ENCYCLOPEDIA OF
IMAGING SCIENCE TECHNOLOGY AND
ENCYCLOPEDIA OF IMAGING SCIENCE AND TECHNOLOGY
Editor Joseph P. Hornak Rochester Institute of Technology
Editorial Board Christian DeMoustier Scripps Institution of Oceanography William R. Hendee Medical College of Wisconsin Jay M. Pasachoff Williams College William Philpot Cornell University Joel Pokorny University of Chicago Edwin Przybylowicz Eastman Kodak Company
John Russ North Carolina State University Kenneth W. Tobin Oak Ridge National Laboratory Mehdi Vaez-Iravani KLA-Tencor Corporation
Editorial Staff Vice President, STM Books: Janet Bailey Vice-President and Publisher: Paula Kepos Executive Editor: Jacqueline I. Kroschwitz Director, Book Production and Manufacturing: Camille P. Carter Managing Editor: Shirley Thomas Editorial Assistant: Susanne Steitz
ENCYCLOPEDIA OF
IMAGING SCIENCE TECHNOLOGY AND
Joseph P. Hornak Rochester Institute of Technology Rochester, New York
The Encyclopedia of Imaging Science and Technology is available Online in full color at www.interscience.wiley.com/eist
A Wiley-Interscience Publication
John Wiley & Sons, Inc.
This book is printed on acid-free paper. Copyright 2002 by John Wiley & Sons, Inc., New York. All rights reserved. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008, E-Mail:
[email protected]. For ordering and customer service, call 1-800-CALL-WILEY. Library of Congress Cataloging in Publication Data: Encyclopedia of imaging science and technology/[edited by Joseph P. Hornak]. p. cm. ‘‘A Wiley-Interscience publication.’’ Includes index. ISBN 0-471-33276-3 (cloth:alk.paper) 1. Image processing–Encyclopedias. 2. Imaging systems–Encyclopedias. I. Hornak, Joseph P. TA1632.E53 2001 2001046915 621.36 7 03–dc21 Printed in the United States of America. 10 9 8 7 6 5 4 3 2 1
PREFACE Welcome to the Encyclopedia of Imaging Science and Technology. The Encyclopedia of Imaging Science and Technology is intended to be a definitive source of information for the new field of imaging science, and is available both in print and Online at www.interscience.wiley.com/eist. To define imaging science, we first need to define an image. An image is a visual representation of a measurable property of a person, object, or phenomenon, and imaging is the creation of this visual representation. Therefore, imaging science is the science of imaging. Color plays an important role in many imaging techniques. Humans are primarily visual creatures. We receive most of our information about the world through our eyes. This information can be language (words) or images (pictures), but we generally prefer pictures. This is the reason that between 1200 and 1700 A.D., artists were commissioned to paint scenes from the Bible. More recently, this is the reason that television and movies are more popular than books, and many current day newspapers provide many pictures. We rely more and more on images for transmitting information. Take a newspaper as the first supporting evidence of this statement. The pictures in the newspaper are possible only because of image printing technology. This technology converts gray-scale pictures into a series of different size dots that allow our eyes to perceive the original picture. Chances are that some of the pictures in your newspaper are in color. This more recent printing technology converts the color image to a series of red, green, blue, and black dots for printing. That is the technology of printing an image, but let’s examine a few of the pictures in a typical newspaper in more detail to see how ubiquitous imaging science is.
from 10.5 to 202 light years away. This feat would never be possible if it were not for the powerful telescopes available to astronomers. Turning to the weather page, we see a 350-mile, fieldof-view satellite image of a typhoon in the South Pacific Ocean. This image is from one of several orbiting satellites devoted to providing information for weather forecasting. The time progression of the images from this satellite enables meteorologists to predict the path of the typhoon and save lives. Other images on this page of the newspaper show the regions of precipitation on a map (Fig. 1–see color insert). Images of this type are composites of radar reflective imagery from dozens of radar sighting. Although radar was developed in the mid-twentieth century, the ability to image precipitation, its cumulative amount and velocity, was not developed until the latter quarter of the century. Using this imagery, we can tell if our airline flights will be delayed or our weekend barbecue will be rained out. Another article shows a U.S. National Aeronautics and Space Administration (NASA) satellite image of a smoke cloud from wildfires in the Payette National Forest in Idaho, U.S. The smoke cloud is clearly visible, spreading 475 miles across the entire length of the state of Montana. Another U.S. National Oceanic and Atmospherics Administration (NOAA) infrared satellite image shows the hot spots in another fire across the border in Montana. This imagery will allow firefighters to allocate resources more efficiently to battle the wildfires. We take much of the satellite imagery for granted today, but it was not until the development of rockets and imaging cameras that this imaging technology was possible and practical. For example, before electronic cameras were available, some satellites used photographic film that was sent back to earth in a recovery capsule. This method is not quite the instantaneous form used today. Turning to the entertainment section, we see a description of the way computer graphics were used to generate the special effects imagery in the latest moving picture. The motion picture industry has truly been revolutionized by the special effects made available by digital image processing, which was originally developed to remove noise from images, enhance image quality, and find features in images. Those people who do not receive their information in printed form need only look at the popularity of television, film and video cameras, and photocopiers to be convinced that imaging science and technology touches our daily lives. Nearly every home and motel room has a television. Considering that the networked desktop computer is becoming a form of television, the prevalence of television is even greater. Most people in the developed world own or have used a photographic film, digital, or VCR camera. Photocopiers are found in nearly all businesses, libraries, schools, post offices, and grocery stores. In addition to these obvious imaging systems, there are lesser known ones that have had just as important an impact on our lives. These include X-ray, ultrasound, and magnetic resonance
A word about Color in the Encyclopedia: In the Online Encyclopedia all images provided by authors in color appear in color. Color images in the print Encyclopedia are more limited and appear in color inserts in the middle of each volume. The pictures of a politician or sports hero on the front page were probably taken with a camera using silver halide film, one of the more common imaging systems. Chances are that a digital camera was used if it is a very recent story on the cover page. The imaging technology used in these cameras decreases the time between image capture and printing of the newspaper, while adding editorial flexibility. Moving further into the paper, you might see a feature story with images from the sunken HMS Titanic located approximately 2 miles beneath the surface of the North Atlantic Ocean. The color pictures were taken with a color video camera system, one of the products of the late twentieth century. This specific camera system can withstand the 296 atmospheres of pressure and provide the illumination to see in total darkness and murky water. Another feature in the paper describes the planets that astronomers have found orbiting around stars v
vi
PREFACE
imaging (MRI) medical imaging systems; radar for air traffic control; and reflective seismology for oil exploration. Imaging science is a relatively new discipline; the term was first defined in the 1980s. It is informative to examine when, and in some cases how recently, many of the imaging systems, on which we have become so dependent, came into existence. It is also interesting to see how the development of one technology initiated new imaging applications. The time line in Table 1 (see pages ix–x). Summarizes some of the imaging-related developments. In presenting this time line, it is helpful to note that the first person who discovers or invents something often does not receive credit. Instead, the discovery or invention is attributed to the person who popularizes or mass-markets the discovery or invention. A good example is the development of the optical lens (1). The burning glass, a piece of glass used to concentrate the sun’s energy and start a fire, was first referred to in a 424 B.C. play by Aristophanes. The Roman philosopher, Seneca (4 B.C. –65 A.D.), is alleged to have read books by peering at them through a glass globe of water to produce magnification. In 50 A.D., Cleomedes studied refraction. Pliny (23–79 A.D.) indicated that Romans had burning glasses. Claudius Ptolemy (85–165 A.D.), the Egyptian born Greek astronomer and mathematician, mentions the general principle of magnification. Clearly, the concept of the lens was developed before the Dark Ages. But because this early technology was lost in the Dark Ages, reading stones, or magnifying glasses, are said to have been developed in Europe about 1000 A.D., and the English philosopher, Roger Bacon, is credited with the idea of using lenses to correct vision in 1268. Therefore, this historical account of imaging science is by no means kind to the actual first inventors and discoverers of an imaging technology. It is, however, a fair record of the availability of imaging technology in society. It is clear from the historical account that technologies are evolving and developing to increase the amount of visual information available to humans. This explosive growth and need to use new imaging technology was so great that the field of imaging science evolved. The field was called photographic science, but now it is given the more general term imaging science to include all forms of visual image generation. Because many are unfamiliar with imaging science as a discipline, this introduction defines the field and its scope. As stated earlier, an image is a visual representation of some measurable property of a person, object, or phenomenon. The visual representation or map can be one-, two-, three-, or more dimensional. For this encyclopedia, the representation is intended to be exact, not abstract. The measurable property can be physical, chemical, or electrical. A few examples are the reflection, emission, transmission, or absorption of electromagnetic radiation, particles, or sound. The definition of an image implies that an image can also be of a phenomenon, such as temperature, an electric field, or gravity. The device that creates the image is called the imaging system. All imaging systems create representations of objects, persons, and phenomena that are visual. These representations are intended to be interpreted by the human mind, typically using the eyes, but in some cases by an expert system.
Many imaging systems create visual maps of what the eye and mind can see. Others serve as transducers, converting what the eye cannot see (infrared radiation, radio waves, sound waves) into a visual representation that the eye and mind can see. Imaging systems evolve and keep pace with our scientific and technological knowledge. In prehistoric time, imaging was performed by individuals skilled in creating visual likenesses of objects with their hands. The imaging tools were a carving instrument and a piece of wood or a piece of charcoal and a cave wall. As time evolved, imaging required more scientific knowledge because the tools became more technologically advanced. From 5000 B.C. to about 1000 A.D., persons interested in creating state-of-the-art images needed knowledge of pigments, oils, and color mixing to create paintings or of the cleavage planes in stone to create sculptures. By the time photography was developed, persons wishing to use this imaging modality had to know optics and the chemistry of silver halides. The development of television further increased the level of scientific knowledge required by those practicing this form of imaging. Few would have thought that the amount of scientific knowledge necessary to understand television would be eclipsed by the amount necessary to understand current imaging systems. Many modern digital imaging systems require knowledge of the interaction of energy with matter (spectroscopy), detectors (electronics), digital image processing (mathematics), hard and soft copy display devices (chemistry and physics), and human visual perception (psychology). Clearly, imaging requires more knowledge of science and technology than it did thousands of years ago. The trend predicts that this amount of knowledge will continue to increase. Early applications of imaging were primarily historic accounts and artistic creations because the imaged objects were visual perceptions and memories. Some exceptions were the development of microscopy for use in biology and telescopy for use in terrestrial and astronomical imaging. The discovery of X rays in 1895 further changed this by providing medical and scientific needs for imaging. The development of satellite technology and positioning systems that have angstrom resolution, led, respectively, to additional applications of remote and microscopic imaging in the sciences. Currently scientists, engineers, and physicians in every discipline are using imaging science to visualize properties of the systems they are studying. Moreover, images are ubiquitous. Because they can be more readily generated and manipulated, they are being used in all aspects of business to convey information more effectively than text. The adage, ‘‘A picture is worth a thousand words’’ can be modified in today’s context to say, ‘‘A picture is worth 20 megabits,’’ because an image can in fact summarize and convey such complexities. Imaging science is the pursuit of the scientific understanding of imaging or an imaging technique. The field of imaging science continues to evolve, incorporating new scientific disciplines and finding new applications. Today, imaging is used by astronomers to map distant galaxies, by oceanographers to map the sea floor, by chemists to map the distribution of atoms on a surface, by physicians to map the functionality of the brain, and
PREFACE
by electrical engineers to map the electromagnetic fields around power transmission lines. We see the results of the latest imaging technologies in our everyday lives (e.g., the nightly television news contains instantaneous radar images of precipitation or three-dimensional views of cloud cover), and we can also personally use many of these technologies [e.g., digital image processing (DIP) algorithms are available on personal computers and are being used to remove artistic flaws from digital and digitized family and vacation photographs]. When attempting to summarize a new field, especially an evolving one like imaging science, a taxonomy should be chosen that allows people to see the relationships to other established disciplines, as well as to accommodate the growth and evolution of the new discipline. The information presented in this encyclopedia is organized in a unique way that achieves this goal. To understand this organization, the concept of an imaging system must first be developed. An imaging system is a device such as a camera, magnetic resonance imager, an atomic force microscope, an ultraviolet telescope, or an optical scanner. Most imaging systems have ten loosely defined components: 1. An imaged object, person, or phenomenon. 2. A probing radiation, particle, or energy, such as visible light, electrons, heat, or ultrasound. 3. A measurable property of the imaged object, person, or phenomenon, such as reflection, absorption, emission, or scattering. 4. An image formation component, typically based on focusing optics, projections, scans, holography, or some combination of them. 5. A detected radiation, particle, or energy, which may be different from the probing radiation, particle, or energy. 6. A detector, consisting of photographic film, a charge-coupled device (CCD), photomultiplier tubes, or pressure transducers, 7. A processor, which can be chemical as for photographs, or a computer algorithm as in digital imaging processing (DIP) techniques 8. An image storage device, such as photographic film, a computer disk drive, or computer memory. 9. A display, which can be a cathode ray tube (CRT) screen, photographic paper, or some other hard copy (HC) output. 10. An end user. Examples of the components of two specific imaging systems will be helpful here. In a photographic camera imaging system (see articles on still Still Photography and Instant Photography), the imaged object might be a landscape. The probing radiation may be visible light from the sun or from a strobe light (flash). The measurable property is the absorption and reflection of visible light by the objects in the scene. The image formation component is a lens. The detected radiation is visible light. The detector is many tiny silver halide crystals in the film (see article on Silver Halide Detector Technology). The processor is a
vii
chemical reaction that converts the exposed silver halide crystals to silver metal and removes the unexposed silver halide crystals. The storage device is the developed negative. The display is photographic paper (see article on Photographic Color Display Technology), and the end user is a person. In a magnetic resonance imaging (MRI) system (see article on Magnetic Resonance Imaging), the imaged object might be the human brain. The probing radiation is electromagnetic radiation in the radio-frequency range (typically 63 MHz). The measurable property is a resonance associated with the absorption of this energy by magnetic energy levels of the hydrogen nucleus. The image formation component is frequency and phase encoding of the returning resonance signal using magnetic field gradients. The detected radiation is at the same radio frequency. The detector is a doubly balanced mixer and digitizer. The processor is a computer. The storage device is a computer disk. The display device is a cathode ray tube (CRT; see article on Cathode Ray Tubes and Cathode Ray Tube Display Technology) or liquid crystal display (LCD; see article on Liquid Crystal Display Technology) computer screen, or a film transparency. The end user is a radiologist. The imaged object and end user play an important role in defining the imaging system. For example, even though ultrasound imaging (see article on Ultrasound Imaging) is used in medicine, nondestructive testing, and microscopy, the three imaging systems used in these disciplines have differences. The systems vary because the imaged objects (human body, manufactured metal object, and surface) are different, and the end users of the information (physician, engineer, and scientist) have unique information requirements. The way in which the human mind interprets the information in an image (see articles on Human Visual System) plays a big role in the design of an imaging system. Many imaging systems use some form of probing radiation, particle, or energy, such as visible light, electrons, or ultrasound. The radiation, particle, or energy is used to probe a measurable property of the imaged object, person, or phenomenon. Nearly every wavelength of electromagnetic radiation, type of particle, and energy has been used to produce images. The nature of the interaction of the radiation, particle, or energy with the imaged object, person, or phenomenon is as important as the probing radiation, particle, or energy. This measurable property could be the reflection, absorption, fluorescence, or scattering properties of the imaged object, person, or phenomenon. Systems that do not use probing radiation, particles, or energy typically image the emission of radiation, particles, or energy. This aspect of imaging science is so important that it is covered in detail in a separate article of the encyclopedia entitled Electromagnetic Radiation and Interactions with Matter. The image formation component is responsible for spatially encoding the data from the imaged object so that it forms a one, two, three, or more dimensional representation of the imaged object. Optics are the most frequently used image formation method (see article on Optical Image Formation) when dealing with ultraviolet
viii
PREFACE
(UV), visible (vis), or infrared (IR) radiation. Other image formation methods include scanning used in radar, MRI, and flat-bed scanners; point source projection used in Xray plane-film imaging; and hybrid methods used, for example in electrophotographic copiers (see article on Image Formation). Many imaging systems have image storage devices. The storage can be digital or hard copy, depending on the nature of the imaging system. There is much concern about the permanence of image storage techniques in use today. Many images are stored on CDs, whose lifetime is ∼10 years. Magnetic storage media face another problem: obsolescence. Many images are archived on magnetic media such as nine-track tape, and 8, 5.25, or 3.5 inch floppy disks. The drives for these devices are becoming more and more difficult to find. This concern about image permanence and the ability to view old images is not new. Figure 3 (see color insert) is an image of a mosaic created before 79 A.D. depicting the battle between Alexander the Great and the Persian Emperor Darius III in 333 B.C. This mosaic, from the House of the Faun in Pompeii, Italy, survived the eruption of Mt. Vesuvius that buried it in 79 A.D. It is claimed that this mosaic is a copy of an original painting created sometime after 333 B.C., but destroyed in a fire. (This image is noteworthy for another reason. It is a mosaic copy of a painting and therefore an early example of a pixelated copy of an image. The creator understood how large the pixels (tiles) could be and still convey visual information of the scene.) Will our current image storage formats survive fires and burial by volcanic eruptions? The display device can produce hard copy, soft copy, or three-dimensional images. Hard copy devices include dye sublimation printers (see Dye Transfer Printing Technology), laser printers (see article on Electrophotography), ink jet printers (see Ink Jet Display Technology), printing presses (see articles on Gravure Multi-Copy Printing and Lithographic Multicopy Printing), and photographic printers (see article on Photographic Color Display Technology). Soft copy devices include cathode ray tubes (see articles on Cathode Ray Tubes and Cathode Ray Tube Display Technology), liquid crystal displays (see article on Liquid Crystal Display Technology), and field emission displays (see article on Field Emission Display Panels). Three-dimensional displays can be holographic (see article on Holography) or stereo (see article on Stereo and 3D Display Technologies). Table 2 (see pages xi–xiii) will help you distinguish between an imaging system and imaging system components. It identifies the major components of various imaging systems used in the traditional scientific disciplines, as well as in personal and commercial applications. The information in this encyclopedia is organized around Table 2. There is a group of articles designed to give the reader an overview of the way imaging science is used in a select set of fields, including art conservation (see Imaging Science in Art Conservation), astronomy
(see Imaging Science in Astronomy), biochemistry (see Imaging Science in Biochemistry), overhead surveillance (see Imaging Science in Overhead Surveillance), forensics & criminology (see Imaging Science in Forensics and Criminology), geology (see Imaging Applied to the Geologic Sciences), medicine (see Imaging Science in Medicine), and meteorology (see Imaging Science in Meteorology). The articles describe the objects imaged and the requirements of the end user, as well as the way these two influence the imaging systems used in the discipline. A second class of articles describes specific imaging system components. These include articles on spectroscopy, image formation, detectors, image processing, display, and the human visual system (end user). Each of these categories contains several articles on specific aspects or technologies of the imaging system components. For example, the display technology category consists of articles on cathode-ray tubes, field emission panels, liquid crystals, and photographic color displays; and on ink jet, dye transfer, lithographic multicopy, gavure, laser, and xerographic printer. The final class of articles describes specific imaging systems and imaging techniques such as magnetic resonance imaging (see Magnetic Resonance Imaging), television (see Television Broadcast Transmission Standards), optical microscopy (see Optical Microscopy), lightning strike mapping (see Lightning Locators), and ground penetrating radar (see Ground Penetrating Radar), to name a few. The editorial board of the Encyclopedia of Imaging Science and Technology hopes that you find the information in this encyclopedia useful. Furthermore, we hope that it provides cross-fertilization of ideas among the sciences for new imaging systems and imaging applications. REFERENCES 1. E. Hecht, Optics, Addison-Wesley, Reading, 1987. 2. R. Hoy, E. Buschbeck, and B. Ehmer, Science, (1999). 3. The Trilobite Eye. http://www.aloha.net/∼smgon/eyes.htm 4. The Cave of Chauvet Pont d’Arc. Minist`ere de la culture et de la communication of France. http://www.culture.gouv.fr/culture/arcnat/chauvet/en/ 5. The Cave of Lascaux. Minist`ere de la culture et de la communication of France. http://www.culture.fr/culture/arcnat/lascaux/en/ 6. Brad Fortner, The History of Television Technology, http://www.rcc.ryerson.ca/schools/rta/brd038/clasmat/class1/ tvhist.htm 7. Electron Microscopy. University of Nebraska-Lincoln. http://www.unl.edu/CMRAcfem/em.htm 8. B.C. Breton, The Early History and Development of The Scanning Electron Microscope. http://www2.eng.cam.ac.uk/∼bcb/history.html
JOSEPH P. HORNAK, PH.D. Editor-In-Chief
Table 1. The History of Imaging ∼ 5 × 108
B.C.
30,000 B.C.
16,000 B.C.
15000 B.C.
12000 B.C.
5000 B.C. 4000 B.C.
3500 B.C.
3000 B.C.
2700 B.C. 2500 B.C.
1500 B.C. 1400 B.C. 300 B.C.
200 B.C. 9 A.D. 105
1000
1268
1450 1608
1590
Imaging systems appear in animals during the Paleozoic Era, for example, the compound eye of the trilobite (2,3). Chauvet-Pont-d’Arc charcoal drawings of on cave walls in southern France (4). Aurignacians of south Germany created sophisticated portable art in the form of ivory statuettes that have naturalistic features (4). The first imaging tools: paint medium — animal fat; support — rock and mud of secluded caves; painting tools — fingers, scribing sticks, blending and painting brushes; a hollow reed to blow paint on the wall (airbrush). Lascaux, France, cave drawings (5). Paintings were done with iron oxides (yellow, brown, and red) and bone black. Altamira, Spain, cave paintings made from charcoal, pigments from plants, soil, and animal fat and blood. Water-based paint was used in upper Nile, Egypt. Turpentine and alcohol were also available as paint thinners for oil-based paints in the region around the Mediterranean Sea. The first paperlike material was developed in Egypt by gluing together fibers from the stem of the papyrus plant. Memphis, Egypt, lifelike gold sculptures were being poured. Sumerians created mural patterns by driving colored clay cones into walls. Egyptians carved life-size statues in stone with perfect realism. Lead-based paints developed in Morocco, Africa. Vermilion (mercuric sulfide), a bright red pigment developed in China. Egg and casein (a white dairy protein) mediums used as paint in the Baltic Sea region. Ammonia available in Egypt for wax soap paints. Babylonia, art decorated structures with tile (early mosaics). Greece, representational mosaics constructed from colored pebbles. Aristotle described the principle of the camera obscura. Camera is Latin for room and obscura is Latin for dark. Roman sculpture was considered to have achieved high art standards. Chinese innovators create movable type, the precursor of the printing press. Cai Lun, the manager of the Chinese Imperial Arsenal, reported the production of paper from bark, hemp, rags, and old fishing nets. Alhazen studied spherical and parabolic mirrors and the human eye. Reading stones, or magnifying glass, were developed in Europe. The English philosopher and Franciscan, Roger Bacon, initiated the idea of using lenses to correct vision. Johannes Gutenberg originates the movable type printing press in Europe. Hans Lipershey, a Dutch spectacle maker, invents the telescope. (. . . or at least helped make it more widely known. His patent application was turned down because his invention was too widely known.)
1600s
1677 1798 1820s 1827
1839
1842
1855 1859
1860 1880
1884
1887 1891 1895
1897
1907
1913
ix
Hans and Zacharias Janssen, Dutch lens grinders, produced the first compound (twolens) microscope. English glass makers discover lead crystal glass, a glass of very high clarity that could be cut and polished. Anton van Leeuwenhoek, a Dutch biologist, invents the single-lens microscope. Alois Senefelder in Germany invents lithographic printing. Jean Baptiste Joseph Fourier develops his Fourier integral based on sines and cosines. Joseph Ni´epce produced the heliograph, or first photograph. The process used bitumen of Judea, a varnish that hardens on exposure to light. Unexposed areas were washed away leaving an image of the light reflected from the scene. Niepce Daguerre develops the first successful photographic process. The image plate or picture as called a daguerreotype. Alexander Bain proposes facsimile telegraph transmission that scans metal letters and reproduces an image by contact with chemical paper. Christian Andreas Doppler discovers wavelength shifts from moving objects. Known today as the Doppler effect, this shift is used in numerous imaging systems to determine velocity. Collotype process developed for printing high-quality reproductions of photographs. Gaspard Felix Tournachon collects aerial photographs of Paris, France, from a hot-air balloon. From 1863, William Bullock develops the web press for printing on rolls of paper. The piezoelectric effect in certain crystals was discovered by Pierre and Jacques Curie in France. The piezoelectric effect is used in ultrasound imagers. German scientist Paul Gottlieb Nipkow patented the Nipkow disk, a mechanical television scanning system. American, George Eastman, introduces his flexible film. In 1888, he introduced his box camera. Eastman’s vision was to make photography available to the masses. The Kodak slogan was, ‘‘You press the button, we do the rest.’’ H. R. Hertz discovers the photoelectric effect, the basis of the phototube and photomultiplier tube. The Edison company successfully demonstrated the kinetoscope, a motion picture projector. Louis Lumiere of France invented a portable motion-picture camera, film processing unit, and projector called the cinematographe, thus popularizing the motion picture camera. Wilhelm R¨ontgen discovers X rays. German physicist Karl Braun developed the first deflectable cathode-ray tube (CRT). The CRT is the forerunner of the television picture tube and computer monitor. Louis and Auguste Lumiere produce the Autochrome plate, the first practical color photography process. Frits Zernike, a Dutch physicist, develops the phase-contrast microscope.
Table 1. (Continued) 1915
1917
1925 1926
1927 1928
1929
1931
1934
1935
1938 1938 1940s 1941
1947
1949 1951
1953
1956
Perhaps motivated by the sinking of the HMS Titanic in 1912, Constantin Chilowsky, a Russian living in Switzerland, and Paul Lang´evin, a French physicist, developed an ultrasonic echo-sounding device called the hydrophone, known later as sonar. J. Radon, an Austrian mathematician, derived the mathematical principles of reconstruction imaging that would later be used in medical computed tomography (CT) imaging. John Logie Baird invented mechanical television. Kenjito Takayanagi of the Hamamatsu Technical High School in Tokyo, Japan, demonstrates the first working electronic television system using a cathode-ray tube (6). Philo T. Farnsworth, an American, demonstrates broadcast television (6). John L. Logie Baird demonstrates color television using a modified Nipkow disk. Sergei Y. Sokolov, a Soviet scientist at the Electrotechnical Institute of Leningrad, suggests the concept of ultrasonic metal flaw detection. This technology became ultrasound-based nondestructive testing. Vladimir Zworykin of Westinghouse invented the all-electric camera tube called the iconoscope (6). Max Knoll and Ernst Ruska of Germany develop the transmission electron microscope (TEM) (7). Harold E. Edgerton develops ultrahigh speed and stop-action photography using the strobe light. Karl Jansky of Bell Telephone Laboratories discovers a source of extraterrestrial radio waves and thus started radio astronomy. Corning Inc., Corning, New York, pours the first 5-m diameter, 66-cm thick borosilicate glass disk for a mirror to be used in the Hale Telescope on Palomar Mountain, near Pasadena, California. Robert A. Waston-Watt, a British electronics expert, develops radio detection and ranging (radar). The first unit had a range of ∼8 miles. Manfred von Ardenne constructed a scanning transmission electron microscope (STEM) (8). Electrophotography (xerography) invented by Chester F. Carlson. Moving target indication (MTI) radar or pulsed-Doppler radar developed. Konrad Zuse, a German aircraft designer, develops the first, electronic, fully programmable computer. He used old movie film to store programs and data. Edwin Herbert Land presents one-step photography at an Optical Society of America meeting. One year later the Polaroid Land camera was introduced. Rokuro Uchida, Juntendo University, Japan, builds the first A-mode ultrasonic scanner. Charles Oatley, an engineer at the University of Cambridge, produces the scanning electron microscope. Ian C. Browne and Peter Barratt, Cambridge University, England, apply pulsed-Doppler radar principles to meteorological measurements.
1957
1963
1965
1969
1971
1972
1973
1974 1975
1976 1981
1989 1990 1997
x
Charles Ginsburg led a research team at Ampex Corporation in developing the first practical videotape recorder (VTR). The system recorded high-frequency video signals using a rapidly rotating recording head. Gordon Gould invents light amplification by stimulated emission of radiation (laser). The Soviet Union launches Sputnik 1, the first artificial satellite to orbit the earth successfully. This success paved the way for space-based remote sensing platforms. Polaroid introduces Polacolor film that made instant color photos possible. Allan M. Cormack develops the backprojection technique, a refinement of Radon’s reconstruction principles. James W. Cooley and John W. Tukey publish their mathematical algorithm known as the fast Fourier transform (FFT). The charge-coupled device (CCD) was developed at Bell Laboratories. The first live images of man on the earth’s moon and images of the earth from the moon. Sony Corporation sells the first video cassette recorder (VCR). James Fergason demonstrates the liquid crystal display (LCD) at the Cleveland Electronics Show. Intel introduces the single-chip microprocessor (U.S. Pat. 3,821,715) designed by engineers Federico Faggin, Marcian E. Hoff, and Stan Mazor. Dennis Gabor receives the Nobel prize in physics for the development of a lensless method of photography called holography. Godfrey Hounsfield constructs the first practical computerized tomographic scanner. Pulsed-Doppler ultrasound was develops for blood flow measurement. Paul G. Lauterbur develops backprojection-based magnetic resonance imaging (MRI). Robert M. Metcalfe, from the Xerox, Palo Alto Research Center (PARC), invents hardware for a multipoint data communication system with collision detection which made the Internet possible. The first personal computers were marketed (Scelbi, Mark-8, and Altair). Richard Ernst develops Fourier-based magnetic resonance imaging (MRI) (see Fig. 2) Laser printer invented. First CCD television camera. Ink-jet printer developed. Gerd Karl Binnig and Heinrich Rohrer invent the scanning tunneling microscope (STM) which provides the first images of individual atoms on the surfaces of materials. The IBM personal computer (PC) was introduced. First Ph.D. program in Imaging Science offered by Rochester Institute of Technology. Hubble Space Telescope launched. Mars Pathfinder lands on Mars and transmits images back to earth.
xi
Imaged Object
Aviation Radar (ground-based) Radar (airborne) Storm scope Chemistry Autoradiograph Molecular modeling Molecular modeling Defense/Spy/Surveillance Night vision (IR) Satellite imaging Electrical Engineering E field imaging B field imaging E field imaging Forensics/Criminology Fluorescence
Visible X ray
MW MW EMR γ, β RF X ray IR Vis, IR RF RF RF UV
Tracers Molecules Molecules
Military Military
Fields Fields Fields
Material
Vis X ray
RF UV
RF X ray
Vis
UV
Radiation/ Energy
Planes Planes Lightning
Stars/planets Stars
Pulsars Stars
Artifacts Artifacts
Subsurface radar X ray
Astronomy Radio UV
Art/Artifact
Narrow BW optical
Archeology/Art Conservation Fluorescence Art
Application Field Technique
Table 2. Partial List of Imaging Technologies
Fluorescence
Absorption NMR Absorption
Emission Refl/IR-emis
Radioact. decay NMR Diffraction
Reflection Reflection Emission
Vis emission Emission
RF emission UV emission
Reflection Absorption
Refl/Abs
Fluorescence
Spectroscopy
Optics
Array k-space scan x, y scan
Optics Optics
Projection — —
θ, t scan θ, t scan Triangulation
Optics Projection
θ1 , θ2 scan Optics
θ, t scan Projection
Optics
Optics
Image Formation
CCD AgX
Resistive-LC mixer mixer
SSDA SSDA
AgX Mixer Scintillator-PMT
Mixer Mixer Triangulation
Mixer CCD Scintillator-AgX CCD SSDA Scintillator-AgX
Mixer Scintillator-CCD Scintillator-AgX
CCD
CCD
Detection
DIP Chemistry
Scan 2-DFT Scan
DIP DIP,MIS
Chemistry 2-DFT-space mod. Space modeling
θ, t scan θ, t scan Digital
DIP DIP Chemistry DIP DIP Chemistry
DIP Chemistry DIP Chemistry θ, t scan DIP Chemistry
Processing & Enhancement
EE EE EE Law enforcement Law enforcement
Camera SC + HC HC SC + HC HC − photo
Intelligence Intelligence
Scientists Chemists Chemists
HC − photo SC + HC SC + HC SC SC + HC
ATC Pilots Pilots
Astronomers Astronomers Astronomers Astronomers Astronomers Astronomers
Art Cons. Art Cons. Art Cons. Art Cons. Archeologists Archeologists Archeologists
End User
SC − CRT SC − CRT SC − LCD
SC + HC SC + HC HC SC + HC SC + HC HC
SC HC − photo SC HC − photo SC + HC SC HC − photo
Display
xii
Particle radiogram PET Thermography Ultrasound X ray Meteorology Doppler radar Lightning locator Satellite
MEG topography Nuclear med imaging Proton, neutron X ray IR Sound X ray MW EMR IR
Body Organ function Tissue temp. Soft tissue Hard tissue
Precipitation Lightning Clouds
Body Organ function
B field γ, β
X ray Sound E field E field E field Vis RF RF RF X ray
Medicine CT Doppler ultrasound EEG topography ECG topography EMG topography Endoscopy fMRI MRI MRA X ray angiography
Reflection Emission Absorption
Emission β + Annihilation Emission Reflection Absorption
MEG Nuclear decay
Absorption Reflection Emission Emission Emission Reflection NMR NMR NMR Absorption
Reflection Reflection Absorption
Vis Sound X ray
Soft/hard tissue Flow Brain activity Cardiac activity Muscle activity Internal organs Brain activity Soft tissues Flow Flow
Overhauser NMR
B field
Spectroscopy
Refl/emission Force NMR Refl/emission Reflection Reflection
Radiation/ Energy
Vis, IR Gravity RF Vis, IR RF MW
Imaged Object
Geology + Earth Resource Management Airborne imaging Earth Gravitation of imaging Mass NMR Oil/water Satellite imaging Earth Subsurface radar Geologic deposits Synthetic aperture Terrain radar Magnetic anomaly Ferromagnetic obj. imaging Machine Vision/Process Inspection Optical camera Production Ultrasound Mfg. material X ray Mfg. material
Application Field Technique
Table 2. (Continued)
θ, t,ν scan θ, amplitude Optics
Projection Coincidence Det. Optics θ, t scan Projection
Triangulation Tomo. Projection
Tomo. projection θ, t, ν scan Detector array Detector array Detector array Optics k-space scan k-space scan k-space scan Projection
Optics θ, t scan Projection
Scan
Optics x, y scan Scan Optics Scan Scan
Image Formation
Mixer Triangulation CCD
Scintillator-CCD Piezoelectric Digitizer Digitizer Digitizer CCD Mixer Mixer Mixer Scintillator-AgX Scintillator TV SQUID Scintillator-CCD Scintillator-AgX Particle detector Scintillator-CCD CCD Piezoelectric Scintillator-film
CCD Piezoelectric Scintillator-CCD Scintillator- AgX
Mixer
CCD Piezoelectric Mixer CCD φ, Array Mixer
Detection
θ, t, ν scan Triangulation DIP
Radon Transform θ, t, ν scan Triangulation Triangulation Triangulation Video 2-DFT 2-DFT 2-DFT Chemical/DIP Video Triangulation DIP Chemistry DIP Coincidence DIP θ, t scan Chemical
Artificial Intel. θ, t scan DIP Chemistry
FT
DIP/MIS x, y scan FT DIP/MIS DIP Time/Phase
Processing & Enhancement
Physicians Cardiologists Physicians Cardiologists Physicians Physicians Neurologists Radiologists Cardiologists Cardiologists Cardiologists Physicians Physicians Physicians Physicians Physicians Physicians Physicians Physicians Meteorologists Meteorologists Meteorologists
SC + HC SC + HC SC + HC SC + HC SC + HC SC − video SC + HC SC + HC SC + HC Film Video SC + HC SC + HC Film SC + HC SC + HC SC + HC Video + HC Film SC + HC SC + HC SC + HC
Intel. Syst. Inspectors Engineers Engineers
Geologists
SC + HC
Intel. Syst. SC + HC SC + HC Film
Geologists Geologists Geologists Geologists Geologists Geologists
End User
SC + HC SC + HC SC + HC SC + HC SC + HC SC + HC
Display
xiii Vis MW/RF Vis Sound B-field Vis Vis Vis Vis Vis Vis Vis IR
Objects e− spin Objects
Terrain Metal/oil
Experiences Vis memories Experiences Vis memories Vis memories Documents
People People
B field Ions Vis
Abs/Refl Abs/Refl
Abs/Refl Reflection Abs/Refl Reflection Reflection Abs/Refl
Reflection NMR-T2 *
Abs/scat ESR Reflection
Absorption E field SIMS
B field PIXE Refl.-tran/polar
Absorption Fluorescence
e− UV
Vis Surf. conductiv e−
Mag. domains Surfaces Materials
Magnetic force PIXE imaging Polarizing microscope
Abs/tran
Acoustic impe Auger E field Capacitance Reflection
Spectroscopy
Vis
Sound e− E field E field Vis
Radiation/ Energy
Surfaces Surfaces Surfaces
Surfaces Surfaces
Electron microscopy Fluorescence
Scanning confocal Scanning tunneling SIMS imaging Miscellaneous Ballistic photon ESR imaging Holography Oceanography Sonar Magnetometer Personal/Consumer Broadcast television Digital camera Movie Photography Video camera Xerography Surveillance Vis IR
Surfaces
Surfaces Surfaces Surfaces Surfaces Surfaces
Microscopy Acoustic force Auger imaging Atomic force Capacitive probe Compound
Confocal
Imaged Object
Application Field Technique
Table 2. (continued)
Optics Optics
Optics Optics Optics Optics Optics Hybrid
θ, t scan x, y scan
Optics/scan θ, x scan Holography
x, y scan x, y scan x, y scan
x, y scan x, y scan Optics
x, y scan Optics
Optics
x, y scan x, y scan x, y scan x, y scan Optics
Image Formation
CCD CCD
CCD CCD AgX AgX CCD Org. photoconductor
Piezoelectric Mixer
HS camera/detector Mixer AgX
Piezoelectric Scintillator PMT Piezoelectric Voltage Eye CCD Eye CCD CP optics Scintillator-film Scintillator-CCD Piezoelectric Scintillator-PMT AgX CCD PMT/Diode Current Mass spectrometer
Detection
—
DIP DIP
Electronic DIP Chemical Chemical DIP Physical
θ, t scan FT/time
Time Radon transform Chemical
— DIP Scan + DIP — DIP Scan Scan — DIP Scan Scan Scan
DIP
Scan Scan Scan Scan
Processing & Enhancement
Law enforcement Law enforcement
SC + HC SC + HC
Sailors Oceanographers
SC + HC SC + HC
Consumers Consumers Consumers Consumers Consumers Consumers
Scientists Scientists Scientists
SC + HC SC + HC Hologram
Video SC + HC Projection Photo Video HC − paper
Microscopists Surf. Scientists Microscopists Microscopists Bio/Scientists Bio/Scientists Microscopists Microscopists Microscopists Microscopists Microscopists Microscopists Surf. Scientists Microscopists Microscopists Microscopists Microscopists Surf. Scientists
End User
SC + HC SC + HC SC + HC SC + HC Eye SC + HC Eye SC + HC SC + HC HC − Film SC + HC SC + HC SC + HC Photo SC + HC SC + HC SC + HC SC + HC
Display
ACKNOWLEDGMENTS The Encyclopedia of Imaging Science and Technology is the work of numerous individuals who believed in the idea of an encyclopedia covering the new field of imaging science. I am grateful to all these individuals. I am especially grateful to these people: The authors of the articles in this encyclopedia who set their professional and personal activities aside to write about their specialities. The nine members of the Editorial Board who signed up to guide the development of this encyclopedia when it was just a vague idea. Their input and help molded this encyclopedia into its present form.
Susanne Steitz and the many present and past employees at John Wiley & Sons, Inc. for their help in making this encyclopedia possible. Dott. Bruno Alfano e il Consiglio Nazionale delle Ricerche — Centro per la Medicina Nucleare di Napoli, Italia, per avermi dato la possibilita’ di dedicare parte del mio anno sabbatico al completamento di questa enciclopedia. My wife, Elizabeth, and daughter, Emma, for their love and support during the writing and editing of this work. JOSEPH P. HORNAK, PH.D. Editor-In-Chief
xv
A ACOUSTIC SOURCES OR RECEIVER ARRAYS
the distances from sources 1 and 2 to that same point P are r1 and r2 , respectively. By linear superposition, the total acoustic pressure p at point P is simply the sum of the pressures created by each source, or
WILLIAM THOMPSON, JR. Penn State University University Park, PA
p(P) =
INTRODUCTION This article discusses the directional response characteristics (i.e., the beam patterns or directivity patterns) of arrays of time harmonic acoustic sources. A wonderful consequence of the acoustic reciprocity theorem (1), however, is that the directional response pattern of a transducer used as a source is the same as its directional response pattern used as a receiver. The arrays in question begin with the simplest, two identical point sources, and build up to line arrays that contain many point sources, continuous two dimensional arrays of sources, and discrete arrays of finite sized sources. Topics such as beam width of the main lobe, amplitudes of the side lobes, amplitude weighting to reduce the side lobes, and beam tilting and its consequences are discussed. Again, although the emphasis will be on the response of arrays of sources, the result (the directivity pattern) is the same for the same array used as a receiver. Harmonic time dependence of the form exp(jωt) at circular frequency ω is assumed and suppressed throughout most of the following discussion.
qc
d
r2
(2)
= r + (d/2) − rd cos(π/2 + θ ).
(3)
2
2
However, if one relaxes the demands on this response function to say that interest is only in the directivity pattern observed at distances r that are much greater than the size d of the array (the so-called acoustic far field), useful approximations of Eqs. (2) and (3) can be made: 1d sin θ + · · · r1 ≈ r 1 − (d/r) sin θ ≈ r 1 − 2r ≈r−
d sin θ, 2
(4)
1d sin θ + · · · r2 ≈ r 1 + (d/r) sin θ ≈ r 1 + 2r ≈r+
d sin θ. 2
(5)
Here, the binomial theorem was used to expand the square root of one plus or minus a small quantity. The term (d/2)2 in both Eqs. (2) and (3) was ignored because it has secondorder smallness compared to the (rd sin θ ) term. This same result can be obtained by arguing that when r is sufficiently large, all three lines r1 , r2 , and r are essentially parallel and can be related by simple geometrical considerations that immediately yield the results in Eqs. (4) and (5). A quantitative relationship for how large r must be compared to d so that the observation point P is in the far field will be discussed later. Equations (4) and (5) are to be used in Eq. (1), but the quantities r1 and r2 occur in two places, in an amplitude factor such as 1/r1 and in a phase factor such as exp(−jkr1 ).
P
r q
r21 = r2 + (d/2)2 − rd cos(π/2 − θ ), r22
Consider two, in-phase, equistrength point sources separated by a distance d, and a single point sensor located at observation point P, as shown in Fig. 1. This source configuration is sometimes called a bipole. The sources are located in an infinitely extending isovelocity fluid medium characterized by ambient density ρ0 and sound speed c. The origin of the coordinate system is chosen at the midpoint of the line segment that joins the two sources. The angle θ is measured from the normal to that line segment; alternatively, one could use the complementary angle θc measured from the line segment. The distance from the center point to point P is r. Similarly,
r1
(1)
where k is the acoustic wave number (= ω/c or 2π/λ, where λ is the wavelength) and Q is the strength of either source (the volume velocity it generates which is the integral of the normal component of velocity over the surface of the source; SI units of m3 /s). Now, what is really desired is a response function that describes how the pressure varies with the angle θ as a sensor is moved around on a circle of radius r. This requires expressing r1 and r2 as functions of r and θ . Unfortunately, using the law of cosines, these functional relationships are nonlinear:
TWO IN-PHASE POINT SOURCES
1
jρo ckQ −jkr1 jρo ckQ −jkr2 e + e , 4π r1 4π r2
q = 0°
2 Figure 1. Two, in-phase, equistrength point sources. 1
2
ACOUSTIC SOURCES OR RECEIVER ARRAYS
Another approximation that is made, which is consistent with the fact that r d, is 1 1 1 ≈ ≈ r1 r2 r
(6)
in both amplitude factors. However, in the phase factors, because of the independent parameter k that can be large, the quantity [(kd/2) sin θ ] could be a significant number of radians. Therefore, it is necessary to retain the correction terms that relate r1 and r2 to r in the phase factors. Hence, Eq. (1) becomes jρo ckQ −jkr j(kd/2) sin θ + e−j(kd/2) sin θ e e 4π r kd jρo ckQ −jkr = e sin θ . cos 2π r 2
p(P) =
(7)
in that plane is equidistant from the two identical but 180° out-of-phase sources whose outputs therefore totally cancel one another. The other difference in Eq. (8) is an additional phase factor of j which says the pressure field for this source configuration is in phase quadrature with that of the two in-phase sources. This factor is not important when considering just this source configuration by itself, but it must be accounted for if the output of this array were, for example, to be combined with that of another array; in that case, their outputs must be combined as complex phasors, not algebraically. As a special case of this source arrangement, imagine that the separation distance d is a small fraction of a wavelength. Hence, for a small value of kd/2, one can approximate the sine function by its argument, thereby obtaining p(P) =
−ρo ckQ 2π r
kd sin θ e−jkr , 2
for
kd 1. 2
Because the source array is axisymmetrical about the line segment that joins the two sources, so is the response function. If we had instead chosen to use the angle θc of Fig. 1, the term sin θ in Eq. (7) would simply become cos θc . The quantity in the brackets which describes the variation of the radiated pressure with angle θ , for a fixed value of r, is called the directivity function. It has been normalized so that its absolute maximum value is unity at θ = 0° or whenever [(kd/2) sin θ ] = nπ for integer n. Conversely, the radiated pressure equals zero at any angle such that [(kd/2) sin θ ] = (2n − 1)π/2. If kd/2 is small, which is to say the size of the source configuration is small compared to the wavelength, then the directivity function is approximately unity for all θ , or one says that the source is omnidirectional. On the other hand, if kd/2 is large, the directivity pattern, by which is meant a plot of 20 log10 |directivity function| versus angle θ , is characterized by a number of lobes all of which have the same maximum level. These lobes, however, alternate in phase relative to one another.
This configuration is known as the acoustic dipole. Its directivity function is simply the term sin θ (or cos θc ), independent of frequency. The directivity pattern shows one large lobe centered about θ = 90° , a perfect null everywhere in the plane θ = 0° , as discussed above, and another lobe, identical to the one centered about θ = 90° but differing in from it by 180° , and centered about θ = −90° . Because of the shape of this pattern plotted in polar format, it is often referred to as a ‘‘figure eight’’ pattern. Because of the multiplicative factor kd/2, which is small, the acoustic output of this source configuration is small compared to that of the in-phase case previously discussed. In fact, if the distance d were to shrink to zero so that the two 180° out-of-phase sources were truly superimposed on each other, the total output of the dipole would be zero.
TWO 180° OUT-OF-PHASE SOURCES
FOUR IN-PHASE COLLINEAR SOURCES
Referring to Fig. 1, imagine that the lower source is 180° out of phase from the upper one. Hence, the pressure field at point P would be the difference of the two radiated pressures rather than the sum, as in Eq. (1). Making the same far-field approximations as before, the total pressure at point P becomes p(P) =
kd −ρo ckQ −jkr e sin θ . sin 2π r 2
(8)
There are two evident differences between this result and that in Eq. (7) to discuss. The first is that now the directivity function is a sine function rather than a cosine, although the argument remains the same. This means that the positions of maxima and nulls of the pattern of the in-phase pair of sources become, instead, the respective positions of the nulls and the maxima of the 180° out-ofphase pair of sources. There is always a null in the plane defined by θ = 0° , the plane which is the perpendicular bisector of line segment d, because any observation point
(9)
Now, consider the source configuration of Fig. 2. This is a symmetrical line array of four, in-phase, point sources, equally spaced by distance d, where the relative strengths are symmetrical with respect to the midpoint of the line array. These relative strength factors, which incorporate many of the physical constants as well as the source strengths Q in either of the terms of Eq. (1), are denoted A1 and A2 here. Hence, by superposing the basic result in Eq. (7), the total pressure field at far-field point P is p(P) =
2 −jkr e [A1 cos u + A2 cos 3u], r
(10)
where u = (kd/2) sin θ . The quantity in the brackets is the (unnormalized) directivity function for this line array. The response is maximum in the direction θ = 0° , whereupon all of the cosine factors are unity. Therefore, the normalized directivity factor is obtained by dividing the quantity in the brackets by (A1 + A2 ).
ACOUSTIC SOURCES OR RECEIVER ARRAYS
and assuming that all sources are in phase with one another and that the relative strengths are symmetrical about the midpoint of the array, one obtains the far-field response function,
P
A2 A1
3
q
p(P) =
A1 d A2
2 −jkr 1 e A0 + A1 cos 2u + A2 cos 4u r 2 + · · · + AN cos 2Nu .
(12)
Figure 2. Symmetrical line array of four point sources.
SYMMETRICAL EVEN ARRAY OF 2N IN-PHASE SOURCES Extending the last result to a line array of 2N sources that are all in phase with one another, where equal spacing d is between any two adjacent sources and where the relative strengths are symmetrical about the midpoint (the centermost pair each have strength A1 , next pair outward from the center each have strength A2 , etc.), one obtains 2 −jkr e [A1 cos u + A2 cos 3u + · · · + AN cos(2N − 1)u]. r (11) The solid curve in Fig. 3 presents the beam pattern for six equistrength sources (i.e., all Ai = 1.0) and for d/λ = 2/3. The plot shows only one-quarter of the possible range of angle θ ; it is symmetrical about both the θ = 0° and the θ = 90° directions. The quantity on the ordinate of the plot, sound pressure level (SPL), equals 20 log |p(θ )/p(0)|. p(P) =
SYMMETRICAL ODD ARRAY OF (2N + 1) IN-PHASE SOURCES The analysis of an odd array is equally straightforward. Placing the origin of coordinates at the centermost source
0 −5 −10
SPL (dB)
−15 −20 −25 −30 −35 −40 −45 −50
0
10
20
30 40 50 60 Angle q (degrees)
70
80
90
Figure 3. Directivity pattern of symmetrical line array of six equispaced point sources, d/λ = 2/3. Solid curve: A1 = A2 = A3 = 1.0; Dashed curve: A1 = 1.0, A2 = 0.75, A3 = 0.35.
This directivity function differs in two regards from Eq. (11) for an even array. First, because the spacing between any two corresponding sources is an even multiple of distance d, all of the cosine arguments are even multiples of the parameter u and second, because the source at the midpoint of the array does not pair up with any other, its relative strength counts only half as much as the other pairs. Nevertheless, the beam pattern of an odd array is very similar to that of an even array of the same total length.
AMPLITUDE WEIGHTING By adjusting the relative weights of the sources (a process called shading the array), attributes of the beam pattern such as the width of the main lobe or the levels of the side lobes compared to that of the main lobe can be altered. The dashed curve in Fig. 3 illustrates the effect of decreasing the strengths of the six-source array in some monotonic fashion from the center of the array toward either end (A1 = 1.0, A2 = 0.75, A3 = 0.35). Invariably, such an amplitude weighting scheme, where the sources at the center of the array are emphasized relative to those at the ends, reduces the side lobes and broadens the width of the main lobe compared to uniform strengths along the array. Many procedures have been devised for choosing or computing what the relative weights should be to realize a desirable directivity pattern. Perhaps two of the most famous and most often used are those due to Dolph (2) and to Taylor (3). Dolph, appreciating features of the mathematical function known as the Chebyshev polynomial which would make it a desirable directivity function, developed a procedure by which the weighting coefficients of the directivity function in either Eq. (11) or (12) could be computed to make that directivity function match the Chebyshev polynomial whose order is one less than the number of sources in the array. This technique has become known as Dolph–Chebyshev shading. Its result is that all of the side lobes of the beam pattern have exactly the same level, which can be chosen at will. In the Taylor procedure, only the first few side lobes on either side of the main lobe are so-controlled and the remainder of them simply decrease in amplitude with increasing angle via natural laws of diffraction theory, as illustrated by the unshaded case (solid curve) in Fig. 3.
4
ACOUSTIC SOURCES OR RECEIVER ARRAYS
PRODUCT THEOREM
Z
The discussion so far has concerned arrays of point sources. Naturally, a realistic source has a finite size. The effect of the additional directivity associated with the finite size of the sources can be accounted for, in certain situations, by using a result known as the product theorem (4) (sometimes called the first product theorem to distinguish it from another one) which says that the directivity function of an array of N sources of identical size and shape, which are oriented the same, equals the product of the directivity function of one of them times the directivity function of an array of N point sources that have the same center-to-center spacings and the same relative amplitudes and phases as the original sources. NONSYMMETRICAL AMPLITUDES The directivity function of the in-phase array that has a symmetrical amplitude distribution, Eq. (11) or (12), predicts a pattern that is symmetrical about the normal to the line that joins the sources, the θ = 0° direction. It is interesting to note that even if the amplitude distribution is nonsymmetrical, as long as the sources are in phase, the pattern is still symmetrical about the θ = 0° direction. This can be seen by considering the simplest case of two point sources again, as in Fig. 1, but assuming that the upper one has strength A1 and the lower one has complex strength A2 exp(jα), where α is some arbitrary phase. In the manner previously discussed, one can show that the magnitude of unnormalized directivity function R(θ ) is
q0
r4
4
(13)
TILTING OR STEERING Figure 4 shows a line array of four, equistrength, equispaced point sources. The following analysis could be performed for any number of sources and for any strength distribution, but this simple example is very illustrative. The result will then be generalized. We wish to have the direction of maximum response at angle θ0 , as measured from the normal direction rather than at θ = 0° , as would be the case if they were all in phase. This can be accomplished if we ‘‘fool’’ the sources into ‘‘believing’’ that they lie at positions indicated by + along the line inclined by that same angle θ0 measured from the x axis, that is, we want to pseudorotate the original line array by the angle θ0 . Then the normal
d 2
1
r1 X
direction to that pseudoarray will make an angle θ0 with the normal direction to the actual array. This pseudorotation is accomplished by introducing time delays (or equivalent amounts of phase shift) into the relative outputs of the sources. Specifically, the output of #3 must be time delayed relative to that of #4 corresponding to the shorter propagation distance (d sin θ0 ) to #3; therefore, that time is τ3 = d sin θ0 /c. Because of the even spacing of the sources in this case, delay times τ2 and τ1 for sources #2 and #1 are two and three times time τ3 , respectively. The total pressure field at the usual far-field observation point at an arbitrary angle θ becomes (momentarily bringing back the previously suppressed harmonic time factor for illustrative purposes) Ae−jkr1 jω(t−τ1 ) Ae−jkr2 jω(t−τ2 ) e + e r1 r2 +
Now, if A1 = A2 but α = 0, this is an even or symmetrical function of argument u and hence of angle θ . However, for α = 0 or nπ , the latter form of Eq. (13) clearly indicates that R(θ ) is not symmetrical about u = 0, that is, about θ = 0° . In fact, introducing specific phase shifts into the sources along a line is a method of achieving a tilted or steered beam pattern.
r2
Figure 4. Geometry associated with tilting the direction of maximum response of an equispaced line array of four point sources to some angle θ0 measured from the normal to the line array.
+ (A1 − A2 )2 sin2 (u − α/2)]1/2 = [A21 + A22 + 2A1 A2 cos(2u − α)]1/2 .
r
3 q0
p(P) =
R(θ ) = [(A1 + A2 )2 cos2 (u − α/2)
r3
=
Ae−jkr3 jω(t−τ3 ) Ae−jkr4 jωt e + e r3 r4
Ae−jkr jωt j(3kd/2) sin θ −j3kd sin θ0 e + e j(kd/2) sin θ e−j2kd sin θ0 e e r +e−j(kd/2) sin θ e−jkd sin θ0 + e−j(3kd/2) sin θ . (14)
If one factors the quantity exp(−j3kd sin θ0 /2) out of every term and by analogy to the previous definition of quantity u defines u0 = kd sin θ0 /2, Eq. (14) becomes 2Ae−jkr jωt −j3u0 e e [cos 3(u − u0 ) + cos(u − u0 )] . r (15) The factor in brackets is the unnormalized directivity function of the tilted array. Because it predicts a maximum value at u = u0 which is to say θ = θ0 , the peak response has been steered or tilted in that direction. As opposed to introducing sequential time delays into the various outputs, as has been described, it seems that one could choose sequential amounts of phase shift so that at circular frequency ω, the phase shift for source n is ωτn . Then, Eq. (14) would be unchanged. This is true. However, one must note that if one uses true time delay elements to accomplish the beam tilting, the angle of tilt θ0 will be the same at all frequencies of operation. If, however, one chooses phase shift networks to accomplish the desired tilt p(P) =
ACOUSTIC SOURCES OR RECEIVER ARRAYS
at a certain frequency, the direction of tilt will change as frequency, changes, which may not be desirable. Generalizing this result for the array of four sources, from Eq. (11) for an even symmetric array of 2N sources, one obtains the unnormalized directivity function of the tilted array as R(θ ) = A1 cos(u − u0 ) + A2 cos 3(u − u0 ) + · · · + AN cos(2N − 1)(u − u0 ),
(16)
and from Eq. (12) for an odd symmetrical array of (2N + 1) sources, R(θ ) = 12 A0 + A1 cos 2(u − u0 ) + · · · + AN cos 2N(u − u0 ). (17) AMPLITUDE BEAM TILTING An interesting alternative interpretation of Eqs. (16) and (17) exists (5). In particular, if one were to expand each of the trigonometric functions of a compound argument in Eq. (16) by using standard addition theorems and then regroup the resulting terms, one would obtain R(θ ) = [A 1 cos u + A 2 cos 3u + · · · + A N cos(2N − 1)u] + [A
1 sin u + A
2 sin 3u + · · · + A
N sin(2N − 1)u], (18) where A 1 = A1 cos u0 , A 2 = A2 cos 3u0 , etc. and A
1 = A1 sin u0 , A
2 = A2 sin 3u0 , etc. Because u0 is a constant, these primed and double primed coefficients are simply two new sets of weighting coefficients that are functions of the original unprimed coefficients and of the angle θ0 to which one wishes to tilt. The two quantities in brackets in Eq. (18) can be readily interpreted. The first is the directivity function of a symmetrical line array of the same size as the actual array, except that it must have weighting coefficients, as prescribed by the primed values. The second term in brackets, with reference to Eq. (8), represents the directivity function of a line array of the same size as the actual array, except that it must have an antisymmetrical set of weighting coefficients, as prescribed by the double primed values; that means that a source on one side of the middle of the array must be 180° out of phase from its corresponding source on the other side of the middle. Equation (18) suggests that if one simultaneously excites the array by using the two sets of excitation signals so as to create both the phase symmetrical output (first term in brackets) and the phase antisymmetrical output (second term in brackets) and allows the two outputs to combine, a tilted beam will result, and there is no reference to employing time delays or phase shifts to accomplish the tilt. Although this is true, it must be noted that the two outputs must truly be added algebraically, as indicated in Eq. (18). Further reference to Eq. (8), as contrasted to Eq. (7), recalls the fact that the output of an antisymmetrical array is inherently in phase quadrature with that of an symmetrical array. This phase quadrature effect must be removed to combine the two
5
outputs algebraically, as required. Hence, for the concept discussed previously to work, it is necessary to shift one of the two sets of excitation signals by 90° , but that is all that is required — one 90° phase-shift network for the whole array, regardless of how many sources there are. The concept works just as well for the odd array described by Eq. (17), except that the centermost source does not participate in the antisymmetrical output. By simply reversing the polarity of one of the two outputs, that is, subtracting the two terms of Eq. (18) rather than adding them, one instead realizes a beam tilted at angle −θ0 . This technique, akin to the use of phase-shift networks to produce tilted beams, has the feature that because the primed and double primed weighting coefficients are frequency-dependent, the angle of tilt will change with frequency for a fixed set of weighting coefficients. Because different angles of tilt can be realized at a given frequency by changing only the weighting coefficients, this technique is called amplitude beam tilting. GRATING LOBES Although Eq. (11) (for an even array) and Eq. (12) (for an odd array) both predict that symmetrical, evenly spaced, in-phase arrays have a maximum response at u = 0(θ = 0° ), it is also evident that the absolute value of either of those directivity functions has exactly the same value as they have at u = 0 if the argument u is any integral multiple of π , that is, if sin θ = nλ/d
(19)
for integer n. This repeat of the main lobe is called a grating lobe, borrowing a term from optics where the effect is common. Equation (19) predicts that this effect cannot occur unless the source spacing d is equal to or greater than the wavelength. Hence, one can usually rule out the existence of problems associated with the occurrence of any grating lobes in the beam pattern by designing the source spacing at less than λ at the highest operating frequency. Because the directivity function of an even array involves cosines of odd multiples of the parameter u, it is noted that the successive grating lobes of an even array alternate in phase whereas those of an odd array are always in phase with one another because that directivity function involves cosines of even multiples of u.
U -SPACE REPRESENTATION If the directivity functions given in either Eqs. (11) or (12) were simply plotted (on a linear scale) as a function of the argument u, that finite Fourier series might yield a curve such as shown in Fig. 5. The peak that is centered at u = 0 is the main lobe. This is surrounded by a set of secondary maxima and minima called side lobes (because a beam pattern is a plot of the absolute value of the directivity function, the relative minima become, in fact, relative maxima) and then, when u = ±π , there is a repeat of the main lobe which is a grating lobe; if u is increased further,
6
ACOUSTIC SOURCES OR RECEIVER ARRAYS
Response function Main lobe
Individual transducer response function
Side lobes
−π
u
π
−u 90
u 90 Grating lobe
Figure 5. u-space representation of the directivity function of a symmetrical line array of six equispaced point sources.
one sees another pair of grating lobes at u = ±2π , etc. This plot happens to be for an even number of sources, in fact, six sources. It can be shown that the number of relative maxima and minima between adjacent grating lobes or between the main lobe and either of its adjacent grating lobes equals the number of sources. Now, in practice, there is a maximum value of u, and it occurs at θ = 90° . Therefore, this maximum value is denoted u90 , and its position on the abscissa of the plot in Fig. 5 is so indicated; it is assumed that d/λ is 0. Although an infinite baffle is a mathematical fiction, it is a reasonably accurate model when the planar piston source is mounted in an enclosure whose dimensions are large compared to the wavelength. Because reflections from the baffle combine constructively with the direct radiation from the source, the pressure in the region z > 0 is effectively doubled. Alternatively, one can argue that the presence of the baffle can be simulated by introducing an in-phase image source immediately adjacent to the actual piston source, thereby effectively doubling its strength (6). Focusing on the one indicated differential patch of area dS in Fig. 9, the differential amount of pressure it creates
x dS
Active region of x, y plane
r′ q
y
Field pt. P (coords: r, q, f)
r z
Area S
Figure 9. Geometry of a two-dimensional radiator lying in the x, y plane; the region of the x, y plane outside the indicated active region has zero velocity.
8
ACOUSTIC SOURCES OR RECEIVER ARRAYS
at the observation point P, based on spherical coordinates (r, θ , φ), is jρo ck udS −jkr
e . (23) dp(P) = 2π r
Then, the total field at point P is obtained by integrating across the active area. This computation can be done, in theory, for an area of any shape or even for disjoint areas, but in practice there are only a few simple geometries for which it is possible to obtain analytic expressions for the radiated pressure and then, usually, only for observation points in the far field. Note that the task of integrating can be further complicated if the velocity distribution u is not constant but varies across the active area. BAFFLED CIRCULAR PISTON Perhaps the simplest shaped piston source for which it is possible to obtain an exact answer for the radiated pressure is the circular piston of radius a and uniform velocity u0 . This is also a practical case because it serves as a model for most loudspeakers as well as for many sonar and biomedical transducers. The integrations involved are not trivial but are well documented in a number of standard textbooks (7–9). The result is jρo cka2 u0 −jkr 2J1 ka sin θ e p(r, θ ) = 2r ka sin θ
BW = 2 sin−1 (0.26λ/a).
(25)
For the same geometry, it is possible to obtain an exact expression for the radiated pressure at any point along the z axis, that is, any point along the normal through the center of the circular piston: k 2 r + a2 + r p(θ = 0) = 2jρo cuo exp −j 2 k r2 + a2 − r , × sin 2
(26)
0.6
where r is now distance along the z axis. This expression is valid even for r = 0. If one plots the square of the absolute value of this expression versus r for an assumed value of ka, the curve will be characterized by a sequence of peaks of equal height, determined by the maxima of the sine squared function. There will be one last peak at approximately r = a2 /λ. Then, for r greater than this value, because the argument of the sine squared function actually becomes smaller, as r increases, one can make a small argument approximation to the sine squared function and to the square root function and show that the square of the radiated on-axis pressure varies as 1/r2 or that the pressure varies as 1/r. Hence, we can say that this is the distance to the so-called far-field. That distance is variously taken as 2a2 /λ or 4a2 /λ or sometimes π a2 /λ. It is not that the distance is arbitrary, but rather that the exact result asymptotically approaches the 1/r dependence for large r, and therefore, the greater the distance, the more accurate the approximation will be. The value π a2 /λ, which can be read as the active area divided by the wavelength, is a convenient metric for other shapes of radiators.
0.4
BAFFLED RECTANGULAR PISTON
(24)
where J1 is the Bessel function of the first kind of order one. The quantity in brackets is the normalized directivity function of the uniform strength, baffled, circular piston. Because of the axisymmetry of the source, the result is independent of the circumferential angle φ. Figure 10 is a plot of this function versus the dimensionless argument ka sin θ . This plot represents one-half of a symmetrical beam pattern, that is, θ varies in only one direction from the z axis of Fig. 9. Depending on the value of ka, only a portion of this curve is applicable; the largest value
1 0.8
2∗J-sub1(v )/v
of the argument is ka, which occurs at θ = 90° . If ka is small, one observes very little decrease in the response at θ = 90° , that is, any source that is small, compared to the wavelength, radiates omnidirectionally. However if ka is say 20, there will be about five side lobes on each side of the main lobe. Note that the level of the first side lobe on either side is about 17.6 dB lower than that of the main lobe and that subsequent side lobes are even lower. By noting the value of the argument corresponding to the √ 1/ 2 value of the main lobe, one can determine that the total −3-dB beam width (BW) of the main lobe is predicted by the equation
For a rectangular piston of dimensions (parallel to the x axis of Fig. 9) by w (parallel to the y axis) and uniform velocity, one can readily integrate Eq. (23) across the active area to obtain the far-field radiated pressure. For this case, the normalized directivity function is
0.2 0 −0.2
0
5
10
15 Argument v
20
25
30
Figure 10. Normalized response function of a baffled circular piston radiator of radius a and uniform normal velocity; argument v = ka sin θ.
R(θ ) =
sin sin w = Sinc( )Sinc(w ) w
(27)
where = (k/2) sin θ cos φ and w = (kw/2) sin θ sin φ. Because the source distribution is not axisymmetric, the
ANALOG AND DIGITAL SQUID SENSORS
directivity function depends on both the coordinate angles φ and θ that locate the far-field observation point P. This expression is a special case of a result known as the second-product theorem (10) which states that the pattern function of a planar radiator whose velocity distribution can be written as the product of a function of one coordinate variable and another function of the orthogonal coordinate variable, for example, [u(x, y) = f (x)g(y)], equals the product of the pattern functions of two orthogonal line arrays, one that lies along the x axis and has velocity distribution f (x), and the other, that lies along the y axis and has velocity distribution g(y). For a piston of uniform velocity, f (x) = g(y) = 1.0, and therefore the two line array pattern functions, as noted earlier, are sinc functions.
9
10. V. M. Albers, Underwater Acoustics Handbook II, The Pennsylvania State University Press, University Park, PA, 1965, p. 188. 11. P. M. Morse and K. U. Ingard, Theoretical Acoustics, McGraw Hill, NY, 1968, pp. 332–366. 12. M. C. Junger and D. Feit, Sound, Structures, and Their Interaction, 2nd ed., The MIT Press, Cambridge, MA, 1986, pp. 151–193. 13. H. Schenck, J. Acoust. Soc. Am. 44, 41–58 (1968). 14. G. W. Benthien, D. Barach, and D. Gillette, CHIEF Users Manual, Naval Ocean Systems Center Tech. Doc. 970, Sept. 1988.
ANALOG AND DIGITAL SQUID SENSORS NONPLANAR RADIATORS ON IMPEDANCE BOUNDARIES The problem of computing the radiation from source distributions that lie on nonplanar surfaces is considerably more complicated and in fact tractable only in an analytic sense for a few simple surfaces such as spheres and cylinders. These results are well documented in some standard textbooks (11,12) . For sources on an arbitrary surface, one must resort to numerical procedures, particularly an implementation of the Helmholtz integral equation such as the boundary value program CHIEF described in (13,14). Furthermore, if the supporting surface is an elastic body or a surface characterized by some finite locally reacting impedance, a finite-element model of the boundary reaction coupled with a boundary value program such as CHIEF must be used to compute the radiated sound field.
ABBREVIATIONS AND ACRONYMS SI m s dB BW
System International meters seconds decibel beamwidth
BIBLIOGRAPHY 1. L. E. Kinsler, A. R. Frey, A. B. Coppens, and J. V. Sanders, Fundamentals of Acoustics, 4th ed., J. Wiley, NY, 2000, pp. 193–195.
MASOUD RADPARVAR HYPRES, Inc. Elmsford, NY
INTRODUCTION Superconducting quantum interference devices (SQUIDs) are extremely sensitive detectors of magnetic flux that can be used as low-noise amplifiers for various applications such as high resolution magnetometers, susceptometers, neuromagnetometers, motion detectors, ultrasensitive voltmeters, picoammeters, readout of cryogenic detectors, and biomagnetic measurements. A SQUID is a superconducting ring interrupted by one or two Josephson tunnel junctions made of a superconductor such as niobium (Nb) (Fig. 1). Nb is the material of choice for superconducting circuits because it has the highest critical temperature among the elemental superconductors (9.3 K). This alleviates certain concerns for uniformity and repeatability that must be taken into account when using compound superconductors. Josephson tunnel junctions using niobium as electrodes and thermally oxidized aluminum as tunnel barriers have been investigated rather extensively (1) and have been brought to a state of maturity suitable for medium-scale integrated circuits by several laboratories. Figure 2a shows the characteristics of a Josephson tunnel junction. One of the unique features of a Josephson junction is the presence of a tunnel current through the device at zero bias voltage. The magnitude of this current, which is normally referred to as the Josephson current, depends
2. C. L. Dolph, Proc. IRE 34, 335–348 (1946). 3. T. T. Taylor, IRE Trans. Antennas Propagation AP-7, 16–28 (1955).
I /P
4. L. E. Kinsler et al., Op. Cit., p. 199.
X
5. W. J. Hughes and W. Thompson Jr., J. Acoust. Soc. Am. 5, 1040–1045 (1976).
F/ B
6. L. E. Kinsler et al., Op. Cit., pp. 163–166. 7. L. E. Kinsler et al., Op. Cit., pp. 181–184. 8. D. T. Blackstock, Fundamentals of Physical Acoustics, Wiley, NY, 2000, pp. 445–451. 9. A. D. Pierce, Acoustics: An Introduction to its Physical Principles and Applications, McGraw Hill, NY, 1981, pp. 225–227.
Resistive shunts
SQUID loop
Josephson junction
Figure 1. Circuit diagram of a dc SQUID. The feedback (F/B) and X coils are coupled to the SQUID loop through multiturn coils whose number of turns is determined by the required sensitivity.
10
ANALOG AND DIGITAL SQUID SENSORS
Current (0.1 mA/div.)
(a)
Voltage (0.5 mV/div.)
gauss-cm2 (h = Planck’s constant, e = electron charge). This sensitivity of the Josephson current to an external magnetic field is exploited in building ultrasensitive SQUID-based magnetometers. Such a magnetometer can be easily transformed into a picoammeter or a low-noise amplifier for many applications by coupling its transformer directly with the output of a detector. A typical SQUID circuit process uses Josephson junctions 3–10 µm2 in area whose critical current density is 100–1000 A/cm2 . These circuits are fabricated by a multilayer process using all niobium electrodes and wiring, an aluminum oxide tunnel barrier, Mo or Au resistors, Au metallization, and SiO2 insulating layers. Table 1 lists the layers and their typical thicknesses. The layer line widths are nominally 2.5 µm for a 5-µm pitch. The overall dimensions of a SQUID circuit including its integrated coils are about 1000 × 3000 µm2 .
(b) Voltage (20 µV/div.)
ANALOG SQUIDS
F0 Figure 2. (a) High-quality Josephson junctions are routinely fabricated from conventional materials such as niobium. (b) Voltage across a shunted SQUID as a function of an external magnetic field.
on a magnetic field. When two Josephson junctions are paralleled by an inductor (Fig. 1), their combined Josephson currents as a function of an applied field to the SQUID loop by either the coupled transformer (X) or feedback (F/B) coil exhibit a characteristic similar to that shown in Fig. 2b, which is normally called an interference pattern. The periodicity of the pattern exhibited in this characteristic is multiples of the flux quantum o = h/2e = 2.07 × 10−7
The first SQUID-based device was used as a magnetometer in 1970 by Cohen et al. (2) to detect the human magnetocardiogram. Since then, this type of magnetic flux detector has been used as the main tool in biomagnetism. The SQUID is inherently very sensitive. Indeed, the principal technical challenge by researchers has been mainly discrimination against the high level of ambient noise. A SQUID amplifier chip has a SQUID gate coupled to a transformer (Fig. 3a). The transformer is, in turn, coupled to a matched input coil (pickup loop). The SQUID and its associated superconducting components are maintained at 4.2 K by immersion in a bath of liquid helium in a cryogenic dewar. Superconducting transformers are essential elements of all superconducting magnetometer/amplifier systems. The SQUID and its transformer are usually surrounded by a superconducting shield to isolate them from ambient fields. The transformer acts as a flux concentrator. In such an input circuit (the transformer + input coil), the flux trapped is fixed, and subsequent changes in the applied field are canceled exactly by changes in the screening current around the input circuit. Because no energy dissipates, the
Table 1. A Typical Standard Niobium Process for Fabricating SQUID Magnetometers Layer
Material
Thickness (nm)
Purpose
1 2 3
Niobium Silicon dioxide Niobium Aluminum oxide Niobium Molybdenum Silicon dioxide Niobium Silicon dioxide Niobium Gold
100 150 135 ∼15 100 100 200 300 500 500 600–1,000
Ground plane Insulator Base electrode Tunnel barrier Counterelectrode 1 /square resistor Insulator First wiring Insulator Second wiring Pad metallization
4 5 6 7 8 9 10
ANALOG AND DIGITAL SQUID SENSORS
Signal coil
(a)
Base electrode
Josephson junctions
Counter-electrode
SQUID loop
(b)
Multiturn transformer
Contact pads
Josephson junctions
Figure 3. (a) Layout schematic of a dc SQUID. The transformer is coupled to the SQUID via the hole in the plane of the SQUID. The shunting resistors across Josephson junctions are not shown in this diagram. (b) Photograph of a dc SQUID chip.
11
SQUID loop. This tuned circuitry is driven by a constant current RF oscillator. RF SQUID-based circuits were slowly phased out because of their lower energy sensitivity (5 × 10−29 J/Hz at a pump frequency of 20 MHz), compared with dc SQUIDs. The dc SQUID differs from the RF SQUID in the number of junctions and the bias condition. The dc SQUID magnetometer, which was first proposed by Clarke (3), consists of the dc SQUID exhibited in Fig. 1 where the Josephson junctions are resistively shunted to remove hysteresis in the current–voltage characteristics and the transformer is a multiturn coil tightly coupled to the SQUID loop. When the dc SQUID is biased at a constant current, the expected behavior to an external magnetic field is through periodic dependence of output voltage on input magnetic flux. Figure 2b shows such a characteristic for a Nb/AlOx/Nb SQUID. The SQUID’s sensitivity to an external field is improved significantly by coupling the device to the multiturn superconducting flux transformer. Figure 3b shows a photograph of an actual SQUID chip and its integrated transformer. It is estimated that the minimum energy sensitivity for dc SQUIDs is on the order of h (Planck’s constant, 6.6 × 10−34 J/Hz). dc SQUIDs that operate at 5h have been fabricated and evaluated in the laboratory (4). The performance of these Josephson tunnel junction SQUIDs is actually limited by the Johnson noise generated in the resistive shunts used to eliminate hysteresis in the current–voltage characteristics. To use a SQUID as an amplifier, its periodic transfer characteristic should be linearized. This linearization also substantially increases the dynamic range of the SQUID circuit. Figure 4 shows a dc SQUID system and its peripheral electronics. The function of the feedback coil is to produce a field that is equal but opposite to the applied field. The circuit uses a phase-lock loop (lock-in amplifier) and a reference signal to facilitate a narrowband, lock-in type of measurement. The output is proportional to the feedback current, hence, to the amount of flux required to cancel the applied field, and is independent of the voltage–flux (transfer) characteristic shown in Fig. 2b. Thus, the SQUID circuit that has a feedback coil, lock-in
Transformer
Step-up transformer
I /P
superconducting input circuit is a noiseless flux-tocurrent transducer. The loop operates at arbitrarily low frequencies, thus making it extremely useful for very low frequency applications. The field imposed on the SQUID loop by the input circuit is transformed into a voltage by the SQUID and is sensed by an electronic circuit outside the dewar. Early systems used RF SQUIDs because they were easy to fabricate. The prefix RF refers to the type of bias current applied to the SQUID. An RF SQUID is a single junction in a superconducting loop. Magnetic flux is generally inductively coupled into the SQUID via an input coil. The SQUID is then coupled to a high-quality resonant circuit to read out the current changes in the
Pre-amp.
Phasesensitive detector
Dc amp.
Feedback coil ~
Output
Feedback current Figure 4. A dc SQUID chip and peripheral electronics. A feedback circuit is needed to linearize SQUID characteristics. The dotted line shows the components normally at cryogenic temperature. The output resistor can be on- or off-chip.
12
ANALOG AND DIGITAL SQUID SENSORS
amplifier, oscillator, dc amplifier, and on-chip transformer coil serves as a null detector. The feedback electronics also determine the dynamic range (ratio of the maximum detectable signal and minimum sensitivity) and the slew rate (the maximum allowable rate of change of flux coupled to the SQUID loop with time) of the system. The maximum measurable signal is determined by the maximum feedback current available from the peripheral electronics. The slew rate is determined by the speed by which the feedback current can null out the applied field, which is dictated by the speed of the peripheral electronics, as well as the physical distance between the electronics and the SQUID chip. To simplify the room temperature electronics and eliminate the bulky transformer, which limits the system bandwidth, Drung et al. (5) developed and demonstrated a dc SQUID magnetometer that has additional positive feedback (APF). In this case, an analog SQUID is operated without impedance matching and without flux modulation techniques. For a minimum preamplifier noise contribution, the voltage–flux characteristic of a dc SQUID (Fig. 2b) should be as steep as possible at the SQUID’s operating point. To steepen the voltage–flux characteristic at the operating point, Drung’s novel circuit directly couples the SQUID voltage via a series resistor to the SQUID loop. This current provides additional positive feedback and steepens the voltage–flux characteristic at the operating point at the expense of flattening the curve that has an opposite slope. For a properly designed SQUID that has adequate gain, such a technique allows reading out the voltage across the dc SQUID without the need for the step-up transformer. Alternatively, the peripheral circuitry can be either integrated on-chip with the SQUID gate, as in a digital SQUID, or can be minimized by exploiting the dc SQUID array technique. The digital SQUID is suitable for multichannel systems, where it virtually eliminates the need for all of the sophisticated room temperature electronics and reduces the number of leads to a multichannel chip to less than 20 wires. This method reduces system cost and heat leak to the chip significantly at the expense of a very complex superconducting chip. On the other hand, the DC SQUID array chip relies on relatively simple circuitry and is ideal for systems where the number of channels is limited and the heat load to the chip is manageable. In this case, an analog system becomes more attractive than a system based on digital SQUID technology. DC SQUID ARRAY AMPLIFIERS A dc SQUID array amplifier chip consists of an input SQUID that is magnetically coupled to an array of 100–200 SQUIDs (Fig. 5) (6). An input signal couples flux into the input SQUID, which is voltage-biased to a very small resistor, typically 0.05 , so that the SQUID current is modulated by variations in the applied flux. The flux modulation coil of the output SQUID is connected in series with the input SQUID, so that variation in the input SQUID current changes the flux applied to the output
Feedback
Bias
Output Integrator
Input
SQUID
Figure 5. Circuit schematic of the two-stage dc array amplifier. The analog SQUID is coupled to an array of dc SQUIDs through coupling series inductors. The room temperature amplifier can be directly connected to the array. The signal coil is connected to the input transformer, and the feedback coil can be connected to a simple integrator attached to the output of the room temperature amplifier.
array. The series array is biased at a constant current, so the output voltage is modulated by this applied flux. The external field sensed by the high-sensitivity analog SQUID is converted to a current that is applied, through the series array of inductors, to the series dc SQUIDs (Fig. 5). These dc SQUIDs have voltage–flux characteristics similar to those of Fig. 2b, but about 100 times larger in amplitude due to the summation of the voltages in the output series SQUID array. Because the output current from the front end analog SQUID is relatively large, we do not need high-sensitivity dc SQUIDs for the dc SQUID array. This significantly simplifies the SQUID layout density and fabrication. The series array of dc SQUIDs can generate a dc voltage on the order of nanovolts, which can be used directly by room temperature electronics and feedback circuitry to read out and provide feedback to the high-sensitivity input analog SQUID circuit. Because the output voltage of the SQUID array is very large, no special dc amplifiers or step-up transformers are required. The input SQUID stage normally consists of a lowinductance, double-loop SQUID that has a matchedinput flux transformer (Fig. 6). The two SQUID loops are connected in parallel, and the junctions are located between them. Therefore, the SQUID inductance is half of the individual loop inductance. Two multiturn modulation coils are wound (in opposite directions) on the SQUID loops, one on each side, and connected in series. This double coil is connected to a load or an input coil. An additional coil, as feedback, can be integrated to permit operation in a flux-locked loop. DIGITAL SQUID SENSORS The high resolution data obtained from SQUID magnetometers in biomedical applications such as magnetoencephalograms show remarkable confirmation of the power of magnetic imaging. To localize the source of the signal, which could be a single-shot event, it is important
ANALOG AND DIGITAL SQUID SENSORS
13
Feedback loop
I/P coil
Write gate 1
Write gate 2
Analog SQUID Buffer
Comparator
Output Figure 7. Block diagram of a digital SQUID chip.
Figure 6. A dc SQUID that has two SQUID loops minimizes the effect of an external field.
to have a large number of simultaneous recording channels. However, as the number of channels increases, the wiring requirements (four wires per an analog SQUID), the physical size of available dc SQUIDs and SQUID electronics, and the cost become prohibitive. For example, a 128-pixel system, using conventional bias and readout circuitry, requires at least 512 wires attached to the superconducting chip and needs a set of 128 channels of sophisticated electronics. A bundle of wires containing 512 connections between superconducting chips held at 4.2 K and room temperature represents a 4-W heat leak, or consumption of about 3 liters of liquid helium per hour. Consequently, one must turn to multiplexed digital SQUIDs for such imaging applications. The concept of digital SQUID magnetometry is rather new and, thus, deserves more attention. A digital SQUID integrates feedback electronics into the SQUID circuit and eliminates the need for modulation and a phase-sensitive detector. A multiplexed digital SQUID requires less than 20 connections to a chip and minimal support electronics by introducing a very sophisticated superconducting chip. Consequently, heat load from the leads to the chip is substantially reduced to a few hundred milliwatts or less due to the reduction in the number of connections. A complete digital SQUID amplifier consists of a highsensitivity analog SQUID coupled to a comparator and integrated with a feedback coil. Figures 7 shows the block diagram of a digital SQUID chip. In this case, the pickup coil is in series with the feedback circuit and is integrated with the analog SQUID front end. The write gates induce currents in the feedback loop for positive and negative
applied magnetic fields to maintain the circulating current in the feedback coil near zero. Due to flux quantization, the induced currents create magnetic fields in the feedback loop in multiples of flux quanta. A quantum of flux is normally called a fluxon or antifluxon, depending on the direction of its magnetic field. The distinct advantage of the digital SQUID amplifier shown in Fig. 7 is its ultrawide dynamic range that is obtained by using the series connection of the single-fluxquantum feedback device with the SQUID pickup loop. The circuit consists of a comparator SQUID coupled to the output of an analog SQUID preamplifier. The output of the comparator controls the inputs of the two separate SQUID write gates (one of them through a buffer circuit). As can be seen from Fig. 7, the outputs of the write gates are connected through a superconducting inductive path consisting of the pickup coil and the flux transformer of the analog SQUID preamplifier. As a consequence of this series connection, the feedback loop always operates so that net circulating current vanishes, resulting in a practically unlimited dynamic range. In addition, due to its high dynamic range, such a digital SQUID can be used in a relatively high magnetic field without requiring extensive magnetic shielding. This novel architecture of combining two write gates in series with the applied signal gives this digital magnetometer circuit its high dynamic range. The operation of the single-chip magnetometer is as follows. In the absence of an external field, the comparator switches to a voltage state that causes its corresponding write gate to induce a flux quantum into the feedback loop and prohibits the buffer gate from switching to a voltage state. This flux creates a circulating current in the loop that is amplified by the analog SQUID and is applied back to the comparator SQUID. In the following clock cycle, this magnetic field keeps the comparator from switching and causes the buffer circuit to switch; hence, its corresponding write gate induces an antifluxon in the loop that annihilates the original fluxon. As long as there is no applied magnetic field, this process of fluxon/antifluxon creation/annihilation continues and represents the steadystate operation of the digital SQUID circuit, as shown in Fig. 8.
14
ANALOG AND DIGITAL SQUID SENSORS
Input coil Voltage
Feedback
Comparator
Coupling transformer Time Figure 8. Steady-state outputs of a digital SQUID circuit in the absence of an external field. The traces from top to bottom are clock 1, clock 2, input, and comparator and buffer outputs. As expected, the outputs alternately induce fluxons and antifluxons into the loop.
In the presence of an applied magnetic field, the comparator causes its corresponding write gate to generate pulses into the feedback loop to cancel the applied magnetic field. The injection of fluxons continues in each clock period, as long as the gate current of the comparator is less than its threshold current. Using proper polarity, the (Single Flux Quantum) SFQ-induced current in the superconducting feedback loop can eventually cancel the applied current and restore the comparator SQUID close to its original state. When the current in the feedback loop is close to zero, both write gates alternately emit fluxons and antifluxons into the loop in each clock period and keep the feedback current close to zero. The difference between the number of pulses is a measure of the applied the signal. Figure 9 shows a photograph of a successful high-sensitivity digital SQUID chip. In an optimized digital SQUID circuit, the least significant bit (LSB) of the output must be equal to the 1/2 flux noise of the front-end analog SQUID (SB ). If the 1/2 LSB is much less than SB , then the circuit does not work properly, that is, the comparator output randomly fluctuates between 0 and 1. If the LSB is much larger than 1/2 SB , then the sensitivity of the complete digital SQUID is compromised. In the latter case, however, such a lowsensitivity digital SQUID can be produced and will operate properly, albeit it has nonoptimal sensitivity. The digital SQUID chip slew rate (a measure of how fast the feedback current can null out the applied signal), is proportional to the clock frequency. The system must track the signal as well as the noise (at a slew rate of a few µT/s) to be able to operate in an unshielded environment. This requirement sets the minimum clock frequency required to accommodate the slew rate associated with the external noise at typically a few tens of MHz. The sensitivity of the single-chip digital SQUID magnetometer is limited by the sensitivity of its analog SQUID frontend. Consequently, if an analog SQUID that
Figure 9. Photograph of a high-sensitivity digital SQUID chip magnetometer. The loop inductance of the SQUID comparator is made of eight washers in parallel to facilitate coupling it to the large coupling transformer. The analog SQUID is coupled to the feedback coils through the two washers under these coils.
Clock
Digital SQUID
1-bit code
Divide by 214
+ −
Up-down counter
Accumulator
Output Figure 10. Block diagram of the digital SQUID peripheral electronics.
has ultimate (quantum) sensitivity is used as a front-end sensor for a digital SQUID, the complete sensor chip also possesses the same quantum efficiency. The easiest method of reconstructing the signal applied to the input of a digital SQUID is to integrate the output of the digital SQUID in real time. This method, however, does not take advantage of the digital nature of the output signal. Figure 10 shows a block diagram of the complete digital peripheral electronics for digital SQUID magnetometers. The output of the digital SQUID, which is typically 1 mV, is amplified 100 times by a preamplifier (not shown) that is attached to the digital SQUID output. Then, this amplified signal is fed to a gate that either generates a pulse at the positive output when there is an input pulse, or generates a pulse at the negative terminal
ANALOG AND DIGITAL SQUID SENSORS
when there is no input pulse. The counter simply keeps track of the difference between the number of positive and negative pulses. The counter output is loaded into the accumulator after each read cycle. Between read cycles, the counter’s value is added to the accumulator after each clock cycle. This decimation technique is actually the digital equivalent of a simple analog low-pass filter. Any data acquisition program, such as LabView, can postprocess the digital signal. SUMMARY Today analog dc SQUIDs are the most widely produced superconducting magnetometers for a variety of applications such as biomagnetism, rock magnetometry, and nondestructive evaluation. The SQUID array amplifier has led to significant improvement in the performance and cost of readout circuits for cryogenic detectors used in high-energy particle physics. The main features of the array amplifier are the wide bandwidth and low cost. The most intriguing application of the digital SQUID-based system is in the biomedical field, where multiple SQUID
15
channels are required for magnetic imaging. However, significant engineering works remain to be done before digital SQUIDs can be marketed.
BIBLIOGRAPHY 1. M. Gurvitch, M. A. Washington, H. A. Huggins, and J. M. Rowell, IEEE Trans. Magn. MAG-19, 791 (1983). 2. D. Cohen, E. Edelsack, and J. E. Zimmerman, Appl. Phys. Lett. 16, 278 (1970). 3. J. Clarke, in B. B. Schwartz and S. Foner, eds., Superconducting Applications: SQUID and Machines, Plenum Press, NY, 1977, pp. 67. 4. M. B. Ketchen and J. M. Jaycox, Appl. Phys. Lett. 40, 736 (1982). 5. D. Drung et al., Appl. Phys. Lett. 57, 406 (1990). 6. R. P. Welty and J. M. Martinis, IEEE Trans. Magn. 27, 2924 (1991). 7. N. Fujimaki et al., IEEE Trans. Electron Dev. 35, 2414 (1988). 8. M. Radparvar, IEEE Trans. Appl. Superconductivity 4, 87 (1994).
C CAPACITIVE PROBE MICROSCOPY
of cross-sectioned transistors for process failure analysis. Semiconductors other than silicon, such as InP/InGaAsP buried heterostructure lasers, have also been imaged with SCM. Another application that has encouraged the development of commercial SCMs is the need for quantitative two-dimensional (2-D) carrier profiling of silicon transistors. The International Technology Roadmap for Semiconductors (1) describes the necessity and requirements for 2-D carrier profiling as a metrology to aid in developing future generations of ICs. If accurate 2-D carrier profiles could be measured, they could be used to enhance the predictive capabilities of technology computer-aided design (TCAD). However, for this application, the performance goals for 2-D carrier profiling tools are very challenging. Carrier profiles of the source and drain regions of the transistors need to be known at 5 nm spatial resolution and ±5% accuracy in 2001 — the requirements rise to 0.6 nm spatial resolution and ±2% accuracy by 2014. The status of quantitative carrier profiling by SCM will be reviewed later. This article will review the history, operating principles, and applications of SCM. Special emphasis will be placed on measuring quantitative 2-D carrier profiles in silicon using SCM. This article will also review implementation and applications of the scanning capacitance spectroscopy (SCS) technique, the intermittent contact mode of SCM (or IC-SCM), and two other techniques that can also be classified as scanning capacitance probes: the scanning Kelvin probe microscope (SKPM), when operated at twice its drive frequency, and the scanning microwave microscope (SMWM).
JOSEPH J. KOPANSKI National Institute of Standards and Technology Gaithersburg, MD
INTRODUCTION A scanning capacitance microscope (SCM)1 combines a differential capacitance measurement with an atomic force microscope (AFM). The AFM controls the position and contact force of a scanning probe tip, and a sensor simultaneously measures the capacitance between that tip and the sample under test. The capacitance sensor can detect very small (≈10−21 F) changes in capacitance. The sensor is electrically connected to an AFM cantilevered tip that has been coated with metal to make it electrically conductive. In the most common mode of operation, an ac voltage (around 1 Vpeak−to−peak at 10 kHz) is used to induce and modulate a depletion region in a semiconductor sample. The varying depletion region results in a varying, or differential, SCM tipto-sample capacitance. The capacitance sensor detects the varying capacitance and produces a proportional output voltage. The magnitude and phase of the sensor output voltage are measured by a lock-in amplifier that is referenced to the modulation frequency. The signal measured by the SCM is the output voltage of the lock-in amplifier, which is proportional to the induced tip-to-sample differential capacitance. As the SCM/AFM tip is scanned across the sample surface, simultaneous images of topography and differential capacitance are obtained. Currently, the most common application of SCM is imaging the carrier concentration variations (also known as dopant profiles) qualitatively within transistors that are part of silicon integrated circuits (ICs). The ICs are cross-sectioned to expose the structure of individual transistors and other electronic devices. The differential capacitance measured by the SCM is related to the carrier concentration in the silicon. Regions of high carrier concentration, such as the source and drain of a transistor, produce a low differential capacitance signal level, and regions of low carrier concentration, such as the transistor channel, produce a high signal level. Thus, SCM images of transistors contain information about the details of construction of the transistor that is not usually visible in an optical or topographic image. Silicon integrated circuit manufacturers use SCM images
DEVELOPMENT OF SCM The first capacitance probe microscope predates the AFM. Matey and Blanc described an early version of SCM in 1985 (2,3). This instrument measured capacitance using a sensor removed from an RCA2 SelectaVision VideoDisc System (4,5) and a 2.5 × 5 µm probe that was guided in a groove. The VideoDisc was a competitor to the Video Cassette Recorder (VCR) for home video playback. The VideoDisc retrieved an analog video signal for playback that had been mechanically imprinted onto a capacitance electronic disk (CED). The VideoDisc player featured a capacitance detector that operated at 915 MHz and detected small changes in capacitance by the shift induced in the resonant peak of the response of an inductance–capacitance–resistance (LCR) resonator. The
2 Certain commercial equipment, instruments, or materials are identified in this paper to specify the experimental procedure adequately. Such identification does not imply recommendation or endorsement by NIST, nor does it imply that the materials or equipment used are necessarily the best available for the purpose.
1
The acronym SCM refers to the instrument, scanning capacitance microscope, and the technique, scanning capacitance microscopy. 16
CAPACITIVE PROBE MICROSCOPY
operating principle of the capacitance detector will be discussed in detail later. The VideoDisc has not endured in the market place, but the remarkable sensitivity of its capacitance detector inspired the SCM. Sensors similar in design are used on the commercial SCMs of today. The next development was a generation of SCMs that were similar in some ways to the scanning tunneling microscope (STM) (6–11). These instruments featured a sharpened metal tip and a piezoelectric scanning system like that used for the STM. Instead of controlling tip height by maintaining constant electron tunneling current, as in the STM, these microscopes maintained a constant capacitance to control tip-to-sample separation. By using the capacitance as the feedback signal, these SCMs measured the topography of insulating surfaces. (STMs can measure only surfaces that are relatively good conductors). Slinkman, Wickramasinghe, and Williams of IBM patented an SCM based on AFM and a RCA VideoDisc sensor in 1991 (12). This instrument was similar in configuration to current SCMs. Rather than using the capacitance signal to control the tip–sample separation, this instrument used the AFM to control tip position, and the capacitance was measured independently of the AFM. Combinations of commercial AFMs and RCA capacitance sensors to form SCMs were demonstrated soon after (13–15). The first turnkey commercial SCM was introduced by Digital Instruments Inc. in 1995. Development of this instrument was partially funded by the industry consortium SEMATECH. Thermomicroscopes Inc. also markets an SCM, and Iders Inc., Manitoba, Canada, markets a capacitance sensor suitable for SCM. Today, numerous groups pursue research into further development of SCM. Much, but not all, of this work is focused on developing SCM as a tool to measure the twodimensional carrier profiles of the source/drain regions of silicon transistors. The following is a thorough, but not exhaustive, survey of recent publications from groups that have active SCM programs. In North America, a variety of university, industrial, and government labs have published SCM related work. Contributions from each of these groups are referenced where relevant throughout this article. Since 1992, the National Institute of Standards and Technology (NIST) has had a SCM program with the goal of making SCM a quantitative 2-D carrier profiling technique (13,15–26). The NIST program has developed models of the SCM measurement based on three-dimensional finite-element solutions of Poisson’s equation (23,24) and Windows-based software for rapid carrier profile extraction from SCM images (21). One of the inventors of the SCM technique, Williams, is now at the University of Utah and continues to lead a large and active SCM group. The University of Utah program also seeks to use SCM to measure quantitative 2-D carrier profiles of silicon transistors (27–37). The industrial consortium, International SEMATECH, has also played a critical role in developing a commercial SCM tool. International SEMATECH established a working group on 2-D dopant profiling that conducts review
17
meetings on the SCM technique and is conducting roundrobin measurements of samples by SCM (38). Universities that publish SCM studies include Stanford University (39,40), the University of California at San Diego (41), and the University of Manitoba (42). Industrial applications of SCM have been reported by scanning probe microscope instrument makers, Digital Instruments Inc. (15, 43) and Thermomicroscopes (11,44), and semiconductor manufacturers such as Intel (15,43), AMD (17,18,45), Texas Instruments (46–50), and Lucent Technologies (51,52). In Japan and Korea, several groups are investigating SCM applications. Researchers at Korea’s Seoul National University have studied charge trapping and detrapping mechanisms by using SCM in SiO2 -on-Si structures (44,53–55). Fuji Photo Film Co. has studied SCM charge injection and detection of bits for ferroelectric/semiconductor memory (56–58). Researchers at Tohoku University, Nissin Electric Co., and Nikon Co. have investigated charge injection into SiO2 -on-Si structures (59–62). Multiple groups are active in Europe. Researchers at the Laboratory of Semiconductor Materials, Royal Institute of Technology, Sweden, have specialized in applying SCM to image buried heterostructure (BH) laser structures and other layered structures made from InP/GaInAsP/InP (63–65). Researchers at the University of Hamburg, Germany, have reported manipulating charge in NOS structures as potential mass storage devices using a capacitance sensor of their own design (66,67). In Switzerland, a group at the Swiss Federal Institute of Technology has developed a direct inversion method for extracting carrier profiles from SCM data and simulated the methods’ sensitivity to tip size (68,69). IMEC, in Belgium, has long been a center of expertise in measuring carrier profiles in silicon. IMEC has published various studies of SCM and scanning spreading resistance microscopy (70–74). OPERATING PRINCIPLES Contrast Mechanism Figure 1 shows a block diagram of a typical scanning capacitance microscope operated in the constant changein-voltage (V) mode. The SCM can be considered as four components: (1) AFM, (2) conducting tip, (3) capacitance sensor, and (4) signal detection electronics. The AFM controls and measures the x, y, and z positions of the SCM tip. (See the article on AFMs in this encyclopedia for more details). The conducting tip is similar to ordinary AFM tips, except that it needs to have high electrical conductivity and must have electrical contact with the capacitance sensor and the sample. The radius of the tip determines the ultimate spatial resolution of the SCM image. Commercially available metal-coated silicon tips or tips made completely from very highly doped (and hence low resistivity) silicon have been used. Capacitance Sensor. The capacitance sensor is the key element that enables the scanning capacitance microscope.
18
CAPACITIVE PROBE MICROSCOPY
Piezoelectric scanner
Laser diode Beam position detector
dC
Lock-in amp (w)
dV C(Vdc)
Capacitance sensor
r ileve Cant Tip Dielectric N-type Figure 1. Block diagram of the scanning capacitance microscope configured for constant V mode (13). (Copyright American Institute of Physics 1996, used with permission.)
Semiconductor
The archetypical SCM uses a capacitance sensor similar to that used in the RCA VideoDisc player (4,5). Commercial SCMs use sensors that are similar in concept, though unique and proprietary in design. The sensor uses an excited LCR resonant circuit to measure capacitance. The capacitance to be measured is incorporated in series as part of the LCR circuit. The amplitude of the oscillations in this resonant circuit provides a measure of the capacitance. Figure 2 shows a simplified schematic of the capacitance sensor circuitry. The circuit consists of three inductively coupled circuits: an ultrahigh frequency (UHF) oscillator, the central LCR resonant circuit, and a simple peak detection circuit. The UHF (ωhf ≈ 915 MHz for the VideoDisc) oscillator is used to excite the LCR resonator at a constant drive frequency. The oscillator circuit contains a mechanically tunable potentiometer and capacitor to allow some tuning of the UHF signal frequency and amplitude. The magnitude of the UHF signal voltage VHF which is applied between the tip and sample is an important variable for SCM image interpretation. The capacitance CTS between the SCM tip and the sample is incorporated into the central LCR circuit. As shown in Fig. 3, an LCR circuit displays a characteristic bell-shaped response versus frequency; the peak response is at the resonant frequency. When CTS changes, the resonant frequency changes, and the amplitude of the oscillations in the resonant circuit changes. The total
Vac(w)
P-type
Vdc
capacitance in the sensor LCR circuit includes the tip–sample capacitance, the stray capacitance between the sample and sensor, and a variable tuning capacitance. In an RCA style sensor, a varactor diode is used to provide a voltage-variable capacitance. The varactor capacitance can be used to adjust the total capacitance in the LCR circuit, which in turn changes the circuit’s resonant frequency. In this way, the sensor can be ‘‘tuned’’ to produce a similar response for a range of tip–sample capacitances. A simple video peak detection circuit is used to detect the amplitude of the oscillations in the resonant circuit, thus giving a measure of CTS . The detector circuit produces a dc output voltage that changes in proportion to CTS . Figure 3 shows the effect of introducing a resonant frequency deviation, due to a change in CTS , on the detected sensor output voltage. The rightmost bell-shaped envelope shows the LCR circuit response to changes in drive frequency. The peak response is at the resonant frequency. The leftmost envelope shows the response when the resonant frequency has been altered by 115 kHz due to a change in CTS . At the drive frequency, this causes the amplitude of the oscillations in the resonant circuit to drop by 10 mV. In this example, this change will show up in the detector output voltage by a drop in 10 mV from an unperturbed output voltage of around 2 V. Summary of MOS Device Physics. To generate image contrast, the SCM exploits the voltage-dependent
Amplitudemodulated carrier
Peak detector AMP
Tip/ sample capacitance C TS Figure 2. Capacitance detection circuit used in the RCA VideoDisc sensor and later modified for use in scanning capacitance microscopy (4). (Copyright American Institute of Physics 1989, used with permission.)
UHF oscillator
C
Capacitance sensor output
CAPACITIVE PROBE MICROSCOPY
Sensor output voltage (V)
Freq. deviativon
−4
Vp
BW = 20 MHz
Vp Slope = − √3 4 BW
−2
10 mVpp
f f0 = 915 MHz Resonant drive freq. freq.
Freq.
Figure 3. Capacitance sensor LCR circuit response illustrating how the shift in resonant frequency is converted into an output voltage [adapted from (5)].
capacitance between the conducting SCM tip and a semiconductor sample under test. When imaging silicon by SCM, this capacitor is usually considered a metal–oxide–semiconductor (MOS) capacitor, though a metal–semiconductor Schottky contact may also be formed (42). The quality of the SCM measurement on silicon depends on forming a thin oxide and establishing an unpinned semiconductor surface. Because a semiconductor is necessary for generating a SCM signal in contact mode, the following discussion assumes that a semiconductor is present as the sample to be imaged. A MOS capacitor displays a characteristic voltagedependent capacitance due to the ‘‘field effect.’’ The field effect is also the basis of the technologically important metal–oxide–semiconductor field effect transistor (MOSFET). The physics of the field effect and the MOS capacitor is discussed in great detail in many textbooks (75,76). Following is a brief discussion of the MOS capacitance versus voltage (C–V) response in the high-frequency regime that will define the terms and behaviors necessary to describe how the SCM functions. The MOS capacitance changes with voltage because an electric field can attract or repel charge carriers to or from the surface of a semiconductor. A voltage applied across a MOS capacitor induces an electric field at the surface of the semiconductor. Characteristically, the MOS C–V response is divided into three regions. When the voltage on the metal has a sign opposite that of the majority carriers, the electrical field repels minority carriers and attracts additional majority carriers. The silicon surface has ‘‘accumulated’’ additional majority charge carriers. In the accumulation region, the capacitance is just the parallel plate oxide capacitance between the metal and semiconductor. When the voltage on the metal has the same sign as that of the majority carriers, the electrical field attracts minority carriers and repels majority carriers. The silicon surface is then ‘‘depleted’’ of charge carriers. In the depletion region, the capacitance is the oxide capacitance plus a depletion capacitance due to
19
the field-induced space charge region (or depletion region). When the voltage on the metal is large and has the same sign as the majority carriers, the electrical field can attract sufficient minority carriers to ‘‘invert’’ the conduction type at the silicon surface. In the inversion region, the depletion region has reached its maximum extent, and the capacitance is at a minimum. The sign of the voltage necessary to deplete a semiconductor depends on its conduction type. Silicon of n-type, where electrons are the majority carriers, displays accumulation at positive tip voltages, and inversion at negative tip voltages. Silicon of p-type, where holes are the majority carriers, displays accumulation at negative tip voltages and inversion at positive tip voltages. The dividing point between accumulation and depletion occurs at the flatband voltage Vfb . Flatband describes a condition of semiconductor band structure where the conduction band, valence band, and Fermi level at the surface are all equal to their bulk values. The ideal value of Vfb depends on the net carrier concentration in the semiconductor and the dielectric constant of the semiconductor. In real systems, oxide surface charges, the dielectric constant of the insulator, and the work function of the gate (or SCM tip) metal cause Vfb to vary from its ideal value. Analysis of the MOS C–V curve reveals a lot of information about the semiconductor–insulator system. The value of the MOS capacitance depends on the voltage and also on the oxide thickness tox , the oxide dielectric constant εox , the semiconductor dielectric constant εs , and the carrier concentration N in the semiconductor. The maximum capacitance in accumulation is equal to the oxide capacitance Cox . The ratio of the inversion and oxide capacitances and the slope of the capacitance versus voltage in depletion depend on the carrier concentration of the semiconductor. SCM image contrast is generated from the voltage-dependent capacitance of a MOS capacitor formed by the SCM tip and an oxidized semiconductor. Field of View and Spatial Resolution. Because it is based on an AFM, the SCM has the same field of view as the AFM. For most commercial systems, the field of view can be continuously adjusted down from 100 × 100 µm to less than 1 × 1 µm. The spatial resolution of the SCM is also related to that of the AFM. The AFM system determines the number of points per line and the number of lines per image. However, the tip radius determines the ultimate spatial resolution of SCM. SCM gathers information from a volume beneath the probe tip that is determined by the magnitude of the voltages applied to the SCM. Large applied voltages can cause that volume to expand to many times the tip radius, especially for material that has a low concentration of carriers. It is generally believed that, by controlling the voltages applied to the SCM and by using models in data interpretation, the SCM can measure carrier profiles whose spatial resolution is at least equal to the tip radius (20,30). Current SCM tip technology produces tips whose radii are of the order of 10 nm.
20
CAPACITIVE PROBE MICROSCOPY
Modes of Operation The SCM detection electronics are used to measure a difference in capacitance C, between the tip and sample. The SCM is commonly operated to image dopant gradients in two different modes, the constant V mode and the constant C mode. Constant 1V (or Open Loop) Mode. In the constant delta voltage (V) mode of SCM (8–10), an ac sinusoidal voltage whose magnitude is Vac at a frequency of ωac is applied between the sample and the SCM tip, which is a ‘‘virtual’’ ground. In this case, the tip is inductively coupled to ground. The inductance is such that it passes the ωac signal to ground but does not ground the ωhf signal, which is used to detect the capacitance. Vac is typically around 1 V at a frequency of 10–100 kHz. Figure 4 illustrates the voltage dependence of a MOS capacitor and the mechanism of SCM contrast generation. The Vac signal induces a change in the tip-to-sample capacitance C(ωac ) that depends on the slope of the C–V curve between the tip and the semiconductor. When the tip is over a semiconductor that has a low doping concentration, the C–V curve changes rapidly as voltage changes, and Vac produces a relatively large C. When the tip is over a semiconductor that has a high dopant concentration, the C–V curve changes slowly as voltage changes, and Vac produces a relatively small C. When a set value of Vac is applied between the tip and the sample, the capacitance sensor produces a voltage output signal that varies at ωac in proportion to C. In the constant V mode of SCM, the SCM signal is this sensor output voltage. As shown in Fig. 1, the magnitude of the sensor output is detected by using a lock-in amplifier that is referenced to ωac . Because n-type and p-type semiconductors produce C–V characteristics that are mirror images of each other, they produce distinctly different responses to the SCM (17,18). In general, n-type silicon produces a response that is inphase with the drive signal, and p-type silicon produces a response that is 180° out
Capacitance
DC high
DC low
Vac
Tip-sample voltage
Figure 4. Schematic illustration of the way SCM image contrast is generated for two different dopant densities of silicon.
of phase with the drive signal. When using a lock-in amplifier in the x/y mode, n-type silicon produces a positive signal, and p-type silicon produces a negative signal. The capacitance sensor output provides a relative measure of differential capacitance — larger values of C produce larger sensor output voltages. Because the tip shape and the tuning of the capacitance sensor are variable, the absolute value of capacitance (in farads) measured by the SCM is difficult to determine. When measuring an unknown, the absolute capacitance is determined by comparison to measurements of a reference whose capacitance can be calculated from its known properties. Several operational factors strongly influence the SCM signal: 1. ac bias voltage Vac . The ac voltage determines how much of the C–V curve is used to generate the differential capacitance. Larger values of Vac generate larger values of C. The larger change in capacitance is due to depletion of carriers from a larger volume of the sample. This results in lower spatial resolution when the SCM is used to measure quantitative carrier profiles. For quantitative measurements, a low value of Vac is used, typically 1 V or less. 2. dc bias voltage Vdc . The dc voltage Vdc applied between the sample and the tip determines where C is measured on the C–V curve. The effect of Vdc is relative to the flatband voltage of the MOS capacitor formed by the tip and the sample. Vfb will vary from sample to sample. The peak SCM response occurs when Vdc ≈ Vfb (24). In a sample that contains a dopant gradient, Vfb varies with dopant concentration. In practice, Vdc is usually set equal to the voltage that produces the maximum SCM response in the region that has the lowest doping concentration. 3. Sensor high-frequency voltage Vhf . The magnitude of the high-frequency voltage Vhf in the LCR loop of the capacitance sensor is proportional to the magnitude of the output signal. Increasing Vhf increases the capacitance sensor output and the signal-to-noise of the capacitance measurement. However, similar to Vac , the measured SCM signal is averaged across its values at the voltages spanned by Vhf . Large values of Vhf generate large signals at the cost of spatial resolution. 4. Sensor-to-sample coupling. The capacitance sensor forms a grounded LCR loop that includes the tip–sample capacitance. The capacitance between the sample and the grounded outer shield of the capacitance sensor completes the loop. Subtle variations in the sample geometry can influence this coupling capacitance. A signature of large coupling is the generation of a nonzero SCM signal when the probe is over a region such as a good insulator where no SCM signal should be expected.
CAPACITIVE PROBE MICROSCOPY
Constant 1C (or Closed-Loop) Mode. For carrier profiling applications, the SCM is also commonly operated in a constant C mode to control the volume depleted of carriers better. A feedback loop (hence, closed-loop mode) is added to the SCM to adjust the magnitude of Vac automatically to keep C constant in response to changes in dopant concentration as the SCM tip is scanned over a dopant gradient (18,27,29). The lower the doping beneath the tip, the smaller the Vac needed to induce the same C. Figure 5 shows a block diagram of a constant C SCM feedback loop. Because C remains constant, the volume depleted of carriers and therefore the spatial resolution of the image are less dependent on the dopant concentration in the semiconductor. In the constant C mode of SCM, the SCM signal is the value of Vac that the feedback loop establishes to maintain a constant capacitance. To achieve optimum control of the volume of the depletion layer in the constant C mode, the dc bias must be equal to the flatband voltage of the tip–sample MOS capacitor (17). The feedback loop of the constant C mode produces a signal that changes monotonically as carrier concentration changes only for dopant gradients in like-type semiconductors. When a p–n junction is present, the changes in sign of the SCM signal when transiting between n-type and p-type material cause the feedback loop to saturate near the junction. Capacitance-Voltage Curve Measurement. The measurement of capacitance versus dc voltage (i.e., the C–V curve) is the usual method of characterizing any MOS capacitor, and this is just as true for the MOS capacitor formed by the SCM tip and an oxidized semiconductor. Measurement of the SCM tip–sample C–V characteristics reveals the suitability of the sample preparation for quantitative carrier profiling. SCM tip–sample C–V curves can be measured by a boxcar averager (27) or a digital oscilloscope operated in the x, y mode. An ac voltage at around 1 kHz that spans the range of the desired voltage of the C–V curve is applied between the tip and
Out
Vref
Lock-in amp. Ref
+
the sample. The ac voltage is displayed on the x channel, and the capacitance sensor output is displayed on the y channel. The averaging feature of the digital oscilloscope is used to improve the signal-to-noise. To be suitable for quantitative carrier profiling, the C–V curve must display clear accumulation and depletion regions, little hysteresis between the forward and reverse sweep, and no increase in capacitance in the depletion region. The C versus V curve can also be measured directly by the SCM (25). In this case, an SCM image is acquired while slowly changing the value of Vdc , so that the desired range for the C–V curve is swept out once in the time it takes to acquire the image. Sections of such an image taken parallel to the slow scan direction reveal the C–V curve. A high-quality surface should yield a single peak (a positive peak for n-type and negative peak for p-type) at the flatband voltage. Scanning Capacitance Spectroscopy. The SCM technique has been extended into spectroscopy by measuring capacitance at multiple values of Vdc at each point of an image (46,47). Multiple points of the tip–sample C–V curve are measured by changing the applied dc bias voltage between the tip and the sample on successive scan lines. The measured C–V curves display different characteristics of behavior for n-type, p-type, and intermediately doped regions. Material of n-type has a C–V of positive slope, p-type has a C–V of negative slope, and intermediate regions display a U-shaped C–V curve. The p-n junction location can be estimated as the point where the C–V curve has a symmetrical U-shape (46). Intermittent-Contact SCM. The SCM can also be operated in intermittent-contact (or tapping) mode (20,21). In this case, a differential capacitance is generated by the change in the tip–sample distance due to the vibrating tip. Figure 6 shows a block diagram of the IC-SCM. The capacitance sensor output is measured by a lock-in amplifier using the tip vibration frequency as the reference.
Lock-in amplifier
AFM auxiliary input
In
21
Ref.
Capacitance sensor
-
w
Tip Sample
Vdc
Piezo stack
Dielectric
Capacitance sensor
Out
Vac(a)
DC (w) Tune +15 V
Ref Amplitude
modulation In
Figure 5. Feedback loop used to implement the constant C mode of the scanning capacitance microscope (16). (Copyright Elsevier Science 1997, used with permission.)
Silicon
Metal
Figure 6. Block diagram of the intermittent-contact mode scanning capacitance microscope (20). (Copyright American Institute of Physics 1998, used with permission.)
22
CAPACITIVE PROBE MICROSCOPY
No modulation of a depletion region in a semiconductor is required in the IC-SCM mode to generate a differential capacitance signal. Because all materials display a tip–sample spacing-dependent capacitance, the IC-SCM extends the technique to measurements on nonsemiconductors. IC-SCM can detect metal lines beneath an insulating layer, even if the insulating layer has been planarized to remove any indication of the metal lines in the surface topography. This may have applications in lithographic overlay metrology. IC-SCM can also detect variations in the dielectric constant of thin films; the difference between SiO2 (ε = 3.9) and Si3 N4 (ε = 7.5) is clearly resolvable. This capacity may have applications for evaluating deposited dielectric films used in the semiconductor industry, such as candidate alternative gate dielectrics and for capacitors in memory circuits. The IC-SCM’s ability to measure dielectric constant at high spatial resolution may also have applications for library evaluation in combinatorial methods. APPLICATIONS OF SCM This section will discuss several applications of SCM, except for quantitative carrier profiling by SCM, which will be discussed separately in the next section. The most common application of SCM today is for qualitative imaging and inspection of fabricated doped structures in silicon. SCM images of a matched pair of 0.35-µm channel NMOS and PMOS silicon transistors are shown in Fig. 7 and 8 (17,18). Other device structures such as the tungsten plug silicide contacts to source/drain regions and the trench sidewall implants for DRAM
200 nm
Figure 7. SCM image of cross-sectioned NMOS silicon transistor (17). The lines show the approximate location of the silicon surface, polysilicon gate, and p–n junction. (Copyright Electrochemical Society 1997, used with permission.) See color insert.
200 nm
Figure 8. SCM image of cross-sectioned PMOS silicon transistor (19). The lines show the approximate location of the silicon surface, polysilicon gate, and p–n junction. (Copyright American Institute of Physics 1998, used with permission.) See color insert.
trench capacitor processes have also been imaged by SCM. SCM images are used to detect defects that negatively impact device performance, to evaluate the uniformity of implants, and to determine information as basic as whether the implant dopant step used the right type of dopant, or even if it was conducted at all. This type of information is not as easily available from other techniques, making SCM an increasingly used method for failure analysis. SCM is also used to determine the location of the electrical p–n junction in images of cross-sectioned silicon devices that allows measuring the basic transistor properties of p–n junction depth and MOSFET channel length. The apparent p–n junction location in an SCM image is where the signal changes sign from positive (for n-type) to negative (for p-type). The apparent junction location is a function of both Vdc and Vac (22,51,52). By varying Vdc , it is possible to move the apparent junction location from one side of the built-in depletion region of the junction to the other. The apparent junction location coincides approximately with the actual electrical junction when Vdc is midway between the flatband voltage of the n-type silicon and the flatband voltage of the p-type silicon (22,51,52). Junction location can also be determined from scanning capacitance spectroscopy images. A variety of other semiconductors have also been imaged by SCM for the same general goals as for silicon. Images of InP based laser structures (63–65), SiC, and AlGaN/GaN (41) have recently been published. SCM imaging of InP laser structures has been used to evaluate regrown material quality, interface locations, and the uniformity of doped structures formed by LPE and MOVPE. An SCM image of an InP/InGaAsP Buried Heterostructure Laser is shown in Fig. 9. The active region
CAPACITIVE PROBE MICROSCOPY
Figure 9. SCM image 4 × 8 µm of a cross-sectioned InP/InGaAsP buried heterostructure laser. (Image/photo taken with NanoScope SPM, Digital Instruments, Veeco Metrology Group, Santa Barbara, CA, courtesy Andy Erickson. Copyright Digital Instruments Inc. 1995–98, used with permission.) See color insert.
of the laser is in the center of the image and, if lasing, the emitted light would be directed out of the page. SCMs have been used to study the charge trapping and detrapping dynamics of silicon dioxide or silicon nitride insulating layers on silicon (39,44,53,55–58,60,61,66). A voltage pulse through an SCM tip can be used to inject a charge into a nitride–oxide–silicon (NOS) structure, which becomes trapped. The trapped charge induces a depletion region in the silicon whose capacitance can be detected by SCM. A reverse bias pulse can remove the stored charge. Much of this work has been conducted to determine the potential for using charge trapping and detection with capacitive probes as a nonvolatile semiconductor memory. Because the small tip size enables high packing densities of information, it has been hypothesized that such a technique may be able to store large amounts of data in a small area. TWO-DIMENSIONAL CARRIER PROFILING OF SILICON TRANSISTORS Two-dimensional carrier profiling is the measurement of the local carrier concentration (in cm−3 ) as a function of the x and y positions across a semiconductor surface. When referring to 2-D carrier profiling of MOSFETs, this 2-D surface is assumed to be a cross section of the device parallel to the direction of the source-to-drain current flow. Such a cross section will reveal the structure of the source and drain regions and allow visualizing the sourceto-drain spacing and junction depths. A one-dimensional profile traditionally is the distribution of carriers as a function of depth from the surface of the silicon wafer. The regions of most interest are those where high concentrations of dopants have been introduced into the semiconductor by ion implantation or diffusion to form regions of high conductivity. In particular, the dopant concentration as a function of position (or the dopant profile) of the source and drain regions of a MOSFET in the vicinity of the channel largely determines the electrical characteristics of the device and is of much technological interest. For the sake of clarity, keep in mind that dopants are the chemical impurities introduced to change the electrical conductivity of a semiconductor, and ‘‘carriers’’
23
are the electrically active charge carriers. If incorporated into an electrically active site of a crystal lattice, each dopant can provide a charge carrier. Because the internal electrical field from steep gradients of dopants can cause redistribution of the charge carriers, the dopant profile and the carrier profile are not necessarily the same. SCM is sensitive to the electrically active charge carrier profile. To measure 2-D carrier profiles in silicon by SCM requires three separate procedures: sample preparation, SCM image acquisition, and carrier profile extraction. For silicon, sample preparation must expose the region of interest, produce a smooth surface whose residual scratches are less than a nanometer deep, and form a thin oxide or passivation layer. To extract accurate quantitative carrier profiles, SCM images must be acquired under highly controlled and specified conditions. The SCM model parameters must be measured, and data from known reference samples must be included. Once a controlled SCM image has been acquired, the 2-D carrier profile can be extracted with reference to a physical model of the SCM. Forward modeling is the calculation of a SCM signal for a known carrier profile whose measurement and sample model parameters are known. Several forward models of varying complexity and accuracy have been developed. Reverse modeling is the rather more complex problem of determining an unknown carrier profile from the measured SCM signal and the known model parameters. SCM Measurement Procedures for Carrier Profiling Sample Preparation. SCM cross-sectional sample preparation is derived from techniques developed for Scanning Electron Microscopy (SEM) and Transmission Electron Microscopy (TEM). A detailed common procedure is described in (77). Sample preparation usually involves four steps: (1) cross-sectional exposure, (2) back side metallization, (3) mechanical polishing, and (4) insulating layer formation. Before cross-sectioning, the region of interest is usually glued face-to-face to another piece of silicon. This cover prevents rounding and chipping of the surface of interest during mechanical polishing. Thin ( 3 is required in many applications. Tricolor RGB imaging equipment is manufactured in huge quantities at relatively low cost to serve the imaging needs of humans. Thus, the availability of equipment alone dictates that one at least consider tricolor RGB for any image processing application that cannot be performed with monochrome equipment. Three spectral channels are relatively convenient to accommodate in an imaging problem, although spectral information is thereby quantized rather crudely. In many image analysis applications, it may be unimportant to match the spectral characteristics of the human eye. For example, if one wishes to classify different types of terrain and vegetation in color aerial photographs, the response of the human eye is irrelevant. On the other hand, if one wishes to display on a CRT monitor an image that looks like the original scene, then the perceptual characteristics of the eye are of central importance. Similarly, if one wishes to display
102
COLOR IMAGE PROCESSING
an image so that it appears exactly as it would look when printed by a certain printing process, then both human vision and the printing process must be modeled accurately. In the remainder of this section, we restrict the discussion exclusively to the three-channel RGB case. COLOR IMAGE PROCESSING AND ANALYSIS There are two distinct purposes for which one might use digital color imaging. One is to produce processed color images for human consumption; the other is to analyze color images toward some quantitative goal. These are direct generalizations of monochrome digital image processing and analysis, respectively. The challenge in color image processing is to record an image of a scene using a tricolor system and then use a tricolor display system (e.g., CRT monitor) to present the image so that it has the same colors as the original scene. Normally, the displayed image will not have the same spectral content as the original scene because the phosphors used for display will not match the spectral characteristics of the objects in the original scene. But if the job is well done, it will be perceived that the colors in the displayed image match those in the scene. Here, an accurate model of the human visual system is required. Otherwise, even before any processing, the image will not look natural when displayed. In color image analysis, on the other hand, one seeks to analyze the content of a color image. An example might be to estimate the amount of forest, lake, and cultivated farmland in an aerial color photograph. The result of such an analysis may not ever be displayed as a natural-looking color image. Here, the details of human visual perception are much less important. International Standards A large body of color vision research has been done in connection with the international standardization of television transmission technology. Various standards have been established by the Commission Internationale de l’Eclairage (CIE), an international standards committee for light and color, and by the National Television Standards Committee (NTSC), the Society of Motion Picture and Television Engineers (SMPTE), and the International Television Union (ITU). Many of these standards are based on data from experiments conducted on human observers. Definitions The field of color science has developed a set of strict definitions of the terms used to describe color, particularly in documents published by the organizations mentioned. The same terms, however, are used more freely in digital image processing. Because the focus here is on digital processing of color images, we do not hold rigidly to the terminology of color science. In a color matching experiment, a human observer adjusts the intensity of three primary colors (red, green,
and blue) until it is perceived that their mixture matches a given color. The intensities of the three primaries, relative to those required to match a reference white, form the set of three tristimulus values that specify the color. A color matching experiment requires prior selection of three primary colors and a reference white that corresponds to the chosen illumination source. Additive color matching uses superimposed spots of colored light, but subtractive color matching works with stacks of colored filters. In this section, we are concerned only with additive color matching. It is often conceived that the tristimulus values derived from a color matching experiment specify a location in a three-dimensional color space. Brightness refers to how strongly a lighted or lightemitting object stimulates the visual system. Scaled between ‘‘bright’’ and ‘‘dim,’’ it is the perception of how much light is being radiated. Related terms are intensity, luminance, luma, lightness, and value. Intensity is the total emitted radiant power per unit solid angle. It can be obtained by integrating the SPD of the source across the wavelength spectrum. Luminance, as defined by the CIE, is the emitted radiant power, weighted by the spectral sensitivity function of human vision. This latter curve is called the ‘‘luminous efficiency of the standard observer.’’ Figure 2 shows the standard scotopic (low light level) luminosity function (2). Based on measurements by Wald (3,4) and by Crawford (5), it was adopted by the CIE in 1951 (6). It peaks at 555 nm and agrees roughly with the absorbance curve of the rod photopigment in Fig. 1. One can integrate the product of the ‘‘luminosity function’’ (Fig. 2) and an SPD to obtain a numerical luminance value, usually denoted as Y. This quantifies the perceptual strength of a color. One can also calculate the luminance of a color as a weighted sum of its RGB tristimulus values, as we shall see later. The human eye has a nonlinear response to light intensity. One must reduce the intensity of a light source to about 18% for it to appear half as bright. Lightness models this behavior and measures brightness relative to a reference white. Usually denoted L∗ , it is proportional to
1
0.75
0.5
0.25
400
450
500 550 600 Wavelength (nm)
650
700
Figure 2. The luminosity function of human vision. This curve shows the sensitivity of the human eye to light, as a function of wavelength, normalized to unity at the peak.
COLOR IMAGE PROCESSING
the cube root of the luminance ratio of the test source over the reference white. Luma, usually denoted Y , is computed from RGB tristimulus values in a manner similar to luminance, except that nonlinear (gamma-corrected) values of R, G, and B are used. It is similar to lightness, except that the nonlinear transformation is done before averaging the RGB components. Value is a scale of ten perceptually equal steps of brightness developed by Munsell (see later). The hue of a color refers to the spectral color (e.g., a color of the rainbow) to which it is perceptually closest. The scale of hues is circular and runs first through the rainbow from red through orange, yellow, green, and cyan to blue, then through the nonspectral (magenta or purple) colors and back to red. The hue of a light source usually corresponds to the wavelength at which its SPD peaks. Saturation is a measure of how strongly colored (colorful) a light source is. Vividly colored lights, like the spectral colors, are highly saturated. Pastel colors have only moderate saturation. Neutral colors (grays) have zero saturation. The related term chroma is sometimes used. The concept of saturation can be illustrated as follows. A bucket of bright red paint would correspond to a hue of 0° and saturation 1. Mixing in white paint makes the redness less vivid and reduces its saturation, but does not change its hue. Pink corresponds to a saturation of 0.5 or so. As more white is added to the mixture, the red becomes paler, the saturation decreases and eventually approaches zero saturation (white). On the other hand, if one mixed black paint with the bright red, its intensity would decrease (toward black), and its hue (red) and saturation (1.0) would remain constant. Chrominance refers to the colorfulness of a source, independent of intensity. Hue and saturation together make up chrominance.
103
The tungsten filament of a light bulb, for example, is a blackbody radiator. The spectral power distribution of a blackbody radiator is given by Planck’s law, E(λ) =
c1 λ ec2 /λT
−5
−1
,
(3)
where T is temperature in degrees Kelvin and E is energy in ergs per second per nanometer of bandwidth per square centimeter of radiating surface. The constants are c1 = 3.7415 × 1023 and c2 = 1.4388 × 107 (7). Materials at about 5000 K emit light whose SPD is reasonably flat across the visible spectrum. Cooler objects emit reddish light, and hotter objects appear bluish. Figure 3 shows the spectral energy distribution of black bodies at three different temperatures. Although warmer bodies emit more power per unit surface area, the plots in Fig. 3 are normalized so that they are equal at 550 nm. Color Temperature Any reasonably white light source can be specified by its ‘‘color temperature,’’ that is, the black body temperature to which it is perceptually closest. Ordinary tungstenfilament lamps have a color temperature of about 2400 K. Those designed for photography operate at color temperatures up to about 3200 K. Sunlight on a clear day has a color temperature of about 5000 K. Blue sky has a color temperature of about 20,000 K. Color temperatures between 5000 and 5500 K appear white to most people. Lower color temperatures appear yellowish, and higher temperatures tend to look bluish, although the eye tends to adapt so as to perceive the ambient illumination as white. CIE Standard Illuminants In 1931, the CIE established definitions of, and published SPDs for, standard illuminants that correspond to various artificial and sunlight sources. CIE Illuminant A is
COLORIMETRY 2 Blackbody radiators
1.5 6500 K SPD
The science of quantifying perceived color is called colorimetry. Here, one seeks to specify a set of tristimulus values that correspond uniquely to a particular perceived color. Before this can be done, however, one must establish the spectral characteristics of the three primary colors and of the illumination, which is called the reference white. Then, color matching is used to determine the amount of each primary color that, when mixed together, will reproduce a color sensation indistinguishable from that of the test color. Now, we address ways one can quantitatively specify a color, such as that of a pixel in a color digital image.
1
5500 K
0.5
3200 K
Illuminants Blackbody Radiators A dense incandescent material radiates electromagnetic energy. As its temperature changes, so do the intensity and spectral characteristics of the emitted energy. At high enough temperatures, the radiation contains visible light.
350
400
450
500 550 600 Wavelength (nm)
650
700
750
Figure 3. Spectral energy distribution of incandescent black bodies at three temperatures. Radiant energy per unit of radiating surface area, per unit of wavelength, normalized to unity at 550 nm.
104
COLOR IMAGE PROCESSING
2 White reference
A
SPD
1.5
C
Test color
B
1
D65 0.5
0 350
400
450
500 550 600 Wavelength (nm)
650
700
750
Figure 4. SPDs for four CIE standard illuminants.
intended to represent tungsten light. It is a blackbody radiator whose SPD is calculated from Eq. 3 using a color temperature of 2856 K. CIE Illuminant B is intended to represent direct sunlight. It has a color temperature of approximately 4874 K. CIE Illuminant C is intended to represent average daylight. It has a color temperature of approximately 6774 K. CIE Illuminant D65 is now regarded as the main standard for daylight illumination. It has a color temperature of approximately 6504 K. SPDs for these four illuminants, plotted in Fig. 4, are scaled so that they are equal at 550 nm. Primaries
Figure 5. Color matching experiment configuration. The intensities of the overlapping primaries are adjusted to match first the reference white, then the test color.
Color Matching Experiments In a color matching experiment, a spot of light of an arbitrary test color is projected on a uniformly diffusing (white) surface, that is, one that reflects all wavelengths equally in all directions. A spot of reference white light is also projected on the same surface (Fig. 5). Finally, three spots of primary-colored light are superimposed on the same surface. If A1 (C), A2 (C), and A3 (C) are the primary intensities required to make the color of the overlapped spot match that of the test color and A1 (W), A2 (W), and A3 (W) are the primary intensities required to match the reference white, then the tristimulus values that specify the test color are
A color matching experiment requires selecting red, green and blue primaries, which are then mixed in proper proportions to match a specific color. Thus, the tristimulus values obtained depend on the specific choice of primaries.
T1 (C) =
A1 (C) , A1 (W)
T2 (C) =
A2 (C) , A2 (W)
T3 (C) =
A3 (C) . A3 (W)
Monochromatic Primaries In 1931, the CIE adopted a set of three monochromatic light sources as primary illuminants for use in color matching experiments (8). Their wavelengths were red: 700 nm, green: 546.1 nm, and blue: 435.8 nm. Phosphor Primaries Color images are often displayed on CRT tubes that generate a mixture of light from red, green, and blue phosphors, so it is natural to use the SPDs of those phosphors as primaries in color matching experiments. The NTSC primaries of 1953 were standardized in ITU Recommendation BT.601 (9). The luminance resulting from these primaries is given by Y601 = 0.299RN + 0.587GN + 0.114BN .
(4)
Primaries consistent with contemporary CRT phosphors were standardized in ITU-R Recommendation BT.709 (10). The luminance resulting from these primaries is given by Y709 = 0.2125RN + 0.7154GN + 0.0721BN .
(5)
and
(6)
It may not be possible to match the test color with any mixture of the three primary colors. In that case, one matches a mixture of two of the primaries to a mixture of the test color and the third primary. Again, all three primary intensities are adjusted to produce a perceptual match. In this case, the tristimulus values are T1 (C) =
−A1 (C) , A1 (W)
T2 (C) =
A2 (C) , A2 (W)
T3 (C) =
A3 (C) , A3 (W)
and (7)
where the first primary is that mixed with the test color. If this cannot produce a match, even after all three primaries have been used with the test color, one attempts to match one primary with a mixture of the test color and the other
COLOR IMAGE PROCESSING
two primaries. In this case,
105
White
T1 (C) =
−A1 (C) , A1 (W)
T2 (C) =
−A2 (C) , A2 (W)
T3 (C) =
A3 (C) , A3 (W)
B
Blue
Cyan
Magenta
and G
(8)
Yellow
Green
where the first and second primaries are mixed with the test color. Notice that the latter two cases produce negative tristimulus values for colors that cannot be matched by a mixture of the three primaries. The Laws of Color Matching Grassman (11) has advanced a set of axioms that describe color matching phenomena. These lead to eight principles of color matching: 1. Any color can be matched by a properly balanced mixture of no more than three colored lights (primaries), provided that none of the primaries can be matched by a mixture of the other two. 2. Color matches are relatively independent of intensity level. 3. The human eye is unable to decompose a mixture into its primary components. 4. The luminance of a mixture is equal to the sum of the luminances of the components. 5. If color A matches color B and color C matches color D, then when color A is mixed with color C, it will match the mixture of color B with color D. 6. If the mixture of color A with color B matches the mixture of color C with color D and color A matches color C, then color B matches color D. 7. If color A matches color B and color B matches color C, then color A matches color C. 8. Either color A can be matched by a mixture of colors B, C, and D; a mixture of colors A and B can be matched by a mixture of colors C and D; or a mixture of colors A, B, and C can be matched by color D. COLOR COORDINATE SYSTEMS Several systems have been developed for specifying color. They derive mainly from the large body of work that has been done on the international standardization of television transmission methodology by standards organizations. A few of the more important color systems are listed here. RGB Color Spaces The most straightforward way to specify color is to use red, green, and blue tristimulus values, scaled between, for example, zero and one. Each pixel can be represented
Black
Red
R
Figure 6. Rectangular RGB color space.
by a point in the first quadrant of three-space, as shown by the color cube in Fig. 6. The origin of the RGB color space represents no intensity of any of the primary colors and thus corresponds to the color black. Full intensity of all three primaries together appears as white at the opposite corner of the cube. Equal amounts of the three color components at lesser intensity produces a shade of gray. The locus of all such points falls on the diagonal of the color cube, called the gray line. Three of the corners of the color cube correspond to the primary colors, red, green, and blue. The remaining three corners correspond to the secondary colors, yellow, cyan (blue-green), and magenta (purple). Each of the secondaries is a combination of two primary colors. The gray-level histogram of an RGB image is a scatter of points in three-dimensional RGB-space. Figure 7 shows a color image and its R, G, and B components. The Rc Gc Bc Coordinate System In 1931, the CIE developed a color coordinate system (8) based on color matching experiment data obtained by Guild (12) and by Wright (13). The CIE 1931 system forms the basis for much of modern colorimetry. It uses the three 1931 CIE monochromatic primaries mentioned before (14,15). The reference white is an illuminator that has constant energy throughout the visible spectrum. Figure 8 shows the tristimulus values required to match the spectral (monochromatic) colors (2,16). These curves were determined experimentally using several normal human subjects, and they represent the perception of the CIE Standard Observer. Note that, for this set of primaries, certain of the tristimulus values are negative at some wavelengths. The RN GN BN and RS GS BS Coordinate Systems The NTSC developed the RN GN BN colorimetry system using as primaries three phosphors that were typical of those used in commercial color television in 1953 (17). This was codified in ITU Recommendation BT.601 (9). The reference white is CIE source C, which approximates the light from an overcast sky (18). The NTSC tristimulus values for a particular color can be obtained from the CIE
106
COLOR IMAGE PROCESSING
Figure 7. Linear color space: Upper left: color image; upper right: red; Lower left: green; Lower right: blue component. See color insert.
values by the linear transformation R N
GN BN
=
0.842 0.156 −0.129 1.320 0.008 −0.069
0.091 RC GC . −0.203 BC 0.897
(9)
again formed by summing the RN , GN , and BN values. The chrominance (color difference) signals U and V that contain the color information are ignored by black-andwhite television receivers. The YUV coordinates of a color can be obtained from its NTSC components by the linear relationship
Later the SMPTE developed the RS GS BS system that better matches CRT phosphors in use today (19). It is codified in ITU Recommendation BT.709 (10). These two RGB systems are linearly related by R S
GS BS
1.609 −0.447 −0.104 R N = −0.058 GN . 0.977 0.051 BN −0.025 −0.037 1.162
(10)
Color Difference Systems For color television transmission to remain compatible with existing black-and-white receivers, it was necessary to establish color standards that (1) continue to transmit the luminance signal as before and (2) encode the color information so that it would not affect existing monochrome receivers. The RGB signals were combined to form the luminance signal [e.g., as in Eq. (4)], and two new signals were designed to carry the color information. These were color difference signals (B–Y and R–Y) from which R, G, and B could be recovered by suitable processing. The YUV and YIQ Coordinate Systems The PAL and SECAM color television broadcast systems used in many countries (20) transmit three color component signals called Y, U, and V. Y is the luminance signal,
Y U V
=
0.299 −0.148 0.615
0.587 0.114 RN GN . −0.289 0.437 BN −0.515 −0.100
(11)
The NTSC color broadcasting system used in the United States employs three color component signals called Y, I, and Q. Y is again the luminance signal, but I and Q are chrominance signals obtained by rotating the U and V signals by 33° . They are given by I = −U sin(33° ) + V cos(33° ) and Q = U cos(33° ) + V sin(33° ).
(12)
The I and Q signals can be subjected to more bandwidth reduction than the U and V signals without visibly degrading the transmitted image (18). The YIQ signals can be obtained directly from the NTSC components by the linear relationship Y I Q
0.299 =
0.596 0.211
0.587 0.114 RN GN . −0.274 −0.322 BN −0.523 0.312
(13)
COLOR IMAGE PROCESSING
tristimulus values by the linear relationship
3
X Y Z
2
CMF
107
0.607 =
0.299 0.000
0.174 0.587 0.066
0.200 RN GN . 0.114 BN 1.116
(14)
Note that the middle row of the matrix contains the luminance weighting coefficients from Eq. (4). Figure 10 shows an image and its X, Y, and Z components. The XYZ values can be normalized to sum to one to form the chromaticity coordinates
1
400
450
500 550 600 Wavelength (nm)
650
700
X , X +Y +Z Y y= , X +Y +Z
x=
−1 Figure 8. CIE 1931 tristimulus values. These curves show how much of each primary is required to match monochromatic colors of different wavelengths.
and z=
Figure 9 shows an image and its Y, I, and Q components. The XYZ Coordinate System The negative-going tristimulus values of the 1931 CIE system (Fig. 8) create problems in television transmission, where it is more convenient to encode positive signals. The CIE developed a color system in which the tristimulus values (X, Y, Z) are always positive. It uses three artificial primaries and an equal-energy reference white. It is designed so that the Y component is the luminance. The XYZ tristimulus values can be obtained from the NTSC
Z . X +Y +Z
(15)
Then, the chrominance of the color, independent of intensity, can be shown in an x, y chromaticity diagram (Fig. 11). The chromaticity coordinates of the D65 illuminant, for example, are [x, y] = [0.3127, 0.3290]. Table 1 shows the chromaticity coordinates of the BT.709 primaries. The XYZ and x, y systems are now the most commonly used color specification systems. Cube Root Systems It is desirable that perceptually equal changes in color (hue, saturation, and intensity) should produce
Figure 9. YIQ color difference space: Upper left: color image; Upper right: Y; Lower left: I; Lower right: Q component. See color insert.
108
COLOR IMAGE PROCESSING
Figure 10. XYZ color space, Upper left: color image; Upper right: X; Lower left: Y; Lower right: Z component. See color insert.
Table 1. Chromaticity Coordinates of the BT.709 Primaries
0.8 550
0.7
Red Green Blue
0.6 500
x
y
0.640 0.300 0.150
0.330 0.600 0.060
y
0.5 3,000
0.4 5,000
4,000 6,000
490
0.3
600
2,000
The UVW and U ∗ V ∗ W ∗ Coordinate Systems 650
10,000 0.2 480 0.1
470 400
0 0
0.1
0.2
0.3
0.4 x
0.5
0.6
0.7
0.8
Figure 11. The x, y chromaticity diagram. The horseshoe shows where the spectral colors (wavelengths in nm) fall in normalized x, y space. The horizontal curve shows where different color temperatures (K) fall.
equal changes in color coordinates. Several color spaces that have been developed for this use the cube root function, which approximates the sensitivity of the human eye.
The C.I.E. UVW system, introduced in 1960, was designed so that just noticeable changes in hue and saturation produce equal changes in color coordinates (14,21). The U ∗ V ∗ W ∗ system, adopted in 1964, was an attempt to make just noticeable changes in both luminance and chrominance produce equal changes in color coordinates (14,22). They were both rendered obsolete by the 1976 introduction of the more accurate L∗ a∗ b∗ and L∗ u∗ v∗ systems. The L∗ a∗ b∗ Coordinate System The L∗ a∗ b∗ system (14,23) was designed to agree more closely with the Munsell color system (24,25). In this system, L∗ specifies the lightness, and a∗ and b∗ specify chrominance. The L∗ a∗ b∗ color coordinates are related to the XYZ system by 1/3 Y − 16, L∗ = 116 Yn
COLOR IMAGE PROCESSING
109
Figure 12. The L∗ u∗ v∗ cube root space: Upper left: color image; Upper right: L∗ , Lower left: u∗ , Lower right: v∗ component. See color insert.
a∗ = 500 and
∗
b = 200
X X0
X X0
1/3
−
1/3
−
Y Y0
Z Z0
1/3
and Xn , Yn , Zn corresponds to the reference white. Figure 12 shows L∗ u∗ v∗ component images when the D65 illuminant is used for the reference white.
, 1/3
Cylindrical Color Spaces ,
(16)
where Xn , Yn , and Zn are the coordinates of the reference white and Y/Yn > 0.008856. The L∗ u ∗ v ∗ Coordinate System The L∗ u∗ v∗ system (14,26), another cube root system, was introduced by the CIE in 1976. The color coordinates are related to the XYZ system by 1/3 Y − 16, L∗ = 116 Yn u∗ = 13L∗ (u − u0 ), and v∗ = 13L∗ (v − v0 ),
Separating Luminance and Chrominance The three types of color spaces mentioned before (linear RGB, color difference, and cube root spaces) do not truly separate luminance (brightness) from chrominance (hue and saturation). In color image processing, it is often necessary, for example, to segment the image into regions that correspond to different objects, such as forest and cropland in an aerial photograph. These regions differ in luminance, hue, and saturation due to the different reflective properties of the vegetation. They will also differ in luminance due to variations in illumination, such as shadows from clouds. Thus luminance, like R, G, and B, is affected by nonuniform illumination, but hue and saturation, to a first approximation, are not. Thus, it is often useful to use a color space in color image processing wherein hue and saturation are truly independent of luminance.
(17) The Color Circle
where 4X 9Y , v = , (18) X + 15Y + 3Z X + 15Y + 3Z 4Xn 9Yn , v0 = , (19) u0 = Xn + 15Yn + 3Zn Xn + 15Yn + 3Zn u =
Because the hue scale is periodic, artists have used the concept of a ‘‘color circle’’ for centuries. The fully saturated colors of the rainbow (the spectral colors) are spaced around the perimeter of the circle in the sequence red, yellow, green, cyan, blue, purple, and back to red (Fig. 13). Note that the rainbow extends only from red through
110
COLOR IMAGE PROCESSING
The Ostwald System
120° Green
Yellow
Sat Hue
Cyan
0° Red
About 100 years ago, Wilhelm Ostwald advanced a threedimensional generalization of the color circle (27). He placed a perpendicular gray axis through the center of the hue circle to form a double cone solid (Fig. 14). White fell at the upper apex, and black at the lower apex. The hues, numbered 1 through 24, are arranged around the periphery. The fully saturated colors fall on the perimeter of the circle where the two cones intersect. Adding white to a paint color causes it to move toward the upper apex, and adding black moves it toward the lower apex. The Munsell System
Blue 240°
Magenta
Figure 13. The color circle. Hue is an angle, and saturation is a vector length.
blue. Between blue and red fall the nonspectral (purple) colors that the eye perceives when red- and blue-sensitive cones, but not the green-sensitive ones, are stimulated simultaneously. This is not realizable with monochromatic light, but it does occur in nature. The colors on the color circle become less saturated (paler, more pastel) toward the center, but still retain their hue. The color circle, which has long proved useful in illustrating similarities and differences among colors, forms the basis for cylindrical color spaces.
22
Hue 23 24 1
The color space developed by H. A. Munsell (24,25) at about the same time as Ostwald’s is a cylindrical coordinate system, wherein ten hues are spaced around the color circle and nine ‘‘values’’ (intensities) cover the range from black to white along the vertical axis. Munsell established scales of ‘‘chroma’’ (saturation) and, for each hue, illustrated different chroma/value combinations with painted chips (Fig. 15). The hue, value, and chroma scales were designed to have perceptually equal steps. The Munsell system, like Ostwald’s, is a threedimensional generalization of the color circle, and it parallels human vision rather well. The Munsell system has retained its importance, in part, because the company founded by Munsell has supplied color matching books since 1904. Due to the availability of better pigments, the number of hues in the Munsell system has increased to forty. Since 1934, the CIE has used Munsell colors in color matching experiments to establish a correspondence with CIE color coordinates. Cylindrical Coordinates
2 3
21
4
20 19 18
5 6
17
7
Intensity. For quantitative purposes, the three color coordinates, hue, saturation, and intensity, define a cylindrical color space, shown in Fig. 16. Intensity specifies
8
16 9
15 14
13 12 11 White
10 White
Full color
Black
Black
Figure 14. The Ostwald color system. The spectral colors lie on the perimeter of the solid.
Figure 15. The Munsell color system. For each hue, painted chips show the various combinations of value and chroma. (photo provided courtesy of Munsell Color, division of GretagMacbeth). See color insert.
COLOR IMAGE PROCESSING
White
Green
Red
111
The effects of hue, saturation, and intensity are illustrated by the color test patterns in Fig. 17, where hue varies with angle from zero to 360° . In the left image, both saturation and intensity are constant at unity. In the middle image, saturation goes to zero at the center as the colors fade to gray. In the right-hand image, intensity goes to zero at the center, as the colors fade to black.
Blue Hue
Intensity
120° Saturation 0° Black 240° Figure 16. Cylindrical color space. Hue is an angle, saturation is a vector length, and intensity is the vertical axis.
the overall brightness of a color, without regard to its spectral wavelength, and it is the vertical axis of cylindrical color space. Different intensities (the gray shades) fall along the vertical axis from black at the bottom to white at the top. In this discussion, we depart somewhat from the strict definition of intensity put forth by the CIE and used routinely in color science. Instead we use a more flexible definition, as is common practice in digital image processing.
Hue. The two chrominance parameters, hue and saturation, are illustrated by the color circle in Fig. 16. Hue is an angle, and saturation is a vector length in the hue-saturation plane. Arbitrarily, a hue of 0° is red, 120° is green, and 240° is blue. Hue traverses the colors of the visible spectrum as it goes from zero to 240 degrees. The nonspectral colors fall between 240° and 360° .
Color Coordinate Conversion For image processing, some techniques are naturally more successful when carried out in one color coordinate system or another. Thus, it is useful to be able to convert between rectangular (RGB) and cylindrical color coordinates. We present two alternative cylindrical systems that are useful for color image processing, even though the hue, saturation, and intensity values obtained are not the same as those used in color science. The IHS Coordinate System Several formulations for HSI spaces have been proposed and used. One of the most straightforward is called the IHS (intensity, hue, and saturation) system (7). The conversion from NTSC RGB to IHS color coordinates is given by R I 1/3 1/3 1/3 N √ √ √ GN V1 = −1/√ 6 −1/√6 2/ 6 V2 0 BN 1/ 6 −2/ 6
(20)
and H = tan−1 S=
V2 V1
V12 + V22 .
(21)
In this formulation, blue is the reference for hue. The inverse formulation is V1 = S cos(H),
Saturation. The saturation parameter is the radius of the point from the origin of the color circle. The ‘‘vivid’’ (saturated) colors fall around the periphery of the cylinder, and their saturation values are unity. At the center lie neutral (gray) shades, those with zero saturation. In between lie pastel shades. The fully bright, fully saturated colors fall on the perimeter of the circular top surface.
V2 = S sin(H),
(22)
and R N
GN BN
4/3 −2√6/9 √6/3 I √ √ = 2/3 −√ 6/9 − 6/3 V1 . 6/3 0 V2 1
(23)
Figure 17. Color charts: Hue varies with angle according to Fig. 13. Left: constant saturation and intensity. Middle: constant intensity, saturation goes to zero at the center. Right: constant saturation, intensity goes to zero at the center. See color insert.
112
COLOR IMAGE PROCESSING
Figure 18. Cylindrical IHS color space: Upper left: color image; Upper right: I; Lower left: H; Lower right: S component. In this image, blue is the reference for zero hue. See color insert.
Table 2. RGB, IHS, and HSI Values for Primary, Secondary, and Neutral Colors Color Name Black Gray White Red Yellow Green Cyan Blue Magenta
IHS System
HSI System
R
G
B
I
H
S
H
S
I
0 0.5 1.0 1.0 0.5 0 0 0 0.5
0 0.5 1.0 0 0.5 1.0 0.5 0 0
0 0.5 1.0 0 0 0 0.5 1.0 0.5
0 0.5 1.0 0.333 0.333 0.333 0.333 0.333 0.333
270° 270° 270° 135° 207° 243° 297° 0° 45°
0 0.204 0.408 0.577 0.455 0.912 0.456 0.816 0.290
45 45 45 0 60 120 180 240 300
0 0 0 1 1 1 1 1 1
0 0.872 1.743 0.588 0.588 0.588 0.588 0.588 0.588
The transformations between this system and RGB space are relatively simple to compute, and intensity and hue are well behaved (Fig. 18). Saturation, however, is intensity-dependent, and the primary colors do not possess unity saturation, as one would hope. Table 2 illustrates this. Thus, this IHS space is less desirable for color image processing than a truly cylindrical color space might be.
HSI Format A more useful cylindrical color specification system is one we call the ‘‘HSI format.’’ Its saturation is intensityindependent, and it also offers other advantages for image processing. The conversion to and from RGB space, however, is a bit more complex than for the IHS system.
RGB to HSI Conversion The conversion from RGB to HSI format can be approached as follows (28–30). Recall that the gray line is the diagonal of the color cube in RGB space, and it is the vertical axis in cylindrical HSI space. Thus we can begin by establishing an (x, y, z) coordinate system in which the RGB cube is rotated so that its diagonal lies along the z axis and its R axis lies in the x–z plane (Fig. 19). This is given by 1 x = √ [2R − G − B], 6 1 y = √ [G − B], 2
COLOR IMAGE PROCESSING
z
(a)
113
1
Rotation
G
W′ r
y
f
W
G′
R
x
−1
1
B
B
R′
B′
−1
G
R
(b)
1
x
Figure 19. Rotating the RGB cube. This is a step in the development of the conversion from RGB to HSI coordinates.
S y
H
−1
and 1 z = √ [R + G + B]. 3
x
(24)
Next we convert to cylindrical coordinates by defining polar coordinates in the x, y plane, ρ=
(25)
where (x, y) is the angle that a line from the origin to the point (x, y) makes with the x axis. This is basically the arc tangent function, but attention is paid to the quadrant in which the point falls. Now, we have cylindrical coordinates, where (φ, ρ, z) correspond to (H, S, I), but there are two problems with saturation. It is not independent of intensity, as we would like, and the fully saturated colors (those that have no more than two of the primary colors present) fall on a hexagon in the x, y plane (Fig. 20a) rather than on a circle. The remedy is to normalize ρ by dividing by its maximum for that value of φ. This leads to the saturation formula √ 3 3 min(R, G, B) S= =1− =1− min(R, G, B) ρmax R+G+B I (26) This places all the fully saturated colors on a unit-radius circle in the x, y plane (Figure 20b). There are at least three ways to compute hue. One can use Eq. (24) and take φ in Eq. (25) as the hue. An alternative expression for y, which yields results similar to those of Eq. (24) but provides hue values that are more stable in neutral gray (i.e., small S) areas, is 1 y = √ [G − 2B]. As a third alternative, one can compute 6 ρ
−1 Figure 20. The x, y plane of color space: (a) unnormalized polar coordinates; (b) normalized saturation.
x2 + y2 ,
φ = (x, y),
1
the angle −1
θ = cos
1 [(R 2
− G) + (R − B)]
(R − G)2 + (R − B)(G − B)
,
(27)
after which the hue is H=
θ, G≥B . 2π − θ, G ≤ B
(28)
Figure 21 shows the HSI components of the Lenna image. Notice how different saturation is from that in the IHS system (Fig. 18). Hue differs only in that it is referenced to red at 0° .
HSI to RGB Conversion The formulas for converting from HSI to RGB take on slightly different form, depending upon which sector of the color circle the point falls (28–30). Assuming that Eqs. (24)–(26) have been used for the RGB to HSI conversion, the reverse conversion proceeds as follows. For 0° ≤ H < 120° ,
S cos(H) I , R= √ 1+ cos(60° − H) 3 I B = √ (1 − S), 3
114
COLOR IMAGE PROCESSING
Figure 21. The HSI cylindrical color space: Upper left: color image; Upper right: H; Lower left: S; Lower right: I component. Here, red is the reference for zero hue. See color insert.
and G=
√
COLOR CORRECTION 3I − R − B,
(29)
but for 120° ≤ H < 240° ,
S cos(H − 120° ) I , G= √ 1+ cos(180° − H) 3 I R = √ (1 − S), 3 and, B=
Color Balance
√ 3I − R − G,
(30)
and for 240° ≤ H < 360° ,
S cos(H − 240° ) I , B= √ 1+ cos(300° − H) 3 I G = √ (1 − S), 3 and R=
√ 3I − G − B.
Sometimes, a digitized color image is obtained without the proper balance between the red, green, and blue components, or the RGB values may not truly represent the characteristics of the scene. Then, it may be necessary to correct color before processing or display.
(31)
There are several more variations of HSI color spaces (14,31). From the viewpoint of color image processing, the specific choice may not materially affect the result, as long as hue is an angle, hue and saturation are independent of intensity, and the transformation is accurately invertible.
Different sensitivities, offsets, gain factors, etc., in the three color channels perform linear transformations on the three color components during digitizing. The result of this can be a color image where its primary colors are ‘‘out of balance.’’ All of the objects in the scene are shifted in color from the way they should appear. Most noticeably, objects that should be gray take on unwarranted color. The first test of color balance is whether all of the gray objects appear gray. The second test is whether the highly saturated colors have the proper hue. If the image has a prominent black or white background, this will produce a discernible peak in the gray-level histograms of the RGB component images. If these peaks occur at different gray levels in the three channels, this signals a color imbalance. The remedy for color imbalance of this type is to use linear gray-scale transformations on the individual R, G, and B images to give equal gray levels to gray areas and thus bring about color balance. Normally, only two of the component images need to be transformed to match the third. The simplest way to design the required grayscale transformation function is (1) to select relatively uniform light gray and dark gray areas of the image, (2) to
COLOR IMAGE PROCESSING Pixel count 1200
Pixel count 600
Red
1000
500
800
400
600
300
400
200
200
100
Green
Pixel count 1000
0
60
Pixel count 1000
120 180 Gray level
600 400 200 0
240
Red
0
60
Pixel count
800 600
Blue
800
0
0
115
120 180 Gray level
0
240
Green
60
120 180 Gray level
Pixel count
600
600
400
400
200
200
240
Blue
400 200 0
0 0
60
120 180 Gray level
240
0 0
60
120 180 Gray level
240
0
60
120 180 Gray level
240
Figure 22. Color balance example. Top row: unbalanced image and RGB histograms. Bottom row: balanced image and RGB histograms. This exercise assumes that the (white) hatband and the (black) mirror frame are neutral colors. See color insert.
compute the mean gray level of both areas in all three component images, and then (3) to use a linear gray-scale transformation on two of them to make them match the third. If each of the two areas has the same gray level in all three component images, color balance has been effected, at least to a first approximation. Figure 22, at the upper left, shows the Lenna picture as it is available on the Internet, and gray-level histograms of its RGB components are to the right. The leftmost peak in each histogram corresponds to the mirror frame on the right side of the picture. The average gray levels there are approximately [R,G,B] = [95, 22, 61]. To effect color balance, we took both the mirror frame and the hatband as neutral colors. First, constants were subtracted from the red and blue components to make their leftmost peaks match that of the green component. Then, the three component images were multiplied by constants designed to place the hatband at an average
255
gray level of approximately 240. The resulting histograms are seen in the lower row of Fig. 22, and the color-balanced image appears at the lower left. More accurate color balance can be obtained if a gray step-wedge test target is present in the image, and nonlinear gray-scale transformations are used to give each step of the wedge the same gray level in each of the three color component images. Figure 23 illustrates the process. If the camera and digitizing system are stable, the required transformations can be determined before the digitizing session simply by digitizing a black-and-white step-wedge test target. Then, all images digitized during that session can be balanced by using the same set of transformations. Proper color balance is important when converting from RGB to cylindrical color spaces. Because hue is computed as an arc tangent [Eqs. (21), (25)], it is quite susceptible to color imbalance, especially in low-intensity (dark) areas. Even slight color imbalance can result in
255
Red 128
128
Red
Blue
Blue
0
0
x
x
Figure 23. Nonlinear color balance illustration. RGB gray level plots along a line that cuts through a step-wedge target. Left: Before color balancing. Right: After color balancing.
116
COLOR IMAGE PROCESSING
assigning inappropriate hue values and excess saturation to the dark regions of an image. COLOR IMAGE CALIBRATION
Color Compensation Example
Color Compensation In some applications, the goal is to isolate different types of objects that differ primarily, or exclusively, in chrominance. In fluorescence microscopy, for example, different constituents of a biological specimen (e.g., different cell structures) are stained with differently colored fluorescent dyes. The analysis often involves being able to visualize these objects separately but in correct spatial relationship to one another. If the preparative procedure stains three chemical components of the specimen, for example, using red, green, and blue fluorescent dyes, one can digitize and display this as a normal tricolor image. Then, the RGB component images, form a set of registered monochrome images, each of which shows objects of a specific type. This paves the way for image segmentation and object measurement by standard monochrome image processing techniques. Given the broad and overlapping sensitivity spectra of commonly used color image digitizers and the varied emission spectra of available fluorescent dyes (fluorophores or fluorochromes), one seldom completely isolates the three types of objects in the three component images. Normally, each type of object will be visible in all three of the color component images, although at reduced contrast in two of them. We refer to this phenomenon as ‘‘color spreading.’’ We can model color spreading as a linear transformation (32,33). Let the matrix C specify how the three colors are spread among the three channels. Each element cij is the proportion of the intensity of fluorophore j that appears in color channel i of the digitized image. Let x be the three-element vector of actual fluorophore intensities at a particular pixel, scaled as gray levels that would be produced by an ideal digitizer (one with no color spread and no black level offset). Then, y = Cx + b
Figure 24 shows an RGB image of human bone marrow cells that have been digitized by a color television camera mounted on a fluorescence microscope. In this preparation, all cells have been stained with DAPI, a blue fluorescent dye. Cells that are in the process of dividing also absorb FITC, a green fluorescent dye. Finally, the DNA located at the centromeres of the two number 8 chromosomes is labeled with Texas Red, a red fluorescent dye. Ideally, all of the cells would be visible in the blue channel, dividing cells in the green channel as well, and two dots per cell, corresponding to the two number 8 chromosomes, would appear in the red channel. In Fig. 24, however, all cellular components appear in all channels due to overlapping sensitivity spectra of the image digitizer. The color spread matrix for the instrument that recorded this image appears in Table 3. It states, for example, that only 44% of a DAPI molecule’s intensity
(a)
(32)
is the vector of RGB gray levels recorded at that pixel by the digitizer. The matrix C accounts for the color spread, and the vector b accounts for the black level offset of the digitizer, that is, bi is the gray level that corresponds to black (zero intensity) in channel i. This is easily solved for the true intensity values by x = C−1 [y − b].
matrix was determined. The color compensation matrix can be modified to account for differences in exposure time among channels (33).
(b)
(33)
Thus, color spreading can be eliminated by premultiplying the RGB gray-level vector for each pixel by the inverse of the color spread matrix, after the black level has been subtracted from each channel. This development assumes that gray levels are linear with intensity. For some cameras, a gray-scale transformation of the RGB components might be required to establish this condition before color compensation. The foregoing analysis assumes that the exposure time in each channel is the same as it was when the color spread
Figure 24. Three-color fluorescence microscopy of cells: (a) color; (b) red; (c) green; (d) blue components. See color insert.
COLOR IMAGE PROCESSING
117
compensation is given by
(c)
1.24 0.05 x = C y = −0.45 1.69 −0.35 −1.26 −1
−0.29 R −0.24 G 2.61 B
(36)
Thus, to correct the red channel image, at each pixel, one would, take 124% of the gray level in the red channel image, add 5% of the green channel value, and subtract 29% of the blue. The second and third rows, likewise, specify how to correct the green and blue channel images. Figure 25 shows the result of color compensation applied to the image in Fig. 24. Here, the three differently labeled types of objects have been effectively isolated to the RGB component images. This makes image segmentation and measurement a much simpler task. Color compensation also increases the saturation of the displayed color image (Fig. 25a) because color spread tends to desaturate the image.
(d)
COLOR IMAGE ENHANCEMENT Contrast and Color Enhancement When working with the RGB components of a tricolor digital image, one must be careful to avoid upsetting the color balance. Essentially all of the image processing techniques discussed in this volume, however, will produce very predictable results if applied to the intensity component of an image in HSI format. In many ways, the intensity component can be treated as a monochrome image. The color information, embedded in the hue and saturation components, will usually tag along without protest. Any transformations that alter the geometry of the image must be carried out in exactly the same way on all three components, whether these are in RGB or HSI format.
Figure 24. (Continued)
Table 3. Color Spread Matrix
Red Green Blue
Texas Red
FITC
DAPI
0.85 0.05 0.10
0.26 0.65 0.09
0.24 0.32 0.44
is recorded in the blue channel, whereas 32% of it shows up in the green channel and 24% ends up in the red channel. The values in this matrix were determined experimentally from digitized images of cells stained with single fluorophores, and they are specific to this particular combination of dyes, camera, and optics. The relatively large degree of color smear evident in this example resulted from spectral overlap among the RGB band-pass filters built into the color camera. Less severe color smear could be obtained with carefully designed optics, but this example illustrates the technique of color compensation. The color compensation matrix C−1 specifies what must be done to correct the color spread. Using the smear matrix in Table 3 and assuming that the background is zero, color
Saturation Enhancement One can make the colors in an image bolder by multiplying the saturation at each pixel by a constant greater than one. Likewise, a constant less than one reduces the colorfulness of the image. A nonlinear point operation can be used on the saturation image, as long as it is zero at the origin. Changing the saturation of pixels that have near-zero saturation upsets the color balance. Hue Alteration Because hue is an angle, one logical thing to do is to add a constant to the hue of each pixel. This shifts the color of each object up or down the rainbow. If the angle added or subtracted is only a few degrees, this process will ‘‘cool’’ or ‘‘warm’’ the color image, respectively. Larger angles will drastically alter its appearance. A general point operation, performed on the hue image, will exaggerate color differences between objects in portions of the spectrum where its slope is greater than one, and conversely. Because hue is an angle, operations processing that component must treat the hue scale as periodic, using modulo-2π arithmetic. For example, adding 20° to a hue of
118
COLOR IMAGE PROCESSING
(a)
(b)
(c)
(d)
Figure 25. The result of color compensation: (a) color; (b) red; (c) green; (d) blue components. See color insert.
350° yields a hue of 10° . One must sometimes add or subtract 360° to the result to constrain hue to the range [0° , 360° ]. Color Image Restoration One can apply the standard image restoration techniques to the R, G, and B images individually in a straightforward extension from monochrome to color. However, Some special considerations, however, apply to tricolor images.
If an image is being restored or enhanced for the sake of its appearance, one does well to note the strengths and weaknesses of the human eye. Detail, for example, is much more visible in intensity than in color. Blurring of edges, then, is much more disturbing if it affects intensity rather than hue or saturation. Similarly, graininess (random noise) of reasonable amplitude is more apparent in intensity than in color. Finally, the eye is much more sensitive to graininess in flat areas than in ‘‘busy’’ areas that contain high-contrast detail. This applies to both intensity and color (hue and saturation) noise. Taking the foregoing into account, we can construct a general outline for approaching a color image enhancement or restoration project. 1. Use a linear point operation to ensure that the RGB image fits properly within the gray scale (e.g., 0–255) and is balanced in color. 2. Convert to HSI format. 3. Use a low-pass filter or, better yet, a median filter on the hue and saturation components to reduce the random color noise within objects. Some blurring of edges in these images will not be noticeable in the final product, so this step might involve significant noise reduction. The filter used must preserve average gray level (i.e. MTF(0,0) = 1.0), and hue must be treated as an angle. 4. Use a spatially variant approach (e.g., linear combination filters) to restore the intensity image. This step sharpens edges and enhances detail while reducing graininess in flat areas. Again, MTF(0,0) = 1.0. 5. Use a linear point operation on the intensity component, as required, to ensure proper utilization of the gray scale. 6. Convert to RGB format, and display or print the image. Figure 26 shows an example of image restoration. The top row shows the Lenna image after it has been degraded by blurring, hue shift, and random noise addition. To the left of the color image are hue, saturation, and intensity images. The bottom row shows the same thing after restoration. For the restoration, the hue at each pixel was reduced by 14° to effect color balance. Saturation was increased by 50% and subjected to a 3 × 3 median filter to reduce color noise. The intensity image was synthesized as a weighted sum of a low-pass- and a high-pass-filtered intensity image. The weighting was done by the smoothed output of a Sobel edge detector. This ensured that the high-pass-filtered image contributed most to areas that contain detail and the low-pass-filtered image filled in flat areas. Each pixel in the intensity image was multiplied by 1.3 to increase the contrast by 30%. When processing color components, the filtering operations must account for the periodicity of the hue scale. For example, the average of hues 10° and 50° is 30° , but the average of 10° and 330° is 350° . Here we add 360° to the first hue to place it (370° ) nearer the second hue before averaging.
COLOR IMAGE PROCESSING
119
Figure 26. Color image restoration example. Top row: degraded image, Bottom row: processed image. Columns: HSI components, color image. See color insert.
For a different approach to filtering the color components, one can convert from H, S (polar) coordinates to x, y (rectangular chromaticity) coordinates before filtering. Here, there is no periodicity, and the results will be quite different from filtering hue and saturation separately. For example, suppose we average two adjacent pixels where [H, S] = [0° , 1] and [180° ,1]. These are fully saturated red and cyan, respectively. If we average hue and saturation separately, we obtain [H, S] = [90° , 1], which is a vivid yellow-green. Instead, if we average the corresponding [x, y] = [1, 0] and [−1, 0], we obtain [x, y] = [0, 0], which is a neutral (gray) color. COLOR IMAGE SEGMENTATION Segmentation of a color image is basically a process of partitioning the (three-dimensional) color space into regions that correspond to the different types of objects and background. The different objects in the image often correspond to separate clusters of points in a threedimensional histogram defined in RGB or HSI space. The background upon which the objects reside will also produce one or more clusters. A three-feature (e.g., RGB or HSI) classifier design exercise can prove helpful in defining the surfaces that partition the space best. The hue and saturation of an object are normally dictated by the light absorbing or reflecting properties of the material of which it is composed, combined with the illumination spectrum. The intensity, however, is seriously affected by both illumination intensity and viewing angle. A shadow, for example, that falls across an object would normally affect the intensity of the pixels therein much more than it would affect their hue and saturation. Thus, it may be productive to segment the image in the
hue–saturation plane (i.e., on the color circle), completely ignoring intensity, rather than in three-dimensional color space. One may find, however, that the hue and saturation images are noisy and lack distinct edges. Some noise reduction, implemented by smoothing or median filtering the two chrominance channels, may prove helpful before segmentation. Figure 27 shows how the chromaticity (hue-saturation) plane can be used in image segmentation. Figure 27 has an arrangement of differently colored flowers at the upper left. At the upper right is a hue versus saturation histogram of that image. The hue axis has been shifted 120° to center the clusters of points in the display window. The differently colored objects in the scene produce separate clusters of points in the two-dimensional histogram. The highly saturated yellow flowers produce cluster 1, the red flowers create cluster 2, and the blue flowers are responsible for cluster 3. Cluster 4 is due to the basket, and the background produces the very dense cluster 5 at a hue of 260° and near-zero saturation. The white flowers are hopelessly confounded with the background cluster. Due to the hue shift mentioned, 120° (green), not 0° (red), appears at both ends of the horizontal scale. In the HSI components at the bottom of Fig. 27, the different flowers are largely constant in hue and saturation. One could the image segment either by thresholding the hue and saturation images or by treating the 2-D histogram as a probability density function and partitioning it into regions by standard pattern recognition techniques. A more ambitious approach, not necessarily destined for better performance, would be to partition the 3-D HSI color space in a similar fashion.
120
COLOR IMAGE PROCESSING
1
.5
0 120
240
0
120
Figure 27. Color image segmentation example. Upper left: image; Upper right: hue vs. saturation chromaticity histogram; bottom row: HSI components. In the histogram the numbered clusters correspond to (1) yellow flowers, (2) red flowers, (3) blue flowers, (4) the basket, and (5) the background. See color insert.
COLOR IMAGE MEASUREMENT Once segmentation is complete, size and shape can be measured in the same way as they are for monochrome images. Now however, brightness includes the added aspect of color. One can compute, for example, the average hue and average saturation of each object in addition to its average intensity. This corresponds to the location, in color space, of the center of the cluster of points due to pixels that fall inside the object. Once measured, then, the objects can be classified by standard techniques. The texture features that have been defined for monochrome images can be generalized to color. The standard deviation of gray level inside the object, for example, is a measure of the amplitude of the textural pattern. Given a color image, one can compute this on the R, G, and B channel images separately, or on H, S, and I images. COLOR IMAGE DISPLAY Pseudocolor The term pseudocolor refers to a color image generated from a monochrome image by mapping each of the gray levels (located along the axis of the color cylinder) to a point elsewhere in color space (34). This is simply assigning a color to each gray level by some rule that can be stored in a lookup table. Thus, pseudocolor is basically a detail of
the monochrome image display process rather than a color image processing technique unto itself. The attraction of pseudocolor stems from the fact that the human eye can reliably distinguish many more colors than it can differentiate shades of gray. Thus, although one might be able to discern only 40 or so distinct shades of gray in a monochrome display, many more shades might be visible when mapped to different colors. Figure 28 shows an example of pseudocolor display. Figure (28a) is a monochrome image used as the input. Figure (28b) shows how the gray levels in the input image map to gray levels in the RGB components of the output image, which appears in Fig. (28c). This particular pseudocolor mapping encodes low intensity as blue, high intensity as red, and leaves green in the middle. Increasing the gray level simply cycles through the fully saturated colors of the rainbow. This pseudocolor mapping essentially converts gray level to hue. A poorly chosen pseudocolor mapping can obscure, rather than enhance, the information in a displayed image. The mapping is usually more satisfactory if it employs some pattern, rather than simply a random assignment of colors to gray levels. Normally, the gray-scale axis maps to a continuous line that spirals through RGB or HSI color space. Mapping the black-and-white points to themselves is often useful as well. In general, the more conservative mappings are the more successful. Considerable visual discomfort can result from more aggressive pseudocolor mapping schemes.
COLOR IMAGE PROCESSING
121
256
Output
192
128
64
0 0
64
(a)
128 Input
(b)
192
256
(c)
Figure 28. Pseudocolor image display example. (a) monochrome input image. (b) RGB gray-scale mapping. (c) output image. See color insert.
The usefulness of pseudocolor can be illustrated by a microscope image digitizing application developed by Donald Winkler at the NASA Johnson Space Center. A real-time pseudocolor display was used to assist operators in setting the light level before digitizing images in a microscope. The entire gray-scale was mapped onto itself, except for 255, which was mapped to a brilliant, saturated red. The operators’ instructions were to increase the lamp current until red appears, then decrease it slowly until the red goes away. This visual aid proved very useful in practice. CONCLUSION
6. CIE Proceedings, Bureau Central de la CIE, Paris, 1951, Vol. 1, Sec 4; Vol. 3, p. 37. 7. W. Niblack, An Introduction to Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1985. 8. CIE, Commission Internationale de l’Eclairage Proceedings, 1931, Cambridge University Press, Cambridge, 1932. 9. ITU-R Recommendation BT.601, Studio Encoding Parameters of Digital Television for Standard 4 : 3 and Wide-Screen 16 : 9 Aspect Ratios, International Television Union, Geneva, 1982. 10. ITU-R Recommendation BT.709, Basic Parameter Values for the HDTV Standard for the Studio and for International Programme Exchange, International Television Union, Geneva, 1990.
A wealth of color and vision data is available on the World Wide Web for those who wish to pursue further study in this area. Worthwhile starting points are (35–37).
11. H. Grassman, Philos. Mag. 4(7), (1854).
ABBREVIATIONS AND ACRONYMS
14. W. K. Pratt, Digital Image Processing, Wiley, New York, 1991.
SMPTE
15. R. W. G. Hunt, The Reproduction of Colour, Wiley, New York, 1957.
12. J. Guild, Philoso. Trans. R. Soc. London A230, 149–187 (1931). 13. W. D. Wright, Trans. Opt. Soc. 30, 141–164 (1928).
CIE NTSC ITU CRT RGB PIXEL SPD HSI IHS FITC DAPI
Society of Motion Picture and Television Engineers Commission Internationale de l’Eclairage National Television Standards Committee International Television Union Cathode Ray Tube red, green, blue Picture Element spectral power density hue, saturation, intensity intensity, saturation, hue a green fluorescent dye a blue fluorescent dye
16. W. S. Stiles and J. M. Burch, Optica Acta 6, 1–26 (1959). 17. F. J. Bingley, in D. G. Fink, ed., Television Engineering Handbook, McGraw-Hill, New York, 1957. 18. The Science of Color, Crowell, New York, 1953. 19. SMPTE RP 177-1993, Derivation of Basic Television Color Equations, Society of Motion Picture and Television Engineers, White Plains, New York, 1993. 20. P. S. Carnt and G. B. Townsend, Color Television. Vol. 2: PAL, SECAM and Other Systems, Iliffe, London, 1969. 21. D. L. MacAdam, J. Opt. Soc. Am. 27, 294–299 (1937). 22. G. Wyszecki, J. Opt. Soc. Am. 53, 1318–1319 (1963).
BIBLIOGRAPHY
23. C.I.E. Colorimetry Committee Proposal for Study of Color Spaces, Tech Note, J. Opt. Soc. Am. 64, 896–97 (1974).
1. H. J. A. Dartnall, J. K. Bowmaker, and J. D. Mollon, Proc. R. Soc. London B220, 115–130 (1983).
24. A. H. Munsell, A Color Notation, 8th ed., Munsell Color Company, Boston, 1939.
2. G. Wyszecki and W. S. Stiles, Color Science: Concepts and Methods, Quantitative Data and Formulae, 2e, Wiley & Sons, New York, 1982.
25. A. H. Munsell, A Grammar of Color, Van Nostrand-Reinhold, New York, 1969.
3. G. Wald, Science 101(2635), 653–658 (1945).
26. Colorimetry, 2e, Pub. No. 15.2, Commission Internationale de l’Eclairage, Vienna, Austria, 1986.
4. G. Wald, Science 145(3636), 1007–1017 (1964). 5. B. H. Crawford, Proc. Phys. Soc. B62, 321–334 (1949).
27. D. L. MacAdam, Color Measurement, Springer-Verlag, Berlin, 1985.
COLOR PHOTOGRAPHY
b
1.0
Although color has been used for graphic purposes since prehistoric times, it was Isaac Newton’s discovery of the solar spectrum in 1666 that led to an understanding of
r
0.5
500 600 Wavelength, (nm)
Red
0
400
COLOR VISION AND THREE-COLOR PHOTOGRAPHY
g
Orange
Color photography is a technology by which the visual appearance of a three-dimensional subject may be reproduced on a two-dimensional surface pleasingly balanced in brightness, hue, and color saturation. There are two essentials in the practice of color photography: the camera and the light-sensitive sensor. The task of the camera is to present an undistorted image to the plane of the sensor that has an intensity level and exposure time appropriate to the sensitivity of the sensor being used. The sensor can be either silver halide based photographic film or an electronic sensor such as a charge-coupled device (CCD). A digital record of the scene may also be obtained by scanning the processed film. The technology of silver halide color photography is discussed herein. The physical record of the image is expected to have high permanence. The image may be viewed directly as a reflection color print, by projection as a color slide, or by back-illumination as a display transparency. A digital image may also be viewed on a video monitor. References 1–20 are general sources of information on the science and technologies supporting color photography (see also PHOTOGRAPHY; COLOR PHOTOGRAPHY, INSTANT PHOTOGRAPHY). A detailed history of color photography may be found in Refs. 1 and 2.
Yellow
INTRODUCTION
Green
Eastman Kodak Company Rochester, NY
Blue
JON A. KAPECKI JAMES RODGERS
Violet
COLOR PHOTOGRAPHY
color with regard to the properties of light. From the observation that white light is not a pure entity, but consists of a mixture of colors, the idea developed that any color could be obtained from a mixture of three primary colors. A wide range of colors could be obtained by mixing red, green, and blue lights. At the start of the nineteenth century, the Young–Helmholtz theory of color vision proposed that there were three types of receptors in the human retina; each responds over a certain wavelength range in proportion to the amount of light absorbed. Because the concept of color is meaningless in the absence of an observer, the physiological basis of color is used as a starting point for trichromatic color reproduction. Human vision is sensitive to electromagnetic radiation in the 400–700 nm range. Figure 1 is an illustration of the colors perceived over this wavelength range, together with the representative spectral sensitivities of human retinal receptors (21). There are no abrupt transitions between the spectral colors, as indicated by the vertical lines, but rather a gradual merging. The human eye contains two main types of lightsensitive cells: rods and cones. These names come from the shape of the outer segment of the cell. The rods are by far the most numerous; they outnumber the cones by a factor of nearly 20 and operate as the main sensors at low illumination levels. Scotopic (rod) vision as shown in Fig. 1 centers at about 510 nm and extends from 400–600 nm. Because there is only one type of rod, it lacks the ability to discriminate colors, and its electrical output varies as the integrated response over this wavelength range. Cones are the primary sensors of color vision, and there is strong evidence that there are three types, shown as ρ, γ , and β. Over most of the visible range, any single wavelength stimulates more than one receptor type. For example, the
Ultraviolet
28. R. S. Ledley, M. Buas and T. J. Golab, Proc. 10th Int. Conf. Pattern Recognition, IEEE Comp. Soc. Press, Los Alamitos, CA, 1, 791–795 (1990). 29. K. R. Castleman, Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1996. 30. R. C. Gonzalez and R. E. Woods, Digital Image Processing, Addison-Wesley, Reading, MA, 1992. 31. A. K. Jain, Fundamentals of Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1989. 32. K. R. Castleman, Bioimaging 1(3), 159–165 (1993). 33. K. R. Castleman, Bioimaging 2(3), 160–162, (1994). 34. J. J. Sheppard Jr., R. H. Stratton and C. Gazley Jr., Arch. Am. Acad. Optometry 46, 735–754, (1969). 35. A. Stockman and L. T. Sharpe, Color and Vision Research Laboratories, 1999. 36. C. Poynton, Color FAQs, , 1999. 37. D. Bruton, Color Science, , 1999.
Relative sensitivity
122
700
Figure 1. Representative spectral sensitivities of the human retinal receptors, (- - - - ) scotopic (rod) vision, and β, γ , and ρ cone sensitivities. The wavelengths of the perceived colors are also shown. Reproduced with permission (20).
COLOR PHOTOGRAPHY
peak sensitivities of the ρ and γ receptors at 580 nm and 540 nm, respectively, are quite close together. The ability to distinguish small differences in spectral wavelength, it is believed relies on the difference signals between the responses of the three cone detectors. For example, stimulation of the ρ and γ receptors by monochromatic light of 570 nm has no difference signal and thus is interpreted by the brain as yellow light. However, stimulation of the γ receptor is increased by monochromatic light of 550 nm, that of the ρ receptor is decreased, and the difference signal between the two is interpreted by the brain as greenish light. Most colors consist of many wavelengths of varying intensities; these signals are additive for each receptor. This understanding of human color vision makes a definition of the characteristics of a perfect color reproduction system possible. For simplicity, assume that a reproduction of any uniform patch of color is desired, such that the original and the reproduction are indistinguishable under any illuminant. The first requirement is selecting three sensors that match the cone sensitivity curves shown in Fig. 1. The second requirement is for these sensors to record faithfully the intensity or brightness of the stimulus. The third requirement is that in viewing the reproduction, the physical record of each sensor’s response must be capable of stimulating only the cone receptor it is designed to mimic. This last requirement cannot be met in practice because even monochromatic light over most of the visible range stimulates more than one receptor type. Thus, the construction in color vision that permits exquisite wavelength discrimination also makes perfect color reproduction using three-color mixing impossible. With this limitation, the design challenge of color photography is to reduce or correct for unwanted stimulations as much as possible. In 1861, James Clerk Maxwell demonstrated that colored objects could be reproduced by photography, using three-color mixing (1). He produced three separate negative images in silver on glass plates, ostensibly taken through red, green, and blue filters. After the three negative plates were converted to positive plates, the positives were then loaded into three separate projectors, each had the corresponding filter used in the exposure in front of the lens, and the images were brought into registration on a white screen. A tolerable color reproduction was obtained. Although considered a landmark, the success of the experiment was fortuitous, because the silver halide crystals used as the lightsensitive element were sensitive only to ultraviolet and blue light. It was subsequently shown (22) that the green exposure resulted from the sensitivity of the silver halide tailing into the blue-green region of the spectrum and that the red exposure resulted from ultraviolet radiation passed by the red filter employed. Thus, the ability to sensitize silver halide crystals to green and red light, in addition to the inherent blue sensitivity, is critical. Intrinsic sensitivity depends on the halide chosen: useful sensitivity does not extend beyond 410 nm for AgCl and 490 nm for AgBr. It is axiomatic that only absorbed radiation can produce chemical action (Grotthus–Draper law), in this case, the
123
promotion of an electron from the valence to the conduction band of silver halide. It was discovered in 1873 that when certain organic dyes are adsorbed on silver halide crystals, they induce sensitivity to the longer wavelengths of the visible spectrum. In the course of testing photographic plates that contained a yellow dye to prevent backreflection of light (halation), some sensitivity to green light was observed. Many thousands of dyes have been tested as spectral sensitizers, and materials exist for selectively sensitizing of any region of the visible spectrum and also the near infrared. Besides permitting better tone reproduction in black-and-white photography, this technology is the key to providing the red, green, and blue discrimination required in modern color photography. Maxwell’s demonstration was an example of an additive trichromatic process: the final image was produced by combining red, green, and blue lights in registration, and each light corresponded to its amount in the original scene. The various colors produced by the additive combination of red, green, and blue lights are shown in Fig. 2. To simplify the discussion, blue light is considered the 400–500 nm range of the spectrum, green the 500–600 nm range, and red the 600–700 nm range. White light is produced when all three colors are combined in equal amounts. The twoway combinations of red, green, and blue light produce three colors; each lacks one-third of the spectrum. The color cyan is generated by combining blue and green lights and lacks light in the red part of the spectrum. Similarly, the color magenta lacks light in the green part of the spectrum, and the color yellow lacks light in the blue part of the spectrum. For this reason cyan, magenta, and yellow are called subtractive primaries. Early experimenters focused on the problem of converting the three separation positives produced by Maxwell’s method into a form that could be printed with colored inks on a paper surface. The first clear statements of the basic principles that underlie modern three-color photography
Figure 2. Addition mixing of red, green, and blue lights. See color insert.
124
COLOR PHOTOGRAPHY
Figure 3. Superimposed subtractive dyes. See color insert.
are credited to two independent workers: du Hauron, who applied for a French patent in 1868, and Cros, who wrote an article in Les Mondes in 1869. Both noted that printing colors should be complementary or ‘‘antichromatic’’ to the red, green, and blue taking filters. The complementary colors are the subtractive primaries. Cyan is complementary to red, magenta is complementary to green, and yellow is complementary to blue. Each subtractive primary dye absorbs the color to which it is complementary. Figure 3 illustrates the superposition of equal amounts of the three subtractive primaries as dyes. The three possible two-way combinations yield red, green, and blue. The three-way combination yields black. Thus, the three subtractive primary dyes used singly or together can create a wide variety of colors by selectively modulating the red, green, and blue components in white light. The Light-Recording Element The primary element for light capture in color photography is the silver halide crystal. By common usage, the term emulsion has come to denote in the photographic literature what is actually a dispersion of silver halide crystals (grains) in gelatin. Four discrete steps can be identified in forming a colored image: (1) light absorption by the crystal; (2) the solid-state processes that lead to the formation of a latent image; (3) reduction of all or part of a crystalbearing latent image to metallic silver by a mild reducing agent; and (4) use of the by-products of silver reduction to create a colored image in register by dye formation, dye destruction, or dye transfer. Sensitivity, or photographic speed, is one of the most important attributes of the light-sensitive element. Practical color photography using a handheld camera is possible in conditions from bright sunlight to night street lighting. These conditions span a factor of about 105 in illuminance and must be accommodated by the combination of camera shutter speed, lens aperture, and choice of film speed. Most
commercial emulsions contain a population of silver halide crystals that vary widely in size and shape. Although the structure of the AgBr and AgCl lattice is simple facecentered cubic, an enormous variety of crystal shapes can be obtained, depending on the number and orientation of twin planes in the crystal, the silver ion concentration during growth, and the presence of growth modifiers (23,24). To add to this complexity, the crystals in commercial emulsions usually contain mixed-halide phases. Films suitable for a handheld camera generally contain silver bromoiodide, in which iodide ions are incorporated into the AgBr lattice during crystal growth. The intrinsic sensitivity of silver halide is enhanced during manufacture by heat treatment, usually in the presence of tiny amounts of sulfur and gold compounds, a process known as chemical sensitization. This creates subdevelopable chemical sensitization centers of mixed silver/gold sulfides on the grain surface. The gold incorporated into the latent image formed during exposure enhances the developability of the silver halide crystal and effectively increases its light sensitivity, without affecting the wavelength dependence of sensitivity. The key to efficient chemical sensitization is to trap the photoelectron effectively at a latent-image site while minimizing loss processes such as electron–hole recombination. The important subject of chemical sensitization is dealt with more fully in PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY. Sensitizing Dyes Sensitizing dyes (25,26) are essential to color photography. Although the initial discovery was made in 1873, it was not until the 1930s that photographic scientists began to develop a systematic series of dyes that adsorbed on the silver halide crystal and then transferred to it the energy of green and red light necessary to create a latent image. The most widely used of these materials are the cyanine dyes, which are heterocyclic moieties linked by a conjugated chain of atoms. More than 20,000 dyes of this class have been synthesized. In the example shown in Fig. 4, when n = 0, the dye is yellow and provides sensitization to blue light. Dyes that sensitize in the blue are particularly important for AgCl emulsions, which lack intrinsic blue sensitivity. When n = 1, the dye, a carbocyanine, is magenta and absorbs green light. For n = 2, a dicarbocyanine, the dye appears cyan and sensitizes silver halide to red light. Extending the conjugation further produces dyes that absorb in the infrared and produce films useful for aerial photography and thermal analysis. There are several requirements for a good sensitizing dye. A good dye is adsorbed strongly on silver halide. The dye molecules attach themselves to the surface of the silver halide crystals, usually up to monolayer coverage. This amount can be determined by measuring the adsorption isotherm for the dye. Dye in excess of this amount can cause the emulsion to lose sensitivity, so-called dyeinduced desensitization. In color films, simple positively charged cyanine dyes can also dissolve in the organic phase used to solubilize the image-forming coupler and lead to a phenomenon known as unsensitization. This can be overcome by adding a negatively charged acidic group, such as sulfonate, to turn the dye into a zwitterion that is
COLOR PHOTOGRAPHY
C
CH
(CH
CH)n
C +N
N
I−
C2H5
C2H5
Sensitivity
(a)
B
A
400
500
C
600 700 Wavelength, nm
D 800
(b) Figure 4. (a) Cyanine sensitizing dye structure. (b) Sensitivity curves (- - - - ) intrinsic silver halide; and ( — ) for the dye in (a) where A corresponds to n = 0; B to n = 1; C to n = 2; and D to n = 3.
insoluble in organic media. Adsorption is also influenced by the composition and crystallographic surfaces of the silver halide crystal and by the procedure of dye addition. The optical properties of the dye change when it is attached to silver halide. Typically the light absorption peak shifts by 30–50 nm to longer wavelengths. Frequently, new absorption peaks occur as a result of different stacking arrangements of the dyes on the crystal surface. A good sensitizing dye absorbs light of the desired wavelength range very efficiently. The absorption spectrum of the sensitizing dye depends on a number of factors. Each heterocyclic nucleus has a characteristic color value associated with it that can be modified by chemical substitution. The chromophore length is another important variable. A useful empirical rule is that for each increment in n, the absorption peak of the eye shifts to longer wavelengths by 100 nm. A good sensitizing dye sensitizes very efficiently. A modern sensitizing dye transfers an electron from the dye’s excited singlet state to the conduction band of silver halide, which can subsequently participate in latentimage formation. The dye’s ability to serve as an efficient sensitizer is often proportional to the electrochemical reduction potential of the dye measured in solution. This has proved to be a useful tool in designing spectral sensitizers. A practical measure of dye efficiency is the relative quantum efficiency of sensitization (27) defined as the ratio of the number of quanta absorbed in the intrinsic region, usually 400 nm, of the silver halide to the number of quanta absorbed only by dye at a wavelength within its absorption band, both to produce the same specified developed density. For the best dyes this value is only slightly less than 1.0. A good sensitizing dye does not interfere with other system properties. Sensitizing dyes can sometimes influence the intrinsic response of a chemically sensitized emulsion and lead to desensitization or additional sensitization. The dye can also interfere with the
development rate, increase or decrease unwanted fog density, and remain as an unwanted stain in the film after processing. The dye should have adequate solubility for addition to the emulsion but should not wander between layers of the final coating. The relationship between crystal size and photographic speed can be understood by using simple geometric arguments. The sensitivity of an individual crystal is defined as the reciprocal of the minimum light absorption required to generate a developable latent image. For a silver halide crystal without sensitizing dye, blue light absorption is proportional to volume. If it is assumed that the crystal is a sphere and that the latent image can be formed with equal efficiency in all grain sizes, the relationship shown in Fig. 5 is obtained. The adsorption of sensitizing dyes is necessary to confer sensitivity on the green and red regions of the spectrum; this is frequently called ‘‘minus-blue’’ speed. To a first approximation, minus-blue speed depends on the surface area available for dye adsorption. Again, assuming sphericity, the line shown in Fig. 5 is the expected change of minus-blue speed with crystal size. Even for the highest speed films, the crystals do not usually exceed 3 µm in linear dimension. Crystal Morphology Over the years, obtaining greater uniformity in silver halide crystal size and habit in the grain population has been emphasized, in the belief that the chemical sensitization process can then yield a higher average imaging efficiency. One way of doing this is to adjust the nucleation conditions so that untwined crystals are favored, and then to ensure that no new crystals are formed during the growth of the starting population. Crystals that contain twin planes grow anisotropically, and it is more difficult to obtain a uniform population.
Log (relative sensitivity)
S
S
125
Blue
Minus blue
Log (crystal volume)
Figure 5. Calculated relationship between log (relative sensitivity) and log (crystal volume) for ( — ) intrinsic response (blue) and (- - - - ) dyed response (minus blue), assuming crystals are spherical.
126
COLOR PHOTOGRAPHY
Commercial materials are available that contain cubic and octahedral crystals of narrow size dispersity. Along with better control of the crystal size and shape in the population, the placement of iodide in the AgBr crystal has received a great deal of attention. Iodide incorporation increases blue light absorption, and its selective placement within the crystal allows controlling the rate of the development reaction, because iodide also acts as a development inhibitor. In color negative films, the color image is formed by reducing less than one-quarter of the total silver available to metallic silver during development. This strategy is necessary so that the crystals are large enough to have the required sensitivity, yet each crystal by its partial reduction contributes only a small amount of dye to the final image and leads to an image of low graininess. An approach has been devised (28) to break out of the surface-to-volume relationship imposed by crystal shapes that are nearly spherical. Conditions have been
(a)
established to favor the growth of crystals that have multiple parallel twin planes (29), and emulsions that contain mostly hexagonal tabular crystals, such as those shown in Fig. 6b, can be prepared. Figure 6 compares the crystals of a newer emulsion to a more traditional one. Tabular thicknesses of about 0.1 µm are commonly employed. By adjusting the projective area and thickness of these tabular crystals, it is possible to create a series of emulsions in which the surface area and hence minusblue speed increases, but the crystal volume and hence blue speed remain constant. Because an emulsion that is dyed to be sensitive to green or red light still retains its intrinsic blue sensitivity, the greater minus-blue/blue sensitivity ratio afforded by tabular crystals reduces unwanted blue light sensitivity in those layers of the film designed to record green and red light. Another advantage cited for this technology is improved optical transmission properties of layers that contain tabular emulsions. Because the crystals align themselves nearly parallel to the film support during the coating and drying operations, the transmitted light is less scattered sideways. This is advantageous for image sharpness in multilayer color films. Color Processes Additive Mixing
1 µm
(b)
1 µm
Figure 6. Emulsions (a) traditional and (b) hexagonal tabular crystals.
The first commercially successful color photographic systems used additive color mixing. Simultaneous recording of red, green, and blue components of an image was achieved in the chromoscope camera in 1898. The image beam was split into three by semitransparent mirrors and prisms; the three beams passed through red, green, and blue filters before striking the emulsion surface. One of the most serious practical difficulties encountered was nonuniformity of intensities at the film plane as a result of the optics of semireflecting mirrors. If different colors are presented rapidly enough to the eye, they are additively fused by the visual system. This principle has been applied to the generation of moving color images, where successive frames contain the red, green, and blue records of the scene. Photography and projection are accomplished by moving the appropriate colored filters synchronously with the image frames. This method places great mechanical demands on apparatus, however, and requires much higher than the normal projection rate of 24 frames per second. The results are often unsatisfactory because of color fringing of moving objects and flicker from the different transmission characteristics of the filters used. Mixing additive primaries as dyes by superposition does not work because each primary absorbs two-thirds of the spectrum. The two difficulties of registration and additive superposition were overcome by what is known historically as the screen-plate process. The additive primaries are presented to the eye as a mosaic of very small colored dots in juxtaposition, as in pointillist painting. Although close inspection reveals a pattern of colored dots, additive blending occurs by increasing the viewing distance. The retina of the eye is itself a random mosaic of red, green, and blue receptors, and if the dot pattern is fine enough,
COLOR PHOTOGRAPHY
the eye interprets the image as smooth. The photographic record is obtained by exposing a silver halide emulsion, which is sensitized throughout the visible spectrum, that is, it is panchromatic, through a mosaic of very small red, green, and blue filters. The film is then processed to give a positive image in silver. When viewed or projected in registration with the original mosaic, a colored image is created. The amount of light transmitted through each filter is controlled by the developed silver optical density. The first commercially successful screen-plate process was the Autochrome Plate made by the Lumiere Co. in France in 1907. The mosaic of filters was integral to the photographic plate, it consisted of starch granules about 15 µm in diameter, which were dyed red, green, and blue. The individually dyed granules were mixed and pressed onto the plate so that their edges touched. The spatial distribution of red, green, and blue granules was random. Any gaps in the mosaic were filled by a carbon paste. This filter screen was then overcoated with the emulsion. Exposure was made through the reverse side so that the exposing light passed through the filters first. The final image was intended for direct viewing or projection. The relative surface areas of the three primary colors were such as to give a satisfactory neutral with the viewing illuminant. The Autochrome process survived commercially until the mid-1930s. Mottle sometimes appeared in the Autochrome image because of clumping of the starch granules. A regular grid of filters gave more satisfactory results, as in Dufaycolor (1908), which employed a very fine square grid of filter elements. The lenticular method (30) was also commercially successful. The reverse side of a panchromatic blackand-white film was embossed with a very fine (about 25 per mm) pattern of cylindrical lenses or lenticules. The camera had a red filter over the top third of its lens, a green filter over the middle third, and a blue filter over the bottom third. During exposure, the lenticules were parallel to the camera filters. Light from any tiny area of the subject is focused onto the lenticular surface, which faces the lens. The lenticule then focuses a tiny image of the lens aperture with its three filters onto the panchromatic emulsion coated on the reverse side of the film. The relative intensities coming through each filter depend on the color of that tiny area of the subject. This process occurs for every point of the image, resulting in three horizontal bands for each lenticule. In this system, the variables to be optimized were line spacing, the thickness of the support, and the curvature of the lenticules. When the film is given a black-and-white reversal process and the optical path is reversed in a projector where the lens has the same arrangement of filters as the camera, a colored image is obtained. This system was introduced in 1928 by Eastman Kodak Company as Kodacolor and was available until 1935 as a 16-mm motion picture film. At present, the additive process is used in color television, in which light emitted from a tiny regular mosaic of red, green, and blue phosphors blends to give the colored image. Another modern additive color system is Polaroid’s Polachrome 35-mm transparency film that consists of a positive silver image overlying an additive screen that has 394 triplets of red, green,
127
and blue lines per centimeter of film. However, because additive photographic systems inherently waste light (each additive filter absorbs two-thirds of the light energy), most modern systems rely on the subtractive primaries. Subtractive Mixing There are mechanical difficulties in separating a photographic image into three images to record red, green, and blue information, only to recombine them later. Perfect registration of the color information can be preserved in a film if the three-color records are stacked on top of each other on the support. This film structure is known as an integral tri-pack. Photographic systems have been designed in which the three subtractive primary dyes — cyan, magenta, and yellow — can be formed in register, destroyed in register, or transferred in register to create the full-color image. Dye destruction technology, Silver Dye-Bleach, is used in the Cibachrome process. After a black-and-white development step, the film is subjected to an acidic dyebleach solution that destroys the incorporated azo dyes in proportion to the amount of developed silver and leaves a residual positive color image. Because the presence of light-absorbing dyes during exposure severely limits the photographic speed of these materials, they are used to make display transparencies and prints, usually from a camera transparency original. The azo dyes used in this process offer very good light and dark stability. The first instant color photography system, introduced by the Polaroid Corp. in 1963 as Polacolor, used the transfer of subtractive dyes to a receiver sheet to produce a positive image. The incorporated dye-developers that contain a hydroquinone moiety are soluble in the alkaline activator solution, except where silver development occurs, and then they are immobilized as the quinone form. Another dye diffusion method is the dye transfer system in which three-color separation negatives are prepared from an original positive color transparency. These are printed onto a special matrix film, which is processed in a tanning developer and washed in hot water to remove unhardened gelatin, giving three positive gelatin relief images. The depth of the gelatin relief is inversely proportional to the original camera exposure received. The corresponding subtractive primary dye is imbibed into each matrix and then transferred in register to a receiver sheet. Technicolor motion picture prints have been made by this process, which is used when exceptional color quality and dye stability are demanded. The first commercially successful film using the in situ formation of three subtractive dyes was Kodachrome film, introduced in 1935. The film has a multilayer structure in which red-sensitive, green-sensitive, and blue-sensitive emulsions are successively coated on the same support. Because the compounds necessary for dye formation are not incorporated in the film, an elaborate process is required to produce each dye in its correct layer. The first step is conventional black-and-white development to give a silver image. In the first commercial Kodachrome process, the silver image was removed (bleached) to leave a reversal image in residual silver halide. As the initial step in color image formation, cyan dye was formed in all layers
128
COLOR PHOTOGRAPHY
by reducing this (residual) silver halide. The film was then dried and subjected to a slowly penetrating bleach solution that decolorized the cyan dye and oxidized the silver in the blue- and green-sensitive layers. The process was repeated for magenta dye formation in the green- and bluesensitive layers. The film was again dried, and there was subsequent selective bleaching of the magenta image and silver oxidation in the blue-sensitive layer. A final colorforming step generated yellow dye in the blue-sensitive layer. Removal of all image silver then left a positive three-color image. The modern Kodachrome process relies on selective layer exposure and dye formation steps, again using silver reduction to drive the dye formation reactions. Subtractive dye precursors (couplers) that could be immobilized in each of the silver containing layers were sought, so that dye formation in all layers could proceed simultaneously rather than successively. The first of these to be commercialized were in Agfacolor Neue and Ansco Color films, introduced soon after Kodachrome film. These reversal working films contained colorless couplers that were immobilized (ballasted) by attaching long paraffinic
chains. The addition of sulfonic or carboxylic acid groups provided the necessary hydrophilicity to make them dispersible as micelles in aqueous gelatin. A different approach was taken in Kodacolor film, introduced by Eastman Kodak Company in 1942. The couplers were ballasted but, instead of having hydrophilic functional groups, were dissolved in a sparingly watersoluble oily solvent. This oily phase was then dispersed by high agitation into a gelatin solution as fine droplets less than one micrometer in diameter. Kodacolor film is negative working and was designed to be printed onto a companion color paper, which because it is also negative working, produces a positive color print. The whole system is known as the negative–positive process. Figure 7 illustrates two ways in which the integral tripack can be processed. One leads directly to a positive image; the other leads to a negative image that can be subsequently printed on a negative-working paper. In the positive-working or ‘‘reversal’’ process used in slide films, the first step is black-and-white development to yield a negative silver image. After light or chemical fogging
Silver halide
Blue-sensitive emulsion
Latent image
Yellow filter layer
Developed silver
Green-sensitized emulsion Red-sensitized emulsion Support
Color positive First developer
Color negative Developer
Subsequent dye development
Bleach and fix
Figure 7. Positive and negative-working integral tri-packs. See color insert.
Bleach and fix
COLOR PHOTOGRAPHY
of the unexposed silver halide, subsequent development is carried out in a color developer that simultaneously reduces the unreacted silver halide and generates dye. Removal of all of the silver leaves the positive color image. The negative-working process uses the initial silver development reaction to drive color formation. Because the camera negative is not the final image, the system tolerates underexposures and overexposures by as much as two stops (a factor of 4), and density differences in the negative can be allowed for in the printing stage. The key process steps for color negative are develop/bleach/fix, which are carried out under carefully controlled temperature and agitation. Water washes follow the bleach and fix steps. The negative–positive system enjoys great commercial success. In 1949, color purity was improved by introducing colored masking couplers to the camera negative film, which partially correct for unwanted absorption of the image dyes (31). Masking couplers account for the yelloworange color seen in the unexposed parts of modern color negative films after processing. A later improvement was the introduction of development-inhibitor-releasing (DIR) couplers, in which a silver development inhibitor released as a function of exposure in one layer can influence the degree of development in adjacent layers (32). Using masking couplers and DIR couplers in concert substantially improves the quality of color reproduction in the final print. The principal features of an integral tri-pack are shown in Fig. 8. The color records are stacked in the order shown, the red record is on the bottom, the green record next, and the blue record on the top. The blue record is on the top because it is necessary to interpose a blue light filter to remove blue light, which would otherwise form latent images in the underlying red and green records. Silver halides, except for AgCl, have an intrinsic blue light sensitivity even when spectrally sensitized to green or red. The traditional filter material has been Carey Lea silver, a finely dispersed colloidal form of metallic silver, which removes most of the blue light at wavelengths kr)
Figure 11. Reaction of ionized coupler and oxidized developer (Dev+ ox ) to produce the intermediate leuco dye. If X is a good leaving group, the reaction proceeds spontaneously to dye, and the coupler is said to be two-equivalent. If oxidation by a second molecule of oxidized developer is required, the coupler is four-equivalent.
Log kc−
X
RR′C
R′′ N
R
RR′C kf
Cyan
N N O
X=H
RR′C− + Dev ox+
N
Ar
N+ R′
∼
R′
OH−
+
R
R′′
+
NH
Figure 10. In each image-forming layer, developer oxidized by the exposed silver halide (Devox ) reacts with the appropriate coupler to form the corresponding subtractive primary dye, yellow, cyan, or magenta. Ar represents an aryl group, and the various R’s are undefined organic segments.
Yellow
O
OH−
A
5.0
B
C
4.0 OH
3.0
Ballast
2.0 X 5.0
6.0
7.0
8.0
9.0
10.0
Coupler pK a
second molecule of QDI is required to produce the image dye. Because formation of each mole of QDI requires the reduction of two moles of silver ion (for a total of four moles), such couplers are referred to as ‘‘four-equivalent’’ couplers. However, if X is a good leaving group, such as a halide, or a low pKa phenol or heterocycle, departure of X as an anion to form the dye proceeds spontaneously, and the coupler is said to be ‘‘two-equivalent’’ (45). Although the final dye yield, that is, the ratio of dye formed to silver reduced, depends on the degree of competition in the system, two-equivalent couplers are preferred because they undergo fewer side reactions with oxidized developer and require less silver halide to produce an equivalent amount of dye. Thus, there is an obvious economic benefit, and it can also lead to increased film sharpness by reducing optical scatter by the coated silver halide. The experimentally observed second-order rate constant kc− , for two-equivalent couplers, where the conversion of the leuco dye to image dye is rapid, can be equated with kf the rate of nucleophilic attack of coupler anion on oxidized developer. Thus, when the pH of the process is specified, two parameters, pKa and kc− , can be
Figure 12. The relationship between acidity and anion reactivity for naphthol couplers that differ in the 2-position ballast where A is ballast 1; B, ballast 2; C, ballast 3. For each coupler, data points represent different leaving groups X.
conveniently used to characterize the molecular reactivity of a large variety of photographically well-behaved couplers (42,56). When couplers are grouped together in structurally similar families, such as the naphthols in Fig. 12, it is found that a linear free-energy relationship of the form log kc− = α + βpKa often exists between the pKa of the coupler and the log kc− of its resulting anion (42), that is, a substituent X that raises the pKa of the coupler and thus reduces the concentration of active anion at a given pH also increases the activity of that anion. The constant β is a measure of the trade-off between those two opposing factors on coupler reactivity. For β > 1, increasing the electron-donating ability of the substituent leads to an increase in anion reactivity that more than compensates for the loss in anion concentration. For β < 1, the
COLOR PHOTOGRAPHY
opposite is true; overall reactivity results from decreasing the pKa of the coupler, until the pKa drops below the pH of the process. At that point, when the coupler is fully ionized, changes in the molecule that lower pKa do not increase anion concentration but only decrease anion reactivity (56). Studies of the naphthol and phenol couplers indicate that the field and resonance, but not steric, effects of the substituents are important in determining pKa and kc , suggesting that formation of the leuco dye proceeds through an early transition state where there is overlap between the electron-deficient p orbitals of the developer ring and the electron-rich p orbitals of the naphthol ring. This geometry is quite different from that of the product (57). For coupling reactions where conversion of the leuco dye to image dye is not fast, the rate constant of the reverse reaction, kr , is important. If kr is much less than ke , dye formation is slow, but dye yield can still be high, although the leaving group may not be released in time to affect development. If, however, kr is faster than ke , much of the oxidized developer originally captured by the coupler is lost to other reactions, and dye yield is low (56). The effect of the pH of the processing solution on coupling is most often considered in the context of the coupler, whose structure can be varied, but pH can also affect the activity of the oxidized developer (45,58). In general, oxidized developers fall into three categories. The first, like CD-2, has a constant charge, either cationic or zwitterionic, at normal processing pH values, and coupling rates are little affected by pH changes. The second class exists in cationic or neutral forms as a function of pH. For example, the oxidized form of the commercially important CD-4 reacts reversibly with hydroxide ion to form a neutral pseudobase, which is unreactive toward coupling (43). Finally, the quinonedimine derived from developers like CD-3 can exist in cationic and zwitterionic forms as a function of pH. Both forms can react with ionized coupler, but at different rates. Though the molecular reactivities of the coupler and the developer are important, both can be severely constrained by the polyphasic nature of the film system, where the coupler is most often embedded in a lipophilic phase and the oxidized developer is generated in the aqueous gelatin phase (45). Because the organic solvent invariably has a lower dielectric constant than the aqueous phase, the effect is to suppress ionization of the coupler and slightly increase the rate at which the positively charged QDI and coupler anion react with one another. The overall result is to reduce the rate of dye formation. For some yellow couplers, the rate-determining step can change from coupling to ionization as a function of solvent or even substituent. The structure of the oxidized developer and its charge can also control its ability to partition into the organic phase (44,58). In some processes, development additives such as benzyl alcohol are added to the developer to increase the hydrophilicity of the organic phase. More frequently, higher pKa couplers are designed to have additional ionizable sites, such as carboxyl, sulfo, or phenolic groups, to accomplish the same end (59).
133
In any case, the kinetics of such systems can be very complex, depending on the identity of the coupler and developer, and also on the nature of the dispersed organic phase. Often, the rate of coupling is proportional to the total surface area of the droplets that contain the coupler. Less frequently, for example, when the droplet integrity is destroyed in the developer bath, the rate can be independent of droplet size. Surfactants and mechanical milling techniques are used to form these oil-in-water emulsions, or dispersions as they are called in the photographic literature, that are highly controlled in particle sizes and have the requisite stability for coating and storage. In some cases, dye-forming moieties attached to a polymeric backbone, called a polymeric coupler, can replace the monomeric coupler in coupler solvent (53). In other reports, very small particles of coupler solubilized by surfactant micelles can be formed through a catastrophic precipitation process (60). Though both approaches have the potential to eliminate the need for mechanical manipulation of the coupler phase, as well as detrimental heating to effect coupler dissolution, in practice they have seen only limited application. Cyan Couplers Substituted phenols and α-naphthols are the primary classes of cyan dye-forming couplers (Fig. 13). Naphthols of structural types (1) and (2), the 1-hydroxy-2naphthamides, have proved very useful and are easily and inexpensively prepared. Hydrogen bonding between the naphtholic oxygen and the hydrogen of the 2-amido group shifts the dye hue bathochromically (to longer wavelengths), increases its extinction, and contributes to the dye’s stability. Electron-withdrawing groups on the amide nitrogen also shift the hue bathochromically and increase the extinction coefficient (61). Substitution in the 4-position provides accessibility to image-modifying couplers of various types. However, some dyes of this class are prone to chemical reduction, which returns them to the colorless leuco form in the presence of ferrous ion during the bleaching step (62). Naphthols of type (3) reportedly show less fade to the leuco dye (63), probably due to stabilization of the dye by internal hydrogen bonding. Phenols of structure (4), it is also claimed, show markedly improved dye stability both in the presence of ferrous ion and with a second carbonamido group in the 5-position to simple thermal fade (64). Numerous substituent variations are described in the literature to adjust dye hue. A perfluoroacylamido in the 2-position shifts the hues bathochromically and maintains thermal stability of the dyes (65). Phenols of structure (5) are said to show outstanding light stability, which makes them especially suitable for display materials like color paper (66). Some cyan dyes derived from both naphthols and phenols reportedly show thermochromism, a reversible shift in the dye hue as a function of temperature. This can occur in a negative while prints are being made (67). Derivatives of the pyrazolotriazole nucleus, more commonly used in magenta dye-forming couplers (see below), and the related pyrrolotriazole ring have recently been described as new parents for cyan couplers. They have the reported advantage of yielding dyes of very high
134
COLOR PHOTOGRAPHY
OH
O
OH
CNH(CH2)nOAr
O
OH
CNHAr
X
R′
X
RCNH
X
O (1)
(2) O
OH
HNCR
(3) OH
Cl
O HNCR
R′ Figure 13. Cyan dye-forming couplers where X can be H, Cl, OAr, OR, or SAr. Ar is aryl. R and R are undefined organic segments.
X (4)
extinction (ε > 60, 000). The hue of the dye is shifted into the cyan by using extended conjugation and electronwithdrawing groups (68). Yellow Couplers The most important classes of yellow dye-forming couplers are derived from β-ketocarboxamides, specifically the benzoylacetanilides (6) (69) and pivaloylacetanilides (7) (70). Substituents Y and Z can be used to attach ballasting or solubilization groups and to alter the reactivity of the coupler and the hue of the resulting dyes. Typical coupling-off groups (X) cited in the literature are also shown. Y COCHCONH X
Z
Ballast (6) Y
(CH3)3CCOCHCONH X
Ballast (7) H, Cl, OSO2R,
where X =
OOCR,
S-aryl, O
SO2R,
C O
,
OOCN
,
N C
R
O
For the widely studied benzoylacetanilides, coupler acidities can be correlated using a two-term Hammett equation that involves substituents in either or both rings. As in the naphthols, there is a linear correlation of the log kc− and pKa values, but β equals 0.55, suggesting that increased reactivity comes with reduced pKa until the coupler is nearly fully ionized (71). The hues of the
X (5)
resulting dyes can be shifted bathochromically by electronwithdrawing groups in either ring. Ortho substitution in the anilide ring increases the extinction coefficient and narrows the bandwidth of the dyes. Both the couplers and their dyes have significant absorptions in the ultraviolet that offer protection to dyes in the underlying layers (72). The relatively low pKa values for benzoylacetanilides, especially as two-equivalent couplers, minimize concerns over slow ionization rates and contribute to the couplers’ overall reactivity. But this same property often results in slow reprotonation in the acidic bleach, where developer carried over from the previous step can be oxidized and react with the still ionized coupler to produce unwanted dye in a nonimage related fashion. This problem can be eliminated by an acidic stop bath between the developer and the bleach steps or minimized by careful choice of coupling-off group, coupler solvent, or dispersion additives. The second widely used class of yellow couplers is the pivaloylacetanilides (7) and related compounds that bear a fully substituted carbon adjacent to the keto group. The dyes from these couplers tend to show significantly improved light stability, and so these couplers have been widely adopted for use in color papers as well as in many projection materials. In general, the dyes have narrowed bandwidths and less unwanted green absorptions (70). The lack of a second aryl group flanking the active methylene site, however, means that the pKa values of pivaloylacetanilides tend to be considerably higher than those of benzoylacetanilides. As a result, these couplers are rarely used as their four-equivalent parents. Rather, the coupling site is substituted with electronwithdrawing groups to increase the acidity of the coupler or hydrophilic groups to aid in the rate and extent of oilphase ionization. Both electron-withdrawing groups and hydrophilic groups can also appear in the anilide ring. An interesting variation on this is the use of polarizable groups, such as C=O or S=O, in the ortho position of an aryloxy group attached to the coupling site. These groups reduce the pKa of the coupler by increasing the rate of the ionization process (73). Another technique for increasing the reactivity of these couplers is reducing the steric
COLOR PHOTOGRAPHY
hindrance of the tertiary-butyl group by constraining two of the pendant carbons in a cyclopropyl ring (74). The higher pKa of the coupling site in pivaloylacetanilides does mean that undesirable dye formation in bleach is minimized. A deficiency of many yellow couplers is the relatively low extinction coefficient of the dyes that they form, as low as about 15,000 and usually no higher than 22,000. This can be compared to typical cyan and magenta dyes whose extinctions are greater than 30,000. To create a neutral in the film, this requires coating considerably more coupler and silver halide, resulting in a thicker, more expensive yellow layer and consequent reduced sharpness. One approach to improved extinction has been to substitute an indolinyl or 2-aryl-3-indolyl group for the t-butyl group of pivalolylacetanilides. These materials combine good reactivity and extinction coefficients as high as 27,000 (75). A second ingenious, albeit more complicated, approach is to make the coupling-off group itself a preformed yellow dye whose hue does not manifest itself until the dye is released when coupling. Thus each coupling event creates two dye molecules of dye, whose a combined extinction is more than 50,000 (76). Other classes of yellow couplers reported in the literature include indazolones (77) and benzisoxazolones (78). Neither of these structures contains an active methylene group; dye formation, it is believed, occurs through a ring-opening process. Magenta Couplers For many years, the most widely used magenta couplers have been derived from 1-aryl-2-pyrazolin-5-ones (79). Substituents in the aryl ring or at the 3-position have been used to alter dye hue and stability and to control coupler reactivity. Ballasting groups are usually attached through the 3-position as well. Electron-withdrawing groups at either site tend to shift the hue of the resulting dye bathochromically (80). The principal absorption of these dyes is in the region from 500–570 nm, but most pyrazolinone dyes also show significant unwanted absorptions in the blue. Because these can be minimized by using 3-arylamino or 3-acylamino substituents [see structures (8), (9)] while also increasing the extinction coefficient of the primary absorptions, such couplers have been extensively described in the literature (81). Ar represents an aryl group. Ar
Ar N
N
O
N NHR′
H
O
O NHCR′′
X
H
(8) where X =
N
X (9)
,
O
OOCNHR,
R OOCN
,
OOCR,
NHSO2R,
SR,
S-aryl, N
N-aryl
135
Although these pyrazolinone couplers can exist in several tautomeric forms, ionization of the couplers is rapid, and the four-equivalent parents have been widely used for decades. The aryl ring is often trisubstituted in the 2,4,6-positions, and one or more of the substituents are chlorine (82). The pKa values the 3-arylamino pyrazolinones (8) are higher than for their 3-acylamino (9) counterparts with dyes, whose hues are shifted toward shorter wavelengths and show less unwanted blue absorption. Coupler (10) [61354-99-2] is unique because it carries its own stabilizer against photochemical dye fade in the form of the 4-alkoxyphenol ballast, which makes it especially suitable for color paper (83). Cl Cl
N ClO
N
Cl NH NHCOCH O (10)
n-C12H25
OH C(CH3)3
The four-equivalent pyrazolinones suffer from a number of disadvantages. Before processing, the couplers can react with ambient formaldehyde to yield a non-dyeforming condensation product. Formaldehyde scavengers, such as ethylenediurea, can be added to control this problem (84). After processing, the residual coupler can react with image dye to form colorless products (85). A formaldehyde stabilizer, that reacts with the residual coupler in the same fashion as the undesirable preprocess reaction eliminates this problem. Finally, though these couplers react rapidly with oxidized developer, the intermediates undergo side reactions that result in reduced dye yields. The 3-acylamino pyrazolinones are more prone than 3-arylamino couplers to these problems, but both are susceptible (86). The two-equivalent counterparts to these couplers are largely devoid of these problems. Both halogen and sulfo groups are unsatisfactory leaving groups because of various side reactions, whereas the otherwise attractive aryloxy moiety causes the coupler to undergo an often rapid radical chain decomposition (87). Useful couplingoff groups cited in the literature include aryl, alkyl, and hetero thiol groups (88), nitrogen heterocycles (89), and arylazo groups (90). Because the thiols only depress the reactivity of the already low pKa acylamino pyrazolinones, they tend to be most useful on the arylamino couplers. Alkyl thiols are generally reluctant leaving groups in the developer because of their high pKa values, but eventually form image dye in the low pH bleach (56,91). Most of the arylthiol leaving groups cited in the literature show substitution at least in the 2-position. Bis-pyrazolinones, linked through the coupling site by methylene, substituted methylene, sulfide, and disulfide groups, reportedly give two-equivalent couplers, but only one of the couplers yields a dye (92). A more recent class of magenta dye-forming couplers is the pyrazolo-(3,2,-c)-5-triazoles (11) and related isomers (93), where X can be Cl, SR, S-aryl, or O-aryl.
136
COLOR PHOTOGRAPHY
Dyes from this class of couplers are exceedingly attractive; they have good thermal stability and much lower unwanted blue and red absorptions than dyes from the pyrazolinones. However, the high pKa values of the four-equivalent parents translate into unacceptably low reactivity. Two-equivalent analogs that have chloro or aryloxy leaving groups and hydrophilic groups elsewhere in the molecule provide good reactivity and form dye yield and are resistant to ambient formaldehyde and form dyes that do not require postprocess stabilization (94). A related class of pyrazolotriazoles (12) reportedly yields dyes that have improved light stability (95). The higher pKa values of pyrazolotriazole couplers have led to concerns over image variability induced by small changes in developer pH.
N
N
The dyes produced in chromogenic development have unwanted absorptions. For example, the cyan dye is expected to control or modulate red light alone and thus should absorb only between 600 and 700 nm, but it also shows lesser absorptions in the blue (400–500 nm) and green (500–600 nm) regions. Thus, exposure of the redsensitive layer of the film produces the desired density to red light in the negative, and also undesirable densities to blue and green light, resulting in desaturation or ‘‘muddying’’ of the color. For materials that are not directly viewed, like a color negative film, masking couplers provide an ingenious solution (31,98). Unlike a normal cyan dyeforming coupler, which is colorless, a cyan masking coupler bears a colored, preformed (usually azo) dye in the coupling-off position. The hue of this dye is chosen to match the unwanted blue-green absorption of the cyan dye that is generated. When coupling occurs, the preformed dye is released and washes out of the film or is destroyed. The result is a negative image formed by the cyan dye with its unwanted absorptions and an entirely complementary positive image left by the preformed dye that remains on the residual coupler (Fig. 14). This is equivalent to a perfect cyan image overlaid with a uniform blue-green density. Because the negative is printed onto color paper using separate cyan, magenta, and yellow exposures, a somewhat longer cyan exposure is required. Similar chemistry is employed to deal with the unwanted blue density of the magenta coupler (99).
Ballast
N X
Colored Masking Couplers
(CH2)n N
CH3
indazolones (97). The latter are unique because they do not contain an active methylene group and it is proposed, form magenta dyes with a zwitterionic structure.
H (11) N
N
N
CH3 CH
CH3 NH
CH2NH
Cl
C
R
O (12)
Other classes of magenta dye-forming couplers reported in the literature include pyrazolobenzimidazoles (96) and
R
R H N+ + ··
N
N
O−
R2N
N
O +
Cyan dye
Masking coupler
Density
R 2N
Log exposure Figure 14. Masking coupler used in the cyan layer that shows ( — ) the unwanted density to blue-green light that accompanies cyan dye formation matched by (- - - - ) a complementary density to blue-green from the unreacted coupler and ( — · — .) density to red light.
+ N2
COLOR PHOTOGRAPHY
The use of colored masking couplers does not come without a price, however. The preformed dye can absorb light during exposure, reducing effective film speed, and the background density of the masking dye can reduce the dynamic range of the negative. The first problem can be attacked by the using a process-removable group, which shifts the hue of the masking dye out of the visible region in the unprocessed film (100). DIR Couplers Masking couplers cannot be used for directly viewed materials because of the objectionable color of the mask itself. But similar advantages and more can be achieved by using development-inhibitor-releasing (DIR) couplers (32,101). These materials are usually image couplers that carry a silver development inhibitor (In) linked directly or indirectly to the coupling site (Eq. 6). When released as a function of dye formation, the development inhibitors migrate to the silver halide grain and either slow or stop further development. In addition to correcting of unwanted dye absorptions, DIR couplers can be used to improve the sharpness and reduce the granularity of films (102). NH2
OH R′ +
In
NR2
137
and is designed to migrate into the adjacent magenta dye-forming layer, where it inhibits silver development and reduces the production of green density to the same degree that unwanted green density is produced by the cyan dye. This is sometimes referred to as a red-onto-green interlayer interimage effect and can occur only when the predominantly red image in the scene has some green component. Similar color correction can originate from the yellow and magenta dye-forming layers, and the overall correction can be described by a 3 × 3 color matrix (101). DIR couplers can also be used to control granularity, a measure of the visual nonuniformity of the dye image resulting from the random distribution of the dye clouds. Granularity, the noise in the photographic signal, is proportional to the density divided by the square root of the number of signal-generating centers, in this case, silver halide grains. Increasing the amount of silver halide does not in itself reduce granularity, however, because this also generates more oxidized developer and greater density. However, a DIR coupler can permit the coating of more silver halide centers without increased dye density by releasing an inhibitor that allows each grain to develop only partially. Perhaps the most intriguing use of DIR couplers is their ability to improve the perceived sharpness of an image. If a sharp photographic edge or square wave signal is imposed on a piece of film, for example, by placing an opaque material over part of the film and exposing it to light, the
(a)
DIR coupler AgX
R′ O
Density
(6) NR2 + In
N Dye R S N
N N
N ; N
Distance
Z N
Development inhibitors have long been used to control photographic processes, but the mechanism by which they work remains in dispute. In one model of the developing silver halide grain, the inhibitor reacts with the surface silver ions at the etch pit, a defect region on the crystal where halide ions depart into solution (Fig. 9). This complexation prevents these silver ions from migrating interstitially to the root of the developing silver filament and slows down or stops development (103). Another model shows the inhibitor complexing with the silver metal filament, where its insulating properties prevent electron injection by the oxidizing developer (104). For color correction of, the unwanted green absorptions of a cyan dye, for example a cyan DIR coupler is added to the imaging layer along with the cyan image coupler. The inhibitor is released in proportion to cyan dye formation
(b)
Density
N where In =
Distance Figure 15. Light scatter and chemical diffusion lead to a loss of sharpness at (a) a photographic edge where ( — ) represents the image and (- - - - ) the developed image. In case of (b) an inhibitor modified edge, the sharpness of the (- - - - ) developed image can be restored by preferentially releasing a development inhibitor in image areas. ( — ) represents the inhibitor concentration profile. Diffusion of the inhibitor accentuates the edge relative to the macro portion of the image.
138
COLOR PHOTOGRAPHY
O
Coupler O
O (CH2)n
NCIn R
X
R
O−
O (CH2)n
Devox
Dye + X
N R
C
N (CH2)n + In−
In Switch diffuses
O
X
n = 0, 1 Figure 16. Reaction of a delayed-release DIR coupler and oxidized developer (Devox ). A delayed-release DIR coupler permits fine-tuning where and when the development inhibitor (In) is generated by releasing a diffusible inhibitor precursor or ‘‘switch’’ as a function of image formation. This permits control of inter- and intralayer effects.
resulting developed image shows some degradation of the edge, mostly because of light scatter, but also because of diffusion processes (Fig. 15a). This ‘‘softening’’ of the edge is perceived as a loss in sharpness. If, however, a DIR coupler is coated with the image coupler, it produces the inhibitor concentration profile of Fig. 15b. The resulting inhibition of development leads to the final dye image of Fig. 15b, where the density at the edge is enhanced relative to the macro density. This increased rate of density change at the edge is seen as sharpness (102). Although the varied uses for which DIR couplers are employed call for precise control over where the inhibitor diffuses, the very complexation mechanism by which inhibitors work preclude such control. The desired ability to target the inhibitor can be attained by using delayedrelease DIR couplers, which release not the inhibitor itself, but a diffusable inhibitor precursor or ‘‘switch’’ (Fig. 16) (105). Substituents (X, R) and structural design of the precursor permit control over both diffusivity and the rate of inhibitor release. Increasing the effective diffusivity of the inhibitor, however, means that more of it can diffuse into the developer solution where it can affect film in an undesirable, nonimagewise fashion. This can be minimized by using self-destructing inhibitors that are slowly destroyed by developer components and do not build up or ‘‘season’’ the process (106). Although most DIR couplers are based on image dyeforming parents, universal DIR couplers have appeared in the literature. These materials react with oxidized developer to produce the inhibitor (or precursor) and either a colorless dye, an unstable dye, or a washout dye (107). Universal DIR couplers could be used in any layer where there is a need to match only image-modifying properties, not hue, to the given layer. In addition to inhibitors and masking dyes, similar coupling mechanisms have been used to release other photographically useful fragments, including development accelerators, image dyes, bleaching accelerators, development competitors, and bleaching inhibitors (108). Other mechanisms for imagewise release of inhibitors include inhibitor-releasing developers (IRDs) (13) (109) and the related IRD-releasing couplers (14) (110). The former is of particular interest as a potential means of achieving image modification in reversal systems, where the image structure is determined in a black-and-white development
step before chromogenic development. OH
NHCOR′
R
In
R′
R′′
HO
OH (13)
Z
In
O
OH X
RCONH
Y
(14)
Post-Development Chemistry The silver and silver halide that remains in the film after color development must be removed to improve the appearance of the color image and to prevent the appearance from changing as silver halide is slowly photoreduced to silver metal. This is generally accomplished in two steps. The first, called bleaching, is an oxidation that converts silver metal to silver salts. The second, called fixing, is solubilization of the silver salts by complexation with a silver ligand. In some processes, particularly those used for color paper, the two steps can be combined in a single step called a bleach-fix or blix (62,111). The most common color film bleaches are the rehalogenating type; they contain halide ion, often bromide, to complex the silver ion being formed and drive the reaction to its conclusion. For many years, the most common bleaching agent was ferricyanide, a powerful bleaching agent that also can convert residual leuco dye into image dye (62). However, the high oxidation potential, E0 = 356 mV versus the hydrogen electrode (NHE), also makes ferricyanide capable of oxidizing the color developer carried over with the film. This can couple and form nonimagewise dye. For this reason, an intervening acidic ‘‘stop’’ bath must be used with ferricyanide bleaches. Rehalogenating bleaches based on ferric ethylenediaminetetraacetic acid (ferric EDTA) are lower in potential, E0 = 117 mV versus NHE, do not require a stop bath, and permit a shorter and simpler process (112). The bleaching efficiency of ferric EDTA can be improved by lowering the solution pH below 6. However, as the pH is dropped, the propensity for reducing indoaniline cyan dyes to the colorless leuco form increases. Thus, most iron ligand bleaches are designed to operate in a window bounded by the pH of the bath and the oxidation potential of the bleaching agent.
COLOR PHOTOGRAPHY
The rates of most color bleaching processes are limited by diffusion over at least part of the reaction. Color negative films tend to pass from a diffusion-limited into a chemically limited regime as sensitizing dye and other passivating species accumulate on the remaining silver surfaces (113). Persulfate anion is used as a bleaching agent in some motion picture film processes (114). Although it is thermodynamically attractive because of the very high (E0 = 2010 mV vs NHE) oxidation potential, persulfate is a kinetically slow bleach in the absence of a catalyst. Commonly, a prebath containing dimethylaminoethanethiol [108-02-1] is used; the use of bleach accelerator-releasing couplers or blocked bleach accelerators incorporated in the film has also been proposed (108,115). Thiosulfate, usually as its sodium or ammonium salt, is almost universally employed as the fixing agent for color films (111). Thiocyanate can be used as a fixing accelerator. Fixing performance is often defined by two parameters: the clearing time necessary to dissolve the silver halide and render the film optically transparent and the fixing time required to remove the complexed silver from the film. Fixing must be complete and the fixing agents thoroughly washed from the film because thiosulfate can destroy image dye by reduction or by other reactions in long-term storage. The complexation constants of thiosulfate with silver ion are sufficiently large to keep silver salts from being redeposited on the film when diluted in the wash. Film Quality Speed Standards for photographic speed are now coordinated worldwide by the International Standards Organization (ISO) in Geneva, Switzerland. ISO speed is determined under specified conditions of exposure, processing, and measurement. Standards are published for color negative and color reversal films (116,117). In amateur color photography, the most popular film speeds are in the ISO 100–400 range. For modern 35-mm cameras, these speeds offer a good compromise between image quality, depth of field, and ability to arrest motion, particularly when coupled with electronic flash. Several high-speed color films of ISO 1000 or greater, now available from primary manufacturers, are recommended for conditions of low illumination or fast subject motion. The larger silver halide grains necessary for high speed become progressively less efficient in converting absorbed radiation into a latent image, and in practice the linear relationship shown in Fig. 5 rolls off at the larger sizes. Image graininess becomes objectionable if high magnification is required. Sensitivity to ambient ionizing radiation of both cosmic and terrestrial origin also increases with crystal size, resulting in a proportion of crystals fogging during storage and increasing graininess (118,119). The human visual system can adapt to changes in the spectral balance of the scene illuminant. For example, a gray color appears gray under both daylight and tungsten illumination. However, a color film designed for use in daylight produces prints that have a yellowish cast if used with tungsten illumination. This effect can be reduced by
139
placing an appropriate filter over the camera lens or by adjusting light filtration when printing a color negative. In practice, designing a film for a particular illuminant involves adjusting its sensitivity to blue, green, and red spectral light to account for the relative abundance of each in that illuminant. Most amateur films are balanced for optimum performance in daylight at a correlated color temperature of 5500 K (4,21). Color Reproduction Color has three basic perceptual attributes: brightness, hue, and saturation. Saturation relates to the amount of the hue that is exhibited. The primary influences on the color quality of the final image are the red, green, and blue spectral sensitivities of the film or paper and the spectral absorption characteristics of the image dyes. If the green record is designed to match the sensitivity of the eye’s γ receptor and the red record the ρ receptor (Fig. 1), channel overlap is so severe that a satisfactory reproduction cannot be obtained. In practice, the peak sensitivity of the red record is moved to longer (ca 650 nm) wavelengths to reduce the overlap. A penalty is that colors that have strong reflectances at 650 nm or longer appear excessively reddish in the reproduction. Ideally, subtractive image dyes should exhibit no absorption outside the spectral ranges intended. In reality, considerable unwanted absorptions occur, even using the best dyes, and colored masking couplers (see Fig. 14) are designed to alleviate this problem. However, to avoid serious speed losses, a colored coupler is normally employed only under layers in the film that absorb in a similar range. Thus, a magenta-colored coupler, which absorbs green light, is used in the cyan pack; a yellow-colored coupler, which absorbs blue light, may be used in the magenta or cyan packs. Color may also be corrected by selectively suppressing dye generation in one color record as a function of exposure in another by using a diffusible development inhibitor generated by a development-inhibitor-releasing (DIR) coupler. Masking couplers and DIR couplers are the main tools for achieving the high color saturation of modern color negative films. Reversal films are designed so that the camera film is the film that contains the final image, so these are subject to additional constraints. The minimum densities in the image must be low to accommodate scene highlights, and colored masking couplers cannot be used. In addition, because the color development step is exhaustive, that is, it develops all of the residual silver halide, it is difficult to gain color correction from DIR couplers, which rely on influencing the relative development rates in a system that is only partially developed. Some degree of interlayer development inhibition can be obtained in the black-and-white developer by the migration of iodide released from the developing silver halide (120). Some of the factors important in designing modern color reversal films are their ability to recover good quality images from underexposed film (push-processing), high emulsion efficiency for good speed and grain, and film technology for improved sharpness (121).
140
COLOR PHOTOGRAPHY
Dye Stability The dyes used in photographic systems can degrade over time, both by thermal reactions and, if the image is displayed for extended periods of time, by photochemical processes. The relative importance of these two mechanistic classes, known as dark fade and light fade, respectively, depends on the way the product is to be used (122). Meaningful evaluations of dye stability on a practical time scale are difficult because the reactions themselves are by design either slow or inefficient. The t1/10 value, defined as the time for a 0.1 density loss from a dye patch of 1.0 initial density, for dark fade ranges from 30 to greater than 100 years for chromogenically generated dyes, whereas the quantum yields of photochemical fading are often on the order of 1 × 10−7 . Accelerated testing conditions that involve high-temperature storage or highintensity illumination must be used (123), although these are sometimes unreliable predictors of ambient fade (124). The perceived color stability of a photographic system is usually limited by the fading of its least stable dye, which can produce an undesirable shift in color balance. Whereas recovery of such faded images is often possible, a so-called neutral fade, in which all three color records lose density at approximately the same rate, is usually preferred. For light fade, the magenta dye has usually been limiting. Numerous studies support the hypothesis that the fading mechanism is photooxidative (125). Stabilization techniques have included exclusion of oxygen by barrier or encapsulation technologies; elimination of ultraviolet light by incorporated UV absorbers; quenching of one or more of the excited state species in the reaction sequence, that is, dye singlet, dye triplet, or singlet molecular oxygen, via incorporated quenching agents; and scavenging of reactive intermediates such as free radicals. Cyan light fade includes both reductive and oxidative components (126). The dark fade of yellow dyes is largely a hydrolytic process (127). Cyan dark fade results mostly from reduction of the dye to its leuco form. For the magenta record, dark storage can be dominated by yellowing reactions from the residual coupler. A light-induced yellowing has also been observed. These problems have been successfully addressed by the design of new couplers and coupling-off groups. The stabilities of modern photographic products have vastly improved over those of the past. For example, the magenta light stability of color negative print papers has improved by about two orders of magnitude between 1942 and the present (128). Although such products offer very acceptable image stability in most customer uses, other processes, such as silver dye bleach and dye-transfer processes, can offer even greater stability, albeit at a significant cost in processing convenience. Image Structure Because the primary photographic sensors are a population of silver halide crystals whose spatial distribution is random, the final image is also particulate. In black-and-white photography the image consists of opaque deposits of silver. In chromogenic photography
using incorporated couplers, the image is formed by the coupler-containing hydrophobic droplets dispersed with the silver halide. The droplet size, typically 0.2 µm in diameter, is usually less than the crystal size. Dye is formed in a cloud of droplets around each developing crystal, as oxidized developer is released. Individual droplets cannot be resolved under usual viewing conditions, but the dye clouds can be seen under magnification and convey a visual sensation of nonuniformity or graininess. The objective correlate of graininess is granularity, which is the spatial variation in density observed when numerous readings are taken on a uniformly exposed patch using a densitometer that has a very small aperture. The distribution of such measurements is approximately Gaussian and can be characterized by its standard deviation, σD . This quantity, called root-mean-square (RMS) granularity, is published by film manufacturers as a figure of merit. The features which differentiate a color image from a black-and-white image are the transparency of the dyes and the spreading of the dye cloud around the developing crystal. Dye clouds continue to grow during development as the oxidized developer diffuses farther to find unreacted coupler. Overlapping of adjacent dye clouds leads to a reduction in granularity as more and more of the voids are filled in. The size of the dye clouds can be controlled by reducing the amount of silver development per crystal by using of development-inhibitor-releasing couplers, or by having a soluble coupler in the developer that competes with the incorporated coupler for oxidized developer (129), the dye formed is subsequently washed out. Sharpness is the edge detail in the final image. There are many opportunities to lose sharpness in a photographic system. Camera focus, depth of field, subject movement, camera movement during exposure, and the optical properties of the film all contribute to image sharpness. In the negative–positive system, printer focus and the optics of the color paper stock are also important. Modern multilayers of the type illustrated in Fig. 8 are typically 20–25 µm thick. As light penetrates the front surface, it is scattered by the embedded silver halide crystals, which differ in refractive index from the supporting gelatin medium. This optical spread increases with depth, and thus the uppermost blue-sensitive layer records the sharpest image and the lowest red-sensitive layer the least sharp image. Over the years, film manufacturers have reduced coated thickness to improve sharpness and reduce material costs. Absorbing dyes may also be included in the film to reduce sideways light scatter, although at some cost in speed. These are generally removed by the processing solutions. Another method of improving sharpness is to change the layer order. Motion picture theatrical print film, for example, has the green record on top to improve sharpness, because the eye is most sensitive to differences in midspectrum light. The sharpness of a film is often assessed by the modulation transfer function (MTF), which measures how sinusoidal test patterns of different frequencies are reproduced by the photographic material (130). A perfect reproduction has a MTF of 100% at all frequencies. In practice, a decrease of MTF due to increasing frequency occurs as a result of optical degradation. Films that contain couplers
COLOR PHOTOGRAPHY
that release development inhibitors can display MTF values above 100% at low frequencies because of edge effects that increase the output signal amplitude beyond that of the reference signal. The transport of inhibitor fragments across image boundaries leads to development suppression or enhancement on a microscopic scale (Fig. 15). Such edge effects are particularly useful in boosting the apparent sharpness of the lower layers of the film most degraded by optical scatter. However, if chemical enhancement of edges is carried too far, the effects appear unnatural. Environmental Aspects Photographic processing (photofinishing) is a geographically dispersed chemical industry. An increasing proportion of images is processed in minilabs, rather than large, centralized photofinishers, so manufacturers have responded to the need for more environmentally-benign chemistry. Processing machines are rarely emptied and refilled with fresh solutions but require replenishment because of chemical use, evaporation, and overflow. New films have been designed that the make use of coated silver more efficient and thus require less processing chemicals. The solution overflow that occurs is usually chemically regenerated and returned to the tank. Silver is most commonly recovered by electrolysis or metallic replacement from the processing solutions or by ion exchange from the wash water (131). Loss of chemicals from one tank into the next has been minimized. The color paper process has progressed from five chemical solutions, three washes, and a replenishment rate of 75 µL/cm2 (70 mL/ft2 ) of film for each of the five solutions to two chemical solutions, one wash, and replenishment rates of 6 µL/cm2 (6 mL/ft2 ) and less than 3 µL/cm2 (3 mL/ft2 ). For color negative films, developer replenishment has dropped from more than 300 to 43 µL/cm2 (40 mL/ft2 ). Regeneration of the now reduced overflow has decreased chemical discharge by as much as 55% (132). The new chemistry of the RA-4 paper process permits eliminating of benzyl alcohol from the developer and a 50% reduction of the biological or chemical oxygen demand (BOD/COD) of the film/paper effluent. Substitution of the more powerful ferric 1,3-propylenediaminetetraacetic acid (ferric 1,3-PDTA) bleaching agent for ferric EDTA allowed a 60% reduction in both iron and ligand concentrations and total elimination of ammonia from the bleaching formulation (133). Although ferric EDTA and related materials are rapidly photodegraded in the environment (134), concerns over the potential for persistent chelating agents to translocate heavy metals led to the development of photobleaches based on biodegradable chelating agents such as methyliminodiacetic acid (MIDA) and ethylenediaminodisuccinic acid (135,136). During the period 1985–1998, some components in the effluent have been reduced to near zero, and an overall reduction in effluent concentration of as much as 80% has been attained (132). Some of these reductions have been achieved by using solutions of highly concentrated, precisely metered chemicals, culminating in a system that uses dry tablets to deliver the replenishment chemistry, and plain water as the only liquid.
141
Economic Aspects At present, more than 3 billion rolls of film are processed worldwide every year. Despite its great size, there is still a tremendous opportunity for growth in the consumer photographic business. The countries whose consumption is highest (U.S. and Japan) average 7–8 rolls of film per household per year, whereas in developing markets such as China and Russia, film use averages less than onehalf roll per household per year. The last two decades have seen an increasing dominance of the 35-mm color negative at the expense of the color slide because of its greater exposure latitude, the convenience of viewing prints, and the ease of obtaining duplicate prints. The growth in the popularity of picture taking has been fueled by the increasing sophistication of 35-mm point-andshoot cameras. These offer features such as autoexposure, autofocus, motor drive, and built-in flash, all within a very compact camera body. The single-use camera, in which the film box itself is equipped with a lens and shutter, has also been very successful. In 1996, a camera-film format called the Advanced Photo System (APS) was introduced as a new industry standard. This has a smaller negative size than 35mm film but offers advantages to the customer such as easy loading, smaller camera size, a choice of three print formats including panoramic, and an index print with each order. APS negatives are returned in their original cassette and provide a convenient interface to a film scanner and therefore images in digital form. The relative volumes of different film speeds have changed over the years; there is a trend toward higher speeds because of the photographic flexibility that they offers in improved ability to arrest motion, better zoom capability, and better depth of field. For example, dividing the color negative population into two categories — those of 400 speed and higher and those of 200 speed and lower — the relative volume of the higher speed films has increased by about 50% from 1995 to 1999. The photofinishing industry has been growing at more than 5% annually. Since the mid-1970s a shift has occurred to favor local minilabs over large centralized processing laboratories. Although minilabs are generally more expensive, consumers appreciate the convenience, rapid access, and personal service. It has become clear in the 1990s that the photographic business has gone through a period of transition, driven both by the threats and opportunities provided by digital technology. Digital still imaging is starting to reach the consumer through digital still cameras based on charge-coupled device (CCD) sensors. The price for an equivalently featured digital camera is still considerably higher than that of a film camera, but prices are declining rapidly. An advantage of digital photography is that the image can be modified and transmitted easily, and the people are becoming aware that there is more that they can do with their pictures. However, high-resolution scanning puts silver halide film on the same playing field as electronic capture for image manipulation and reuse for different purposes. The scanning or ‘‘digitization’’ of conventional films and papers is becoming an increasingly important pathway in the industry. In the future, every film may
142
COLOR PHOTOGRAPHY
be scanned as a matter of course after processing. Standalone ‘‘kiosks’’ provide a way to scan, copy, or enhance conventional photographic prints. There are now more than 20,000 such kiosks in the United States. Because the growth of the Internet, many new businesses have started up offering photographic services, including storage of customers’ pictures. It is too early to judge the longterm viability of their business models. However, it is clear that convenient hard copy output will be even more important in the future as people learn that they can use their images for different purposes. There will be more options for getting high-quality prints — through wholesale photofinishing, minilabs, inkjet printing at home, and the Internet. The professional segment of color photography includes portrait and wedding photography, advertising photography, and photojournalism. Films tailored to these markets are offered by the major photographic film manufacturers. Most photojournalists now use digital still cameras because they need rapid access to the image. The motion picture industry has enjoyed continued growth in the last two decades. In spite of videocassette recorders, demand continues strong for the theatrical experience of first-run movies. This experience is enhanced by the improved picture quality of 70-mm origination and projection. Color negative film is often the preferred medium for prerecorded television shows and advertising; telecine transfer converts the images to electronic form for transmission.
ABBREVIATIONS AND ACRONYMS APS BOD CCD CD COD DIR Dox EDTA IRD IRD ISO MIDA MTF pAg PDTA PPD QDI QMI RMS
Advanced Photo System Biological Oxygen Demand Charge-Coupled Device Color Developer Chemical Oxygen Demand Developer Inhibitor Releasing (Coupler) Oxidized Developer Ethylenediaminetetraacetic Acid International Standards Organization Inhibitor Releasing Developer International Standards Organization Methyliminodiacetic Acid Modulation Transfer Function Negative of the log of the silver ion concentration 1,3-propylenediaminetetraacetic Acid para-Phenylene diamine Quinone diimine Quinone monoimine Root Mean Square
BIBLIOGRAPHY Color Photography under ‘‘Photography’’ in ECT, 1st ed., vol. 10, pp. 577–584, by T. H. James, Eastman Kodak Company; ‘‘Color Photography’’ in ECT, 2nd ed., vol. 5, pp. 812–845, by J. R. Thirtle and D. M. Zwick, Eastman Kodak Company; in ECT, 3rd ed., vol. 6, pp. 617–646, by J. R. Thirtle and D. M. Zwick, Eastman Kodak Company; in ECT, 4th ed., vol. 6, pp. 965–1002, by J. A. Kapecki and J. Rodgers, Eastern Kodak Company.
1. E. J. Wall, History of Three-Color Photography, American Photographic, Boston, MA, 1925. 2. J. S. Friedman, History of Color Photography, 2nd ed., Focal Press, London, 1968. 3. R. M. Evans, W. T. Hanson, and W. L. Brewer, Principles of Color Photography, Wiley, NY, 1953. 4. R. W. G. Hunt, The Reproduction of Colour, 4th ed., Fountain Press, Tolworth, UK, 1987. 5. T. H. James, ed., Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977. 6. J. Sturge, V. Walworth, and A. Shepp, eds., Imaging Processes and Materials — Neblette’s Eighth Edition, Van Nostrand Reinhold, NY, 1989. 7. J. M. Eder, History of Photography, Columbia University Press, NY, reprinted by Dover, NY, 1978. 8. W. T. Hanson, Photogr. Sci. Eng. 21, 293–296 (1977). 9. A. Weissberger, Sci. Am. 58, 648–660 (1970). 10. P. Glafkides, Photographic Chemistry, vol. II, Fountain Press, London, 1960. 11. G. F. Duffin, Photographic Emulsion Chemistry, Focal Press, London, 1966. 12. G. Haist, Modern Photographic Processing, Wiley, NY, 1979. 13. L. F. A. Mason, Photographic Processing Chemistry, Focal Press, London, 1966. 14. F. W. H. Mueller, Photogr. Sci. Eng. 6, 166 (1962). 15. J. Pouradier, in E. Ostroff, ed., Pioneers of Photography, SPSE, Springfield, VA, 1987, Chap. 3. 16. T. J. Dagon, J. Appl. Photogr. Eng. 2, 42 (1976). 17. R. L. Heidke, L. H. Feldman, and C. C. Bard, J. Imag. Technol. 11, 93 (1985). 18. P. Krause, Modern Photogr. 48(4), 72 (1984); ibid. 49(11), 47 (1985). 19. R. D. Theys and G. Sosnovsky, Chem. Rev. 97, 83 (1997). 20. T. Tani, Photographic Sensitivity, Oxford University Press, NewYork, 1995. 21. R. W. G. Hunt, Measuring Color, Ellis Horwood, Chichester, UK, 1987, p. 21. 22. R. M. Evans, J. Photogr. Sci. 9, 243 (1961). 23. C. R. Berry, in Ref. 5, p. 98. 24. J. E. Maskasky, J. Imag. Sci. 30, 247 (1986). 25. B. H. Carroll, Photogr. Sci. Eng. 21, 151 (1977). 26. S. Dahne, Photogr. Sci. Eng. 23, 219 (1979). 27. J. Spence and B. H. Carroll, J. Phys. Colloid Chem. 52, 1,090 (1948). 28. J. T. Kofron and R. E. Booms, J. Soc. Photogr. Sci. Technol. Jpn. 49(6), 499 (1986). 29. U.S. Pat. 4,439,520 (1984), J. T. Kofron and co-workers (to Eastman Kodak Company). 30. C. E. K. Mees, J. Chem. Ed. 6, 286 (1929). 31. W. T. Hanson, J. Opt. Soc. Am. 40, 166 (1950). 32. C. R. Barr, J. R. Thirtle, and P. W. Vittum, Photogr. Sci. Eng. 13, 214 (1969). 33. U.S. Pat. 4,923,788 (1990), L. Shuttleworth, P. B. Merkel, G. M. Brown (to Eastman Kodak Company). 34. Ger. Pat. 253,335 (1912), R. Fischer, R. Fischer, and H. Siegrist, Photogr. Korresp. 51, 18 1914. 35. L. E. Friedrich and J. E. Eilers, in F. Granzer and E. Moisar, eds., Progress in Basic Principles of Imaging Systems, Vieweg & Sohn, Wiesbaden, Germany, 1987, p. 385. 36. R. L. Bent and co-workers, J. Am. Chem. Soc. 73, 3,100 (1951).
COLOR PHOTOGRAPHY 37. L. K. J. Tong and M. C. Glesmann, J. Am. Chem. Soc. 79, 583,592 (1957). 38. J. Eggers and H. Frieser, Z. Electrochem. 60, 372,376 (1956). 39. L. K. J. Tong, M. C. Glesmann, and C. A. Bishop, Photogr. Sci. Eng. 8, 326 (1964); L. K. J. Tong and M. C. Glesmann, Photogr. Sci. Eng. 8, 319 (1964); R. C. Baetzold and L. K. J. Tong, J. Am. Chem. Soc. 93, 1347 (1971). 40. J. C. Weaver and S. J. Bertucci, J. Photogr. Sci. 30, 10 (1988). 41. E. R. Brown, in S. Patai and Z. Rappoport, eds., The Chemistry of Quinonoid Compounds, vol. 2, Wiley-Interscience, NY, 1988. 42. L. K. J. Tong and M. C. Glesmann, J. Am. Chem. Soc. 90, 5,164 (1968). 43. L. K. J. Tong, M. C. Glesmann, and R. L. Bent, J. Am. Chem. Soc. 82, 1,988 (1960). 44. J. Texter, D. S. Ross, and T. Matsubara, J. Imag. Sci. 34, 123 (1990). 45. L. K. J. Tong, in Ref. 5, pp. 345–351. 46. E. R. Brown and L. K. J. Tong, Photogr. Sci. Eng. 19, 314 (1975). 47. J. Texter, J. Imag. Sci. 34, 243 (1990). 48. U.S. Pat. 2,108,243 (1938), B. Wendt (to Agfa Ansco Corp.); U.S. Pat. 2,193,015 (1940), A. Weissberger (to Eastman Kodak Company); Brit. Pat. 775,692 (1957), D. W. C. Ramsay (to Imperial Chemical Ind.); Fr. Pat. 1,299,899 (1962), W. Pelz and W. Puschel (to Agfa-Gevaert), for examples. 49. C. A. Bishop and L. K. J. Tong, Photogr. Sci. Eng. 11, 30 (1967). 50. R. Bent and co-workers, Photogr. Sci. Eng. 8, 125 (1964). 51. F. W. H. Mueller, in S. Kikuchi, ed., The Photographic Image, Focal Press, London, 1970, p. 91. 52. J. R Thirtle, Chemtech. 9, 25 (1979), for a tutorial. 53. Brit. Pat. 701,237 (1953), (to DuPont); U.S. Pat. 3,767,412 (1973), M. J. Monbaliu, A. Van Den Bergh, and J. J. Priem (to Agfa-Gevaert); Brit. Pat. 2,092,573 (1982), M. Yagihara, T. Hirano, and K. Mihayashi (to Fuji Photo Film); U.S. Pat. 4,612,278 (1986), P. Lau and P. W. Tang (to Eastman Kodak Company). 54. F. Sachs, Berichte 33, 959 (1900). 55. C. A. Bishop and L. K. J. Tong, J. Phys. Chem. 66, 1,034 (1962). 56. J. A. Kapecki, D. S. Ross, and A. T. Bowne, Proc. Inte. EastWest Symp. II, The Society for Imaging Science and Technology, Springfield, VA, 1988, p. D-11. 57. L. E. Friedrich, unpublished data, 1983. 58. T. Matsubara and J. Texter, J. Colloid Sci. 112, 421 (1986); J. Texter, T. Beverly, S.R. Templar, and T. Matsubara, Ibid. 120, 389 (1987). 59. U.S. Pat. 4,443,536, 1984, G. J. Lestina (to Eastman Kodak Company). 60. Brit. Pat. 1,193,349, 1970, J. A. Townsley and R. Trunley (to Ilford, Ltd.); W. J. Priest, Res. Discl. 16468, 75–80, 1977; U.S. Pat. 4,957,857, 1990, K. Chari (to Eastman Kodak Company). 61. C. Barr, G. Brown, J. Thirtle, and A. Weissberger, Photogr. Sci. Eng. 5, 195 (1961). 62. K. H. Stephen, in Ref. 5, pp. 462–465. 63. U.S. Pat. 4,690,889, 1987, N. Saito, K. Aoki, and Y. Yokota (to Fuji Photo Film).
143
64. U.S. Pat. 2,367,531, 1945,I. L. Salminen, P. Vittum, and A. Weissberger (to Eastman Kodak Company); U.S. Pat. 2,423,730, 1947, I. L. Salminen and A. Weissberger (to Eastman Kodak Company); U.S. Pat. 4,333,999, 1982, P. T. S. Lau (to Eastman Kodak Company). 65. U.S. Pat. 2,895,826, 1959, I. Salminen, C. Barr, and A. Loria (to Eastman Kodak Company). 66. U.S. Pat. 2,369,929, 1945, P. Vittum and W. Peterson (to Eastman Kodak Company); U.S. Pat. 2,772,162, 1956, I. Salminen and C. Barr (to Eastman Kodak Company); U.S. Pat. 3,998,642, 1976, P. T. S. Lau, R. Orvis, and T. Gompf (to Eastman Kodak Company). 67. E. Wolff, in Proc. SPSE 43rd Conf., The Society for Imaging Science and Technology, Springfield, VA, 1990, pp. 251–253. 68. Eur. Pat. Apps. 744,655 and 717,315, 1996, S. Ikesu, V. B. Rudchenko, M. Fukuda Y. Kaneko (to Konica Corp.) 69. U.S. Pat. 2,407,210, 1946, A. Weissberger, C. J. Kibler, and P. W. Vittum (to Eastman Kodak Company); Brit. Pat. 800,108, 1958, F. C. McCrossen, P. W. Vittum, and A. Weissberger (to Eastman Kodak Company); Ger. Offen. 2,163,812, 1970, M. Fujiwhara, T. Kojima, and S. Matsuo (to Konishiroku KK.); Ger. Offen. 2,402,220, 1974, A. Okumura, A. Sugizaki, and A. Arai (to Fuji Photo Film); Ger. Offen. 2,057,941, 1971, I. Inoue, T. Endo, S. Matsuo, and M. Taguchi (to Konishiroku KK.); Eur. Pat. 296793 A2, 1988, B. Clark, N. E. Milner, and P. Stanley (to Eastman Kodak Company); Eur. Pat. 379,309, 1990, S. C. Tsoi (to Eastman Kodak Company). 70. U.S. Pat. 3,265,506, 1966, A. Weissberger and C. Kibler (to Eastman Kodak Company); U.S. Pat. 3,770,446, 1973, S. Sato, T. Hanzawa, M. Furuya, T. Endo, and I. Inoue (to Konishiroku KK.); Fr. Pat. 1,473,553, 1967, R. Porter (to Eastman Kodak Company); U.S. Pat. 3,408,194, 1968, A. Loria (to Eastman Kodak Company); U.S. Pat. 3,894,875, 1975, R. Cameron and W. Gass (to Eastman Kodak Company). 71. E. Pelizzetti and C. Verdi, J. Chem. Soc. Perkin II 806 (1973); E. Pelizetti and G. Saini, J. Chem. Soc. Perkin II 1,766 (1973); D. Southby, R. Carmack, and J. Fyson, in Ref. 67, pp. 245–247. 72. G. Brown and co-workers, J. Am. Chem. Soc. 79, 2,919 (1957). 73. J. A. Kapecki, unpublished data, 1981; U.S. Pat. 4,401,752, 1981, P. T. S. Lau (to Eastman Kodak Company). 74. U.S. Pat. 5,359,080, 1984, Y. Shimura, H. Kobayashi, Y. Yoshioka (to Fuji Photo Film Co.) 75. U.S. Pat. 5,213,958, 1993, M. Motoki, S. Ichijima, N. Saito, T. Kamio, and K. Mihayashi (to Fuji Photo Film Co.); U. S. Pat. 6,083,677, 2000, T. Welter and J. Reynolds (to Eastman Kodak Company). 76. U.S. Pat. 5,457,004, 1995, J. B. Mooberry, J. J. Siefert, D. Hoke, D. T. Southby, and F. D. Coms (to Eastman Kodak Company); D. Hoke, J. B. Mooberry, J. J. Siefert, D. T. Southby, and Z. Z. Wu, J. Imaging Sci. Technol. 42, 528 (1988). 77. Brit. Pat. 875,470, 1961, E. MacDonald, R. Mirza, and J. Woolley (to Imperial Chemical Ind.). 78. Brit. Pat. 778,089, 1957, J. Woolley (to Imperial Chemical Ind.). 79. U.S. Pat. 2,600,788, 1952, A. Loria, A. Weissberger, and P. W. Vittum (to Eastman Kodak Company); U.S. Pat. 2,369,489, 1945, H. D. Porter and A. Weissberger (to Eastman Kodak Company); U.S. Pat. 1,969,479, 1934, M. Seymour (to Eastman Kodak Company).
144
COLOR PHOTOGRAPHY
80. G. Brown, B. Graham, P. Vittum, and A. Weissberger, J. Am. Chem. Soc. 73, 919 (1951). 81. U.S. Pat. 2,343,703, 1944, H. Porter and A. Weissberger (to Eastman Kodak Company); U.S. Pat. 2,829,975, 1958, S. Popeck and H. Schulze (to General Aniline and Film); U.S. Pat. 2,895,826, 1959, I. Salminen, C. Barr, and A. Loria (to Eastman Kodak Company); U.S. Pat. 2,691,659, 1954, B. Graham and A. Weissberger (to Eastman Kodak Company); U.S. Pat. 2,803,544, 1957, C. Greenhalgh (to Imperial Chemical Ind.); Brit. Pat. 1,059,994, 1967, C. Maggiulli and R. Paine (to Eastman Kodak Company). 82. U.S. Pat. 2,600,788, 1952, A. Loria, A. Weissberger, and P. Vittum (to Eastman Kodak Company); U.S. Pat. 3,062,653, 1962, A. Weissberger, A. Loria, and I. Salminen (to Eastman Kodak Company). 83. U.S. Pat. 3,127,269, 1964, C. W. Greenhalgh (to Ilford, Ltd.); U.S. Pat. 3,519,429, 1970, G. Lestina (to Eastman Kodak Company). 84. U.S. Pat. 3,582,346, 1971, F. Dersch (to Eastman Kodak Company); U.S. Pat. 3,811,891, 1974, R. S. Darlak and C. J. Wright (to Eastman Kodak Company). 85. P. W. Vittum and F. C. Duennebier, J. Am. Chem. Soc. 72, 1,536 (1950). 86. A. Wernberg, unpublished data, 1986. 87. L. E. Friedrich, in Ref. 67, p. 261. 88. U.S. Pat. 3,227,554, 1966, C. R. Barr, J. Williams, and K. E. Whitmore (to Eastman Kodak Company); U.S. Pat. 4,351,897, 1982, K. Aoki and co-workers (to Fuji Photofilm Co.); U.S. Pat. 4,853,319, 1989, S. Krishnamurthy, B. H. Johnston, K. N. Kilminster, D. C. Vogel, and P. R. Buckland, (to Eastman Kodak Company). 89. U.S. Pat. 4,241,168, 1980, A. Arai, K. Shiba, M. Yamada, and N. surFurutachi (to Fuji Photofilm Co.). 90. U.S. Pat. 3,519,429, 1970, G. J. Lestina (to Eastman Kodak Company). 91. N. Furutachi, Fuji Film Res. Dev. 34, 1 (1989). 92. U.S. Pat. 2,213,986, 1941, J. Kendall and R. Collins (to Ilford, Ltd.); U.S. Pat. 2,340,763, 1944, D. McQueen (to DuPont); U.S. Pat. 2,592,303, 1952, A. Loria, P. Vittum, and A. Weissberger (to Eastman Kodak Company); U.S. Pat. 2,618,641, 1952, A. Weissberger and P. Vittum (to Eastman Kodak Company); U.S. Pat. 3,834,908, 1974, H. Hara, Y. Yokota, H. Amano, and T. Nishimura (to Fuji Photo Film Co.). 93. U.S. Pat. 3,725,067, 1973, J. Bailey (to Eastman Kodak Company); U.S. Pat. 3,810,761, 1974, J. Bailey, E. B. Knott, and P. A. Marr (to Eastman Kodak Company). 94. U.S. Pat. 4,443,536, 1984, G. J. Lestina (to Eastman Kodak Company); Eur. Pat. 284,240, 1990, A. T. Bowne, R. F. Romanet, and S. E. Normandin (to Eastman Kodak Company); Eur. Pat. 285,274, 1990, R. F. Romanet and T. H. Chen (to Eastman Kodak Company). 95. U.S. Pat. 4,540,654, 1985, T. Sato, T. Kawagishi, and N. Furutachi (to Fuji Photo Film). 96. K. Menzel, R. Putter, and G. Wolfrum, Angew. Chem. 74, 839 (1962); Brit. Pat. 1,047,612, 1966, K. Menzel and R. Putter (to Agfa-Gevaert). 97. J. Jennen, Chim. Ind. 67(2), 356 (1952). 98. W. T. Hanson Jr. and P. W. Vittum, PSA J. 13, 94 (1947); U.S. Pat. 2,449,966, 1948, W. T. Hanson Jr. (to Eastman Kodak Company); K. O. Ganguin and E. MacDonald, J. Photogr. Sci. 14, 260 (1966). 99. Ref. 4; pp. 270 ff; pp. 320–321.
100. U. S. Pat. 5,364,745, 1994, J. B. Mooberry, S. P. Singer, J. J. Seifert, R. J. Ross, and D. L. Kapp (to Eastman Kodak Company) 101. U.S. Pat. 3,227,554, 1966, C. R. Barr, J. Williams, and K. E. Whitmore (to Eastman Kodak Company); C. R. Barr, J. R. Thirtle, and P. W. Vittum, Photogr. Sci. Eng. 13, 74,214 (1969). 102. M. A. Kriss, in Ref. 5, pp. 610–614; P. Kowaliski, Applied Photographic Theory, Wiley, London, UK, 1972, pp. 466–468. 103. L. E. Friedrich, unpublished data, 1985; K. Liang, in Ref. 35, p. 451. 104. T. Kruger, J. Eichmans, and D. Holtkamp, J. Imag. Sci. 35, 59 (1991). 105. U.S. Pat. 4,248,962, 1981, P. T. S. Lau (to Eastman Kodak Company); Brit. Pat. 2,072,363, 1981, R. Sato, Y. Hotta, and K. Matsuura (to Konishiroku KK.). 106. Brit. Pat. 2,099,167, 1982, K. Adachi, H. Kobayashi, S. Ichijima, and K. Sakanoue (to Fuji Photo Film Co.); U.S. Pat. 4,782,012, 1988, R. C. DeSelms and J. A. Kapecki (to Eastman Kodak Company). 107. U.S. Pat. 3,632,345, 1972, P. Marz, U. Heb, R. Otto, W. Puschel, and W. Pelz (to Agfa-Gevaert); U.S. Pat. 4,010,035, 1977, M. Fujiwhara, T. Endo, and R. Satoh (to Konishiroku Photo); U.S. Pat. 4,052,213, 1977, H. H. Credner and co-workers (to Agfa-Gevaert); U.S. Pat 4,482,629, 1984, S. Nakagawa, H. Sugita, S. Kida, M. Uemura, and K. Kishi (to Konishiroku KK.). 108. T. Kobayashi, in Proc. 16th Symp. Nippon Shashin Kokkai, Japan, 1986; U.S. Pat. 4,390,618, 1983, H. Kobayashi, T. Takahashi, S. Hirano, T. Hirose, and K. Adachi (to Fuji Photo Film); U.S. Pat. 4,859,578, 1989, D. Michno, N. Platt, D. Steele, and D. Southby (to Eastman Kodak Company); U.S. Pat. 4,912,025, 1990, N. Platt, D. Michno, D. Steele, and D. Southby (to Eastman Kodak Company); Eur. Pat. 193,389, 1990, J. L. Hall, R. F. Romanet, K. N. Kilminster, and R. P. Szajewski (to Eastman Kodak Company). 109. Brit. Pat. 1,097,064, 1964 C. R. Barr (to Eastman Kodak Company); Belg. Pat. 644,382, 1964, R. F. Porter, J. A. Schwan, and J. W. Gates Jr. (to Eastman Kodak Company). 110. K. Mihayashi, K. Yamada, and S. Ichijima, in Ref. 67, p. 254. 111. G. I. P. Levenson, in Ref. 5, pp. 437–461. 112. K. H. Stephen and C. M. MacDonald, Res. Disclosure 240, 156 (1984); J. L. Hall and E. R. Brown, in Ref. 35, p. 438. 113. S. Matsuo, Nippon Shashin Gak. 39(2), 81 (1976). 114. Manual for Processing Eastman Color Films, Eastman Kodak Company, Rochester, NY, 1988. 115. U.S. Pat. 4,684,604, 1987, J. W. Harder (to Eastman Kodak Company); U.S. Pat. 4,865,956, 1989, J. W. Harder and S. P. Singer (to Eastman Kodak Company); U.S. Pat. 4,923,784, 1990, J. W. Harder (to Eastman Kodak Company). 116. ISO 5800:1987(E), Photography — Color Negative Films for Still Photography — Determination of ISO Speed. 117. ISO 2240:1982(E), Photography — Color Reversal Camera Films — Determination of ISO Speed. 118. A. F. Sowinski and P. J. Wightman, J. Imag. Sci. 31, 162 (1987). 119. D. English, Photomethods, 16 (May 1988). 120. W. T. Hanson Jr. and C. A. Horton, J. Opt. Soc. Am. 42, 663 (1952).
COLOR PHOTOGRAPHY 121. J. D. Baloga, Proc. P.I.C.S. Conf. (Society for Imaging Science and Technology), Portland, Oregon, May 17–20, 1998, p. 299; J. D. Baloga and P. D. Knight, J. Soc. Photogr. Sci. Technol. Jpn. 62, 111 (1999). 122. R. J. Tuite, J. Appl. Photogr. Eng. 5, 200 (1979). 123. K. O. Ganguin, J. Photogr. Sci. 9, 172 (1961); D. C. Hubbell, R. G. McKinney, and L. E. West, Photogr. Sci. Eng. 11, 295 (1967). 124. Y. Seoka and K. Takahashi, Second Symp. Photogr. Conserv. Society of Scientific Photography, Japan., Tokyo, July 1986, pp. 13–16. 125. P. Egerton, J. Goddard, G. Hawkins, and T. Wear, Royal Photogr. Soc. Colour Imaging Symp., Cambridge, UK, Sept. 1986, p. 128. 126. K. Onodera, T. Nishijima, and M. Sasaki, in Proc. Int. Symp.: The Stability Conser. Photogr. Images, Bangkok, Thailand, Nov. 1986. 127. E. Hoffmann and A. Bruylants, Bull. Soc. Chim. Belges 75, 91 (1966); K. Sano, J. Org. Chem. 34, 2,076 (1969). 128. R. L. Heidke, L. H. Feldman, and C. C. Bard, J. Imag. Technol. 11(3), 93 (1985). 129. U.S. Pat 2,689,793, 1954, W. R. Weller and N. H. Groet (to Eastman Kodak Company).
145
130. J. C. Dainty and R. Shaw, Image Science, Academic Press, London, 1974; M. Kriss, in Ref. 5, Chap. 21, p. 596. 131. CHOICES — Choosing the Right Silver-Recovery Method for Your Needs, Kodak Publication J-21, Rochester, NY, 1989. 132. T. Cribbs, Sixth Int. Symp. Photofinishing Technology, The Society for Imaging Science and Technology, Springfield, VA, 1990, p. 53; Kodak Publication Z130, revision of July, 1998 at www.kodak.com. 133. D. G. Foster and K. H. Stephen, in Ref. 132, p. 7. 134. F. G. Kari, S. Hilger, and S. Canonica, Enviro. Sci. and Techno. 29, 1,008 (1995); F. G. Kari and W. Giger, ibid. 29, 2,814 (1995). 135. U.S. Pat. 4,294914, 1981, J. Fyson (to Eastman Kodak Company); U.S. Pat. 5,652,085 and 5,691,120, 1997, D. A. Wilson, D. K. Crump, and E. R. Brown (to Eastman Kodak Company); Eur. Pat. 532,003B1, 1998, Y. Ueda (to Konica Corporation). 136. S. Koboshi, M. Hagiwara, Y. Ueda, IS&T Tenth Int. Symp. on Photofinishing Technol. Paper Summaries, p. 17, The Society for Imaging Science and Technology, Springfield, VA, 1998.
D DIGITAL
VIDEO
TELEVISION ARUN N. Bell Labs
SCANNING
The image information captured by a television camera conveys color intensity (in terms of red, green, and blue primary colors) at each spatial location (x, y) and for each time instance t. Thus, the image intensity is multidimensional (x, y, t). However, it needs to be converted to a unidimensional signal for processing, storage, communications, and display. Raster scanning is the process used to convert a three-dimensional (x, y , t) image intensity into a one-dimensional television waveform (1). The first step is to sample the television scene many times (l/T, where T is frame period in seconds) per second to create a sequence of still images (called frames). Then, within each frame, scan lines are created by vertical sampling. Scanning proceeds sequentially, left to right for each scan line and from top to bottom line at a time within a frame. In a television camera, an electron beam scans across a photosensitive target upon which the image is focused. In more modern cameras, chargecoupled devices (CCDs) are used to image an area of the picture, such as an entire scan line. At the other end of the television chain, using raster-scanned displays, an electronic beam scans and lights up the picture elements in proportion to the light intensity. Although it is convenient to think that all of the samples of a single frame occur at one time (similar to the simultaneous exposure of a single frame for film), the scanning in a camera and in a display makes every sample correspond to a different time.
NETRAVALI
Lucent Technolo@es Murray Hill, NJ Analog television was developed and standardized in the 1940s mainly for over-the-air broadcast of entertainment, news, and sports. A few upward compatible changes have been made in the intervening years, such as color, multi-channel sound, closed captioning, and ghost cancellation, bit the underlying analog system has survived a continuous technological evolution that has pervaded all other media. Television has stimulated the development of a global consumer electronics industry that has brought high-density magnetic recording, high-resolution displays, and low-cost imaging technologies from the laboratory into the living room. A vast array of video production processing technologies make high-quality programming an everyday reality, real-time on-site video the norm rather than the exception, and video the historical medium of record throughout the world. More recently, the emergence of personal computers and high speed networks has given rise to desktop video to improve productivity for businesses. In spite of this impressive record and a large invested base, we are on the threshold of a major disruption in the television industry. After fifty years of continuous refinement, the underlying technology of television is going to be entirely redone. Digital video is already proliferating in a variety of applications such as videoconferencing, multimedia computing, and program production; the impediments that have held it back are rapidly disappearing. The key enabling technologies are (1) mature and standardized algorithms for high-quality compression; (2) inexpensive and powerful integrated circuits for processing, storing, and reconstructing video signals; (3) inexpensive, high-capacity networks for transporting video; (4) uniform methods for storing, addressing, and accessing multimedia content; and (5) evolution of computer architecture to support video I/O. The market drivers include: (1) direct consumer access for content providers, (2) convergence of video with other information sources such as print; (3) the emergence of a fast growing consumer market for personal computing, (4) the evolution of the Internet and other networks in the commercial domain, and (5) the removal of various regulatory barriers. This article deals with the technology of digital television. We first start with the way the television signal is sampled (scanning) and digitized. Then we discuss techniques of compression to reduce the bit rate to a manageable level and describe briefly the emerging standards for compression.
Progressive
and
Interlaced
Scan
There are two types of scanning: progressive (also called sequential) and interlaced. In progressive scanning, the television scene is first sampled in time to create frames, and within each frame, all of the raster lines are scanned in order from top to bottom. Therefore, all of the vertically adjacent scan lines are also temporally adjacent and are highly correlated even in the presence of rapid motion in the scene. Almost all computer displays, are sequentially scanned, especially for all high-end computers. In interlaced scanning (Fig. l), all of the odd-numbered lines in the entire frame are scanned first during the first half of the frame period T, and then the even-numbered lines are scanned during the second half. This process produces two distinct images per frame at different times. The set of odd-numbered lines constitutes the odd field, and the even-numbered lines make up the even field. All current TV systems (National Television System Committee [NTSCI, PAL, SECAM) use interlaced scanning. One of the principal benefits of interlaced scanning is reducing the scan rate (or the bandwidth) without significantly reducing image quality. This is done with a relatively high field rate (a lower field rate would cause flicker), while maintaining a high total number of scan lines in a frame (lower number of lines per frame would reduce resolution of static images). Interlace cleverly preserves the high-detail visual 146
DIGITAL
Figure
television frame is divided into an odd field odd-numbered scan lines) and an even field even-numbered scan lines).
1. A
(containing (containing
information and, at the same time, avoids visible large area flicker at the display due to insufficient temporal postfiltering by the human eye. The NTSC has 15,735 scan lines/s or 525 lines/frame, because there are 29.97 frames/s. For each scan line, a small period of time (16% to 18% of total line time), called blanking or retrace, is allocated to return the scanning beam to the left edge of the next scan line. European systems (PAL and SECAM) have 625 lines/frame, but 50 fields/s. The larger number of lines results in better vertical resolution, whereas larger numbers of frames result in better motion rendition and lower flicker. There is no agreement worldwide yet, but highdefinition TV (HDTV) will have approximately twice the horizontal and vertical resolution of standard television. In addition, HDTV will be digital, where the television scan lines will also be sampled horizontally in time and digitized. Such sampling will produce an array of approximately 1000 lines and as many as 2000 pixels per line. If the height/width ratio of the TV raster is equal to the number of scan line/number of samples per line, the array is referred to as having “square pixels,” that is, the electron beam is spaced equally in the horizontal and vertical direction, or has a square shape. This facilitates digital image processing as well as computer synthesis of images. One of the liveliest debates regarding the next generation of television systems involves the type of scanning to be employed: interlaced or progressive. Interlaced scanning was invented in the 1930s when signal processing techniques, hardware, and memory devices were all in a state of infancy. Because all current TV systems were standardized more than five decades ago, they use interlace, and therefore, the technology and the equipment (e.g., cameras) using interlace are mature. However, interlace often shows flickering artifacts in scenes with sharp detail and has poor motion rendition, particularly for fast vertical motion of small objects. In addition, digital data are more easily compressed in progressively scanned frames. Compatibility with film and computers also favors progressive scanning. In the future, because different stages of the television chain have different requirements, it is likely that creation (production studios), transmission, and display may employ different scanning methods. Production studios require high -quality cameras and compatibility with film
VIDEO
147
and computer-generated material, all of very high quality. If good progressive cameras were available and inexpensive, this would favor progressive scanning at even higher scan rates (> 1,000 lines/frame). However, transmission bandwidth, particularly for terrestrial transmissions, is expensive and limited, and even with bandwidth compression, current technology can only handle up to 1,000 lines/frame. Display systems can show a better picture by progressive scanning and refreshing at higher frame rates (even if the transmission is interlaced and at lower frame rates) made possible by frame buffers. Thus, though there are strong arguments in favor of progressive scanning in the future, more progress is needed on the learning curve of progressive equipment. The (Federal Communication Commission) FCC in the United States therefore decided to support multiple scanning standards for terrestrial transmission, one interlace and five progressive, but using a migration path toward the exclusive use of progressive scanning in the future. Image
Aspect
Ratio
The image aspect ratio is generally defined as the ratio of picture width to height. It impacts the overall appearance of the displayed image. For standard TV, the aspect ratio is 4 : 3. This value was adopted for TV because this format was already used and found acceptable in the film industry prior to 1953. However, since then, the film industry has migrated to wide-screen formats with aspect ratios of 1.85 or higher. Subjective tests on viewers show a significant preference for a wider format than that used for standard TV, so HDTV plans to use an aspect ratio of 1.78, which is quite close to that of the wide-screen film format. Image
Intensity
Light is a subset of electromagnetic energy. The visible spectrum ranges from 380 to 780 nm in wavelength. Thus, visible light in a picture element can be specified completely (pel) by its wavelength distribution {S(k)}. This radiation excites three different receptors in the human retina that are sensitive to wavelengths near 445 (blue), 535 (green), and 570 (red) nm. Each type of receptor measures the energy in the incident light at wavelengths near its dominant wavelength. The three resulting energy values uniquely specify each visually distinct color C. This is the basis of the trzchromatic theory of color which states that for human perception, any color can be synthesized by an appropriate mixture of three properly chosen primary colors R, G, and B (2). For video, the primaries are usually red, green, and blue. The amounts of each primary required are called the tristimulus values. If a color C has tristimulus values Rc, Gc, and Bc, then C = Rcbf R + GcG + BcB. The tristimulus values of a wavelength distribution S(k) are given by
(1)
148
DIGITAL
VIDEO
Luminance is an objective measure of brightness. Different contributions of wavelengths to the sensation of brightness are represented by the relative luminance efficiency y(h). Then, the luminance of any given spectral distribution S(h) is given by Y = k,
Figure 2. The color-matching functions for the 2” standard observer, based on primaries of wavelengths 700 (red), 546.1 (green), and 435.8 nm (blue) have units such that equal quantities of the three primaries are needed to match the equal energy white.
where {r(h), g(A), b(h)} are called the color matching functions for primaries R, G, and B. These are also the tristimulus values of unit intensity monochromatic light of wavelength h. Figure 2 shows color matching functions for the primary colors chosen to be spectral (light of a single wavelength) colors of wavelengths 700.0, 546.1, and 435.8 nm. Equation (1) allows us to compute the tristimulus values of any color that has a given spectral distribution S(A), by using color matching functions. One consequence of this is that any two colors whose spectral distributions are Sl(h) and Sa(A) match if and only if RI = G1 = B1 =
s s s
&(a)r(a)da
=
Sl(h)g(h)dh
=
&(h)b(h)dh
=
s s s
S,(A)r(a)da
= Rz,
&(h)g(h)dh
= G2,
&(A)b(A)da
= Ba,
(2)
where {RI, Gi, Bl} and {Rz, Gz , Bz} are the tristimulus values of the two distributions S1 (h) and Sa (A), respectively. This could happen even if Si (a) were not equal to Sz (A) for all wavelengths in the visible region. Instead of specifying a color by its tristimulus values {R, G, B}, normalized quantities called chromaticity coordinates {r, g, b} are often used: r=
R R+G+B’
g=
G R+G+B’
b=
B
R+G+B’
COMPOSITE
(Wh
3
TV SYSTEMS
A camera that images a scene generates for each pel the three color tristimulus values RGB, which may be further processed for transmission or storage. At the receiver, the three components are sent to the display, which regenerates the contents of the scene at each pel from the three color components. For transmission or storage between the camera and the display, a luminance signal Y that represents brightness and two chrominance signals that represents color are used. The need for such a transmission system arose with NTSC, the standard used in North America and Japan, where compatibility with monochrome receivers required a black-and-white signal, which is now referred to as the Y signal. It is well known that the sensitivity of the human eye is highest to green light, followed by that of red, and the least to blue light. The NTSC system exploited this fact by assigning a lower bandwidth to the chrominance signal, compared to the luminance Y signal. This made it possible to save bandwidth without losing color quality. The PAL and SECAM systems also employ reduced chrominance bandwidths (3). The NTSC System The NTSC color space of Y1Q can be generated the gamma-corrected RGB components or from components as follows: Y = 0.299R’
+ 0.5876’
+ O.l14B’,
I = 0.596R’
- 0.2746’
- 0.322B’
Q = 0.21lX
Because r +g + b = 1, any two chromaticity coordinates are sufficient. However, for complete specification a third dimension is required. The luminance (Y) is usually chosen.
S@>Y
where k, is a normalizing constant. For any given choice of primaries and their corresponding color matching functions, luminance can be written as a linear combination of the tristimulus values {R, G, B}. Thus, complete specification of a color is given by either the three tristimulus values or by the luminance and two chromaticities. Then, a color image can be specified by luminance and chromaticities at each pel.
= -(sin
(3)
s
from YW
33”) U + (cos 33”)V, - 0.5236’
- 0.311B’
= (cos 33”) U + (sin 33”)V,
(5)
where U = B’ - Y/2.03 and V = R’ - Y/1.14. (Gamma correction is performed to compensate for the nonlinear relationship between signal voltage U and light intensity B[B s Vy]).
DIGITAL
The inverse operation, corrected RGB components space, can be accomplished
that is, generation of gammafrom the Y1Q composite color as follows:
R’ = l.OY + 0.9561+
0.621&,
G’ = l.OY + 0.2711+
0.649&,
B’ = l.OY - 1.1061-
1.703Q.
(6)
In NTSC, the Y, 1, and Q signals are multiplexed into a 4.2-MHz bandwidth. Although the Y component itself takes the 4.2-MHz bandwidth, multiplexing all three components into the same 4.2 MHz becomes possible by interleaving luminance and chrominance frequencies without too much “crosstalk” between them. This is done by defining a color subcarrier at approximately 3.58 MHz. The two chrominance signals I and Q are (quadrature amplitude modulated) QAM modulated onto this carrier. The envelope of this QAM signal is approximately the saturation of the color, and the phase is approximately the hue. The luminance and modulated chrominance signals are then added to form the composite signal. The process of demodulation involves first comb filtering (horizontal and vertical filtering) of the composite signal to separate the luminance and the chrominance signal followed by further demodulation to separate the I and Q components. The Phase Alternate
Line System
The YW color space of PAL is employed in one form or another in all three color TV systems. The basic YW color space can be generated from gamma-corrected RGB (referred to in equations as R’G’B’) components as follows:
Y = 0.299R'+ 0.5876'+ O.l14B', U = -O.l47R'V = 0.615R'-
0.2896 +0.436B'= 0.515G'-
0.492@'-
O.lOOB' = 0.877(R'-
Y), Y).
(7)
The inverse operation, that is, generation of gammacorrected RGB from YW components, is accomplished by the following: R’ = l.OY+ l.l4OV,
G' = l.OY- 0.394UB’ = l.OY - 2.03OU.
0.58OV. (8)
The Y, U, and V signals in PAL are multiplexed in a total bandwidth of either 5 or 5.5 MHz. With PAL, both U and V chrominance signals are transmitted in a bandwidth of 1.5 MHz. A color subcarrier is modulated with U and V via QAM, and the composite signal is limited to the allowed frequency band which ends up truncating part of the QAM signal. The color subcarrier for PAL is located at 4.43 MHz. PAL transmits the V chrominance component as +V and -V on alternate lines. The demodulation of the QAM chrominance signal is similar to that of NTSC. The recovery of the PAL chrominance signal at the receiver includes averaging successive demodulated scan lines to derive the U and V signals.
COMPONENT
VIDEO
149
TELEVISION
In a component TV system, the luminance and chrominance signals are kept separate, such as on separate channels or multiplexed in different time slots. The use of a component system is intended to prevent the cross talk that causes cross-luminance and cross-chrominance artifacts in the composite systems. The component system is preferable in all video applications that are without the constraints of broadcasting, where composite TV standards were made before the advent of high-speed electronics. Although a number of component signals can be used, the CCIR-601 digital component video format is of particular significance. The color Y, Cr, Cb space of this format is obtained by scaling and offsetting the Y, U, V color space. The conversion from gamma-corrected R, G, B components represented as eight bits (0 to 255) to Y, Cr, Cb is specified as follows: Y = 0.257R’
+ 0.504G’
+ 0.098B
+ 16,
Cr = 0.439R’
- 0.368G’
- 0.07lB’
+ 128,
Cb = -O.l48R'-0.29lG'+0.439B'+128.
(9)
In these equations, Y is allowed to take values in the 16 to 235 range, whereas Cr and Cb can take values in the range from 16 to 240 centered at a value of 128, which indicates zero chrominance. The inverse operation generates gamma-corrected RGB from Y, Cr, Cb components by R’=
l.l64(Y-
16)+1.596(C,
- 128),
G'= l.l64(Y-16)-0.813(C,
-128)-0.392(Cb
B'= l.l64(Y-16)+2.017(&
-128).
-128), (10)
The sampling rates for the luminance component Y and the chrominance components are 13.5 MHz and 6.75 MHz, respectively. The number of active pels per line is 720, the number of active lines for the NTSC version (with 29.97 frames/s) is 486 and for the PAL version (with 25 frames/s) is 576. At eight bits/pel, the bit rate of the uncompressed CCIR-601 signal is 216 Mbps. Digitizing
Video
Video cameras create either analog or sampled analog signals. The first step in processing, storage, or communication is usually to digitize the signals. Analog-to-digital converters that have the required accuracy and speed for video signals have become inexpensive in recent years. Therefore, the cost and quality of digitization is less of an issue. However, digitization with good quality results in a bandwidth expansion, in the sense that transmitting or storing of these bits often takes up more bandwidth or storage space than the original analog signal. In spite of this, digitization is becoming universal because of the relative ease of handling the digital signal compared to analog. In particular, enhancement, removal of artifacts, transformation, compression, encryption, integration with
150
DIGITAL
VIDEO
computers, and so forth are much easier to do in the digital domain using digital integrated circuits. One example of this is the conversion from one video standard to another (e.g., NTSC to PAL). Sophisticated adaptive algorithms required for good picture quality in standards conversion can be implemented only in the digital domain. Another example is the editing of digitized signals. Edits that require transformation (e.g., rotation, dilation of pictures, or time warp for audio) are significantly more difficult in the analog domain. Additionally, encrypting bits is a lot easier and safer than encrypting analog signals. Using digital storage, the quality of the retrieved signal does not degrade in an unpredictable manner after multiple reads as it often does in analog storage. Also, based on today’s database and user interface technology, a rich set of interactions is possible only with stored digital signals. Mapping the stored signal to displays with different resolutions in space (number of lines per screen and number of samples per line) and time (frame rates) can be done easily in the digital domain. A familiar example of this is the conversion of film, which is almost always at a different resolution and frame rate than the television signal. Digital signals are also consistent with the evolving network infrastructure. Digital transmission allows much better control of the quality of the transmitted signal. In broadcast television, for example, if the signal were digital, the reproduced picture in the home could be identical to the picture in the studio, unlike the present situation where the studio pictures look far better than pictures at home. Finally, analog systems dictate that the entire television chain from camera to display operates at a common clock with a standardized display. In the digital domain, considerable flexibility exists by which the transmitter and the receiver can negotiate the parameters for scanning, resolution, and so forth, and thus create the best picture consistent with the capability of each sensor and display. The process of digitization of video consists of prefiltering, sampling, quantization, and encoding (see Fig. 3).
Next, the filtered signal is sampled at a chosen rate and location on the image raster. The minimum rate at which an analog signal must be sampled, called the Nyquist rate, corresponds to twice that of the highest frequency in the signal. For the NTSC system this rate is 2 x 4.2 = 8.4 MHz, and for PAL, this rate is 2 x 5 = 10 MHz. It is normal practice to sample at a rate higher than this for ease of signal recovery when using practical filters. The CCIR-601 signal employs 13.5 MHz for luminance and half that rate for chrominance signals. This rate is an integral multiple of both NTSC and PAL line rates but is not an integral multiple of either NTSC or PAL color subcarrier frequency. Quantization
The sampled signal is still in analog form and is quantized next. The quantizer assigns each pel, whose value is in a certain range, a fixed value representing that range. The process of quantization results in loss of information because many input pel values are mapped into a single output value. The difference between the value of the input pel and its quantized representation is the quantization error. The choice of the number of levels of quantization involves a trade-off between accuracy of representation and the resulting bit rate. PCM
This step is also referred to as prefiltering because it is done prior to sampling. Prefiltering reduces the unwanted frequencies as well as noise in the signal. The simplest filtering operation involves simply averaging the image intensity within a small area around the point of interest and replacing the intensity of the original point by the computed averaged intensity. Prefiltering can sometimes be accomplished by controlling the size of the scanning spot in the imaging system. In dealing with video signals, the filtering applied on the luminance signal may be different from that applied to chrominance signals due to the different bandwidths required.
3.
TV signals.
Conversion
of component analog TV signals to digital
Encoder
The last step in analog-to-digital conversion is encoding quantized values. The simplest type of encoding is called pulse code modulation (PCM). Video pels are represented by eight-bit PCM codewords, that is, each pel is assigned one of the 28 = 256 possible values in the range of 0 to 255. For example, if the quantized pel amplitude is 68, the corresponding eight-bit PCM codeword is the sequence of bit 01000100. WHAT
Filtering
Figure
Sampling
IS COMPRESSION?
Most video signals contain a substantial amount of “redundant” or superfluous information. For example, a television camera that captures 30 frames/s from a stationary scene produces very similar frames, one after the other. Compression removes the superfluous information so that a single frame can be represented by a smaller amount of finite data, or in the case of audio or time-varying images, by a lower data rate (4,5). Digitized audio and video signals contain a significant amount of statistical redundancy, that is, “adjacent” pels are similar to each other, so that one pel can be predicted fairly accurately from another. By removing the predictable component from a stream of pels, the data rate can be reduced. Such statistical redundancy can be removed without loss of any information. Thus, the original data can be recovered exactly by inverse operation, called decompression. Unfortunately, the techniques for accomplishing this efficiently require probabilistic characterization of the signal. Although many excellent probabilistic models of audio and video signals
DIGITAL
have been proposed, serious limitations exist because of the nonstationarity of the statistics. In addition, video statistics may vary widely from application to application. A fast moving football game shows smaller frame-toframe correlation compared to a head and shoulders view of people using video telephones. Current practical compression schemes do result in a loss of information, and lossless schemes typically provide a much smaller compression ratio (2 : 1 to 4 : 1). The second type of superfluous data, called perceptual redundancy, is the information that a human visual system can not see. If the primary receiver of the video signal is a human eye (rather than a machine as in some pattern recognition applications), then transmission or storage of the information that humans cannot perceive is wasteful. Unlike statistical redundancy, the removal of information based on the limitations of human perception is irreversible. The original data cannot be recovered following such a removal. Unfortunately, human perception is very complex, varies from person to person, and depends on the context and the application. Therefore, the art and science of compression still have many frontiers to conquer, even though substantial progress has been made in the last two decades. ADVANTAGES
OF COMPRESSION
Table
1. Bit
Rates
of Compressed
Video
151
to accommodate two layers of data on each side of the DVD, resulting in 17 GB of data. The DVD can handle many hours of high-quality MPEG2 video and Dolby AC3 audio. Thus, compression reduces the storage requirement and also makes stored multimedia programs portable in inexpensive packages. In addition, the reduction of data rate allows transfer of video-rate data without choking various resources (e.g., the main bus) of either a personal computer or a workstation. Another advantage of digital representation/compression is packet communication. Much of the data communication in the computer world is by self-addressed packets. Packetization of digitized audio-video and the reduction of packet rate due to compression are important in sharing a transmission channel with other signals as well as maintaining consistency with telecom/computing infrastructure. The desire to share transmission and switching has created a new evolving standard, called asynchronous transfer mode (ATM), which uses packets of small size, called cells. Packetization delay, which could otherwise hinder interactive multimedia, becomes less of an issue when packets are small. High compression and large packets make interactive communication difficult, particularly for voice. COMPRESSION
The biggest advantage of compression is data rate reduction. Data rate reduction reduces transmission costs, and when a fixed transmission capacity is available, results in better quality of video presentation (4). As an example, a single ~-MHZ analog cable TV channel can carry between four and ten digitized, compressed programs, thereby increasing the overall capacity (in terms of the number of programs carried) of an existing cable television plant. Alternatively, a single ~-MHZ broadcast television channel can carry a digitized, compressed high-definition television (HDTV) signal to give significantly better audio and picture quality without additional bandwidth. Data rate reduction also has a significant impact on reducing the storage requirements for a multimedia database. A CD-ROM can carry a full-length feature movie compressed to about 4 Mbps. The latest optical disk technology, known as digital versatile disk (DVD), which is the same size as the CD, can store 4.7 GB of data on a single layer. This is more than seven times the capacity of a CD. Furthermore, the potential storage capabilities of DVD are even greater because it is possible
VIDEO
REQUIREMENTS
The algorithms used in a compression system depend on the available bandwidth or storage capacity, the features required by the application, and the affordability of the hardware required for implementing the compression algorithm (encoder as well as decoder) (4,5). Various issues arise in designing the compression system. Quality The quality of presentation that can be derived by decoding a compressed video signal is the most important consideration in choosing a compression algorithm. The goal is to provide acceptable quality for the class of multimedia signals that are typically used in a particular service. The three most important aspects of video quality are spatial, temporal, and amplitude resolution. Spatial resolution describes the clarity or lack of blurring in the displayed image, and temporal resolution describes the smoothness of motion. Amplitude resolution describes graininess or other artifacts that arise from coarse quantization. Signals
152 Uncompressed
DIGITAL
VIDEO Versus
Compressed
Bit Rates
The NTSC video has approximately 30 frames/s, 480 visible scan lines per frame, and 480 pels per scan line in three color components. If each color component is coded using eight bits (24 bits/pel total), the bit rate would be approximately 168 Mbps. Table I shows the raw uncompressed bit rates for film, several audio, and video formats. Robustness
As the redundancy from the video signal is removed by compression, each compressed bit becomes more important in the sense that it affects a large number of samples of the video signal. Therefore, an error in either transmission or storage of the compressed bit can have deleterious effects for either a large region of the picture or during an extended period of time. For noisy digital transmission channels, video compression algorithms that sacrifice efficiency to allow graceful degradation of the images in the presence of channel errors are better candidates. Some of these are created by merging source and channel coding to optimize the endto-end service quality. A good example of this is portable video over a wireless channel. Here, the requirements for compression efficiency are severe due to the lack of available bandwidth. Yet, a compression algorithm that is overly sensitive to channel errors would be an improper choice. Of course, error correction is usually added to an encoded signal along with a variety of error concealment techniques, which are usually successful in reducing the effects of random isolated errors. Thus, the proper choice of the compression algorithm depends on the transmission environment in which the application resides. Interactivity
Both consumer entertainment and business video applications are characterized by picture switching and browsing. In the home, viewers switch to the channels of their choice. In the business environment, people get to the information of their choice by random access using, for example, on-screen menus. In the television of the future, a much richer interaction based on content rather than channel switching may become possible. Many multimedia offerings and locally produced video programs often depend on the concatenation of video streams from a variety of sources, sometimes in real time. Commercials are routinely inserted into nationwide broadcasts by network affiliates and cable headends. Thus, the compression algorithm must support a continuous and seamless assembly of these streams for distribution and rapid switching of images at the point of final decoding. It is also desirable that simple edits as well as richer interactions occur on compressed data rather than reconstructed sequences. In general, a higher degree of interactivity requires a compression algorithm that operates on a smaller group of pels. MPEG, which operates on spatiotemporal groups of pels, is more difficult to interact with than JPEG, which operates only on spatial groups of pels. As an example, it is much easier to fast forward a compressed JPEG
bitstream than a compressed MPEG bitstream. This is one reason that current digital camcorders are based on motion JPEG. In a cable/broadcast environment or in an application requiring browsing through a compressed multimedia database, a viewer may change from program to program with no opportunity for the encoder to adapt itself. It is important that the buildup of resolution following a program change take place quite rapidly, so that the viewer can decide either to stay on the program or change to another, depending on the content. Compression
and
Packetization
Delay
have come predominantly Advances in compression through better analysis of the video signal arising from the application in hand. As models have progressed from pels to picture blocks to interframe regions, efficiency has grown rapidly. Correspondingly, the complexity of the analytic phase of encoding has also grown, resulting in an increase of encoding delay. A compression algorithm that looks at a large number of samples and performs very complex operations usually has a larger encoding delay. such encoding delay at the For many applications, source is tolerable, but for some it is not. Broadcast television, even in real time, can often admit a delay of the order of seconds. However, teleconferencing or multimedia groupware can tolerate a much smaller delay. In addition to the encoding delay, modern data communications introduce packetization delay. The more efficient the compression algorithm, the larger is the delay introduced by packetization, because the same size packet carries information about many more samples of the video signal. Symmetry
A cable, satellite, or broadcast environment has only a few transmitters that compress, but a large number of receivers that have to decompress. Similarly, video databases that store information usually compress it only once. However, different viewers may retrieve this information thousands of times. Therefore, the overall economics of many applications is dictated to a large extent by the cost of decompression. The choice of the compression algorithm ought to make the decompression extremely simple by transferring much of the cost to the transmitter, thereby creating an asymmetrical algorithm. The analytic phase of a compression algorithm, which routinely includes motion analysis (done only at the encoder), naturally makes the encoder more expensive. In a number of situations, the cost of the encoder is also important (e.g., camcorder, videotelephone). Therefore, a modular design of the encoder that can trade off performance with complexity, but that creates data decodable by a simple decompressor, may be the appropriate solution. Multiple
Encoding
In a number of instances, the original signal may have to be compressed in stages or may have to be compressed and decompressed several times. In most television studios, for example, it is necessary to store the compressed data and then decompress it for editing, as required. Such an edited
DIGITAL
signal is then compressed and stored again. Any multiple coding-decoding cycle of the signal is bound to reduce the quality of the signal because artifacts are introduced every time the signal is coded. If the application requires such multiple codings, then higher quality compression is required, at least in the several initial stages. Scalability A compressed signal can be thought of as an alternative representation of the original uncompressed signal. From this alternative representation, it is desirable to create presentations at different resolutions (in space, time, amplitude, etc.) consistent within the limitations of the equipment used in a particular application. For example, if a HDTV signal compressed to 24 Mbps can be simply processed to produce a lower resolution and lower bitrate signal (e.g., NTSC at 6 Mbps), the compression is generally considered scalable. Of course, the scalability can be achieved by brute force by decompressing, reducing the resolution, and compressing again. However, this sequence of operations introduces delay and complexity and results in a loss of quality. A common compressed representation from which a variety of low-resolution or higher resolution presentations can be easily derived is desirable. Such scalability of the compressed signal puts a constraint on the compression efficiency in the sense that algorithms that have the highest compression efficiency usually are not very scalable. BASIC COMPRESSION
TECHNIQUES
A number compression techniques have been developed for coding video signals (1). A compression system typically consists of a combination of these techniques to satisfy the type of requirements that we listed in the previous section. The first step in compression usually consists of decorrelation that is, reducing the spatial or temporal redundancy in the signal (4,5). The candidates for doing this are 1. Making a prediction of the next sample of the picture signal using some of the past and subtracting it from that sample. This converts the original signal into its unpredictable part (usually called prediction error). 2. Taking a transform of a block of samples of the picture signal so that the energy would be compacted in only a few transform coefficients. The second step is selection and quantization to reduce the number of possible signal values. Here, the prediction error may be a quantized sample at a time, or a vector of prediction error of many samples may be quantized all at once. Alternatively, for transform coding, only important coefficients may be selected and quantized. The final step is entropy coding which recognizes that different values of the quantized signal occur with different frequencies and, therefore, representing them with unequal length binary codes reduces the average bit rate. We give here more details of the following techniques because they have formed the basis of most compression systems;
Predictive coding (DPCM) Transform coding Motion compensation Vector quantization Subband/wavelet coding Entropy coding Incorporation of perceptual Predictive
Coding
VIDEO
153
factors
(DPCM)
In predictive coding, the strong correlation between adjacent pels (spatially as well as temporally) is exploited (4). As shown in Fig. 4, an approximate prediction of the sample to be encoded is made from previously coded information that has already been transmitted. The error (or differential signal) resulting from subtracting the prediction from the actual value of the pel is quantized into a set of discrete amplitude levels. These levels are then represented as binary words of fixed or variable lengths and are sent to the channel for transmission. The predictions may use the correlation in the same scanning line or adjacent scanning lines or previous fields. A particularly important method of prediction is the motion-compensated prediction. If a television scene contains moving objects and a frame-to-frame translation of each moving object is estimated, then more efficient prediction can be performed using elements in the previous frame that are appropriately spatially displaced. Such prediction is called motion-compensated prediction. The translation is usually estimated by matching a block of pels in the current frame to a block of pels in the previous frames at various displaced locations. Various criteria for matching and algorithms to search for the best match have been developed. Typically, such motion estimation is done only at the transmitter and the resulting motion vectors are used in the encoding process and are also separately transmitted for use in the decompression process. Transform
Coding
In transform by transform
Figure
coding (Fig. 5), a block of pels is transformed 7’ into another domain called the transform
4. Block
diagram
of a predictive
encoder
and decoder.
DIGITAL
VIDEO
receiver is considerably table lookup. Subband/Wavelet
Figure
5. Block diagram of a transform
coder.
domain, and some of the resulting coefficients are quantized and coded for transmission. The blocks may contain pels from one, two, or three dimensions. The most common technique is to use a block of two dimensions. Using one dimension does not exploit vertical correlation, and using three dimensions requires several frame stores. It has been generally agreed that discrete cosine transform (DCT) is best matched to the statistics of the picture signal, and moreover because it has fast implementation, it has become the transform of choice. The advantage of transform coding (4) comes mainly from two mechanisms. First, not all of the transform coefficients need to be transmitted to maintain good image quality, and second, the coefficients that are selected need not be represented with full accuracy. Loosely speaking, transform coding is preferable to predictive coding for lower compression rates and where cost and complexity are not extremely serious issues. Most modern compression systems have used a combination of predictive and transform coding. In fact, motion compensated prediction is performed first to remove the temporal redundancy, and then the resulting prediction error is compressed by two-dimensional transform coding using discrete cosine transform as the dominant choice. Vector
simple
because
it does a simple
Coding
Subband coding, more recently generalized using the theory of wavelets, is a promising technique for video and it has already been shown, outperforms still image coding techniques based on block transforms such as in JPEG. Although subband techniques have been incorporated into audio coding standards, the only image standard based on wavelets currently is the FBI standard for fingerprint compression. There are several compelling reasons to investigate subband/wavelet coding for image and video compression. One reason is that unlike the DCT, the wavelet framework does not transform each block of data separately. This results in graceful degradation, as the bit rate is lowered, without the traditional “tiling effect” that is characteristic of block-based approaches. Wavelet coding also allows one to work in a multiresolution framework which is a natural choice for progressive transmission or applications where scalability is desirable. One of the current weaknesses in deploying wavelet schemes for video compression is that a major component for efficient video compression is block-based motion estimation which makes block-based DCT a natural candidate for encoding spatial information. Entropy
Coding
If the quantized output values of either a predictive or a transform coder are not all equally likely, then the average bit rate can be reduced by giving each one of the values a different word length. In particular, those values that occur more frequently are represented by a smaller length code word (4,5). If a code of variable length is used and the resulting code words are concatenated to form a stream of bits, then correct decoding by a receiver requires that
Quantization
In predictive coding described in the previous section, each pixel is quantized separately using a scalar quantizer. The concept of scalar quantization can be generalized to vector quantization (5) in which a group of pixels are quantized at the same time by representing them as a code vector. Such vector quantization can be applied to a vector of prediction errors, original pels, or transform coefficients. As in Fig. 6, a group of nine pixels from a 3 x 3 block is represented as one of the K vectors from a codebook of vectors. The problem of vector quantization is then to design the codebook and an algorithm to determine the vector from the codebook that offers the best match to the input data. The design of a codebook usually requires a set of training pictures and can grow to a large size for a large block of pixels. Thus, for an 8 x 8 block compressed to two bits per pel, one would need a 2128 size codebook. Matching the original image with each vector of such a large codebook requires a lot of ingenuity. However, such matching is done only at the transmitter, and the
Figure
6.
Block diagram of vector quantlzation.
DIGITAL
every combination of concatenated code words be uniquely decipherable. A variable word length code that achieves this and at the same time gives the minimum average bit rate is called the Huffman code. Variable word length codes are more sensitive to the effect of transmission errors because synchronization would be lost in the event of an error. This can result in decoding several code words incorrectly. A strategy is required to limit the propagation of errors when Huffman codes are used. Incorporation
of Perceptual
Factors
Perception-based coding attempts to match the coding algorithm to the characteristics of human vision. We know, for example, that the accuracy with which the human eye can see coding artifacts depends on a variety of factors such as the spatial and temporal frequency, masking due to the presence of spatial or temporal detail, and so on. A measure of the ability to perceive the coding artifact can be calculated on the basis of the picture signal. This is used, for example, in transform coding to determine the precision needed for quantization of each coefficient. Perceptual factors control the information that is discarded, on the basis of its visibility to the human eye. Therefore, it can be incorporated in any of the previously stated basic compression schemes. Comparison
of Techniques
Figure 7 represents an approximate comparison of different techniques using compression efficiency versus complexity as a criterion under the condition that the picture quality is held constant at an eight-bit PCM level. The complexity allocated to each codec is an approximate estimate relative to the cost of a PCM codec which is given a value of 5. Furthermore, it is the complexity only of the decoder portion of the codec, because that is the most important cost element of digital television. Also, most of the proposed systems are a combination of several different techniques of Fig. 7, making such comparisons difficult. As we remarked before, the real challenge is to combine the different techniques to engineer a cost-effective solution for a given service. The next section describes one example of such a codec.
Figure 7. Bits/pel versus complexity ofvideo decoding for several video compression algorithms.
A COMPRESSION
VIDEO
155
SCHEME
In this section, we describe a compression scheme that combines the previous basic techniques to satisfy the requirements that follow. Three basic types of redundancy are exploited in the video compression process. Motion compensation removes two-dimensional DCT removes temporal redundancy, spatial redundancy, and perceptual weighting removes amplitude irrelevancy by putting quantization noise in less visible areas. Temporal processing occurs in two stages. The motion of objects from frame-to-frame is estimated by using hierarchical block matching. Using the motion vectors, a displaced frame difference (DFD) is computed which generally contains a small fraction of the information in the original frame. The DFD is transformed by using DCT to remove the spatial redundancy. Each new frame of DFD is analyzed prior to coding to determine its rate versus perceptual distortion characteristics and the dynamic range of each coefficient (forward analysis). The transform coefficients are quantized performed based on the perceptual importance of each coefficient, the precomputed dynamic range of the coefficients, and the rate versus distortion characteristics. The perceptual criterion uses a model of the human visual system to determine a human observer’s sensitivity to color, brightness, spatial frequency, and spatiotemporal masking. This information is used to minimize the perception of coding artifacts throughout the picture. Parameters of the coder are optimized to handle the scene changes that occur frequently in entertainment/sports events, and channel changes made by the viewer. The motion vectors, compressed transform coefficients, and other coding overhead bits are packed into a format that is highly immune to transmission errors. The encoder is shown in Fig. 8a. Each frame is analyzed before being processed in the encoder loop. The motion vectors and control parameters resulting from the forward analysis are input to the encoder loop which outputs the compressed prediction error to the channel buffer. The encoder loop control parameters are weighed by the buffer state which is fed back from the channel buffer. In the predictive encoding loop, the generally sparse differences between the new image data and the motioncompensated predicted image data are encoded using adaptive DCT coding. The parameters of the encoding are controlled in part by forward analysis. The data output from the encoder consists of some global parameters of the video frame computed by the forward analyzer and transform coefficients that have been selected and quantized according to a perceptual criterion. Each frame is composed of a luminance frame and two chrominance difference frames which are half the resolution of the luminance frame horizontally. The compression algorithm produces a chrominance bit rate which is generally a small fraction of the total bit rate, without perceptible chrominance distortion. The output buffer has an output rate from 2 to 7 Mbps and a varying input rate that depends on the image content. The buffer history is used to control the
156
DIGITAL
VIDEO
Figure
8. Block
diagram
parameters of the coding algorithm, so that the average input rate equals the average output rate. The feedback mechanism involves adjusting the allowable distortion level because increasing the distortion level (for a given image or image sequence) causes the encoder to produce a lower output bit rate. The encoded video is packed into a special format before transmission which maximizes immunity to transmission errors by masking the loss of data in the decoder. The duration and extent of picture degradation due to any one error or group of errors is limited. The decoder is shown in Fig. 8b. The compressed video data enter the buffer which is complementary to the compressed video buffer at the encoder. The decoding loop uses the motion vectors, transform coefficient data, and other side information to reconstruct the NTSC images. Channel changes and severe transmission errors detected in the decoder initiate a fast picture recovery process. Less severe transmission errors are handled gracefully by several algorithms, depending on the type of error. Processing and memory in the decoder are minimized. Processing consists of one inverse spatial transform and a variable length decoder which are realizable in a few very large scale integration (VLSI) chips. Memory in the decoder consists of one full frame and a few compressed frames. COMPLEXITY/COST
Because cost is directly linked to complexity, this aspect of a compression algorithm is the most critical for the asymmetrical situations described previously. The decoder cost is most critical. Figure 8 represents an approximate
of an encoder/decoder.
trade-off between compression efficiency and complexity under the condition that picture quality is held constant at an eight-bit PCM level. The compression efficiency is in terms of compressed bits per Nyquist sample. Therefore, pictures that have different resolution and bandwidth can be compared simply by proper multiplication to get the relevant bit rates. The complexity allocated to each codec should not be taken too literally. Rather, it is an approximate estimate relative to the cost of a PCM codec, which is given a value of 5. The relation of cost to complexity is controlled by an evolving technology; codecs of high complexity are quickly becoming inexpensive through the use of applicationspecific video DSPs and submicron device technology. In fact, very soon, fast microprocessors will be able to decompress the video signal entirely in software. It is clear that in the near future, a standard resolution (roughly 500 line by 500 pel TV signal) will be decoded entirely in software for even the MPEG compression algorithm. Figure 9 shows video encoding and decoding at various image resolutions. VIDEOPHONE STANDARDS-
AND COMPACT DISK H.320 AND MPEC-1
Digital compression standards (DCS) for videoconferencing were developed in the 1980s by the CCITT, which is now known as the ITU-T. Specifically, the ISDN video-conferencing standards are known collectively as H.320, or sometimes P*64 to indicate that it operates at multiples of 64 kbits/s. The video coding portion of the standard called H.261, codes pictures at a common intermediate format (CIF) of 352 pels by 288 lines. A
DIGITAL
157
VIDEO
Figure 9. Computational requirements in millions of instructions per second (mips) for video encoding and decoding at different image resolutions.
lower resolution of 176 pels by 144 lines, called QCIF, is available for interoperating with PSTN videophones. The H.263 standard is built upon the H.261 framework but modified to optimize video quality at rates lower than 64 kb/s. H.263+ is focused on adding features to H.263 such as scalability and robustness to packet loss on packet networks such as the Internet. In the late 1980s a need arose to place motion video and its associated audio onto first generation CD-ROMs at 1.4 Mbps. For this purpose, in the late 1980s and early 1990s the IS0 MPEG committee developed digital compression standards for both video and two-channel stereo audio. The standard is known colloquially as MPEG-1 and officially as IS0 11172. The bit rate of 1.4 Mbps available on first generation CD-ROMs is not high enough to allow full-resolution TV. Thus, MPEG-1 was optimized for the reduced CIF resolution of H.320 video-conferencing. It was designed to handle only the progressive formats, later MPEG-2 incorporated progressive as well as interlaced formats effectively. THE DIGITAL
ENTERTAINMENT
TV STANDARD--PEG-2
Following MPEG-1, the need arose to compress entertainment TV for such transmission media as satellite, cassette tape, over-the-air, and CATV (5). Thus, to have digital compression methods available for full-resolution standard definition TV (SDTV) pictures such as shown
in Fig. 4a or high definition TV (HDTV) pictures such as shown in Fig. 4b, (IS0 International Standard Organization) developed a second standard known colloquially as MPEG-2 and officially as IS0 13818. Because the resolution of entertainment TV is approximately four times that of videophone, the bit rate chosen for optimizing MPEG-2 was 4 Mbps. SUMMARY A brief survey of digital television has been presented in this article. Digitizing television and compressing it to a manageable bit rate creates significant advantages and major disruption in existing television systems. The future is bright for a variety of systems based on digital television technology. BIBLIOGRAPHY 1.
P. Mertz (1934).
and F. Gray,
Bell
Syst.
Tech.
J.
13,
464-515
2. W. -T. Wintringham, Proc. IRE 39(10) (1951). 3. K. B. Benson, ed., Televmion Engineermg Handbook, McGrawHill, NY, 1986. 4. A. Netravali and B. G. Haskell, Digital Pictures, Plenum, NY, 1988. 5. B. G. Haskell, A. Puri, and A. N. Netravali, Digital Video: An Introduction to MPEG-2, Chapman & Hall, London, 1996.
158
DIGITAL
DIGITAL
WATERMARKING
WATERMARKING R. CHANDRAMOULI Stevens Hoboken, NASIR
Institute NJ
of Technology
MEMON
Polytechnic Brooklyn, MAJID
University NY
RABBANI
Eastman Rochester,
Kodak NY
Company
INTRODUCTION The advent of the Internet has resulted in many new opportunities for creating and delivering content in digital form. Applications include electronic advertising, realtime video and audio delivery, digital repositories and libraries, and Web publishing. An important issue that arises in these applications is protection of the rights of content owners. It has been recognized for quite some time that current copyright laws are inadequate for dealing with digital data. This has led to an interest in developing new copy deterrence and protective mechanisms. One approach that has been attracting increasing interest is based on digital watermarking techniques. Digital watermarking is the process of embedding information into digital multimedia content such that the information (which we call the watermark) can later be extracted or detected for a variety of purposes, including copy prevention and control. Digital watermarking has become an active and important area of research, and development and commercialization of watermarking techniques is deemed essential to help address some of the challenges faced by the rapid proliferation of digital content. In the rest of this article, we assume that the content being watermarked is a still image, though most digital watermarking techniques are, in principle, equally applicable to audio and video data. A digital watermark can be visible or invzsible. A visible watermark typically consists of a conspicuously visible message or a company logo indicating the ownership of the image, as shown watermarked in Fig. 1. On the other hand, an invisibly image appears very similar to the original. The existence of an invisible watermark can be determined only by using an appropriate watermark extraction or detection algorithm. In this article, we restrict our attention to invisible watermarks. An invisible watermarking technique generally consists of an encoding process and a decoding process. A generic watermark encoding process is shown in Fig. 2. The watermark insertion step is represented as X’ = EK(X,
W),
(1)
where X is the original image, X’ is the watermarked image, W is the watermark information being embedded, K is the user’s insertion key, and E represents the watermark insertion function. Depending on the way the watermark is inserted and depending on the nature of
Figure
1. An image
Figure
that
2. Watermark
has visible
encoding
watermark.
process.
the watermarking algorithm, the detection or extraction method can take on very distinct approaches. One major difference between watermarking techniques is whether the watermark detection or extraction step requires the original image. Watermarking techniques that do not require the original image during the extraction process are called oblivzous (or public or blind) watermarking techniques. For oblivious watermarking techniques, watermark extraction is represented as Tjir = DK@),
(2)
where X is a possibly corrupted watermarked image, K’ is the extraction key, D represe_nts the watermark extraction/detection function, and W’ is the extracted watermark information (see Fig. 3). Oblivious schemes are attractive for many applications where it is not feasible to require the original image to decode a watermark. Invisible watermarking schemes can also be classified as either robust or fragile. Robust watermarks are often used to prove ownership claims and so are generally designed to withstand common image processing tasks
DIGITAL
fighting a form of piracy example, when someone movie as it is shown in onto optical disks or VHS
Figure
3. Watermark
decoding
process.
such as compression, cropping, scaling, filtering, contrast enhancement, and printing/scanning, in addition to malicious attacks aimed at removing or forging the watermark. In contrast, fragile watermarks are designed to detect and localize small changes in the image data. Applications
Digital watermarks applica tions, including
are potentially the following.
useful
in
many
Ownership Assertion. To assert ownership of an image, Alice can generate a watermarking signal using a secret private key and embed it in the original image. She can then make the watermarked image publicly available. Later, when Bob contests the ownership of an image derived from this public image, Alice can produce the unmarked original image and also demonstrate the presence of her watermark in Bob’s image. Because Alice’s original image is unavailable to Bob, he cannot do the same. For such a scheme to work, the watermark has to survive image processing operations aimed at malicious removal. In addition, the watermark should be inserted so that it cannot be forged because Alice would not want to be held accountable for an image that she does not own. Fingerprinting. In applications where multimedia content is electronically distributed across a network, the content owner would like to discourage unauthorized duplication and distribution by embedding a distinct watermark (or a fingerprint) in each copy of the data. If, unauthorized copies of the data are found, at a later time, then the origin of the copy can be determined by retrieving the fingerprint. In this application, the watermark needs to be invisible and must also be invulnerable to deliberate attempts to forge, remove or invalidate it. The watermark, should also be resistant to collusion, that is, a group of users that have the same image but contains different fingerprints should not be able to collude and invalidate any fingerprint or create a copy without any fingerprint. Another example is in digital cinema, where information can be embedded as a watermark in every frame or sequence of frames to help investigators locate the scene of the piracy more quickly and point out security weaknesses in the movie’s distribution. The information could include data such as the name of the theater and the date and time of the screening. The technology would be most useful in
WATERMARKING
159
that is surprisingly common, for uses a camcorder to record the a theater and then duplicates it tapes for distribution.
Copy Prevention or Control. Watermarks can also be used for copy prevention and control. For example, in a closed system where the multimedia content needs special hardware for copying and/or viewing, a digital watermark can be inserted indicating the number of copies that are permitted. Every time a copy is made the watermark can be modified by the hardware and after a certain number of copies, the hardware would not create further copies of the data. An example of such a system is the digital versatile disk (DVD). In fact, a copy protection mechanism that includes digital watermarking at its core is currently being considered for standardization, and second-generation DVD players may well include the ability to read watermarks and act based on their presence or absence (1). Fraud and Tampering Detection. When multimedia content is used for legal purposes, medical applications, and commercial transactions, it is news reporting, important to ensure that the content originated from a specific source and that it was not changed, manipulated, or falsified. This can be achieved by embedding a watermark in the data. Subsequently, when the photo is checked, the watermark is extracted using a unique key associated with the source, and the integrity of the data is verified through the integrity of the extracted watermark. The watermark can also include information from the original image that can aid in undoing any modification and recovering the original. Clearly, a watermark used for authentication should not affect the quality of an image and should be resistant to forgeries. Robustness is not critical because removing the watermark renders the content inauthentic and hence valueless. ID Card Security. Information in a passport or ID (e.g., passport number or person’s name) can also be included in the person’s photo that appears on the ID. The ID card can be verified by extracting the embedded information and comparing it to the written text. The inclusion of the watermark provides an additional level of security in this application. For example, if the ID card is stolen and the picture is replaced by a forged copy, failure in extracting the watermark will invalidate the ID card. These are a few examples of applications where digital watermarks could be of use. In addition, there are many other applications in digital rights management (DRM) and protection that can benefit from watermarking technology. Examples include tracking use of content, binding content to specific players, automatic billing for viewing content, and broadcast monitoring. From the variety of potential applications exemplified, it is clear that a digital watermarking technique needs to satisfy a number of requirements. The specific requirements vary with the application, so watermarking techniques need to be designed within the context of the entire system in which they are to be employed. Each application
160
DIGITAL
WATERMARKING
imposes different requirements and requires different types of invisible or visible watermarking schemes or a combination thereof. In the remaining sections of this article, we describe some general principles and techniques for invisible watermarking. Our aim is to give the reader a better understanding of the basic principles, inherent trade-offs, strengths, and weaknesses, of digital watermarking. We will focus on image watermarking in our discussions and examples. However as we mentioned earlier, the concepts involved are general and can be applied to other forms of content such as video and audio. Relationship
to Information
Hiding
and
Steganography
In addition to digital watermarking, the general idea of hiding some information in digital content has a wider class of applications that go beyond copyright protection and authentication. The techniques involved in such applications are collectively referred to as information hiding. For example, an image printed on a document could be annotated by information that could lead a user to its high resolution version as shown in Fig. 4. Metadata provide additional information about an image. Although metadata can also be stored in the file header of a digital image, this approach has many limitations. Usually, when a file is transformed to another format (e.g., from TIFF to JPEG or to bmp), the metadata are lost. Similarly, cropping or any other form of image manipulation typically destroys the metadata. Finally, the metadata can be attached only to an image as long as the image exists in digital form and is lost once the image is printed. Information hiding allows the metadata to travel with the image regardless of the file format and image state (digital or analog). Metadata information embedded in an image can serve many purposes. For example, a business can embed the website URL for a specific product in a picture that shows an advertisement of that product. The user holds the magazine photo in front of a low-cost CMOS camera that is integrated into a personal computer, cell phone, or Palm Pilot. The data are extracted from the low-quality picture and are used to take the browser to the designated website. Another example is embedding GPS data (about 56 bits) about the capture location of a picture. The key difference between this application and many other watermarking applications is the absence of an active adversary. In watermarking applications, such as copyright protection and authentication, there is an active adversary that would attempt to remove,
Figure
4. Metadata
taggmg
using
information
hiding.
invalidate, or forge watermarks. In information hiding, there is no such active adversary because there is no value in removing the information hidden in the content. Nevertheless, information hiding techniques need to be robust to accidental distortions. For example, in the application shown in Fig. 4, the information embedded in the document image needs to be extracted despite distortions from the print and scan process. However, these distortions are just a part of a process and are not caused by an active adversary. Another topic that is related to watermarking is steganography (meaning covered writing in Greek), which is the science and art of secret communication. Although steganography has been studied as part of cryptography for many decades, the focus of steganography is secret communication. In fact, the modern formulation of the problem goes by the name of the prisoner’s problem. Here, Alice and Bob are trying to hatch an escape plan while in prison. The problem is that all communication between them is examined by a warden, Wendy, who will place both of them in solitary confinement at the fh-st hint of any suspicious communication. Hence, Alice and Bob must trade seemingly inconspicuous messages that actually contain hidden messages involving the escape plan. There are two versions of the problem that are usually discussed - one where the warden is passzue, and only observes messages and the other where the warden is active and modifies messages in a limited manner to guard against hidden messages. Clearly, the most important issue here is that the very presence of a hidden message must be concealed, whereas in digital watermarking, it is not always necessary that a good watermarking technique also be steganographic. Watermarking
The following watermarking l
l
l
Issues
important techniques:
issues arise
in studying
digital
Capacity: What is the optimum amount of data that can be embedded in a given signal? What is the optimum way to embed and then later extract this information? Robustness: How do we embed and retrieve data so that it survives malicious or accidental attempts at removal? Transparency: How do we embed data so that it does not perceptually degrade the underlying content?
DIGITAL
Security: How do we determine that the information embedded has not been tampered with forged, or even removed?
l
These questions have been the focus of intense study in the past few years, and some remarkable progress has already been made. However, there are still more questions than answers in this rapidly evolving research area. Perhaps a key reason for it is that digital watermarking is inherently a multidisciplinary topic that builds on developments in diverse subjects. The areas that contribute to the development of digital watermarking include at least the following: Information and communication Decision and detection theory 0 Signal processing l Cryptography and cryptographic
l
theory
l
protocols
Each of these areas deals with a particular aspect of the digital watermarking problem. Generally speaking, information and communication theoretic methods deal with the data embedding (encoder) side of the problem. For example, information theoretic methods are useful in computing the amount of data that can be embedded in a given signal subject to various constraints such as the peak power (square of the amplitude) of the embedded data or the embedding-induced distortion. The host signal can be treated as a communication channel, and various operations such as compression/decompression, and filtering can be treated as noise. Using this framework, many results from classical information theory can be successfully applied to compute the data-embedding capacity of a signal. Decision theory is used to analyze data-embedding procedures from the receiver (decoder) side. Given a dataembedding procedure, how do we extract the hidden data from the host signal which may have been subjected to intentional or unintentional attacks? The data extraction procedure must guarantee a certain amount of reliability. What are the chances that the extracted data are indeed the original embedded data? Even if the dataembedding algorithm is not intelligent or sophisticated, a good data extraction algorithm can offset this effect. In watermarking applications where the embedded data is used for copyright protection, decision theory is used to detect the presence of embedded data. In applications such as media bridging, detection theoretic methods are needed to extract the embedded information. Therefore, decision theory plays a very important role in digital watermarking for data extraction and detection. In fact, it is shown that when ustng invisible watermarks for resolving rtghtful ownership, uniqueness problems arise due to the data detection process, irrespective of the data-embedding process. Therefore, there is a real and immediate need to develop reliable, eficient, and robust detectors for digital watermarking applications. A variety of signal processing tools and algorithms can be applied to digital watermarking. Such algorithms are based on aspects of the human visual system, properties of signal transforms [e.g., Fourier and discrete cosine
WATERMARKING
161
transform (DCT)] , noise characteristics, properties of various signal processing attacks, etc. Depending on the nature of the application and the context, these methods can be implemented at the encoder, at the decoder, or both. The user has the flexibility to mix and match different techniques, depending on the algorithmic and computational constraints. Although issues such as visual quality, robustness, and real-time constraints can be accommodated, it is still not clear if all of the properties desirable for digital watermarking discussed earlier can be achieved by any single algorithm. In most cases, these properties have an inherent trade-off. Therefore, developing signal processing methods to strike an optimal balance between the compettng properties of a digital watermarking algorithm is necessary. Cryptographic issues lie at the core of many applications of information hiding but have unfortunately received little attention. Perhaps this is due to the fact that most work in digital watermarking has been done in the signal processing and communications community, whereas cryptographers have focused more on issues like secret communication (covert channels, subliminal channels) and collusion-resistant fingerprinting. It is often assumed that simply using appropriate cryptographic primitives like encryption, time-stamps, digital signatures, and hash functions would result in secure information hiding applications. We believe that this is far from the truth. In fact, designing secure digital watermarking techniques requires an intricate blend of cryptography along with information theory and signal processing. The rest of this article is organized as follows. In the next section, we describe fragile and semifragile watermarking; the following section deals with robust watermarks. Communication and information theoretic approaches to watermarking are discussed in the subsequent section, and concluding remarks are provided in the last section. FRAGILE AND SEMIFRAGILE
WATERMARKS
In the analog world, an image (a photograph) has generally been accepted as a proof of occurrence “of the event depicted. The advent of digital images and the relative ease with which they can be manipulated has changed this situation dramatically. Given an image in digital or analog form, one can no longer be assured of its authenticity. This has led to the need for image authenttcation techniques. Data authentication techniques have been studied in cryptography for the past few decades. They provide a means of ensuring the integrity of a message. At first, the need for image authentication techniques may not seem to pose a problem because efficient and effective authentication techniques are found in the field of cryptography. However, authentication applications for images present some unique problems that are not addressed by conventional cryptographic authentication techniques. Some of these issues are listed here: l
It is desirable in many applications to authenticate the image content, rather then the representation of
162
DIGITAL
l
l
l
WATERMARKING
the content. For example, converting an image from JPEG to GIF is a change in representation. One would like the authenticator to remain valid across different representations, as long as the perceptual content has not been changed. Conventional authentication techniques based on cryptographic hash functions, message digests, and digital signatures authenticate only the representation. When authenticating image content, it is often desirable to embed the authenticator in the image itself. This has the advantage that authentication will not require any modifications to the large number of existing representational formats for image content that do not provide any explicit mechanism for including an authentication tag (like the GIF format). More importantly, the authentication tag embedded in the image would survive transcoding of the data across different formats, including analog-to-digital and digital-to-analog conversions, in a completely transparent manner. In addition to detecting any tampering with the original content, it is also desirable to detect the exact location of the tampering. Given the highly data-intensive nature of image content, any authentication technique has to be computationally efficient to the extent that a simple real-time implementation should be possible in both hardware and software.
These issues can be addressed by designing image authentication techniques based on digital watermarks. There are two kinds of watermarking techniques that have been developed for authentication applications -fragile watermarking techniques and semifragile watermarking techniques. In the rest of this section, we describe the general approach taken by each and give some illustrative examples. Fragile
Watermarks
A fragile watermark is designed to indicate and even pinpoint any modification made to an image. To illustrate the basic workings of fragile watermarking, we describe a technique recently proposed by Wong and Memon (2). This technique inserts an invisible watermark W into an
Figure mark
5. Public key verification insertion procedure.
water-
m x n image, X. The original image X and the binary watermark W are partitioned into k x I blocks, where the rth image block and the watermark block are denoted by X, and W,, respectively. For each image block X,, a corresponding block X,. is formed, identical to X,., except that the least significant bit of every element in XT is set to zero. For each block X,, a cryptographic hash H(K, m, n, &) (such as MD5) is computed, where K is the user’s key. The first kl bits of the hash output, treated as k x I rectangular array, are XOR’ed with the current watermark block W, to form a new binary block C,. Each element of C, is inserted into the least significant bit of the corresponding element in X,., generating the output Xi. Image authentication is performed by extracting C, from each block Xi of the watermarked image and by XOR’ing that array with the cryptographic hash H(K, m, n, X,), as before, to produce the extracted watermark block. Changes in the watermarked image result in changes in the corresponding binary watermark region, enabling using the technique to localize unauthorized alterations of an image. The watermarking algorithm can also be extended to a public key version, where the private key of a public key algorithm Ki is required to insert the watermark. However, the extraction requires only the public key of user A. More specifically, in the public key version of the algorithm, the MSBs of an image data block X,. and the image size parameters are hashed, and then the result is encrypted using the private key of a public key algorithm. The resulting encrypted block is then XOR’ed with the corresponding binary watermark block W, before the combined results are embedded in the LSB of the block. In the extraction step, the same MSB data and the image size parameters are hashed. The LSB of the data block (cipher text) is decrypted using the public key and then XOR’ed with the hash output to produce the watermark block. Refer to Figs. 5 and 6 for public key verification watermark insertion and extraction processes, respectively. This technique is one example of a fragile watermarking technique. Many other techniques are proposed in the literature. Following are the main issues that need to be addressed in designing fragile watermarking techniques:
DIGITAL
Figure
6.
procedure.
Figure
cedure.
7.
Public
key
verification
SARI image authentication
watermark
extraction
system - verification
pro-
Locality: How well does the technique identify the exact pixels that have been modified. The Wong and Memon technique described before, for example, can localize only changes in image blocks (at least 12 x 12), even if only one pixel has been changed in the block. Any region smaller than this cannot be pinpointed as modified. Transparency: How much degradation in image quality is suffered by inserting of a watermark? Securzty: How difficult is it for someone without the knowledge of the secret key (the user keyK in the first scenario or the private key Ki in the second scenario) used in the watermarking process to modify an image without modifying the watermark or to insert a new but valid watermark. Semifragile
Watermarks
The methods described in the previous subsection authenticate the data that form the multimedia content; the authentication process does not treat the data as distinct from any other data stream. Only the process of inserting the signature into the multimedia content treats the data stream as an object that is to be viewed by a human observer. For example, a watermarking scheme
WATERMARKING
163
may maintain the overall average image color, or it may insert the watermark in the least significant bit, thus discarding the least significant bits of the original data stream and treating them as perceptually irrelevant. All multimedia content in current representations have a fair amount of built-in redundancy, that is to say that the data representing the content can be changed without effecting a perceptual change. Further, even perceptual changes in the data may not affect the content. For example, when dealing with images, one can brighten an image, compress it in a lossy fashion, or change contrast settings. The changes caused by these operations could well be perceptible, even desirable, but the image content is not considered changed. Objects in the image are in the same positions and are still recognizable. It is highly desirable that authentication of multimedia documents takes this into account, that is, there is a set of allowed operations that can be applied to the image content without affecting the authenticity of the image. There have been a number of recent attempts at techniques that address authentication of “image content”, not just the image data representation. One approach is to use feature points in defining image content that are robust to image compression. Cryptographic schemes such as digital signatures can then be used to authenticate these feature points. Typical feature points include, for example, edge maps (3), local maxima and minima, and low-pass wavelet coefficients (4). The problem with these methods is that it is hard to define image content in terms of a few features; for example, edge maps do not sufficiently define image content because it may be possible for two images, to have fairly different content (the face of one person replaced by that of another) but identical edge maps. Image content remains an ill-defined attribute that defines quantification despite the many attempts by the image processing and vision communities. Another interesting approach to authenticating image content is to compute an image digest (or hash or fingerprint) of the image and encrypt the digest using a secret key. For public key verification of the image, the secret key is the user’s private key and hence the verification can be done by anyone who has the user’s public key, much like digital signatures. Note that the image digest that is computed is much smaller than the image itself and can be embedded in the image by using a robust watermarking technique. Furthermore, the image digest has the property that, as long as the image content has not changed, the digest that is computed from the image remains the same. Clearly, constructing such an image digest function is a difficult problem. Nevertheless, there have been a few such functions proposed in the literature, and image authentication schemes based on them have been devised. Perhaps the most widely cited image digest function/authentication scheme is SARI, proposed by Lin and Chang (5). The SARI authentication scheme contains an image digest function that generates hash bits that are invariant to JPEG compression, that is, the hash bits do not change if the image is JPEG compressed but do change for any other significant or malicious operation.
DIGITAL
WATERMARKING
The image digest component of SARI is based on the invariance of the relationship between selected DCT coefficients in two given image blocks. It can be proven that this relationship is maintained even after JPEG compression by using the same quantization matrix for the whole image. Because the image digest is based on this feature, SARI can distinguish between JPEG compression and other malicious operations that modify image content. More specifically, in SARI, the image to be authenticated is first transformed to the DCT domain. The DCT blocks are grouped into nonoverlapping sets Pp and P4 as defined here: pa
=
{plf2,p3,.
P,
=
t&l,
Q2,
. . ,&I/2)},
Q3,
..
,
Qc~,2)),
where N is the total number of DCT blocks in the input image. An arbitrary mapping function 2 is defined between Pp = Z(K, P,), these two sets that satisfies the criteria = P, where P is the set of all DCT PPnPq =$andP,UP, blocks of the input image. The mapping function 2 is central to the security of SARI and is based on a secret key K. The mapping effectively partitions image blocks into pairs. Then for each block pair, a number of DCT coefficients is selected. Feature code or hash bits are then generated by comparing the corresponding coefficients in the paired block. For example, if the DCT coefficient in block Pm is greater than the DCT coefficient in block P, in the block pair (Pm, P,), then the hash bit generated is “1”. Otherwise, a “0” is generated. It is clear that a hash bit preserves the relationship between the selected DCT coefficients in a given block pair. The hash bits generated for each block are concatenated to form the digest of the input image. This digest can then either be embedded in the image itself or appended as a tag. The authentication procedure at the receiving end involves extracting the embedded digest. The digest for the received image is generated in the same manner as at the encoder and is compared with the extracted and decrypted digest. Because relationships between selected DCT coefficients is maintained even after JPEG compression, this authentication system can distinguish JPEG compression from other malicious manipulations of the authenticated image. However, it was recently shown that if a system uses the same secret key K and hence the same mapping function 2 to form block pairs for all of the images authenticated by it, an attacker who has access to a sufficient number of images authenticated by this system can produce arbitrary fake images (6). SARI is limited to authentication that is invariant only to JPEG compression. Although JPEG compression is one of the most common operations performed on an image, certain applications may require authentication that is invariant to other simple image processing operations such as contrast enhancement or sharpening. As a representative of the published literature to achieve this purpose, we review a promising technique proposed by Fridrich (7). In this technique, N random matrices are generated whose entries are uniformly distributed in [O,l] , using a secret key. Then, a low-pass filter is applied to each of these random matrices to obtain N random smooth
Figure
Fridrich
8. Random patterns and their smoothed versions used in semifragile watermarking technique.
patterns, as shown in Fig. 8. These are then made DC free by subtracting their respective means to obtain P, where i=l,... , N. Then image block B is projected onto each of these random smooth patterns. If a projection is greater than zero, then the hash bit generated is a “1” otherwise a “0” is generated. In this way, N hash bits are generated for image authentication. Because the patterns P, have zero mean, the projections do not depend on the mean gray value of the block but depend only on the variations within the block itself. The robustness of this bit extraction technique was tested on real imagery, and it was shown that it can reliably extract more than 48 correct bits (out of 50 bits) from a small 64 x 64 image for the following image processing operations: 15% quality JPEG compression (as in PaintShop Pro); additive uniform noise that has an amplitude of 30 gray levels; 50% contrast adjustment; 25% brightness adjustment, dithering to 8 colors; multiple applications of sharpening, blurring, median, and mosaic filtering; histogram equalization and stretching; edge enhancement; and gamma correction in the range 0.7-1.5. However, operations, such as embossing, and geometric modifications, such as rotation, shift, and change of scale, lead to a failure to extract the correct bits. In summary, image content authentication using a visual hash function and then embedding this hash by using a robust watermark is a promising area and will see many developments in the coming years. This is a difficult problem, and there may never be a completely satisfactory solution because there is no clear definition of image content and relatively small changes in image representation could lead to large variations in image content. ROBUST WATERMARKS Unlike fragile watermarks, robust watermarks are resilient to intentional or unintentional attacks or signal processing operations. Ideally, a robust watermark must withstand attempts to destroy or remove it. Some of the desirable properties of a good, robust watermark include the following: l
Perceptual transparency: Robustness must not be achieved at the expense of perceptible degradation of the watermarked data. For example, a high-energy watermark can withstand many signal processing
DIGITAL
l
l
l
l
l
attacks; however, even in the absence of any attacks this can cause significant loss in the visual quality of the watermarked image. Higher payload: A robust watermark must be able to carry a higher number of information bits reliably, even in the presence of attacks. Resilience to common signal processing operations such as compression, linear and nonlinear filtering, additive random noise, and digital-to-analog conversion. Resilience to geometric attacks such as translation, rotation, cropping, and scaling. Robustness to collusion attacks where multiple copies of the watermarked data can be used to create or remove a valid watermark. Computational simplicity: Consideration for computational complexity is important when designing robust watermarks. If a watermarking algorithm is robust but computationally very intensive during encoding or decoding, then its usefulness in real life may be limited.
In general, most of these above properties conflict with one another, so a number of trade-offs is needed. Three major trade-offs in robust watermarking and the applications that are impacted by each of these trade-off factors are shown in Fig. 9. It is easily understood that placing a watermark in perceptually insignificant components of an image imperceptibly distorts the watermarked image. However, such watermarking techniques are generally not robust to intentional or unintentional attacks. For example, if the watermarked image is lossy compressed, then the perceptually insignificant components are discarded by the compression algorithm. Therefore, for a watermark to be robust, it must be placed in the perceptually significant components of an image, even though we run a risk of causing perceptible distortions. This gives rise to two important questions: (a) what are the perceptually significant components of a signal, and (b) how can the perceptual degradation due to robust water-marking be minimized? The answer to the first question depends on the type of mediumaudio, image, or video. For example, certain spatial frequencies and some spatial characteristics such as edges in an image are perceptually
Figure
9. Trade-offs
in robust
watermarking.
WATERMARKING
165
significant. Therefore, choosing these components as carriers of a watermark will add robustness to operations such as lossy compression. There are many ways in which a watermark can be inserted into perceptually significant components. But care must be taken to shape the watermark to match the characteristics of the carrier components. A common technique that is used in most robust watermarking algorithms is adaptation of the watermark energy to suit the characteristics of the carrier. This is usually based on certain local statistics of the original image so that the watermark is not visually perceptible. A number of robust watermarking techniques have been developed during the past few years. Some of them apply the wattermark in the spatial domain and some in the frequency domain. Some are additive watermarks, and some use a quantize and replace strategy. Some are linear and some are nonlinear. The earliest robust spatial domain techniques were the MIT patchwork algorithm (8) and another one by Digimarc (9). One of the first and still the most cited frequencydomain techniques was proposed by Cox et al. (10). Some early perceptual watermarking techniques using linear transforms in the transform domain were proposed a recent spatial-domain algorithm that in (11). Finally, is remarkably robust was proposed by Kodak (12-16). Instead of describing these different algorithms independently, we chose to describe Kodak’s technique in detail because it clearly identifies the different elements that are needed in a robust watermarking technique . Kodak’s
Watermarking
Technique
A spatial watermarking technique based on phase dispersion was developed by Kodak (12- 16). The Kodak method is noteworthy for several reasons. First, it can be used to embed either a gray-scale iconic image or binary data. Iconic images include trademarks, corporate logos, or other arbitrary small images; an example is shown in Fig. 10a. Second, the technique can determine cropping coordinates without the need for a separate calibration signal. Furthermore, the strategy that is used to detect rotation and scale can be applied to other watermarking methods in which the watermark is inserted as a periodic pattern in the image domain. Finally, the Kodak algorithm reportedly scored 0.98 using StirMark 3.0 (15). The following is a brief description of the technique. For brevity, only the embedding of binary data is considered. The binary digits of the message are represented by positive and negative delta functions (corresponding to ones and zeros) that are placed in unique locations within a message image M. These locations are specified by a predefined message template T, an example of which is shown in Fig. lob. The size of the message template is typically only a portion of the original image size (e.g., 64 x 64, or 128 x 128). Next, a carrier image c~, which is the same size as the message image, is generated by using a secret key. The carrier image is usually constructed in the Fourier domain by assigning a uniform amplitude and a random phase (produced by
DIGITAL
WATERMARKING
Figure
11.
Schematic
of the watermark
a random number generator initialized by the secret key) to each spatial-frequency location. The carrier image is convolved with the message image to produce a dispersed message image, which is then added to the original image. Because the message image is typically smaller than the original image, the original image is partitioned into contiguous nonoverlapping rectangular blocks X,., which are the same size as the message image. The message embedding process creates a block of the watermarked image, X;(~G, y), according to the following relationship: x;(&Y)
= dW~,Y>
* &(x,Y)l
+mw~,
(3)
where the symbol * represents cyclic convolution and a is an arbitrary constant chosen to make the embedded message simultaneously invisible and robust to common processing. This process is repeated for every block in the original image, as depicted in Fig. 11. It is clear from Eq. (3) that there are no restrictions on the message
insertion
process.
image and its pixel values can either be binary or multilevel. The basic extraction process is straightforward and consists of correlating a watermarked image block with the same carrier image used to embed the message. The extracted message image M(Lc, y) is given by
where the symbol 8 represents cyclic correlation. The correlation of the carrier with itself can be represented by a point-spread function p (x, y) = Ed (LV,y) @ c~ (x, y) , and because the operations of convolution and correlation commute, Eq. (4) reduces to
DIGITAL
The extracted message is a linearly degraded version of the original message plus a low-amplitude noise term resulting from the cross correlation of the original image with the carrier. The original message can be recovered by using any conventional restoration (deblurring) technique such as Wiener filtering. However, for an ideal carrier, p(~, y) is a delta function, and the watermark extraction process results in a scaled version of the message image plus low-amplitude noise. To improve the signal-to-noise ratio of the extraction process, the watermarked image blocks are aligned and summed before the extraction process, as shown in Fig. 12. The summation of the blocks reinforces the watermark component (because it is the same in each block), and the noise component is reduced because the image content typically varies from block to block. To create a system that is robust to cropping, rotation, scaling, and other common image processing tasks such as sharpening, blurring, and compression, many factors need to be considered in designing of the carrier and the message template. In general, designing the carrier requires considering the visual transparency of the embedded message, the extracted signal quality, and the robustness to image processing operations. For visual transparency, most of the carrier energy should be concentrated in the higher spatial frequencies because the contrast sensitivity function (CSF) of the human visual system falls off rapidly at higher frequencies. However, to improve the extracted signal quality, the autocorrelation function of the carrier, p(x, y), should be as close as possible to a delta function, which implies a flat spectrum. In addition, it is desirable to spread out the carrier energy across all frequencies to improve robustness to both friendly and malicious attacks because the power spectrum of typical imagery falls off with spatial frequency and concentration of the carrier energy in high frequencies would create little frequency overlap between the image and the embedded watermark. This would render the watermark vulnerable to removal by simple low-pass filtering. The actual design of the carrier is a balancing act between these concerns. The design of an optimal message template is guided by two requirements. The first is to maximize the quality of the extracted signal, which is achieved by placing the
WATERMARKING
167
message locations maximally apart. The second is that the embedded message must be recoverable from a cropped version of the watermarked image. Consider a case where the watermarked image has been cropped so that the watermark tiles in the cropped image are displaced with respect to the tiles in the original image. It can be shown that the message extracted from the cropped image is a cyclically shifted version of the message extracted from the uncropped image. Because the message template is known, the amount of the shift can be unambiguously determined by ensuring that all of the cyclic shifts of the message template are unique. This can be accomplished by creating a message template that has an autocorrelation equal to a delta function. Although in practice it is impossible for the autocorrelation of the message template to be an ideal delta function, optimization techniques such as simulated annealing can be used to design a message template that has maximum separation and minimum sidelobes. The ability to handle rotation and scaling is a fundamental requirement of robust data embedding techniques. Almost all applications that involve printing and scanning result in some degree of scaling and rotation. Many algorithms rely on an additional calibration signal to correct for rotation and scaling, which taxes the information capacity of the embedding system. Instead, the Kodak approach uses the autocorrelation of the watermarked image to determine the rotation and scale parameters, which does not require a separate calibration signal. This method can also be applied to any embedding technique where the embedded image is periodically repeated in tiles. It can also be implemented across local regions to correct for low-order geometric warps. To see how this method is applied, consider the autocorrelation function of a watermarked image that has not been rotated or scaled. At zero displacement, there is a large peak due to the image correlation with itself. However, because the embedded message pattern is repeated at each tile, lower magnitude correlation peaks are also expected at regularly spaced horizontal and vertical intervals equal to the tile dimension. Rotation and scaling affect the relative position of these secondary peaks in exactly the same way that they affect the image.
Figure
12.
tion process.
Schematic
of the watermark
extrac-
168
DIGITAL
WATERMARKING
on communication and information theory is an ongoing process where theories are proposed and refined based on feedback from engineering applications of watermarks. In this section, we describe some communication and information theoretic aspects of digital watermarking. First, we describe the similarities and differences between classical communication and current watermarking systems. Once this is established, it becomes easier to adapt the theory of communications to watermarking and make theoretical predictions about the performance of a watermarking system. Following this discussion, we describe some information theoretic models applied to watermarking. Watermarking
Figure 13. (a) Example of a watermarked image tion and scale transformation and its corresponding tlon. (b) Image in top row after scale and rotational tion and its corresponding autocorrelation.
without rotaautocorrelatransforma-
By properly detecting these peaks, the exact amount of the rotation and scale can be determined. An example is shown in Figure 13. Not surprisingly, the energy of the original image is much larger than that of the embedded message, and the autocorrelation of the original image can mask the detection of the periodic peaks. To minimize this problem, the watermarked image needs to be processed, before computing the autocorrelation function. Examples of such preprocessing include removing the local mean by a spatially adaptive technique or simple high-pass filtering. In addition, the resulting autocorrelation function is highpass filtered to amplify the peak values. COMMUNICATION ASPECTS
AND
INFORMATION
THEORETIC
Communication and information theoretic approaches focus mainly on the theoretical analysis of watermarking systems. They deal with abstract mathematical models for watermark encoding, attacks, and decoding. These models enable studying watermarks at a high level without resorting to any specific application (such as image authentication, etc.). Therefore, the results obtained by using these techniques are potentially useful in a wide variety of applications by suitably mapping the application to a communication or information theoretic model. The rich set of mathematical models based primarily on the theory of probability and stochastic processes allows rigorous study of watermarking techniques; however, a common complaint from practitioners suggests that some of these popular mathematical theories are not completely valid in practice. Therefore, studying watermarks based
as Communication
Standard techniques from communication theory can be adapted to study and improve the performance of watermarking algorithms (17). Figure 14 shows an example of a communication system where the information bits are first encoded to suit the modulation type, error control, etc., followed by the modulation of a carrier signal to transmit this information across a noisy channel. At the decoder, the carrier is demodulated, and the information bits (possibly corrupted due to channel noise) are decoded. Figure 15 shows the counterpart system for digital watermarking. The modulator in Fig. 14 has been replaced by the watermark embedder that places the watermark in the media content. The channel noise has been replaced by the distortions of the watermark media induced by either malicious attacks or by signal processing operations such as compression/decompression, cropping, filtering, and scaling. The embedded watermark is extracted by the watermark decoder or detector. However, note that a major difference between the two models exists on the encoder side. In communication systems, encoding is done to protect the information bits from channel distortion, but in watermarking, emphasis is usually placed on techniques that minimize perceptual distortions of the watermarked content. Some analogies between the traditional communication system and the watermarking system are summarized in Table 1. We note from this table that the theory and algorithms developed for studying digital communication systems may be directly applicable to studying some aspects of watermarking. Note that though these two systems have common requirements, such as power and reliability constraints, these requirements may be motivated by different factors, for example, power constraint in a communication channel is imposed from a cost perspective, whereas in watermarking, it is motivated by perceptual issues. Information
Theoretic
Analysis
Information theoretic methods have been successfully applied to information storage and transmission (18). Here, messages and channels are modelled probabilistically, and their properties are studied analytically. A great amount of effort during the past five decades has produced many interesting results regarding the capacity of various channels, that is, the maximum amount
DIGITAL
Figure
Table
1. Analogies
Between
Communication
15.
Watermarking
and
Watermarking
of information that can be transmitted through a channel so that decoding this information with an arbitrarily small probability of error is possible. Using the analogy between communication and watermarking channels, it is possible to compute fundamental information-carrying capacity limits of watermarking channels using information theoretic analysis. In this context, the following two important questions arise: l
l
What is the maximum length (in bits) of a watermark message that can be embedded and distinguished reliably in a host signal? How do we design watermarking algorithms that can effectively achieve this maximum?
Answers to these questions can be found based on certain assumptions (19-28). We usually begin by assuming probability models for the watermark signal, host signal, and the random watermark key. A distortion constraint is then placed on the watermark encoder. This constraint is used to model and control the perceptual distortion induced due to watermark insertion. For example, in image or video watermarking, the distortion metric could be based on human visual perceptual criteria. Based on the application, the watermark encoder can use a suitable distortion metric and a value for this metric that must be met during encoding. A watermark attacker has a similar distortion constraint, so that the attack does not result in a completely corrupted watermarked signal that makes it useless to all parties concerned. The information that is known to the encoder, attacker,
as a communication
WATERMARKING
169
system.
System
and the decoder is incorporated into the mathematical model through joint probability distributions. Then, the watermarking capaczty is given by the maximum rate of reliably embedding the watermark in any possible watermarking strategy and any attack that satisfies the specified constraints. This problem can also be formulated as a stochastic game where the players are the watermark encoder and the attacker (29). The common payoff function of this game is the mutual information between the random variables representing the input and the received watermark. Now, we discuss the details of the mathematical formulation described before. Let a watermark (or message) W E W be communicated to the decoder. This watermark is embedded in a length-N sequence XN = . , XN) representing the host signal. Let the Wl~X2, * * watermark key known both to the encoder and the decoder be KN = (KI, K2, . . . , KN). Then, using W, XN, and KN, a watermarked signal XfN = (Xi, Xi, . . . , X;V) is obtained by the encoder. For instance, in transformbased image watermarking, each X, could represent a block of 8 x 8 discrete cosine transform coefficients, WN could be the spread spectrum watermark (lo), and KN could be locations of the transform coefficients where the watermark is embedded. Therefore, N = 4096 for a 512 x 512 image. Usually, it is assumed that the elements of XN are independent and identically distributed (i.i.d.) random variables whose probability mass function is p(x), x E X. Similarly, the elements of KN are i.i.d., and their probability mass function is p(k), k E K. If X and K denote generic random variables in the random vectors
170
DIGITAL
WATERMARKING
XN and KN, respectively, then any dependence between X and K is modeled by the joint probability mass function p(x, k). Usually, it is assumed that W is independent of (X, K). Then, a length-N watermarking code that has distortion Di is a triple (W, fN, &), where, W is a set of fN is messages whose elements are uniformly distributed, the encoder mapping, and @N is the decoder mapping that satisfy the following (25): l
l
The encoder mapping xfN = fN(ti, z.u,kN) E il? is such that the expected value of the distortion, E[dN(XN, X’N)] 5 D1. The decoder mapping is given by & = $&yN, kN) E W where y” is the received watermarked signal.
The attack channel is modeled as a sequence of conditional probability mass functions, AN(yN IxN) such that E[dN(XN, YN)l 5 D2. Throughout, it is assumed that dN(p, yN) = l/N J!$i d(x,, yj) where d is a bounded, nonnegative, real-valued distortion function. A watermarking rate R = l/N log IWI is said to be achievable for (01, Da> if there exists a sequence of watermarking D1 that have codes (W, fN, #N) subject to distortion respective rates RN > R such that the probability of error P, = l/lWl C,,wPr(& # ullW = UI) + 0, as N -+ 00 for any attack subject to D2. Then, the watermarking capacity C(D1, Da) is defined as the maximum (or supremum, in general) of all achievable rates for given D1 and 02. This information theoretic framework has been successfully used to compute the watermarking capacity of a wide variety of channels. We discuss a few of them next. When N = 1 in the information theoretic model, we obtain a single letter channel. Consider the single letter, discrete-time, additive channel model shown in Fig. 16. In this model, the message W is corrupted by additive noise J. Suppose that E(W) = E(J) = 0; then, the watermark power is given by E(W2) = erg, and the channel noise power is E(J2) = 0;. If W and J are Gaussian distributed, then, it can be shown that the watermarking capacity is given by l/2 ln( 1 + a$/~$) (28). For the Gaussian channel, a surprising result has also been found recently (25). Let W = R be the space of the watermark signal and d (w , y) = (w -Y)~ be the squared-error distortion measure. If X - Gaussian (0, a:), then, the capacities of the blind and nonblind watermarking systems are equal! This means that, irrespective of whether or not the original signal is available at the decoder, the watermarking rate remains the same. Watermark capacity has received considerable attention for the case where the host signal undergoes specific processing/attacks that can be modeled using well-known
Figure
16.
Discrete-time
additive channel noise model.
probability distributions. It is also a popular assumption that the type of attack which the watermark signal undergoes is completely known at the receiver and is usually modeled as additive noise. But, in reality, it is not guaranteed that an attack is known at the receiver, and, it need not be only additive; for example, scaling and rotational attacks are not additive. Therefore, a more general mathematical model, as shown in Fig. 17, is required to improve the capacity estimates for many nonadditive attack scenarios (20). In Fig. 17, we see that a random multiplicative component is also introduced to model an attack. Using the model in Fig. 17, where Gd and G,, respectively, denote the deterministic and random components of the multiplicative channel noise attack, it has been shown that (20) a traditional additive channel model such as that shown in Fig. 16 tends either to over- or underestimate the watermarking capacity, depending on the type of attack. A precise estimate for the loss in capacity due to the uncertainty about the channel attack at the decoder can be computed by using this model. Extensions of this result to multiple watermarks in a host signal show that, to improve capacity, a specific watermark decoder has to cancel the effect of the interfering watermarks rather than treating them as known or unknown interference. It has also been observed that (20) an unbounded increase in watermark energy does not necessarily produce unbounded capacity. These results give us intuitive ideas for optimizing the capacity of watermarking systems. Computations of information theoretic watermarking capacity do not tell us how to approach this capacity effectively. To address this important problem a new set of techniques is required. Approaches such as quantization index modulation (QIM) (23) address some of these issues. QIM deals with characterizing of the inherent trade-offs among embedding rate, embedding-induced degradation, and robustness of embedding methods. Here, the watermark embedding function is viewed as an ensemble of functions indexed by UI that satisfies the following property: x 5%x’vw. (6) It is clear that robustness can be achieved if the ranges of these functions are sufficiently separated from each other. If not, identifying the embedded message uniquely even in the absence of any attackes will not be possible. Equation (6) and the nonoverlapping ranges of the embedding functions suggest that the range of the embedding functions must cover the range space of x’ and the functions must be discontinuous. QIM embeds information by first modulating an index or a sequence of indexes by using the embedding information and
17. Multiplicative noise model.
Figure
and additive
watermarking
channel
DIGITAL
then quantizing the host signal by using an associated quantizer or a sequence of quantizers. We explain this by an example. Consider the case where one bit is to be embedded, that is, zu E (0, I}. Thus two quantizers are required, and their corresponding reconstruction points in RN must be well separated to inherit robustness to attacks. If UI = 1, the host signal is quantized by the first quantizer. Otherwise, it is quantized by the second quantizer. Therefore, we see that the quantizer reconstruction points also act as constellation points that carry information. Thus, QIM design can be interpreted as the joint design of an ensemble of source codes and channel codes. The number of quantizers determines the embedding rate. It is observed that QIM structures are optimal for memoryless watermark channels when energy constraints are placed on the encoder. As we can see, a fundamental principle behind QIM is the attempt to trade off embedding rate optimally for robustness. As discussed in previous sections, many popular watermarking schemes are based on signal transforms such as the discrete cosine transform and wavelet transform. The transform coefficients play the role of carriers of watermarks. Naturally, different transforms possess widely varying characteristics. Therefore a natural question to ask is, what is the effect of the choice of transforms on the watermarking capacity? Note that good energy compacting transforms such as the discrete cosine transform produce transform coefficients that have unbalanced statistical variances. This property, it is observed, enhances watermarking capacity in some cases (26). Results such as these could help us in designing high-capacity watermarking techniques that are compatible with transform-based image compression standards such as JPEG2000 and MPEG-4. To summarize, communication and information theoretic approaches provide valuable mathematical tools for analyzing watermarking techniques. They make it possible to predict or estimate the theoretical performance of a watermarking algorithm independently of the underlying application. But the practical utility of these models and analysis has been questioned by application engineers. Therefore, it is important that watermarking theoreticians and practitioners interact with each other through a constructive feed back mechanism to improve the development and implementation of the state-of-the-art digital watermarking systems.
Although significant progress has already been made, there still remain many open issues that need attention before this area becomes mature. This chapter has provided only a snapshot of the current state of the art. For details, the reader is referred to the survey articles (30-43) that deal with various important topics and techniques in digital watermarking. We hope that these references will be of use to both novices and experts in the field.” BIBLIOGRAPHY 1. M. Maes (2000).
et al.,
2. P. Wong
and
Digital watermarking is a rapidly evolving area of research and development. We discussed only the key problems in this area and presented some known solutions. One key research problem that we still face today is the development of truly robust, transparent, and secure watermarking techniques for different digital media, including images, video, and audio. Another key problem is the development of semifragile authentication techniques. The solution to these problems will require applying known results and developing new results in the fields of information and coding theory, adaptive signal processing, game theory, statistical decision theory, and cryptography.
IEEE N.
press).
Signal
Process.
Memon, IEEE
Mag.
Trans.
1’7(5),
Image
47-57
Process.
(in
3. S. Bhattacharjee, Proc. Int. Confi Image Process., Chicago, Oct. 1998. 4. D. Kundur and D. Hatzinakos, Proc. IEEE, Speczal Issue Identification (1999).
Protectzon
Multzmedia
Inf:
87(7),
1,167-1,180
5. C. Y. Lin and S. F. Chang, SPIE Storage and Retrzeval of Image/Video Databases, San Jose, January 1998. 6. R. Radhakrishnan and N. Memon, Proc. Int. Conf: Image Process, 971-974, Thessaloniki, Greece, Oct. 2001. 7. J. Fridrich, Proc. Int. Conf: Image Process, Chicago, Oct. 1998. 8. W. Bender, D. Gruhl, N. Morimoto, and A. Lu, IBM Syst. J. 35(3-4),
313-336
(1996).
Digimarc Corporation. http:llwww.digzmarc.com. T. Leighton, and T. Shamoon, 10. I. J. Cox, J. Kilian, 9.
Trans.
11.
Image
Process.
R. B. Wolfgang, 87(7),
1,108-1,126
6(12),
1,673-1,687
C. I. Podilchuk,
and apparatus
for
13. Method
for
embedding
14
for
detectzng
hzding
one zmage
US Pat. 5,905,819, 1999, S. J. Daly. digital
znformation
15
16.
rotation
and
IEEE
or pattern
withzn
zn an zmage,
Pat. 5,859,920, 1999, S. J. Daly et al. Method
IEEE
(1997).
and E. J. Delp, Proc.
(1999).
12. Method another,
magnificatzon
US
zn zmages,
US Pat. 5,835,639, 1998, C. W. Honsinger and S. J. Daly. C. Honsinger, IS&T PICS 2000, Portland, March 2000, pp. 264-268; C. W. Honsinger and M. Rabbani, Int. Conf. Inf Tech.: Coding Cornput., March 2000. Method forgenerating ding problem, US
an improved
carrzer
for the data
embed-
Pat. 6,044,156, 2000, C. W. Honsmger
M. Rabbani. 17. I. J. Cox, M. L. Miller, 1,127-1,141
CONCLUSIONS
171
WATERMARKING
and A. L. McKellips,
(1999).
Proc.
IEEE
and 87,
C. E. Shannon, Bell Syst. Tech. J. 27, 379-423 (1948). C. Cachin, Proc. 2nd Workshop Infi Hiding, 1998. Proc. SPIE Securzty and Watermarkzng 20. R. Chandramouli, 18. 19.
Multimedia
21. 22.
Contents
R. Chandramouli, 2001,
p. 4,518.
Signal
Process.,
III,
Proc.
2001.
SPIE
Multimedia
B. Chen and G. W Wornell, IEEE 1998,
2nd
Syst. Appl. Workshop
IV, Aug.
Multimedia
pp. 273-278.
B. Chen and G. W. Wornell, IEEE Int. Conf Multimedia Comput. Syst. 1, 13-18 (1999). 24. B. Chen and G. W. Wornell, IEEE Int. Conf: Acoust. Speech
23.
Signal
Process.
25. P. Moulin
and
-moulinlpaper.html,
4, 2,061-2,064
(1999).
M. K. Mihcak, June 2001.
http:ll
of
www.ifp.uzuc.eduI
172
DISPLAY
CHARACTERIZATION
26.
M. Ramkumar Multimedia
and A. N. Akansu, IEEE 2nd Signal Process. Dec. 1998, pp. 267-272.
27.
M. Ramkumar and Appl. 3528,482-492
28.
S. D. Servetto, C. I. Podilchuk, and K. Ramachandran, lnt. Conf: Image Process. 1,445-448 (1998).
29.
A. Cohen and A. Lapidoth, 2000, p. 48.
30.
P. Jessop, 2,077-2,080.
31.
F. Mintzer and G. W. Braudaway, Signal Process. 80, 2,067-2,070.
32.
M. Holliman, N. Memon, and Watermarkzng of pp. 134- 146.
33.
F. Hartung, Watermarking
Int.
A. N. Akansu, (1998).
Conf
Proc.
Acoust.
et al., Contents,
Int.
Multimedia
Symp.
Speech Int.
Inf:
B. Girod, Contents,
Acoust.
SPIE 1999,
SPIE Security and 1999, pp. 171-182.
36.
M. Kutter and F. A. P. Petitcolas, markzng of Multimedia Contents,
37.
Special
38.
W Zhu, Z. Xiong, and Y. Q. Zhang, IEEE Video Technol. 9(4), 545-550 (1999).
39.
Proc.
40.
Special
41.
M. D. Swanson, 86(6), 1,064-1,087
42.
G. C. Langelaar 20-46 (2000).
43.
C. Podilchuk 33-46 (2001).
Int.
Proc.
Workshop Issue,
IEEE
Infi
IEEE
and
87(7)
80, Speech
Securzty and pp. 147-158.
Watermarking
of
Watermarkzng
SPIE Security and 1999, pp. 226-239.
Water-
(1999). Trans.
Circuits
Syst.
Hiding.
J. Selected
Areas
M. Kobayashi, (1998). et al.,
June
Process.,
35. J. Fridrich and M. Goljan, SPIE Security and of Multimedia Contents, 1999, pp. 214-225.
Issue,
IEEE
Theory,
Signal Confi
Syst.
and M. M. Yeung, SPIE Security Multimedia Contents, Jan. 1999,
J. K. Su, and of Multimedia
34. J. Dittmann Multimedia
SPIE
Workshop
IEEE
E. Delp,
Commun.
and A. H. Tewfik, Signal
IEEE
Signal
Process. Process
(May Proc.
1998). IEEE
Mag.
17(5),
Mag.
18(4),
DISPLAY CHARACTERIZATION DAVID
H. BFUINARD
University Philadelphia,
of Pennsylvania PA
DENIS G. PELLI New York University New York, NY TOM ROBSON Cambridge Research Systems, Ltd. Rochester, Kent, UK
now computer controlled, and this makes it possible for the computer to take into account the properties of the imaging device to achieve the intended image. We emphasize CRT (cathode ray tube) monitors and begin with the standard model of CRT imaging. We then show how this model may be used to render the desired visual image accurately from its numerical representation. We discuss the domain of validity of the standard CRT model. The model makes several assumptions about monitor performance that are usually valid but can fail for certain images and CRTs. We explain how to detect such failures and how to cope with them. Here we address primarily users who will be doing accurate imaging on a CRT. Inexpensive color CRT monitors can provide spatial and temporal resolutions of at least 1024 x 768 pixels and 85 Hz, and the emitted intensity is almost perfectly independent of viewing angle. CRTs are very well suited for accurate rendering. Our treatment of LCDs (liquid crystal displays) is brief, in part because this technology is changing very rapidly and in part because the strong dependence of emitted light on viewing angle in current LCD displays is a great obstacle to accurate rendering. Plasma displays seem more promising in this regard. We present all the steps of a basic characterization that will suffice for most readers and cite the literature for the fancier wrinkles that some readers may need, so that all readers may render their images accurately. The treatment emphasizes accuracy both in color and in space. Our standard of accuracy is visual equivalence: substituting the desired for the actual stimulus would not affect the observer.2 We review the display characteristics that need to be taken into account to present an arbitrary spatiotemporal image accurately, that is, luminance and chromaticity as a function of space and time. We also treat a number of topics of interest to the vision scientist who requires precise control of a displayed stimulus. The International Color Consortium (ICC, http: llwww. coZor.orgl) has published a standard file format (4) for storing “profile” information about any imaging device.3 It is becoming routine to use such profiles to achieve accurate imaging (e.g. by using the popular Photoshop@ programX4 The widespread support for profiles allows most users to achieve characterization and correction without needing to understand the underlying characteristics of the imaging device. ICC monitor profiles use the standard CRT model presented in this article. For applications where the standard CRT model and instrumentation designed for the mass market are sufficiently accurate, users can simply buy a characterization package consisting of a program
INTRODUCTION This article describes the characterization computer-controlled disp1ays.l Most imaging
and use of devices are
1 The display literature often distinguishes between calibration and characterization (e.g. l-31, calibration refers to the process of adjusting a device to a desired configuration, and characterzzation refers to modeling the device and measuring its properties to allow accurate rendering. We adopt this nomenclature here.
2 The International Color Consortium (4) calls this “absolute calorimetric” rendering intent, which they distinguish from their default “perceptual” rendering mtent. Their “perceptual” intent specifies that “the full gamut of the image is compressed or expanded to fill the gamut of the destination device. Gray balance is preserved but calorimetric accuracy might not be preserved.” 3 On Apple computers, ICC profiles are called “ColorSync” profiles because the ICC standard was based on ColorSync. Free C source code 1s available to read and write ICC profiles (5,6). 4 Photoshop is a trademark of Adobe Systems Inc.
DISPLAY
and a simple calorimeter that automatically produces ICC data in a profiles for their monitors. 5 The ICC-required monitor profile are just enough6 to specify the standard CRT model, as presented here. The ICC standard also allows extra data to be included in the profile, making it possible to specify extended versions of the standard CRT model and other relevant aspects of the monitor (e.g., modulation transfer function and geometry). This article explains the standard CRT model (and necessary basic calorimetry) and describes simple visual tests (available online at http:llpsychtoolbox.org/tips/displaytest. html) that establish the model’s validity for your monitor. Then, we show how to use the standard model to characterize your display for accurate rendering. Finally, the Discussion briefly presents several more advanced topics, including characterization of non-CRT displays.7 CRT
MONITOR
BASICS
We begin with a few basic ideas about the way CRT monitors produce light and computers control displays. The light emitted from each location on a monitor is produced when an electron beam excites a phosphor coating at the front of the monitor. The electron beam scans the monitor faceplate rapidly in a raster pattern (left to right, top to bottom), and the intensity of the beam is modulated during the scan so that the amount of light varies with the spatial position on the faceplate. It is helpful to think of the screen faceplate as being divided up into contiguous discrete pixels. The video voltage controlling the beam intensity is usually generated by a graphics card, which emits a new voltage on every tick of its pixel clock (e.g., 100 MHz). The duration of each voltage sample (e.g., 10 ns) determines the pixel’s width (e.g., 0.3 mm). Each pixel is a small fraction of a raster line painted on the phosphor as the beam sweeps from left to right. The pixel height is the spacing of raster lines. Most monitors today are “multisync,” allowing the parameters of the raster (pixels per line, lines per frame, and frames per second) to be determined by the graphics card. Color monitors contain three interleaved phosphor types (red, green, and blue) periodically arranged as dots or stripes across the face of the monitor. There are three electron beams and a shadow mask arranged so that each beam illuminates only one of the three phosphor types. 5 Vendors of such packages include Monaco (http:ll www.monacosys.coml) and ColorBlind (http:llwww.color.coml). 6 Actually, as noted by Gill (7), it is “a pity that the media black point is not a mandatory part of the profile m the same way [that] the media white point is, since the lack of the black point makes absolute colorimetrlc profile interpretation inaccurate”. Thus, before purchasing monitor characterization software, one should consider whether the software will include the optional media black point tag in the monitor profiles it produces. We have not found a precise definition of the term “media black point” m the ICC documentation, but we infer that it refers to a specification of the ambient light A(h) defined in Eq. (1) below. 7 More detailed treatments of CRTs (e.g., 2,8-11; see 12), calorimetry (e.g., 13-15), and use of ICC profiles (3) may be found elsewhere.
CHARACTERIZATION
173
The phosphor interleaving is generally much finer than a pixel, so that the fine red-green-blue dots or stripes are not resolved by the observer at typical viewing distances. We will not discuss it here, but for some applications, for example, rendering text, it is useful to take into account the tiny displacements of the red, green, and blue subpixel components (16,17). The stimulus for color is light. The light from each pixel may be thought of as a mixture of the light emitted by the red, green, and blue phosphors. Denote the spectrum of the light emitted from a single monitor pixel by C(A). Then, C(A) = rR(h)
+gG(A)
+ bB(h) + A(L),
(1)
where h represents wavelength, R(h), G(h), and B(h) are the spectra of light emitted by each of the monitor’s phosphors when they are maximally excited by the electron beam, r, g, and b are real numbers in the range [O, 11, and A(h) is the ambient (or “flare”) light emitted (or reflected) by the monitor when the video voltage input to the monitor is zero for each phosphor. We refer to the values r, g, and b as the phosphor light intensities.8 Later, we discuss the kind of equipment one can use to measure light, but bear in mind that the characterization of a display should be based on the same light as the observer will see. Thus the light sensor should generally be in approximately the same position as the observer’s eye. Happily, the luminance of CRTs is almost perfectly independent of viewing angle (i.e., it is nearly a lambertian light source), allowing characterization from one viewing point to apply to a wide range. (LCDs lack this desirable property. From a fixed head position, you can see variations in hue across a flat panel caused by the different viewing angles of the nearer and farther parts. This is a serious obstacle to characterization of LCDs for accurate rendering unless the viewpoint can be fixed. Plasma displays are much better in this regard.) Note that Eq. (1) depends on the assumption of channel constancy: to a very good approximation, the relative spectrum of light emitted by each (R, G, or B) monitor channel is independent of the degree to which it is excited.g To simplify the main development below, we assume that A(k) = 0. Correcting for nonzero ambient is 8 Spectrally, Eq. (1) says that the light is a linear combmation of contributions from the three phosphors. Similarly, spatially, the light measured at a particular pixel location IS a linear combination of contributions from several pixels. The weighting of the contributions is the point-spread function. The point spread may be neglected while characterizing the display’s color properties if you use a uniform test patch that is much bigger than the point spread. Thus, Eq. (1) does not take into account blur introduced by the optics of the electron beam and the finite size of the phosphor dots: the intensities r, g, and b describe the total light intensity emitted at a pixel but not how this light is spread spatially. Treatment of the point-spread function is provided in the Discussion (see Modulation Transfer Function). ’ Strictly speaking, Eq. (1) only requires phosphor constancy, the assumption that the relative spectrum of light emitted by each of the CRT’s phosphors is invariant. It is possible for phosphor constancy to hold and channel constancy to be violated. for example, when the degree to which electrons intended for one
174
DISPLAY
CHARACTERIZATION
Figure 1. Schematic of graphics card and CRT monitor. The figure illustrates the video chain from digital pixel value to emitted light. As mentioned in the text and illustrated in Fig. 2, most graphics cards can run either in 24-bit mode, in which mred, mgreen, and mblue are independent, or in s-bit mode, in which &e-j = mgreen = mblue. The digital video output of each lookup table is 8 bits in most commercial graphics cards, but a few cards have more than 8 bits to achieve finer steps at the digital-to-analog converter (DAC) output. See color insert.
both straightforward and recommended, as explained in the Discussion. Figure 1 shows how a pixel is processed. The graphics card generates the video voltages based on values stored in the onboard memory. Typically, user software can write digital values into two components of graphics card memory: a frame buffer and a color lookup table. The top panel of Fig. 2 illustrates the operation of the frame buffer and lookup table for what is referred to as “true color” or 24-bit (“millions of colors”) rn0de.l’ Three &bit bytes in the frame buffer are allocated for each monitor pixel. As shown in the figure, this memory can be thought of as three image planes, each arranged in a spatial grid corresponding to the layout of the monitor pixels. These are labeled Red, Green, and Blue in the figure. Each image plane controls the light emitted by one monitor phosphor. Consider a single pixel. The 8-bit (256 level) value mred in the Red image plane is used as an index to the red lookup table Fred(). This table has 256 entries, and each entry specifies the digital video value R used to generate the video voltage z&d that goes to the monitor to control the intensity of the electron beam as it excites the red phosphor at the pixel of interest. Similar indirection is used to obtain the digital video values G and B for the
green and blue phosphors: R
&red)
G = FDeen B
=
T
(mgreen)
Fblue(mblue)
y
(2)
.
In most graphics cards sold today, R, G, and B are B-bit numbers driving &bit digital-to-analog converters (DACs). This is very coarse compared to the 12- to 14-bit analog-todigital converters used in practically all image scanners sold today. It is odd that the digital imaging industry does not allow us to display with the same fidelity as we capture. However, some graphics cards for demanding applications such as radiology and vision science, do provide higher precision more-than-&bit DACs, using the lookup table F to transform the &bit values of r?‘&& mgreen, and mblue into values of R, G, and B selected from a palette of more finely quantized numbers. R, G, and B drive the graphic card’s The numbers digital-to-analog converters, which produce video voltages proportional to the numbers, except for small errors in the digital-to-analog conversion: ured vgreen
phosphor dot excite another varies with digital video value (see, e.g., 1; also l&19>. For the presentation here, it is more convenient not to distinguish explicitly between phosphor and channel constancy. lo This mode IS also sometimes referred to as 32-bit mode: the 24 bits of data per pixel are usually stored in 32-bit words because this alignment provides faster memory access on most computers.
= Fred
= R =
Ublue =
i- f&d@,
G + egreen(G, B + eblue@,
R’), (3, B’),
(3)
where v represents voltage on a dimensionless scale proportional to the actual voltage, e is the error, and R’, G’, and B’ are the values of R, G, and B for the immediately preceding pixel in the raster scan. The error e has a static and a dynamic component. The static error depends
DISPLAY
CHARACTERIZATION
175
Figure 2. Graphics card operation. Top panel: 24-bit mode. The frame buffer may be thought of as three separate planes, one each for the red, green, and blue channels. Each plane allows specification of an 8-bit number (O-255) for each image location. In the figure, values for the pixel at x = 4, y = 3 are shown: these are 0 for the red plane, 2 for the green plane, and 4 for the blue plane. For each plane, the frame buffer value is used as an index to a lookup table. A frame buffer value of 0 indexes the first entry of the lookup table, and a frame buffer value of 255 indexes the last entry. Thus the configuration shown m the figure causes digital RGB values (17,240, 117) to be sent to the DACs for the pixel at x = 4, y = 3 Bottom panel: &bit mode. The frame buffer IS a single image plane, allowing specification of a smgle &bit number for each image location. This number is used to index to a color lookup table that provides the RGB values to be sent to the DACs. The 8-bit configuration shown in the bottom panel displays the same color for the pixel at x = 4, y = 3 as the 24-bit configuration shown m the top panel. only on the current number and is sometimes specified by the manufacturer. It is generally less than one-half of the smallest step, that is, f0.5 in the dimensionless scale we are using here (e.g., O-255 for &bit DACs). and is a brief The dynamic error is called a “glitch” (few ns) error that depends on the relationship between the current value and that immediately preceding. The glitch is caused by the change of state of the switches (one per bit) inside the digital-to-analog converter. The glitch has a fixed duration, whereas the pixel duration is determined by the pixel clock rate, so the contribution of the glitch to the light intensity is proportionally reduced as the pixel duration is increased. The glitch is usually negligible.
The three video voltages produced by the graphics card, as described in Eq. (3), are transmitted by a cable to the CRT. Within the monitor, a video amplifier drives a cathode ray tube that emits light. The light emitted by the phosphors is proportional to the electron beam intensity, but that intensity is nonlinearly related to the video voltage, so the light intensity is a nonlinear function of video voltage: r = fredhed), g = fgreen (Ugreen) b = fblu&blue)~
3
(4)
176
DISPLAY
CHARACTERIZATION
where r, g, and b have the same meaning as in Eq. (1) and fred(), f,,,,(), and f&& are so-called gamma functions11 that characterize the input-output relationship for each monitor primary. Note that the term “gamma function” is often used today to describe the single-pixel nonlinear transformation of many display devices, including printers. There are differences in both the form of the nonlinearities and the mechanisms that cause them. The gamma function of a CRT is caused by a space charge effect in the neck of the tube, whereas the gamma function of printers is usually caused by the spreading and overlapping of ink dots and the scatter of light within the paper. Different models are needed to account satisfactorily for the nonlinearities of different devices. Some analog-input LCD displays and projectors process the incoming signal electronically to provide a nonlinear response that approximates that of a traditional CRT, but they are being displaced by newer digital-input LCDs. Equations (3) and (4) incorporate further assumptions about monitor performance, including assumptions of pixel independence and channel independence. These assumptions, particularly pixel independence, greatly simplify the CRT model and make it practical to invert it. Pixel independence is the assumption that the phosphor light intensities (r, g, and b) at each pixel depend solely on the digital video values for that pixel, independent of the digital video values for other pixels. Recall that, as a matter of convenience, Eq. (1) defines the phosphor intensities r, g, and b before blur by the point-spread function of the CRT. The real blurry pixels overlap, but pixel independence guarantees that the contribution of each pixel to the final image may be calculated independently of the other pixels. The final image is the sum of all of the blurry pixels (21). Channel independence is the assumption that the light intensity for one phosphor channel depends solely on the digital video value for that channel, independent of the digital video values for the other channels. The validity of these assumptions is discussed later. There is a second common mode for graphics cards. This is typically referred to as “indexed color” or &bit (“256 colors”) mode. It is illustrated at the bottom of Fig. 2. Here, there is only a single plane of &bit frame buffer values. These index to a single color lookup table to obtain R, G, and B values; at each pixel, the same index value is used for all three channels so that mred = mgreen = mbiue. Thus, for 8bit mode, only 256 colors (distinct R, G, B combinations) are available to paint each frame. It is usually possible to load an entirely new lookup table for l1 The term gamma function is also used to describe the relationship between phosphor intensities (i.e., r, g, and b) and digital video values (i.e., R, G, and B) because this latter relationship is often the one measured. It 1s called a “gamma function” because the relationship has traditionally been described by power-law like functions where the symbol gamma denotes the exponent, for example, r cx [(R - Ro)/(255 - Ro)lY for R > Ro and r = 0, otherwise [see Eq. (16) later]. Gamma is usually in the range 2 to 3, with a value of 2.3 quoted as typical (11; see also 20).
each frame. Most current graphic cards can be configured to operate in either 8 or 24-bit mode. The advantage of 8bit mode is that images can be moved into the frame buffer more rapidly, because less data must be transferred. The disadvantage is that only 256 colors are available for each frame. Both 8- and 24-bit modes are important and widely used in visual psychophysics (see 22). Although it is useful to understand the difference between 8- and 24-bit graphics modes, the distinction is not crucial here. The digital part of the CRT video chain is simple and does not introduce distortions. As illustrated by Fig. 3, the key point is that, for each pixel, we must compute the appropriate R, G, and B values to produce arbitrary desired phosphor intensities r, g, and b. This computation relies on measurements of the analog portion of the video chain and on the properties of the display. In particular, it is necessary to characterize the spectral properties of the light emitted by the monitor phosphors [Eq. (l)] and th e gamma functions [Eqs. (3) and (4)]. BASIC COLORIMETRY Equation (1) shows that monitors can produce only a very limited set of spectra C(k), those that consist of a weighted sum of the three fixed primaries. But that is enough because human color vision is mediated by three classes of light-sensitive cones, referred to as the L, 1M, and S cones (most sensitive to long, middle, and short wavelengths, respectively). The response of each class of cones to a spectrum incident at the eye depends on the rate at which the cone pigment absorbs photons, and this absorption rate may be computed via the spectral sensitivity of that class of cones. Denote the spectral sensitivities of the L, M, and S cones as L(1), M(h), and S(h), respectively. Then the quanta1 absorption rates I, m, and s of the L, M, and S cones for a color stimulus whose spectrum is C(h) are given by the integrals 780 nm I=
.I
UWW
d&
380" nm
m=
I
WW@)
dh,
386 nm
where each integral is taken across the visible spectrum, approximately 380 nm to 780 nm. We refer to these quanta1 absorption rates as the cone coordinates of the spectrum C(A). The integrals of Eq. (5) may be approximated by the sums
I = ~L(h,)C(h,)A*,
DISPLAY
177
CHARACTERIZATION
Figure 3. The standard CRT model. Based on Fig. 1, this reduced schematic shows the subsection of the video chain described by the standard CRT model. This subsection takes the digital video values R, G, and B as input and produces mtensities r, g, and b as output. See color insert.
(6) across wavelengths h, evenly spaced across the visible spectrum, where Ah is the interval between wavelength samples. The CIE recommends sampling from 380 nm and 780 nm at 5 nm intervals, making n = 81. Using matrix notation, s=
1 [I m
,
S
s =
[
L(h2) * * *
Jm.1) M(hl)
M(h2)
. .
S(h)
S(h2).
.
r c(h)
Ah,
(7)
L : we can rewrite
1
Eq. (6) as
s=sc.
(8)
Equation (8) computes cone coordinates s from the spectrum c of the color stimulus. When two physically distinct lights that have the same spatial structure result in the same quanta1 absorption rates for the L, M, and S cones, the lights are indistinguishable to cone-mediated vision.12 Thus, accurately rendering a desired image on a characterized monitor means choosing R, G, and B values so that the spectrum c emitted by the monitor produces the same cone coordinates s as the desired image. l2 We neglect for now a possible effect of rod signals that can occur at light levels typical of many color monitors. See Discusszon: Use of Standard Colorzmetrzc Data.
CHARACTERIZATION
MODEL
USING THE STANDARD
CRT
The CRT model presented above is the standard CRT model used for color characterization (e.g., 1,2,23).13 Using the standard CRT model, we can find the digital video values R, G, and B required to render a spectrum C(h) through the following computational steps: 1. Computsng cone coordinates. Use Eq. (8) to find the I, m, and s quanta1 absorption rates corresponding to C(h). 2. Computing phosphor light intensities. Find r, g, and b such that the mixture expressed on the right side of Eq. (1) produces the same quanta1 absorption rates. This computation relies on measurements of the phosphor spectra but is independent of the gamma functions. 3. Gamma correction. Find DAC values R, G, and B that will produce the desired phosphor intensities r, g, and b. This computation relies on measurements of the gamma functions but is independent of the phosphor spectra. Because each intensity r, g, and b under the standard CRT model is a simple monotonic function of the corresponding digital video value (R, G, or B), it is straightforward to invert each gamma function to find the necessary digital video value to produce any desired output (assuming that the DAC error e is negligible). The measurements and computations required for each of these steps are discussed in more detail below. Note that before actually using the standard model, you will want to make sure it is valid for your monitor and stimuli. l3 Both Post (1) and Cowan (2) provide systematic developments of the standard model. Different authors have used different terms for the assumptions embodied by the standard model. Our nomenclature here is similar to Post’s (1).
178
DISPLAY
CHARACTERIZATION
Detecting failure of the standard CRT model and possible remedies are also discussed below. Most CRT monitors have controls labeled “contrast” and “brightness” that affect the gamma function, the ambient luminance. These should light A(h), and the maximum be adjusted before beginning the characterization process. Write the date and “do not touch” next to the controls once they are adjusted satisfactorily. Computing
Cone
Coordinates
Recall that, given the spectrum of a color stimulus c, Eq. (8) allows us to compute the cone coordinates s. Measuring spectra requires a spectroradiometer. But spectroradiometers are expensive, so it is more common to use a less expensive calorimeter to characterize the spectral properties of the desired light. A calorimeter measures the CIE XYZ tristimulus coordinates of light. XYZ tristimulus coordinates are closely related to cone coordinates. Indeed, the cone coordinates s of the light may be obtained, to a good approximation, by the linear transformation
s=Tx
(9)
where
0.2420
T =
0.8526
-0.3896
-0.0445 0.0853
1.1601 -0.0018
[ 0.0034
and
X Y [I z
x=
0.5643
where the 3 x 3 matrix M equals S P. We can compute matrix inverse of M and solve for w,
w = M-h.
the
(14)
Equation (14) computes the phosphor intensities w (r, g, and b) that will produce the desired cone coordinates s (I, m, and s). Calculation of Eq. (14) requires knowledge of the matrix M, M=SP L(h,)ft(h,)Ah
L z=l
&(h,)G(hi)Ah
&(h,)B(hi)Ai,
z=l
2=1
-I
(15) Each element of M is the quanta1 absorption rate of one cone class for one phosphor spectrum at maximum excitation. Each column of M is the set of cone coordinates s of one phosphor spectrum [Eqs. 6 and 71. Thus, to implement Eq. (14), it is necessary to know these cone coordinates for each phosphor. The best way to find them is to use a spectroradiometer to measure the phosphor emission spectra R(L), G(h), and B(h). If such measurements are not feasible and only the XYZ tristimulus coordinates of each phosphor are available, then the matrix M may be computed using the relationship between the cone and tristimulus coordinates given in Eqs. (9) and (10).
1 (10)
.
The matrix T was calculated from the Smith-Pokorny estimates (24,25) of cone spectral sensitivities, each cone sensitivity was normalized to a peak sensitivity of one, and the XYZ functions were as tabulated by the CIE (26,27). The appropriate matrix may be easily computed for any set of cone spectral sensitivities (14). A detailed discussion of the relationship between cone coordinates and tristimulus coordinates is available elsewhere (14,15). Computing
Adopting intensities
Phosphor
matrix w, p
Intensities
notation
=
[:
w= allows us to rewrite
for the phosphor
NW
G(h)
W-1)
W2)
G&2)
W2)
[I
r g 7 b
.
Eq. (1) [neglecting
spectra
1 ,
!
(11)
A(h)] as
c=Pw. Using this to substitute
P and
(12)
for c in Eq. (8) yields
s=SPw=Mw.
(13)
Figure 4. Typical gamma function. The plot shows the gamma function for a typical CRT monitor. The solid circles show measured data. The measurement procedure is described in the text. The intensity data were normalized to a maximum of one for a digital video value of 255. The line shows the best fit obtained using Ea. (16). with narameter estimates v = 2.11 and Gn = 0.
DISPLAY
Alternatively, if one prefers to specify color stimuli in terms of XYZ coordinates rather than cone coordinates, one may set T as the identity matrix and work directly in XYZ coordinates, starting with the calorimetric measurements and replacing the cone spectral sensitivities used to form the matrix S in the previous derivations by the CIE XYZ color matching functions. Gamma
Correction
Figure 4 shows a typical gamma function measurement. of light emitted by To obtain this curve, the spectrum the green phosphor was measured for 35 values of the G digital video value where R = B = 0. A measurement of the ambient light (R = G = B = 0) was subtracted from each individual measurement. Then, for each measured spectrum, a scalar value was found that expressed that spectrum as a fraction of the spectrum G(h) obtained from the maximum digital video value (G = 255). These scalars take on values between 0 and 1 and are the measured gamma function fmeen(). Given a desired value for the green-phosphor intensity g, gamma correction consists of using the measured gamma function to find the digital value G that produces the best approximation to g. This is conceptually straightforward, but there are several steps. The first step is to interpolate the measured values. a Although exhaustive measurement is possible, fitting to the measured values has the parametric function advantage of smoothing any error in the gamma function measurements.14 For most CRT monitors, measured gamma functions are well fit by the functional form (e.g., for the green phosphor)
Because G must be an integer and the value computed by Eq. (17) is real, the computed value of G should be rounded up or down to minimize the error in g predicted by Eq. (16). DETECTING
g = f,,dG
+ egreen)= 0,
(16)
In Eq. (16), parameter Go represents a cutoff digital video value below which no incremental light is emitted, and parameter y describes the nonlinear form of the typical gamma function. The constant 255 in Eq. (16) normalizes the digital video value and is appropriate when G is specified by &bit numbers. To be consistent with Eq. (4), we retain egreen, which is usually negligible. The curve through the data in Fig. 4 represents the best fit of Eq. (16) to the measured data. Note that Eq. (16) requires that phosphor intensity g = 0 when digital video value G = 0. This is appropriate because both our definition of phosphor intensities [Eq. (l)] and our measurement procedure (see above) account explicitly for the ambient light. Equation (16) is easily inverted to determine, for example, G from g: G = (255 - Go)glly
+ Go.
(17)
l4 In a demanding application using &bit DACs, it may be desirable to measure light intensity accurately for all 256 digital video values and to use the measured data directly instead of fitting a smooth function, so that the DAC’s static error is taken into account m the gamma correction.
FAILURES
OF
THE
STANDARD
CRT
MODEL
The standard CRT model is useful because it generally provides an acceptably accurate description of the performance of actual CRT monitors, because measurement of its parameters is straightforward, and because it is easily inverted. Use of the standard CRT model is implicit in the information stored in ICC monitor profiles: these profiles store information about the XYZ tristimulus coordinates and gamma function for each channel (R, G, and B). For very accurate rendering, however, it is important to be aware of the assumptions that the model embodies and how these assumptions can fail. A number of authors provide detailed data on how far actual CRT monitors deviate from the assumptions of the standard CRT model (1,2,18,19,21,28-31). As noted above, the key assumption for the standard CRT model is pixel independence, expressed implicitly by Eqs. (3) and (4). Pixel independence is particularly important because, though it is easy to extend the standard CRT model to deal with dependent pixels, the extended model is hard to invert, making it impractical for fast image rendering. The next section (Extending the Standard CRT ModeZ) briefly discusses other assumptions for the standard CRT model. It is generally possible to cope with the failures of these other assumptions because the standard CRT model can be extended to handle them in a fashion that still allows it to be inverted rapidly. Pixel
otherwise.
179
CHARACTERIZATION
Independence
There are at least six causes for the failure of pixel independence (also see 21). Dependence on other pixels in the same frame (failures l-4, below) has been called “spatial dependence”, and dependence on pixels in preceding frames (failures 5 and 6) has been called “temporal dependence” (1). Failures 1 and 2 are short range, causing each pixel’s color to depend somewhat on the preceding pixel in the raster scan. Failures 3-6 are long range in the current frame (3 and 4) or in the preceding frames (5 and 6). 1. Glitch. As mentioned earlier, the glitch (the dynamic component of the DAC error) depends on the preceding as well as the current pixel. The DAC manufacturer generally specifies that the contribution of the glitch to the average voltage for the duration of a pixel at maximum clock rate is a fraction of a least significant step, so it is usually negligible. 2. Finite video bandwidth and slew rate. The whole video signal chain -including the DACs on the graphics card, the video cable, and the monitor’s amplifiers -has a finite bandwidth so that it cannot instantly follow the desired voltage change from pixel to pixel, resulting in more gradual voltage
180
DISPLAY
CHARACTERIZATION
transitions. Because this filtering effect precedes the nonlinear gamma function, it is not equivalent to simply blurring the final image horizontally or averaging along the raster lines. Smoothing before the nonlinear gamma function makes the final image dimmer. (Because the gamma function is accelerating, the average of two luminances produced by two voltages is greater than the luminance produced by the average voltage.) An excellent test for this effect is to compare the mean luminance of two fine gratings, one vertical and one horizontal (21,29,31,32). This test is available on-line at http:llpsychtoolbox.org/ tipsldisplaytest.html. Each grating consists of white and black one-pixel lines. Only the vertical grating is attenuated by the amplifier’s bandwidth, so it is dimmer. Like the glitch, this effect can be reduced by slowing the pixel clock in the graphics card or, equivalently, by using two display pixels horizontally (and thus two clock cycles) to represent each sample in the desired image. Video bandwidth is normally specified by the manufacturer and should be considered when choosing a CRT monitor. 3. Poor high-voltage regulation. The electron beam current is accelerated by a high-voltage (15 to 50 kV) power supply, and on cheaper monitors, the voltage may slowly drop when the average beam current is high. This has the effect of making the intensity of each pixel dependent on the average intensity of all of the pixels that preceded it. (The highvoltage supply will generally recuperate between frames.) You can test for such long-distance effects by displaying a steady white bar in the center of your display surrounded by a uniform field of variable luminance. Changing the surround from white to black ideally would have no effect on the luminance of the bar. To try this informally without a photometer, create a cardboard shield with a hole smaller than the bar to occlude a flickering surround, and observe whether the bar is steady. This effect depends on position. The effect is negligible at the top of the screen and maximal at the bottom. A single high-voltage supply generally provides the current for all three channels (R, G, and B), so that the effect on a particular test spot is independent of the channel used for background modulation. When the high voltage is very poorly regulated, the whole screen image expands as the image is made brighter, because as the increased current pulls the high voltage down, the electrons take longer to reach the screen and deflect more. 4. Incomplete DC restoration. Unfortunately, the video amplifier in most CRT monitors is not DC coupled (21). Instead it is AC coupled most of the time, and momentarily DC coupled to make zero volts produce black at the end of the vertical blanking interval. (DC, “direct current,” refers to zero temporal frequency; AC, “alternating current,” refers to all higher frequencies.) This is called “DC restoration,” which is slightly cheaper to design and build than a fully DC coupled video circuit. If the AC
time constant were much longer than a frame, the DC restoration would be equivalent to DC coupling, but, in practice, the AC time constant is typically short relative to the duration of a frame, so that the same video voltage will produce different screen luminances depending on the average voltage since the last blanking interval. As for failure 3, this effect is negligible at the top of the screen and maximal at the bottom. However, this effect can be distinguished from failure 3 by using silent substitution. To test, say, the green primary, use a green test spot, and switch the background (the rest of the screen) back and forth between green and blue. The green and blue backgrounds are indistinguishable to the high-voltage power supply (it serves all three guns) but are distinct to the video amplifiers (one per gun). 5. Temporal dependence. This is a special case of pixel dependence, where the pixel intensities depend on pixels in preceding frames. A typical example of this occurs when a full-field white screen is presented after a period of dark. Amongst other things, the large change in energy delivered to the screen results in distortions of the shadow mask and displayed luminances that are temporarily higher than desired. This effect may persist for several frames. It is normally difficult to characterize these phenomena precisely, especially in a general model, and the best line of defense is to avoid stimuli that stress the display. Measurement of temporal dependence requires an instrument that can measure accurately frame by frame. 6. Dynamic brightness stabilization. This is another cause of temporal dependence. Since the mid1990s some CRT manufacturers (e.g., Eizo) have incorporated a new “feature” in their CRT monitors. The idea was to minimize objectionable flicker when switching between windows, by stabilizing the total brightness of the display. To this end, the monitor compensates for variations in the mean video voltage input to the display. This has the effect of making the intensity of each pixel dependent on the average voltage of all of the pixels in the preceding frame. This artifact, too, will be detected by the test described in 3 above. However, this effect is independent of spot position and thus can be distinguished from failures 3 and 4, by testing at the top of the screen, where failures 3 and 4 are negligible. Different monitors conform in different degrees to the standard CRT model. More expensive monitors tend to have higher video bandwidth and better regulation of the high-voltage power supply. If accuracy matters, it is worth performing tests like those described here to find a monitor whose performance is acceptable. A suite of test patterns for display in your web browser is available at http: llpsychtoolbox.org I tips ldisplaytest. html. Coping
with
Pixel
Dependence
The standard CRT model is easily extended to deal with dependent pixels, but the resulting model is hard to invert,
DISPLAY
making it impractical for many applications. However, an extended model may be useful in estimating the errors in your stimuli. Once you understand the nature of pixel dependence, you may be able to minimize its effect by avoiding problematic stimuli. For example, the preceding pixel 1 and 2 of pixel independence) can be effects (failures reduced by reducing the rate of change of the video signal, either by spacing the changes out (e.g., double the viewing distance, and use a block of four pixels, 2 x 2, in place of each original pixel) or making a fine grating horizontal (parallel to the raster lines) instead of vertical. The spatialaverage effects (failures 3-6 of pixel independence) can be reduced by maintaining a constant mean luminance during characterization and display. If the image to be displayed does not occupy the whole screen, then it is generally possible to mask a small section of the screen from view and to modulate the light emitted from this section to counterbalance modulations of the image that are visible. In this way, the total light output from the entire display may be held constant. EXTENDING Channel
THE
STANDARD
CRT
MODEL
Constancy
As noted above, Eq. (1) embodies an assumption of channel constancy, that the relative spectrum emitted by an RGB monitor channel is independent of its level of excitation.
181
In our experience, channel constancy holds quite well for CRTs. However, for other display technologies, there can violations of this assumption. Figure 5, for be significant example, shows the relative spectrum of the green primary of an LCD monitor (see discussion of LCD displays later) at two digital video value levels, revealing a substantial change in the spectrum of this primary. It is possible to measure such changes systematically and correct for them. For example, Brainard et al. (33) describe a method based on using small-dimensional linear models to characterize spectral changes. This method was developed for handling spectral changes that occur when the voltage to filtered lamps is varied, but it also works well for LCD displays. Other authors have also proposed methods designed explicitly to deal with failures of channel constancy (28). Lookup table methods (see ChanneZ Independence) may also be used to handle the effects of spectral changes. Channel
Independence
In Eq. (4), the form of the gamma function for the green phosphor is g = fgreen(ugreen). A more general form for this function would be g = fgreen(ured, ugTeen,Ublue), so that g depends on its corresponding digital video value G and also on R and B. Failures of the simple form [Eq. (4)] are called failures of channel independence. Channel independence can fail because of inadequate power supply regulation (see item 3 under pixel independence above). Cowan and Rowe11 (18) and Brainard (19) describe procedures for testing for failures of channel independence. Channel inconstancy and dependence go beyond the standard CRT color model but still obey pixel independence and are within the domain of a general color transformation model. Lookup tables may be used to characterize CRT monitors whose performance violates the assumptions of channel constancy and independence. To create a table, one measures the monitor output (e.g., XYZ) for a large number of monitor digital video value triplets (i.e., RGB values). Then, interpolation is employed to invert the table and find the appropriate digital video value for any desired output. Cowan (2) provides an introduction to lookup table methods (see also 34,35). The ICC profile standard allows for storing and inverting a sampled representation of a general three-space to a three-space nonlinear transform. Spatial
Figure 5. LCD primary spectra. A plot of the relative spectral power distribution of the blue primary of an LCD monitor (Marshall, V-LCD5V) in the spectral range 400-600 nm measured at two digital video values. Measurements were made in David Brainard’s lab. The spectrum of the ambient light was subtracted from each measurement. The solid curve shows the spectrum for a digital video value of 255. The dashed curve shows the spectrum for a digital video value of 28, scaled to provide the best least-squares fit to the spectrum for a digital video value of 255. The plot reveals a change in the relative spectrum of the emitted light as the digital video value changes from 28 to 255. If there were no change, the two curves would comcide.
CHARACTERIZATION
Homogeneity
In general, all of the properties of a CRT may vary across the face of the CRT, and, for some applications, it may be necessary to measure this variation and correct for it. Luminance at the edge of a CRT is often 20% less than in the center (19). Failures of spatial homogeneity can be even larger for projectors. Brainard (19) found that the spatial inhomogeneity of a monitor could be characterized by a single light-attenuation factor at each location. Spatial inhomogeneity does not violate pixel independence, because the light output at each pixel is still independent of the light output at other pixel locations. Spatial homogeneity may be handled by allowing the parameters of the standard model to depend on the pixel location.
182
DISPLAY
CHARACTERIZATION
Temporal
Homogeneity
There are two common causes of temporal variations in many types of display: those caused by warm-up and lifetime decay. CRTs and LCD panels (as a result of their backlight) take a significant time to warm up after being turned on. This can be long as 45 to 60 minutes during which time the luminance of a uniform patch will gradually increase by as much as 20%. There may also be significant color changes during this period. CRTs have finite lifetimes, partly because of browning of the phosphors by X rays. This causes the phosphors to absorb some light that would otherwise have been emitted. The same process affects the backlights used in LCD panels. The magnitude of this effect is proportional to the total light that has ever been emitted from the tube. Thus, presenting very bright stimuli for long periods will cause it to happen faster. Typical magnitudes of this phenomenon can be a 50% decrease in luminance across a few thousand hours. In addition to an overall decrease in the amount of emitted light, repeated measurements of the gamma function of a 1989-vintage monochrome CRT monitor during its life revealed an increase in the cutoff parameter Go [Eq. (16)] by about six per month (on a scale of 0 to 255), when the monitor was left on continuously. We tentatively attribute this to loss of metal from the cathode, which may affect the MTF as well.
as a linear system that has the specified MTF. When the grating is vertical, some of the “blur” is due to the finite video bandwidth (failure 2 of pixel independence, above), and because the effect is nonlinear, the MTF is only an approximate description. Geometry. For some experiments, for example, judging symmetry, it may be important to produce images whose shape is accurately known. For this, you may wish to measure and correct for the geometric distortions across the face of your monitor (36). Some monitors allow adjusting of the geometry of the displayed image and such adjustment may be used to minimize spatial distortions. Ambient
Correcting
Light
or “Flare”
for
the presence of ambient light [term is easy. If we want to produce C(h), first we compute C’(h) = C(h) -A(h) and simply proceed as described previously (see Characterzzing with the Standard CRT Model), using C’(A) in place of C(k). The same correction can also be applied if one starts with cone coordinates s or tristimulus coordinates x. In these cases, one computes s’ or x’ by subtracting the coordinates of
A(h) in Eq. (l)]
DISCUSSION
The preceding development has specified the steps of basic display characterization. Here we touch on several more advanced points. Further
Characterization
There are other imaging properties, beyond mation, that you may want to characterize.
color transfor-
Modulation Transfer Function (MTF). The optics of the electron beam and phosphor blur the displayed image somewhat. This is linear superposition and does not violate the pixel independence assumption. (The contribution of each blurry pixel may be calculated independently, and the final image is the sum of all of the blurry pixels.) Although it is easier to think of blur as convolution by a point spread, it is usually best to characterize it by measuring the Fourier analog of the point spread, the MTF. To do this, we recommend displaying drifting sinusoidal gratings and using a microphotometer to measure the light from a small slit (parallel to the gratings) on the display. (Your computer should read the photometer once per frame.) This should be done for a wide range of spatial frequencies. Adjust the drift rate to maintain a fixed temporal frequency, so that the results will not be affected by any temporal frequency dependence of the photometer. Results of such measurements are shown in Fig. 6. The monitor’s MTF is given by the amplitude of the sinusoidal luminance signal as a function of spatial frequency. When the grating is horizontal (no modulation along individual raster lines), the blur is all optical and is well characterized
6. CRT monitor Modulation Transfer Function (MTF). The measured contrast of a drifting sinusoidal grating as a function of spatial frequency (normalized to 1 at zero frequency) for a CRT monitor, a 1989-vintage Apple High-Resolution Monochrome Monitor made by Sony. It has a fixed scan resolution of 640 x 480, a gamma of 2.28, and a maximum luminance of 80 cd/m2. As explained in the text, the contrast gain using horizontal gratings is independent of contrast, and thus is the MTF. The “contrast gain” using vertical gratings, normalized by the horizontal-grating result, characterizes the nonlinear effect of limited bandwidth of the video amplifier, which will depend on contrast. It is best to avoid presenting stimuli at frequencies at which the video amplifier limits performance, that is, where the dashed curve falls below the solid curve.
Figure
DISPLAY
the ambient light from the desired coordinates and then proceeds as before. When a monitor viewed in an otherwise dark room has been adjusted to minimize emitted light when R = G = B = 0, and only a negligible fraction of its emission is reflected back to it from surfaces in the room, correcting for the ambient light is probably not necessary. Under other conditions, the correction can be quite important. What
If I Care
Only
About
luminance?
For many experiments it is desirable to use just one primary and set the others to zero. Or, because observers typically prefer white to tinted displays, you may wish to set R = G = B and treat the display as monochrome, using a one-channel version of the standard CRT model. Fine
Control
of Contrast
As noted above, both &bit and 24-bit modes on graphics cards usually use B-bit DACs to produce the video voltages that drive the CRT. The 256-level quantization of the &bit DACs is too coarse to measure threshold on a uniform background. A few companies sell video cards that have more-than&bit DACs.15 There are several ways to obtain finer contrast steps using &bit DACs, but none is trivial to implement: Use a half-silvered mirror to mix two monitors optically (37). Optically mix a background light with the monitor image, for example, by using a slide projector to illuminate the face of the CRT (38). For gray-scale patterns, add together the three video voltages produced by a graphics card to produce a single-channel signal that has finer steps (see 39). For color patterns, add together the corresponding from two synchronized u,ed, Ugreen, and Ublue voltages graphics cards. (Synchronizing monitors can be technically difficult, however.) 4. Move the observer far enough away so that individual pixels cannot be resolved and use dithering or error diffusion (40,41). 5. Use a high frame rate, and dither over time. Use of Standard
Calorimetric
Data
Standard estimates of color matching functions represent population averages for color normal observers and are typically provided for fovea1 viewing at high light levels. These estimates may not be appropriate for individual observers or particular viewing conditions. First, there is variation among observers in which lights match, even among color normal observers (42,43; see also 25,27,44-50). Second, the light levels produced by typical CRTs may not completely saturate the rods. If rods participate significantly in the visual processing of the stimulus, then color matches computed on the l5 A list of such video cards org I tips I vldeocards. html.
may
be found
at http:
llpsychtool
box.
CHARACTERIZATION
183
basis of standard calorimetry will be imperfect for lights viewed outside the fovea. Some guidance is available for determining whether rods are likely to have an effect at typical CRT levels (see 51; also 13). correcting for individual variation Unfortunately, is not trivial, because determining the appropriate color matching functions for a given individual and viewing conditions requires substantial measurement. If individual cone sensitivities or color matching functions are known, then it is straightforward to employ them by substituting them for the standard functions in the matrix S of Eq. (7). In some experiments, stimuli are presented on monitors to nonhuman observers. Here it is crucial to customize the rendering calculations for the spectral sensitivities appropriate for the species of interest (see 52). For psychophysical experiments where taking the role of rods into account is critical, one can consider constructing custom four-primary displays, restricting stimuli to the rod-free fovea or employing an auxiliary bleaching or adapting light to minimize rod activity. For experiments where silencing luminance is of primary interest, individual variation can be taken into account by using a psychophysical technique such as flicker photometry to equate the luminance of stimuli that diller in chromaticity (15). Other
Display
Devices
Although CRT monitors remain the dominant technology for soft copy display, they are by no means the only display technology. Liquid crystal display (LCD) devices are becoming increasingly popular and in fact are the dominant display technology used in laptop computers and data projection systems. Other emerging technologies include the digital light processor (DLP; 53) and plasma displays. Printers, obviously, are also important. Characterization of other display devices requires applying the same visual criterion of accuracy. For each type of display device, a model of its performance must be developed and evaluated. The standard CRT model is a reasonable point of departure for LCD, DLP, and plasma displays. We discuss these in more detail below. Printers are more difficult to characterize. The standard CRT model does not provide an acceptable description of printers. First, printers employ subtractive (adding ink causes more light to be absorbed) rather than additive color mixture, so that Eq. (1) does not hold. Second, the absorption spectrum of a mixture of inks is not easy to predict from the absorption spectra of individual inks. Third, the spectrum of reflected light depends on the inks laid down by the printer and also on the spectrum of the ambient illumination. The illumination under which the printed paper will be viewed is often not under the control of the person who makes the print. Lookup tables (see above) are generally employed to characterize the relationship between digital values input to the printer and the reflected light, given a particular reference illuminant. Accordingly, ICC printer profiles provide for specification of tabular data to characterize printer performance. A detailed treatment of printer characterization is beyond the scope of this article (see 2,3,35,54,55).
184
DISPLAY
CHARACTERIZATION
LCD Panels. LCDs appear commonly in two manifestations: as flat panel equivalents of the CRT for desktop and laptop computers and as the light control element in projection displays. When designed to replace monitors, some manufacturers have tried to imbue them with some of the characteristics of CRT displays, so LCDs have typically accepted the same analog video signal from the computer that is used to drive a CRT. However, since the 1999 publication of the Digital Visual Interface (DVI) standard (http:llwww.ddwg.orgl; see also http:llwww.dell.comluslenlarmltopicslvectors~000dvi. htm), a rapidly increasing proportion of LCD displays accept digital input. At least in principle, digital-input LCDs can provide independent digital control of each pixel and promise excellent pixel independence. LCDs can be characterized similarly to CRT displays, but bear in mind the following points:
1. Angular dependence. The optical filtering properties of LCD panels can have a strong angular dependence, so it is important to consider the observer’s viewing position when characterizing LCD displays. This is especially important if the observer will be off-axis or there will be multiple observers. Angular dependence may be the greatest obstacle to the use of LCDs for accurate rendering. 2. Temporal dependencies. The temporal response of LCD panels can be sluggish, resulting in more severe violations of the assumption of temporal independence than typically observed in CRT displays. For example, when measuring the gamma function, a different result will be obtained if one measures the output for each digital video value after a reasonable settling time than if one sandwiches one frame at the test level between two frames of full output. It is important to try to match the timing of the characterization procedure to the timing of the target display configuration. 3. Warm-up. The LCD panel achieves color by filtering a backlight normally provided by a cold-cathode fluorescent display. These behave similarly to CRTs when warming up, so be prepared to wait for 45 minutes after turning one on before expecting consistent characterization. 4. Channel constancy. As illustrated by Fig. 5, some LCD displays do not exhibit channel constancy. This does not appear to be inherent in LCD technology, however. Measurements of other LCD panels (5657) indicate reasonable channel constancy, at least as assessed by examining variation in the relative values of XYZ tristimulus coordinates. 5. Output tzming. It can be important to know when the light is emitted from the display with respect to generating video signals. In a CRT-based display, the video signal modulates the electron beam directly, so it is easy to establish that the light from one frame is emitted in a 10 ms burst (assuming a 100 Hz frame rate) starting somewhere near the time of the frame synchronization pulse and the top of the screen, but this need not be so in an LCD system. To evaluate the output timing, arrange to display a
single light frame followed by about ten dark frames. Connect one of the video signals (say the Green) to one channel of an oscilloscope and connect the other channel to a photodiode placed near the top of the screen. (Note that no electronics are needed to use the photodiode but the signal may be inverted and the voltage produced is logarithmically, rather than linearly, related to the incident light.) When observing a CRT, it will be possible to identify a short pulse of light about 1 ms or so wide located somewhere near the beginning of the video stream. If the detector is moved down the screen, the pulse of light will move correspondingly toward the end of the video frame. When observing an LCD panel, the signals look completely different. The light pulse is no longer a pulse but a frame-length block, and there may be a significant delay between the video stream and the arrival of the light. In fact, in some displays, the two may have no fixed relationship at all (see Fig. 7). LCD panels (and projec6. Resolutzon. Analog-input tors) contain interlace electronics that automatically resample the video signal and interpret it in a manner suitable for their own internal resolution and refresh rate. This is desirable for easy interface to different computers, but the resampling can introduce both spatial and temporal dependencies that make accurate imaging more difficult. If possible, LCD displays should be run at their native spatial and temporal resolution. Even then, it is not guaranteed that the electronics pass the video signal unaltered, and one should be alert for spatial and temporal dependencies. This consideration also applies to DLP displays. Internal quantization. Analog-input LCDs may actually digitize the incoming video voltages with a resolution of only 6 or 8 bits before displaying it, so be prepared to observe quantization beyond that created by the graphics card. Gamma. There is no inherent mechanism in an LCD panel to create the power-law nonlinearity of a CRT; therefore a power-type function [e.g., Eq. (16)] does not work very well to describe the function relating digital video value to phosphor light intensity. One way to deal with this is to make measurements of the light out for every possible digital video valve in and invert the function numerically. LCD Projectors. Analog-input LCD displays currently offer little advantage in terms of image quality over a CRT display, but the unique problems posed by some environments such as the intense magnetic fields inside a MRI scanner mandate the use of a projector. Apart from CRT-based projectors, there are two types of projection displays commonly available: those based on LCDs, and those based on digital light processors (DLPs). LCD projectors have properties very similar to LCD panels except that the light source is often an arc lamp rather than a cold-cathode fluorescent device, and the pixel processing is more elaborate. Therefore, the same
DISPLAY
CHARACTERIZATION
185
Figure 7. Delay in LCD projection. Diagram showing how the light output from the center of a projection screen, illummated by an analog-mput LCD projector, varies with time when the patch is nominally illuminated for two frames before being turned off. This is compared to the light from a CRT driven by the same video signal. Note the extra 12-ms delay imposed by the LCD projector’s electronics.
considerations that apply to LCD panels generally apply to LCD projectors as well. Projectors are often used in conjunction with “gain” screens, which have nonlambertian viewing characteristics, and as for LCD panels, the light that reaches the observer varies with viewing position. DLP Projectors. Digital light processing (DLP) projectors work by using a two-dimensional array of tiny mirrors (digital micromirrors) that can be deflected by about 20” at high speed; 15 ,USswitching times are usual. These are used as light valves to direct the light from a high-intensity source either through the projection lens or off to the side. A color display is generated either by using three of these devices to deflect colored light filtered from the main source (as in modern LCD projectors) or by using one array operating at three times the net frame rate to deflect colored light provided by a rotating filter wheel. Typically, this rotates at about 150 Hz to give a 50 Hz display. The picture breakup that occurs when the eye makes saccadic movements makes projectors based on the filter wheel design difficult to use. Currently three-mirror DLP projectors are quite expensive in comparison to LCD projectors, and many are designed for large-scale projection (e.g., digital cinema). Packer et al. (58) evaluate how well a three-mirror DLP device conforms to a number of the assumptions of the standard CRT model. Plasma. Plasma displays are much more expensive than CRTs and LCDs but are currently favored by museums (e.g., the Museum of Modern Art and the Metropolitan Museum of Art in New York City) because the plasma display is both flat like an LCD and nearly lambertian (i.e., emitted light is independent of viewing angle) like a CRT.
Instrumental
Considerations
Three types of instrument are typically used to characterize display devices: spectroradiometers, colorimeters, and photometers. Spectroradiometers measure the full spectral power distribution of light. Colorimeters typically allow measuring of CIE XYZ tristimulus coordinates. Photometers measure only luminance (proportional to CIE Y). Zalewski (59) provides a detailed treatment of light measurement. Several sources (1,11,60,61; http:llpsychtoolbox.orgltipsllightmeasure.html) discuss instrumentation for display characterization. A spectroradiometer provides the most general characterization of the spectral properties of a display. One may compute cone coordinates or XYZ tristimulus coordinates from the spectral power distributions measured by spectroradiometers. If one wishes to use a description of color vision that differs from that specified by the CIE system (e.g., 24,25,27,42,62,63), then it is necessary to characterize the display by using a spectroradiometer. However, spectroradiometers are expensive (thousands of dollars), whereas calorimeters often cost less than a thousand dollars and the XYZ measurements they provide are sufficient for many applications. Photometers measure only luminance and thus are suitable only for applications where the color of the stimuli need not be characterized. How much instrumental precision is required? The answer, of course, depends on the application. A simple calculation can often be used to convert the effect of instrumental error into a form more directly relevant to a given application. As an example, suppose that we wish to modulate a uniform field so that only one of the three classes of cones is stimulated, and the modulation is invisible to the other two. Such “silent substitution” techniques are commonly used in visual psychophysics to allow isolation of individual mechanisms (e.g., individual cone types; see 64,65). Figure 8 shows an
186
DISPLAY
CHARACTERIZATION
8. CRT red primary emission compared with L cone sensitivity. The solid line is a plot of the spectral power distribution of a typical CRT red primary. The measurements were made in David Bramard’s lab and have approximately 1-nm resolution. The dashed line is a plot of an estimate (24) of the human L cone spectral sensitivity. Both curves have been normalized to a maximum of 1. Note that considerable light power is concentrated in a region where the slope of the L cone sensitivity is steep. Figure
estimate of the spectral sensitivity of the human L cone along with the spectral power distribution of a typical CRT red phosphor emission. Considerable phosphor power is concentrated in the spectral interval where the slope of the L cone sensitivity function is steep. One might imagine that calculations of the L cone response to light from this phosphor are quite sensitive to imprecisions in the phosphor measurement. To investigate, we proceeded as follows. We started with measurements of a CRT’s phosphor emissions that had roughly a 1-nm resolution. Then, we simulated two types of imprecision. To simulate a loss of spectral resolution, we convolved the initial spectra using a Gaussian kernel 5 nm wide (standard deviation). To simulate an error in spectral calibration, we shifted the spectra 2 nm toward the longer wavelengths. We computed three stimulus modulations for each set of simulated measurements. Each modulation was designed to generate 20% contrast for one cone type and silence the other two. Then, we used the original spectral measurements to compute the actual effect of the simulated modulations. The effect is very small for the reduced resolution case. The maximum magnitude of modulation in cone classes that should be silenced is 0.25%. However, the 2-nm wavelength shift had a larger effect. Here, a nominally silenced cone class can see almost 2% contrast. For this application, spectral calibration is more critical than spectral resolution. One point that is often overlooked when considering the accuracy of calorimeters and photometers is how well the instrument’s spectral sensitivities match those of their target functions: X, Y, and 2 color matching
functions for calorimeters or luminance sensitivity (Y) for photometers. To characterize the accuracy of a photometer, for example, it is typical to weight differences between instrumental and luminance spectral sensitivity at each wavelength in proportion to luminance sensitivity at that wavelength. This means that the specified accuracy is predominantly affected by discrepancies between instrumental and luminance sensitivity in the middle of the visible spectrum, where luminance sensitivity is high. The specified accuracy is not very sensitive to discrepancies in the short- or long-wavelength regions of the spectrum, where luminance sensitivity is low. A photometer that is specified to have good agreement with the luminance sensitivity function will accurately measure the luminance of a broadband source, but it may perform very poorly when measurements are made of spectrally narrower light, such as that emitted by CRT phosphors. The indicated luminance in such a situation can be wrong by 50% or more. A final question that arises when using any type of light measuring instrument for a CRT is whether the instrument was designed to measure pulsatile sources. The very short bursts of light emitted as the raster scans the faceplate of the CRT have peak intensities of 10 to 100 times the average intensity. This can distort the result obtained by using electronics not designed for this possibility. Recommended
Software
ICC. For many purposes, sufficiently accurate imaging may be obtained simply by using any available commercial package to characterize the CRT and create an ICC profile and then using an ICC-compatible image display program. Vision Research. After using a wide variety of software packages (e.g., see directories 66,67) for running vision experiments, we have come to the conclusion that it is very desirable that the top level, in which the experimenter designs new experiments, be a full-fledged language, preferably interactive, like BASIC or MATLAB. Writing software from scratch is hard. Some packages overcome this limitation by allowing users to design experiments by just filling out a form rather than actually programming. However, this limits the experiment to what the author of the form had in mind and makes it impossible to do a really new experiment. The best approach seems to be to program in a general purpose language and write as little new code as possible for each new experiment by modifying the most similar existing program. In that spirit, the free Psychophysics Toolbox (http: llpsychtooZbox.orgl> provides a rich set of extensions to MATLAB to allow precise control and synthesis of visual stimuli within a full-featured interactive programming environment, along with a suite of demo programs (68,69). Psychophysica (http: llvtsion.arc.nasa.gov lmathematica lpsychophysical), also free, provides similar extensions for Mathematica (70). Rush
puters
-
“Hogging”
interrupt
the
Machine.
the user’s
program
Today’s popular comfrequently to grant
DISPLAY
time to other processes. This is how the computer creates the illusion of doing everything at once. The difficulty with this is that the interruptions can cause unwanted pauses in the temporal presentation of a stimulus sequence. Although it is difficult to shut down the interrupting processes completely, we have found it both possible and desirable to suppress interrupts, hogging the machine, for the few seconds it takes to present a visual stimulus. In the Psychophysics Toolbox (see above), this facility is called Rush and allows a bit of MATLAB code to run at high priority without interruption. SUMMARY Today’s graphics cards and CRT monitors offer a cheap, stable, and well-understood technology suited to accurate rendering after characterization in terms of a simple standard model. This article describes the standard model and how to use it to characterize CRT displays. Certain stimuli will strain the assumption of pixel independence, but it is easy to test for such failures and often possible to avoid them. New technologies, particularly LCD, DLP, and plasma displays, are emerging as alternatives to CRT displays. Because these technologies are less mature, there are not yet standard models available for characterizing them. Development of such models can employ the same rendering intent and color vision theory introduced here, but the specifics of the appropriate device model are likely to differ. Digital-input LCD displays promise excellent pixel independence but may be difficult to use for accurate rendering because of the high dependence of the emitted light on the viewing angle.
9. P. A. Keller, Applications, 10.
11. 12.
CHARACTERIZATION
The Cathode-Ray Palisades Press,
Tube: Technology, NY, 1991.
D. H. Brainard and chophysics bibliography, terbib.html.
D. G. Pelli, Raster http:llpsychtoolbox.orgl
15.
P. K. Kaiser and R. M. Boynton, 2nd ed., Optical Society of America,
18. W. B. Cowan and S33-S38 (1986). 19. D. H. Brainard,
N. Rowell,
Color
Color
Res. Appl.
Res.
21.
D. G. Pelli, Spat&al V&on 10, 443-446 nyu.edulVideoToolboxlPixelIndependence.html.
22.
A. B. Watson et al., Behav. 587-594 (1986).
23.
B. A. Wandell,
Res. Methods
Foundations
MA, 1995. V. Smith
25.
P. DeMarco, 1,465-1,476
and J. Pokorny,
26.
CIE, Calorimetry, 2nd ed., Bureau VIENNA, 1986, publication 15.2. A. Stockman and L. T. Sharpe, Color http:l/cv~s~on.ucsd.edul~ndex.htm.
J. Pokorny, (1992).
Viszon and
Res.
31.
A. C. Naiman Visual Process.
32.
Q. J. Hu (1994).
33.
D. H. Brainard, W. A. Brunt, Am. A 14,2,091-2,110 (1997).
and
34.
E. S. Olds, 1,501-1,505
W. B. Cowan, (1999).
and
P. Jolicoeur,
J. Electron.
Imaging
4. ICC, Specification ICC.1:1998-09 1998, http: II www.color.orglprofiles.html. 5. G. Gill, ICC File argyll lcolor.html. 6. D. Wallner,
Engineering,
Practical Gucde to Color Foundation, Sewickley,
file format
I/O README,
for color
http:llweb.access.net.au/
Building ICC Profiles-The Mechanics 2000, http:llwww.color.orgl~ccprofZes.html.
7. G. Gill, What’s wrong with 1999, http: II web.access.net.au 8. D. E. Pearson, TransmcssLon mation, Pentech Press, Wiley,
the
profiles,
ICC largyll
profile
format of Pictortal 1975.
Behav.
37.
C. C. Chen, J. M. Foley, 773-788 (2000).
D. H. Brainard,
38.
A. B. Polrson (1996).
14, Int.
172-186 Sym.
Tech.
Color
Res.
94 Dig.
Viscon 25,
19-22
J. Opt.
J. Opt.
Sot.
Sot. A 16,
2, 53-61(1993).
B. A. Wandell,
and L. Zhang,
database,
J. M. Spelgle,
L. T. Maloney and K. Koh, Comput. 20, 372-389 (1988).
39. D. G. Pelli
Appl.
CIE,
SPIE Conf: Human III, 1666, 1992,41-56.
P. C. Hung,
and
la
vision
Display
A. Klein, Sot. Inf: Display
and
Sot. A 9,
de
M. E. Gorzynski,
and W. Makous, Digital Display,
and S.
Res.
Sot. Inf
J. Opt.
and
35.
anyway’? Infor-
Color
and
18,
Sunderland,
Central
36. and
f &cc-problems.html.
and Display New York,
C. S. Calhoun,
Comp.
15, 161-171(1975).
V. C. Smith,
1, NY,
and
http:llvlslon.
Sinauer,
R. S. Berns, R. J. Motta, Appl. l&299-314 (1993).
27.
11,
gamma,
Instrum.
of Viszon,
24.
Suppl.
about (1997);
30.
GATF Technical
Vision, 1996.
(1989).
C. Poynton, Frequently-asked questions http:llwww.tnforamp.netl-poyntonlGammaFAQ.html.
in
3. R. Adams and J. Welsberg, Management, Graphic Arts PA 1998.
Color DC,
Appl.
14,23-34
20.
N. P. Lyons and J. E. Farrell, Dig. 20, 220-223 (1989).
vol.
psy-
S. Daly, Sot. Inf: Display 2001 Dig. XXXII, 1,200-1,203 (2001). 17. R. A. Tyrell, T. B. Pasquale, T. Aten, and E. L. Francis, Sot. Infi Display 2001 Dig. XXXII, 1,205-1,207 (2001).
29.
of Optics, McGraw-Hill,
tipslras-
16.
BIBLIOGRAPHY
ed., Handbook and Design,
graphics
Human Washington,
D. L. Post (1989).
2. W. B. Cowan, in M. Bass, Fundamentals, Techntques, 1995, pp. 27.1-27.44.
and
13. G. Wyszecki and W. S. Stiles, Color Sctence-Concepts and Methods, Quantitative Data and Formulae, 2nd ed., J Wiley, NY, 1982. 14. D. H. Brainard, in M. Bass, ed., Handbook of Optics, vol. 1, Fundamentals, Techmques, and DesLgn, McGraw-Hill, NY, 1995, pp. 26.1-26.54.
28.
Color
History,
T. R. H. Wheeler and M. G. Clark, m H. Widdel and D. L. Post, eds., Color Ln Electrome Displays, Plenum Press, NY, 1992, pp. 221-256. P. A. Keller, Electronic Display Measurement - Concepts, Techmques and Instrumentation, J Wiley, NY, 1997.
Acknowledgments We thank J. M. Foley, G. Horwitz, A. W. Ingling, T. Newman, and two anonymous reviewers for helpful comments on the manuscript. More generally, we learned much about monitors and their characterization from W. Cowan, J. G. Robson, and B. A. Wandell. This work was supported by grant EY10016 to David Bramard and EY04432 to Denis Pelli.
1. D. L. Post, in H. Widdel and D. L. Post, eds., Electronic Displays, Plenum, NY, 1992, pp. 299-312.
187
Viszon
Res.
Vision
Methods
Instrum.
Vision Res.
36,
Res. 31, 1,337-1,350
Res.
40,
515-526 (1991).
188
DYE TRANSFER
40.
C. W. Tyler,
41.
J. B. Mulligan 1,217-1,227
Spatial
42.
W. S. Stiles
43.
M. A. Webster and 1,722-1,735 (1988).
PRINTING
DYE TRANSFER PRINTING
10, 369-77 (1997).
Viszon
and (1989).
TECHNOLOGY
L. S. Stone,
and J. M. Burch,
J.
Opt.
Optica
Sot.
Acta
D. I. A. MacLeod,
Am.
6, l-26
J. Opt.
A
6,
Am.
J. Nathans
et al., Sczence
232,
203-210
This article describes the structure and operation of dye-sublimation thermal-transfer printing technology. In addition to the structure and operation, the capabilities and limitations of the technology such as resolution, speed, intensity levels, color accuracy, and size are described. Though the subject of this chapter is the dye-sublimation printer, the details of other thermal transfer printers are included to assist in understanding and comparing the capabilities and limitations of the dye-sublimation printer.
(1986).
(1993).
M. Neitz
and J. Neitz,
Sczence
267,
1,013-1,016
(1995).
50 J. Carroll, C. McMahon, M. Neitz, and J. Neitz, J. Opt. Sot. A 17,499-509(2000). 51. A. G. Shapiro, J. Pokorny, and V. C. Smith, J. Opt. Sot. Am. A 13, 2,319-2,328 (1996). 52. L. J. Fleishman et al., Anzm. Behav. 56, 1,035-1,040 (1998). 53. L. J. Hornbeck, Digztal light processzng for hzgh-brzghtness hzgh-resolution 1997.
applications.
54. R. W. G. Hunt, Press,
Tolworth,
55. B. A. Wandell
Texas
The Reproduction England, 1987.
Instruments
of Colour,
and L. D. Silverstein, of Color, 2nd ed., Optical DC, 2001.
The Sczence Washington,
m
56. M. D. Fairchild
and D. R. Wyble, terization of the Apple Studio LCD). Munsell Color Science Rochester Institute of Technology, http://www.~~s.rzt.edulmcsllresearchlreports.shtml.
HISTORY
Report,
4th ed., Fountain S. K. Shevell, ed., Society of America,
Colorzmetrzc characDisplay (Flat Panel Laboratory Report, Rochester, NY, 1998;
and M. D. Fairchild, Colorzmetrzc characterzzation of three computer displays (LCD and CRT). Munsell Color Science Laboratory Report, Rochester Institute of Technology, Rochester, NY, 2001; http:ll www.czs.rit.edul mcsl Iresearch lreports.shtml.
58 0. Packer
et al., Viszon
E. F. Zalewski, McGraw-Hill,
Res.
41,427-439
(2001).
in M. Bass, ed., Handbook NY, 1995, pp. 24.3-24.51.
of Optics,
vol.
2,
60. R. S. Berns, M. E. Gorzynski, and R. J. Motta, Color Res. Appl. l&315-325(1993). and D. G. Pelli, Light measurement mstru61. D. H. Brainard mentation,
http:llpsychtoolbox.orgltipsllightmeasure.html.
62. J. J. Vos, Color Res. Appl. 3, 125-128 (1978). 63. A. Stockman, D. I. A. MacLeod, and N. E. Johnson, J. Opt. Sot. Am.A 10(12),2,491-2,521(1993). 64. 0. Estevez and H. Spekreijse, Viszon Res. 22,681-691(1982). 65. D. H. Bramard, in P. K. Kaiser and R. M. Boynton, eds., Human Color Viszon, Optical DC, 1996, pp. 563-579.
66. LTSN
Psychology, ch-resources. html.
Society
of America,
http:llwww.psychology.ltsn.ac.uklsear-
67 F Florer,
Powermac software for experimental http:/lviszon.nyu.edulTipslFaithsSoftwareRevzew.html.
68. D. H. Brainard, 69.
D. G. Pelli,
70. A. B. Watson (1997).
Washington,
psychology,
10,433-436 (1997). 10, 437-442 (1997). and J. A. Solomon, Spatial Visson 10, 447-466 Spatial
Spatzal
Viszon
Viszon
OF PRINTERS
USING
DYE
TRANSFER
Dye-transfer printers form images on sheets using different types of energy such as mechanical pressure, heat, and light. Impact printers use mechanical pressure whereas nonimpact printers use heat and light. In this article, the histories of both impact and nonimpact printers are overviewed. Which is important to understand the position of the dye-sublimation printer in the history of dye-transfer printers. The history following includes both impact printers and nonimpact printers which include the dye-sublimation printer. Impact
57. J. E. Gibson
59
MATSUSHIRO Corporation Japan
INTRODUCTION
46, LT. Neitz and G. H. Jacobs, Vision Res. 30, 621-636 (1990). et al., Nature 356, 431-433 (1992). 47 J. Winderickx 48 J. Neitz, M. Neitz, and G. H. Jacobs, Viszon Res. 33,117-122 49
Okidata Gunma,
A 5,
44. J. Pokorny, V. C. Smith, G. Verriest, and A. J. L. G. Pinckers, Congenztal and Acquzred Color Viszon Defects, Grune and Stratton, NY, 1979. 45.
NOBUHITO
(1958).
Sot.
TECHNOLOGY
Printers
Using
Dye
Transfer
1960 Development of the horizontal chain line printer, the original line printer. established the position of 1961 The golf ball printer the serial impact printer. 1970 Development of the daisy wheel printer, a typeface-font serial impact printer, the successor to the golf ball printer. 1970 The dot matrix printer is the mainstay of current serial impact printers. Nonimpact
Printers
Using
Dye
Transfer
Development of wax-type thermal-transfer recording paper. Development of the early sublimation dye transfer method. 1940s Development of heat-sensitive paper using almost colorless metallic salts. of colorant melting by expo1950s Commercialization sure to infrared radiation. Disclosure of the patent for the thermal printer. Development of a prototype electrosensitive transfer printer. 1960s Development of the leuco-dye-based heatsensitive recording sheet, a current mainstay in the market. 1969 Development of the heat-mode laser-based transfer printer. 1930s
DYE TRANSFER
1970s
The dye-sublimation printer, electrosensitive transfer printer, and wax-melt printer were developed in succession. 1971 Dye-sublimation printer was developed. 1973 Electrosensitive transfer printer was developed. 1975 Wax melt printer was developed.
THERMAL-TRANSFER
PRINTERS
PRINTER
This is one of the most promising printers in recent years. Sublimation is a process by which a solid is transformed directly into a vapor without passing through a liquid phase. This cycle forms the basis of dye-sublimation printing.
Figure
1.
Basic structure
of the dye-sublimation
process.
TECHNOLOGY
189
Structure
In the dye-sublimation printer, as shown in Fig. 1 and described in the outline, ink that contains a dispersed dye is applied to ink sheets. Upon heating, the dye is transferred to the sheet, and then resolidifies at room temperature. Therefore, dry-process printing is possible. Sublimation
In principle, images are formed thermally via physical or chemical reactions of solid inks by using a heating element. The principle of image formation by the thermaltransfer printer is depicted in Fig. 1. Figure 1 shows the dye-sublimation printer which is the most representative of the thermal-transfer printers. The thermal-generating elements are heated by applying an input voltage. The heated area of the ink which coats the base material sublimes onto the recording sheet, thereby forming images. The advantage of the thermal-transfer printer is that it does not require an ink supply, an ink recovery mechanism, or a clog recovery system as in ink-jet printers. It requires only an ink-sheet or coloring sheet supply mechanism, that makes the recording and printing mechanisms simple. Due to these advantages, small printers are possible. By controlling the amount of heat applied, gradation control by dot density control is also possible; high-quality images that have high resolution and gradation can be produced. The disadvantage of the thermal-transfer printer is that it requires special paper; accordingly, the operating cost is higher than that of printers which use plain paper. The dye-sublimation printer and the wax-melt printer are representative types of thermal-transfer printers. DYE-SUBLIMATION
Basic
PRINTING
Dye,
Ink
Sheet,
and
Recording
Sheet
The very fact that sublimation dyes sublime is in itself proof that they are essentially unstable. At the same time, however, recorded materials must be stable, which is contradictory to the nature of sublimable dyes. Accordingly, the selection of dyes is the most important factor in determining the success or failure of ink-sheet manufacturing. In the early 198Os, dispersed dyes and materials related to the dispersed dye group were used as ink dyes. Thereafter, due to characteristics such as light resistance and color reproducibility, required for hard dyes were developed. In addition to copies, special regularly dispersed dyes and basic dyes, new dyes that have the same molecular structure as the pigments used for color photosensitive materials have been recently developed. To reduce the cost of printing systems, multiple transferable ink sheets have been developed. Using these sheets, large quantities of sublimable dyes are included in a thermoplastic resin layer; this makes up to 10 printing cycles possible from the same sheet at identical density. The issue of sublimable dyes is most important for sublimation thermal-transfer recording. Progress in the area of materials, development of dyes with appropriate subliming characteristics, characteristics for dispersing by diffusion, high absorption coefficient, severe weather tolerance and good saturation, as well as progress in the area of recording sheets are the most important issues in the development of sublimation printers. To reproduce the full color spectrum, ink sheets are used onto which dyes of the three primary colors, cyan, magenta and yellow, are applied, In addition to these three primary colors, black is used. The reproduction of colors is realized by subtractive color mixture. In addition, the dyes applied onto the recording sheet must have high linearity with respect to the applied heat and many levels of gradation. The energy required to transfer from this type of ink sheet and recording sheet, is greater than that in the wax-melt process. The requirements for the ink sheets with respect to the thermal head are (1) the ink sheets should not degrade the thermal head, (2) the ink sheets should not adhere to the thermal head, and (3) no ink should remain on the thermal head. Furthermore, the heat-resistant layer itself should not affect the properties of the dye layer. The requirement for the recording sheets onto which the sublimable dye is transferred with respect the thermal head is that the recording sheets should not degrade the thermal head with hard chemical compounds included in the recording sheets.
190 Sublimation
DYE TRANSFER
PRINTING
TECHNOLOGY
Recording
The density gradation method, in which the dots themselves possess gradations, can be realized when using sublimation thermal-transfer sheets (ink sheets and recording sheets). The essence of sublimation thermal transfer is that density modulation is possible for each dot. The density of each dot can be controlled by the pulse width of input signals; thus, by increasing or decreasing joule heat, the amount of dyes contained in the ink layer that is sublimed and transferred to the recording sheets is modulated. This control itself is an analog process. The temperature of the thermal head reaches 200-300 “C momentarily; thus it is essential that the ink sheets be made of heat-resistant materials. Only polyester films are available as low cost sheets that have heatresistant properties and strength. However, polyester films have short useful lives under these conditions. A method to incorporate layers that have improved longevity is being studied and should soon be used. Recording
WAX-MELT
Features
Because the recording system can reproduce 64 or more gradations, it is used in pictorial color printers. Therefore, due to changes in environmental temperature and heat accumulation in the thermal head, the thermal balance can be misdistributed and errors in recording density can result. These effects have a serious impact on sublimation systems that use analog gradation recording systems: they can be the factors that cause problems of non uniformity density in the images and also in the degradation of image quality, including color reproducibility, resolution, and MTF. The recording sensitivity is one-third that of thermal sensitivity systems, and the operating cost of these systems is high due to the requirements of the image processing features; furthermore, they are not suitable for high-speed recording. Although their operating cost is high, because of their excellent color reproducibility, 2.0) and high density level (optical density value: high tone image (gradation: 64 or higher), they have been incorporated into products that require high image quality, such as digital color proofing systems, video printers, over-head-projectors and card-transferring machines. Advantages
and
printing quality close to that of photographic images is required, such as in color proofing and production of pictorial color copies. In addition, this system can be used for applications ranging from personal use such as video printing to professional outputs that involve medical imaging and measurement devices. However, prints have some disadvantages: poor durability, retransfer of transferred dyes, and early degradation of sections touched by fingers. Various methods to alleviate these problems have been developed for practical use. Some representative examples, such as a method to cover the entire recording sheet with a protective layer that contains UV absorbents and a method to stabilize the characteristics of the staining agent itself, have been effective. Another disadvantage is the high price of sheets; however, sheets for multiple transfers, which can be used several to ten times, have been developed.
Disadvantages
The advantages of sublimation thermal-transfer systems include the fact that they can reproduce images that have many gradations. The highest possible range of gradations is 256 levels. They also offer high-density images and produce excellent gradations in low-density areas. Resolution can be improved by reducing the dot diameter. Furthermore, due to the spreading of the ink transferred, the grainy nature of dots disappears, rendering images smooth. In addition, due to the simple printing mechanism, the printer can be fabricated compactly and inexpensively. The dye-sublimation printer creates a continuous tone and can produce color pictures whose image quality is comparable to that obtained by photographic systems. Thus, this system is suitable when
PRINTERS
In the wax-melt printer, when a sheet coated with a heatmeltable ink is applied by heating signals in the thermal head, the ink is transferred onto a recording sheet, and images are formed. Because the density characteristics of this recording system are binary, the dither method or another similar method is used to achieve gradation. Thus, the resolution of this method tends to be low. The melt-type thermal-transfer system produces solid and clear images; thus, the system is used to prepare bar codes. Basic
Structure
The mechanisms of two thermal-transfer printers are compared in Fig. 2. The major difference the thermaltransfer mechanisms of the two systems is that, compared with the wax-melt printer, the dye-sublimation printer requires higher thermal energy. Wax
Melt,
Ink
Sheet,
and
Recording
Sheet
The ink used in the wax-melt printer is a wax containing pigments whereas the dye-sublimation printer uses a resin that contains a sublimable dye. Ink which coats sheets is produced by dispersing coloring materials such as pigments in a wax medium such as paraffin wax. Pigments are used as coloring materials to provide color tones to recorded materials; pigments whose characteristics are very close to those of printing ink can be used. Vehicles function as carriers for pigments during heating, melting, and transferring by the thermal head. At the same time, they cool and bind the pigments onto the recording sheets. Paraffin wax, carnauba wax, and polyethylene wax are used as vehicles. The wax-melt printer employs more stable dyes and pigments as coloring materials than the dye-sublimation printer for high-quality images. Plain paper can be used as the recording sheet for the wax-melt printer, whereas the dye-sublimation printer requires special recording sheets.
DYE TRANSFER
PRINTING
Figure 2. Wax-melt thermal sublimation thermal-transfer.
Recording
Features
The image produced by melt transfer has excellent sharpness; the edges of images are the sharpest among the images obtained by all types of recording systems. THERMAL
HEAD
191
TECHNOLOGY
transfer
and
dye-
the production of large-sized thin-film heads is difficult, and the manufacturing process is complex. Figure 3 shows the structure of thick-film heads and thin-film heads. In sublimation transfer, the conduction time of the current to the thermal head is 2-20 ms, and that in melt transferring is approximately 1 ms. The difference in duration arises from the difference in recording processes, which are reflected in the recording speed.
Requirements
The following thermal head:
basic
characteristics
are required
for the
1. For high-speed printing, the thermal head must have properties that allow rapid heating and cooling. To meet these requirements, the thermal head must have low heat capacity. 2. For high-resolution printing, high-density resistor patterns must be realized. 3. For extended life, the thermal head must be durable throughout continuous rapid heating and cooling cycles. 4. For reduced power consumption, the heat energy supply must be highly efficient. Structures
and
Features
Some types of thermal heads have been developed that meet these requirements. Thermal heads can be classified as (1) semiconductor heads, (2) thick-film heads, and (3) thin-film heads. A semiconductor head has not yet been practically applied because of its slow speed resulting from poor thermal response. A thick-film head is the most suitable for large printers. The heating element is formed using screen-printing and sintering technology. The advantage of this type of head is that the fabrication process is simple, and thus it is suitable for mass production. The thick-film head is used practically for large (A0 paper size) sheet printers and is frequently used for heat-sensitive recording by facsimile machines. The thin-film head has an excellent thermal response and thus is suitable for high-speed printing. It is frequently used in sublimation thermal transfer and melt transfer. However,
Temperature
Control
For ideal temperature control, when an input voltage signal is applied, the thermal head temperature is controlled at a preset temperature, and when the input signal is set to zero, the thermal head temperature returns to its original value immediately. This kind of control is not easily realized because of to heat hysteresis (Fig. 4); however, several methods which use special arrangements have been proposed. Concentrated
Thermal-Transfer
Head
Figure 5 shows the pattern of the heating elements of the thermal head used in the concentrated thermal-transfer system. As shown in Fig. 6, the thermal head in this system has many narrow regions. When using a heating element of this type, high-temperature sections are created in the high-resistance section, and heat transfer starts at areas centering around these high-temperature sections. As the applied energy is increased, the transferred areas
Figure
3. Thick-film
and thin-film
heads.
DYE TRANSFER
192
PRINTING
TECHNOLOGY
Figure
Figure
head.
4.
Transition
of the surface temperature
of a thermal
expand around these centering points. By incorporating sharp heat-generating distribution in the narrow regions, area-based gradation, in which the transfer recording area within one pixel of the heating element is altered, has been realized. This system can also be used with melt ink sheets. The thermal head is the major feature of this system. Ultrasonic
Thermal-Transfer
In this system, heat of ultrasonic vibrations,
Head
is
generated by the impact and the ink is melted and
transferred. This at high speed. Thermal
Thermal
Figure
5. Heating heads.
elements
of conventional
and concentrated
Concentrated
system
thermal-transfer
produces
head.
high-quality
images
Issues
issues related
to the thermal
head are as follows:
1. Issues arising from the temperature increase of the thermal head These issues include the operating temperature limit of the thermal head and the deterioration of the heatresistant layer on the thermal head side of the ink sheet. To cope with these problems, various arrangements related to the heat radiation fin have been incorporated. 2. Issues arising from thermal conditions of the thermal head These issues are related to the problem arising from the constant changes in the thermal head temperature during printing. In particular, measures that counteract transient temperature changes are required. These measures include optimum control by temperature detection using thermistors and temperature prediction using hysteresis data from gradation printing. Wear
thermal
6.
Issues
The thermal head maintains contact with the heatsensitive recording sheet; and, the thermal head wears due to frictional contact with the heat-sensitive recording sheet. In particular, if the heat-sensitive recording sheet contains hard chemical compounds, wear occurs rapidly. In addition to this type of mechanical wear, chemical wear also occurs. Because the surface of the thermal head is glassy, it is corroded by alkali ions and other substances contained in the heat-sensitive recording sheets. Utmost attention must paid to the causes of wear in developing thermal heads.
DYE TRANSFER Resolution
PRINTING
TECHNOLOGY
Progress
Resolution at 150 dpi (1980), 200 dpi (1987), 400 dpi (1996), and 600 dpi (1998) has been introduced for thermal heads. A 1200-dpi thermal head may be introduced in the 2000s.
DRIVING line
METHOD
Sequential
Method
As shown in Fig. 7, two driving methods are available. In the serial line sequential method, the thermal head is shifted for each color, and the recording sheet is gradually moved forward. In the parallel line sequential method, each color is aligned in parallel, the thermal head is shifted in parallel, and the recording sheet is moved forward. Area
Sequential
Figure
8. Area
sequential
method.
Method
As shown in Fig. 8, transfer sheets that are the same size as the recording sheets are continuously aligned, and transfer is carried out onto the entire area of the sheet. This method has high processing speed; and currently, most printers use this method. Typical
Configuration
Figure 9 depicts a typical configuration of a thermalThe system shown in Fig. 9 is the transfer printer. swing type; as the recording sheet completes recording
Figure
9. Typical
configuration
of a thermal-transfer
printer.
in one color, the sheet returns to its initial position. This type of printer is inexpensive and compact. A drumwinding system and a three-head system also exist. In the former, registration is relatively simple, and return of the recording sheet is not required; this contributes to a reduction in recording time. The three-head system is suitable for high-speed recording. ITEMS
RELATED
Reproduction
TO
IMAGE
of Density
PROCESSING
Gradation
The reproduction of color density by various printers can be classified into three methods, as illustrated in Fig. 10. In the first method, the density of each pixel changes. In this system, each pixel can be given a continuous change of color density, and nearly-perfect gradation can be reproduced over the entire gradation range. In the second method, an area is covered by a fixed number of dots whose color density is constant and whose dot size changes in relation to the density. In this method, the area of ink-covered dots changes; thus, when an image produced by this method is observed from a distance at which each dot in the image is not observable due to the resolution
DYE TRANSFER
PRINTING
TECHNOLOGY Color
Reproduction
Range
Figure 11 shows the color reproduction range on the CIE X, y chromaticity diagram. The three primary colors, Y (Yellow), M (Magenta), C (Cyan) of sublimable dyes have almost the same color reproduction range as that realized by offset color printing; they are suitable for use in fullcolor printer recording systems. VARIOUS Improvement
limit of the human eye, a coherent density change can be observed. This method is called the dot-size variable multivalue area pseudodensity reproduction method. In the third method, the number of dots whose dot density and size are constant is changed in an area; this is called the bilevel area method.
IMPROVED
SYSTEMS
in Processing
Speed
Generally, thermal-transfer color printers are inferior to the electrophotographic printers in processing speed. In this context, a one-pass full-color system has been developed. Figure 12 shows the construction of such a system. The one-pass system has four independent recording sections, Y, M, C, and K. The recording sheet is forwarded to the four sections in order. The disadvantage of this system is that the sheet-forwarding section is complex and, therefore, color deviations tend to develop easily. Even if the rotational speed of the drive roller is constant, the speed of the recording sheet varies due to the load and back-tension during forwarding, which results in color deviations. The correction of back-tension is insufficient to eliminate color deviations. Color deviations are prevented by detecting the speed of the recording sheet directly using a detection roller and controlling the drive roller to maintain a constant recording sheet speed. This is one of the countermeasures against color deviation. Improvement
of Durability
of Dye-Sublimation
Prints
The durability of images is poor in dye thermal transfer, due to the inherent nature of the dyes themselves. For example, discoloration due to light irradiation is a problem. To correct this problem, the following measures are adopted: l
Figure
11.
Color
reproduction
range
(CIE
x, y).
A compound that reacts with a transferred dye and stabilizes the resulting substance is mixed in the recording layer.
Figure
12.
One-pass
full-color
system.
DYE TRANSFER l
l
l
The recording layer is hardened after image recording. An ultraviolet-absorbing material or similar substance is added to the recording layer. A protective layer is applied as the surface layer.
Improvement
of Wax-Melt
Sheet-recycling
Printer
Printer
Used sheets must be discarded in thermal-transfer recording; this is problematic. In the sheet-recycling system, ink for transfer is pumped from a heated ink tank using a heated roller and is applied uniformly to an ink sheet. The ink on the used sheet is scraped off by the heated roller, returned to the ink tank, and reused. Thermal
Rheography
By introducing the idea of ink-jet recording, the disadvantage of the wax-melt printer that is, the rate at which ink sheets are consumed, has been reduced. In this system, a small hole is made at the center of the resistor in the thermal head, and solid ink which is melted by heating ink is transferred onto the recording sheets through the hole. OTHER
PRINTERS
Electrosensitive
BASED Transfer
ON
THERMAL
TRANSFER
Printer
In the electrosensitive transfer printer, colored materials are transferred using joule heat from electric conduction through a coated layer of high electric resistance which is placed under the sublimable dye or thermal-transfer ink layer (Fig. 13). The amount of ink transferred in this system, can be changed in accordance with the duration of electric conduction, and expressions of gradation are possible in each pixel. Currently, this system is not popular because the production cost of recording materials for this system is high compared with that of other systems. Light-Sensitive
Microcapsule
Printer
The capsules for this system are hardened by W exposure, unexposed capsules are crushed by pressure, the pigments in capsules are exposed and transferred onto the sheet,
13.
Structure
of electrosensitive
transfer.
TECHNOLOGY
195
and images are produced. The wall material of the capsules is a polymer of urea and formaldehyde. Acrylic monomers, a polymerization initiator, spectral sensitizers, and leuco dyes are contained in the capsules. Each capsule is 5-8 nm in diameter. The recording sheet contains a developer. After exposing capsule-containing sheets to W light, the sheets are superimposed on recording sheets and put through a pressure roller. Although exposed and hardened capsules are not crushed, the unexposed capsules are crushed, and the leuco dyes are transferred onto the recording layer. At this point, the developer and dyes in the capsules make contact with each other, and colors are produced. Micro
Dry-Process
Printer
Basically, the process is the same as that used in the conventional melt thermal-transfer system; however, in micro dry processing, to realize both an increase in contact pressure between the sheet surface and ink and retain ink on the sheet surface, ink penetration is suppressed by using a high viscosity ink. The high viscosity ink transfer process is explained here. Micro dry-process printing of the following four processes. 1. Ink pressurization Conventionally, ink in which the main component is an ethylene vinyl acetate copolymer has been used as a thermoplastic material which is resistant to high pressurization. 2. Ink heating To realize high-quality printing, the heat responsiveness of the thermal head and ink has been improved. The thermal head and ink are prepared using thin films, thus decreasing the heat capacity as well as the heat conduction delay. 3. Ink pressurization and cooling Using a resin-type thermoplastic ink which has a high viscosity to control ink characteristics in accordance with temperature changes during pressurization and cooling is important; excellent characteristics are achieved. 4. Base peel-off A design in which importance is placed on the conditions under which pressurized ink sections are transferred and nonpressurized ink sections are not transferred is adopted. Dye Transfer Source
Figure
PRINTING
Printer
Using
a Semiconductor
laser
as a Heat
Image resolution in a system that uses a thermal head as the heat source, depends on the level of integration of heating elements in the thermal head. Even if higher resolution is desired, there is a limitation in the fabrication of the thermal head. A dye-transfer printer using a laser as a heat source is being studied to resolve this problem. The dye sheet used in the printer is basically the same as that used in conventional dye-transfer printers; it consists of a base film and a dye layer, and the sheet consists of a base sheet and a dye-receiving layer. In addition, it is necessary to incorporate a layer that
196
DYE TRANSFER
PRINTING
Figure 14. (a) Surface-absorbing donar-layer-absorbing type.
TECHNOLOGY
type. (b) Dye-
converts laser light to heat in the dye sheet. Figure 14 shows two types of printing media used for this system. In contrast to the layout in Fig. 14a, in the construction of Fig. 14b, a layer that contains an infrared-ray absorbent and selectively absorbs semiconductor laser waves is included, and part of the laser beam is absorbed in the dye layer. By appropriately setting the parameters related to absorption, the amount of dye transferred can be increased. The thermal-transfer recording system which uses a laser as its heat source is mainly used for color proofing in printing processes. Some systems use melt and sublimation thermal transfer. The characteristics of the melt thermal-transfer system provide sharp dot matrix images. For sublimable dyes, a problem of color reproducibility arises from the difference between the dye colors and the colors used in the recording ink. When using a laser, relatively high power laser is required because it is a heat mode system. Previous research used a gas laser. However, in recent years, a high-power semiconductor laser has been developed, that contributed to the development of compact optical disc recording systems. There are two types of image creation systems in laser drawing: with respect to the recording surface on a drum, one is a system in which the laser from a fixed source is scanned using a mirror, and the other is a system in which the laser itself moves and emits laser beams. The latter systems are popular for thermal-transfer recording with low recording sensitivity. A resolution of 2540 dpi or higher has been currently realized. Resolution is the most important factor in laser recording systems. Multihead systems in which multiple lasers are used are being studied as a countermeasure for the shortcoming of the semiconductor laser system, namely, the time required for laser recording. DIRECT THERMAL
PRINTER
Coloring agents in the direct thermal printer, are applied in advance to the recording sheet itself. When heat is applied locally onto the sheet using a thermal head, physical and chemical coloring forms images. Currently, this system is widely used in simple printers such as facsimiles, word processors, and personal computers. In addition, this system is used in printers of measuring instruments, POS labeling, and medical equipment output systems.
This system has advantages over other systems for these reasons. It uses special sheets on which a coloring agent is applied; thus excellent coloring and image quality can be obtained. The system is so simple that the product is compact, lightweight, and inexpensive. Reliability is excellent and maintenance-free products can be realized. The only supplies required are recording sheets; handling and storage are easy. By altering the temperature of the heating element, the color density of each dot can be changed; thus, high-resolution images with gradation can be obtained. The color density difference is great and backgrounds that have a high level of whiteness can be obtained; thus it is possible to realize images that have good contrast. High-speed printing is possible. The following issues have been raised regarding the system. The system uses sheets onto which special coloring agents are applied; thus, compared with the system using plain paper, the cost for sheets is high. The fixing characteristics of the ink on the heat-sensitive recording sheets are poor. Color change and discoloration are easily caused by heat, light, pressure, temperature, and chemical agents. The storage of paper after printing must be managed carefully; the output from this system is not suitable for long-term storage. CONCLUSIONS Dye sublimation technologies using thermal heads were some of the most commercially important printers in the first half of the 1990s. However, the system has now been improved in performance. The resolution of the thermal head has been improved from 400 to 600 dpi. Further effort is being put forth, aimed at high precision and low cost. The problem of image storage properties, which was the crucial issue for dye thermal-transfer, has been solved substantially by stabilizing the dye itself by chemical reactions. Research in the area of recording materials continues in the search for better materials. In the area of heat sources, a high-power laser has made it possible to produce ultrahigh-precision images, such as those with resolution ranges from 3600 dpi to 4000 dpi, thus contributing to the improvement in printing-plate fabrication by replacing thermal heads. BIBLIOGRAPHY 1. K. Tsuji, The trend of the research in thermal technology E1D89-38, Tokyo, Japan, 1989.
printing
DYE TRANSFER 2. A. Iwamoto, “Thermal Japan, 1988.
printing
3. H. Tanaka,
Rep.
ITE
4. M. Mizutani 5. T. Goto, 6. T. Abe,
ITE ITE
Tech.
Tech.
7. K. S. Pennington Adv. NIP Technol. 8. IBM
Tech.
17(27),
19-24
Okz Tech.
Tokyo,
Rep.
47(10),
Rep.
11(26),
(1993).
J. U(2),
14. Hanma
Bull
28(l),
10. T. Kanai et al., 6th Int. Display Tokyo, Japan, 1986, p. 3.
(1993).
335-337
Res.
11.
S. Nakaya, K. Murasugi, M. Kazama, SID 23(l), 51-56 (1982). 12. C. A. Bruce and J. T. Jacobs, J. Appl. (1977). et al.,
SID
87,
Dig.,
(1984).
(1987).
and W. Crooks, SPSE Proc. VA, USA, 1984, p. 236.
Disclosure
16.
283-289
1397-1400 7-12
9. N. Taguchi, H. Matsuda, T. Kawakami, Proc. 4th Int. Cong. Adv. NIP Technol. LA, USA 1988 pp. 532-543.
13. M. Shiraish pp. 424-427.
E1088-29,
15. Hori
and S. Ito, Tech.
technology”
2nd
Int.
Cong.
(1985). and A. Imai, SPSE Thermograph Session,
Conf
(Jpn.
and Photo Louisiana,
Display
Y. Sekido, Eng.
3(l), USA,
861, Proc. 40-43 1987,
et al., IEEE et al., IEEE
Genno et al., pp. 284-287.
17.
I. Nose
18.
S. Ando
Trans.
Trans. SID
et al., SID
19. S. Masuda CE-28(3),
PRINTING
et al., SID
TECHNOLOGY
CE-31,431-437 CE-32,
Int. 85 Dig., 85 Dig.,
FLA,
(1986).
Dig., USA,
FLA,
et al., IEEE 226-232 (1982).
(1985).
283-289
Symp.
USA,
Trans.
197
AZ,
1985,
USA,
pp. 143-144.
1985,
pp. 160-163.
Consumer
Electron.
20.
W. Grooks et al., IS& T, 2nd Int. Cong. Printing Technol. VA, USA, 1984, p. 237.
21.
H. Ohnishi (1993).
22.
0. Sahni et al., SID 85 Dig., 1985, p. 152. A. Tomotake et al., IS& T, 14th Int. Cong. Adv. Prznting Technol., Tronto, CA, 1998, pp. 269-272.
23. 24.
M. Irie 231-234
et al., IEEE
and T. Kitamura, (1993).
Trans.
Electron
J. Imaging
1990,
Adv.
Devzces
Scz.
Non-impact 40(l),
69-74
Non-impact
Technol.
37(3),
E ELECTROENCEPHALOGRAM TOPOGRAPHY
(EEG) WOLFGANG
SKRANDIES
Institute of Physiology Justus-Liebig University Giessen, Germany THE FUNDAMENTAL THEORY ELECTROENCEPHALOCRAPHY Physiological of Spontaneous
and
Functional EEG
OF (EEG) Considerations
Neurons, the nerve cells in the brain, have a membrane potential that may change in response to stimulation and incoming information. Excitation is generated locally as intracellular depolarization, and it can be propagated over longer distances to nuclear structures and neurons in other brain areas via the nerve cells’ axons by soThe human brain consists of called “action potentials”. billions of neurons that are active spontaneously and also respond to internal and external sensory stimuli. Due to synaptic interconnections, brain cells form a vast complex anatomical and functional physiological structure that constitutes the central nervous system where information is processed in parallel distributed networks. At the synapses, chemical transmitters induce membrane potential changes at connected, postsynaptic neurons. In this way, information in the central nervous system is coded and transmitted either in the form of frequencymodulated action potentials of constant amplitude or as analog signals where local membrane potential changes occur gradually as postsynaptic depolarization or hyperpolarization. The cellular basics of neuronal membrane potentials and conductance have been studied by physiologists since the 1950s (l), and meanwhile much is also known about the molecular mechanisms in ionic channels and inside neurons [see overview by (2)]. Such studies employed invasive recordings using microelectrodes in animals or in isolated cell cultures where activity can be picked up in the vicinity of neurons or by intracellular recordings. Human studies have to rely on noninvasive measurements of mass activity that originates from many neurons simultaneously (the only exception is intracranial recordings in patients before tumor removal or surgery for epilepsy). The basis for scalp recordings is the spread of field potentials that originate in large neuronal populations through volume conduction. As described before, nerve cells communicate by changes in their electric membrane potential and dendritic activation (excitation, inhibition) due to local chemical changes and ionic flow across cell membranes. It is now well established that postsynaptic activation, (like excitatory postsynaptic potentials, (EPSP) or inhibitory postsynaptic potentials (IPSP), not action potentials, constitutes the physiological origin
of scalp-recorded electroencephalographic activity (3,4). Scalp electrodes typically have a diameter of 10 mm that is very large compared to the size of individual neurons (about 20 pm). Thus, the area of one electrode covers about 250,000 neurons, and due to spatial summation and the propagation of activity from more distant neural generators by volume conduction (5), many more nerve cells will contribute to mass activity recorded from the scalp. To detect brain activity at some distance using large electrodes, many neurons must be activated synchronously, and only activity that originates from “open” intracranial electrical fields can be assessed by scalp recordings. In contrast to this, “closed” electrical fields are formed by arrays of neurons arranged so that electric activity cancels. This constellation is found in many subcortical, network-like structures that are inaccessible to distant mass recordings. The major neuronal source for electrical activity on the scalp are the pyramidal cells in the cortical layers that are arranged in parallel perpendicular to the cortical surface which, however, is folded in intricate ways so that it cannot be assumed that the generators are perpendicular to the outer surface of the brain. Brain activity recorded from the intact scalp has amplitudes only in the range up to about loo-150 kV. Thus, these signals often are contaminated by noise that originates from other body systems such as muscle activity (reflected by electromyogram, EMG), eye movements (reflected by electrooculogram, EOG), or the heart (reflected by electrocardiogram, ECG) which often is of large amplitude. Line power spreading from electric appliances in the environment (50 or 60 Hz) is another source of artifact . For this reason, it is not astonishing that extensive measurements of brain activity from human subjects in ordinary electric surround conditions became possible only after the advent of electronic amplifiers that have high common-mode rejection. In 1929, the German psychiatrist Hans Berger published a paper on the first successful recording of human electrical brain activity. Similar to the earlier known electrocardiogram (ECG), the brain’s mass activity was called an electroencephalogram (EEG). In this very first description, Berger (6) already stressed the fact that activity patterns changed according to the functional status of the brain: electric brain activity is altered in sleep, anesthesia, hypoxia and in certain nervous diseases such as epilepsy. These clinical fields became the prime domain for the application of EEG measurement in patients [for details see (‘7)]. Two broad classes of activation can be distinguished in human brain electrophysiology: (1) spontaneous neural activity reflected by the electroencephalogram (EEG) which constitutes the continuous brain activation; and (2) potential fields elicited by physical, sensory stimuli, or by internal processing demands. Such activity is named evoked potential (EP) when elicited by external sensory stimuli, and event-related potential (ERP) when internal, psychological events are studied. Conventionally, EEG
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
measures are used to quantify the global state of the brain system and to study ongoing brain activity during long time epochs (from 30 minutes to several hours) such as during sleep or long-term epilepsy monitoring. The study of evoked activity aims at elucidating different brain mechanisms, and it allows assessing sensory or cognitive processing while the subject is involved in perceptual or cognitive tasks. Spontaneous EEG can be described by so-called ‘‘graphoelements’’ that identify pathological activity [e.g., ‘‘spike and wave’’ patterns (7)], but it is quantified mainly in the frequency domain. Activity defined by waveforms occurs in different frequency bands between 0.5 and about 50 Hz. These frequency bands were established in EEG research during the past 70 years. A brief, not exhaustive summary of the individual frequency bands follows. It should be stressed that even though it was shown that the bands behave independently in different recording conditions, the exact functional definition of bands by factorial analyses depends to some extent on the conditions that are studied. In addition, large interindividual variation precludes generalizations of the functional interpretation of a given frequency band. The lowest frequencies are seen in the delta band that ranges from 0.5–4 Hz, followed by theta (4–7 Hz). In the healthy adult, these low frequencies dominate the EEG mainly during sleep although theta activity is also observed during processing such as mental arithmetic. In awake subjects, one finds alpha activity in a state of relaxation (8–13 Hz), and beta occurs during active wakefulness (14–30 Hz). Note, however, that alpha activity is also observed during the execution of welllearned behavior. In addition, there are high frequency gamma oscillations (30–50 Hz) that have been recorded in the olfactory bulb of animals and are related to attentive processing of sensory stimuli and to learning processes in cortical areas. A common, general observation is that large synchronized amplitudes occur at low frequencies, whereas high EEG frequencies yield only small signals because the underlying neuronal activity is desynchronized. The basic rhythms of electric brain activity are driven by pacemakers that have been identified in animal experiments in various brain structures such as the cortex, thalamus, hippocampus, and the limbic system (4). From the study of the electroresponsive properties of single neurons in the mammalian brain, it was proposed that oscillations have functional roles such as determining the functional states of the central nervous system (8). A description of the detailed clinical implications of the occurrence of pathological EEG rhythms is beyond the scope of this text, and the reader is referred to the overview by Niedermeyer and Lopes da Silva (7). As a general rule of thumb, in addition to epileptic signs like spikes or ‘‘spike and wave’’ patterns, two situations are generally considered pathophysiological indicators: (1) the dominant local or general occurrence of theta and delta activity in the active, awake adult; and (2) pronounced asymmetries of electric activity between the left and right hemispheres of the brain.
199
Physiological and Functional Considerations of Evoked Brain Activity Evoked brain activity is an electrophysiological response to external or to endogenous stimuli. Evoked potentials are much smaller in amplitude (between about 0.1 and 15 µ V) than the ongoing EEG (whose amplitudes are up to 150 µV), so averaging procedures must be employed to extract the signal from the background activity (9). A fundamental characteristic of evoked activity is that it constitutes time-locked activation that can be extracted from the spontaneous EEG when stimuli are presented repeatedly. For many years, the clinical use of qualitative analysis of EEG had dominated the field, and experienced neurological experts visually examined EEG traces on paper. Only in the 1960s have computer technology and modern analytic techniques allowed quantitative analyses of EEG data that finally led to the possibility of extracting of evoked brain activity and to topographic mapping. Measures of high temporal resolution of the order of milliseconds are needed to study brain processes. This is reflected by the fact that brain mechanisms are fast and that individual steps in information processing occur at high temporal frequency that correlates with rapid changes of the spatiotemporal characteristics of spontaneous and evoked electric brain activity. Similarly, it has been proposed that the binding of stimulus features and cooperation of distinct neural assemblies are mediated through high-frequency oscillations and coherence of neuronal activation in different parts of the brain (10). Due to its high sensitivity to the functional state of the brain, scalp-recorded electrical brain activity is a useful tool for studying human brain processes that occur in a subsecond range. The very high temporal resolution is an important prerequisite when rapidly changing electric phenomena of the human brain are investigated noninvasively. TOPOGRAPHIC IMAGING TECHNOLOGY AND BASIC IDEAS OF ELECTROENCEPHALOGRAPHIC MAPPING As is obvious from the description of other imaging techniques in this volume (CT, structural or functional MRI, PET), imaging of anatomical brain structures or of hemodynamic responses to different processing demands is available at high spatial resolution but typically has to rely on relatively long integration times to derive significant signals that reflect changes in metabolic responses. Different from brain imaging methods such as PET and functional MRI, noninvasive electrophysiological measurements of EEG and evoked potential fields (or measurements of the accompanying magnetic fields, MEG; see corresponding article) possess high temporal resolution, and techniques to quantify electric brain topography are unsurpassed when functional validity is required to characterize human brain processes. In addition, electric measurements are relatively easy and inexpensive to perform and offer the possibility of assessing brain function directly in actual situations without referring to indirect comparisons between experimental states and a neutral baseline condition or between different tasks.
200
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
Electrophysiological data are recorded from discrete points on the scalp against some reference point and conventionally have been analyzed as time series of potential differences between pairs of recording points. Multichannel recordings allow assessing the topographical distribution of brain electrical activity. For imaging, waveform patterns are transformed to displays that reflect the electric landscape of brain activity at discrete times or different frequency content of the recorded EEG signals. The results of a topographical transformation are maps that show the scalp distribution of brain activity at discrete times or for selected frequencies of the spontaneous EEG. Such functional imaging (1) possesses high time resolution needed to study brain processes, (2) allows characterizing sequences of activation patterns, and (3) is very sensitive to state changes and processing demands of the organism. Conventionally, electrophysiological data were displayed and analyzed as many univariate time series. Only the technical possibility of simultaneous acquisition of data in multichannel recordings allowed treating of EEG data as potential distributions (11–13).The strength of mapping of brain activity lies not only in the display of brain activation in a topographical form, and mapping is also a prerequisite for adequate analysis of brain activity patterns. Because EEG and evoked potentials are recorded as potential differences between recording sites, the location of a reference point drastically influences the shape of activity recorded over time. The basic properties of scalp potential maps and adequate topographical analysis avoid the fruitless discussion about an electrically neutral reference point (14,15). In contrast to potential difference waveshapes, the spatial structure or landscape of a momentary map does not change when a different recording reference is employed. The topographical features remain identical when the electric landscape is viewed from different points, similar to the constant relief of a geographical map where sea level is arbitrarily defined as zero level: The location of maxima and minima, as well as the potential gradients in a given map, are independent of the chosen reference that defines zero. To analyze the topographical aspects of electroencephalographic activity, it is important to remember that we are dealing with electric fields that originate in brain structures whose physical characteristics vary with recording time and space: the positions of the electrodes on the scalp determine the pattern of activity recorded, and multichannel EEG and evoked potential data enables topographical analysis of the electric fields that are reconstructed from many spatial sampling points. Today, from 20 to 64 channels are typically used. From a neurophysiological point of view, components (or subsets) of brain activity are generated by the activation of neural assemblies located in circumscribed brain regions that have certain geometric configurations and spatiotemporal activity patterns. Landscapes of electric brain potentials may give much more information than conventional time series analysis that stresses only restricted aspects of the available electric data (12,13,16). Examining brain electric activity as a series of maps of the momentary potential distributions shows that these
‘‘landscapes’’ change noncontinuously over time. Brief epochs of quasi-stable landscapes are concatenated by rapid changes. Different potential distributions on the scalp must have been generated by different neural sources in the brain. It is reasonable to assume that different active neural assemblies incorporate different brain functions, so a physiologically meaningful data reduction is parsing the map series into epochs that have quasi-stable potential landscapes (‘‘microstates’’) whose functional significance can be determined experimentally. In spontaneous EEG, polarity is disregarded; in EP and ERP work, polarity is taken into account (12,14,17). As in most neurophysiological experiments on healthy subjects or on neurological or psychiatric patients, topographical analysis of EEG and evoked potentials is used to detect covariations between experimental conditions manipulated by the investigator and features of the recorded scalp potential fields. Evoked potential (or event-related potential) studies aim to identify subsets or so-called components of electric brain activity that are defined in terms of latency with respect to some external or internal event and in terms of topographical scalp distribution patterns. Measures derived from such data are used as unambiguous descriptors of electric brain activity, and they have been employed successfully to study visual information processing in humans (13,15). Regardless whether the intracranial generator populations can be localized exactly, the interpretation of scalp potential data combined with knowledge of the anatomy and physiology of the human central nervous system allows drawing useful physiological interpretations (see also later section). Scalp topography is a means of characterizing electric brain activity objectively in terms of frequency content, neural response strength, processing times, and scalp location. Comparison of scalp potential fields obtained in different experimental conditions (e.g., using different physical stimulus parameters, in different subjective or psychological states, or for normal vs. pathological neurophysiological traits) may be used to test hypotheses on the identity or nonidentity of the neuronal populations activated in these conditions. Identical scalp potential fields may or may not be generated by identical neuronal populations, whereas nonidentical potential fields must be caused by different intracranial generator mechanisms. Thus, we can study systematic variations of the electrical brain activity noninvasively, and we are interested in variations of scalp potential fields caused by manipulating independent experimental parameters. TOPOGRAPHIC MAPPING OF SPONTANEOUS ELECTROENCEPHALOGRAPHIC BRAIN ACTIVITY — MAPS OF FREQUENCY BAND POWER The spontaneous EEG is commonly analyzed in terms of the frequency content of the recorded signals that can be quantified by frequency analysis (Fast Fourier Transform, FFT). This allows numerically determining the amount of activity in each of the frequency bands described earlier. In addition, the spectra of electric brain activity summarize the frequency content of brain signals during longer epochs. Conventionally, artifact-free EEG is used with epoch lengths of 2–15 seconds, and the results
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
are grouped in the classical frequency bands from low (delta, theta), to middle (alpha), and high frequencies (beta, gamma) between 0.5 and about 50 Hz. Intracerebral recordings may reveal even higher frequencies, but due to the filtering characteristics of the cerebrospinal fluid, the skull, and the scalp, the recorded signals contain only little very high frequency activity. To map the distributions of the classical EEGT, frequency bands are displayed topographically as maps of spectral power or amplitude. This allows detecting regional differences of activation patterns and also analyzing the influence of different brain states and processing demands on the spatial scalp activity distribution. Clinically relevant, pathological asymmetries are evident from topographic maps. An example of frequency distribution maps is given in Fig. 1 which illustrates the topography of the spontaneous EEG of a healthy volunteer. The time series of 90 epochs of 1024 ms duration were Fourier-transformed at each of 30 electrodes, and the spectra were averaged. Then, data were summed in the conventional frequency bands, and the results were plotted as scalp maps. For recording, 30 electrodes were distributed across the head in a regular array (see inset in Fig. 1), and the resulting potential distribution was reconstructed by spline interpolation. Note that this interpolation does not change the potential values measured at the recording electrodes and the area between electrodes attains a smooth appearance. Potential maps are used in electrophysiology to visualize recorded data. All quantitative steps and statistical data analysis rely on using the original signals measured at each of the recording electrodes. Various parameters can be extracted for characterization and topographical analysis of EEG and evoked potential data, some of which are described in more detail here. It is obvious that the topographical scalp distributions illustrated in Fig. 1 are dissimilar for the different basic EEG frequencies. Although we see symmetrical activation patterns in all frequency bands across both hemispheres, the detailed topography is different. Note that alpha activity is pronounced across the occipital areas (i.e., the back of the head) and lower frequencies such as EEG activity in the theta and delta bands show a more frontal distribution.
Delta
Theta
Alpha
Beta
201
TOPOGRAPHIC MAPPING OF EVOKED BRAIN ACTIVITY — TIME SEQUENCE MAPS Scalp Distribution of Evoked Potential Fields Following sensory stimulation, brain activity occurs across the primary and secondary sensory areas of the cortex. The brain has specialized regions for analyzing and processing different stimulus modalities that are represented in distinct cortical areas. In addition, sensory information is routed to the central nervous system in parallel pathways that analyze distinct sensory qualities (2). Knowledge of the anatomy and the pathways of various sensory systems allows studying information processing in different brain systems. Evoked potentials can be elicited in all sensory modalities (9), and as a response to visual stimulation, the brain generates an electric field that originates in the visual cortex which is located in the posterior, occipital parts of the brain. Because of their small amplitude, evoked potentials are averaged time-locked to the occurrence of the external sensory stimulus. This enhances stimulus-related activation by averaging and thereby removes the contribution of the spontaneous background EEG. Distinct components occur at latencies that directly reflect processing times and steps in information processing. The evoked field changes in strength and topography over time. A series of potential distributions is given in Fig. 2 that depicts the spatiotemporal activation pattern within the recording array at various times after stimulation. The stimulus was a grating pattern flashed in the right visual field at time zero. The maps in Fig. 2 are potential maps that were reconstructed from data recorded from the electrode array shown in the inset. The 30 electrodes are distributed as a regular grid across the brain regions under study. Only potential differences within the scalp field are of interest, so all data are referred to the computed average reference. This procedure results in a spatial high-pass filter that eliminates the dc-offset potential introduced by the necessary choice of some point as a recording reference. In Fig. 2, the evoked brain activation is displayed between 60 and 290 ms after the stimulus occurred. It Figure 1. Spatial distribution of frequency spectra of spontaneous EEG. Time series of 1.024 seconds length at each of the 30 electrodes were frequency analyzed by FFT, and 90 amplitude spectra were averaged. The results were grouped in the conventional frequency bands (delta: 0.5–4 Hz; theta: 4–7 Hz; alpha: 8–13 Hz; for beta activity the so-called ‘‘lower β’’ spectral values between 14 and 20 Hz were used) and are plotted as potential maps. Subject was a healthy awake volunteer in a quiet room. Head as seen from above, nose up, left ear on the left, right ear on the right side. For recording, 30 electrodes were distributed across the head in a regular array (see inset). Lines are in steps of 0.25 µV, blue corresponds to low activity, and red corresponds to high amplitudes. See color insert.
202
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY Stimulus in the right visual field
Stimulus in the left visual field
60
70
80
90
100
60
70
80
90
100
110
120
130
140
150
110
120
130
140
150
160
170
180
190
200
160
170
180
190
200
210
220
230
240
250
210
220
230
240
250
260
270
280
290 ms
260
270
280
290 ms
Figure 2. Sequence of averaged visual evoked potential maps recorded from the scalp of a healthy young adult who had normal vision. Activity was evoked by flashing a vertical grating pattern in the right visual half field. The stimulus subtending a square of 6° × 6° ; spatial frequency was 4.5 cycles/degree. Maps are illustrated between 60 and 290 ms at steps of 10 ms after stimulus occurred with respect to the average reference; 1-µV steps between equipotential lines. The electrode array covers the scalp from the inion (most posterior) to an electrode at 5% anterior to Fz [according to the International 10/20 Electrode System (27)]; see also head schema.
is obvious that the topography and the strength of the electric field change as a function of time. The field relief is pronounced between 80 and 90 ms, 120 and 140 ms, and 200 and 220 ms: there are high voltage peaks and troughs, associated with steep potential gradients. Obviously, these instances indicate strong synchronous activation of neurons in the visual cortex. Due to the regular retinotopic projection of the visual field onto the mammalian visual cortex, stimuli presented in lateral half fields are followed by lateralized brain activation. The left visual field is represented in the right visual cortex, and the right visual field in the left hemisphere. An asymmetrical potential distribution is evident from the maps illustrated in Fig. 2. Brain activity elicited by internal or external stimuli and events may be decomposed into so-called components that index steps of information processing that occur at different latencies after the stimulus. When the visual stimulus is in the right visual field, there is a positive peak over the left occipital cortex at around 80 ms, whereas a similar peak occurs at 140 ms across the right brain area. When the same visual target is presented in the left half field, the topographical pattern of activity shows a reversed scalp distribution pattern (Fig. 3). The basic features of the fields are similar to those evoked by right visual stimuli, but the lateralization is changed toward
Figure 3. Sequence of averaged visual evoked potential maps elicited by a grating pattern in the left visual half field. Maps are illustrated between 60 and 290 ms at steps of 10 ms after stimulus occurred. Same stimulation and recording parameters as before, for details refer to legend of Fig. 2.
the opposite hemisphere. This is evident when the maps series in Figs. 2 and 3 are compared. More detailed analysis is needed to interpret the topographic patterns. It is important to remember that the absolute locations of the potential maxima or potential minima in the field do not necessarily reflect the location of the underlying generators (this fact has led to confusion in the EEG literature, and for visual evoked activity, this phenomenon became known as paradoxical lateralization). Rather, the location of the steepest potential gradients is a more adequate parameter that indicates the intracranial source locations. This point will be discussed in more detail later. Quantitative Analysis of Brain Activity Images and Derived Measures Mapping electric brain activity in itself does not constitute data analysis; it is used mainly to visualize complex multichannel time series data. To compare activity patterns by conventional statistical methods, data must be reduced to quantitative descriptors of potential fields. In the following, derived measures used for topographical EEG and evoked potential analysis are presented. Scalp recorded fields reflect the synchronous activation of many intracranial neurons, and it has been proposed that steps in information processing are reflected by the occurrence of strong and pronounced potential fields (13,14). During these epochs, the electric landscape of spontaneous EEG and of evoked activity is topographically stable [see Topographic Imaging and (12,14,17)]. In the analysis of evoked brain activity, one of the main goals is identifying so-called components. As evident from the maps shown in Figs. 2 and 3, there are potential field distributions that
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
i=1
4.5 142 ms 210 ms
3
1.5
84 ms
0 0
125
250
375
500
625
750
625
750
Latency (ms)
(1)
Thus, GFP corresponds to the root-mean-square amplitude deviations among all electrodes in a given potential field. Note that this measure is not influenced by the reference electrode, and it allows for a reference-independent treatment of EEG and evoked potential data. Similarly, GFP may be computed from average reference data: 2 n n 1 1 Ui − Uj GFP = n n
(a)
(2)
j=0
This measure is mathematically equivalent to Eq. (1). In summary, GFP reflects the spatial standard deviation within each map. Potential fields that have high peaks and troughs and steep gradients are associated with high GFP values, and when fields are flat, GFP is small. The potential field strength determined by GFP results in a single number at each latency point, and its value is commonly plotted as a function of time. This descriptor shows how field strength varies over time, and its maximum value in a predetermined time window can be used to determine component latency. Note that GFP
(b)
4.5 Global field power (µV)
n n 1 2 Ui − Uj GFP = 2 2n i=1 j=1
computation considers all recording electrodes equally, and that at each time, it results in one extracted value that is independent of the reference electrode. High global field power also coincides with periods of stable potential field configurations where the major spatial characteristics of the fields remain unchanged. Fig. 4 illustrates the GFP of the topographical data of Figs. 2 and 3 as a function of time. In both conditions, stimulation of the right or left visual field, three components occur between about 80 and 210 ms. The brain’s response to the grating pattern resulted in an electric field that is maximally accentuated at these times. The quantification of field strength defines component latency by considering all topographically recorded data. Thus, this procedure is independent of the choice of a reference electrode. It is evident that visual stimuli in the right and the left visual field yield brain activation that has comparable temporal characteristics. The small latency
Global field power (µV)
have very little activity (few field lines, shallow gradients at 60 ms, around 100 ms, or after 260 ms), whereas at other latency times, maps display high peaks and deep troughs that have large potential gradients in the potential field distributions (around 80, 140, or 200 ms in Figs. 2 and 3). It appears reasonable to define component latency as the time of maximal activity in the electric field that reflects synchronous activation of a maximal number of intracranial neuronal elements. To quantify the amount of activity in a given scalp potential field, we have proposed a measure of ‘‘global field power’’ (GFP) that is computed as the mean of all possible potential differences in the field that corresponds to the standard deviation of all recording electrodes with respect to the average reference (14). Scalp potential fields that have steep gradients and pronounced peaks and troughs result in high global field power, and global field power is low in electric fields that have only shallow gradients and a flat appearance. Thus, the occurrence of a maximum in a plot of global field power over time determines component latency. In a second step, the features of the scalp potential field are analyzed at these component latencies. Derived measures such as location of potential maxima and minima and steepness and orientation of gradients in the field are, by definition, independent of the reference electrode, and they describe the electric brain activity adequately. Global field power is computed as the mean potential deviation of all electrodes in the recording array at each time. Using equidistant electrodes on the scalp, the potentials ei , i = 1, . . . , n yield the measured voltages Ui = ei − ecommon reference . From this potential distribution, a reference-independent measure of GFP is computed as the mean of all potential differences within the field:
203
144 ms 206 ms
3 78 ms 1.5
0 0
125
250 375 500 Latency (ms)
Figure 4. Global field power (GFP) between stimulus occurrence and 750 ms after stimulation as a function of time, computed for the topographical data shown in Figs. 2 and 3. Most pronounced activity is seen at latencies below 250 ms. (a) Stimulus in the right; (b) stimulus in the left visual half field. In both conditions, three components occur with latencies of 84, 142, and 210 ms for right, and 78, 144 and 206 ms for left stimuli. The corresponding scalp field distribution of the components is indicated by the maps at component latency. Blue relative negative, red positive polarity with respect to the average reference; 1-µV steps between equipotential lines. See color insert.
204
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
differences seen are not statistically significant. On the other hand, there are large, highly significant differences in the topography of the components. The maps illustrated in Fig. 4a and b that are elicited around 80 ms and around 140 ms after stimulation show similar characteristics; however, the pattern of lateralization is mirror image symmetrical: for right visual half field stimuli, a positive extremum occurs at 80 ms across the left occipital areas, for stimuli in the left visual field, the positive component is seen across the right occipital cortex. Similarly, the 140-ms component is systematically differently lateralized in processing of left and right visual stimuli (compare Fig. 4a and b), whereas at 206 and 210 ms latency, the asymmetries are much less pronounced. There is a large literature for the visual system illustrating how evoked potential measures have been used to elucidate processing mechanisms (9). The time course of information processing as well as the structures in the visual cortex activated depends on the location of the stimulus in the visual field (13), and it has also been demonstrated that the temporal characteristics of processing of contrast or stereoscopic depth perception are comparable. Such data on neural processing time in healthy adult subjects or in patients are accessible only with topographic analysis of electric brain activity. Using further topographical analysis, latency of the components, field strength, or the complete potential fields at component latency may be compared directly between experimental conditions or between subject and patient populations, resulting in significance probability maps. All derived measures can be submitted to statistical analysis. Component latency may be equated with neural processing time, and field strength is an index of the amount of synchronous activation or the spatial extent of a neuronal population engaged in stimulus processing. Topographical descriptors such as the location of potential maxima or minima, the centers of gravity, or the location and orientation of potential gradients give information on the underlying neuronal assemblies (see later). Such measures can be used to quantify the topography of potential fields, and they constitute a physiologically meaningful data reduction. As evident from Fig. 4, the visual stimulus that occurred in the right or in the left visual half field elicits cortical activity around 80 and 140 ms that has different strength. As the topographical distribution of the three components suggests, different neuronal assemblies in the human visual cortex are involved in different subsequent processing steps. Inspection of the maps reveals, however, only a very loose relationship between the location of the potential maxima and the intracerebral neuronal assemblies that are activated. From the anatomical wiring of the mammalian visual pathways, we know that information that originates in the left half field is routed to the right occipital cortex, and stimuli in the right visual field are processed by the left hemisphere. The scalp distribution of components around 140 ms is paradoxical. This will be discussed in more detail later. Of course, nonidentical scalp potential fields reflect the activation of nonidentical intracranial neurons. Due to the ‘‘inverse’’ problem, already stated by von Helmholtz (18),
there is no unique solution to the question of source localization if there is more than one active generator. This holds true in general, for EEG and also for other electrophysiological signals such as ECG. The following section will discuss the basic ideas of source localization and will introduce an approach that may result in physiologically meaningful solutions. RELATIONSHIP OF SCALP-RECORDED ACTIVITY TO INTRACRANIAL BRAIN PROCESSES Analysis of Intracranial Processes — Model Sources and Neuroanatomical Structures One aim of electrophysiological recordings of brain activity on the scalp is characterizing of underlying sources in the central nervous system. Information is processed in circumscribed brain areas, and spontaneous activation patterns originate from specific structures in the central nervous system. Thus, it appears consequential to try to explain the topography of scalp distribution patterns in terms of anatomical localization of neuronal generators. To arrive at valid interpretations of scalp recorded data is no trivial task: The so-called ‘‘inverse’’ problem that cannot be uniquely solved constitutes a fundamental and severe complication. Any given surface distribution of electrical activity can be explained by an endless variety of intracranial neural source distributions that produce an identical surface map. Thus, it has long been known that there is no unique numerical solution when model sources are determined (18). Over the years, a number of different approaches to determine physiologically and anatomically meaningful source locations mathematically have been proposed; among them is the fit of model dipoles located within a spherical model head or a realistic head shape model [review by (19)]. To solve the inverse problem mathematically, researchers have to make assumptions that are based on anatomical and physiological knowledge of the human brain. This can also be used to limit the number of possible intracranial source distributions for multiple dipole solutions (20,21). Information is processed in parallel distributed networks, and long-range cooperation between different structures constitutes a basic feature of brain mechanisms (10). Due to this fact, neuronal activation is distributed over large areas of the central nervous system, and its extension is unknown. Thus the approach of single dipole solutions or the fit of a small number of dipoles to the data is not appropriate when unknown source distributions should be detected. More realistic views of underlying mechanisms are applied by methods that aim to determine the most likely intracranial sources that may be distributed through an extended three-dimensional tissue volume. Low-resolution electromagnetic tomography (LORETA) rests on neurophysiological assumptions such as the smooth intracortical distribution of activity and the activation of dendrites and synapses located in the gray matter of the cortex. The details of the method, as illustrated here, can be found in Pascual-Marqui et al. (22). Here, we give only a brief summary of the logic of the analysis: Lowresolution brain electromagnetic tomography determines
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
the three-dimensional distribution of generator activity in the cortex. Different from dipole models, it is not assumed that there are a limited number of dipolar point sources or a distribution on a known surface. LORETA directly computes the current distribution throughout the entire brain volume. To arrive at a unique solution for the threedimensional distribution among the infinite set of different possible solutions, it is assumed that adjacent neurons are synchronously activated and neighboring neurons display only gradually changing orientation. Such an assumption is consistent with known electrophysiological data on neuronal activation (8). This basic constraint computes the smoothest of all possible three-dimensional current distributions that results in a true tomography that, however, has relatively low spatial resolution; even a point source will appear blurred. The computation yields a strength value for each of a great many voxels in the cortex and no constraints are placed on the number of model sources. The method solves the inverse problem without a priori knowledge of the number of sources but by applying the restriction of maximal smoothness of the solution based on maximally similar activity in neighboring voxels. The result is the current density at each voxel as the linear, weighted sum of the scalp electric potentials. The relation to brain anatomy is established by using a threeshell spherical head model matched to the atlas of the human brain (23) that is available as a digitized MRI from the Brain Imaging Centre, Montreal Neurologic Institute. The outlines of brain structures are based on digitized structural probability MR images derived from normal brains. Registration between spherical and realistic head geometry uses EEG electrode coordinates reported by Towle et al (24), and in a current implementation, the solution space is restricted to the gray matter of the cortical and hippocampal regions. A total of 2394 voxels at 7-mm spatial resolution is produced under these neuroanatomical constraints (21).
The final results are not single locations, as obtained from model dipole sources but are reconstructed tomographical slices through the living human brain. In this respect, there is some similarity to brain imaging methods such as CT, PET, and MRI; however, here we are dealing with the intracranial distribution of extended potential sources that best explain the scalp recorded EEG or evoked potential data. Intracranial Correlates of Spontaneous EEG Spontaneous EEG quantified by frequency analysis can be displayed as topographical maps that illustrate the scalp distribution of conventional frequency bands, as in Fig. 1. The model sources of alpha activity reconstructed by low-resolution electromagnetic tomography are shown in Fig. 5. The source location data were computed for the cortical areas of the Talairach atlas and are displayed as horizontal (axial), sagittal, and frontal (coronal) views. Activity is shown as the estimated current density strength in the different regions of the central nervous system. Dark red areas indicate stronger neural activation; light colors correspond to low activity. It is obvious that major neuronal sources are found mainly in the posterior, occipital areas of the cortex. This is to be expected based on the vast literature on spontaneous EEG in healthy adults (7). The images of Fig. 5 are selected cuts (tomograms) made through the voxel of maximal cortical activity. Other views of the brain anatomy may be used to visualize further the three-dimensional distribution of activity. This is presented in Fig. 6 which shows a series of horizontal cuts through the brain. There are 17 tomograms at different levels of the head that illustrate where alpha activity is located in the cortex. Similar to Fig. 5, major neuronal activation can be seen in occipital regions, and slight asymmetries become evident at different depths. These data further support the interpretation that large areas of occipital cortex are involved in producing the resting EEG rhythm of a healthy adult subject.
Mean alpha activity L
R
−5
0
+5 cm (X)
Loreta−Key
(Y) +5
A
P
(Z)
L
R
(Z)
0
+5
+5
−5
0
0
−10
−5
−5
(Y)
+5
0
205
−5
−10 cm
−5
0
+5 cm (X)
Figure 5. Low-resolution electromagnetic tomography (LORETA) images of alpha-band activity of the data illustrated in Fig. 1. The results are local current densities that were computed for the cortical areas of the Talairach probability atlas (digitized structural probability MR images). From left to right, the three images show horizontal, sagittal, and frontal views of the brain. Red indicates strong activity; light colors indicate low activity. The cuts illustrate intracerebral locations of maximal activation. See color insert.
206
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY L
R [Y]
L
R [Y]
0
0
0
0
−5
−5
−5
−5
−5
−10
+5 cm [X]
−5
0
L
−5
R [Y]
−5
R [Y]
0
+5 cm [X]
L
−5
R [Y]
0
L
+5 cm [X] R [Y]
+5
+5
0
0
0
0
0
−5
−5
−5
−5
−5
−10 −5
−10 [Z=8]
0
L
+5 cm [X]
−5
R [Y]
−10 [Z=15]
0
+5 cm [X]
L
−5
R [Y]
−10 [Z=22]
0
+5 cm [X]
L
−5
R [Y]
0
L
+5 cm [X] R [Y]
+5
+5
+5
+5
+5
0
0
0
0
0
−5
−5
−5
−5
−5
−10
−10
−10 [Z=36]
+5 cm [X]
−5
R [Y]
[Z=43]
0
L
+5 cm [X]
−5
R [Y]
+5
+5
0
0
−5
−5
−10
−10
−10 [Z=50]
0
+5 cm [X]
−5
−10 [Z=57]
0
+5 cm [X]
−5
0
+5 cm [X]
Loreta-Key
0.000
−5
+5 cm [X]
+5
+5 cm [X]
[Z=64]
0
L
−10 [Z=−13]
+5
[Z=29]
0
−10 [Z=−20]
+5
R [Y]
−5
+5 cm [X]
[Z=1]
L
−10 [Z=−27]
−10
L
R [Y]
0
[Z=−6]
0
L
+5
R [Y]
−5
R [Y] +5
[Z=−34]
0
L
+5
−10 −5
R [Y]
+5
[Z=−41]
L
L
+5
1.959
3.917
5.876
7.834 −2 [×10 ]
Mean alpha activity
[Z=71]
0
+5 cm [X]
−5
0
+5 cm [X]
Figure 6. Low-resolution electromagnetic tomography (LORETA) images of alpha-band activity shown as a series of horizontal (axial) cuts through the brain. There are 17 tomograms computed for different levels. Same data as in Fig. 5. See color insert.
When lower frequencies of spontaneous EEG are analyzed, the pattern of intracerebral activation changes. Fig. 7 illustrates the source distribution of theta activity recorded from the same subject. A redistribution of activity to different brain areas is evident. Mainly regions in the frontal lobe and in the cingular cortex are active when the theta rhythm is mapped. This is what is suggested by the surface distribution illustrated in Fig. 1. In addition, it has been shown that scalp recorded frontal midline theta activity (FMT) during mental calculation, exercise, and sleep occurs in healthy human subjects as well as in patients suffering from brain tumors or chemical intoxication. Such findings have been related to brain functions located in the frontal lobe of humans (25). The strong activation in frontal areas that is also seen in series of horizontal tomograms is in line with such reports. The source
localization data in Fig. 8 further illustrate the occurrence of strong electrical activity in frontal brain areas in most of the horizontal (axial) slices through the brain. Intracranial Correlates of Evoked Brain Activity As reviewed earlier, some evoked components occur at unexpected locations in the scalp field distributions. Fig. 4 illustrated the paradoxical lateralization of visual evoked potentials. When intracerebral sources are computed from the recorded data, this observation may be validated or corrected. Fig. 9 shows the scalp field distribution of a component following stimulation of the left visual field at a latency of 144 ms. In the surface map, the activation is lateralized across the left hemisphere where maximal positivity can be identified at occipital areas. This does not agree with what must be expected based on the knowledge of the human visual pathways where
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
207
Mean theta activity L
Loreta−Key
R (Y) +5
−5
+5 cm
0
A
P
(Z)
L
R
(Z)
0
+5
+5
−5
0
0
−10
−5
−5
(X)
(Y)
+5
−5
0
−5
−10 cm
+5 cm (X)
0
Figure 7. Low-resolution electromagnetic tomography (LORETA) images of theta-band activity of the data illustrated in Fig. 1. The results are local current densities that were computed for the cortical areas of the Talairach probability atlas. From left to right, the three images show horizontal, sagittal, and frontal views. Red indicates strong activity, light colors indicate low activity. The cuts illustrate intracerebral locations of the voxel at maximal activation. Details as in the legend of Fig. 5. See color insert. L
R [Y]
L
R [Y]
0
0
0
0
−5
−5
−5
−5
−5
−10
+5 cm [X]
−5
0
L
+5 cm [X]
+5 cm [X]
0
L
−5
R [Y]
−10 [Z=−13]
0
+5 cm [X]
L
−5
R [Y]
0
L
R [Y]
+5
+5
+5
0
0
0
0
0
−5
−5
−5
−5
−5
−10
−10 0
L
−5
+5 cm [X] R [Y]
−10
+5 cm [X]
0
L
−5
R [Y]
−10 [Z=22]
[Z=15]
[Z=8]
−5
+5 cm [X]
0
+5 cm [X]
L
−5
R [Y]
0
L
R [Y]
+5
+5
+5
+5
0
0
0
0
0
−5
−5
−5
−5
−5
−10 [Z=36]
+5 cm [X]
−5
R [Y]
−10
0
L
+5 cm [X]
−5
R [Y]
+5
+5
0
0
−5
−5
−10 −5
+5 cm [X]
0
−10 [Z=57]
[Z=50]
[Z=43]
0
+5 cm [X]
−5
0
Loreta−Key
0.000
1.707
3.415
5.122
6.830 −2 [×10 ]
Mean theta activity −10 −5
−10 [Z=71]
0
+5 cm [X]
+5 cm [X]
+5
−10
[Z=64]
+5 cm [X]
+5
[Z=29]
0
−10 [Z=−20]
+5
R [Y]
−5
−5
R [Y]
[Z=1]
L
−10 [Z=−27]
−10 0
R [Y]
0
[Z=−6]
−5
L
+5
R [Y]
L
R [Y] +5
[Z=−34]
0
L
+5
−10 −5
R [Y]
+5
[Z=−41]
L
L
+5
−5
0
+5 cm [X]
Figure 8. Low-resolution electromagnetic tomography (LORETA) images of theta-band activity shown as a series of horizontal slices through the brain. There are 17 tomograms computed for different levels. Same data as in Fig. 7. See color insert.
+5 cm [X]
208
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
Stimulus in the left visual field
144 ms L
Loreta-key
R (Y) +5
−5
0
A
P
(Z)
L
R
(Z)
0
+5
+5
−5
0
0
−10
−5
−5
+5 cm (X)
(Y) +5
0
−5
−10 cm
−5
0
+5 cm (X)
Figure 9. Scalp distribution map, electrode scheme, and low-resolution electromagnetic tomography (LORETA) images of a component elicited at 144 ms by a stimulus in the left visual field. In the potential map, blue indicates relative negative and red positive polarity with respect to the average reference; there are 1-µV steps between equipotential lines. The three LORETA images illustrate horizontal, sagittal, and frontal (coronal) views. Red indicates strong activity; light colors indicate low activity. Note that the intracerebral location of activity corresponds to what is expected from the anatomy of the visual pathway. See color insert.
information is routed to cortical areas opposite the visual half field that is stimulated. When the most likely source configuration is computed by LORETA, a more clear-cut result arises: major intracranial activation is seen in the occipital cortex of the right hemisphere of the brain. This is evident from the tomograms through the areas of maximal intracranial activity illustrated in Fig. 9 where activation in the sagittal plane is found in the posterior, occipital brain areas. From the frontal (coronal) and the horizontal (axial) sections, it is clear that mainly neurons in the right hemisphere respond to the stimulus. The results shown in Fig. 10 confirm this interpretation: the sequences of horizontal slices through the brain display lateralized activity restricted mainly to the right hemisphere. Thus, the three-dimensional source localization confirms that neurons in the cortical areas opposite the stimulated visual half field respond to the grating pattern presented. Such data clearly demonstrate that a direct topographical interpretation of scalp field distributions must not rely on the location of extreme values that may be misleading. More meaningful information is reflected by the location and orientation of potential gradients of the scalp field patterns. CLINICAL AND PRACTICAL IMAGING APPLICATIONS As evident from the data and results presented in the preceding sections, mapping brain electrical activity constitutes a means for (1) visualizing brain electric field
distributions on the scalp, (2) adequate quantitative and statistical analysis of multichannel electrophysiological data, and (3) computational determination of possible underlying neuronal populations that are spontaneously active or are activated by sensory stimulation or psychological events. Thus, electric brain activity can be characterized in terms of latency (i.e., processing times), synchronous involvement and extent of neuronal populations (i.e., field strength), and topographical distribution of frequency content or potential components. Practical applications are twofold: studies of functional states of the human brain, information processing, and motor planning in healthy subjects, and clinical questions on the intactness and functionality of the central nervous system of patients suspected of having nervous or psychiatric disease. Such experimental investigations are part of contemporary neurophysiological questions of the way global states affect brain functions such as processing of sensory or psychological information, motor planning and execution, and internal states related to emotion and cognition. In normal subjects as well as in patients, such processes can be studied and characterized by electric brain activity patterns. For clinical purposes, the question of deviation from normal is relevant. Such deviations may often not be obvious in visual inspection of EEG or evoked potential traces but are detectable in quantitative, numerical evaluation. The mapping of the electric scalp distribution patterns of healthy volunteers allows
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY L
R [Y]
L
R [Y]
0
0
0
0
0
−5
−5
−5
−5
−5
−10
+5 cm [X]
−5
0
L
+5 cm [X]
+5 cm [X]
0
L
−5
R [Y]
−10 [Z=−13]
0
+5 cm [X]
L
−5
R [Y]
0
L
+5 cm [X] R [Y]
+5
+5
+5
+5
0
0
0
0
0
−5
−5
−5
−5
−5
−10
−10 [Z=8]
−5
+5 cm [X]
0
L
−5
+5 cm [X] R [Y]
−10 [Z=15]
−5
+5 cm [X]
0
L
R [Y]
−10 [Z=22]
0
−5
+5 cm [X]
L
R [Y]
0
L
+5 cm [X] R [Y]
+5
+5
+5
+5
+5
0
0
0
0
0
−5
−5
−5
−5
−5
−10 [Z=29]
−10 [Z=36]
0
−10 [Z=−20]
+5
R [Y]
−5
−5
R [Y]
[Z=1]
L
−10 [Z=−27]
−10
L
R [Y] +5
[Z=−6]
0
L
+5
R [Y]
−5
R [Y]
+5
[Z=−34]
0
L
+5
−10 −5
R [Y]
+5
[Z=−41]
L
L
209
+5 cm [X]
−5
R [Y]
−10 [Z=43]
0
+5 cm [X]
−5
R [Y]
L
+5
+5
0
0
+5 cm [X]
0
−5
−10 [Z=57]
0
+5 cm [X]
−5
0
+5 cm [X]
Loreta-Key
0.000 −5
−10 [Z=50]
0.348
0.696
1.043
1.391 [×10−2]
−5
Left visual field; 144 ms −10 [Z=64]
−5
−10 [Z=71]
0
+5 cm [X]
−5
0
+5 cm [X]
Figure 10. The source location results of Fig. 9 displayed as a series of 17 slices that display horizontal tomograms. Note that activity is restricted to the right occipital areas throughout the brain. See color insert.
establishing normative data. It has been shown that neurological impairment or psychiatric disorders are characterized by distinct profiles of abnormal brain electric fields. Such significant deviations can statistically validated and used for clinical diagnosis and evaluating treatment efficacy (26). Brain topography also allows drawing conclusions about the possible localization of pathological activity such as spikes or ‘‘spike and wave’’ patterns in epilepsy or the location of abnormal activity caused by structural damage such as lesions or tumors. In general, sensory evoked brain activity is recorded in clinical settings to test the intactness of afferent pathways and central processing areas of the various sensory modalities in neurological patients. Event-related brain activity elicited during cognitive tasks has its main application in the fields of psychology and psychiatry
where cognition, attention, learning, and emotion are under study. These fields profit from the application of topographic mapping and analysis of brain electric activity in real time. Future applications of topographic mapping of electrophysiological activity will include coregistration of high time resolution EEG recordings with brain imaging methods such as functional MRI (see corresponding articles in this volume). It is to be expected that the collaboration of the fields will lead to functional imaging of brain activity that has high temporal and high spatial resolution. Acknowledgments I thank Dr. Roberto Pascual-Marqui for assistance in computing the source localization results illustrated in the section on the relationship of electric brain activity to intracranial brain processes.
210
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
ABBREVIATIONS AND ACRONYMS
22. R. D. Pascual-Marqui et al., Int. J. Psychophysiol. 18, 49–65 (1994).
CT ECG EEG EMG EOG ERP EP EPSP FFT FMRI GFP IPSP LORETA MEG FMT PET
23. J. Talairach and P. Tournoux, in Co-Planar Stereotaxic Atlas of the Human Brain, Thieme, Stuttgart, 1988.
computer tomography electrocardiogram electroencephalogram electromyogram electrooculogram event-related potential evoked potential excitatory postsynaptic potentials Fast Fourier Transform functional magnetic resonance imaging global field power inhibitory postsynaptic potentials Low Resolution Electromagnetic Tomography magnetoelectroencephalogram midline theta activity positron emission tomography
24. V. L. Towle et al., Electroencephalogr. Clin. Neurophysiol. 86, 1–6 (1993). 25. S. Matsuoka, Brain Topogr. 3, 203–208 (1990). 26. E. R. John et al., Science 239, 162–169 (1988). 27. H. H. Jasper, Electroencephalogr. Clin. Neurophysiol. 20, 371–375 (1958).
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER MICHAEL KOTLARCHYK Rochester Institute of Technology Rochester, NY
BIBLIOGRAPHY 1. A. L. Hodgkin and A. F. Huxley, J. Physiol. (Lond.) 117, 500–544 (1952). 2. E. R. Kandel, J. H. Schwartz, and T. M. Jessel, in Principles of Neural Science, 4th ed., McGraw-Hill, NY, 2000. 3. O. Creutzfeldt and J. Houchin, in O. Creutzfeldt, ed., Handbook of Electroencephalography and Clinical Neurophysiology, vol. 2C, Elsevier, Amsterdam, 1974, pp. 5–55. 4. F. H. Lopes da Silva, in R. Greger and U. Windhorst, eds., Comprehensive Human Physiology, Springer, NY, 1996, pp. 509–531. 5. W. Skrandies et al., Exp. Neurology 60, 509–521 (1978). 6. H. Berger, Arch. Psychiatr. Nervenkr. 87, 527–570 (1929). 7. E. Niedermeyer and F. Lopes da Silva, in Electroencephalography: Basic Principles, Clinical Applications, and Related Fields, 3rd ed., Williams and Wilkins, Baltimore, 1998. 8. R. R. Llinas, Science 242, 1654–1664 (1988). 9. D. Regan, in Human Brain Electrophysiology, Elsevier Science, NY, 1989. 10. W. Singer, Neuron 24, 49–65 (1999). 11. D. Lehmann, Electroencephalogr. Clin. Neurophysiol. 31, 439–449 (1971). 12. D. Lehmann, in A. Gevins, A. Remond, eds., Handbook of Electroencephalography and Clinical Neurophysiology, Rev. Series, vol. 1, Elsevier, Amsterdam, 1987, pp. 309–354. 13. W. Skrandies, Prog. Sens. Physiol. 8, 1–93 (1987). 14. D. Lehmann and W. Skrandies, Electroencephalogr. Clin. Neurophysiol. 48, 609–621 (1980). 15. W. Skrandies, Biol. Psychol. 40, 1–15 (1995). 16. W. Skrandies, in F. H. Duffy, ed., Topographic Mapping of Brain Electrical Activity, Butterworths, Boston, 1986, pp. 728. 17. D. Lehmann et al., Int. J. Psychophysiol. 29, 1–11 (1998). 18. H. von Helmholtz, Ann. Phys. Chem. 29, 211–233, 353–377 (1853). 19. P. L. Nunez, in P. L. Nunez, ed., Neocortical Dynamics and Human EEG Rhythms, Oxford University Press, NY, 1995, pp. 3–67. 20. Z. J. Koles and A. C. K. Soong, Electroencephalogr. Clin. Neurophysiol. 107, 343–352 (1998). 21. R. D. Pascual-Marqui, J. Bioelectromagnetism 1, 75–86 (1999).
INTRODUCTION All imaging systems map some spectroscopic property of the imaged object onto a detector. In a large fraction of imaging modalities, the imaged object directly or indirectly emits, reflects, transmits, or scatters electromagnetic (EM) radiation. Such is the case, for example, in radar and remote sensing systems, radio astronomy, optical microscopy, X-ray and positron-emission tomography, fluorescence imaging, magnetic resonance imaging, and telescope-based systems, to name but a few areas. This chapter addresses the following broad topics: 1. The basic nature and associated properties common to all types of EM radiation (1–3) 2. How radiation from different parts of the electromagnetic spectrum is generated (1,2,4) 3. The dispersion, absorption, and scattering of radiation in bulk material media (2,4–7) 4. Reflection and transmission at a material interface (2,4–7) 5. Interference and diffraction by collections of apertures or obstacles (2,4,7,8) 6. The mechanisms responsible for emitting, absorbing, and scattering EM radiation at the atomic, molecular, and nuclear levels (9–13) This treatment in no way attempts to compete with the many fine comprehensive and detailed references available on each of these a subjects. Instead, the aim is just to highlight some of the central facts, formulas, and ideas important for understanding the character of electromagnetic radiation and the way it interacts with matter. Historical Background — Wave versus Particle Behavior of Light A fundamental paradox in understanding the nature of electromagnetic radiation is that it exhibits both wave-like
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
and particle-like properties. Visible light, which we now know is one form of electromagnetic radiation, exhibits this dual behavior. Before the nineteenth century, the scientific community vacillated between one picture of light as a continuous wave that propagates through some all-pervasive, undetectable medium (often referred to as the luminiferous ether) and another picture where light was envisioned as a stream of localized particles (or corpuscles). The wave picture emerged in the forefront shortly after 1800 when the great British scientist, Thomas Young, and others such as Augustin Jean Fresnel in France, successfully explained a variety of interference and diffraction phenomena associated with light. In 1865, James Clerk Maxwell put forth a solid theoretical foundation for the wave theory of light when he clearly showed that the fundamental equations of electricity and magnetism predict the existence of electromagnetic waves (apparently through the ether medium) that propagate at a speed very close to the experimentally measured speed of light. In addition to providing the basis for electromagnetic waves in the visible spectrum, Maxwell’s equations also accounted for EM waves at lower and higher frequencies, namely, previously detected infrared and ultraviolet radiation. In 1888, when Heinrich Hertz published experimental verification of long wavelength (radio-frequency) EM waves, it became apparent that electromagnetic waves span a wide range of frequencies. An understanding of the truly unique character of light and other electromagnetic waves was not understood until around the turn of the century when scientists and mathematicians began to ponder the controversial experimental results published in 1881 and 1887 by the two American physicists, Albert Michelson and Edward Morley. In attempts to detect the motion of the earth through the luminiferous ether, they concluded, based on the high precision of their measurements, a null result — that is, the presence of the ether was undetectable. Around 1900, the hypothesis of the existence of an ether medium began to come into question. The ether hypothesis was completely rejected by Albert Einstein in the introduction of his Theory of Special Relativity in 1905. Around this time, it became clear that an electromagnetic wave is unlike any other wave encountered in nature in that it is the self-sustaining propagation of an oscillating electromagnetic field that can travel through vacuum without the need for a supporting physical medium. A wealth of phenomena that involve the interaction of light and matter are well-explained by the wave character of EM radiation. These include interference and diffraction phenomena, the reflection and refraction of light at an interface, trends in the optical dispersion of materials, and certain scattering and polarization effects. However, many important and even commonplace phenomena exist that are left unexplained by treating light and other electromagnetic radiation as a wave. In those cases, any paradoxes that arise can be resolved by treating light as a stream of discrete particles, or quanta. Historically, this concept, which apparently contradicts the continuous wave picture, is an outgrowth of ideas put forth by the German physicist, Max Planck,
211
in the year 1900. Until then, the observed frequency spectrum of thermal radiation from a hot, glowing solid, also known as blackbody radiation, was unexplained. Previous calculations of the spectral distribution led to the so-called ultraviolet catastrophe, in which the theory predicted, to a severe extent, an excessive amount of radiation emitted at high frequencies. Planck successfully resolved this dilemma by postulating that the walls of a blackbody radiator are composed of oscillators (i.e., atoms) that emit and absorb radiation only in discrete, or quantized, energy units. Only a few years later, Einstein explained the puzzling observations associated with the emission of electrons from metal surfaces exposed to light, a phenomenon known as the photoelectric effect, by similarly quantizing the electromagnetic radiation field itself. Eventually, the name photon was coined to denote a single unit, or quantum, of the radiation field. The discrete, particle-like behavior of light carries over to all regions of the electromagnetic spectrum. A striking illustration of this is the scattering of X rays by free electrons. In 1922, experiments by Arthur Compton, an American physicist, demonstrated that the wavelength shift exhibited by X rays upon scattering is solely a function of the scattering angle. A wave theory fails to predict this result. The only way to account for the experimental outcome correctly is to assume that a particle-like collision takes place between each incoming X-ray photon and a free electron (see section on Compton scattering by a free electron). Even though the wave and particle pictures (also sometimes referred to as the classical and quantum pictures, respectively) of light may appear contradictory, they are not. It is just that different types of experimental conditions elicit either the wave character or the particle character of electromagnetic radiation. For example, interference and diffraction experiments bring wave properties to the forefront, whereas counting experiments focus on particle behavior. Neither picture by itself is adequate to describe all features of radiation and the way it interacts with atoms, molecules, and bulk media. Rather, the various phenomena are understood by treating the wave and particle properties as complementary. So before addressing the various specific interactions between radiation and matter, the next two sections will present a quantitative, detailed, description of the wave and particle models. Classical Wave Description of Electromagnetic Waves in Vacuum Maxwell’s Prediction of EM Waves By the 1830s, it was believed that the behavior of electric and magnetic fields and their relationships to charges and electric currents were well understood. Experimental work performed by Coulomb, Biot and Savart, Ampere, and Faraday culminated in a set of four equations for the fields. Embodied in these equations is the observation that electric fields arise because of the presence of electric charge (Gauss’s law) or because of the existence of timevarying magnetic fields (Faraday’s law). Magnetic fields, on the other hand, arise because of the presence of moving charge, or electric currents (Ampere’s law). Unlike electric field lines that originate and terminate on charges,
212
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
magnetic field lines form loops; they have no starting or ending points (Gauss’s law for magnetism). Maxwell’s contribution was recognizing an inherent inconsistency in the field equations which he resolved by modifying Ampere’s law. He deduced that magnetic fields arise because of the presence of currents and also because of time-varying electric fields as well. In other words, a changing electric field basically acts like an effective current — it is referred to as a displacement current. The four field equations, where Ampere’s law is modified to include the addition of a displacementcurrent term, are the famous Maxwell’s equations. Without Maxwell’s contribution, electric/magnetic fields cannot propagate as waves. On the other hand, the effect of introducing Maxwell’s displacement current is essentially this: According to Faraday’s law, an oscillating magnetic field gives rise to an oscillating electric field. Because the electric field is time-varying, there is an associated displacement current which, in turn, gives rise to an oscillating magnetic field. Then, the sequence repeats over and over, and the coupled electric and magnetic oscillations propagate through space as a self-sustaining electromagnetic wave. Some of the basic mathematics and resulting wave characteristics are presented here. The Wave Equation and Properties of Plane Waves in Vacuum In vacuum, where no charges or currents exist, Maxwell’s equations can be reduced to a single vector equation for the electric field E, which is a function of position and time t: ∂ 2E (1) ∇ 2 E = ε0 µ0 2 . ∂t Similarly, the magnetic-field vector also follows this same equation. Here, ∇ 2 is the Laplacian operator which, in rectangular coordinates, is given by (∂ 2 /∂x2 ) + (∂ 2 /∂y2 ) + (∂ 2 /∂z2 ). The constants ε0 and µ0 are the electric permittivity and magnetic permeability of free (i.e., empty) space, respectively. They are given by ε0 = 8.854 × 10−12 C2 /N·m2 and µ0 = 4π × 10−7 N·s2 /C2 . Equation (1) is the standard, classical wave equation, and its basic solutions are waves that travel in three dimensions. The speed of these waves is given by 1 = 3.00 × 108 m/s. c= √ ε0 µ0
one says that the wave is linearly polarized along the x direction (other types of polarization will be discussed in the section on Polarization). The magnetic field vector B propagates in tandem and in phase with the electric field. For an EM plane wave in vacuum, Maxwell’s equations demand that, at any moment, the ratio of the field magnitudes is given by the speed of the wave: E = c. B
(3)
Furthermore, the vectors E and B are mutually orthogonal, as well as orthogonal to the direction of wave propagation. The fact that the oscillation of the field vectors is perpendicular to the direction in which the wave travels means that electromagnetic radiation propagates as a transverse wave. Although the diagram in Fig. 1 is useful for illustrating some of the features of the wave, it fails to convey why it is labeled a ‘‘plane wave.’’ From the picture, one gets the false impression that the field vectors are restricted to the z axis. The correct way to visualize the situation is to imagine that the direction of propagation is normal to a series of parallel, infinite, planar surfaces, or wave fronts, at various values of z. At any moment, the electric and magnetic field vectors are the same at every point on a given plane, that is, for the wave in Fig. 1, the fields are independent of the x and y coordinates. The electric field of a linearly polarized plane wave that propagates in the +z direction can be represented by the form (4) E(z, t) = E0 cos(kz − ωt + φ). The direction of E0 specifies the direction of the EM wave’s polarization, whereas the magnitude gives the amplitude of the wave. The cosine function gives the wave its simple oscillatory, or so-called harmonic, character. k and ω are parameters related to the wave’s characteristic wavelength λ and period T. A wavelength represents the repeat distance associated with the wave, in other words, the distance between two adjacent peaks (or troughs) at a fixed instant (see Fig. 2). A period, on the other hand, is the time it takes for one complete wavelength of the disturbance to pass a given point in space. Its inverse ν is
(2)
This is precisely the measured speed of light in vacuum, which confirms that light is a propagating electromagnetic wave. The most basic form of a traveling wave that satisfies the wave equation is that of a harmonic plane wave. A plane wave is often used to approximate a collimated, monochromatic light beam. In addition, the fact that the wave equation is linear allows one to add, or superimpose, various plane waves to construct more complicated solutions. Figure 1 depicts a snapshot of a plane electromagnetic wave traveling in the +z direction. The particular wave shown here is one where the electric field vector oscillates parallel to the x axis. In this case,
x E
+z
B y Figure 1. A linearly polarized electromagnetic plane wave propagating in the +z-direction. The electric and magnetic field vectors, E and B, are perpendicular to each other and to the direction of wave propagation.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Fixed time
z λ
C
Direction of propagation
E0
k Distance
Electric field
213
r
y
O
Fixed Position
T
x
Figure 3. One of the wave fronts of a plane wave that propagates with wave vector k. Notice that k is normal to the planar wave front and that the product k · r is fixed for all points on the given front.
Time
Figure 2. The wavelength λ and the period T of a harmonic wave of amplitude E0 that travels at speed c.
the frequency of the wave, ν=
1 , T
(5)
which is measured in units of cycles/s, or hertz (Hz). The wavelength and frequency are related simply through the speed c, of the wave, by λν = c.
(6)
The parameter k in Eq. (4) is called the wave number, and is inversely proportional to the wavelength: k=
2π . λ
(7)
ω represents the angular frequency of the wave, given by 2π . ω = 2π ν = T
Wave front (k • r = constant)
(8)
From the previous definitions of k and ω, Eq. (6) can also be written as ω = c. (9) k The entire argument of the cosine function in Eq. (4) is called the phase of the wave. At any fixed time, all points in space that have the same value of the phase correspond to a wave front — in the present discussion, they are the previously mentioned planar surfaces. Because the rate at which the wave propagates is equivalent to the speed of passing wave fronts (having constant phase), the speed of the wave is often referred to as the phase velocity. The parameter φ in Eq. (4) is the wave’s initial phase, and it simply gives the value of the field when both z and t vanish. Clearly, our assumption that the plane wave travels along the z axis is completely arbitrary and unnecessarily
restrictive. Mathematically, the way to specify a general plane wave is first to define a wave vector k. This is a vector that points in the direction in which the wave propagates and has a magnitude equal to the wave number. Then, the wave vector would be normal to any chosen planar wave front, and referring to Fig. 3, one sees that all points on such a wave front satisfy the condition k · r = const (r denotes the position vector from the origin to any point in space). Then, the appropriate form for the plane wave becomes (10) E(r, t) = E0 cos(k · r − ωt + φ). Energy and Momentum Transport by EM Waves Electric and magnetic fields store energy. The energy per unit volume, or the energy density (u), contained in any E and/or B fields at a point in space is given by u=
1 2 1 ε 0 E2 + B . 2 µ0
(11)
A by-product of Maxwell’s classical theory is the result that this energy can flow only from one location to another if, simultaneously, there is an electric and magnetic field in the same region of space. Specifically, the flow of energy is determined by the cross-product between the two fields. In vacuum, the flow is given by S=
1 E × B, µ0
(12)
where S is called the Poynting vector and its direction gives the direction of energy flow. The magnitude of the Poynting vector corresponds to the instantaneous rate of energy flow per unit area, or equivalently, the power per unit area. Its corresponding SI units are watts per square meter (W/m2 ). For an electromagnetic plane wave, one can see that using a right-hand rule for the direction of the crossproduct in Eq. (12) does, in fact, give the expected direction for the energy flow, namely, in the direction of the wave propagation (see Fig. 1). From Eqs. (2), (3) and (10), the
214
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
magnitude of this flow becomes S=
1 2 1 EB = E µ0 µ0 c
= cε0 E20 cos2 (k · r − ωt + φ).
(13)
The cosine function oscillates very rapidly (for example, optical frequencies are on the order of 1015 Hz) and the squared cosine oscillates at twice that frequency. In most cases, the rapid variations cannot be resolved by any radiation detector one might use; hence the Poynting vector is usually averaged over many oscillations. Because the squared cosine function oscillates symmetrically between the values of zero and one, it is simply replaced by the value 1/2. A number of names are used for the magnitude of the averaged Poynting vector S — they include irradiance, intensity, and flux density. We will use either irradiance or intensity, and denote the quantity by I. So, for a plane EM wave, I ≡ S = 12 cε0 E20 .
(14)
The most important result is that the irradiance is proportional to the square of the field amplitude. Classical electromagnetic waves also carry momentum. One way to see this is to consider what occurs when a plane wave strikes a charged particle, say, an electron. At any given instant, the negatively charged electron will be pushed opposite to the direction of the E field. Then, the acquired velocity of the charge is directed perpendicular to the wave’s B field. However, because any moving charge in a magnetic field experiences a lateral force perpendicular to both the particle’s velocity and the B field, the particle is pushed in the direction of the wave propagation. Consequently, the wave imparts momentum to the particle along the direction of the wave vector. The momentum per unit volume, or the momentum density (g), carried by the original wave is proportional to the Poynting vector and is given by g=
1 S. c2
(15)
The transfer of momentum from an EM wave to a material surface is responsible for the phenomenon of radiation pressure. For example, a plane wave that strikes a perfectly absorbing surface at normal incidence imparts a radiation pressure of P=
1 S. c
(16)
Hence, from Eq. (14), the average pressure is P = 12 ε0 E20 .
Figure 4. Spherical wave fronts that emanate from a point source.
constant phase) are spheres centered on the source. As time goes on, these spheres expand outward from the source, as shown in Fig. 4. Assuming that the power that emanates from the source is constant in time, the energy flow (per unit time) carried by the various wave fronts must be the same. Because the area of a wave front is proportional to the square of its radius r, the irradiance, which is the rate of energy flow per unit area, must be proportional to 1/r2 . This is known as the inverse square law. Because irradiance is proportional to E2 , it follows that the field amplitude falls off proportionally to 1/r. Quantum Nature of Radiation and Matter Fundamental Properties of Photons The framework for treating light and other electromagnetic radiation as discrete particles was proposed by Einstein in 1905. In this picture, the transport of radiant energy occurs via photons. A photon is a quantum, or excitation, of the radiation field. It has zero rest-mass and propagates at the speed of light. Each photon carries energy E that is determined solely by the frequency or wavelength of the radiation according to E = hν =
hc , λ
(18)
where h is Planck’s constant, which has the experimentally determined value 6.6261 × 10−34 J·s. Equivalently, from Eqs. (8) and (9), one can express the photon energy in terms of the radiation field’s angular frequency or wave number: E=h (19) ¯ω=h ¯ ck, where h ¯ = h/2π = 1.0546 × 10−34 J · s = 6.583 × 10−16 eV·s (note: h ¯ is called h-bar). A photon also carries a linear momentum p whose magnitude is p=
hν h E = = =h ¯ k. c c λ
(20)
(17)
Then, the vector momentum is directly related to the wavevector k via p=h (21) ¯ k.
When a small, point-like source emits radiation isotropically in all directions, the wave fronts (i.e., surfaces of
At high frequency or wave number, the momentum of a photon is large, and the radiation exhibits behavior that is
Spherical Waves
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
the most particle-like. X rays and gamma rays fall in this category. In the photon picture, intensity or irradiance is determined by the number of photons carried by the illumination. More specifically, it is the product of the photon flux, defined as the number of photons per unit time per unit area and the energy of a single photon. This means that at lower frequencies, where the photon energy is smaller, there are more photons in a beam of a given intensity. According to a basic tenet of quantum mechanics known as the correspondence principle, the larger the number of quanta, the more classical-like the behavior. Hence, low-frequency radio and microwave radiation has a general tendency to be more wave-like in its behavior. However, under certain conditions, radiation at any frequency whatsoever can display wave or particle characteristics. Energy Levels and Transitions in Atomic, Molecular, and Nuclear Systems The idea of photon energy is closely related to the scale of energies encountered in atomic, molecular, and nuclear systems. The rules that govern the properties of these submicroscopic systems are dictated by the laws of quantum mechanics (1,12,15). The framework of quantum theory, most of which was developed during the 1920s, predicts that the basic building blocks of matter, namely, electrons, protons, and neutrons, bind together so that the total energy of an atom, molecule, or atomic nucleus may only take on certain well-defined, discrete values. A transition from one energy level to another is accompanied by the absorption or emission of a photon.
Atomic Transitions and Line Spectra. Figure 5 illustrates an electron transition in an atom. To raise an atom from a low-energy level to a higher level, a photon must be absorbed, and its energy must match the energy difference between the initial and final atomic states. When an atom deexcites to a lower energy level, it must emit a sequence of one or more photons of the proper energy. Atomic transitions are characterized by photon energies that range from a few eV to the order of keV, depending on the type of atom and the particular transition involved. The frequency of the absorbed or emitted radiation is determined solely by photon energy (i.e., E = hν), which in turn is determined by the allowed transitions of atomic electrons. As a result, only certain select frequencies appear in atomic absorption and emission spectra; this gives rise to the observed characteristic line spectra of atoms. Nuclear Transitions. Protons and neutrons bind within atomic nuclei at certain discrete, quantized values of energy. Nuclear levels and photon energies are orders of magnitude larger than those of atomic transitions. A rough estimate of the energies involved can be calculated from the basic Heisenberg uncertainty principle which states that the product of the uncertainties in a particle’s position and its linear momentum must be at least as large as h ¯ /2: xp ≥ h (22) ¯ /2.
215
(a) Allowed electron orbitals
e +
hn
e
(b)
hn +
∆Eatom = hn (c)
hn1 hn2
e +
hn1 + hn2 = hn2 = −∆Eatom Figure 5. Electron transitions in an atom. (a) A photon of energy hν incident on an atom in a low-energy state. (b) If the photon energy matches the energy difference Eatom between two electronic states of the atom, the photon is absorbed, and the atom is raised to an excited state. (c) An atom can deexcite to a lower energy state by emitting a photon. The atom shown here deexcites by undergoing two transitions in rapid sequence.
A given nucleon (i.e., proton or neutron) is essentially localized within the confines of the nucleus, or x ∼ 10−6 nm. The uncertainty in the nucleon’s momentum √ can be calculated from p = p2 = 2mE, where m = 1.67 × 10−27 kg is the mass of the proton or neutron and E is its (kinetic) energy. Then, the uncertainty relationship predicts that
E≥ ≥
2 h ¯ 8m(x)2
(hc)2 , 32π 2 (mc2 )(x)2
(23)
where the second form is particularly convenient for calculation. This gives a rough estimate of the lowest possible quantum energy, or the ground-state energy, of a typical nucleus. Using hc = 1240 eV·nm and mc2 ≈ 940 MeV, we see that the ground-state energy is on the order of 5 MeV. The spacings between nuclear levels are also typically in the MeV range or greater, and so are the corresponding photon energies. These energies are huge compared to those of atomic electrons.
216
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Vibrational and Rotational States in Molecules. Electron transitions in molecules exhibit energy spacings that are similar to those in individual atoms; they are on the order of eV to keV. However, molecules are more complicated than atoms due to vibrational and rotational motions of the constituent atomic nuclei (14). As an example, consider a diatomic molecule that consists of two atomic nuclei (each of mass M) and the various orbiting electrons. Even though this is the simplest type of molecule, it represents a complex many-body system that has many degrees of freedom. Specifically, each of the two atomic nuclei has three degrees of freedom (corresponding to the three coordinate directions), and each electron in the molecule has the same. At face value, the problems connected with finding the energy states in even a light diatomic molecule appear almost insurmountable. Fortunately, however, a clever method exists that enables one to determine molecular energy states; it is known as the Born–Oppenheimer approximation (16,17). Here, one takes advantage of the fact that nuclei are much more massive than electrons — thus, they move much more slowly than the electrons in a molecule. As a first approximation, therefore, the nuclei can be regarded as having a fixed separation R in space. For each such separation, one then determines the possible values of the total electronic energy ε of the molecule, that is, there is an ε versus R curve associated with each possible electronic energy level. A central assertion of the Born–Oppenheimer approximation is that, for each electronic level, these same ε versus R curves also play the role of the internuclear potential energy function V(R). Typically, each V(R) curve has a rather sharp minimum, as illustrated in Fig. 6. Consequently, the two nuclei have a highly preferred separation at some R0 . The constant V(R0 ) corresponds to the binding energy of the molecule when it is in the electronic state under consideration. The nuclear motions may be neglected as far as the basic structure of the molecule is concerned. However, to analyze photon spectra, it is important to consider the nuclear degrees of freedom; they give rise to closely spaced vibrational and rotational states that accompany each electronic level.
For a given electronic state, the characteristics of the associated vibrational levels are obtained by approximating the V(R) curve in the vicinity of its minimum by a simple quadratic function. This is the characteristic potential function for a simple harmonic oscillator, and has the form V(R) ≈ V(R0 ) + 12 Mω2 (R − R0 )2 .
Put another way, the two nuclear masses have equilibrium separation R0 and vibrate relative to one another at some characteristic frequency ω. This vibrational frequency is determined by the curvature, or sharpness, of the V(R) curve. Given the frequency, the vibrational amplitude is determined by the vibrational energy of the molecule. A quantum-mechanical treatment of the harmonic oscillator shows that the vibrational energy levels are discrete and are equally spaced by separation h ¯ ω (see Fig. 7). The spacing of vibrational levels is on the order of 100 times smaller than that of the electronic levels. This gives a corresponding vibrational amplitude on the order of only one-tenth of the equilibrium separation R0 . This means that diatomic molecules are rather stiff and one is justified in treating the rotations of the molecule independently. At a first approximation, the rotational energy levels of a molecule correspond to that of a rigid rotator that rotates about its center of mass (see Fig. 8a). According to quantum theory, the angular momentum vector L of this (or any other) system is quantized. This leads to the fact that the molecule’s rotational kinetic energy is quantized as well. The allowed energy levels are given by E =
2 ( + 1)h ¯ MR20
V(R)
(25)
and are indexed by the orbital angular momentum quantum number . The possible values of are = 0, 1, 2, 3, . . . .
(26)
In addition, there are 2 + 1 rotational states for each value of (or value of the energy). Each of these states has the same magnitude of orbital angular momentum but a different value for its z component. The possible values for the z component of the angular momentum vector are Lz = mh ¯,
R0
(24)
(27)
E2 = 5hw / 2 R
hw E1 = 3hw / 2 hw E0 = hw / 2 E=0
Figure 6. Internuclear potential energy function V(R) for a diatomic molecule. R0 is the equilibrium separation between the two nuclei of the molecule.
Figure 7. The vibrational energy levels of a diatomic molecule are equally spaced. The spacing is simply h ¯ ω, where ω is the characteristic vibrational frequency of the molecule.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Let S represent the spin of a particle. Because it represents a type of angular momentum, quantum mechanically intrinsic spin can be treated in basically the same way as orbital angular momentum L. One can define a spin angular momentum quantum number s. For the spin angular momentum, it turns out that the possible values of s are (29) s = 0, 12 , 1, 32 , 2, . . . .
(a)
R0/2
M
R0/2
M
Fixed midpoint (b)
l
El
m
3
6h2/I
+3 +2 +1 0 −1 −2 −3
2
3h2/I
+2 +1 0 −1 −2
1
h2/I
+1 0 −1
0
0
0
Figure 8. Rotations of a diatomic molecule. (a) Model of the molecule as a rigid rotator that rotates about its center of mass. (b) Rotational energy levels, E . Each energy level has (2 + 1) distinct quantum states. Here, I = MR20 /2 is the moment of inertia of the molecule about its center.
where m is called the magnetic quantum number. Its (2 + 1) possible values are m = −, −( − 1), · · · , 0, · · · , +( − 1), +.
217
(28)
Because a number of different quantum states are associated with a given energy level E , one says that each energy level is (2 + 1)-fold degenerate. Figure 8b shows the energy-level diagram for the rigid rotator. The 2 spacing of energy levels, which is on the order of h ¯ /MR20 , is typically about 100 times smaller than the spacing between the vibrational levels of a molecule (or some 10,000 times smaller than the spacing between electronic levels).
Magnetic Splitting of Spin States. Electrons and atomic nuclei possess a magnetic dipole moment m. When these particles are placed in an external magnetic field, the vector m tends to align with the field lines. In effect, electrons and nuclei behave like submicroscopic bar magnets, where m points along the axis of the magnet (from the south to the north pole) and |m| is a measure of the magnet’s strength. The magnitude of the magnetic dipole moment (or magnetic moment, for short) is an unalterable fundamental property of the electron, proton, and neutron and is proportional to the quantum spin, or intrinsic angular momentum, of each type of particle. In an atomic nucleus, the individual spins and orbital angular momenta of the various nucleons combine to produce a net angular momentum. Nevertheless, one still speaks of a net nuclear spin and its corresponding magnetic moment.
For a given value of s, there are 2s + 1 spin states; each has the same magnitude of spin angular momentum, but each has a different value for its z component. The possible values for the z component of the spin angular momentum vector are (30) Sz = mh ¯, where, as before, m is the magnetic quantum number, but now its possible values are m = −s, −(s − 1), · · · , 0, · · · , +(s − 1), +s.
(31)
In the absence of an external magnetic field, there is no difference between the energies of spin states that have different values of m and the same value of s. Hence, each value of s is (2s + 1)-fold degenerate. By turning on a magnetic field, the different m states split into distinct energy levels; they become nondegenerate states. This is called the Zeeman effect and comes about because there is an interaction between the field B and the particle’s magnetic moment m. If the magnetic field is uniform and points in the +z direction so ˆ the classical interactive energy is given by that m = B0 k, E = −µz B0 .
(32)
This basically says that as the moment becomes more and more aligned with the field, the energy becomes lower and lower. As mentioned previously, the magnetic moment is directly proportional to the particle’s spin: m = ±γ S.
(33)
The proportionality constant γ is called the gyromagnetic ratio of the particle, and its value depends on the type of particle. The upper sign is used for nuclei and the lower sign is used for electrons. Combining Eq. (33) with Eq. (32) and then substituting in Eq. (30) gives E = ∓γ B0 Sz = ∓mh ¯ γ B0 .
(34)
This result shows that for a particle whose spin quantum number is s, the degeneracy is split into 2s + 1 energy levels (or Zeeman states) that are equally spaced by an amount (35) E = h ¯ γ B0 . The spin quantum number of an electron, is always s = 1/2, so the magnetic quantum number m is either +1/2 or −1/2. As a result, when an electron is placed in a magnetic field, there are only two possible energy
218
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
states. The lower energy level is the m = −1/2 state, and the upper level corresponds to m = +1/2. The spacing between the two levels is obtained from Eq. (35). Given that the gyromagnetic ratio of an electron is γ = 1.76 × 1011 T−1 ·s−1 , each tesla of applied field produces a splitting of 1.16 × 10−4 eV. Atomic nuclei have spins restricted to half or whole integers and can take on a variety of values. So, when a nucleus is placed in an external field, a number of possible energy states can appear. For nuclei, the lower energy levels correspond to the higher values of m, whereas the upper energy levels correspond to lower m values. The value of γ depends on the specific nuclear species in question. However, to get a sense of the spacing between levels, consider the simplest nucleus, namely a proton (or hydrogen nucleus), which has a spin of 1/2. This produces two energy levels. The gyromagnetic ratio is 2.67 × 108 T−1 ·s−1 , and each tesla of field splits the levels by only 1.76 × 10−7 eV, or about three orders of magnitude less than the splitting of electron spin states. Regions of the Electromagnetic Spectrum The energy levels and transitions just described give rise to an enormous range of photon energies that span many orders of magnitude. This, in turn, represents a correspondingly wide range of wavelengths (or frequencies). Figure 9 displays the entire gamut of the EM spectrum. Even though it forms a continuum, the spectrum is partitioned (somewhat artificially) into the following regions: radio waves, microwaves, infrared radiation, visible light, ultraviolet radiation, X rays, and gamma rays. The divisions between adjacent regions of the spectrum are not abrupt — rather, they are fuzzy and allow some degree of overlap between one region and the next. The characteristics of the different spectral regimes are presented here:
10−12
Gamma rays
1020
X rays
103
10−8
1016
Ultraviolet Visible 10−4
106
1
Infrared 1012
10−3
Microwaves 1 Radio waves
108
10−6
10−9 104 Wavelength (m)
Frequency (Hz)
Photon(eV) energy
Figure 9. The electromagnetic spectrum.
Radio Waves Radio waves have wavelengths (in vacuum) that exceed about 10 cm and have frequencies typically in the MHz region. The corresponding photon energies are about 10−5 eV or less. These are too small to be associated with electronic, molecular, or nuclear transitions. However, the energies are in the right range for transitions between nuclear spin states (nuclear magnetic resonance) and are the basis for magnetic resonance imaging. Radio waves are used in television and radio broadcasting systems and are generated by oscillating charges in antennas and electric circuits (see section on Electric dipole radiation). Microwaves The wavelengths of microwaves are in roughly the 30-cm to 1-mm range, and have frequencies of the order of GHz. Microwaves correspond to photon energies typically between about 10−6 and 10−3 eV. These energies correspond to transitions between rotational states of molecules, as well as transitions between electron spin states (electron spin resonance). Microwaves are easily transmitted through clouds and the atmosphere, making them appropriate for radar systems, communication, and satellite imaging. In radio astronomy, a particularly prevalent source of microwaves is the atomic hydrogen that is found in great abundance throughout various regions of space. A 21-cm spectral line is observed when hydrogen is present in large enough quantities. The origin of this line is that the electron in hydrogen, when considered in its own frame of reference, sees a circulating proton, which acts like a current loop. From Ampere’s law, any current loop produces a B field. As a result, the magnetic moment of the spinning electron is immersed in a magnetic field, and its spin states are split (Zeeman effect). The splitting is known as the fine structure of hydrogen, and astronomers find that the corresponding microwave emission provides a great deal of information about the structure of galaxies. Infrared Radiation This region of the spectrum is subdivided into the far IR (λ ∼ = 1 mm → 50 µm), mid IR (50 → 2.5 µm), and the near IR (2.5 → 0.78 µm). Frequencies range from ∼3 × 1011 Hz in the far-IR region to ∼4 × 1014 Hz just near the red end of the visible spectrum. The corresponding values of photon energy are on the order of 10−3 eV up to a few eV. Infrared (IR) radiation is generally associated with the thermal motion of molecules. The far IR is related mostly to molecular rotations and low-frequency vibrations. Absorption and emission in the mid-IR region is caused primarily by vibrational transitions. The near IR is associated with molecular vibrations and certain electronic transitions. By virtue of their finite temperature, all bodies emit (and absorb) in the infrared. Most heat transfer by radiation occurs in the IR region of the spectrum. In general, the warmer the object, the more IR radiation that is emitted from its surface (see section on Thermal sources). Because of this, variations in the surface temperature of objects can easily be imaged by using
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
infrared detectors. Many areas of imaging technology, including thermal imaging, remote sensing of the earth’s surface, and certain medical imaging techniques, are based on IR emission and its detection. Visible Light The human visual system responds to the narrow range of frequencies between about 3.8 × 1014 Hz and 7.9 × 1014 Hz that is referred to as visible light. The wavelengths of light in vacuum, range from about 780 nm at the extreme reddish end of the visible spectrum to about 380 nm at the violet extreme. The acronym ROY-G-BIV is useful for remembering the sequence of colors perceived as frequency increases; red, orange, yellow, green, blue, indigo, and violet. Table 1 lists the range of frequencies and vacuum wavelengths that correspond to each perceived color. The table is only a general guide because individual perceptions vary somewhat. Light that is approximately an equal mixture of the various frequencies in the visible spectrum appears as white light. The color response of the human eye and brain to light is rather complex, especially when presented with a mixture of frequencies. For example, light that contains a 50–50 mixture of red light and green light appears yellow, even though no yellow frequency component is present. In addition, certain pairs of frequencies evoke the same response in the visual system as that produced by broadband white light. The perception of whether an object is white or not is also context sensitive. For example, the processing by the human visual system is such that an object may be interpreted as white, even when observed under totally different illumination conditions, say, when viewed under the light of a fluorescent lamp or under bright sunlight. Photon energies in the visible region are between 1.6 eV (red) and 3.2 eV (violet). These energies are generally associated with the excitation and deexcitation of outer electrons in atoms and molecules. Ultraviolet Radiation The ultraviolet (UV) region of the spectrum is subdivided into the near UV (λ ∼ = 380 nm → 200 nm) and the far UV (200 nm → 10 nm). Frequencies range from ∼8 × 1014 Hz just above the visible spectrum to ∼3 × 1016 Hz at the extreme end of the far UV. Photon energies which range between about 3 and 100 eV are of the order of magnitude of ionization energies and molecular dissociation energies
Table 1. Frequencies and Vacuum Wavelengths for Colors in the Visible Spectrum Color Red Orange Yellow Green Blue Violet
Frequencies (×1014 Hz)
Wavelengths (nm)
3.85–4.84 4.84–5.04 5.04–5.22 5.22–6.06 6.06–6.59 6.59–7.89
780–620 620–595 595–575 575–495 495–455 455–380
219
of many chemical reactions. This accounts for various chemical effects triggered by ultraviolet radiation. The sun is a powerful source of ultraviolet radiation and is responsible for the high degree of ionization of the upper atmosphere — hence, the name ionosphere. Solar radiation in the ultraviolet region is often subdivided into UVA (320–380 nm), UV-B (280–320 nm), and UV-C (below 280 nm). The reason for the different labels is that each region produces different biological effects. UV-A radiation is just outside the visible spectrum and is generally not harmful to tissue. UV-B radiation, on the other hand, can lead to significant cell damage. The effects of UV-C can be worse still, even potentially lethal. However, gases in the earth’s upper atmosphere absorb radiation in this region. The gas most effective for absorbing radiation is ozone (O3 ), which has led to the recent concern over depletion of the earth’s ozone layer. X Rays This region of the EM spectrum extends from wavelengths ˚ or of about 10 nm down to wavelengths of about 0.1 A, frequencies in the range 3 × 1016 –3 × 1019 Hz. Photon energies run between approximately 100 eV to the order of a few hundred keV. The lower end of the energy range is often referred to as the soft X-ray region, and the highenergy extreme is called the hard X-ray region. X rays are associated with transitions that involve the inner, most tightly bound, atomic electrons. If an atom is ionized by removing a core electron, an outer electron will cascade down to fill the vacancy that was left behind. This transition is accompanied by the emission of an X-ray photon (so-called X-ray fluorescence). The energy spectrum of X rays emitted by this process is specific to the particular atomic species involved. In effect, this provides a fingerprint of the different elements. X rays produced in this fashion are called characteristic X rays. A common method for producing X rays is to accelerate a beam of electrons to high speed and then have the beam strike a metallic target. This is the method for producing X rays in commercial X-ray tubes. When the energetic electrons rapidly decelerate in the target, a broad, continuous spectrum of radiation known as bremsstrahlung, which consists primarily of X rays, is generated (see section on Bremsstrahlung). Superimposed on this continuous spectrum are the discrete lines of the characteristic X rays of the target atoms. Medical X-ray imaging and computerized tomography (CT) use the difference in the X-ray attenuation of different anatomical structures, especially between bone and soft tissue. X-ray astronomy is concerned with detecting and imaging X rays emitted by distant stars, quasars, and supernovas, as well as other sources, including our sun. Often, the origin of these X rays is the high temperature of the emitting object, which can be millions of degrees. At such extreme temperatures, the corresponding blackbody radiation consists primarily of X rays. Gamma Rays Gamma radiation is nuclear in origin. The wavelengths ˚ and frequencies run from about 0.1 A˚ to less than 10−5 A,
220
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
range from ∼1019 Hz to more than 1023 Hz. Because their photon energies are so large, of the order of keV to GeV, gamma rays are highly penetrating radiation. Gamma-ray (γ -ray) emission accompanies many of the nuclear decay processes of radioactive nuclides. Photons in this regime readily produce ionization and, in some cases, can initiate photonuclear reactions. The most common type of radioactivity is nuclear beta decay (β decay), of which there are two types — β − decay and β + decay. In β − decay, the nucleus emits an electron, which causes the atomic number of the nuclide to increase by one. In β + decay, the nucleus emits a positron (i.e., a positively charged electron), that causes the atomic number to decrease by one. These processes are important for γ -ray imaging in nuclear medicine because gamma rays are almost always a secondary by-product of the β decays that occur in radiopharmaceuticals administered to the patient. One reason for this is that when the original (or parent) nucleus decays by either β − or β + decay, it produces a new (or daughter) nucleus in an excited state. The subsequent nuclear deexcitation is accompanied by one or more gamma-ray transitions. The other reason is that the positron emitted during β + decay always combines with a nearby electron, and the two particles mutually annihilate one another — a process known as electron–positron annihilation. The result is that the charged-particle pair vanishes and two back-to-back 511keV photons are created. In positron-emission tomography (PET), these two photons are simultaneously detected by two oppositely facing detectors. Generation of EM Radiation This section describes the basic principles behind producing electromagnetic radiation and identifies some of the radiative sources commonly found in practice; many are encountered in imaging systems: Electric Dipole Radiation According to classical theory, a charged particle that moves at constant velocity does not radiate any electromagnetic energy. However, should the particle experience an acceleration a, the surrounding field lines undergo rearrangement, and energy is carried away as electromagnetic radiation. As long as the accelerated charge is always moving at speeds much less than the speed of light (i.e., v c), the total power P radiated by such a particle (charge q) is given by q2 a2 . (36) P= 6π ε0 c3 This is known as Larmor’s formula. Probably the most fundamental way to generate electromagnetic waves is to oscillate a charged particle harmonically back and forth along a line, say, the z axis, according to z(t) = z0 cos ωt. This is often referred to as an oscillating electric dipole and is characterized by an instantaneous electric dipole moment of magnitude p, defined as p(t) ≡ q · z(t). (37)
The oscillating particle undergoes an acceleration a(t) ≡ d2 z/dt2 = −z0 ω2 cos ωt; hence it radiates and produces socalled electric dipole radiation. According to Eq. (36), the time-averaged power radiated by the dipole is P =
p20 ω4 , 12π ε0 c3
(38)
where p0 ≡ qz0 is the size of the dipole moment (note: the average value of cos2 ωt is simply 1/2). An important result is that the power radiated is proportional to the oscillatory frequency to the fourth power. Sufficiently far from the dipole (in the so-called far-field or radiation zone), the energy is carried away as an outgoing electromagnetic wave that has the same frequency as the oscillating charge. For a small dipole (i.e., z0 much less than the emission wavelength), the angular distribution of the radiation follows a sin2 θ form, as depicted in Fig. 10, where θ represents the angle from the oscillation axis of the dipole (the z axis). This means that an oscillating electric dipole does not radiate along its axis. On the other hand, the radiated wave is strongest in directions along the dipole’s midplane. Another important fact is that the radiated waves are linearly polarized, and the electric field oscillates in a plane that contains the dipole. The electric-field lines in such a plane are illustrated in Fig. 11. Probably, the most direct practical application of these ideas is the radio-frequency (RF) transmitting antenna. In this case, a long, thin conductor is driven at its midpoint by a sinusoidally varying voltage or current and pushes free electrons up and down at the frequency of the desired radiation. The performance is optimized by using a halfwave antenna, where the length of the antenna is set at one-half the wavelength of the emitted radio waves. For example, AM radio stations transmit at a frequencies near 1 MHz, which gives corresponding wavelengths of a few hundred meters. Therefore, to act as a half-wave antenna, the height of the transmitting tower must be on the order of 100 m. At microwave frequencies (on the order of a GHz or larger), the length of a simple half-wave antenna is in the centimeter to submillimeter range. In many radar and telecommunication applications, this size turns out to be too small for practical purposes and necessitates a number of other designs for antennas and antenna arrays in the
z q
Figure 10. Angular distribution of the far-field radiation emitted by a small electric dipole oscillating along the z axis.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
221
Speeds of this magnitude correspond to kinetic energies (K) very large compared to the rest energy of an electron (m0 c2 = 511 keV), or γ ≡ K/m0 c2 1. Currently, there are approximately 75 operating or planned storage-ring synchrotron radiation sources worldwide (20). Just under half of them are located in Japan, the United States, and Russia. The most energetic sources produce electron energies up to 7 GeV (or γ ∼104 ). A highly relativistic electron (charge e) that travels in an orbit of radius R radiates a total power of (21) P∼ = γ4
Figure 11. Snapshot of the electric field lines surrounding a small oscillating electric dipole. The dipole, located at the center, is oriented vertically (adapted from Fundamentals of Electromagnetic Phenomena by P. Lorrain, D. R. Corson, and F. Lorrain. Copyright 2000 W. H. Freeman and Company. Used with Permission.)
microwave region, including reflector-type antennas, lens antennas, and microstrip antennas (18), the details of which are outside the scope of this article. The EM radiation emitted by individual atoms, molecules, and quite often, nuclei is predominantly electric dipole radiation. This can be understood by considering, for simplicity, the circular orbit of a single electron in an atom (or proton in a nucleus). First of all, even though the particle maintains constant speed, it accelerates nonetheless because it experiences a centripetal acceleration toward the center of the orbit. Because the charge accelerates, it radiates electromagnetic energy. Secondly, circular motion is actually a superposition of two straight-line harmonic oscillations in orthogonal directions. Consequently, one can basically treat the circular motion of the charged particle as equivalent to an oscillating electric dipole whose angular vibrational frequency ω matches the angular velocity of the circulating charge, and the radiation emitted is electric dipole in character. One might argue that Larmor’s formula and the resulting production of electric dipole radiation are based on classical physics, whereas atoms, molecules, and nuclei are inherently quantum objects. However, except for a few modifications to be discussed later on (see section on Absorption and emission of photons), the results of quantum-mechanical calculations agree essentially with classical predictions. Synchrotron Radiation The synchrotron, a source of intense, broadband radiation, is a large facility used to accelerate electrons up to extremely high speeds and steer them around a circular storage ring at approximately constant speed. Bending magnets are used to maintain the circular trajectory, and the resulting centripetal acceleration causes the electrons to emit so-called synchrotron radiation (19). The electron speeds v attained are highly relativistic, which means that they are very close to the speed of light (v/c ≈ 1).
e2 c . 6π ε0 R2
(39)
At each instant, the synchrotron radiation is preferentially emitted in the direction of the electron’s velocity, tangent to the circular path. The radiation is concentrated within a narrow cone of angular width 1/γ (∼0.1 to 1 mrad). As the electrons circulate around the storage ring, the radiative beam resembles that of a searchlight. To use the radiation, a momentary time slice, or pulse, of this searchlight is sampled through a beam port. Because the duration t of the pulse is very short, the radiation received has a large bandwidth, ω = 2π/t. The resulting frequency spectrum is broad and continuous. An important parameter of the spectrum is the critical frequency, given by ωc = 3cγ 3 /2R. This corresponds to a corresponding critical photon energy Ec = γ 3
3h ¯c . 2R
(40)
Up to the critical value, all frequencies (or photon energies) strongly contribute to the radiated spectrum. However, above the critical value, the radiation spectrum drops off exponentially, and little contribution is made to the spectrum (5). An important property of synchrotron radiation is that, when viewed in the same plane as the circular orbit, it is linearly polarized along a direction parallel to that plane. When viewed above or below the orbit plane, the radiation is elliptically polarized (see section on Types of polarization). Often, it is desired to enhance the X-ray portion of the emission spectrum. In practice, this is accomplished by inserting either a beam undulator or a wiggler magnet. The actions of these devices and their effects on the angular distribution, frequency spectrum, and polarization of synchrotron radiation are discussed in the references. Suffice it to say that the addition of these devices has extended the usable spectrum into the hard X-ray regime (22). Astronomy provides many examples of naturally occurring synchrotron radiation. High-speed charged particles are found in the magnetic fields that permeate various regions of space. When this happens, the particles follow circular or helical trajectories and emit synchrotron radiation. Examples of this include radiation from particles trapped in the earth’s magnetic field, from sunspots, and from certain nebulae.
222
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Thermal Sources The surface of any object at temperature T emits (and absorbs) thermal energy as electromagnetic radiation. The spectrum has the form 2π hc2 I(λ, T) = ε(λ) λ5
1 ehc/λkB T − 1
,
(41)
where kB = 1.381 × 10−23 J/K is Boltzmann’s constant. I(λ, T) represents the so-called spectral radiant flux density (or spectral radiant exitance), which is the power emitted per unit surface area per unit wavelength. The dimensionless quantity ε(λ) represents the emissivity of the radiating surface and, in general, is a function of wavelength. A surface whose emissivity is unity over all wavelengths is an ideal emitter (and absorber) and is called a blackbody radiator. It produces the Planck blackbody radiation spectrum. A graybody radiator is characterized by a constant emissivity and has a value less than unity. Figure 12 shows the shape of the blackbody spectrum for a few different temperatures. As temperature increases, the spectra peak at shorter and shorter wavelength λmax . This relationship is given quantitatively by the Wien displacement law: λmax T =
hc = 2.90 × 10−3 m · K. 5kB
(42)
For example, at typical ambient temperatures, say ∼300 K, the spectrum peaks in the mid IR at around 10 µm. It is not until a body is heated to 4000–5000 K that the thermal spectrum begins to peak in the visible. As the temperature is increased further, the peak shifts from red to violet and eventually enters the UV region at temperatures of 6000–7000 K. Notice that when the peak falls, for example, in the blue, there is still plenty of radiation emitted at the red, yellow, and green wavelengths. The mixture of different visible wavelengths
produces the glow of white-hot surfaces. At temperatures that exceed 106 –108 K, like those encountered in the plasmas of evolving stars and some nuclear fusion reactors, blackbody radiation peaks well into the X-ray regime. In all, the spectral characteristics of the emitted radiation provide a reliable means for remotely sensing an object’s surface temperature. The effective surface temperatures of the sun (∼5500 K) and of distant stars are determined in this manner. Another clear trend in the spectra is that the flux density integrated over all wavelengths (the total power emitted per unit surface area) increases rapidly with temperature. Specifically, the total blackbody emission is proportional to the absolute temperature to the fourth power: ∞ I(λ, T)dλ = σ T 4 . (43) I(T) = 0
This is known as the Stefan–Boltzmann law. The Stefan–Boltzmann constant σ , has the value 5.67 × 108 W/m2 ·K4 . Optical sources that use radiation emitted by a heated element fall under the heading of incandescent sources. The globar is a source of this type used for the near and mid IR. It consists of a rod of silicon carbide that is joule heated by an electric current. The rod acts like a graybody that has a typical emissivity around 0.9. The temperature can be varied up to about 1000 K and produces usable radiation in the range of about 1–25 µm. Tungsten filament lamps are used to generate light in the visible through mid IR. The tungsten filament acts like a graybody and typically operates at about 3000 K. The filament is in a glass bulb that contains nitrogen or argon gas, which retards the evaporation of the filament. Any evaporation degrades the filament and leaves a darkened tungsten coating on the inside of the bulb, which diminishes light output. Halogen lamps mitigate this problem by the addition of iodine or bromine vapor into the gas. The halogen vapor combines with the tungsten on the bulb surface and forms tungsten iodide or tungsten bromide. The molecules dissociate at the heated filament, where the tungsten is redeposited, and the halogen atoms are recycled back into the surrounding gas.
Spectral flux density [× 107 W/(m2 • µm)]
10 6000 K
5 5000 K
4000 K 3000 K 0 0.0
0.5
1.0 Wavelength (µm)
1.5
Figure 12. Blackbody radiative spectra.
2.0
Electric Discharge Sources. In a discharge lamp, an electric arc is struck between the electrodes at either end of a sealed transparent tube filled with the atoms of a particular gas. The gas atoms become excited and emit radiation whose spectrum is characteristic of the type of gas. Low-pressure tubes operated at low current produce spectral lines that are quite sharp, whereas lamps operated at high pressure and current tend to sacrifice spectral purity in favor of increased output intensity (4). The sodium arc lamp and the mercury discharge tube are two examples of electric discharge sources. Light from a sodium lamp is yellow and arises from the two closely spaced spectral lines (referred to as the sodium doublet) near 589 nm. Mercury vapor produces prominent visible lines in the violet (405 and 436 nm), green (546 nm), and yellow (577 and 579 nm). Fluorescent lamps, commonly found in household and office fixtures, are low-pressure glass discharge tubes that
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
contain mercury vapor. The inside surface of the tube is covered with a phosphor coating. Ultraviolet emission from the excited mercury vapor induces fluorescence (see section on Fluorescence and phosphorescence) in the phosphor, which produces visible light. Lasers. In 1917, Einstein first introduced the principle of stimulated emission of radiation from atoms (see section on Stimulated emission). In the process of stimulated emission, an existing radiation field induces the downward transition of an atom in an excited state. More specifically, when a photon is present whose energy hν matches the transition energy of the excited atom, there is an enhanced probability that the atom will deexcite. Upon doing so, the atom emits a photon that has precisely the same energy (or frequency), phase, polarization, and direction of propagation as the stimulating photon. The chance that an emission of this type occurs is proportional to the number of stimulating photons present. Devices that produce radiation by taking full advantage of this process are called lasers (23,24). The word laser stands for light amplification by the stimulated emission of radiation. Lasers emit an intense, highly monochromatic, collimated beam of optical radiation. They serve as light sources in many imaging applications, including holography, Doppler radar, LIDAR (light detection and ranging), and a number of areas in diagnostic medicine. All lasers consist of three basic components. First, the laser must have an amplifying medium in which stimulated emission occurs. Next, there is an optical resonator, or highly reflective cavity, in which the medium is enclosed. The resonator reflects the emitted photons back and forth between two highly reflective ends. This produces further stimulated emission, and leads to the intense buildup of a coherent electromagnetic field; a portion of the emission passes through one end of the cavity and is used as the light beam. The third component is the laser pump. To understand its purpose, consider the following: Under conditions of thermal equilibrium (temperature T), the amplifying medium contains many more atoms in low-energy states than in those of high energy; the number in any energy state E is proportional to the Boltzmann factor, exp(−E/kB T). Given two energy levels E1 and E2 , the number of atoms N1 and N2 in these two states is in the ratio N2 = e−(E2 −E1 )/kB T . N1
(44)
For transitions in the optical regime where E2 − E1 is around 2–3 eV, the number of atoms that reside in the upper energy level is on the order of some 10−53 to 10−35 less than the number in the lower state for a system at room temperature (∼300 K). Under these circumstances, it is much more likely that photons will be absorbed than undergo stimulated emission. Hence, for stimulated emission to dominate and produce the amplification of a light field, it is first necessary to devise a method for placing more atoms in higher energy states than in lower ones — a condition known as population inversion. The role of the laser pump is to accomplish this inversion. A number
223
Table 2. Properties of Some Different Types of Lasers Type
Wavelength
Comments
Gas Lasers Helium–neon Argon ion Nitrogen Carbon dioxide
632.8 nm 488.0, 514.5 nm 337.1 nm 10.6 µm
Cwa Cw Pulsed Cw, high power
Solid-State Lasers Ruby Nd-YAG Nd-glass
694.3 nm 1.06 µm 1.06 µm
Pulsed, high power Cw Pulsed
(Liquid) Dye Lasers Rhodamine 6G Sodium fluorescein
560–650 nm 520–570 nm
Cw/pulsed, tunable Cw/pulsed, tunable
Semiconductor Laser Diodes (Injection Lasers) GaAs AlGaAs GaInAsP
840 nm 760 nm 1300 nm
Cw/pulsed Cw/pulsed Cw/pulsed
Excimer (Molecular) Lasers Xenon fluoride Argon fluoride
486 nm 193 nm
Pulsed, high power Pulsed, high power
Chemical Lasers Hydrogen fluoride
2.6–3 µm
Cw/pulsed
Free-Electron Lasers Wiggler magnet a
100 nm–1 cm
Tunable, high power
Continuous wave.
of different types of pumps exist that are based on various types of processes and energy transfer mechanisms. Table 2 lists the different types of lasers and the properties of some that are more commonly used. A description of the various laser types can be found in the references. Note, however, that the last entry in the table, the free-electron laser, is the only type of laser listed that is not based on the process of stimulated emission. Rather, radiation is produced by the oscillation of relativistic, free electrons in a wiggler magnet (25,26). The free-electron laser resembles a synchrotron source and does not function like other types of lasers. Bremsstrahlung. Figure 13 is a basic drawing of a typical X-ray tube, a common source of X rays used in many industrial and medical settings, as well as a source for X-ray spectroscopy. An electric current heats a filament (the cathode) and generates a cloud of free electrons (a process known as thermionic emission). The electrons are accelerated across the gap in the evacuated tube by a potential difference on the order of 103 –105 V. Then, the energetic electrons strike a metal target (the anode). Upon entering the target, the electrons are deflected by the coulombic field that surrounds the nuclei of the
224
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
where e is the electron’s charge and V is the accelerating voltage of the X-ray tube. It is possible, on occasion, for all of the electron’s kinetic energy (except for an extremely small amount due to recoil of the target nucleus) to be radiated as a single photon. Such a photon would correspond to one that has the maximum permissible energy, or equivalently, the shortest allowable wavelength λmin . The minimum wavelength is determined by setting the photon energy hc/λmin , equal to the kinetic energy of the electron. The result is simply λmin = hc/eV, or
X rays e− Target
Current supply
Filament
+
−
Hv supply Figure 13. A basic X-ray tube. Electrons from a heated filament are accelerated toward a metal target where X rays are produced.
30 KV
(46)
As an example, the spectrum from a 20-kV accelerating potential has a cutoff wavelength of 0.062 nm and peaks at a wavelength of about 0.093 nm. In addition to bremsstrahlung, real X-ray tubes emit characteristic X rays specific to the target atoms. The characteristic X rays show up as pronounced narrow spectral lines that sit on top of the bremsstrahlung continuum. In practice, only a very small fraction of the electron beam energy shows up as X radiation. Most of the energy actually goes to heat the anode. Target materials like tungsten that have high atomic numbers produce a higher yield of X rays — but even then heating effects dominate. For this reason, the anode must be cooled and/or rotated to prevent melting.
8
Intensity (arb. units)
1240 . V (volts)
In addition, the approximate peak of the spectrum is located at (47) λpeak ∼ = 32 λmin .
10
6
4 20 KV 2
10 KV 0 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Wavelength (nm) Figure 14. Continuous X ray, or Bremsstrahlung, spectra for different accelerating potentials.
target atoms and thus experience an acceleration. The electrons radiate electromagnetic energy, causing them to experience a rapid deceleration in the target. The radiation emitted is called bremsstrahlung or braking radiation. Bremsstrahlung is characterized by a continuous spectrum; most of the radiation is emitted in the X-ray region. Figure 14 shows typical bremsstrahlung emission spectra from an X-ray tube that uses different accelerating voltages. An important feature to observe is that each spectrum has a minimum wavelength λmin below which no radiation is emitted. The existence of this wavelength cutoff can be explained only by considering a quantum, that is, photon, picture of the emitted radiation. In general, as an electron decelerates, it can lose energy by emitting any number of photons. However, the sum of the photon energies emitted by the electron in the vicinity of any one target nucleus, cannot exceed the kinetic energy K of the electron upon entering the target. This kinetic energy is determined by the product K = e · V,
λmin (nm) =
(45)
Electromagnetic Waves in Matter Speed of Light in a Dielectric Medium In vacuum, all electromagnetic waves travel at the √ same speed, c = 1/ ε0 µ0 = 3.00 × 108 m/s (see Eq. 2). This follows from the wave equation (Eq. 1) developed by Maxwell. Now, if one applies Maxwell’s equations to a region of space filled with a nonconducting, or dielectric, medium, the wave equation appears yet again, except that ε0 and µ0 , the electric permittivity and magnetic permeability of free space, are replaced by ε and µ, the permittivity and permeability of the medium. The result is that, like vacuum, homogeneous dielectric materials also support the propagation of EM waves, but now the wave speed is given by 1 (48) v= √ . εµ Basically, ε and µ are measures of a material’s response to applied electric and magnetic fields. When an E field is applied to a dielectric, the individual electric dipoles of the atoms or molecules become aligned. Similarly, an applied B field aligns the orientation of the individual electron current loops (or magnetic moments) of the atoms or molecules. These alignments produce internal fields within the dielectric. As a result, the actual electric or magnetic field in the medium is the sum of the externally
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
applied field and the internal field generated by the atoms or molecules of the material. The overall effect is to change the apparent coupling between the E and B fields in Maxwell’s equations, which in turn, changes the propagative speed of the electromagnetic waves. Ordinarily, the speed of an EM wave in various materials is quoted as an index of refraction (or simply index, for short). The index of refraction n of a medium is just the ratio of the wave speed in vacuum to the wave speed in the medium:
(49)
For nonmagnetic media, µ and µ0 are essentially identical, √ so n = ε/ε0 . The dielectric constant κ of a material is defined as its permittivity relative to that of free space:
Therefore, n=
ε . ε0 √
(50)
κ.
(51)
The refractive index of some common substances in yellow light are listed in Table 3. The larger the index of refraction, the slower the speed of the wave. For example, the speed of light in air differs from the speed in vacuum by only about 0.03%, whereas the speed in water is about 33% slower than in vacuum. For a given frequency, the wavelength of the radiation depends on the speed of the wave, and hence the refractive index of the supporting medium:
Table 3. Index of Refraction for Some Common Substancesa Substance
Index
Gases (0° C, 1 atm) Air Carbon dioxide
1.000293 1.00045
Liquids (20° C) Benzene Ethyl alcohol Water
1.501 1.361 1.333
Solids (20° C) Diamond Silica Crown glass Flint glass Polystyrene
λ=
2.419 1.458 1.52 1.66 1.49
a Using sodium light, vacuum wavelength of 589 nm.
(52)
where λ0 is the wavelength in vacuum. In other words, compared to vacuum, the wavelength is reduced by the factor n. From Eq. (7), the wave number k is also larger than the wave number k0 , in vacuum, by the factor n: k = nk0 .
c n≡ v εµ = . ε0 µ0
κ≡
v c/n c/ν = = ν ν n λ0 = , n
225
(53)
The precise value of the speed, and hence the refractive index, of a substance depends on the frequency of the radiation, a general phenomenon known as dispersion. Notice that the values of n in Table 3 are for yellow light whose vacuum wavelength is 589 nm. Generally, visible light toward the violet end of the spectrum has a somewhat higher index and reddish light has an index that is somewhat lower. Dispersion occurs, because the speed of light is determined by the permittivity of the medium, and the permittivity is a function of frequency. A convincing example of this is that for microwave frequencies and lower, the relative permittivity (i.e., dielectric constant) κ for water is around 80. According to Eq. (51), the fact that yellow light has an index of 1.333 means that the permittivity of water at optical frequencies, must be drastically less. A more complete discussion of dispersion follows in the next section. Dispersion in a Dielectric Medium. Many of the basic features that underlie the frequency dependence of the speed of light in a dielectric are successfully explained by a relatively simple classical model of the interaction between an EM wave and the molecules of a material. The model considers that each molecule, in effect, is composed of two equal and opposite charges, q and −q, bound together. For simplicity, one of the charges is treated as stationary (i.e., infinitely massive), whereas the other charge (mass m) is free to move. In a nonpolar molecule, for example, q and −q correspond to the combined positive charge of the nuclei and the surrounding negatively charged cloud of electrons. When the molecule is unperturbed, the centers of the positive and negative charge distributions coincide. However, in the presence of an external electric field, the center of the electron cloud becomes displaced relative to the nuclei — in other words, the field produces an induced dipole moment in the molecule. Now the charge distribution is asymmetrical, and the electron cloud experiences an electrostatic restoring force that tends to pull it back to equilibrium. In our simple model, this restoring force is represented by a linear spring (force constant K) that connects the two charges. The system has the properties of a simple harmonic oscillator, characterized by a natural,
or resonant, vibration frequency ω0 = K/m. When visible light or any other electromagnetic wave, is present, the effect is to introduce a harmonically oscillating electric field at the location of the molecule. The field continually shakes the charge in the molecule back and forth at a frequency, ω that matches that of the wave. The
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
oscillator (i.e., molecule) experiences both this harmonic driving force and the spring-like restoring force discussed before. There is a third and final force, namely, a damping force, that tends to retard the motion and cause energy losses in the system. It arises because of interactions between the oscillator and other nearby molecules and because energy is carried away by radiation from the oscillating (and hence accelerating) charge. The latter effect is referred to as radiation damping. It can also be thought of as the reactive force of the radiative field on the oscillator. Each of the three forces identified before contributes a term to the oscillator’s equation of motion. If it is assumed that the EM wave is linearly polarized along the x direction, the charge will displace along that direction, and the equation of motion is m
dx d2 x = −Kx − b + qE0 cos ωt. dt2 dt
(54)
The left-hand side is simply the product of the mass and acceleration of the oscillating charge, and the right-hand side is the sum of the three forces acting on it. The first term represents the restoring force, the second term is the damping force, and the third term corresponds to the driving force that arises from the interaction of the charge with the oscillating field of the EM wave. Notice that the damping term (where the damping constant is b) resembles a drag force and is proportional to the speed of the oscillator at a given instant. The negative signs attached to the first two terms guarantee that the restoring force and the damping force are directed opposite to the oscillator’s instantaneous displacement and velocity, respectively. Equation (54) can also be written as q d2 x 1 dx + + ω02 x = E0 cos ωt, dt2 τ dt m
(55)
where τ = m/b represents an effective time constant associated with the damping, or energy dissipation, in the system. There are two important frequencies that appear — the resonant frequency ω0 , which is solely a property of the oscillator or type of molecule, and ω, which is the frequency of the driving wave. Now, we seek the steady-state, or particular, solution to Eq. (55), which is an inhomogeneous, linear, secondorder differential equation having constant coefficients. In cases like this, where the driving term is oscillatory, the process of solving the equation is simplified by replacing the factor cos ωt by the complex exponential exp(−iωt), where i stands for the square-root of −1. According to a mathematical identity known as Euler’s formula, e±iωt = cos ωt ± i sin ωt,
where x˜ is complex, and then taking the real part of this result. To solve Eq. (57), assume a solution of the form x˜ (t) = Ae−iωt ,
The meaning of a complex amplitude is as follows: As for any complex quantity, A(ω) can be reexpressed in the polar form (60) A(ω) = |A(ω)|eiϕ , the product of a magnitude and a complex phase factor (whose phase angle is ϕ). Substituting in the assumed form for the complex solution, Eq. (58), gives x˜ (t) = |A(ω)| exp[−i(ωt − ϕ)]. Then, the actual physical solution is obtained by taking the real part of this expression, which is just x(t) = |A(ω)| cos(ωt − ϕ). In other words, a complex amplitude means that there is some phase difference ϕ between the driving force and the resulting oscillation. The actual amplitude of the oscillation is given by the magnitude of A(ω): |A(ω)| =
(q/m)E0
(ω02
−
ω 2 )2
ω2 + 2 τ
.
(61)
This function, sketched in Fig. 15, represents the so-called resonance curve for a damped, driven oscillator. Naturally, the question at this point is how does finding the solution for the displacement x(t) lead to an understanding of dispersion? The answer has to do with the way different
1/tw0 = 0.2
(56) 0
we see that cos ωt is identical to the real part of exp(−iωt). This means that the solution to Eq. (55) is obtained by first determining the mathematical solution to the equation d2 x˜ 1 d˜x q + + ω02 x˜ = E0 e−iωt , dt2 τ dt m
(58)
where A is some complex oscillatory amplitude. Substitution in the equation of motion shows that the amplitude is a function of the driving frequency of the electromagnetic wave: (q/m)E0 . (59) A(ω) = 2 ω0 − ω2 − i ωτ
A (w)
226
(57)
1
2
3
w/w0 Figure 15. Resonance curve for a damped, driven oscillator. The curve shows how the oscillation amplitude depends on ω, the driving frequency. ω0 is the natural, or resonant, frequency of the oscillator and the factor 1/τ is proportional to the amount of damping in the system.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
driving frequencies induce different degrees of charge separation in the molecules and whether they result in different phase shifts between the perturbing wave and the oscillatory motion. The action of the EM field on a molecule is to produce an oscillating electric dipole, whose (complex) dipole moment p(t) = q · x˜ (t). Using Eqs. (58) and (59), one finds that the dipole moment can be reduced to the form
This complex refractive index can be broken up into its real part and imaginary part, n˜ = n + i, where
n=1+
α(ω) =
q2 /m ω02 − ω2 − i
(64)
It is important not to confuse the polarization of the medium with the polarization associated with the EM wave. In a simple, linear dielectric, another measure of the degree to which a medium is polarizable is the value of the relative permittivity κ in excess of the value in vacuum (namely, unity). More specifically, P = (κ − 1)ε0 E.
(65)
Combining Eq. (65) with Eq. (64) gives N α. ε0
(66)
Now, we can make the connection between our model and the refractive index, or speed of light in the medium. √ Recall that n = κ, where the refractive index n is a real quantity. Note that in the present treatment, the molecular polarizability α is, in general, complex, and therefore, so is κ. Now, we introduce a complex index of ˜ given by refraction n, 1+
N α. ε0
(67)
At sufficiently low density, such as in a gas, Nα/ε0 1, and the right-hand side of Eq. (67) can be expanded in a Taylor series. Keeping only the two leading terms and plugging in Eq. (63) for α, gives n˜ = 1 +
=1+
N α 2ε0
2 ω 2 2 2 (ω0 − ω ) + 2 τ −ω
1 2 ω 2 p
(69)
. ω (ω02 − ω2 )2 + 2 τ ω/τ
2
ωp2 =
Nq2 . mε0
(70)
(71)
ωp is known as the plasma frequency of the medium. As a specific example, consider variations in the refractive index of crown glass in the visible region of the spectrum (ν = ω/2π ≈ 5 × 1014 Hz). If we make the simplifying assumptions that the frequencies of interest are much lower than the natural vibrational frequency of molecules in the glass (i.e., ω ω0 , and the light is far from resonance) and that damping effects are small (i.e., τ is very large), then, n≈1+
1 2 ω 2 p
1 2 ω0 − ω 2
ωp2 n=1+
and
2
1 Nq . 2mε0 ω2 − ω2 − i ω 0 τ
(68)
,
(72)
and is essentially negligible. In other words, n˜ ≈ n, and the refractive index is purely real, as one might expect. Figure 16 displays data for the index of crown glass as a function of the wavelength in vacuum. The accompanying curve is the best fit to the data using the very simple model of Eq. (72). The resonant frequency can be extracted from the fit; the result is ν0 = ω0 /2π = 2.95 × 1015 Hz — that is, the resonant frequency is in the ultraviolet. The characteristic shape of the curve shown here is typical of the dispersion curve for many transparent substances at frequencies far below resonance: At short wavelengths, the index of a given material is higher than at long wavelengths. This type of behavior, where dn/dω > 0 (or dn/dλ < 0), is referred to as normal dispersion. Now, consider the behavior of the refractive index in the vicinity of resonance, when ω ≈ ω0 . Then ω02 − ω2 = (ω0 + ω)(ω0 − ω) ≈ 2ω0 (ω0 − ω), and Eqs. (69) and (70) for the real and imaginary parts of the index reduce to ω0 − ω 4ω0
(ω0 − ω)2 +
2
We have introduced the parameter ωp , where
P = Np = NαE.
n˜ =
1 2 ω 2 p
(63)
ω τ
and is called the molecular polarizability. This parameter is a measure of how easy it is, at frequency ω, for the oscillating electric field to induce a separation of the bound charges in the molecule. If we consider a region of dielectric that contains N molecules per unit volume, we can speak of the local electric polarization P of the medium, which is the dipole moment per unit volume:
κ =1+
ω02
(62) =
where
and
p(t) = α(ω)E(t),
227
=
1 (2τ )2
ωp2 /8τ ω0 (ω0 − ω)2 +
1 (2τ )2
.
(73)
(74)
228
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
according to Einstein’s Theory of Special Relativity, it is impossible for a signal to propagate at a speed greater than c. To resolve this enigma, one must distinguish between the phase velocity and the group velocity of a wave. As mentioned earlier, v = c/n, the phase velocity, corresponds to the speed of a perfectly harmonic, monochromatic wave, as in Eq. (4). Such a wave can, in fact, travel faster than c. But to carry any information, some sort of modulation, or variation in amplitude, must be impressed on the idealized wave. Even the process of simply switching the source of the wave on or off introduces some degree of modulation because the wave will no longer be infinite in extent and duration. Information, or the signal carried by the wave, is encoded in the modulation envelope. In general, however, the speed of the envelope may be different from the phase velocity c/n. The rate at which the envelope propagates, known as the group velocity of the wave, is given by
1.56 Dispersion curve for crown glass
1.55 Measured data
n
1.54 Simple oscillator theory
vg =
1.53 400
500 600 Wavelength (nm)
700
dω . dk
(75)
Compare this to the expression for the phase velocity:
Figure 16. Refractive index of crown glass as a function of wavelength for visible light. Fitting the data to simple theory produces a resonant frequency of 2.95 × 1015 Hz, which is in the ultraviolet. (Data are from Ref. (2), Table 3.3, p. 71.)
v=
ω ω/k0 ω c = . = = n n nk0 k
(76)
It is not too difficult to show that these relationships lead to yet another expression for group velocity: c
vg =
n+ω
n−1 w
1/t w0 Figure 17. Frequency dependence of the real part (n) and imaginary part () of the refractive index in the vicinity of resonance ω0 for a dielectric medium whose damping is 1/τ .
A general sketch of n and near resonance is shown in Fig. 17. Consider the dispersion curve for n, the real part of the index. At ω = ω0 , the value of n is unity, and just below and above resonance, there are a maximum and a minimum, respectively. For frequencies between the maximum and minimum, the slope of the curve is negative, (dn/dω < 0), and one says that the medium exhibits anomalous dispersion. On either side of this region, the slope is positive (dn/dω > 0) and, as before, normal dispersion occurs. A puzzling feature of Fig. 17 is that for frequencies above the resonant frequency, the real index n is less than unity. This means that c/n, the speed of the wave, is greater than c, the speed of light in vacuum. However,
dn dω
.
(77)
Even though the phase velocity c/n is greater than c in the region of normal dispersion above the resonant frequency, the slope dn/dω is positive, and Eq. (77) guarantees that the group velocity is always less. Hence, the signal (i.e., the envelope of the wave) propagates at a speed less than c. Notice, however, that in the region of anomalous dispersion, where dn/dω is negative, Eq. (77) indicates that vg > v and apparently the group velocity may exceed c. However, the idea of group velocity loses its significance in this region, and again the transfer of information never exceeds c (5). Absorption Near Resonance We have yet to discuss the meaning of , the imaginary part of the index of refraction, and its behavior as a function of frequency, as displayed in Fig. 17. To understand the significance of , consider a simple EM plane wave that propagates in the +z direction through a dielectric medium. In complex form, the wave is written as ˜ E(z, t) = E0 ei(kz−ωt) .
(78)
k represents the wave number in the medium, or the product of the index of refraction, and k0 , the wave number in vacuum [see Eq. (53)]. We know that, in general, the medium has a complex index n˜ = n + i. So, we replace k ˜ 0 , and the field reduces to by nk ˜ E(z, t) = E0 e−ωz/c ei(nωz/c−ωt) .
(79)
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
The actual physical field is obtained by taking the real part of this expression, so E(z, t) = E0 e−z/δ cos
nω c
z − ωt .
(80)
The parameter δ = c/ω
(81)
is called the skin depth of the medium at frequency ω. Recall that I, the intensity of the wave, is proportional to the magnitude of the field squared (averaged over time). So, (82) I = I0 e−(2/δ)z , where I0 is the intensity at z = 0. For a purely real refractive index, = 0, which means that the skin depth is infinite and the wave propagates without any attenuation or energy loss. However, if the index is complex, the skin depth is finite, and the wave is attenuated. The energy carried by the wave is absorbed exponentially along the direction of propagation. The quantity 2/δ is called the absorption or attenuation coefficient of the medium. For a substance to be transparent to radiation of frequency ω, the skin depth at that frequency must be much larger than the thickness of the material. Now, return to Fig. 17 and the sketch of as a function of frequency. It shows that the imaginary part of the index is very pronounced in the vicinity of resonance, which signifies that absorption in this region is strong. The shape of the versus ω curve is Lorentzian, and the two frequencies that mark where the curve hits half the maximum height correspond to the edges of the anomalous dispersion region, also called the absorption band. The width of this region, or the curve’s full width at halfmaximum (FWHM), is simply 1/τ . Corrections to Dispersion Model Identified here are two important corrections to the simple molecular dispersion model developed thus far (27): 1. Until now, the model has assumed a low molecular density in the dielectric. However, in general, this assumption holds only for dilute gases. It turns out that the effects of increased density can be properly accounted for by simply replacing the quantity κ − 1 with 3(κ − 1)/(κ + 2) in Eq. (66). The revised equation is known as the Clausius–Mossotti relationship: 3ε0 κ − 1 = α. (83) N κ +2 The correction to the original equation comes from the fact that, in addition to the externally applied field, each molecule also experiences another field produced by the polarization of the surrounding molecules. The Clausius–Mossotti relationship is particularly useful because it relates macroscopic quantities [left-hand side of Eq. (83)] to α, which is a molecular quantity. Furthermore, because molecular polarizability should depend only on the
229
type of molecule (and frequency), the left-hand side of Eq. (83) remains constant, independent of density. Hence, if one knows κ for a molecular species at one density, the value can be computed for another √ density, and likewise for κ, or the refractive index. It can be shown that the only effect of increased density on the dispersive behavior of a dielectric medium is simply to shift the center of the absorption band downward from frequency ω0 to a value of ω02 − ωp2 /3. 2. The model presented assumes that a molecule has only a single resonance ω0 . In general, there are resonances at a number of frequencies; ω01 , ω02 , ω03 , . . ., etc. As before, resonances in the UV portion of the spectrum are ordinarily associated with natural oscillations of electrons bound in the molecule. Resonances in other regions of the spectrum, however, usually originate from other oscillatory modes in the system. For example, resonances in the IR are usually caused by interatomic vibrations in the molecule. Each resonance ω0i has its own characteristic mass mi and damping time τi . The result is that now the molecular polarizability α contains a contribution from each resonant frequency of the molecule, so that α(ω) =
αi (ω),
(84)
i
where fi αi (ω) =
q2 mi
2 ω0i − ω2 − i
ω. τi
(85)
Each resonance has a weighting factor, or oscillator strength, denoted by fi . As shown in Fig. 18, the overall effect of multiple resonances is that each gives rise to anomalous dispersion and its own absorption band. Wave Propagation in a Conducting Medium Any medium that can conduct an electric current contains free charges. Unlike a dielectric, these charges are
IR
n
Visible
UV
X ray
1
0
w01
w02
w03
w
Figure 18. Real part of the index of refraction as a function of frequency for a dielectric medium that has multiple resonances at ω01 , ω02 , ω03 , . . ., etc.
230
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
unbound, and no restoring force acts on them when they are displaced by an electric field. In a metal, for example, outer (i.e., valence) electrons are released from the grip of individual atoms and, in the process, form a pool of so-called conduction electrons. These electrons are no longer localized near any particular atom. Instead, they are completely free to move about within the solid, much like the motion of particles in an ideal gas. The metal is said to contain a free electron gas. The basic behavior of EM waves in a conductor can be modeled by saying that the existence of an unbound charge is equivalent to setting ω0 → 0. Applying this condition to Eq. (63) for the molecular polarizability α and plugging the expression into Eq. (66) for κ gives κ =1−
ωp2 ω2 + i
ω. τ
(86)
But recall that κ = n˜ 2 = (n + i)2 = (n2 − 2 ) + 2in. Equating the real and imaginary parts of this expression to those from the right-hand side of Eq. (86) gives ωp2
n2 − 2 = 1 −
ω2 +
1 2n = ωτ
ωp2 ω2
1 + 2 τ
.
ωp2 τ ω
.
(88)
(89)
(90)
This result is applicable, for example, to radio waves and microwaves in metals. Later on, it will be shown that at these wavelengths, most of the energy carried by the wave is generally reflected from the surface of a metal, but because of the large imaginary part of the index, the small part that enters the medium is strongly absorbed and heats the conductor. From Eq. (81), the skin depth of the medium is given by δ=
2c2 . ω(ωp2 τ )
(91)
2 . µ0 ηω
(92)
The conductivities of most metals are on the order of 108 −1 ·m−1 . So, at a typical microwave frequency (say 10 GHz), the skin depth is on the order of a micron or less. Compare this to the skin depth of seawater at the same frequency (seawater conducts because the salt dissociates into ions, which are free charges). Seawater has a conductivity of about 4–5 −1 ·m−1 , which gives a microwave skin depth of a few millimeters. The penetration of the wave can be increased still further by reduction to radio-wave frequencies. At a frequency of 10 kHz, for example, the skin depth of a wave in seawater is a few meters, showing that it can penetrate significantly. At intermediate frequencies, where τ −1 ω ωp , n2 − 2 ∼ =−
2n ∼ =
Then, solving for n and gives ωp n∼ 1. =∼ = 2ω/τ
2c2 ε0 = ηω
and
n2 − 2 ∼ = −ωp2 τ 2
2n ∼ =
δ=
(87)
The most common situation is one where the damping is small and τ −1 ωp . This allows a straightforward determination of n and at low, intermediate, and high frequencies, as discussed here. In the limit of low frequency, where ω τ −1 , the preceding equations reduce to
and
and
1 τ2
It turns out (6) that the quantity ωp2 τ is identical to η/ε0 , where η is the dc (i.e., low-frequency) electrical conductivity of the medium. This means that the conductivity is the only material property needed to determine the skin depth:
ωp2 ω2
ωp2
,
.
ω3 τ
(93)
Then, the real and complex index become n∼ = and
ωp 2ω2 τ
∼ =
ωp . ω
(94)
In this frequency regime, is much larger than n, and the skin depth is simply c . (95) δ= ωp These results apply to metals, when light is in the infrared. Finally, consider the index at high frequencies, when ω ωp . Then, n2 − 2 ∼ = 1, and 2n ∼ = which leads to
ωp2
,
(96)
.
(97)
ω3 τ
n∼ =1
and ∼ =
ωp2 2ω3 τ
Here, 1, so the skin depth is extremely large. The conductor is essentially transparent to the radiation. This explains, at least from a classical point of view, a number of important phenomena. For example, consider
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
copper, which has a free electron density of N = 8.48 × 1028 electrons/m3 (this number is based on the fact that each copper atom contributes a single conduction electron to the solid). From Eq. (71), the plasma frequency of copper is νp = ωp /2π = 2.61 × 1015 Hz, which is in the near UV. Because the plasma frequency is much lower than the frequency of gamma rays, X rays, and even radiation in the far UV, these radiations are easily transmitted through copper and other metallic conductors. Another example involves transmission of radiation through the ionosphere, which contains ionized gas. Here, N, the density of these ions, is roughly 17 orders of magnitude lower than the value for a typical metal. This gives a plasma frequency of only a few megahertz, which explains why the atmosphere is transparent in the microwave region. Figure 19 is a plot of n and versus ω/ωp for the particularly simple case of τ −1 → 0, or no damping. Using this idealization, the plasma frequency plays the role of a critical frequency. At frequencies less than this value, the index is purely imaginary — this means that the wave is completely reflected at the boundary of the medium. At frequencies higher than ωp , the index is purely real, and the conductor is transparent. Of course, any real conducting medium is characterized by a finite value of τ , and the actual curves will not be quite as simple as those shown here. However, except at very low frequencies (ω τ −1 ) where the effects of damping are extremely important, there is a remarkable resemblance between the curves of Fig. 19 and those for real conductors (27).
particular coordinate direction. In reality, a state of linear polarization corresponds to only one of a number of possible types of polarization, as discussed in this section. Types of Polarization Consider a monochromatic electromagnetic plane wave (frequency ω, wave number k) that propagates in the +z direction. The most general form for the electricfield vector is a superposition (i.e., sum) of two mutually orthogonal, linearly polarized waves, one polarized along the x direction and the other polarized along the y direction: (98) E(z, t) = ˆiEx (z, t) + ˆjEy (z, t). Each component is just a harmonic wave that has its own amplitude (E0x , E0y ) and its own initial phase: Ex (z, t) = E0x cos(kz − ωt) Ey (z, t) = E0y cos(kz − ωt + ).
(99)
Without any loss of generality, the initial phase of the x component of the wave has been set to zero, and represents the phase difference between the y component and the x component. Depending on the value of and the relative values of the two amplitudes, E0x and E0y , one obtains different types of polarization for the propagating wave. The different cases are presented here:
Linear Polarization. Consider the case when = 0 or π . Then, the field is simply
Polarization Until now, it has been assumed that the electric-field vector of an EM wave is linearly polarized along a 5
E(z, t) = (ˆiE0x ± ˆjE0y ) cos(kz − ωt).
(100)
(The upper sign is for = 0, and the lower sign is for = π .) This is the situation where the x and y components oscillate precisely in synch with one another; the result is a linearly polarized wave whose amplitude is E20x + E20y . Together, the electric-field vector and the direction of propagation define the plane of vibration of the wave. Imagine that the wave is traveling directly toward you, and consider the time variation in the electric-field vector at some fixed position (i.e., z = const). As depicted in Fig. 20, the tip of the vector will oscillate back and forth along a line tilted relative to the +x axis by angle θ , where tan θ = E0y /E0x . When = 0, the field vibrates in quadrants I and III in the x, y coordinate system, and when = π , it vibrates in quadrants II and IV.
4
3
2
1
Circular Polarization. Let E0x = E0y = E0 ±π/2. Then,
n
E(z, t) = E0 [ˆi cos(kz − ωt) ∓ ˆj sin(kz − ωt)]. 0 0.0
231
0.2
0.4
0.6
0.8
1.0 w/wp
1.2
1.4
1.6
1.8
2.0
Figure 19. Frequency dependence of the real part (n) and imaginary part () of the refractive index for a conductor that has no damping (i.e., 1/τ → 0). The plasma frequency ωp plays the role of a critical frequency.
and =
(101)
In this case, the net field amplitude remains fixed at E0 , but its direction rotates about the origin at angular velocity ω. As the wave approaches and passes a fixed position, the tip of the field vector traces out a circle of radius E0 , and the wave is said to be circularly polarized. Figure 21 shows the situation for = +π/2, where the rotation is
232
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
y E
E0y
E0y E
q
E0x
q
x
E0x
Figure 20. Linearly polarized electromagnetic wave. For a plane wave (amplitude E0 = E20x + E20y ) that travels in the +z direction (i.e., toward the reader), the tip of the electric field vector oscillates along a line tilted at angle θ = tan−1 (E0y /E0x ) relative to the x axis.
Figure 22. Elliptically polarized wave (left-handed). The polarization ellipse is inscribed in a rectangle whose sides are 2E0x and 2E0y .
ellipse is given by θ , where y
tan 2θ =
x
Figure 21. Circularly polarized wave (propagating toward the reader). The magnitude of the electric-field vector remains fixed and rotates at constant angular velocity about the z axis. The sense of rotation shown here corresponds to left-hand circular polarization.
counterclockwise — this corresponds to so-called left-hand circular polarization. When = −π/2, rotation becomes clockwise, and the wave has right-hand polarization. It is interesting to note that by combining left- and rightcircularly-polarized light (i.e., applying the two different signs in Eq. (101) and adding the fields together), one obtains a wave that is linearly polarized.
Elliptical Polarization. Finally, consider the general case of arbitrary values for the amplitude components and relative phase. Following some involved algebra, the quantity kz − ωt can be eliminated from Eqs. (99), producing the following equation that relates Ex and Ey : Ex E0x
2
+
Ey E0y
2
−2
Ex E0x
(103)
Linear polarization and circular polarization are both special cases of elliptical polarization.
E0
2E0x E0y cos . E20x − E20y
Ey E0y
cos = sin2 .
(102) This is the equation of an ellipse traced out by the tip of the electric-field vector, as shown in Fig. 22. The resulting wave is elliptically polarized, and Eq. (102) defines a polarization ellipse. As before, the field vector can rotate counterclockwise or clockwise, giving left-ellipticallypolarized or right-elliptically-polarized radiation. The ellipse is inscribed in a rectangle whose sides 2E0x and 2E0y are centered at the origin. The tilt of one axis of the
Unpolarized and Partially Polarized Radiation As can be seen from the preceding discussion, an electromagnetic wave is polarized (linear, circular, or elliptical), as long as some definite, fixed phase difference is maintained between the oscillations of the two orthogonal field components. However, most sources of light and other EM waves produce radiation that is more complex due to the introduction of rapidly varying, random phase shifts in the x and y field components. In an incandescent or discharge lamp, for example, each excited atom emits a brief polarized wave train of light whose duration is on the order of about 10−8 s. Once such a wave train is emitted, the random nature of atomic emission makes it impossible to predict exactly when the next wave train will appear and what its polarization state will be. The result is that the phase shift fluctuates randomly and rapidly in time, and no well-defined polarization exists (at least, not for more than about 10−8 s). Radiation of this type is said to be unpolarized or randomly polarized. Another way to think of unpolarized radiation is as an incoherent superposition of two orthogonal, linearly polarized waves. By incoherent, we mean that there is no well-defined phase relationship between the two waves during any appreciable time. As unpolarized light advances, an observer might imagine seeing an extremely rapid sequence of polarization ellipses that have random orientations and eccentricities. Over time, however, no clear polarization ellipse emerges. The most general state of EM radiation is one of partial polarization. As in the unpolarized case, the phase difference between the orthogonal oscillations is not fixed. On the other hand, the phase difference is not completely random either. In partially polarized light, the phases of the two components are correlated to some degree, and the path of the electric-field vector tends to fluctuate about some preferential polarization ellipse. One way to think of the light is as a superposition of both polarized and unpolarized radiations. One sometimes refers to the degree of polarization of a radiation, which
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
is a number between 0 and 1. Light that is completely polarized, whether linear, circular, or elliptical, has a degree of polarization of unity, whereas light that is completely unpolarized corresponds to a value of zero. Partially polarized light is characterized by a fractional value between the two extremes. Polarization-Modifying Materials Here, we briefly mention certain classes of materials that can alter the state of polarization of an incident radiation. These materials are the basis for a number of optical elements common in many imaging systems. To begin with, certain materials can selectively absorb one of the two orthogonal polarization components and transmit the other. Materials of this type are said to exhibit dichroism. They include certain natural crystals, as well as commercially produced Polaroid sheet. In the field of optics, dichroic materials function as linear polarizers — they can be used to produce a beam of light that is linearly polarized, so that its plane of vibration is determined by the orientation of the element. These materials can also be used as polarization analyzers, which means that they block or transmit light that has certain polarization characteristics. A large number of crystalline solids exhibit a phenomenon known as birefringence. These are anisotropic materials that have two different indexes of refraction for a given direction of wave propagation; each index corresponds to one of two orthogonal polarization directions. When a general polarized beam travels through a birefringent crystal, each polarization component advances at a different phase velocity. This introduces an additional phase shift between the two components, causing a change in the polarization ellipse. The amount of shift depends on the precise values of the two indexes, the wavelength of the light, and the thickness traversed in the medium. An optical element that produces a phase shift of π/2 is called a quarter-wave plate; it can transform linearly polarized light into circularly polarized light, and vice versa. A halfwave plate introduces a shift of π . One of its main uses is to reverse the handedness of right- or left-circular (or elliptical) light. Because of their effect on relative phase, wave plates, such as those mentioned here, are also referred to as phase retarders. Birefringence can also be induced in a material by subjecting it to an external electric, magnetic, or mechanical (i.e., stress) field. For example, when a constant electric field or voltage is applied, the medium experiences a so-called electro-optic effect. When an optically isotropic medium is placed in the E field, a refractive index difference n develops in the material for light polarized parallel and perpendicular to the applied field. The degree of birefringence (i.e., the value of n) that appears is proportional to the square of the field. This is known as the Kerr effect. A second important electrooptic phenomenon is the Pockels effect, which occurs only for certain crystal structures. In this case, the effect of the field is linear, that is, the induced birefringence is proportional to E, rather than E2 . Kerr cells and Pockels cells are electro-optical devices based on these effects. They
233
are used as optical switches and modulators, as well as high-speed optical shutters. Finally, certain materials exhibit a property known as optical activity. These are substances that rotate the plane of vibration of linearly polarized light, either clockwise or counterclockwise. Certain liquids, as well as solids, are optically active. Two simple examples are sugar–water and quartz. Reflection and Refraction of Waves at an Interface Consider the fate of an electromagnetic plane wave that propagates in a transparent medium (index n) toward a smooth interface formed with a second transparent medium (index n ). In general, upon encountering the interface, the incident wave gives rise to two other waves; one is transmitted beyond the interface into the second medium, and the other is reflected back into the first medium, as illustrated in Fig. 23. The propagation direction of each wave is represented by a ray, which for isotropic media, also represents the direction of energy flow. The direction of a given ray is the same as the direction of the corresponding wave vector and hence tracks a line normal to the advancing wave fronts. The wave vectors for the incident, transmitted, and reflected waves are denoted by k, k , and k , respectively. The angles between the line normal to the interface and the incident, transmitted, and reflected rays are given by θ , θ , and θ , respectively. Boundary Conditions for the Fields By applying Maxwell’s equations to a region of space that contains material media, one can derive four conditions that must be satisfied by any electric and magnetic fields, E and B, at an interface. These are referred to as boundary conditions for the fields. Assuming for the present that the media are nonconducting, the boundary conditions can be stated as follows: 1. The normal component of B must be continuous across an interface. 2. The normal component of εE must be continuous across an interface.
Incident ray k
Reflected ray
k″
q q″
n n′
q′
k′ Transmitted ray
Figure 23. Incident, transmitted, and reflected rays at an interface. All three rays (as well as the normal to the interface) lie in the same plane, known as the plane of incidence.
234
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
3. The tangential component of E must be continuous across an interface. 4. The tangential component of µ−1 B must be continuous across an interface. For an EM wave striking an interface (see Fig. 23), the importance of these boundary conditions is that the incident and reflected waves that reside on the first side of the interface produce a net E field and B field that can be compared to the fields on the transmitted side. By demanding that the field components satisfy the criteria listed, one finds that certain restrictions are imposed on the directions of wave propagation, as well as on the relative amplitudes and phases of the incident, transmitted, and reflected fields. Geometry of Reflection and Refraction The four boundary conditions on the fields lead to some simple relationships that involve the directions of the three rays in Fig. 23 (these results hold independently of polarization). The first important observation is that all three rays lie in the same plane, called the plane of incidence. Next, we have the Law of Reflection, which simply states that the angle of incidence θ matches the angle of reflection θ : (104) θ = θ. The Law of Reflection is the basis for tracing rays through a sequence of reflective surfaces. When the surface is curved, one constructs the plane tangent to the surface, and measures the angles of incidence and reflection relative to the line normal to that plane. The equality between θ and θ holds for very smooth, or specular, surfaces. If the surface has any roughness, its rapidly varying curvature will still give rise to specular reflection on a local scale. However, on a larger scale, it will appear that the surface reflects radiation in all directions. This is called diffuse reflection. In general, a surface may produce both a specular and diffuse component of reflected intensity. The final geometric condition involves the incident and transmitted rays and is called Snell’s Law. It is simply n sin θ = n sin θ.
(105)
According to this relationship, whenever a wave travels from a medium of low index n (sometimes referred to as low optical density) into one whose index n is higher (or high optical density), the transmitted ray is bent toward the surface normal, that is, θ < θ (unless, of course, θ = 0, in which case θ and θ both vanish, and the transmitted wave travels straight through and normal to the interface). This is shown in Fig. 23. When a wave travels from a high-index medium to one whose index is lower, the transmitted ray bends away from the normal, and θ > θ . This phenomenon, where the wave changes direction after encountering an interface, is called refraction, and Snell’s law is sometimes also known as the Law of Refraction. The transmitted ray is often referred to as the refracted ray, and θ is also known as the angle of refraction. Among other things, refraction is responsible for the focusing action of simple lenses.
It is worth mentioning that the Laws of Reflection and Refraction can be derived from considerations other than Maxwell’s equations and the boundary conditions for the fields. One method involves the geometric construction of wave fronts starting with some initial wave front and knowing how the speed of the wave changes when going from one medium to the next. This is the method used in many introductory level optics texts. A far more interesting method is based on an idea called Fermat’s principle of least time. In its simplest form, it states that the path followed by a ray of light (or, for that matter, any other electromagnetic radiation) between any two specified points in space is the one that takes the shortest amount of time. For example, consider the path followed by the incident and refracted rays in Fig. 23, and compare it to a straight-line path that connects the same starting and ending points. Clearly, the straight path covers a shorter distance; however, light would take a longer time to cover that distance because the wave travels at a higher speed in the first medium (lower index) than in the second. Hence, the light’s travel time is minimized by having a majority of its path reside in the low-index medium, causing the wave to bend, or refract, at the interface. By applying some simple ideas of calculus to the problem of minimizing the wave’s travel time, one finds that Fermat’s principle reproduces Snell’s law. Effect of Dispersion on Refraction Recall that, because of dispersion, the refractive index of a material is, in general, a function of wavelength. This has an important effect on refraction, namely, that different wavelengths are refracted by different amounts. For example, consider a beam of white light obliquely incident on an air–glass interface, and recall that white light is a mixture of the various frequency components in the visible spectrum. Because the index of glass increases slightly at the shorter wavelengths (normal dispersion), white light will be spread out into a continuum of colors; violet and blue (i.e., short wavelengths) are refracted the most, and red (or long wavelengths) is refracted the least. This phenomenon explains why a prism can be used to separate white light into its component colors. It also underlies the mechanism for rainbow formation when light enters a region of air that contains a mist of small water droplets. Chromatic aberration is an often undesirable effect that occurs in optical instrumentation. The value of the focal length of any lens in the system is governed by refraction at the two surfaces of the lens. Because the index of the lens material is a function of wavelength, so too is the focal length of the lens, and different colors will focus to different points — this has the effect of smearing images that are formed. A parameter that quantifies the degree to which a given lens material produces chromatic aberrations is called the dispersive power of the medium. Roughly speaking, dispersive power is the ratio of the amount of dispersion produced across the range of the visible spectrum compared to the amount the lens refracts light in the middle of the spectrum (when surrounded by air). The inverse of the dispersive power, known as the Abbe number, or V number, of the material, is given by
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
V=
nyellow − 1 . nblue − nred
(106)
nblue , nyellow , and nred are the indexes of the material at wavelengths of 486.1327 nm, 587.5618 nm, and 656.2816 nm, respectively. These values are chosen because they correspond to precisely known hydrogen and helium absorption lines (or Fraunhofer lines) that appear in solar absorption spectra. Larger Abbe numbers signify that less chromatic aberration is produced by that lens material. One way to correct for the problem of chromatic aberration is to use a so-called achromatic doublet. This is nothing more than a combination of particular converging (positive focal length) and diverging (negative focal length) lens elements that have well-chosen Abbe numbers, V1 and V2 , and focal lengths, f1 and f2 (for the yellow wavelength listed before). For the most part, chromatic aberrations are eliminated by choosing the parameters to satisfy (2) f1 V1 + f2 V2 = 0.
Now, we turn to the problem of relating the field amplitudes of the transmitted and reflected waves to that of the incident wave. This, again, is determined by applying the boundary conditions at the interface. In this case, the results depend on the polarization of the incoming wave. Figure 24 shows the situation for (a) the electricfield vector normal to the plane of incidence and (b) the electric-field vector parallel to the plane of incidence. The first case is sometimes referred to as the transverse electric,
(a) E
k″
k
1 1 1 − B cos θ + B cos θ = − B cos θ . µ µ µ
= n n′
q′
E′
B′
k′
TE polarization E
E″
k″
B
=− n n′
q′
2 sin θ cos θ . sin(θ + θ )
E′
B′ k′ TM polarization Figure 24. Electric and magnetic-field vectors for the incident, transmitted, and reflected waves at an interface. (a) TE polarization: Electric field vector vibrates normal to the plane of incidence. (b) TM polarization: Magnetic field vector vibrates normal to the plane of incidence.
(110)
The second expression for t is easily obtained from the first by applying Snell’s law. One can also define an amplitude reflection coefficient r which is the ratio of the reflected to the incident field. By eliminating the transmitted field, one finds that E n cos θ − n cos θ = rTE ≡ E TE n cos θ + n cos θ
B″ q q″
(109)
The magnetic fields can be eliminated in this equation because the magnetic and electric-field amplitudes in a medium of index n are related by B = nE/c [see Eq. (3) where c is replaced by c/n]. In addition, the Law of Reflection allows replacing θ by θ . Further simplification occurs if we also assume that the media are nonmagnetic, which means that µ ∼ = µ0 . Then, the reflected field = µ ∼ can be eliminated from Eqs. (108) and (109), and one finds an expression for the quantity E /E. This ratio of the transmitted to the incident field is called the amplitude transmission coefficient t. For TE polarization, it is E 2n cos θ = tTE ≡ E TE n cos θ + n cos θ
B"
q q″
k
In addition, the tangential component of µ−1 B must be continuous, so
E″
B
(b)
or TE, case, and the second case is referred to as the transverse magnetic, or TM, case. In keeping with our previous convention for primed and unprimed symbols, the incident, transmitted, and reflected fields are denoted by the pairs (E, B), (E , B ), and (E , B ), respectively (a field vector represented by indicates that it is pointing out of the page). Note that for either polarization, the set (E, B, k) defines a right-handed triad, and similarly for the primed vectors. Because some of the boundary conditions depend on the permittivity and permeability of the media, we will use the symbols ε and µ for the incident medium and the symbols ε and µ for the transmitted medium. First consider TE polarization. The tangential component of the electric-field vector has to be continuous across the interface, so (108) E + E = E .
(107)
The Fresnel Equations
235
sin(θ − θ ) . sin(θ + θ )
(111)
Equations (110) and (111) are known as the Fresnel equations for TE polarization. The Fresnel equations for TM polarization can be derived similarly. Using the same two boundary conditions as before demands that
and
E cos θ − E cos θ = E cos θ
(112)
1 1 1 B + B = B . µ µ µ
(113)
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Again assuming nonmagnetic equations turn out to be tTM ≡
E E
TM
media,
the
Fresnel
2 sin θ cos θ sin(θ + θ ) cos(θ − θ )
0.6
(114a)
and
E E
= TM
Internal reflection (n /n ′ = 1.50)
0.8
2n cos θ = n cos θ + n cos θ =
rTM ≡
1.0
n cos θ − n cos θ n cos θ + n cos θ
tan(θ − θ ) . = tan(θ + θ )
Reflrection coefficient
236
0.4 TE 0.2 qB 0.0
(115)
0
10
−0.2
The amplitude reflection coefficients for both TE and TM polarization are plotted as a function of incident angle in Figs. 25 and 26 for light that strikes an interface between air and a typical glass. There are two graphs because the results depend on whether the incoming light is on the air side (n /n = 1.50) or the glass side (n/n = 1.50) of the interface. The first is referred to as external reflection, and the second as internal reflection. Negative values of r indicate a phase change of π upon reflection. Reflectance and Transmittance One is usually interested in the fraction of the wave’s energy or power that is reflected and transmitted at an interface. The desired quantities are the reflectance and transmittance of the interface. The reflectance R is defined as the power carried by the reflected wave divided by the power of the incident wave; the transmittance T is the ratio of the transmitted power to the incident power.
0.2 qΒ Reflection coefficient
−0.2
20
30
40
50
60
70
80
80
90
−0.4 Figure 26. Amplitude reflection coefficients for both TE and TM polarization as a function of incident angle for internal reflection from a glass/air interface (n/n = 1.50). θc is the critical angle for total internal reflection, and θB is the Brewster angle for internal reflection.
To see how the reflectance and transmittance are related to the Fresnel coefficients just derived, we use the fact that the power carried by any one of the three waves is given by the product of the intensity (I, I , or I ) and the cross-sectional area presented by the beam. If A is the illuminated area of the interface, then the cross-sectional area of the incident, transmitted, and reflected beams are A cos θ , A cos θ , and A cos θ , respectively. Hence, the reflectance is I I (A cos θ ) = . (116) R= I(A cos θ ) I
2 E R = = |r|2 . E
90
Angle of incidence TM
−0.4 TE −0.6 −0.8
70
TM
0.0 10
30 40 50 60 Angle of incidence
Because I = 12 vε|E|2 , and I = 12 vε|E |2 (see Eq. (14) in which c is replaced by v, the speed in the incident medium, and ε0 is replaced by ε), the expression, for either polarization, simply reduces to
0.4
0
20
qC
External reflection (n ′/n = 1.50)
−1.0 Figure 25. Amplitude reflection coefficients for both TE and TM polarization as a function of incident angle for external reflection from a typical air (n = 1)/glass (n = 1.50) interface. θB denotes the Brewster angle for external reflection, which is the incident angle where the TM reflection coefficient vanishes.
(117)
To get the transmittance, start with T=
v ε |E |2 cos θ I (A cos θ ) = . I(A cos θ ) vε|E|2 cos θ
(118)
Assuming nonmagnetic media, ε /ε = (n /n)2 . In addition, make the substitution v /v = n/n . Then, the transmittance simplifies to 2 E n cos θ n cos θ = |t|2 . T = E n cos θ n cos θ
(119)
Assuming no absorption at the interface, Eqs. (117) and (119) combined with the Fresnel equations yield R + T = 1, which is a statement of energy conservation.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
First consider the special case of a wave that strikes an interface at normal incidence, that is, at θ = 0° . Then θ = 0° as well, and the reflectance and transmittance formulas become n − n 2 Rnorm = n + n
(120)
4nn . (n + n)2
(121)
and Tnorm =
These expressions follow from either the TE or TM Fresnel equations because the plane of incidence cannot be defined when θ = θ = θ = 0° . For an air–glass interface, we find that Rnorm = 0.04, that is, 4% of the intensity is reflected. This becomes especially important in optical systems that contain, for example, many lenses, where each presents two such interfaces. Hence, the need for applying antireflection coatings. Consider also the other extreme, where the incoming wave strikes the interface at a glancing incidence — that is, when θ approaches 90° . The result for both TE and TM polarization is that Rglance → 1, and an air–glass surface will behave in a mirrorlike fashion. Figure 27 shows the reflectance plotted for various incident angles at an air–glass interface, for both external and internal reflection and for both TE and TM polarization. The most important features of these curves are discussed in the two sections that follow.
1.0 0.9 0.8
Reflectance
0.7 0.6 0.5
External reflection
Internal reflection
0.4 0.3
TM TE
0.2 TE
TM
0.1 20
30 40 50 60 Angle of incidence
70
80
90
Brewster #2
10
Critical
0
Brewster #1
0.0
Figure 27. Reflectance from a typical air/glass interface as a function of incident angle. The TE and TM polarization cases are graphed for external and internal reflection.
237
Polarization by Reflection For TM polarization, notice that there is an incident angle at which R = 0 for both external and internal reflection. This angle is referred to either as the polarization angle or the Brewster angle θB . It is the angle at which there is no reflection of a wave polarized parallel to the plane of incidence. If the polarization of an incoming wave has components both parallel and normal to the plane of incidence, the reflected wave at the Brewster angle will be completely polarized normal to the plane; it will be TE polarized. The amplitude reflection coefficient rTM must vanish at the Brewster angle, so, from Eq. (115), tan(θB + θ ) must become infinite. This occurs when θB + θ = π/2. This leads to tan θB =
sin θB sin θB sin θB = π = . cos θB sin θ sin − θB 2
(122)
Now, use Snell’s law to replace the expression on the right by n /n. The result is the formula for the Brewster angle: tan θB =
n . n
(123)
Furthermore, once θB is determined for external reflection, the Brewster angle for internal reflection is easily obtained by interchanging n and n in Eq. (123). One finds that the two Brewster angles are complements of one another. For the air–glass interface, θB = 56.3° for external reflection and θB = 33.7° for internal reflection. Polarized sunglasses take advantage of what happens at the Brewster angle. When driving a car, much of the glare from the road surface is sunlight reflected near the Brewster angle, so the light is TE polarized parallel to the road surface. From Fig. 27, notice that the minimum of the reflectance curve about θB is quite broad. So even reflections on either side of the true minimum at angles substantially removed from θB are primarily TE polarized. Sunglasses can block out most road reflections by using linear polarizers oriented to block out waves polarized parallel to the road surface. Boaters and fishermen also use polarized glasses to block TE waves reflected from the surface of lakes. This improves one’s ability to view things below the water surface. In designing lasers, one places a so-called Brewster window (see Fig. 28) at each end of the cavity that contains the amplifying medium. A Brewster window is constructed simply by tilting the end of the enclosure (for example, glass) at the Brewster angle, making it completely transparent to laser light that is TM polarized. Conveniently, if the window is tilted at the air–glass Brewster angle, it is straightforward to show that refraction in the glass causes the light to strike the exiting surface at the glass–air Brewster angle (assuming that the surfaces are parallel), allowing complete transmission of the TM wave. Total Internal Reflection For internal reflection (n > n ), Fig. 27 shows that beyond some well-defined incident angle, the reflectance of the
238
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
qB1 E
45° qB2
E
Figure 28. Transmission of a TM polarized wave through a Brewster window. θB1 is the external (air–glass) Brewster angle, and θB2 is the internal (glass–air) Brewster angle.
interface has a value of unity. This holds true for either TE or TM polarization, and the angle is the same in both cases. The incident angle at and above which R = 1 is called the critical angle; it marks the onset of the phenomenon known as total internal reflection (TIR). Above the critical angle, the surface behaves as a perfect reflector, and no refracted ray is transmitted through the interface. It is clear from Snell’s law why a critical angle must exist. When light is transmitted across an interface from a region of high index to one of low index, the refracted ray is bent away from the normal, that is, θ > θ . As the incident angle is increased, so is the refracted angle until, at some point, the refracted angle reaches the value θ = π/2. Once this occurs, any further increase in the incident angle θ causes Snell’s law to break down, and no refracted ray is produced. The critical angle θc , is determined by setting θ = π/2 in Snell’s law, Eq. (105). The result is sin θc =
n . n
(124)
As a simple application, consider light that enters the simple glass prism shown in Fig. 29. The critical angle for glass–air (n/n = 1.50) is θc = 41.8° . After being transmitted through the first interface, the light strikes the next interface at 45° . Because this angle is greater than θc , the light is totally reflected toward the next interface, again at an incident angle of 45° . The light is totally reflected once more, then exits the prism. Basically, the prism is useful as a light reflector, similar to a shiny metal surface, but it is not hampered by losses due to oxidation or corrosion. Total internal reflection is also the basic principle behind guiding light through fiber-optic cables. The extremely astute reader might realize that there is an apparent problem with this simple treatment of TIR — it violates the boundary conditions that must hold at the interface. Specifically, the incident and reflected waves give rise to electric/magnetic fields on the high-index side of the interface, but no fields exist on the low-index side. This is not permitted because certain tangential and normal field components must be continuous across the interface. The resolution of the problem lies in the fact that in addition to having a totally reflected wave, another
Figure 29. A light reflector that uses a simple 45 ° –90 ° –45 ° glass prism (in air). The light strikes each interface at 45 ° , which is larger than the glass–air critical angle of 41.8 ° . Hence, the light undergoes total internal reflection at both interfaces.
wave is produced that travels along the interface. It has the following basic form:
Esurface ∼ e−k αz cos[k (1 + α 2 )1/2 x − ωt].
(125)
k is the wave number in the low-index medium and α=
sin θ sin θc
2 − 1.
(126)
The z axis has been chosen so it is normal to the interface, and the x axis is chosen along the interface (and in the plane of incidence). The interpretation of Eq. (125) is that there is a disturbance that propagates along the interface and has an amplitude that decreases exponentially with distance beyond the surface. Strange as it may seem, this surface wave, or evanescent wave, as it is commonly known, does not produce any energy flux across a single interface (a calculation of the Poynting vector shows that this is true). So, as originally stated, all of the energy carried by the incident wave is transferred to the reflected wave. The existence of the evanescent wave leads to an interesting phenomenon called frustrated total internal reflection (FTIR). When a second interface is brought into close proximity to the original one, some optical energy is taken from the reflected wave and leaks across the intervening gap. The fraction of incident energy transmitted to the other side can be varied by adjusting the thickness of the gap relative to the skin depth 1/k α of the evanescent wave. This is the basic principle behind the design of many optical beam splitters. FTIR is the optical analog to the phenomenon of quantum-mechanical tunnelling across a potential barrier (15). Yet another important property of total internal reflection is that the reflected wave is phase-shifted relative to the incident wave. The size of the shift depends on the angle of incidence and how far it is above the critical angle, as well as the wave’s state of polarization. This can be seen as follows. The amplitude reflection coefficients given by Eqs. (111) and (115) can, by using Snell’s law, be
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
rewritten as rTE =
cos θ − iα sin θc cos θ + iα sin θc
(127)
α sin θc . = α cos θ + i sin θc
and
cos θ − i
rTM
(128)
In other words, for θ > θc , the coefficients are complex because α, as given by Eq. (126), is always real. For either polarization, the reflectance R = |r|2 = r∗ r is unity for any angle greater than or equal to θc , as expected. The phase shifts ϕ on the other hand, reduce to the following functions of θ : sin2 θ − sin2 θc ϕTE = (129) tan 2 cos θ sin2 θ − sin2 θc ϕTM tan = . (130) 2 sin2 θc cos θ ϕTE and ϕTM are plotted as a function of incident angle in Fig. 30 for n/n = 1.50. By totally reflecting a wave that is partly polarized in the TE and TM directions from one or more interfaces, it is possible to adjust the relative phase shift, = ϕTM − ϕTE , of the two orthogonal components and change the polarization state of the wave. This is the basic principle behind a number of achromatic optical polarizing elements. For example, the Fresnel rhomb (2) is an element that transforms linearly polarized light into circularly polarized light (i.e., introduces a relative π/2 phase shift between initially in-phase and equal TE and TM components). Unlike a simple quarter-wave plate, which
180
150
239
can introduce only a π/2 shift at certain wavelengths, the rhomb is relatively insensitive in this regard and can be used throughout the visible spectrum. The idea of total reflection can also be extended to the high-frequency or X-ray region of the spectrum. However, referring back to Fig. 17, observe that in this regime where the frequency is typically above some UV resonance, the index of refraction is less than unity. Now, if one considers an interface between, say, air and glass, X rays see the air as the higher index medium. Because the index of glass is typically only slightly less than unity, the critical angle for the total reflection of X rays in air from a glass surface is just slightly less than 90° . This means that total reflection will occur only for X rays that strike the interface at a glancing angle (i.e., almost parallel to the interface). This principle is used to steer and focus X rays in various instruments such as X-ray telescopes and spectrometers. Reflection from a Conducting Surface Consider a wave that strikes the surface of a metal or other conductor at normal incidence. As before, the reflectance of the surface is given by Eq. (120), but now n is replaced by the complex refractive index n˜ = n + i of the conducting medium. Assuming that the wave starts out in air or vacuum, the reflectance is given by n˜ − 1 2 R = n˜ + 1 =
(n − 1)2 + 2 . (n + 1)2 + 2
(131)
Figure 31 graphs the real and imaginary indexes n and , as a function of wavelength for both silver and copper, along with plots of the corresponding reflectance at normal incidence calculated from Eq. (131). Notice that silver has a very high, essentially constant reflectance across the visible and IR spectrum, and hence has excellent mirrorlike behavior at these wavelengths. Copper, on the other hand, reflects much more in the IR and reddish end of the visible spectrum than in the blue-violet; this gives copper its characteristic surface color.
Phase shift (degrees)
TM
Interference of Electromagnetic Waves
120 TE 90
60 ∆
30 qc = 41.8° 0 40
50
60 70 Angle of incidence
80
90
Figure 30. Solid curves: Phase shift ϕ of reflected wave relative to the incident wave for total internal reflection from a typical glass–air interface (n/n = 1.50) for TE and TM polarization. Dashed curve: = ϕTM − ϕTE , the difference between the solid curves.
All waves in nature (light waves, sound waves, water waves, etc.) exhibit the phenomenon known as interference. The fact that waves interfere with one another is one of the most important characteristics that distinguishes wave behavior from the behavior of particles. When one particle in motion encounters another particle, a collision occurs that alters the trajectory of the particles. However, when two waves meet, each wave continues on as if the other wave was not present at all. As a result, the net disturbance caused by the two waves is a sum of the two individual disturbances, and interference effects occur. In a nutshell, interference between waves can be explained by a simple law known as the principle of superposition. Simply stated, it says that when two or more waves are present, the value of the resultant wave, at a given point in space and time, is the sum of the values of the individual waves at that point. For electromagnetic
240
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
1.0
Source 1
Copper
k1
Observation point
x
R
0.8
0.6
′(×10−1) k2
0.4 Source 2
0.2
Figure 32. Interference of two waves (wave vectors k1 and k2 ) at a point in space.
n ′(×10−1) 0.0 300
400
500 600 700 Wavelength (nm)
800
Because of the principle of superposition, the resultant field at the observation point is obtained by adding the instantaneous fields of the two waves, E(r, t) = E1 (r, t) + E2 (r, t). We are interested in determining the intensity produced by the interference at point r. This is given by the time-averaged value of the Poynting vector, which for a plane wave is
1.0 Silver R 0.8
0.6
′(×10−1)
I = cε0 E2 = cε0 E · E .
0.4
(134)
From the superposition, this becomes 0.2
I = cε0 (E1 + E2 )·(E1 + E2 )
n ′(×10−1) 0.0 300
400
500 600 700 Wavelength (nm)
= cε0 E21 + cε0 E22 + 2cε0 E1 ·E2 .
800
Figure 31. Reflection from a copper and a silver surface (in air) as a function wavelength. R, the reflectance at normal incidence, is determined by both the real and imaginary refractive indices, n and , of the metal. (Data are from Ref. (28).)
waves, this principle can be applied to the instantaneous electric-field vector at a given point, that is, the resultant electric field E at position r at time t is obtained by summing the individual electric-field vectors: E(r, t) =
Ej (r, t).
(132)
(135)
The last term that involves E1 ·E2 comes about because of interference. Clearly, when the polarizations of E1 and E2 are mutually orthogonal, the dot product vanishes, and no interference occurs. So, in general, we see that interference occurs only when the polarizations have parallel components. Let us assume that E1 and E2 are identically polarized. Then, the intensity becomes I = cε0 E21 + cε0 E22 + 2cε0 E1 E2
(136)
and the interference term can be reduced to
j
Notice that it is the fields that add, not the intensities or fluxes. Two-Beam Interference Consider the simplest case of two monochromatic plane waves that have the same frequency ω, and their interference at some fixed point (r = const) in space. As shown in Fig. 32, the two waves could represent two beams (assume in vacuum) that have wave vectors k1 and k2 , emitted by two different sources, that interfere at some chosen observation point. The electric-field vectors are E1 (r, t) = E01 cos(k1 ·r − ωt) E2 (r, t) = E02 cos(k2 ·r − ωt).
(133)
E1 E2 = E01 E02 cos(k1 ·r − ωt) cos(k2 ·r − ωt) = 12 E01 E02 cos[(k2 − k1 )·r].
(137)
The last expression follows after using the standard trigonometric identity for the cosine of a difference and then averaging the resulting cross-terms over time. The argument (k2 − k1 )·r represents the phase difference introduced between the two waves that arises from the difference in path lengths traveled by the beams. For the sake of brevity, we introduce the symbol to represent this phase difference. In general, also incorporates any initial phase difference associated with the oscillations of the source. After averaging the first two terms in Eq. (136) over time, we can rewrite the expression for the intensity as
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
I = 12 cε0 E201 + 12 cε0 E202 + 2 12 cε0 E201 12 cε0 E202 cos . (138) The terms 1/2(cε0 E201 ) and 1/2(cε0 E202 ) represent just the separate intensities, I1 and I2 , of the individual contributing waves. Therefore, the final expression for the intensity takes the simplified form
I = I1 + I2 + 2 I1 I2 cos .
(139)
For the special case where both waves have the same intensity, one sets I1 = I2 = I0 , and Eq. (139) becomes I = 2I0 (1 + cos ) = 4I0 cos2
. 2
(140)
A plot of the intensity as a function of phase difference is displayed in Fig. 33. The important features are as follows: When is any integral multiple of 2π , the waves are exactly in phase at the observation point, and the intensity is maximized — this is known as constructive interference. When is an odd-integral multiple of π , the waves are exactly 180° out of phase, and the intensity vanishes — this is called destructive interference. Observe that in constructive interference, the value of the intensity is 4I0 , four times the intensity of either of the two individual waves. This occurs because interference arises from adding fields, not from adding intensities. When the waves are in phase, the observed intensity is obtained by first adding the individual field amplitudes and then squaring the result. Compare this to what would happen if one were to add intensities — the order of the two operations would be reversed, that is, the individual fields would be squared before adding the results, and one would end up with an observed intensity of only 2I0 . This distinction will be discussed more fully in the section on Coherent versus incoherent sources. A phasor diagram is a useful tool for visualizing interference effects. The basic principle underlying these diagrams is that at any fixed point in space, a harmonic wave can be thought of as the real part of a complex exponential of the form E0 exp[i(ωt + φ)]. This can be represented by a phasor, or vector in the complex plane, as shown in Fig. 34. The length of the phasor is E0 and, at t = 0, it makes an angle φ with the real axis. As time goes
4I0
241
w
E0 wt + f
Figure 34. Phasor of magnitude E0 and initial phase φ that rotates about the origin at angular velocity ω.
on, the phasor rotates about the origin at angular velocity ω. At any chosen instant, the actual field is obtained by taking the geometric projection of the rotating phasor on the real axis, and the result is E0 cos(ωt + φ). Now suppose that one is interested in the interference of two waves that have the same polarization, both of frequency ω. Each of these waves can be represented by a complex exponential; one has initial phase φ1 , and the other has initial phase φ2 . If we assume, for simplicity, that the waves have equal amplitude, that is, E01 = E02 = E0 , the exponentials are given by E0 exp[i(ωt + φ1 )] and E0 exp[i(ωt + φ2 )]. According to the superposition principle, the net field is obtained by adding these complex quantities and then taking the real part. Alternatively, one can draw the two phasors in the complex plane and add them together, head to tail, as vectors. This is demonstrated in Fig. 35 for arbitrarily chosen φ1 and φ2 . The amplitude of the net field is the given by the length of the resultant phasor. The fact that the phasors are rotating becomes immaterial when one is combining waves of identical frequency. Because they rotate at the same rate, the relative angle = φ2 − φ1 between the phasors remains fixed and fixes the magnitude of the resultant as well. Some simple geometry applied to the diagram shows that the amplitude of the total field is |Etotal | = 2E0 sin
1 (π − ) = 2E0 cos . 2 2
(141)
As before, the intensity is 1 cε0 |Etotal |2 2 1 =4 cε0 E20 cos2 2 2
I=
= 4I0 cos2
. 2
Etotal
2I0
(142)
E0
f2 − f1 −6π
−4π
−2π
0
2π
4π
6π
f1
E0
Phase difference (rad) Figure 33. Intensity pattern produced by the interference of two waves; each has intensity I0 , as a function of phase difference at some observation point.
Figure 35. The addition of the phasors for two harmonic waves that have the same amplitude (E0 ) and frequency. φ1 and φ2 are the initial phases of the two waves.
242
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
This is the same as the result obtained previously [Eq. (140)]. The reader should be able to deduce that the phasor diagrams for constructive and destructive interference, where is some integral multiple of π , are quite trivial. In constructive interference, the two phasors are aligned, so the total field is just 2E0 , and the intensity is proportional to its square, giving four times the intensity of a single beam. For destructive interference, the phasors are oriented in opposite directions, and they cancel, giving the expected vanishing field and intensity. As is seen shortly, the real advantage of using phasor diagrams becomes more apparent when one considers the problem of interference among many waves. Ideas related to two-beam interference arise in the analysis of many standard problems, including those of two-slit interference, thin-film interference, and the interference of X rays scattered from crystals. Some of these will be discussed in the sections that follow. In addition, the results of this section also carry over as the basis of various interferometric instruments; the most widely known probably is the Michelson interferometer (2,4). Coherent Versus Incoherent Sources It should be apparent from the preceding analysis that interference effects depend critically on the precise phase difference between the waves involved. Unless a well-defined, constant phase difference is maintained over a reasonable amount of time, one will fail to observe interference between the waves. Two sources of light or other electromagnetic waves characterized by a fixed phase difference are said to be mutually coherent, and the same is said of emitted waves. The radiation fields from such sources combine according to the superposition principle, that is, the fields add and lead to readily observable interference effects. On the other hand, if the phase difference between the sources has a fluctuating value, one says that the sources (and waves) are mutually incoherent. In this case, the sources emit waves independently, causing the individual intensities, rather than the fields, to add. Incoherent sources produce no interference effects. In actuality, the concept of coherence is somewhat involved because no two real sources can maintain a fixed phase difference forever. Coherence is usually quantified in terms of a characteristic time, or coherence time, denoted by τc . One can think of τc as the time during which an approximately constant phase difference is maintained. Whether or not sources are coherent and hence can produce interference depends on the value of τc relative to the response time Tr , of the detector used to observe the interference. In general, the condition for mutual coherence is given by τc Tr , whereas the condition for incoherence is τc Tr . When the coherence time and the detector response time have similar values, the situation is no longer clear-cut, and the simple descriptors coherent and incoherent lose their significance. Ordinary extended optical sources such as incandescent filament and gas-discharge lamps are incoherent. They are incoherent in the sense that there are rapidly fluctuating phase differences between waves that originate from
different sources or, for that matter, from different parts of the same source. The fluctuations arise because individual atoms emit independently from one another and randomly, causing the relative phases to become uncorrelated. The coherence time for a typical gas lamp is on the order of a fraction of a nanosecond, and it becomes impossible to observe interference effects from sources of this type even by using the fastest detectors. Electronic oscillators and antennas for generating radio waves and microwaves are generally considered coherent sources. Even so, the phase stability of these sources gives them typical coherence times on the order of a second or less. This means that even for these so-called coherent sources, any interference signal that develops will last only a few seconds, at best, before evolving into a different signal. A method for overcoming such drifts in the signal is to phase-lock the sources by using some type of feedback control. Similar considerations hold for lasers, which serve as coherent sources in the optical regime. Although much more stable than other optical sources, lasers have attainable coherence times only on the order of milliseconds, at most. One might be able to observe interference between beams from two independent lasers by employing a very fast detection scheme (sampling time less than a millisecond), but clearly this presents a challenge. To attain a truly stable interference signal requires long-term coherence between sources. This is usually done by extracting two or more beams from the same set of wave fronts that emanate from a single source. A specific example of this idea is described in the next section. Huygens’ Principle and Two-Slit Interference. The first, and probably most well-known, demonstration of interference between electromagnetic waves is the interference of light passing through two closely spaced slits. The phenomenon was first observed by Thomas Young in 1802. A diagram of a basic setup similar to that used by Young is shown in Fig. 36. Visible light from an incoherent source
x
q
∆r
d
Figure 36. Schematic of Young’s double-slit setup. Light that passes through a single slit illuminates a pair of slits separated by a small distance d. Light from the two slits produces an interference pattern that consists of bright and dark fringes on a distant screen. If the distance from the slits to the screen is large compared to the slit separation, then the distance marked r is the extra path length traversed by light from one slit, compared to the other slit.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
(coherent optical sources were unavailable at the time Young did his experiments) illuminates a single narrow slit. The light passing through illuminates two narrow slits separated by a very small distance d (typically on the order of tenths of millimeters or less), which in turn produce an intensity pattern on a screen a substantial distance D away (D d). Our discussion will assume that the light from the source is approximately monochromatic, centered on some wavelength λ. Understanding the role of the single- and double-slit arrangement requires stating a fundamental idea put forth by the Dutch physicist, Christian Huygens, during the latter part of the seventeenth century, that has become known as Huygens’ principle. Given a wave front at one instant, this principle provides a way to determine how that wave front propagates through space at later times. Basically, Huygens’ principle states that each and every point on a given wave front acts as a new source of outgoing spherical waves, called wavelets. Once one wave front is specified, a subsequent wave front can be constructed by tracking the envelope generated by these wavelets. To follow the development of the wave front at still later times, one only need apply Huygens’ principle again every time a newly constructed wave front is determined. Based on this understanding, suppose we let light from some arbitrary source fall on an extremely small, almost pointlike, aperture. When a wave front reaches the aperture, only the wavelet generated at the opening will appear on the other side — in effect, the aperture acts as a small source of spherical wave fronts. If the single aperture is now followed by two more apertures, closely spaced, the succession of spherical wave fronts will strike the pair of openings, causing the apertures to act like two point sources, emitting spherical wavefronts in the region beyond. Not only do they behave as point sources, but they behave as coherent sources as well. This property is essential if one expects to see interference effects. Even though the original light source was incoherent, the presence of the first aperture guarantees that the train of wave fronts that strike the double aperture stimulates the production of two wavelet sources that have a definitive, fixed, phase relationship with each other. As was learned earlier, a fixed phase difference is the defining characteristic of coherent sources. Even though our discussion has assumed point-like apertures, the essential ideas still carry over to a setup that uses long, narrow slits. The main difference is that instead of acting as coherent sources of spherical waves, the slits behave approximately as coherent sources of cylindrical waves. Now, referring to Fig. 36, consider the illumination at some arbitrarily chosen point on the screen that is positioned at a large distance beyond the double slit. The light level that appears at a given point depends on the phase difference appearing between the waves that originate at each of the two slits. This phase difference can be determined by knowing how much further the light from one slit has to travel compared to light from the other slit. Assuming that D, the distance to the screen, is large, this difference in path length is approximately given by the distance r labeled on the diagram. If the
243
location of the point under consideration is specified by the angle θ measured relative to the symmetry line that bisects the two-slit arrangement, then, from some simple geometry, one sees that r = d sin θ , where, again, d is the separation between the slits. Thus, the corresponding phase difference is determined by the size of the path difference relative to the wavelength of the light: = 2π
r λ
=
2π d sin θ . λ
(143)
At the center of the screen where θ = 0, the waves from the two slits travel the same distance, so there is no phase difference and the waves arrive at the screen exactly in phase. The waves constructively interfere at this point, and high light intensity is observed. Constructive interference also occurs at other points on the screen on either side of θ = 0, in particular, those that satisfy the condition = 2π m, where m is an integer. Equivalently, the values of θ for constructive interference meet the condition d sin θ = mλ,
m = 0, 1, 2, . . . .
(144)
Destructive interference, corresponding to points of zero intensity on the screen, is given by the condition = 2π(m + 1/2). So θ must satisfy d sin θ = m + 12 λ,
m = 0, 1, 2, . . . .
(145)
Regions of constructive interference are often called bright fringes, and those of destructive interference are called dark fringes. The integer m specifies the so-called order of a fringe. For example, the bright illumination at the center of the pattern corresponds to the zero-order bright fringe, and the adjacent bright fringes on either side of center are both first-order fringes, etc. Suppose that we call the position at the center of the screen x = 0 and measure the actual distance x between this point and the centers of the various bright and dark fringes. Then, as long as θ is sufficiently small, we can replace sin θ ≈ tan θ = x/D in Eqs. (144) and (145). The result is that successive bright fringes or successive dark fringes are equally spaced by the amount x ≈ λD/d. Knowing the slit spacing and the distance to the screen, one can then measure the spacing between adjacent fringes and calculate the wavelength of the light. Using this technique in his early experiments, Young achieved wavelength determinations for visible light. It is important to notice that there is an inverse relationship between slit spacing d, and fringe spacing x. Hence, for a fixed wavelength of light, increasing the slit spacing decreases the spacing between fringes in the pattern (and vice versa). The functional form of the intensity distribution observed on the screen is given by Eq. (140), and is given by Eq. (143). In other words,
π d sin θ λ π dx . ≈ 4I0 cos2 λD
I = 4I0 cos2
(146)
244
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Bragg Reflection of X-Rays from Crystals. In the X-ray region, a standard two-slit arrangement, such as that just described, would not produce an observable interference pattern. The reason for this can be seen by considering a ˚ According typical X-ray wavelength, for example, λ = 1 A. to Eq. (144), if one sends x-rays at this wavelength through even the most closely spaced slits, say on the order of only a few microns, the first-order maxima would appear at an angle no larger than a few thousandths of a degree. The spacing between higher order fringes would be similarly small and, in practice, one could not resolve any variations in intensity. In practice, producing an observable X-ray interference pattern requires two or more coherent sources separated by an extremely small distance d that approaches the wavelength of the radiation, that is, the separation needs to be on the order of angstroms, or at most, nanometers. These are exactly the types of spacings between atoms that form the lattice of a crystalline solid. When X-rays strike a crystal, the beam is scattered in various directions by the individual atoms. The various atoms that make up the crystal act as a set of mutually coherent X-ray sources (see the section on Rayleigh scattering by atoms), and an interference pattern is produced. Correctly accounting for all the scattering from the individual atoms is somewhat involved (12), but the condition for constructive interference, or maximum intensity, turns out to be quite simple. Rather than concentrating on scattering from the individual atoms, one only needs to treat the X-rays as if they are specularly reflected by various sets of parallel atomic planes within the crystal. A crystal structure, in general, has many sets of parallel planes; each set has its own orientation and plane-to-plane spacing. For now, however, consider only one such set and focus on two of its adjacent planes, as in Fig. 37. Here, d stands for the spacing between successive planes, and φ is the angle of the incoming beam relative to the surface of a reflecting plane. X-ray interference occurs because of the phase difference between the waves that are reflected from the upper and lower planes. From simple geometry, one sees that the ray reflected from the bottom plane must travel an extra path length of 2d sin φ relative to the ray reflected from the top plane. The condition for constructive interference is that this distance must be an
f
integral multiple of the X-ray wavelength, or 2d sin φ = mλ,
m = 1, 2, . . . .
(147)
This is known as Bragg’s law. Actually, one should be considering interference between waves reflected from all of the many parallel planes within a given set. When this is done, however, one finds that this form of Bragg’s law is still correct (see the ideas related to multiple-beam interference in the next section). For a particular set of crystal planes, the spacing d is fixed. If, in addition, the wavelength is fixed as well, then varying the incoming angle φ produces alternating maxima and minima that represent constructive and destructive interference, respectively. Alternatively, Bragg’s law can be used to measure the plane separation if the wavelength is known, and vice versa. These ideas are the basis for standard crystal spectrometers (or x-ray diffractometers) (29). Keep in mind that, in general, there are many sets of crystal planes in a given crystal structure, and quite often, more than one species of atom is present in the crystal. The result is that real X-ray interference patterns from single crystals (also known as a Laue patterns) can be somewhat complex. Nevertheless, the patterns obtained are, in effect, fingerprints of the various crystal structures. Multiple-Beam Interference and Gratings Now, we turn to the problem of the interference between waves emitted by N mutually coherent sources of the same frequency ω. The application that we have in mind is multiple-slit (i.e., N-slit) interference, depicted in Fig. 38. It is assumed that each slit is separated from its neighbor by the distance d. A geometric construction essentially identical to that described for two-slit interference can be applied here to the rays from two adjacent slits, showing that a path difference of d sin θ , or a phase difference of = (2π d/λ) sin θ , arises between the waves from neighboring slits. In addition, if the slit arrangement is small compared with the distance to the observation screen, then all of the waves will have approximately the same amplitude E0 upon reaching the interference point.
f
q
d
in f
ds
f
*
*
*
*
*
* d sinq
Atomic planes
Figure 37. Bragg reflection from atomic planes in a crystal.
d N slits
Figure 38. Interference of light passing through N slits that have separation d. The path difference between rays from two adjacent slits is d sin θ.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
˜ = E0 eiωt E
N
16
A
12
I /I0
As before, we can treat the field produced by any one of the waves as a complex exponential, or equivalently, a phasor in the complex plane. Let us start by using complex exponentials, and let the field produced at the observation point by the wave from the first slit be E0 exp(iωt). Then, the waves from the other slits are just phase-shifted, in succession, by an amount . Hence, the wave from the nth slit is given by E0 exp{i[ωt + (n − 1)]}. Superimposing all N waves gives a total complex field of
8
B
4
ei(n−1)
n=1
= E0 eiωt
N−1
D C
ein .
(148)
n=0
The has the form of a standard geometric series N−1sum n α , where α = exp(i). The series converges to the n=0 value (1 − α N )/(1 − α). Therefore, ˜ = E0 eiωt E
245
1 − eiN 1 − ei
0 −2π
2π
0 ∆ (rad)
Figure 39. N = 4 slit interference pattern as a function of the phase difference between waves from two adjacent slits.
.
(149)
E
∆ = 0 (Point A)
Hence, the observed intensity I is proportional to the ˜ times its squared magnitude of this complex field (or E complex conjugate). The result reduces to the form N sin2 2 , I = I0 sin2 2
E
∆ = π/4 (Point B)
(150)
where I0 is the intensity from an isolated slit. Note that for N = 2, one can show that Eq. (150) reduces to the expression previously obtained for two coherent sources, or a double slit [i.e., Eq. (140)]. Let us investigate the main features of the intensity pattern predicted by Eq. (150). When the phase difference is any integral multiple of 2π , both the numerator and denominator vanish, and the limiting value of the intensity becomes N 2 I0 . These points in the interference pattern are referred to as principal maxima — they are points where constructive interference occurs. The condition for the principal maxima is = (2π d/λ) sin θ = 2π m, or d sin θ = mλ,
m = 0, 1, 2, . . . .
(151)
This is exactly the same as the condition for maxima in the two-slit interference pattern. Points where the numerator vanishes, but the denominator does not, correspond to zeroes (or minima). These are given by the condition = 2π n/N, where n = 1, 2, . . ., N − 1. There are N − 1 minima between adjacent principal maxima. Between the minima are peaks known as a secondary maxima. There are N − 2 secondary maxima between each pair of principal maxima. Figure 39 shows a plot of intensity as a function of for N = 4 slits. The phasor diagrams for the points labeled A, B, C, and D are displayed in Fig. 40. When the number of slits is large, the separation between the first minima on either side of a principal
π/4
∆ = π/2 (Point C)
(E = 0)
E ∆ = 3π/4 (Point D)
3π /4
Figure 40. Phasor diagrams for points labeled A, B, C, and D on the interference pattern of Fig. 39.
maximum is very small. As a result, the principal maxima are very narrow and very intense (I = N 2 I0 ). When N is large, one refers to the slit arrangement as a grating. The principal maxima in the interference pattern are easily resolvable when a grating is used. This is important when the source contains spectral lines that have closely spaced wavelengths. Suppose that one wants to separate principal maxima of the same order m that correspond to two wavelengths λ1 and λ2 . Ordinarily, the condition that must be satisfied to say that two lines (i.e., wavelengths) are just resolved is that the principal maximum for λ1 occurs at the location of the first zero next to the principal maximum for λ2 (for the same value of m). This condition, called the Rayleigh criterion, is sketched in Fig. 41. It can be shown that the Rayleigh criterion is equivalent to
246
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Principal maximum for l1
Principal maximum for l2
Figure 41. Rayleigh criterion for resolving two wavelength components (i.e., spectral lines). The wavelengths λ1 and λ2 are considered just resolved if the principal maximum for one component falls at the same location as the first zero of the other component.
requiring that
|λ2 − λ1 | 1 = , λave Nm
(152)
where λave is the average of the two wavelengths. One defines the resolving power RP, for a grating as RP =
λave = Nm. |λ2 − λ1 |
(153)
For example, consider a grating where N = 50,000. The resolving power of this grating for the first-order (m = 1) principal maximum is simply RP = 50,000. Suppose that two spectral lines, both green and close to a wavelength of 550 nm, enter the grating. Then, the grating can resolve a wavelength difference of |λ2 − λ1 | = (550 nm)/50,000 = 0.011 nm. A number of other optical devices are based on the principles of multiple-beam interference. Fabry–Perot devices, for example, contain two closely spaced, highly reflective surfaces that face each other. Each time light traverses the gap between the two surfaces, most of the light is reflected back to the other surface; however, a small portion is transmitted through the device. If the reflectance of the surfaces is very close to unity, a great many of these reflections will occur; each produces a weakly transmitted beam in the process. These transmitted beams will superimpose and produce an interference pattern whose resolution characteristics are very high. Fabry–Perot interferometers are instruments based on this principle and are used for high-resolution optical spectroscopy. Optical elements that incorporate Fabry–Perot geometry are also useful as narrowband spectral filters (2,4,27). Diffraction When light or other EM radiation encounters an aperture or obstacle whose size is either smaller than or on the order of the wavelength λ, the advancing wave produces a rather complex illumination pattern unlike that predicted by basic geometric or ray optics. A crisp geometric shadow is not formed, as one might expect. Instead, the observed irradiance varies smoothly in the vicinity of an edge and
is accompanied by alternating regions of brightness and darkness. Furthermore, the wave is bent to some degree around the edges of the aperture or obstacle into the region of the geometric shadow. These effects, especially the latter, fall under the heading of diffraction. Diffraction occurs because of the wave properties of EM radiation and basically encompasses any effects that represent deviations from geometric optics. When the size of the aperture or obstacle is significantly greater than the wavelength of the radiation, the observable effects of diffraction are extremely minimal, and one is justified in using simple ray-tracing techniques and the principles of geometric optics. Diffraction effects are generally classified into one of two regimes. When the radiation source and observation plane are both sufficiently far from the diffracting aperture (or obstacle) so that the curvature of the wave fronts at the aperture and observation plane is extremely small, a plane wave approximation can be used, and one speaks of far-field or Fraunhofer diffraction. When the distances involved are such that the curvature of the wave front at the aperture or observation plane is significant, one is dealing with near-field or Fresnel diffraction. As the observation plane is moved further and further beyond the aperture, the observed pattern continuously evolves from that predicted by geometric optics to that of Fresnel diffraction, to that of Fraunhofer diffraction. The changes in the illumination pattern are quite complex until one enters the Fraunhofer regime; at that point, any further increase in distance introduces only a change in the scale, not the shape, of the pattern observed. Because of its complicated nature, Fresnel diffraction will not be discussed any further here. Instead, the interested reader should consult some of the available references on the subject (2,4). The focus of this section is on the Fraunhofer regime, starting with diffraction from a single slit as described here. Single-Slit Diffraction According to Huygens’ principle, each point of a wave front acts as a source of outgoing spherical wavelets. This means that when a plane wave encounters a slit, the diffraction pattern is essentially caused by the interference of wavelets that emanate from a huge number of infinitesimally spaced coherent point sources that fill the slit (see Fig. 42). To obtain the far-field (Fraunhofer) irradiance pattern, consider the corresponding phasor diagram shown in Fig. 43. Each phasor represents the electric-field contribution that originates from a single point in the slit and has magnitude dE. The phase difference d for waves from adjacent points separated by a distance dy is d =
2π (dy) sin θ. λ
(154)
The various phasors form a circular arc of some radius R that subtends an angle 2α, and the resultant electric-field amplitude E at a given observation angle θ is given by the length of the chord that connects the end points of the circular arc, or E = 2R sin α. (155)
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
247
I /I0
1.0
q 0.5
dy
a
Figure 42. Single-slit diffraction is caused by the interference of wavelets emanating from an infinite number of closely spaced point sources that fill the slit (width a).
−3π
−2π
−π
π
0 a
2π
3π
Figure 44. Intensity pattern for diffraction from a single slit of width a as a function of α = (π a/λ) sin θ.
R a a
R
E E0
2a
Figure 43. Phasor diagram for single-slit diffraction. 2α is the phase difference between waves emanating from the extreme end points of the slit.
The radius R is just the arc length divided by 2α. Let the arc length be denoted by E0 — this corresponds to the field amplitude at θ = 0 (i.e., where all the phasors are parallel). Then, Eq. (155) reduces to E = E0
sin α . α
(156)
This can be squared to give the irradiance: I = I0
sin α α
2 ,
(157)
where I0 is the intensity at θ = 0. In addition, notice that 2α is the total phase difference between the waves that originate from the extreme end points of the slit — in other words, 2α = N(d). Using Eq. (154) gives α = [π N(dy)/λ] sin θ . However, N(dy) is just the width of the slit, which we shall denote by a. Therefore, α=
πa sin θ. λ
(158)
A sketch of Eq. (157) for single-slit diffraction is shown in Fig. 44. The zeros of the intensity pattern occur when α is an integral multiple of π (except for α = 0), or using Eq. (158), when a sin θ = mλ,
m = 1, 2, 3, . . . .
(159)
The central maximum is twice as broad and much more intense than the secondary maxima and essentially represents the slit’s image. The angular width θ of this image can be defined as the angular separation between the first minima on either side of the central maximum. Using the approximation sin θ θ and setting m = ±1 in Eq. (159) gives the width as θ =
2λ . a
(160)
When a, the size of the slit, is much larger than the wavelength, the spreading of the beam is small, and the effects of diffraction are minimal. As the ratio a/λ increases, one becomes more and more justified in using the approximations inherent in simple geometric optics. However, for slits that are small relative to the wavelength, pronounced beam spreading occurs, and the geometric-optics approach fails completely. For a fixed wavelength, the narrower the slit, the more the beam spreads. Said another way, the more collimated a beam of light or other electromagnetic radiation, the more the beam tends to spread because of diffraction. Even if one just considers a bare, propagating beam, that has no aperture present, diffraction will still cause the beam to spread naturally and lose collimation to some extent. Fraunhofer Diffraction by a General Aperture The single-slit result just obtained is actually a special case of a more general result of Fraunhofer diffraction. Consider a general aperture lying in the x, y plane. The observation plane is a distance D away and parallel to the plane of the aperture. Points on the observation plane are labeled according to a separate X, Y coordinate system (see Fig. 45). It turns out that, in general, the Fraunhofer diffraction pattern for the electric field in the observation plane is proportional to a two-dimensional
248
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
where
Observation plane
α=
ωx a 2
β=
ωy b . 2
Y
and
X
(165)
The single-slit result [Eq. (157)] found in the previous section can be obtained by just letting b → 0 and observing that for small θ , the parameter α reduces to
D
Aperture plane
y
kXa πa X πa πa ωx a = = = tan θ ≈ sin θ, 2 2D λ D λ λ (166) as in Eq. (158). α=
Fourier transform of the aperture: E(ωx , ωy ) = const ×
+∞
−∞
+∞
g(x, y)ei(ωx x+ωy y) dx dy.
−∞
(161) The parameters ωx and ωy are related to the coordinates (X, Y) in the observation plane by kX ωx = D and ωy =
kY , D
(162)
where k = 2π/λ, as usual. ωx and ωy have units of inverse length and are usually referred to as spatial frequencies. g(x, y) is known as the aperture transmission function. When the aperture is simply a hole or set of holes in an otherwise opaque screen, the function g(x, y) is zero everywhere, except within the hole(s) where it has the value of unity. The resulting diffraction patterns, that is, the squared magnitude of E(ωx , ωy ), from a few standard shapes of open apertures are presented here. In general, however, g(x, y) can represent an aperture that has a transmission that varies in amplitude and/or phase as a function of x and y.
Rectangular Aperture. For a rectangular hole, whose dimensions are a and b in the x and y directions, respectively, the aperture transmission function is g(x, y) =
1, 0,
for |x| ≤ a and |y| ≤ b . otherwise
(163)
It is straightforward to compute the Fourier transform, and the diffraction pattern has the form I = I0
sin α α
2
sin β β
2 ,
(164)
Circular Aperture. To handle the computation for a circular aperture of radius R, it becomes convenient to transform to plane polar coordinates, (x, y) → (r, φ) and (X, Y) → (ρ, ϕ). Then, the aperture transmission function is simply 1, for r ≤ R g(x, y) = . (167) 0, otherwise The observed diffraction pattern is circularly symmetrical and is plotted in Fig. 46. The irradiance has the form I = I0
2J1 (u) u
2 ,
(168)
where J1 (u) is the first-order Bessel function whose argument is kR u= ρ. (169) D
1.0
0.8
0.6
I /I0
x Figure 45. Coordinate systems for the aperture and observation planes used in Fraunhofer diffraction. The light is incident on the aperture plane (from below) and produces the diffraction pattern on a screen located in the observation plane.
0.4
0.2
0.0
0
2
4 u
6
8
Figure 46. Fraunhofer diffraction pattern produced by a circular aperture of radius R as a function of u = kRρ/D, where ρ is the distance measured from the center of the observed pattern. D is the distance from the aperture to the observation plane, and k = 2π/λ.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
The bright, central area is called an Airy disk. The edge of the disk is determined by the first zero of the Bessel function, which occurs at u = 3.832. Then, Eq. (169) gives the angular extent θ of the disk (measured from its center) as approximately θ ≈ 1.22
λ . 2R
(170)
Multiple Apertures. Fraunhofer diffraction from a collection of identical apertures can be handled by using an important result known as the array theorem. The theorem states that the diffraction pattern from an array of identical apertures is given by the product of the diffraction pattern from a single aperture and the interference pattern of an identically distributed array of point sources. An illustration of this idea is the problem of diffraction from N identical long slits whose widths are a and whose center-to-center separation is d. For large N, this would be the observed irradiance from a diffraction grating. The observed diffraction pattern is the product of the diffraction pattern from a single slit [Eq. (157)] and the interference pattern from N slits that have infinitesimal widths [Eq. (150)]. In other words, the pattern is given by
I = I0
sin α α
2
2 N 2 , sin 2
12
The ability of imaging systems to resolve objects is diffraction-limited. For example, when an optical instrument like a telescope or camera images a distant point source, it basically forms a Fraunhofer diffraction pattern at the focal plane of a lens. In this case, the lens opening itself is the aperture. If an attempt is made to image two or more closely spaced point sources, each produces its own Airy disk, and they overlap. According to the Rayleigh criterion discussed previously (see section on multiple-beam interference and gratings), two images are considered resolved when the center of the Airy disk from one source coincides with the edge (i.e., first minimum) of the Airy disk from the other source. The minimum angular separation between the disks (as well as between the sources) is the same θ as given by Eq. (170).
16
I /I0
249
sin
(171)
where, as before, α = (π a/λ) sin θ and = (2π d/λ) sin θ . The pattern for N = 4 and d/a = 8 is shown in Fig. 47. The second factor in Eq. (171) associated with multipleslit interference determines the location of fringes in the pattern, and the first factor, which is due to slit width, determines the shape of the envelope that modulates the fringe intensities. Babinet’s Principle An amazing fact about diffraction in the Fraunhofer regime is that if an aperture (or array of apertures) is replaced by an obstacle (or array of obstacles) that has exactly the same size and shape, the diffraction pattern obtained is the same except for the level of brightness. This result comes from a more general theorem known
8
4
0 −6
−4
−2
0 a
2
4
6
Figure 47. Fraunhofer diffraction from N = 4 identical, long slits. The ratio of the slit separation d to the slit width a is d/a = 8. The intensity pattern is displayed as a function of α = (π a/λ) sin θ.
as Babinet’s principle. For example, coherent light aimed at a long, thin object, like an opaque hair, will produce the same irradiance pattern as light that impinges on a long, thin slit of the same width. Rather than producing a geometric shadow and acting to block the light, the hair causes the light to bend, resulting, as before, in a diffraction pattern that has a central bright fringe. Classical Scattering of Electromagnetic Waves The term scattering refers to an interaction between incident radiation and a target entity that results in redirection and possibly a change in frequency (or energy) of the radiation. For radiation of the electromagnetic type, the word target almost always refers to the atomic electrons associated with the scattering medium. From a fundamental standpoint, the scattering of electromagnetic radiation from atomic and molecular systems needs to be treated as a quantum-mechanical problem. This approach will be taken later on in the section on the scattering of photons. In the quantum treatment, both elastic and inelastic scattering of photons can occur. An elastic scattering process is one in which no change in photon energy or frequency takes place. In an inelastic process, radiation experiences an upward or downward frequency shift due to the exchange of energy between photon and target. This section presents a classical wave picture of the scattering process. According to the classical approach, the only type of scattering that can occur is elastic scattering; inelastic scattering appears only as a possibility in the quantum treatment. Amazingly, however, the results for elastic scattering of classical waves are completely consistent with the quantum results for elastic scattering of photons. Thomson Scattering of X rays and the Classical Concept of a Scattering Cross Section Begin by considering the elastic scattering of low-energy X rays by an atomic electron, a process known as Thomson
250
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
scattering. For X rays, the photon energy h ¯ ω is much greater than the binding energy of an atomic electron, and it is quite reasonable to treat the electron as essentially unbound, or free, and initially at rest. Thomson scattering applies only to X rays whose photon energies are much less than me c2 ( 511 keV), the electron’s rest energy. At higher energies, quantum effects become important, and X rays undergo Compton scattering (to be described later). In a classical treatment of Thomson scattering, one considers the incident radiation as a monochromatic wave (frequency ω, amplitude E0 ) that is polarized, for example, along the x direction. The wave’s electric-field vector exerts an oscillating force Fx = −eE0 cos ωt on a free electron. As a result of this force, the electron (mass me ) undergoes a harmonic acceleration, d2 x Fx eE0 = =− cos ωt, dt2 me me
(172)
dP = I0
2 (d) sin2 γ ,
(175)
where I0 is the intensity of the incident wave. The sin2 γ angular dependence of the scattered power should remind the reader of the distribution of radiation emitted from a simple oscillating electric dipole. The power of the scattered wave per unit solid angle divided by the incident intensity is known as the (angular) differential scattering cross section, and is represented by the symbol dσ/d. For Thomson scattering, it has the particularly simple form dP/d dσ = r20 sin2 γ , = d I0 where r0 =
e2 4π ε0 me c2
= unpol
1 2 r (1 + cos2 θ ). 2 0
(178)
Here, θ is the angle between the propagation directions of the scattered and incident waves, usually referred to as the scattering angle. The total scattering cross section σ is obtained by adding up, or integrating, the expression for the differential cross section over all possible solid angles (i.e., over 4π steradians). Curiously, the result turns out to be the same, independent of whether the radiation is polarized or not: σ =
2π π dσ dσ d = sin θ dθ dφ d d φ=0 θ =0
8π 2 r 3 0
(179) (180)
(173)
which, in turn, radiates electromagnetic energy as a scattered wave, also at frequency ω. It can be shown that the time-averaged power carried by the scattered wave into a small cone of solid angle d (measured in steradians) at an angle γ relative to the polarization vector of the incident field (i.e., the x axis) is given by e2 4π ε0 me c2
dσ d
= 0.665 × 10−28 m2 .
Therefore, the electron behaves as an oscillating electric dipole, e2 E0 cos ωt, (174) p(t) = −e · x(t) = − me ω 2
=
or equivalently, harmonic displacement eE0 cos ωt. x(t) = me ω 2
a linearly polarized wave. For an unpolarized beam of X rays, the differential cross section becomes (12)
(176)
(177)
is known as the classical electron radius, which has a value of 2.82 × 10−15 m. Note that the dimensions of a differential cross section are that of an area (per steradian). The expression in Eq. (176) was derived for
Usually, cross sections are stated in units of barns; 1 barn is defined as 10−28 m2 . Hence, the Thomson scattering cross section is 0.665 barns. One can think of the total cross section σ for a particular scattering process as the effective cross-sectional area of the scatterer (in this case, the electron) presented to the incident radiation beam, and it is proportional to the probability that the scattering process in question takes place. It is important to realize that a scattering cross section does not, in general, correspond to the true geometric cross-sectional area of the scatterer. Although not the case here, the differential and total cross sections are commonly functions of the incident energy (or frequency). The effective radius of a circular disk, whose geometric area exactly√matches that of the derived Thomson cross section, is σ/π, or 4.6 × 10−15 m. In comparison, the radius of an atom is on the order of an angstrom, or 10−10 m. The fact that the cross section for Thomson scattering is so small explains, in part, why X rays penetrate so easily through many materials. Scattering of Light by Small Particles Now, we turn to another problem that can be approached by classical means, namely, the elastic scattering of light by small, dielectric particles. The details are rather mathematical and beyond the scope of this article, but the central ideas and important results are outlined here. Consider a homogeneous dielectric particle with index of refraction n immersed in a uniform background medium of refractive index n0 . A typical scattering geometry is shown in Fig. 48. A beam of polarized light, whose wavelength is λ0 in vacuum, is incident on the particle. The propagation of the incoming beam is chosen to be in the +z direction, and it is characterized by the wave number k = kv n0 , where kv = 2π/λ0 = ω/c is the wave number in vacuum. The incoming light is treated as a polarized plane wave of the (complex) form E0 exp[i(k · r − ωt)], where k is the incident wavevector. The setup displayed in the
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
x
Eo k n
q
n0
z
Es k ′ Dete ctor
y Figure 48. Scattering of light from a homogeneous dielectric particle (refractive index n) immersed in a uniform background medium (refractive index n0 ). The wave vector k of the incident light is along the z axis and, in the geometry shown, the incident light is taken as polarized along the x axis. k is the wave vector of the scattered light in the y, z plane (or scattering plane), and θ is called the scattering angle. (Adapted from Interaction of Photons and Neutrons with Matter by S. H. Chen and M. Kotlarchyk. Copyright 1997 World Scientific Publishing Company. Used with permission.)
figure assumes that the electric field of the incident wave is polarized along the x axis and that one is concerned only about radiation scattered into the y, z plane, also known as the scattering plane. The wave vector of the scattered wave is k , and the angle θ between k and k is the scattering angle. Because the scattering is elastic, the magnitude of the scattered wave vector is the same as the magnitude of the incident wave vector. In general, the polarization vector of the scattered field could have a component normal to or parallel to the scattering plane. These are called the polarized (or vertical) and depolarized (or horizontal) components, respectively, of the scattered wave. An important parameter that arises in the analysis is the scattering vector or wave vector transfer Q. As shown in Fig. 49, Q is the difference between the incident and scattered wavevectors: Q = k − k .
(181)
Because |k| = |k | = k, one can see from simple geometry that the magnitude of the scattering vector is directly related to the scattering angle θ by Q = 2k sin
θ 4π n0 θ sin . = 2 λ0 2
(182)
The general problem to be addressed is to calculate the scattered wave at a position r (often called the field point) far from the scattering particle located at the origin. There are two basic classical approaches for doing so. One approach is based on an integral formulation; the other is based on a differential formulation of the scattering problem. The integral approach will be described first: For field points sufficiently far from the scatterer (farfield approximation), Maxwell’s equations can be recast into a form that produces an explicit expression for the ! scattered field that involves the integral E(r ) exp(−ik · r )dV over all points r (often called source points) inside the particle volume (12). The integrand involves the quantity E(r ), which is the electric-field vector at each point within the particle. Because the internal field of the particle is unknown, it might appear that the integral approach to scattering produces an indeterminate expression for the scattered field. However, by making certain simplifying approximations, a number of useful results can be obtained, as discussed here:
Rayleigh–Gans–Debye Scattering. The Rayleigh–Gans– Debye (RGD) approximation is applicable under the following two conditions: 1. The refractive index of the scattering particle is close to that of the surrounding background medium: |m − 1| 1,
Q
q/2 q
k Figure 49. The scattering vector Q is defined as the difference between the incident and scattered wave vectors k and k .
(183)
where m = n/n0 . 2. The particle is small relative to the wavelength of the light: kd|m − 1| 1, (184) where d is the diameter, or characteristic size, of the particle. If the RGD conditions are met, the scattering is sufficiently weak so that once the incident wave undergoes scattering at some point r within the particle, a second scattering event becomes highly unlikely. This allows one essentially to replace the internal field E(r ) with the value of the incident field at the same point:
E(r ) → E0 ei(k·r −ωt) .
(185)
Using this replacement, the previous integral reduces to a tractable form and the complete expression for the scattered field Es becomes Es = k2 (m − 1)
k′
251
ei(kr−ωt) E0 Vf (Q), 2π r
(186)
where V is the volume of the scattering particle. Notice that scattering of light occurs only when the refractive index of the particle is different from that of the background medium, that is when m = 1. In addition, for RGD scattering, the polarization of the scattered wave is always parallel to the polarization of the incident wave; no depolarized scattering component appears. The function
252
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
f (Q) is called the particle form factor (9,10) — it depends on the details of the shape and size of the particle, as well as the particle’s orientation relative to the direction of the vector Q. The form factor is given by an integral over the particle volume: f (Q) =
1 V
eiQ·r dV .
(187)
V
In physical terms, each point r acts as a source of spherical, outgoing waves. The total scattered wave at the field point r is a coherent superposition of these waves; each has an associated phase shift due to the position of the source point within the particle. This is very much like the superposition of Huygens wavelets encountered in standard interference and diffraction problems. Consider scattering from a small dielectric sphere of radius R. In this case the particle form factor is only a function of u = QR, and is given by f (u) =
3 (sin u − u cos u). u3
(188)
The scattered intensity is proportional to |Es |2 , which is proportional to |f (u)|2 . The angular distribution of the scattered light is plotted in Fig. 50. An important observation (notice the logarithmic vertical scale) for RGD scattering is that the larger the particle, the more it tends to scatter in the forward (i.e., small θ ) direction.
Classical Rayleigh Scattering. Rayleigh scattering occurs when the particle size is very much smaller than the wavelength of the light, irrespective of the relative refractive index m. In other words, it applies when 2kR 1, or equivalently, when QR 1. In this limit, the
form factor approaches unity, and there is no θ dependence to the scattering. The RGD expression for the scattered field reduces to Es =
ei(kr−ωt) 2 (m − 1)k2 R3 E0 . 3 r
(189)
However, this expression is not quite right because the RGD approximation puts a restriction on the refractive index that is not applicable here. The correct expression is actually 2 m −1 ei(kr−ωt) (190) k2 R3 E0 . Es = 2 m +2 r We can introduce the particle polarizability, α, which is analogous to the quantity molecular polarizability previously discussed in connection with Eq. (62). The polarizability of a particle is a measure of the ease with which the oscillating light field induces an electric dipole moment in the particle. It corresponds to α = R3
m2 − 1 . m2 + 2
(191)
The particle essentially acts like a small radiating electric dipole; it produces light that has the same polarization as that of the incident wave (true only for light emitted into the previously defined scattering plane). Now, we can calculate the differential cross section for Rayleigh scattering: E 2 dσ s = r2 d E0 = k4 α 2 .
(192)
The total scattering cross section is σ = 4π k4 α 2
1E+0
A parameter that is sometimes quoted is the particle scattering efficiency (9,10) η, which is the total scattering cross section divided by the geometric cross section of the particle, η = σ/π R2 . For a Rayleigh scatterer, the scattering efficiency can be written as
Relative intensity
1E−1
1E−2
2 2 m −1 . η = 2k2 R2 m2 + 2
1E−3
1E−4
1E−5
1E−6
(193)
0
2
4
6
8
10
QR Figure 50. Rayleigh–Gans–Debye scattering from a dielectric sphere of radius R. (Adapted from Interaction of Photons and Neutrons with Matter by S. H. Chen and M. Kotlarchyk. Copyright 1997 World Scientific Publishing Company. Used with permission.)
(194)
Because 2kR 1, the scattering efficiency is much less than unity. The cross section, and hence likelihood, of Rayleigh scattering is characterized by a k4 or 1/λ4 dependence. This means that scattering of blue light by small particles is much more pronounced than the scattering of red light from the same particles. Probably the most ubiquitous illustration of this fact is the blue color of the sky. When sunlight enters the atmosphere, it is scattered by small particles in the atmosphere. All wavelengths are Rayleigh scattered to some degree, but the blue component of sunlight is scattered the most, and that is the light our eye picks up.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Another important property of light scattered by the atmosphere involves its polarization characteristics. Before it enters the atmosphere, light from the sun is completely unpolarized. Now consider some observer who views the scattering of this light along a line of sight perpendicular to a ray of sunlight. Recall that unpolarized light is an incoherent superposition of two orthogonal, linearly polarized waves. Choose one of these polarization components so that it is perpendicular to the line of sight and the other so that it is along the line of sight. Because the line of sight lies in the scattering plane for the perpendicular component, the Rayleighscattered wave associated with this component will also be linearly polarized perpendicular to the line of sight. The polarization component that lies along the line of sight is also Rayleigh scattered, however this component is normal to the scattering plane. Consequently, none of the scattering from this component occurs along the line of sight. The result is that, in this geometry, Rayleigh scattering makes the skylight linearly polarized.
Mie Scattering. For particles that are too large and/or whose refractive index is sufficiently different from that of the background medium to use the RGD approximation, the integral approach to scattering calculations becomes unmanageable. Instead, a differential treatment of the problem can be used. This method involves constructing the solution to Maxwell’s equations both inside and outside the particle, subject to the boundary conditions for the field components at the surface of the particle. The approach is rigorous, and has the advantage that it leads to an exact solution of the scattering problem for particles of any size and refractive index. The usefulness of the technique is limited, however, by its inherent mathematical complexity, hence the problem has been completely solved only for scattering of light from homogeneous spheres, long cylinders or rods, and prolate/oblate spheroids. The application of the differential formalism to the general problem of scattering by a homogeneous sphere (radius R) is referred to as Mie scattering (30). The solution for the scattered field a distance r far outside the particle is ∞ ei(kr−ωt) 2 + 1 Es = E0 −ikr =1 ( + 1) " # P(1) d (1) (cos θ ) × a + b P (cos θ ) . sin θ dθ
(195)
As before, the expression given is valid only for points in the scattering plane. The P(1) ’s stand for associated Legendre functions — these are tabulated in standard mathematical handbooks. The a ’s and b ’s are coefficients given by ξ (x)ξ (y) − mξ (x)ξ (y) a = ζ (x)ξ (y) − mζ (x)ξ (y) and b =
ξ (x)ξ (y) − mξ (x)ξ (y) . mζ (x)ξ (y) − ζ (x)ξ (y)
(196)
253
As before, m is the relative refractive index of the particle. (2) ξ (z) = zj (z), and ζ (z) = zh(2) (z), where the j ’s and h ’s are, respectively, spherical Bessel and spherical Hankel functions of order . A primed ( ) function indicates a derivative. The arguments are x = kR and y = mkR. It is interesting to note that, irrespective of the particle’s size and refractive index, there is never a depolarized scattering component when detection is performed in the scattering plane. In general, this is not true for nonspherical scatterers. However, recall that when a particle of any shape whatsoever satisfies the RGD conditions, the depolarized component does indeed vanish. Figure 51 shows how the intensity of the scattered light, which is proportional to |Es |2 , varies with scattering angle for m = 1.33, 1.44, 1.55, and 2.0, evaluated at two different values of kR. Absorption and Emission of Photons Until now, the interaction of electromagnetic radiation with matter has been discussed from a classical perspective, where the radiation was treated as a wave. Now, we turn to a quantum picture of radiation–matter interactions, where the EM radiation is treated as a collection of photons (refer to the previous section on Quantum nature of radiation and matter). Specifically, we consider processes that involve the emission or absorption of a single photon. Processes of this type are called first-order radiative processes. Excitation and Deexcitation of Atoms by Absorption and Emission An atom in its ground state or other low-lying energy level Ea can be promoted to a state of higher energy Eb by the absorption of a photon of energy hν = Eb − Ea . As one might expect, the likelihood that an atom absorbs one of these photons is proportional to the intensity, or number of such photons, in the incident radiation field. Once in an excited state, an atom may be able to make a downward transition (subject to certain atomic selection rules) to a state of lower energy by emitting a photon of energy hν, again matching the energy difference Eb − Ea between the two states. When such an event occurs, it falls under one of two headings — either spontaneous emission or stimulated emission.
Spontaneous Emission. This process refers to the emission of a photon by an excited atom when no external EM radiation field is present. That is to say, when an excited atom is left on its own, in the absence of any perturbing outside influence, there is some chance that it will spontaneously emit a photon on its own accord. The photon’s polarization and direction of emission are completely random. One defines Einstein’s coefficient of spontaneous emission, normally denoted by Ae , which is the probability per unit time that an individual atom will emit a photon via spontaneous emission. The coefficient depends on ν, the frequency of the emitted photon, according to Ae =
8π 2 p2ba ν 3 . 3ε0 h ¯ c3
(197)
254
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Comparing this result to Eq. (38), we see that our quantum-mechanical expression is nearly identical to that for the power emitted by a classical radiating dipole, i.e., both expressions are proportional to the fourth-power of the radiation frequency and the square of the dipole moment. At a certain level, it is natural then to think that the emission of EM radiation from an atom is due to the oscillating dipole moment produced by the circulating atomic electrons. Keep in mind, however, that this is a semiclassical point of view that has limited utility. Equation (197) allows one to estimate the fluorescent or radiative lifetime, τ = 1/Ae , of an excited state before it decays spontaneously. For an order-of-magnitude calculation, we can replace the dipole moment pba by ea, where a is the approximate linear dimension of the atom. Then, ωa 2 ω, (199) τ −1 = Ae ∼ α c
10
kR = 2.0 1 × 10−2
0.1
Relative intensity
0.01
m = 1.33
× 10−4
m = 1.44
1E−3 1E−4
m = 1.55 × 10−7
1E−5 1E−6
m = 2.0 1E−7 1E−8 1E−9
0
30
60
90 120 q (degrees)
150
180
1E+4
KR = 6.0 1E+3
This result shows that for excited atomic states that lead to transitions in the visible region of the spectrum (ω ∼ 1015 s−1 ), the lifetimes are on the order of nanoseconds, whereas X-ray transitions (ω ∼ 1018 s−1 ) have lifetimes on the order of picoseconds.
1E+2
m = 1.33
10 Relative intensity
× 10−2 1
m = 1.44
0.1 × 10−4
0.01
m = 1.55
1E−3 × 10−6
1E−4
m = 2.0
1E−5 1E−6 1E−7
0
30
60
90 q (degrees)
120
150
180
Figure 51. Mie scattering from a dielectric sphere (radius R, refractive index n) for kR = 2.0 and 6.0 (k is the wave number in the surrounding medium whose refractive index is n0 ) and for different values of m = n/n0 . (Adapted from Interaction of Photons and Neutrons with Matter by S. H. Chen and M. Kotlarchyk. Copyright 1997 World Scientific Publishing Company. Used with permission.)
pba is a quantum-mechanical quantity known as the dipolemoment matrix element associated with the transition between the upper and lower energy states of an atom. Its value is determined by an integral that involves the wave functions of the two atomic states (15). The power P emitted by the atom during spontaneous emission is obtained by simply multiplying Eq. (197) by the photon energy hν: P=
p2 ω 4 16π 3 p2ba ν 4 = ba 3 . 3 3ε0 c 3π ε0 c
where α = e2 /4π ε0 h ¯ c ≈ 1/137 is the so-called fine structure constant. Furthermore, h ¯ ω = Eb − Ea ∼ e2 /4π ε0 a, or ωa/c ∼ α, so, (200) τ −1 ∼ α 3 ω.
(198)
Stimulated Emission. In the process of stimulated emission, an existing EM field induces the emission of a photon from an excited atom. The emitted photon will be of the same type or mode; it will have the same frequency, polarization, and direction of travel as one of the photons in the external field. It is important to know how the probability of stimulated emission compares to the probabilities for spontaneous emission and absorption. If there are n photons that populate a particular mode in the external field, the occurrence of stimulated emission for that photon mode is n times as likely as that for spontaneous emission into that mode. Furthermore, Einstein first discovered the following amazing fact. The probability that an excited atom undergoes stimulated emission to some lower energy state is precisely the same as the probability that an atom in the lower state will be excited to the upper one by photon absorption. When a large collection of atoms is immersed in a radiation field, the process of stimulated emission competes with that of photon absorption. Stimulated emission tends to increase the number of photons in a particular mode and makes it more and more probable that further emissions will occur. On the other hand, photon absorption removes photons from the field and counters the effects of stimulated emission. Because the probabilities of stimulated emission and absorption are identical, the process that dominates is determined by the fraction of atoms in low-energy and high-energy states. For a system in thermal equilibrium, most of the atoms will be in the ground state, and absorption dominates. However,
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
in a laser (see previous section on Lasers), a mechanism such as optical pumping is introduced, which effectively inverts the statistical distribution of the number of atoms in excited versus low-energy states. This makes the rate of stimulated emission overwhelm the rate of absorption. The cumulative result is a coherent amplification of a highly collimated, monochromatic, polarized beam. This process is fundamentally responsible for laser operation. In lasers, spontaneous emission also occurs, but it appears as background noise in the field. Molecules and Luminescence. The term luminescence refers to the spontaneous emission of photons from excited electronic states of a molecule. More specifically, in photoluminescence a molecule absorbs a photon of energy hν, generally decays to a lower energy excited electronic state via some non-radiative relaxation process (see below), and then emits a photon of energy hν which is less energetic than the absorbed photon. Chemiluminescence is the emission of radiation from a molecule that has been left in an excited electronic state as a result of some chemical reaction (31). Luminescence is generally divided into two categories, depending on the radiative lifetime of the excited state, namely, fluorescence and phosphorescence; lifetimes of phosphorescence are much longer. As a general rule of thumb, the lifetimes of fluorescence are usually on the order of picoseconds to hundreds of nanoseconds, whereas lifetimes for phosphorescence of are on the order of microseconds or longer, sometimes even as long as seconds. Luminescence does not account for all decays from excited molecular states. The reason is that a substantial number of nonradiative processes compete with the spontaneous emission of photons. For example, when a molecule is excited to a high vibrational level (see section on Vibrational and rotational states in molecules), it will usually relax quickly to the lowest vibrational level of the excited electronic state without the emission of a photon. This process is referred to as vibrational relaxation. Other radiationless relaxation processes include internal conversion (nonradiative transition between excited states of identical spin) and intersystem crossing (transition between excited states of different spin). Energy released from nonradiative deexcitation processes appears as thermal energy in the material. One defines a quantum yield for luminescence, as the fraction of transitions that are radiative (i.e., photon-producing). In practice, the quantum yield is less than unity for virtually all luminescent materials. Photoelectric Effect and Its Cross Section Until now, the discussion has been concerned with the absorption and emission of photons that accompany transitions between discrete, or bound, quantum states of atoms and molecules. In the photoelectric effect, a photon is absorbed by an atom, and one of the electrons is kicked out of the atom. In other words, as a result of the process, the atom is left in an unbound state, and a so-called photoelectron is promoted to the energy continuum. There is a strong tendency for the electron to be emitted at an angle close to 90 ° relative to the propagation direction
255
of the incoming photon, especially at low energies. At higher energies, the angular distribution of photoelectrons is shifted somewhat toward the forward direction (i.e., toward angles less than 90° ) (12). The kinetic energy Te of the ejected photoelectron is simply (201) Te = hν − Eb, where hν is the energy of the absorbed photon and Eb represents the binding energy of the electron in the atom. Clearly, the photoelectric effect is possible only if the energy of the incoming photon is at least as large as the binding energy of the most weakly bound electron in an outer atomic shell. Emission of an electron from a more tightly bound shell requires the absorption of a higher energy photon, so that hν exceeds the shell’s absorption edge. Specifically, the K, L, M, . . .-edges refer to the binding energies of atomic electrons in the first, second, third,. . . shells. It should also be emphasized that whenever a photoelectric event occurs, the emitted electron leaves behind a vacancy in one of the atomic shells. Consequently, the photoelectric effect is always followed by downward transitions made by outer shell atomic electrons, which in turn are accompanied by the emission of characteristic X rays from the ionized atom. It is observed, however, that the measured intensities of the resulting X ray emission lines are often very different from what one would expect, especially when there is an inner shell ionization of an element that has a low Z (atomic number) because, in addition to deexcitation by X ray emission, there may also be a significant probability for a nonradiative transition within the atom — this is known as the Auger effect. In this process, rather than emitting a photon of some characteristic energy, the ionized atom emits a secondary electron whose kinetic energy is equal to the aforementioned characteristic energy less the electron’s binding energy in the already singly ionized atom. Such an electron is readily distinguishable from the primary photoelectron in that the energy of the latter depends on the energy of the incident photon [see Eq. (201)], whereas the energy of an Auger electron does not. For atoms ionized in a given shell, one is usually interested in knowing the fluorescent yield Y which is the fraction of transitions to the vacant level that produce an X ray (rather than an Auger electron). The fluorescent yield of a given shell increases with atomic number Z according to the approximate form (32), Y=
1 . 1 + βZ−4
(202)
For the K-shell fluorescent yield, β has a value of about 1.12 × 106 , and for the L-shell, β is about 6.4 × 107 . The literature contains very little information on the fluorescent yields for the M shell and above. An important quantity to consider is the quantummechanical cross section for the photoelectric effect. This is defined as the rate of occurrence of photoelectric events (number of events per unit time) divided by the incident photon flux (i.e., the number of photons in the incident
256
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
beam that crosses a unit area per unit time). In effect, the cross section is a measure of the quantum-mechanical probability that a photon will undergo photoelectric absorption. It is found that this cross section is a strong function of the incident photon energy hν and the atomic number Z of the target atom. When hν is near an absorption edge, the cross section varies greatly and generally changes abruptly at the binding energy of each atomic shell. This effect is most pronounced for high-Z target atoms. For K-shell absorption assuming that hν is far above the absorption edge (i.e., hν Eb ), a quantummechanical calculation shows that the photoelectric cross section is proportional to (hν)−7/2 and Z5 . In other words, the probability of photoelectric interaction diminishes rapidly as incident photon energy increases and increases strongly as the atomic number of the target increases. These results are valid only when the photoelectron produced is nonrelativistic, that is when the kinetic energy of the electron is much less than its rest energy (∼511 keV). In the highly relativistic case, the energy dependence of the cross section varies as (hν)−1 , and there are some complex, but rather small, modifications to the cross section’s Z dependence (12). Pair Production An important mechanism for absorbing high-energy photons, or γ rays, by a material is called pair production. In this process, a photon enters a material, it spontaneously disappears, and is replaced by a pair of particles, namely, an electron and a positron (i.e., a positively charged electron). Electric charge is conserved in this process because the photon carries no charge, and the net charge of an electron–positron pair is zero as well. To satisfy energy conservation, pair production requires a minimum threshold energy for the incoming photon. In relativistic terms, the photon must be at least energetic enough to be replaced by the rest energy of an electron–positron pair. So it appears that pair production requires a minimum photon energy of twice the rest energy of an electron, or 2me c2 . However, this statement is not exactly correct because if a photon is replaced by an electron and a positron at rest in vacuum, the relativistic momentum of the system is not conserved. This is easy to see because a photon always carries a momentum hν/c; however, a resting electron–positron pair clearly has no momentum. What makes pair production possible is the presence of another particle, specifically some nearby massive atomic nucleus (mass M). By requiring that the process take place within the field of such a nucleus, it becomes possible to conserve both energy and momentum, and pair production can occur. The correct expression for the threshold energy of the incident photon becomes me . (hν)min = 2me c2 1 + M
(203)
Because the electron mass is thousands of times smaller than the nuclear mass, the term me /M is virtually negligible. Hence, as before, one can say that for all practical purposes, the threshold energy is approximately twice the electron rest energy, or 1.02 MeV.
Above the threshold value, the energy dependence of the pair-production cross section (rate of pair production divided by incident photon flux) is quite complex, and values are usually obtained from tables or graphs. However, the general trend is that just above the threshold, the probability of interaction is small, but as energy increases, so does the cross section. For highenergy photons above 5–10 MeV, the process of pair production predominates over other interactive processes. The dependence of the cross section on atomic number is nearly proportional to Z2 . Hence, if for some photon energy hν, it is known that the pair-production cross section for a particular element of atomic number Z is σpp (hν, Z), then the cross section for an element of atomic number Z is σpp (hν, Z )
Z Z
2 σpp (hν, Z).
(204)
Shortly following pair production, the positron that is created always combines with an electron in the material. The two particles mutually annihilate one another and emit two back-to-back 511-keV photons. This radiation may escape from the material, or it may undergo either photoelectric absorption or Compton scattering (see next section). Scattering of Photons Recall that in the context of quantum mechanics, the absorption and emission of electromagnetic radiation are classified as first-order radiative processes because these processes involve either annihilating (absorbing) or creating (emitting) a single photon. On the other hand, scattering that involves electromagnetic radiation is considered a second-order radiative process because, from a quantum-mechanical perspective, scattering processes involve the participation of two photons, the incident photon and the scattered photon. According to the theory of quantum electrodynamics (QED), scattering is not simply a redirection of one and the same photon. Instead, it involves annihilation of the incident photon, accompanied by the creation of the scattered photon. Analogous to classical scattering (see section on Thomson scattering of X Rays and the concept of a cross section), one can define an angular differential cross section dσ/d for the scattering of photons. It is defined as the rate of scattering per unit solid angle d divided by the incident photon flux. As in the classical case, the total scattering cross section σ , is obtained by integrating the differential cross section over all solid angles. Compton Scattering by a Free Electron. Begin by considering the scattering of X rays and γ rays by an atomic electron. For such high-energy radiation, the incident photon energy hν far exceeds the binding energy of an atomic electron; hence, one is justified in treating the electron as a particle that is virtually free. The quantum theory of scattering from a free electron is essentially an extension of classical Thomson scattering; however, the latter was applicable only to photon energies far less than the electron’s rest energy, where relativistic considerations can be neglected. Now, we allow for
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
λ of the scattered and the wavelength λ, of the incident photons (also known as the Compton shift) is
hn′ hn′/c
hn hn/c
e−
θ e−
Before scattering
λ − λ =
φ
pe′ Ee
After scattering
Figure 52. Compton scattering of a high-energy photon (energy hν, momentum hν/c) by a free electron. The energy and momentum of the scattered photon are hν and hν /c, respectively, and the electron recoils with momentum pe and total relativistic energy Ee .
high-energy X rays and γ rays and envision that the scattering event is a relativistic particle-like collision between an incoming photon and an initially resting free electron, as illustrated in Fig. 52. This is referred to as Compton scattering. Because the incident photon carries a momentum hν/c, the electron must experience a recoil to conserve momentum in the system. As a result, the scattered photon will have an energy hν and momentum hν /c less than that of the incident photon. Thus, unlike the classical situation, the scattering process shifts the frequency of the radiation downward, that is, ν < ν. The precise relationship between the incident and scattered frequencies comes from requiring conservation of both the relativistic energy and momentum of the system. The energy-conservation equation is
hν + me c = hν + Ee , 2
(205)
where Ee is the total relativistic energy (sum of the rest energy me c2 and kinetic energy Te ) of the scattered (i.e., recoiling) electron. It is related to the relativistic momentum pe , of the scattered electron by Ee = Te + me c2 =
(me c2 )2 + (pe c)2 .
(206)
The conservation relationships for the momentum components are hν hν (207) = cos θ + pe cos φ c c and 0=
hν sin θ − pe sin φ, c
257
(208)
where θ and φ are the scattering angles of the photon and electron, respectively, relative to the propagation direction of the incident photon. Manipulation of Eqs. (205)–(208) produces the following result for the energy of the scattered photon: hν . (209) hν = 1 + α(1 − cos θ ) Here, α = hν/me c2 is the incident photon energy in units of 511 keV. Using the fact that a photon’s wavelength is the speed of light c divided by its frequency, it becomes easy to show that the difference between the wavelength
h (1 − cos θ ). me c
(210)
The kinetic energy of the recoil electron is obtained simply from the fact that Te = hν − hν . Using Eq. (209), the expression reduces to Te = hν
α(1 − cos θ ) . 1 + α(1 − cos θ )
(211)
The relationships derived until now have been based solely on considerations of momentum and energy conservation. Deriving the probability, or cross section, of Compton scattering requires undertaking a complicated, fully relativistic, quantum-mechanical calculation. The closed-form expression for the differential scattering cross section dσ/d for Compton scattering has been determined and is known as the Klein-Nishina formula. The expression is valid for a beam of unpolarized (i.e., randomly polarized) X rays or γ rays and is given by
dσ d
= unpol
1 2 ν 2 (1 + cos2 θ ) r0 2 ν × 1+
$ α 2 (1 − cos θ )2 , (212) (1 + cos2 θ )[1 + α(1 − cos θ )]
where r0 is the previously defined classical electron radius (2.82 × 10−15 m). As shown in Fig. 53, there is a pronounced increase in the fraction of photons scattered in the forward direction as incident photon energy increases. The total cross section for Compton scattering by an electron, σ , is obtained by integrating the Klein–Nishina formula over all solid angles. The resulting energy dependence of σ is plotted in Fig. 54. The important facts are as follows: • For hν much less than me c2 (i.e., α 1), the scattering cross section from a free electron approaches 665 millibarns, which matches the value of the cross section for classical Thomson scattering. This is precisely what is expected. • The cross section is a monotonically decreasing function of energy. • Each electron in an atom presents the same Compton scattering cross section to a beam of incoming photons. Therefore the cross section for a particular atom is simply the number of electrons, that is, the atomic number Z, multiplied by the electronic cross section. Rayleigh Scattering by Atoms. Previously (see section on Classical Rayleigh scattering), Rayleigh scattering was treated as scattering by particles very much smaller than the wavelength of light. In Rayleigh scattering at the quantum level, a photon interacts with the constituent bound electrons of an atom and undergoes elastic scattering; the energy of the scattered photon is
258
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
80 a =0
(ds/dW)unpol (millibarns/steradian)
70 60 a = 0.1 50 40 30 20
a =1
10 0
a = 10 0
30
60
90 120 q (degrees)
150
180
Figure 53. Differential scattering cross section for the Compton scattering of randomly polarized photons as a function of scattering angle θ. The cross section is displayed for four different values of the parameter α = hν/me c2 , the ratio of the incident photon energy to the rest energy of an electron. (From Interaction of Photons and Neutrons with Matter by S. H. Chen and M. Kotlarchyk. Copyright 1997 World Scientific Publishing Company. Used with permission.)
700 600
s (millibarns)
500 400 300 200 100
a single, weakly bound electron, Rayleigh scattering requires the participation of the atom as a whole. In general, it is the primary mechanism for the scattering of EM radiation by atoms in the optical regime, where photon energies are less than or on the same order of magnitude as the spacing of atomic energy levels. The dominance of Rayleigh scattering also extends well into the soft X ray regime, up to photon energies of about 10 keV. In the range of about 10–100 keV, the cross section for the Rayleigh scattering of X rays competes with that for Compton scattering. For hard X rays (∼100 keV and beyond), the probability of Rayleigh scattering becomes virtually negligible. An important property of Rayleigh scattering is that the scattering is coherent. That is to say, the scattered radiation is coherent and allows for interference effects between Rayleigh-scattered waves from atoms at various locations in the target. This fact is what allows the possibility of X ray diffraction and crystallography, as well as many coherent light-scattering and optical effects. Rayleigh scattering arises because of a coupling between the field of the photon and the electric dipole moment of the atom. This type of interaction is the same as that responsible for a transition accompanying a simple single-photon (i.e., first-order) atomic absorption or emission process. Such an electric-dipole interaction, however, can only mediate a transition between distinctly different atomic states. Said another way, the dipolemoment matrix element pba , in Eq. (197), always vanishes when the initial and final atomic states are identical. Because the state of the atom is left unchanged as the result of elastic or Rayleigh scattering, one concludes that some bridging intermediate atomic state is required for the process to take place. The basic idea is illustrated in Fig. 55. Here, one clearly sees that Rayleigh scattering is a second-order, or two-photon, process: The target atom starts with initial energy Ea and absorbs the incident photon of energy hν. The atom forms an excited intermediate state of energy EI and reemits a photon, also of energy hν. Because, this photon, in general, propagates in a direction different from that of the incident photon, it is perceived as scattered. The events described, along with Fig. 55, represent a conceptually simplified version of what is really a fuzzy quantum-mechanical process. First of all, the photon energy hν does not, in general, match the energy difference, EI − Ea , between the intermediate and initial atomic states. The transitions that take place are called virtual transitions, and the intermediate states are
0 0.01
0.1
1 a
10
1E+2
Figure 54. Total cross section for Compton scattering as a function of α = hν/me c2 . (From Interaction of Photons and Neutrons with Matter by S. H. Chen and M. Kotlarchyk. Copyright 1997 World Scientific Publishing Company. Used with permission.)
identical to that of the incident photon. As a result, this type of scattering leaves the atom in its original bound state. Unlike Compton scattering, which only involves
Intermediate state
hn
EI hn
Ea Figure 55. Elastic Rayleigh scattering of a photon from an atom occurs via an intermediate atomic state.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
often referred to as virtual states. A virtual transition does not need to conserve energy because a virtual state exists only for a fleeting instant. According to the time–energy uncertainty principle of quantum mechanics, violation of energy conservation is permitted for extremely short periods of time (15). In particular, the shorter the lifetime of an excited state, the greater the opportunity not to conserve energy (this principle is responsible for the natural width of spectral lines). Because the lifetime of a virtual state is infinitesimally small, there is no requirement that the excitation energy, EI − Ea , match the energy of the photon. Secondly, an atom presents many choices for the excited intermediate state. According to quantum electrodynamics (QED), the system will actually sample all possible intermediate states! And finally, the transition to and from an intermediate state does not necessarily take place in the order previously described. In addition to having the incident photon vanish before the appearance of the scattered photon, QED also allows the possibility that the scattered photon will appear before the incident photon disappears! As nonsensical as this may seem, such an event is possible in the quantum world. The two possible sequences of events are often illustrated by using constructs called Feynman diagrams. The characteristic behavior of the cross section for Rayleigh scattering depends strongly on the energy of the photon relative to the excitation energy, EI − Ea . The behavior can be divided into the following three regimes: • hν EI − Ea Here, one considers the very long-wavelength limit (relative to the size of an atom) that corresponds to the elastic scattering of optical photons. In this case, the quantum mechanically derived cross section exhibits the same 1/λ4 dependence, as well as the same angular dependence, that characterized the classical formulation of Rayleigh scattering. What is gained from the quantum derivation is an expression for calculating the polarizability of the atom given the available intermediate states and dipole-moment matrix elements of the atom. • hν EI − Ea This corresponds to resonant scattering, or the phenomenon of resonant fluorescence. The resulting cross section is extremely large. Resonant fluorescence occurs, for example, when a well-defined beam of monochromatic yellow light from a sodium lamp enters a transparent cell that contains sodium vapor. When an incoming photon undergoes elastic scattering by one of the gas atoms, the process can be thought of as an absorption of the photon, accompanied by the immediate emission of a photon at precisely the same frequency, but in a different direction. Because the probability of such an event is so large at resonance and because the light frequency is unaffected by the process, the light scattered by any one atom will be rescattered by another. This sequence of events will occur over and over within the gas, causing strong diffusion of the light beam. The result is a pronounced yellow glow from the sodium cell.
259
• hν EI − Ea This condition is met in the soft X ray region of the spectrum (hν ∼ a few keV) where the wavelengths are still large compared to the size of the atom. In this region, the atomic cross section for Rayleigh scattering is proportional to Z2 . In fact, the cross section is exactly Z2 times the Thomson scattering cross section from a single free electron. The fact that the multiplier is Z2 , rather than simply Z, is a consequence of the fact that the various electrons within the atom all scatter coherently with the same phase. Raman Scattering. The inelastic scattering of light from an atom is referred to as electronic Raman scattering. The cross section for this type of scattering is much smaller than that for elastic Rayleigh scattering. The basic process is illustrated in Fig. 56. The target atom that has initial energy Ea absorbs the incident photon of energy hν and makes a virtual transition to an intermediate state of energy EI . The scattered photon appears as a result of the downward transition from the intermediate state to an atomic state of energy Eb different from energy Ea . As in Rayleigh scattering, the principle of energy conservation can be briefly violated; hence, the energy of the incident photon need not match the transition energy, EI − Ea . In addition, all possible intermediate states come into play, and the order of photon emission and photon absorption can go either way. As shown in Fig. 56, the frequency of the scattered photon is either less than or greater than the frequency of the incident light, depending on which is larger, the energy of the initial state Ea , or that of the final atom Eb . When Eb > Ea , the frequency ν of the scattered radiation is downshifted relative to the frequency of the incident light ν, that is, ν = ν − (Eb − Ea )/h, and is commonly referred to as the Stokes line. On the other hand, when Eb < Ea , then ν = ν + (Ea − Eb )/h, and the
(a)
Intermediate state
EI hn ′
hn
Eb Ea (b)
Intermediate state
hn
EI hn′
Ea Eb Figure 56. Inelastic Raman scattering from an atom occurs via an intermediate state. (a) The scattered photon energy hν is less than the energy hν of the incident photon. (b) The scattered photon energy is greater than the energy of the incident photon.
260
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
frequency is upshifted and produces an anti-Stokes line. In a scattered light spectrum, the Stokes and anti-Stokes lines appear as weak sidebands on either side of the elastic peak due to Rayleigh scattering.
acts as an effective thickness of the material, having units of g/cm2 , and Eq. (213) is rewritten as I = I0 e−(µ/ρ)(ρz) .
(215)
Attenuation of Gamma-Rays in Matter Recall that in the classical treatment of an electromagnetic wave that passes through a material (see section on Electromagnetic waves in matter), the intensity of a light beam decreases exponentially as it propagates [(see Eq. (82)]. There is a continuous absorption of the energy carried by the wave along the direction of travel, and the intensity decays by a factor of 1/e (i.e., ∼0.37) after propagating a distance of one-half the skin depth δ of the medium. The passage of high-energy X rays and γ rays through matter is also characterized by an exponential drop in intensity as a function of the distance z into a medium. The intensity decays as
Figure 57 shows plots of the mass-attenuation coefficient as a function of energy for the passage of photons through silicon and lead. The abrupt peaks that appear correspond to various K, L, M, . . . photoelectric absorption edges. Figure 58 is a map that indicates which of the three interactive processes dominates for different combinations of incoming photon energy and atomic number of the target.
1E+4
1E+3
(213)
where µpe , µc , and µpp are the linear attenuation coefficients for photoelectric interaction, Compton scattering, and pair production, respectively. Each attenuation coefficient is actually the atomic cross section for that particular interaction multiplied by the atomic number density (# atoms/m3 ) for the attenuating medium. Because the latter quantity clearly depends on the physical state of the material (i.e., density ρ of the solid, liquid, or gas), it is often more convenient to tabulate the quantity µ/ρ for each process, as well as for all processes together. µ/ρ, known as the mass-attenuation coefficient of the medium, is usually quoted in units of cm2 /g. The quantity ρz then
Pb
1E+2 m/r (cm2/g)
where µ is called the linear attenuation coefficient of the medium. µ is analogous to the classical quantity 2/δ, but it has a different physical interpretation. In highenergy radiation, it is necessary to think of the intensity in terms of the number of photons carried by the beam. As these photons travel through a medium, the beam is weakened by the interaction of individual photons within the material. Three major interactive mechanisms are present: photoelectric interaction, Compton scattering, and pair production. Any one of these processes removes a photon from the incident beam. Because each process is characterized by its own cross section, there is a certain probability that each will occur, depending on the nature of the target medium and the energy of the incoming photon. Unlike the wave picture, where the beam intensity decays continuously, high-energy X rays and γ rays are removed from the beam via individual, isolated, quantum events that occur with certain likelihoods. The quantity 1/µ can be interpreted as the mean distance traveled by a photon before it undergoes an interaction. The linear attenuation coefficient of a given medium is the sum of the attenuation coefficients due to each type of interaction: (214) µ = µpe + µc + µpp ,
1E+1 Si
1
0.1
1E−2 1E−3
0.01
0.1 1 hn (MeV)
10
1E+2
Figure 57. Mass-attenuation coefficients for lead and silicon as a function of photon energy. [Data obtained from NIST Physics Laboratory Website (33).]
120 100 Z of absorber
I = I0 e−µz ,
80
Photoelectric effect dominant
Pair production dominant
60 Compton effect dominant
40 20 0 0.01
0.05 0.1
0.5 1 hn (MeV)
5
10
50 100
Figure 58. Relative importance of the three major types of γ -ray interactions for different combinations of hν, the incident photon energy, and Z, the atomic number of the target. The lines show the values of Z and hν for which the neighboring effects are just equal. (From The Atomic Nucleus by R. D. Evans. Copyright 1955 McGraw-Hill Book Company. Reproduced with permission.)
ELECTRON MICROSCOPES
ABBREVIATIONS AND ACRONYMS AM CT dc EM FTIR FWHM IR LIDAR PET QED RF RGD ROY-G-BIV SI TE TIR TM UV
amplitude modulated computerized tomography direct current electromagnetic frustrated total internal reflection full width at half-maximum infrared light detection and ranging positron-emission tomography quantum electrodynamics radio frequency Rayleigh–Gans–Debye red-orange-yellow-green-blue-indigo-violet International system of units transverse electric total internal reflection transverse magnetic ultraviolet
BIBLIOGRAPHY 1. S. T. Thornton and A. Rex, Modern Physics for Scientists and Engineers, 2nd ed., Saunders College Publishing, Fort Worth, 2000. 2. E. Hecht, Optics, 3rd ed., Addison-Wesley, Reading, MA, 1998. 3. R. Loudon, The Quantum Theory of Light, 2nd ed., Oxford University Press, NY, 1983. 4. F. L. Pedrotti and L. S. Pedrotti, Introduction to Optics, 2nd ed., Prentice-Hall, Upper Saddle River, NJ, 1993. 5. J. D. Jackson, Classical Electrodynamics, 3rd ed., Wiley, NY, 1998. 6. J. R. Reitz, F. J. Milford, and R. W. Christy, Foundations of Electromagnetic Theory, 4th ed., Addison-Wesley, Reading, MA, 1993. 7. M. Born and E. Wolf, Principles of Optics, 7th ed., Cambridge University Press, Cambridge, UK, 1999. 8. C. Scott, Introduction to Optics and Optical Imaging, IEEE Press, NY, 1998. 9. C. F. Bohren and D. R. Huffman, Absorption and Scattering of Light by Small Particles, Wiley, NY, 1983. 10. M. Kerker, The Scattering of Light and Other Electromagnetic Radiation, Academic Press, NY, 1969. 11. P. W. Barber and S. C. Hill, Light Scattering by Particles: Computational Methods, World Scientific, Singapore, 1990. 12. S. H. Chen and M. Kotlarchyk, Interaction of Photons and Neutrons with Matter, World Scientific, Singapore, 1997. 13. D. Marcuse, Engineering Quantum Electrodynamics, Harcourt, Brace & World, NY, 1970. 14. G. Herzberg, Molecular Spectra and Structure, Van Nostrand Reinhold, NY, 1950. 15. D. J. Griffiths, Introduction to Quantum Mechanics, PrenticeHall, Upper Saddle River, NJ, 1995. 16. M. Born and J. Oppenheimer, Ann. Phys. 84, 457–484 (1927). 17. A. Goswami, Quantum Mechanics, 2nd ed., Wm. C. Brown, Dubuque, IA, 1997, pp. 425–430. 18. K. F. Sander, Microwave Components and Systems, AddisonWesley, Wokingham, UK, 1987, pp. 73–81. 19. R. P. Godwin, in Springer Tracts in Modern Physics, vol. 51, G. Hoehler, ed., Springer-Verlag, Heidelberg, 1969, pp. 2–73.
261
20. H. Winick, Synchrotron Radiation News 13, 38–39 (2000). 21. M. Alonso and E. J. Finn, Fundamental University Physics: Fields and Waves, vol. 2. Addison-Wesley, Reading, MA, 1967, pp. 740–741. 22. H. Wiedemann, Synchrotron Radiation Primer, Stanford Synchrotron Radiation Laboratory, Stanford, CA, 1998, http://www-ssrl.slac.stanford.edu/welcome.html. 23. P. W. Milonni and J. H. Eberly, Lasers, Wiley, NY, 1988. 24. M. Sargent III, M. O. Scully, and W. E. Lamb, Jr., Laser Physics. Addison-Wesley, Reading, MA, 1977. 25. S. F. Jacobs, M. O. Scully, M. Sargent III, and H. Pilloff, Physics of Quantum Electronics, vol. 5–7. Addison-Wesley, Reading, MA, 1978–1982. 26. C. Brau, Free-Electron Lasers, Academic Press, San Diego, CA, 1990. 27. M. V. Klein and T. E. Furtak, Optics, 2nd ed., Wiley, NY, 1986, pp. 98–121. 28. D. R. Lide, ed., CRC Handbook of Chemistry and Physics, 81st ed., CRC Press, Boca Raton, FL, 2000, pp. 12–136 and 12–150. 29. B. E. Warren, X-Ray Diffraction, Dover, NY, 1990. 30. G. Mie, Ann. Physik 25, 377–445 (1908). 31. M. A. Omary and H. H. Patterson, in J. C. Lindon, G. E. Tranter, and J. L. Holmes, eds., Encyclopedia of Spectroscopy and Spectrometry, Academic Press, London, 1999, pp. 1186–1207. 32. E. H. S. Burhop, The Auger Effect and Other Radiationless Transitions, Cambridge University Press, London, 1952, pp. 44–57. 33.
Physical Reference Data, National Institute of Standards and Technology — Physics Laboratory, Gaithersburg, MD, 2000, http://physics.nist.gov/PhysRefData/XrayMassCoef/ tab3.html.
ELECTRON MICROSCOPES DIRK VAN DYCK S. AMELINCKX University of Antwerp Antwerp, Belgium
In 1873, it was proven by Ernst Abbe that the resolving power of a light microscope will always be limited by the wavelength of the light, which is of the order of 1 µm, so that there could be no hope to visualize much smaller objects such as atomic scale structures. (In the 1980s, near-field optical scanning techniques were developed that can bring the resolution down by two orders of magnitude.) Fifty years later, a new impulse was given to the problem by the hypothesis of Louis De Broglie about the wave nature of particles so that other particles could also serve as ‘‘light.’’ In 1931 Ernst Ruska developed the first transmission microscope TEM that uses electrons instead of photons. In 1986 Ernst Ruska was awarded the Nobel Prize for his pioneering work. Electrons are the best candidates since they can easily be generated by a heated filament or extracted from a point by an electric field and they are easily deflected by electric and magnetic fields.
262
ELECTRON MICROSCOPES
When accelerated to, say, 100 keV, their wavelength is much smaller (3 pm = 3 × 10−12 m) than that of visible light. They can also be detected on a photoplate, a fluorescent screen, or an electronic camera. On the other hand, they can only propagate in vacuum and they can only penetrate through very thin objects 0
B
B
B
D
d>0
B
D
B
a≠π
–
–
–
–
–
–
–
–
–
sin a > 0
D
D
D
B
d>0
D
B
D
D
Figure 16. Comparison of fringe pattern characteristics due to a stacking fault (left) and to a domain boundary (right). The abbreviations F, L, B, and D denote first fringe, last fringe, bright, and dark, respectively.
270
ELECTRON MICROSCOPES
nature. The reverse is true for domain boundary fringes; the edge fringes are of opposite nature in the bright-field image, whereas in the dark-field image the nature of the two edge fringes is the same. From the nature of the first and last fringes one can conclude, for instance, whether a stacking fault in a face-centered cubic crystal is either intrinsic (i.e., of the type . . .abcababc. . .) (8) or extrinsic (i.e., of the type . . .abcabacabc. . .). Figure 16, right refers to a domain boundary whereas Fig. 19, left is due to a stacking fault. Dislocation Contrast (9,10) The contrast produced at dislocation lines can be understood by noting that the reflecting lattice planes in the regions on two opposite sides of the dislocation line are tilted in the opposite sense. Hence the Bragg condition for reflection is differently affected on the two sides of the line. On one side, the diffracted intensity may be enhanced because the Bragg condition is better satisfied (s is smaller), whereas it is decreased on the other side because s is larger, leading to a black-white line contrast shown schematically in Fig. 17 for the case of an edge dislocation. In this schematic representation the line thickness is proportional to the local beam intensity. In bright-field images dislocation lines are thus imaged as dark lines, slightly displaced from the actual position of the dislocation line towards the ‘‘image side.’’ This model implies that imaging in reflections associated with families of lattice planes that are not deformed
by the presence of the dislocation will not produce a visible line image; the image is then said to be extinct (Fig. 18). The extinction condition can to a good approximation be formulated as H · b = 0, where b is the Burgers vector of the dislocation. If extinction occurs for two different sets of lattice planes with normals H1 and H2 , the direction of the Burgers vector b is parallel to H1 × H2 . Images of dislocations can be simulated accurately by numerically solving the equations that describe the dynamical scattering of the electrons in the object (see the Appendix). Fast computer programs (10) have been developed to calculate such images for various strain fields and various diffraction conditions. An example of the agreement between the observed and the computed image that can be achieved is shown in Fig. 19 after Ref. 10. Weak-Beam Images (11)
C E1
Figure 18. Three images of the same area made under two-beam conditions using three different diffraction vectors H1 = 020, H2 = 110, H3 = 210. Note the extinctions.
eZ
E E2 k0
s H sH
O
The width of the bright peak that images a dislocation in the dark-field imaging mode decreases with increasing S. This effect is exploited systematically in the weak-beam method, which allows one to image the dislocations as very fine bright lines on a dark background, using a reflection that is only weakly excited, that is, for which s is large. Unfortunately the contrast is weak and long exposure times are required (Fig. 20). High-Resolution Imaging
Images of Thin Objects IT
IS
BF
DF
Figure 17. Image formation at an edge dislocation according to the kinematical approximation. The thickness of the lines is a measure for the intensity of the beams. IT is the intensity of the transmitted beam and IS of the scattered beam.
A very thin object acts as a phase object, in which the phase is proportional to the projected potential along the electron path. The reason is that, in an electrostatic potential, the electron changes speed, which results in a phase shift. Then the exit wave of the object can be written as ψ(R) ≈ 1 + iσ Vp (R)
(10)
Vp (R) is the projected potential of the object. In the phase-contrast mode [Appendix A1, Eq. (18)], the phase shift of π/2 changes i into −1 so that the image
ELECTRON MICROSCOPES
271
intensity is I(R) ≈ 1 − 2σ VP (R)
(11)
The image contrast of a thin object is proportional to its electrostatic potential VP (R) projected along the direction of incidence.
Building-Block Structures It often happens that a family of crystal structures exists, of which all members consist of a stacking of the simple building blocks but with a different stacking sequence. This is, for instance, the case in mixed-layer compounds, polytypes, and periodical twins, but also periodic interfaces such as antiphase boundaries and crystallographic shear planes can be considered as mixedlayer systems. If the blocks are larger than the resolution of the microscope, each block will show its characteristic contrast. In this way, stacking of the blocks can be directly ‘‘read’’ from the image. The relation between image and structure is called the image code. An example is shown in Fig. 21 for the case of the binary alloy Au4 Mn in which Au or Mn atoms are located on two sublattices of a single fcc lattice. In this image the Mn atoms are visualized as bright dots. The Au atoms are not visible. This kind of image can be interpreted unambiguously.
Interpretation using Image Simulation
Figure 19. Comparison of observed (left) and computer simulated (right) images of dislocations. Note the excellent correspondence (after Ref. 9).
Figure 20. Weak beam image of dislocations in RuSe2 .
In most cases, however, the image cannot easily be decoded in terms of the object structure, making interpretation difficult, especially at very high resolution, where the image contrast can vary drastically with the focus distance. As a typical and historical example, structure images obtained by Iijima for the complex oxide Ti2 Nb10 O25 with a point resolution of approximately 0.35 nm are shown in Fig. 22 (upper parts). The structure as reproduced schematically in Fig. 23 consists of a stacking of corner- or face-sharing NbO6 octahedra
Figure 21. Dark-field superlattice image of Au4 Mn. Orientation and translation variants are revealed. (Courtesy of G. Van Tendeloo.)
272
ELECTRON MICROSCOPES
Holographic Reconstruction Methods
Figure 22. Schematic representation of the unit cell of Ti2 Nb10 O29 consisting of corner-sharing NbO6 octahedra with Ti atoms in tetrahedral sites.
−350Å
−800Å
−950Å
Due to the complexity of the imaging process, the information about the structure of the object is scrambled in the image. The structural information can be extracted directly from the images using holographic reconstruction methods. These methods aim at undoing the image process, that is, going back from image to object. Such a procedure consists of three steps. First one has to reconstruct the electron wave in the image plane. Then one has to reconstruct the exit wave at the object and from this one has to deduce the projected structure of the object. In a recorded image, which shows only intensities, the phase information is lost. Hence, the reconstruction of the whole image wave is a typical phase problem that can only be solved using holographic methods. Basically two methods are workable. One method is offaxis holography (12). Here the electron beam is split in two waves by means of a biprism, which essentially is an electrostatically charged wire. One wave crosses the object so as to produce an enlarged image. The other wave (reference wave) passes by the object through the vacuum and interferes with the image wave in the image plane. In this way the high-resolution image is modulated by the interference fringes. From the position of the fringes one can then determine the phase of the electron wave. The other method is the focus variation method (1). The image wave is calculated by computer processing a series of images taken at different focus settings. Figure 24 shows an experimentally reconstructed exit wave for YBa2 Cu4 O8 . From this the structure of the object can be deduced.
Reconstruted PHASE
−1100Å
−1400Å
−1700Å
Figure 23. Comparison of experimental images (upper parts) and computer simulated images (lower parts) for Ti2 Nb10 O29 as a function of defocus. (Courtesy S. Iijima.)
with the titanium atoms in tetrahedral positions. Highresolution images are taken at different focus values, causing the contrast to change drastically. The best resemblance with the X-ray structure can be obtained near the optimum Scherzer defocus, which is −90 nm in this particular case. However, the interpretation of such high-resolution images never appears to be trivial. The only solution remains in the comparison of the experimental images with those calculated for various trial structures. The results of such a calculation using the model of Fig. 23 are also shown in Fig. 22 (lower parts) and show a close resemblance with the experimental images. Image simulation, however, is a very tedious technique that has to deal with a number of unknown parameters (specimen thickness, exact focus, beam convergence, etc.). Furthermore, the comparison is often done visually. As a consequence, the technique can only be used if the number of plausible models is very limited.
Experimental IMAGE Figure 24. Experimentally reconstructed exit wave for YBa2 Cu48 . Top, reconstructed phase; center: structure model; bottom: experimental image.
ELECTRON MICROSCOPES
Quantitative Structure Determination Ideally quantitative extraction of information should be done as follows. One has a model for the object, for the electron object interaction, for the microscope transfer and for the detection, that is, all the ingredients needed to perform a computer simulation of the experiment. The object model that describes the interaction with electrons consists of the assembly of the electrostatic potentials of the constituting atoms. Also the imaging process is characterized by a number of parameters such as defocus, spherical aberration, and voltage. These parameters can either be known a priori with sufficient accuracy or not, in which case they have to be determined from the experiment. The model parameters can be estimated from the fit between the theoretical images and the experimental images. What one really wants is not only the best estimate for the model parameters but also their standard deviation (error bars), a criterion for the goodness of fit, and a suggestion for the best experimental setting. This requires a correct statistical analysis of the experimental data. The goodness of the fit between model and experiment has to be evaluated using a criterion such as likelihood, mean square difference, or R factor (as with X-ray crystallography). For each set of parameters of the model, one can calculate this goodness of fit, so as to yield a fitness function in parameter space. In principle, the search for the best parameter set is then reduced to the search for optimal fitness in parameter space. This search can only be done in an iterative way as given in the schematic in Fig. 25. First one has a starting model, that is, a starting value for the object and imaging parameters {an }. From these one can calculate the experimental images. This is a classical image simulation. (Note that the experimental data can also be a series of images and/or diffraction patterns.) From the mismatch between experimental and simulated images one can
(a)
273
Simulation
p ({ni }/{ax }) Direct methods
Model space
Experimental data
{ax }
{n i }
New estimate Figure 25. Scheme for the refinement procedure.
obtain a new estimate for the model parameters (for instance, using a gradient method), which can then be used for the next iteration. This procedure is repeated until the optimal fitness (i.e., optimal match) is reached. The refinement procedure needs a good starting model to guarantee convergence. Such a model can be derived from holographic reconstruction. The refinement can also be done using experimental electron diffraction patterns. An application of such refinement is shown in Figs. 26 and 27. Figure 26 (left) shows an HREM image of a Mg/Si precipitate in an Al matrix (13). Figure 26 (right) shows the phase of the exit wave, which is reconstructed experimentally using the focus variation method. From this an approximate structure model can be deduced. From different precipitates and different zones, electron diffraction patterns could be obtained, which were used simultaneously for a final fitting. For each diffraction pattern the crystal thickness as well as the local orientation was also treated as a fittable parameter. An overview of the results is shown in Table 1.
(b)
Figure 26. HREM image (a) and phase of the experimentally reconstructed exit wave (b) of an Mg/Si precipitate in an Al matrix. [Courtesy H. Zandbergen (14).]
274
ELECTRON MICROSCOPES Table 1. Results of Structure Refinement Using Electron Diffraction Data
Zone
Number of Observed Reflections
[010] [010] [010] [010] [010] [001] [001]
50 56 43 50 54 72 52
0 0 0 0 0 4.5 −1.9
R Value (%)
l
MSLS
Kinematic
−2.3 −1.8 0.3 −1.0 2.5 0 0
3.0 4.1 0.7 1.4 5.3 4.1 6.8
3.7 8.3 12.4 21.6 37.3 4.5 9.3
Synchronously scanned beams
+
+
+
8.3 2.6 −1.7 −5.0 −5.9 −3.9 3.6
k
+
+ +
h
6.7(5) 15.9(6) 16.1(8) 17.2(6) 22.2(7) 3.7(3) 4.9(6)
+
+
Crystal Misorientation
Thickness (nm)
+
+ +
+ +
+
+
+ +
+ + +
+ +
Probe beam
+
+
+
Line scan
+
c
+
+
+
a Mg
Display beam
B
A
Si
Figure 27. Structure model with MSLS from the fitting procedure described in the text. [Courtesy H. Zandbergen (14).]
The obtained R factors are of the order of 5%, which is well below the R factors using kinematical refinement that do not account for the dynamical electron scattering. Figure 27 shows the structure obtained after refinement. Details of this study have been published by Zandbergen et al. (14). SCANNING ELECTRON MICROSCOPY The SEM is a mapping, rather than an imaging device (Fig. 28) and so is a member of the same class of instruments as the facsimile machine, the scanning probe microscope, and the confocal optical microscope (19). The sample is probed by a beam of electrons scanned across the surface. Radiations from the specimen stimulated by the incident beam are detected, amplified, and used to modulate the brightness of a second beam of electrons scanned, synchronously with the first beam, across a cathode-ray-tube display. If the area scanned on the display tube is A × A and the corresponding area scanned on the sample is B × B, then the linear magnification M = A/B. The magnification is therefore geometric in origin and may be changed by varying the area scanned on the sample. The arrangement makes it possible for a wide range of magnifications to be obtained and allows rapid changes of magnification since no alterations to the electron-optical system are required. There is no rotation between object and image planes, and once the instrument has been focused on a given area the focus
B Frame scan
A Pixel
Figure 28. Schematic illustration of the basic mapping principle of the scanning electron microscope. [Courtesy D. Joy (19).]
need not be changed when the magnification is varied. To a first approximation the size of the finest detail visible in the image will be set by the size of the probe scanning the specimen. Multiple detectors can be used to collect several signals simultaneously that can then be displayed individually or combined in perfect register with each other. It is this probability in particular that makes the SEM so useful a tool since multiple views of a sample, in different imaging modes, can be collected and compared in a single pass of the beam. Figure 29 shows the basic components of a SEM. These can be divided into two main categories: the electronoptical and detector systems and the scanning processing and display systems. The electron-optical components are often described as being the ‘‘column’’ of the instrument while the other items are the ‘‘console’’ of the machine. The source of electrons is the gun, which produces electrons either by thermal emission, from tungsten or lanthanum hexaboride cathodes, or from a field-emission source. These electrons are then accelerated to an energy in the range from 500 eV to 30 keV. The beam of electrons leaving the gun is then focused onto the specimen by one or more condenser lenses. Although either electrostatic or electromagnetic lenses could be employed all modern SEMs use electromagnetic lenses. Typically, the final objective lens has been of the pinhole design with the sample sitting outside the field of the lens since this
ELECTRON MICROSCOPES
275
Gun Condenser lens Objective lens
Lens controller
Scan coils
Scan generator
High voltage 1 kV to 30 kV
Scan coils
Aperture Incident beam energy E0
Detector
A
B B Specimen Electron column
A Visual display Signal amplification and processing (digital or analog)
Figure 29. Basic components of the scanning electron microscope. [Courtesy D. Joy (19).]
Console
arrangement gives good physical access to the specimen. However, in this arrangement the specimen is 10 mm to 20 mm away from the lens, which must therefore be of long focal length and correspondingly high aberration coefficients. In modern, high-performance instruments it is now common to use an immersion lens (15), in which the sample sits inside the lens at the center of the lens field, or a ‘‘snorkel’’ lens (16) in which the magnetic field extends outside of the lens to envelop the sample. Although the immersion lens gives very good performance and, by making the sample part of the lens structure, ensures mechanical stability, the amount of access to the specimen is limited. The snorkel lens, on the other hand, combines both good electron-optical characteristics with excellent access for detectors and stage mechanisms. The coils that scan the beam are usually incorporated within the objective lens. A double-scan arrangement is often employed in which one sets of coils scans the beam through some angle θ from the axis of the microscope while a second set scans the beam through an angle 2θ in the opposite direction. In this way all scanned beams pass through a single point on the optic axis allowing for the placement of a defining aperture without any constriction of the scanned area. The scan pattern, produced on the specimen, is usually square in shape and is made up of 1,000 horizontal lines, each containing 1,000 individual scanned points or pixels. The final image frame thus contains 106 pixels, although for special activities such as focusing or alignment frames containing only 256 × 256 pixels may be used. Increasingly the detector output is passed through an analog-to-digital convertor (ADC) and then handled digitally rather than as an analog video signal. This permits images to be stored, enhanced, combined, and analyzed using either an internal or an external computer. While the majority of the images are still recorded onto photographic film, digital images can be stored directly to magnetic or magneto-optic disks, and hard-copy output of the images can then be obtained using laser or
dye sublimation printers. Typically scan repetition rates ranging from 15 or 20 frames/s (‘‘TV rate’’) to one frame in 30 s to 60 s (‘‘photographic rate’’) are provided. In addition individual pixels or arrays of pixels within an image field may be accessed if required. In the case of the SEM the attainable resolution is determined by a number of factors, including the diameter of the electron-beam probe that can be generated, the current Ib contained in that probe, the magnification of the image, and the type of imaging mode that is being used. Over most of the operating energy range (5 keV to 30 keV) of the SEM, the probe size and beam current are related by an expression of the form (17) Ib 3/8 1/4 d = CS λ3/4 1 + βλ2
(12)
where λ is the wavelength of the electrons (λ ≈ −1/2 1.226 E0 nm, where E0 is the incident electron energy in eV), β is the brightness of the electron gun in A · cm−2 · sr−1 , and CS is the spherical aberration coefficient of the objective lens. Finally, if the gun brightness is further increased to 108 A · cm−2 · sr−1 by using a field-emission source (18), then the factor is close to unity for both modes of operation considered. For a modern SEM CS is typically a few millimeters; thus minimum probe sizes of 1 nm or 2 nm are available. At low beam energies (below 5 keV) additional effects including the energy spread of electrons in the beam must also be considered, but the general conclusions discussed previously remain correct. Modes of Operation Secondary-Electron Imaging Secondary electrons (SE) are those electrons emitted by the specimen, under irradiation by the beam, which have energies between 0 eV and 50 eV. Because of their low
276
ELECTRON MICROSCOPES
energy the SE only travel relatively short distances in the specimen (3 mm to 10 mm) and thus they emerge from a shallow ‘‘escape’’ region beneath the surface. There are two cases in which an SE can be generated and subsequently escape from the specimen: first, when an incident electron passes downward through the escape depth, and second as a backscattered electron leaves the specimen and again passes through the escape region. Secondary electrons produced in the first type of event are designated SE1, and because they are generated at the point where the incident beam enters the specimen, it is these that carry highresolution information. The other secondary electrons are called SE2, and these come from a region the size of which is of the order of the incident beam range in the sample. Secondary-electron imaging is the most common mode of operation of the SEM. The reason for this is that secondary electrons are easy to collect and they carry information about the surface topography of the specimen. Information about surface chemistry and magnetic and electric fields may also be obtainable on suitable specimens. SE images can usually be interpreted readily without specialized knowledge and they yield a spatial resolution of 1 nm or better. Examples of typical SE images are shown in Figs. 30 and 31. The light and shadow effects together with the very large depth of focus enhance the 3D aspects of the surface structure. Another imaging mode is voltage constrast, which is illustrated in Fig. 32. Here large regions of uniform bright and dark contrast correspond to regions that have a negative and positive voltage with respect to ground.
Figure 31. High-resolution image of magnetic disk media surface recorded at 30 keV in a JEOL JSM 890 field-emission SEM. [Courtesy D. Joy (19).]
Backscattered Electrons Backscattered electrons (BSE) are defined as being those electrons emitted from the specimen with energies between 50 eV and the incident beam energy E0 . Because the yield of BSE varies with the atomic number of the specimen the contrast of the images is related to the atomic number of the object. Other Imaging Modes With a SEM it is possible to measure the current through the object as induced by the imaging electron beam [electron-beam-induced current (EBIC)]. This signal gives
Figure 32. Voltage contrast from an integrated circuit. Recorded at 5 keV in a Hitachi S-800 field-emission SEM. [Courtesy D. Joy (19).]
information about the electron-hole pair carriers in a semiconductor such as those at p–u junctions. In cathode luminescence, one detects the fluorescence radiation that is due to irradiation by the incident beam. This is a very sensitive technique that gives information about the impurities in semiconductors. For more information we refer the reader to Ref. 19. SCANNING TRANSMISSION ELECTRON MICROSCOPY
Figure 30. Secondary-electron images of radiolarium. Recorded in Hitachi S-4500 field emission SEM at 5 keV beam energy. Magnification: 800×. [Courtesy D. Joy (19).]
In principle a STEM can be considered as a SEM in which the object is transparent for the high energy electrons and in which the detector is placed behind the object. As in a STEM, a fine electron probe, formed by using a strong objective electron lens to demagnify a small source, is scanned over the specimen in a two-dimensional raster [Fig. 33(a)]. The electron probe is necessarily convergent: the convergence angle is, ideally, inversely proportional to the minimum probe size that determines the microscope resolution. On any plane after the specimen, a convergent beam electron-diffraction pattern is formed. Some part of this diffraction pattern is collected in a detector, creating a signal, which is displayed on a cathode-ray-tube screen to
ELECTRON MICROSCOPES
(a)
277
Stem Source
Objective Detector Specimen
Detector
A
B Detector
Source
CTEM
(b)
ADF detector
BF detector
Lens FEG source
Specimen
EELS spectrometer CTEM Display Scan
form the image using a raster scan matched to that which deflects the incident electron beam (20,21). Dark-field images, obtained with an annular detector in a STEM instrument, showed the first clear electron microscopy images of individual heavy atoms (22). From that time, STEM has developed as an important alternative to conventional, fixed-beam transmission electron microscopy (CTEM), with special advantages for many purposes. The use of a field emission gun (FEG) for high-resolution STEM is necessary to provide sufficient signal strength for viewing or recording images in a convenient time period. Because the FEG source has a brightness that is a factor of 103 or 104 greater than that of a W hairpin filament, the total current in the electron beam is greater when beam diameters of less than about 10 nm are produced. The current in a beam of 1 nm diameter is typically about 1 nA. As suggested by Fig. 33(b), the essential components of a STEM imaging system are the same as those for a CTEM instrument, with the electrons traveling in the opposite direction. In this diagram condensor and projector lenses have been omitted, and only the essential objective lens, which determines the imaging characteristics, is included. The STEM detector replaces the CTEM electron source. The STEM gun is placed in the detector plane of the CTEM, and the scanning system effectively translates the STEM source to cover the CTEM recording plate. When one uses a detector with a hole to eliminate the unscattered electron beam, the imaging is effectively incoherent so that the image contrast can be interpreted directly in terms of the atomic number of the constituting atoms. This imaging mode is therefore called Z contrast imaging (20). Figure 34 shows a STEM image of a tilt boundary in silicon in which the local atomic configuration can be seen directly in the images.
Signal
Figure 33. (a) Diagram of the essential components of a STEM instrument. (b) Diagram suggesting the reciprocity relationship between STEM (electrons going from left to right) and CTEM (electrons going from right to left). [Courtesy J. Cowley (20).]
(a)
(b)
Figure 34. " = 9, {221}(100) symmetric tilt boundary in silicon viewed along the [110] direction showing its fiveand seven-membered ring structure. ADF = annular dark field; EELS = electron energy loss spectroscopy. [Courtesy S. Pennycook (21).]
278
ELECTRON MICROSCOPES
The strength of STEM as compared to TEM is that a variety of signals may be obtained in addition to the bright-field or dark-field signals derived from the elastic scattering of electrons in the specimen. STEM instruments are visually fitted with an energy-loss spectrometer. Energy filtered images reveal compositional information. For more information we refer to Refs. 20 and 24.
The effect of all processes prohibiting the electrons from contributing to the image contrast, including the use of a finite aperture, can in a first approximation be represented by a projected absorption function in the exponent of Eq. (A.5) so that ψ(x, y) = exp[iσ VP (x, y) − µ(x, y)]
(A.7)
ψ(R) = exp[iσ VP (R) − µ(R)]
(A.8)
or APPENDIX A. ELECTRON-DIFFRACTION THEORIES
with R = (x, y) the vector in the plane perpendicular to z.
Phase Object Approximation We will now follow a classical approach. The nonrelativistic expression for the wavelength of an electron accelerated by an electrostatic potential E is given by 1 λ= √ 2meE
(A.1)
with h the Planck constant, m the electron mass and e the electron charge. During the motion through an object with local potential V(x, y, z) the wavelength will vary with the position of the electron as 1 λ1 (x, y, z) = 2me[E + V(x, y, z)]
(A.2)
For thin phase objects and large accelerating potentials the assumption can be made that the electron keeps traveling along the z direction so that by propagation through a slice dz the electron suffers a phase shift. dz dz − 2π 1 λ λ & % E + V(x, y, z) dz √ −1 = 2π λ E
dφ (x, y, z) = 2π
σ V(x, y, z) dz with σ =
π λE
V(x, y, z) dz = σ VP (x, y)
(A.4)
where VP (x, y) represents the potential of the specimen projected along the z direction. Under this assumption the specimen acts as a pure phase object with transmission function ψ(x, y) = exp[iσ VP (x, y)]
(A.5)
In case the object is very thin, one has ψ(x, y) ≈ 1 + iσ VP (x, y) This is the weak-phase approximation.
As follows from Eq. (1) a diffraction pattern can be calculated from the Fourier transforms of the exit wave ψ(R). However, even for a simple approximation such as Eq. (A.8) the Fourier transform is not expressed in a simple analytical form. In order to derive a simpler, albeit approximated, expression for the diffraction pattern it is more convenient to describe the diffraction process directly in Fourier space. According to the kinematical diffraction theory electrons are scattered in the specimen only and moreover the incident beam is not depleted by scattering. Each atom (scattering center) thus sees the same incident beam amplitude. This approximation is excellent in neutron diffraction, justified in X-ray diffraction, but it is poor in electron diffraction because the atomic scattering cross sections for electrons are relatively much larger than those for the other forms of radiation. The kinematical approximation is therefore only applicable to very thin crystals (a few nanometers for most materials) or for very large deviations from the exact Bragg condition (large s). It allows one to compute the amplitude of the diffracted beam only since the incident beam remains undepleted. Qualitative conclusions from the kinematical theory are nevertheless usually in agreement with the observations. A crystal is made up of identical unit cells, regularly arranged at the basic lattice nodepoints given by AL = l1 a1 + l2 a2 + l3 a3
(A.3)
Therefore the total phase shift is given by φ(x, y) = σ
Kinematical Theory
(A.9)
(where lj is an integer). In each unit cell, a number N of atoms is found at the relative positions ρk (k = 1, . . . , N). Mathematically speaking, the whole crystal is made up by convolution of one unit cell with the basic crystal lattice. Atom positions are thus rj = AL + ρk and they depend on four indices l1 , l2 , l3 , and k. Let k0 represent the wave vector of the incident wave and k that of the diffracted wave; then at large distance, i.e., in the Fraunhofer approximation, the phase difference between a wave diffracted by an atom at the origin and an atom at rj is given by 2π(k − k0 ) · rj and the scattered amplitude A(k) along the direction of k (Fig. 35) A(k) =
fj exp[2π i(k − k0 ) · rj ]
(A.10)
j
(A.6) This amplitude will exhibit maxima if all exponents are integer multiples of 2π i; maxima will thus occur if
ELECTRON MICROSCOPES
279
reciprocal-space base vectors bj : s = s1 b1 + s2 b2 + s3 b3
(A.17)
s · AL = l1 s1 + l2 s2 + l3 s3
(A.18)
k
one has
k0 B A
rj
The triple sum can be expressed as the product of three single sums of geometrical progressions. Calling N1 , N2 , and N3 the numbers of unit cells along the three lattice directions a1 , and a3 , one obtains, neglecting an irrelevant phase factor
C OC − AB = (k − k 0)• rj
N1 −1 N2 −1 N3 −1
O Figure 35. Illustrating the path difference OC − AB between waves diffracted by an atom at the origin and an atom at rj .
(k − k0 ) · rL = integer, which implies that k − k0 must be a reciprocal-lattice vector k − k0 = BH ≡ h1 b1 + h2 b2 + h3 b3
(A.11)
(where hj are integers and bi are base vectors). This is Ewald’s condition as discussed in the section on electron diffraction. However, A(k) will also be different from zero if the diffraction condition is not exactly satisfied, that is, if Ewald’s sphere misses closely a reciprocal-lattice node by a vector s, called the excitation error. This vector is parallel to the foil normal and connects the reciprocallattice node with the intersection point with Ewald’s sphere; by convention s is positive when the reciprocallattice node is inside Ewald’s sphere and negative when outside (Fig. 7). One can now set k − k0 = BH + s and AH =
fj exp[2π i(BH + s) · rj ]
AH = FH
This is the well-known von Laue interference function (10) (Fig. 36), which describes the dependence of the scattered amplitude on the deviation parameter s. The sine functions in the denominators can be approximated by their arguments, since these are always small. We further note that for large N one has sin(π Ns )/Nπs ∼ δ(s) with δ(s) = 0 for s = 0 and δ(s) = 1 for s = 0. We can then write, neglecting irrelevant phase factors, AH = FH δ(s1 )δ(s2 )δ(s3 )
(A.20)
(A.13)
One can then rewrite Eq. (A.19) in terms of Sx , Sy , and Sz as AH = FH
fk exp[2π i(BH + s) · (AL + ρk )]
(A.12)
(A.14)
sin π sx N1 a1 sin π sy N2 a2 sin π sz N3 a3 sin π sx a1 sin π sy a2 sin π sz a3
(A.22)
400
AH = FH
exp(2π is · AL )
(A.15)
L
where the structure factor FH is defined as FH =
fk exp[2π i(BH · ρk ]
(A.16)
k
Equation (A.15) is in fact a triple sum over the indices L(l1 , l2 , l3 ). If s is written as a vector in terms of the
300
200 sin2pSxa1
Neglecting s · ρk as compared to the other terms and noting that BH · AL is always an integer, this can be written as
for N1 = 20
k
sin2pSx a1N1
L
Va
where is the volume of the crystal and Va the volume of the unit cell ( = N1 N2 N3 a1 a2 a3 ; Va = a1 a2 a3 ). For a parallelopiped-shaped crystal block one often introduces the components of s along the three mutually perpendicular edges of the block with unit vectors ex , ey , and ez : (A.21) s = sx ex + sy ey + sz ez
and with r − j = AL + ρk
(A.19)
sin π s1 N1 sin π s2 N2 sin π s3 N3 = FH sin π s1 sin π s2 sin π s3
j
AH =
exp[2π i(l1 s1 + l2 s2 + l3 s3 )]
l1 =0 l2 =0 l3 =0
100
3
p/2
p
3p/2
pSx a1 Figure 36. von Laue interference function describing the dependence of the scattered intensity on the excitation error sx .
280
ELECTRON MICROSCOPES
Hereby use was made of the relations valid for a parallelopiped s1 = sx a1 , s2 = sy a2 , sz = sz a3
(A.23)
Equation (A.22) is only true if N1 , N2 , and N3 are sufficiently large. However, the foils used in transmission electron microscopy are only large in two dimensions, that is, along x and y, the foil being only a small number N3 of unit cells thick along z. In such a foil one thus obtains AH = FH
sin π sz N3 a3 δ(sx )δ(sy ) Va π sz a3
(A.24)
Introducing the specimen thickness N3 a3 = t and assuming sx , xy = 0 and calling sz = 0 one finds AH =
sin π st stH
(A.25)
per surface area where tH = π/FH ; tH is called the extinction distance. This result is interpreted as meaning that the sharp reciprocal-lattice nodes, characteristic of a large crystal, become rods in the case of a thin plate, as already mentioned before. These rods are perpendicular to the foil plane and have a weight profile given by sin(π st)/stH . The corresponding intensity is given by (Fig. 37) sin2 π st IH = (A.26) (stH )2 it is called the rocking curve. An intensity can be associated with each intersection point of the Ewald sphere with this rod (called relrod), that is, with each s value, the intensity being given by the value of the function at the intersection point. Another way to interpret these results is the following: In the Fraunhofer approximation, the diffracted wave is expressed by the Fourier transform of the electrostatic potential of the crystal. A crystal can be considered as a convolution product of two factors: The convolution theorem then states that the diffracted wave is given by the product of the respective
Fourier transforms of these two factors [e.g., Eq. (A.15)]. The Fourier transform of the lattice function yields delta functions at the reciprocal nodepoints, which describe the directions of the diffracted beams. The amplitudes of these beams are then given by the Fourier transforms of the potential of one unit cell, i.e., the structure factors [e.g., Eq. (A.20)]. The conservation of energy also requires that the wavevectors of the diffracted beams should all have constant length, or, that the reciprocal nodes should lie on a sphere, the Ewald sphere. In case the object is a thin crystal slab, it can be described as the product of an infinite crystal with a slab function that is equal to 1 inside the slab and 0 elsewhere. In that case, the diffraction pattern is given by the convolution product of the diffraction pattern of the infinite crystal with the Fourier transform of the slab function. Then each reciprocal node is smeared along a line perpendicular to the slab, with a factor given by a sinc function of the form in Eq. (A.26). Note that if the Ewald sphere would be flat, the diffracted wave can be derived from the Fourier transform of Eq. (A.8) provided the phase is weak. This means that, apart from the Ewald sphere, the weak phase object theory and the kinematical theory are equivalent. Equation (A.15) can also be understood intuitively in terms of the column approximation along the z direction. The amplitude of the wave diffracted by the volume element zn at level zn in the column (measured from the entrance face) is given by AH = FH zn exp(2π iszn ) or in differential form dAH = FH exp(2π isz) dz
(A.27)
The amplitude at the exit face of the column is then given by the sum AH = FH
exp(2π iszn )zn
(A.28)
n
which, if s = const, can be approximated by the integral
t
AH = FH
exp(2π isz) dz
(A.29)
0
or AH =
FH sin 2π st πs
(A.30)
Is
I0
O
1/t 0
S Figure 37. Rocking curve for a foil with thickness t0 according to the kinematical theory.
which is consistent with Eq. (A.26), though not identical. In the complex plane the sum Eq. (A.28) can be represented by an amplitude-phase diagram (7) (Fig. 38); it consists of the vector sum of elementary vectors, all of the same length, each representing the amplitude diffracted by a unit cell. Successive unit cells along a column in a perfect crystal diffract with constant phase differences, that is, the corresponding vectors enclose constant angles. The diagram is a regular polygon that can be approximated by a circle with radius FH /2π s. The length of the arc of the circle is equal to the column length. The amplitude diffracted by the column is given by the length of the segment connecting the endpoints P and P of the arc. It is clear that the amplitude will be zero if the column length (i.e., the foil thickness t) is an integer number of
ELECTRON MICROSCOPES
P′
281
interference between the twice transmitted beam with amplitude φ0 (z)φ0 (dz) and the doubly scattered beam of which the amplitude is φH (z)φ−H (dz) [Fig. 39(a)]. The minus sign in −H means that reflection takes place from the −H side of the set of lattice planes. We thus obtain
P
φ0 (z + dz) = φ0 (z)φ0 (dz) + φH (z)φ−H (dz)
t 2
t 2
1 2ps
dz
2psz 0 Figure 38. Complex plane construction of the amplitude-phase diagram for a perfect foil.
complete circles long. The maximum amplitude is equal to the diameter of the circle, that is, to FH /π s. Along deformed columns the amplitude-phase diagrams become curved (spiral shaped) since the angle between successive segments is no longer constant.
(A.31)
The slice dz being arbitrarily thin the kinematical approximation [e.g., Eq. (A.27)] can be applied rigorously to φ0 (dz) (≡ 1, no beam depletion) and φ−H (dz) = π i/tH exp(−2π isz) dz, where the factor i results from the phase change on scattering and where the structure amplitude FH has been expressed in terms of the extinction distance tH . Note that changing H → −H also changes the sign of s [Fig. 39(b)]. Similarly the scattered beam amplitude results from the interference between (1) the transmitted beam, which is subsequently scattered in dz and (2) the scattered beam, which is subsequently transmitted through dz [Fig. 39(a)].
(a) q
z
dz
Two-Beam Dynamical Theory for Perfect Crystals The dynamical theory takes into account that a scattered beam can act in turn as an incident beam and be scattered again in the interior of the crystal. The simplest case that can analytically be discussed and which moreover is relevant for image formation in the diffraction contrast mode is the two-beam case. Next to the incident beam only one beam is strongly excited (has small s). This scattered beam is then again an incident under the Bragg angle for the same set of lattice planes and can thus be scattered again. This interplay between incident and scattered beam tends to obliterate the destruction between incident and scattered beam in the interior of the crystal; it limits strongly the lateral spread of an incident Bragg diffracted electron beam and justifies the column approximation used in image calculations of defects according to the diffraction contrast mode. The dynamical theory is applicable to ‘‘thick’’ crystals provided also absorption is taken into account. It allows to compute the amplitudes of the transmitted beam as well as of the diffracted beam for a single Bragg reflection. We will include all usual approximations in the model already from the onset. These approximations are as follows: (i) ideal two-beam situation, (ii) no absorption, and (iii) column approximation. Within a column along z, perpendicular to the foil surface, we describe the interplay between the transmitted beam represented by the plane wave φ0 (z) exp(2π ik0 · r) and the scattered beam represented by φH (z) exp(2π ik · r) (twobeam approximation). The complex amplitudes φ0 and φH depend on z only (column approximation). Within the slice dz at the level z behind the interface we express that the transmitted beam amplitude results from the
x
f0(x, z) fH (dz )
fH (x, z) f−H (dz )
fH (x, z) f0(dz )
f0(x, z) f0(dz )
(b)
C
k ′H , k ′0
k ′H , k0
H sH
+H
O′
O −H
−sH −H ′
Figure 39. Schematic representation of the interfering waves during dynamical diffraction in the two-beam case. (a) Right: transmitted beam, left: scattered beam. (b) Changing H → −H changes also s → −s.
282
ELECTRON MICROSCOPES
This leads to the relation φH (z + dz) = φ0 (z)φH (dz) + φH (z)φ0 (dz)
(A.32)
where again φ0 (dz) = 1 and φH(dz) = (π i/tH )[(exp(2π isz)] dz. The two Eqs. (A.31) and (A.32) can be transformed into differential equations by noting that quite generally φ(z + dz) − φ(z) =
dφ dz dz
(A.33)
One thus obtains the following set of coupled differential equations: πi dφ0 exp(2π isz) φH (z) = dz t−H dφH πi exp(−2π isz) φ0 (z) = dz tH
(A.34)
in centro-symmetrical crystals t−H = tH. An alternative system is obtained by the substitution φ0 = T,
φH = S exp(−2π isz)
dS πi T = 2π isS + dz tH
(A.36)
These are the Darwin-Howie-Whelan equations (10,11) of the two-beam dynamical diffraction theory. The solution for a perfect crystal (i.e., s is constant) is easily obtained by the standard procedure used to solve systems of coupled first order differential equations; one finds sH sin(π σH z) exp(π isH z) T = cos(π σH z) − i σH (A.37) i sin π σH z exp(π isH z) S= σH t H where σH2 =
(1 + s2H t2H ) t2H
Two-Beam Dynamical Theory for Faulted Crystals Displacement Fields of Defects
(A.35)
which only changes the phase of the amplitudes but not the resulting intensities. One obtains dT πi S = dz t−H
beam and vice versa; this effect is called the Pendell¨ousung effect because of its similarity with the behavior of two coupled pendulums or two coupled oscillating circuits. Equations (A.37) describes the periodic depth variations of the diffracted and transmitted intensity, as well as the variation as a function of the excitation error s. Equation (A.39) is called the rocking curve. In an undeformed wedge-shaped specimen the depth variation gives rise to thickness extinction contours, which are parallel to the cutting edge of the wedge (Fig. 40). In a bent plane-parallel specimen the lines of constant s give rise to equi-inclination or bent contours (Fig. 41). It can be shown that taking absorption into account the shape of the rocking curve becomes asymmetric in s for the transmitted beam (Fig. 42) whereas it remains symmetric for the scattered beam. [A similar effect occurs in x-ray diffraction: the Bormann effect (22)]. The steep slope of the transmitted intensity in the vicinity of s = 0 is exploited in imaging strain fields due to dislocations and other defects.
In transmission electron microscopy defects are characterized by their displacement fields R(r). The simplest example is the stacking fault, for which R(r) is a step function, R = 0 for z < z1 , and R = R0 for z1 < z < z0 , z1 being the level of the stacking fault plane in the foil and z0 being the foil thickness. The exit part of the foil is displaced over a vector R0 with respect to the entrance part (Fig. 15). At the level of the interface the diffracted beam undergoes a relative phase shift given by α = 2π H · R0
(A.40)
whereas the transmitted beam is unaffected. The amplitude TS of the transmitted beam for the foil containing
IS
1/sH
1/SH
(A.38)
The scattered intensity is thus given by the square modulus of S sin2 (π σH z) IS = SS∗ = (A.39) (σH tH )2 where S∗ denotes the complex conjugate of S and IT = 1 − IS since absorption is neglected. Formula (A.39) is the homolog of formula (A.26), found in the kinematical approximation. Note that the depth period, which is 1/sH in the kinematical case, now becomes 1/σH . There is no longer a divergence for SH → 0. Equations (A.34) describe the periodic transfer of electrons from the transmitted beam into the scattered
z
Figure 40. Illustration of the formation of thickness extinction contours.
ELECTRON MICROSCOPES
283
and the exit part, respectively; the minus sign indicates that the excitation error is −s in the corresponding expression because the diffraction vector is −H. Similarly the diffracted beam amplitude can be expressed as (Fig. 42, right) SS = T1 S2 e−iα + S1 T2−
IS
O SH
1/t
−H
1
(a) Direct wave
Figure 41. Illustration of the formation of equi-inclination (bent) contours.
+H
1
Diffracted wave
The meaning of Eq. (A.41) is obvious; it expresses that the transmitted beam results from the interference between the doubly transmitted beam and the doubly scattered beam. The factor exp(iα) takes into account the phase shift over α of the beam scattered by the exit part. Equation (A.42) has a similar meaning. In Eq. (A.42) the phase shift is −α because the phase shifts are opposite in sign for S2 and S− 2 . Detailed expressions can be obtained by replacing T1 , T2 , S1 , S2 by their explicit expressions in Eq. (A.37). If the fault plane is inclined with respect to the foil plane, the phase change α takes place at a level z1 , which now depends on position x along the foil. For instance, in Fig. 43, z1 becomes a linear function of x. As a result TS and SS become quasiperiodic functions not only of z1 , but also of x. For s = 0 the depth period is equal to tH ; for s = 0 it becomes 1/σH , where σH is given by Eq. (A.38).
t
2/t
(A.42)
z S1
T1
S1
T1
1 s
z1
a
S 2−e i a
T2
S1S 2−e i a
T1T2
S 2e i
T2−
S1T2−
T1S 2e −i a
T1T2+S1S 2−e i a
T1T2e −i a +S1S 2−
(b)
O −S
+H −H
Direct wave
Ewald sphere +S
O′
Figure 42. Schematic representation of the interfering waves in the case of a foil containing a planar interface. Left: transmitted amplitude; right: scattered amplitude.
z0
Diffracted wave
x
1 sH
a stacking fault parallel to the foil plane can thus be formulated as (Fig. 42, left) iα TS = T1 T2 + S1 S− 2e
x
(A.41)
The expressions T1 , T2 , S1 , S2 refer to the amplitudes for perfect foils. The indices 1 and 2 refer to the entrance part
Figure 43. Cross section of foil containing a stacking fault in an inclined plane: illustrating the formation of stacking-fault fringes. (a) According to the kinematical theory s = 0; (b) according to the dynamical theory s = 0.
284
ELECTRON MICROSCOPES
Strained Crystals Strain fields and lattice defects are characterized by their displacement fields R(r): the atom that was at r before deformation will be found at r + R(r) after deformation. A twin boundary with a small twinning vector (domain boundary) parallel to the foil plane at the level z1 (Fig. 15) can, for instance, be represented by the displacement field R = 0 for z < z1 and R = kz for z > z1 . A pure screw dislocation can be described by the function R = b(θ/2π ), where θ is the azimuth angle measured in the plane perpendicular to b; all displacements are parallel to b. The Darwin-Howie-Whelan Eqs. (A.36) can be adapted to the case of a deformed crystal by the substitution
s ⇒ seff = s + H
dR dz
r
(x ′,y ′)
(x,y )
(A.43)
The Multislice Method The two-beam dynamical treatment is insufficient for the general situation in which high resolution images are taken with the incident beam along a zone axis where many diffracted beams are involved. Therefore the multislice method was developed as a numerical method to compute the exit wave of an object. Although the multislice formula can be derived from quantum-mechanical principles, we follow a simplified version (23) of the more intuitive original optical approach (24). A more rigorous treatment is given in the next section. Consider a plane wave, incident on a thin specimen foil and nearly perpendicular to the incident beam direction z. If the specimen is sufficiently thin, we can assume the electron to move approximately parallel to z so that the specimen acts a pure phase object with transmission function Eq. (A.5)
e
(A.44)
A thick specimen can now be subdivided into thin slices, perpendicular to the incident beam direction. The potential of each slice is projected into a plane that acts as a twodimensional phase object. Each point (x, y) of the exit plane of the first slice can be considered as a Huyghens source for a secondary spherical wave with amplitude ψ(x, y) (Fig. 44). Now the amplitude ψ(x , y ) at the point (x , y ) of the next slice can be found by the superposition of all spherical waves of the first slice, that is, by integration over x and y, yielding
ψ(x , y ) =
e
(x − x )2 + (y − y )2 + ε2 (x − x )2 (y − y )2 ≈ε 1+ + 2ε2 2ε2
r=
(A.46)
so that exp(2π ikε) ε × exp[iσ Vp (x, y)]
πk × exp i [(x − x )2 + (y − y )2 ] dx dy ε (A.47) which, apart from constant factors, can be written as a convolution product: ψ(x, y) = ε[iσ Vp (x, y)] iπ k(x2 + y2 ) × exp ε
(A.48)
where the convolution product of two functions is defined as (in one dimension)
exp[iσ Vp (x, y)]
(A.45) exp(2π ikr) × dx dy r
e
When |x − x | ε|y − y | ε, with ε the slice thickness, the Fresnel approximation can be used, that is,
ψ(x , y ) ≈ ψ(x, y) = exp[iσ Vp (x, y)]
e
Figure 44. Schematic representation of the propagation effect of electrons between successive slices of thickness ε.
f (x) ∗ g(x) =
f (x ) g(x − x ) dx
(A.49)
ELECTRON MICROSCOPES
If the wave function at the entrance face is ψ(x, y, 0), instead of a plane wave one has for the wave function at the exit face ψ(x, y, ε) = {ψ(x, y, 0) exp[iσ Vp (x, y)]} iπ k(x2 + y2 ) × exp ε
(A.50)
This is the Fresnel approximation in which the emerging spherical wavefront is approximated by a paraboloidal wavefront. The propagation through the vacuum gap from one slice to the next is thus described by a convolution product in which each point source of the previous slice contributes to the wave function in each point of the next slice. The motion of an electron through the whole specimen can now be described by an alternation of phase object transmissions (multiplications) and vacuum propagations (convolutions). In the limit of the slice thickness ε tending to zero, this multislice expression converges to the exact solution of the nonrelativistic Schr¨odinger equation in the forward-scattering approximation. In the original multislice method one used the Fourier transform of Eq. (A.50) where the real space points (x, y) are transformed into diffracted beams g and where convolution and normal products are interchanged, that is, ψ(g, ε) = [ψ(g, 0) exp(iσ Vg )] iπ g2 ε × exp k
(A.51)
where Vg are the structure factors (Fourier transforms of the unit-cell potential). The wave function at the exit face of the crystal can now be obtained by successive application of Eq. (A.50) or (A.51). This can either be done in real space [Eq. (A.50)] or in reciprocal space [Eq. (A.51)]. The major part of the computing time is required for the calculation of the convolution product, which is proportional to N2 [N is the number of sampling points (real space) or beams (reciprocal space)]. Since the Fourier transform of a convolution product yields a normal product (with calculation time proportional to N) a large gain in speed can be obtained by alternatingly performing the propagation in reciprocal space and the phase object transmission in real space (23). In this way the computing time is devoted to the Fourier transforms and is proportional to N log2 N. Another way of increasing the speed is in the so-called real-space method (24). Here the whole calculation is done in real space using Eq. (A.50) but the forward scattering of the electrons is exploited so as to calculate the convolution effect of the propagation only in a limited number of adjacent sampling points. In this way, the calculation time is proportional to N. This method does not require a periodic crystal and is thus suitable for calculation of crystal defects.
285
Electron Channeling The multislice method is an efficient method to compute numerically the exit wave of an object. However it observes interesting physical aspects of dynamical electron scattering. The channelling theory is more approximate (1,25) (although improvements are currently being made) but it is simple and it gives much physical insight. Electron Wave Consider an isolated column of atoms, parallel to the electron beam. If we assume that the fast electron in the direction of propagation (z axis) behaves as a classical particle with velocity v = hk/m we can consider the z axis as a time axis with mz (A.52) t= kh Hence we can start from the time-dependent Schr¨odinger Eq. (A.53) −h ∂ψ (R, t) = Hψ(R, t) (A.53) i ∂t with H=−
h2 R − eU(R, t) 2m
(A.54)
with U(R, t) the electrostatic crystal potential, m and k the relativistic electron mass and wavelength, and R the Laplacian operator acting in the plane (R) perpendicular to z. Using Eq. (A.52) we then have i ∂ψ(R, z) = [R + V(R, z)]ψ(R, z) ∂z 4π k with V(R, z) =
2me U(R, z) h2
(A.55)
(A.56)
This is the well-known high-energy equation in real space, which can also be derived from the stationary Schr¨odinger equation in the forward-scattering approximation (22). If we now consider the depth proportional to the time, the dynamical Eq. (A.55) represents the walk of an electron in the two-dimensional projected potential of the columns. The solution can be expanded in eigenfunctions (eigenstates) of the Hamiltonian ψ(R, z) =
n
En z Cn φn (R) exp −iπ E λ
(A.57)
where Hφn v(R) = En φn (R)
(A.58)
with the Hamiltonian H=−
h R − eU(R) 2m
(A.59)
286
ELECTRON MICROSCOPES
U(R) is the projected potential of the column E=
h2 k2 2m
(A.60)
E is the incident electron energy, and λ is the electron wavelength. For En < 0 the eigenstates are bound to the column. We now rewrite Eq. (A.57) as ψ(R, z) =
Cn φn (R) +
n
Cn φn (R)
n
En z × exp −iπ −1 E λ
(A.61)
The coefficients Cn are determined from the boundary condition Cn φn (R) = ψ(R, 0) (A.62) n
In case of plane-wave incidence one thus has
Cn φn (R) = 1
(A.63)
n
n
Diffraction Pattern Fourier transforming the wave function Eq. (A.66) at the exit face of the object yields the wave function in the diffraction plane, which can be written as
so that ψ(R, z) = 1 +
where the summation runs over all the atomic columns of the object parallel to the electron beam. The interpretation of Eq. (A.66) is simple. Each column i acts as a channel in which the wave function oscillates periodically with depth. The periodicity is related to the ‘‘weight’’ of the column, that is, proportional to the atomic number of the atoms in the column and inversely proportional to their distance along the column. The importance of these results lies in the fact that they describe the dynamical diffraction for larger thicknesses than the usual phase-grating approximation and that they require only the knowledge of one function φi per column (which can be set in tabular form similar to atom scattering factors or potentials). Furthermore, even in the presence of dynamical scattering, the wave function at the exit face still retains a one-to-one relation with the configuration of columns for perfect crystals as well as for defective crystals provided they consist of columns parallel to the electron beam. Hence this description is very useful for interpreting high-resolution images.
En z −1 Cn φn (R) exp −iπ E λ
ψ(g, z) =
i
exp(−2π ig · Ri )Fi (g, t)
(A.67)
i
Only states will appear in the summation for which Eλ |En | ≥ z
(A.64)
These are bound states with deep energy levels that are localized near the column cores. In practice if the column does not consist of heavy atoms and the distance between columns is not too close (e.g., larger than 0.1 nm) only one eigenstate will appear, which can be compared to the 1s state of an atom. We then have ψ(R, z) = 1 + Cφ(R) E z −1 × exp −iπ E0 λ
(A.65)
A very interesting consequence of this description is that, since the state φ is very localized at the atom core, the wave function for the total object can be expressed as a superposition of the individual column functions φi so that Eq. (A.65) in that case becomes ψ(R, z) = 1 +
Ci φi (R − Ri )
i
E z −1 × exp −iπ E0 λ
(A.66)
In a sense the simple kinematical expression for the diffraction amplitude holds, provided the scattering factor for the atoms is replaced by a dynamical scattering factor for the columns, which is defined by −iπ Ei z Fi (g, z) = exp − 1 Ci fi (g) E λ
(A.68)
with fi (g) the Fourier transform of φi (R). It is clear that the dynamical scattering factor varies periodically with depth. This periodicity may be different for different columns. In case of a monatomic crystal, all Fi are identical. Hence ψ(g, z) varies perfectly periodically with depth. In a sense the electrons are periodically transferred from the central beam to the diffracted beams and back. The periodicity of this dynamical oscillation (which can be compared with the Pendel¨osung effect) is called the dynamical extinction distance. It has, for instance, been observed in Si(111). An important consequence of Eq. (A.67) is the fact that the diffraction pattern can still be described by a kinematical type of expression so that existing results and techniques (e.g., extinction rules) that have been based on the kinematical theory remain valid to some extent for thicker crystals in zone orientation.
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
BIBLIOGRAPHY
287
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
1. S. Amelinckx et al., eds., Electron Microscopy, vol. 1, chap. IV, in Handbook of Microscopy, Weinheim, VCH, 1997.
SANDRA S. EATON GARETH R. EATON
2. D. W. Robards and A. J. Wilson, Procedures in Electron Microscopy, Wiley, Chichester, 1993.
University of Denver Denver, CO
3. W. L. Bragg, Nature 124, 125 (1929). 4. P. P. Ewald, Ann. Phys. 54, 519 (1917). 5. J. Steeds, Convergent Beam Electron Diffraction, vol. 1, chap. IV.1.5, in S. Amelinckx et al., eds., Handbook of Microscopy, Methods I, Weinheim, VCH, 1997. 6. S. Takagi, Acta Crystallogr. 15, 1311 (1962). 7. P. B. Hirsch, A. Howie, and M. J. Whelan, Philos. Trans. R. Soc. A252, 499 (1960). 8. S. Amelinckx and J. Van Landuyt, in S. Amelinckx, R. Gevers, and J. Van Landuyt, eds., Diffraction and Imaging Techniques in Material Science, Amsterdam, North Holland, 1978, p. 107. 9. P. B. Hirsch et al., Electron Microscopy of Thin Crystals, Butterworths, London, 1965. 10. P. Humble, in S. Amelinckx, R. Gevers, and J. Van Landuyt, eds., Diffraction and Imaging Techniques in Material Science, Amsterdam, North Holland, 1978, p. 315. 11. D. J. H. Cockayne, in S. Amelinckx, R. Gevers, and J. Van Landuyt eds., Diffraction and Imaging Techniques in Material Science, Amsterdam, North Holland, 1978. 12. H. Lichte, Electron Holography Methods, chap. IV.1.8, in S. Amelinckx et al., eds., Handbook of Microscopy, Methods I, Weinheim, VCH, 1997. 13. J. Jansen et al., Acta Crystallogr. A54, 91 (1998). 14. H. W. Zandbergen, S. Anderson, and J. Jansen, Science 12, 1221 (1997).
PRINCIPLES OF EPR IMAGING Electron paramagnetic resonance (EPR) imaging maps the spatial distribution of unpaired electron spins. Information about a sample can be obtained from images that have one, two, or three spatial dimensions. Images also can be obtained that have an additional spectral dimension that reveals the dependence of the EPR spectrum on the position in the sample. One can then seek to understand why certain types of spins are found at particular locations in a sample, why the concentrations vary with position, or why the concentrations vary with time. For the benefit of readers who are not familiar with EPR, a brief introduction to the principles of EPR is given in the next section. The emphasis is on the aspects of EPR that impact the way that imaging experiments are performed. In the section following, we discuss types of species that have unpaired electrons, also called paramagnetic species, that have been studied by EPR imaging. Subsequent sections discuss procedures for EPR imaging and provide examples of applications. More extensive introductions to EPR imaging can be found in Ref. 1–3. Reviews of the EPR imaging literature for the years 1990 to 1995 (4) and 1996 to early 2000 (5) provide many additional references.
15. T. Nagatani et al., Scan. Microsc. 1, 901 (1987). 16. T. E. Mulvey and C. D. Newman, Inst. Phys. Conf. Ser. 18, 16 (1973). 17. K. C. A. Smith, in O. Johari, ed., Proceedings 5th Annual SEM Symposium, IITRI, Chicago, 1972, p. 1. 18. J. I. Goldstein et al., chap. 2, Scanning Electron Microscopy and X-ray Microanalysis, Plenum, New York, 1992. 19. S. Amelinckx et al., eds., Scanning Electron Microscopy, D. C. Joy, Scanning Reflection Electron Microscopy, chapter IV.2.1, in Handbook of Microscopy, Methods II, Weinheim, VCH, 1997. 20. J. Cowley, Scanning Transmission Electron Microscopy, vol. 1, chap. IV.2.2, in S. Amelinckx et al., eds., Handbook of Microscopy, Methods II, Weinheim, VCH, 1997. 21. S. Pennycook, Scanning Transmission Electron Microscopy, Z Contrast, vol. 1, chap. IV.2.3, in S. Amelinckx et al., eds., Handbook of Microscopy, Methods II, Weinheim, VCH, 1997. 22. A. V. Crewe, J. Wall, and J. Langmore, Science 168, 1333 (1970). 23. D. Van Dyck, in P. Hawkes, ed., Advances in Electronics and Electron Physics, Academic, New York, 1985. 24. J. M. Cowley and A. F. Moodie, Acta Crystallogr. 10, 609 (1957). 25. K. Ishizuka and N. Uyeda, Acta Crystallogr. A33, 740 (1977). 26. K. Kambe, G. Lempfuhl, and F. Fujimoto, Z. Naturforsch. 29a, 1034 (1974).
Principles of EPR Resonance Condition. EPR, (also known as electron spin resonance (ESR) or electron magnetic resonance (EMR)), studies unpaired electrons by measuring the absorption of energy by the spin system in the presence of a magnetic field. Molecules or materials that have unpaired electrons are called paramagnetic. For the purposes of this discussion, we consider only paramagnetic species that have a single unpaired electron, because they are more likely to be amenable to EPR imaging. More comprehensive introductions to EPR are available in Ref. 2 and in standard texts (6). Many of the physical principles are similar to those of nuclear magnetic resonance (NMR). When an electron is placed in a magnetic field, the projection of the electron’s magnetic moment on the axis defined by the external field (usually designated as the z axis) can take on only one of two values, +1/2 and −1/2, in units of h ¯ , which is Planck’s constant divided by 2π . This restriction to only a small number of allowed states is called quantization. The separation between the two energy levels for the unpaired electron is proportional to the magnetic field strength B (Fig. 1). Unlike many of the other spectroscopies described in this encyclopedia, EPR involves interaction of the unpaired electron with the magnetic component of electromagnetic
288
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING Projection of spin on z axis +1 2
B=0
gßB
−1 2
B Increasing Absorption line
Figure 1. The splitting of electron spin energy levels increases proportionally to magnetic-field strength. Transitions between the two energy levels are stimulated by electromagnetic radiation when hν = gβe B. The customary display in EPR spectroscopy is the derivative of the absorption line that is shown here.
radiation, rather than with the electric component. When electromagnetic radiation, whose energy is equal to the separation between the spin energy levels, is applied to the sample, energy is absorbed by the spins, and transitions occur between the spin states. This resonance condition is defined by hν = gβe B, where h is Planck’s constant, ν is the frequency of the electromagnetic radiation, βe is the Bohr magneton, B is the magnetic field strength, and g is a characteristic value for a particular paramagnetic species. EPR experiments are most commonly performed at microwave frequencies of about 9.0–9.5 GHz (in the frequency band that is commonly called X band). g values of organic radicals are typically close to 2.0, so that resonance at X band for an organic radical occurs at magnetic fields of about 3200 to 3400 G (1 gauss = 0.1 mT). Samples are placed in a structure that is called a resonator. At X band, the most common resonator is a rectangular box that is called a cavity whose dimensions are about 2.3 × 1.1 × 4.3 cm. The maximum usable dimension of a sample is about 1.0 cm, provided that the sample does not absorb too much microwave energy. Other resonant structures are typically used at the lower frequencies that are used for in vivo imaging. In a continuous wave (CW) experiment, the microwave frequency and power are held constant, and the magnetic field is swept through resonance to record the spectrum. Because magnetic field modulation and phase-sensitive detection are used, traditionally, the first derivative of the absorption spectrum is recorded. The first-derivative display also provides resolution enhancement, which is advantageous because many EPR lines have rather large line widths. Later, some factors related to the choice of the microwave frequency at which to perform imaging experiments, are discussed. Hyperfine Splitting. In real samples, an unpaired electron is typically surrounded by many nuclear spins
that contribute to the net magnetic field experienced by the electron spin. The energy required for the electron spin transition depends on the quantized spin states of neighboring nuclei, which results in splitting the EPR signal into multiple lines. This is called hyperfine splitting. The number of lines in the EPR signal is equal to 2nI + 1 where n is the number of equivalent nuclei and I is the nuclear spin. Thus, interaction with one 14 N nucleus (I = 1) causes splitting into three lines, and interaction with three equivalent protons (I = 1/2) causes splitting into four lines. Splitting between adjacent lines is called the hyperfine splitting constant. The magnitude of the splitting constant in fluid solution, for a system tumbling rapidly enough to average electron–nuclear dipolar interactions to zero, depends upon the extent to which the unpaired electron spin is delocalized onto that nucleus. Hyperfine splitting constants of 10 to 20 G are common for organic radicals, and splitting constants of 100 G or more are common for transition-metal ions. Thus, a typical EPR spectrum of an organic radical may extend over tens of gauss. The hyperfine splitting constants are independent of magnetic field, so hyperfine splittings become increasingly large fractions of the resonant field, as the microwave frequency (and corresponding resonant magnetic field) is decreased. Rigid Lattice Spectra. The g values for most paramagnetic centers are anisotropic which means that the g value is different for different orientations of the molecule with respect to the magnetic field. When a paramagnetic center tumbles rapidly in solution, the g anisotropy is averaged, and a single g value is observed. However, when a sample that contains an unpaired electron is immobilized in a rigid lattice (a crystalline solid, an amorphous solid, a frozen solution, or a solution that formed a glass when it was cooled), the EPR spectrum is more complicated and extends over a wider range of magnetic fields than when the same material is in solution. The anisotropic dipolar contributions to hyperfine splitting, which were averaged in fluid solution, also contribute to spectral complexity in the solid state. Contributions to Line Widths. As discussed later, the spatial resolution of the image for most approaches to EPR imaging, is inversely proportional to the line width of the EPR signal. Thus, it is important to appreciate the more common factors that contribute to line width. EPR line widths for organic radicals typically are of the order of a gauss (1 gauss at g = 2 is 2.8 MHz). In some cases, line widths are relaxation-time determined and become temperature dependent. Frequently, unresolved nuclear hyperfine splitting and a distribution of chemical environments make a significant contribution to line widths. In fluid solutions, collisions between paramagnetic species (including oxygen) can cause EPR spectral line broadening, so solutions whose radical concentrations are higher than a few mM typically have line widths greater than those observed at lower concentrations. At concentrations higher than about 10 mM, collision broadening is usually severe and can cause loss of
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
hyperfine structure. At very high concentrations, exchange narrowing can cause collapse of the spectrum to a single line and give a line width that depends strongly on concentration. Solvent viscosity has an impact on line widths in samples that have significant g anisotropy because incomplete motional averaging of the anisotropy results in broadening the EPR signal. In rigid lattice samples, unresolved hyperfine splitting and distributions of g values and nuclear hyperfine splittings are major contributors to line widths. As the concentration of the paramagnetic species increases, there is increasing dipolar interaction between neighboring paramagnetic centers, which broadens the lines. Thus, for both fluid solution and rigid lattice spectra, concentrations of paramagnetic centers greater than a few mM significantly broaden the spectra and result in a loss of spatial resolution in EPR images. Ultimately, the resolution achievable in EPR imaging is limited by signalto-noise (S/N), so broadening of the signal that occurs at high concentrations of paramagnetic centers (high S/N) places physical limits on achievable resolution. EPR Imaging versus EPR Spectroscopy. EPR spectroscopy is performed in a magnetic field that is as uniform as possible. Often the spectrum is uniform through the sample, but if that is not the case, the EPR spectrum is a superposition of the signals from all positions in the sample. The goal in an imaging experiment is to distinguish between signals from different portions of the sample. An image typically displays EPR signal intensity as a function of one or more spatial dimensions. For samples in which the EPR line shape varies with position in the signal, the image may include line shape as an imaging dimension. Any EPR observable could potentially be an imaging dimension. Species whose EPR Signals are Amenable to Imaging Most EPR imaging experiments are performed at room temperature, so the species to be imaged must have a suitable EPR signal at room temperature. Many organic radicals in fluid solution are so unstable that their EPR signals do not persist long enough for an imaging experiment. There are important exceptions to this generalization, which has led to the widespread use of certain classes of organic radicals as probes in imaging experiments. Nitroxyl radicals have an unpaired electron delocalized in a nitrogen–oxygen π ∗ molecular orbital. An example of this class of radicals is the molecule that is given the acronym Tempone (I). These compounds are also sometimes called nitroxide radicals or aminoxyl radicals. This class of radical can be made of five-membered rings in place of the six-membered ring of Tempone and have a variety of substituents on the 4-position of the ring. The carbon atoms adjacent to the N–O moiety of these molecules are substituted with methyl groups (as in Tempone) or other bulky side chains to protect the paramagnetic center sterically. Many nitroxyl radicals are commercially available. They are stable in solution, even in the presence of oxygen, for prolonged periods provided that no species are present that are strong enough oxidizing or reducing agents to destroy the paramagnetic center.
289
The EPR signals for nitroxyl radicals exhibit hyperfine splitting due to the nitrogen atom, which results in a three-line signal whose splitting is about 15 G in fluid solution. The magnitude of the hyperfine splitting and the relative widths of the three lines of the hyperfine pattern can be useful monitors of the environment of the radical but can also complicate the imaging experiment as discussed later. A second class of compounds that has been exploited more recently in imaging experiments is derivatives of the triarylmethyl radical. These radicals have the advantage that they do not contain nitrogen, which avoids splitting the signal into three hyperfine lines. The triarylmethyl radical has relatively low stability in air. However, highly substituted derivatives, such as II, are substantially more stable than the parent radical and can be made water-soluble by selecting the groups on the periphery (7). These radicals are called trityl radicals or triarylmethyl (TAM) radicals. The EPR signal for these radicals is a single relatively narrow line (that has many unresolved hyperfine components), which is very convenient for spatial imaging. Radicals entrapped in a solid may be more stable than the same species in solution. For example, the EPR signals due to certain radicals in organic chars and irradiated solids are quite stable and suitable for imaging.
O
N
C• CD3
S
CD3
S
•
3
S
CD3
S
CD3
•
O
COO-K+
I
II
Many transition-metal ions have stable oxidation states and one or more unpaired electrons. However, the EPR signals of many of these metal ions at room temperature are quite broad, or there are many lines in the spectrum, which makes imaging based on magnetic-field gradients more difficult than for narrow-line signals. Encoding Spatial Information using Magnetic-Field Gradients The spatial information in most EPR imaging experiments is encoded using magnetic-field gradients. The basic principles underlying the use of gradients are illustrated in Fig. 2 for a single imaging dimension. Consider two samples that contain the same paramagnetic species for which the EPR signal is a single line. Let Bres designate the magnetic field that satisfies the resonance condition (hν = gβe Bres ). In the absence of a magnetic-field gradient, if Bext (the externally applied uniform magnetic field) is swept as in the usual field-swept continuous wave EPR experiment, both samples give a single-line signal centered at Bext = Bres . Now, consider the impact of an applied magnetic-field gradient along the direction of the main magnetic field, which is defined as the z axis. The magnetic-field gradient can be expressed as ∂Bz /∂z, the change in the z component
290
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
Center of cavity
Bext
(a)
Bi (b) Gradient contribution to B
0
Bj Zi
Z0
Zj
Z
(c) EPR signal
Bext
Field at sample : Bo+Bi
Bo
Bo+Bj
of coils is used to generate a gradient along the direction of the external magnetic field (∂Bz /∂z), and the current flows in opposite directions in the two coils. To create gradients in the two perpendicular directions, pairs of coils shaped like a figure eight are employed. The current in the coils is varied to adjust the magnitude of the gradient. Fixed gradients can be generated by using ferromagnetic wedges (8). One-dimensional Spatial Imaging. Consider the sample shown in Fig. 3. Coal has a strong, single-line EPR spectrum. Five tubes that contained coal, or coal mixed with NaCl, were put in the EPR cavity and oriented so that the five samples were spaced along the z direction. The normal (nongradient) X-band EPR spectrum exhibited a single line. When a magnetic-field gradient of 103 G/cm was applied and the external magnetic field was scanned, the spectrum in Fig. 3a was obtained. Five peaks were observed. The horizontal axis in the figure is the magnetic field. This is a complicated spectrum, but knowing the conditions of the experiment, one can see that each of the five samples of coal yielded a separate peak in the
Bext at resonance : Bres−Bi Bres Bres−Bj Figure 2. Encoding of spatial information by using a magnetic-field gradient. (a) Two samples of the same paramagnetic material that has a single-line spectrum are placed in an EPR cavity at positions zi and zj . (b) A magnetic-field gradient is applied, so that the gradient is zero at position zo , positive at position zi , and negative at position zj . (c) During a CW field-swept EPR experiment, the two samples achieve resonance at different magnitudes of the slowly swept external field Bext . (Figure adapted from Ref. 3 and used with permission).
of the magnetic field as a function of the z spatial coordinate. The magnetic-field gradient is generated so that at the center of the cavity zo , there is zero gradient, and the total magnetic field Bo (the field at location zo ) equals Bext . The applied magnetic-field gradient is such that at the left-hand sample (position zi ) the gradient field contributes an amount Bi , and at the right-hand sample (position zj ), the gradient contributes Bj . For the present example, let Bi be positive and Bj be negative. In the presence of the applied gradient, the sample at zi experiences a magnetic field that is larger than Bext by the amount Bi . Resonance occurs when Bext + Bi = Bres , so Bext = Bres − Bi . Similarly, the sample at zj experiences a magnetic field that is smaller than Bext by the amount Bj . Hence, the sample at zj will not be at resonance until Bext increases to Bres − Bj . Thus, when the external field is scanned to achieve resonance, the magnetic-field gradient distinguishes between the samples at different spatial locations in the EPR cavity. Therefore, we say that the magnetic-field gradient encodes spatial information into spectral information, which means that samples at different spatial positions appear as if they are different spectral components when the gradient is applied and the external field is varied to record a spectrum. The resulting spectrum in Fig. 2c is shown in the traditional first-derivative presentation. Magnetic-field gradients are usually generated by passing electric currents through coils of wire. A pair
(a)
Coal 25 G
(b) 103 G/cm gradient 2.0 mm
Figure 3. One-dimensional X band spatial imaging of a sample composed of five small tubes that contain coal. Coal exhibits a single-line EPR spectrum in a normal nongradient EPR experiment. The five samples were the same height, as shown in the insert, but the coal in three of the tubes was diluted with NaCl. (a) First-derivative X band CW spectrum of the sample in the presence of a magnetic-field gradient of 103 G/cm. (b) One-dimensional image obtained by mathematical deconvolution of the nongradient spectrum from the spectrum shown in (a). (Figure reproduced from Ref. 23 and used with permission.)
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
CW EPR spectrum. To find the spatial position of the samples, the magnitude of the field gradient (103 G/cm in this case) is used to convert the spectral dimension to a spatial dimension. Because the EPR lines are finitely wide (importantly — see later - they are all the same width in this case), the line width can be mathematically removed by deconvolution. The result of the deconvolution of the first integral of the spectral line width and conversion from gauss to distance for the horizontal axis scale is shown in Fig. 3b. This is a display of the intensity of the EPR signal as a function of position in the cavity. Figure 3b is a onedimensional spatial image of unpaired spin density in the composite coal sample. The gradient is the scaling factor between the sweep width in gauss and the spectral axis in the image. For example, a 50-G scan of a sample in the presence of a 100-G/cm gradient generates an image whose spatial axis is 0.50 cm long. The intensity of the EPR signal is proportional to the concentration of coal in each tube. The next step in developing the concepts of EPR imaging is to add hyperfine structure to the problem described in the preceding paragraph. So we substitute solutions of nitroxyl radicals for the coal samples and keep everything else the same. A nitroxyl radical in fluid solution in a uniform magnetic field exhibits a threeline spectrum whose splitting is about 15 G due to the coupling to the I = 1 14 N nucleus. The X-band firstderivative CW EPR spectrum for the five-part sample in the presence of a 103-G/cm gradient is shown in Fig. 4a. Six lines are immediately evident, and some of them show further complications. Actually, there are 15 overlapping lines — three lines from each of the five samples. The appearance depends on the magnitude of the gradient (more about this later). Taking advantage of the fact that the nitroxyl EPR line widths and hyperfine splittings in all five samples are identical, the line shape including hyperfine splitting can be deconvoluted in the same way as was done for the coal sample. The result is Figure 4b, which shows the spatial distribution of the signal intensity from the five samples prepared from different concentrations of the nitroxyl radical. Addition of a Spectral Dimension. Key to the deconvolution that was used to obtain the spatial information in the one-dimensional images of the coal and nitroxyl phantoms was the assumption that the line shape of the EPR signal was invariant through the sample. This assumption is not valid for some important classes of samples. In samples that contain naturally occurring radicals or radicals induced by irradiation, there may be more than one radical present in the sample, and the relative concentrations of the radicals are likely to vary through the sample. Even when only one species is present, the line shape of the EPR signal may vary through the sample. For example, the line shape for a nitroxyl radical depends on the rate of molecular tumbling, which provides a way to monitor local viscosity. The broadening of a narrow-line signal from a trityl radical (II), a char, or a nitroxyl radical due to collisions with paramagnetic molecular oxygen can also be used to monitor local oxygen concentration, which is called oximetry. An attempt to deconvolute a line shape
291
(a)
Nitroxyl
O N O
103-G/cm gradient 50 G (b)
4.0 mm
Figure 4. One-dimensional X band spatial image of a phantom composed of five tubes that contain solutions of the nitroxyl radical Tempone. The five tubes were the same height, as shown in the insert, but the concentrations of radical in the solutions were different. The radical gives a three-line spectrum that has hyperfine splitting of about 15 G. The magnetic-field gradient was 103 G/cm. (a) First-derivative field-swept spectrum obtained in the presence of the gradient. (b) Image obtained by deconvoluting the nongradient line shape, including hyperfine splitting from the first-integral of the spectrum shown in (a). (Figure reproduced from Ref. 24 and used with permission.)
from an image in which the line shape is not constant causes distortions and artifacts in the image. For these types of samples, it is important to include a spectral dimension in the image. An image that has one spectral dimension and one spatial dimension is called a spectralspatial image. Rather than include a spectral dimension, the lines in some spectra can be instrumentally broadened to minimize line shape differences, but line broadening reduces resolution in the image. If the motional information inherent in the nitroxyl line shape, for example, is not of interest, the sample can be cooled; this slows the motion for all environments and restores a broadened but uniform line shape (9). An approach to generating a spectral-spatial image is illustrated in Fig. 5. This image was obtained from a phantom1 constructed of two tubes that contained a radical (DPPH) that gives a single-line EPR spectrum and a third 1 A phantom is a sample of known composition and geometry that is designed to test an imaging procedure.
292
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
0.67 cm
spatial dimensions can also be obtained by extending the method just described.
DPPH Tempone
0.90 cm
Resolution of the Image
aN
80°
44 G 44°
aN
1°
aN Figure 5. X band spectral-spatial image and three of the projections used to create the image. Projections at higher angles are obtained from higher gradients and wider magnetic-field scans. The maximum magnetic-field gradient used to obtain the experimental projections that had the highest angle in the spectral-spatial plane was 400 G/cm. The sample consisted of two specks of solid DPPH, which gives a single-line EPR spectrum, and a tube that contained isotopically enriched 15 N-Tempone. (Reproduced from Ref. 3 and used with permission.)
tube that contained isotopically enriched 15 N-Tempone which gives a two-line EPR spectrum. The horizontal axis of the image is the spectral axis, the vertical axis is the spatial axis, and the spectral traces represent variations in signal intensity and line shape. A ‘‘view’’ of this image or pseudo-object is the shadow that an observer would see along a perpendicular plane when viewing the object from a particular direction. Three of these views are shown in the figure. Views are often called ‘‘projections.’’ Views of a spectral-spatial object are obtained by varying the magnetic-field gradient and the magnetic-field scan in a particular way. Typically, 16 to 128 such views are obtained and combined mathematically to reconstruct the image shown in the figure. The reconstructive algorithms are common to various imaging modalities based on tomographic reconstruction. Detailed discussions related to EPR imaging can be found in Ref. 10. Imaging by using Two or Three Spatial Dimensions. Spatial information for these images is encoded by using gradients in two or three directions. In each case, the net magnetic field is along the z axis, but the magnitude of the field varies with position along the x and/or y axes. The gradients are then ∂Bz /∂x and/or ∂Bz /∂y. Gradients along the axes are varied, and the magnetic field is scanned to provide ‘‘views’’ of the object which are then reconstructed to generate the full image. Images that have a spectral dimension in addition to either two or three
To a first approximation, the resolution of an image is determined by the ratio of the line width of the EPR signal to the magnetic-field gradient. For example, if the line width is 1 G and the gradient is 100 G/cm, then the resolution is 1G/(100 G/cm), which is 0.01 cm or 100 microns. A detailed discussion of resolution in magnetic resonance imaging can be found in Ref. 1 and 11. The resolution can be improved by deconvoluting the line shape, but practical considerations limit this enhancement to perhaps a factor of 3. In principle, increasing the gradient can increase the resolution, but again there are practical limits. Most EPR imaging experiments to date have used gradients less than 400 G/cm (m0)
1
B0 (b) 1 C
A 25% B 50 C 75
0.2 mm
B A
2
1G
Figure 11. (Continued)
and among organs. Differences in metabolic state may also distinguish healthy tissue from diseased tissue. In narrow-line EPR spectra, the line width depends on local oxygen concentration, which is the basis for EPR oximetry. Images have been obtained of a colloidal glucose char suspension perfused into an excised perfused rat heart. At the resolution of these images, key anatomical features can be recognized, and it is possible to detect the effects of local ischemia (21). Data acquisition can be synchronized with the heart beat to define the time-dependent alterations in radical distribution and oxygenation (22). The signal-to-noise requirements for in vivo imaging present substantial challenges for technique development. However, the potential enlightenment that could be gained by better understanding, of physiology and metabolic states in vivo, and even real-time monitoring of the efficacy of interventional procedures motivate efforts in this area. Summary and Future Directions
Spatial
1
Spectral
1G
Figure 11. (a) Sketch of the perturbation of a static magnetic-field around a cylindrical object whose magnetic permeability >1 (cgs). (b), (c) Contour plots of X band spectral-spatial image of polypyrrole on a 1.0-mm diameter Pd wire. The data were obtained by using a maximum gradient of 300 G/cm. (b) Experimental spectral-spatial image. (c) Image calculated for the effect of a Pd wire whose permeability was 1.0 + 8.0 × 10−4 (cgs). The values on the spectral axis are for the external magnetic-field. Labels 1 and 2 in parts (b), (c) refer to the regions labeled as 1 and 2 in part (a). (Figure adapted from Ref. 19 and used with permission.)
EPR imaging, it has been shown, provides key insights in materials science, including radiation-damaged samples. This information would be difficult to obtain by other methods. In studies of diffusion coefficients, EPR has the advantage that local tumbling rates can be determined from nitroxyl line shapes. Imaging then provides values of the diffusion coefficients, so both microscopic and macroscopic motion can be monitored in the same sample. Key challenges that face applications in vivo are the time required to obtain an image and the limitations placed on resolution by large line widths and limited signal intensity. At least initially, it may be useful to compare low-resolution EPR imaging results for local behavior of radicals with higher resolution images of anatomical features obtained by other imaging modalities. Spatial distributions change less rapidly with time in many materials than in in vivo experiments, so rapid signal averaging is less of an issue in materials science than in vivo.
ELECTROPHOTOGRAPHY
ABBREVIATIONS AND ACRONYMS DPPH EPR ESR HAS NMR TAM Tempone
1,1-diphenyl-2-picryl hydrazyl Electron Paramagnetic Resonance Electron Spin Resonance Hindered Amine Stabilizer Nuclear Magnetic Resonance Triarylmethyl radical, also called trityl radical 4-oxo-2,2,6,6-tetramethyl-piperidin-oxyl
23. G. R. Eaton and S. S. Eaton, in J. A. Weil, eds., Electronic Magnetic Resonance of the Solid State, Canadian Society for Chemistry, Ottawa, 1987, pp. 639–650. 24. S. S. Eaton and G. R. Eaton, in L. Kevan and M. K. Bowman, eds., Modern and Continuous-Wave Electron Spin Resonance, Wiley Interscience, NY, 1990, pp. 405–435.
ELECTROPHOTOGRAPHY O. G. HAUSER
BIBLIOGRAPHY 1. G. R. Eaton, S. S. Eaton, and K. Ohno, EPR Imaging and In Vivo EPR, CRC Press, Boca Raton, FL, 1991. 2. M. Ikeya, New Applications of Electron Spin Resonance: Dating, Dosimetry, and Microscopy, World Scientific, Singapore, 1993. 3. G. R. Eaton and S. S. Eaton, Concepts Magn. Resonance 7, 49–67 (1994). 4. S. S. Eaton and G. R. Eaton, Electron Spin Resonance 15, 169–185 (1996). 5. S. S. Eaton and G. R. Eaton, Electron Spin Resonance 17, 109–129 (2000). 6. J. A. Weil, J. R. Bolton, and J. E. Wertz, Electron Paramagnetic Resonance: Elementary Theory and Practical Applications, Wiley, NY, 1994. 7. J. H. Ardenkjaer-Larsen et al., J. Magn. Resonance 133, 1–12 (1998). 8. J. P. Hornak, J. K. Moscicki, D. J. Schneider, and J. H. Freed, J. Chem. Phys. 84, 3387–3395 (1986). 9. J. Pilar, J. Labsky, A. Marek, and S. Schlick, Macromolecules 32, 8230–8233 (1999). 10. R. K. Woods, W. B. Hyslop, R. B. Marr, and P. C. Lauterbur, in G. R. Eaton, S. S. Eaton, and K. Ohno, eds., EPR Imaging and In Vivo EPR, CRC Press, Boca Raton, FL, 1991, pp. 91–117. ¨ ¨ 11. M. Van Kienlin and R. Pohmann, in P. Blumler, B. Blumich, R. Botto, and E. Fukushima, eds., Spatially Resolved Magnetic Resonance: Methods, Materials, Medicine, Biology, Rheology, Ecology, Hardware, Wiley-VCH, Weinheim, 1998, Chap. 1, pp. 3–20. 12. H. Nishikawa, H. Fujii, and L. J. Berliner, J. Magn. Resonance 62, 79–82 (1985). 13. R. Murugesan et al., Magn. Resonance Med. 38, 409–414 (1997). 14. A. Feintuch et al., J. Magn. Resonance 142, 382–385 (2000). 15. J. H. Freed, Annu. Rev. Biomol. Struct. 23, 1–25 (1994). 16. M. Sueki et al., J. Appl. Phys. 77, 790–794 (1995). 17. K. Kruczala, M. V. Motyakin, and S. Schlick, J. Phys. Chem. B 104, 3387–3392 (2000). 18. M. Furusawa, M. Kasuya, S. Ikeda, and M. Ikeya, Nucl. Tracks Radiat. Meas. 18, 185–188 (1991). 19. M. Sueki, S. S. Eaton, and G. R. Eaton, J. Magn. Resonance A 105, 25–29 (1993). 20. H. M. Swartz and H. Halpern, Biol. Magn. Resonance 14, 367–404 (1998). 21. P. Kuppusamy, P. Wang, and J. L. Zweier, Magn. Resonance Med. 34, 99–105 (1995). 22. P. Kuppusamy et al., Magn. Resonance Med. 35, 323–328 (1996).
299
Rochester, NY
INTRODUCTION Electrophotography, also called xerography, is a process for producing high quality copies or images of a document. Electrophotography is based on forming an electrostatic charge pattern or image of the original document that is made visible by ultrafine electrically charged particles. The process was first commercialized by the Haloid Corporation in 1949 in the manual XeroX Copier Model A and then in 1955 in the XeroX Copyflo Printer. In 1986, it was estimated that the total copier business worldwide was roughly twenty billion dollars (1). Today, many corporations manufacture copying or printing machines based on the electrophotographic process, and the worldwide copying and printing market is estimated at well in excess of twenty billion dollars. This represents the production of more than 5000 billion dollars pages in 1999. The need for convenient, fast, low-cost copying intensified during the 1940s and 1950s as business practice became more complex. Letters and forms were submitted in duplicate and triplicate, and copies were maintained for individual records. Copying at that time was accomplished by Eastman Kodak’s Verifax and Photostat, mimeographing, and the 3M Thermofax processes. The intent was to replace carbon paper in the typewriter, but these processes were cumbersome and had major shortcomings. Chester F. Carlson patented the first electrophotographic copying machine (2) in 1944, but the first automatic commercial office electrophotographic copying machine, the Xerox model 914, was commercialized in 1959. Since then, the electrophotographic process has come to dominate the field of office copying and duplicating. Electrophotography is defined as a process that involves the interaction of electricity and light to form electrostatic latent images. The formation of electrostatic images (without the use of light) can be traced back to Lichtenberg (3) who observed that dust would settle in star-like patterns on a piece of resin which had been subjected to a spark. In 1842, Ronalds fabricated a device called the electrograph. This device created an electrostatic charge pattern by moving a stylus, which was connected to a lightning rod, across an insulating resin surface. This device could be considered a forerunner of electric stylus writing on dielectric-coated, conductive paper substrates. Stylus writing is commonly used today in rather commercially successful Versatec wide format black and white engineering and color electrographic printers.
300
ELECTROPHOTOGRAPHY
Experiments in forming electrostatic charge patterns on insulators during the 1920s and the 1930s were performed by Selenyi (4–8). Carlson realized that these early processes could not lend themselves easily to document copying or reproduction, so he and Otto Kornei began to experiment with the photoconductive insulators sulfur and anthracene. Using a sulfur-coated zinc plate, the successful reduction to practice of what is now called electrophotography occurred on 22 October, 1938. The sulfur was charged in darkness by rubbing it vigorously with a handkerchief. The characters 10-22-38 ASTORIA were printed in India ink on a microscope slide. The charged sulfur was exposed for a few seconds to the microscope slide by a bright incandescent lamp. Sprinkling lycopodium powder on the sulfur made the image visible after the loose powder was carefully blown away. Finally, the powder image was transferred to wax paper which was then heated to retain the image permanently. Carlson called this process electrophotography. A patent for electrophotography was applied for in 1939 and was granted in 1942 (9). Since then, Schaffert (10) redefined electrophotography to include those processes that involve the interaction of electricity with light to form latent electrostatic images. Significant improvements were made during the late 1940s and the 1950s to the original process steps and materials that Carlson had used. One major improvement by Bixby and Ullrich (11) was using amorphous selenium instead of sulfur as the photoconductor. Other significant improvements were using wire- and screencontrolled corona-charging devices (12), two-component contact electrification developer particles (13), and electrostatic transfer methods (14). These improvements all helped to commercialize the electrophotographic process. The advent of the personal computer and desktop publishing created a need for convenient, low-cost, pointof-need printing. At first, typewriters were adapted to perform impact printing, followed by dot matrix printers, but these concepts had many disadvantages. Small (and slow) ink jet printers have come to dominate the low-volume desktop publishing market at this time. The mid-volume and high-volume markets, however, are presently dominated by faster and more reliable laser and light-emitting-diode (LED) electrophotographic printers. The print quality of these modern printers approaches that of offset lithography quite closely, so many electrophotographic machines compete effectively with the offset press. Examples of these machines are the Xerox Docu-Tech, the Xerox 4850, 5775, 5650, the Heidelberg Digimaster 9110, and the Canon Image Runner 9110. There are business advantages for each type of printer, however, and offset print shops are certainly not going to go out of business in the foreseeable future. Because of the popularity of electrophotographic printers, both large and small, color and black only, this article explores the fundamentals involved in producing hard copy by this process. This article first presents an overview of the essential elements in an electrophotographic imaging system. Then, each of the commercialized process steps is covered in some detail to lead the reader from corona sensitization of the
photoconductor to fixing of the final image. An extension of the electrophotographic process to color printing is discussed in the final section. In a form of electrophotography called xeroradiography (10), the X-ray portion of the electromagnetic spectrum is used to form a latent image on a photoconductor. Xeroradiography is very beneficial in the medical profession, but it is not discussed in this article. A Overview of an Electrophotographic System An electrophotographic imaging system consists of a photoconductive belt or drum that travels through cyclic charging, exposure, development, transfer, erasure, and cleaning steps. The direction in which the photoconductor moves is called the process direction because the various process steps occur sequentially in this direction. Charging occurs by uniformly depositing positive or negative ions generated usually by air breakdown, using one or a combination of several devices called chargers. These are usually corotrons, scorotrons, ionographic devices, or biased conductive rollers. It is also possible to charge the photoconductor uniformly by ion transfer from conductive fluids. The charged photoconductor is exposed to light to form a pattern of charges on the surface, called an electrostatic image. In light lens copiers, the exposure system usually consists of a light source that illuminates a platen, a set of mirrors and a lens that focuses the platen surface onto the photoconductor. The illumination can be full frame flash or a scanning set of lights that move synchronously with the photoconductor. The exposure system in laser printers, consists of a laser beam, which is turned on and off by an acousto-optic modulator or by modulating the laser power supply itself. The electric charges of the electrostatic image are not visible to the human eye, but the pattern is made visible by a development process. During development, ultrafine, charged, pigmented toner particles are deposited on the photoconductor in the image areas, and the background areas are left clean. These particles are pigments coated with a thermoplastic and are electrically charged. Commercial dry development methods are open cascade (two-component developer particles are cascaded over the photoconductor surface in an open tray), electroded cascade (two-component developer particles are cascaded between the photoconductor surface and an electrically biased, closely spaced electrode), powder cloud (electrically charged toner particles are raised into a cloud that comes in contact with the photoconductor surface), magnetic toner touchdown (single-component toner particles that contain magnetite contact the photoconductor), magnetic brush (see Magnetic Brush development and Hybrid Scavengeless Development (development). The most widely used development technology to date has been the magnetic brush. A form of single-component dry development (widely used in Canon products) is called jumping development.
ELECTROPHOTOGRAPHY
The details of single-component and two-component jumping development are covered later in the sections on Jumping development. Liquid toner techniques include liquid immersion development and electroded liquid applicators (see Liquid development of electrostatic images) used in the Indigo 9000 machine made by Indigo N.V. of the Netherlands. After development, it is desirable to transfer these particles to some transfer medium such as paper so that the photoconductor can be cleaned and reused. The transfer can take place electrostatically by charging devices similar to those used to charge the photoconductor uniformly. However, a pressure-assisted biased transfer roll (BTR) is very widely used in modern machines. Then a fixing process, usually the application of heat to the toner image on paper, makes the visible images permanent because the pigment particles are embedded in a thermoplastic base. If the developed image-bearing surface can sustain high pressures and high temperatures, for example using ionography (see Ref. 15 for a description of the ionographic process used by Delphax Systems Ltd., Mississauga, Ontario, Canada), the fixing and transfer process can be performed simultaneously. Fusing the toner simultaneously upon transfer is referred to as a transfuse or a transfix process. Photoconductors tend to degrade at high temperatures, so transfuse is carried out when the image-bearing surface is a rugged insulator, such as aluminum oxide (Delphax machines) or Mylar (intermediate belt-transfer methods). Each of these processes will be covered later in some detail. The Photoconductor The heart of an electrophotographic system is the imaging material called a photoconductor. (For a concise description of the physical principles involved in photoconductors and also a review of the electrophotographic process (see Ref. 1). The photoconductor conducts electrical charges when illuminated; in darkness, it is an insulating dielectric. This material is coated as a thin layer on an electrical conductor, usually aluminum. In the first electrophotographic machines produced during the 1950s and 1960s, this material was zinc oxide powder coated on paper covered by aluminum foil, cadmium sulfoselenide or copper phthalocyanines coated on aluminum drums, or polyvinylcarbazole (PVK) coated on metallic foils or drums. Xerox machines, however, used amorphous selenium vacuum deposited on aluminum drums. As electrophotography matured, these substances were replaced by organic photoconductors. See Mort (1) for a concise discussion of the photoconductivity of amorphous selenium and Borsenberger and Weiss (16) for a thorough discussion of the photoconductivity of some organic materials. A photoconductive material commonly used in modern electrophotographic machines is an Active MATrix coating (AMAT) on a conductive substrate (Fig. 1). Patents to Weigl, Smith et al. and Horgan et al. (17–19) describe AMAT. An AMAT photoconductor consists of an electron-hole pair-generating layer, a charge-transport layer, and an adhesive layer coated on aluminized Mylar. Charge generation occurs within the layer that consists of about 0.5- to 0.7-µm particles of finely powdered trigonal
301
Charge-transport layer
Charge-generation layer Conductive substrate Support structure Figure 1. Schematic drawing of AMAT photoconductor.
selenium uniformly embedded in the adhesive that is usually the bottom layer. The electron-transport layer is usually doped polycarbonate. The transport layer can be any practical thickness, but usually it is about 25–30 µm. The change to AMAT and other organic photoconductors was brought about by cost, availability, and environmental issues with selenium. This change had far-reaching, process-related consequences. Selenium accepted positive ions as the surface charge, whereas organic materials required negative charging. Subsequently, problems of charging uniformity (it is more difficult to produce a uniform negative corona than a uniform positive corona) had to be resolved. This is covered in more detail in the section on charging. In addition, a magnetic brush material technology had been developed (for use with selenium photoconductors) that produced developer particles that had negatively charged toner particles and positively charged carrier particles. (These will be covered in detail in the section on development.) However, the reversal of the electrostatic image charge sign required new developer materials that provided a positively charged toner and a carrier that had a negative countercharge. The development of new materials is always a time-consuming and expensive task. As its name implies, a photoconductor is sensitive to light, so that it is an electrical conductor in illuminated areas and a slightly leaky dielectric (insulator) in darkness. Thus, when it is sensitized by uniform surface electrical charges, it retains an electrostatic charge for some time in the dark areas, but the surface charge in illuminated areas quickly leaks through the material to the ground plane. The mechanism of electrical conduction in the illuminated areas is the formation of electron-hole pairs by the light absorbed in the generating layer. The number of hole–electron pairs per photon is called the quantum efficiency. The magnitude of charge density in gray (partially exposed) areas depends on total exposure to light (total number of hole–electron pairs produced by the exposure). A photo-induced discharge curve (PIDC) is usually generated in a scanner (20). The PIDC relates the voltage of the charge density after exposure to the value of the exposure. These curves are generated at various initial charge densities. Local internal electric fields set up by the uniform surface charges before exposure cause movement of either holes or electrons through the transport layer. In some materials, such as amorphous selenium, only holes are allowed to move from the charge-generating layer to the ground plane. Thus, the charge-generating layer, the front surface of selenium, has to be charged positively. If it
302
ELECTROPHOTOGRAPHY
were charged negatively, then the direction of the electric field would transport electrons through the bulk of the material. However, pure amorphous selenium has a very high density of electron traps distributed through the bulk, so that the electrons would be trapped, and the buildup of space charge in the interior of the photoconductor would prevent further discharge. Traps empty comparatively slowly, and the contrast in electrical potential between image and background areas would be low, resulting in a very weak electrostatic image. The actual charge density in the image at the time of development is the result of the original surface charge density and dark decay. The density of charges that decay because of thermally activated sites for electron–hole pair production is called dark decay. (See Refs. 1,16, and 20 for detailed discussions of photoconductors and photoconductivity.) The photoconductor is flat in the direction perpendicular to the process direction and can be a drum or a belt. There are advantages and drawbacks to both belts and drums that depend on the particular machine configurations under consideration. These involve the generally complicated problem of choosing subsystem configurations for the specific machine and the process applications being considered. For example, a small slow personal printer such as the Hewlett Packard Laserjet would use a drum configuration, whereas a high-speed printer such as the Xerox DocuTech would use a belt configuration. The choices depend on the engineering and innovations required to achieve the performance and cost goals set for the machines. UNIFORM CHARGING OF THE PHOTOCONDUCTOR The goal of charging the photoconductor is to deposit a spatially uniform density of charges of single polarity on the surface of the material. Traditionally, this has been accomplished by running the photoconductor at a uniform speed under (or over) a charging corotron or scorotron. The simplest forms of these two general categories of chargers are a very fine wire (about 0.003 in. diameter) suspended by dielectric materials from a conductive shield and spaced about 0.25 in. over (or under) the moving photoconductor. Corotron Charging A traditional corotron is shown schematically in Fig. 2. A very high dc voltage Vw of about 5000 volts is applied to the wire. The conductive backing of the photoconductor and the shield of the corotron are electrically grounded. Thus, the electric field in the region between the wire and the grounded components exceeds the threshold
Coronode Shield
for air breakdown, and a steady flow of ions of single polarity leaves the wire coronode. The threshold for corona discharge, however, is a function of many variables. Some of these are the distance between electrodes, the atmospheric pressure, the polarity of the wire voltage (i.e., positive or negative corona emission), the geometry of the wire, and the nature and composition of the gas that surrounds the coronode. Relative humidity and ozone concentration also influence corona production. The total ionization current i splits between flow to the shield and flow to the photoconductor. The sum of current supplied to the photoconductor and the grounded shield is constant and is controlled by the wire voltage: i = AVw (Vw − Vth ).
The constant A also depends on the geometry of the shield, wire diameter, air pressure, and temperature among other things. This constant and the threshold voltage Vth , for corona discharge, have been traditionally obtained from experiments for specific configurations. However, the current density in the flow of ions that reaches the photoconductor depends on the voltage Vp , of the charges on the photoconductor, and the voltages of the components surrounding the wire, zero for grounded shields. The current flow to the photoconductor is established within milliseconds, and it would remain steady if the charge receiver were a conductor. It is altered, however, by the buildup of electrical potential on the surface of the photoconductor. So, initially the corona current to the photoconductor is a high value but, as charges accumulate, the current is reduced. Assuming a capacitively charging photoconductor that has no charge traps to be emptied by initial deposition of corona charge on its surface and negligible dark discharge during charging time, the current ip , to the photoconductor surface is ip = C
ip = Ap (Vw − Vp )(Vw − Vp − Vth ).
(2)
(3)
A in Equation (1) is associated with the total current leaving the wire, and Ap is associated with the current arriving at the photoconductor. The two constants are not the same, and they have to be measured individually. Thus,
Vp
dVp =
Photoconductor
Figure 2. Schematic drawing of a traditional corotron charger.
dVp . dt
If we assume that the ions arriving at the photoconductor surface are driven simply by the instantaneous electric field between the coronode and the photoconductor surface, then in the simplest form
0
Charged ions
(1)
=
1 C 1 C
t
ip dt
0 t
Ap (Vw − Vp )(Vw − Vp − Vth )dt.
(4)
0
For any set of wire voltages, geometries, and other conditions that govern the generation of a corona, the
ELECTROPHOTOGRAPHY
variables in Eq. (4) are Vp and t, the charging time. So,
Vp 0
dVp 1 = (Vw − Vp )(Vw − Vp − Vth ) C
t
Ap dt = 0
Ap t . C
(5)
Following Schaffert (10, p. 238), we integrate and solve for Vp in terms of Vw , the geometrical factors and charging time t, to obtain the photoconductor voltage: Ap Vth t
1−e C . Vp = Vw Ap Vth t Vw 1− e C Vw − Vth
(6)
The slope of the curve relating current to the photoconductor and the voltage of the accumulated charge called the slope of the charging device is important. One way of increasing this slope is to increase the wire voltage; however, electrical conditions for sparking or arcing (a disruptive surge of very high current density) must be avoided. When sparking occurs, the wire or the photoconductor can be damaged, and spots of high charge density can appear on the photoconductor surface. (See Ref. 21 for further information on gaseous discharges.) Corona spatial and temporal uniformity are very important requirements for good print uniformity. When the wire voltage is positive and a positive corona is generated, uniformity is generally very good. However, using negatively charging photoconductors, the spatial and temporal uniformity of negative corona current emitted by metallic wires is insufficient to achieve good print uniformity. Spots of high current density appear along the corona wires and tend to move along the wire with time, thereby yielding streaky prints. To avoid these streaks, the wire may be coated with a dielectric material such as glass. Instead of applying a steady (dc) voltage to the wire, using a grounded shield, a high frequency alternating (ac) voltage is applied to the wire, and the shield is biased negatively to drive the negative ions to the photoconductor. This arrangement is called a dicorotron and is used in such machines as the Xerox 1075 and the Xerox Docu-Tech. Other methods of providing uniform negative corona charging use a linear array of pins in a scorotron configuration. Scorotron Charging A typical scorotron is shown schematically in Fig. 3. In scorotron charging, a screen is inserted between the highvoltage wire and the charge-receiving photoconductive surface. The screen is insulated from the shield and
the coronode. It is biased independently to a potential which is close to the desired voltage of the charges on the photoconductor. Thus, a charge cloud is generated by air breakdown in the volume enclosed by the screen and the wire. The electric field generated by the voltage difference between the photoconductive surface and screen drives ions of the right polarity to the photoconductor. When this field is quenched by the accumulation of charge on the photoconductive surface, the current in the charging beam diminishes. Finally no further charging occurs, even though the photoconductive surface element may still be under the scorotron’s active area. Possible combinations of wire and screen voltages are positive dc wire and screen voltages negative dc wire and screen biased ac wire that has either a positive or negative screen. Another variation of scorotron construction is replacing the screen wires, that run parallel to the coronode by a conductive mesh. The openings of the mesh are placed at a 45° angle with respect to the coronode. This has been named a potential well (POW) scorotron and is disclosed in Ref. 22. Yet another variation of scorotron construction is replacing the coronode wire by a series of fine coronode pins. This device is called a ‘‘pin scorotron’’ and is disclosed in Refs. 23 and 24. Each of these devices has its respective advantages and drawbacks. Other Methods of Charging Roller charging is used in some electrophotographic machines (see Refs. 25 and 26). A biased conductive elastomeric roll moves synchronously across the uncharged photoconductive surface. The electric field is distributed across the gap defined by the surface of the roller and the photoconductor. If there is no compression of the elastomer as it contacts the photoconductor, the gaps on approach and departure are symmetrical. The threshold for air breakdown corresponds to a critical combination of electric field and gap. Therefore, the instantaneous point along the circumference of the roller, above the photoconductor surface at which air breakdown initiates, is a function of roller nip geometry, conductivity, and applied bias. Consider a point on the surface of the roller during the approach phase of the charging process (Fig. 4).
Conductive elastomer
Coronode Biased core Shield Photoconductor Screen Charged ions Figure 3. Schematic drawing of a traditional scorotron charger.
303
Gap approach
Gap separation Photoconductor
Figure 4. Schematic drawing of a biased roller charger.
304
ELECTROPHOTOGRAPHY
The gap diminishes to zero as the point approaches contact. During this time, charging may occur for a period of time that depends on current flow, the dielectric properties of the photoconductor, and the rate of charge density accumulation on the photoconductor, which moves synchronously with the roller. If the air breakdown is not quenched during the approach, then current continues as the point on the roller surface and photoconductor surface depart from the nip. As the photoconductor and roller surface separate, the gap increases and charging will cease at some value. The total charge density deposited on the photoconductor is the accumulation of charge during this process. Thus, limitations on charging uniformity from this system are related to both the mechanical and electrical properties of the coating. In all of these systems, the final surface voltage on the photoconductor depends on component voltages, device geometry, and process speed. Other methods of charging, not yet mature enough for extensive machine usage are conductive blade charging (27), contact sheet charging (28), and aquatron charging (29,30). IMAGE EXPOSURE Image exposure can occur by full frame flash, scanning illuminators moving synchronously with the photoconductor, an array of light-emitting diodes, or a laser scanner. Let us examine each type of system. Full Frame Flash A platen supports the optical original that will be converted into an electrostatic image which consists of a pattern of electronic charge on the photoconductor surface. Flash exposure requires a flat surface such as a belt or plate. Figure 5 shows an exposure device schematically (for example, as used in the Xerox 1075 family of machines of the late 1970s and 1980s). Let x, y be coordinates on the photoconductor surface. Exposure is a function of light intensity and time of illumination at the photoconductor surface and is given by EX(x, y) =
For a full frame flash exposure, Eq. (7) can be approximated as EX(x, y) = I(x, y)t (8) if flash intensity, rise time, and decay times are insignificant compared to flash duration t and the photoconductor moves only an insignificant distance during t. In Eq. (8) I(x, y) is the intensity of light at the photoconductor surface that is reflected from the platen. Neglecting light lost through the lens and illuminating cavity system and light reflected from the platen surfaces as stray light, (9) I(x, y) = I0 R(x, y), where R(x, y) is the reflectance distribution of the original image on the platen and I0 is the intensity of light that strikes the original image surface. Because the lens focuses the original onto a plane, only belt or plate photoconductors can be used for this exposure system. If a drum is desired as a photoconductor, then a moving light source synchronized with the moving drum surface is possible when using stationary flat platens. An alternative is a moving platen that has a stationary light source. Moving Illuminators with Stationary Platens Moving illuminators combined with a partially rotating mirror enable focusing a flat image on a platen onto the curved surface of a drum. The exposure system has to be designed so that the moving illuminated portion of the platen is always focused at a stationary aperture above the moving drum surface. This is enabled by a set of mirrors, at least one of which partially rotates within the optical path. Figure 6 shows schematically how this could be accomplished.
Stationary platen
n Moving light sources
toff
I(x, y)dt.
(7)
ton
Lens Stationary platen
Rotating mirror
Stationary mirror
Illuminating cavity Flash lamps Aperture
Lens and shutter Moving photoconductor
Rotating photoconductor drum n
Drive roll Figure 5. Schematic drawing of a full frame flash exposure system.
Figure 6. Schematic drawing of a stationary platen with that has a moving light source exposure system.
ELECTROPHOTOGRAPHY
Here, exposure at the moving drum surface is still given by Eq. (7) and in a 100% magnification system, x, y at the platen corresponds to x, y on the drum. The exposing aperture is a slit lengthwise to the drum, so although Eq. (8) still applies, t is now x0 /v, where x0 is the aperture width and v is the drum speed. Because the illuminators move at the same speed as the drum surface, then again neglecting light loss through optics, the intensity distribution at the drum surface is given by Eq. (9).
t p < x0/ n
Case 1: tp < x0 v, Case 2: tp = x0 v, Case 3: tp > x0 v. Figures 7–9 show the exposure distribution schematically for each of the three cases, assuming that there is complete darkness before and after the LED array. In these figures, it is assumed that the relationships shown for the three cases are achieved by varying the pulse time. However, if the relationship of Fig. 9 is achieved by increasing velocity instead of pulse time, then exposure is decreased, as shown in Fig. 10. In these figures, points 1 and 2 are the leading and the trailing edges of the LED array aperture. The photoconductor moves at constant velocity v. Points 1 and 2 are the projections of points 1 and 2 onto the moving photoconductor. At t = 0, the LED is activated, and the photoconductor under point 1 moves away from point 1, so at 1 , EX = 0. However, the photoconductor under point 2 continues to be illuminated either
x0 I =0
1
2
I =I 0
EX
EX=I0t p
2′
1′
x0
n
nt p Photoconductor
Exposure by Light-Emitting Diodes Linear arrays of light-emitting diodes (LEDs) are used in some printers as an exposure system alternative to a scanning laser spot. The array of approximately 3000 diodes is arranged perpendicularly to the photoconductor motion. Each diode is pulsed for a period that depends on factors such as the geometry and light intensity of the diode and the photosensitivity and velocity of the photoconductor. Because the array is stationary and the photoconductor moves, the x, y location of each exposed element (pixel) on the photoconductor is determined by the timing of the pulses. A start of scan detector signals the LED array electronics (sets the clock to zero for the scan line) that the photoconductor has moved one scan line in the process direction. Depending on the desired image, sets of individual LEDs are activated simultaneously during each scan line. The x location is defined by the position of the photoconductor, and the y locations by which sets of LEDs are turned on. However, the illuminated area element on the photoconductor is defined by the size of the output aperture of each diode. So exposure is again defined by Eq. (7), but now t is related to pulse time tp , assuming negligible illumination rise and fall times. The photoconductor moves, and the stationary LED’s aperture defines the illuminated area of the photoconductor. Three cases define the exposure distribution at the photoconductor surface. Letting x0 be the width of the aperture in the process direction and specifying that the photoconductor moves at constant velocity, (x0 v = const), the three possible conditions are
LED
Case 1 :
I =0
305
nt p
x0
Figure 7. Schematic drawing of exposure at the photoconductor surface when pulse time is less than transition time.
LED Case 2: t p = x0/ n
x0
I =0
2
I =0
1
I = I0
EX
EX= I0t p 1′ n
2′
x0
nt p
nt p
x0
Photoconductor
Figure 8. Schematic drawing of exposure at the photoconductor surface when pulse time equals transition time; nominal photoconductor speed.
LED Case 3: t p > x0/ n
I =0
x0 2
1
I =0 Photoconductor
I =I0
EX=I0x 0 /n
EX 2′
x0 nt p
1′
n
nt p
x0
Figure 9. Schematic drawing of exposure at the photoconductor surface when pulse time is greater than transition time; nominal photoconductor speed.
until the LED is turned off or until 2 reaches the leading edge, point 1. Any point on the photoconductor upstream of point 2 is in darkness until it reaches point 2, and then it is exposed until it reaches point 1 or the LED is turned off. The exposure that occurs between points 1 and 1 and 2 and 2 constitutes smear. From Figs. 7–9, to minimize smear, it is desirable to make pulse time as short and light output as
306
ELECTROPHOTOGRAPHY
LED
x0
t p > x0/ n I =0
2
I =0
1
I = I0
Photoconductor
EX 2′
x0
EX= I0x 0/ n 1′ n
nt p
nt p
x0
Figure 10. Schematic drawing of exposure at the photoconductor surface when pulse time is greater than transition time because photoconductor speed is greater than nominal.
LED
x0 / n > t p ; t r = x0/ n I =0
EX
x0 2
I =0
1
Photoconductor
I = I0
d
EX = I0tp /2
2′
d =x 0 2″ nt p
x0 nt p
1′
x0
EX= I0tp 1″ nt p
n
x0
Figure 11. Optimum exposure at the photoconductor surface to eliminate banding when pulse time is less than transition time; nominal photoconductor speed.
intense as possible as shown in Fig. 11. However, the raster line spacing d has to be critically controlled to avoid printing bands perpendicular to the process direction. The raster line spacing is the distance that the photoconductor moves during the time interval between the initiation of one exposure pulse and the next. Thus, the optimum combination of pulse time and raster line spacing would be such that the combination of smear and fully exposed photoconductor provides constant exposure in all background areas. For this to happen, the time between pulse bursts tr has to be exactly x0 v, where v is the photoconductor velocity. It can be appreciated that any fluctuation in v would cause fluctuations in EX(x) and the areas between 1 and 2 would either be overor underexposed because exposure is additive. Banding print defects are objectionable because the human eye is particularly sensitive to periodic fluctuations of optical density or color. When periodic banding perpendicular to the process direction occurs in background regions (using charged area development), it is called structured background to distinguish it from the more random background sometimes called fog. However, when the banding occurs in image regions (using discharged area development), it is called structured image. Schemes to avoid banding are reported in Ref. 31.
In this discussion, it was assumed that the illumination at the surface caused by the LEDs were sharply defined square or rectangular areas and completely uniform in intensity. In practice, the exposure optical system makes the intensity distribution at the photoconductor surface more Gaussian than shown in Figs. 7–11. Gaussian distributions are discussed in the following section. Relative to lasers, individual LEDs are limited in intensity because of losses in the optical system and possible overheating, if run at high power. So, more sensitive photoconductors are required for high-speed LED printers than for laser applications. Exposure by Laser Scanning In laser printers, a succession of spots of light of uniform diameter are turned on and off as the laser beam sweeps perpendicularly across the plane of the moving photoconductor. Because the electrical charges dissipate due to exposure, the desired pattern left on the surface is the electrostatic image. Just as in LED exposure, this electrostatic image has characteristics that correspond to the physical characteristics of the spot of light that exposes the uniformly charged material. The resolution and edge sharpness attainable in line or halftone images depends on, among other things, the exposing light spot diameter and the intensity distribution within the spot. In addition, the darkness of the developed image or the dirtiness of the background areas can depend on the intensity distribution of the spot of light and the ‘‘fast and slow’’ scan velocities with which the spot traverses the photoconductor. Fine lines have their peak charge densities eroded because of the overlap of exposure when the intensity distribution within the spot is broad and Gaussian. The light spot used to expose the photoconductor is generated by a laser that is optically focused on the plane of the generating layer of the photoconductor. The spot is modulated in time by modulating the voltage across the lasing junction in solid-state lasers or by passing the light beam through an acousto-optic modulator and modulating the polarization state of the shutter. The acousto-optic shutter is enabled by the plane polarization of laser light. A rotating polygon that has mirror surface finishes on the facets reflects the beam so that it sweeps across the photoconductor in the fast scan direction, which is perpendicular to the process direction. A start of scan detector signals a clock in the electronic subsystem that the spot is at the start of the scan position. The start of the scan position is usually the left edge of the document image area. A line of discharge spots is generated as the photoconductor moves one spot diameter in the slow-scan, or process, direction. A twodimensional exposure pattern is generated by sequentially turning the spot on or off. Figure 12 shows a laser scanning exposure system schematically. Electrical charges are invisible to the human observer, so the electrostatic image has to be made visible by a development process. Generating an Electrostatic Image by Laser Exposure
Exposure. The fast scan velocity vfast of the exposing laser spot is associated with the rotation of the polygon. The slow scan velocity vslow is the photoconductor velocity.
ELECTROPHOTOGRAPHY
307
at the photoconductor, at anytime, t
Rotating polygon Optics
I(x, y, t) = I0 e−(ax
Laser
Optics
2 +by2 )
.
(12)
For simplicity, assume circularly symmetrical Gaussian, a = b, laser spot intensity distributions.
Photoconductive rotating drum
Figure 12. Schematic drawing of a scanning laser exposure system. See color insert.
A start of the scan detector tells the laser modulator when the spot is at the beginning of a line of information. An end of the scan detector tells the modulator when the spot has reached the end of the line. During the time it takes the spot to complete a line of information of length L, the photoconductor moves one raster line spacing d. The raster spacing is normally one full width half-max (FWHM) of the spot intensity distribution at the surface. Fast and slow scan velocities are related to each other by vslow vfast = . L d
(10)
The acousto-optic modulator either lets light through or shuts it off. The time from the start of the scan determines the location of the image element. Hence, considering charged area development (CAD), the laser will be on until the spot reaches an area element that corresponds to the start of an image (such as a letter, a line or a halftone dot, etc,). Then the modulator shuts the beam off. The modulator turns the beam on again when the end of the image element is reached. While the spot is traversing the photoconductor perpendicularly to the process direction, the photoconductor is moving along the process direction. The next line of information begins when the distance traversed by the photoconductor at the start of the scan detector is just one FWHM of spot intensity distribution away from the previous start of the scan signal. Periodic variations in vslow produce periodic banding in image and background areas, perpendicular to the process direction. This image defect will be discussed later. Exposure is defined as the product of light intensity and illumination time, as given by Eq. (7). However, in that equation, x, y, and t are independent variables. When using scanning lasers, light intensity is distributed along the x, y directions on the photoconductor generally as a Gaussian function. Because the spot and the photoconductor move, x, y, and t are no longer independent. Thus,
Generating Lines Parallel to the Process Direction. Consider exposure in the fast scan direction separately from exposure in the slow scan direction by letting y be zero. On a moving photoconductor, x = vfast t, but to generate lines, the laser is turned off and then on again. Thus, for charged area development, the laser is on from the start of the scan until the first image element is reached. Then, it is turned off (I0 = 0). At the end of the image element, the laser is turned on again (IO = IO ) until the next image element is reached and so on. For discharged area development, the procedure is reversed. Consider charged area development. Exposure at dx located at a point x in background areas is I(x)dt. So, at any point x in the background along the fast scan line, 2
I(x) = I0 e−a(x−vfast t0ff ) ,
where toff is the time from the start of the scan, at which the laser is turned off. Then, dx . (14) dEX(x) = I(x)dt = I(x) vfast But as the spot of light moves past that element of the photoconductor, the total exposure at x is the sum of all of the exposures at x, as the intensity of the spot of light first increases then decreases. We can think of any x as a discrete point along the photoconductor, and intensity is represented by 2
2
(11)
and x, y are coordinates on the moving photoconductor in the fast and slow scan directions, respectively. Generally,
2
+ I3 e−a(x−vfast t3 ) + · · · In e−a(x−vfast tn ) · · · , (15) where vfast t1 , vfast t2 , vfast t3 , vfast tn . . . etc., are the fast scan raster points. Each raster point is associated with a laser intensity and if these are all the same I1 = I2 = I3 = In . . . = I0 then, they factor out, of course, and Eq. (15) is simplified. However, when the laser is pulsed periodically to generate lines, then the corresponding I s = 0 and vfast t s locate the positions along the fast scan direction at which the laser is turned off. Total exposure at any x is the integral from the start of the scan to the end of the scan, and the exposure pattern for each line or location on the photoconductor is the superposition of each of the exposures of all the other patterns. A series of exposures results for every element in the fast scan direction:
I(x, y, t)dt,
2
I(x, t) = I1 e−a(x−vfast t1 ) + I2 e−a(x−vfast t2 )
EX(x, y) =
(13)
EX(x) = I0 (x)
∞
2
e−a(x−vfast tn ) dt.
(16)
−∞
To evaluate the integrals at each element in the background areas, we use Eq. (16), hold tn constant, and
308
ELECTROPHOTOGRAPHY
√ change the variable √ of integration to u = a(x − vfast tn ), using x = vfast t, du = avfast dt, and exposure becomes EX(x) =
I0 √ vfast a
√ π I0 2 e−u du = √ . a vfast −∞ ∞
(17)
Thus, exposure in background regions along a fast scan turns out to be simple, until the laser is turned off. EX =
I0 vfast
π EX = 0 a
See Fig. 13 for an illustration. Exposure along x during the transition from on to off is approximately 2
dEX(x) = I(x)dt = I0 e−a(x−vfast toff )
dx vfast
I0 fast
p a
Structured Background or Structured Image and Motion Quality. Exposure on the photoconductor in the y direction, perpendicular to fast scan velocity, consists of rasters of exposure placed one raster spacing apart. Equation (12) can be expressed as
EX=0 Fast scan
Figure 13. Formation of lines parallel to the slow scan direction for charged area development by periodically turning off the laser.
2
I(x, y, t) = I0 e−ax e−by . 2
(18)
The term vfast toff in the exponent is the position along the photoconductor at which the laser turns off. The transition is usually made within one pixel. In the example see Fig. 14, the laser is on and turns off (I0 = 0) at t = 6/vfast . It turns back on at t = 9/vfast and stays on until t = 15/vfast . Then, it turns off and on again at t = 18/vfast , and so on. The normalized exposure pattern is shown in Fig. 14. The normalization is vfast t = 3n to represent n pixels that are three units wide. It can be appreciated that if the lines are too narrow, then exposure overlap erodes the null exposure between the lines, and therefore charges on the photoconductor are dissipated in the lines as well as in background areas. This is one reason that it becomes difficult, in CAD to print narrow lines that are as dark as wider lines. It might be obvious that if the laser spot intensity
EX= n
distribution had a steeper slope, then the overlaps at the peak of the line would be reduced. Hence, the charge density would increase and make it possible to develop a darker narrow line. This, however, has consequences in the avoidance of structured background in CAD systems. When development occurs in discharged areas of DAD (or write black) systems, the print defect is structured solid areas.
All integrations in the previous section were with respect to x. Exposure in background areas along the y direction for each raster at constant x is √ π I0 −by 2 e . (20) EX(y) = √ a vfast see Eq. (17) for integration in the x direction. However, the combination of raster spacing and exposure spot size b is chosen to overlap exposures to prevent bands from printing. To account for the raster spacing, we introduce into Eq. (20), the slow scan velocity vslow , and tr the time required for the polygon to move from one facet to the next: √ π I0 −b(y−vslow tr )2 e . (21) EX(y) = √ a vfast Here, vslow tr is the raster spacing d controlled by the rotational rate of the polygon. It can be appreciated that a cyclic perturbation in vslow will cause the raster spacing to fluctuate with time, noting that tr is a constant. If the polygon rotational rate fluctuates or wobbles, then tr is not constant, but that is a separate issue not addressed here. Assume that, because of mechanical and electrical imperfections in the photoconductor drive, vslow (t) = vs [1 + A sin(π t)],
Fast scan direction
Exposure at x ′
1.2
√ π I0 −b(y−vs [1+A sin(π t)]tr )2 e . EX(y, t) = √ a vfast
1 0.8 0.6 0.4 0.2 0 10
20 30 40 position on photoconductor (x ′)
(22)
where A is the amplitude of variation, is the frequency of variation, and vs is the perfect slow scan velocity.
1.4
0
(19)
50
Figure 14. Exposure pattern for single-pixel lines spaced two pixels apart; I0 = 0 at n = 2,5,8,11, . . . .
(23)
Equation (23) is a function of t as well as the constant tr . Assume that the first line of consecutive rasters under consideration occurs at y = 0 and t = 0. Because the photoconductor moves away from the line as t increases, it is necessary in Eq. (23) to include only as many tr ’s as influence the exposure of two consecutive rasters. The maximum error in exposure occurs halfway between these two rasters. Depending on spot size b, only a few values
309
1.2
1.2 Normalized exposure
Intensity distribution at photoconductor
ELECTROPHOTOGRAPHY
1 0.8 0.6 0.4
1.0 0.8 0.6 0.4 0.2 0
0.2
0
0.002
0 0
0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 Position on photoconductor (y )
Figure 15. Normalized intensity distribution in the slow scan direction for rasters spaced 0.0033 in. apart and b = 2.5 × 105 in−2 .
of tr are significant. For example, notice from Fig. 15, that the intensity (therefore exposure) contribution of the third raster (y = 0.0066 in.) to the point halfway between rasters at y = 0.00165 in. is essentially zero. Consider a laser that has an optimum spot size for a 300 lpi printer. The intensity distribution is shown in Fig. 15. Here the values of b and vslow tr were chosen so that exposure in background regions is optimum. The intensity distribution shown above would give an exposure distribution, as shown in Fig. 16. The waviness in this figure reflects the fact that when exposures add, the middle of the slow scans gets less light than the peak regions. The result is that the photoconductor has more charge density at the halfway points than at the points that correspond to the peaks of exposure. If the voltage of the charges at these points
0.8
0.6
t =1/(2W)
0.4
0.2
0 0
0.001
0.002
0.003
0.004
0.014
is not biased out by the developer subsystem, then some background deposition occurs at these points in CAD (see Development). On the other hand, in DAD, the image has structure if this occurs. This is extremely objectionable because the human eye is more sensitive to periodic than to random variations in darkness. For the plot in Fig. 16, the drive system is assumed perfect, so that A = 0 in Eq. (23). Now let us presume that some 120-Hz vibration gets into the drive system. For example, the power supply for the drive motor might have a 5% ripple, A = 0.05, (a realistic value for some low-cost power supplies). This manifests itself as shown in Fig. 17. Only three consecutive rasters were included for the calculation in Eq. (23). The top curve in the set shown in Fig. 17 corresponds to t = 0, and the bottom curve to t = 1/(2). This variability in exposure is converted to a variability in voltage in the background regions that uses the photo-induced discharge curves discussed in Photoconductor Discharge.
Consecutive raster
1
0.012
Figure 16. Normalized exposure distribution in the slow scan direction for rasters Spaced 0.0033 in. apart, b = 2.5 × 105 in−2 , and has perfect motion quality.
Initial raster
1.2
Normalized exposure
0.004 0.006 0.008 0.01 Position y (inches)
0.005
0.006
0.007
0.008
0.009
0.01
Position y (inches) Figure 17. Normalized exposure distribution at consecutive rasters as a function of time for one-quarter period of motion quality perturbation.
310
ELECTROPHOTOGRAPHY
It can be appreciated that if the laser spot were smaller, then, for the same raster distance of 0.0033 in., the severity of underexposure would increase halfway between rasters. The unit of laser power is expressed in milliwatts and exposure in ergs/square centimeters. So, intensity is in milliwatts/square centimeter, but a watt is a joule/sec and so after conversion, exposure by 1 mw per cm2 for 1 second yields 104 ergs/cm2 . But for an 11 in. wide printing system where the photoconductor moves at 6 in/s, and printing spots are 0.0033 in. in diameter, the exposure time is of the order of 10−4 seconds. Therefore, exposure is of the order of 1 erg/cm2 per milliwatt. Thus for reasonably powered lasers (5–10 mw) in this type of system, using conventional magnetic brush development methods, the photoconductor should discharge from a full charge of about 600 volts to a background in the range of 100–200 volts. PHOTOCONDUCTOR DISCHARGE Photoconductor discharge is covered in detail in the Photoconductor Detector Technology article of this encyclopedia and also by Scharfe, Pai and Gruber (20). Photoconductor discharge is described by a photo-induced discharge curve (PIDC) that can be obtained experimentally by a curve fit of potentials when the exposure It is varied in a scanner (32). A scanner is a device that cyclically charges a photoconductor uniformly and measures the surface voltages at various stations. At the charging station, the corona current to the photoconductor and the surface voltage are measured. The charged photoconductor moves to an exposure station where it is exposed to a known light source. The surface voltages before and after exposure are measured by electrostatic voltmeters. The exposed photoconductor then moves to an erase station where it is discharged to its residual potential. The surface potential between exposure and erase is measured at several stations, so that the discharge rate after exposure can be calculated. After erase, the process starts over again, and exposure is varied. Thus, after a series of experimentally determined points, a curve can be fitted to the data. The measured data are modeled in terms of the transit times of the photogenerated carriers; the quantum efficiency of the charge generation; its dependence on the internal electric field; trapping of charges during transit from the photogeneration layer to the surface; dark decay, if significant; and cyclic stability. In modeling the PIDC, it has been assumed that electrical contacts are blocking. (There is no charge injection from the conductive substrate into the charge generating layer.) Let us follow the derivation of Scharfe et al. (20). For a capacitively charging photoconductor in which light photons are highly absorbed by the generating layer and the polarity of the surface charges of the mobile charge carriers are the same as the surface charges, the voltages of the surface charges before and after exposure can be obtained from dV = ηeI, (24) Cp dt provided that the transit time of the carriers is short compared to the exposure time and that the range of
the carriers is long compared to the photoconductor layer thickness. Here Cp is the photoconductor capacitance, η is quantum efficiency (see Ref. 20 for the distinction between quantum and supply efficiencies), e is the electronic charge, and I is the exposing light intensity. This leads of course to photoconductor discharge as
V
eI Cp
dV = V0
t
ηdt,
(25)
0
which leads to a linear discharge curve, if quantum efficiency is a constant independent of electric field and intensity is not a function of exposure time. Then Vimage is Vimage = V0 −
eI Cp
t
ηdt,
(26)
0
where V0 is the voltage at start of exposure and Vimage is the voltage immediately after exposure. This is a simple linear equation, generally not representative of real photoconductors except for hydrogenated amorphous silicon (20). In terms of surface charge densities, σ0 dp , kε0
V0 = and Vimage =
σimage dp . kε0
(27)
Thus, when no charges are in the bulk, σimage is the surface charge density on the photoconductor after exposure, dp is the photoconductor geometrical thickness, k is its dielectric constant, and ε0 is the permittivity of free space. For many photoconductors, η depends on some power of the electric field but not on light intensity (i.e., it obeys reciprocity) given by η = η0
V dp
Pe .
(28)
Here, V is the instantaneous voltage of the charge on the surface, η0 is the free carrier density and Pe is the power. So, combining Eq. (24) and Eq. (28) and integrating as in Eq. (25), yields, for Pe = 1, " Vimage = V0
1−Pe
− (1 − Pe )
eη0 It
#
1 1−Pe
.
Cp dp Pe
(29)
The shape of the PIDC depends on the electric field dependence of photogeneration of the carriers when there are no mobility or range limitations. Another relationship for the PIDC is given by Williams (33). It
Vimage = Vr + (V0 − Vr )e− Ea .
(30)
In Eq. (30), Ea is an empirical constant also called the normalized sensitivity. Williams gives the value of this
ELECTROPHOTOGRAPHY
Vc 2 − SIt Y PIV = 2 Vimage = PIV + Vr + PIV 2 + Vc 2 Y−
(31)
fit some typical photoconductors. (It can be shown that the equations in Ref. 32) and Eq. (31) are equivalent.) Here V0 is the electrostatic potential in complete darkness, S is the spectral sensitivity of the photoconductor, Vr is a residual potential (under high exposure, the potential is never less than this value), Vc is an empirically derived constant which connects the initial slope of the PIDC with the residual value. Figure 18 is a fit to Eq. (31) of PIDC #8 shown in Fig. 6 of Ref. 19. A more useful plot is the linear plot shown in Fig. 19. In this figure, PIDC #8 of Ref. 19 is calculated for V0 = 700 volts and V0 = 1350 volts to show how background voltage increases when the initial corona charge increases. The linear plot is more convenient especially when the sensitivity and curvatures are improved, as for example in Fig. 20. These improved PIDCs are more practical for exposures in the 5–10 erg/cm2 range that are encountered in machine applications. (See Exposure by laser scanning.) So, by using the PIDC curves, along with the exposures discussed in Image Exposure, obtains the electrostatic image that needs to be developed.
1400 Photoconductor voltage
Y = V0 − Vr
1600
1200 1000 800 600 400 200 0 0
5
10
15
20
25
30
35
Exposure (erg/cm2) Figure 19. Linear plot of photo-induced discharge curve #8 of Ref. 16, fitted by Melnyck parameters S = 60 volts/erg/cm2 , Vc = 400 volts, Vr = 40 volts, and Vddp = 1350 volts.
1600 1400 Photoconductor voltage
constant as 31.05 ergs/cm2 for organic photoconductors and 4.3 ergs/cm2 for layered photoconductors, when exposure occurs by light illumination at 632.8 nm (33). Melnyk (32), found from experimental measurements in a scanner that PIDC equations of the form
311
1200 1000 800 600 400
Photoconductor Dark Discharge The PIDC curves describe the photoconductor voltage in exposed areas. These are background areas when
200 0 0
10
15 EX
1600
20
25
30
35
(erg/cm2)
Figure 20. Linear plot of photo-induced discharge curves that are more practical for exposure systems capable of delivering 1 erg/cm2 per milliwatt; S = 80 volts/erg/cm2 , Vc = 120 volts, Vr = 40 volts.
1400 Photoconductor voltage
5
1200 1000 800 600 400 200 0 0.1
1
10
100
Log exposure (erg/cm2) Figure 18. Logarithmic plot of photo-induced discharge curve #8 of Ref. 16, fitted by Melnyck parameters S = 60 volts/erg/cm2 , Vc = 400 volts, Vr = 40 volts, and Vddp = 1350 volts.
using charged area development (CAD) and image areas when using discharged area development (DAD). However, a time delay occurs after exposure, during which the electrostatic image traverses from the exposure station to the development station. This time delay depends on process speed and machine architecture. During this delay, the photoconductor voltage of the electrostatic image degenerates because of dark decay. The decrease in voltage has to be considered when the system is designed because, as will be seen later in the section on Development, the development potential is key in determining the toner mass deposited on
312
ELECTROPHOTOGRAPHY
the photoconductor image. Development potential is defined as the difference in potential between the image at development time and the bias applied to the developer. However, the applied bias has to be tailored to the background potential to avoid dirty backgrounds. Therefore, it is important to know what the image and background potentials are as the photoconductor enters the development zone. The dark decay rate depends on the photoconductor material among other things and is discussed in detail by Borsenberger and Weiss in Ref. 16, pp 42–52, by Scharfe et al. in Ref. 20, p. 57, and by Scharfe, Ref. 34, pp. 108–114. The basic mechanism of bulk dark decay is thermal generation of charge carriers which causes timedependent degeneration of the voltage in the charged areas of the electrostatic image. Obviously, the time between charging and exposure and exposure and development has to be short enough to prevent significant degeneration. Yet, these times have to be greater than the charge transit times of the carriers. For a discussion of charge transit through the bulk of the photoconductor transport layer, see Williams, Ref. 33, pp. 24–28 and Scharfe Ref. 34, pp. 127–137. Thus the design of an imaging system architecture needs to take into consideration the concepts of dark decay, photo-induced discharge, charge trapping (16), and carrier transit times. DEVELOPMENT Development is the process by which ultrafine pigmented toner particles are made to adhere to the electrostatic image formed on the photoconductive material. Various means are available for developing this image. In the earliest copying machines such as the Xerox 914 and the Xerox 2400, cascade development was used in open and electroded form. Open cascade development (Xerox 914) deposits toner only at the edges of the electrostatic image. The electric field away from the edges of the image is mainly internal to the photoconductor. Consequently, only the fringe fields at the edges of images attract toner from the developer particles. A need was perceived to produce images that had dark extended areas. To do this, a biased electrode was introduced (Xerox 2400) spaced from the photoconductor that caused the electric field in the interior of extended areas to
split between the field internal to the photoconductor and external to it. Consequently, toner deposition in solid areas was somewhat enhanced. The electrically conductive sleeve of a magnetic brush developer serves the same purpose as the electrode in electroded cascade development. Other methods of development are liquid immersion (LID), powder cloud (used extensively for xeroradiography), fur brush, magnetic toner touchdown, jumping development, used in many machines by Canon, and hybrid scavengeless development (35–39). Two of the most widely used systems now are magnetic brush and jumping development. They will be discussed separately. The form of LID used by the Indigo 9000 printer will also be described. Magnetic Brush Development In magnetic brush development, a mixture of carrier particles and much smaller toner particles are dragged across the electrostatic image. The carrier particles are magnetic, but the toner particles are nonmagnetic and pigmented. They are used to make the image visible. Typically, carrier particles are about 100 µm, and toner particles are about 10–20 times smaller, only about 5–10 µm. Carrier particles are usually coated with polymers to enhance contact electrification between the toner and carrier surfaces. Stationary magnets inside a roughened, rotating, nonmagnetic cylindrical sleeve cause a stationary nonuniform magnetic field to draw the magnetic carrier particles toward the sleeve surface. Friction between the developer particles and the sleeve make the developer move through the gap formed by the photoconductor and the sleeve. Figure 21 shows a typical magnetic brush schematically. The combination of toner and carrier, called the developer, moves across the electrostatic image as a blanket. Because of friction between the carrier and sleeve surfaces, it can reasonably be assumed that, for a range of conditions, developer does not slip on the sleeve surface. Therefore, the developer flow rate per unit roll length through the development zone is proportional to the product of gap width g and the relative velocity vR : dMD = ρD gvR . dt
Development zone
Photoconductor
Toner image Developer particles
Pickup zone Metering blade
S
N
S
Release zone Developer bed height
N Magnets Figure 21. Schematic diagram of a magnetic brush developer.
(32)
ELECTROPHOTOGRAPHY
Here MD is the developer mass per unit length, and ρD is the mass density in the developer blanket. However, Eq. 32 is not valid when the developer blanket slips on the sleeve surface. The condition at which slip starts depends on carrier material, shape, magnetic field intensities and gradients, and sleeve material and roughness, among other things. These conditions are found experimentally for specific configurations of specific materials. Developer mass density is not necessarily constant but depends on toner concentration and carrier and toner particle size averages and distributions. However, the material mass densities and the amount of compression caused by magnetic fields, as the developer is driven through the development gap, are constants for specific configurations. Thus, for a specific geometry, developer material, magnetic field strength, and configuration, one can reasonably assume that ρD is constant under constant operating conditions. The moving conductive sleeve of the magnetic brush is biased electrically with respect to the (usually grounded) conductive backing of the photoconductor. So, an electric field in image areas is generated by the presence of the electrostatic image. This field makes electrically charged toner particles move, during development time, in the desired direction (toward the photoconductor in the image areas and away from the photoconductor in the background areas) and form a powder image on the surface. Development time is the time that an electrostatic image element spends in the development zone. It is the ratio of development zone length to photoconductor relative velocity. The electric field in the image imparts high velocity to the charged toner particles. They traverse the distance between the developer particles and the imaging surface and become part of the developed image. Because the force on a charged particle is the product of charge multiplied by field, low charge toner will fail to make the transition. After being separated from close contact with the carrier particles, the toner particles that have low charge become entrained in the air stream and land in unwanted areas. The development roll bias is chosen to minimize the deposition of toner that has the wrong polarity into the background areas. Toner Concentration and Charge-to-Mass Ratio Toner concentration (ratio of toner mass to developer mass) is a major factor that determines the quantity of electric charge on the toner particles, among other things. For most developers, increased toner concentration yields lowered average charge per particle. However, there are many more toner particles in a developer than carrier particles and the total electric charge in a quantity of developer is usually distributed normally among the toner particles. Let us use the definition that toner concentration, TC, is the ratio of toner mass Mt to developer mass MD . Developer mass is the sum of toner mass and carrier mass, MC . Mt . TC = MC + Mt
(33)
313
TC determines the total amount of toner that is delivered into the development zone. However, the amount that is actually available to an electrostatic image is only a fraction of the total amount. So the mass of toner per unit area deposited in the image is Mt = gvR TC#D A
t
ρD dt.
(34)
0
Here, ψD is the fraction of toner delivered into the development zone that becomes available to the image electric field forces during development time t. Toner particles adhere to the carrier particles by electrical and Van der Waals attraction. An electric field is generated in the development gap by the charges on the photoconductor when they come close to the electrically biased developer roll. This field overcomes the adhesion of toner particles to the carrier particles and causes deposition of the charged toner into the image areas. Thus, ψD depends on the adhesion of toner to carrier, the mechanical forces in the development zone that tend to dislodge the toner, and the depth into the developer bed to which the imaging electric field penetrates. Toner concentration affects the toner delivery into the development zone and also influences the quantity of charge on toner particles. The average toner charge-to-mass ratio (Q/M) in most developers is related to the toner-to-carrier mass ratio Ct , by Q At . (35) = M Ct + TC0 In Eq. (35), Ct is the total mass of toner divided by the total mass of carrier in a bed of developer, and Q/M is the average toner charge-to-mass ratio in the developer bed. At and TC0 are measured parameter. The surface state theory of contact electrification (see Ref. 40, p. 83) assumes that the charge on a carrier particle after charge exchange with toner is the product of the carrier area available for charging, the electronic charge, the carrier surface state density per unit area per unit surface energy, and the difference between the carrier work function and surface energy. Similarly, the charge on the toner particles on a carrier is the product of the total toner surface area, the electronic charge, the toner surface state density per unit area per free energy, and the difference between the toner work function and free energy. These assumptions lead (see Ref. 40, p. 83, Eq. 4.14) to the toner mass-to-toner charge ratio on a single carrier particle (the reciprocal of the charge-to-mass ratio) given in Eq. (36): Mt = RCt Qt
ρc 3ϕeNc
+r
ρt 3ϕeNt
.
(36)
In Eq. (36), ϕ is the difference between the carrier ‘‘work function’’ and the toner ‘‘work function.’’ The other parameters in Eq. (36) are Nc and Nt , the surface state densities per unit area per unit energy on the carrier and toner particles, respectively
314
ELECTROPHOTOGRAPHY
R and r, the carrier and toner average particle radii, respectively ρc and ρt , the carrier and toner particle mass densities, respectively Ct , the ratio of the mass of all the toner particles on one carrier to the mass of the carrier particle e, the electronic charge Qt , the average charge of all the toners on one carrier particle Mt , the mass of all of the toner particles on one carrier particle. It can be shown that the average charge-to-mass ratio on one carrier equals the average charge-to-mass ratio in a bed of developer. Let mt and mc be the mass of one toner and one carrier particle, respectively. The number of toner particles in a bed of developer is n0 nc , where n0 is the number of toner particles on one carrier particle and nc is the number of carrier particles in the bed of developer. Thus, the mass of the total number of toner particles in a bed of developer is n0 nc mt . The total toner charge in a bed is n0 nc q0 = Qt , where q0 is the average charge on a toner particle. The total toner mass in the bed is n0 nc mt = Mt . So, the average charge-to-mass ratio in the bed is Qt Mt . However, the total toner charge on a single carrier particle is n0 q0 , and the total toner mass on the carrier particle is n0 mt . So, the average charge-to-mass ratio on a carrier particle is q0 /mt which is just equal to Qt /Mt . So, the average charge-to-mass ratio on a carrier particle is the same as the average charge to mass ratio in the bed. The mass of all of the carriers in a bed of developer is nc mc , and the ratio of the two is n0 mt /mc . So, Ct is both the ratio of the mass of all of the toner particles on one carrier particle to the mass of the carrier particle and the ratio of the total toner mass in the bed of developer to the total carrier mass in the bed. Because both toner and carrier particles are distributed in size, these arguments apply only to the average toner size, carrier size, and charge-to-mass ratio. For these averages, Eq. (35) can be obtained from Eq. (36) by letting At =
3ϕeNc Rρc
and TC0 =
Nc rρt . Nt Rρc
(37)
The parameters At and TC0 are usually measured in the laboratory, under experimental conditions designed to provide insight into the behavior of developer materials of varying physical and chemical formulations, charge control additives, relative humidity, developer age, running history, and toner concentrations. Ruckdeschel performed experiments on dynamic contact electrification by rolling a bead in the track of a rotating split Faraday cage and measuring the voltage buildup as a function of the number of revolutions (41). The experiments were repeated with beads of various
materials. These experiments showed that contact electrification is a nonequilibrium ion transfer process that depends on the area of contact, the adsorption of water on the surfaces, the speed at which the surfaces separate, and, of course, the materials that comprise the surfaces in contact. The adsorption of water which influences toner charge when in contact with the carrier is often seen by the marked influence of relative humidity on the toner chargeto-mass ratio. By performing his experiments in a bell jar, Ruckdeschel found that the nature of the surrounding gas had a small influence, as long as the pressure was close to atmospheric. Close to vacuum, the pressure did indeed have an influence on charging. Nash in a review article (Ref. 42, p. 95–107) presents factors that affect toner charge stability. He presents a modified form of Eq. (35) for charge generation of a simple additive-free toner–carrier mixture: At Q [1 − e−γ t ]. = M Ct + TC0
(38)
The exponential factor γ , the effective rate constant for charging, depends on the frequency of toner to carrier contacts during mixing. The frequency per unit mass depends on geometrical mixing factors and the power put into the mixing device. These important factors make it difficult to compare the results of bench tests with those obtained from machine operating conditions, yet, toner charge per particle is one of the key parameters involved in toner deposition. Mixing time t determines the total number of toner to carrier contacts. From surface state theory, it follows that when all the surface states on carrier and toner particles are filled, Q/M saturates. Toner particles are usually combined with charge-control agents. The sign and quantity of charge per toner particle depend on the carrier core and coating materials, surface coverage, toner polymers, and charge-control agents that are used in fabricating the developer (see Ref. 1, p. 119 and Ref. 40, p. 82 for details). Comparisons between bench tests and the results of running the same materials in a machine, although very difficult, are particularly important. The evaluations of these comparisons are used to guide the formulations of carrier coatings and toner additives that provide stable developers that have long life. For a detailed discussion of toner instability, see Ref. 42, p. 95. Some factors involved in developer instability are the sensitivity of Q/M to ambient temperature and humidity, to toner addition strategies during operation, and to developer aging effects. Some charge-to-mass ratio fluctuations at developer start-up and the accumulation of wrong polarity and low charge toner also contribute to developer instability. Developer aging effects are related to the loss of carrier coating after extended operation; toner ‘‘welding’’ onto the carrier surface; and/or ‘‘poisoning’’ of the carrier surface by toner charge-control agents, flow control agents, or pigments. Both the toner and carrier charge distributions in the developer determine to what extent toners end up in the image or background areas. The fraction of toner that has the wrong polarity (or very low charge of the right polarity) is also determined by toner concentration. A developer can become ‘‘dirty’’ or ‘‘dusty’’
ELECTROPHOTOGRAPHY
at various high concentrations. This toner becomes part of unwanted print background or machine dirt. Much effort has gone into producing stable developer materials. Electric Field and Toner Deposition. The optical density of a toner monolayer in an extended area image is proportional to the number of toner particles deposited per unit area. After a monolayer is deposited, the optical density tends to saturate, and further deposition only increases the thickness of the layer without making it darker. Let us define development to completion (neutralization-limited development) of an electrostatic image as the condition reached when the vertical component of the electric field in a horizontal development zone is shut off. This can happen by neutralizing the electrostatic image charge density either by toner charge or by quenching the electric field in the development zone by the space charge left on carriers after the toner is stripped. (There is another limitation on toner deposition called supply-limited development.) Let us consider neutralization-limited development without space charge. (Also see Ref. 40 p. 149 and Ref. 33, p. 165.) When toner flow is sufficient to saturate development, see Eq. (34), field neutralization shuts off further deposition in the image. Before deposition of toner, the initial electric field in the development zone, of a capacitively charged photoconductor in image areas is given by Ei =
Vimage − Vb Vimage − Vb ' . = dz dp Z ε0 + + dair kz kp
(39)
Assume that the development zone is partitioned into the photoconductor, the developer blanket, and an air gap. Here, Vimage is obtained by the methods described in Photoconductor Discharge Vb is the bias applied to the development roller; and Z is the sum of the electrical thickness of the developer bed, the photoconductor, and the air gap. In any specific experimental setup, the photoconductor material and thickness, the magnetic field and distribution, and the developer material are usually held constant. Thus the denominator of Eq. (39) is determined by photoconductor thickness dp , its dielectric constant kp , air gap dair , the effective electrode spacing dz and the dielectric constant kz in the region between the effective electrode and the photoconductor surface. As toner deposition proceeds, the numerator is diminished by Vt , the voltage of the average charge density of the toner charges. Assuming that these toner charges can be approximated by a thin sheet of charge and that development proceeds until Ei vanishes, then, Vimage − Vt − Vb = 0 and Vimage − Vt = (σimage − σt )Z.
(40)
Because σt = (Q/M)(M/A), Eq. (40) can be rewritten as M = A
Vimage − Vb ' . dp dz ε0 (Q/M) + + dair kz kp
(41)
315
So, in the neutralization, limit the deposition of toner depends on the development potential, Vimage − Vb , the photoconductor thickness, and the toner charge-to-mass ratio, when other things are equal. Many simplifying assumptions are made in deriving Eq. (41). However, deposition does not usually go to field neutralization. In practice, 50–80% field neutralizations are common. Developer Conductivity. The conductivity of a magnetic brush in the development zone is very important. Consider that charged toner particles leave their countercharged carrier particles during development, and become part of the developed image on the photoconductor. This constitutes charge flow or an electrical current out of the developer in the development zone. The countercharges on the carrier particles either dissipate to the developer sleeve, when a highly conductive developer is used, or become part of space charge in the development zone when insulating developers are used. This space charge tends to negate the electric field caused by the electrostatic image near the effective electrode. When an insulating developer is used, this space charge is swept out of the development zone because the development roll moves faster than the photoconductor. By moving faster than the photoconductor, the carrier particles that have lost toner and have become charged (countercharge of toner) are replaced by fresh particles from the developer sump. See Ref. 33, p. 165 for an analysis of space-charge-limited magnetic brush development. Thus, one function of a high conductivity developer is to prevent, at roll speeds slower than for insulating carriers, the premature shutdown of development because of a buildup of space charge in the development zone. When insulating carrier particles are used, the location of the biased effective electrode in the development zone is defined by the electrically conductive sleeve, usually about 1 mm away from the photoconductor surface. However, if the carrier particles are conductive, then the effective electrode is defined by the time constant of the developer blanket and will be closer (25–200 µm) to the photoconductor than the developer sleeve. The time constant, in turn, is a function of the particle shape and conductivity, toner concentration, magnetic field strength, the particle density of the blanket, and its state of agitation in the development zone (see Ref. 43 and Ref. 44, p. 1510). Carrier particle conductivity can be controlled by the nature of the core material; the shape of the particle (spherical vs. granular); the presence and thickness of an oxide layer; and the nature, thickness, and surface coverage of the carrier coating. Grit and oxidized iron coated with low percentages of polymer to enhance contact electrification are popular materials. Toner concentration affects developer conductivity by influencing the number of carrier-to-carrier contacts made by the conductive carrier particles within a magnetic brush chain. Figure 22 shows, schematically, bead chains that have low or high toner concentrations. The random distribution of insulating toner particles within a chain causes statistically fewer electrically conductive contacts between the conductive carrier particles at high than at low concentrations. One can also visualize how carrier
316
ELECTROPHOTOGRAPHY
Grounded photoconductor with imaging charge High toner concentration
Effective electrode
Effective electrode Low toner concentration Figure 22. Schematic diagram of effective electrode location at high and low toner concentrations.
Biased sleeve
shape and size distribution as well as toner shape and size distribution influence the number of bead-to-bead contacts per chain. It can be appreciated that if the carrier core is conductive, then the fraction of the surface covered by the coating that is added for contact electrification of toner statistically influences the conductivity of the developer bed. The effective electric field during development at locations within extended areas far away from electrostatic image edges is enhanced by conductive developers. When insulating developers are used, the fields at these locations are internal to the photoconductor, and only the fringe fields at the image edges develop. However, if the developer conductivity is high enough to present a closely spaced effective development electrode to the electrostatic image, then the field at the interior of extended areas is no longer internal to the photoconductor. Thus, toner deposition within extended areas is enhanced by using conductive carriers, and fringe field development is reduced. The flow of charge during deposition (toners are charged particles) constitutes an electrical current associated with the flow of toner particles from the developer bed to the photoconductor surface. To approximate the influence of the effective electrode spacing (conductivity) on development, assume that toner current dσt /dt during development is proportional to the electric field in the development zone. Following Ref. 40, p. 125 and Ref. 34, p. 26, assume that the electrode-to-photoconductor backing is divided into three slabs, as shown in Fig. 23. Therefore, because the field in the image area, including toner deposition as a thin sheet of charge, is given by
Ei =
Vimage − Vb − Vt Vimage − Vb − Vt ' = dz dp Z ε0 + + dair kz kp
=
VD − Vt ' , dz dp ε0 + + dair kz kp
dσt VD − Vt = αEi = α . dt Z
In terms of M/A, Eq. (43) becomes d(M/A) α VD − Vt α = Ei = dt Q/M Q/M Z
dz , k z
Air gap d p, k p
d air Photoconductor
Development zone length Figure 23. Schematic diagram a dielectric slab of a conductive magnetic brush developer.
(45)
Assume that the toner charge density σt in the developed image is a thin sheet, as before. For ease of notation, let M/A = m, dp /kp ε0 = c, and Q/M = q.
Vt =
Developer bed
(43)
Because σt = (Q/M)(M/A) and Q/M is constant (the average value) during deposition, but the variable is M/A, so, Q d(M/A) dσt = . (44) dt M dt
Then, Effective electrode
(42)
σt dp = kp ε0
Q M
M A
(46)
dp = mqc, kp ε0
(47)
and Eq. (45) becomes dm α α = VD − mc dt qZ Z
(48)
Putting Eq. (48) into the standard form of a linear firstorder differential equation yields dm + C1 m = C2 dt
ELECTROPHOTOGRAPHY
C2 =
α VD qZ
C1 =
and
αc . Z
(49)
C1 and C2 vary slowly, if at all, during the deposition process. ! C dt Letting the integrating factor be φ(t) = e 1 yields the solution m = C1 e−C1 t C2 eC1 t dt + K = C1 e−C1 t
C 2 C1 t e +K , C1 −C1 t
m = C2 + KC1 e
(50)
αct αVD αc − = +K e Z qZ Z
(51)
The boundary conditions are m = 0, when t = 0; m = 0 when VD = 0; and m = 0, when α = 0. Solving Eq. (51) for K using these conditions yields K = −αVD /qZ. When the substitutions for the coefficients (Eqs. (46) and (49)) are replaced in Eq. (51), it becomes kp αVD M 1 − e = A (Q/M)Z
−dp αt dz dp + +dair kz kp
(52)
Equation (52) fulfills the experimental requirements that 1. as development potential VD ⇒ 0, toner deposition ⇒ 0; 2. as development time, expressed as development zone length (LD /vp ) divided by process speed, t ⇒ 0, toner deposition ⇒ 0 3. as dependence of deposition rate on field ⇒ 0, toner deposition ⇒ 0; 4. as development time increases and other things remain equal, deposition approaches the neutralization limit; 5. as the charge to mass of the toner increases, the neutralization limit decreases; and 6. toner deposition is linear with development potential, if configuration and materials properties are constant. As developer conductivity increases, Z decreases because both dz and dair decrease whereas kz increases (see Eq. (42)), that is the effective electrode spacing decreases. In addition, the exponent in Eq. (52) increases, so that the neutralization limit is reached more rapidly. Thus, Eq. (52) shows that deposition is enhanced by using conducting developers, if other conditions are kept constant. However, Eq. (52) also shows that adjustments can be made to allow insulating developers to deposit as much toner as conductive developers. For example, development time is increased by extending the development zone or slowing the process speed. If high process speed is essential when insulating developers are used another way of increasing development time is to use multiple development zones. This would allow some insulating developers to yield the
same toner deposition as more conducting developers at comparable development potentials. There are other ways of accomplishing the same thing, but all at some additional cost. Figure 24 shows some examples of development curves. Curves, like these examples, can be generated by varying the parameters discussed earlier and allow estimates of the optical density of solid areas. (Optical density is a function of M/A, and M/A = 0.6 to 0.8 mg/cm2 is desirable.) These parameters help define an imaging system. Using conductive developers to make the latent image visible has the advantage that close proximity of the front surface of the photoconductor to the effective electrode minimizes fringe field development. Therefore the geometry of the developed image closely approximates the geometry of the electrostatic charge pattern on the photoconductor. This is useful for estimating the optical density of lines and halftones. Write Black (DAD) and Write White (CAD) Two forms of development, can be used to make electrostatic images visible: Charged Area Development (CAD, ‘‘write white’’) and Discharged Area Development (DAD, ‘‘write black’’). In CAD, toner is charged opposite in polarity to the electrostatic image. Because opposite charges attract each other, toner is deposited into the charged area of the image. CAD provides optically positive image areas from optically positive exposure patterns. The bias on the developer roll has the same polarity as the charges in the image. In DAD, however, toner is charged with the same polarity as the electrostatic image, but the developer bias is set close to the voltage of the charges in the unexposed background areas. Exposure is in the form of an optical negative. Image areas have a lower magnitude of charge than unexposed background areas. Thus, optically negative electrostatic images provide optically positive output images. Figure 25 shows an example of an exposure pattern. This exposure pattern is converted into the electrostatic image voltage pattern, shown in Fig. 26, by a PIDC. Figure 26 shows development potentials to be used in one of the development models, illustrating the difference between CAD and DAD.
1.2
Q /M = 15 µc /g
1
M /A (mg/cm 2)
where
317
Q /M = 20 µc /g
0.8 0.6 0.4 0.2 0 0
100
200
300 400 V D (volts)
500
600
Figure 24. Examples of development curves calculated from Eq. (52).
ELECTROPHOTOGRAPHY
1.2
700
1
600
0.8
500
V image at x ′
Exposure at x ′
318
0.6 0.4 0.2
Normal exposure
Vbias (DAD) Overexposure
400
VD
300 200
Vbias (CAD)
100
0
0 0
0.005
0.01
0.015
0.02
0.025
0.03
0
Position on pc (x ′) Figure 25. Example of exposure pattern for conversion into the image voltage pattern in Fig. 26.
0.01 0.015 0.02 Position on pc (x ′)
0.005
Development zone
600
Toner transition zone
V bias (DAD)
500 400
VD
300
Metering blade
VD
200
0 0
0.005
0.01 0.015 0.02 Position on pc (x ′)
0.03
S
Photoconductor Toner image
N
S
Magnetic toner particles
N Magnets N
V bias (CAD)
100
0.025
Figure 27. Electrostatic image patterns of normal exposure and overexposure that Show line growth in DAD and line shrinkage in CAD.
700
Vimage at x ′
VD
0.025
0.03
Figure 26. Electrostatic image pattern illustrating how developer bias is set for CAD or DAD.
Line growth and overexposure. When overexposure occurs in DAD mode, the totally exposed areas between the unexposed regions tend to widen because the photoinduced discharge curve (PIDC) is flat in the very high exposure regions (see Fig. 20). The bias voltage on the development roll then tends to drive toner into the exposed image areas. Because the developer bias is set close to the background potentials, DAD produces line growth and tends to reduce resolution. However, in the CAD mode, the line image occurs in the regions where the exposing spot is turned off between the exposed background areas. Because light intensity is distributed in space, the fraction of peak exposure at a given point in space will discharge the photoconductor more, as peak exposure is increased. Thus, overexposure in the CAD mode tends to produce line thinning and low optical density. This is illustrated in Fig. 27. The upper curve represents the electrostatic image when peak exposure is 10 erg/cm2 , and the lower curve is for a peak exposure of 20 erg/cm2 . If the difference between the background voltage and developer bias is to be held constant to avoid background deposition when overexposure occurs, the bias has to be readjusted. This influences the value of development potential VD in Fig. 27. Optimizing electrophotographic process parameters is a complicated subject (see Ref. 34 for methods and details).
S
S Toner sump
N
Figure 28. Schematic drawing of jumping development using magnetic toner.
Jumping Development Using Magnetic Toner Jumping development is a method also popular in some modern machines. It can be carried out with magnetic or nonmagnetic toner. In the former method, magnetic toner is metered onto a moving roller sleeve by a metering blade that also charges the toner. The thin toner layer is brought close to the electrostatic image but is spaced 50–200 µm away from the photocondutor surface. An oscillating field, provided by an ac bias with a dc offset on the roller sleeve, causes a cloud of toner to be lifted off the roller surface, but out of contact with the photoconductor surface. No contact is made by the airborne toner with the photoconductor surface until an electric field attracts the toner into electrostatic image areas. The force balance that keeps the toner airborne in background areas is the magnetic attraction to the roller and the electrical forces of the ac voltage on the bias. The role of the electrostatic image is to unbalance these forces and motivate toner to move toward the photoconductor. Figure 28 shows a schematic drawing of a jumping development system. Controlling the toner thickness on the roller as it goes into the development zone is very important. Hosono (45,46) describes some ways of achieving this. Because there is a magnetic attraction as well as the usual forces of adhesion between the magnetic toner and the developing roller, deposition does not begin until a threshold electric field is exceeded. The application of biases is explained by Kanbe (47). As toner on the sleeve moves into the development zone, the ac and dc electric fields get stronger, and toner
ELECTROPHOTOGRAPHY
oscillation back and forth between the photoconductor and the sleeve becomes more energetic. When the phase of the ac bias attracts the toner toward the electrostatic image, the particles move further and faster than in the reverse polarity phase because the fields caused by the electrostatic image augment the fields caused by the ac component of the developer bias. When the phase is such as to bring toner back to the sleeve, the ac component of field is diminished by the presence of the electrostatic image field. Consequently, the motion of the particles that is augmented by the electrostatic image causes them to reach and adhere to the photoconductor and become part of the developed image. However, if the ac component is too high, toner will be urged to the photoconductor in background areas as well as in image areas during deposition. Once the toner contacts the surface and forms an adhering bond, the reverse phase of the ac component cannot dislodge and remove it. This is desirable in image areas but not in background areas. Thus the factors that affect the developed image quality are the ac magnitude and frequency, the dc offset level, the toner layer thickness and charge density on the developing roller, and the effective air gap between the toner layer and the photoconductor surface. Let us divide the development zone into three regions (Fig. 29 and 30). In region 1, the superposition of electrostatic image field, ac and dc bias fields are less than the threshold field (Ed < Eth ) and attract toner away from the image. In region 2, they are less than the threshold field but attract toner toward the image. In region 3, they are greater than the threshold field (Ed > Eth ) and attract toner toward the image. (see Fig. 29). As a partially developed image area passes through the development zone, it goes through the three regions a number of times, depending on the frequency and amplitudes of the ac bias,
319
the value of the threshold field for deposition, the dc offset of the bias, and the instantaneous potential of the partially neutralized electrostatic image. The total length of the development zone is defined by the geometry of the development roller and the photoconductor configuration. Total development time is LD /vp , where vp is process speed. Let the total time that an image element spends in regions 1, 2 and 3 be t1 , t2 , and t3 , respectively. For a sinusoidal ac component (Ref. 48 shows some nonsinusoidal voltage pulses) added to a dc bias, the electric field in the development zone is
ED (t) =
Vimage − Vti (t) − [Vb + Vtc (t) + Z
Vp−p sin π ωt] 2 .
(53) Here Vimage is the voltage of the image on the photoconductor, and Vti is the voltage of the toner deposited on the photoconductor as it moves through the development zone. This quantity depends on the instantaneous mass per unit area (M/A) of the layer. Assuming conservation of charge in the cloud, then the voltage of the charge in the toner cloud layer Vtc is also a function of instantaneous deposition and time. The dc component of the applied bias Vb is not a function of time. A plot of Eq. (53) is shown in Fig. 30 for a hypothetical case where LD /vp = 0.02 seconds, Vb = 200 volts, Vimage = 200 volts, ω = 400 Hz, Vp−p = 800 volts, Z = 100µ m, and Eth = 1 volt/micron. It was assumed that the voltages Vti and Vtc were zero to indicate a possible initial condition in the development gap before deposition starts to occur. So, in this case t1 = 0.010 seconds, t2 = 0.0015 seconds, and t3 = 0.008 seconds. However, as deposition begins, Vi depends on the amount of toner deposited and its charge-to-mass ratio.
d p, k p
Photoconductor Toner particles
Air gap
d air
Region 1 Region 2 Region 3 Region 1 Region 2 Region 3
d t kt
Toner layer Biased sleeve
Figure 29. Schematic drawing of development zone for jumping development.
E D (volts/micron)
Development zone L D
5 4 3 2 1 0 −1 0 −2 −3 −4 −5
Deposition Region 3
E th
Region 2 0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
Region 1
Development time t(s)
Figure 30. Oscillating electric field in development zone that shows times when toner is attracted toward the electrostatic image.
320
ELECTROPHOTOGRAPHY
Therefore, as an element of charged photoconductor moves through the development zone, the value of the development field diminishes. If there is sufficient development time, development again approaches completion. It can be assumed that the instantaneous deposition rate is proportional to ED (t), as in Eq. (45) but only when ED > Eth , that is, region 3. This assumes that the toner particles respond to the field, make it all the way across the air gap, and adhere to the photoconductor image area. Because the electric force on a toner particle is the product of the field and charge, uncharged particles would not be motivated to move. The details of this development process are covered by Kanbe (49). Jumping Development using Nonmagnetic Toner Magnetic toner contains magnetite, a black magnetic material. Thus, it is not a practical toner for color applications. Tajima (50) discloses a method that allows jumping development using nonmagnetic toner. In this method, a conventional two-component magnetic brush developer deposits a layer of toner onto a donor roll that is biased ac and has a dc offset voltage (Fig. 31). As in magnetic toner jumping development, nonmagnetic toner in the development zone is maintained as an oscillating cloud by the ac component of the bias. The electrostatic image augments the fields to attract toner to deposit on the photoconductor. Because full donor coverage is required by toner in a confined space, a conductive carrier is usually used in the magnetic brush developer. In this process, great care must be taken to prevent carrier particles from depositing on the roll surface and bringing toner particles into the image development zone. The advantage of the magnetic toner scheme is that there are
Development zone
no carrier particles, but of course, the magnetic toner is disadvantageous for color applications. Hybrid Scavengeless Development (HSD) Hybrid scavengeless development is very similar to jumping development using nonmagnetic toner, as just described. However, a set of wires is placed into the photoconductor–donor roll nip. These wires are at a high ac voltage, typically 600–800 volts peak to peak, and mechanically and electrically lift off the toner deposited on the donor by the magnetic brush developer (Fig. 32). So again, an oscillating cloud of toner is maintained in the electrostatic image development zone and the electrostatic image attracts toner to make the transition to the photoconductor. Patents by Hays (51), Bares and Edmunds (52), and Edmunds (53) disclose some of the details involved in this development system. A form of hybrid scavengeless development, described by Hays (54), replaces the wires with electrodes that are integral to the donor roll. These electrodes are biased with respect to each other and have an ac voltage. The electric field strips the toner off the donor surface, raising an oscillating cloud of toner which is used to develop the electrostatic image. Figure 33 shows the electroded development system schematically. The electrodes are all commutated so that the ac bias is applied only to triads of electrodes simultaneously. The donor roll is a dielectric material so that the ac electric fields between the outer electrodes and the central electrode of the triad are mainly tangential to the surface of the roll and tend to break the adhesion between the toner and the roll surface. Once this bond is broken, the toner is available to form a cloud. The dc bias on the central electrode and the electrostatic
Photoconductor
Development zone
Toner image
Toner transition zone
Nonmagnetic toner particles
Toner image Nonmagnetic toner particles
Wires
Skiving blade
S
S
Skiving blade
S
S N
N Magnets
Two−component developer sump
N
Figure 31. Schematic drawing of jumping development using nonmagnetic toner.
Photoconductor
S N
N S
Magnets
Two−component developer sump
N
Figure 32. Schematic drawing of hybrid scavengeless development using wires in the development zone.
ELECTROPHOTOGRAPHY
Photoconductor
Commutated electroded donor roll
Magnetic brush developer
Figure 33. Schematic drawing of hybrid scavengeless development using a segmented donor roll.
image augment the fields perpendicular to the surface and attract toner onto the photoconductor in the image areas. Development continues until either the electrostatic image is neutralized or until the image-bearing element of the photoconductor is beyond the development zone. Liquid Development of Electrostatic Images Liquid development of an electrostatic image, liquid ink electrophotography, was a popular method when ZnO coated papers were used as photoconductors in the 1950s and 1960s. Originally, the electrostatic image on the photoconductor was totally immersed in a mixture of pigment particles and fluid carrier. The development process proceeded by having electric fields attract the charged pigment particles to the latent image. As in open cascade development, only fringe fields were developed in the original version of liquid immersion development (LID). So, Stark and Menchel (55) introduced an electrode close to the imaging surface (aluminized Mylar) and let the developer fluid pass between the member and the electrode. This provides solid area development capability. In this form of LID, the developed images were quite wet and excess carrier fluid had to be evaporated or removed mechanically. This form of LID became undesirable for commercial application because disposable photoconductors became unpopular. To reuse the photoconductor, the developed toner particles needed to be transferred to a transfer sheet such as paper. The easiest way to do this for the small (submicron sized) pigment particles was to use a combination of pressure and heat. ZnO coated paper was not a good candidate for a reuseable photoconductor. In addition, due to increasing environmental awareness in the 1980s and 1990s, evaporation of the hydrocarbon liquid carrier became less desirable, so LID became
321
commercially unpopular. The invention of developer fluid metering (56,57), blotting and cleaning rollers (58), and compaction and rigidization of the developed image to make it transferable to an intermediate member (59–61) made the new method of LID commercially viable for high-speed, high-quality color printing. Indigo machines use this procedure quite successfully. Toner particles for dry development systems consist of pigments dispersed in a thermoplastic coated with chargecontrol and flow additives. These particles range from 5–12µ, in average diameter. Liquid development systems use pigment particles that range from 0.2–0.4µ in average diameter and are dispersed in a hydrocarbon insulating carrier fluid. Commonly used carrier fluids include Isopar. Currently there are many patented formulations that have complicated chemistry (62–67). They also have charge director substances in the fluids to control the charging of the pigment particles (which are now coated with thermoplastics) by contact with the carrier fluid. The carrier fluid is nonconducting, so, that the electrostatic image is not discharged by intimate contact with the fluid. The average particle sizes of the coated pigments range from 0.4–2.0µ. A counterrotating, very closely spaced electrically biased metering surface considerably reduces the amount of fluid carryout and provides excellent solid area development. Image disturbances can appear when the toner image is too thick or if the charge on the topmost layer is too low. However, biasing schemes and geometries can counteract these defects (68). A set of cleaning rollers that moves in the same direction as the photoconductor (69), further reduces the amount of fluid carryout. Figure 34 shows a schematic drawing (out of scale) of the development metering zone in a counterrotating system, such as depicted by Denton (61). In practice, the closest approach of the photoconductor to the roll is on the order of thousandths of an inch, Whereas the roll diameter can be several inches. The photoconductor is about 25µ thick. The dotted lines in Fig. 34. represent the general pattern of fluid flow as the image element on the photoconductor approaches the minimum gap in the development zone. The toner particles move with the carrier fluid until the perpendicular electric field in the development zone overpowers viscous drag. However, fluid at the surface of the photoconductor moves at the photoconductor velocity, and fluid at the roller moves at the velocity of the roller surface. Consequently, the fluid is sheared, and the locations of the menisci at the entrance and exit zones of the development zone help to define the development zone length and the distribution of the electric field. Thus, the development zone length depends in part on the velocities of the photoconductor and roll surfaces, as well as on the geometry. A relatively thick layer of ink is deposited on the photoconductor either by nozzles that squirt ink on the surface (70), or by allowing it to run through a bath of developer. These ink deposition zones are comparatively far away (on the order of inches) from the development metering roll. The electrostatic image does not have a reference electrode close by during the transition from the
322
ELECTROPHOTOGRAPHY
Electrostatic image
Development zone Photoconductor
Meniscus of ink leaving zone
Fluid flow
Meniscus of ink entering zone
Toner image
Toner particles
Development and metering roll
Figure 34. Schematic drawing of development using liquid ink and a counterrotating development/metering roll.
deposition zone to the development zone. So the electric fields generated by the electrostatic image charges are mostly internal to the photoconductor. Some development may start during this transition because of fringe fields, but most development occurs in the development zone. As the ink layer approaches the minimum gap in the development zone, the electric field in the inking gap approaches a maximum. However, as the field increases, toner particles entrained in the carrier fluid are attracted toward the charges on the photoconductor surface, and are deposited when contact is made. The toner deposited tends to neutralize the image (as in dry development). So, the total amount deposited on the photoconductor depends on the flow rate of toner through the development zone, the electrostatic image charge density and the fields set up by these charges, the charge-to-mass ratio of the particles, the total development time, and the fluid characteristics of the ink. Among the ink characteristics are the carrier fluid viscosity, the mobility of the particles and their volume concentrations, the volume charge density of the micelles that carry the countercharge left behind by the deposited toner and their mobility through the fluid, and the development zone geometry. Development of the electrostatic image by liquid inks does not leave a dry image. The concentration of toner particles in the ink before development is about 1% to 4%, but the concentration of the particles in the developed image can be in the range of 15% to 20%. In background areas, of course, the volume concentration of particles has to be zero. So, although there is carrier fluid in both background and image regions, the effective action of the development fields is to concentrate the particles and to rigidize the toned image to resist the shearing stress at the exit meniscus of the development zone. The function of the metering part of the development metering roll is to reduce the fluid layer thickness from many microns at the entrance zone meniscus of Fig. 34. to a thickness of about 4µ to about 8µ at the exit meniscus. Rigidization or
compaction of the toned image is disclosed by Denton (61) and by Domoto (68). Then, as disclosed by Landa (60), the developed photoreceptor is passed through an electrostatic transfer zone where the remaining fluid helps to enable transfer of the particles to an intermediate medium which then passes the toned image through a blotter or squeegee. The blotter or squeegee is electrically biased to concentrate the toner particles further and remove carrier fluid until particle concentrations of about 40 to 50% are achieved. Another function of the electric field at the blotter is to rigidize the toner image further. This intermediate member is passed into a pressurized and heated zone and the toner is transferred to paper or any final substrate. TRANSFER Now that the electrostatic image is visible on the photoconductor, the toner particles can be transferred to an intermediate member such as a dielectric or to a viewing medium such as paper or a transparency. To do so, the transfer medium is brought into contact with the developed photoconductor so that a sandwich is formed, and the toner layer is placed between the photoconductor and the transfer medium. Transfer is achieved electrically, mechanically or by a combination of both. If the transfer medium can withstand high temperatures (on the order of 150 ° C), then the toner melts simultaneously in the transfer process and is fixed to the viewing medium. This process is called transfusion. In the ionographic process used by Delphax Corporation (see Ref. 15 for a description of the process), the electrostatic image is formed on an aluminum drum that uses a layer of aluminum oxide as the charge-receiving dielectric. Alumina is hard and heat resistant, so after the image is developed with toner, it is brought into contact with paper backed by a heated pressure roller. Thus the toner melts in the pressurized nip and transfuses. However, most photoconductors degrade rapidly when subjected to high
ELECTROPHOTOGRAPHY
temperatures (degradation accelerates for Se alloys and AMAT layers when temperature exceeds about 110 ° F). Thus, the toner image is transfused after it is first electrostatically transferred to an intermediate member which can be a heat resistant dielectric such as Mylar. Cold pressure transfer, a purely mechanical means, can be used if the photoconductor can withstand high pressures. However, low transfer efficiencies result unless the pressure is so high that the medium is deformed by it. Paper tends to calendar under these conditions. Electrostatic transfer is achieved by corona charging the back of the transfer medium (a purely electrical method) or by bringing the toner sandwich in contact with a conformable, electrically biased roller (a combination of electrical and mechanical means). Corona transfer is disclosed in Ref. 71 and biased roll transfer is disclosed in Refs. 72–78. The biased transfer roller (BTR) consists of a hard, usually metallic, cylindrical core that is coated with a layer of conductive conformable material (72,79). Coating materials such as carbon-loaded elastomers have been used. The electrical conductivity and compressibility have to be tightly controlled and depend on the photoconductor and transfer medium speed and electrical characteristics. Toner particles are charged. Thus, they react to electric fields that are generated between the photoconductor surface and the transfer medium backing. If the field is in the right direction and is strong enough to overcome the adhesive and cohesive forces, the toner particles stick to the paper and provide a heat-fixable image. Because toner particles are charged, the forces that tend to move them toward the surface of the transfer medium are proportional to the product of charge and field. Thereby, one would imagine that simply increasing the value of the electric field would enhance transfer. In fact, air breakdown provides an upper limit to the field and exceeding the threshold for breakdown only inhibits transfer by tending to neutralize the charge on the toner particles. Increasing the charge-to-mass ratio of the toner particles assists transfer, but increased charge on the particles also increases the countercharge on the photoconductor. So there is an effect but it is small. Obviously if the particles are uncharged, they will not react to uniform electric fields. When the photoconductor, toner, paper sandwich leaves the corona transfer region, the paper will stick to the photoconductor because of electrostatic tacking forces. The electrical force on the transfer medium arises from the attraction of the electric charges on the back surface of the medium and the conductive backing of the photoconductor. They are relieved by a detacking device that allows the charges from the paper backing to relax when the paper is separated from the photoconductor. These devices are usually high frequency ac corona generators or ac biased rolls. The frequency has to be chosen so that strobing is not introduced. Using biased transfer rolls, tacking is not as severe as in corona transfer (80). Tacking can be avoided by tailoring the electric field in the toner sandwich, so that it tends to decay at the exit portion of the roller nip. Reference 81 describes exposure of the photoconductor at the exit region to control the field as separation occurs between the toner, paper, and photoconductor sandwich.
323
Transfer efficiency is the fraction of the total toner deposited on the photoconductor which appears on the transfer medium. It depends on the electric field in the sandwich, the electrical and mechanical adhesion of particles to the photoconductor, the cohesion that holds the toner layer together as a unit, and the electric field distribution at the exit of the transfer zone. If one plots the experimentally obtained mass of toner per unit area on the transfer medium versus the mass of toner per unit area on the photoconductor then, a linear curve is obtained for coverage up to about 1.0 mg/cm2 . However, there is usually threshold coverage of about 0.02 mg/cm2 before transfer starts to occur. Therefore, transfer efficiency is usually stated, at or near 1.0 mg/cm2 deposition, as the ratio of toner mass on the paper to toner mass on the photoconductor before transfer. Reference 33 (pp. 220–221) presents an exponential curve fit to some transfer data taken at IBM Corporation. Transfer efficiencies of about 0.6–0.9 are common for conventional corotron transfer devices, depending on the specific paper thickness and absorbed humidity, as well as the photoconductor thickness. When paper absorbs moisture, it becomes conductive. If the paper is simultaneously in contact with any grounded conductive surface, then the charges on the back of the paper bleed to that surface, and the potential of the paper diminishes. Thus, charges deposited on the back surface of the paper tend to bleed through toward the toner, and the total electric field in the sandwich diminishes. In extreme cases of absorbed moisture and corona transfer, efficiencies approach zero, and catastrophic failure is experienced. To minimize this failure, it is necessary to fabricate all paper transport belts from good insulators. It is actually possible to transfer toner to a metallic plate or foil if the potential of the foil is kept constant as separation occurs and its value is such that air breakdown is avoided. An approach to minimizing the effects of humidity on transfer, among other things, by field tailoring is disclosed in Ref. 82. Paper is packaged under dry conditions, and the user is advised to keep the stacks intact after opening a ream. Moisture penetration is a slow process that depends on the exposed area, the sides of the stack, among other things. The design of papers for electrophotographic transfer is a separate technology. Yang–Hartmann (83) developed a model of charged particle transfer when subjected to an electric field generated by two dielectric coated parallel electrodes. This model is cited in Ref. 33, (pp. 204–208). In this model, the electric field in the toner pile between the photoconductor and the transfer medium is calculated. Because the particles are charged, the field vanishes at a point inside the toner layer. It is assumed that the toner layer splits at this thickness. Transfer efficiency in this model is defined as the ratio of toner pile height on paper to the pile height of the original deposition on the photoconductor. As Williams (Ref. 33, pp. 204–208) points out, this model is not very representative of corona transfer, but it is the only one published. When using photoconductor materials coated on flexible belts, vibration at about 60 kHz can assist in breaking the toner to photoconductor adhesion when applied
324
ELECTROPHOTOGRAPHY
simultaneously with the electric field (84). The vibration is supplied by a piezoelectrically driven horn that contacts the back surface of the photoconductor. Best performance is obtained by placing the horn tip at a position that corresponds to the peak of the electric field (84). Transfer efficiencies of 98% have been achieved by this method. The transferred image has to be treated such that the charged toner will not jump back to the photoconductor upon separation from the photoconductor but rather sticks to the paper. If this retransfer does occur, it usually happens randomly and leaves an ugly image on the paper. Good engineering shapes the paper path and controls the corona currents or ac so that the electric field is always in a direction to hold the toner to the paper, as the paper leaves the transfer region. Transfer in color electrophotography is complicated by the requirement of transferring three or four separate images in registration. These images are generated either cyclically or in parallel. The transfer system has to be tailored for the specific application. In a cyclic color electrophotographic process, the same transfer sheet is brought into contact with the same developed photoconductor as many times as there are color separations. Thus, in a cut sheet machine, the transfer sheet is held in place by clips (there are also other methods of keeping the paper on the transfer member) until the last transfer has occurred. However, in a tandem application, the same transfer sheet moves synchronously as the imaged and developed photoconductor pass through as many separate transfer zones as there are color separations. The sheet has to be in registration with the image on the photoconductor in each of these zones. This presents engineering challenges. FUSING OR FIXING The final step in producing an electrophotographically printed page is fusing or fixing the toner image, so that it becomes permanent. Toner particles are pigments embedded in a thermoplastic such as polystyrene. These particles soften at their glass transition temperature (about 60 ° –70 ° C) and become soft and rubbery. Then as temperature rises, the melt viscosity decreases, and the plastic begins to flow. When the image support is paper, the molten toner flows into the pores of the paper by capillary action. Upon cooling, the toner solidifies, and it is trapped by the paper fibers and by intimate contact with the fiber surfaces. If the image support is another plastic, for example, a transparent material, then the heated solid plastic sheet comes into intimate contact with the molten toner and again, upon solidification, the image becomes permanent. As the temperature of the toner and plastic contact drops, the thermal contraction of the toner image matches that of the plastic support. Otherwise, a mismatch of deformations at the interface causes the solidified toner to flake off the support. The temperature of the toner support has to match or exceed that of the toner above the glass transition point for a bond to occur.
Contact Heat Transfer Typically, heat is supplied to both the toner particles and the support by contact with a heated surface, but radiant heat transfer has also been used. A roll fuser is an example of a contact heat transfer system. Typically, the paper that contains unfused toner is passed between heated pressure rolls that are hard cylinders made of a highly thermally conductive material such as steel coated with a heatresistant conformable material such as silicone rubber. The coating has to be highly thermally conductive and highly resistant to cyclic heating and cooling. Inside the cylinder an infrared-rich source such as a quartz filament lamp heats the steel. The surface temperature of the cylinder is controlled by pulsing the lamp at the proper rate. The surface temperature, about 180 ° C, is measured by sensors, and feedback control loops determine the pulse duration and frequency. These depend on the presence of paper and the temperature and humidity content of the paper. The quantity of toner on the paper contributes slightly to the heat transfer required to maintain proper roll temperature. The conformance of the roll coating to the paper surface and toner image has to be high, so that the image is not distorted upon fusing. When the toner melts, pressure from the conformable surface helps confine it to the boundaries of the image. Before conformable coatings were used, the roller contacting the image was hard, and considerable distortion of the image occurred, especially on transparent substrates. This was particularly objectionable when fusing halftones because dot distortions contribute to image optical noise. However, the conformance of the roll surface in the image area depends on the coating and also on the toner melt viscosity and dwell time in the roller nip. Short dwell time exists when the roll surface speed is high and/or the roller diameter is small and also when the coating has a high elastic modulus. Thus, for short dwell times, the roll temperatures have to be higher than for long dwell times. Two operational boundaries confine the operating characteristic of a fuser roll. Toner from the image can offset to the roller surface when it is too hot (melt viscosity is too low) or when the roller is too cool (melt viscosity is too high). These boundaries can be widened by metering release agents onto the fuser roll. These help form a layer of low cohesion between the molten toner and the roller surface. References 85 and 86 disclose a contact fuser assembly for use in an internally heated fuser roll structure. It is comprised of a rigid, thermally conductive core that is coated by a thin layer of a normally solid, thermally stable material and a liquid release agent is subsequently applied to the coated core. The liquid release agent is a silicone oil. Reference 87 discloses a heat and pressure roll fuser. The apparatus includes an internally heated fuser roll that has a backup or pressure roll to form a nip through which the copy substrates pass and the images contact the heated roll. The heated fuser roll has an outer layer or surface of silicone rubber or VitonTM to which a low viscosity polymeric release fluid is applied. The release fluid is dispensed from a sump by a metering roll and a donor roll. The metering roll contacts the release fluid in the sump, and the donor roll contacts the surface
ELECTROPHOTOGRAPHY
of the heated fuser roll. Reference 88 discloses a release agent management (RAM) system for a heat and pressure fuser for black and color toner images. Both the fuser roll and the backup roll are cleaned in each cycle to remove any debris that may have stuck to their surfaces because of contact with toner and its substrate. Fusing colored toner images requires attention to the degree of coalescence of the toner particles. To a certain extent, the fused image should not remain particulate because surface scattering of light tends to reduce the color gamut available from the pigments in the toners. Thus, engineering of fuser rolls that have operational latitude requires sophisticated heat transfer calculations. Radiant Heat Transfer Flash fusing and focused infrared light are examples of radiant heat transfer systems. In machines of the early 1960s such as the Xerox 914, the paper passed through an oven. This created a fire hazard if a jam held paper in the fuser. So considerable effort went into designing sensors and switches that would turn off the heat source when a jam occurred. In addition, the ovens were constructed of high thermal capacity materials. Therefore, engineering was focused on providing an ‘‘instant on or off’’ low thermal mass fuser (89). Here, radiant and contact fusing occur simultaneously. The back of the paper contacts the floor of the fuser, and a quartz lamp that has a focusing shield concentrates radiation on the front of the paper. This fuser is constructed of low thermal mass materials such as sheet aluminum or stainless steel. Reference 90 describes an instant-on fuser that has a relatively thin, fiber-wound cylinder that supports a resistance wire, heating foil, or printed circuit secured on the outside surface of the cylinder or embedded in the surface of the cylinder. Flash fusing concentrates the energy of a xenon flash tube on the toner particle image. During the flash, these particles absorb the infrared radiation and are heated to their melting point. Particle contact with paper heats the paper locally until the molten toner flows into the fibers. The paper remains cold in background areas. The toner polymer also has to melt to a low viscosity and flow quickly. Reference 91 describes a toner composition for flash fusing that contains a polyester binder whose glass transition temperature is between 50 and 80 ° C. A flash fusing system is currently not compatible with colored toner because most of the energy of the xenon flash is concentrated in the visible portion of the spectrum. By definition, colored toner, other than black, absorbs only a part of the visible spectrum. Efforts have been made, with limited success, to include visibly transparent infrared absorbing molecules in the toners to enable color flash fusing. Reference 92 describes a toner in which an ammonium salt of an infrared light absorber and a positive charge-control agent are used in combination. Image permanence is a rigid requirement in any commercial application. Fusing Color Images A black image requires absorbing light in the whole visible spectrum. Therefore, because the colorant in black
325
toner is usually carbon black, image quality does not depend on the coalescence of the toner particles into a solid mass. (Some color machines use a ‘‘process black’’ that consists of cyan, magenta, and yellow pigments dispersed within a single toner particle.) The main concern is image permanence. When toner particles are permanently attached to paper fibers without a high degree of coalescence, the image is said to be ‘‘fixed’’ but not necessarily ‘‘fused.’’ On the other hand, the color of nonblack toner does depend on coalescence of the particles. The surface roughness of the fused toner layer particularly influences the saturation of the output color because of light scattering. Therefore, the color toner has to melt in the fuser and it also has to flow within the image boundary. This requirement makes designing color toner materials very complicated because cyan, magenta, and yellow pigments influence contact electrification and toner color. Dispersion and concentration of colorant in the plastic binder influences color, toner charging characteristics, and the flow properties in the molten state. The cleanliness of the fuser roll in color applications is a high priority. Any contamination of the image due to residue from a previous image contaminates background areas and also distorts the color. A light colored image such as yellow is particularly vulnerable. COLOR ELECTROPHOTOGRAPHY To print process color electrophotographic images, the basic xerographic process steps, described in the preceding sections are repeated three or four times. Each of the latent electrostatic images is developed by using a subtractive color primary pigmented toner. Each time, the exposure step is tailored to meet the individual demands of the cyan (minus red), magenta (minus green), and yellow (minus blue), or black separation. The merits of fourcolor printing versus three-color printing are beyond the scope of this article. Quantitative control of charging, exposure, development, and transfer is necessary to meet the demanding requirements of a customer who is accustomed to obtaining high-quality lithographic color prints. However, lithographic and electrophotographic printing processes differ. Lithographic printing of colored images requires skilled press operators to perform trial runs while setting up the press. The final adjustments on the relative amounts of cyan, magenta, yellow, and black inks can use up hundreds if not thousands of pieces of printed material. Then, during the run, the operators observe the output of the presses to insure constant performance. (Short run lithographic presses are currently on the market, but a discussion of these devices is beyond the scope of this article.) If colors begin to shift during a job on a conventional press, the operators make adjustments to the press to restore color. This is highly impractical for color electrophotographic printers or copiers that are used for jobs that may consist of 1 to 1000 prints. The image quality requirements for highquality color printing need quantitative understanding of all of the process steps involved in forming the final image.
326
ELECTROPHOTOGRAPHY
that yields
Optical Density of Color Toner The darkness (optical density) of black images, is directly related to toner mass per unit area on the paper. However, the color of the developed toner image of color prints depends on the total and also on the relative amounts of the subtractive primary colorant pigments that make up the toner layer. Thus, the electrostatic images of the different primary ‘‘separations’’ of three-or four-color images are different. When light of intensity I0 (λ) strikes an image made up of pigmented areas on a substrate such as paper, part of the spectrum of the illuminating light is partially absorbed. Part of the illuminating light is scattered at the image surface, and part is reflected by the surface. A portion of the light that enters the image is absorbed by the pigment in the toner layer. The partially absorbed light that strikes the paper surface is scattered and partially absorbed by the paper fibers. Then, it is reflected back through the same toner layer. On its way out of the layer, it is again partially absorbed by the pigment. Thus, the light that emerges from the toner layer has a spectrum different from the light that enters it. The total intensity I(λ) of the light gathered by a sensing device such as the human eye or a densitometer is I(λ) = It (λ) + Ir (λ) + Is (λ)
(54)
where It (λ) is the intensity of light coming through the toner layer after being partially absorbed by the pigment on its way in, partially absorbed by the paper at the toner paper interface, and again partially absorbed by the pigment on its way out of the layer. Ir (λ) is the intensity of light reflected at the surface, and Is (λ) is the intensity of the light scattered at the surface. The total reflectance of the toner layer on the paper is R(λ) =
−β(λ)Cp
Rt (λ) = e
Ir (λ) + Is (λ) Is (λ) − Ir (λ) −β(λ)Cp Lt e Rp (λ) − R(λ) = 1 − I0 (λ) I0 (λ) (59) If front surface reflection and scattering are ignored, then optical density is given in terms of mass deposition by OD(λ) = 0.43β(λ)Cp M/Aρt
(60)
If one knows the toner composition, then the pigment concentration is known, and β(λ) can be measured. Then, a more representative optical density is obtained from mass deposition from. Ir (λ) + Is (λ) Is (λ) − Ir (λ) −β(λ)Cp M/Aρt Rp (λ) − ]e , I0 (λ) I0 (λ) (61)
and OD(λ) = − log10 R(λ).
and the optical density is OD(λ) = − log10 R(λ)
(58)
where Cp is the pigment concentration and β(λ) is the spectral absorption of pigment per unit layer thickness. Equation (57) is a reasonable assumption, but it is not accurate for OD > ∼1.4 because front surface reflection and light scattering by both the surface and the pigment at the surface of the fused layer become significant at high densities. Optical density has an asymptote at a finite value, which is determined by the reflecting and scattering characteristics of the fused toner layer and by viewing geometry, not by the M/A. To obtain the total reflectance as measured by a spectrophotometer or as seen by the human eye, the front surface reflection and scattering have to be included because Is (λ) + Ir (λ) can become greater than It (λ) at high values of mass deposition. So the total reflectance R(λ) of the fused toner on the paper surface is modified and yields reflectance in terms of layer thickness Lt ;
R(λ) = [1 −
I(λ) I0 (λ)
M Aρt
(55)
(62)
Figure 35 is a typical plot of the optical density of black toner using the reflectance given in Eq. (61). Similar plots
But the reflectance associated with only the amount of light absorbed by the pigment in the toner layer is 1.6
(56)
1.2
However, the reflectance of the toner layer on the paper (which may be colored), neglecting surface reflection and scattering is the product Rt (λ)Rp (λ). Let Vmass = ALt be the volume of 1cm2 of fused toner. It can readily be seen that the layer thickness Lt = M/Aρt , where M/A is the mass per unit area of the toner which has been transferred and fused on the paper and ρt is the mass density of the layer. It is often assumed that the reflectance of the fused layer can be represented by Rt (λ) = e−β(λ)Cp Lt
1.4
(57)
1 OD
It (λ) Rt (λ) = I0 (λ) − Is (λ) − Ir (λ)
0.8 0.6 0.4 0.2 0 0
0.5
1
M /A (mg/cm2) Figure 35. Optical reflection density versus deposition of some typical powder toners.
ELECTROPHOTOGRAPHY
are obtained for cyan, magenta, and yellow toners using red, green, and blue filters, respectively, in a densitometer. Parameters for this plot are βCP /ρt = 6.5 mg/cm2 , RP = 1, (Ir + Is )/I0 = 0.04, and because the toner for the calculation is black, the parameters are uniform for the whole visible spectrum. This calculation agrees well with the curve shown in Ref. 40, p. 39, Fig. 2.11, and p. 38, Eq. (2.4). Also see Ref. 93. The assumption implicit in obtaining Lt from M/Aρt is that fusing at low depositions causes the toner particles to flow into a solid layer of low thickness. This does not usually happen. At depositions less than about 0.3 mg/cm2 , toner particles, although fused, tend to remain as agglomerates of small particles. For typical spectral distributions of reflectance by cyan, magenta, or yellow printing inks, see Fig. 36 and Ref. 94. The spectral distributions β(λ) of the pigments in cyan, magenta, and yellow toners are very close to those of the printing inks shown in Fig. 36. The reflectance of white paper is a constant close to one for the visible spectrum. Thus, an approximation can be made from Eq. (58) that β(λ) is represented by a fraction, which is not a function of mass per unit area, multiplied by the natural logarithm of the reflectances shown in Fig. 36.
1
Yellow
0.9
Reflectance
0.8
Magenta
327
The transformation yields the spectral distribution of β(λ) shown in Fig. 37 when normalized to a maximum value of 6.5 per mg/cm2 . Using the values of β(λ) in Eq. (60) or (61), the spectral reflectance and optical density of layers of cyan, magenta, or yellow toner can be calculated from mass depositions obtained from the process parameters discussed in the preceding sections. If light scattering by pigment particles and toner layer interfaces is ignored, then the spectral reflectance of superposed layers of cyan, magenta yellow, and black can be estimated. Light first passes through the multiple layers and is partially absorbed on its way in. Then, the portion that gets through is reflected at the paper interface and is partially absorbed on its way out. Calculation procedures that account for significant light scattering are used extensively in the paint industry. However, the pigment particles used for paint are opaque. The paint binder is often mixed with titanium or silicon oxides that increase the opacity of the paint to help the newly painted surface color hide the previously painted surface. There are also conditions in color photography where light scattering invalidates Eq. (57). See Ref. 95 for more details. The pigments used in color toners tend to be more transparent than the pigments and fillers such as TiO2 or SiO2 used in paints. So, if one ignores light scattering by the pigment and fusing transforms the toner particles into homogeneous layers, then the spectral reflectance of the composite layer can be estimated from
0.7 0.6 −βblack (λ)Cblack
0.5
Rblack (λ) = e
0.4 0.3
−βcyan (λ)Ccyan
0.2
Rcyan (λ) = e
Cyan
0.1 0 400
450
500 550 600 Wavelength (nm)
650
DMAcyan ρt ,
(63)
DMAmagenta −βmagenta (λ)Cmagenta ρt , Rmagenta (λ) = e
700
Figure 36. Spectral distribution of reflectance of some typical cyan, magenta, and yellow printing inks.
DMAblack ρt ,
and −βyellow (λ)Cyellow
Ryellow (λ) = e
DMAyellow ρt ,
b(l) per (% /100 C p) per mg/cm2
7 Cyan
where DMA is M/A
6 5
DMAcyan = (M/A)cyan , DMAmagenta = (M/A)magenta ,
4
DMAyellow = (M/A)yellow ,
Magenta
Ccyan = cyan pigment concentration,
3
Cmagenta = magenta pigment concentration,
2
Cyellow = yellow pigment concentration,
1 Yellow 0 400
450
500 550 600 Wavelength (nm)
650
700
Figure 37. Spectral distribution of absorption per pigment layer thickness per pigment concentration fraction.
and the reflectance of the four superposed layers, ignoring front surface reflections and scattering, is Rlayer (λ) = Rcyan (λ)Rmagenta (λ)Ryellow (λ)Rblack (λ).
(64)
328
ELECTROPHOTOGRAPHY
Including reflection and scattering at the front surface in the composite layer yields Rcomposite (λ) = 1 −
Is (λ) − Ir (λ) I0 (λ)
× Rlayer (λ)Rp (λ) −
Ir (λ) + Is (λ) I0 (λ)
(65)
So, the spectral reflectance of the composite color as a function of the mass per unit area of the developed, transferred, and fused toner layers can be estimated. Architecture of a Color Electrophotographic Printer Light lens color copiers did not enjoy very enthusiastic market acceptance for many reasons. One technical drawback was that color could not be corrected for unwanted absorptions of magenta and yellow. Another drawback was that the color original to be copied could exhibit metamerisms because of the spectral absorptions of the pigments used in the original and the spectral distribution of energy in the illuminating system used to expose the photoconductor. A discussion of metamerism is beyond the scope of this article. Cyclic Color Copying or Printing. A light lens color copier has to be a cyclic machine. An example of a cyclic color copier is discussed in Ref. 96. The charged photoconductor in a cyclic color machine is exposed through a red filter to a colored original and developed with cyan (minus red) toner. The cyan toner image is transferred to a receiver and held there. Then, the photoconductor is cleaned, recharged, and exposed again to the same original through a green filter. The first developer housing that contains the cyan toner is moved away and a second housing is moved in to develop the second latent electrostatic image with magenta (minus green) toner. The magenta image is transferred in registration on top of the cyan toner. This process is repeated for exposure through a blue filter followed by development by yellow toner until the full color image is on the receiver. Then the receiver that contains the multiply colored toner layer is released and passed through a fuser. Modern color copiers first scan the original document by using a filtered raster input scanner. They generate a bit map of information that corresponds to the red, green, and blue and (for a four-color printer) black separations. This information is analyzed by an onboard computer and is color corrected for the unwanted absorptions of the primary colorants used in the toners. The color-corrected separations are printed by a raster output scanner (ROS) exposure system. Printing with a ROS opens the possibility of using a tandem or an ‘‘Image On Image’’ (IOI) (97), instead of the cyclic process described before. The IOI process is sometimes called the ‘‘recharge expose and develop’’ (REaD) process. When the bit map of imaging information controls the exposure step, the copier is the same as a printer. However, the source of the information is obtained from an original document instead of being generated by a computerized publishing system.
Tandem Color Printing Process The unique feature of a tandem process is that three (or four) sets of photoconductors, chargers, exposure systems, and transfer systems are used sequentially. Each set constitutes a mini ‘‘engine’’ that forms one of the primary colorant separations and deposits the appropriate toner on a single receiver. An example of a tandem printer is discussed in Ref. 98. Thus, as the receiver moves through the individual transfer zones of each mini ‘‘engine,’’ the full color toner image is built up. Primary colorant toners accumulate on the single receiver. In this kind of system, the developer housings are not moved in and out between separations. However, although mechanical complexity is increased, the number of prints generated per minute at a given process speed is triple (or quadruple) of that using the cyclic process. So, to obtain the same output rate as a tandem engine, the process speed of a cyclic engine needs to be increased substantially. The trade-off between cyclic and tandem processes involves many engineering and performance decisions that are beyond the scope of this article. Image on Image or REaD Color Printing In the REaD process, a uniformly charged photoconductor is exposed to the bit map of the first separation, pixel by pixel, to form the first latent electrostatic image. This image is developed by a developer housing that contains the appropriate primary colored toner. This first toner image is not transferred from the photoconductor. The toned image on the photoconductor passes through a charge erase and leveling station. Here, the charges that remain on the toned image and untoned background areas of the photoconductor are dissipated. Then the toned image on the photoconductor passes through a recharge station and a reexpose station. Here, the electrostatic latent image of the second separation is formed in registration on top of the first developed toner image and the photoconductor upon which it sits. The combination of the first developed toner image and the second latent electrostatic image is developed by a second developer housing that contains the appropriate second primary colorant toner. Again, the first two layers of toner are not transferred, but the previous steps are followed until the final, registered, three- or four-layer, multicolored toner image is formed. Then, the multiple layers of toner are treated by a corona to ensure that all of the particles are charged with the same polarity, and they enter a final transfer zone where they are all transferred simultaneously to the receiver sheet. Then, the toner image is fused to form the colored output. Several challenges presented to the electrophotographic subsystems used in the IOI process are spelled out in the ‘‘Background of the invention’’ section of Ref. 97. The light that exposes a previously developed latent image cannot be appreciably absorbed or scattered by the primary color toner that is on the photoconductor. This challenge was met by using infrared (IR) light for exposure because the primary colored toners are mostly transparent in the near IR. The previously toned latent image can be neutralized before subsequent recharging by
ELECTROPHOTOGRAPHY
329
Table Sub System
Equation
Corotron charging Flash exposure Laser scanning exposure with perfect motion quality Laser scanning exposure with imperfect motion quality Photoconductor illuminated area discharge Magnetic brush development neutralization limit
(6) (8) (17) (20) (23) (30) (31) (41)
Magnetic brush development before neutralization limit Transfer
(52)
Optics of fused toner layer
Output Quantity Photoconductor voltage before exposure (volts) Exposure (erg/cm2 ) Exposure distribution in fast scan direction Exposure distribution in slow scan direction Exposure distribution in slow scan direction Electrostatic image potential (volts)
%mass transferred (61) (62)
Toner mass per unit area deposited in electrostatic image (mg/cm2 ) Toner mass per unit area deposited in electrostatic image (mg/cm2 ) Toner mass per unit area on transfer sheet Toner layer reflectance in visible range Toner layer optical density in visible range
various forms of ac corona. Scorotrons tend to level the surface potentials of the previously developed image and undeveloped background areas of the photoconductors. These can be used to recharge the photoconductor before it moves into a second, third, or fourth exposure station. Then, because the photoconductor has a previously developed toner image on it when it moves into the second, third, and fourth development zones, these previously developed toner images must not be disturbed. Therefore, the development methods have to be non-interactive. One such development method is hybrid scavengeless development (see previous section). The nonmagnetic form of jumping development is another development method adapted for scavengeless performance (99,100). Finally, the transfer system has to be able to transfer high as well as low depositions faithfully with high efficiency. One such transfer method involves acoustic transfer assist, (see preceding section and Ref. 84). The final multilayer toner image can be transferred either to the final receiver material or to an intermediate surface. If it is transferred to the final receiver, then the toner image passes on to a fuser where the toner particles coalesce and form the output color print. The image quality of the color print is determined by the quality of the engineering and systems analyses that were performed in designing the color printer.
to estimate the influences of various parametric values on the optical density of the output images. The assumptions used in formulating the equations are discussed in the appropriate sections. These equations can be used to produce multivariable system plots to help examine the influences of the many input variables on the visual appeal of the electrophotographic images. Sensitivity can be analyzed by partially differentiating the various constitutive equations in the list to obtain an estimate of the influence on output print quality by fluctuations of the independent variables.
SUMMARY AND CONCLUSION
AMAT
Electrophotography, also called xerography (a process for producing high-quality images by using the interaction of electricity and light with materials) was described in this article. A number of methods were discussed for executing the process steps: charging, exposing, and developing the photoconductor, transferring and fusing the developed toner image. The measurable parameter is the optical density, which is a function of the toner mass per unit area and the pigmentation on the transfer sheet, among other things. The following is a list of the equation numbers that can be used (under the assumptions of their derivations)
BTR CAD DAD DMA FWHM HSD IOI IR LED LID M/A
Acknowledgments Many models and workers not mentioned in this article have contributed to the present state of understanding of electrophotography, and that this article is by no means complete. The science of electrophotography is multidisciplinary, involving chemistry, physics, mathematical distributions, and engineering. This article just briefly describes only some of the most common concepts, principles, and phenomena that are active during the production of toner images. The reader interested in further detail is referred to the texts and patents cited. The many articles published primarily in the Proceedings of the IEEE-IAS Industry Applications Society, the Journal of the SPIE and the journal Applied Physics also provide more detailed and possibly more current information.
ABBREVIATIONS AND ACRONYMS active matrix coating on a conductive substrate biased transfer roll charged area development discharged area development deposited mass per unit area full width at half maximum hybrid scavengeless development image on image infrared radiation light emitting diode liquid immersion development toner mass per unit area
330
OD PIDC PVK POW Q/M RAM REaD ROS TC
ELECTROPHOTOGRAPHY 26. Contact Charging Member, Contact Charging Making Use of It, and Apparatus Making Use of It, US Pat. 5,140,371, (1992), Y. Ishihara et al.
optical density photo-induced discharge curve polyvinylcarbazole potential well scorotron average toner charge to mass ratio release agent management recharge and develop raster output scanner toner concentration
27. Electrophotographic Charging Device, US Pat. 5,068,762, (1991), T. Yoshihara. 28. Charging Device and Image Forming Apparatus, US Pat. 5,940,660, (1999), H. Saito. 29. Control of Fluid Carrier Resistance and Liquid Concentration in an Aquatron Charging Device, US Pat. 5,819,141, (1998), J. S. Facci et al. 30. Roll Charger with Semi-Permeable Membrane for Liquid Charging, US Pat. 5,895,147, (1999), J. S. Facci. 31. Banding-Free Printing by Linear Array of Photosources, US Pat. 4,475,115, 1984, Garbe et al.
BIBLIOGRAPHY 1. J. Mort, The Anatomy of Xerography: Its Invention and Evolution, McFarland, Jefferson, NC, 1989. 2. Electrophotographic Apparatus, US Pat. 2,357,809, (1944), C. F. Carlson. 3. G. C. Lichtenberg, Novi. Comment. Gott. 8, 168 (1777). 4. P. Selenyi, Zeitschr. Phys. 47, 895 (1928). 5. P. Selenyi, Zeitschr. Tech. Phys. 9, 451 (1928).
8. P. Selenyi, J. Appl. Phys. 9, 637 (1938). Pat.
2,297,691,
34. M. Scharfe, Electrophotography Principles and Optimization, Research Studies Press, John Wiley & Sons, Inc., New York, NY, 1984.
36. Electrode Wire Cleaning, US Pat. 4,984,019, 1991, J. Folkins.
7. British Pat. 305,168, (1929), P. Selenyi. US
33. E. M. Williams, The Physics & Technology of Xerographic Processes, Wiley, NY, 1984.
35. Scavengeless Development Apparatus for Use in Highlight Color Imaging, US Pat. 4,868,600, 1989, D. Hays et al.
6. P. Selenyi, Zeitschr. Tech. Phys. 10, 486 (1929).
9. Electrophotography, C. F. Carlson.
32. A. Melnyk, Third Int. Congr. Adv. Non-Impact Printing Technol. Aug. 24–28, San Francisco, pp. 104–105, 1986.
(1942),
10. R. M. Schaffert, Electrophotography, Focal Press, NY, 1975. 11. Method for the Production of a Photographic Plate, US Pat. 2,753,278, 1956, W. E. Bixby and O. A. Ullrich Jr.
37. Dual AC Development System for Controlling the Spacing of a Toner Cloud, US Pat. 5,010,367, 1991, D. Hays. 38. Development Apparatus Having a Transport Roll Rotating at Least Twice the Surface Velocity of a Donor Roll, US Pat. 5,063,875, 1991, J. Folkins et al.
12. Corona Discharge Device, US Pat. 2,777,957, (1957), L. E. Walkup.
39. Hybrid Development Type Electrostatographic Reproduction Machine Having a Wrong Sign Toner Purging Mode, US Pat. 5,512,981, 1996, M. Hirsch.
13. Developer Composition for Developing an Electrostatic Image, US Pat. 2,638,416, (1953), L. E. Walkup and E. N. Wise.
40. L. Schein, Electrophotography and Development Physics, Springer-Verlag, NY, 1988.
14. Method and Apparatus for Printing Electrically, US Pat. 2,576,047, (1951), R. M. Schaffert.
41. F. R. Ruckdeschel Dynamic Contact Electrification Between Insulators, PhD Thesis, University of Rochester, 1975; also J. Appl. Phy. 46, 4416 (1975).
15. J. R. Rumsey, Electronic Imaging ’87, Int. Electron. Imaging Exposition Conf., Boston, Mass., 1987. pp. 33–41.
42. R. J. Nash, IS&T, Tenth Int. Congr. Adv. Non-Impact Printing Technol., New Orleans, 1994, pp. 95–107.
16. P. M. Borsenberger and D. S. Weiss, Organic Photoreceptors for Imaging Systems, Marcel Dekker, NY, 1993.
43. Process for Developing Electrophotographic Images by Causing Electrical Breakdown in the Developer, US Pat. 4,076,857, 1978, G. P. Kasper and J. W. May.
17. Method for the Preparation of Electrostatographic Photoreceptors, US Pat. 3,956,524, (1976), J. W. Weigl. 18. Layered Imaging Member and Method, US Pat. 4,282,298, 1981, M. W. Smith, C. F. Hackett, and R. W. Radler. 19. Imaging System, US Pat. 4,232,102, (1980), A. M. Horgan.
44. D.A. Hays, IEEE-IAS Annu. Conf. Proc., Toronto, Canada, 1985, p. 1510. 45. Developing Apparatus for Electrostatic Image, US Pat. 4,386,577, 1983, N. Hosono et al.
20. M. E. Scharfe, D. M. Pai, and R. J. Gruber, in J. Sturge, V. Walworth, and A. Shepp, eds., Imaging Processes and Materials, Neblette’s Eighth Edition, Van Nostrand Reinhold, NY, 1989.
46. Developing Apparatus for Electrostatic Image, RE. 34,724, 1994, Hosono et al.
21. J. D. Cobine, Gaseous Conductors Theory and Engineering Applications, Dover, NY, 1958.
48. Developing Method and Apparatus, US Pat. 4,610,531, 1986, Hayashi et al.
22. Corona Generating Device, US Pat. 3,936,635, (1976), P. F. Clark.
49. Developing Method for Developer Transfer Under A.C. Electrical Bias and Apparatus Therefor, US Pat. 4,395,476, 1983, Kanbe et al.
23. Long Life Corona Charging Device, US Pat. 4,837,658, (1989), L. Reale.
47. Magnetic Developing Method Under AC Electrical Bias and Apparatus Therefor, US Pat. 4,292,387, 1981, Kanbe et al.
50. Developing Device, US Pat. 4,383,497, 1983, H. Tajima.
24. Corona Generating Device, US Pat. 5,451,754, (1989), L. Reale.
51. Scavengeless Development Apparatus for Use in Highlight Color Imaging, US Pat. 4,868,600, 1989, D. A. Hays.
25. Contact-Type Charging Member Which Includes an Isulating Metal Oxide in a Surface Layer Thereof, US Pat. 5,502,548, (1996), Y. Suzuki et al.
52. Hybrid Scavengeless Developer Unit Having a Magnetic Transport Roller, US Pat. 5,359,399, 1994, J. Bares and C. Edmunds.
ENDOSCOPY 53. Donor Roll with Electrode Spacer for Scavengeless Development in a Xerographic Apparatus, US Pat. 5,338,893, 1994, C. G. Edmunds. 54. Developing Apparatus Including a Coated Developer Roller, US Pat. 5,386,277, 1995, D. A. Hays et al. 55. H. Stark and R. Menchel, J. Appl. Phys. 41, 2905 (1970). 56. Squeegee Roller System for Removing Excess Developer Liquid from Photoconductive Surfaces, US Pat. 3,955,533, 1976, I. E. Smith et al. 57. Developer Wringing and Removing Apparatus, US Pat. 3,957,016, 1976, K. Yamad et al. 58. Apparatus for Cleaning and Moving a Photoreceptor, US Pat. 4,949,133, 1990, B. Landa. 59. Method and Apparatus for Removing Excess Developing Liquid from Photoconductive Surfaces, US Pat. 4,286,039, 1981, B. Landa et al. 60. Imaging System with Rigidizer and Intermediate Transfer Member, US Pat. 5,028,964, 1991, B. Landa. 61. Method and Apparatus for Compaction of a Liquid Ink Developed Image in a Liquid Ink Type Electrostatographic System, US Pat. 5,655,192, 1997, G. A. Denton and H. Till. 62. Liquid Developer, US Pat. 3,729,419, 1973, S. Honjo et al. 63. Charge Control Agents for Liquid Developers, US Pat. 3,841,893, 1974, S. Honjo et al. 64. Milled Liquid Developer, US Pat. 3,968,044, 1976, Y. Tamai et al. 65. Dyed Stabilized Liquid Developer and Method for Making, US Pat. 4,476,210, 1984, M. D. Croucher et al. 66. Metallic Soap as Adjuvant for Electrostatic Liquid Developer, US Pat. 4,707,429, 1987, T. Trout. 67. Liquid Developer, US Pat. 4,762,764, 1988, D. S Ng et al. 68. Liquid Ink Development Dragout Control, US Pat. 5,974,292, 1999, G. A. Domoto et al. 69. Apparatus for Cleaning and Moving a Photoreceptor, US Pat. 4,949,133, 1990, B. Landa. 70. Liquid Developer Imaging System Using a Spaced Developing Roller and a Toner Background Removal Surface, US Pat. 5,255,058, (1993), H. Pinhas et al. 71. Electrophotographic Printing Machine, US Pat. 2,807,233, (1957), C. J. Fitch. 72. Constant Current Biasing Transfer System, US Pat. 3,781,105, (1973), T. Meagher. 73. Electrostatic E. F. Mayer.
Printing,
US
Pat.
3,043,684,
(1962),
331
82. Transfer System with Field Tailoring, US Pat. 5,198,864, (1993), G. Fletcher. 83. C.C. Yang and G.C. Hartmann, IEEE Trans. Elec. Dev. ED23, 308 (1976). 84. Method and Apparatus for using Vibratory Energy with Application of Transfer Field for Enhanced Transfer in Electrophotographic Imaging, US Pat. 5,016,055, (1991), K. Pietrowski et al. 85. Renewable CHOW Fuser Coating, US Pat. 3,934,547, (1976), Jelfo et al. 86. Renewable CHOW Fuser Coating, US Pat. 4,065,585, (1977), Jelfo et al. 87. Roll Fuser Apparatus and Release Agent Metering System therefor, US Pat. 4,214,549, (1980), R. Moser. 88. Web with Tube Oil Applicator, US Pat. 5,500,722, (1996), R. M. Jacobs. 89. Instant-on Radiant Fuser, US Pat. 4,355,225, (1982), D. G. Marsh. 90. Filament Wound Foil Fusing System, US Pat. 4,883,941, (1989), R. G. Martin. 91. Developer Composition for Electrophotography for Flash Fusing, US Pat. 5,330,870, (1994), S. Tanaka et al. 92. Flash Fixing Color Toner and Process for Producing Same, US Pat. 5,432,035, (1995), Y. Katagiri et al. 93. P. E. Castro and W. C. Lu, Photogr. Sci. Eng. 22, 154 (1978). 94. E. Jaffe, E. Brody, F. Preucil, and J. W. White, Color Separation Photography, GATF, Pittsburgh, PA, 1959. 95. R.M. Evans, W.T. Hanson, and W.L. Brewer, Principles of Color Photography, John Wiley, NY, 1953. 96. Multicolor Xerographic Process, US Pat. 4,135,927, (1979), V. Draugelis et al. 97. Single Positive Recharge Method and Apparatus for Color Image Formation, US Pat. 5,579,100, (1996), Z. Yu et al. 98. Tandem Trilevel Process Color Printer, US Pat. 5,337,136, (1994), J. F. Knapp et al. 99. Method and Apparatus for Color Electrophotography, US Pat. 4,949,125, (1990), H. Yamamoto et al. 100. Cleaning Method for use in Copy Apparatus and Toner Used Therefor, US Pat. 5,066,989, 1991, H. Yamamoto.
ENDOSCOPY
74. Powder Image Transfer System, US Pat. 3,267,840, (1966), T. Honma et al.
NIMISH VAKIL ABBOUD AFFI
75. Method of and Means for the Transfer of Images, US Pat. 3,328,193, (1967), K. M. Oliphant et al.
University of Wisconsin Medical School Milwaukee, WI
76. Photoelectrostatic Copying Process Employing Organic Photoconductor, US Pat. 3,598,580, (1971), E. S. Baltazzi et al. 77. Impression Roller for Current-Assisted Printing, US Pat. 3,625,146, (1971), J. F. Hutchison.
INTRODUCTION
78. Electrophotographic Receiver Sheet Pickup Method and Apparatus, US Pat. 3,630,591, (1971), D. R. Eastman.
Endoscopy is an imaging method that is used to visualize the interior of human organs. The widest application of endoscopy is in the gastrointestinal tract. Specialized instruments have been developed to image specific gastrointestinal organs. Gastroscopes, which are passed through the mouth, are used to examine the esophagus, the stomach, and the duodenum. Enteroscopes, also through the mouth, are specially designed to examine the small
79. Photoelectrostatic Copier, US Pat. 3,520,604, (1971), L. E. Shelffo. 80. Biasable Member and Method of Making, US Pat. 3,959,574, (1976), D. Seanor. 81. Transfer System with Tailored Illumination, US Pat. 4,014,605, (1977), G. Fletcher.
332
ENDOSCOPY
intestine. Colonoscopes, passed through the anal orifice, are used to examine the lower gastrointestinal tract. There are two major classes of instruments, fiber-optic endoscopes that consist of a bundle of coaxial fiber-optic bundles and electronic endoscopes that rely on a chargecoupled device. The electronic endoscope consists of a charge-coupled device that transmits an electronic signal resulting in an image that is visualized on a television monitor. Modern endoscopes, whether electronic or fiberoptic contain three systems: (1) a mechanical system used to deflect the endoscope tip, (2) a system of air/water and biopsy/suction channels with controls. Insufflation of air is important to distend the walls of the organ being studied. Suction capabilities allow the aspiration of contents from the gastrointestinal tract for examination. In other cases, fluid in the gastrointestinal tract obscures the view and suction allows the fluid to be aspirated away. (3) The imaging system which may be fiber-optic or electronic is used to visualize tissue. Table 1 lists the most frequent reasons for which endoscopy is performed and the organs that are visualized. Figures 1–10 are examples of endoscopic images.
Figure 2. Normal appearance of the duodenum.
Table 1. Common Endoscopic Procedures: Organs and Indications
Procedure Esophagogastro duodenoscopy (EGD)
Organs Examined Esophagus Stomach
Colonoscopy
Duodenum Colon Ileum
Enteroscopy
Small intestine
Esophageal ring
Common Conditions Involving These Organs Bleeding Gastroesophageal reflux disease Peptic ulcer disease Polyps Bleeding Ulcerative colitis Crohn’s disease Bleeding Tumors, benign and malignant
Figure 3. Esophageal ring. Note the sharp ring that caused obstruction and dysphagia in this patient.
Stomach Ulcer
Visible vessel
Esophagus
Figure 1. Normal junction of the esophagus and the stomach.
Figure 4. Duodenal ulcer that has a visible vessel. Artery protruding from the base of an ulcer.
ENDOSCOPY
333
Polyp
Diverticuli
Bleeding varix
Figure 8. Diverticulosis and colon polyp. Multiple diverticuli are seen in the colon, and a polyp is also seen projecting from the wall of the colon.
Figure 5. Bleeding esophageal varicose vein (varix).
Polyp Ulcers in the colon
Snare holding polyp
Figure 9. Polypectomy. A wire snare has been applied to the stalk of a polyp. Figure 6. Ulcerative colitis. Multiple ulcers in the colon.
Polyp Cauterized base
Ulcer
Figure 10. Polypectomy. The polyp has been resected after applying current through the wire snare.
INSTRUMENTS Fiber-Optic Instruments
Figure 7. Crohn’s disease of the colon. Note the solitary ulcer in the colon.
Clinical endoscopy began in the 1960s using fiber-optic instruments (1). The fiber-optic endoscope consists of two fiber-optic bundles, one to transmit light into the gastrointestinal tract and the other to carry light back to the endoscopist’s eye. Using these devices, light from a cold
334
ENDOSCOPY
light source is directed into a small bundle of glass fibers. Light emerges from the fiber bundle end to illuminate the mucosa of the gastrointestinal tract. Light is then reflected from the tissue of the gastrointestinal tract and enters an adjacent bundle of coherent glass fibers that returns it to the endoscope head, where the light forms an image. Each fiber in the endoscope is specially designed for endoscopy. The index of refraction decreases from the axis of the fiber to its outer surface, the light rays that travel through the fiber bend toward the center of the fiber. The outside of each fiber is covered with material that minimizes light leakage and adds strength and flexibility to the fiber. The fibers are arranged within a bundle identically at both the imaging end and the viewing end (coherent bundle). Fibers used in endoscopes may be as small as 8 microns. A satisfactory image requires a large number of fibers. There may be as many as 50,000 fibers in a bundle. Electronic Endoscopes. In the last decade, electronic endoscopes replaced fiber-optic endoscopes in gastrointestinal endoscopy. They make up for some of the deficiencies of the fiber-optic instruments (2). Fiber-Optic endoscopes do not allow binocular vision, and the endoscope needs to be held up to the endoscopist’s eye. Assistants cannot see the endoscopic image, and photography is difficult because a camera must be attached to the endoscope and the resulting image is frequently substandard. Electronic endoscopes permit binocular vision, and a large image is seen on a television monitor that permits assistants to participate actively in the procedure. This allows teaching and interaction. Images from the procedure can be stored on magnetic media and printed immediately without loss of resolution. There are many advantages to electronic endoscopes, but there is a learning curve associated with the device, and calibration of color is important, so that endoscopists view the image as ‘‘natural’’ and so that subtle changes in color and texture that could represent disease are not missed. All electronic endoscopes consist of two components: a charge-coupled device (CDD) and a video processor. The CCD is mounted at the tip of the endoscope, and it is a twodimensional array of photocells. Each photocell generates an electrical charge proportional to the intensity of the light that falls on it. This charge is relayed as a voltage to the video processor, which reconverts the information into an electrical signal that is sent to the television monitor. The television monitor is a two-dimensional grid of tiny fluorescent dots; each produces one additive primary color (red, green, or blue). The electrical impulses from the video processor direct a stream of electrons to the phosphors of each dot on the TV monitor. Stimulation of the phosphors produces a tiny point of colored light. The intensity of a given dot depends on the intensity of the signal from the processor. A trio of primary-colored dots on the monitor is a pixel. The pixel is the smallest picture element. The brightness and the color of a pixel depend on the intensity of the electron beam that strikes each component dot.
ANALOG AND DIGITAL TECHNOLOGIES IN ENDOSCOPY An analog signal is a waveform whose frequency and amplitude vary continuously and take on any values within an allowed range; a digital signal falls into one several predetermined discrete values (3). When a CCD records an image, the intensity of illumination of each pixel or brightness is an analog quantity. In digital imaging, on the other hand, the spatial information such as the location of each CCD pixel is represented by a binary number that forms a digital quantity. In an analog system, the varying intensities of the pixels are recorded as continuous variations in a physical quantity such as magnetic orientation (for an optical disk device and videotape) or optical polarization (for optical disk drives). As the disk spins, the variations in the appropriate physical quantities are converted into the temporal variation of intensity of the monitor pixels, and hence the images are retrieved. In a digital image storage system, the instantaneous value of the intensity of a pixel is converted into a binary pattern by an analog-to-digital converter. This pattern is copied as binary digits and saved on a magnetic or optical disk. When the images are retrieved, the binary digits are fed at a fixed rate into a digital-to-analog converter, which reproduces the original value of the pixel intensity at discrete time intervals. One of the advantages of digital storage methods is the high quality of the images reproduced. They do not degrade after reproduction, and they can be manipulated by computer image-processing systems. Initial acquisition costs of electronic endoscopes are higher but their longer life more than compensates for the extra cost. RESOLUTION Resolution is defined in the field of endoscopy as the shortest distance between two pictorial elements (4). A pictorial element is the smallest discernible portion of a picture. Therefore, resolution is the ability to distinguish two adjacent points in the target tissue being imaged. For example, the resolution decreases if the distance between the two adjacent points decreases or the distance from the target increases. The number of sensors in an individual instrument determines the resolution of that instrument. Therefore, the resolution of the fiberscope depends on the number of fibers in a given instrument. The number of fibers is limited by the diameter of the instrument; larger instruments cause greater patient discomfort. Due to improvements in fiber manufacture, modern endoscopes are narrower but contain more fibers than their predecessors. In electronic endoscopes, resolution is determined by the number of sensing elements in the CCD chip of the endoscope. Another factor that plays a role in resolution is the angle of view because resolution decreases with wide-angle lenses that allow studying a large area of mucosa. Resolution can be improved by using a narrow angle of view, and small areas of the mucosa can be examined in detail. The endoscopic procedure that uses a narrow angle of view becomes cumbersome because the tip of the endoscope has to be moved repeatedly to scan the entire circumference of the bowel. Routine endoscopy is designed for rapid scanning of the mucosa
ENDOSCOPY
of the gastrointestinal tract, so wide-angled lenses are part of all endoscopy systems. Therefore, all endoscopes are compromises between resolution and field of view. A third factor that effects resolution is brightness. Detail is difficult to discern in dark images, and resolution decreases. COLOR The principal determinant of mucosal color is the ability of the mucosa or an abnormality in the mucosa to reflect some wavelengths of light and not others. In addition to reflecting light, some areas in the gut wall also transmit light through the wall to underlying structures from which it is reflected. One example of light transmission through the bowel wall is the blue color seen in the region of the liver when it is viewed through the colon. The colonic lining in this situation acts as a filter, and the color visualized depends on the absorption and transmission of selected wavelengths through it. The eye sees color by using receptors of three different sensitivities according to the wavelength of the light within the electromagnetic spectrum. Therefore, it is convenient to break the visible spectrum of light into three approximately equal portions that consist of the red end (wavelength longer than 630 nm), the green middle (midsized wavelength between 480 and 560 nm), and the blue (wavelength shorter than 480 nm) end of the spectrum. The eye can appreciate color only as a combination of these three colors, and therefore they are called primary colors. Adding red, green, and blue produces the sensation of a particular color. These colors are called additive primaries. Another way to produce the same effect would be to use filters that subtract light selectively from the visible spectrum by transmitting light that possesses only a range of wavelengths. The three colors magenta, yellow, and cyan are special because each of these is composed essentially of two-thirds of the visible spectrum, that is, a third of the spectrum is missing. When used as filters, these colors can selectively subtract a third of the visible portion from white light. These colors are subtractive primaries. The brightness of the colors and of the images as a whole is an important aspect of endoscopic visualization. The visual impression of brightness of the image is also called luminosity. Color has three principal attributes: hue, saturation, and brightness. Hue is the property by which the eye distinguishes different parts of the spectrum. This property distinguishes red from green and blue. Saturation measures how much color is present. Colors appear whiter or grayer as saturation is reduced. Brightness or lightness, on the other hand, is a measure of the brilliance of the color, which depends on its ability to reflect or transmit light. For example, two colors may have the same hue, like yellow in lemon and banana, but one may reflect light more intensely, creating a more brilliant color sensation due to the difference in their brightness. When designing a video system for special applications such as endoscopy, it is essential to utilize video instruments and configurations to detect and quantify colors that are best suited to the human tissues and the body conditions under which they are used. The transmission
335
characteristics of fiber-optic instruments have been studied using standard illuminants of defined spectral composition (5). Fiber-Optic instruments transmit red colors much more efficiently than green and transmit blue least efficiently. Fiber-Optic and lens assemblies can distort the impression of color by selectively absorbing some wavelength of light because of the transmission characteristics of the fiber bundles. The perception of a particular color can be produced in three ways: (1) By viewing a source of light that is monochromic and produces light of a single wavelength, for example, laser light (2) Via the subtractive method of producing color, such as that employed when pigments or dyes are used. When white light shines through stained glass, for example, the glass selectively transmits some wavelengths and absorbs others giving rise to the sensation of color in the viewer’s eye. (3) Via the additive method of producing color. The additive method is used to generate color images on a television monitor. Stimulating red, green, and blue phosphors by using different intensities produces a particular color. The CCD, the key element of the electronic endoscope, is a small microelectronic device that converts an image into a sequence of electronic signals, which are transformed into an image on the monitor screen after appropriate processing. The CCD, which consists of thousands of minute photosensitive elements, converts light energy into an electrical charge proportional to the red, green, or blue component of the image. The CCD per se does not see color, and each sensing element responds only to the brightness of the light that falls upon it, much like rods in the retina. Therefore, special arrangements must be made for CCDs to visualize and reproduce color. The tristimulus system of human color vision is used as the basis for determining the color image. These two principal mechanisms are used to generate color in the image: 1. Sequential illumination: A rotating wheel that has three colored filters, red, green, and blue, is interposed between the light source and the mucosa. The filter rotates rapidly so that the mucosa is rapidly is exposed to short, rapid bursts of red, green, and blue light. The mucosa selectively absorbs different wavelengths of each color light, depending on its own color, and reflects the rest. Each of the thousands of sensing elements in the CCD senses reflected light that falls upon it and produces an electrical charge that is sent to the video processor. The processor is a computer that records the information from the CCD for red, green, and blue filters and transforms the electronic impulses into a message that is sent to the television monitor. The color wheel rotates at speeds of 30–50 revolutions per second that is too fast for the eye to appreciate the red, green, and blue components of the image. 2. Color chip: In color chip technology, each sensing element on the CCD is covered with a filter that is red, green, and blue, and white light shines continuously on the mucosa. This is achieved by mounting stationary filters on the surface of the CCD. When light is reflected selectively from the
336
ENDOSCOPY
mucosa, it passes through a filter before reaching an individual sensing element. Then, the element produces a charge proportional to the red, green, and blue component of the image. This proportional charge production depends upon the element filter color. The processor reads and integrates the information from the CCD and transforms the electronic impulses into a message that is sent to the television monitor, as described before. In laboratory studies, the performance of the two types of systems was similar (4). Measurement of Color and Resolution of an Endoscopy System Colorimetry is the technique that is used to measure color in endoscopy based on the principle that any color can be defined by an appropriate combination of red, green, and blue. The color receptors (cones) in the human eye are sensitive to these colors, so image display devices such as monitors are designed where the color of each pixel is determined by the intensity of the red, green, and blue dots that constitute it. Significant diagnostic information is encapsulated in the red, green, blue values measured by a CCD to make the use of a video system worthwhile. Photoelectrical colorimeters consists of three photoelectric cells covered by appropriate filters that convert light into electrical impulses that are amplified and can be converted into numerical units for comparison to reference colors of known values. Spectrophotometry is more accurate but much less convenient color measurement method (6,7), which has been used to characterize the transmission characteristics of fiber-optic instruments and to quantify blood flow in gastrointestinal tissues. Spectrophotometers produce a quantitative graph of the intensity of each wavelength of light across the visible range. Measurement is in the form of a graph. Resolution is measured using standard test charts of alternating black and white lines. It has been shown that the resolution of electronic endoscopes surpasses fiberscopes (4). IMAGE PROCESSING Both computer and television monitors create the illusion of continuous motion by rapidly displaying a number of images (called frames) each second. The eye cannot distinguish them as separate, and thereby an impression of continuity is created. Each frame is made of thin horizontal lines on the picture tube. The television monitor scans at rates and sequences different from the computer monitors, and this creates compatibility problems. A computer monitor, for example, scans lines from top of the screen to the bottom, whereas television monitors display all odd numbered lines on the screen and then all even numbered lines. The time for developing a frame is also different in the two systems. Therefore, the signal from an electronic endoscope needs to be modified to suit the monitor. The image processor converts the images captured by the video devices into numbers in a process called digitization. The processed numbers are used to create an image. Video capture devices that can digitize sufficiently quickly to
approximate real time are called frame grabbers. The digitization process includes color information, and this is achieved by matching colors in the image to colors in the computer’s memory (via a color look-up table). Computers can demonstrate varying numbers of colors that depend on the amount of memory attached to each pixel in the image. For example, 256 different colors can be displayed, from eight bits of information, several thousands colors can be displayed, from 16 bits and 16 million colors can be displayed from 24 bits. Because the range of colors in the gastrointestinal tract is limited, it is uncertain whether providing images of higher color resolution is necessary for diagnosis. In a study, lesions imaged in 8-, 16- and 24-bit color were displayed to endoscopists in random order on a monitor (8). The endoscopists could not distinguish among the images in 41% of the cases; images were correctly identified in 22% of the cases and incorrectly in 37% of the cases. Image Processing in Endoscopy Image processing has been introduced to the field of endoscopy but remains an investigational tool without general clinical applicability. The principal applications of image processing in endoscopy have been quantifying the characteristics of lesions or organs to provide diagnostic or prognostic information. Image analysis has been successfully used in other endoscopic disciplines to provide objective information that may aid in diagnosis. In gynecology, for example, preliminary studies have shown that computerized colposcopy may be useful in managing cervical dysplasia, a precursor of cancer (9). There are two major applications for image processing in endoscopy: (1) more accurate measurement of lesion dimensions and (2) characterization of tissue. Measurement of Dimensions in Endoscopy In addition to determining the prognosis of lesions visualized in endoscopy, endoscopic measurement of lesion size is important in predicting the pathological characteristics or the outcome of therapy. The risk of malignancy in colonic polyps, for example, increases when they are larger than 1 cm in diameter (10). Size may also affect treatment decisions for malignant lesions. For example, rectal cancers smaller than 3 cm in diameter are locally resected, and the recurrence rate is low. Larger lesions are associated with extrarectal spread, and the results of local resection are poorer. Therefore, the size of a lesion may help direct the surgical approach to some lesions. Problems with Lesion Measurement in Endoscopy It has been shown that visual estimates of ulcer size by endoscopists using endoscopes have many inaccuracies (11–13). The cause of this error is the distortion caused by the wide-angle lens that is part of the design of all endoscopes. This distortion causes a uniform grid to appear barrel shaped, so that points in the periphery of the field appear compressed relative to those in the center of the image. As a result of this barrel distortion, lesions appear smaller when they are further from the center
ENDOSCOPY
of the endoscopic field. The ability of the endoscopists to estimate the size of the ulcer was assessed in a model ulcer visualized by an electronic endoscope (13). The study demonstrated that there is no correlation between the true size of the ulcer and its size as estimated by the endoscopists. The error increased as the size of the lesion increased because the effects of barrel distortion were accentuated. Several attempts have been made to correct the measurement error caused by the wide-angle lens. Early studies (14,15) used grids built into the lens system of the endoscopes but had high error rates (11 ± 6%). Another method was using devices of known size introduced into the endoscopic field. The size of the lesion was then compared to the size of the measuring device. Measuring devices have varied, but the most popular method is to compare the size of the lesion to the distance between the tips of the jaws of alligator forceps used to obtain biopsy specimens in endoscopy. The underlying hypothesis is that the lesion and the measuring device will undergo equal distortion and magnification. Therefore, comparative measurements were expected to provide a more accurate determination of size. To make a measuring device of this nature work, the measuring device needs to be on or adjacent to the lesion so that the distances of the lesion and the measuring device from the lens are equal. When the measuring device is not in the same plane as the lesion, an error results from unequal magnification. The magnitude of this error increases with increasing distance of the lesion from the endoscope tip. Another issue is that if the measuring device is smaller or larger than the lesion, unequal barrel distortion will cause incorrect measurements because barrel distortion is not uniform; compression of the image becomes more pronounced at the periphery of the endoscopic field than in the center. For example, when lesions lie in the periphery of the field, comparative measurements become inaccurate because the measuring device tends to enter the field near the center due to the fixed location of the biopsy channel. Under these conditions, barrel distortion squeezes individual pixels in the image of the lesion in the periphery of the field together and creates an apparent reduction in size, the measuring device remains minimally altered, resulting in a systematic undermeasurement of ulcer size. In vitro (6,7) and in vivo (7) studies using open biopsy forceps as a measuring device of fixed size have shown significant underestimation of lesion size. Underestimation becomes more pronounced as the size of the lesion becomes greater than the size of the open biopsy forceps due to unequal barrel distortion. Dancygier et al. described a modification of the comparative measurement technique (16). Measurements are made by introducing a rubber plate of known area into the endoscopic field through the biopsy channel onto a lesion such as an ulcer. This technique yielded accurate measurement that had a mean error of 4.2 ± 0.5%. Using a modified technique of the rubber plate, Okabe et al. found that the method is accurate as long as the distance of the lens from the target is greater than 4 cm (17). When the distance is shorter, techniques to correct barrel distortion are necessary to obtain accurate results.
337
Image Processing Techniques for Lesion Measurement. In this technique, a new image is created by applying a geometric correction factor to each pixel of the original image, shifting the position of the pixel relative to the center of distortion of the lens (18). The correction factor is determined by measuring how an endoscope distorts an array of points. An inverse transformation is then applied to the image. Once determined, this conversion factor is constant for that endoscope. Once the conversion is accomplished, the number of pixels in the image that correspond to a given distance (1 mm) is determined. By this simple method of converting the pixels into millimeters, there is no need to introduce measuring devices into the endoscopic field. This process was used for a model ulcer of known size and also a target of known size (coated chewing gum) swallowed by a patient before endoscopy (13). Then, the procedure was transformed into a computer program that can be run on a personal computer. Measurements using this technique showed an error of 1.8 ± 2.2% for the model ulcer and 2.8 ± 3.2% for the swallowed gum. This technique has the important advantage that, once the instrument is calibrated, the only measurement that needs to be made by the endoscopist is the distance of the lesion from the endoscope using a graduated probe. The technique is relatively inexpensive because it uses personal computers that are widely available. Other image processing techniques have also been described. Hofstad et al. described colonic polyp measurement by photographing a polyp that had a measuring device next to it (19). Then, the photographic images were processed, and the size of the polyp was determined. The polyp was subsequently removed, and its size and the weight were obtained. The weight of the polyp correlated with the calculated area of the polyp. Kim et al. applied an image processing technique to study Barrett’s esophagus (a precancerous change in the esophagus) to determine the precise area of involvement (20). Photographs of the epithelium were converted into computer-generated maps that can be used to provide areas involved by the metaplastic changes. The software that was developed corrects first for the distortion of the endoscope lens, then calculates the center of the endoscopic photograph, and finally unrolls the photograph into a planar image. A series of photographs is obtained at a short distance from each other, and these are stacked together by the computer to calculate the area affected along the length of the esophagus. The measured area using this technique had an error rate of 5.2%. There was little interobserver variability. Finally, Yamagushi et al. described a technique using a diffraction grating and a laser (21). The system consists of an argon ion generator, a side-viewing endoscope that has a diffraction grating made of glass fiber, and a graphic processing system. A fiber grating was fit at the end of the endoscope, and 1600 light spots produced by the diffracting system were projected on the gastric mucosa. By analyzing the deviation of laser spots on the gastric lesion, the diameter, area, and depth of the lesion could be determined. For lesions 30 mm in diameter, the error of area measurement was 2.8 ± 1% and 3.7 ± 3% in vitro
338
ENDOSCOPY
and in vivo, respectively. For 5-mm lesions, the error of area measurement was 7.5 ± 5% and 6.5 ± 3% in vitro and in vivo, respectively. It is clear that the disadvantages of this system are that when the ulcer is smaller than 5 mm in diameter, error increases significantly. Unfortunately, none of the image processing techniques is widely available because they are either cumbersome or expensive. Despite their inaccuracies, visual estimation and comparative measurements (without image processing) are the most frequently used methods for determining lesion size today. Image Processing to Characterize Tissue. Abnormal tissue seen in endoscopy does not provide a pathological diagnosis. Often, biopsies must be obtained and processed, and a final diagnosis may be delayed by several days. A method that would allow a precise diagnosis at the time of endoscopy would be very useful. A variety of techniques has been developed that would allow characterizing tissue at the time of endoscopy that are described here. Laser-Induced Fluorescent Spectroscopy (LIF) and Elastic Scattering Spectroscopy This is a technique in which low-power laser light ( dj (x),
∀ x ∈ ωk , and j = 1, 2, . . . , M, j = k,
where dk (x) and dj (x) are respectively, evaluated values of the discriminant functions for patterns x in classes k and j. Training the Nonparametric Decision Theoretic Classifier. The decision surfaces that partition the space may be linear or nonlinear. Many nonlinear cases can be handled by the method of piecewise linear or φ machine [see details in (8)]. In the last section of this article, we will introduce artificial neural networks for pattern classification. An artificial neural network is a powerful tool in solving nonlinear problems but has much higher computational complexity. Let us go back to the linear case. The training of a system is to find the weight vector w that has a priori information obtained from the training samples. We can do this either in the pattern space or in the weight space, but it is much more efficient and effective to work in the weight space. As described in the previous section, the weight space is an (n + 1)-dimensional space. For each prototype zm k , k = 1, 2, . . ., M, m = 1, 2, . . ., Nk (where M represents the number of categories (classes) and Nk represents the
362
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
The optical spectrum
10
200 Extreme
300 390 Far
455
492
Near Violet Blue
Ultra-violet
577 Green
597
Yellow
622
Orange
770 Red
1500
6000 4×10
Near Medium
Visible light
Far
4
10
6
λ (nm)
Extreme
Infra-red G
D
E
F
A B
C
Figure 17. Schematic diagram of a multi-spectral scanner.
number of prototypes belonging to category k), there is a hyperplane in the weight space on which
hyperplanes in the weight space. A vector w can be found in such a region so that
wT zm k = 0. Any weight vector w on the positive side of the hyperplane m yields wT zm k > 0. This correctly classifies zk in class k. Any weight vector w on the negative side of this hyperplane m yields wT zm k < 0 and incorrectly classifies zk in a class other than class k. Whenever any misclassification occurs, the w vector should be adjusted (or moved) to the positive side of the hyperplane to make wT zm k greater than zero again. Consider a simple two class problem. The decision surface is represented by d(x) = d1 (x) − d2 (x). When w is on the positive side of the hyperplane and z1 , z1 ∈ ω1 , is presented to the system, the prototype, z1 , will be correctly classified, because wT z1 > 0. When z2 , z2 ∈ ω2 , is presented to the system, wT z2 should be less than zero. Suppose that there are N1 prototypes that belong to ω1 and N2 prototypes that belong to ω2 . Then N = N1 + N2
∀ zm 1 ∈ ω1 ,
m = 1, 2, . . . , N1
m wT zm 2 < 0 ∀ z2 ∈ ω2 ,
m = 1, 2, . . . , N2 .
wT zm 1 > 0 and
This is the solution region for class ω1 in W space. That region lies on the positive sides of the N1 hyperplanes for class ω1 and on the negative sides of the N2 hyperplanes for class ω2 . Figure 18 shows the solution region for three prototypes z11 , z21 , and z31 (all of them belong to ω1 ). m We know that zm 1 are in ω1 , and z2 are in ω2 . To start the training, w may be chosen arbitrarily. Let us present zm 1 to the classification system. If w is not on the positive T m side of the hyperplanes of zm 1 , w z1 must be less than 0. Then we need to move the w vector to the positive side of the hyperplanes for zm 1 . The most direct way of doing this is to move w in a direction perpendicular to the m hyperplane (i.e., in the direction of zm 1 or −z2 ). If w(k) and w(k + 1) are, respectively, the weight vectors at the kth and (k+1)-th correction steps, then the correction of w can
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
procedure of weight vector adjustment based on the order of presentation of the prototypes shown in the figure.
w2
. fo
1
rz2
+ Solution region +
h.p
+
. h.p
for
Principal Component Analysis for Dimensionality Reduction
3
z1
w1 +
h.p
. fo
rz1
+
1
+
z1 1
Figure 18. Solution region for three prototypes z11 , z21 , and z31 of ω1 .
2
z1 c
e
+
+
d +
w
y = Ax
wTz2m < 0
z12
f
In previous sections, we have already discussed the problems that may arise in pattern classification in high-dimensional spaces. We have also mentioned that improvements can be achieved by mapping the data in pattern space into a feature space. Note that the feature space has a much lower dimensionality, yet it preserves most of the intrinsic information for classification. On this basis, let us introduce the technique of principal component analysis (8,11,13). The objective of principal component analysis is to derive a linear transformation that emphasizes the difference among pattern samples that belong to different categories. In other words, the objective of the principal component analysis is to define new coordinate axes in directions of high information content useful for classification. Let ml and Cl denote, respectively, the sample mean vector and the covariance matrix for the lth category (l = 1, 2, . . . , M). These two quantities can be computed from the training set. Now, our dimensionality reduction problem is to find a transformation matrix A. Two results are obtained by this transformation. First, an ndimensional observation vector, x = x1 , x2 , . . . , xn , will be transformed into a new vector yT = [y1 , y2 , . . . , yp ], whose dimensionality p is less than n, or
wTz1m > 0
Solution g region for ω1
z22
a
b Order of presentation of prototypes: z11, z21, z12, z22, z11, z21, z12, z22, z11, z21, z12, z22, z11, z21, z12, z22
where A is a p × n transformation matrix. This transformation is based on the statistical properties of the sample patterns in the vector representation and is commonly referred to as principal component analysis. Principal eigenvectors represent directions where the signal has maximum energy. Figure 20 shows a two-dimensional pattern space in the Cartesian coordinate system for a two-class classification problem. From the distribution of the pattern samples in Fig. 20, we can see that these pattern points cannot be effectively
Figure 19. Weight vector adjustment procedure for weight training a two-class classification system.
x2 y2
be formulated as w(k + 1) = w(k) +
363
y1
czm 1
if w
T
(k)zm 1
0
w(k + 1) = w(k)
if correctly classified
ω1
x1
During this training period, patterns are presented, one at a time, through all prototypes. A complete pass through all of these patterns is called an iteration. After an iteration, all of these patterns are presented again in the same or other sequence to carry on another iteration. This is repeated until no corrections are made through a complete iteration. Figure 19 shows the step-by-step
ω2
Figure 20. A two-dimensional pattern space and principal component axes.
364
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
ω1
ω2
ω2 to
y
ω1
ω2
t0
ω1
The conventional way to handle this problem is to preprocess the data to reduce its dimensionality before applying a classification algorithm, as mentioned in the previous section. Fisher’s linear discriminant analysis (8,11,13) uses a linear projection of n-dimensional data onto a one-dimensional space (i.e., a line). It is hoped that the projections will be well separated in classes. In so doing, the classification problem becomes choosing a line that is oriented to maximize class separation and has the least amount of data crossover. Consider that an input vector x is projected on a line resulting in a scalar value y:
y′ Figure 21. Selection of the first principal component axis for classifier design.
discriminated according to their distribution along either the x1 axis or the x2 axis alone. There is an overlapped region resulting in an error in classification. If we rotate the x1 and x2 axes to the positions shown as y1 and y2 , then the distribution of these pattern points are well represented for discrimination. Component axis y1 will be ranked first by its ability to distinguish between classes ω1 and ω2 (i.e., it has smallest error in the projected space) and is called the first principal component axis. Along the axis y2 , the projections of the sample patterns of the two classes have a large overlapping region. This principal component axis is not effective for classification. Figure 21 shows an example of a two-class problem (ω1 and ω2 ), where distributions of sample patterns are projected onto two vectors y and y , as shown. A threshold t0 is chosen to discriminate between these two classes. The error probability for each is indicated by the cross-hatched region in the distributions. From the figure, we see that the error from projection onto the vector y is smaller than that onto the vector y . This component y will be ranked first by its ability to distinguish between classes ω1 and ω2 . For the linear classifier design, we will select y and t0 to give the smallest error in the projection space. Data analysis by projecting pattern data onto the principal axes is very effective in dealing with high-dimensional data sets. Therefore, the method of principal components is frequently used for the optimum design of a linear classifier. How many principal components are sufficient for a classification problem? That depends on the problem. Sometimes, the number of components is fixed a priori at two or three, as in those situations that require visualizing of the feature space. In some other situations, we may set a threshold value to drop out less important components whose associated eigenvalues (λ) are less than the threshold. The procedure for finding the principal component axes can be found in (8).
y = wT x where w is a vector of adjustable weight parameters specifying the projection. This is actually a transformation from the data point set x into a labeled set in a onedimensional space y. By adjusting the components of the weight vector w, we can select a projection that maximizes the class separation. Consider a two-class problem consisting of N pattern samples. N1 of these samples belong to class ω1 , and N2 of these samples belong to ω2 . The mean vectors of these two classes are, respectively, m1 =
and
m2 =
1 N1
1 N2
N1
xi1 ,
for class ω1 ,
xi2 ,
for class ω2 ,
i=1
N2 i=1
where the subscripts denote the classes and the superscripts denote the patterns in the class. The projection mean (m1 ) for class ω1 is a scalar given by m1 =
1 N1
N1
yi1 =
i=1
= wT
1 N1
N1
1 N1
N1
wT xi1 ,
i=1
xi1 = wT m1 .
i=1
Similarly, the projection mean of class ω2 is a scalar given by m2 =
1 N2
N1
= wT
yi2 =
i=1
1 N2
N1
1 N2
N1
wT xi2 ,
i=1
xi2 = wT m2 .
i=1
Optimum Classification Using Fisher’s Discriminant Multidimensional space, no doubt, provides us with more information for classification. However, the extremely large amount of data involved makes it more difficult for us to find the most appropriate hyperplane for classification.
Therefore the difference between the means of the projected data is, m2 − m1 = wT (m2 − m1 ).
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
365
SB is the between-class covariance matrix. Similarly, the denominator can be rewritten as ω1
s21 + s22 = wT Sw w
ω2
where
Arb
itra ry bou deci nda sion p(x /ω ry 1 )p (ω ) 1
Sw =
Optimum decision boundary
N1
(xi − m1 )(xi − m1 )T +
i=1 xi ∈ω1
R1
N2
(xi − m2 )(xi − m2 )T .
i=1 xi ∈ω2
Sw is the total within-class covariance matrix. Hence,
E1 = p ∫ (x /ω2)p(ω )d 2 x R1
p(x /ω
2 )p
R2
(ω2 )
E2 = p ∫ (x /ω1)p(ω )d 1 x R2 Figure 22. Probability of error in a two-class problem and Fisher’s linear discriminant.
J(w) =
wT S B w wT S w w
J(w) is Fisher’s criterion function to be maximized. Differentiating J(w) with respect to w and setting the result equal to zero, we obtain (wT SB w)Sw w = (wT Sw w)SB w.
When data are projected into y, the class separation is reflected by separation of the projected class means, and then we might simply choose an appropriate w to maximize (m2 − m1 ). However, as shown in Fig. 22, some cases would have to be discussed. We have to take into account the within-class spreads (scatter) of the data points (or the covariance of each class). Let us define the within-class scatter of the projection data as s21 =
(y − m1 )2 ,
y∈Y1
and s22 =
(y − m2 )2 ,
y∈Y2
where y1 and y2 are, respectively, the projections of the pattern points from ω1 and ω2 . Then, fisher’s criterion J(w) is J(w) =
squared difference of the projection means total within class scatter of the projection data
or =
(m2 − m1 )2 , s21 + s22
where s21 and s22 are, respectively, within-class scatters of the projected data of each class. The sum of s21 and s22 gives the total within-class scatter for all of the projection data. Then, (m2 − m1 )2 , the numerator of J(w), can be rewritten as (m2 − m1 )2 = wT (m2 − m1 )(m2 − m1 )T w = wT SB w, where SB = (m2 − m1 )(m2 − m1 )T .
We see that SB w = (m2 − m1 )(m2 − m1 )T w = k(m2 − m1 ) is always in the direction of (m2 − m1 ). Furthermore, we do not care about the magnitude of w, only its direction. Thus, we can drop the bracketed scalar factors. Then, w ∝ S−1 w (m2 − m1 ) Figure 22 shows the probability of error in a two-class problem using the Fisher discriminant. The probability of error that would be introduced in schemes discussed previously is a problem of much concern. Let us take the two-class problem for illustration. The classifier will divide the space into two regions, R1 and R2 . The decision that x ∈ ω1 will be made when the pattern x falls into the region R1 , and x ∈ ω2 , when x falls into R2 . Under such circumstances, there will be two possible types of errors: 1. x falls in region R1 , but actually x ∈ ω2 . This gives the probability of error E1 , which may be denoted by Prob(x ∈ R1 , ω2 ). 2. x falls in region R2 , but actually x ∈ ω1 . This gives the probability of error E2 , or Prob(x ∈ R2 , ω1 ). Thus, the total probability of error is Perror = Prob(x ∈ R1 | ω2 )p(ω2 ) + Prob(x ∈ R2 | ω1 )p(ω1 ) = p(x | ω2 )p(ω2 ) dx + p(x | ω1 )p(ω1 ) dx R1
R2
where p(ωi ) is the a priori probability of class ωi , and p(x | ωi ) is the likelihood function of class ωi or the class conditional probability density function of x, given that x ∈ ωi . More explicitly, it is the probability density function for x given that the state of nature is ωi , and p(ωi | x) is the probability that x comes from ωi . This is actually the a posteriori probability. Perror is the performance criterion that we try to minimize to give a good classification. These
366
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
two integrands are plotted in Fig. 22. Figure 22 also shows Fisher’s linear discriminant which gives the optimum decision boundary. For a more detailed treatment of this subject, refer to (8). K-Nearest Neighbor Classification What we have discussed so far has been supervised learning, that is, there is a supervisor to teach the system how to classify a known set of patterns and then let the system go ahead freely to classify other patterns. In this section, we discuss unsupervised learning, in which the classification process will not depend on a priori information. As a matter of fact, quite frequently, much a priori knowledge about the patterns does not exist. The only information available is that patterns that belong to the same class share some properties in common. Therefore,
similar objects cluster together by their natural association according to some similarity measures. Euclidean distance, weighted Euclidean distance, Mahalanobis distance, and correlation between pattern vectors have been suggested as similarity measures. Many clustering algorithms have been developed. Interested readers may refer to books dedicated to pattern recognition (7,9,10). One of the algorithms (k nearest neighbor classification) (14–16) will be discussed here. Clusters may come in various forms (see Fig. 23). In previous sections, we have already discussed the classification of two clusters whose shape is shown in Fig. 23a. This is one of the most common forms that we encountered. However, many other forms such as those shown in Fig. 23b–f also appear, whose data sets have different density distributions. In some cases,
(a)
(b)
(c)
(d)
Figure 23. Patterns cluster in various forms.
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
367
(f)
(e)
Figure 23. (Continued)
there is a neck or valley between subclusters, and in other cases, the data set clusters in a chain. In these cases, we notice that the boundary surface is not clearcut and patterns that belong to different classes may interleave together. In this section and the sections that follow, we will introduce algorithms that address these problems. Hierarchical clustering based on k nearest neighbors will be discussed first. Hierarchical clustering, in general, refers to finding cluster structure of unknown patterns. Classification based on nearest neighbors is a process of classifying a pattern as a member of the class to which its nearest neighbor (or neighbors) belongs. If the membership is decided by a majority vote of the k nearest neighbors, the procedure will be called the k nearest neighbor decision rule (14–16). This is to address the cases whose distributions are not the same throughout the pattern space. The procedure consists of two stages. In the first stage, the data are pregrouped to obtain initial subclusters. In the second stage, the subclusters are merged hierarchically by using a certain similarity measure. Pregrouping of Data to Form Subclusters. First, let us determine k appropriately. For every pattern point, one can always find its k nearest neighbors. At the same time, this pattern point may also be one of the k nearest neighbors of its k nearest neighbor. According to the argument in the previous paragraph, there are two choices: (1) assign this pattern point as a member of the class to which its nearest neighbor belongs, or (2) classify its k nearest neighbor as a member of the class to which this pattern point belongs. The choice between (1) and (2) will depend on whether this pattern point or its k nearest neighbor has the higher potential to be a subcluster. For the Euclidean distance d(xi , xj ) between sample points xi and xj , we can define Pk (xi ) as the average of the distances
d(xi , xj ) over k: 1 d(xi , xj ), Pk (xi ) = k j∈ k (xi )
where k (xi ) is a set of k nearest neighbors of the sample point xi based on the Euclidean distance measure and Pk (xi ) is a potential measure for the pattern point xi to be the center of a subcluster. Obviously, the smaller the value of Pk (xi ), the higher the potential for the pattern point xi to be the center of a subcluster. In other words, all k nearest neighbors (x1 , x2 , . . . , xk ) of the pattern point xi should be clustered toward the pattern point xi (see Fig. 24). xi is said to be k-adjacent to x1 , x2 , x3 , . . . and xk ; k = 6 in this example. Figure 25 illustrates ξk (xi ), which is a set of sample points that are k-adjacent to the sample point xi . From this figure, we see that xi is a k nearest neighbor of xa , or xa is k-adjacent to xi , and that xi is a k nearest neighbor of xb , or xb is k-adjacent to xi . Therefore, ξk (xi ) = [xa , xb , . . .], which is a set of sample
x2
x1
x3 xi x6
x5
x4
Figure 24. Definitions of k (xi ). k (xi ) = [x1 , x2 , . . . , xk ], k = 6 in this example; xi is said k-adjacent to (x1 , x2 , . . . , xk ), and Pk (xi ) is the smallest in value among Pk (xj ), j = 1, 2, . . . , 6, and i.
368
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
x1 x2
x6
x3
xa xi x5
where xi is its center should subordinate to xa or to xb will depend on the strength of Pk (xa ) and Pk (xb ). Subordinate the xi and its group members to xa , when xa has the highest potential [i.e., when Pk (xa ) has the smallest value among the others in ξk (xi )] to be a cluster center of a new larger subcluster. After the pattern points are grouped in this way, one will find that several subclusters are formed. Let us count the subclusters and assign every pattern point to its nearest subcluster.
x4
xb
Merging of Subclusters. Subclusters obtained after pregrouping may exhibit two undesired forms. (1) The boundary between two neighboring subclusters may be unclear, compared to the regions near the subcluster centers (see Fig. 26a); and (2) sometimes there may be a neck between these two subclusters (see Fig. 26b). What we expect is to merge the two subclusters in (1) and leave the two subclusters as they are in (2). An algorithm follows to differentiate these cases. sc Let us define Psc k (m) and Pk (n) (see Fig. 27) as the potentials, respectively, for the subcluster m and the subcluster n. They are expressed as
Figure 25. Definition of ξk (xi ), which is a set of sample points k-adjacent to xi . Note that xa is k adjacent to xi however, xi may not be k-adjacent to xa .
1 Nm [ k (xm ) ∩ Wm ]
Psc k (m) =
=
points that are k-adjacent to xi . After knowing ξk (xi ), xi and its group members can cluster toward xa or xb , depending on which one (xa or xb ) has the higher potential to form a larger subcluster. Note that xa , xb , . . . may or may not be the k nearest neighbors of xi . Now follow the described procedure to compute ξk for every sample point x. The next step in the process is to subordinate the sample point xi to a sample point xj , which is k-adjacent to xi and at the same time, possesses the highest potential among all pattern points that are k-adjacent to xi . Then, Pk (xj ) = min Pk (xm ), xm ∈ξk (xi )
where xm is one of the pattern points that are k-adjacent to xi and Pk (xj ) is the potential measure of pattern point xj . This expression means that Pk (xj ) of the pattern point xj has the smallest value (i.e., highest potential) among Pk ’s of those points that are also k-adjacent to xi . Then, we assign xi to xj , because xj has the highest potential to be the center of a subcluster. If it has already been known that xj belongs to another subcluster, then we assign the point xi and all of its members to the same subcluster to which xj belongs. If Pk (xi ) = Pk (xj ), that means that no points in ξk (xi ) have a potential higher than Pk (xi ), and xj is the only point whose potential Pk (xj ) is equal to Pk (xi ) of xi . Then, the pattern point xi subordinates to no point because the possible center of the subcluster is still itself. So far, all of the pattern points xi , i = 1, 2, . . ., N, can be grouped to form subclusters. Next, let us check whether some of these subclusters can further be grouped together. From Fig. 25, ξk (xi ) = (xa , xb , . . .). Whether the subcluster
1 Nm
Nm
Pk (xi )
i∈[ k (xm )∩Wm ]
Pk (xi ),
i=1
and Psc k (n) =
1 Nn [ k (xn ) ∩ Wn ]
=
1 Nn
Nn
Pk (xj )
i∈[ k (xn )∩Wn ]
Pk (xj ),
j=1
where k (xm ) is a set of k nearest neighbor pattern points of the center xm of the subcluster m; Wm is the set of pattern points contained in the subcluster m; the number of these pattern points may be (and usually is) greater
(a)
(b)
Figure 26. Two different forms of pattern distribution.
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
than k. Nm is the number of points which are the k nearest neighbors of xm and are also in the subcluster m. Therefore, Psc k (m) is the average of Pk (xi ) over all of the points that are in cluster m and at the same time are the k nearest neighbors of the sub-cluster center xm . When more points in Wm are k nearest neighbor points of the cluster center xm , it means that the subcluster m has a higher density. Then, we can use this as a tightness measure for subcluster m. The tightness measure is a good measure for subcluster merging considerations. Similar definitions can be made for k (xn ), xn , Wn , and Psc k (n). Next, let us define a few more parameters: Ykm,n , Ykn,m , BPkm,n and BPn,m k , so that we can establish a condition for merging two subclusters. Ykm,n is defined as the set of pattern points x’s in subcluster m, each of which is, respectively, a k nearest neighbor of some pattern points in subcluster n. That is to say, Ykm,n represents the set of pattern points, x’s, that are in subcluster m, and those points that are k-adjacent to these pattern points x’s are in subcluster n. Take a couple of sample pattern points xi and xj from Fig. 28 as examples for illustration. xi is a pattern point in subcluster m, and xi at the same time, is a k nearest neighbor of xa , which is in the subcluster n. Similarly, xj is a pattern point in subcluster m and, at the same time, is a k nearest neighbor of xb , which is in subcluster n. Then, Ykm,n
= [x | x ∈ Wm
369
xa xi xj
xb
Cluster m Cluster n m,n
Yk
Figure 28. Definitions for Ykm,n and BPkm,n .
Pk (xi ) over all of the pattern points, x’s, that are in Ykm,n . Obviously, these x’s mentioned before are only part of the pattern points in subcluster m. Similar definitions are made for Ykn,m and BPn,m k ; Ykn,m = [x | x ∈ Wn and = BPn,m k
and ξk (x) ∩ Wn = 0],
and ξk (x) ∩ Wm = 0],
1 Pk (xi ). n,m N(Yk ) n,m xi ∈Y
k
where Wm and Wn represent, respectively, the sets of pattern points contained in subclusters m and n, and ξk (x) represents a set of points k-adjacent to the point x, as defined previously. BPkm,n can be defined as BPkm,n =
1 Pk (xi ) N(Ykm,n ) m,n xi ∈Y
k
where N(Ykm,n ) is the number of pattern points in the set represented by Ykm,n and Pk (xi ) is a potential measure of xi as defined previously. Then, BPkm,n is the average of
provide tightness information on the BPkm,n and BPn,m k boundary between these two subclusters m and n. Let us take the ratio of the tightness measure of the boundary points across the tightness measure of the subclusters (i.e., tightness of BPkm,n /tightness of subcluster) as a measure in considering the possible merging of the two subclusters. Naturally, we would like to have the boundary points more densely distributed to merge these two subclusters, that is, to have a smaller value of the BPkm,n for merging subclusters m and n because BPkm,n is computed in terms of Euclidean distance in pattern space. The smaller the computed value of BPkm,n , the tighter the boundary points. A similar argument applies to the tightness of the subclusters. The smaller the computed value of Psc k , the denser the subcluster in the distribution. Then, we choose a similarity measure (SM− 1) that is m,n proportional to the ratio of Psc k to BPk . SM− 1(m, n) ∝
xn xm
n
m x ∈ Ωk(xm) ∩ Wm
sc Figure 27. Definitions for Psc k (m) and Pk (n).
or SM− 1(m, n) =
Psc k , BPkm,n
sc min[Psc k (m), Pk (n)] . m,n max[BPk , BPn,m k ]
m,n are expressed in terms of distance Both Psc k and BPk in the pattern space. To play safe, let us use the max function in the denominator and the min function for the numerator. SM− 1(m, n) represents the difference in the tightness measure between the subclusters and the
370
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
boundary and can be used to detect the valley between two subclusters. To detect the neck between two subclusters, we use the similarity measure (SM− 2) defined as: SM− 2(m, n) =
(a)
N(Ykm,n ) + N(Ykn,m ) , 2 min[N(Wm ), N(Wn )]
where N(Ykm,n ) and N(Ykn,m ) represent, respectively, the number of points in Ykm,n and Ykn,m , as defined earlier. N(Wm ) denotes the number of points in subcluster m; and N(Wn ) represents the number of points in subcluster n. A large value of SM− 2 (m, n) signifies that the relative size of the boundary is large in comparison with the size of the subclusters. Then, the similarity measure (SM− 2) can be used to detect the ‘‘neck.’’ Combining SM− 1(m, n) and SM− 2(m, n), (b)
SM(m, n) = SM− 1(m, n)∗ SM− 2(m, n). The similarity measure SM (m, n) can be used to merge the two most similar subclusters. Figure 29 gives some results for merging two subclusters that have a neck (upper) and a valley (lower). Algorithm Based on Regions of Influence Most of the approaches discussed previously were based on a distance measure, which is effective in many applications. But difficulties would occur if this simple distance measure were employed for clustering certain types of data sets in the studies of galaxies and constellations. These data sets frequently appear with a change in the point density or with chain clusters within a set, as shown, respectively, in Fig. 30a,b, where one may
(a)
Figure 30. Examples of special types of data sets that have (a) change in point density; (b) chain clusters.
identify the clusters visually. For data like these, some other method is needed for clustering them. The method discussed in this section is based on the limited neighborhood concept (17), which originated from the visual perceptual model of clusters. Several definitions are given first. Let
4
S = [S1 , S2 , . . . , SM ]
3
and = [R1 , R2 , . . . , RM ],
(b)
1
2
Figure 29. Results obtained in merging two sub-clusters that have a neck and a valley: (a) two sub-clusters that have a neck between them; (b) two sub-clusters that have a valley between them.
where Sl and Rl , l = 1, 2, . . . , M, represent respectively, the graphs and the regions of influence; (xi , xj ) represents a graph edge that join points xi and xj . To illustrate the region of influence, two graphs are defined, the Gabriel graph and the relative neighborhood graph. The Gabriel graph (GG) is defined in terms of circular regions. Line segment (xi , xj ) is included as an edge of the GG, if no other point xk lies within or on the boundary of the circle where (xi , xj ) the diameter, as shown in Fig. 31a. Otherwise, (xi , xj ) is not included as an edge segment of the GG. Similarly, the relative neighborhood graph (RNG) is defined in terms of a lune region. The line segment (xi , xj ) is included as an edge of the RNG, if no other point xk lies
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
(a)
371
R2 (xi , xj , β) = RRNG (xi , xj ) ∪ {x : β min[d(x, xi ), d(x, xj )]
xi
< d(xi , xj ), i = j}, where β (0 < β < 1) is a factor called the relative edge consistency. Thus, S1 (β) is obtained from the GG by removing edges (xi , xj ), if d(xi , xj ) < β, min[d(xi , xa ), d(xj , xb )]
xj (b)
xi
xj Figure 31. The shapes of regions defined by (a) Gabriel graph; (b) relative neighborhood graph.
within or on the boundary of the lune where xi and xj are the two points on the circular arcs of the lunes. Then, Sl and l can be defined as (xi , xj ) ∈ Sl
iff xk ∈ / l (xi , xj ), ∀ k = 1, 2, . . . , n;
k = i = j,
Rl (xi , xj ) = {x : f [d(x, xi ), d(x, xj )] < d(xi , xj ); i = j}, where f [d(x, xi ), d(x, xj ) is an appropriately defined function. From these definitions, it can be seen that Sl defines a limited neighborhood set. When max [d(x, xi ), d(x, xj )] is chosen for the function f [d(x, xi ), d(x, xj )] in the previous equation (i.e., find the maximum between d(x, xi ) and d(x, xj ), and use it for the f function), we obtain RRNG (xi , xj ) = {x : max[d(x, xi ), d(x, xj )] < d(xi , xj ); i = j}, where RRNG (xi , xj ) represents the RNG region of influence. It can be seen in Fig. 31b that upper arc of the lune is drawn so that xj is the center and (xi , xj ) is the radius, and the lower arc is drawn so that xi is the center and (xi , xj ) as radius. When d2 (x, xi ) + d2 (x, xj ) is used for f [d(x, xi ), d(x, xj )], RGG (xi , xj ) = {x : d2 (x, xi ) + d2 (x, xj ) < d2 (xi , xj ); i = j}, where RGG (xi , xj ) represents the GG region of influence. The definition of Rl will determine the property of Sl . If Rl ⊆ RGG , the edges of Sl will not intersect. But if Rl ⊃ RGG , intersecting edges are allowed. Take an example to illustrate the case when Rl ⊃ RGG . Assume that regions of influence are defined as R1 (xi , xj , β) = RGG (xi , xj ) ∪ {x : β min[d(x, xi ), d(x, xj )] < d(xi , xj ), i = j},
where xa ( = xj ) denotes the nearest Gabriel neighbor to xi , and xb ( = xi ) denotes the nearest Gabriel neighbor to xj . Figure 32 illustrates the effect of β on the region of influence. Then, it is clear that varying β would control the fragmentation of the data set and hence would give a sequence of nested clusterings. Increasing β would break the data set into a greater number of smaller clusters. The examples of two-dimensional dot patterns shown in Fig. 33 demonstrate the effectiveness of this clustering method. See (17) for supplementary reading.
MULTILAYER PERCEPTRON FOR PATTERN RECOGNITION What we have presented are the relatively simple pattern recognition problems described in the previous sections. For problems in real-world environments, we desire to have a much more powerful solution method. This motivates us to seek help from artificial neural networks. About 10 years ago, there were two hardware products on artificial neural networks on the market. One of them is called ETANN, manufactured by Intel that has 64 neurons; and the other is called ‘‘neural logic,’’ manufactured by Neural Logic Inc. Two functions of artificial neural networks are attractive: (1) the associative property, the ability to recall; and (2) self-organizing, the ability to learn through organizing and reorganizing in response to external stimuli. Such human-like performance will, no doubt, require an enormous amount of processing. To obtain the required processing capability, an effective approach needs to be developed for the dense interconnection of a large number of simple processing elements, an effective scheme for achieving high computation rate is required; and some hypothesis would need to be established. A multilayer perceptron is one of the models developed under the name ‘‘neural networks’’ which is popular in applications. Many other models have been proposed. Due to space limitations let us discuss only the multilayer perceptron. There are many books on artificial neural networks. Readers interested in this subject can find them in a library (12–13, 18–23). The powerful capability of the multilayer perceptron comes from the following network arrangements: (1) one or more layers of hidden neurons are used; (2) a smooth nonlinearity, for example sigmoidal nonlinearity is employed at the output end of each artificial neuron; and (3) a high degree of connectivity in the
372
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
x
x xi
d(x, xi) xi
x xi
d(x, xj)
xj
xj
R1(xi, xj, β)
R1(xi, xj, β)
β1
R1(xi, xj, β) = RGG(xi, xj) ∪ {x: β min[d(x, xj), d(x, xj)] < d(xi, xj), i ≠ j} Figure 32. Drawing illustrating the effect of β.
network. These three distinctive characteristics enable a multilayer perceptron to learn complex tasks by extracting more meaningful features from the input patterns. Naturally, training such a system is much more complicated. Figure 34 shows a three-layer perceptron that has N inputs and M outputs. Between the inputs and outputs are two hidden layers. Let yl , l = 1, 2, . . . , M, be the outputs of the multilayer perceptron, and xj and xk are the outputs of the nodes in the first and second hidden layers. θj , θk and θl are the internal thresholds (not shown in the figure). Wji , i = 1, 2, . . . , N, j = 1, 2, . . . , N1 , are the connection weights from the input to the first hidden layer. Similarly, Wkj , j = 1, 2, . . . , N1 , k = 1, 2, . . . , N2 , and Wlk , k = 1, 2, . . . , N2 , l = 1, 2, . . . , M, are, respectively, the connection weights between the first and second hidden layers and between the second hidden layer and the output layer. They are to be adjusted during training. The outputs of the first hidden layer are computed according to xj
=f
N
Wji xi − θj ,
j = 1, 2, . . . , N1 .
i=1
Those of the second hidden layer are computed from: xk
N1 =f Wkj xj − θk ,
k = 1, 2, . . . , N2 ,
j=1
and the outputs of the output layer are yl = f
N 2 k=1
wlk xk
−
θl
,
l = 1, 2, . . . , M.
The decision rule is to select the class that corresponds to the output node that has the largest value. The function f’s can be a hard limiter, a threshold logic, or a sigmoid logistic nonlinearity which is 1/[1 + e−(α−θ ) ]. Training the Multilayer Perceptron: Backpropagation Algorithm. The backpropagation algorithm is a generalization of the least-mean-square (LMS) algorithm. It uses an iterative gradient technique to minimize the mean-square error between the desired output and the actual output of a multilayer feedforward perceptron. The training procedure is initialized by selecting small random weights and internal thresholds. The training data are repeatedly presented to the net, and weights are adjusted in the order from Wlk to Wji in the backward direction after each trial, until they are stabilized. At the same time, the cost function (which is the mean-square error mentioned before) is reduced to an acceptable value. Essentially the training procedure is as follows: The feedforward network calculates the difference between the actual outputs and the desired outputs. Using this error assessment, weights are adjusted in proportion to the local error gradient. To do this for an individual neuron, we need values of its input, output, and the desired output. This is straightforward for a single-layer perceptron. But for a hidden neuron in a multilayer perceptron, it is difficult to know how much input is coming into that node and what is its desired output. To solve this problem, we may consider the synapses backward to see how strong that particular synapse is. The local error gradient produced by each hidden neuron is a weighted sum of the local error gradient of neurons in the successive layer. The whole training process involves two phases, a forward phase and a backward phase. In the forward phase, the error is computed, and in the backward phase, we proceed to
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
(a)
373
(b)
(c)
Figure 33. Effect of changing β on the clustering results. (a) data set; (b) when β is small; (c) when β is larger. x′1
x1 x2
For details of the development of the backpropagation training algorithm, see (8,12,13,18–19,23).
x′′j
x′2
xi
x′j wji
wkj
wlk
yj
yM
xN-3 xN
x′′K
ABBREVIATIONS AND ACRONYMS Output
Input
yi
′ xN 1 First hidden layer
x′′N
AI CPU MSS SAR k-NN SM RNG LMS
artificial intelligence central processing unit multi-spectral scanner synthetic aperture radar k-nearest neighbor similarity measure relative neighborhood graph least mean square
2
Second hidden layer
Figure 34. Three-layer perceptron.
modify the weights to decrease the error. This two-phase sequence runs through for every training pattern until all training patterns are correctly classified. It can be seen that a huge amount of computation is needed for training.
BIBLIOGRAPHY 1. S. T. Bow, R. T. Yu, and B. Zhou, in V. Cappelline and R. Marconi, eds., Advances in Image Processing and Pattern Recognition, North Holland Amsterdam, New York, 1986, pp. 21–25. 2. B. Yu and B. Yuan Pattern Recognition, 26(6) 883–889 (1993). 3. O. D. Trier, A. K. Jain, Patter Recognition, 29(4) 641–661 (1996).
374
FIELD EMISSION DISPLAY PANELS
4. S. Ghosal and R. Mehrotra, IEEE Transs Image Proce. 6(6), 781–794, (1997). 5. J. A. Ullman, Pattern Recognition techniques, Crane, Russak, N.Y. 1973. 6. K. S. Fu, Syntactic Pattern Recognition and Applications, Prentice-Hall, Englewood Cliffs NJ, 1982. 7. S. T. Bow, Pattern Recognition and Image Preprocessing, Marcel Dekker, New York, Basel, 1992. 8. S. T. Bow, Pattern Recognition and Image Preprocessing, (revised and expanded) Marcel Dekker, New York, Basel, 2001. 9. S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, Boston, 1999. 10. J. T. Tou and R. C. Gonzalez, Pattern Recognition Principles, Addison-Wesley, Reading, MA., 1972. 11. F. Fukunaga, Introduction to Statistical Pattern Recognition, academic Press, Boston, 1990. 12. R. Schalkoff, Pattern Recognition (statistical, structural and neural approaches), J. Wiley, NY, 1992. 13. C. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1990. 14. R. Mizoguchi and O. Kakusho, Proc. Int. Conf. Pattern Recognition, Japan, 1978, 314–319. 15. A. Djouade and E. Bouktache, IEEE Trans. Pattern Anal. Mach. Intelligence 29(3), 277–282, (1997). 16. T. Hastie and R. Tibshhirani, IEEE Trans. Pattern Analy. Mach. Intelligence 18(6), 607–616. 17. R. Urquhart, Pattern Recognition, 15(3), 173–187, (1982). 18. A. Pandyo and R. Macy, Pattern recognition with Neural Networks in C++, CRC Press. Boca Raton, 1995. 19. L. Fausett, Fundamentals of Neural Networks (Architectures, algorithms, and applications), Prentice-Hall 1994. 20. S. I. Gallant, IEEE Trans. Neural Networks, 1(2), 179–191, (1990). 21. E. M. Johnson, F. U. Dowla, and D. M. Goodman, Int. J. Neural syst., 2(4) 291–301, (1992). 22. V. Nedeljkovic, IEEE Trans. Neural Networks, 4(4), 650–659, (1993). 23. J. M. Zurada, Introduction to Artificial Neural Systems, West Publishing Co. St Paul, 1992.
FIELD EMISSION DISPLAY PANELS ROBERT J. HANSON Micron Technology, Inc. Boise, ID
DAVID R. HETZER Timber Technologies, Inc. Freemont, CA
DUANE F. RICHTER Metron Technology, Inc. Boise, Idaho
SCOTT W. STRAKA Seattle, WA
INTRODUCTION The flat panel display (FPD) market grew significantly throughout the 1990s, and the liquid crystal display (LCD)
had the majority of the market share. However, the world monitor market has been, and still is, dominated by the cathode-ray tube (CRT) (1–3). A relatively new kind of FPD called a field emission display (FED) combines the wide viewing angle and brightness of the CRT and the linearity and thinness of the LCD. FEDs exhibit characteristics that allow FEDs to directly compete with both the CRT and LCD. FEDs are smaller, lighter, and consume less power than CRTs. They also can overcome the fundamental performance shortcomings of LCDs such as limited temperature range of operation, limited viewing angle, and slow response to fast-motion video. FEDs may be used in desktop applications, original equipment manufacturer’s (OEM) systems, and medical equipment, where the CRT has dominated for many years. They may also be used in markets such as laptop computers and camcorders that are traditionally dominated by LCDs. It is projected that the world FPD market will reach $35 billion in 2005 (4). As improvements are made by research and development and manufacturing costs are reduced, FEDs have the potential to account for a substantial percentage of the world FPD and desktop market. It is estimated that the high volume manufacturing costs of FEDs will be approximately 30% less than those of active matrix LCDs (AMLCDs) due to fewer process steps and because FEDs do not require a backlight, polarizers, or color filters. The major components that comprise a field emission display are the cathode panel (also referred to as a backplate or baseplate), the emitter tips, the anode panel (also referred to as a faceplate, phosphor screen, or cathodoluminescent screen), the spacers, and the electronic drivers (external circuitry). The emitter tips on the cathode panel provide the electron beams, the anode panel contains the phosphor that gives off light, the spacers prevent the anode and cathode panels from imploding into the vacuum space between them, and the drivers are responsible for controlling the images displayed. Like CRTs, FEDs are cathodoluminescent (CL) devices that involve the emission of electrons from a cathode and the acceleration of the electrons toward an anode where they impinge on phosphors to generate light. In a CRT, one to three electron beams are rastered across the screen and require enough depth to deflect the beam fully. On the other hand, a FED has millions of individual electron sources directly behind each pixel that eliminate the need to raster the beam; therefore, the display can be substantially thin as well as flat. The cathode generally consists of millions of small coneshaped electron emitters, and each picture element (pixel) has a few hundred to several thousand cones. Applying a potential difference between the cathodes and the gate generates cold cathode electrons. Figure 1 shows an array of cathode electron emitters whose tips generate electron beams. The electrons are accelerated to the anode, which is at a much higher potential than the gate. Some FED producers also incorporate a focusing grid between the gate and the anode to enhance the kinetic energy and trajectories of the
FIELD EMISSION DISPLAY PANELS
++
Black matrix
375
Transparent conductor
Light Faceplate
+ −
Extraction grid
Electrons
Spacer
Spacer
Phosphor particles
Insulator Emitter
Baseplate
Emitter electrodes
electrons toward the phosphors (the concept is the same as the focusing plate used in CRTs). Figure 1 also shows the anode panel, which contains the phosphor screen that gives off light when a phosphor is struck by electrons. The light generated by the phosphors is transmitted through the transparent glass substrate and the end user sees the images that the light produces. FED fabrication involves processing the anode and cathode panels individually and incorporating standoffs, called spacers, on one of the panels. As a rule of thumb, FEDs whose diagonals are larger than approximately 10 cm require spacers to prevent the panels from imploding into the vacuum that separates them. Figure 1 shows how spacers are arranged in the matrix so that they do not block the path of electrons from the cathode to the anode and do not interfere with the optical performance of the display. Once both of the panels have been completely processed and spacers have been incorporated onto a panel, the panels are brought close together and are hermetically sealed together so that a vacuum is in the space between the panels. The vacuum is obtained either by sealing the panels together in a vacuum chamber or by using a tubulated seal. Tubulated seals allow evacuating the display outside a chamber, and the tube is pinched off when the desired pressure has been reached. The final steps to create a fully functional FED include attaching the external driving circuitry to the panels and testing the display. The testing process evaluates the optical and electrical functionality of the display to determine if it meets specifications. After the display has been tested, it is assembled into a package that protects it from moisture and physical damage. BACKGROUND FEDs are based on the phenomena of cathodoluminescence and cold-cathode field emission. Cathodoluminescence is a particular type of luminescence in which a beam of electrons strikes a phosphor and light is given off. Luminescence was first written about thousands of years ago in the Chinese Shih Ching (Book of Odes). In 1603, Cascariolo made the first recorded synthesis and observation of luminescence in solids by heating a mixture of barium sulfate and coal. The study of cathodoluminescence burgeoned in 1897 when Braun made the first oscilloscope, the invention of the CRT. The phenomenon of extracting electrons from cold cathodes due to intense electric fields has been described
Figure 1. Cross sectional view of a FED, including baseplate and cathodes, faceplate and phosphor and transparent conductor, spacers, and external circuitry that has the voltage sign convention.
by an approximate theory developed by Schottky (5). The theoretical exposition and correlation with experimental results was greatly improved by Fowler and Nordheim in their now famous article in 1928 (6). Their derivation has led to what is now called the Fowler–Nordheim (F–N) equation and allows for the theoretical treatment of cold cathodes, thereby facilitating characterization and optimization of cold-cathode electron emission devices. The early development of fabrication strategies for integrated circuit (IC) manufacturing in the 1950s and 1960s made it feasible to produce FED anodes and cathodes by processing techniques used in thinfilm technology and photolithography. This allows for a high density of cathodes per pixel array, process controllability, process cleanliness and purity, and costeffective manufacture. The field emitter array (FEA) based on microfabricated field emission sources was introduced in 1961 in a paper by Shoulders (7). The concept of a full-video rate color FED was first presented by Crost et al. in a U.S. patent applied for in 1967 (8). In 1968, Charles A. ‘‘Capp’’ Spindt at the Stanford Research Institute (SRI) had the idea of fabricating a flat display using microscopic molybdenum (Mo) cones as cold-cathode emitters of electrons to illuminate CRT-like phosphors on an anode/faceplate. His basic cathode idea became the focus of nearly all of the ensuing work on field emission displays (9). Since then, considerable effort has been put into research and development of FEDs at academic and industrial levels. Shortly thereafter, Ivor Brodie and others were funded by several U.S. Government agencies to continue development (10). In 1972, the SRI group had demonstrated to their satisfaction that it was feasible to manufacture FEDs (11). Molybdenum Microtips Versus Silicon Emitters A different approach to FEDs was to use a p–n junction in a semiconductor device as a ‘‘cold’’ electron source. In 1969, Williams and Simon published a paper on generating field emission from forward-biased silicon p–n junctions. This work set off a series of experiments using silicon rather than molybdenum as a cold cathode. In 1981, Henry F. Gray and George J. Campisi of the Naval Research Laboratory (NRL) patented a method for fabricating an array of silicon microtips (12). This process involved thermal oxidation of silicon followed by patterning of the oxide and selective etching to form silicon tips. The silicon tips could act as cold cathodes for a FED.
376
FIELD EMISSION DISPLAY PANELS
In December 1986 at the International Electron Devices Meeting, Gray, Campisi, and Richard Greene demonstrated a silicon field emitter array (FEA) that was fabricated on a five-inch diameter silicon wafer (13). At this time, Gray and his colleagues were more interested in demonstrating the benefits of using FEAs as switching devices (like transistors) or amplifiers than as displays, but the possibility of using them for displays was held out as an additional potential benefit. The arguments that Gray made in favor of FEAs was that they might permit faster switching speeds than semiconductors and could be made in either a vertical or a planar structure. They could be constructed using a variety of materials: silicon, molybdenum, nickel, platinum, iridium, gallium arsenide, and conducting carbides. The devices did not require an extreme vacuum to operate, but there were problems of achieving uniformity in FEAs that had to be overcome before they could be used in commercial displays (14). In 1985, when further U.S. research funding was unavailable, the technology development moved to the Laboratoire d’Electronique de Technologie et de l’Informatique (LETI), a research arm of the French Atomic Energy Commission, in Grenoble, France. This is where a group lead by Meyer demonstrated their FPD using Spindt type cathodes, and the field began to sprout (15). This is when the basic technologies were developed and then licensed to a French company called PixTech. They used these technologies to develop their commercialization strategy of building multiple partnerships for manufacturing. This allowed the company to leverage the strengths of its partnerships. Other companies like Raytheon, Futaba, Silicon Video Corporation (later known as Candescent Technologies), Motorola, and Micron Display all developed their own FED programs; all were based on Spindt’s basic idea of an emissive cathode. Of these companies, only Candescent and Micron were not partnered with PixTech in an alliance established between 1993 and 1995. PixTech Jean-Luc Grand Clement put together a short proposal for venture capital (VC) financing of a new display company called PixTech S.F. Advent International became one of his earliest investors. PixTech would be able to produce FEDs using the LETI patents. PixTech S.F. raised capital in incremental amounts, starting at $3 million from Advent in late 1991 and then rising to $10, $22, and $35 million from other investors. More than half of the funds came from U.S. investors. PixTech, Incorporated, was formed as a U.S. corporation in June 1992. Using these funds, Grand Clement purchased exclusive rights to all 16 of the patents that were owned by LETI in early 1992. Then, he convinced Raytheon to become a partner of PixTech so that it would have access to LETI’s patents. Motorola, Texas Instruments (TI), and Futaba also joined the PixTech alliance for the same reason. In March 1996, TI abandoned its FED efforts, and the agreement was terminated. PixTech set up a research and development facility in Montpellier, France, which was leased from IBM
France. They used off-the-shelf equipment in their pilot production plant along with a small amount of FED custom equipment. The plant used a conventional semiconductor photolithography process and a MRS stepper. PixTech started initial production of FEDs in November 1995. These displays were monochrome green, 5.2-inch diagonal, 1/4 VGA (video graphics array) displays that had 70 foot lamberts (fL) of brightness. PixTech opened a small R&D facility in Santa Clara, California, in February 1996. Production yields in the first half of 1996 were erratic. There were residual problems with lithography and sealing equipment. PixTech’s approach involved the use of high-voltage drivers for the cathode, combined with low anode voltages. The high cathode voltages created problems of arcing and expensive drivers, and the low anode voltages meant that PixTech FEDs could not be as bright as CRTs (which relied on high-voltage phosphors) unless better low-voltage phosphors could be developed. The first markets that PixTech targeted were avionics, medical devices, and motor vehicles. These markets were interested in smaller displays in the 4 to 8-inch range (diagonal measurement). Customers in these markets needed very bright displays that were readable from wider angles than were available from LCDs. In November 1996, PixTech announced that it had concluded a memorandum of understanding with Unipac Corporation in Taiwan to use Unipac’s high-volume thinfilm transistor (TFT) LCD manufacturing facility to produce FEDs on a contract basis. PixTech was given the ‘‘Display of the Year’’ award from Information Display magazine for its 5.2-in. monochrome display in 1997. In April 1997, PixTech announced that it would lead a consortium of European companies to develop FED technology under a $3 million grant from the European Union. They would work with SAES Getters and RhonePoulenc to develop getters and low-voltage phosphors, respectively. PixTech also received a zero-interest loan of $2 million from the French government to develop FEDs for multimedia applications. In the second quarter of 1998, PixTech received its first high-volume commercial order from Zoll Medical Corporation. Zoll Medical introduced a portable defibrillator using PixTech’s FEDs. This was the first commercial order for the field emission display. During this same period, the first FEDs came off the production line at Unipac. In May 1999, PixTech closed a deal to purchase Micron Display. As a result, Micron Technology would own 30% of PixTech’s equity and an unspecified amount of liabilities. PixTech said that it would use the Micron Display facility in Boise, Idaho, for the codevelopment of 15-inch displays with a major unnamed Japanese company. Texas Instruments Texas Instruments, along with Raytheon and Futaba, was one of the three initial partners in the PixTech alliance. TI’s initial interest in FEDs stemmed from its desire to have an alternative to TFT LCDs for notebook computers. In 1990, TI was paying around $1,500 per display for TFT LCDs. When PixTech board members suggested that equivalent FEDs would eventually cost
FIELD EMISSION DISPLAY PANELS
around $500 per unit, TI became interested in FEDs. The FED project at TI was put under the control of the analog IC division of the firm that was responsible for digital signal processors (DSPs). TI built an R&D laboratory, and PixTech transferred its proprietary technology to the lab, but the lab had difficulties using the PixTech technologies to build FED prototypes. Futaba Futaba had experience with manufacturing monochrome vacuum fluorescent displays (VFDs) for small devices, such as watches, and was interested in scaling up its work on VFDs. VFDs use low-voltage phosphors, so the fit with PixTech’s FED technology was good. Futaba was also working on a 5.2-inch diagonal, 1/4 VGA display. Production began in November 1995, and the first sample units were sold in December of that same year. Futaba demonstrated prototypes of color FEDs at a number of international conferences after 1997 but had not introduced any of these displays to the marketplace as of mid-1999. Motorola In 1993, Motorola set up a corporate R&D facility to research FEDs. In 1995, the company set up the Flat Panel Display Division of the Automotive Energy Components Sector (AECS). A 6-inch glass pilot production line was built in Chandler, Arizona, in 1996. Later that year, construction of a second-generation 275,000 square foot facility was begun in Tempe, Arizona. Construction was completed in 1998. The new plant was capable of producing up to 10,000 units per month. It was designed to produce multiple displays on 370 × 470 mm glass substrates, a standard size for second-generation TFT LCD plants. Problems of ramping up production occurred soon after the plant was completed. The main problems had to do with sealing the glass tubes, selecting the correct spacers to obtain uniform distances between the anode and the cathode, and discovering the right getters for dealing with residual gases in the cavity. In May 1999, Motorola said that it would delay the ramp up of its FED production facility to solve some basic technological problems, including the limited lifetime of color displays. They announced a scale-back in staff and now consist of a small research team. Micron Display (Coloray) Commtech International sold the 14 SRI patents to Coloray Display Corporation of Fremont, California, in 1989. Shortly thereafter, Micron Display, A Division of Micron Technology, Inc., was formed and purchased a small stake in Coloray. However, this was not enough to keep Coloray from filing for bankruptcy under Chapter 11 in 1992. Micron Display developed a new microtip manufacturing technology for FEDs involving chemical mechanical polishing (CMP). CMP makes it possible to manufacture cold-cathode arrays precisely and reliably without the need for lithography at certain process modules. The process begins with the formation of the tips, which are then covered by a dielectric layer, which in
377
turn is covered by the conductive film that will constitute the FEDs gate structure. The CMP process removes the raised material much faster than the surface material. As the dielectric and conductive layers are polished, the surface becomes flat. A wet chemical etch is used to remove the insulator surrounding the tips (16). This process would allow Micron Display to scale up to displays as large as 17 inches uniformly. After several years of struggling, Micron Technology decided to sell its Display Division to PixTech in exchange for an equity stake in the firm. Candescent In 1991, Robert Pressley founded a company called Silicon Video Corporation (SVC), which was renamed Candescent Technologies in 1996. Initial work was started on the basis of a DARPA contract to develop a hot-cathode thin-film display. Shortly after, it was determined that hotcathode devices would not compete with LCDs in power consumption, and work was switched to cold-cathode efforts. Additional fund-raising efforts continued along with gradual technological success. In May 1994, SVC and Advanced Technology Materials, Inc. (ATMI) received a $1.1 million contract award from DARPA focusing on building FEDs using diamond microtips. It was hoped that depositing thin films of diamond materials on cold cathodes would lower power consumption and hence the cost of building and operating FEDs (17). Difficulties in working with diamond materials stalled advancements. Early in 1994, Hewlett-Packard and Compaq Computers decided to take equity positions in SVC. Several other firms followed, including Oppenheimer Management Corporation, Sierra Ventures, Wyse Technology, Capform, and ATMI (18). In September 1994, SVC purchased a 4-inch wafer fabrication facility in San Jose, California, and in October SVC received a $22 million grant from a U.S Technology Reinvestment Program (TRP) to develop its FEDs further. In March 1995, the first prototypes were actually produced using ‘‘sealed tubes’’. In June 1995, the first ‘‘sealed tube’’ prototype came off the line. SVC first showed its 3-inch prototype display publicly at Display Works in San Jose in January 1996. Shortly thereafter, SVC (now called Candescent) upgraded to 320 × 340 mm display tools that can produce 5-inch prototypes, and the first units came off this line in February 1998 (19). In November 1998, ‘‘Candescent and Sony Corporation announced plans to jointly develop high-voltage Field Emission Display (FED) technology’’ (20). Candescent also has strong partnerships with some of the computer industry’s dominant players. They have an alliance with Hewlett-Packard Company as both a major customer partner and a development partner; Schott Corporation, a leading maker of special glass; and Advanced Technology Materials Inc. (ATMI), thin-film technology specialists (21). As of 1 May 1999, Candescent had received more than $410 million in funding from investing strategic partners, venture capital firms, institutional investors, U.S. government-sponsored organizations, and capital
378
FIELD EMISSION DISPLAY PANELS
equipment leasing firms. They are committed to deliver full-motion, true color video image quality, low-cost flatpanel displays to the marketplace. Candescent intends to become a major supplier of flat-panel displays for notebook and desktop computers, communications, and consumer products (19). THEORY To understand how a FED functions as an information display device, it is necessary to know the physical, electrical, and optical properties of its major components: the cathode and anode panels, the spacers, and the external driving circuitry. This requires defining each component sufficiently and explaining the theoretical foundations of each component. Once the functionalities of these components are understood, the application of these components to fabricate a display device of the field emission type is easily realized. This section outlines the theoretical treatment of the major components of a FED so that the reader reasonably understands them. This is only meant to be a short summary; the reader is urged to refer to the literature for complete derivations and in-depth theory development. The Cathode A cathode is a negatively charged electrode; hence, the FED panel that houses the electron source of the display is often called the cathode panel. FED cathode panels are usually fabricated so that the cathodes are cone shaped and have substantially sharp tips (called emitter tips because electrons are emitted from the tip surface) in a matrix-addressable array. FED cathode materials must be chosen for their processability, process repeatability and robustness, physical and electrical properties, availability, and cost. There has been significant research and development on a number of cold-cathode materials, including tungsten (22), molybdenum (23), carbon (24,25), silicon (26), nickel, and chromium, to name a few. Continued research is being conducted on small and negative electron affinity (NEA) materials that allow lower gate voltages necessary to induce field emission (27). A potential as low as 50 to 500 volts is typically applied between the electron extraction gate and the cathode cones, resulting in an electric field of the order of more than 106 V cm−1 at the tips and current densities greater than 10 A cm−2 (more than 100 µA per emitter) (23). The emission of electrons from a material under the influence of a strong electric field is called field emission (28). Field emission from materials at low temperatures, including the order of room temperature, is termed cold-cathode field emission. Field emission occurs when the electric field at the surface of the emitter, induced by the voltage difference between the gate and cathode, thins the potential barrier in the tip to the point where electron tunneling occurs (10). Field emission is a tunneling process that is described by theories in quantum mechanics and requires the solution of Schrodinger’s wave equation in one dimension to arrive at the well-known and
generally accepted Fowler–Nordheim equation (6,29): −8π(2m)1/2 φ 3/2 e3 E2 exp ν(y) , J= 8π hφt2 (y) 3heE
(1)
where J is the current density (A cm−2 ), e is the elementary charge in C, E is the electric field strength in V cm−1 , h is Planck’s constant in J s, φ is the work function of the electron emitting material in eV, and m is the electron rest mass in kg. The t2 (y) and v(y) terms are image potential correction terms (also called Nordheim’s elliptic functions) and are approximated as t2 (y) = 1.1
(2)
ν(y) = 0.95 − y2 ,
(3)
and
where y is given by y=
φ , φ
(4)
where φ is the barrier lowering of the work function of the cathode material given in terms of e, E, and the permittivity of vacuum ε0 : φ =
eE 4π ε0
1/2 .
(5)
Substituting for the constants e, ε0 , π , h, m, and Eqs. (2), (3), (4), and (5) in (1) yields 10.4 −6.44(10)7 φ 3/2 exp . J = 1.42(10) exp φ 1/2 E (6) The F–N equation shows that the current density is a function only of the electric field strength for a constant work function. In practice, however, the field emission current I and the potential difference V between the cathode and gate are much easier to measure than J and E. This requires relating the F–N equation numerically to measurable quantities so that parameters in real-life cold-cathode devices may be quantified and optimized. Therefore, it is necessary to relate I and V to the F–N parameters J and E, so that the observed phenomenon of cold-cathode field emission may be compared to theoretical predictions. This is facilitated by setting
−6
E2 φ
I = JA
(7)
E = βV,
(8)
and
where A is the emitting surface area (cm2 ) and β is the field emitter geometric factor or field enhancement factor (cm−1 ). Due to the sharpness of the tip, a phenomenon known as field enhancement allows a strong electric field
FIELD EMISSION DISPLAY PANELS
to form at a low applied voltage. Substituting Eqs. (7) and (8) in (6) yields 10.4 β2V2A exp φ φ 1/2 −6.44(10)7 φ 3/2 . × exp βV
I = 1.42(10)−6
(9)
By plotting log(I/V 2 ) on the abscissa and 1/V on the ordinate, a negative sloping linear plot, known as a Fowler–Nordheim plot, is obtained. A typical F–N plot is shown in Fig. 2. The bent line in Fig. 3 shows how the impact of the electric field changes the barrier height between a metal and vacuum in the energy band diagram. The difference in the vacuum level Ev and the Fermi level Ef is the barrier height and is called the work function of the metal. The difference between Ev and the top of the field-modified band structure is the barrier lowering given by Eq. (5). The difference between the maximum height of the fieldmodified band structure and Ef gives an effective work function and is a direct result of the applied electric field. The change in the barrier due to the applied electric field results in electron tunneling. This result tells us that to have a constant current I for a fixed potential difference V between the cathode and the gate, the variables φ, β, and A must all remain constant across time (this may also be done if they vary to offset variations in each other, but this is not desired). Therefore,
Fowler-Nordheim plot
−8
log10(I/V2)
−9 −10 −11 −12 −13 −14 0.01
0.015
0.02 1/V
0.025
0.03
Figure 2. A Fowler–Nordheim plot.
Ev
No field Barrier height Electron tunneling
Ef
Metal With field
Figure 3. Metal–vacuum energy band diagram showing barrier height without an applied electrical field and with an applied electrical field.
379
it is necessary to determine which phenomena give rise to change in each of these variables and to prevent the changes. Changes in the work function φ of the emitter are attributed primarily to the adsorption of chemically active gas species such as oxygen, hydrogen, carbon dioxide, and carbon monoxide. Special measures must be taken to ensure that gases like these are not able to contaminate the vacuum area of the display. The anode may also be a source of these contaminants, and various attempts have been made to reduce the amount of contaminants originating from the anode. The geometric factor β generally changes due to variations in the surface roughness of the emitter or when the radius of the tip is changed. The former is influenced by ion bombardment, in which ions impinge on the cathode surface and sputter the cathode, resulting in atomic-scale surface roughening. These ions either are generated in a gas-phase electron ionization process or by electron-stimulated desorption of gases/ions from the anode. It has been shown that temperature changes change the emitter radius. Theories for the temperature-dependent change in the emitter radius of field emission cathodes have been developed and include the effect on emitter radius in the presence of an electric field (30). Variations in emitter radius due to temperature are typically neglected because they are small for the typical operating temperatures of FEDs. The electron emitting area A at the surface of the cathode plays a smaller role in the current than φ and β, as can be seen by inspecting the F–N equation [Eq. (9)]. A is not included in the exponential term of the equation and thus, has an obvious lesser effect on the emitter current. It can also be seen that the slope of the F–N plot remains constant and A cannot change if φ and β are held constant. However, if either φ or β change, A is likely to change. Various techniques may be used to quantify A to an order of magnitude approximation (23). Field emission currents can vary from tip to tip even though the same potential difference is applied between the cathode and their respective extraction grids. These variations are generally attributed to localized and global variations in φ, β, and A across the pixel arrays and the cathode panel, respectively. Variations in emission current can result in optical nonuniformities and reduce image quality. These image variations can be greatly reduced by incorporating extremely large numbers of emitter tips per pixel array (100–5,000) aided by electrical compensation techniques such as the use of a resistive ballast layer under the tips or thin-film transistor (TFT) switches to eliminate current runaway (31). After evaluating the variables that affect field emission current and the phenomena that change these variables, it is obvious that the pressure in the vacuum space of the sealed display must be as low as possible. As a rule of thumb, the lower the vacuum pressure in a sealed FED, the better the stability of the cathode tips, and the longer the life of the display. There will always be some degree of outgassing when gases are desorbed from surfaces within the device during
380
FIELD EMISSION DISPLAY PANELS
operation. Outgassing is combated by using a gettering material such as barium (32) or silicon (33), among others, to capture residual gases located inside the vacuum area of the sealed display. Generally, the final pressure in a sealed FED should be less than 10−6 torr (1.3 × 10−7 kPa) to meet the vacuum requirements of stable cathode operation. The Anode Images produced by a FED are formed in exactly the same way as produced by the CRT by a process called cathodoluminescence. Electrons generated by the cathode panel are accelerated toward the anode by applying a higher voltage to the anode (1–10 kV), relative to the cathode; hence, the name anode panel. The electrons bombard the phosphor screen that is located on the anode panel, and the result is light generation (wavelengths in the range of 400–700 nm). Light generation in cathodoluminescence is due to the transfer of electron energy to the phosphors and the subsequent energy transfer and relaxation mechanisms within the phosphors. Certain energy relaxation processes result in energy given off by the phosphor as visible light. Low voltage (1–10 kV) phosphor technology is one of the most important developmental areas in FED research (34). The promise of FEDs to provide a very bright, long lasting, fast responding, and low power consuming display is highly dependent on the phosphor screen. Traditional CL phosphors have been developed so that the CRT is most efficient at high voltages in the range of 10–35 kV. Low-voltage phosphors must be developed that exhibit the desired chromaticity coordinates (give off light in the color spectrum where the human eye is most sensitive), high luminance efficiency, and long operating lifetime (>10,000 hours). In 1931, the Commission Internationale de l’Eclairage (CIE) standard for color characterization was developed, the commonly accepted chromaticity diagram provides a set of normalized values for comparing colors of differing wavelengths. This standard is commonly used to characterize the visible emission spectrum of phosphors. The luminous efficiency ε in units of lumens per watt (lm/W) is defined as the energy of visible light emitted per amount of input power and is given by (34) ε=
π LA , P
(10)
where L is the luminance in cd m−2 , A is the area of the incident electron beam spot in m2 , P is the power of the electron beam in watts, and the factor of π is used because the emission of the phosphor is lambertian. Equation (10) is used to evaluate the potential of CL phosphors for use in FEDs. The efficiency and chromaticity coordinates of all phosphors degrade over time due to burning and coulombic aging. At very high loading, phosphors may suffer permanent damage called burning, which may result in a burnt image on the display. Coulombic aging is the gradual degradation in phosphor brightness due to the operation of the display. The reduction in phosphor efficiency can be directly related to the overall integrated coulombic charge
through the phosphor. The emission intensity I after aging is given by I0 , (11) I= (1 + CN) where I0 is the initial intensity, N is the number of electrons deposited into the phosphor per unit area, and C is an aging parameter of the phosphor/display. Low voltage CL phosphors usually contain dopant species called activators, which act as the luminescent source. Activator ions are incorporated into the phosphor matrix and are surrounded by the host crystal, which forms the luminescent centers where the process of luminescence takes place. These dopants are usually present in small percentages, otherwise they will lie too close together in the host matrix and possibly quench neighboring dopant species. An activator dopant that is added to a host material that does not normally luminesce is called an originative activator, and an activator that improves or increases the luminescence of the host material is called an intensifier activator. More than one activator (coactivator) species may also be used to induce the phenomena of cascade or sensitized luminescence (35). Cascading occurs when a coactivator introduces new energy absorption bands and nonradiatively transfers energy to the other activator(s). Sensitization, on the other hand, is caused by radiative energy transfer. Phosphors must also be wide band-gap materials (>2.8 eV), must form low-energy surface barriers to allow efficient electron injection, and must be highly pure. Some high luminous efficiency, low-voltage red phosphors include Eu3+ doped oxides such as Y2 O2 S •• Eu, Y2 O3 •• Eu, and YVO4 •• Eu. For green phosphors, ZnO •• Zn has been significantly researched and extensively used; however, Gd2 O2 S •• Tb and ZnS •• Cu,Al show high luminous efficiencies of the order of 30 lm/W at 1 kV. ZnS •• Ag, ZnS •• Ag,Cl,Al, and SrGa2 S4 •• Ce are three promising blue phosphors. These phosphors are chosen for their chromaticity coordinates, luminous efficiency, and resistance to coulombic aging. It has also been found that some sulfur-based phosphors contaminate the cathode tips as the phosphors degrade, and this must also be considered when choosing a phosphor (34). Improvements in phosphor synthesis, screening, particle size uniformity, and characterization of degradation mechanisms are still needed and are ongoing. These areas are the primary focus of low-voltage phosphor research and development for use in FEDs. Anode panels generally have a black matrix pattern incorporated into their film stack structure. A black matrix is comprised of an opaque material that usually contains silicon or chromium and is used to increase color contrast between pixels. Gettering materials may also be incorporated into the black matrix grille (33). Because the anode is the high-voltage panel and must be transparent to allow the end user to view the visible light emitted by the phosphors, it is necessary to have an optically transparent (85–95% transmission of visible light) and electrically conductive (resistivity of 10−4 –10−2 cm, or sheet resistance on the order of 10 /square) material for the anode electrode. Some
FIELD EMISSION DISPLAY PANELS
typical electrode materials consist of silver oxide (AgO), zinc-doped zinc oxide (ZnO •• Zn) (36), and indium tin oxide (ITO) (37,38). ITO is the most common transparent conductor material for FPDs. The optical and electrical properties may be optimized by obtaining the ideal combination of free carrier concentration and free carrier mobility (39) aided by thermal annealing processes (40). It is also well known that ITO surface properties may be altered to improve electrical properties by subjecting the film to chemical treatment (41,42). It is common to deposit a layer over the phosphor screen, called a binder or dead layer, to bind the phosphors to the anode, otherwise phosphor grains may become dislodged from their pixels and cross contaminate adjacent pixels of differing phosphors. The binder increases the voltage required to activate the phosphors because the electrons lose energy while passing through the layer. It is also important for ambient light that strikes the display not to cause any unwanted glare due to reflection. Therefore, it is customary to use antireflective coatings (ARCs) such as silicon nitride (SiNx ) or silicon oxynitrides (SiOx Ny ) (43). These films are widely used in the semiconductor industry as antireflectants for photolithographic processing. The Spacers It is well known that FEDs must have a high vacuum in the gap between the cathode and anode panels. As a rule of thumb, FEDs whose diagonals are more than 2 inches require special standoffs, called spacers, to prevent the panels from imploding into the vacuum space. Of course, extremely thick glass may be used to fabricate large area FEDs, which defeats the purpose of creating a small, lightweight display device. Substrate glasses 1–3 mm thick are usually used for fabricating large area FEDs. FEDs whose diagonals are larger than 30 cm use thin glass primarily to reduce the weight of the display. To be effective, spacer structures must possess certain physical and electrical characteristics. The spacers must be sufficiently nonconductive to prevent catastrophic electrical breakdown between the cathode array and the anode. They must exhibit sufficient mechanical strength to prevent the FPD from collapsing under atmospheric pressure and must have the proper density across the display so that there is minimal warping of the substrates. Furthermore, they must exhibit electrical and chemical stability under electron bombardment, withstand sealing and bake-out temperatures higher than 400 ° C, and have sufficiently small cross-sectional area to be invisible during display operation. It is also important to be able to control the amount of current that passes from the anode to the cathode (opposite the flow of electrons) via conduction of the spacers because it is an important consideration for power consumption and charge bleed off. Charge bleed off is important in preventing charging of the spacers and subsequent electrostatic electron repulsion and associated image distortion. The physical forces (due to atmospheric pressure) that act upon a spacer induce normal compressive stresses in
381
the spacer. The average stress σs at any point in the crosssectional area of a spacer As is given by the normal stress equation for axially loaded members (44): σs =
Fs , As
(12)
where Fs is the internal resultant normal force in each spacer (for this type of problem, it is the same as the force acting on the spacer). If there are n evenly spaced spacers and Fd is the atmospheric force acting on the display given by (13) Fd = Pa Ad , where Pa is the atmospheric pressure, Ad is the area of the vacuum space of the display parallel to the cross-sectional area of the spacers and is given by the product of the length Ld and width Wd of the display, then, Ad = 2Ld Wd , and Fs =
Fd . n
(14)
(15)
The factor of 2 in Eq. (14) is required because there are two panels, one on each end of the spacer. Rearranging and substituting Eqs. (13), (14), and (15) in (12) yields the stress per spacer in terms of known and measurable quantities: 2Pa Ld Wd . (16) σs = nAs Because the maximum allowable stress applied to a spacer should be known (it is an intrinsic property of the spacer material), it is possible to calculate the minimum number of spacers required to withstand atmospheric pressure by rearranging Eq. (16) and solving for n. To ensure the proper electron beam cross sections, it is important to be able to control cathode emission currents, the emission area, and the gate opening; it is also important to have a uniform and controllable gap separating the cathode and anode panels. For this reason, spacer dimensions are controlled to a high degree of precision. Consideration must be given to the sealed package as well. The lengths of spacers subjected to atmospheric pressure and its associated normal compressive strain ε contract by δ (As is assumed to be constant). The average unit axial strain εs in a spacer is given by (45) δ , (17) εs = Ls where Ls is the original length of the spacer. Hook’s law for elastic materials may be used to relate the stress and strain numerically by the simple linear equation: σs = Eεs ,
(18)
where E is Young’s modulus, also called the modulus of elasticity. Substituting Eq. (16) and (17) in (18) and
382
FIELD EMISSION DISPLAY PANELS
rearranging yields δ=
2Pa Ld Wd Ls . nAs E
(19)
From Eqs. (16) and (19), it is possible to calculate the optimum number of spacers n to be used, as well as their dimensions As and Ls , so that the sealed display has good vacuum integrity and proper spacing between panels. Note that we have neglected the change in the diameter of the spacers due to the compressive forces of atmospheric pressure. This is based on the assumption that the spacer length is much greater than the width and that Poisson’s ratio for the spacer is small (0.23 for soda-lime glass). Additionally, displays should be overdesigned to have an excess of spacers so that the atmospheric pressure and any other outside loads do not result in mechanical failure. The coefficient of thermal expansion (CTE) is another important property of the spacer. CTE mismatches between the substrate and spacer can result in residual stress and possibly the destruction of the spacers during sealing. For this reason, using the same glass as the substrate is often the best choice for spacer fabrication. However, CTE matched ceramics have also been found useful as spacer structures. The Drivers All display technologies require unique electronic circuitry to transform the display assembly from an inert piece of glass into a useful device that can present real-time visual information. In contrast to the now commonplace analog CRT, FEDs use primarily digital techniques to control the display of video information. FED technology is still in development, but the principal approach for driving these displays is well known within the industry. To extend the gate–cathode arrangement to a flat panel display, one possible approach is to connect the emitter tips to the columns of a matrix display and divide the grid into rows. To excite a particular pixel, one applies a sufficient positive select voltage to a row and a negative, or at least less positive, modulation voltage to a column. Generally, a single row is selected, and all the columns are simultaneously switched. This allows addressing the display a row at a time, instead of a pixel at a time, as in the raster scan of a CRT. A gate–cathode voltage difference of 80 V is sufficient to generate full pixel luminosity for a typical FED. A black level is attained at 40 V or less. The 40-V difference, is used to modulate the pixel between on and off states (46). As in most FPD technologies, the overall display area of a FED is subdivided into a rectangular matrix of pixels that are individually addressed in a row and column arrangement. From a sufficient distance, the human eye fuses these discrete points of light to form an image. For this discussion, a common SVGA (super VGA) resolution (800 columns × 600 rows) color display will be used as an example. In this display, the matrix formed by the intersections of these rows and columns yields 480,000 pixels. Even larger matrix sizes are possible, such as 1, 024 × 768 or even 1, 600 × 1, 200 with a corresponding increase in complexity. To depict
color, each pixel is further divided into red, blue, and green subpixels. Although a myriad of subpixel geometries is possible, the most common arrangement consists of vertical color stripes oriented to form a square pixel. This requires multiplying the number of columns by a factor of 3. Thus, the example color SVGA display would require 800 × RGB = 2, 400 discrete columns × 600 rows. Note that most current approaches constructing a FED are so-called passive displays, which positions the active drive circuitry at the periphery of the display. The typical FED consists of an active area populated by emitter tips and intersecting row and column metal lines that have high-voltage IC driver chips located around the edge. These exterior driver chips attach to the individual rows and columns via a flexible electronics substrate similar to a printed circuit board. An individual chip may be responsible for driving hundreds of rows or columns simultaneously. One consequence of this passive approach is that the narrow metal address lines that cover the entire display can be resistive as well as mutually capacitive where they overlap. The resulting RC transmission line behavior can distort and delay the outputs from the external driver chips, which can affect display sharpness and contrast. Maximum rise and fall times of the column and row signals can limit the maximum scan rate and gray-scale capability of the display. Furthermore, the large capacitive load of the lines requires that the drive ICs supply large transient currents. Figure 4 shows a block diagram of the overall electronics of a FPD system used in driving a FED. An analog-to-digital (A/D) portion interfaces with the video signals commonly encountered in the real world such as analog television (NTSC/PAL) or computer video. These analog signals are converted to digital values that can be applied to the display. Commonly, red, green, and blue are represented by 6- or 8-bit digital values that result in 64 or 256 discrete levels of brightness. It is also becoming increasingly more common for digital data to be available directly from a computer source. In this case, no A/D conversion is required, and less noise is introduced into the video data. Once these digital signals are acquired, they may need to be rescaled to match the native resolution of the FPD. For instance, VGA video (640 × 480 pixels) would use only a portion of the SVGA example display. To perform this nonintegral scaling without artifacts is difficult, but high-performance ICs are becoming available to do this task cleanly (47). Then, the appropriately scaled digital data are piped into a video controller IC that directs the data to the specific column and row drivers. The driver chips located at the edge of the display’s active area receive the digital data and control signals allowing them to drive the row and column address lines to the required levels. Another important component is the system power supply that must generate the row, column, and anode potentials in addition to the logic level voltages. Finally, a user interface is required to allow the viewer to control display parameters such as contrast and brightness. In operation, a single output of a row driver is selected that turns on a row by raising its voltage from ground potential to +80 V. Column drivers have been previously loaded with data for that particular row. After a slight
FED block diagram Column data & control
User control
Column drivers
383
Row data & control
FIELD EMISSION DISPLAY PANELS
Video control Video data
Video controller
Logic voltages Logic voltages NTSC/PAL Computer video video
Voltage control
FED active area M cols x N rows
Row drivers
A/D conversion video scaling
Row drivers
Row data & control
Voltage supply Display voltages Column date & control
Column drivers
Figure 4. FED driving system block diagram with various electronic elements.
delay that allows the row to charge to its final value, column outputs are either turned on by swinging to ground or remain in their off state at +40 V. In this manner, an entire row of selected pixels is illuminated simultaneously. After a certain amount of time, the active columns are reset to the off state, and the next row is selected. The columns of new data are again selectively switched, and the process repeats. The entire display is scanned row by row at a rapid rate. Ideally, the rows should be refreshed upward of 60 times a second to minimize flicker. A simple calculation shows that the example SVGA display would have a row time of 1/(60 rows/s × 600 rows) or 27.8 µs. Simply switching rows and columns between on and off states is useful in producing black or white pixels to yield a binary display. This is appropriate in many applications such as text displays where gray scale is not needed. However, to render gray levels accurately, the amount of light from individual pixels must be modulated. Two primary methods for attaining modulation involve controlling the amplitude of the column drive, called amplitude modulation (AM), or varying the time the column is turned on, called pulse-width modulation (PWM). AM can be accomplished by modulating the overall column voltage swing, but nonuniform pixel emission will result in a grainy appearance in the display. Another approach to modulating the pixel is to control the column current. However, the large transient current used to charge the capacitive column is in the milliamp range, whereas the emitter current is only microamps. Neither of these approaches has proven feasible for passive displays, so the most popular method for attaining gray-scale encoding is pulse-width modulation. Figure 5 shows representative column and row waveforms in a gray-scale FED. As in the binary display, rows are sequentially enabled. The column drivers, loaded with 6- or 8-bit data, are used to turn on column lines for a variable amount of time that is proportional to the encoded value. Six bit data would require 64 individual time increments for the column driver. For the SVGA FED, these time increments would be in multiples of approximately 400 nanoseconds. Each of the color subpixels is similarly
controlled and allows display of a full range of colors and intensities. DISPLAY FABRICATION A high degree of process control, large-scale manufacturing capability, and the ability to generate small features, makes semiconductor processing the manufacturing method of choice for field emission displays. Semiconductor processing is the general term for techniques in thin-film processing and photolithography that, involve depositing and selectively removing thin films supported on a substrate. FED cathodes and anodes are fabricated by this type of processing, combined with the spacers in an assembly technique, and the finished product is incorporated into a package with the driver system. To understand how the cathode and anode panels are fabricated, it is necessary to know photolithography and thin-film processing. In general, the process flow is to place a material on the panel (deposition), pattern it (photolithography), and remove it selectively (etch). Thin films may be deposited by a number of means. Physical vapor deposition (PVD) and chemical vapor deposition (CVD) are two modes of thin-film deposition. Sputtering and evaporation are two main forms of PVD. Sputtering offers a wide variety of choices for material deposition but can be high in cost, depending on the material. Evaporation offers high purity deposition
Pulse-width modulation timing On Row N Off On
Row N+1
Off Off Column M On Variable time on
Variable time on
Figure 5. Waveform diagram for driving system.
384
FIELD EMISSION DISPLAY PANELS
(partially due to the high vacuum levels used) but can be difficult and expensive to scale up to large substrate sizes (48). During sputtering, a panel is placed in a vacuum chamber. Also inside the chamber is a target made of the material to be deposited. Positively charged atoms (e.g., from argon or other halogen gases) are created by a plasma and are accelerated into the target (negatively charged electrode) due to a potential difference. The result is that the target material is deposited on the anode (a FED panel in this case). Evaporation is a PVD technique that involves a filament, which is made up of the material to be deposited. The material is heated to its boiling point (the point at which it evaporates) by electron beam or thermal evaporation and subsequently condenses on the substrate assisted by gas flow. There are many types of CVD processes. Two types are atmospheric pressure chemical vapor deposition (APCVD) and plasma-enhanced chemical vapor deposition (PECVD). APCVD is typically used for thin-film oxide depositions. Some APCVD can be done only on silicon wafers because the deposition temperature is above the average melting point most glass substrates used in FED production (∼ 600 ° C). PECVD uses a plasma to aid in depositing materials by forming highly reactive ions from the source gases. This process can also be done at high temperature, however, temperatures of 200–500 ° C are sufficiently low for processing FED substrates. PECVD is also used for many deposition processes within LCD manufacturing. Photolithography is a process by which a pattern is transferred to a substrate. Photolithography uses a lightsensitive material called a photoresist that can copy the image from the master, or mask, by changing its properties in the exposed areas, not in the unexposed areas. The exposed areas of the photoresist are chemically altered so that they can be removed by a solution containing water and a developer that is commonly a basic liquid, such at tetramethyl ammonium hydroxide (TMAH). A typical process would consists of coating the panel with photoresist, baking the resist to remove solvent, aligning the mask to the substrate (panel) to ensure proper placement of the master image, exposing the unmasked areas to light (usually ultraviolet light for semiconductor processing), and finally, removing the exposed resist (for a positive resist) and exposing the thin film underneath that is to be removed by the next process step, etching. Etching in semiconductor processing is a technique for selectively removing a particular material from other surrounding materials. There are two main categories of etching, wet and dry. Wet etching uses acid and base chemistries and generally gives an isotropic etch profile, whereas dry etching uses gases and a glow discharge (plasma) and is normally used to generate an anisotropic (directional) etch profile, that is, a vertical profile. Wet etching often is faster and more selective than dry etching. The term dry etching is used to encompass the general topic of gas phase etching of a specific surface by physical reaction, chemical reaction, or some combination of the two (48). Either wet or dry etching is used depending on the materials that are being processed, the desired selectivity, and the sidewall profile results.
Because the FED is an electronic unit, it is necessary to control each of the panels produced, so that it has the desired electrical properties of its films. Doping the films by purposefully incorporating impurities into them is the generally accepted method for altering their electrical properties. The most common methods of doping are by ion implantation and in situ doping. Ion implantation is the process for implanting high-energy ions through the surface of the material to be doped. In situ doping is doping the film during deposition. In situ doping is the preferred method for FED fabrication. The Cathode Using standard semiconductor processing tools, cathode fabrication is slightly more complicated than production of the anodes. The first step is to choose a material for the substrate. The material must have many specific physical properties, including strength and a coefficient of thermal expansion (CTE) that matches most of the other materials used in the processing steps. CTE is a property of a material that indicates how much the material expands or contracts when it experiences a thermal history. The two main types of materials chosen for the cathode substrate are silicon wafers and glass. The glass types that are used vary from a typical window type glass to highly specialized glass that is ultrapure. Glass composition may include oxides of silicon, boron, phosphorous, calcium, sodium, aluminum, and magnesium, to name a few (49). Borosilicate and aluminosilicate glass are two types of special glasses that are routinely used in FED manufacturing. Many LCD makers use aluminosilicate, which contains oxides of aluminum within the silicon dioxide. Glass can be provided in a wide variety of sizes, including very large sheets (1,000 × 1,000 mm). Several smaller displays may be fabricated on these large substrates, this is known as multi-up processing. Silicon wafers are single crystals of silicon that are produced in sizes up to 300 mm in diameter. It is impractical to make field emission displays on silicon wafers if the active area of the display is going to be more than about 5 cm. Silicon wafers are easy to process but extremely expensive as a starting material. The semiconductor industry uses them for two main reasons. First, it is necessary for them to use silicon because it is the building block of a semiconductor device. Second, it is possible to fabricate a plethora of devices on each wafer, thus splitting up the total cost of the wafer among thousands of parts. Once the substrate has been selected, it is time to start processing the FED. Probably the most important step in all FED processing is the cleaning step. There are many cleaning steps in semiconductor processing, and FED processing is no different. A particle on a cathode as small as a human hair would be enough to destroy the quality of the emission from the entire device. In addition, any organic materials that get on the cathode or in the package can contribute to a phenomenon known as outgassing. The first process completed on the glass substrates as they arrive at the fabrication facility (fab for short) is cleaning. Normally, once the panels have been
FIELD EMISSION DISPLAY PANELS
cleaned, no human hands can contact them until they are completely processed. The entire process is automated in the interest of cleanliness. After the panels are washed, it is necessary for them to receive a thin film deposit of a barrier (protection) layer. The barrier is typically a silicon dioxide film used to protect the subsequent thin film layers from any surface defects that may be on the glass. Following the barrier deposition, the panels receive a deposit of metal, which is the conductor for the columns used in conjunction with the row metal to address (by matrix addressing) the emitters. After metal deposition, the panels receive the very precise pattern that the columns (and later the rows) require. This is done by using the photolithographic process described earlier. The next step is resistive silicon deposition. This thin layer of silicon is deposited on the cathode so that when driven, each set of emitter tips is limited in the amount of current it can draw. This is important because if one set of tips draws an unexpectedly large amount of current (e.g., due to a short), the entire set of tips or worse yet, the driver circuits could be destroyed. Again, this thin film is processed so that all excess material is removed by the etching process, and the final pattern remains. This point in the processing scheme is where the two main methods (the CMP and Spindt methods) for fabricating cathodes diverge. The greatest difference in the way FED cathodes are made is in the emitter tip process module. The CMP method deposits a layer of doped silicon approximately 1 µm thick and then a layer of silicon dioxide that will act as a hard mask to pattern the tips. The hard mask is photopatterned by using circles for the locations of the tips. Before the tips can be formed, the hard mask is etched so that circles, or caps, will be above each future tip location. The tips are formed by plasma etching the silicon to shape it into cones. The oxide hard mask is removed by a wet chemical etch. Next, two more thin films are deposited on the panel. First, a layer of oxide is placed on top of the tips to isolate them electrically from each other. Second, a layer of metal, called the extraction grid, is deposited on top of the oxide layer. This grid is usually a metal or semiconductor, such as niobium or silicon, respectively. This second metal is used as the gate electrode to extract electrons from the emitter tips. After these metals are deposited, bumps remain where the materials have conformed to the shapes of the tips. By flattening these bumps, the extraction grid metal is removed from the area directly above the tips, leaving the oxide exposed. The process by which the bumps are flattened is called chemical mechanical planarization (CMP) (50). This process is shown in detail in Fig. 6. The final step in fabricating tips is etching the oxide layer, chemically and isotropically, to create cavities. These cavities play a crucial role in allowing proper electric fields to be generated, that are necessary to extract electrons. However, if this process is left as the final step, the oxide layer can serve as a physical protection layer for the tips until all further processing has been completed. The last step in fabricating the cathode then becomes wet chemical etching of this oxide layer.
Silicon dioxide cap layer
385
Patterned photoresist
Doped silicon (Tip silicon)
Silicon
Oxide
Figure 6. Schematic diagram of tip formation using the CMP process.
The Spindt method of manufacturing tips differs mostly in order of operations. In addition, instead of removing material to form the tips, they are actually formed into their sharp conical shape as they are deposited. The thick oxide layer, on which the grid metal sits is deposited first. Then, the grid metal is deposited on top, and photopatterned by using circles, just as for the hard mask in the previous example. The grid metal is dry etched so that circles expose the underlying oxide layer. The oxide layer is isotropically wet etched, creating a cavity under the grid. Before the tips are formed into cavities, a layer is deposited so that the tip material that is deposited outside the cavities can be removed later. This sacrificial layer is sometimes called the liftoff layer. When the opening is set, the cathode is placed in an evaporation chamber where a metal, usually molybdenum, is deposited at an angle normal to the holes. As the holes fill up, the conical shape of the emitter tips take form. When the process is complete, the original holes are filled and closed off. An electrochemical etch is employed to remove residual tip metal from the top of the extraction grid and tips. This process is known as the Spindt technique, and the tips are called Spindt emitters (9). This is the most common method for producing field emitter arrays used in FEDs. Regardless of emitter type, the next step in cathode fabrication is depositing the metal that will be used to form the rows. The row metal is similar if not identical to the column metal. The functionality of the row and column metal give the display its x, y addressability. The driver chips can then easily turn on any specific row and column thus illuminating one defined pixel (or subpixel). After the row metal is patterned and etched, a layer of a dielectric material may be deposited. This material is usually silicon dioxide or silicon nitride. Because the anode can be at high voltages, this dielectric may be necessary to suppress rogue emission from unwanted, possibly jagged sites.
386
FIELD EMISSION DISPLAY PANELS
The Anode Anodes are much less complicated in terms of number of processing steps. A typical anode can be fabricated in approximately four to seven major fabrication steps (a major step is normally defined by a photolithographic mask step), compared to cathodes, which can require upward of 5 to 10 fabrication steps. Because every pixel site on the cathode has a corresponding pixel site on the anode, the layout of the pixel defining area is normally identical to that which defines the pixel/tip locations on the cathode. Large area FEDs require that the anode and cathode are fabricated on substrates comprised of the same material. However, small area FED cathodes may be made on a crystalline silicon substrate, whereas the anode is usually made on glass. The first step in anode fabrication is to choose a material to deposit on the substrate that will be electrically conductive and optically transparent, such as ITO. The next step in making a FED anode is to deposit and pattern the black matrix. The black matrix contributes to the contrast ratio and can act as an antireflectant. The final step(s) in the anode fabrication process is to deposit phosphor. The methods employed here depend on many factors, including whether the display will be monochrome or color, high or low voltage (on the anode), and the resolution of the display (VGA, SVGA, XGA extended graphics array, etc.). Two main methods for phosphor deposition are electrophoretic (EP) deposition and slurry deposition. Electrophoretic deposition is a process for placing ionic phosphor particles into an electrolytic solution. The anode is coated with photoresist everywhere except where phosphor is to be deposited. The panel is then placed into the solution, and a potential difference is applied between it and a reference electrode. The charged particles migrate to the panel and coat the surface. The photoresist is stripped off the panel along with any undesired phosphor that may have been left on it. This leaves a well-defined pattern of phosphor where the resist was not located. Slurry phosphor deposition is fully coating an anode with a particularly pigmented phosphor (blue, for example) and then patterning and selectively removing that phosphor to form the precise pattern necessary for pixel definition. This process is repeated two more times for the other colors (red and green). The two most common methods for slurry deposition are spin coating and screen printing. The Spacers One of the more important concerns for FED spacers is the choice of material. Among other things, the spacer is responsible for preventing the evacuated display from imploding. Therefore, it is of the utmost importance for the spacers to be strong. In addition, it is the daunting task of a spacer to occupy a relatively small (10–400 µm) footprint and to be tall enough to withstand large gaps (200–3,000 µm). High aspect ratios between 4 and 20 are not easy to achieve, and good quality spacers are increasingly difficult to fabricate when considering the physical requirements for support and resistivity.
Spacers can be fabricated in a variety shapes, sizes, components, and methods. Methods of fabrication have been derived from traditional glass drawing, molding, and micromachining. Spacer structures can incorporate solid or hollow structures (51). Different designs that have been used in spacer technology include cylinders, triangles, squares, rectangles, trapezoids, and crossshaped structures. Application of these small spacer structures to either substrate (cathode or anode) is a particularly difficult task. The spacers must be placed on the cathode or anode within ±10 µm in some cases. Handling single structures that are 25 × 200 µm is very difficult without specialized equipment. Moreover, depending on size, a typical display may require placing from thousands to tens of thousands of these small structures in precise locations. Clearly, handling these spacers individually becomes infeasible in production. One method for spacer development is to fabricate the spacer directly on the substrate. Some historical methods of directly employing spacers include screen printing, stencil printing, reactive ion etch (RIE), laser ablation (52), and thick-film photopatterning of organic structures (53). Each of these methods has serious technological limitations. Screen printing suffers from an inability to produce the proper aspect ratio, and RIE is an inherently slow process that leads to high-cost production. Many individual spacers need to be attached to the substrate via some procedure that allows fast and accurate placement. Methods of attaching separate spacer structures include anodic bonding (54), metal-to-metal bonding (55), and various methods of gluing. When spacers are manufactured from soda-lime glass, the presence of sodium allows a method of attachment known as anodic bonding. At high potential and high temperature, sodalime glass spacers can be bonded to silicon substrates (or glass substrates that have silicon sites connected to a conductor), by countermigration of oxygen and sodium ions. Other methods of attaching spacers include using both organic glues and inorganic glass frit (56). Issues of charging and electrical breakdown are understood, and solutions have been derived to eliminate these catastrophic failure mechanisms from an operating display. Although there have been many great accomplishments in spacer technology, it is still in its infancy. The Assembly The technology for evacuating a glass envelope though an exhaust tube exists. (CRT manufacturers have been doing it for years). However, the fundamental shape and design of the glass pieces are different in a CRT than in a FED. The use of an exhaust tube in a FED package removes the advantage of producing a thin display. In addition, exhaust tubes can introduce potential failure mechanisms. The technology exists today for assembling and sealing the glass components in a vacuum that eliminates the need for an exhaust tubulation process. The fundamental pieces of the sealing process are CTE matching, incorporation of a getter, and alignment of the substrates to each other. Initial work on the assembly of FEDs was done using tubulated seals. Tubulated seals involve a two-step process
FIELD EMISSION DISPLAY PANELS
whereby the anode and cathode are sealed together in an inert atmosphere that does not oxidize components of the display, leaving a small tubulation. The second step is to pull vacuum through the tubulation and pinch off the tube via various glass manipulations. A high temperature bake-out process can be accomplished using the tabulated sealing approach while actively pumping out the display. Some FED manufacturers have shifted focus to nontubulated sealing, which has the added benefit that it is a single-step process (57). The sealing glass (also called frit) is typically a powder form of glass that is CTE matched to the substrate in the display. The powder is combined with an organic material that allows placing the powder glass on the panel precisely. When the organic material is removed (through an elevated thermal cycle), the glass powder is below its flow point so that organic materials cannot be trapped inside and increase the porosity of the sealing glass (58). High porosity and improper wetting of the sealing components can be direct causes of vacuum leakage. Equally important to assembly and ultimately final operation is alignment of the cathode to the anode. It is imperative to line up the cathode pixels (or subpixels) precisely with their counterpart pixels/subpixels on the anode. Misalignment by more than 5 microns on highdefinition displays can destroy color purity as well as image quality. Alignment tools are normally custom designed for FED developers. CRT manufacturers have a rough alignment procedure and achieve exacting alignment using internal (magnetic) methods, making corrections after the CRT is manufactured. The CTE is a material property of utmost concern in assembling anodes and cathodes. If the CTE of the components (anode, cathode, spacers, and sealing glass) are not properly matched, then one component will expand or contract at a different rate than the others and breakage may occur. Often, there are CTE mismatches that the sealing technology must overcome. One such issue is sealing temperature. Temperature constraints from the high end come from glass and thin-film materials (e.g., no temperatures over 450 ° C for soda-lime glass due to decreased glass viscosity). Constraints from the low end come from the getter materials (some getter materials are not activated until they have experienced temperatures over 400 ° C). Other types of getters can be activated by a point heat source, which eliminates the need to heat the entire display to activate the gettering material. This closes the tolerances for some of the assembly materials. Some of the materials needed include a sealing glass solvent vehicle that totally burns off at less than 400 ° C and a sealing glass powder that completes its transition to glass at less than 450 ° C. Sealing glass powders are available for temperatures lower than 450 ° C however, the getter materials that work much below that temperature are hard to obtain. Recently, significant advances in gettering technology have allow for the development of lower temperature sealing methods (59). Insufficient vacuum levels in the presence of high electric fields (like those in operating FEDs) can lead to a condition known as glow discharge and/or voltage
387
breakdown, known as arcing. Both of these events are catastrophic failures in FED devices. PERFORMANCE AND PARAMETERS Field emissive display technology is still evolving; a number of different manufacturers are pursuing unique approaches. Unlike more mature display technologies, such as CRTs, it is difficult to describe general parameters for a typical FED. However, the physics defining the manufacture of a FED result in a certain envelope of operation. Key parameters that define this envelope for a FED include luminance, viewing angle, contrast ratio, resolution, size (display diagonal), operating temperature, response time, color purity, power consumption, and lifetime. The luminance of a field emissive display is determined by a combination of the potential difference between the grid and tip, the tip current density, the anode acceleration voltage, the phosphor efficiency, and the duration of excitation for each row. A typical VGA implementation whose values for these parameters are attainable can range from 100–600 cd/m2 (cd is the abbreviation for candela, the unit of luminous intensity). Extreme examples of this technology can yield displays as high as 1,000 cd/m2 or more. Two important parameters that are inherent in FEDs are wide viewing angle and quick response time. Unlike LCD displays that use polarizing filters that result in a distinct angular response (typically ±60° ), FEDs are an emissive technology that produces light off-axis. For the end user, this results in a display that can be viewed at angles of ±80° or greater in both the horizontal and vertical directions. Response times of FEDs are similar to CRTs because both are cathodoluminescent technologies. Typical response times of an FED phosphor are measured in fractions of a second (the response time of an FED is of the order of 10–30 µs). The response time is determined by the phosphor persistence and the scan rate of the display. This enables both FEDs and CRTs to achieve true video rate image presentation. In comparison, the viscosity of liquid crystal material limits an LCD response time to milliseconds. Operating temperature is another distinct advantage of FEDs. The manufacturing process involves sealing the displays at high temperatures (normally above 300 ° C). Although normal operating temperatures are rarely this high, the limiting factor is often the electronics that drive the display. FEDs could actually operate at temperatures above 100 ° C, except that the silicon base resistor is temperature-dependent and would require compensation. The benefit of this temperature independence means that FEDs have ‘‘instant on’’ capabilities at a wide range of temperatures that appeal to military and automotive markets. The contrast ratio depends in part on the type of phosphor used. A popular phosphor ZnO •• Zn, pronounced zinc oxide zinc, emits blue/green light and is white when it is not excited. This results in a large amount of reflected ambient light. The ZnO •• Zn phosphor has a contrast ratio between 20 : 1 and 40 : 1 in a dark room (60). In
388
FIELD EMISSION DISPLAY PANELS
comparison, the contrast ratio of a typical home television set is around 30 : 1 to 50 : 1 (61). Using color phosphors, the contrast ratio can be increased by using a black matrix between the pixels. Color purity, response time, and lifetime all depend on the choice of phosphor (monochrome vs. color, material composition). Color purity is probably the most difficult parameter to quantify because it is related to the presence (or lack) of impurities in color uniformity. These impurities can be due to both the phosphor and fluctuations in magnetic fields (for CRTs) within the display. Qualitatively, color purity can be reported as good (having no localized distortions) or bad (containing distortions). Preliminary attempts in developing an FED have focused on lower resolution and smaller display size to facilitate research and development. To date, development efforts on FEDs have been in the 1/4 VGA to XGA pixel format. Display sizes in this pixel format range from less than 1 inch to as much as 13 inches in diagonal (62). As of May 2000, a 13.1’’ color SVGA FED was demonstrated by Candescent, Inc. at a Society of Information Display Exhibition. Power consumption for a field emissive display is dominated by two specific factors. Neglecting the power required for digital control logic, which is minimal, most of the energy is required to charge and discharge capacitive columns as well as drive the anode current across the vacuum gap (from emitter tip to phosphor). Capacitive switching power consumption is calculated by the relationship P = 12 (fCV2 ), where f is the switching frequency, C is the capacitance, and V is the voltage being switched. Row capacitive switching is quite small because only one row is switched at a time and the row scan rate is relatively slow. Column switching, however, can be very significant due to the large column voltage swing, the large number of individual columns, and the fact that all the columns may be switched every row. For the color VGA display example, 640 × RGB = 1, 920 columns, assuming a column capacitance of 1 nF, then the overall column capacitance is 1.92 µF. At a 60 Hz display refresh rate, the columns could potentially switch up to every row which is 480 × 60 Hz = 28.8 kHz. For a 25-V column swing, the resulting column power consumption would be 17.3 W. The anode power consumption is calculated simply from the anode current and voltage by P = IV. This depends, in part, on the degree of luminance that the display needs to produce, but values of 10–15 W for anode consumption are acceptable. Without attempting to reduce the column swing or finding more efficient phosphors, total display power consumption could be in the range of 30–45 W. The field emissive display lifetime is determined by phosphor degradation, emitter tip aging, and highvoltage breakdown between the anode and cathode. If a robust, high-voltage vacuum system can be maintained to eliminate the breakdown issue, phosphor and emitter tip stability remain to be solved. Differential aging is used to describe the phenomenon where illuminated pixels degrade differently from dark ones. This same situation manifested itself in early CRTs where screen-savers were necessary to prevent a latent image from being ‘‘burned’’
into the screen. The goal is a display than can operate more than 10,000 hours and age negligibly. AUTHORS’ FINAL STATEMENTS Although field emission displays have not yet entered the major consumer markets, the technology is growing toward that end. Within the next 5–10 years, this emerging technology will share in the world display market. The first company that can produce and profit by selling FEDs will most likely emerge as a powerhouse in this new arena of flat panel displays. ABBREVIATIONS AND ACRONYMS A/D AECS AM AMLCD APCVD ARC ATMI CIE CL CMP CRT CTE CVD DARPA DSP Ef Ev FEA FED FPD IBM IC ITO LCD LETI NEA NRL NTSC OEM PAL PECVD PVD PWM R&D RIE SiNx SiOxNy SRI SVC SVGA TFT TI TMAH TRP VC VFD VGA XGA
analog-to-digital Automotive Energy Components Sector amplitude modulation active matrix liquid crystal display atmospheric pressure chemical vapor deposition antireflective coating Advanced Technology Materials, Inc. Commission Internationale de l‘Eclairage cathodoluminescent chemical mechanical planarization or chemical mechanical planarization cathode-ray tube coefficient of thermal expansion chemical vapor deposition Defense Advanced Research Projects Agency digital signal processor Fermi energy level vacuum energy level field emitter array field emission display flat panel display International Business Machines integrated circuit indium tin oxide liquid crystal display Laboratoire d’Electronique de Technologie et de l’Informatique negative electron affinity Naval Research Laboratory National Television Standards Committee original equipment manufacturer phase alternating line plasma enhanced chemical vapor deposition physical vapor deposition pulse-width modulation research and development reactive ion etch silicon nitride silicon oxy-nitride Stanford Research Institute Silicon Video Corporation super video graphics array thin film transistor Texas Instruments tetramethyl ammonium hydroxide Technology Reinvestment Program venture capital vacuum fluorescent display video graphics array extended graphics array
FIELD EMISSION DISPLAY PANELS
BIBLIOGRAPHY
36. K. Tominaga (1998).
et al.,
Thin
Solid
Films
334,
389 35–39
1. R. Young, Semicond. Int. 97–100 (May 1998).
37. K. Utsimi et al., Thin Solid Films 334, 30–34 (1998).
2. B. Fedrow, Solid State Technol. 60–67 (September 1998).
38. I. Baia, Thin Solid Films 337, 171–175 (1999).
3. M. Abbey, Inf. Display 14, 14–17 (1998).
39. A. K. Kulkarni and S. A. Knickerbocker, J. Vac. Sci. Technol. A 14, 1,709–1,713 (1996).
4. Stanford Resources Incorporated, FID 1999. 5. W. Schottky, Z. F. Physik 14, 80 (1923).
40. D. V. Morgan et al., Renewable Energy 7, 205–208 (1996).
6. R. H. Fowler and L. W. Nordheim, Proc. R. Soc. London A 119, 173–181 (1928).
41. F. Neusch et al., Appl. Phys. Lett. 74, 880–882 (1999).
7. K. R. Shoulders, Adv. Comput. 2, 135–293 (1961).
43. S. Wolf and R. N. Tauber, Silicon Processing for the VLSI Era vol 1: Process Technology, Lattice Press, Sunset Beach, 2000, pp. 524–527.
8. Thin Electron Tube with Electron Emitters at Intersections of Crossed Conductors, US Pat. 3,500,102, March 10, 1970, M. E. Crost et al., (to United States of America). 9. C. A. Spindt, J. Appl. Phys. 39, 3,504 (1968).
42. C. C. Wu et al., Appl. Phys. Lett. 70, 1,348–1,350 (1997).
44. R. C. Hibbeler, Mechanics of Materials, Macmillan, NY, 1991, pp. 114–117.
10. H. Busta, J. Micromechanical Microengineering 2, 43–74 (1992).
45. K. P. Arges and A. E. Palmer, Mechanics of Materials, McGraw-Hill, NY, 1963, pp. 8–10.
11. J. D. Levine, Flat Panel Display Process and Res. Tutorial Symp., San Jose, June 21–22, 1995.
46. R. T. Smith, Inf. Display 14, 12–15 (1998).
12. Process for fabricating Self-Aligned Field Emitter Arrays, US Pat. 4,964,946, October 23, 1990, H. F. Gray and G. J. Campisi, (to United States of America).
48. M. Mandou, Fundamentals of Microfabrication, CRC Press, Boca Raton, 1997, pp. 53–113.
13. D. Pidge, Jpn. Econ. Newswire July 4, 1989. 14. Vacuum Electronics Could Find Use in HDTV Technology, ASP 31(8), 33 (August 1989). 15. R. Meyer et al., Proc. Jp Display 513 (1986). 16. V. Comello, R & D Mag. (December 1996). 17. ATMI Announces Flat Panel Display Contract: $1.1 Million Contract for Field Emission Display Cathode a Step Toward Diamond Semiconductor Commercialization, Bus. Wire, May 3, 1994. 18. SVC-Private Placement Memorandum–March 4, 1996, p. 40.
47. A. Y. Lee, Inf. Display 14, 30–33 (1998).
49. W. H. Dumbaugh and P. L. Bocko, Proc. SID 31, 269–272 (1990). 50. Method to Form Self-Aligned Gate Structures Around Cold Cathode Emitter Tips Using Chemical Mechanical Polishing Technology, US Pat. 5,372,973, April 27, 1993, T. Doan, B. Rolfson, T. Lowery, and D. Cathey (to Micron Technology, Inc.). 51. Method for Manufacturing Hollow Spacers, US Pat. 5,785,569, July 28, 1998, D. M. Stansbury, J. Hofmann, C. M. Watkins, (to Micron Technology, Inc.).
19. Candescent WEB page http://www.candescent.com /Candescent/.
52. Spacers for Field Emission Display Fabricated Via SelfAligned High Energy Ablation, US Pat. 5,232,549, Aug. 3, 1993, D. A. Cathey et al., (to Micron Technology, Inc.).
20. Candescent and Sony to Jointly Develop High-Voltage Field Emission Display (FED) Technology, Bus. Wire, November 2, 1998.
53. Sacrificial Spacers for Large Area Displays, US Pat. 5,716,251, Feb. 10, 1998, C. M. Watkins, (to Micron Display Technology, Inc.).
21. Candescent and Schott Partner to Establish Flat-Panel Display Manufacturing Infrastructure in the United States, Bus. Wire, May 14, 1997. 22. A. Van Oostrom, J. Appl. Phys. 33, 2,917–2,922 (1962). 23. C. A. Spindt et al., J. Appl. Phys. 42, 5,248–5,263 (1976). 24. J. E. Yater etal, J. Vac. Sci. Technol. A 16, 913–918 (1998). 25. A. Weber et al., J. Vac. Sci. Technol. A 16, 919–921 (1998). 26. R. N. Thomas et al., Solid-State Electron. 17, 155–163 (1974).
54. Anodically Bonded Elements for Flat Panel Displays, US Pat. 5,980,349, November 9, 1999, J. J. Hofmann et al., (to Micron Technology, Inc.). 55. Method for Affixing Spacers within a Flat Panel Display. US Pat. 5,811,927, September 22, 1998, C. L. Anderson and C. D. Moyer, (to Motorola, Inc.). 56. Flat Panel Display Having Spacers, US Pat. 5,789,857, August 4, 1998, T. Yamaura et al., (to Futaba Denshi Kogyo K.K.).
28. R. W. Wood, Phys. Rev. 5, 1 (1897).
57. Field Emission Display Package and Method of Fabrication, US Pat. 5,788,551, August 4, 1998, D. Dynka, D. A. Cathey Jr. and L. D. Kinsman (to Micron Technology Inc.).
29. D. Temple, Mater. Sci. Eng. 5, 185–239 (1999).
58. J. W. Alpha, Electro-Optical Syst. Design 92–97 (1976).
30. J. P. Barbour et al., Phys. Rev. 117, 1,452 (1960).
59. Low Temperature Method for Evacuation and Sealing Field Emission Displays, US Pat. 5,827,102, Oct. 27, 1998, C. M. Watkins and D. Dynka (to Micron Technology, Inc.).
27. K. W. Wong et al., Appl. Surf. Sci. 140, 144–149 (1999).
31. H. Gamo et al., Appl. Phys. Lett. 73, 1,301–1,303 (1998). 32. Method of Making a Field Emission Device Anode Plate Having an Integrated Getter, US Pat. 5,520,563, May 28, 1996, R. M. Wallace et al. (to Texas Instruments, Inc.). 33. Anode Plate for Flat Panel Display Having Silicon Getter, US Pat. 5,614,785, March 25, 1997, R. M. Wallace et al. (to Texas Instruments, Inc.). 34. L. E. Shea, Electrochem. Soc.-Interface 24–27 (1998). 35. H. W. Leverenz, An Introduction to Luminescence of Solids, Dover, New York, 1968, pp. 333–341.
60. R.O. Peterson, FED Phosphors: Low or High Voltage? Inf. Display 13(3), March 1997 pp. 22–24. 61. A. F. Inglis and A. C. Luther, Video Engineering, 2nd ed., McGraw-Hill, 1996, pp. 121–123. 62. Candescent Technologies Corporation, San Jose, CA, 2001. http://www.candescent.com/Candescent/showcase.htm.
390
FLOW IMAGING
FLOW IMAGING NOEL T. CLEMENS The University of Texas at Austin Austin, TX
y
z
INTRODUCTION Imaging has a long history in fluid mechanics and has proven critical to the investigation of nearly every type of flow of interest in science and engineering. A less than exhaustive list of flows where imaging has been successfully applied would include flows that are creeping, laminar, turbulent, reacting, high-temperature, cryogenic, rarefied, supersonic, and hypersonic. The wide range of applications for flow imaging is demonstrated by the recent development of techniques for imaging at microand macroscales. For example, (1) and (2) report imaging velocity fields in 100-µm channels, and (3) describes a schlieren technique for imaging density gradient fields around full-scale supersonic aircraft in flight for the study of sonic booms. Impressively, the range of flow length scales spanned by these techniques is more than six orders of magnitude. Traditionally, flow imaging has been synonymous with ‘‘flow visualization,’’ which usually connotes that only qualitative information is obtained. Examples of flow visualization techniques include the imaging of smoke that has been introduced into a wind tunnel or vegetable dye introduced into a water flow. Owing to the complex and often unpredictable nature of fluid flows, flow visualization remains one of the most important tools available in fluid mechanics research. Excellent compilations of flow visualization images captured in a number of different flows can be found in (4) and (5). Modern flow imaging, however, goes far beyond qualitative flow visualization. Advances in computer, laser, and digital camera technologies have enabled the development of imaging techniques for obtaining quantitative images of a large number of flow variables such as density, temperature, pressure, species concentration, and velocity. Image data of this type enable the computation of a number of quantities that are important in fluid mechanics research, including vorticity, strain rate, dissipation, and heat flux. As an example of the power of flow imaging, consider Fig. 1, which shows a 3-D volume of the conserved scalar field in the far field of a turbulent water jet (6,7). The jet was seeded with a fluorescent dye, and the image volumes were captured by recording the fluorescence induced by a thin laser beam that was swept through the flow. The beam was swept rapidly in a raster fashion, and the fluorescent images were recorded by using a high-speed 2-D photodiode array. The resulting data volumes resolve the finest scales of mixing in three spatial dimensions and time, and when several such volumes are acquired sequentially, the data enable studying the temporalevolution of the conserved scalar field. These data can
x
z
Figure 1. Three-dimensional rendering of the conserved scalar (ζ ) field measured in a turbulent water jet using laser-induced fluorescence of a fluorescent dye seeded into the jet fluid. The cube is approximately 27 mm on each side, and the data resolve the finest scalar and vorticity scales in the flow. (Reprinted with permission from Quantitative Flow Visualization via Fully-Resolved Four-Dimensional Spatio-Temporal Imaging by W. J. A. Dahm and K. B. Southerland, in Flow Visualization: Techniques and Examples, A. J. Smits and T. T. Lim, eds., Imperial College Press, London, 2000.) See color insert.
yield details of the mixing process and even the complete 3-D, unsteady velocity vector field within the volume (7). This example shows that flow imaging is providing the type of multidimensional, multiparameter data that could be provided only by computational fluid dynamics not too long ago (8). Most imaging in fluid mechanics research involves planar imaging, where the flow properties are measured within a two-dimensional cross section of the flow. This is most often accomplished by illuminating the flow using a thin laser light sheet, as shown in Fig. 2, and then recording the scattered light using a digital camera. The laser light is scattered from either molecules or particles in the flow. The primary emphasis of this article will be on this type of planar laser imaging because it remains the cornerstone of quantitative imaging in fluid mechanics research. Furthermore, planar imaging is often a building block for more complex 3-D imaging techniques, such as that used to produce Fig. 1. Readers interested in
Pulsed laser Timing electronics
Laser sheet Flow facility
CCD camera with filter Cylindrical telescope
f2
f1 Spherical lens
Glass flat
Image acquisition computers White card
Video camera
Figure 2. Schematic of a typical planar imaging experiment.
FLOW IMAGING
details of qualitative flow imaging techniques should note that several good references are available in the literature (5,9,10). Quantitative imaging is substantially more challenging than simple visualization because a greater degree of knowledge and effort are required before the researcher can ensure that the spatial distribution of the flow property of interest is faithfully represented in the image. The first part of this article will discuss some of the most important issues that need to be addressed in quantitative flow imaging. The article will end with a brief survey of primarily planar imaging techniques that have been developed. This survey will not be able to discuss all, or even most, of the techniques that have been developed, but hopefully readers will gain an appreciation for the wide range of techniques that can be applied to their flow problems. BASIC PLANAR LASER IMAGING SYSTEMS Lasers Lasers are used almost universally in flow imaging, owing to their high brightness, coherence, excellent focusing properties, and the nearly monochromatic range of wavelengths at which they operate. Lasers can be either pulsed or continuous wave (CW); pulsed lasers are more commonly used because they provide high-energy pulses that are sufficiently short (e.g., 10 ns) to freeze the motion of nearly any flow. Most lasers used in flow imaging operate at visible or UV wavelengths (11). One of the main reasons for this is that until recently, there were few low-noise imaging arrays that operate outside of the UV-visible to near-IR wavelength range. Furthermore, some techniques, such as Rayleigh and spontaneous Raman scattering, increase in scattering efficiency as the frequency of the incident light increases, and therefore UV and visible lasers have a large advantage over IR sources. Furthermore, planar laser-induced fluorescence (PLIF) techniques typically involve the excitation of atomic/molecular electronic transitions, which occur primarily at UV and visible wavelengths for species of interest in fluid mechanics. The predominance of techniques in the visible/UV is by no means absolute, however, as recent advances in laser and camera technology have enabled the development of PLIF techniques that rely on the excitation of vibrational transitions at IR wavelengths (12). The most widely used laser in flow imaging is the flashlamp-pumped neodymium: yttrium–aluminum garnet (Nd •• YAG) laser, which emits in the infrared (1.06 µm), but whose output is usually frequency-doubled (532 nm), tripled (355 nm) or quadrupled (266 nm), using nonlinear crystals (13). Frequency-doubled Nd •• YAG lasers are primarily used in particle image velocimetry (PIV), Rayleigh and Raman scattering, and for pumping tunable lasers. Nd •• YAG lasers are essentially fixed frequency, but when injection seeded (a technique that is used primarily to obtain very narrow line width), they can be tuned across a narrow frequency range. This ability to tune is used extensively in a class of techniques called filtered Rayleigh scattering (described later). Flashlamp-pumped Nd •• YAG
391
lasers operate at repetition rates of a few tens of Hz and pulse energies of hundreds of millijoules at 532 nm. One drawback to flashlamp-pumped Nd •• YAG lasers is that their repetition rates are typically much lower than the characteristic flow frequencies typical in most applications; the images are thus not temporally correlated and are effectively randomly sampled from the flow. Excimer lasers provide high-energy pulses of UV light (e.g., hundreds of millijoules at hundreds of hertz) in a narrow range of frequencies that depend on the particular gas mixture that is used. The most commonly used wavelengths in flow imaging are 193 nm (ArF), 249 nm (KrF), 308 nm (XeCl), and 350 nm (XeF). Because Rayleigh and Raman scattering are more efficient at short wavelengths, excimers are particularly attractive for these techniques. Furthermore, versions are commercially available that have narrow line width and are tunable over a small range. These lasers can be used to excite the fluorescence from O2 and NO (193 nm) and from OH (248 and 308 nm), without using a dye laser. Coppervapor lasers are pulsed lasers that produce visible light simultaneously at two wavelengths (510 and 578 nm) and operate at high repetition rates (tens of kHz) but have relatively low pulse energies (a few mJ). Because of their high repetition rates, they have been used extensively for high-speed flow visualization (such as smoke scattering), but they are not as widely used as Nd •• YAG lasers because of their relatively low pulse energies. Flashlamp-pumped dye lasers provide very high pulse energies (e.g., a few joules per pulse) but at repetition rates of just a few hertz. Because of their high pulse energies, they have been used primarily in imaging techniques where the signals are very weak, such as in spontaneous Raman or Rayleigh scattering imaging. For spectroscopic techniques, where it is necessary to tune the laser wavelength to coincide with an atomic/molecular absorption line, then laser-pumped dye lasers and more recently, optical parametric oscillators (OPO) are used. Both dye lasers and OPOs are typically pumped by Nd •• YAG lasers, although dye lasers are also pumped by excimers. The use of CW lasers is limited to low-speed flows (typically liquids) or to high-speed flows where only timeaverage measurements are desired. The reason is that they typically provide insufficient energy in times that are short enough to freeze the motion of most gas flows. For example, a 20-W CW laser provides only 0.02 mJ of energy in one microsecond, compared to a frequencydoubled Nd •• YAG that can provide up to 1 J per pulse in 10 ns. The argon-ion laser is the most commonly used CW laser in flow imaging. The argon-ion laser has found a niche particularly for laser-induced fluorescence of dyes seeded into liquid flows. Some techniques, such as cinematographic imaging, require high-repetition rate light sources such as coppervapor or high-repetition rate diode-pumped Nd •• YAG lasers. The latter achieve repetition rates up to hundreds of kHz by acousto-optic Q-switching of a continuously pumped Nd •• YAG rod. The drawback of these highrepetition rate lasers is that they tend to have low energy per pulse (a few millijoules maximum), despite relatively high average power (e.g., 20–50 W). For
392
FLOW IMAGING
slower flows, electro-optically Q-switched diode-pumped Nd •• YAG lasers can produce repetition rates of the order of a kilohertz and pulse energies of the order of tens of millijoules at 532 nm. Recently, a pulse-burst Nd •• YAG laser has been developed that produces a train of up to 100 pulses at a rate as high as 1 MHz and individual pulse energies at 532 nm of about 25 mJ (14). In another technique, repeated Q-switching of a ruby laser (694 nm) was used to generate a train of 65 pulses at a rate of 500 kHz, where the energy for each of the 65 pulses was about 350 mJ (15). If this laser could operate continuously, its average power would be an impressive 175 kW. These laser systems are not currently available commercially, but they are particularly well suited for imaging very high-speed flows.
When the beam diameter at the lens is the same for both diffraction-limited and multimode beams, then the far-field full-angle divergence, θ = d/f , is the same for both beams. However, if the focal spot sizes (d0 ) are made to be the same — because the multimode beam has a larger diameter at the lens — then the divergence will be M 2 times larger for the multimode beam. This is seen by considering the Rayleigh range, which is an important parameter in imaging because it is a measure of the distance across which the laser beam (or sheet) remains focused. The definition of the Rayleigh range xR is the distance along the beam √ from the focus to the point where the beam diameter is 2 times the diameter at the focus. The relationship is xR =
Optics The focusing properties of laser beams are related to the mode structure of the beam, or specifically to the number of transverse electromagnetic modes (TEM) that characterize the energy flux field (16). A single-mode (TEM00 ) laser beam has a Gaussian intensity distribution and is considered diffraction-limited. Note that in this article, the term ‘‘intensity’’ refers to the power density, or irradiance, of the laser beam (in units of W/m2 ), whereas the term ‘‘fluence’’ refers to the energy density (in units of J/m2 ). The focusing properties of diffraction-limited beams are described by Gaussian optics (16). Many laser beams, however, are not diffraction-limited because they contain many transverse modes. Multimode beams have higher divergence and poorer focusing characteristics than singlemode beams. The degree to which a beam is multimode is often specified by the M 2 value (pronounced ‘‘M-squared’’), where the more multimode the beam, the higher the M 2 value, and where M 2 equals unity for a diffractionlimited beam. Many scientific lasers have M 2 values of 1 to 2, although many lasers, such as copper-vapor or high-power diode-pumped Nd •• YAG lasers, can have M 2 values ranging from tens to hundreds. To see the effect of nonunity M 2 , define the beam diameter d as twice the radius where the laser beam intensity drops to e−2 of the maximum. Assume that a laser beam whose initial diameter is d is focused by a spherical lens of focal length f . The resulting focal spot will have a diameter d0 given by the relationship (17), d0 =
4 f λM 2 πd
(1)
The focal spot diameter for a Gaussian (diffraction-limited) beam is 4f λ/(π d); thus Eq. (1) is the same as for a Gaussian beam, except that λ is replaced by λM 2 . Equation (1) shows that the multimode focal spot diameter is M 2 times the diffraction-limited value for equal beam diameter at the focusing lens. Owing to this, a laser beam is often referred to as being ‘‘M 2 times diffraction-limited,’’ meaning that it will have M 2 times the spot size. Equation (1) also shows that it is possible to get a smaller focal spot by using a shorter focal length lens or by increasing the initial beam diameter (by using a beam-expanding telescope).
π d20 , 4λM 2
(2)
which is the same as the Rayleigh range for a Gaussian beam, except that λ has been replaced by λM 2 . Equation (2) shows that for equal spot size, as M 2 increases, the Rayleigh range decreases because of the greater divergence. It can be concluded from this that aberrated beams can be focused to as small a spot as a diffraction-limited beam (by expanding it before the focusing lens), but the focal spot cannot be maintained over as large a distance. Note that the M 2 value can usually be obtained from the laser manufacturer, but it can also be measured by passing the beam through a lens of known focal length and then measuring the beam diameter at several locations (17). In planar imaging, the laser beam is formed into a thin sheet, which can be accomplished by several different techniques (11). One of the more common methods is shown in Fig. 2 where a spherical lens, which is typically plano-convex and has a focal length of 500 to 1000 mm, is used to focus the beam near the center of the field of view of the camera. Such long focal length lenses are used to increase the Rayleigh range, or the distance across which the beam remains focused. The larger Rayleigh range obtained from long focal length lenses does not come without a cost, however, because the longer focal length lenses also result in larger focal spots, or thicker sheets, in planar imaging. Figure 2 also shows the use of a cylindrical telescope formed from a plano-convex lens of focal length f1 and a larger plano-convex lens of focal length f2 . For high peak power laser beams, it may be best to use a negative (plano-concave) lens as the first lens to avoid a real focus and hence reduce the possibility of air breakdown. The reason for using two plano-convex lenses — where the convex sides are directed toward the collimated beam — is that this configuration minimizes the aberrations for a telescope formed from simple spherical lenses (18). The cylindrical lenses expand the laser beam only in one direction, by a factor of f2 /f1 . Because the laser sheet height is determined by the height of the second cylindrical lens, producing large sheets (e.g., 100 mm) requires a large lens, which can be very expensive. Often, the second lens is omitted, and the sheet is allowed to diverge. The disadvantage is that the laser intensity varies in the propagative direction, which can make it harder to
FLOW IMAGING
correct the image of the scattered light for variations in intensity. Because a laser sheet is formed by expanding the beam in only one direction by using a cylindrical lens, the thickness of the sheet at the focus is approximately equal to the focal spot diameter given by Eq. (1). However, when the sheet thickness must be measured, this can be accomplished by using the scanning knife-edge technique. In this technique a knife-edge (e.g., a razor blade) is placed normal to the laser sheet and is translated across it so that the beam is progressively blocked by more of the knife-edge. The transmitted light is measured by a power meter as the knife-edge is translated. The derivative of the power versus distance curve is the mean sheet intensity profile. For example, if the laser sheet intensity profile is Gaussian, then the knife-edge intensity profile will be an error function. Spreading the laser beam into a sheet results in a large reduction in the intensity (or fluence); thus, when the intensity must be maximized, such as in Raman scattering imaging, the laser sheet can be formed by using a multipass cell (19). In this case, the laser beam is reflected back and forth between two confocal cylindrical mirrors. The main problem in this technique is that the sheet intensity profile is very nonuniform, and the nonuniformity may be difficult to correct for on a singleshot basis. In this case, shot-to-shot fluctuations in the intensity distribution can be left as an artifact in the image. Another technique that can be used in low-velocity flows is the scanning method, where a CW laser beam is swept past the field of view by using a moving mirror (6). If time-resolved data are desired, then the sweep time must be short enough to freeze the motion of the flow. Because of this, the scanning technique is really useful only in liquid flows, which have relatively small characteristic flow timescales. Cameras The most commonly used cameras in quantitative imaging are based on charged-coupled device (CCD) arrays or image-intensified CCD arrays. Note that there are a few applications where film may be preferred to a digital camera, such as large field-of-view PIV (20) and highframing-rate PIV (21,22). Nevertheless, CCD arrays have largely supplanted film and other detectors, including TV tubes, photodiode and charge-injection device (CID) arrays, owing to their low noise, excellent linearity, uniformity, and resistance to blooming. The operation of a CCD is based on the fundamental property that a photon incident on the CCD produces an electron–hole pair in a region of silicon that is biased to some potential. The electrons generated are called ‘‘photoelectrons,’’ which migrate to the ‘‘potential well’’ of the CCD pixel where they are stored for later readout. Because the CCD stores charge, it is essentially a capacitor, whose charge is proportional to the number of incident photons. The quantum efficiency η is the ratio between the number of photoelectrons generated and the number of photons incident. Frontilluminated CCDs have quantum efficiencies of 10–50% at visible and near-IR wavelengths (peaking near 700 nm) but are virtually zero at UV and mid-IR wavelengths.
393
Back-illuminated CCDs, although more expensive, provide quantum efficiencies up to 90% in the visible and can maintain good response (e.g., η = 20%) well into the UV. CCD arrays can be full frame, frame transfer, or interline transfer type (23). Full frame CCD arrays read out the charge by shifting it down through the entire array (like a ‘‘bucket brigade’’) into an output register where it is then read out serially. Because the array is used to shift the charge, the image will be blurred if the CCD is exposed during readout. Because readout can take several seconds, a mechanical shutter must be used. In contrast, frame transfer CCD arrays use a photosensitive array and an identical array that is masked off from any incident light. After an exposure, the charge of each pixel is shifted down through the array into the masked array, and the masked array is then read out in the same manner as a full frame CCD array. Frame transfer CCD arrays offer some level of electronic shuttering, but this is limited to a few milliseconds. The pixel area for both full frame and frame transfer CCD arrays is 100% photosensitive, thus the pixel width is the same as the pixel pitch (spacing). Interline transfer CCD arrays have nonphotosensitive storage registers located adjacent to the photosensors. This enables the rapid transfer of charge (in parallel) from the pixels into the storage registers. This makes it possible to rapidly shutter the array electronically, where exposure times of the order of microseconds or less are possible. The interline transfer arrays also enable ‘‘frame straddling,’’ whereby two frames can be captured in rapid succession. For example, standard RS-170 format video cameras based on interline transfer arrays can acquire two video fields in less than 10 µs between frames (24). More expensive scientific grade interline transfer cameras report interframe times as short as 200 ns. Frame-straddling by video cameras is useful for double-pulse imaging in high-speed flows (25), whereas frame-straddling by higher resolution scientific/industrial cameras (e.g., Kodak ES1.0 and ES4.0) is now becoming the norm for PIV because it enables the use of crosscorrelation processing algorithms. The main drawback of interline transfer imagers is that they tend to be noisier than either full frame or frame transfer imagers. The main reason for this is that the storage registers are located adjacent to the photosensitive sites; therefore the photosensitive area of the pixel is substantially smaller than the physical area of the pixel. The fraction of the pixel area that is photosensitive is called the ‘‘fill factor’’ and is typically 20–30% for an interline transfer CCD. As will be discussed later, the signal scales with the number of photons collected per pixel; thus low fill factors result in low signals. Some manufacturers mitigate this problem to some extent by using microlenses over each pixel to collect light across a larger area and can increase the fill factor to about 60%. If neither electronic shuttering nor frame straddling is required, then full frame or frame transfer imagers are desired to maximize the signal-to-noise ratio (SNR). Generally, the relatively long shutter times are not a problem when pulsed lasers are used because the laser pulse duration acts as the exposure time. Intensified CCD cameras (ICCD) are used for low lightlevel imaging and for very short exposure times (e.g., as
394
FLOW IMAGING
low as a few nanoseconds). The most common type of image intensifier consists of a photocathode, a microchannel plate, a phosphor screen, and a mechanism to couple the screen to the CCD (26,27). Photons that are incident on the photocathode eject photoelectrons, which in turn are amplified in the microchannel plate. The amplified electrons contact the phosphor screen causing photon emission, and these photons are collected by the CCD. The phosphor screen is usually coupled to the CCD by a fiber optic bundle, although lens coupling is also used. Image intensifiers are shuttered by switching on and off, or ‘‘gating,’’ the photocathode by a high-voltage pulse. The electron gain is a function of the voltage applied across the microchannel plate. Short duration gating is necessary to reject the background luminosity of very luminous flows, such as sooting flames or plasmas. Because the duration of the laser scattering signal is often of the order of several nanoseconds, short gates greatly reduce the background luminosity but do not affect the signal. One of the main drawbacks of intensifying CCD cameras is that the intensifiers tend to have both lower resolution and lower signal dynamic range than the bare CCD. The dynamic signal range is usually limited by saturation of the microchannel plate, particularly at high electron gain (26), rather than by saturation of the CCD itself. Furthermore, as will be shown later, it is unlikely that an ICCD camera will provide better SNR than a low-noise CCD camera under the constraint that a certain minimum SNR is required for an image to be useful for quantitative analysis. For these reasons, ICCD cameras are preferred to low-noise UV-sensitive CCD cameras only when fast gating is required, which is why they are primarily used for imaging high-temperature gases.
light scattering are particularly susceptible to a low SNR because the laser beam must be spread out into a sheet; thus, signals are lower by hundreds to thousands of times, compared to a point measurement with the same laser energy. Figure 3 shows a generic camera system that views a region in the flow that is illuminated by a laser light sheet of height yL and thickness z. Assume that the camera uses an array sensor and a lens of known focal length f and limiting aperture diameter D. Each pixel of the camera, of width δx and height δy, transforms to a region in the flow of dimensions, x = δx/m, y = δy/m, where m = yi /yo is the magnification and yi and yo are as defined in Fig. 3. Each pixel also spatially integrates the signal in the z direction across a distance equal to the sheet thickness z. Note that usually in flow imaging, the image is inverted, and the magnification is typically less than unity, that is the object is minified. Now, assuming that a pulsed laser light sheet is used that has a local fluence FL , then the number of photons collected by each pixel Spp will be Spp =
FL dσ V n ηt , hν d
(3)
where h is Planck’s constant, ν is the laser frequency, V = xyz is the volume imaged by each pixel, dσ/d is the differential scattering cross section, n is the number density of the scattering medium, is the solid angle subtended by the lens, and ηt is the transmission efficiency of the collection optics (lens and spectral filters). For a CW laser, FL = IL t, where IL is the laser intensity (power flux density) and t is the integration time. The solid angle, = (π D2 /4)/z2o (where zo is the distance from the object to the lens), is related to the magnification and f number (f# = f /D) of the lens by:
SIGNAL AND NOISE One of the most critical issues in flow imaging is obtaining an adequate SNR. Imaging measurements that use laser
=
π m2 . 2 4 (f# ) (m + 1)2
(4)
yL Collection lens
yo
CCD array (image plane)
yi
dy ∆y
dx
∆x Field-of-view (object plane)
y x z
Laser sheet ∆z
Figure 3. Planar laser imaging of a flow field using an array detector.
FLOW IMAGING
Assuming that the laser sheet is uniform (i.e., the fluence is constant), then the fluence can be approximated as FL = EL /yL , where EL is the laser energy. Now combining Eqs. (3) and (4), and substituting x = δx/m and y = δy/m gives
π 1 4 (f #)2 (m + 1)2
Spp =
1 0.8
nηt .
(5)
Equation (5) shows that the photons collected per pixel actually increase as m → 0, or as the camera is moved farther from the object plane. This may seem counterintuitive because the solid angle subtended by the lens progressively decreases. The reason for this is that x and y increase as the magnification decreases, which means that each pixel collects light from a larger region of the flow. This is correct as the problem has been posed, but is not realistic, because it assumes that the laser sheet has the same fluence, regardless of the field of view. However, in practice, as the camera is moved farther away, the laser sheet must be enlarged to accommodate the larger field of view. To see this effect, assume that the condition yL = yo must be maintained as the magnification is changed; in this case yL = Np δy/m, where Np is the number of pixels in one column of the array. Now, Eq. (5) reduces to
1.2
Spp /(Spp)max
Spp
EL δxδy dσ = hν yL d
395
0.6 0.4 0.2 0
0
0.5
1
1.5 Magnification
2
2.5
3
Figure 4. Relative variation of photons-per-pixel (Spp ) versus magnification for a typical planar imaging experiment.
an electron from a photocathode. The signal Se (in units of electrons, designated as e− ) is given by Se = ηSpp G,
(7)
EL yi dσ π m nηt . hν (Np )2 d 4 (f# )2 (m + 1)2
(6)
This form of the equation is probably the most useful for seeing the effect of varying different parameters. For example, Eq. (6) shows that the signal depends only on the laser energy (actually, the term EL /hν represents the total number of incident photons) and is independent of z or on how tightly the sheet is focused. Although tighter focusing increases the fluence, this effect is counteracted by a decrease in the number of molecules that is available to scatter the light. In addition, as the number of pixels is increased (at fixed detector size yi ), the signal decreases because the pixels are smaller and thus collect light from a smaller area of the flow. This shows the importance of having large pixels (or small Np at fixed yi ) to improve the SNR, albeit possibly at the expense of resolution. The trade-off between SNR and resolution is a fundamental one, whose manifestation in point measurements is the trade-off between SNR and bandwidth (or response time). Equation (6) also shows that Spp ∼ m/(m + 1)2 , a dependence that is plotted in Fig. 4. Here, it is seen that the signal is maximized at a magnification of unity and that there is an abrupt decrease in signal as m → 0. Equation (6) also shows that the signal is inversely proportional to f#2 , and thus it is essential in many imaging techniques to use lenses that have low f numbers. For several techniques such as PLIF, Rayleigh scattering, and Raman scattering in gas-phase flows it is difficult to obtain adequate SNRs using lenses whose f numbers are higher than f /1.2. Equation (6) gives the number of photons incident on a pixel of a generic detector. The resulting signal then consists of the photoelectrons that are generated, whether by creating an electron–hole pair in a CCD or by ejecting
where G is the overall electron gain from the photocathode to the CCD. For an unintensified CCD, G = 1. The noise in the signal will have several sources, but the dominant sources in scientific grade CCD and ICCD cameras are shot noise and ‘‘read’’ noise. Shot noise results from statistical fluctuations in the number of photoelectrons generated at each pixel. The statistical fluctuations of photoelectrons and photons exhibit Poisson statistics, for which the variance is equal to the mean (28). Most of the shot noise arises from statistical fluctuations in the photoelectrons generated, although some noise is induced in the amplification process of image intensifiers. The shot noise (in units of e− ), which is the square root of the variance, is given by (29) Nshot = G(ηκSpp )1/2 ,
(8)
where κ is the noise factor. The noise factor quantifies the noise that is induced through the overall gain process between the photocathode and the array; for an ICCD, it is gain dependent and falls within the range of 1.5 < κ < 2.5. In an unintensified CCD, G = κ = 1, and the shot noise is equal to (ηSpp )1/2 , which is the square root of the number of photoelectrons collected per pixel during the integration period. One way of interpreting the shot noise in a detector array is to consider the case where the array is composed of identical pixels and is illuminated by a spatially uniform light source. If it is assumed that each pixel collects an average of 1000 photons during the integration time and if it is further assumed that η = 0.1, then, on average, each pixel will collect 100 photoelectrons. However, the actual number of photoelectrons collected will vary from pixel to pixel, and compiling a histogram of the pixel values will reveal that the variance of the distribution is equal to the mean number of photoelectrons collected per pixel.
FLOW IMAGING
The dominant noise source intrinsic to scientific grade CCD cameras is ‘‘read noise’’ (30). Read noise is incurred in the output registers in the process of converting the charge of each pixel into a voltage that can be read by an analog-todigital converter. A pixel is read by transferring the charge of each pixel to a small capacitor, whose integrated charge is converted to a voltage by an on-chip amplifier. The dominant sources of read noise are dark-current shot noise, ‘‘reset noise,’’ and output amplifier noise. Dark current is the current that is generated in the absence of incident light due to thermally induced charge carriers. Cooling a CCD greatly reduces the dark current. For example, an uncooled CCD might generate a dark current of 300 e− /s at 20 ° C, but only 1 e− /s at −40 ° C. Owing to the relatively short exposure and readout times that are typically used in flow imaging (of the order of 10 seconds or less), shot noise in dark current is not usually a large contributor to the noise in cooled CCD arrays. Reset noise is injected into the small capacitor by a switching transistor, whose job is to reset the capacitor to a reference voltage in preparation for reading the next pixel’s charge. This switching transistor contaminates the capacitor charge with both ‘‘digital feedthrough’’ and thermal noise. Digital feedthrough noise is caused by capacitive coupling of the clock signals through the switching transistor. These noise sources can be greatly limited by slow (low-bandwidth) readout rates and correlated double sampling (30,31). Because means have been developed to reduce these noise sources, the intrinsic camera noise is typically limited by the on-chip output amplifier to a few electrons rms per pixel (typically 5–20 e− ). When photoelectron shot noise is not the only noise source, then it is assumed that the noise sources are uncorrelated and therefore their variances add. In this case, the SNR is given by (29) SNR =
ηSpp G , 2 )1/2 (ηκSpp G2 + Ncam
(9)
where Ncam is the intrinsic background noise of the camera (in electrons rms) and includes contributions from amplifier noise, digital feedthrough noise, thermal noise, dark-current shot noise, and quantization noise from the analog-to-digital converter. There are several interesting implications of Eq. (9). The first is seen by considering the limit when the signal is dominated by shot noise, that is, 2 . This shot-noise-limited operation when ηκSpp G2 Ncam of the detection system occurs when either the read noise is small or when the signal is high. Equation (9) also shows that it is possible to obtain shot-noise-limited operation by increasing the gain until the first noise term dominates the other. This is the way an image intensifier works; it provides very high electron gain through the microchannel plate and thus causes the shot noise to overwhelm the intrinsic noise sources in the camera. It may seem odd that the goal is to increase the noise, but the signal is also increased as the gain increases, so the SNR either improves or remains constant. At low gain, the signal will be detector-noise-limited. As the gain is increased to arbitrarily high levels, the SNR continues to improve until it reaches the shot noise limit, beyond which the SNR is
constant. This is seen in Eq. (9) by letting G → ∞, in which case the SNR becomes independent of G. Because electron gains of 103 are possible by using single-plate microchannel intensifiers that are typical, it is possible to operate in the shot-noise-limited regime, even when the camera that stores the image has relatively high noise, such as a video format CCD camera. The dynamic range of a CCD — defined as the ratio of the maximum to the minimum usable signals — is limited by the well depth, which is the total number of photoelectrons that can be stored in a CCD pixel, and the intrinsic noise of the camera. Specifically, the dynamic range DR is given by (29) DR =
Se,sat − Sdc , Ncam
(10)
where Se,sat is the signal at saturation (full well) and Sdc is the integrated dark charge. For example, for a cooled slow-scan CCD array whose integration time is short (hence low Sdc ) and has a well depth of 105 e− and noise of 10 e− , then DR ≈ 104 , which is much larger than can usually be obtained in single-shot planar imaging. The dynamic range of an ICCD can be much smaller than this because the electron gain from the photocathode to the CCD effectively reduces the well depth of the CCD (29). For example, if the overall electron gain is 102 , then a CCD that has a well depth of 105 e− will saturate when only 103 photoelectrons are generated at the photocathode. In addition, ICCD cameras may have an even lower dynamic range than that allowed by saturation of the CCD well because of saturation of the microchannel plate (26). Figure 5 shows how the SNR varies as a function of the number of photons per pixel for cameras of high and low read noise, as might be found in video format and slow-scan CCD cameras, respectively. In this figure, it is assumed that η = 0.7 for the low-noise camera and Ncam = 10 e− , and η = 0.7 and Ncam = 200 e− for the high-noise camera. Also shown is the case where the high-noise camera has been intensified. It is assumed that the intensified camera
1000
100 Shot-noise limited 10 SNR
396
1
Camera-noise limited
Low noise CCD High noise CCD Intensified CCD
0.1
0.01 10
100
1000
104
105
S pp (photons/pixel) Figure 5. Variation of the SNR versus signal (Spp ) for three different camera systems.
FLOW IMAGING
has a lower quantum efficiency (η = 0.2) and G = 500. Dark charge has been neglected in all cases. The high-noise camera is camera-noise-limited for the entire range of Spp (hence, the slope of unity on the log–log plot), whereas the low-noise camera is camera-noise-limited only for low Spp . As expected, the SNR is substantially higher for the lownoise camera at all Spp . At higher Spp , the low-noise camera becomes shot-noise limited, as seen by the region where the slope is one-half on the log–log plot. By intensification, the high-noise camera reaches the shot-noise limit even at very low Spp ; thus results in a SNR that is even higher than that of a low-noise camera. However, for Spp greater than about 60, the low-noise camera outperforms the intensified camera, owing to its higher quantum efficiency. Figure 5 also shows that at an overall electron gain of 500, if the well depth is 10−5 e− , the intensified camera saturates the CCD when 1000 photons are incident per pixel. One point to consider is that for flow imaging, it is usually not necessary or desired to intensify a slow-scan low-noise CCD camera, unless gating is required to reject a luminous background. The main reason is that if the signal is so low that read noise is a significant contributor to the total noise, then it is unlikely that single-shot images will be useful for quantitative purposes. For example, assume that a minimum SNR of 20 is desired for quantitative analysis and that the intensified slow-scan camera has κ = η = 1, is operated at high gain, and the CCD has 10 e− rms of read noise. If 100 e− are collected per pixel, then the high gain overwhelms the read noise, and the signal is shot-noise limited, that is, SNR = (100)1/2 = 10, which is well below our minimum value. Now, assuming that 500 e− are collected, then the SNR based only on shot noise is (500)1/2 = 22. However, at these signal levels, the signal is nearly shot-noise-limited, even without the intensifier, because including the camera noise gives a SNR ≈ 20; thus there would be very little benefit in intensifying the CCD. The fact that the intensifier is likely to have a smaller dynamic signal range, worse resolution, lower quantum efficiency, and a larger noise factor than the CCD, makes intensification even less desirable. It is also interesting to consider how the high-noise camera would perform with the signal of 500 e− . In video format cameras, the read noise will be about 100–200 e− rms. Using the lower value, the SNR for the video camera would be 500/100 = 5. In this case, adding an image intensifier would be an advantage because high electron gain could be used to obtain shot-noise-limited operation, so that the SNR = (500)1/2 = 22 (assuming equal η with and without intensification). IMAGE CORRECTIONS Quantitative imaging always requires several correction steps so that the measured signal can be related to the flow property of interest and to ensure that the spatial structure of the object is faithfully represented by the image. First, consider corrections to the signal measured at each pixel of the array. Most planar imaging involves only relative measurements of signal intensity, from which absolute measurements can be obtained by calibrating a single point within the image. To obtain an image
397
that represents quantitatively accurate relative intensity measurements requires making several corrections to the measured image. For example, let Se (x, y) represent the desired signal level at a given pixel or location on the array (x, y). By ‘‘desired’’ it is meant that Se (x, y) is proportional to the number of photons incident on that pixel originating from the scattering process of interest. The signal Se can be related to the total signal (Stot ) recorded at that pixel by the imaging system through the relationship
Stot (x, y, ti , tro ) = w(x, y) L(x, y)Se (x, y) + Sback (x, y, ti ) + Sdark (x, y, tro ),
(11)
where L(x, y) is a function that is proportional to the laser sheet intensity (or fluence) distribution function, Sback is the signal resulting from unwanted background light, Sdark is the fixed pattern signal that occurs with no light incident on the detector, ti is the exposure time, and tro is the array readout time (which includes the exposure time). The function w(x, y) is the ‘‘white-field’’ response function, which accounts for variation in the signal across an image of a uniformly white object. It has been assumed that a pulsed laser is used as the light source, in which case the signal Se is not a function of the exposure time. Furthermore, in general, all of the functions involved in the correction may vary from shot to shot. The desired scattering signal is obtained by solving for Se in Eq. (11): Stot (x, y, ti ) − [w(x, y)Sback (x, y, ti ) + Sdark (x, y, tro )] . Se (x, y) = w(x, y)L(x, y)
(12)
Equation (12) gives a means of obtaining the desired scattering signal image by arithmetic processing of the signal and correction images. Sdark (x, y, tro ) is not noise because it is an offset that is nominally the same for each image that has the same exposure and readout time. The dark image is obtained by acquiring an image when the shutter is closed (or when the lens cap is on) and using the same integration and readout times as in the experiment. The background signal Sback (x, y), is due to reflections of the laser from walls/windows, natural flow luminosity (as in combustion), fluorescence from windows or species not of interest, and external light sources. For nonluminous flows, a good approximation to the background can be obtained by acquiring an image when the laser beam is present but without the scattering medium (e.g., without the fluorescent species seeded into the flow). This is only an approximation of the actual background because the light itself that is scattered from particles/molecules in the flow can reflect from the walls and windows; therefore, an image obtained when the scattering medium omitted may not have the same background signal as during an actual experiment. There is usually no simple way around this problem, but fortunately, this effect is often negligible. It is important to note that the background cannot be measured directly because it is the function wSback that is actually measured when a background image is acquired. In fact, the background image is also affected by the dark signal; therefore, if the background image
398
FLOW IMAGING
is acquired by using the same exposure and readout times as the scattering signal image, then this yields the term Scorrection = (wSback + Sdark ) in Eq. (12). In this case, the correction relationship is simply, Se = (Stot − Scorrection )/(wL). Note also that to reduce the effect of noise on the correction procedure, the images Scorrection (x, y), w(x, y), and L(x, y), should be average images, unless the corrections are made on a single-shot basis. If the flow is unsteady and luminous, the luminosity varies from shot to shot, and therefore, it is more difficult to correct for the background signal. In this case, it is useful to consider the signal-to-background ratio (SBR), Se /Sback , which is sometimes confused with the SNR. Background luminosity is usually not random, and thus it is not noise (although it may appear so if one does not have an easy way to correct for it). One option for dealing with background luminosity is to reduce the luminosity incident on the array through gating, by using an intensified camera or by using spectral filters in front of the camera that pass the scattered light but reject the bulk of the luminosity. Another option is to use a second camera to capture an image of the flow luminosity a very short time before (or after) the laser fires. This assumes, of course, that the flow is essentially frozen for each camera image, which is unlikely to be the case for the millisecond shutter times used for full frame CCD cameras, but it is likely to be true when using microsecond gates and an intensified camera. The laser sheet intensity distribution function, L(x, y), is not easy to obtain, but it can be approximated in a few different ways. In general, the sheet intensity varies in both the x and y directions and from shot to shot. Figure 2 shows a technique, described in (32), for measuring L(x, y) on a single-shot basis. For single-shot corrections, it is necessary to collimate the laser sheet, so that L is a function only of y. In this case, part of the laser sheet energy can be extracted, as done using the glass flat in Fig. 2, and directed onto a target. The glass flat reflects several percent of the laser light from each surface, depending on the angle of incidence (33). In Fig. 2 the target is a white card, although a cell containing fluorescent material could also be used (e.g., laser dye in water, or acetone vapor). The scattering (or fluorescence) from the target must obviously be linear in its response to the incident light intensity and must scatter the light uniformly. In Fig. 2, a video camera is used to image the laser sheet intensity profile. Rather than using a target, it is also possible to image the beam directly using a 2D or linear array. The main drawback of this technique is the risk of damage to the array by the focused laser beam. The scattering image and the sheet profile can be registered by blocking the beam, before the optical flat, at two discrete vertical locations using two very thin wires. Both the scattering image and the profile image will include a shadow of the wires, which can be used to index the two images. If the laser sheet is not collimated, but diverging, this makes it much more difficult to correct for the sheet on every shot. In this case, the laser energy and distribution must be sufficiently repeatable so that L(x, y) can be obtained at a time different from that for the scattering image. The correction image is obtained by
placing a uniform, linear scattering medium in the field of view. Sometimes, it is possible to use the Rayleigh scattering from the air itself, although it is more common to have to introduce a more efficient scattering medium, such as smoke or a fluorescent test cell. Care must be taken when using fluorescent materials, such as laser dyes or acetone vapor, because they will cause substantial absorption of the beam if the concentration is too high. Unless the absorption itself is corrected for, the sheet intensity distribution will be incorrect. Therefore, when using fluorescent media, it is best to use very low concentrations to keep the absorption to less than a few percent across the image. The low concentration may necessitate averaging the correction image over many shots to obtain sufficient SNR. The white-field response function, w(x, y), is obtained by imaging a uniformly white field, such as a uniformly illuminated white card. The signal of a white-field image will tend to decrease from the center of the image because the solid angle subtended by the lens is smaller for point sources located near the periphery of the field of view. The variation in intensity across an image formed by a circular aperture will theoretically follow the ‘‘cosine-to-the-fourth’’ law, or I(β)/I(0) = cos4 β, where β is the angle between the optical axis and a line connecting the center of the lens aperture and the given point on the object plane (18). The white-field response function will also enable correction for variable response of the pixels in the array. Note that the dark charge contribution to the signal must also be subtracted from the white-field image. In some cases, it will be necessary to correct for geometric distortion. The distortion in an image is typically larger for points farther from the optical axis. For this reason, a square will be imaged as an object whose sides either bulge out (called barrel distortion) or in (called pincushion distortion). When using high quality photographic lenses, the maximum distortion is usually small (often less than a pixel). However, when it must be corrected for, this is usually accomplished by imaging a rectangular grid and then warping (or remapping) the image so that each point of the grid is consistent with its known geometry (34). The warping procedure involves finding a large number of ‘‘tie’’ points across the image such as the ‘‘points’’ where two gridlines cross and using these to solve for a set of polynomial coefficients required for the remapping. Pixels other than the tie points are remapped by interpolating among coefficients for the tie points. IMAGING SYSTEM RESOLUTION Even though the proper specification of the resolution of the imaging system is often critically important to a particular application, it is often neglected in flow imaging studies. For example, it is not unusual to find scalar imaging papers that quote the resolution in terms of the area that each pixel images in the flow. In many cases, however, this is not the factor that limits the resolution, particularly when using fast (low f# ) optics. A somewhat better approach involves imaging a standard resolution target, such as the USAF or NBS targets (35), available
FLOW IMAGING
from major optics companies, which are composed of a periodic sequence of light and dark bars of varying spatial frequency. The user typically reports the resolution limit as the smallest set of bar patterns for which a contrast modulation can be distinguished. In some cases, this may give the user an idea of the limiting resolution of the imaging system, but this technique is subjective, can be misleading because of aliasing (discussed further below), and is inadequate as a measure of the limitations that finite resolution impose on the data. The resolution is fundamentally related to the pointspread function (PSF), which is the intensity distribution at the image plane, Ii (x, y), produced by imaging an infinitesimally small point source of light. The overall size of the PSF is referred to as the blur spot, whose diameter is denoted as dblur . In the diffraction limit, the PSF will be the Airy function (33), which has a blur spot diameter that can be approximated as the Airy disk diameter, (dblur )dl , given by the relationship (20) (dblur )dl = 2.44(m + 1)λf# .
399
(a)
(b)
(13)
Most flow imaging experiments employ camera lenses designed for 35-mm film cameras. When used at high f# and for magnifications that are not too far off design, these lenses give nearly diffraction-limited performance. The lenses have several lens elements that are necessary to correct for the many types of aberrations, including spherical, chromatic, coma, astigmatism, and distortion. However, chromatic aberrations are not usually a problem in flow imaging, because in most cases the scattered light is effectively monochromatic. In practice, such photographic lenses used at low f# and off-design produce spot sizes that can be several times larger than the Airy disk. For example, Fig. 6 shows digitally sampled images of a point light source (λ = 532 nm) whose diameter is approximately two microns in the object plane, taken at unity magnification by a Nikon 105-mm Micro lens coupled to a Kodak ES1.0 1 k × 1 k CCD camera (9 µm × 9 µm pixels). For comparison, the length of the horizontal white bar below each image of Fig. 6 is equal to the diffractionlimited spot size computed from Eq. (13). Figure 6 shows that the spot size is approximately diffraction-limited at f /22 and f /11, but at f /2.8, the spot size is about 50 µm, which is substantially larger than the diffractionlimited value. The increase in the blur spot, relative to the diffraction limit, results from the greater aberrations of the lower f# . The PSF directly affects the resolution because the image is the result of the convolution of the PSF with the irradiance distribution of the object. Therefore, the smallest objects that can be imaged are related to the size and shape of the PSF; worse resolution is associated with a broader PSF or larger blur spot. In addition to setting the limiting resolution, or the highest spatialfrequency structure that can be resolved, the imaging system also tends to blur increasingly smaller scale structures. Because of this, it is usually not sufficient to simply state the limiting resolution of the system. For example, it will be shown later that measurements of scalar gradients, such as derived from temperature or
(c)
Figure 6. Digitally sampled point-spread functions acquired using a Kodak ES1.0 CCD camera (9 × 9 µm pixels) fitted with a Nikon 105-mm lens. The object imaged is a point source approximately 2 µm in diameter, and the magnification is unity. The three images are for three different aperture settings: (a) f /22, (b) f /11, and (c) f /2.8. The white line below each spot is the diameter of the diffraction-limited blur spot.
concentration fields, can exhibit substantial errors due to resolution limitations, even at frequencies substantially lower than the limiting resolution of the system. The blurring incurred by an imaging system that has finite resolution is essentially a result of the system’s inability to transfer contrast variations in the object to the image. The accepted means of quantifying how accurately an imaging system transfers contrast is the optical transfer function (OTF) (18,35). The OTF, which is analogous to a linear filter in time-series analysis, describes the response of the imaging system to a sine wave contrast variation in the object plane. For example,
400
FLOW IMAGING
assume that the intensity distribution of the object is described by the equation Io (x) = b0 + b1 cos(2π sx),
(14)
where Io is the intensity of the object, b0 and b1 are constants, and s is the spatial frequency (typically in cycles/mm, or equivalently, line-pairs/mm). It can be shown that a linear system will image the object as a sine wave of the form (18) Ii (x) = b0 + c1 cos(2π sx − φ),
MTF(s) =
(15)
where Ii is the intensity of the image, c1 is a constant, and φ is a phase shift. Examples of functions are shown in Fig. 7, where the image exhibits both a reduction in the contrast (i.e., c1 < b1 ) and a phase shift, which corresponds to a shift in the location of the wave. Because the phase shift is associated with a shift in the position of the image, it is generally associated with geometric distortion. The OTF can be described mathematically by the relationship OTF(s) = MTF(s)eiPTF(s) ,
(16)
where MTF(s) is the modulation transfer function and PTF(s) is the phase transfer function. The MTF describes the contrast transfer characteristics of the imaging system, and the PTF describes the phase transfer characteristics. Equation (16) shows that the magnitude of the OTF is the MTF, that is, MTF(s) = |OTF|. The MTF is generally considered more important in describing the transfer characteristics of an imaging system because
s−1
(a)
phase differences typically occur only at high spatial frequency where the MTF is very small (35). The MTF is measured by imaging objects that have a sine wave irradiance variation of known spatial frequency. The maximum and minimum intensities are defined as Imax , Imin , respectively, and the contrast of the object is defined as Co = (Iomax − Iomin )/(Iomax + Iomin ). The contrast of the image is defined similarly as Ci = (Iimax − Iimin )/(Iimax + Iimin ). The MTF is then defined as
b 0+ b 1
(17)
For an imaging system that reproduces the contrast of an image perfectly, the MTF is equal to unity, but for all real imaging systems, MTF → 0 as s → ∞. For example, Fig. 8 shows the MTF of a diffractionlimited f /8 lens at a magnification of unity. The figure shows that the MTF immediately begins decreasing as spatial frequency increases, and implies that there are no nonzero frequencies that can be imaged without contrast distortion. This is different from ideal transfer functions in time-series analysis, which generally have a flat response over a wide range and then roll off only at high frequency. In imaging, it is virtually impossible to measure without some level of contrast distortion. The limiting resolution is often specified by a cutoff frequency sco , where the MTF goes to zero. Note that all diffraction-limited MTFs have a universal shape and a cutoff frequency (sco )dl that is related to the numerical aperture (NA) on the image side of the lens and the wavelength of light (36). In the literature, it is common to see the cutoff frequency related to the lens f# but assuming an infinite conjugate ratio (i.e., object at infinity). However, for noninfinite conjugate ratios and assuming that the image is formed in a medium whose index of refraction is unity, the cutoff frequency depends on the magnification per the relationship (sco )dl = [λf# (m + 1)]−1 .
b0
Io(x )
1
b 0− b 1
0.8
x
Diffraction-limited imaging system
(b) f
0.6
b 0+ c 1 b0
MTF
Ii (x )
Ci Co
0.4 Aberrated imaging system
b 0− c 1 0.2
x Figure 7. Effect of the imaging system on a sine wave object. (a) the irradiance distribution of the object; (b) the irradiance distribution of the image resulting from the convolution of the object sine wave with the LSF. The resulting image exhibits contrast reduction and a phase shift φ. (Adapted from W. J. Smith, Modern Optical Engineering: The Design of Optical Systems, 2e., McGraw-Hill, NY, 1990, with permission of The McGraw-Hill Companies.)
4% MTF 0
0
20
40 60 80 100 Spatial frequency (cycles/mm)
120
Figure 8. Diffraction-limited MTF for an f /8 lens operated at a magnification of unity. Also shown is a hypothetical MTF for an aberrated imaging system. The cutoff frequency for the diffraction-limited MTF is (sco )dl = 117 cycles/mm.
FLOW IMAGING
The diffraction-limited MTF is given by (18) MTF(s) =
2[α(s) − cos α(s) sin α(s)] π
(18)
where, α(s) = cos−1 [s/(sco )dl ]. The human eye can distinguish contrast differences of a few percent, and so the cutoff frequency, particularly for Gaussian MTFs, is sometimes specified as the frequency at which the MTF is 0.04, or 4% of the peak value. Figure 8 also shows a hypothetical MTF for an aberrated optical system. The aberrated system exhibits reduced contrast transferability across the entire frequency range and a lower cutoff frequency. One of the main advantages of the concept of the MTF is that MTFs for different components of an optical system can be cascaded. In other words, the overall MTF is the product of the MTFs of each component. For example, the overall MTF for an intensified camera system is the product of the MTFs for the photocathode, microchannel plate, phosphor screen, optical fiber bundle, and CCD. Because virtually all MTFs exhibit rapid roll-off, the overall MTF is always worse than the worst MTF in the system. It is enlightening to consider an example of how significantly the MTF can affect a certain type of measurement. Assume that it is desired to measure the irradiance gradient dIo /dx, such as is necessary when computing diffusive fluxes. Consider an object that has a sine wave intensity distribution as given by Eq. (14). It can be shown that the image contrast is given by (18) Ii (x) = b0 + b1 MTF(s) cos(2π sx − φ)
(19)
The derivatives of both Io and Ii are sine waves; for simplicity, consider only the maximum derivative, which occurs at 2π sx − φ = π/2. In this case, the relative error in the maximum gradient (derivative) is 1 Error = dIo dx
dIo dIi − dx dx
= 1 − MTF
(20)
Equation (20) shows that the error in the gradient is very large (96%) at the 4% MTF point. If an error no larger than 10% is desired, then the MTF at the frequency of interest must be no less than 0.9. This can be a very stringent requirement for some imaging systems. For the diffraction-limited case shown in Fig. 8, the measurements would be limited to frequencies less than 10 cycles/mm or wavelengths greater than 100 µm. As exemplified in Fig. 8, the situation is typically much worse for an actual aberrated imaging system. In practice, the MTF is a very difficult thing to measure directly because it is difficult to achieve a true sine wave contrast modulation in the object plane (35). It is relatively easy, however, to produce black-and-white bar patterns of varying frequency, which is why the MTF is often approximated by this method. The response of the system to a periodic black-and-white bar pattern is sometimes called the contrast transfer function (CTF) (also the square-wave transfer function). The CTF is relatively
401
easy to measure, and several square-wave targets are available commercially. However, the CTF is not the same as the MTF, although they are related. Because the FT of a square wave is a sinc function, which exhibits a finite bandwidth of frequencies, the CTF is a reflection of the imaging system’s ability to transfer contrast across a range of frequencies, rather than at just a single frequency as for the MTF. The CTF is related to the MTF by the relationship (28) CTF(3s) CTF(5s) π CTF(s) + − MTF(s) = 4 3 5 CTF(7s) CTF(11s) − + ··· . + 7 11
(21)
The CTF generally has a shape similar to that of the MTF, but it will have higher values of the transfer function at a given spatial frequency; therefore, measuring the CTF tends to give the impression that the resolution is better than it actually is. Despite the ease of measuring the CTF, it is not a recommended means of determining the resolution because it is not very accurate, particularly when using discrete sampling detectors, such as CCD arrays (35,37). An array detector can be thought of as a device that averages, owing to the finite size of the pixels (δx), and samples at a frequency that is the inverse of the pixel pitch (spacing) a. When the image, as projected onto the array detector, is sampled at too low a frequency, then aliasing can occur. Aliasing occurs when high-frequency components of the image are incorrectly sampled as lower frequency components and results in spurious contrast modulation in the sampled image. Aliasing can be avoided by ensuring that the image (before sampling) has no frequency content higher than the Nyquist frequency sN = (2a)−1 . When the spatial frequency content of the image is higher than the Nyquist frequency, then the resulting spurious frequency content can mislead the user into thinking that the resolution is higher than it actually is (38). In flow imaging, the input optics typically have a cutoff frequency that is higher than the Nyquist frequency of the array, and thus aliasing is often a potential problem. Furthermore, the broad range of frequencies in a squarewave target makes it very difficult to avoid aliasing effects. In fact, the avoidance of aliasing when measuring contrast transfer characteristics is imperative because the MTF of a discrete sampling detector is not even defined when aliasing is present (35,37). The reason is that for a device to have an MTF, it must be linear and isoplanatic. Isoplanatic means that the output image is insensitive to movement of the input image. Array detectors are typically sufficiently linear, but they are not necessarily isoplanatic. For example, consider the case where a white/black bar pattern is imaged at a magnification of unity and where the spacing of the bars is equal to the pixel pitch. In this case, the contrast modulation of the image will depend on whether the bars are ‘‘in-phase’’ (aligned with the pixels), or ‘‘out-of-phase’’ (straddling the pixels). Such nonisoplanatic behavior is mainly a problem at spatial frequencies near the Nyquist limit. For this reason, MTFs
402
FLOW IMAGING
for CCDs and other detectors can be considered ‘‘pseudo’’MTFs only, which have a limited range of applicability. For example, it has been shown that array detectors are approximately isoplanatic for frequencies lower than SN (35). From purely geometric considerations, the array MTF follows a sinc function, MTF(s) =
sin(π δx s) , π δxs
(22)
which goes to zero at a frequency of s = 1/δx. In practice, the MTF will be smaller than given by Eq. (22), owing to the diffusion of photon-generated charge carriers, light scatter between detector elements, reflections between the array and the protective window, and nonideal chargetransfer efficiency. For video systems, the processing electronics and frame grabber will also reduce the quality of the MTF. Several studies have shown that a useful means of inferring the MTF is by measuring the line-spread function (LSF). The LSF is the 1-D analog of the PSF because it is the intensity distribution at the image plane resulting from imaging an infinitesimally narrow slit at the object plane. The importance of the LSF is that its FT is the OTF (35). Furthermore, if the LSF is a symmetrical function, then the OTF is real, indicating that there is no phase distortion and the PTF is zero. If the intensity distribution of the PSF is given by p(x, y), then the LSF irradiance distribution is
k(x) is the convolution of a step function with the LSF. It necessarily follows that the derivative of k(x) is the LSF l(x) =
(24)
Figure 9 shows example an setup for obtaining the LSF by scanning a knife-edge and monitoring the output from a single pixel. Figure 10 shows the SRF obtained using this same setup, for m = 1, f /2.8, where the knife-edge was translated in 2-µm increments. A single 9-µm pixel near the center of the field of view was monitored, and the resulting SRF was very well resolved. Figure 10 also shows an error function curve fit to k(x), where the error function provides a reasonably good fit to the data. Also shown in Fig. 10 is the Gaussian LSF obtained by differentiating the error function curve fit. The LSF is seen to have a 1/e2 full width of about 40 µm, which corresponds to about 4.5 pixels. The point source images of Fig. 6c indicate a larger LSF, but the heavy quantization and the potential for aliasing makes this difficult to determine from these types of images. The MTF, which is the FT of the LSF (and
Narrowband filter Tungsten lamp Diffusing screen Kodak ES1.0 CCD camera w/ 105 mm lens Knife-edge
∞ l(x) =
dk(x) . dx
p(x, y) dy.
(23)
−∞
x –z translation stage Figure 9. Schematic of the setup for measuring the step response function (SRF) for a single pixel of a CCD camera. The camera images the back-illuminated knife-edge, and the output of a single pixel is monitored as the knife-edge is translated across the field of view.
1
SRF(x) Measured SRF(x) Curve fit LSF(x)
0.8 Response function
Consider the sampled PSF represented by the image of Fig. 6c. Because the LSF covers such a small range of pixels, it is not known how the actual LSF is affected by array sampling. For example, if the LSF contains spatial frequency content that is higher than the Nyquist frequency, then aliasing is present, and the sampled LSF may not reflect the true LSF. There is, however, a superior technique for measuring the LSF that does not suffer from aliasing (39). In this technique, the object (whether sine wave or line source) is translated within the object plane (say in the x direction), and the output from a single pixel is monitored as a function of the x location. This technique is free from aliasing errors because the LSF is sampled at only a single point and the pitch of the measurement (i.e., the resolution) can be much finer than the pixel pitch. For example, it is not difficult to obtain 1-µm resolution on standard optical translation stages, which is substantially smaller than the pitch of most CCD arrays. Because good sine wave and line sources may be difficult to generate in practice, a relatively easy technique is to measure the step response function (SRF), which is the intensity distribution at the image plane obtained by scanning a knife-edge across the object plane. In this case, the output of a single pixel is also measured as a function of the knife-edge position. The SRF irradiance distribution
0.6 0.4 0.2 0 0
10
20 30 Distance, x (µm)
40
50
Figure 10. Measured SRF for an f /2.8 lens operated at unity magnification. The dashed line is the LSF computed from the derivative of the curve fit to the SRF.
FLOW IMAGING
system per the relationship (18)
1 MTF(s)
δdf =
Ideal MTF 0.8
f (m + 1)dblur . m(D ± dblur )
(25)
The ± sign in Eq. (25) indicates that the depth of field is smaller in the direction of the lens and larger away from it. The total depth of field δtot is the sum of the depths of field toward and away from the lens. When dblur D, which is so for most flow imaging cases, then the total depth of field simplifies to
MTF
0.6
0.4
sN
0.2
0
403
0
0.1
0.2
0.3
0.4
0.5
δtot ≈ 2dblur f#
0.6
0.7
Frequency, s (cycles/pixel)
Figure 11. Comparison of MTFs for an f /2.8 lens and 9-µm pixel array operated at unity magnification. The Gaussian MTF was inferred from the measured LSF shown in Fig. 10, and the ideal MTF was computed assuming a diffraction-limited lens and a geometric sampling function for the CCD detector.
is also Gaussian) is shown in Fig. 11. From the figure, it is seen that the resolution of this system is really not very good because sine wave structures whose frequency is 0.2 cycles/pixel (or a wavelength of 5 pixels) will exhibit a 40% contrast reduction. The Nyquist frequency SN is associated with an MTF of about 5% and emphasizes the danger of specifying the resolution in terms of the projection of a pixel into the field of view. An ‘‘ideal’’ MTF is also shown for comparison. The ideal MTF is the product of the MTFs for a diffraction-limited lens (at f /2.8, λ = 532 nm, and m = 1) and an ideal sampling detector whose pixel size is 9 µm [i.e., the product of Eqs. (18) and (22)]. The figure shows that the measured MTF is substantially worse than the ideal one, owing largely to aberrations in the lens. Note that because taking a derivative is a noiseenhancing process, if the SRF cannot be fit to a relatively simple functional form, such as an error function, this makes the determination of the LSF much more difficult using this technique. In some cases, it may be worth the trouble of measuring the LSF directly by using a narrow slit rather than a knife-edge. Paul (26) shows in a planar imaging experiment, that the MTF will be a function of the laser sheet thickness when the sheet thickness is greater than the depth of field of the imaging system. The depth of field δdf is the distance that the object may be shifted in the direction of the lens and still maintain acceptable blur, whereas the depth of is the distance that the detector can be shifted and focus δdf maintain acceptable blur. Note that the two are related = m2 δdf . If the laser sheet by the magnification, that is, δdf is larger than the depth of field, then the region across which the imaging system collects light will be a ‘‘bowtie’’ shaped region, rather than the ‘‘box’’ region shown in Fig. 3. Therefore, near the tails of the laser sheet, the blur spot may be substantially larger than at best focus. The depth of field is related to the blur spot of the imaging
m+1 . m
(26)
For the diffraction-limited case, the blur spot size is given by Eq. (13). Equation 26 shows that the depth of field increases as blur spot size increases and decreases for increasing magnification. For example, the blur spot of Fig. 6c is about 50 µm, which at f /2.8 and m = 1 amounts to δtot = 560 µm. This is somewhat larger than the typical laser sheet thicknesses that are used in planar imaging of scalars, and therefore, it is unlikely that additional blur at the edge of the sheet would be an issue. In many cases, such as using faster optics, this effect will not be negligible. One way to account for the collection of light over the ‘‘bow-tie’’ shaped volume is given in (26), where the MTF was evaluated as the weighted sum of the MTFs of thin laminates (infinitesimally thin planes) parallel to the laser sheet but at different z locations. The weighting function used was the laser sheet energy distribution. This technique of weighting the MTFs by the energy distribution accounts for the fact that more energy will be collected from regions that have smaller blur spots. However, this technique requires either the assumption of ideal MTFs or detailed system MTF measurements at a number of z locations. Another approach is to measure the MTF by the knife-edge technique at best focus and at the edge of the laser sheet. To be conservative, the MTF at the edge of the sheet could be used as the primary measure of resolution, although a reasonable compromise might be to take the average of these two MTFs as the representative MTF of the entire system, including the laser sheet. It is also important to note that the MTF is measured at a given point in the image, but it may vary across the field of view. For this reason, it is also advisable to measure the MTF at the center and near the edges of the field of view. RESOLUTION REQUIREMENTS IN FLUID FLOWS One of the major difficulties in flow imaging is achieving adequate spatial and temporal resolution. This is particularly the case when flows are turbulent because the resolution requirements are typically very severe if it is desired to resolve the smallest scales at which fluctuations occur. Laminar flows, however, pose substantially less stringent requirements on resolution, compared to turbulent flows. The primary issue when considering the resolution requirements is the gradient of the flow property that is being measured because the gradient determines the amount of averaging that occurs across the
404
FLOW IMAGING
resolution volume. In many laminar shear flows including boundary layers, pipe flows, wakes, jets, and mixing layers, the maximum gradient is the same order of magnitude as the overall gradient. In other words, the maximum velocity and temperature gradients are approximately (∂U/∂y)max ∼U/δ and (∂T/∂y)max ∼T/δ, where U is the characteristic velocity difference, δ is the local width of the shear flow, and T is the characteristic temperature difference across the flow. For example, in a boundary layer formed by the flow of air over a heated flat plate, the maximum velocity gradients in these flows scale as (∂U/∂y)max ≈ U∞ /δ ∼ (U∞ /x)Re1/2 x , where Rex = U∞ x/ν, x is the downstream distance, and ν is the kinematic viscosity. Gradients in scalars, such as temperature or species concentration, will similarly scale with Reynolds number, but will also depend on the relative diffusivities for momentum and the scalar. For example, the maximum scalar gradient in the boundary layer will scale as (∂T/∂y)max ≈ [(T∞ − Tw )/x](Rex Pr)1/2 , where T∞ and Tw are the free-stream and wall temperatures, respectively, Pr = ν/α is the Prandtl number, and α is the thermal diffusivity. The preceding relationships show that gradients become large at large Reynolds and Prandtl numbers (or Schmidt number, Sc = ν/D , where D is the mass diffusivity for mass transfer), which is the same as saying that shear flows become ‘‘thin’’ at high Re (and Pr). Turbulent flows have substantially more severe resolution requirements than laminar flows, owing to the much larger gradients that occur at the smallest scales of turbulence. In turbulent flows, the spatial fluctuations in flow properties, such as velocity, temperature, or concentration range in scale from the largest physical dimension of the flow (e.g., the local width of the boundary layer or jet) to the scale at which diffusion acts to remove all gradients. The largest scales are often called the ‘‘outer scales,’’ whereas the smallest scales are the ‘‘inner’’ or dissipation scales because these are the scales at which the energy of fluctuations, whether kinetic or scalar, is dissipated. In classical turbulence theory, the kinetic energy dissipation scale is the Kolmogorov scale (40), η≡
ν3 ε
1/4 ,
(27)
where ε is the kinetic energy dissipation rate. Batchelor (41) argued that the smallest scale of scalar fluctuations λB , called the Batchelor scale, is related to the Kolmogorov scale and the ratio of the kinematic viscosity to the scalar diffusivity. For Sc(or Pr) 1, he argued that λB = ηSc−1/2 . There is some disagreement in the literature about the scaling for fluids when Sc 1 (42), but because most gases and liquids have Schmidt numbers of order unity or larger, this is of little practical concern. Generally, it is assumed that the Sc−1/2 scaling applies at near unity Schmidt numbers, in which case λB ≈ η. For liquids, it is typical that Pr, Sc 1; thus the Sc−1/2 scaling is appropriate, in which case λB η. Using scaling arguments, the Kolmogorov scale can also be related to outer scale variables through the relationship, η ∝ Re−3/4 , where Re is the Reynolds number based on outer scale variables (such as U, the maximum
velocity difference, and δ, the local width of the shear flow). Buch and Dahm (43) make explicit use of such an outer scaling by defining the strain-limited scalar diffusion scale λD , as λD −3/4 (28) = Reδ Sc−1/2 δ where δ is the 5–95% velocity full width of the shear flow and Reδ = Uδ/ν. Their planar imaging measurements of the finest mass diffusion scales in round turbulent jets suggest that ≈ 11. Similar measurements in planar jets suggest a value of ≈ 14 (32). The finest velocity gradient scale, analogous to the Kolmogorov scale, is the strainlimited vorticity scale, λν = λD Sc1/2 . The strain-limited diffusion scales can be related to the Kolmogorov scale by using measurements of the kinetic energy dissipation rate. For example, using the data for the decay of the kinetic energy dissipation rate for gas-phase round jets (44) and taking = 11, it can be shown that λD ≈ 6λB and λν ≈ 6η. If the mean kinetic energy dissipation scales are about 6η, then accurate measurements of the gradients will necessitate better resolution than this. This is consistent with thermal-wire measurements of temperature and velocity fluctuations, which suggest that a resolution of about 3η is sufficient for correct measurements of the smallest scale gradients (45–47). Therefore, it is recommended that the resolution of the imaging system be no worse than λD /2 and λν /2, if the smallest fluctuations in a turbulent flow are to be measured accurately. It cannot be emphasized enough that because of the nature of the MTF of the imaging system, it is too simplistic to speak of ‘‘resolving’’ or ‘‘not resolving’’ particular scales in the flow. Progressively finer scales will be increasingly affected by the imaging system, and any quantitative measurement of gradients must take this into account. Another perspective on Eq. (28) is that it describes the dynamic spatial range that is required for measuring the full range of scales. Here, the dynamic spatial range (DSR) is defined as the ratio of the largest to the smallest spatial structures that can be measured. The largest spatial scale in turbulent flows is generally considered the local width of the shear flow (or in some enclosed flows, a characteristic dimension of the enclosing box). Therefore, for turbulent shear flows, δ/λD given by Eq. (28), is the range of scales of the flow. This also shows that the Reynolds (and Schmidt) number can be thought of as directly related to the DSR of the turbulent shear flow. Equation (28) also shows that the DSR for scalars is even larger for low scalar diffusivity (high Sc or Pr numbers). For example, fluorescein dye in water has a diffusivity of about 2000, and thus the finest mass diffusion scale is about 45 times smaller than the smallest vorticity scale (42). The other important point that Eq. (28) reveals is that the DSR is a strong function of the Reynolds number; thus, it is often not possible to resolve the full range of turbulent scales by using currently available camera systems. For example, assume that it is desired to obtain planar images of the jet fluid concentration in a turbulent round jet and to resolve the full range of scales 500 mm downstream of a 5-mm diameter nozzle. The jet velocity 5% full width grows at a rate of δ(x) = 0.44x, where x is the distance downstream of the jet exit and the centerline
FLOW IMAGING
velocity decays as Uc /U0 = 6.2/(x/dj ), where Uc is the centerline velocity, U0 is the jet exit velocity, and dj is the jet exit diameter (48). In this case, the outer scale Reynolds number, Reδ = Uc δ/ν = 1.9Red , where Red is the source Reynolds number (= U0 dj /ν). If we desire to study a jet where Red = 20, 000, then the range of scales given by Eq. (28) is 150. If a smallest scale of λD /2 must be resolved, then our required DSR is 2δ/λD = 300. In planar imaging using the 1000 × 1000 pixel CCD camera whose measured MTF is shown in Fig. 11, it would not be possible to resolve the entire range of scales because substantial blurring occurs across 4–5 pixels. Turbulent timescales are generally bounded by relatively low frequency outer scale motions and highfrequency inner scale motions (40). The largest scale motions are independent of viscosity and occur over a characteristic time that is of the order of τos ∼ δ/U. This is also commonly referred to as the ‘‘large-eddyturnover’’ time. The small-scale motions, however, occur over a substantially shorter timescale, which is of the order of τis ∼ (ν/ε)1/2 or τis ∼ (Reδ )−1/2 τos , if based on outer scale variables. This latter relationship shows that, at high Reynolds numbers, the inner scale timescales can be orders of magnitude smaller than outer scale timescales. The turbulent inner scale timescales may not be the shortest timescales that must be resolved, if the flow is convecting past the measurement volume. In this case, the shortest time may be the convective inner scale time (τis )conv = λν /U, where U is the local velocity. For example, consider a mixing layer that forms between two parallel streams of air, where the streams have velocities of 100 m/s and 90 m/s. The range of turbulent spatial scales will depend only on the outer scale Reynolds number, which in turn depends only on the velocity difference of 10 m/s. The absolute velocities are irrelevant, except to the extent that they affect the local mixing layer thickness δ. If the imaging system is in the laboratory frame of reference, then the timescales will depend on both the velocity difference (which drives the turbulence) and the bulk convection of these spatial structures, which depends on the local velocity of the structures with respect to the imaging system. For the mixing layer conditions given before, if the mixing layer at the imaging location is 10 cm thick, then τos ≈ 10 ms, and τis ≈ 40 µs. However, if the small-scale structures convect by the measurement station at the mean velocity of Uconv = (U1 + U2 )/2 = 95 m/s, then the timescale that needs to be resolved is (τis )conv = λν /Uconv = 3 µs, which is considerably less than τis . It is clear that the smaller of the convective and turbulence timescales must be resolved. FLOW IMAGING: SURVEY OF TECHNIQUES The purpose of this section is to give the reader an idea of the wide range of flow imaging techniques that have been developed and applied in fluid mechanics research. Owing to space limitations, however, this survey must leave out many techniques that are certainly worthy of discussion. Hopefully, in most cases, a sufficient number of general references is provided for readers to learn about these omitted techniques on their own.
405
Furthermore, excellent reviews of a number of qualitative and quantitative flow visualization techniques, including some that were omitted in this article, can be found in (5). The reader should keep in mind that the physical principles underlying each technique are usually not discussed in this article because they are covered in different sections of this encyclopedia and in the references cited in the bibliography. This section is organized on the basis of the flow variable to be imaged, because in most cases the user starts with a need (say, for temperature imaging in an aqueous flow) and then must find the technique that best addresses that need. Because some techniques can be used to measure several flow variables, their use may be described under more than one category. Therefore, to avoid too much redundancy, a technique is described only the first time it is mentioned; thus, the uninitiated reader may need to read the article all the way through rather than skipping to later sections. Density Gradients (Schlieren and Shadowgraph) Two of the most widely used techniques for qualitative flow visualization, particularly in high-speed flows, are the schlieren and shadowgraph techniques. Although the main emphasis of this article is on quantitative planar imaging techniques, the shadowgraph and schlieren techniques will be briefly discussed because of their extensive use in gas dynamics. Furthermore, the mechanism of light ray deflection by index-of-refraction gradients, which is the basis for these techniques, is a potential source of error in quantitative laser imaging. In their most commonly used forms, the schlieren and shadowgraph techniques provide line-of-sight integrated information about gradients in the index-of-refraction field. Because the index of refraction is related to gas density, in fluid flows such as air whose composition is uniform, the schlieren technique is sensitive to variations in the first derivative of density, and the shadowgraph to the second derivative. Interferometry is a quantitative line-of-sight technique that enables imaging the density field, but it will not be discussed here because it is becoming increasingly supplanted by planar imaging techniques. Because these techniques are spatially integrated along the line of sight, they are limited in the quantitative information that can be inferred in complex, three-dimensional flows. Further details of these techniques can be found in several excellent references (5,9,10,49). The physical basis for the shadowgraph and schlieren techniques is that spatial variations in the index of refraction of a transparent medium cause spatial variations in the phase of plane light waves (33). The index of refraction is defined as n = c0 /c, where c0 is the speed of light in vacuum and c the speed of light in the medium. When traveling through a medium when n > 1, the phase of the transmitted wave undergoes a negative phase shift, owing to a lag in the oscillations of the induced dipoles within the medium. For this reason, an object that causes such a phase shift is termed a ‘‘phase object,’’ and it can be contrasted with an ‘‘amplitude object,’’ such as an opaque disk, which changes the amplitude of the light waves. Because the velocity of light is usually considered the
406
FLOW IMAGING
‘‘phase velocity,’’ which is the velocity of a point of constant phase on the wave, the phase shift can be interpreted as a change in the velocity of the transmitted wave. Both the schlieren and shadowgraph techniques are analyzed by considering how a plane wave is affected by propagation through an index-of-refraction gradient. Consider the propagation of a plane wave in the z direction through a transparent medium that has a gradient of n in the y direction. It can be shown that the angular deflection θy in the y direction is given by (9) θy =
1 ∂n dz, n ∂y
(29)
L
where the integration is along the line of sight and over the path length L. Equation (29) shows that the angular deflection increases for increasing gradients and longer path lengths. The equation also shows that the light rays are bent in the direction of the gradient, that is, the rays are bent toward regions of higher index of refraction. In gases, the index of refraction is related to the fluid density ρ by the Gladstone-Dale relationship (9), n = 1 + Kρ,
(30)
where K is the Gladstone–Dale constant. For example, for 633-nm light and T = 288 K, K = 2.26 × 10−4 , 1.57 × 10−4 , 1.96 × 10−4 m3 /kg, for air, argon and helium, respectively. In water, which is largely incompressible, the index of refraction varies primarily with temperature. For example, for 632.8-nm light, the index of refraction across the temperature range of 20–34 ° C is given by (9) n(T) = 1.332156 − 8.376 × 10−5 (T − 20 ° C) − 2.644 × 10−6 (T − 20 ° C)2 + 4.79 × 10−8 (T − 20 ° C)3 (31) An example of a schlieren setup is shown in Fig. 12. For instantaneous imaging, the light source is usually a point source of short duration (typically a microsecond or lower); common sources are xenon flash lamps and lasers. In most cases, a flash lamp is preferred to a laser source because lamps are cheaper and the coherence and mode structure
Microscope objective lens
Pinhole on three-axis translation stage
Lenses (typically mirrors)
f2 I = I a
1 ∂n dz, n ∂y
(32)
L
CCD camera
f1 Knife-edge f2 Phase object Figure 12. Schematic of a typical laser schlieren setup. The undeflected rays are shown in gray, whereas the deflected rays are shown in black. The knife-edge blocks rays deflected downward by negative gradients, which renders those gradients dark in the image. In contrast, the rays deflected upward by the positive gradients miss the knife-edge and are rendered light in the image. Pulsed laser
of most pulsed lasers causes a nonuniform image. In some cases, however, such as in plasmas where the background luminosity is very high, the high brightness of a laser is a necessity. In this case, the beam must be spatially filtered to improve its spatial uniformity (33). This is usually accomplished by tightly focusing the beam through a small pinhole by using a microscope objective lens. Note that it is typically very difficult to focus the laser beam onto such a small pinhole, and an integrated lens/pinhole mount that has three axes of translation is necessary. A further problem when using a pulsed laser is that it is difficult to keep from burning the pinhole material, owing to the very high peak intensity at the focus. This problem can be alleviated by substantially reducing the energy of the beam. Although such low laser energies will result in weaker signals, obtaining a sufficient signal is not usually a problem in the schlieren and shadowgraph techniques because the beam is usually directed into the camera (Fig. 12). When using a flash lamp that has an extended arc, the point source is approximated by imaging the arc onto a small aperture (e.g., submillimeter diameter) with a lens. The sensitivity of the schlieren system will be improved by a smaller point source, but the signals are reduced accordingly. For many flash lamps, the arc is small enough that it may not be necessary to use any spatial filter at all. The point light source is collimated by what is typically a large diameter spherical mirror, which is at least as large as the object that is being imaged. Lenses can also be used when smaller fields of view are desired, and this is the situation shown in Fig. 12. The mirror/lens is placed one focal length from the source, which collimates the beam. After the beam passes through the test section, it is then directed to a second mirror/lens (called the ‘‘schlieren head’’), which refocuses the beam. In the conventional schlieren setup, a knife-edge (e.g., razor blade) is placed at the second focal spot, as shown in Fig. 12. The horizontal knife-edge shown produces an optical system that renders upward density gradients as light and downward gradients dark. This occurs because the rays are deflected up by the upward gradients and thus miss the knife-edge, whereas the knife-edge blocks the rays that are deflected down by the downward gradients. The analysis of the schlieren intensity variations is conceptually simpler for a line light source, which forms a line image at the focus. In this case, the relative intensity variations at the film plane are given by (9)
where I is the intensity of the image when no gradient is present, I = I∗ − I, I∗ is the intensity when the gradient is present, f2 is the focal length of the schlieren head, and a is the height of the focal image that is not blocked by the knife-edge. Equation (32) shows that a longer focal length schlieren head and decreasing height of the transmitted portion of the image at the focus increases sensitivity. Interestingly, increasing the distance between the phase object and the focusing lens/mirror does not affect the sensitivity.
FLOW IMAGING
The focus is found by traversing the knife-edge along the optical axis. When the knife-edge is upstream of the focus, the image reveals an inverted shadow of the knifeedge, whereas when the knife-edge is downstream of the focus, the image reveals an upright shadow. It is only at the focus that inserting the knife-edge into the focal spot results in a uniform reduction of intensity of the image and no shadow of the knife-edge. An example of a schlieren image of a supersonic helium jet issuing into room air is shown in Fig. 13 (from Ref. 50). In this figure the knife-edge was horizontal, and therefore the vertical n-gradients are visualized. Because helium has a very low index of refraction, the n gradients resulting from mixing are very distinct. Furthermore, even subtle features such as the Mach waves in the ambient air are visualized as the dark lines to the outside of the jet. A useful extension of the schlieren technique is ‘‘color’’ schlieren, which uses a white light source combined with a transparency of varying color in place of the knifeedge (5,9,10). Because the eye is better able to distinguish colors than shades of gray, color schlieren is superior for visualizing the density gradients in flows. Although most color schlieren is used for flow visualization, it has also been used to obtain quantitative temperature data in flows by relating the color of the image to the angular deflection of the light rays (51). When used with an axisymmetric phase object, this technique enables the tomographic reconstruction of the three-dimensional temperature field (52).
407
An interesting way of looking at the function of the knife-edge is as a filter, which acts on the spatial frequencies in the phase object. This can be seen by considering that the second focus is called the ‘‘Fourier transform plane,’’ because the intensity distribution at the focus is related to the spatial frequency content of the phase object (33,53). Higher spatial frequencies are associated with increasing radial distance from the center of the focal spot. The dc component is the neutral intensity background present when there is no phase object, and it can be filtered out if an opaque disk is used as the spatial filter. In this case, when the phase object has no high-frequency content, then the image will be uniformly dark. When higher spatial frequencies are present in the phase object, the disk will not block them, and they will be visualized as light regions in the image. It can be seen from this that the shape of the spatial filter can be tailored to visualize different frequencies in the phase object. This principle has been used to develop a system that directly measures the power spectrum of the line-of-sight integrated index of refraction fluctuations in turbulent flows (54,55). Note that an alternative technique has been developed, named the ‘‘focusing schlieren’’ technique, which enables visualizing density gradients as in conventional schlieren, but the depth of field along which the signal is integrated can be just a few millimeters (56). The focusing schlieren technique can yield nearly planar images of the density gradient field at substantially lower cost than by planar laser imaging. In some cases, such as a large-scale wind tunnel where optical access is limited, it may be the only means of acquiring spatially resolved image data. The shadowgraph effect can be understood from simple geometrical ray tracing, as shown in Fig. 14. Here a plane wave traverses a medium that has a nonuniform index-of-refraction gradient and is allowed to illuminate a screen. The rays traversing through the region that has no gradient are not deflected, whereas the rays traversing the region that has an the upward gradient are bent up. The resulting image on the screen consists of regions where the rays converge and diverge; these appear as regions of light and dark, respectively. It is this effect that gives the technique its name because gradients leave a shadow, or dark region, on the viewing screen. It can be shown that the intensity variations on the screen follow the relationship (9) I =L I
∂2 ∂2 + 2 2 ∂x ∂y
(ln n) dz.
(33)
Illumination screen
} Figure 13. Sample schlieren image of a Mach 2 jet of helium exhausting into room air. The knife-edge is oriented horizontally, thus the vertical index of refraction gradients are visualized. The image reveals fine structures of the jet turbulence in addition to Mach waves that are generated by structures that travel at supersonic speeds with respect to the ambient. (Reprinted with permission from Mach Waves Radiating from a Supersonic Jet by N. T. Clemens and P. H. Paul, Physics of Fluids A 5, S7, copyright 1993 The American Institute of Physics.)
Collimated light rays
Neutral
} Light
r
Dark r
} Light
} Phase object
Neutral
Deflected ray
Figure 14. Illustration of the shadowgraph effect.
408
FLOW IMAGING
For gas flows, incorporating the Gladstone–Dale relationship into Eq. (33) shows that the shadowgraph technique is sensitive to the second derivative of the density along the line of sight of the light beam. A shadowgraph system can be set up almost trivially by using an approximately collimated light source and a screen. For example, a shadowgraph system suitable for classroom demonstrations can be made by expanding the beam from a laser pointer using a short focal length lens and projecting the beam onto a wall a few meters away. This simple system will enable the visualization of the thermal plume rising from a candle flame. Despite the simplicity of this system, more sophisticated setups are typically desired. For example, the schlieren setup shown in Fig. 12 can be used for shadowgraph by simply removing the knife-edge. However, unlike schlieren, where the camera is focused on the phase object, the camera must be slightly defocused to produce sufficient divergence of the deflected rays on the image plane. This feature enables one to ‘‘focus out’’ the shadowgraph effect in a schlieren system. An obvious disadvantage to this technique is that any amplitude objects in the image (e.g., a bullet) will be slightly out of focus. The problem of slight defocus is generally tolerable, compared to the advantages of being able to alternate quickly between the schlieren and shadowgraph techniques. Concentration/Density Imaging the concentration of a particular type of fluid or chemical species is primarily of interest in studies of mixing and combustion. Concentration and density are related quantities in that they both quantify the amount of a substance per unit volume. Because most optical diagnostic techniques are sensitive to the number of scatterers per unit volume, rather than to the mass per unit volume, the concentration is the more fundamental quantity. Of course, density can be inferred from the concentration if the fluid composition is known. Concentration imaging is of interest in nonreacting mixing studies and in reacting flows for investigating the relationship between the chemical state of the fluid and the fluid mechanics. Planar laser-induced fluorescence imaging is probably the most widely used technique for quantitative scalar imaging because it can be used in liquids and gases, it is species specific, and its high signals enable measuring even minor species in gas-phase flows (11,27,57). In laserinduced fluorescence (LIF), a laser is used to excite an atom or molecule from a lower energy state into a higher energy state by the absorption of a photon of light. The frequency of light required is related to the energy difference between the states through the relationship E = hν, where E is the energy per photon and ν is the frequency of light. The excited state is a state of nonequilibrium, and thus the atom/molecule will tend to return to equilibrium by transiting to a lower energy state. The return to the lower state can occur by several processes, including spontaneous emission of a photon of light (fluorescence); stimulated emission by the incident laser light; ‘‘quenching,’’ that is, the transfer of energy to other atoms/molecules through molecular collisions; and
by internal energy transfer, or the transfer of energy to other energy modes within the molecule. Because the probability of quenching depends on local thermodynamic conditions, the LIF signal is in general a function of several flow variables, including the concentrations of all species present, temperature, and pressure. Furthermore, the theoretical dependence of the LIF signal on the flow variables depends on the specific model of the energytransfer physics. Because properly modeling the physics is an important part of quantifying PLIF measurements, PLIF can be a particularly challenging technique to use. The dependence of the signal on many variables presents both an opportunity and a disadvantage for making quantitative measurements. The opportunity is that PLIF can be used to measure a range of flow variables for a remarkable number of chemical species. However, it is generally very difficult to relate the LIF signal to a particular variable of interest (e.g., species concentration) because the signal depends on so many other flow variables, which may not be known. For example, in using PLIF for OH, which is commonly used in flames as an approximate marker of the reaction zone, the PLIF signal is a function of the OH mole fraction, the mole fractions of several other species, including N2 , O2 , H2 O, and CO2 ; and the temperature. Because it is virtually impossible to measure all of these variables, the signal can be quantified only by assuming a certain level of knowledge about the thermochemical state of the flow (e.g., equilibrium chemistry). Despite the caveat about the difficulties that can be encountered when using PLIF imaging, there are many cases where PLIF imaging is in fact relatively simple to implement. The first case is using PLIF in liquid flows. PLIF in liquids, particularly water, is achieved by seeding a fluorescent organic dye into the flow. Because many liquids are essentially incompressible and isothermal, the PLIF signal is usually a function only of the dye concentration and therefore is ideal for mixing studies (6,58,59). Fluorescent dyes absorb light across a very broad range of wavelengths, and thus they can be stimulated by using a number of different lasers. Some of the more popular dyes for aqueous flows include fluorescein, rhodamine B, and rhodamine 6G; all of their absorption bands overlap one or more emission lines of the argon-ion, copper-vapor, and doubled Nd •• YAG lasers. Because of this and because liquid-phase PLIF tends to exhibit high signal levels (due to the high density of the fluid), excellent results can usually be achieved without highly specialized equipment. It is important to note that some dyes, suffer from photobleaching effects at high laser intensity (or fluence), which can lead to significant errors in concentration measurements (60–63). Photobleaching is the reduction in the concentration of fluorescent molecules due to laser-induced photochemistry. Both fluorescein and rhodamine 110 are particularly problematic, and (60) even suggests abandoning the use of fluorescein in favor of rhodamine B. Another important issue in using PLIF of organic dyes is that the high signals are often a result of the high absorption coefficient of the dye solution. In this
FLOW IMAGING
case, substantial laser beam attenuation is encountered when the optical path lengths are relatively large. Beam attenuation can be alleviated by reducing the dye concentration along the beam path or by reducing the optical path length; however, this is often not possible, owing to SNR considerations or other practical limitations. Alternatively, attenuation along the ray path can be corrected for by using the Beer-Lambert absorption law, provided that the entire path length of a given ray of the laser sheet is imaged (11,64). PLIF is also relatively easy to implement in nonreacting gas-phase flows, where the flow can be seeded with a gasphase tracer species. By far the most popular tracer to date is acetone, although biacetyl, NO, and I2 have also been used to a more limited extent. Acetone (CH3 COCH3 ) is an excellent tracer species in nonreacting flows because it is relatively nontoxic, fairly easy to seed into flows, generally provides good signals, and can be pumped at a range of UV wavelengths (65). A characteristic feature of polyatomic molecules, such as acetone, is that they have broad absorption bands. The absorption band of acetone ranges from about 225 to 320 nm, and thus it is readily pumped by quadrupled Nd •• YAG (266 nm) and XeCl excimer (308 nm) lasers. Furthermore, although its fluorescence efficiency is not very high (about 0.1–0.2%), its high saturation intensity means that high laser energies can be used, which compensates for any limitation in fluorescence efficiency. A small sample of studies where acetone PLIF was used for concentration measurements includes jets (32,65), jets in crossflow (66), supersonic shear layers (67), and internal combustion engines (68). Biacetyl (CH3 (CO)2 CH3 ) is another low toxicity seed species that has been used in nonreacting flows and to a lesser extent in flames. Biacetyl vapor absorbs in the range 240–470 nm and exhibits blue fluorescence over the range 430–520 nm and green phosphorescence over the range 490–700 nm. The quantum yield, that is, the fraction of emitted photons to absorbed photons is 15% for phosphorescence, but is only 0.2% for fluorescence. For this reason, biacetyl phosphorescence has been used to produce very high SNR imaging (29). Several different lasers have been used for biacetyl pumping, including dye lasers, excimers, and frequency-tripled Nd •• YAGs. One drawback to using biacetyl is that O2 quenches the phosphorescence, which leads to a significantly lower SNR when biacetyl is seeded into air. Furthermore, the long lifetime of the phosphorescence (about 1 ms) can be severely limiting if high temporal resolution is required. Finally, biacetyl can be difficult to work with because it has a very strong odor (akin to butterscotch) that can rapidly permeate an entire building if not contained. Other seed species that have been used for species mole fraction measurements, primarily in supersonic mixing flows, include I2 and NO. Both species are difficult to work with because they are highly corrosive and toxic. One of the main advantages of diatomic molecules is that they tend to have many discrete absorption lines, in contrast to more complex polyatomic molecules, such as acetone, whose lines are very broad. Diatomic molecules give the user much greater ability to choose the temperature dependence of the LIF signal. This property has been
409
used in several supersonic mixing studies where relatively temperature-insensitive transitions were used so that the resulting LIF signal was approximately proportional to the mole fraction of the fluorescent species (69–71). A major issue in mixing studies is that unless the smallest scales of mixing are resolved, it is not possible to differentiate between fluid that is uniformly mixed at the molecular level or simply ‘‘stirred’’ (i.e., intertwined, but without interdiffusion). An interesting application of NO PLIF is in ‘‘cold chemistry’’ techniques, which can differentiate between mixed and stirred fluid on a scale smaller than can be resolved. These techniques use the fact that NO fluorescence is rapidly quenched by O2 but is negligibly quenched by N2 . Cold chemistry has been used to obtain quantitative statistical mixing properties of high Reynolds number shear layers where the smallest mixing scales were not resolved (72,73). This technique has also been extended to enable direct imaging of the degree of mixing/stirring for each pixel by simultaneously imaging the fluorescence from a quenched (NO) and nonquenched (acetone) species (74). PLIF has proven extremely useful in investigating mixing and supersonic flows by seeding a tracer, and it is also important for imaging naturally present species, such as occur in chemically reacting flows. Because PLIF is a highly sensitive technique, it enables the imaging of trace species, such as combustion intermediates. For example, PLIF has been used to image an astounding number of species in flames. A limited list of major and intermediate species that have been imaged in flames by PLIF include CH, OH, NO, NO2 , C2 , CN, NH, O2 , CO, C2 H2 , H2 CO, O, and H (11,57,75). The power of PLIF species imaging in combustion research is exemplified by Fig. 15, which shows a pair of simultaneously acquired images of CH and OH in a turbulent nonpremixed methane–oxygen jet flame (76). The CH was pumped at a wavelength of about 390 nm, and the fluorescence was collected across the range of 420–440 nm; the OH was pumped at about 281 nm, and the fluorescence was collected across the range of 306–320 nm. The laser excitation was achieved by using two Nd •• YAG lasers, two dye lasers, frequency doubling, and mixing crystals; the images were captured
x /d 42
CH
CH + OH
OH
40 38 −2
0
2
−2
0 r /d
2
−2
0
2
Figure 15. Sample of simultaneously acquired CH/OH PLIF images in a turbulent methane–oxygen jet flame. The CH field is shown at left, the OH field at center, and the superposition of the two at the right. The coordinates x and r refer to axial and radial distances, respectively, and d is the diameter of the jet nozzle. (Reprinted by permission of Elsevier Science from Reaction Zone Structure in Turbulent Nonpremixed Jet Flames — From CH-OH PLIF Images by J. M. Donbar, J. F. Driscoll and C. D. Carter, Combustion and Flame, 122, 1–19, copyright 2000 Combustion Institute.) See color insert.
410
FLOW IMAGING
on two intensified CCD cameras. The CH field is shown at the left, the OH in the middle, and the two images are shown superimposed at the right. Rayleigh scattering has also been used successfully to image the concentration field in a range of flows. Rayleigh scattering is defined as the elastic scattering from particles, including atoms and molecules, which are much smaller than the wavelength of the incident light (77). In molecular Rayleigh scattering, the differential Rayleigh scattering cross section at 90° , (dσRay /d ), is given by the relationship (78) 4π 2 (n − 1)2 dσRay = d Nd2 λ4
(34)
where Nd is the number density. Note that in a mixture of fluids, the Rayleigh scattering signal is proportional to the total cross section of the mixture, and thus it is not species specific (11), which greatly limits its utility for measuring concentration in reacting flows. Equation (34) shows that the differential scattering cross section scales as λ−4 , which indicates a much greater scattering efficiency for short wavelengths of light. However, whether it is advantageous to work at UV rather than visible wavelengths is determined from an analysis of the entire electro-optical system. For example, is it better to measure using a frequency-quadrupled (266 nm) or a doubled (532 nm) Nd •• YAG laser? To see this, it must first be considered that the signal recorded by a detector is directly proportional to the number of incident photons (EL /hν), as shown in Eq. (5). Because ν = c/λ, the number of photons per pixel for Rayleigh scattering scales as Spp ∝ (EL λ)λ−4 ∝ EL λ−3 ; thus, the dependence of the signal on the wavelength is weaker on a per photon basis. Furthermore, the quantum efficiency of most detectors decreases in the UV, and there are few highquality fast (low f# ) photographic lenses that operate at UV wavelengths. For example, consider scattering measured by a Nd •• YAG laser that produces 500 mJ at 532 nm and 125 mJ at 266 nm. In this case, the number of photons scattered will be only twice as large as that at 266 nm. After accounting for the likely reduced quantum efficiency and f# of the collection optics, UV excitation may not be a means of improving the signal. UV excitation is more likely to be beneficial when using excimer lasers, which produce very high energies per pulse well into the UV. This example shows that it is necessary to account for all aspects of the electro-optical system, not just the scattering cross section, when determining the optimal wavelength to use. One of the most common uses of Rayleigh scattering is in nonreacting mixing studies, where it is used as a passive marker of fluid concentration. For example, jet mixing can be studied by imaging the Rayleigh scattering when a jet fluid that has a high Rayleigh cross section issues into an ambient fluid that has a low cross section (32,43,79,80). In this case, the mole fraction of jet fluid can be related to the scattering signal through the relationship χjet = [Se − (Se )∞ ]/[(Se )0 − (Se )∞ ], where (Se )0,∞ are the signals obtained at the jet exit and ambient, respectively. An example of a Rayleigh scattering image is shown in
Max.
Concentration
Min. Figure 16. Example of a planar Rayleigh scattering image of a turbulent propane/acetone jet. The jet issued into a slow co-flow of filtered air, the field-of-view was 35 × 35 mm, and the local Reynolds number at the measuring station was 5600. The signal is proportional to the concentration of jet fluid. (Reprinted with permission from Planar Measurements of the Full Three-Dimensional Scalar Dissipation Rate in Gas-Phase Turbulent Flows by L. K. Su and N. T. Clemens, Experiments in Fluids 27, 507–521, copyright 1999 Springer-Verlag.) See color insert.
Fig. 16 (from Ref. (32)), which was acquired in a planar turbulent jet of local Reynolds number 5600 at a distance of 100 slot widths downstream. The jet fluid was propane, which was seeded with about 5% acetone vapor, and the jet issued into a slow co-flow of air. The jet was illuminated by 240 mJ of 532-nm light produced by a Nd •• YAG laser, and the images were captured by a slow-scan CCD camera that had a 58-mm focal length, f /1.2 lens and a laser line filter (50% maximum transmission) at a magnification of 0.28. In Fig. 16, the signal is proportional to the concentration of jet fluid, and the figure demonstrates that Rayleigh scattering can be used to obtain very high quality images of the jet concentration field. One of the main difficulties in such an experiment is the need to reduce all sources of unwanted elastic scattering, such as reflections from test section walls/windows and scattering from dust particles in the flow. The background scattering from windows and walls is particularly problematic because it can easily overwhelm the weak Rayleigh signals. Although theoretically these background signals can be removed as part of a background correction, such as obtained by filling the test cell with helium (81), this can be done only if the shot-to-shot variation in the background is substantially less than the Rayleigh signals of interest. In many cases, however, such as in a relatively small test section, this is not the case, and background interference is unacceptably high. When the background due to laser reflections from walls/windows is high, increasing the laser energy does not improve the signal-to-background ratio because the signal and background increase proportionately. In this case, the only recourse is to increase the signal by using a higher cross section or number density or to lower the background by reducing reflections by painting opaque surfaces flat black and by using antireflection coatings on all windows.
FLOW IMAGING
In some cases, filtered Rayleigh scattering (FRS) can be used to reduce the background reflections greatly from walls and windows (5,82). In FRS, the Rayleigh scattering from a narrow bandwidth laser is imaged through a narrow line notch filter. For example, in one implementation used in high-speed flows, the Rayleigh scattering is induced by the narrow line light from an injection seeded frequency-doubled Nd •• YAG laser, and the scattering is imaged through a molecular iodine absorption filter. If the laser beam and camera are oriented in the appropriate directions, the light scattered by the moving molecules will be Doppler-shifted, whereas the reflections from the stationary objects will not be shifted. Figure 17 illustrates the FRS technique. Because the scattering is imaged through an absorption filter, the unshifted light is absorbed by the filter, whereas the Doppler-shifted scattering is partially or completely transmitted. This same technique also forms the basis of a velocity diagnostic that will be discussed later. Spontaneous Raman scattering has also been employed for quantitative concentration measurements in turbulent flows. It is particularly useful in combustion research because it is linear, species specific (unlike Rayleigh scattering), and enables measuring multiple species using a single laser wavelength (11). Spontaneous Raman scattering is caused by the interaction of the induceddipole oscillations of a molecule with its rotational and vibrational motions. In other words, the incident laser beam of frequency ν0 is shifted in frequency by the characteristic frequency of rotation/vibration. The frequency of Raman scattering is either shifted to lower frequencies (called Stokes-shifted) or to higher frequencies (called anti-Stokes-shifted). The photon that is Stokesshifted has lower energy than the incident photon, and the energy difference is transferred to the energy of
∆nD Laser line
Filter transmission curve
I (n) Rayleigh–Brillouin scattering line
nL
n
Figure 17. Illustration of the filtered Rayleigh scattering technique. The scattering from walls and windows has the same line shape and line center frequency as the laser itself. The scattering from the flow is shown as molecular (Rayleigh–Brillouin) scattering, which may be broader than the laser line, owing to thermal and acoustic motions of the molecules. If the scattering medium is particles rather than molecules, then the Rayleigh scattered light will have the same line shape as the laser. When the scattering is imaged through a notch filter (shown as the dotted line), then the Doppler-shifted light is partially or completely transmitted, whereas the scattering from stationary objects is not transmitted.
411
vibration/rotation of the molecule. Similarly, anti-Stokesshifted photons have higher energy, and thus energy is given up by the molecule. In most flow imaging studies, vibrational Raman scattering is used because the lines for different species are fairly well separated. For example, for excitation at 532 nm, the Stokes-shifted vibrational Raman scattering from N2 , O2 , and H2 occurs at wavelengths of 607, 580, and 683 nm, respectively. In contrast, owing to the smaller energies of rotation, the rotational Raman lines in a multispecies mixture tend to be grouped closely around the excitation frequency, thus making it very difficult to distinguish the scattering from a particular species. The main problem in spontaneous Raman scattering is that the signals tend to be very weak, owing to very small scattering cross sections. Typically, Raman scattering cross sections are two to three orders of magnitude smaller than Rayleigh cross sections (11). For example, for N2 at STP, (dσ/d )Ray = 7 × 10−32 m2 /sr, whereas the vibrational Raman cross section (dσ/d )Ram = 4.6 × 10−35 m2 /sr, which is more than three orders of magnitude smaller than the Rayleigh cross section. The low signals that are inherent in Raman scattering make it applicable in only a few very specialized cases, such as when only major species are of interest and when very high laser energies can be generated. For example, methane concentration has been imaged in jets and flames; however, this required a high-energy flashlamp-pumped dye laser (λ ≈ 500 nm, >1 J/pulse), combined with a multipass cell (19,83,84). The multipass cell resulted in an increase in laser fluence of about 30 times over that which could be achieved using only a cylindrical lens. A similar setup was used to image the Raman scattering from the C–H stretch vibrational mode in methane-air jet flames (85). Despite the use of very high laser energies and multipass cells, the relatively low SNRs reported in these studies demonstrate the great challenge in the application of Raman scattering imaging in flames. Temperature/Pressure Several techniques have been developed to image temperature in both liquid- and gas-phase flows. Most liquidphase temperature imaging has been accomplished using either liquid crystals or PLIF of seeded organic dyes. For example, suspensions of small liquid crystal particles were used to image the temperature field in aqueous (86) and silicon oil flows (87,88). In these studies, the liquid crystal suspensions were illuminated by a white light sheet, and the reflected light was imaged by using a color CCD camera. The color of the crystals were then related to the local flow temperature using data from independent calibration experiments. The advantage of liquid crystals is that they can measure temperature differences as small as a fraction of a degree, but typically in a range of just a few degrees. Furthermore, they have a rather limited response time and spatial resolution that is not as good as can be achieved by planar laser imaging. PLIF thermometry offers an improvement in some of these areas, but the minimum resolvable temperature difference tends to be inferior. The simplest PLIF technique is single-line
412
FLOW IMAGING
thermometry, where a temperature-sensitive dye is uniformly seeded into the flow and the signals are related to temperature using data from a calibration experiment. For example, rhodamine B dye has relatively good temperature sensitivity because the LIF signal decreases 2–3% per ° C. In (89), temperature fields were acquired by exciting rhodamine B by using a frequency-doubled Nd •• YAG laser and imaging the fluorescence through a color filter. They report measurement uncertainty of about 1.7 ° C. A potential source of error in flows that have large index of refraction gradients, such as occur in variable temperature liquid- or gas-phase flows, is the variation in the intensity of the laser beam, owing to the shadowgraph effect. This can be a significant problem in liquid-phase flows where the temperature differences are of the order of several degrees or more or where fluids that have different indexes of refraction are mixed. In gas-phase flows, shadowgraph effects are less of a problem, but they may be significant when mixing gases such as propane and air that have very different indexes of refraction, and at high Reynolds numbers where gradients tend to be large. For example, careful viewing of the mixing of propane and air shown in Fig. 16 reveals subtle horizontal striations that are caused by shadowgraph effects. In principle, it is possible to correct for shadowgraph effects (64,89) — provided that the smallest gradients are resolved — by correcting the laser beam intensity along a ray path using Eq. (33). In the planar imaging of turbulent flow, however, it is not possible to correct for out-of-plane gradients, and thus the correction procedure will not be completely accurate. As an alternative to correcting for shadowgraph effects, two-line techniques have been developed where a mixture, composed of a temperature-sensitive dye and a temperature-insensitive dye, is seeded into the flow (63,90). If dyes are chosen that fluoresce at different wavelengths (when excited by the same wavelength of light), then the ratio of the two LIF signals is related to the temperature but is independent of the excitation intensity. In some cases, it is desired to remove shadowgraph effects, while maintaining density differences. In this case, it is possible to make a binary system of fluids, which have different densities but the same index of refraction (e.g., 91). One of the simplest techniques for measuring temperature in gas-phase, constant-pressure flows is to measure density by schlieren deflectometry, interferometry, or Rayleigh scattering, from which the temperature can be inferred using an equation of state. For example, the rainbow schlieren (or deflectometry) technique discused previously (51,52) enables imaging the temperature field under certain conditions, such as in constant pressure, steady, two-dimensional, laminar flows. However, because this technique is spatially integrated, it has limited applicability to 3-D, unsteady flows, particularly where the composition and temperature (hence, index of refraction) vary in space and time. Unfiltered Rayleigh scattering techniques typically require a constant pressure flow that has a uniform index of refraction (hence, Rayleigh scattering cross section). In this case, variations in the Rayleigh scattering signal are due only to temperature variations. In general, however, mixing and reacting flows exhibit
variations in fluid composition, which lead to variations in the index of refraction, even at constant temperature. It is for this reason that combustion researchers have used specialized fuel mixtures where the Rayleigh scattering cross section is approximately constant for all states of combustion, and thus the Rayleigh scattering signal is inversely proportional to temperature (92,93). The main drawback of this technique is that it assumes equal molecular diffusivities of heat and species, which is a rather dubious assumption in many cases. FRS can be used for temperature imaging by relating changes in the scattered signal line shape to the temperature. In Rayleigh scattering from molecules, even if the incident light is essentially monochromatic, the scattered light will be spread over a range of frequencies due to thermal and acoustic motions, as illustrated in Fig. 17 (5,82). When the scattering combines thermal and acoustic broadening, it is sometimes called Rayleigh–Broullin scattering. The resulting scattered light line shape, which is sensitive to the temperature, pressure, and composition of the gas, can be used to measure those quantities. For example, if the Rayleigh scattering is imaged through a notch filter that has a known transmission curve and the theoretical Rayleigh–Brillouin line shape is known, then it is possible to infer the temperature field under certain conditions. Techniques using this procedure have enabled imaging the mean pressure and temperature fields in a Mach 2, free, air jet (94) and the instantaneous temperature field in premixed flames (95). In a related technique, Rayleigh–Brillouin scattering is imaged through a Fabry–Perot interferometer, which gives a more direct measure of the frequency and line shape of the scattered light (96). This technique has been used to measure temperature (and velocity) in high-speed, free, jet flows. PLIF has also been extensively used for temperature imaging in gas-phase flows. The most commonly used technique is two-line PLIF of diatomic species (such as NO, I2 , and OH), where the ratio is formed from the fluorescence resulting from the excitation of two different transitions originating from different lower rotational levels (11,57). The advantage of the two-line technique is that the ratio of the signals is directly related to the rotational temperature but is independent of the local collisional environment, because the quenching affects the fluorescence from both lines similarly. The main difficulty in this technique is that if instantaneous temperature fields are desired, then two tunable laser sources and two camera systems are required. If only time-average measurements are required, then it is possible to use only a single laser/camera system. The two-line imaging technique has been used on a wide variety of flows for a range of fluorescent species. For example, measurements have been made in flows seeded by NO (97–99) and I2 (70,100) and by naturally occurring species such as OH (101). An example of a mean temperature image of a Mach 3 turbulent bluff trailing-edge wake is shown in Fig. 18. This image was obtained by seeding 500 ppm of NO into the main air stream and then capturing the fluorescence that results from the excitation of two different absorption lines (99). The figure clearly reveals the structure of the wake flow field, including
FLOW IMAGING
the warm recirculation region behind the base, the cool expansion fans, and the jump in temperature across the recompression shocks. Most PLIF thermometry has employed diatomic molecules, but the temperature dependence of acetone fluorescence has been used for single- and two-line temperature imaging in gas-phase flows (102,103). The advantage of using acetone is that it absorbs across a broad range of frequencies and thus tunable lasers are not required. In the single-line technique, which is applicable to flows that have a uniform acetone mole fraction and constant pressure, relative temperature measurements can be made up to temperatures of about 1000 K. For example, pumping by a KrF excimer laser at 248 nm can provide an estimated 1 K measurement uncertainty at 300 K. When the acetone mole fraction is not constant (such as in a mixing or reacting flow), a twoline technique can be used that is based on measuring the ratio of the LIF signals resulting from the excitation by two fixed-frequency lasers. For example, the ratio of PLIF images obtained from pumping by a XeCl excimer (308 nm) and quadrupled Nd •• YAG (266 nm) can be used to achieve a factor of 5 variation in the signal ratio across the range 300–1000 K. Compared to the singlelaser technique, the two-laser technique is considerably harder to implement (particularly if both images are acquired simultaneously), and it exhibits substantially lower temperature sensitivity. All of the techniques that have been developed for measuring pressure do not measure it directly but instead infer its value from an equation of state combined with measurements of the fluid density and temperature. For this reason, pressure is very difficult to infer in low-speed flows because the pressure fluctuations result in only very small fluctuations in the density and temperature. PLIF has seen the most extensive use in pressure imaging, although one technique based on FRS (94) has been developed and was described earlier. For example, in (104), PLIF of seeded
413
iodine and known absorption line shapes were used to infer first-order accurate pressure information for an underexpanded jet. This technique requires an isentropic flow assumption, which makes it inapplicable in many practical situations. In a related iodine PLIF technique, the pressure field was obtained by measuring its effect on the broadening of the absorption line shape (105). A limitation of this technique is that it is not very sensitive to pressure for moderate to high pressures (e.g., near 1 atm and above). In (106), NO PLIF was used to infer the 2-D pressure field in a high-enthalpy shock tunnel flow using the ratio of NO PLIF signals from a pressure-insensitive B-X transition and a pressure-sensitive A-X transition. A correction for the temperature measured in an earlier study then allowed them to infer the static pressure. This technique may be more practical than I2 PLIF in some cases because NO occurs naturally in high-temperature air flows, but its disadvantages include the low fluorescent yield of the B-X transition and that accurate quenching rates are required. NO PLIF was also used to infer the pressure field of a bluffbody turbulent wake whose temperature field is shown in Fig. 18 (99). In this technique, trace levels of NO were seeded into a nitrogen-free stream. Because N2 is very inefficient in quenching NO fluorescence, the LIF signal is directly proportional to the static pressure and to a nonlinear function of temperature. However, the temperature dependence can be corrected for if the temperature is measured independently, such as by the two-line method. The resulting mean pressure field obtained by this technique is shown in Fig. 19. This figure shows the low-pressure expansion fans originating from the lip of the splitter plate, the pressure increase across the recompression shock, and the nearly constant-pressure turbulent wake. Velocity The most widely applied velocity imaging technique in fluid mechanics is particle image velocimetry (PIV). PIV is a very robust and accurate technique, which in its
Temperature
P /P∞
280 K Mach 3
Mach 3
1.0 0.8 0.6 0.4 0.2
50 K Figure 18. The mean temperature field of a supersonic bluff-body wake derived from two-line NO PLIF imaging. The field of view is 63 mm wide by 45 mm high. (Reprinted with permission from PLIF Imaging of Mean Temperature and Pressure in a Supersonic Bluff Wake by E. R. Lachney and N. T. Clemens, Experiments in Fluids, 24, 354–363, copyright 1998 Springer-Verlag.) See color insert.
Figure 19. The mean pressure field of a supersonic bluff-body wake derived from NO PLIF imaging. The field of view is 63 mm wide by 45 mm high. (Reprinted with permission from PLIF Imaging of Mean Temperature and Pressure in a Supersonic Bluff Wake by E. R. Lachney and N. T. Clemens, Experiments in Fluids, 24, 354–363, copyright 1998 Springer-Verlag.) See color insert.
414
FLOW IMAGING
most common implementation enables the imaging of two components of velocity in a cross-section of the flow. PIV measurements are now commonplace and they have been applied in a wide range of gas- and liquid-phase flows, including microfluidics, large-scale wind tunnels, flames, plasmas, and supersonic and hypersonic flows. Excellent introductions to PIV can be found in several references (5,20,23,107). At its simplest, PIV involves measuring the displacement of particles moving with the fluid for a known time. The presumption, of course, is that the particles, which are usually seeded, have sufficiently low inertia to track the changes in the motion of the flow (108). Even a cursory review of the literature shows that there are myriad variations of PIV, and therefore for brevity, only one of the most commonly used configurations will be discussed here. In a typical PIV experiment, two spatially coincident laser pulses are used where there is a known time between the pulses. The coincident beams are formed into thin sheets and passed through a flow seeded with particles. The lasers used are usually frequency-doubled Nd •• YAG lasers, and the two pulses can
originate from two separate lasers, from double-pulsing the Q-switch of a single laser, or from one of several dual-cavity lasers that were designed specifically for PIV applications. In two-component PIV, the scattering from the particles is imaged at 90° to the laser sheets using a high-resolution CCD camera, or less commonly today, a chemical film camera. The particle pairs can be imaged onto either a single frame (i.e., a double exposure) or onto separate frames. A major issue in PIV is that if it is not possible to tell which particle image of the pair came first, then there is an ambiguity in the direction of the velocity vector. This is one of the main advantages of the twoframe method because it does not suffer from directional ambiguity. Several CCD cameras on the market are ideal for two-frame PIV. They are based on interline transfer technology and can ‘‘frame-straddle,’’ or allow the capture of two images a short time apart. An example of a twoframe particle field captured in a turbulent jet is shown in Fig. 20 (from Ref. 109). The camera used was a 1k × 1 k frame-straddling camera (Kodak ES1.0), the field of view was 33 × 33 mm, and the time between pulses was 8 µs.
(a)
t + ∆t
Spatial crosscorrelation function
t
(b)
1 m/s
Figure 20. Sample PIV images. (a) Two-frame particle field images. The right image was captured 8 µs after the left image. (b) A two-component velocity vector field computed from a cross-correlation analysis of a two-frame particle image pair. (Reprinted with permission from Ref. 109.)
FLOW IMAGING
The particle displacements are obtained by dividing the image into smaller interrogation windows (usually ranging from 16 × 16 to 64 × 64 pixels), for which a single velocity vector is computed for each window. Examples of interrogation windows are shown as white boxes in Fig. 20a. The displacement is determined by computing the spatial cross-correlation function for the corresponding windows in each image of the pair, as shown in Fig. 20a. The mean displacement and direction of the velocity vector can then be determined from the location of the peaks in the cross-correlation function. This is then repeated for every interrogation window across the frame. A sample turbulent jet velocity field computed from this process is shown in Fig. 20b. For this vector field, the interrogation window size was 32 × 32 pixels, and the window was offset by 16 pixels at a time (50% overlap), which resulted in 62 × 62 vectors across the field. Because the velocity is averaged across the interrogation window, PIV resolution and DSR are important issues. For example, typically cited values of the resolution are about 0.5 to 1 mm. Perhaps a bigger limitation though, is the DSR Np /Nw , where Np and Nw are the linear sizes of the array and the interrogation window, respectively. For example, a 1k × 1k array that has a 32 × 32 window gives a DSR of only 32. If the minimum required resolution is 1 mm, then the maximum field of view that can be used is 32 mm. Limited DSR is one of the main reasons for using large format film and megapixel CCD arrays. Several algorithms have been developed that use advanced windowing techniques (110) or a combination of PIV and particle tracking (111–113) to improve both the resolution and DSR of the measurements substantially. The PIV technique described can measure only two components of velocity; however, several techniques have been developed that enable measuring all three components. Probably the most widely used technique to date is stereoscopic PIV, which requires using two cameras, separated laterally, but share a common field of view (20,23). Particle displacement perpendicular to the laser sheet can be computed by using the particle images from the two cameras and simple geometric relationships. Although stereoscopic PIV is somewhat more difficult to implement than two-component PIV, much of the development burden can be avoided because entire systems are available from several different companies. In another class of velocity imaging techniques, the scattered light signal is related to the Doppler shift imparted by the bulk motion of the flow. Both FRS and PLIF techniques have been applied that use this effect and may be preferable to PIV under some circumstances. For example, both FRS and PLIF velocimetry become easier to use in high-speed flows, owing to increasing Doppler shifts, whereas PIV becomes more difficult to use at high speeds because of problems in obtaining sufficient seeding density and ensuring small enough particles to track the fluid motion. Rayleigh scattering velocimetry has seen substantial development in recent years, and different researchers have implemented closely related techniques, which go by the names of global Doppler velocimetry, filtered Rayleigh scattering, and planar Doppler velocimetry. Here, the
415
less ambiguous term, planar Doppler velocimetry (PDV), will be used. A recent review of these techniques can be found in (114). All of these techniques operate on the basic principle that small changes in the frequency of the scattered light resulting from Doppler shifts can be inferred from the signal when the scattered light is imaged through a narrowband notch filter. Two Doppler shifts affect measurement. When molecules in the flow are illuminated by an incident laser beam, the radiation by the induced dipoles in the gas will be Doppler-shifted if there is a component of the bulk fluid velocity in the direction of the laser beam propagation. Similarly, the detector will perceive a further Doppler shift in the induced radiation if there is a component of the bulk fluid velocity in the direction of the detector. The result is that the perceived Doppler shift fD measured by the detector is given by (82) fD =
(s − o) · V , λ
(35)
is the bulk fluid velocity, o is the unit vector in the where V laser propagation direction, and s is the vector originating from the probe volume and pointing toward the detector. In PDV, the Rayleigh scattering is induced by a tunable narrow line width laser, and the flow is imaged through a notch filter. In the most common implementation, the laser source is an injection seeded, frequency-doubled Nd •• YAG laser, which has a line width of about 50–100 MHz and can be tuned over a range of several GHz (114). The notch filter is usually an iodine vapor cell. In one technique, the laser is tuned so the non-Doppler-shifted light is centered on the edge of the absorption line, such as the right edge of the line shown in Fig. 17. Usually the scattering medium is an aerosol, such as a condensation fog, and thus the scattered line width is nearly the same as that of the laser. If the flow has a constant particle density, then the signal will increase as the Doppler shift increases. If the notch filter line shape is known, then the signal can be directly related to the velocity. In most cases, the density is not constant, and therefore a separate nonDoppler-shifted density measurement must be made. This can be accomplished by using another camera or a singlecamera split-image configuration (114). Much of the recent work in this area has been in improving the accuracy of the technique and in extending it to enable measuring three components of velocity. PLIF velocimetry is also a Doppler-shift-based technique, which is particularly applicable in high-speed reacting flows where seeding the flow with particles is not practical or where low gas densities preclude the use of Rayleigh scattering. In most PLIF velocimetry studies, the flow is seeded by a tracer, such as iodine or NO, although naturally occurring species, such as OH, have also been used successfully. PLIF velocimetry is accomplished by having the laser sheet propagate as much as possible in the direction of the bulk flow, which maximizes the Doppler shift seen by the absorbing molecules. The camera is usually oriented normally to the laser sheet, and the broadband fluorescence is collected (i.e., it is not spectrally resolved). Thus, unlike PDV, only the Doppler
416
FLOW IMAGING
shift induced by the flow/laser beam is relevant. The difficulty in PLIF is that in addition to velocity, the fluid composition, pressure, and temperature also affect the signal through number density, population, quenching, and line shape effects. Therefore, schemes have to be devised that enable isolating effects due to velocity alone. In an early PLIF velocimetry technique, a tunable CW narrow line laser (argon-ion) was used to scan in frequency across an absorption line of I2 seeded in a high-speed flow, and several images were captured during the scan, which enabled reconstructing the line shape for each pixel (115). The measured Doppler-shifted line shapes were then compared to an unshifted line shape taken in a stationary reference cell. Although this technique worked well, it can provide only time-average measurements because it takes finite time to scan the laser. In another technique also employing I2 PLIF, two discrete laser frequencies and four laser sheets were used to measure two components of mean velocity and pressure in an underexpanded jet (104). In the techniques mentioned before, the laser line needs to be much narrower than the absorption line. It can be an advantage, however, when the laser line width is much larger than the absorption line width because it reduces the sensitivity of the signal to variations in the absorption line shape. For example, in (116), two counterpropagating laser sheets and two cameras were used to image one component of velocity in NO-seeded supersonic flows. The reason for using counterpropagating sheets is that the ratio of the LIF signals from the two sheets can be related to the velocity component but is independent of the local temperature and pressure. When the laser line is of the same order of magnitude as the absorption line, such two-sheet fixed-frequency techniques require modeling the overlap integral for the absorption and laser line shapes (117). Future Developments Although new quantitative imaging techniques will certainly continue to be developed, it is likely that the greatest effort in the future will be directed at simply improving existing techniques by making them easier and cheaper to implement and by improving the accuracy, precision, resolution, and framing rate. A good example of the improvement that can be achieved by better technology is to compare the quality of OH PLIF imaging from one of the first images captured by this technique in 1984 (118) to images that have been captured more recently (76). The difference in quality is dramatic, despite the use of the same technique in both cases. A major trend that started in the past decade, but will no doubt continue, is the application of two or more ‘‘established’’ techniques to obtain simultaneous images of several flow parameters (81). Multiple-parameter techniques include the simultaneous acquisition of multiple flow variables, such as velocity and scalars. Multipleparameter imaging also includes imaging the same flow variable with a short time delay between images, to obtain the rate of change of a property, and acquiring two images of the same flow variable where the laser sheets are placed a small distance apart to enable the computation of spatial gradients. Because multiple-parameter imaging
usually involves established techniques, its implementation is usually limited by the availability of the required equipment and by optical access for all of the laser beams and cameras. An obvious limitation of most of the techniques that have been discussed is that the framing rates are typically limited to a few hertz. This limitation is imposed by the laser and camera systems that are available now. Although there is no question that the laser power of high-repetition rate commercial lasers will continue to increase with time, limited laser power will remain an obstacle to kilohertz imaging for many of the techniques discussed in this article. For example, the Rayleigh scattering image of Fig. 16 required about 300 mJ of light from a frequencydoubled Nd •• YAG operating at 10 Hz, which corresponds to 3 W of average power. If it was desired to acquire images that have the same SNR at 10 kHz, such as is likely to be required in even a moderate Reynolds number gasphase flow, then this would require a doubled Nd •• YAG laser whose average power is 3 kW. This amount of continuous average power might not be large compared to that required for metal cutting or ballistic missile defense, but it is an astounding amount of power by flow diagnostics standards, and handling such a laser would provide many practical problems for the user. Highframing rate imaging is also currently limited by camera technology; no camera is currently available that operates quasi-continuously at 10 kHz at 10–20 e− rms noise per pixel as is necessary to obtain images of the quality of Fig. 16. The reason for this is that high framing rates require high readout bandwidths, which in turn lead to more noise. Thus to keep the noise low, either the framing rate or the number of pixels must be degraded. Despite this caveat, progress toward higher framing rate imaging for all of the techniques discussed here will continue as the necessary technologies improve. Another major trend that will continue is the further development and refinement of three-dimensional techniques. The most commonly used three-dimensional techniques are classified as either tomography or reconstructions from stacks of planar images. Tomography is the reconstruction of a 3-D field of a fluid property from line-of-sight integrated data measured from several different directions through the flow. For example, both absorption (11) and interferometry (119,120) have been used, which enable reconstructing the 3-D concentration and index-of-refraction fields, respectively. A more popular technique is to reconstruct the 3-D field using a set of images that have been acquired by rapidly scanning a laser sheet through the flow and capturing several planar images during the sweep (5). This technique has been used effectively in many aqueous flows using PLIF excited by either continuous or pulsed lasers (6,59,63,121). However, because these techniques rely on sweeping a laser beam or sheet through the flow on a timescale that is shorter than the characteristic fluid timescales, such techniques are significantly more challenging in gas-phase flows. It is remarkable, however, that such experiments have been accomplished by sweeping a flashlamp-pumped dye laser sheet through the flow in only a few microseconds. In one case, the Rayleigh scattered light from a
FLOW IMAGING
freon-gas jet was imaged (122) and in another case the Mie scattering from a particle-laden supersonic mixing layer was imaged (123). Both studies used a high-speed framing camera that could acquire only a few frames (e.g., 10–20), and thus the resolution of the reconstructions was obviously quite limited. The future of 3-D flow imaging is probably best exemplified by holographic PIV (HPIV), which provides accurate three-component velocity fields throughout a volume of fluid (124–127). HPIV is an intrinsically 3-D technique, which begins with recording a hologram of the 3-D double-exposure particle field onto high-resolution film. The image is then reconstructed, and the particle field is digitized by sequentially imaging planes of the reconstruction using a digital camera. HPIV enables the acquisition of an astounding amount of data, but because it is a challenging technique to implement and it requires using very high resolution large-format chemical film, the framing rates will remain low for at least the near future. In conclusion, flow imaging is driving a revolution in fluid mechanics research that will continue well into the future. Continued advances in laser and digital camera technologies will make most of the imaging techniques described in this article possible one day at sufficient spatial resolution and framing rates to resolve virtually any flow spatial and temporal scale of interest. This is an exciting proposition as we enter a new century of experimental fluid dynamics research. Acknowledgments The author acknowledges the generous support of his research into flow imaging by the National Science Foundation, particularly under grants CTS-9319136 and CTS-9553124. In addition, the author thanks Michael Tsurikov and Yongxi Hou of UT-Austin for help in preparing this article.
ABBREVIATIONS AND ACRONYMS 2-D 3-D CCD CTF CW DSR FRS FT ICCD IR LIF LSF MTF NA Nd:YAG OPO OTF PDV PIV PLIF PSF PTF SBR SNR SRF
two-dimensional three-dimensional charge-coupled device contrast transfer function continuous wave dynamic spatial range filtered Rayleigh scattering Fourier transform intensified charge-coupled device infrared laser-induced fluorescence line spread function modulation transfer function numerical aperture neodymium: yttrium-aluminum garnet optical parametric oscillator optical transfer function planar Doppler velocimetry particle image velocimetry planar laser-induced fluorescence point spread function phase transfer function signal to background ratio signal to noise ratio step response function
STP TEM TV UV
417
standard temperature and pressure transverse electromagnetic modes television ultraviolet
BIBLIOGRAPHY 1. P. H. Paul, M. G. Garguilo, and D. J. Rakestraw, Anal. Chem. 70, 2459–2467 (1998). 2. J. G. Santiago et al., Exp. Fluids 25, 316–319 (1998). 3. L. M. Weinstein, High-Speed Research: 1995 Sonic Boom Workshop, Atmospheric Propagation and Acceptability Studies, NASA CP-3335, October, 1995. 4. M. Van Dyke, An Album of Fluid Motion, The Parabolic Press, Stanford, 1982. 5. A. J. Smits and T. T. Lim, eds., Flow Visualization: Techniques and Examples, Imperial College Press, London, 2000. 6. W. J. A. Dahm and K. B. Southerland, in A. J. Smits and T. T. Lim, eds., Flow Visualization: Techniques and Examples, Imperial College Press, London, 2000, pp. 289– 316. 7. L. K. Su and W. J. A. Dahm, Phys. Fluids 8, 1,883–1,906 (1996). 8. M. Gharib, J. Fluids Eng. 118, 233–242 (1996). 9. W. Merzkirch, Flow Visualization, 2nd ed., Academic Press, Orlando, 1987. 10. G. S. Settles, AIAA J. 24, 1,313–1,323 (1986). 11. A. C. Eckbreth, Laser Diagnostics for Combustion Temperature and Species, Abacus Press, Cambridge, 1988. 12. B. J. Kirby and R. K. Hanson, Appl. Phys. B 69, 505–507 (1999). 13. J. Hecht, The Laser Guidebook, 2nd ed., McGraw-Hill, NY, 1992. 14. P. Wu and R. B. Miles, Opt. Lett. 25, 1,639–1,641 (2000). 15. J. M. Grace et al., Proc. SPIE 3642, 133–141 (1999). 16. A. E. Siegman, Lasers, University Science Books, Mill Valley, CA, 1986. 17. M. W. Sasnett, in D. R. Hall and P. E. Jackson, ed., The Physics and Technology of Laser Resonators, Adam Hilger, Bristol, 1989, pp. 132–142. 18. W. J. Smith, Modern Optical Engineering: The Design of Optical Systems, 2nd ed., McGraw-Hill, NY, 1990. 19. M. B. Long, D. C. Fourguette, M. C. Escoda, and C. B. Layne, Opt. Lett. 8, 244–246 (1983). 20. R. J. Adrian, Ann. Rev. Fluid Mech. 23, 261–304 (1991). 21. A. Vogel and W. Lauterborn, Opt. Lasers Eng. 9, 274–294 (1988). 22. J. C. Lin and D. Rockwell, Exp. Fluids 17, 110–118 (1994). 23. M. Raffel, C. E. Willert, and J. Kompenhans, Particle Image Velocimetry: A Practical Guide, Springer, Berlin, 1998. 24. B. Lecordier et al., Exp. Fluids 17, 205–208 (1994). 25. N. T. Clemens, S. P. Petullo, and D. S. Dolling, AIAA J. 34, 2,062–2,070 (1996). 26. P. H. Paul, AIAA Paper 91–2315, June, 1991. 27. J. M. Seitzman and R. K. Hanson, in A. Taylor, ed., Instrumentation for Flows with Combustion, Academic Press, London, 1993. 28. RCA Staff, Electro-Optics Handbook, RCA, Lancaster, PA, 1974. 29. P. H. Paul, I. van Cruyningen, R. K. Hanson, and G. Kychakoff, Exp. Fluids 9, 241–251 (1990).
418
FLOW IMAGING
30. J. R. Janesick et al., Opt. Eng. 26, 692–714 (1987). 31. I. S. McLean, Electronic Imaging in Astronomy: Detectors and Instrumentation, John Wiley & Sons, NY, 1997. 32. L. K. Su and N. T. Clemens, Exp. Fluids 27, 507–521 (1999). 33. E. Hecht, Optics, 3rd ed., Addison-Wesley, Reading, MA, 1998. 34. W. K. Pratt, Digital Image Processing, 2nd ed., Wiley, NY, 1991. 35. T. L. Williams, The Optical Transfer Function of Imaging Systems, Institute of Physics Publishing, Bristol, 1999. 36. C. S. Williams and O. A. Becklund, Introduction to the Optical Transfer Function, John Wiley & Sons, NY, 1989. 37. W. Wittenstein, J. C. Fontanella, A. R. Newbery, and J. Baars, Optica Acta 29, 41–50 (1982). 38. R. N. Bracewell, The Fourier Transform and its Applications, 2nd ed., McGraw-Hill, NY, 1986. 39. F. Chazallet and J. Glasser, SPIE Proc. 549, 131–144 (1985). 40. H. Tennekes and J. L. Lumley, A First Course in Turbulence, MIT Press, Cambridge, 1972. 41. G. K. Batchelor, J. Fluid Mech. 5, 113–133 (1959). 42. K. A. Buch Jr. and W. J. A. Dahm, J. Fluid Mech. 317, 21–71 (1996). 43. K. A. Buch Jr. and W. J. A. Dahm, J. Fluid Mech. 364, 1–29 (1998). 44. C. A. Friehe, C. W. van Atta, and C. H. Gibson, in AGARD Turbulent Shear Flows CP-93, North Atlantic Treaty Organization, Paris, 1971, pp. 18-1–18-7. 45. J. C. Wyngaard, J. Phys. E: J. Sci. Instru. 1, 1,105–1,108 (1968). 46. J. C. Wyngaard, J. Phys. E: J. Sci. Instru. 2, 983–987 (1969). 47. R. A. Antonia and J. Mi, J. Fluid Mech. 250, 531–551 (1993). 48. C. J. Chen and W. Rodi, Vertical Turbulent Buoyant Jets: A Review of Experimental Data, Pergamon, Oxford, 1980. 49. R. J. Goldstein and T. H. Kuen, in R. J. Goldstein, ed., Fluid Mechanics Measurements, Taylor and Francis, London, 1996. 50. N. T. Clemens and P. H. Paul, Phys. Fluids A 5, S7 (1993). 51. P. S. Greenberg, R. B. Klimek, and D. R. Buchele, Appl. Opt. 34, 3,810–3,822 (1995). 52. A. K. Agrawal, N. K. Butuk, S. R. Gollahalli, and D. Griffin, Appl. Opt. 37, 479–485 (1998). 53. J. W. Goodman, Introduction to Fourier Optics, 2nd ed., McGraw-Hill, Boston, 1996. 54. G. F. Albrecht, H. F. Robey, and T. R. Moore, Appl. Phys. Lett. 57, 864–866 (1990). 55. D. Papamoschou and H. F. Robey, Exp. Fluids 17, 10–15 (1994). 56. L. M. Weinstein, AIAA J. 31, 1,250–1,255 (1993). 57. R. K. Hanson, J. M. Seitzman, and P. H. Paul, Appl. Phys. B 50, 441–454 (1990). 58. M. M. Koochesfahani and P. E. Dimotakis, J. Fluid Mech. 170, 83–112 (1986). 59. R. R. Prasad and K. R. Sreenivasan, J. Fluid Mech. 216, 1–34 (1990). 60. C. Arcoumanis, J. J. McGuirk, and J. M. L. M. Palma, Exp. Fluids 10, 177–180 (1990). 61. J. R. Saylor, Exp. Fluids 18, 445–447 (1995). 62. P. S. Karasso and M. G. Mungal, Exp. Fluids 23, 382–387 (1997). 63. J. Sakakibara and R. J. Adrian, Exp. Fluids 26, 7–15 (1999).
64. J. D. Nash, G. H. Jirka, and D. Chen, Exp. Fluids 19, 297–304 (1995). 65. A. Lozano, B. Yip, and R. K. Hanson, Exp. Fluids 13, 369–376 (1992). 66. S. H. Smith and M. G. Mungal, J. Fluid Mech. 357, 83–122 (1998). 67. D. Papamoschou and A. Bunyajitradulya, Phys. Fluids 9, 756–765 (1997). 68. D. Wolff, H. Schluter, and V. Beushausen, Berichte der Bunsen-Gesellschaft fur physikalisc 97, 1,738–1,741 (1993). 69. R. J. Hartfield Jr., J. D. Abbitt III, and J. C. McDaniel, Opt. Lett. 14, 850–852 (1989). 70. J. M. Donohue and J. C. McDaniel Jr., AIAA J. 34, 455–462 (1996). 71. N. T. Clemens and M. G. Mungal, J. Fluid Mech. 284, 171–216 (1995). 72. N. T. Clemens and P. H. Paul, Phys. Fluids 7, 1,071–1,081 (1995). 73. T. C. Island, W. D. Urban, and M. G. Mungal, Phys. Fluids 10, 1,008–1,021 (1998). 74. G. F. King, J. C. Dutton, and R. P. Lucht, Phys. Fluids 11, 403–416 (1999). 75. T. Parr and D. Hanson-Parr, L. DeLuca, E. W. Price, and M. Summerfield, eds., Nonsteady Burning and Combustion Stability of Solid Propellants, Progress in Aeronautics and Astronautics, American Institute of Aeronautics and Astronautics, vol. 143, Washington, DC, 1992, pp. 261–323. 76. J. M. Donbar, J. F. Driscoll, and C. D. Carter, Combustion and Flame 122, 1–19 (2000). 77. C. F. Bohren and D. R. Huffman, Absorption and Scattering of Light by Small Particles, John Wiley & Sons, NY, 1983. 78. E. J. McCartney, Optics of the Atmosphere: Scattering by Molecules and Particles, Wiley, NY, 1976. 79. B. Yip, R. L. Schmitt, and M. B. Long, Opt. Lett. 13, 96–98 (1988). 80. D. A. Feikema, D. Everest, and J. F. Driscoll, AIAA J. 34, 2,531–2,538 (1996). 81. M. B. Long, in A. M. K. P. Taylor, ed., Instrumentation for Flows with Combustion, Academic, London, 1993, pp. 467–508. 82. R. B. Miles and W. R. Lempert, Ann. Rev. Fluid Mech. 29, 285–326 (1997). 83. M. Namazian, J. T. Kelly, and R. W. Schefer, TwentySecond Symposium (Int.) on Combustion, The Combustion Institute, Pittsburgh, 1988, pp. 627–634. 84. M. Namazian et al., Exp. Fluids 8, 216–228 (1989). 85. J. B. Kelman, A. R. Masri, S. H. Starner, and R. W. Bilger, Twenty-Fifth Symposium (Int.) on Combustion, The Combustion Institute, Pittsburgh, 1994, pp. 1,141–1,147. 86. D. Dabiri and M. Gharib, Exp. Fluids 11, 77–86 (1991). 87. I. Kimura et al., in B. Khalighi, M. J. Braun, and C. J. Freitas, eds., Flow Visualization, vol. 85, ASME FED, 1989, pp. 69–76. ¨ 88. M. Ozawa, U. Muller, I. Kimura, and T. Takamori, Exp. Fluids 12, 213–222 (1992). 89. M. C. J. Coolen, R. N. Kieft, C. C. M. Rindt, and A. A. van Steenhoven, Exp. Fluids 27, 420–426 (1999). 90. J. Coppeta and C. Rogers, Exp. Fluids 25, 1–15 (1998). 91. A. Alahyari and E. K. Longmire, Exp. Fluids 17, 434–440 (1994). 92. D. C. Fourguette, R. M. Zurn, and M. B. Long, Combustion Sci. Technol. 44, 307–317 (1986).
FORCE IMAGING
419
93. D. A. Everest, J. F. Driscoll, W. J. A. Dahm, and D. A. Feikema, Combustion and Flame 101, 58–68 (1995).
126. J. O. Scherer and L. P. Bernal, Appl. Opt. 36, 9,309–9,318 (1997).
94. J. N. Forkey, W. R. Lempert, and R. B. Miles, Exp. Fluids 24, 151–162 (1998).
127. J. Zhang, B. Tao, and J. Katz, Exp. Fluids 23, 373–381 (1997).
95. G. S. Elliott, N. Glumac, C. D. Carter, and A. S. Nejad, Combustion Sci. Technol. 125, 351–369 (1997). 96. R. G. Seasholtz, A. E. Buggele, and M. F. Reeder, Opt. Lasers Eng. 27, 543–570 (1997).
FORCE IMAGING
97. B. K. McMillan, J. L. Palmer, and R. K. Hanson, Appl. Opt. 32, 7,532–7,545 (1993).
KIM DE ROY RSscan INTERNATIONAL Belgium
98. J. L. Palmer, B. K. McMillin, and R. K. Hanson, Appl. Phys. B 63, 167–178 (1996).
L. PEERAER
99. E. R. Lachney and N. T. Clemens, Exp. Fluids 24, 354–363 (1998).
University of Leuven Belgium, FLOK University Hospitals Leuven CERM
100. T. Ni-Imi, T. Fujimoto, and N. Shimizu, Opt. Lett. 15, 918–920 (1990). 101. J. M. Seitzman and R. K. Hanson, Appl. Phys. B 57, 385–391 (1993). 102. M. C. Thurber, F. Grisch, and R. K. Hanson, Opt. Lett. 22, 251–253 (1997).
INTRODUCTION
103. M. C. Thurber et al., Appl. Opt. 37, 4,963–4,978 (1998).
The study of human locomotion has generated a substantial number of publications. Starting from evolutionary history, the authors try to give some insight into the transition from quadripedal to bipedal locomotion (1,2). Evolving to a semiaquatic mode of life, it may be assumed that the human foot developed from its original versatile function for swimming (3), support, and gripping to a more specialized instrument that can keep the body in an upright position. This change of function enables all of the movements that are specific to humans such as walking and running. Data deduced from the literature show that the effects of walking speed on stride length and frequency are similar in bonobos, common chimpanzees, and humans. This suggests that within extant Hominidae, spatiotemporal gait characteristics are highly comparable (4) (Fig. 1). Despite these similarities, the upright position and erect walking is accepted as one of the main characteristics that differentiate humans from animals. No wonder that throughout the search for human evolution the imprints of the feet of the first humanoid creatures that were found are studied and discussed as much as their skulls. Whatever the causes for this evolution to the erect position, the fact remains that static and dynamic equilibrium must be achieved during bipedal activities, thus dramatically reducing the supporting area with respect to the quadripedal condition where this total area is formed by more than two feet. The available area of support during bipedal activities is thus restricted to that determined by one or both feet. The anatomical structure of the human foot, as well as the neuromuscular and circulatory control must have evolved to a multijoint dynamic mechanism that determines the complex interaction between the lower limb and the ground during locomotion (5). Consequently, besides gravitation, the external forces acting on the human body act on the plantar surface of the foot and generate movement according to Newton’s laws. Thus, studying the latter, called the ground reaction force (GRF), is essential to our understanding of human normal and pathological locomotion. The GRF may, however, vary in point of application, magnitude, and orientation, necessitating the
104. B. Hiller and R. K. Hanson, Appl. Opt. 27, 33–48 (1988). 105. R. J. Hartfield Jr., S. D. Hollo, and J. C. McDaniel, AIAA J. 31, 483–490 (1993). 106. P. Cassady and S. Lieberg, AIAA Paper No. 92-2962, 1992. 107. P. Buchhave, in L. Lading, G. Wigley, and P. Buchhave, eds., Optical Diagnostics for Flow Processes, Plenum, NY, 1994, pp. 247–269. 108. A. Melling, Meas. Sci. Technol. 8, 1,406–1,416 (1997). 109. J. E. Rehm, Ph.D. Dissertation, The University of Texas at Austin, 1999. 110. J. Westerweel, D. Dabiri, and M. Gharib, Exp. Fluids 23, 20–28 (1997). 111. R. D. Keane, R. J. Adrian, and Y. Zhang, Meas. Sci. Technol. 6, 754–768 (1995). 112. E. A. Cowen and S. G. Monismith, Exp. Fluids 22, 199–211 (1997). 113. J. E. Rehm and N. T. Clemens, Exp. Fluids 26, 497–504 (1999). 114. M. Samimy and M. P. Wernet, AIAA J. 38, 553–574 (2000). 115. J. C. McDaniel, B. Hiller, and R. K. Hanson, Opt. Lett. 8, 51–53 (1983). 116. P. H. Paul, M. P. Lee, and R. K. Hanson, Opt. Lett. 14, 417–419 (1989). 117. M. Allen et al., AIAA J. 32, 1,676–1,682 (1994). 118. G. Kychakoff, R. D. Howe, and R. K. Hanson, Appl. Opt. 23, 704–712 (1984). 119. L. Hesselink, in W. -J. Yang, ed., Handbook of Flow Visualization, Hemisphere, NY, 1987. 120. D. W. Watt and C. M. Vest, Exp. Fluids 8, 301–311 (1990). 121. M. Yoda, L. Hesselink, and M. G. Mungal, J. Fluid Mech. 279, 313–350 (1994). 122. B. Yip, R. L. Schmitt, and M. B. Long, Opt. Lett. 13, 96–98 (1988). 123. T. C. Island, B. J. Patrie, and R. K. Hanson, Exp. Fluids 20, 249–256 (1996). 124. H. Meng and F. Hussain, Fluid Dynamics Res. 8, 33–52 (1991). 125. D. H. Barnhart, R. J. Adrian, and G. C. Papen, Appl. Opt. 33, 7,159–7,170 (1994).
420
FORCE IMAGING
(a)
(b)
(a)
(b)
L (c)
(d) (c)
R Figure 1. Plantar pressure measurement of the bonobo apes (a,b) (source: The Bonobo Project, Universities of Antwerp and Ghent (Belgium) and the Royal Zoological Society of Antwerp.) and a human being (c,d) (source c,d: footscan). See color insert for (b) and (d).
measurement of its vertical, fore–aft, and mediolateral components for accurate analysis. Different authors have thoroughly investigated these components using different devices (6–8). Force plates enable measuring the resultant three force vectors by four three-dimensional force sensors (Fig. 2c) commonly mounted near the corners underneath a stiff rectangular plate (Fig. 2a,b). Measuring force distribution on the foot would, however, necessitate using a multitude of these three-dimensional sensors across the whole of the area of the plate, thus challenging actual state-of-the-art technology and considerations of cost-efficiency. Commercially available plantar force- or pressure measurement systems are at present restricted to measuring forces normal to the sensor (9,10). We may expect, however, that accurate and reliable measurement of shear forces in the fore–aft and mediolateral directions will become available in the near future. Although several sensors such as piezoelectric crystals and foils, photoelastic materials, and others have been used in the past, most state-of-the-art systems provide a thin (approximately 1 mm or less) sensor array composed of resistive or capacitive force sensors of known area to calculate the mean pressure over the sensor area (10,11). These may be mounted in pressure-distribution platforms or in pressuresensitive insoles (12) available in different foot sizes and developed by different manufacturers. This type of measuring equipment is used in biomechanical analysis of the
Figure 2. (a) Image of the 9253 multicomponent Kistler force plate. (b) Image of the 9285 multicomponent glass-top Kistler force plate. (c) Graphic representation of the multicomponent quartz force sensors. See color insert for (c).
normal, pathological, or even artificial foot during activities such as standing, walking, running, jumping, cycling, or any activity where knowledge of forces acting on the foot are of interest. Therefore, a subject-specific analysis of the dynamics of locomotion is necessary to distinguish between normal and pathological foot function, to differentiate between the various levels of impairment, and to assess quantitatively the restoration of normal foot function after treatment (13). The field of application of these force-distribution measuring systems is very broad and includes medicine, rehabilitation, biomechanics, sports, ergonomics, engineering, and manufacturing of footwear and insoles. Similar measurement techniques are also used for evaluating pressure parameters in seating, residual limb–socket interfaces, and pressure garments. Because
FORCE IMAGING
the sensors have to conform to the sometimes irregular body shape, technical problems due to sensor bending or other factors will occur. Nevertheless several studies illustrate the clinical relevance of these systems in identifying areas of high pressure, or in making comparative studies, in seating (14–16), stump socket interfaces (17), and pressure garments (18). Final conclusions, however, point out the importance of further development of the existing sensor materials to improve the reliability and accuracy of the existing measurement systems. Foot pressure measurements, however, are of utmost importance in understanding the central role of biomechanical issues in treating the diabetic foot (19) and other foot pathologies, as well as in the design, manufacture, and assessment of insoles and footwear for these patients. To this end, foot-pressure distribution was at first assessed visually by using a so-called podobarograph where the plantar surface of the foot is observed while standing on a glass plate. A camera and image processing techniques enabled a gross quantification of pressure after calibrating the system. Because only barefoot images restricted to the equipment location could be studied, further developments aimed at systems that can be used to measure the pressure distribution between the plantar surface of the foot and footwear during free dynamic activities, in particular, during the gait. These insole measurement systems evolved during the last few decades from a unique, relatively large sensor placed under the foot location of interest to eight or more individual sensors placed under anatomical landmarks on the plantar surface of the foot. Measurement frequency was limited to the sampling rate of the analogue-to-digital converters and computers available at the time. The actual systems present an array of numerous small sensors distributed over the whole plantar surface of the foot, and thus eliminate the need for precise anatomical positioning. Sampling frequency went up to 500 Hz per sensor and higher, meaning that the registration system passes on information every 0.002 seconds, significantly improving the quality. Simultaneously, the measurement platforms, which are built into a walkway, evolved accordingly in sensor technology and sampling equipment. The small measurement area of the first systems, which was not much larger than the maximal foot size, is now transformed to plates available in virtually any size that present a geometric array of relatively small sensors (more than one per square cm) sampled at several hundreds of hertz (≥500 Hz). THE IMAGED OBJECT The human foot is a complex, flexible structure composed of multiple bones and joints (Fig. 3). It must provide stability during support, as well as shock absorption and propulsion during respectively the first- and last phases of support during the gait and running. Angular displacements around three axes occur at the ankle joint and at numerous joints of the foot in various degrees during these activities. Foot movements are described in degrees of plantar and dorsal flexion, valgus and varus, ab- and adduction, inversion and eversion, and pronation–supination. These movements
421
Figure 3. Radiograph of the configuration of a normal foot.
occur simultaneously at different foot joints and in various degrees, which makes precise definition of major foot axes extremely difficult. Anatomical or functional aberrations obviously complicate the whole picture. Moreover, foot kinematics influences the movement of lower limb segments. Although several authors investigated the possible relationships between them (13,20,21) by using three-dimensional kinematic measurement systems capable of tracking several segments simultaneously and often combined with measurement of ground reaction forces and foot-pressure distribution, the question remains whether it is preferable to study these relationships patient-by-patient (22). Yet, foot kinematics are often related to frequently diagnosed lower limb injuries, in particular, in sports. Numerous studies focused on several typical injuries. Some recent examples can be found in (21,23,24). Force- or pressure-distribution measurements are often part of the previously mentioned measurement equipment that can provide the user with specific information about the force normal to the sensor surface or with pressure-distribution data (pressure is perpendicular to the surface). These can be measured level with the floor by using platforms or normal to the plantar surface of the foot using force- or pressure-sensitive insoles. Measurement platforms are fixed to the floor. Thus, the measurements reflect the distribution of the vertical ground reaction force on the foot or the shoe during the various situations under study. Because the foot is usually shod, at least in industrialized countries, the foot can be studied as a functional entity with the shoe that is used. As a consequence, the influence of the shoe on the roll-off pattern of the foot can be studied, as well as the influence of insoles used in the shoe. Depending on the goal of a specific research setup, the sole use of pressuredistribution measurements can provide the researcher with significant information on foot function, shoe–foot interaction, and even foot–insole interaction. Although the measurements do not provide information about foot kinematics, several attempts are made to calculate approximate corresponding movements or parameters that reflect foot flexibility by using various mathematical models. An example is calculation of the radius of gyration based on the principal axis theory (25).
422
FORCE IMAGING
FIELD OF APPLICATION Any activity where foot loading is important in understanding movement is a potential field of application for pressure- or force-distribution measurements on the foot. The following areas can be subdivided: • Orthopedics: examination of neuropathic foot ulceration, in particular, foot malformations, pre- and postoperative assessment in foot, knee, and hip surgery (e.g., hallux valgus, (32)) • Pediatrics: the assessment of orthotics and the effects of drug administration (e.g., botulin toxin) on cerebral palsied children • Sports and sport injuries: running, cycling, skiing • Orthopedic appliances: functional evaluation of orthopedic insoles, shoe inserts, orthopedic shoes, as well as lower limb orthotics for partial load bearing and prosthetic foot and knee systems, stump socket interfacial pressure • Biology: roll-off patterns in bipedal and quadripedal locomotion for different animals (e.g., bonobo apes, horses, cows) • Rehabilitation: assessment of different rehabilitation programs in stroke, Parkinson’s disease, amputation • Foot and finite element modeling • Footwear industries: design of shoes, industrial footwear, sports shoes Considering this diversity, it is clear that the same pressure-sensitive sensor material can be used in any field of application where contact pressure registration is important. Reference can also be made to handling tools and sitting and sleeping facilities. MAIN FACTORS INFLUENCING MEASUREMENT DATA AND THEIR REPRESENTATION Several factors influence the force or pressure data obtained. One of the factors mentioned following, in particular, those related to the subject, will be the factor of interest to the researcher or clinician who examines foot function. As in all research, every possible attempt should be made to allow only this variable to vary during the research protocol. Therefore, knowledge of the different factors involved is needed. These can be arbitrarily divided into three main groups: the factors related to (1) the subject, (2) the measurement equipment, and (3) the sensor environment (9,10). Factors Related to the Subject 1. the anatomical structure of the foot including malformations of the foot; 2. the functional deficiencies of the foot including all of the possible gait deviations, whatever the cause may be: limitation of range of motion in joints, insufficient muscular force, compensatory and pathological movements in all body segments, aberrations in neuromuscular control, etc.;
3. movement or gait characteristics that influence total forces that act on the foot (such as gait velocity, age); and 4. anthropometrical variables such as body weight and length. Factors Related to the Measurement Equipment These should reflect basic conditions in good measurement practice, instrumentation specifications, and statistical analysis. Therefore, the following list is restricted to the most important ones: validity of the measurement, reliability, accuracy, measurement range, signal resolution, geometric or spatial resolution (26), response frequency, hysteresis, time drift, temperature and humidity drift, sensor durability, and sensor bending radius or effect of curvature (10). Because the orientation of the force vectors on the sensors is not known, the sensitivity with respect to the measurement axis of the sensor must be known, as well as signal crossover between sensors, which is particularly important in sensor arrays often used in this type of measurement equipment. In addition, a relatively simple procedure for calibrating the sensors is needed because most have a nonlinear characteristic between applied force or pressure and signal output. A lookup table that has interpolation or any appropriate curve fitting procedure can be installed in the measurement system software. Factors Related to the Sensor Environment Sensor Embedment. Sensors are embedded in a sensor mat (platform or insole) and therefore should be flush and remain so, and the mat surface should be in the load range specified. This implies that the sensor mat should be homogeneous to prevent sensors from protruding from the loaded sensor mat. The latter would result in higher loads on the sensor due to the irregular mat surface. Therefore, the measurements would not represent the actual loads. The Effective Sensor Area. The total area covered by the sensors will be less than the total area of the measurement platform or insole due to the space separating sensors and the space needed for the leads in some systems. The ratio between the effective sensor area and the aforementioned free space is important because it affects the representativeness of the measurement results, depending on the spatial resolution and the software procedures used to interpolate results between sensor locations. The measurement data from each sensor can be interpreted mostly as the mean force or pressure acting across the sensor that results in a certain degree of smoothing across the area. Most software programs visualize force or pressure distribution on the foot using linear or other interpolative techniques to enhance the image produced. This may give the user the impression that the spatial resolution is higher than actual. Sensor Lead Arrangement. Point to point (two leads for each sensor) or matrix lead arrangements can influence measurement results significantly. In the first, crossover phenomena should be practically excluded, but it is
FORCE IMAGING
a challenge for sensor mat design when high spatial resolution is to be obtained, in particular, when the dimensions of the sensor mat are kept to a minimum. In the latter, high spatial resolution can be achieved with relatively small sensor mat dimensions often at the expense of crossover phenomena and consequently of accuracy. Sensor Placement
Platforms. Because forces are measured normal to the sensor surface, results can be considered the distribution of vertical ground reaction force acting on the foot. The advantages are that the spatial position of the foot during the supporting phase can be seen with respect to the line of progression. Multistep platforms (large platforms) also allow calculating temporal and spatial parameters because several consecutive steps are measured. Drawbacks are that measurements are restricted to the walkway where the measurement platform is located and to the fact that the subject may target the plate, thus altering gait. The latter would become relative by using large multistep platforms that are readily available. On-line interpretation of foot roll-of may be difficult because the force or pressure sequence is difficult to relate to foot anatomy. Measurements are also restricted to barefoot walking or walking with shoes that do not damage the plate (e.g., sports shoes: spikes, soccer shoes, etc.) Insoles. Insole measurements in general enable measuring several consecutive steps that are made on different surfaces and in different situations. Because the sensor mat placed in the shoe has a fixed position with respect to the foot, force and pressure distribution can be viewed as related to gross foot anatomy. The spatial position of the foot during the activity considered is unknown, however. It is important that insole measurements are not limited to certain types of shoes, or walking surface (even cycling, ski,) and that a large number of consecutive steps or cycles can be measured depending on the data storage capacity or data transfer rate of the measuring unit. Three possible locations can be considered: 1. between the plantar surface of the foot and the insole or foot bed. Measured forces will be those normal to the plantar surface of the foot, without any reference to the orientation of these force vectors with respect to the vertical. Nevertheless it will be of particular importance to detect or estimate the force or pressure to which the skin and soft tissues of the foot are subjected. Clinical relevance is in this case obvious for feet liable to develop pressure ulceration, as in the diabetic foot. 2. between insole and shoe bottom. Measurements between the insole and the bottom of the shoe are often considered a possible solution for measuring the effectiveness of the insole and for limiting sensor curvature. It must be clear, however, that these measurements do not necessarily reflect the pressure distribution on the plantar surface of the foot.
423
Measurements will be largely influenced by the elastic properties and thickness of the insole used. Measurements under a stiff insole will reflect only the incongruence between insole and shoe bottom, whereas very elastic insoles may result in a measurement that is liable to extrapolation. 3. between the midsole of the shoe and the floor. In this case, the sensor mat is attached to the midsole of the shoe. Any extrapolation of results with respect to plantar surface pressure distribution, if at all desirable, must be made with extreme care as a function of insole and shoe midsole elastic properties, as well as shoe construction or correction and sole profile geometry. The first and second type of measurements will be largely influenced by shoe construction, shoe fit, and the firmness of lacing. Firmly lacing the shoe will indeed create forces that are superimposed on those created by supporting the foot. The third type reflects the properties of a particular combination of foot, insole, and shoe on the roll-off characteristics during the gait, and is to a certain extent comparable to plate measurements. AVAILABLE SYSTEMS Throughout the past few decades, the search for objective data representing foot function led to the development of a variety of devices capable of measuring pressure under the foot. The previous section showed how different aspects influence the adequacy of the plantar pressure measurement. Over the years, a variety of different pressure measurement systems have been reviewed (27–30) using different techniques to obtain objective documentation of foot function. The quantification of the multidimensional forces generated by a load applied on the sensor(s) is the result of an indirect measurement. The degree of deformation of the sensor(s) caused by a force applied on a sensor surface can be converted in terms of pressure (kPa). Therefore, the objectiveness of the measurement depends strongly on the specific electromechanical characteristics of the sensors. Most of the recently available systems use capacitive or resistive sensors, two sensor types that have specific characteristics. Capacitive sensors have the advantage that they conform well to a 3-D shaped surface, show a small temperature drift, and have little hysteresis (10). The disadvantages of capacitive sensors are the need for special leads, large-scale electronics, a limited scanning frequency (up to 100 Hz), and the thickness of these sensors (>1 mm). Resistive sensors are extremely thin ( 0, tS is convex if S is convex. Because the intersection of convex sets is convex, Eq. (15) yields the convexity of EB (S) whenever S is convex. Proposition 14. If A and B are convex sets, so are the dilation, erosion, opening, and closing of A by B. Proof. Suppose that z, w ∈ B (A), r + s = 1, r ≥ 0, and s ≥ 0. According to Eq. (20), there exist a, a ∈ A and b, b ∈ B such that z = a + b and w = a + b . Owing to convexity, rz + sw = (ra + sa ) + (rb + sb ) ∈ B (A).
(60)
where B is a base for . The representation of Eq. (56) provides a filter-design paradigm. If an image is composed of a disjoint union of grains (connected components), then unwanted grains can be eliminated according to their sizes relative to the structuring elements in the base B . A key to good filtering is selection of appropriately sized structuring elements, because we wish to minimize elimination of signal grains and maximize elimination of noise grains. This leads to the theory of optimal openings. Because each opening in the expansion of Eq. (56) can, according to Theorem 1, be represented as a union of erosions, substituting the erosion representation of each opening into Eq. (56) ipso facto produces an erosion representation for . However, even if a basis expansion is used for each opening, there is redundancy in the resulting expansion. In a finite number of finite digital structuring elements, there exists a procedure to produce a minimal erosion expansion from the opening representation (41). As applied to a grain image according to Eq. (56), a τ opening passes certain components and eliminates others, but affects the passing grains. Corresponding to each τ opening is an induced reconstructive τ opening () defined in the following manner: a connected component is passed in full by () if it is not eliminated by ; a connected component is eliminated by () if it is eliminated by . If the clutter image of Fig. 5 is opened reconstructively by the ball of Fig. 2, then the rectangle is perfectly passed and all of the clutter is eliminated. Reconstructive openings belong to the class of connected operators (42–44). These are operators that either pass or completely eliminate grains in both the set
establishing the convexity of B (A). Convexity of opening and closing follow, because each is an iteration of a dilation and an erosion. It is generally true that (r + s)A ⊂ rA ⊕ sA, but the reverse inclusion does not always hold. However, if A is convex, then we have the identity (r + s)A = rA ⊕ sA.
(61)
Proposition 15. If r ≥ s > 0 and B is convex, then !rB (S) ⊂ !sB (S) for any set S. Proof. From the property of Eq. (61), rB = sB ⊕ (r − s)B.
(62)
By Proposition 12, rB is sB-open, and the conclusion follows from Proposition 11. If t ≥ 1 and we replace r, s, and B in Eq. (62) by t, 1, and A, respectively, then Proposition 12 establishes the following proposition: if A is convex, tA is A-open for any t ≥ 1. The converse is not generally valid; however, a significant theorem of Matheron states that it is valid under the assumption that A is compact. The proof is quite involved, and we state the theorem without proof. Theorem 6 (1). Let A be a compact set. Then tA is A-open for any t ≥ 1 if and only if A is convex.
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
GRANULOMETRIES Granulometries were introduced by Matheron to model parameterized sieving processes operating on random sets (1). If an opening is applied to a binary image composed of a collection of disjoint grains, then, some grains are passed (perhaps with diminution), and some are eliminated. If the structuring element is decreased or increased in size, then, grains are more or less likely to pass. A parameterized opening filter can be viewed in terms of its diminishing effect on image volume as the structuring element(s) increase in size. The resulting size distributions are powerful image descriptors, especially for classifying random textures. For motivation, suppose that S = S1 ∪ S2 ∪ · · · ∪ Sn , where the union is disjoint. Imagine that the components are passed over a sieve of mesh size t > 0 and that a parameterized filter t is defined componentwise according to whether or not a component passes through the sieve: t (Si ) = Si if Si does not fall through the sieve; t (Si ) = " if Si falls through the sieve. For the overall set, t (S) =
n
t (Si ).
(63)
i=1
Because the components of t (S) form a subcollection of the components of S, t is antiextensive. If T ⊃ S, then each component of T must contain a component of S, so that t (T) ⊃ t (S), and t is increasing. If the components are sieved iteratively through two different mesh sizes, then the output after both iterations depends only on the larger of the mesh sizes. In accordance with these remarks, an algebraic granulometry is defined as a family of operators t : P → P , t > 0, that satisfies three properties: (a) t is antiextensive; (b) t is increasing; (c) r s = s r = max{r,s} for r, s > 0 [mesh property]. If {t } is an algebraic granulometry and r ≥ s, then r = s r ⊂ s , where the equality follows from the mesh property, and the inclusion follows from the antiextensivity of r and the increasingness of s . The granulometric axioms are equivalent to two conditions. Proposition 16. {t } is an algebraic granulometry if and only if (i) for any t > 0, t is an algebraic opening; (ii) r ≥ s > 0 implies Inv[r ] ⊂ Inv[s ] [invariance ordering]. Proof. Assuming that {t } is an algebraic granulometry, we need to show idempotence and the invariance ordering of (ii). For idempotence, t t = max{t,t} = t . For invariance ordering, suppose that S ∈ Inv[r ]. Then, s (S) = s [r (S)] = max{s,r} (S) = r (S) = S.
(64)
439
To prove the converse, we need show only condition (c), since conditions (a) and (b) hold because t is an algebraic opening. Suppose that r ≥ s > 0. By idempotence and condition (ii), r (S) ∈ Inv[r ] ⊂ Inv[s ]. Hence, s r = r . Consequently, r = r r = r s r ⊂ r s ⊂ r ,
(65)
where the two inclusions hold because s is antiextensive and r is increasing. It follows that r ⊂ r s ⊂ r and r s = r = max{s,r} . Similarly to Eq. (65), r = r r = s r r ⊂ s r ⊂ r ,
(66)
and it follows that s r = r = max{s,r} , so that condition (c) is satisfied. If t is translation-invariant for all t > 0, then {t } is called a granulometry. For a granulometry, condition (i) of Proposition 16 is changed to say that t is a τ opening. If, t satisfies the Euclidean property, t (S) = t1 (S/t), for any t > 0, then {t } is called a Euclidean granulometry. In terms of sieving, translation invariance means that the sieve mesh is uniform throughout the space. The Euclidean condition means that scaling a set by 1/t, sieving by 1 , and then rescaling by t is the same as sieving by t . We call 1 the unit of the granulometry. The simplest Euclidean granulometry is an opening by a parameterized structuring element, !tB . For it, the Euclidean property states that (67) !tB (S) = t!B (S/t). Through Proposition 9, x ∈ !tB (S) if and only if there exists y such that x ∈ (tB)y ⊂ S, but (tB)y = tBy/t , so that x ∈ (tB)y ⊂ S if and only if x/t ∈ By/t ⊂ S/t, which means that x/t ∈ !B (S/t). REPRESENTATION OF EUCLIDEAN GRANULOMETRIES There is a general representation for Euclidean granulometries; before giving it, we develop some preliminaries. A crucial point to establish is that not just any class of sets can serve as the invariant class of a granulometric unit. Proposition 17. If {t } is a granulometry, then it satisfies the Euclidean condition if and only if Inv[t ] = t Inv[1 ], which means that S ∈ Inv[t ] if and only if S/t ∈ Inv[1 ]. Proof. Suppose that the Euclidean condition is satisfied and S ∈ Inv[t ]. Then, 1 (S/t) = t (S)/t = S/t, so that S/t ∈ Inv[1 ]. Now suppose that S/t ∈ Inv[1 ]. Then, t (S) = t1 (S/t) = t(S/t) = S, so that S ∈ Inv[t ]. To show the converse, let "t (S) = t1 (S/t). We claim that "t is a τ opening. Antiextensivity, increasingness, and translation invariance follow at once from the corresponding properties of t . For idempotence, "t ["t (S)] = t1 [t1 (S/t)/t] = t1 [1 (S/t)] = t1 (S/t) = "t (S).
(68)
440
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
Next, S ∈ Inv["t ] if and only if S/t ∈ Inv[1 ], which, according to the hypothesis, means that S ∈ Inv[t ]. Thus, Inv["t ] = Inv[t ], and it follows from Theorem 5 that t = "t and thus, t satisfies the Euclidean condition.
union, translation, and scalar multiplication by scalars t ≥ 1 that is generated by G is I. If {t } is the Euclidean granulometry with Inv[1 ] = I, then G is called a generator of {t }. The next theorem provides a more constructive characterization of Euclidean granulometries.
Theorem 7 (1). Let I be a class of subsets of Rn . There exists a Euclidean granulometry for which I is the invariant class of the unit if and only if I is closed under union, translation, and scalar multiplication by all t ≥ 1. Moreover, for such a class I, the corresponding Euclidean granulometry possesses the representation
Theorem 8 (1). An operator family {t }, t > 0, is a Euclidean granulometry if and only if there exists a class of images G such that
t (S) =
!tB (S),
(69)
t (S) =
!rB (S).
(72)
B∈G r≥t
Moreover, G is a generator of {t }.
B∈I
where Inv[1 ] = I. Proof. First suppose that there exists a Euclidean granulometry {t } for which I = Inv[1 ]. To prove closure under unions, consider a collection of sets Si ∈ Inv[1 ], and let S be the union of the Si . Because 1 is increasing, S=
i
Si =
i
1 (Si ) ⊂ 1
Si
= 1 (S).
(70)
i
1 S S 1 = !sB !tsB (S) = !rB (S), = t t t t r≥t s≥1 s≥1 B∈G
Because 1 is antiextensive, the reverse inclusion holds, S ∈ Inv[1 ], and there is closure under unions. Because an algebraic opening is a τ opening if and only if its invariant class is invariant under translation, Inv[1 ] is closed under translation. Now suppose that S ∈ Inv[1 ] and t ≥ 1. By the Euclidean condition, tS ∈ Inv[t ], and 1 (tS) = 1 [t (tS)] = max{1,t} (tS) = t (tS) = tS.
Proof. First, we show that Eq. (72) yields a Euclidean granulometry for any class G . According to Theorem 5, t is a τ opening with base B t = {rB : B ∈ G , r ≥ t}. If u ≥ t, then, B u ⊂ B t , which implies that Inv[u ] ⊂ Inv[t ]. To show that {t } is a Euclidean granulometry, we apply Proposition 17 and Eq. (67). Indeed, S/t ∈ Inv[1 ] if and only if
(71)
Therefore, tS ∈ Inv[1 ], and Inv[1 ] is closed under scalar multiplication by t ≥ 1. For the converse of the proposition, we need to find a granulometry for which I is the invariant class of the unit. According to Theorem 5, t defined by Eq. (69) is a τ opening whose base is tI. Because I is closed under unions and translations, Inv[t ] = tI, and I = Inv[1 ]. To show that {t } is a Euclidean granulometry, we must demonstrate invariance ordering and the Euclidean condition. Suppose that r ≥ s > 0 and S ∈ Inv[r ]. Then, S = rB for some B ∈ I. Because r/s ≥ 1, (r/s)B ∈ I, which implies that S = rB = sC for some C ∈ I, which implies that S ∈ sI = Inv[s ]. Finally, according to Proposition 17, the Euclidean condition holds because, by construction, Inv[t ] = t Inv[1 ]. Taken together with Proposition 17, which states that invariant classes of Euclidean granulometries are determined by the invariant class of the unit, Theorem 7 characterizes the form and invariant classes of Euclidean granulometries. Nevertheless, as it stands, it does not provide a useful framework for filter design because one must construct invariant classes of units and we need a methodology for construction. Suppose that I is a class of sets closed under union, translation, and scalar multiplication by t ≥ 1. A class G of sets is called a generator of I if the class closed under
B∈G
B∈G
(73) which, upon canceling 1/t, says that S/t ∈ Inv[1 ] if and only if S ∈ Inv[t ]. Because B 1 is a base for 1 , the form of B 1 shows that G is a generator of {t }. As for the converse, because {t } is a Euclidean granulometry, t has the representation of Eq. (69) and, because I = Inv[1 ] is a generator of itself, 1 (S) =
!rB (S).
(74)
B∈I r≥1
By the Euclidean condition and Eq. (67), t (S) = t
B∈I r≥1
!rB (S/t) =
B∈I r≥1
!rtB (S) =
!uB (S),
B∈I u≥t
(75) so that t possesses a representation of the desired form. Theorem 8 provides a methodology for constructing Euclidean granulometries: select a generator G and apply Eq. (72); however, such an approach is problematic in practice because it involves a union across all r ≥ t for each t. To see the problem, suppose that we choose a singleton generator G = {B}. Then, Eq. (72) yields the representation !rB (S), (76) t (S) = r≥t
which is an uncountable union. According to Theorem 6, if B is compact and convex, then, rB is tB-open, so that !rB (S) ⊂ !tB (S), and the union reduces to the single opening t (S) = !tB (S). Because Theorem 6 is an
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
equivalence, for compact B, we require the convexity of B to obtain the reduction. This reasoning extends to an arbitrary generator: for a generator composed of compact sets, the double union of Eq. (72) reduces to the single outer union over G , t (S) =
!tB (S),
(77)
B∈G
if and only if G is composed of convex sets, in which case we say that the granulometry is convex. The single union represents a parameterized τ opening. The generator sets of a convex granulometry are convex, and therefore connected. Hence, if S1 , S2 , . . . are mutually disjoint compact sets, then, t
∞
Si
=
i=1
∞
t(Si ),
(78)
i=1
that is, a convex granulometry is distributive, and it can be viewed componentwise. Although we have restricted our development to binary granulometries, as conceived by Matheron, the theory can be extended to gray-scale images (45,46), and the algebraic theory to the framework of complete lattices (10). RECONSTRUCTIVE GRANULOMETRIES The representation of Eq. (77) can be generalized by separately parameterizing each structuring element, rather than simply scaling each by a common parameter. To avoid cumbersome subscripts, we will now switch to the infix notation S°B for the opening of S by B. Assuming a finite number of convex structuring elements, individual structuring-element parameterization yields a family {r } of multiparameter τ openings of the form
441
need not be invariance ordered. As it stands, {r } is simply a collection of τ openings across a parameter space. The failure of the family of Eq. (79) and other operator families defined via unions and intersections of parameterized openings to be granulometries is overcome by openingwise reconstruction and leads to the class of logical granulometries (47). Regarding Eq. (79), the induced reconstructive family {r }, defined by r (S) =
n
(S°Bk [rk ]) =
k=1
n
S°Bk [rk ] ,
(80)
k=1
is a granulometry (because it is invariance ordered). As shown in Eq. (80), reconstruction can be performed openingwise or on the union. {r } is called a disjunctive granulometry. Although Eq. (79) does not generally yield a granulometry without reconstruction, a salient special case occurs when each structuring element has the form ti Bi . In this case, for any n-vector t = (t1 , t2 , . . . , tn ), ti > 0, for i = 1, 2, . . . , n, the filter takes the form t (S) =
n
S°ti Bi .
(81)
i=1
To avoid useless redundancy, we assume that no set in the base is open with respect to another set in the base, meaning that for i = j, Bi °Bj = Bi . For any t = (t1 , t2 , . . . , tn ) for which there exists ti = 0, we define t (S) = S. {t } is a multivariate granulometry (because it is a granulometry without reconstruction) (48). If the union of Eq. (79) is changed to an intersection and all conditions that qualify Eq. (79) are maintained, then the result is a family of multiparameter operators of the form n S°Bk [rk ]. (82) r (S) = k=1
r (S) =
n
S°Bk [rk ],
(79)
k=1
where r1 , r2 , . . . , rn are parameter vectors governing the convex, compact structuring elements B1 [r1 ], B2 [r2 ], . . . , Bn [rn ] that compose the base of r and r = (r1 , r2 , . . . , rn ). To keep the notion of sizing, we require (here and subsequently) the sizing condition that rk ≤ sk implies Bk [rk ] ⊂ Bk [sk ] for k = 1, 2, . . . , n, where vector order is defined by (t1 , t2 , . . . , tm ) ≤ (u1 , u2 , . . . , um ) if and only if tj ≤ uj for j = 1, 2, . . . , m. r is a τ opening because any union of openings is a τ opening; however, because the parameter is a vector now, the second condition of Proposition 16 does not apply as stated. To generalize the condition, we use componentwise ordering in the vector lattice. Condition (ii) of Proposition 16 becomes (ii ) r ≥ s > 0 ⇒ Inv[r ] ⊂ Inv[s ]. Condition (ii ) states that the mapping r → Inv[r ] is order reversing and we say that any family {r } for which it holds is invariance ordered. If r is a τ opening for any r and a family {r } is invariance ordered, then we call {r } a granulometry. The family {r } defined by Eq. (79) is not necessarily a granulometry because it
Each operator r is translation-invariant, increasing, and antiextensive but, unless n = 1, r need not be idempotent. Hence, r is not generally a τ opening, and the family {r } is not a granulometry. Each induced reconstruction (r ) is a τ opening (is idempotent), but the family {(r )} is not a granulometry because it is not invariance ordered. However, if reconstruction is performed openingwise, then the resulting intersection of reconstructions is invariance ordered and a granulometry. The family of operators r (S) =
n
[S°Bk (rk )]
(83)
k=1
is called a conjunctive granulometry. In the conjunctive case, the equality of Eq. (81) is softened to an inequality: the reconstruction of the intersection is a subset of the intersection of the reconstructions. Conjunction and disjunction can be combined to form a more general form of reconstructive granulometry: r (S) =
mk n k=1 j=1
[S°Bk,j (rk,j )].
(84)
442
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
If Si is a component of S and xi,k,j and yi are the logical variables determined by the truth values of the equations Si °Bk,j [rk,j ] = " and r (Si ) = " [or, equivalently, [Si °Bk,j (rk,j )] = Si and r (Si ) = Si ], respectively, then y possesses the logical representation yi =
mk n
xi,k,j .
(85)
k=1 j=1
We call {r } a logical granulometry. Component Si is passed if and only if there exists k such that, for j = 1, 2, . . . , mk , there exists a translate of Bk,j (rk,j ) that is a subset of Si . Disjunctive and conjunctive granulometries, are special cases of logical granulometries, and the latter compose a class of sieving filters that locate targets among clutter based on the size and shape of the target and clutter structural components. For fixed r, we refer to r as a disjunctive, conjunctive, or logical opening, based on the type of reconstructive granulometry from which it arises. Logical openings form a subclass of a more general class of reconstructive sieving filters called logical structural filters (49). These are not granulometric in the sense of Matheron; they need not be increasing. SIZE DISTRIBUTIONS For increasing t, a granulometry causes increasing diminution of a set. The rate of diminution is a powerful image descriptor. Consider a finite-generator convex Euclidean granulometry {t } of the form given in Eq. (77) that has compact generating sets containing more than a single point. Fixing a compact set S of positive measure, letting ν denote area, and treating t as a variable, we define the size distribution as
(t) = ν[S] − ν[t (S)].
(86)
(t) measures the area removed by t . is an increasing function for which (0) = 0, and (t) = ν(S) for sufficiently large t. (t) is a random function whose realizations are characteristics of the corresponding realizations of the random set. Taking the expectation of (t) gives the mean size distribution (MSD), M(t) = E[ (t)]. The (generalized) derivative, H(t) = M (t), of the mean size distribution is called the granulometric size density (GSD). The MSD is not a probability distribution function because M(t) → E[v(S)], as t → ∞. Hence, the GSD is not a probability density. The MSD and GSD serve as partial descriptors of a random set in much the same way as the power spectral density partially describes a wide-sense-stationary random function, and they play a role analogous to the power spectral density in the design of optimal granulometric filters (12,50–53). The pattern spectrum of S is defined by (t) = is a prob (t)/ν(S), or its derivative, = d /dt. ability distribution function, and its derivative is a probability density. The expectation E[ (t)] is a probability distribution function, and we call its derivative, #(t) = dE[ (t)]/dt, the pattern-spectrum density (PSD).
# is a probability density and, under nonrestrictive regularity conditions, #(t) = E[ (t)]. Treating S as a random set, (t) is a random function, and its moments, called granulometric moments, are random variables. The pattern spectrum and the granulometric moments are used to provide image characteristics in various applications, and in particular are used for texture and shape classification (54–61). Because the moments of (t) are used for classification, three basic problems arise: (1) find expressions for the pattern-spectrum moments; (2) find expressions for the moments of the pattern-spectrum moments; (3) describe the probability distributions of the pattern-spectrum moments. In this vein, various properties of pattern spectra have been studied: asymptotic behavior (relative to grain count) of the pattern-spectrum moments (62–64), effects of noise (65), continuous-todiscrete sampling (66), and estimation (67,68). Granulometric classification has also been applied to gray-scale textures (69–71). Given a collection of convex, compact sets B1 , B2 , . . . , BJ , there exist granulometric moments for each granulometry {S°tBj }, j = 1, 2, . . . , J. If we take the first q moments of each granulometry, then m = qJ features, µ(k) (Si ; Bj ), are generated for each Si , thereby yielding, for each Si , an m-dimensional feature vector. Size distributions are applied locally to classify individual pixels. The granulometries are applied to the whole image, but a size distribution is computed at each pixel by taking pixel counts in a window about each pixel. BIBLIOGRAPHY 1. G. Matheron, Random Sets and Integral Geometry, Wiley, NY, 1975. 2. J. Serra, Image Analysis and Mathematical Morphology, Academic Press, NY, 1983. 3. H. Minkowski, Math. Ann. 57, 447–495 (1903). 4. H. Hadwiger, Altes und Neues Uber Konvexe Korper, Birkhauser-Verlag, Basel, 1955. 5. H. Hadwiger, Vorslesungen Uber Inhalt, Oberflache and Isoperimetrie, Springer-Verlag, Berlin, 1957. 6. S. Sternberg, Comput. Vision Graphics Image Process. 35(3), 337–355 (1986). 7. J. Serra, in J. Serra, ed., Image Analysis and Mathematical Morphology, vol. 2, Theoretical Advances, Academic Press, NY, 1988. 8. H. J. Heijmans and C. Ronse, Comput. Vision Graphics Image Process. 50, 245–295 (1990). 9. C. Ronse and H. J. Heijmans, Comput. Vision Graphics Image Process. 54, 74–97 (1991). 10. H. J. Heijmans, Morphological Operators, Academic Press, NY, 1995. 11. E. R. Dougherty, in E. R. Dougherty and J. T. Astola, eds., Nonlinear Filters for Image Processing, SPIE and IEEE Presses, Bellingham, 1999. 12. E. R. Dougherty and J. T. Astola, eds., Nonlinear Filters for Image Processing, SPIE and IEEE Presses, Bellingham, 1999. 13. E. R. Dougherty and Y. Chen, in E. R. Dougherty and J. T. Astola, eds., Nonlinear Filters for Image Processing, SPIE and IEEE Presses, Bellingham, 1999.
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
443
14. E. R. Dougherty, An Introduction to Morphological Image Processing, SPIE Press, Bellingham, 1992.
42. J. Crespo and R. Schafer, Math. Imaging Vision 7(1), 85–102 (1997).
15. P. Soille, Morphological Image Analysis, Springer-Verlag, NY, 1999. 16. J. Serra, Image Analysis and Mathematical Morphology, Academic Press, London, 1982. 17. E. R. Dougherty, ed., Mathematical Morphology in Image Processing, Marcel Dekker, NY, 1993.
43. J. Crespo, J. Serra, and R. Schafer, Signal Process. 47(2), 201–225 (1995). 44. H. Heijmans, in E. R. Dougherty and J. T. Astola, eds., Nonlinear Filters for Image Processing, SPIE and IEEE Presses, Bellingham, 1999. 45. E. R. Dougherty, Math. Imaging Vision 1(1), 7–21 (1992).
18. P. Maragos and R. Schafer, IEEE Trans. Acoust. Speech Signal Process. 35, 1153–1169 (1987). 19. C. R. Giardina and E. R. Dougherty, Morphological Methods in Image and Signal Processing, Prentice-Hall, Englewood Cliffs, NY, 1988. 20. G. J. F. Banon and J. Barrera, SIAM J. Appl. Math. 51(6), 1782–1798 (1991). 21. P. Maragos and R. Schafer, IEEE Trans. Acoust. Speech Signal Process. 35, 1170–1184 (1987). 22. E. R. Dougherty and D. Sinha, Signal Process. 38, 21–29 (1994). 23. G. J. F. Banon and J. Barrera, Signal Process. 30, 299–327 (1993). 24. E. R. Dougherty and D. Sinha, Real-Time Imaging 1(1), 69–85 (1995).
46. E. Kraus, H. J. Heijmans, and E. R. Dougherty, Signal Process. 34, 1–17 (1993). 47. E. R. Dougherty and Y. Chen, in J. Goutsias, R. Mahler, and C. Nguyen, eds., Random Sets: Theory and Applications, Springer-Verlag, NY, 1997. 48. S. Batman and E. R. Dougherty, Opt. Eng. 36(5), 1518–1529 (1997). 49. E. R. Dougherty and Y. Chen, Opt. Eng. 37(6), 1668–1676 (1998). 50. E. R. Dougherty et al., Signal Process. 29, 265–281 (1992). 51. R. M. Haralick, P. L. Katz, and E. R. Dougherty, Comput. Vision Graphics Image Process. Graphical Models Image Process. 57(1), 1–12 (1995). 52. E. R. Dougherty, Math. Imaging Vision 7(2), 175–192 (1997).
25. E. R. Dougherty and D. Sinha, Real-Time Imaging 1(4), 283–295 (1995). 26. E. R. Dougherty and J. Barrera, in E. R. Dougherty and J. T. Astola, eds., Nonlinear Filters for Image Processing, SPIE and IEEE Presses, Bellingham, 1999. 27. J. Barrera, E. R. Dougherty, and N. S. Tomita, Electron. Imaging 6(1), 54–67 (1997). 28. E. J. Coyle and J.-H. Lin, IEEE Trans. Acoust. Speech Signal Process. 36(8), 1244–1254 (1988). 29. E. R. Dougherty, CVGIP: Image Understanding 55(1), 36–54 (1992). 30. E. R. Dougherty and R. P. Loce, Opt. Eng. 32(4), 815–823 (1993). 31. E. R. Dougherty and R. P. Loce, Signal Process. 40(3), 129–154 (1994). 32. E. R. Dougherty and R. P. Loce, Electron. Imaging 5(1), 66–86 (1996). 33. E. R. Dougherty, Y. Zhang, and Y. Chen, Opt. Eng. 35(12), 3495–3507 (1996). 34. M. Gabbouj and E. J. Coyle, IEEE Trans. Acoust. Speech Signal Process. 38(6), 955–968 (1990). 35. P. Kuosmanen and J. Astola, Signal Process. 41(3), 165–211 (1995). 36. R. P. Loce and E. R. Dougherty, Visual Commun. Image Representation 3(4), 412–432 (1992). 37. R. P. Loce and E. R. Dougherty, Opt. Eng. 31(5), 1008–1025 (1992). 38. R. P. Loce and E. R. Dougherty, Enhancement and Restoration of Digital Documents: Statistical Design of Nonlinear Algorithms, SPIE Press, Bellingham, 1997. 39. P. Salembier, Visual Commun. Image Representation 3(2), 115–136 (1992). 40. J. Serra, in J. Serra, ed., Image Analysis and Mathematical Morphology, vol. 2, Theoretical Advances, Academic Press, NY, 1988. 41. E. R. Dougherty, Pattern Recognition Lett. 14(3), 1029–1033 (1994).
53. Y. Chen and E. R. Dougherty, Signal Process. 61, 65–81 (1997). 54. P. Maragos, IEEE Trans. Pattern Analy. Mach. Intelligence 11, 701–716 (1989). 55. E. R. Dougherty and J. Pelz, Opt. Eng. 30(4), 438–445 (1991). 56. E. R. Dougherty, J. T. Newell, and J. B. Pelz, Pattern Recognition 25(10), 1181–1198 (1992). 57. L. Vincent and E. R. Dougherty, in E. Dougherty, ed., Digital Image Processing Methods, Marcel Dekker, NY, 1994. 58. B. Li and E. R. Dougherty, Opt. Eng. 32(8), 1967–1980 (1993). 59. R. Sobourin, G. Genest, and F. Preteux, IEEE Trans. Pattern Anal. Mach. Intelligence 19(9), 989–1003 (1997). 60. G. Ayala, M. E. Diaz, and L. Martinez-Costa, Pattern Recognition 34(6), 1219–1227 (2001). 61. Y. Balagurunathan et al., Image Anal. Stereology 20, 87–99 (2001). 62. E. R. Dougherty and F. Sand, Visual Commun. Image Representation 6(1), 69–79 (1995). 63. F. Sand and E. R. Dougherty, Visual Commun. Image Representation 3(2), 203–214 (1992). 64. F. Sand and E. R. Dougherty, Pattern Recognition 31(1), 53–61 (1998). 65. B. Bettoli and E. R. Dougherty, Math. Imaging Vision 3(3), 299–319 (1993). 66. E. R. Dougherty and C. R. Giardina, SIAM J. Appl. Math. 47(2), 425–440 (1987). 67. K. Sivakumar and J. Goutsias, in J. Serra and P. Soille, eds., Mathematical Morphology and its Applications to Image Processing, Kluwer Academic, Boston, 1994. 68. K. Sivakumar and J. Goutsias, Electron. Imaging 6(1), 31–53 (1997). 69. Y. Chen and E. R. Dougherty, Opt. Eng. 33(8), 2713–2722 (1994). 70. Y. Chen, E. R. Dougherty, S. Totterman, and J. Hornak, Magn. Resonance Med. 29(3), 358–370 (1993). 71. S. Baeg et al., Electron. Imaging 8(1), 65–75 (1999).
G GRAVITATION IMAGING
across the denser body and likewise negative across the other body. In fact, gravitational imaging is in essence the method we use to detect and map the extent of such density contrasts. However, the lithosphere is composed of many heterogeneous bodies of rock, and thus, the appearance of images of gravity anomalies is often complex and makes the separation of anomalies due to different density contrasts difficult. Another consideration is the fundamental ambiguity in gravity studies that is due to the fact that the variation in mass that causes a particular gravity anomaly can be represented by many geologically reasonable combinations of volume and density. Thus, we can be confident that anomalies in gravitational images locate anomalous masses, but we require independent information such as data from drill holes to determine the geometry and density of the body that causes the anomaly. Even though it is a first-order approximation of reality, Newton’s law of gravitation should form the basis for our intuitive understanding of most aspects of gravitational imaging. If the earth were a perfect sphere consisting of concentric shells of constant density and was not rotating, Newton’s law of gravitation would predict the gravitational attraction g between the earth (mass = Me ) and a mass m1 on its surface as
G. RANDY KELLER University of Texas at El Paso El Paso0, TX
FUNDAMENTAL THEORY AND PRACTICE Studies of the earth’s gravity field (and those of other planetary bodies) are a prime example of modern applications of classical Newtonian physics. We use knowledge of the earth’s gravity field to study topics such as the details of the earth’s shape (geodesy), predicting the orbits of satellites and the trajectories of missiles, and determining the earth’s mass and moment of inertia. However, gravitational imaging as defined here refers to geophysical mapping and interpretation of features in the earth’s lithosphere (the relatively rigid outer shell that extends to depths of ∼100 km beneath the surface). In fact, the emphasis is on the earth’s upper crust that extends to depths of about 20 km, because it is this region where gravity data can best help delineate geologic features related to natural hazards (faults, volcanoes, landslides), natural resources (water, oil, gas, minerals, geothermal energy), and tectonic events such as the formation of mountain belts. Such studies provide elegantly straightforward demonstrations of the applicability of classical physics and digital processing to the solution of a variety of geologic problems. These problems vary in scale from very local investigations of features such as faults and ore bodies to regional investigations of the structure of mountain belts and tectonic plates. Mass m is a fundamental property of matter, and density ρ is the mass per unit volume (v); thus, m = ρv. The variation in density in the lithosphere produces mass variations and thus changes in the gravity field that we seek to image. To produce images, we must first apply corrections to our gravity measurements that remove known variations in gravity with respect to elevation and latitude. Our goal is to derive gravity anomalies that represent departures from what we know about the gravity field and to construct images of these values. The lithosphere constitutes only about 5% of the earth’s volume, and because density generally increases with depth in the earth, the lithosphere is a very small portion of the earth’s mass. However, below the lithosphere, the earth can be thought of as concentric shells of material whose density is relatively constant. Thus, the vast majority of the earth’s gravity field is due to the material below the lithosphere and varies in a subtle and very long-wavelength fashion. However, within the lithosphere, rocks vary in density from less than that of water (pumice, a volcanic rock can actually float in water) to more than 4000 kg m−3 . A contrast in density between adjacent bodies of rock produces a gravity anomaly that is positive
g = γ Me m1 /R2e ,
(1)
where Re is the radius of the Earth and γ is the International gravitational constant (γ = 6.67 × 10−11 m3 kg−1 s−2 ). In actuality, the earth’s gross shape departs slightly from spherical, there is topography on the continents that can be thought of as variations in Re , the density within the Earth varies (and varies complexly in the lithosphere), and a slow rotation is present. However, all of these complications are second order. In studies of lithospheric structure, the search is for gravity anomalies (differences between observed gravity values and what is expected based on first principles and planetary-scale variations). With respect to the total gravity field of the earth, these anomalies are at most only a few parts per thousand in amplitude. Images (maps) of the values of these anomalies are used to infer the earth’s structure and are well suited to be integrated with other data such as satellite images and digital elevation models. For example, a simple overlay of gravity anomalies on a Landsat image provides an easy and effective depiction of the way subsurface mass distributions correlate with surface features. Qualitative interpretation of gravity anomalies is no more complex than calling upon Newton’s law to tell us that positive anomalies indicate the presence of a local mass excess while negative anomalies indicate local mass deficiencies. As discussed later, several different types of anomalies have been defined based on the known variations in the 444
GRAVITATION IMAGING
The idealized shape of the earth, an ellipsoid whose major axis is a and minor axis is b.
b a
Flattening (f ) f = (a −b)/a
Figure 1. An example of an idealized reference spheroid that is used to predict gravity values at sea level.
earth’s gravity field that are considered before calculating the anomaly value. However, we start from a basic formula for the gravitational attraction of a rotating ellipsoid (Fig. 1) with flattening f , derived by Clairaut in 1743. This formula predicts the value of gravity (Gt ) at sea level as a function of latitude (φ). In the twentieth century, higher order terms were added so that the formula takes the form (2) Gt = Ge (1 + f2 sin2 φ − f4 sin4 φ), where Ge = global average value of the gravitational acceleration at the equator. = angular velocity of the Earth’s rotation. m = 2 a/Ge , ( 2 a = centrifugal force at the equator, a = equatorial radius of the ellipsoid, b = polar radius of the ellipsoid) and
445
gravitational attraction is 980 cm/s, but this formula shows that the gravitational attraction of the earth at sea level varies from about 978 cm/s2 at the equator to about 983 cm/s2 at the poles. Gravity surveys on land routinely detect anomalies that have amplitudes of a 0.1 mGal and thus have the rather remarkable precision of 1 part in 1 million. Surveys whose precision is 0.01 mGal are common. By merely subtracting Gt from an observed value of the gravitational acceleration (Gobs ), we calculate the most fundamental type of gravity anomaly. However, the effects of elevation are so large that such an anomaly value means little except at sea level. Instead, the Free Air, Bouguer, and residual anomaly values described later are calculated and interpreted. Maps of these anomaly values have been constructed and interpreted for decades, and using modern techniques, it is these values that are imaged. LANGUAGE The intent is to introduce only those terms and concepts necessary to understand the basics of gravitational imaging. The Gravity and Magnetics Committee of the Society of Exploration Geophysicists maintains a web site that includes a dictionary of terms (http://seg.org/comminfo/grav− mag/gm− dict.html), and a link to a glossary (http://www.igcworld.com/gm− glos.html) that is maintained by the Integrated Geophysics Corporation. Gravity
f = (a − b)/a f2 = −f + 5/2 m + 1/2 f 2 − 26 f m + 15/4 m2 f4 = 1/2 f 2 + 5/2 f m The values of Ge , a, b, and f (flattening) are known to a considerable level of precision but are constantly being refined by a variety of methods. Occasionally, international scientific organizations agree on revised values for these quantities. Thus, all calculated values of gravity anomalies need to be adjusted when these revisions are made. As of 2000, practitioners commonly use the following equation that is based on the Geodetic Reference System 67 (1). Gt (mGals) = 978031.846(1 + 0.005278895 sin2 φ + 0.000023462 sin4 φ).
(3)
Due to the advent of the Global Positioning System (GPS), the World Geodetic System 1984 (2) is being widely adopted, and the reduction equation for this system will become the standard. The National Imaging and Mapping Agency maintains a web site (http://164.214.2.59/GandG/pubs.html) that has the latest information on geodetic systems and models of the earth’s gravity field. The units for gravity measurements are cm/s2 or Gals in honor of Galileo; Eq. 3 produces values whose units are milliGals (mGal). We learn that the value of the earth’s
Technically, gravity g is the gravitational acceleration due to the sum of the attraction of the earth’s mass and the effects of its rotation. However, it is common practice for geophysicists to say that they are measuring gravity and think of it as a force (a vector), that represents the attraction of the earth on a unit mass (F = Me g). In most cases, geophysicists tacitly assume that g is directed toward the center of the earth, which is true to much less than 10 in most places. This practice results in treating gravity, effectively as a scalar quantity. Geoid The theoretical treatment of the earth’s gravity field is based on potential theory (3). The gravitational potential at a point is the work done by gravity as a unit mass is brought from infinity to that point. One differentiates the potential to arrive at the gravitational attraction. Thus, it is important to remember that equipotential surfaces are not equal gravity surfaces. This concept is less abstract if we realize that mean sea level is an equipotential surface that we call the geoid. For the continents, the geoid can be thought of as the surface sea level would assume if canals connected the oceans and the water was allowed to flow freely and reach its equilibrium level. Another important consideration is that a plumb bob (a weight on a string) always hangs in a direction perpendicular to the geoid. This is the very definition of vertical, which is obviously important in surveying topography. The technical definition of elevation is also height above the geoid. However, reference spheroids
446
GRAVITATION IMAGING
that approximate the geoid, at least locally, are employed to create coordinate systems for constructing maps (see http://www.Colorado.EDU/geography/gcraft/notes/notes. html for a good primer on geodetic data). Thus, mapping the geoid is a key element in determining the earth’s shape. Gravimeter The measurement of absolute gravity values is a very involved process that usually requires sophisticated pendulum systems. However, gravimeters that measure differences in gravity are elegantly simple and accurate. These instruments were perfected in the 1950s, and although new designs are being developed, most instruments work on the simple principle of measuring the deflection of a suspended mass as a result of changes in the gravity field. A system of springs suspends this mass, and it is mechanically easier to measure the change in tension on the main spring required to bring the mass back to a position of zero deflection than to measure the minute deflection of the mass. If gravity increases from the previously measured value, the spring is stretched, and the tension must be increased to return it to zero deflection. If gravity decreases, the spring contracts and the tension must be decreased. The gravimeter must be carefully calibrated so that the relative readings it produces can be converted into differences in mGals. Thus, each gravimeter has its own calibration constant, or table of constants, if the springs do not behave linearly over the readable range of the meter. Instruments that measure on land are the most widely used and can easily produce results that are correct to 0.1 mGal. Meters whose precision is almost 0.001 mGal are available. Specially designed meters can be lowered into boreholes or placed in waterproof vessels and lowered to the bottom of lakes or shallow portions of the ocean. Considering the accelerations that are present on moving platforms such as boats and aircraft and the precision required for meaningful measurements, it is surprising that gravity measurements are regularly made in such situations. The gravimeters are placed on platforms that minimize accelerations, and successive measurements must be averaged. But precision of 1–5 mGal is obtained in this fashion.
Topography Gravity stations
Elevation of lowest gravity station Station 1
Elevation datum
Bouguer slab for station 1
Figure 2. A diagram that shows how elevation corrections (Free Air and Bouguer) are used in gravity studies. The Free Air correction compensates for variations in the distance from the datum chosen for the procedure, which is usually sea level. The Bouguer correction compensates for the mass between the location of an individual gravity reading and the datum.
Observed Gravity Value By reading a gravimeter at a base station and then at a particular location (usually called a gravity station; Fig. 2), we can convert the difference into an observed gravity value by first multiplying this difference by the calibration constant of the gravimeter. This converts the difference from instrument readings into mGal. Then, this difference is corrected for meter drift and earth tides (see later) and is added to the established gravity value at the base station to obtain the observed gravity value (Gobs ) at the station. This process is to some degree analogous to converting electromagnetic sensor readings from a satellite to radiance values. Corrections of Gravity Measurements To arrive at geologically meaningful anomaly values, a series of ‘‘corrections’’ are made to raw observations of differences between gravity measured at a station and a base station. The use of this term is misleading because most of these ‘‘corrections’’ are really adjustments that compensate (at least approximately) for known variations in the gravity field that do not have geologic meaning. Drift Correction Gravimeters are simple and relatively stable instruments, but they do drift (i.e., the reading varies slightly with time). Because of the sensitivity of these instruments, they are affected by temperature variations, fatigue in the internal springs, and minor readjustments in their internal workings, and these factors are the primary cause of instrument drift. In addition, earth tides cause periodic variations in gravity that may be as large as ∼0.3 mGal during about 12 hours. In field operations, these factors cause small changes in gravity readings with time. The gravity effects of earth tides can be calculated (4), or they can be considered part of the drift. One deals with drift by making repeated gravity readings at designated stations at time intervals that are shorter, as the desired precision of the measurement increases. It is assumed that the drift is linear between repeated occupations of the designated stations, and during a period of a few hours, this is usually a valid assumption. The repeated values are used to construct a drift curve, which is used to estimate the drift for readings that were made at times between those of the repeated readings. Because one encounters so many different situations in real field operations, it is hard to generalize about the way one proceeds. However, the key concern is that no period of time should occur which is not spanned by a repeated observation. This is a way of saying that the drift curve must be continuous. If the meter is jarred, a tare (an instantaneous variation in reading) may occur. If one expects that this has occurred, simply return to the last place a reading was made. If there is a significant difference, a tare has occurred, and a simple constant shift (the difference in readings) is made for all subsequent readings.
GRAVITATION IMAGING
Tidal Correction As discussed earlier, variations in the gravity field due to the earth’s tides can be calculated if one makes assumptions about the rigidity of the lithosphere (4). In fact, the rigidity of the lithosphere can be estimated by studying the earth’s tides. If the effects of tides are calculated separately, the correction for this effect is called the tidal correction. Latitude Correction The international gravity formula predicts that gravity increases by about 5300 mGal from the equator to the poles. The rate of this increase varies slightly as a function of latitude (φ), but it is about 0.8 mGal/km. The preferable approach to correct for this effect is to tie repeat stations to the IGSN 71 gravity net (1,5). The IGSN 71 base stations are available in digital form (files whose prefix is dmanet93) in Dater et al. (6). Then, one can use the international gravity formula to calculate the expected value of gravity, which will vary with latitude. Thus, the first level calculation of the gravity anomaly at the station (Ganomaly = Gobs − Gt ) will have the adjustment for latitude built into the computation. For a local survey of the gravity field, one can derive the formula for the N–S gradient of the gravity field (1.3049 sin 2φ mGal/mile or 0.8108 sin 2φ mGal/km) by differentiating the International Gravity Formula with respect to latitude. Then, a base station is chosen and all gravity readings are corrected for latitude by multiplying the distance by which a gravity station is north or south of this base station by this gradient. Stations located closer to the pole than the base station have higher readings just because of their geographic position; thus, the correction would be negative. The correction is positive for stations nearer the equator than the base station. Free Air Correction In a typical gravity survey, the elevation of the various stations varies considerably (Fig. 2) and produces significant variations in observed gravity because Newton’s law of gravitation predicts that gravity varies with the distance from the center of the earth (7). The vertical gradient of gravity is about −0.3086 mGal/m. The amplitude of the gravity anomalies that we seek to detect (image) are often less than 1 mGal, so the magnitude of this gradient requires that we have high precision vertical control for the locations of our gravity stations. This requirement was once the major barrier to conducting gravity surveys because traditional surveying methods to establish locations are costly and time-consuming, and the number of established benchmarks and other accurately surveyed locations in an area is usually small. However, the emergence of the Global Positioning System (GPS) has revolutionized gravity studies from the viewpoint of data acquisition. Thanks to GPS, a land gravity station can be located almost anywhere. However, the care that must be exercised to obtain GPS locations routinely with submeter vertical accuracy should not be underestimated. One aspect of the variation of gravity with elevation is called the Free Air effect. This effect is due only
447
to the change of elevation, as if the stations were suspended in free air, not sitting on land. The vertical gradient of gravity is derived by differentiation with respect to Re . Higher order terms are usually dropped and yield gradients that are not a function of latitude or elevation (0.3086 mGal/m or 0.09406 mGal/ft). However, this approach does pose some problems (8). Once gravity values have been established and their locations are accurately determined, the Free Air correction can be calculated by choosing an elevation datum and simply applying the following equation: Free Air Correction = FAC = 0.3086 h,
(4)
where h = (elevation − datum elevation). Then, the Free Air anomaly (FAA) is defined as FAA = Gobs − Gt + FAC,
(5)
where Gobs is the observed gravity corrected for drift and tides. Bouguer Correction The mass of material between the gravity station and the datum also causes a variation of gravity with elevation (Fig. 2). This mass effect makes gravity at higher stations higher than that at stations at lower elevations and thus partly offsets the Free Air effect. To calculate the effect of this mass, a model of the topography must be constructed, and its density must be estimated. The traditional approach is crude but has proven to be effective. In this approach, each station is assumed to sit on a slab of material that extends to infinity laterally and to the elevation datum vertically (Fig. 2). The formula for the gravitational attraction of this infinite slab is derived by employing a volume integral to calculate its mass. The resulting correction is named for the French geodesist Pierre Bouguer: Bouguer Correction (BC) = 2π γρh,
(6)
where γ is the International gravitational constant (γ = 6.67 × 10−11 m3 kg−1 s−2 ), ρ is the density, and h = (elevation − datum elevation). As discussed later, the need to estimate density for the calculation of the Bouguer correction is a significant source of uncertainty in gravity studies. Then, the Bouguer anomaly (BA) is defined as BA = Gobs − Gt + FAC − BC
(7)
where Gobs is the observed gravity corrected for drift and tides. If terrain corrections (see later) are not applied, the term simple Bouguer anomaly is used. If terrain corrections have been applied, the term complete Bouguer anomaly is used. A second-order correction to account for the curvature of the earth is often added to this calculation (9).
448
GRAVITATION IMAGING
Terrain Correction Nearby topography (hills and valleys) attracts the mass in the gravimeter (valleys are considered to have negative density with respect to the surrounding rocks) and reduces the observed value of gravity. The terrain correction is the calculated effect of this topography, and it is always positive (a hill pulls up on the mass in the gravimeter and a valley is a mass deficiency). In mountainous regions, these corrections can be as large as 10s of mGals. The corrections have traditionally been made by using Hammer charts (10) to estimate the topographic relief by dividing it into compartments. There have been a number of refinements to this approach as it has been increasingly computerized (8,11,12), but the basic idea has remained unchanged. The use of digital terrain models to calculate terrain corrections has led to a nomenclature of inner zone corrections (calculated by hand using Hammer charts) and outer zone corrections (calculated using terrain models). Unfortunately, radius from the gravity reading that constitutes the divide between inner and outer zones has not been standardized, but this distance has typically been 1–5 km. However, the increasing availability of high-resolution, digital terrain data is on the verge of revolutionizing the calculation of terrain corrections. Many new approaches are being developed, but the general goal is the same: to construct a detailed terrain model and calculate the gravitational effect of this terrain on individual gravity readings. These approaches can also be considered as having replaced the Bouguer slab approximation by a more exact calculation because the goals of the Bouguer and topographic corrections are to estimate the gravitational effect of the topography above the elevation datum to a large radius from the gravity station. This radius is commonly chosen to be 167 km. ¨ os ¨ Correction Eotv Technological advances have made it possible to measure gravity in moving vehicles such as boats and aircraft. However, the motion of the gravimeter causes variations in the centrifugal acceleration and thus the gravitational attraction. This variation is linearly related to the velocity ν of the gravimeter. The correction for this effect, named for the Hungarian geophysicist R. E¨otv¨os, is positive when the gravimeter is moving westward and negative when it is moving eastward. Navigational data from the survey are used to calculate ν, and the equation for the correction is as follows: E¨otv¨os correction (mGal) (EC) = 7.503ν cos λ sin α + 0.004154 ν 2 ,
been achieved, and we say that the area is compensated. Thus, we think that the excess mass represented by a mountain range is compensated for by a mass deficiency at depth. The tendency toward isostatic balance makes regional Bouguer gravity anomalies substantially negative over mountains and substantially positive over oceanic areas. These large-scale anomalies mask anomalies due to shallow (upper crustal) geologic features (13). The delineation of upper crustal features is often the goal of gravity studies. Thus, various techniques have been proposed to separate and map the effects of isostatic equilibrium. The isostatic corrections calculated by these techniques attempt to estimate the gravitational effects of the masses that compensate for topography and remove them from the Free Air or Bouguer anomaly values. A popular approach is calculation of the isostatic residual (14). Wavelength and Fourier Analysis Gravitational imaging usually involves digital processing based on Fourier analysis (3). Most texts on this subject deal with time series [1-dimensional, f (t)], so it is important to clarify the terminology used in the 2-D spatial domain [f (x, y)] of gravitational imaging. Thus, instead of dealing with period and frequency, we deal with wavelength λ in the spatial domain and wave number (k = 2π/λ) in the spatial frequency domain. Regional Gravity Anomaly Gravity anomalies whose wavelengths are long relative to the dimensions of the geologic objectives of a particular investigation are called regional anomalies. Because some shallow geologic features such as broad basins have large lateral dimensions, one has to be careful, but it is thought that regional anomalies usually reflect the effects of relatively deep features. Local (residual) Gravity Anomaly Gravity anomalies whose wavelengths are similar to the dimensions of the geologic objectives of a particular investigation are called local anomalies. In processing gravity data, it is usually preferable to attempt to separate the regional and local anomalies before interpretation. A regional anomaly can be estimated by employing a variety of analytical techniques. The simple difference between the observed gravity anomalies and the interpreted regional anomaly is called the residual anomaly. IMAGING CAPABILITIES AND LIMITATIONS
(8) Availability of Gravity Data
where ν is in knots, α is the heading with respect to true north, and λ is the latitude (3). Isostatic Correction Isostasy can be thought of as the process in the earth that causes the pressure at some depth (most studies place this depth at 30 to 100 km) to be approximately equal over most regions. If this pressure is equal, isostatic balance has
An advantage of gravitational imaging is that a considerable amount of regional data is freely available from universities and governmental agencies throughout the world. However, the distribution of these data is often less organized than that of satellite imagery. As more detailed data are needed, commercial firms can provide this product in many areas of the world. Finally, relative to most geophysical techniques, the acquisition of
GRAVITATION IMAGING
land gravity data is very cost-effective. The determination of the precise location of a gravity station is in fact more complicated than making the actual gravity measurement. Thus, a good approach to producing a database of gravity measurements for a project is use a public domain data set such as the U.S. data provided by Dater et al. (6; http://www.ngdc.noaa.gov/seg/fliers/se-0703. shtml) and the Australian data set provided by the Australian Geological Survey Organization (http://www.agso. gov.au/geophysics/gravimetry/ngdpage.html). Then, if more detailed data are needed, one should contact a commercial geophysical firm via organizations such as the Society of Exploration Geophysicists and the European Association of Geoscientists and Engineers. Finally, field work to obtain new data in the area of a specific target may be required. The Role of Density Knowledge of the density of various rock units is essential in gravity studies for several reasons. In fact, a major limitation in the quantitative interpretation of gravity data is the need to estimate density values and to make simplifying assumptions about the distribution of density in the earth. The earth is complex, and the variations in density in the upper few kilometers of the crust are large. Thus, the use of a single average density in Bouguer and terrain corrections is a major source of uncertainty in calculating of values for these corrections. This fact is often overlooked as we worry about making very precise measurements of gravity and then calculate anomaly values whose accuracy is limited by our lack of detailed information on density. A basic step in reducing gravity measurements to interpretable anomaly values is calculating the Bouguer correction, which requires an estimate of density. At any specific gravity station, one can think of the rock mass whose density we seek as a slab extending from the station to the elevation of the lowest gravity reading in the study area (Fig. 2). If the lowest station is above the datum (as is usually the case), each station shares a slab that extends from this lowest elevation down to the datum, so that this portion of the Bouguer correction is a constant shared by all of the stations (Fig. 2). No one density value is truly appropriate, but when using the tradition approach, it is necessary to use one value when calculating Bouguer anomaly values. When in doubt, the standard density value for upper crustal rocks is 2670 kg m−3 . To make terrain corrections, a similar density estimate is needed. However in this case, the value sought is the average density of the topography near a particular station. It is normal to use the same value as used in the Bouguer correction, but this need not necessarily be so for complex topography and geology. As mentioned in the discussion of the Bouguer correction, modern digital elevation data make it possible to construct realistic models of topography that include laterally varying density. Although preferable, this approach still requires estimating the density of the column of rock between the earth’s surface and the reduction datum. From a traditional view point, this
449
approach represents merging the Bouguer and terrain corrections and then applying them to Free Air anomaly values. One can also extend this approach to greater depths, vary the density laterally, and consider it a geologic model of the upper crust that attempts to predict Free Air anomaly values. Then, the Bouguer and terrain corrections become unnecessary because the topography simply becomes part of the geologic model that is being constructed. When one begins to construct computer models based on gravity anomalies, densities must be assigned to all of the geologic bodies that make up the model. Here, one needs to use all of the data at hand to come up with these density estimates. Geologic mapping, drill hole data, and measurements on samples from the field are examples of information one might use to estimate density. Measurements of Density Density can be measured (or estimated) in many ways. In general, in situ measurements are better because they produce average values for fairly large bodies of rock that are in place. Using laboratory measurements, one must always worry about the effects of porosity, temperature, saturating fluids, pressure, and small sample size as factors that might make the values measured unrepresentative of rock in place. Many tabulations of typical densities for various rock types have been compiled (15,16) and can be used as guides to estimate density. Thus, one can simply look up the density value for a particular rock type (Table 1). Samples can be collected during field work and brought back to the laboratory for measurement. The density of cores and cuttings available from wells in the region of interest can also be measured. Most wells that have been drilled while exploring for petroleum, minerals, and water are surveyed by down-hole geophysical logging techniques, and these geophysical logs are a good source of density values. Density logs are often
Table 1. Typical Densities of Common Types of Rocka Type of rock Volcanic ash Salt Unconsolidated sediments Clastic sedimentary rocks Limestone Dolomite Granite Rhyolite Anorthosite Syenite Gabbro Eclogite Crystalline upper crust Lower crust Upper mantle
Density (kg m−3 ) 1800 2000 2100 2500 2600 2800 2650 2500 2750 2750 2900 3400 2750 3000 3350
a The effects of porosity, temperature, saturating fluids, and pressure cause variations in these values of at least ±0.100 kg/cm−3
450
GRAVITATION IMAGING
available and can be used directly to estimate the density of rock units encountered in the subsurface. However in many areas, sonic logs (seismic velocity) are more common than density logs. In these areas, the Nafe–Drake or a similar relationship between seismic velocity and density (17) can be used to estimate density values. The borehole gravity meter is an excellent (but rare) source of density data. This approach is ideal because it infers density from down-hole measurements of gravity. Thus, these measurements are in situ averages based on a sizable volume of rock, not just a small sample. The Nettleton technique (18) involves picking a place where the geology is simple and measuring gravity across a topographic feature. Then, one calculates the Bouguer gravity anomaly profile using a series of density values. If the geology is truly simple, the gravity profile will be flat when the right density value is used in the Bouguer and terrain corrections. One can also use a group of gravity readings in an area and simply find the density value where the correlation between topography and Bouguer anomaly values disappears.
−107° 00'
−106° 00' 41° 00' 0
km
50
North Park
40° 00' −160 −200 −250 −300 −350 −400
Contour interval: 5 mGals
South Platte River South Park Arkansas River
39° 00'
Construction and Enhancement of Gravitational Images The techniques used to separate regional and local gravity anomalies take many forms and can all be considered as filtering in a general sense (3). Many of these techniques are the same as those employed in enhancing traditional remote sensing imagery. The process usually begins with a data set consisting of Free Air or, more likely, Bouguer anomaly values, and the first step is to produce an anomaly map such as that shown in Figure 3. Gridding The initial step in processing gravity data is creating a regular grid from the irregularly spaced data points. This step is required even in creating a simple contour map, and in general purpose software, it may not receive the careful attention it deserves because all subsequent results depend on the fidelity of this grid as a representation of the actual data. On land, gravity data tend to be very irregularly spaced and have areas of dense data and areas of sparse data. This irregularity is often due to topography; mountainous areas are generally more difficult to enter than valleys and plains. It may also be due to difficulty in gaining access to private property and sensitive areas. Measurements of marine data are dense along the tracks that the ships follow but have relatively large gaps between tracks. Airborne and satellite gravity measurements involve complex processing that is beyond the scope of this discussion. However once these data are processed, the remainder of the analysis is similar to that of land and marine data. A number of software packages have been designed for processing gravity (and magnetic) data, and several gridding techniques are available in these packages. The minimum curvature technique (19) works well and is illustrative of the desire to respect individual data points as much as possible while realizing that gravitational images have an inherent smoothness due to the behavior of the earth’s gravity field. In this technique, the data points that surround a particular grid node are selected. A surface is fitted to these data that satisfies the criteria of minimum curvature between them, and then, the value on this surface at the node is determined. One can intuitively conclude that the proper grid interval is approximately the mean spacing between readings in an area. A good gridding routine should respect individual gravity values and not produce spurious values in areas of sparse data. Once the gridding is complete, the grid interval (usually 100s of meters) can be thought of as being analogous to the pixel interval in remote sensing imagery. Filtering
−107° 00'
−106° 00'
Figure 3. Bouguer gravity anomaly image of a portion of central Colorado. The colors produce an image in which the lowest anomalies are violet and the highest ones are red. As in this case, contour lines are often superimposed on the colors to provide precise anomaly values. The red dashed line is the gravity profile modeled in Fig. 6. See color insert.
The term filtering can be applied to any of the various techniques (quantitative or qualitative) that attempt to separate anomalies on the basis of their wavelength and/or trend (3) and even on the basis of their geologic origin such as isostatic adjustment [i.e., isostatic residual anomaly; (14)]. The term separate is a good intuitive one because the idea is construct an image (anomaly map) and then use filtering to separate anomalies of interest to the interpreter from other interfering anomalies (see regional
GRAVITATION IMAGING
−107° 00'
−106° 00'
−107° 00'
−106° 00'
41° 00' 0
km
451
41° 00'
50
0
km
50
North Park North Park 40° 00' 80 68
40° 00'
48 28 8 −2 −22 −42
Contour interval: 5 mGals
Arkansas River South Park
45 35
South Platte River
25
39° 00'
15
Arkansas River
5 −5 −15
−107° 00'
−106° 00'
Figure 4. Gravity anomaly image formed by applying a 10–150 km (wavelength) band-pass filter to the values shown in Fig. 3. See color insert.
South Platte River
39° 00'
−25
Contour interval: 5 mGals
−107° 00'
versus local anomalies earlier). In fact, fitting a low-order polynomial surface (third-order is used often) to a grid to approximate the regional anomaly is a common practice. Subtracting the values that represent this surface from the original grid values creates a residual grid that represents the local anomalies. In gravity studies, the familiar concepts of high-pass, low-pass, and band-pass filters are applied in either the frequency or spatial domains. In Figs. 4 and 5, for example, successively longer wavelengths have been removed from the Bouguer anomaly map shown in Fig. 3. At least to some extent, these maps enhance anomalies due to features in the upper crust at the expense of anomalies due to deep-seated features. Directional filters are also used to select anomalies on the basis of their trends. In addition, a number of specialized techniques developed to enhance images of gravity data based on the physics of gravity fields are discussed later. The various approaches to filtering can be sophisticated mathematically, but the choice of filter parameters or design of the convolution operator always involves a degree of subjectivity. It is useful to remember that the basic steps in enhancing an image of gravity anomalies to emphasize features in the earth’s crust are (1) First remove a conservative regional trend from the data. The choice of a regional trend is usually not critical but may greatly help in interpretations (14). The goal is to remove long wavelength anomalies, so this step consists of applying a broad high-pass filter.
South Park
−106° 00'
Figure 5. Gravity anomaly image formed by applying a 10–75 km (wavelength) band-pass filter to the values shown in Fig. 3. See color insert.
Over most continental areas, Bouguer anomaly values are large negative numbers; thus, the usual practice of padding the edges of a grid with zeros before applying a Fourier transform and filtering will create large edge effects. One way to avoid this effect is first to remove the mean in the data and grid an area larger than the image to be displayed. However, in areas where large regional anomalies are present, it may be best to fit a loworder polynomial surface to the gridded values and then continue the processing using the residual values with respect to this surface. (2) Then, one can apply additional filters, as needed, to remove unwanted wavelengths or trends. In addition to the usual wavelength filters, potential theory (3) has been used to derive a variety of specialized filters. Upward continuation. A process (low-pass filter) though which a map, simulating the result as if the survey had been conducted on a plane at a higher elevation, is constructed. This process is based on the physical fact that the further the observation is from the body that causes the anomaly, the broader the anomaly. It is mathematically stable because
it involves extracting long-wavelength from shortwavelength anomalies. Downward continuation. A process (high-pass filter) through which a map, simulating the result as if the survey had been conducted on a plane at a lower elevation (and nearer the sources), is constructed. In theory, this process enhances anomalies due to relatively shallow sources. However, care should be taken when applying this process to anything but very clean, densely sampled data sets, because of the potential for amplifying noise due to mathematical instability. Vertical derivatives. In this technique, the vertical rate of change of the gravity field is estimated (usually the first or second derivative). This is a specialized high-pass filter, but the units of the resulting image are not milliGals, and they cannot be modeled without special manipulations of the modeling software. As in downward continuation, care should be taken when applying this process to anything but very clean data sets, because of the potential for amplifying noise. This process has some similarities to nondirectional edge-enhancement techniques used to analyze remote sensing images. Strike filtering. This technique is directly analogous to the directional filters used to analyze remote sensing images. In gravity processing, the goal is to remove the effects of some linear trend for a particular azimuth. For example, in much of the central United States, the ancient processes that formed the earth’s crust created a northeast trending structural fabric that is reflected in gravity maps in the area and can obscure other anomalies. Thus, one might want to apply a strike-reject filter that deletes linear anomalies whose trends (azimuths) range from N30 ° E to N60 ° E. Horizontal gradients. In this technique, faults and other abrupt geologic discontinuities (edges) are detected from the high horizontal gradients that they produce. Simple difference equations are usually employed to calculate the gradients along the rows and columns of the grid. A linear maximum in the gradient is interpreted as a discontinuity such as a fault. These features are easy to extract graphically for use as an overlay on the original gravity image or on products such as Landsat images. Computer Modeling Image processing and qualitative interpretation in most applications of gravitational imaging are followed by quantitative interpretation in which a profile (or grid) of anomaly values is modeled by constructing an earth model whose calculated gravitational effect closely approximates the observed profile (or grid). Modeling profiles of gravity anomalies has become commonplace and should be
Bouguer anomaly (mGal)
GRAVITATION IMAGING
0
20
40
60
80
100
−200 −250
Calculated values
−300 Observed values
−350
SOUTH PARK BASIN
Datum 0
Basin fill (2300 kg m− 3) Depth (km)
452
Crystalline basement (2750 kg m− 3 )
10 20
Deep crustal feature (3050 kg m− 3) 30 0
20
40
60
80
100
Figure 6. Computer model for a profile of gravity values that is shown in Fig. 3. Density values for the elements of the model are given in kg m−3 .
considered a routine part of any subsurface investigation. For example, a model for a profile across Fig. 3 is shown in Fig. 6. In its simplest form, the process of constructing an earth model is one of trial and error iteration in which one’s knowledge of the local geology, data from drill holes, and other data such as seismic surveys are valuable constraints in the process. As the modeling proceeds, one must make choices about the density and geometry of the rock bodies that make up the model. In the absence of any constraints (which is rare), the process is subject to considerable ambiguity because many subsurface structural configurations can fit the observed data. Using some constraints, one can usually feel that the process has yielded a very useful interpretation of the subsurface. However, ambiguities will always remain just as they do in all other geophysical techniques aimed at studying subsurface structure. Countless published articles document a wide variety of mathematical approaches to computer modeling of gravity anomalies (3). However, a very flexible and easy approach is used almost universally for the two-dimensional case (i.e., modeling profiles drawn perpendicular to the structural grain in the area of interest). This technique is based on the work of Hubbert (20), Talwani et al. (21), and Cady (22), although many groups have written their own versions of this software that are increasingly effective graphical interfaces and output. The original computer program was published by Talwani et al. (21), and Cady (22) was among the first to introduce an approximation (called 2 1/2-D) that allows a certain degree of three dimensionality. In the original formulation by Hubbert (20,29), the earth model was composed of bodies of polygonal cross section that extended to infinity in and out of the plane of the profile of gravity readings. In the 2 1/2-D formulation, the bodies can be assigned finite strike
GRAVITATION IMAGING
lengths in both directions. Today, anyone can have a 2 1/2-D model running on his or her PC. The use of three-dimensional approaches is not as common as it should be because of the complexity of constructing and manipulating the earth model. However, there are many 3-D approaches available (3). As discussed earlier, full 3-D calculation of the gravitational attraction of the topography using a modern digital terrain model is the ultimate way to calculate Bouguer and terrain corrections and to construct earth models. This type of approach will be employed more often in the future as terrain data and the computer software needed become more readily available. Gravity modeling is an ideal field in which to apply formal inverse techniques. This is a fairly complex subject mathematically. However, the idea is to let the computer automatically make the changes in a starting earth model that the interpreter constructs. Thus, the interpreter is saved from tedious ‘‘tweaking’’ of the model to make the observed and calculated values match. In addition, the thinking is that the computer will be unbiased compared to a human. The process can also give some formal estimates of the uncertainties in the interpretation. Inverse modeling packages are readily available and can also run on PCs. A free source of programs for modeling gravity anomalies by a variety of PC-based techniques is at http://crustal.usgs.gov/crustal/geophysics/index.html.
interpret. A particularly nice aspect of the gravity technique is that the instrumentation and interpretative approaches employed are mostly independent of the scale of the investigation. Thus, gravitational imaging can be employed in a wide variety of applications. Very small changes in gravity are even being studied as indicators of the movement of fluid in the subsurface. In addition, images of gravity anomalies are ideal candidates for layers in a Geographic Information System (GIS), and typical image processing software provides a number of techniques to merge gravity anomaly data with data sets such as Landsat images. For example, Fig. 7 was constructed by merging Landsat and Bouguer gravity anomaly images for a portion of the Sangre de Cristo Mountains region in northern New Mexico. The merged image shows that low gravity anomaly values are found under a portion of the range in which dense rocks are exposed at the surface in a structural high, a situation that should produce a gravity high. This contradiction poses a major geologic and geophysical challenge for efforts to understand the tectonic evolution of this portion of the Rocky Mountains. The regional geophysics section of the Geological Survey of Canada and the western and central regions of the U.S. Geological Survey maintain the following web sites that include case histories demonstrating applications of gravitational imaging, data sets, and free software: (http://gdcinfo.agg.emr.ca/toc.html?/app/bathgrav/ introduction.html) (http://wrgis.wr.usgs.gov/docs/gump/gump.html) (http://crustal.usgs.gov/crustal/geophysics/index.html)
APPLICATIONS As discussed earlier, gravity data are widely available and relatively straightforward to gather, process, and
250
250
210 215 220 225 230 235 240
245
255
453
0
22 5
21
Figure 7. Merged Landsat and Bouguer anomaly images of a portion of the Sangre de Cristo Mountains in northern New Mexico. The most negative values are less than −255 mGal and are colored purple. See color insert.
454
GRAVURE MULTI-COPY PRINTING
One example of the application of gravitational imaging is the use of gravity anomalies to delineate the geometry and lateral extent of basins that contain groundwater. The sedimentary rocks that fill a basin have low densities and thus produce a negative gravity anomaly that is most intense where the basin is deepest (Fig. 6). Gravity modeling is used to provide quantitative estimates of the depth of the basin and thus the extent of the potential water resource. In the search for hydrocarbons, gravity data are used in conjunction with other geophysical data to map out the extent and geometry of subsurface structures that form traps for migrating fluids. For example, salt is a low-density substance whose tendency to flow often creates traps. Regions where salt is thick thus represent mass deficiencies and are associated with negative gravity anomalies. Gravity data have been used to detect and delineate salt bodies from the very early days of geophysical exploration to the present. A fault is another example of a structure that often acts as a trap for hydrocarbons, and these structures can be located by the gravity gradients that they produce. Gravity data can often delineate ore bodies in the exploration for mineral resources because these bodies often have densities different from the surrounding rock. Images of gravity anomalies also reveal structures and trends that may control the location of ore bodies, even when the bodies themselves produce little or no gravity anomaly. Studies of geologic hazards often rely on images of gravity anomalies to detect faults and evaluate their sizes. Gravity anomalies can also be used to help study the internal plumbing of volcanoes. High precision surveys can even detect the movement of magma in some cases.
ABBREVIATIONS AND ACRONYMS BA BC EC FAA FAC GIS GPS PC
Bouguer anomaly Bouguer correction E¨otv¨os correction Free Air anomaly Free Air correction Geographic Information System Global Positional System Personal Computer
6. D. Dater, D. Metzger, and A. Hittelman, compilers, Land and Marine Gravity CD-ROMs: Gravity 1999 Edition on 2 CDROMs: U.S. Department of Commerce, National Oceanic and Atmospheric Administration, National Geophysical Data Center, Boulder, CO. Web site http://www.ngdc.noaagov/seg/fliers/se-0703.shtml 1999. 7. W. A. Heiskanen and H. Moritz, Physical Geodesy, W. H. Freeman, New York 1967. 8. T. R. LaFehr, Geophysics 56, 1170–1178 (1991). 9. T. R. LaFehr, Geophysics 56, 1179–1184 (1991). 10. S. Hammer, Geophysics 4, 184–194 (1939). 11. L. J. Barrows and J. D. Fett, Geophysics 56, 1061–1063 (1991). 12. D. Plouff, Preliminary documentation for a Fortran program to compute gravity terrain corrections based on topography digitized on a geographic grid: U.S. Geological Survey OpenFile Report 77-535, Menlo Park, CA 1977. 13. T. D. Bechtel, D. W. Forsyth, and C. J. Swain, Geophys. J. R. Astron. Soc. 90, 445–465 (1987). 14. R. W. Simpson, R. C. Jachens, R. J. Blakely, and R. W. Saltus, J. Geophys. Res. 91, 8348–8372 (1986). 15. W. M. Telford, L. P. Geldart, and R. E. Sheriff, Applied Geophysics, 2nd ed., Cambridge University Press, Cambridge, 1990, pp. 6–61. 16. R. S. Carmichael, ed., CRC Handbook of Physical Properties of Rocks, vol. III. CRC Press, Boca Raton, FL, 1984. 17. P. J. Barton, Geophys. J. R. Astron. Soc. 87: 195–208 (1986). 18. L. L. Nettleton, Geophysics 4, 176–183 (1939). 19. I. C. Briggs, Geophysics 39, 39–48 (1974). 20. M. K. Hubbert, Geophysics 13, 215–225 (1948). 21. M. Talwani, J. L. Worzel, and M. Landisman, J. Geophys. Res. 64, 49–59 (1959). 22. W. J. Cady, Geophysics 45, 1507–1512 (1980). 23. J. Milsom, Field Geophysics: Geological Society of London Handbook, Halsted Press, NY, 1989. 24. E. S. Robinson and C. Coruh, Basic Exploration Geophysics, John Wiley, NY, 1988. 25. A. E. Mussett and M. A. Khan, Looking into the Earth, Cambridge University Press, Cambridge, 2000.
GRAVURE MULTI-COPY PRINTING BARRY LEE Rochester Institute of Technology School of Printing Management and Sciences, RIT Rochester, NY
BIBLIOGRAPHY 1. G. P. Woollard, Geophysics 44, 1352–1366 (1979). 2. National Imagery and Mapping Agency, 1997, Department of Defense World Geodetic System 1984: Its definition and relationship with local geodetic systems: NIMA TR8350.2, 3rd Edition, 4 July 1997, Bethesda, MD. Also available to download from their web site http://164.214.2.59/GandG/pubs.html 3. R. J. Blakely, Potential Theory in Gravity and Magnetic Applications, Cambridge University Press, Cambridge, 1996. 4. I. M. Longman, J. Geophys. Res. 64, 2351–2355 (1959). 5. C. Morelli, ed., The International Gravity Standardization Net, 1971, International Association of Geodesy Special Publication, Paris, 1974, 4.
INTRODUCTION Gravure, also known as rotogravure, is an intaglio (from the Italian word ‘‘intagliare’’ meaning to engrave) printing process. The intaglio printing processes are characterized by printing plates (image carriers) whose images have been etched or engraved into a hard surface. To print from an intaglio image carrier, the recessed images must be flooded, or filled with ink, and the surface of the image carrier must be cleared of ink, usually by a metal wiping blade known as a ‘‘doctor blade.’’ Then, the paper or other substrate is pressed against the intaglio image carrier,
GRAVURE MULTI-COPY PRINTING
and the resulting contact between substrate and ink-filled image areas causes the ink to transfer from the image carrier to the substrate. Intaglio images are engraved or etched into a hard, flat surface and then filled with ink. The ink used for intaglio printing often has a high viscosity, the same consistency as paste. The nonimage surface of the intaglio image carrier is then cleared of all excess ink by the wiping action of a thick metal doctor blade. Gravure printing has evolved from early intaglio printing processes and has been adapted to a rotary printing process capable of higher resolution, higher speeds, and greater production capacity than the traditional intaglio processes. The primary differences between gravure and printing from a conventional intaglio plate involve the type of ink used for each process and the variation of the image carrier. The gravure printing process uses a much more fluid, low viscosity ink. This ink has been adapted to dry quickly on a variety of substrates. To contain and control this fluid ink all images on a gravure image carrier have been etched or engraved in a series of tiny cells (Fig. 1). Typically the width of the individual cells that comprise a gravure image are between 30 microns and 300 microns, depending on the type of image being reproduced. On a single gravure image cylinder there may be type matter, halftone graphics, and solid block area images each comprised of hundreds of thousands of cells. In the gravure press, these cells are filled with ink and the non-image surface areas of the gravure cylinder are cleared of ink by a thin metal doctor blade. The low viscosity of the gravure ink also allows for easy cell filling in a rotary process. HISTORICAL DEVELOPMENT OF THE GRAVURE PROCESS The history of intaglio printing begins in Germany in the year 1446 A.D., when the first engraved plates were used for printing playing cards and other illustrations. These intaglio images were hand engraved into the surface of wood and later copper. In the mid-fifteenth century as
Web
Gravure cylinder Impression roller Doctor blade Ink fountain
Figure 1. The five essential elements are impression roller, web & controls, gravure cylinder, doctor blade, and ink fountain (courtesy of GAA).
455
Johannes Gutenberg was developing the relief process, letterpress, intaglio was being developed as a method of applying illustrations other than type, or textual matter. The invention of chemical etching in 1505 made intaglio plate imaging much easier and also improved the quality of the images to be engraved. Copper plates were covered with an acid-resistant coating and the intaglio-imaging artist would simply scrape the intended images into the resist coating, exposing the copper in the image areas. Eventually, for many applications, the high-quality images printed by intaglio were combined with the text pages of books printed by the letterpress process. The most famous example of this technique is the French Encyclop´edie, published from 1751 to 1755 by Denis Diderot. Encyclop´edie included several volumes of text and several volumes of intaglio illustrations (1). In 1783, Thomas Bell, a British textile printer, was granted the first patent for a rotary intaglio press. This press, the first of its kind, marked the beginnings of automated intaglio printing and evolution to the rotogravure process. Before the press patented by Bell, intaglio printing was a much more manual printing process. By contrast, the first rotary intaglio press allowed for a continuous nonstop sequence of all of the imaging functions necessary for intaglio printing: (1) Image areas are filled with ink. (2) A doctor blade cleans nonimage areas. (3) The Image is transferred to the substrate. Interestingly, and as evidence of the simplicity of the gravure process, this early press contained all of the components that are on gravure presses built today. Once the press had become automated, it remained for cylinder imaging techniques to improve. Afterall, scraping images by hand into an etch-resistant coating layer and then etching copper with acid does not lend itself to fast turnaround, high-quality imaging. ‘‘In 1826 JosephNi´ecephore Ni´epce produced the first photo-mechanical etched printing plate. He covered a zinc plate with a light sensitive bitumen and exposed through a copper engraving, which was made translucent by the application of oil’’ (1). The unexposed coating on the zinc was developed, and then the plate image areas were etched. The invention of photography in the late 1820s led to further improvements in the photomechanical image transfer process. The photomechanical intaglio imaging technique was revised and improved by William Henry Fox Talbot and English engraver J. W. Swan. Fox Talbot was responsible for two important discoveries that led directly to improved gravure imaging techniques: the halftone screening process and the light sensitivity of chrome colloids. The halftone screening process allowed printers to reproduce continuous tone images by converting those images to halftone dot patterns. In the 1860s, Swan, using the light sensitivity of chrome colloids, became the first to use carbon tissue as a resist for gravure etching. Carbon tissue ‘‘. . .was a gelatin resist coating on a light-sensitive material applied to the surface of the paper. After exposure the paper could be removed and the exposed coating applied to another surface, such as a metal plate — or plate cylinder’’ (2). After application to the gravure cylinder, the exposed coating, became the ‘‘stencil’’ for etching by iron perchloride solutions.
456
GRAVURE MULTI-COPY PRINTING
In 1860, French publisher Auguste Godchaux patented the first reel-fed gravure perfector press for printing on a paper web. Although the cylinders for this press were hand engraved, the patent and the 1871 English trade paper, The Lithographer, indicated another departure from traditional intaglio toward a gravure image; ‘‘The engraving of these copper cylinders. . . is not lined by the engraver. . . nor etched with acid. . . but is accomplished by a series of minute holes hardly discernable by the naked eye, but forming together the outline of the letters. . . to be printed’’ (1). The Godchaux press design in combination with the cellular (minute holes) structure of the images on the gravure cylinder are the first applications of what is known today as gravure printing. It is difficult to determine who can be credited with developing that most recognizable characteristic of the gravure image carrier, the gravure crosshatch screen. ‘‘Using dichromate gelatin as the sensitive coating on a copper or steel plate, Fox Talbot placed objects such as leaves and pieces of lace on the coating and exposed it to daylight’’ (3). No doubt the lace pattern would have broken the images into small ‘‘cells.’’ Another early application of the gravure screen pattern may have been developed in France. A French patent in 1857 by M. Bechtold describes an opaque glass plate inscribed with fine lines. This plate was exposed to the metal plate and then turned 90° and exposed again. Finally, the cylinder (or plate) was exposed through a diapositive. This technique produces a crosshatch pattern on gravure plates, which would provide a noncell or land area on gravure images. Karl Klietsch, also called Karel Klic was also among the first to realize and take advantage of gravure technology for production. Klic was a pioneer of the gravure printing process and often served as a high-priced consultant to those interested in his printing techniques. In March of 1886, Klic left his home in Vienna for England where he met Samuel Fawcett, a gravure cylinder engraver in the textile decorating business. Fawcett had a number of years of experience in engraving gravure cylinders; he had also spent years refining photographic imaging and engraving technology. Klic and Fawcett combined all of the components of the Thomas Bell textile press (which by this time had no doubt been refined) with the latest cylinder imaging techniques that had been patented by Fawcett. Together, they founded the Rembrandt Intaglio Printing Company in 1895. The Rembrandt Intaglio Printing Company was the most respected and progressive gravure printer of the era. The printing and cylinder imaging techniques were closely guarded secrets. In 1895, beautiful Rembrandt prints were sold in London art shops where they created quite a stir. Londoners were curious to know more about the process capable of producing high quality prints that sold at such low prices. Klic and the Rembrandt Intaglio Printing Company continued their policy of secrecy. Although the Rembrandt Company was printing from cylinders, representatives of the company always referred to the prints as coming from plates — yet another attempt to keep the true nature of their gravure printing techniques secret.
Inevitably, the closely guarded secrets of the Rembrandt Company became known to the outside printing world. In the early 1900s, an employee of the Rembrandt Company moved to the United States and brought the secrets of the Rembrandt Intaglio Printing Company with him. The techniques used for high quality gravure printing may have first come to the United States with the Rembrandt employee, out the first gravure equipment was manufactured in England and installed at the Van Dyke Company in New York in 1903. Soon thereafter, the Englishmen Hermann Horn and Harry Lythgoe installed another gravure press in Philadelphia. Horn and Lythgoe had been experimenting with gravure printing with fellow Englishmen Theodore Reich. Horn sent samples of their gravure prints to art dealers in America. One dealer in particular was so impressed with the prints that he invited Horn and Lythgoe to come to the United States with their gravure printing equipment. In 1904, Hermann Horn and Harry Lythgoe moved to Philadelphia and brought with them from England a small two-color gravure press. In 1913, the New York Times became the first newspaper to use gravure for printing Sunday’s The New York Times Magazine. Today, Sunday supplements like Parade and USA Weekend remain a stronghold of gravure printing. In the 1920s, the gravure printing method was first used to apply graphics on packaging. As the gravure printing process continued to grow in Europe and the United States, chemical cylinder imaging techniques improved. Tone variation was accomplished on the original gravure cylinders by a diffusion transfer method known as conventional gravure (Fig. 2). This method reproduced various tones by varying only the depths of etch of each cell. Later the two-positive method of cylinder etching made it possible to vary the depths of etch and the sizes of the cell. This technique provides an extended tone range because the two-positive cylinders deposit a variable size dot of variable ink film thickness. The only chemical etching methods still in use are called direct transfer systems. Direct transfer chemical etching is used for only a small percentage of the gravure cylinders produced today. The direct transfer method produces variable width cells. To a great extent, electromechanical cylinder imaging technology has replaced chemical imaging methods. Early in the 1950s, the German company, Hell, began developing an electromechanical method of engraving copper cylinders with a diamond stylus. The Hell machine first called the Scan-a-Graver and later called the HelioKlischograph, was the forerunner of modern graphic arts scanners. On an electromechanical engraver, the gravure cylinder is mounted in a precision lathe equipped with an engraving head, including the engraving stylus. On early model engravers, the lathe also carried a ‘‘copy drum’’ on which the continuous tone black-and-white copy to be reproduced was mounted. The Helio-Klischograph converts light energy reflected from the black-and-white copy to voltage. The resulting voltage is amplified and passed to an electromagnet that in turn vibrates a diamond-tipped stylus. The force of the diamond stylus
GRAVURE MULTI-COPY PRINTING
Continuous tone positive
Shadow area
Highlight area
Screen
457
• A gravure cylinder • An ink fountain and on some presses an ink applicator • A doctor blade, a doctor blade holder, and a doctor blade oscillating system • An impression roll • A dryer • Web handling equipment and tension systems Gravure Cylinders
Carbon tissue
Hardened gelatin
Paper backing Unhardened gelatin Paper backing
Hardened gelatin Unhardened gelatin
Cylinder
Acid
Cylinder Unetched "lands" Shadow Highlight
Figure 2. Conventional gravure cylinder making using carbon tissue. Image is transferred directly to cylinder surface and etched into it (courtesy of GAA).
is controlled electronically to cut cells of various size and depth depending on the copy requirements. Modern engravers accept all forms of digital input. Presently, the majority of gravure cylinders are imaged by electromechanical diamond engraving technology; however, many recent developments may gain acceptance in the future. Some of the new developments include laser engraving (used in metals other than copper), Electron Beam engraving in copper, and the use of polymer materials to replace the copper image cylinder. Gravure Printing Components A gravure printing press includes the following components (Fig. 1):
Sheet-fed gravure presses print from engraved copper plates, and web-fed gravure printing is done from a cylinder. In most gravure applications, the gravure printing unit can handle cylinders of varying circumference. This feature of gravure printing allows an infinitely variable repeat length in printing. There are two types of gravure cylinders, sleeve (or mandrel) cylinders and integral shaft cylinders. Sleeve cylinders are stored as shaftless hollow tubes and must be mounted on shafts before going to press, engraving, or copper plating. Although cheaper, lighter, and easier to handle than their counterpart, the integral shaft cylinders, the sleeve cylinders are not as accurate. Consequently, sleeve cylinders are used predominantly in packaging gravure and are more common on presses 40 in. or less in web width. Some gravure presses are designed to print from light-weight copper-plated phenolic resin sleeves. Integral cylinders are used on larger presses and when high quality printing is important. The integral cylinder is heavier and more expensive than the sleeve cylinders; however the integral cylinder is much more accurate. All publication gravure presses use integral cylinders. Both types of cylinders are reuseable and continue through the cylinder life cycle, electroplating and surface finishing, imaging, chrome plating, proofing and printing, stripping of old images. Copper is by far the most common material used for gravure cylinder imaging and gravure printing. After the steel cylinder bodies are formed, in the manufacture of gravure cylinders a thin (0.0002–0.0004 in.) base coat of copper is plated onto the steel cylinder body. This coat of copper is permanent and remains on the cylinder. In preparation for cylinder imaging, a face coat of copper is electroplated onto the base coat. The thickness of the face coat of copper varies from 0.004–0.050 in., depending on the requirements of the print job. Copper plating tanks use acid-based electrolytes that consist of copper sulfate, sulfuric acid, deionized water, and additives designed to affect the surface and hardness of the copper. The anode is made of copper that is removed during the plating process; the gravure cylinder is the cathode. The rate of deposition in a copper plating tank is from 0.003–0.005 in. per hour. After plating, the copper must be polished. Gravure cylinders may require grinding and polishing, or they may require only polishing. The cylinder is ground to correct the diameter, and it is polished to prepare the surface without changing its diameter. Cylinders are polished with polishing paste, polishing paper, lapping film, or polishing stones. Once the cylinder has been plated and
458
GRAVURE MULTI-COPY PRINTING
polished, it is ready to be imaged by electromechanical, laser or chemical means (see Gravure Cylinder Imaging). After imaging, the cylinders chrome plated to increase its life and to reduce the coefficient of friction of the cylinder surface. The electrolyte for chrome plating consists of chromic acid, sulfuric acid, deionized water, and small amounts of other additives. The rate of deposition for chrome plating is from 0.0003–0.0005 in. per hour, and the typical thickness for chrome plating is from 0.0002–0.0007 in. After chrome plating, the cylinder may be polished again and is then ready for cylinder proofing on the press. Following imaging and chrome application, a cylinder is sometimes proofed. If the cylinder proof reveals any inaccuracies, some corrections of the images can be made. If the imaged cylinder is accurate or if it is not proofed, it is then ready for the press. On press, the gravure cylinder may last for 10 million or more impressions. Commonly, the chrome cylinder covering will begin to show signs of wear after three to five million impressions, depending on the abrasiveness of the ink. When wear signs are noticed, the cylinder is removed from the press and can be rechromed and reinserted in the press. When the print run is completed, the cylinder is removed from the press and stored for future use, or the cylinder may reenter the life cycle. If the cylinder reenters the cycle the chrome plating and copper engraved images have to be removed before replating a fresh copper layer. The chrome layer can be removed by reverse electrolysis and then the copper image layer removed by lathe cutting, or both layers may be lathe cut. Ink Fountains/Ink Applicators The gravure ink fountain includes a reservoir of ink, a pumping and circulating system, and on some presses temperature control devices, filtration, and/or, an ink applicator. The function of the ink fountain is to provide a continuous supply of ink to the gravure cylinder. Gravure ink is a low viscosity combination of dissolved resin (the vehicle), solvent, and dispersed colorants (usually pigments). The fountain at each gravure unit contains a reservoir of ink that is pumped into the printing unit. The ink pumped to the printing unit is held close to the gravure cylinder, and the gravure cylinder is partially submerged in the ink. The ink fountain and the printing unit must be designed to allow complete filling of the gravure cells as the cylinder rotates in the ink bath. Some gravure presses are equipped with ink applicators. On presses using ink applicators, the ink is pumped to the applicator. The applicator is designed to flow ink onto and across the entire width of the cylinder. Excess ink flowing from the cylinder is captured by a ‘‘belly pan,’’ and then flows back to the ink reservoir. Gravure ink is continuously circulated and kept under constant agitation to ensure proper blending of the components. Doctor Blade There is some disagreement about the origin of the term ‘‘doctor blade.’’ According to the textbook, Gravure Process and Technology, ‘‘ The name ‘‘doctor blade’’ is derived
from wiping blades used on ‘‘ductor’’ rolls on flatbed letterpress equipment, and in common usage ductor became doctor.’’ (3). An excerpt from a nineteenth century London publication, The History of Printing, by the Society for Promoting Christian Knowledge, suggests a different origin: This important appendage to the machine is called the ‘‘doctor,’’ a name which has been thus oddly accounted for in Lancashire: when one of the partners in the firm by whom cylinder printing was originally applied was making experiments on it, one of the workmen, who stood by, said, ‘‘Ah! This is very well, sir, but how will you remove the superfluous colour from the surface of the cylinder?’’ The master took up a common knife which was near, and placing it horizontally against the revolving cylinder, at once showed its action in removing the colour, asking the workman, ‘‘What do you say to this?’’ After a little pause, the man said, ‘‘Ah, sir, you’ve doctored it,’’ thus giving birth to a name for the piece of apparatus. The most common material used for doctor blades is tempered spring steel. Gravure printers who use waterbased ink often use stainless steel to counteract the corrosiveness of water. Other materials such as brass, bronze, rubber and plastic have also been used. Blade thickness ranges from 0.002–0.020 in. The doctor blade is mounted into a doctor blade holder on press. The holder oscillates the blade back and forth across the gravure cylinder. The oscillation stroke is designed to minimize blade wear and to distribute blade wear evenly during the press run. Doctor blade oscillation also helps to remove any foreign particles that may be trapped between the blade and the cylinder. The doctor blade holder (Fig. 3) is adjustable to allow for adjustments to four aspects of blade to cylinder contact: (1) blade to impression roll nip distance, (2) angle of contact, (3) parallelism with the cylinder, (4) running contact pressure. A doctor blade is pressurized by mechanical, pneumatic, or hydraulic means. The rule of thumb when setting a doctor blade is that the pressure should be as light as possible and equal across the face of the cylinder. Impression Rolls The impression roll on a gravure print station is a hollow or solid steel roll covered with a seamless sleeve of synthetic rubber. The primary function of the impression roll is to provide the pressure necessary for the substrate to contact the ink in the gravure cell at the printing nip. The impression roll is pressed against the gravure cylinder by either pneumatic or hydraulic force. Because the surface of many substrates is rough or irregular, the impression roll pressure is sometimes as high as up to 250 pounds per square inch. The pressure on the impression roll is high enough to cause the rubber covering to compress and flatten as it travels through the printing nip. During normal press adjustments, the compressed area of the printing nip, appropriately called the flat, is measured and the nip pressure is adjusted accordingly. For most applications, the flat area should be one-half inch in width and equal across the width of the impression roll.
GRAVURE MULTI-COPY PRINTING
459
(b)
(a)
Figure 3. Doctor blade setup (courtesy of GAA).
Under ideal conditions, the rubber covering of the gravure impression roll is changed whenever the substrate is changed. For example, ink transfer from the cells to smooth substrates like polyethylene and polypropylene is maximized by using a softer rubber covering, say 65 durometer on a Shore A scale. By comparison, a paperboard substrate with a hard, rough, porous surface is best printed using a harder rubber covering, perhaps a 90 durometer on the Shore A scale. A printing defect unique to the gravure process known as ‘‘cell skipping, skip dot, or snowflaking’’ occurs when the substrate is so rough that some of the smaller gravure cells fail to contact some of the ‘‘valley’’ areas of the substrate. This problem is partially due to the fact that the ink in a gravure cell immediately forms a meniscus after being metered by the doctor blade. This phenomenon prevents contact between the ink in the gravure cell and the rough areas of the substrate. To address this problem, the Gravure Research Institute (today known as the Gravure Association of America) worked with private industry to develop Electro Static Assist (ESA). ESA was introduced in the 1960s and has become standard equipment on many gravure presses. The concept of ESA is to employ the principle of polarity and the attraction of opposite charges. ESA apples a charge to the impression roll and an opposite charge to the gravure cylinder. Just before the printing nip, the top of the ink’s meniscus is pulled slightly above the surface of the cell. As the cell enters the printing nip, the ink contacts the substrate, and capillary action completes the ink transfer. Unlike electrostatic printing, ESA does not cause the fluid ink to ‘‘jump’’ across an air gap; it merely aids transfer by providing contact between ink and substrate in the printing nip. Dryers The inks used for gravure printing are fast drying, and multicolor gravure printing is accomplished by dry trapping. Dry trapping means that even at the highest press speeds, one ink film must be applied and dried before the next ink film can be printed over it. To dry
trap while maintaining high press speed, the web travels through a dryer immediately after leaving the printing unit. The ink film must be dry enough so that it does not offset on any rolls with which it comes in contact. A dryer includes some type of heating element and high velocity air supply and exhaust units. The temperature of the dryer setting depends on the substrate being printed, the air flow on the web, the type of ink being dried, and the speed of the press. The dryer air flow is adjusted to create a slight negative pressure is by adjusting the exhaust air volume to exceed the supply volume slightly. By creating negative air pressure, the gravure printer eliminates the possibility that ink volatiles return into the pressroom. Web Handling Equipment and Tension Systems Most gravure printing is done on web-fed presses; limited production is done on sheet-fed gravure presses. In any web printing method, the substrate to be printed must be drawn from the supply roll and fed into the printing units at a controlled rate and under precise tension. Gravure press web handling is often separated into zones: the unwind, the infeed, the printing units, the outfeed, finishing units, the rewind. The unwind zone of a gravure press may be a single reel stand, a fixed position dual reel unwind, or a rotating turret dual reel stand. To prepare a roll for mounting on a single reel stand, the reel tender inserts a shaft into the roll’s core or positions the roll between two conical tapered reel holders. The roll is lifted by hoist and set into the frame of the reel stand. The reel stand includes an unwind brake that is linked to the shaft. The braking mechanism is designed to act against the inertia of the roll while the press is running to provide resistance against web pull and, in turn, tension on the web. Unwind brake systems are usually closed loop systems that work in combination with a dancer roll. As the dancer roll moves up or down because of tension variations, the dancer’s movement controls the amount of force applied by the unwind brake.
460
GRAVURE MULTI-COPY PRINTING
When the roll of material on a single position reel stand has run out, the press must be stopped while a new roll of material is positioned and spliced to the tail end of the expired roll. Fixed position dual reel stands and rotating turret reel stands are designed to increase productivity by allowing continuous operation of the press by a ‘‘flying splice’’ mechanism. While a roll is running in the active position of these types of dual unwind stands, the idle position is prepared with another roll of substrate. As the active roll of substrate runs out, the roll in the second position is automatically spliced to the moving web, thus providing a continuous press run. The infeed zone of a gravure press isolates the unwind from the printing sections. The infeed section is designed is to pull substrate from the reel on the unwind and to provide a consistent flow of web to the printing units. The tension from the unwind is provided by a pull against the brake tension as the web path travels through a nip between a rubber-covered draw roll and a metal driven roll. The speed of the metal driven roll can be varied to provide more or less tension to the printing units. Often the tension on the web is measured by transducers as it leaves the infeed, and the speed of the driven roll in the infeed is electronically tuned to match the printing unit tension requirements. The printing units of a gravure press influence web tension because of the significant amount of pressure in the printing nip. Consequently, the speed of the printing cylinders, the infeed drive roll, and the driven rolls on the outfeed must be adjusted to maintain uniform web tension. It is common practice to adjust the speed of the driven rolls at the infeed to turn at a speed that is 1/2% slower than that of the printing cylinders. The speed of the driven rolls at the rewind, or finishing zone, are commonly adjusted to a speed that is 1/2–1% faster than that of the printing cylinders. These adjustments help eliminate slack throughout the press despite the natural tendency of the web to elongate under tension. As the web exits the printing section of the press, it may be rewound, or it may go through one or more finishing units. Publication gravure presses include large folders that are designed to slit the web into multiple ribbons. The ribbons are then folded and trimmed to form the individual signatures of magazines or catalogs. A press used for folding carton printing might have a rotary or flatbed diecutting unit and a delivery section for finished materials. SHEET-FED GRAVURE Sheet-fed gravure presses are in limited use for printing high-quality folding cartons, art reproductions, posters, and proofs. The substrate supply for sheet-fed gravure is not in web form but in cut and stacked sheets. Most, but not all gravure production is done on web-fed presses. A sheet-fed gravure press includes an ink fountain, a gravure cylinder or a gravure plate, and a doctor blade mechanism. The press also includes a delivery section called a ‘‘feed table,’’ transfer cylinders with grippers for transporting individual sheets through the press, and a delivery table designed to restack printed sheets.
OFFSET GRAVURE The offset gravure process is used for graphic application on some products like medicine capsules, metal cans, and candies. An offset gravure printing unit includes all of the components of a typical gravure unit and also includes a rubber blanket image transfer cylinder. In offset gravure printing, images are printed from a gravure plate or cylinder onto a soft rubber blanket or transfer medium; the images are then offset from the blanket to the material to be printed. A variation of this process is called pad transfer. The transfer medium in pad transfer printing is a soft silicon ‘‘ball’’ that has been shaped to conform to the item being printed. A pad transfer printing unit includes a gravure plate image carrier, an ink flooding mechanism, and a doctor blade. Once the ink has filled the recesses of the image carrier and the nonimage areas have been cleaned with the doctor blade, the silicon ball is pressed against the image carrier, contacts the inked images, and transfers the images to the silicon ball. Then, the silicon ball is pressed to the material being printed, transferring the image.
GRAVURE CYLINDER IMAGING The methods used to etch or engrave images into copper for gravure cylinders have changed little since their inception (see The Historical Development of the Gravure Process). Today, there are two methods for placing cellular images are placed in copper for gravure cylinders; chemical etching and engraving by an electromechanically operated diamond stylus, or less frequently by a laser. The original chemical imaging process known as conventional gravure, also called diffusion etch, is no longer used. Conventional gravure used a diffusion transfer resist commonly referred to as carbon tissue. Exposing a carbon tissue first to a continuous tone film positive for tone work or vignettes, or a 100% black positive for line copy and text formed the stencil used for chemical etching. The second exposure was to a gravure screen. For tone copy or vignettes the carbon tissue material was light hardened with an infinite variation of hardness from little or no exposure effect, which would occur in the shadow areas of the film positive where little light is available during exposure, to nearly complete hardness in the highlight where most of the exposure light reaches the carbon tissue. After exposure to the continuous tone positive, the land areas that define the gravure cell are formed by exposure to a special gravure screen. The gravure screen is a crosshatch that contains between 100 and 200 lines per inch. When exposed to the carbon tissue, the areas of the stencil that would protect the land areas of the image and cell walls are light hardened. After exposure, the carbon tissue is inverted and applied to the surface of the gravure cylinder and then developed. The highlight areas of the carbon tissue are the hardest because they inhibit diffusion of the etchant; the shadow areas are less hard and provide less of a barrier to the etchant. Consequently, during the etching
GRAVURE MULTI-COPY PRINTING
process the etchant reaches the copper in the shadow areas first, the midtones later in the process, and finally the highlights, resulting in cells of varying depth but constant width. All cells are large and square for all percentages of tone — highlight to shadow — however, they varied in depth to create cell volume variability and thus, print with density variability. The depths of etch range from 5–10 microns in the highlight cells to 40–45 microns in the shadow cells. A second chemical imaging method, the two-positive or sometimes called lateral hard dot (also no longer being used) was developed in the United States. Imaging for the two-positive method was very similar to the imaging methods for conventional gravure but had one important difference. Rather than a gravure screen, a half-tone positive was used for the second exposure to the carbon tissue. Consequently, the two-positive method yielded cells that varied in both depth (due to the exposure to the continuous tone positive) and width (due to the exposure to the half-tone positive). Highlight cells were shallow, small, and round; shadow cells were large, deep, and square. Compared to the conventional method, the two-positive method provided increased land area between the highlight cells and deeper highlight cells, both of these factors added to the life of the gravure cylinder. The twopositive method never gained favor in Europe, probably because of the complexity of registering two film positives during exposure of the carbon tissue. The two-positive method of gravure cylinder imaging was the major method of gravure cylinder imaging in the United States before electro-mechanical methods were introduced in the early 1970s. The introduction of photographic polymer films in the 1950s marked the beginning of the direct transfer, sometimes called the single positive, method of gravure cylinder etching. In the direct transfer process, a cylinder is completely coated with a liquid photopolymer resist. The photopolymer dries on the cylinder — eventually to become the stencil for the etching process. After the photopolymer application, the light-sensitive photopolymer coated on the cylinder is exposed to a special gravure screened film positive consisting of halftones where required and screened solids and text. Exposure to ultraviolet light hardens the photopolymer (and it become insoluble) in the exposed nonimage areas. One direct transfer system offered by the Japanese company Think Laboratories is fully automated and can include a laser exposure unit that eliminates film exposure. Following exposure, the photopolymer is developed with water or dyed solvent, and the image areas are cleared of the photopolymer resist. The next step is etching usually by ferric chloride in an etching bath for 3 to 5 minutes. The direct transfer method of chemical imaging produces cells of variable width and constant depth (approximately 40–45 microns). The direct transfer method remains the only chemical imaging process presently used for gravure cylinders. The percentage of cylinders imaged chemically is estimated at less than 5% in the United States and with a slightly higher percentage worldwide.
461
ELECTROMECHANICAL ENGRAVING Early in the 1950s the German company, Hell, began developing an electromechanical method of engraving copper cylinders with a diamond stylus. The Hell machine, first called the Scan-a-Graver and later called the HelioKlischograph, was the forerunner of modern graphic arts scanners. On an electromechanical engraver, the gravure cylinder is mounted in a precision lathe equipped with an engraving head, including the engraving stylus. On early model engravers, the lathe also carried a ‘‘copy drum’’ on which the continuous tone black-and-white copy to be reproduced was mounted. The Helio-Klischograph converts light energy reflected from the black-and-white copy to voltage. The resulting voltage is amplified and passed to an electromagnet that, in turn, vibrates a diamond-tipped stylus. The force of the diamond stylus is controlled electronically to cut cells of various size and depth, depending on the copy requirements. Modern engravers accept all forms of digital input that has been scanned from original copy. The cells, cut by a true diamond, resemble the shape of the diamond and so vary in width and depth (Fig. 4). A highlight cell is cut from the very tip of the diamond and is relatively shallow and narrow. A shadow cell is cut with the nearly the entire diamond and is relatively deep and wide. Cells made by an electromechanical engraver vary in width between 20 and 220 microns. The depth of the cell is always a function of the shape of the diamond that cut it. By varying the speed of cylinder rotation during engraving and the amplitude of the pulsating diamond stylus, the engraver can cut cells of a compressed, normal (diamond shape), or elongated shape. By varying the cell shape, the engraver also varies the angle at which the cells are aligned on the cylinder. The approximate cell angles of 30° for a compressed cell, 45° for a normal cell, and 60° for an elongated cell help the engraver and printer avoid moir´e patterns in multicolor printing. Because the cells must be consistently precise in size and shape, control of the engraving stylus is critical. This fact has limited the speed of the engraver to approximately 5000 cells per second. The cell cutting speed of the engraver varies with the type of image elements and the screen ruling being cut. Average production speeds for an electromechanical engraver range from 3200–4500 cells per second. Presently, new engraving heads are being developed to double the rate of cell generation. To speed the imaging process when imaging a cylinder for publication gravure printing, an engraver is equipped with several engraving heads. In some applications, as many as 12 individual engraving heads, driven simultaneously by separate digital information, are used to image the various pages on one gravure cylinder. This technique can significantly reduce the time required to image a gravure cylinder; however, it can be used only when the image elements to be printed are separated by some nonimage boundaries as in magazine printing where each page is void of images in its margins. When gravure cylinders are imaged for applications
460
GRAVURE MULTI-COPY PRINTING
When the roll of material on a single position reel stand has run out, the press must be stopped while a new roll of material is positioned and spliced to the tail end of the expired roll. Fixed position dual reel stands and rotating turret reel stands are designed to increase productivity by allowing continuous operation of the press by a ‘‘flying splice’’ mechanism. While a roll is running in the active position of these types of dual unwind stands, the idle position is prepared with another roll of substrate. As the active roll of substrate runs out, the roll in the second position is automatically spliced to the moving web, thus providing a continuous press run. The infeed zone of a gravure press isolates the unwind from the printing sections. The infeed section is designed is to pull substrate from the reel on the unwind and to provide a consistent flow of web to the printing units. The tension from the unwind is provided by a pull against the brake tension as the web path travels through a nip between a rubber-covered draw roll and a metal driven roll. The speed of the metal driven roll can be varied to provide more or less tension to the printing units. Often the tension on the web is measured by transducers as it leaves the infeed, and the speed of the driven roll in the infeed is electronically tuned to match the printing unit tension requirements. The printing units of a gravure press influence web tension because of the significant amount of pressure in the printing nip. Consequently, the speed of the printing cylinders, the infeed drive roll, and the driven rolls on the outfeed must be adjusted to maintain uniform web tension. It is common practice to adjust the speed of the driven rolls at the infeed to turn at a speed that is 1/2% slower than that of the printing cylinders. The speed of the driven rolls at the rewind, or finishing zone, are commonly adjusted to a speed that is 1/2–1% faster than that of the printing cylinders. These adjustments help eliminate slack throughout the press despite the natural tendency of the web to elongate under tension. As the web exits the printing section of the press, it may be rewound, or it may go through one or more finishing units. Publication gravure presses include large folders that are designed to slit the web into multiple ribbons. The ribbons are then folded and trimmed to form the individual signatures of magazines or catalogs. A press used for folding carton printing might have a rotary or flatbed diecutting unit and a delivery section for finished materials. SHEET-FED GRAVURE Sheet-fed gravure presses are in limited use for printing high-quality folding cartons, art reproductions, posters, and proofs. The substrate supply for sheet-fed gravure is not in web form but in cut and stacked sheets. Most, but not all gravure production is done on web-fed presses. A sheet-fed gravure press includes an ink fountain, a gravure cylinder or a gravure plate, and a doctor blade mechanism. The press also includes a delivery section called a ‘‘feed table,’’ transfer cylinders with grippers for transporting individual sheets through the press, and a delivery table designed to restack printed sheets.
OFFSET GRAVURE The offset gravure process is used for graphic application on some products like medicine capsules, metal cans, and candies. An offset gravure printing unit includes all of the components of a typical gravure unit and also includes a rubber blanket image transfer cylinder. In offset gravure printing, images are printed from a gravure plate or cylinder onto a soft rubber blanket or transfer medium; the images are then offset from the blanket to the material to be printed. A variation of this process is called pad transfer. The transfer medium in pad transfer printing is a soft silicon ‘‘ball’’ that has been shaped to conform to the item being printed. A pad transfer printing unit includes a gravure plate image carrier, an ink flooding mechanism, and a doctor blade. Once the ink has filled the recesses of the image carrier and the nonimage areas have been cleaned with the doctor blade, the silicon ball is pressed against the image carrier, contacts the inked images, and transfers the images to the silicon ball. Then, the silicon ball is pressed to the material being printed, transferring the image.
GRAVURE CYLINDER IMAGING The methods used to etch or engrave images into copper for gravure cylinders have changed little since their inception (see The Historical Development of the Gravure Process). Today, there are two methods for placing cellular images are placed in copper for gravure cylinders; chemical etching and engraving by an electromechanically operated diamond stylus, or less frequently by a laser. The original chemical imaging process known as conventional gravure, also called diffusion etch, is no longer used. Conventional gravure used a diffusion transfer resist commonly referred to as carbon tissue. Exposing a carbon tissue first to a continuous tone film positive for tone work or vignettes, or a 100% black positive for line copy and text formed the stencil used for chemical etching. The second exposure was to a gravure screen. For tone copy or vignettes the carbon tissue material was light hardened with an infinite variation of hardness from little or no exposure effect, which would occur in the shadow areas of the film positive where little light is available during exposure, to nearly complete hardness in the highlight where most of the exposure light reaches the carbon tissue. After exposure to the continuous tone positive, the land areas that define the gravure cell are formed by exposure to a special gravure screen. The gravure screen is a crosshatch that contains between 100 and 200 lines per inch. When exposed to the carbon tissue, the areas of the stencil that would protect the land areas of the image and cell walls are light hardened. After exposure, the carbon tissue is inverted and applied to the surface of the gravure cylinder and then developed. The highlight areas of the carbon tissue are the hardest because they inhibit diffusion of the etchant; the shadow areas are less hard and provide less of a barrier to the etchant. Consequently, during the etching
GROUND PENETRATING RADAR
European companies have worked with partners in the United States to develop laser-engraving systems for cutting cells into copper. Theoretically, a laser could generate up to 100,000 cells per second, a great improvement over electromechanical methods. In practice, however the reflectivity of the copper cylinder surface significantly reduced the efficiency of the laser engraver. The laser light energy intended to ablate copper to engrave a cell is instead reflected from the cylinder. Attempts to replace copper with plastics or hardened epoxy for laser imaging have proven unsuccessful. To avoid the reflectivity problems associated with laser engraving of copper, the Max Daetwyler Corporation (MDC) developed a metal alloy with a ‘‘laser type-specific light absorption’’ (4). This system can reportedly engrave 70,000 cells per second.
• • • • • •
463
Countertops Vinyl flooring Candy and pill trademarking Stamps Cigarette filter tips Lottery Tickets
The strength of gravure derives from the simplicity of operation — fewer moving parts. This allows for a more stable and consistent production process. The improvements in gravure prepress have shortened cylinder lead times. Any successful breakthrough in plastic cylinder technology will make gravure competitive in short-run markets. BIBLIOGRAPHY
OVERVIEW OF TODAY’S GRAVURE INDUSTRY The gravure process is currently the third most common printing process used in the United States and the second most commonly used process in Europe and Asia. Three distinctly different market segments use the gravure process: publication, packaging, and specialty. Publication gravure presses are designed to print web widths up to 142 inches and run at speeds up to three thousand feet per minute. The maximum cylinder circumference used in a publication gravure press is 76 inches. A typical publication gravure press consists of eight printing units, four for each side of the web. Gravure printed publications include magazines, Sunday newspaper supplements, catalogs, and newspaper advertising inserts. Gravure packaging presses are designed to handle the specific substrates used in the packaging industry. Consequently, a packaging press is narrower and runs at a slower speed than a publication press. A packaging gravure press usually includes eight or more printing stations, and runs at a speed between 450 and 2000 feet per minute. The type of substrate printed or the speed of in-line finishing often limits press speeds. The gravure printing process can handle a wider range of substrates than any other printing process (with the possible exception of flexography). Gravure printed packaging products include folding cartons, usually printed on paperboard, flexible packaging, usually printed on polyethylene or polypropylene, and labels and wrappers. The most interesting and diverse segment of gravure printing is known as the product or specialty segment. The press speeds and the web widths used for the various products manufactured by specialty gravure printers are diverse. Depending on the substrate, speeds vary from 30 to 1000 feet per minute, and widths vary from less than 20 inches to 12 feet. The following list includes many of the specialty products printed by gravure: • • • •
Gift wrap Wallcoverings Swimming pool liners Shower curtains
1. M. O. Lilien, History of Industrial Gravure Printing up to 1920, Lund Humphries, London, 1972, pp. 3–24. 2. J. F. Romano and M. Richard, Encyclopedia of Graphic Communications, Prentice-Hall, Englewood Cliffs, NJ, 1998, pp. 361–368. 3. Gravure Process and Technology, Gravure Education Foundation and Gravure Association of America, 1998, pp. 17, 182–196, 259. 4. www.daetwyler.com LASERSTAR
GROUND PENETRATING RADAR LAWRENCE B. CONYERS University of Denver Denver, CO
INTRODUCTION Ground-penetrating radar (GPR) is a geophysical method that can accurately map the spatial extent of near-surface objects or changes in soil media and produce images of those features. Data are acquired by reflecting radar waves from subsurface features in a way that is similar to radar methods used to detect airplanes in the sky (1). Radar waves are propagated in distinct pulses from a surface antenna; reflected from buried objects, features or bedding contacts in the ground; and detected back at the source by a receiving antenna. As radar pulses are being transmitted through various materials on their way to the buried target feature, their velocity changes, depending on the physical and chemical properties of the material through which they are traveling. When the travel times of the energy pulses are measured and their velocity through the ground is known, distance (or depth in the ground) can be accurately measured, producing a three-dimensional data set. In the GPR method, radar antennas are moved along the ground in transects, and two-dimensional profiles of a large number of periodic reflections are created, producing a profile of subsurface stratigraphy and buried
464
GROUND PENETRATING RADAR
Ground surface
Buried living surface Buried pipe
Depth (meters)
0
1.0
2.0
0
5
adapted to many differing site conditions. In the past, it has been assumed that GPR surveys would be successful only in areas where soils and underlying sediment are extremely dry and nonconductive (28). Although radar wave penetration and the ability to reflect energy back to the surface are enhanced in a dry environment, recent work has demonstrated that dryness is not necessarily a prerequisite for GPR surveys, as good data have been collected in swampy areas, peat bogs, rice paddies, and even freshwater lakes. Modern methods of computer enhancement and processing have also proven that meaningful data can be obtained, sometimes even in these very wet ground conditions.
10
Distance (meters) Figure 1. GPR reflection profile showing a vertical slice in the ground to 2.5 meters depth.
features along lines (Fig. 1). When data are acquired in a series of transects within a grid and reflections are correlated and processed, an accurate three-dimensional picture of buried features and associated stratigraphy can be constructed. Ground-penetrating radar surveys allow for a wide aerial coverage in a short period of time and have excellent subsurface resolution of buried materials and geological stratigraphy. Some radar systems can resolve stratigraphy and other features at depths in excess of 40 meters, when soil and sediment conditions are suitable (2). More typically, GPR is used to map buried materials at depths from a few tens of centimeters to 5 meters in depth. Radar surveys can identify buried objects for possible future excavation and also interpolate between excavations and project subsurface knowledge into areas that have not yet been, or may never be excavated. GPR surveys are most typically used by geologists, archaeologists, hydrologists, soil engineers, and other geoscientists. Ground-penetrating radar (GPR) was initially developed as a geophysical prospecting technique to locate buried objects or cavities such as pipes, tunnels, and mine shafts (3). The GPR method has also been used to define lithologic contacts (4–6), faults (7), bedding planes and joint systems in rocks (8–11). Ground-penetrating radar technology can also be employed to investigate buried soil units (12–17) and the depth to groundwater (14,18,19). Archaeological applications range from finding and mapping buried villages (20–25) to locating graves, buried artifacts, and house walls (26,27). ENVIRONMENTS WHERE GROUND-PENETRATING RADAR IS SUCCESSFUL The success of GPR surveys depends to a great extent on soil and sediment mineralogy, clay content, ground moisture, depth of burial, surface topography, and vegetation. It is not a geophysical method that can be immediately applied to any geographic or archaeological setting, although with thoughtful modifications in acquisition and data processing methodology, GPR can be
GROUND-PENETRATING RADAR EQUIPMENT AND DATA ACQUISITION The GPR method involves transmitting high-frequency electromagnetic radio (radar) pulses into the earth and measuring the time elapsed between transmission, reflection from a buried discontinuity, and reception at a surface radar antenna. A pulse of radar energy is generated on a dipole transmitting antenna that is placed on, or near, the ground surface. The resulting wave of electromagnetic energy propagates downward into the ground where portions of it are reflected back to the surface at discontinuities. The discontinuities where reflections occur are usually created by changes in electrical properties of the sediment or soil, variations in water content, lithologic changes, or changes in bulk density at stratigraphic interfaces. Reflection can also occur at interfaces between anomalous archaeological features, buried pipes, and the surrounding soil or sediment. Void spaces in the ground, which may be encountered in burials, tombs, or tunnels, will also generate significant radar reflections due to a significant change in radar wave velocity. The depth to which radar energy can penetrate and the amount of definition that can be expected in the subsurface are partially controlled by the frequency of the radar energy transmitted. The radar energy frequency controls both the wavelength of the propagating wave and the amount of weakening, or attenuation, of the waves in the ground. Standard GPR antennas propagate radar energy that varies in bandwidth from about 10 megahertz (MHz) to 1200 MHz. Antennas usually come in standard frequencies; each antenna has one center frequency but produces radar energy that ranges around that center by about two octaves. An octave is one-half and two times the center frequency. Radar antennas are usually housed in a fiberglass or wooden sled that is placed directly on the ground (Fig. 2) or supported on wheels a few centimeters above the ground. When two antennas are employed, one is used as a transmitting antenna and the other as a receiving antenna. Antennas can also be placed separately on the ground without being housed in a sled. A single antenna can also be used as both a sender and receiver in what is called a monostatic system. In monostatic mode, the same antenna is turned on to transmit a radar pulse and then immediately switched to receiving mode to receive and measure the returning reflected energy.
GROUND PENETRATING RADAR
465
Figure 2. Typical GPR field acquisition set up. A 500 MHz antenna is on the left connected to the radar control unit and computer by a cable. A screen and keyboard for analysis in the field during collectiion is on the packing box on the right.
Antennas are usually hand-towed along survey lines within a grid at an average speed of about 2 kilometers per hour, or they can be pulled behind a vehicle at speeds of 10 kilometers per hour or greater. In this fashion, energy is being continuously transmitted and received as the antennas move over the ground. They can also be moved in steps along a transect instead of being moved continuously. During step acquisition, the smaller the spacing between steps, the greater the subsurface coverage. In the last few years, radar equipment manufacturers have been building their systems so that data can be collected by either method, depending on the preference of the user or because of site characteristics. The most efficient method of subsurface radar mapping is establishing a grid across a survey area before acquiring the data. Usually rectangular grids are established with a line spacing of 50 centimeters or greater. Rectangular grids produce data that are easier to process and interpret. Other types of grid acquisition patterns may be necessary because of surface topography or other obstructions. Surveys lines that radiate outward from one central area have been sometimes used, for instance, to define a moat around a central fort-like structure (27). A rhomboid grid pattern has also been used with success within a sugarcane field on the side of a hill (20), where antennas had to be pulled between planted rows. Data from nonrectangular surveys are just as useful as those acquired in rectangular grids, although more field time may be necessary for surveying, and reflection data must be manipulated differently during computer processing and interpretation. Occasionally GPR surveys have been carried out on the frozen surface of lakes or rivers (2,6,14,28). Radar waves will easily pass through ice and freshwater into the underlying sediment, revealing features on lake or river bottoms and in the subsurface. A radar sled can also be easily floated across the surface of a lake or
river and onto the shore, all the while collecting data from the subsurface (29). These techniques, however, do not work in salt water because the high electrical conductivity of the saline water quickly dissipates the electromagnetic energy before it can be reflected to the receiving antenna. If the antennas are pulled continuously along a transect line within a presurveyed grid, continuous pulses of radar energy are sent into the ground, reflected from subsurface discontinuities and then received and recorded at the surface. The movable radar antennas are connected to the control unit by cable. Some systems record the reflective data digitally directly at the antenna, and the digital signal is sent back through fiber optic cables to the control module (2). Other systems send an analog signal from the antennas through coaxial copper cables to the control unit where it is then digitized. Older GPR systems, without the capability of digitizing the reflected signals in the field, must record reflective data on magnetic tape or paper records. The two-way travel time and the amplitude and wavelength of the reflected radar waves derived from the pulses are then amplified, processed, and recorded for immediate viewing or later postacquisition processing and display. During field data acquisition, the radar transmission process is repeated many times per second as the antennas are pulled along the ground surface or moved in steps. The distance along each line is also recorded for accurate placement of all reflections within a surveyed grid. When the composite of all reflected wave traces is displayed along the transect, a cross-sectional view of significant subsurface reflective surfaces is generated (Fig. 1). In this fashion, two-dimensional profiles that approximate vertical ‘‘slices’’ through the earth are created along each grid line. Radar reflections are always recorded in ‘‘two-way time’’ because that is the time it takes a radar wave
466
GROUND PENETRATING RADAR
to travel from the surface antenna into the ground, to reflect from a discontinuity, travel back to the surface, and be recorded. One of the advantages of GPR surveys over other geophysical methods is that the subsurface stratigraphy and archaeological features at a site can be mapped in real depth. This is possible because the twoway travel time of radar pulses can be converted to depth, if the velocity of the radar wave travel through the ground is known (1). The propagative velocity of radar waves that are projected through the earth depends on a number of factors; the most important is the electrical properties of the material through which they pass (30). Radar waves in air travel at the speed of light, which is approximately 30 centimeters per nanosecond (one nanosecond is one billionth of a second). When radar energy travels through dry sand, its velocity slows to about 15 centimeters per nanosecond. If the radar energy were then to pass through a water-saturated sand unit, its velocity would slow further to about 5 centimeters per nanosecond or less. Reflections would be generated at each interface where velocity changes. Type of Data Collected The primary goal of most GPR investigations is to differentiate subsurface interfaces. All sedimentary layers in the earth have particular electrical properties that affect the rate of electromagnetic energy propagation, as measured by the relative dielectric permittivity. The reflectivity of radar energy at an interface is primarily a function of the magnitude of the difference in electrical properties between the two materials on either side of that interface. The greater the contrast in electrical properties between the two materials, the stronger the reflected signal (31). The inability to measure the electrical parameters of buried units precisely usually precludes accurate calculations of specific amounts of reflectivity in most contexts, and usually only estimates can be made. The strongest radar reflections in the ground usually occur at the interface of two thick layers whose electrical properties vary greatly. The ability to ‘‘see’’ radar reflections on profiles is related to the amplitude of the reflected waves. The higher the amplitude, the more visible the reflections. Lower amplitude reflections usually occur when there are only small differences in the electrical properties between layers. Radar energy becomes both dispersed and attenuated as it radiates into the ground. When portions of the original transmitted signal are reflected toward the surface, they will suffer additional attenuation in the material through which they pass before finally being recorded at the surface. Therefore, to be detected as reflections, important subsurface interfaces must have sufficient electrical contrast at their boundaries and also must be located at shallow enough depths where sufficient radar energy is still available for reflection. As radar energy is propagated to increasing depths and the signal becomes weaker and spreads out over more surface area, less is available for reflection, and it is possible that only very low-amplitude waves will be recorded. The maximum depth of resolution for every site will vary with the geologic
conditions and the equipment being used. Data filtering and other data amplification techniques can sometimes be applied to reflective data after acquisition that will enhance very low amplitude reflections to make them more visible. Reflections received from deeper in the ground are usually gained, either during data collection in the field or during postacquisition processing. This data processing method exponentially increases the amplitudes of reflections from deeper in the ground and makes them visible in reflective profiles. The gaining process enhances otherwise invisible reflections, which have very low amplitude because the energy has traveled to a greater depth in the ground and become attenuated and spread out as waves radiate away from the transmitting antenna, leaving less energy to be reflected to the surface. Production of Continuous Reflective Images Most radar units used for geologic and archaeological investigation transmit short discrete pulses into the earth and then measure the reflected waves derived from those pulses as the antennas are moved along the ground. A series of reflected waves are then recorded as the antennas are moved along a transect. The amount of spatial resolution in the subsurface depends partially on the density of reflections along each transect. This spatial density can be adjusted within the control unit to record a greater or lesser number of traces along each recorded line, depending on the speed of antenna movement along the ground. If a survey wheel is being used for data acquisition, the number of reflective traces desired every unit distance can also be adjusted for greater or lesser resolution. If the step method of acquisition is used, the distance between steps can be lengthened or shortened, depending on the subsurface resolution desired. As reflections from the subsurface are recorded in distinct traces and plotted together in a profile, a twodimensional representation of the subsurface can be made (Fig. 1). One ‘‘trace’’ is a complete reflected wave that is recorded from the surface to whatever depth is being surveyed. A series of reflections that make up a horizontal or subhorizontal line (either dark or light in standard black-and-white or gray-scale profiles) is usually referred to as ‘‘a reflection.’’ A distinct reflection visible in profiles is usually generated from a subsurface boundary such as a stratigraphic layer or some other physical discontinuity such as a water table. Reflections recorded later in time are usually those received from deeper in the ground. There can also be ‘‘point source reflections’’ that are generated from one feature in the subsurface. These are visible as hyperbolas on two-dimensional profiles. Due to the wide angle of the transmitted radar beam, the antenna will ‘‘see’’ the point source before arriving directly over it and continue to ‘‘see’’ it after it is passed. Therefore, the resulting recorded reflection will create a reflective hyperbola (Fig. 3), sometimes incorrectly called a diffraction, on two-dimensional profiles. These often can be produced from buried pipes, tunnels, walls, or large rocks.
GROUND PENETRATING RADAR
Physical Parameters that Affect Radar Transmission The maximum effective depth of GPR wave penetration is a function of the frequency of the waves that are propagated into the ground and the physical characteristics of the material through which they are traveling. The physical properties that affect the radar waves as they pass through a medium are the relative dielectric permittivity (RDP), the electrical conductivity, and the magnetic permeability (32). Soils, sediment or rocks that are ‘‘dielectric’’ will permit the passage of most electromagnetic energy without actually dissipating it. The more electrically conductive a material, the less dielectric it is. For maximum radar energy penetration, a medium should be highly dielectric and have low electrical conductivity. The relative dielectric permittivity of a material is its capacity to store and then allow the passage of electromagnetic energy when a field is imposed upon it (33). It can also be thought of as a measure of a material’s ability to become polarized within an electromagnetic field and therefore respond to propagated electromagnetic waves (30). It is calculated as the ratio of a material’s electrical permittivity to the electrical permittivity in a vacuum (that is, one). Dielectric permittivities of materials vary with their composition, moisture content, bulk density, porosity, physical structure, and temperature (30). The relative dielectric permittivity in air, which exhibits only negligible electromagnetic polarization, is approximately 1.0003 (34), usually rounded to one. In volcanic or other hard rocks, it can range from 6 to 16, and in wet soils or clay-rich units, it can approach 40 or 50. In unsaturated sediment, where there is little or no clay, relative dielectric permittivities can be 5 or lower. In general, the higher the RDP of a material, the slower the velocity of radar waves passing through it. In general, the higher the RDP of a material, the poorer its ability to transmit radar energy (1). If data are not immediately available about field conditions, the RDP can only be estimated, but if the actual depth of objects or interfaces visible in reflective profiles is known, the RDP can be easily calculated using Eq. 1. K 1/2 =
C V
(1)
where K = relative dielectric permittivity (RDP) of the material through which the radar energy passes C = speed of light (0.2998 meters per nanosecond) V = velocity at which the radar passes through the material (measured in meters per nanosecond) The relative dielectric permittivity of some common materials is shown in Table 1. These of course can be highly variable due to changes in clay content and type, the amount and type of salts, and especially moisture. The greater the difference between the relative dielectric permittivity of materials in the subsurface, the larger the amplitude of the reflection generated. To generate a significant reflection, the change in dielectric permittivity between two materials must occur over a
Table 1. Relative Dielectric Common Materials Material
Permittivities
467 of
Relative Dielectric Permittivity
Air Ice Salt water Dry sand Saturated sand Volcanic ash/pumice Limestone Shale Granite Coal Dry silt Saturated silt Clay Permafrost Asphalt Concrete
1 3–4 81–88 3–5 20–30 4–7 4–8 5–15 5–15 4–5 3–30 10–40 5–40 4–5 3–5 6
short distance. When the RDP changes gradually with depth, only small differences in reflectivity will occur every few centimeters, and therefore only weak or nonexistent reflections will be generated. Magnetic permeability is a measure of the ability of a medium to become magnetized when an electromagnetic field is imposed upon it (35). Most soils and sediments are only very slightly magnetic and therefore have low magnetic permeability. The higher the magnetic permeability, the more the electromagnetic energy will be attenuated during its transmission. Media that contain magnetite minerals, iron oxide cement, or iron-rich soils can have high magnetic permeability and therefore transmit radar energy poorly. Electric conductivity is the ability of a medium to conduct an electric current (35). When a medium through which radar waves pass has high conductivity, radar energy will be highly attenuated. In a highly conductive medium, the electric component of the electromagnetic energy is essentially conducted away into the earth and becomes lost. This occurs because the electric and magnetic fields are constantly ‘‘feeding’’ on each other during transmission. If one is lost, the total field dissipates. Highly conductive media include those that contain salt water and those that have high clay content, especially if the clay is wet. Any soil or sediment that contains soluble salts or electrolytes in the groundwater will also have high electrical conductivity. Agricultural runoff that is partially saturated with soluble nitrogen and potassium can raise the conductivity of a medium, as will wet calcium carbonate impregnated soils in desert regions. Radar energy will not penetrate metal. A metal object will reflect 100% of the radar energy that strikes it and will shadow anything directly underneath it.
RADAR ENERGY PROPAGATION Many ground-penetrating radar novices envision the propagating radar pattern as a narrow pencil-shaped
468
GROUND PENETRATING RADAR
beam that is focused directly down from the antenna. In fact, GPR waves from standard commercial antennas radiate energy into the ground in an elliptical cone (Fig. 3) whose apex is at the center of the transmitting antenna (36–38). This elliptical cone of transmission occurs because the electric field produced by the antenna is generated parallel to its long axis and therefore usually radiates into the ground perpendicular to the direction of antenna movement along the ground surface. This radiative pattern is generated from a horizontal electric dipole antenna to which elements called shields, are sometimes added that effectively reduce upward radiation. Sometimes, the only shielding mechanism is a metal plate that is placed above the antenna to re-reflect upward radiating energy. Because of considerations of cost and portability (size and weight), the use of more complex radar antennas that might be able to focus energy more efficiently into the ground in a more narrow beam has been limited to date. When an electric dipole antenna is located in air (or supported within the antenna housing), the radiative pattern is approximately perpendicular to the long axis of the antenna. When this dipole antenna is placed on the ground, a major change in the radiative pattern occurs due to ground coupling (39). Ground coupling is the ability of the electromagnetic field to move from transmission in the air to the ground. During this process, refraction that occurs as the radar energy passes through surface units changes the directionality of the radar beam, and most of the energy is channeled downward in a cone from the propagating antenna (32). The higher the RDP of the surface material, the lower the velocity of the transmitted radar energy, and the more focused (less broad) the conical transmission pattern becomes (24). This focusing effect continues as radar waves travel into the ground and material of higher and higher RDP is encountered. The amount of energy refraction that occurs with depth and therefore the amount of focusing is a function of Snell’s law (35). In Snell’s law the amount of reflection or refraction that occurs at a boundary between two media depends on the angle of incidence and the velocity of
Antenna
A=
Ground surface
+
D √K + 1
(3)
A = Approximate long dimension radius of footprint
D
A Footprint
l 4
l = Center frequency wavelength of radar energy
D = Depth from ground surface to reflection surface K = Average relative dielectric permittivity (RDP) of material from ground surface to depth (D )
Figure 3. The conical transmission of radar energy from a surface antenna into the ground. The footprint of illumination at any depth can be calculated with equation 3.
the incoming waves. In general the greater the increase in RDP with depth, the more focused the cone of transmission becomes. The opposite can also occur if materials of gradually lower RDP are encountered as radar waves travel into the ground. Then, the cone of transmission would gradually expand outward as refraction occurs at each interface. Radiation fore and aft from the antenna is usually greater than to the sides, making the ‘‘illumination’’ pattern on a horizontal subsurface plane approximately elliptical (Fig. 3); the long axis of the ellipse is parallel to the direction of antenna travel (1). In this way, the subsurface radiative pattern on a buried horizontal is always ‘‘looking’’ directly below the antenna and also in front, behind, and to the sides as it travels across the ground. The radiative pattern in the ground also depends on the orientation of the antenna and the resulting polarization of the electromagnetic energy as it travels into the ground. If a standard GPR antenna is used, where the transmitting and receiving antennas are perpendicular to the direction of transport along the ground, the elliptical pattern of illumination will tend to be elongated somewhat in the direction of transport. A further complexity arises due to polarization of waves as they leave the antenna and pass through the ground. The electric field generated by a dipole antenna is oriented parallel to the long axis of the antenna, which is usually perpendicular to the direction of transport across the ground. A linear object in the ground that is oriented parallel to this polarization would therefore produce a very strong reflection, as much of the energy is reflected. In contrast, a linear object in the ground perpendicular to the polarized electrical field will have little surface area parallel to the field with which to reflect energy therefore will reflect little energy, and may be almost invisible. To minimize the amount of reflective data derived from the sides of a survey line, the long axes of the antennas are aligned perpendicular to the survey line. This elongates the cone of transmission in an in-line direction. Various other antenna orientations achieve different subsurface search patterns, but most of them are not used in standard GPR surveys (1). Some antennas, especially those in the low-frequency range from 80–300 MHz, are sometimes not well shielded and therefore radiate radar energy in all directions. Using unshielded antennas can generate reflections from a nearby person pulling the radar antenna or from any other objects nearby such as trees or buildings (40). Discrimination of individual targets, especially those of interest in the subsurface, can be difficult if these types of antennas are used. However, if the unwanted reflections generated from unshielded antennas all occur at approximately the same time, for instance from a person pulling the antennas, then they can be easily filtered out later, if the data are recorded digitally. If reflections are recorded from randomly located trees, surface obstructions, or people moving about near the antenna, usually they cannot easily be discriminated from important subsurface reflections, and interpreting the data is much more difficult.
GROUND PENETRATING RADAR
If the transmitting antenna is properly shielded so that energy is propagated in a mostly downward direction, the angle of the conical radiative pattern can be estimated, depending on the center frequency of the antenna used (1). An estimate of this radiative pattern is especially important when designing line spacing within a grid, so that all subsurface features of importance are ‘‘illuminated’’ by the transmitted radar energy and therefore can potentially generate reflections. In general, the angle of the cone is defined by the relative dielectric permittivity of the material through which the waves pass and the frequency of the radar energy emitted from the antenna. An equation that can be used to estimate the width of the transmission beam at varying depths (footprint) is shown in Fig. 3. This equation (Eq. 3) can usually be used only as a rough approximation of real-world conditions because it assumes a consistent dielectric permittivity of the medium through which the radar energy passes. Outside of strictly controlled laboratory conditions, this is never the case. Sedimentary and soil layers within the earth have variable chemical constituents, differences in retained moisture, compaction, and porosity. These and other variables can create a complex layered system that has varying dielectric permittivities and therefore varying energy transmission patterns. Any estimate of the orientation of transmitted energy is also complicated by the knowledge that radar energy propagated from a surface antenna is not of one distinct frequency but can range in many hundreds of megahertz around the center frequency. If one were to make a series of calculations on each layer, assuming that all of the variables could be determined and assuming one distinct antenna frequency, then the ‘‘cone’’ of transmission would widen in some layers, narrow in others, and create a very complex three-dimensional pattern. The best one can usually do for most field applications is to estimate the radar beam configuration based on estimated field conditions (1). Antenna Frequency Constraints One of the most important variables in ground-penetrating radar surveys is selecting antennas that have the correct operating frequency for the depth necessary and the resolution of the features of interest (41). The center frequencies of commercial GPR antennas range from about 10–1200 megahertz (MHz) (15,37). Variations in the dominant frequencies of any antenna are caused by irregularities in the antenna’s surface or other electronic components located within the system. These types of variations are common in all antennas; each has its own irregularities and produces a different pulse signature and different dominant frequencies. This somewhat confusing situation with respect to transmission frequency is further complicated when radar energy is propagated into the ground. When radar waves move through the ground, the center frequency typically ‘‘loads down’’ to a lower dominant frequency (39). The new propagative frequency, which is almost always lower, will vary depending on the electric properties of near-surface soils and sediment that change the velocity of propagation
469
and the amount of ‘‘coupling’’ of the propagating energy with the ground. At present, there is little hard data that can be used to predict accurately what the ‘‘downloaded’’ frequency of any antenna will be under varying conditions. For most GPR applications, it is only important to be aware that there is a downloading effect that can change the dominant radar frequency and affect calculations of subsurface transmission patterns, penetration depth, and other parameters. In most cases, proper antenna frequency selection can make the difference between success and failure in a GPR survey and must be planned for in advance. In general, the greater the necessary depth of investigation, the lower the antenna frequency which should be used. Lower frequency antennas are much larger, heavier and more difficult to transport to and within the field than high-frequency antennas. One 80-MHz antenna used for continuous GPR acquisition is larger than a 42-gallon oil drum, cut in half lengthwise, and weighs between 125 and 150 pounds. It is difficult to transport to and from the field, and usually must be moved along transect lines by some form of wheeled vehicle or sled. In contrast, a 500-MHz antenna is smaller than a shoe box, weighs very little, and can easily fit into a suitcase (Fig. 2). Lower frequency antennas used for acquiring data by the step method are not nearly as heavy as those used in continuous data acquisition but are equally unwieldy. Low-frequency antennas (10–120 MHz) generate long wavelength radar energy that can penetrate up to 50 meters in certain conditions but can resolve only very large subsurface features. In pure ice, antennas of this frequency have been known to transmit radar energy for many kilometers. Dry sand and gravel or unweathered volcanic ash and pumice are media that allow radar transmission to depths that approach 8–10 meters, when lower frequency antennas are used. In contrast, the maximum depth of penetration of a 900-MHz antenna is about 1 meter or less in typical soils, but its generated reflections can resolve features as small as a few centimeters. Therefore, trade-off exists between depth of penetration and subsurface resolution. The depth of penetration and the subsurface resolution are actually highly variable and depend on many site-specific factors such as overburden composition, porosity, and the amount of retained moisture. If large amounts of clay, especially wet clay, are present, then attenuation of the radar energy with depth will occur very rapidly, irrespective of radar energy frequency. Attenuation can also occur if sediment or soils are saturated with salty water, especially seawater. SUBSURFACE RESOLUTION The ability to resolve buried features is determined mostly by frequency and therefore the wavelengths of the radar energy transmitted into the ground. The wavelength necessary for resolution varies, depending on whether a three-dimensional object or an undulating surface is being investigated. For GPR to resolve three-dimensional objects, reflections from at least two surfaces, usually a top and bottom interface, need to be distinct. Resolution
470
GROUND PENETRATING RADAR
of a single buried planar surface, however, needs only one distinct reflection and therefore wavelength is not as important in resolving it. An 80-MHz antenna generates an electromagnetic wave about 3.75 meters long when transmitted in air. When the wavelength in air is divided by the square root of the RDP of the material through which it passes, the subsurface wavelength can be estimated. For example, when an 80-MHz wave travels through material whose RDP is 5, its wavelength decreases to about 1.6 meters. The 300-MHz antenna generates a radar wave whose wavelength is 1 meter in air, and decreases to about 45 centimeters in material whose RDP is 5. To distinguish reflections from two parallel planes (the top and bottom of a buried object, for instance), they must be separated by at least one wavelength of the energy that is passing through the ground (1,2). If the two reflections are not separated by one wavelength, then the resulting reflected waves from the top and bottom will either be destroyed or will be unrecognizable due to constructive and destructive interference. When two interfaces are separated by more than one wavelength, however, two distinct reflections are generated, and the top and bottom of the feature can be resolved. If only one buried planar surface is being mapped, then the first arrival reflected from that interface can be accurately resolved, independent of the wavelength. This can be more difficult when the buried surface is highly irregular or undulating. Subsurface reflections of buried surfaces that have been generated by longer wavelength radar waves tend to be less sharp when viewed together in a standard GPR profile, and therefore many small irregularities on the buried surface are not visible. This occurs because the conical radiation pattern of an 80-MHz antenna is about three times broader than that of a 300MHz antenna (1). Therefore, the reflected data that are received at the surface from the lower frequency antenna have been reflected from a much greater subsurface area, which results in averaging out the low percentage of reflections from the smaller irregular features. Therefore, a reflective profile produced from reflections by an 80MHz antenna produces an average and less accurate representation of a buried surface. In contrast, a 300-MHz transmission cone is about three times narrower than an 80-MHz radar beam, and its resolution of subsurface features on the same buried surface is much greater. Radar energy that is reflected from a buried subsurface interface that slopes away from a surface transmitting antenna is reflected away from the receiving antenna and will be lost. This sloping interface would go unnoticed in reflective profiles. A buried surface of this orientation is visible only if an additional traverse is located in an orientation where that the same buried interface slopes toward the surface antennas. This is one reason that it is important always to acquire lines of data within a closely spaced surface grid. The amount of reflection from a buried feature is also determined by the ratio of the object’s dimension to the wavelength of the radar wave in the ground. Short wavelength (high-frequency) radar waves can resolve very small features but will not penetrate to a great depth.
Longer wavelength radar energy will resolve only larger features but will penetrate deeper in the ground. Some features in the subsurface may be described as ‘‘point targets,’’ and others are similar to planar surfaces. Planar surfaces can be stratigraphic and soil horizons or large flat archaeological features such as pit-house floors or buried soil horizons. Point targets are features such as tunnels, voids, artifacts or any other nonplanar object. Depending on a planar surface’s thickness, reflectivity, orientation, and depth of burial, it is potentially visible with any frequency data, constrained only by the conditions discussed before. Point sources, however, often have little surface area with which to reflect radar energy and therefore are usually difficult to identify and map. They are sometimes indistinguishable from the surrounding material. Many times they are visible only as small reflective hyperbolas visible on one line within a grid (Fig. 1). In most geologic and archaeological settings, the materials through which radar waves pass may contain many small discontinuities that reflect energy. These can be described only as clutter (that is if they are not the target of the survey). Clutter depends totally on the wavelength of the radar energy propagated. If both the features to be resolved and the discontinuities that produce the clutter are of the order of one wavelength, then the reflective profiles will appear to contain only clutter, and there can be no discrimination between the two. Clutter can also be produced by large discontinuities, such as cobbles and boulders, but only when a lower frequency antenna is used that produces a long wavelength. In all cases, the feature to be resolved, if not a large planar surface, should be much larger than the clutter and greater than one wavelength. Buried features, whether planar or point sources, also cannot be too small compared to their depth of burial, before they are undetectable. As a basic guideline, the cross-sectional area of the target to be illuminated within the ‘‘footprint’’ of the beam should approximate the size of the footprint at the target depth (Eq. 3 in Fig. 3). If the target is much smaller than the footprint size, then only a fraction of the reflected energy that is returned to the surface will have been reflected from the buried feature. Any reflections returned from the buried feature in this case may be indistinguishable from background reflections and will be invisible on reflective profiles. Frequency Interference Ground-penetrating radar employs electromagnetic energy at frequencies that are similar to those used in television, FM radio, and other radio communication bands. If there is an active radio transmitter in the vicinity of the survey, then there may be some interference with the recorded signal. Most radio transmitters, however, have quite a narrow bandwidth and, if known in advance, an antenna frequency can be selected that is as far away as possible from any frequencies that might generate spurious signals in the reflected data. The wide bandwidth of most GPR systems usually makes it difficult to avoid such external transmitter effects completely, and any major adjustments in antenna frequency may affect the survey objectives. Usually, this
GROUND PENETRATING RADAR
Ground surface
(a)
Ground surface
(b)
20
40
Subsurface horizon
60
Minor scattering
Time
20 Time
471
Subsurface horizon
40 60
Major scattering
80
80
Ground surface
(c)
Time
20 40
Subsurface horizon
60
Focusing
80 Figure 4. Ground-penetrating radar ray paths reflected from an undulating surface and a deep ditch. Convex upward surfaces scatter radar energy while concave upward focus. Very deep features tend to scatter most energy and are hard to detect using GPR.
becomes a problem only if the site is located near a military base, airport, or radio transmission antennas. Cellular phones and walkie-talkies that are in use nearby during the acquisition of GPR data can also create noise in recorded reflective data and should not be used during data collection. This type of radio ‘‘noise’’ can usually be filtered out during postacquisition data processing. Focusing and Scattering Effects Reflection from a buried surface that contains ridges or troughs can either focus or scatter radar energy, depending on its orientation and the location of the antenna on the ground surface. If a subsurface plane is slanted away from the surface antenna location or is convex upward, most energy will be reflected away from the antenna, and no reflection or a very low amplitude reflection will be recorded (Fig. 4). This is termed radar scatter. The opposite is true when the buried surface is tipping toward the antenna or is concave upward. Reflected energy in this case will be focused, and a very high-amplitude reflection derived from the buried surface would be recorded. Figure 4 is an archaeological example of the focusing and scattering effects when a narrow buried moat is bounded on one side by a trough and on the other side by a mound. Both convex and concave upward surfaces would be ‘‘illuminated’’ by the radar beam as the antenna is pulled along the ground surface. When the radar antenna is located to the left of the deep moat (Fig. 4) some of the reflections are directed to the surface antenna, but there is still some scattering, and a weak reflection will be recorded from the buried surface. When the antenna is located directly over the deep trough, there will be a high degree of scattering, and much of the
radar energy, especially that which is reflected from the sides of the moat, will be directed away from the surface antenna and lost. This scattering effect will make the narrow moat invisible in GPR surveys. When the antenna is located directly over the wider trough to the right of the moat, there will be some focusing of the radar energy that creates a higher amplitude reflection from this portion of the subsurface interface. TWO-DIMENSIONAL GPR IMAGES The standard image for most GPR reflective data is a twodimensional profile that shows the depth on the ordinate and the distance along the ground on the abscissa. These image types are constructed by stacking many reflective traces together that are obtained as the antennas are moved along a transect (Figs. 1 and 5). Profile depths are usually measured in two-way radar travel time, but time can be converted to depth, if the velocity of radar travel in the ground is obtained. Reflective profiles are most often displayed in gray scale, and variations in the reflective amplitudes are measured by the depth of the shade of gray. Color palettes can also be applied to amplitudes in this format. Often, two-dimensional profiles must be corrected to reflect changes in ground elevation. Only after this is done will images correctly represent the real world. This process, which is usually important only when topographic changes are great, necessitates detailed surface mapping of each transect within the data grid and then reprocessing each transect by adjusting all reflective traces for surface elevation.
472
GROUND PENETRATING RADAR
Tunnel
Depth (meters)
0
1.0
2.0
5
0 Distance (meters)
Figure 5. A vertical GPR profile perpendicular to a buried tunnel illustrating the hyperbolic reflection generated from a point source.
Standard two-dimensional images can be used for most basic data interpretation, but analysis can be tedious if many profiles are in the database. In addition, the origins of each reflection in each profile must sometimes be defined before accurate subsurface maps can be produced. Accurate image definition comes only with a good deal of interpretive experience. As an aid to reflection interpretation, two-dimensional computer models of expected buried features or stratigraphy can be produced, which creates images of what things should look like in the ground for comparative purposes (1,24). THREE-DIMENSIONAL GPR IMAGING USING AMPLITUDE ANALYSIS The primary goal of most GPR surveys is to identify the size, shape, depth, and location of buried remains and related stratigraphy. The most straightforward way to accomplish this is by identifying and correlating important reflections within two-dimensional reflective profiles. These reflections can often be correlated from profile to profile throughout a grid, which can be very timeconsuming. Another more sophisticated type of GPR data manipulation is amplitude slice-map analysis that creates maps of reflected wave amplitude differences within a grid. The result can be a series of maps that illustrate the three-dimensional location of reflective anomalies derived from a computer analysis of the two-dimensional profiles.
This method of data processing can be accomplished only with a computer using GPR data that are stored digitally. The raw reflective data collected by GPR are nothing more than a collection of many individual traces along two-dimensional transects within a grid. Each of those reflective traces contains a series of waves that vary in amplitude, depending on the amount and intensity of energy reflection that occurred at buried interfaces. When these traces are plotted sequentially in standard two-dimensional profiles, the specific amplitudes within individual traces that contain important reflective information are usually difficult to visualize and interpret. The standard interpretation of GPR data, which consists of viewing each profile and then mapping important reflections and other anomalies, may be sufficient when the buried features are simple and interpretation is straightforward. In areas where the stratigraphy is complex and buried materials are difficult to discern, different processing and interpretive methods, one of which is amplitude analysis, must be used. In the past when GPR reflective data were often collected that had no discernible reflections or recognizable anomalies of any sort, the survey was usually declared a failure, and little if any interpretation was conducted. Due to the advent of more powerful computers and sophisticated software programs that can manipulate large sets of digital data, important subsurface information in the form of amplitude changes within the reflected waves has been extracted from these types of GPR data. An analysis of the spatial distribution of the amplitudes of reflected waves is important because it is an indicator of subsurface changes in lithology or other physical properties. The higher the contrasting velocity at a buried interface, the greater the amplitude of the reflected wave. If amplitude changes can be related to important buried features and stratigraphy, the location of higher or lower amplitudes at specific depths can be used to reconstruct the subsurface in three dimensions. Areas of low-amplitude waves indicate uniform matrix material or soils, and those of high amplitude denote areas of high subsurface contrast such as buried archaeological features, voids, or important stratigraphic changes. To be correctly interpreted, amplitude differences must be analyzed in ‘‘time slices’’ that examine only changes within specific depths in the ground. Each time slice consists of the spatial distribution of all reflected wave amplitudes, which are indicative of these changes in sediments, soils, and buried materials. Amplitude time slices need not be constructed horizontally or even in equal time intervals. They can vary in thickness and orientation, depending on the questions being asked. Surface topography and the subsurface orientation of features and the stratigraphy of a site may sometimes necessitate constructing slices that are neither uniform in thickness nor horizontal. To compute horizontal time slices, the computer compares amplitude variations within traces that were recorded within a defined time window. When this is done, both positive and negative amplitudes of reflections are compared to the norm of all amplitudes within that window. No differentiation is usually made between
GROUND PENETRATING RADAR
positive or negative amplitudes in these analyses, only the magnitude of amplitude deviation from the norm. Lowamplitude variations within any one slice denote little subsurface reflection and therefore indicate the presence of fairly homogeneous material. High amplitudes indicate significant subsurface discontinuities and in many cases detect the presence of buried features. An abrupt change between an area of low and high amplitude can be very significant and may indicate the presence of a major buried interface between two media. Degrees of amplitude variation in each time slice can be assigned arbitrary colors or shades of gray along a nominal scale. Usually, there are no specific amplitude units assigned to these color or tonal changes.
473
45 40 35 30 25 20 15
EXAMPLES OF THREE-DIMENSIONAL GPR MAPPING USING TIME SLICES Archaeological applications of GPR mapping have been expanding in the last decade, as the prices of data acquisition and processing systems have decreased and the image producing software has expanded. One area of archaeological success with GPR is the high plateau and desert areas of Colorado, Utah, New Mexico and Arizona, an area of abundant buried archaeological remains, including pit houses, kivas (semisubterranean circular pit features used for ceremonial activities), and storage pits. The climate and geological processes active in this area produce an abundance of dry sandy sediments and soil, an excellent medium for GPR energy penetration. Traditional archaeological exploration and mapping methods used for discovering buried sites include visual identification of artifacts in surface surveys, random test pit excavation, and analysis of subtle topographic features; all of them may indicate the presence of buried features. These methods can sometimes be indicative of buried sites, but they are extremely haphazard and random and often lead to misidentification or nonidentification of features. At a site near Bluff, Utah, a local archaeologist used some of these techniques to map what he considered was a large pit-house village. The area is located in the floodplain of the San Juan River, an area that was subjected to repeated floods during prehistoric time that often buried low lying structures in fluvial sediment. In a grid that was roughly 50 × 30 meters in dimension, surface surveys had located four or five topographic depressions that appeared to be subtle expressions of pit houses in what was presumably a small buried village. Lithic debris from stone tool manufacture as well as abundant ceramic sherds were found in and around these depressions and further enhanced this preliminary interpretation. A GPR survey was conducted over this prospective area, using paired 500-MHz antennas that transmitted data to a maximum depth of about 2 meters (42). While data were being acquired, reflection profiles were viewed on a computer monitor and were recorded digitally. A preliminary interpretation of the raw data in the field showed no evidence of pit-house floors in the areas that contained the depressions. Surprisingly, a large distinct floor was located in one corner of the grid, an area
10 5
0
5
10
15
20
25
30
Figure 6. Amplitude slice-map of a layer from 1.2–1.5 meters in the ground. The anomalous high amplitude reflections in the bottom right are from the floor and features on the floor of a buried pit-house covered in river sediment.
not originally considered prospective (Fig. 6). Velocity information, obtained in a nearby pit being dug for a house foundation, was used to convert radar travel time to depth. An amplitude time-slice map was then constructed in a slice from about 1.2–1.5 meters deep, a slice that would encompass the pit-house floor and all subfloor features. A map of the high amplitudes in this slice shows an irregularly shaped floor that has a possible antechamber and an entrance at opposite sides of the pit structure (Fig. 6). To confirm this interpretation derived only from the GPR maps, nine core holes were dug on and around the feature. All holes dug within the mapped feature encountered a hard-packed floor covered with fire-cracked rock, ceramic sherds and even a small bone pendant at exactly the depth predicted from the GPR maps. Those cores, drilled outside the pit house and in the area of the shallow depressions originally considered the location of the houses, encountered only hard, partially cemented fluvial sediment without archaeological remains. This GPR survey demonstrates the advantages of performing GPR surveys in conjunction with typical surface topography and artifact distribution mapping. The standard methods of site exploration indicated the presence of nearby pit houses, but both the artifact distributions and the subtle depressions pointed to the wrong area. If only these indicators were used as a guide to subsurface testing, it is doubtful that any archaeological features would have been discovered. Only when used in conjunction with the GPR data was the pit house discovered. It is not known at this time what may have created the subtle depressions that were originally interpreted as pit houses. It is likely that the artifact
GROUND PENETRATING RADAR
and lithic scatters noticed on the surface were produced by rodent burrowing, which brought these materials from depth and then concentrated them randomly across the site. A cautionary lesson about how changing conditions can affect GPR mapping was learned at this site when a second GPR survey over the known pit house was conducted a few months later after a large rain storm. This survey produced no significant horizontal reflections in the area of the confirmed pit house, but many random nonhorizontal reflections throughout the grid; none of them looked like house floors. These anomalous reflections were probably produced by pockets of rainwater that had been differentially retained in the sediments. At a well known archaeological site, also near Bluff, Utah, a second GPR survey was performed in an area where a distinct surface depression indicated the presence of a Great Kiva, a large semisubterranean structure typical of Pueblo II sites in the American Southwest (42). A 30 × 40 meter GPR survey, using both 300- and 500-MHz antennas, was conducted over this feature for use as a guide to future excavation. Individual GPR profiles of both frequencies showed only a bowl-shaped feature, which appeared to be filled with homogeneous material that had no significant reflection (Fig. 7). There were no discernible features within the depression that would correspond to floor features or possible roof support structures.
Ground surface Wall of kiva
Kiva fill
1.0
5
10
15 Distance (meters)
1.25−1.50 meters
30.00
25.00
20.00
15.00
10.00
5.00
10.00
15.00 meters
20.00
25.00
25 buried
kiva
Amplitude time-slice maps were then produced for the grid in the hope that subtle changes in amplitude, not visible to the human eye in normal reflection profiles, might be present in the data. When this was completed, the slice from 1.33 to 1.54 meters in depth (Fig. 8) showed a square feature deep within the depression, which, it was later found in two excavation trenches, was the wall of a deeper feature within the depression (42). The origin and function of this feature is not yet known. What can be concluded from this exercise in GPR data processing is that the computer can produce images of
35.00
5.00
20
Figure 7. A vertical GPR profile across a (semi-subterranean pit structure) in Utah, USA.
0.50−1.0 meter depth
0.00
Wall of kiva
0 Depth (meters)
474
30.00
Figure 8. Two amplitude slice-maps across the buried kiva shown in Figure 7. The slice from .5–1.0 meters shows the circular wall of the kiva as high amplitude reflections. It is a discontinuous circle because the wall is partially collapsed. The slice from 1.25–1.5 meters shows a more square feature within the kiva that was found to be interior walls during excavations.
GROUND PENETRATING RADAR
475
2−14 ns
2.00
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
14 − 28 ns 4.00
2.00
0.00 2.00
4.00
6.00
8.00
10.00
12.00
subtle features that cannot be readily processed by the human brain. Without this type of GPR processing, this deep feature would most likely not have been discovered or excavated. The most straightforward application of threedimensional amplitude analysis is the search for buried utilities. Often near-surface metal water pipes or electrical conduit can be discovered by using metal detectors or magnetometers. But these methods will not work if the buried utilities are made of clay, plastic, or other nonmagnetic material. Because GPR reflections are produced at the contact between any two types of buried materials, reflections from many buried nonmagnetic pipes and conduits will occur. Tunnels and tubes filled with air are especially visible and produce very high-amplitude reflections. In Fig. 9, two amplitude slices are shown in an area where it was thought that a buried electrical conduit existed. Records were available giving the approximate depth of burial, which was about 5 years before the GPR data were acquired. The actual location of the buried pipe and its orientation were not known. The conduit was immediately visible as a point-source hyperbola on the computer screen during acquisition when the antennas crossed it. Using the approximate depth from the old records and the measured radar travel time acquired in the field, an average velocity of radar travel through the ground was calculated. Amplitude slices were then constructed from the reflective data, and the lowest slice was most likely to include the buried conduit. The upper slices show only minor changes in amplitude, relating to changes in soil character. The pipe is easily discerned in the slice from 14 to 28 nanoseconds, and each bend is imaged. The image from this depth is somewhat complicated due to reflections from the side of the trench within which the conduit was placed.
14.00
Figure 9. Two amplitude slice-maps showing a buried electrical cable. The slice from 2–14 nanoseconds shows only soil changes near the surface. From 14–28 nanoseconds the cable is clearly visible as high amplitude reflections.
CONCLUSIONS AND PROSPECTS FOR THE FUTURE Ground-penetrating radar imaging of near-surface features is still very much in its infancy. Accurate threedimensional maps can be made from two-dimensional reflective data, manually and using the amplitude slicemap method. But these maps are really constructed only from a series of two-dimensional data sets. Using only one transmitting and one receiving antenna (the acquisition method typical today) and an abundant amount of subsurface energy refraction, reflection, and scatter, it can sometimes be difficult to determine the exact source in the ground of many of the reflections recorded. A number of data acquisition and processing methods are being developed that may alleviate some of these problems. One simple data processing method that offers a way to remove the unwanted tails of reflective hyperbolas is data migration. If the velocity of radar travel in the ground can be calculated, each axis of a hyperbola can be collapsed back to its source before producing amplitude slice maps. This process is standard procedure in seismic processing used by petroleum exploration companies. Good velocity analysis and a knowledge of the origin of hyperbolic reflections is necessary for this type of processing. A very sophisticated data acquisition method is presently under development that will allow acquiring true three-dimensional reflective data. Also a modification from seismic petroleum exploration, this procedure would place many receiving antennas on the ground with a grid; each would simultaneously ‘‘listen’’ for reflected waves and then would record each of those received signals on its own channel. One transmitting antenna would then be moved around the grid in an orderly fashion, while the receiving antennas record the reflected waves from many different locations. In this way, real three-dimensional data are acquired in a method called tomography. The processing procedure, which can
476
GROUND PENETRATING RADAR
manipulate massive amounts of multichannel data, is still being developed. Ever more powerful computers and the advancement of true three-dimensional data acquisition and processing will soon make it possible to produce rendered images of buried features. A number of prototype rendering programs have been developed, all of which show much promise. In the near future, clear images of buried features in the ground will be produced from GPR reflections, soon after data are acquired in the field; this will allow researchers to modify acquisition parameters, recollect data while the equipment is still on location, and produce very precise maps of subsurface materials. BIBLIOGRAPHY 1. L. B. Conyers and D. Goodman, Ground-Penetrating Radar: An Introduction for Archaeologists, AltaMira Press, Walnut Creek, CA, 1997. 2. J. L. Davis and A. P. Annan, in J. S. Pilon, ed., Ground Penetrating Radar, Geological Survey of Canada paper 904:49–56, 1992. 3. P. K. Fullagar and D. Livleybrooks, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 883–894. 4. U. Basson, Y. Enzel, R. Amit, and Z. Ben-Avraham, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, 1994, pp. 777–788. 5. S. van Heteren, D. M. Fitzgerald, and P. S. McKinlay, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 869–881. 6. H. M. Jol and D. G. Smith, Can. J. Earth Sci. 28, 1939–1947 (1992). 7. S. Deng, Z. Zuo, and W. Huilian, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 1,115–1,133. 8. L. Bjelm, Geologic Interpretation of SIR Data from a Peat Deposit in Northern Sweden, Lund Institute of Technology, Dept. of Engineering Geology. Lund, Sweden, 1980. 9. J. C. Cook, Geophysics 40, 865–885 (1975). 10. L. T. Dolphin, R. L. Bollen, and G. N. Oetzel, Geophysics 39, 49–55 (1974). 11. D. L. Moffat and R. J. Puskar, Geophysics 41, 506–518 (1976). 12. M. E. Collins, in H. Pauli and S. Autio, eds., Fourth International Conference on Ground-Penetrating Radar, June 8–13, Rovaniemi, Finland. Geological Survey of Finland Special Paper, 16:125–132, 1992. 13. J. A. Doolittle, Soil Surv. Horizons 23, 3–10 (1982). 14. J. A. Doolittle and L. E. Asmussen, in H. Pauli and S. Autio, eds., Fourth International Conference on Ground-Penetrating Radar, June 8–13, Rovaniemi, Finland. Geological Survey of Finland Special Paper, 16:139–147, 1992. 15. C. G. Olson and J. A. Doolittle, Soil Sci. Soc. Am. J. 49, 1,490–1,498 (1985). 16. R. W. Johnson, R. Glaccum, and R. Wotasinski, Soil Crop Sci. Soc. Proc. 39, 68–72 (1980).
17. S. F. Shih and J. A. Doolittle, Soil Sci. Soc. Am. J. 48, 651–656 (1984). 18. L. Beres and H. Haeni, Groundwater 29, 375–386 (1991). 19. R. A. van Overmeeren, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 1,057–1,073. 20. L. B. Conyers, Geoarchaeology 10, 275–299 (1995). 21. T. Imai, S. Toshihiko, and T. Kanemori, Geophysics 52, 137–150 (1987). 22. D. Goodman and Y. Nishimura, Antiquity 67, 349–354 (1993). 23. D. Goodman, Y. Nishimura, R. Uno, and T. Yamamoto, Archaeometry 36, 317–326 (1994). 24. D. Goodman, Geophysics 59, 224–232 (1994). 25. D. Goodman, Y. Nishimura, and J. D. Rogers, Archaeological Prospection 2, 85–89 (1995). 26. C. J. Vaughan, Geophysics 51, 595–604 (1986). 27. B. W. Bevan, Ground-Penetrating Radar at Valley Forge, Geophysical Survey Systems Inc., North Salem, NH, 1977. 28. A. P. Annan and J. L. Davis, in Ground Penetrating Radar, J. A. Pilon, ed., Geological Survey of Canada, Paper 904:49–55, 1992. 29. D. C. Wright, G. R. Olhoeft, and R. D. Watts, in Proceedings of the National Water Well Association Conference on Surface and Borehole Geophysical Methods, 1984, pp. 666–680. 30. G. R. Olhoeft, in Physical Properties of Rocks and Minerals, Y. S. Touloukian, W. R. Judd, and R. F. Roy, eds., McGrawHill, New York, 1981, pp. 257–330. 31. P. V. Sellman, S. A. Arcone, and A. J. Delaney, Cold Regions Research and Engineering Laboratory Report 83-11, 1–10 (1983). 32. A. P. Annan, W. M. Waller, D. W. Strangway, J. R. Rossiter, J. D. Redman, and R. D. Watts, Geophysics 40, 285–298 (1975). 33. A. R. von Hippel, Dielectrics and Waves, MIT Press, Cambridge, MA, 1954. 34. M. B. Dobrin, Introduction to Geophysical Prospecting, McGraw-Hill, NY, 1976. 35. R. E. Sheriff, Encyclopedic Dictionary of Exploration Geophysics, Society of Exploration Geophysics, Tulsa, OK, 1984. 36. S. A. Arcone, J. Appl. Geophys. 33, 39–52 (1995). 37. A. P. Annan and S. W. Cosway, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 747–760. 38. J. L. Davis and A. P. Annan, Geophysics 37, 531–551 (1989). 39. N. Engheta, C. H. Papas, and C. Elachi, Radio Sci. 17, 1557– 1566 (1982). 40. E. Lanz, L. Jemi, R. Muller, A. Green, A. Pugin, and P. Huggenberger, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 1261–1274. 41. P. Huggenberger, E. Meier, and M. Beres, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 805–815. 42. L. B. Conyers and C. M. Cameron, J. Field Archaeology 25, 417–430 (1998).
H HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
charged secondary ions can be collected directly, energyanalyzed, and mass-separated, according to their mass-tocharge ratio, by a mass filter or spectrometer to provide mass-resolved signals adequate for SIMS microanalysis in the form of mass spectra, depth profiling, and imaging. The last results in creating two-dimensional compositional maps of the analyzed surface. The neutral sputtered atoms can also be collected and identified by various methods of postionization. Secondary electrons are also abundantly emitted in the primary ion bombardment process, and provide signals suitable for imaging the surface topography or to yield material contrast. When the primary ion probe is rastered across the sample, these signals yield images analogous to those obtained by the scanning electron microscope (SEM). It is customary to classify the conditions by which this form of ion analysis is performed into two categories: ‘‘static’’ and ‘‘dynamic’’ SIMS (1). The former refers to primary ion bombardment conditions that affect, essentially, only the topmost monolayer of the sample surface. The latter refers to conditions that perturb the near-surface layers of the sample by complex interactions of the impinging ions with the target material and lead to rapid sputter-erosion of the sample. Dynamic SIMS is generally carried out with primary ion fluences higher than those employed for static SIMS. Reactive ion species are often used in dynamic SIMS to enhance secondary ion yields. Under dynamic SIMS conditions generally necessary for SIMS imaging, the surface of the material is continually eroded at a controllable rate, and new layers of the sample are sequentially exposed. This process, which is effectively analytical tomography on a microscale, permits the three-dimensional reconstruction of the chemical and structural constitution of a volume of the target object. Before describing the methods used for SIMS image formation and discussing issues of analytical sensitivity and image resolution, it is important to recall the fundamental physical processes that govern secondary ion emission and that ultimately contribute to the availability of signals suitable for forming analytical images or maps. The feasibility of SIMS depends primarily on the sputtering yields (2) and on the ionization probabilities of the sputtered atoms (3,4). In fact, for a species A present in a sample at concentration CA , the detected secondary ion current IA can be expressed as
RICCARDO LEVI-SETTI KONSTANTIN L. GAVRILOV The University of Chicago Chicago, IL
INTRODUCTION Several modern microanalytical techniques strive to describe the chemical composition of materials by images at ever increasing spatial resolution and sensitivity. Digital micrographs that depict the two-dimensional distribution of selected constituents (analytical maps) across small areas of a sample are one of the most effective vehicles for recording the retrieved information quantitatively. One technique in particular, secondary ion mass spectroscopy or, more appropriately, spectrometry (SIMS) imaging, has been advanced during the last two decades to reach analytical image resolution of a few tens of nanometers. High-resolution SIMS imaging has become practical due to the development of finely focused scanning ion probes of high brightness, incorporated in scanning ion microscopes or microprobes (SIM), as will be illustrated in this context. SIMS images will be shown that embody a wealth of correlative interwoven information in a most compact form. These can be thought of as two-dimensional projections of an abstract multidimensional space that encompasses the spatial dimensions (physical structure), the mass or atomic number dimension (chemical structure), and, as a further variable, the concentration of each constituent in a sample (quantification). In this paper, we will review the fundamental principles that underlie the formation of SIMS images, the requirements and limitations in attaining high spatial resolution (90%
10%
Soil moisture index
filariasis prevalence rate versus average available surface soil moisture (Fig. 12) suggests that a critical interval of soil moisture between 0.2 and 0.5 is favorable for transmitting filariasis. Infection rates in villages whose average available soil moisture is less than 0.2 are negligible. Villages surrounded by areas of very high soil moisture availability (0.5 to 0.6) also have low rates of filariasis infection, suggesting that a critical window of soil moisture is necessary for moderate to high rates (>5% infection) of filariasis infection. The critical soil moisture window is between 0.2 and 0.5 average available surface soil moisture, and the highest incidence of disease falls between 0.3 and 0.44. It is possible that this critical
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
659
NDVI versus temperature delta southwest 1
Normalized difference vegetative index
0.8
fr =100% fr =80% fr =60%
0.6
fr =40% fr =20%
0.4
0.2
fr =0% 0
NDVl SW Mo = 1 Mo = 0.8 Mo = 0.6 Mo = 0.4 Mo = 0.2 Mo = 0
−0.2
−0.4 10
15
20
25
30 35 Temperature ( °C)
40
45
50
Figure 11. NDVI plotted against ground temperature for the Delta study area where soil moisture contours are overlaid. The parameter fr is the fractional area covered by vegetation, and Mo is the measure of available moisture.
40 Filariasis prevalence (%)
35 30 25 20 15 10 5 0 0
0.1
0.2 0.3 0.4 Average surface soil moisture availability
0.5
0.6
Figure 12. Prevalence of filariasis plotted against soil moisture for 173 villages within the Nile delta study area.
available soil moisture interval reflects habitat conditions most favorable for breeding and survival of Culex pipiens mosquitoes. The prevalence rate is variable within the critical soil moisture interval, which suggests that other factors such as socioeconomic conditions, availability of pesticides to reduce mosquito populations, and availability of antifilarial drugs to treat infected humans modulate the prevalence rate. Low infection prevalence in villages whose soil moisture availability is less than 0.2 may also reflect conditions in which low humidity interferes with the transmission of infective larvae from the mosquito to the human host. IMAGING REQUIREMENTS FOR THE NEXT DECADE In this article, a number of application areas for imaging in geology are covered, including specific examples.
The material focuses on imaging requirements for geologic applications during the next decade for resource, hazard, and disaster assessments, monitoring effects of global change, and better understanding of relationships between pathogen dispersal and geologic controls. A key requirement for optimum use of imaging for these application areas is to continue to implement the planned program of earth-orbiting satellites. These include the Earth Observing System spacecraft that are designed to provide systematic and long-term monitoring of the earth’s atmosphere, surface, and oceans. TERRA, the first of these spacecraft, is beginning to make systematic observations after a period of instrument checkout and calibration. Key instruments on board include MODIS, an imaging spectrometer that covers the 0.41- to 14.4-micrometer wavelength region at a spatial resolution of 250–1000 m. MODIS data will be processed to a suite of standard
660
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
products that will be of great use in monitoring global change. A second instrument is ASTER, a multispectral imaging system that operates at wavelengths from 0.52–11.7 micrometers at spatial resolutions ranging from 15–30 m for the visible and reflected infrared and 90 m for the thermal infrared portions of the spectrum. ASTER data will be acquired in stereo so that detailed topographic maps can be generated. The data will be of use in a variety of studies because the standard products pertain to mineral and rock mapping, moisture mapping, and mapping vegetation on a larger scale than possible using MODIS. A second important NASA mission is LightSAR, a planned interferometric radar system that will map the topography of earth’s landforms. These data will complement the recently acquired interferometric data obtained during the Shuttle Radar Topographic Mapping Mission (SRTM). The Tropical Rainfall Measuring Mission (TRMM) is an Earth Probe mission that will map rainfall and be of major use in hydrogeologic studies. Highaltitude aerial photography will continue to be acquired and will continue to be a very important component of many geologic applications. What is missing from the suite of planned missions is a platform and associated instrumentation for acquiring imaging spectrometric data at high spatial resolution, where pixel sizes are 20 m or smaller. The original Earth Observing System Program had such an instrument, termed HIRES, that was meant to be complementary to MODIS and focused on mapping for geologic applications. Acknowledgments We thank the NASA Solid Earth Sciences and Natural Hazards Program for Goddard Space Flight Center Grant NAG 5-7613 to Washington University.
ABBREVIATIONS AND ACRONYMS AIRSAR ASTER AVHRR AVIRIS HIRES HRV IR ITCZ MDC MODIS NASA NDVI NOAA SAR SPOT SRTM SVAT TIMS TM TOPSAR TRMM
airborne synthetic aperture radar advanced spacebone thermal emission and reflection radiometer advanced very high resolution infrared radiometer airborne visible infrared imaging spectrometer high resolution imaging spectrometer high resolution visible infrared intertropical convergence zone missouri department of conservation moderate resolution imaging spectroradiometer national aeronautics and space administration normalized difference vegetation index national oceanic atmospheric administration synthetic aperture radar satellite probatoire pour l’observation de la terre shuttle radar topography mission soil vegetation atmosphere transfer thermal infrared multispectral scanner thematic mapper topographic synthetic aperture radar tropical rainfall measuring mission
USACE USFWS UTM
united states army corps of engineers united states fish and wildlife service universal transverse mercator
BIBLIOGRAPHY 1. United States Geological Survey, Geology for a changing world, A Science Strategy for the Geologic Division of the U. S. Geological Survey, 2000–2010. USGS Circular 1172, 1998. 2. F. F. Sabins, Remote Sensing: Principles and Interpretation, 3rd ed., W. H. Freeman, NY, 1997. 3. F. M. Henderson and A. J. Lewis, eds., Principles and Applications of Imaging Radar, Manual of Remote Sensing, 3rd ed., John Wiley & Sons, Inc., NY, 1998. 4. D. Massonnet et al., Nature 364, 138–142 (1993). 5. M. H. Carr, The Surface of Mars, Yale University Press, New Haven, 1981. 6. R. B. Vincent, Fundamentals of Geological and Environmental Remote Sensing, Prentice-Hall, Englewood Cliffs, NJ, 1997. 7. R. E. Arvidson et al., in C. Pieters and P. Englert, eds., Remote Geochemical Analysis: Elemental and Mineralogic Composition, Cambridge University Press, NY, 1993, pp. 247–282. 8. G. Vane, J. E. Duvall, and J.B. Wellman, in C. Pieters and P. Englert, eds., Remote Geochemical Analysis: Elemental and Mineralogic Composition, Cambridge University Press, NY, 1993, pp. 121–143. 9. A. B. Kahle and A. F. H. Goetz, Science 222, 24–27 (1983). 10. F. T. Ulaby and C. Elachi, eds., Radar Polarimetry for Geoscience Applications, Artech House, Norwood, MA, 1990. 11. H. A. Zebker et al., IEEE Trans. Geosci. Remote Sensing GE30, 933–940 (1992). 12. Z. Berger, Satellite Hydrocarbon Exploration, SpringerVerlag, Berlin, 1994. 13. M. Sultan et al., Tectonics 12, 1,303–1,319 (1993). 14. W. D. Quinlivan, C. L. Rogers, and H. W. Dodge Jr., Geologic map of the Portuguese Mountain Quadrangle, Nye County, Nevada, USGS Misc. Inves. Series Map I-804, 1974. 15. R. R. Elliot, Nevada’s Twentieth-Century Mining Boom, University of Nevada Press, Reno, 1966. 16. R. N. Clark, T. V. King, M. Klejwa, and G. A. Swayze, J. Geophys. Res. 95, 12 653–12 680 (1990). 17. R. L. Frost and U. Johansson, Clays and Clay Minerals 46, 466–477 (1998). 18. S. J. Gaffey, American Mineralogist 71, 151–162 (1986). 19. J. B. Adams, M. O. Smith, and A. R. Gillespie, in C. Pieters and P. Englert, eds., Remote Geochemical Analysis: Elemental and Mineralogic Composition, Cambridge University Press, NY, 1993, pp. 145–166. 20. D. A. Rothery, Geology Today 5, 128–132 (1989). 21. D. Massonnet, M. Didier, P. Briole, and A. Arnaud, Nature 375, 567–570 (1995). 22. S. H. Cannon et al., GSA Today 8, 1–6 (1998). 23. D. R. Rodenhuis, in S. A. Changnon, ed., The Great Flood of 1993: Causes, Impacts, and Responses, Westview Press, Boulder, 1996, pp. 29–51. 24. Scientific Assessment and Strategy Team (SAST), Sharing the Challenge: Floodplain Management into the 21st Century: Report to the Administration Floodplain Management Task Force, part V. U.S. Government. Printing. Office., Washington, DC, 1994.
IMAGING SCIENCE IN ART CONSERVATION 25. G. K. Schalk and R. B. Jacobson, USGS Water-Resources Investigations Report 97-4110, 1997. 26. N. Izenberg et al., JGR-Planets, SIR-C Special Issue, 101, 23 149–23 167 (1996). 27. U. S. Fish and Wildlife Service (USFWS), Proposed Big Muddy National Fish and Wildlife Refuge Jameson Island and Lisbon Bottoms Units Final Environmental Assessment. Puxico, MO, 1994. 28. T. H. Schmudde, Anns. Assoc. Am. Geogr. 53, 60–73 (1963). 29. R. B. Jacobson et al., in Initial Biotic Survey of Lisbon Bottom, Big Muddy National Fish and Wildlife Refuge, V. J. Burke and D. D. Humburg, eds., U.S. Geological Survey, Biological Resources Division, Biological Science Report USGS/BRD/BSR–20000-0001, 1999. 30. J. R. Jensen, Introductory Digital Image Processing: A Remote Sensing Perspective, Prentice-Hall, Upper Saddle River, NJ, 1996. 31. A. F. H. Goetz, B. N. Rock, and L. C. Rowan, Econ. Geol. 78, 573–590 (1983). 32. D. L. Galat et al., Bioscience 48, 721–733 (1998). 33. G. L. Miller, Monitoring Interannual Changes in the Arrow Rock Bottoms/Jameson Island/Lisbon Bottoms Missouri River Floodplain as a Result of the Floods of 1993, 1995, 1996, and 1997, M. A. Thesis. Washington University, St. Louis, MO, 1997. 34. T. E. Graedel and P. J. Crutzen, Atmosphere, Climate, and Change, Scientific American Library, New York, 1995. 35. J. Charney, P. H. Stone, and W. J. Quirk, Science 187, 434–435 (1975). 36. J. Otterman, Science 186, 531–533 (1975). 37. C. J. Tucker, H. E. Dregne, and W. W. Newcomb, Science 253, 299–301 (1991). 38. M. K. Crombie et al., Photogrammetric Eng. Remote Sensing 65, 1,401–1,409 (1999). 39. E. A. Ottesen and C. P. Ramachandran, Parasitology Today 11, 129–131 (1995). 40. R. R. Gillies and T. N. Carlson, J. Appl. Met. 34, 745–756 (1995).
IMAGING SCIENCE IN ART CONSERVATION J. S. ARNEY L. E. ARNEY Rochester Institute of Technology Rochester, NY
Museum professionals regularly apply a variety of imaging technologies to artistic and historic objects. The particular imaging technique applied is determined in part by the intended use of the image. Ordinary photography, for example, is used to illustrate the approximate appearance of objects in a museum collection, and optical microscopy is commonly used to identify pigments in an oil painting. The choice of an imaging technique is also influenced by the anatomic structure of the object. For example, traditional drying oils used as vehicles and varnishes in oil paintings fluoresce more strongly as they age. Thus, a photograph taken under ultraviolet illumination is often used to reveal and document discontinuities between original and
661
restored or altered regions of a painting. On occasion, imaging techniques of this kind have been used to uncover fakes and forgeries, but such applications of imaging to museum studies are rare. By far, the most important use of imaging technologies is to archive information about the object and to extract information about the structure of the object. Archiving preserves information and makes information more readily available to scholars, and structural analysis provides guidance to museum professionals in conserving and restoring objects. This article reviews imaging techniques intended for these two purposes. ARCHIVAL IMAGING TECHNIQUES Chemical Imaging Systems The primary objective of an archive is to preserve information. Making that information readily available is a secondary objective. Often this means carefully maintaining the original book, newspaper, or other object that contains the information. Careful maintenance of objects often means limiting access to the information. Limiting access is particularly important if the original object is fragile or is subject to rapid environmental degradation. Moreover, a single copy of an original document is vulnerable to catastrophic loss, as demonstrated by the conflagration of the library at Alexandria in 641 A.D. Thus, preservation of information often requires that the archivist make a copy of the original information, a process often called reformatting by archivists, or backing up by computer technologists. The most important properties of imaging technologies for archival copying are permanence, accuracy, and retrievability (1,2). The development of photographic technology during the past century has provided several techniques to the archivist for archival photocopying. A camera using ordinary monochrome film, processed by archival techniques (3), can make a very stable negative image of a document, for example, and multiple prints on photographic paper provide convenient access to the information and cause no additional stress on the original object. The convenience and relative stability of photographic processes has led to their widespread use by archivists. Moreover, the ability to compress the data in a document by optically reducing the size of the copy image has led to the development and extensive use of microfilm technology. Silver Halide Microfilm. Microfilms used for capturing and archiving images directly from the original object are called ‘‘source-document’’ or ‘‘first-generation’’ microfilms. They are silver halide films that have a single emulsion layer (4) and are exposed, developed, and fixed by using conventional photographic reagents. The international standard ISO 9848 describes sensitometric procedures for characterizing films used for first-generation microfilming. The standard defines the D–Log(H) curve in terms of diffuse visual density D and exposure H, defined as the time integral of luminance from a black body radiator at 2650 ± 100 K. An index of sensitivity S and an index
662
IMAGING SCIENCE IN ART CONSERVATION
of gamma, called gradient G, are defined by Eqs. (1) and (2):
S=
45 , S= Hm
(1)
Dm − Dn G= . Log(Hm ) − log(Hn )
(2)
The density values Dm and Dn are defined as 1.20 and 0.10 above base plus fog, and the exposures Hm and Hn are the exposures required to produce Dm and Dn , respectively. Microfilms generally have high contrast for maximum image resolution. This requires multiple exposure testing when critical exposures are to be made. The high contrast of microfilm imposes practical problems in manuscript imaging. Fragile old books and manuscripts, particularly from rare book collections, cannot be flattened easily for photographic copying. This requires illumination from almost directly overhead and makes eliminating specular reflections more difficult. Moreover, nonflattened originals require a greater depth of field of focus, and this dictates a higher level of illumination which is undesirable when dealing with fragile historic manuscripts. Secondary Microfilming Using Vesicular Film. Many source-document microfilms are read directly by using projection readers, but they are also used to print multiple copies of second-generation microfilms for general distribution and use. These secondary copies are often made by contact printing in a nonsilver imaging process called vesicular imaging that is based on the photoinduced decomposition of aryl diazonium salts (4). The diazonium salt is dispersed in a polymer binder and coated on a transparent substrate. Exposure is made using an ultraviolet source, and the aryl diazonium salt decomposes to generate a latent image of nitrogen microbubbles.
N2+ X−
defined in ISO 9378 in terms of Eqs. (3) and (4):
hν
X + N2
The latent image is developed by heat. The heat softens the polymer and expands the gas. On cooling, the expanded voids in the polymer, called vesicles, remain and act as centers for light scattering. The final image is viewed in transmitted light by a projection microfilm reader. Exposed regions are highly scattering to light and appear dark. Unlike transparent materials based on absorption of light, vesicular materials have an image density that varies significantly with the aperture of the projection optical system. For this reason, measurements of visual density of vesicular images must include the optical aperture used for the measurement. The standard is f /4.5 for vesicular microfilms. The sensitometry of vesicular microfilms is based on a radiometric D–Log(H) curve, where H is in units of joules/meter2 , because exposure is by ultraviolet rather than visible radiation. The sensitivity and contrast are
1000 , Hm
LER = Log (Hm ) − log(Hn ).
(3) (4)
LER (log exposure ratio) is the practical range across which image formation occurs. In particular, Hm is the exposure to produce a visual density of 1.20 above Dmin (base plus fog density), where visual density is defined as an ISO standard visual density measured through an f /4.5 projection optical system. Similarly, Hn is the exposure to produce a density of 0.1 more than Dmin . Specialized Digital Imaging Techniques Digital electronic imaging is gradually replacing chemical imaging as an archiving tool. Digital cameras and scanners are the most common tools used to generate digital archives, and these devices are described in detail elsewhere in this encyclopedia. The imaging devices and techniques used for digital archiving are not unique, but digital archives present many new problems to the archivist in managing the archive. These problems include file security, provenance of a digital copy, copyright, and most important of all, the permanence of a digital file (5). The advantages of a digital format are driving significant development activity to solve these problems. Most museums maintain a catalog of photographs of the objects in the museum collection. Such pictorial archives serve a variety of needs such as teaching, preliminary art historical research, insurance documentation, and museum advertising. Often a photograph of an object assists art conservators in treating and restoring of the object. In addition, a copy image of an object can be manipulated to illustrate the way the object may have originally looked or to illustrate the probable appearance following proposed restorative treatment. Such surrogate restorations have been done by photographic techniques (6–8), but the advent of digital imaging has significantly improved the quality of such simulated restorations. For example, fading studies of commercial colorants used in early twentieth century color photographic processes have led to digital simulations of fading kinetics. The digital reversal of the kinetic curves resulted in surrogate restorations of faded images (6,9,10). The ease and widespread availability of digitally archived images are a clear advances over conventional photographic archives. (1,2,5,11–14). Moreover, pictorial images and associated text may be archived to allow search engines to locate images by complex stylistic attributes. (17–19). Permanence and long-term retrievability of digitally archived information is still a practical concern, but the need for reliable and permanent archives of digital information in nearly every technical and economic field of endeavor will drive the development of permanent material formats and long-term standards for digital storage. Moreover, because the 1’s and 0’s of digitally archived information are intrinsically permanent, a digital archive is not subject to fading and degradation in appearance as are traditional photographic archives.
IMAGING SCIENCE IN ART CONSERVATION
Thus digital archiving provides an accurate and reliable record of the condition and appearance of an object, and this in turn provides more accurate and reliable information for art historical scholarship and for conservation and restoration. The attributes of an imaging system that is appropriate for capturing pictorial images of museum objects will always be at the limit of practical achievability. (11). An ideal surrogate image of a museum object would have infinite spatial resolution, for example, and each pixel would contain the entire electromagnetic spectrum of that point on the image (20). Each pixel of an ideal pictorial copy of a museum object would contain all information about each point on the original object (gloss, elastic modulus, chemical composition, etc.). This is clearly not achievable, but there is no certain limit to how far one can go in developing approximations of the ideal. R&D efforts continue in the museum profession to develop techniques for pictorial archives, and often these efforts push the available imaging technology to the limit. Three examples are described following. The Charters of Freedom System. Beginning in the early 1980s, the National Archives and Records Administration of the United States began a $3 million project with Jet Propulsion Laboratories and Perkin-Elmer Company to develop the Charter of Freedom Monitoring System to monitor the condition of the U.S. Declaration of Independence, Constitution, and Bill of Rights, collectively called the ‘‘Charters of Freedom’’ (21). The system was completed in 1987 using the best CCD array available at the time. The CCD was mounted in a specially built camera and was thermoelectrically maintained at 18.1 ° C. Optics focused the CCD array onto a 3-cm field of view, thus providing a sampling frequency of 33 mm−1 . The camera was mounted on a high precision, mechanical mount, as illustrated in Fig. 1, and a highly reproducible illumination system was installed to achieve the highest possible radiometric reproducibility for capturing monochrome images. Each of the Charters of Freedom is mounted in a hermetically sealed glass case under a helium atmosphere. The glass cases are periodically removed from display at the National Archives and mounted under the camera for examination. Two illuminators project focused, rectangular slits of light on the document. The illuminators are 25° from opposite sides of the vertical camera. This geometry of illumination, coupled with extensive baffling within the camera barrel, was selected to minimize flare light from multiple reflections involving the several glass surfaces of the document case. The entire system was engineered to provide the highest reproducibility achievable at that time for mechanical, optical, radiometric, and electronic subsystems to monitor the Charters of Freedom and detect the possibility of minute changes in the documents. The VASARI Scanner. Visual Arts: System for Archiving and Retrieval of Images (VASARI) was a project begun in the late 1980s and is another example of a major effort to engineer an imaging system for digital archiving to
663
Granite uprights
Mechanical positioner
Camera
Glass document case Optics table
Pneumatic vibration dampers Figure 1. Approximate illustration of U.S. National Archives Charters of Freedom (21).
Figure 2. Photo of VASARI Scanner (23).
the limit of technical capability (22,23). VASARI was led primarily by the National Gallery of London but involved a consortium of European museums. The VASARI scanner, shown in Fig. 2, was developed around a commercially available camera using a 500 × 290 CCD array and a
664
IMAGING SCIENCE IN ART CONSERVATION
computer using a 386 processor. The camera was used in conjunction with an accurate mechanical stage to capture a series of subimages of regions of the original object. The subimages were then combined as a mosaic to produce a single digital image. This technique achieved a sampling frequency between 10 and 20 mm−1 across more than a square meter of the original object. The VASARI scanner also achieved very high radiometric accuracy. This was required to produce a seamless mosaic of the subimages, but it also was required to capture colorimetrically accurate data at each pixel location in the final image. Rather than using traditional trichromatic imaging in red, green, and blue, the VASARI system employed polychromatic imaging (23). Seven filters were used to control the power spectrum of the light illuminating the object during image capture. The seven filters were used in sequence to capture seven images (eight bits each) at each location of the CCD camera. The seven images provided data for an empirical but highly accurate calibration to standard CIE color coordinates. The resulting system reportedly achieved, pixel by pixel, colorimetric accuracy of E∗ab = 1.1 in CIE-L∗ a∗ b∗ units. This is at the threshold of a color difference just noticeable in a side by side color comparison by the average observer under optimum viewing conditions. By comparison, conventional color photography cannot achieve better than E∗ab = 10 under the best of circumstances (24). The VASARI polychromatic technique was described as a spectral imaging technique (23), but it appears to be only a trichromatic imaging system. However, Tzeng and Berns (25) showed that as few as seven spectral bands of information can provide quantitative estimates of reflection spectra for broad spectrum colorants typically encountered in museum paintings. The MARC Process. Methodology for Art Reproduction in colour (MARC) was also developed at the National Gallery in London and used the same 500 × 290 CCD array used in VASARI (22). However, the MARC system was designed to be easier and faster to use without a significant loss in image information. The system focused a single image of the original object onto an image plane inside the camera. High spatial resolution was achieved by moving a metal mask that has microholes in front of the CCD array, as illustrated in Fig. 3. The mask placed one square hole in front of each pixel, so that only 1/48th of the area of each CCD detector was exposed. In this way, the CCD captured exposure information corresponding to the focused detail of the original image within the CCD. Piezoelectric positioners then shifted the metal screen to the left and/or down to place the hole in a different position in the optical image projected on the CCD. The image of this portion of the focused image was then captured. The mask was shifted, so that the holes were positioned at six horizontal and eight vertical positions on the CCD to increase the sampling frequency of the CCD array from 500 × 290 to 3, 000 × 2, 320. This technique is called micropositioning and is half of the procedure used in the MARC system. The second procedure used by MARC is called macropositioning. Following the 46 image captures in
Original painting
Lens
Part of metal mask CCD array Figure 3. The MARC technique of increasing sampling frequency.
the micropositioning sequence, the entire CCD array and mask are shifted to a new region of the image plane within the camera, and 46 additional images are captured. The CCD array is moved to seven horizontal and nine vertical positions to record a total of 63 frames, each frame of 46 images. Using sufficient overlap to achieve a seamless mosaic, the final image is approximately 20, 000 × 20, 000 pixels. From the perspective of the operator, the system is much easier to use than VASARI because the camera remains in a single position and can be pointed in any direction like a typical tripod mounted camera. However, an extremely stable mount is required to minimize loss of spatial resolution through motion artifacts. Color information is obtained from the MARC system by placing a movable RGB filter array over the CCD array, so that each image is captured in red, green, and blue. The system is calibrated against a Macbeth Color Checker Chart, and a set of polynomial coefficients is determined to convert from the camera RGB values to estimated values of CIE-XYZ. Accuracy of E∗ab = 3.5 was claimed for the system. Archival Object Imaging. In additional to images of paintings, three-dimensional objects such as sculpture and ethnographic objects are included in pictorial archives in museums. Multiple photographs are often required to archive such objects. A three-dimensional, full color imaging process developed by the Canadian National Research Council (CNRC), Autonomous System Lab, can provide quantitative three-dimensional documentation of a colored object (15,16). Through a laser scanning technique, the three spatial dimensions of an object are measured, and by scanning using red, green, and blue laser beams, trichromatic color information is obtained at each point on the object. The technique, called Optical Ranging, has been adopted and used extensively by the Canadian Conservation Institute. The system is illustrated schematically in Fig. 4. The CNRC range scanner uses an RGB, He–Cd laser to allow recording color information. Range z and height y are horizontal and vertical in Fig. 4. The x dimension is determined by rotating the object around the vertical axis
IMAGING SCIENCE IN ART CONSERVATION
examining conservator searches for discontinuities in the object as indicators of its physical condition. Film cameras have long been used to document these examinations, but now digital imaging significantly enhances the utility of examination techniques as illustrated following.
B
Object
y θ
(y,z )
r
A Laser source D
z
665
E
C Figure 4. The NRC depth color scanner (15,16).
or translating the object in the direction perpendicular to y, z. A single scan in the y, z plane involves moving mirror A through one scan of angle θ . Mirror A is not a beam splitter but is fully silvered on both sides. The laser beam is reflected by mirror A and by the fixed mirror B and comes to a focus on the object. The laser point is reflected diffusely from the object. An image of the laser spot is reflected through fixed mirror C and then movable mirror A and is brought to focus on the linear CCD array E. The position of the spot image r is correlated with the mirror angle θ . The position vector (y, z) maps uniquely onto the instrument vector (θ , r). By geometric calculation or empirical calibration, the mapping from (θ , r) to (y, z) can be obtained. The scan is carried out at 20 kHz as the object is translated or rotated through the y, z plane. Beam splitting into RGB signals then provides the color information, and the result is a three-dimensional color image of the object.
Illumination Angle. Most museum objects are meant to be viewed in diffuse conditions of illumination, and under such conditions the object appears most pleasing esthetically. However, by examining the object under alternative geometries of viewing and illumination, clinically significant information can be obtained. The term raking illumination is used to describe the examination of objects illuminated at a low angle relative to the horizon. As illustrated in Fig. 5, raking illumination can reveal both topographic and gloss variations that are not easily seen under ordinary diffuse illumination. In this example, a damaged region that had been repaired by early restoration is seen in region (a), but raking-angle examination shows that the damage is more widely distributed and cracks and buckles are as far down as region (b). The topographic feature revealed at (c) suggests yet another small region that was restored and in-painted, leaving a slightly raised discontinuity in the paint layer.
(a)
DIAGNOSTIC IMAGING TECHNIQUES The analysis of art and ethnographic objects can be divided into two distinct categories. The first is esthetic analysis, the traditional domain of curators and art historians. The second category is diagnostic analysis of the object, often called scientific analysis in the museum profession (26). Diagnostic techniques such as chemical analysis, viscoelastic analysis, and optical analysis provide objective information to museum technologists about the physical condition of the object. This information is used in diagnosing and treating physical and chemical conditions that threaten the permanence of museum objects. In addition, diagnostic analysis can provide information about the provenance of an object and the methodology of the artist. This review describes imaging systems and techniques used in museum diagnostic imaging.
(b)
Examination by Visible Light The technical analysis of a museum object begins with visual inspection by an art conservator. This is analogous to a clinical examination carried out by a physician. Pictorial documentation of the examination, using film or electronic cameras, is common practice among art conservators. Thus, the most straightforward technology for museum analytical imaging involves the methodologies for visual inspection. In general, the
Figure 5. Detail of oil painting on leather (a) in diffuse light and (b) in raking specular light.
666
IMAGING SCIENCE IN ART CONSERVATION
Histogram Analysis. The ability to capture digital images offers a means of documenting a museum object and also of monitoring its condition and of making quantitative measurements. An early example of a histogram segmentation analysis was reported by the Folger Shakespeare Library in Washington, D.C. (27). Fig. 6 is an example analysis. The histogram segmentation shown in Fig. 6 was developed by Folger Library as a quantitative tool for measuring irregular areas of loss in printed illustrations from old manuscripts. The void area can be repaired by a technique called leaf casting in which new pulp is used to fill the voids and restore mechanical integrity. The key to a successful leaf casting treatment is to use a volume
of pulp as close as possible to that to required to fill the losses exactly. Too much or too little pulp results in a less pleasing appearance and lower mechanical strength. The problem, then, is to measure the area of the loss and the thickness of the remaining leaf. The technique shown in Fig. 6 involves capturing the image of the damaged leaf using a black background that shows through the irregular void. Then a simple histogram segmentation provides a reliably accurate measure of the areas of the irregular voids. Calibration is done easily by using a black square of known area. Two-Dimensional Histogram Analysis. Two-dimensional histogram analysis has been used to measure more complex forms of degradation in daguerreotypes. This is illustrated in Fig. 7, which shows a daguerreotype illuminated diffusely by using a black background (a) and at an angle equal and opposite to the angle of viewing (b) (28). The daguerreotype photographic process involves the formation of a highly scattering, white material in regions exposed to light. The substrate for the process is a polished silver mirror. Thus, specular illumination produces a negative image Fig. 7 and diffuse illumination produces a positive image Fig. 7. This positive/negative behavior is characteristic of a healthy daguerreotype, but as shown in Fig. 8, regions of the daguerreotype that have progressively more chemical tarnish vary from this positive/negative behavior. Segmentation analysis of the two-dimensional histogram, therefore, provides a quantitative index of the degree of tarnishing suffered by the object. Moreover, the tarnished regions can be mapped back from the histogram to show the regions of the object most damaged by tarnish. Similar techniques have been applied to extract quantitative topographic information from works of art on paper and to analyze watermarks in historic papers (29,30). Microscopy
Figure 6. Intaglio print where the loss region is shown over a black background. The gray level histogram shows that the black region is 12% of the image area.
Optical microscopy has long been used for materials identification and continues to be a major tool for investigating the materials in museum objects (31). A stereomicroscope that has a magnification between 10 and 100x is useful for preliminary examination of the surface characteristics of museum objects (32) and for taking microsamples for closer examination under a polarized light microscope (33–35). Samples smaller than 100 µm in diameter can generally be taken from a painting or other museum object without impacting the visual appearance of the object at all. A skilled polarized light microscopist who has museum experience can generally identify a pigment sample after a few minute inspection. Often a pigment, a binder, or a fiber can be identified immediately by its shape and microstructure. In addition, quantitative measurements can be made of the number of sides of a crystalline material and the angles of intersections of sides and edges. Examination by crossed polarizers often adds significantly to the observed microstructure of crystalline pigments and fiber materials, and a heated stage provides a direct measurement of melting and decomposition points. If identification is still uncertain, the
IMAGING SCIENCE IN ART CONSERVATION
(a)
667
1
R pos
0
0
1
R neg Figure 8. Two-dimensional histogram of the images in Fig. 7 showing reflectance of the positive image Rpos versus reflectance of the negative image Rneg . The histogram population is displayed in a gray scale, dark is for high population, and white for zero.
(b)
Figure 9. Photomicrographs of a cross section of paint from an oil painting. Images are in visible light and in several infrared wavelengths (38a). Figure 7. Daguerreotype histogram analysis: (a) image in diffuse light; (b) image in specular light.
literature on microscopic analysis is filled with a battery of selective staining reagents, refractive index oils, and microchemical spot tests. Electron microscopy is a technique capable of much higher magnification than optical microscopy and provides additional information about the structure of materials on a submicron scale. Still higher magnification, approaching molecular dimensions, is achievable by atomic force microscopy. In addition, supplemental microprobes capable of chemical or elemental analysis extend the utility of microscopy for materials analysis. Optical microscopes, for example, can be combined with Fourier transform infrared
(FTIR) microprobes to analyze organic compounds (36,37). Scanning electron microscopes are often configured with an energy dispersive, X-ray fluorescent microprobe for elemental analysis (36,38). These techniques then can provide a map of the microdistribution of a material in a museum object. For example, Fig. 9 illustrates digital photomicrographs of a cross section of a paint sample taken from an oil painting. The images were captured in visible light and at a series of infrared wavelengths, and principal component analysis can be applied to extract spatially resolved information about the materials and their changes during aging. A major advantage of microscopy as an analytical tool is its general applicability. Microscopy can be used to identify inorganic as well as organic pigments, fibers,
668
IMAGING SCIENCE IN ART CONSERVATION
resins, metals and corrosion products, ceramics, glass, etc. However, a major drawback of optical microscopy is the need for a trained microscopist. Several years of training and practice are required to become even a moderately skilled microscopist, and often sophisticated instrumentation is more available to art conservators than a trained microscopist.
(a)
Examination Using Ultraviolet and Infrared Energy Ultraviolet Illumination and Fluorescent Discontinuities. Museum objects often have a complex material composition, and examination of fluorescent patterns while illuminating an object by a mineral light is a common technique used by art conservators (39,40). Some adhesive materials fluoresce, revealing otherwise undetectable restorations in ceramic objects (41). A signature on a wooden panel, for example, may be enhanced and identified by ultraviolet examination. Drying oils, commonly used as pigment vehicles and as varnishes, dry by concurrent chemical oxidation that often produces fluorescent chromophores. Thus, oil paintings generally become more fluorescent as they age, and regions of a painting that have been damaged and repaired often appear much darker by UV examination, as illustrated in Fig. 10 (52). The most common technique is to use a color film or digital color camera to record the fluorescent image. A UV blocking filter is placed over the camera lens to confine exposure of the film or digital sensor to visible fluorescence (42,43). A technique called ultraviolet fluorescence microscopy has been used to identify binders in different paint layers in an oil painting (44). The technique involves the application of fluorescent staining reagents to the paint cross sections. Figure 11 illustrates a paint cross section that has a canvas support, an acrylic gesso ground, zinc white in linseed oil, and egg white. Image (a) was captured while the sample was under UV illumination. Other samples were prepared using fluorescent reagents known to bind specifically to various kinds of binders. Image (b) illustrates a sample from the same painting stained with rhodamine B. From this and other stained samples, the overall composition of the painting was interpreted, as shown in drawing (c) (44). Spectral Characteristics of Pigments in the NearInfrared. Imaging using near-infrared radiation was introduced into museum laboratories in the 1930s when commercial, IR sensitive film became available. The limited sensitivity of early black-and-white IR films required illumination by a strong incandescent source. Nevertheless, the different near-IR spectral characteristics of the pigments in many paintings proved very useful. For example, ultramarine blue and azurite can appear very similar under visible light. However, azurite absorbs near IR strongly and thus appears very dark on positive prints made from IR-sensitive film. Ultramarine blue, however, is a poor absorber of near IR and produces much lighter images on the same positive prints. In the 1950s, the Eastman Kodak company developed false color, near-IR film for use by the military to detect camouflage (45). This film has three emulsion layers,
(b)
Figure 10. Oil painting on a wooden panel from the workshop of Rogier van der Weyden, Netherlandish. Image (a) is by ordinary light and (b) is by UV illumination (52). The Dream of Pope Sergius, Workshop of Rogier van der Weyden (Netherlandish, 1399/1400–1464), Number 72.PB.20, The J. Paul Getty Museum, Los Angeles, CA.
each sensitive to a different part of the spectrum. Green light exposes a layer that controls yellow dye, red light controls magenta dye, and IR radiation from 700 to 900 nm controls cyan dye. The film was made commercially available in the 1960s and had immediate use as a tool for nondestructive identification of pigments in art. The film was much more sensitive than older black-and-white film, thus requiring much less light to capture an image, an important consideration when handling light-sensitive
IMAGING SCIENCE IN ART CONSERVATION
(a)
669
(a)
(b)
(c)
Egg white
Zinc white in linseed oil
Acrylic gesso
(b)
Figure 11. Photomicrograph illustrating the use of material-specific stains in examining the construction of an oil painting. Images (a) and (b) are by ultraviolet radiation. Image (b) was stained by rhodamine B. Image (c) was drawn based on the interoperation of fluorescent patterns in these and other stained samples (44).
museum objects. Figure 12 illustrates the use of IR false color film in examining an oil painting. Figure 12 shows a detail of a polyptych by Simone Sartini, ca. 1321 (46). The images are both in the red channel of digital RGB copies of color images originally captured on film. Image (a) was photographed using traditional color film, and image (b) was captured on false IR film. The red channel illustrates primarily the difference in the behavior of the pigments in the near IR, and a damaged region is easily observed in the upper left in image (b). Examination revealed that the region of damage had undergone an early restoration and been inpainted using a pigment to match the visual appearance of the painting. The pigment used in the in-painting, however, does not match the original in the near-infrared. The difference can be seen easily as a large hue difference between color prints made from color film and false color film (47). Near-Infrared Radiation and Substructures of Oil Paintings. The optical scattering power of many materials decreases as wavelength increases. This is the reason that the sky is blue and also and near-IR radiation often penetrates deeper into paint layers than visible light (48).
Figure 12. Red channel images of a detail of a polyptych by Simone Sartini. (a) Image captured on traditional color film. (b) Image captured on color IR film (46).
The lower scattering power of paint layers in the IR reveals an underlying structure in the painting that is not visible in the 400–700 nm range. This phenomenon is described quantitatively by the Kubelka–Munk model (49) of paint reflectance, illustrated in Fig. 13. In this model, I is the
670
IMAGING SCIENCE IN ART CONSERVATION
I
J
x
j
dx X
i
Paint with K and S.
Substrate reflectance, Rg
spatial design R(y, z) is controlled by artists by varying the amount of pigment, c(y, z), and the type of pigment, ε(y, z), in the paint layer. The artist may be guided by an underdrawing Rg (y, z) formed from carbon black applied to the top surface of the substrate. In this case, the artist intends to hide the underdrawing underneath the paint layer. If the scattering coefficient S of the paint goes to zero, then Eq. (7) converges to the Beer–Lambert equation for a transparent absorbing layer on a reflective substrate. However, if the paint has a sufficiently high scattering coefficient, Eq. (7) converges to the ‘‘complete hiding’’ limit of Eq. (9): R=a−b (9)
√ where a ≡ 1 + K/S and b ≡ a2 − 1. The absorption coefficient K is related to the concentration c of the chromophore in the paint layer and to the molar extinction coefficient ε of the chromophore by Eq. (8): K = 2.303εcX. (8)
The terms a and b are functions only of K and S, as defined in Eq. (8), so a high scattering paint has a reflectance that is independent of the underlying substrate Rg . Scattering materials are added to formulate a paint for complete hiding. The scattering material must have a high index of refraction compared to the surrounding medium and be present in sufficient concentration to make the overall scattering coefficient S high enough to be completely hiding in a thin paint layer. Some commonly used scattering materials found in paint are rutile (Ti O2 ), flake white (PbO2 ), and gypsum (CaSO4 ). To formulate the paint economically and to maintain good working properties, a minimum amount of scattering material is used that is sufficient to achieve complete hiding in visible light. In the near IR, however, S is lower, and the paint layer often no longer completely hides. Moreover, many paints absorb less strongly in the infrared. These two effects often combine to make the underdrawing, R(y, z), a very significant part of the observed reflectance of the system in the near-IR, so infrared imaging of paintings has become a major tool for examining painting substructures. The types of substructures in oil paintings examined by near-IR imaging techniques include underdrawing, underpainting, pentimenti, and original paint that has been masked by the accumulation of soot and grime (26,50). Underdrawings often were executed using a charcoal pigment, which is highly IR absorptive, applied to an IR reflective, white gesso ground. The underdrawing served as a guide to the artist in executing the painting. Examination of underdrawings by art historians has led to insights into the painting styles of artists. An underpainting is a paint layer underneath another paint layer. Sometimes an artist would paint over another artist’s work either as an economy measure to recycle the support or to hide work not considered appropriate. Art patrons have on occasion hired artists to overpaint objectionable features of paintings. During the Victorian period, many a bare breast of Eve was covered in this way. A pentimento, on the other hand, is a painted detail that was overpainted by the original artist as an intentional design change during the creation of the painting. Analysis of these substructures can provide insights to art historians into the artist’s working process and can provide guidance to art conservators in the appropriate treatment for a painting.
The reflectance R, of any point in the painting is a function of S, ε, c, X, and Rg . In many cases the
Near-Infrared Radiation Imaging Techniques. Near-IR photography, also known as IR reflectography, has been
Figure 13. The Kubelka–Munk model of paint reflection.
flux of light falling onto a paint surface, and J is the flux of light that returns to the surface as reflected light. The reflectance of the paint system is R = J/I and is a function of the layer thickness X, an absorption coefficient K, and a scattering coefficient S. Within a differential layer dx of a paint at any distance x from the surface, there is a flux of light I in the downward direction and a flux J in the upward direction. These fluxes are modeled as first-order absorptive and scattering processes according to Eqs. (5) and (6): dI = KI + SI − SJ, dx dJ = KJ + SJ − SI. − dx −
(5) (6)
The sign convention in Eqs. (5) and (6) is such that the distances x and X are positive, the flux values I and J are positive, and the constants K and S are positive firstorder constants for absorption and scattering, respectively. These expressions show a first-order decrease in both I and J as distances increase as a result of both absorption K and scattering S. Note that scattering changes some of the downward flux I into upward flux J as shown in the right-hand terms of these equations. Equations (5) and (6) can be solved using the boundary condition that at x = X the ratio of the flux values equals the reflectance of the substrate material, J/I = Rg . This leads to Eq. (7) for the system reflectance R = J/I: R=
1 − Rg [(a − b) cot(bSX)] , a − Rg + cot(bSX)
(7)
Detector spectral responsivity
IMAGING SCIENCE IN ART CONSERVATION
1.00
b
c
d
a 0.50
0.00
0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0 4.4 4.8 5.2 Wavelength (µm)
Figure 14. Spectral sensitivity of (a) typical silicon CCD camera, (b) Hamamatsu model N2606-A lead sulfide vidicon, (c) EG&G SO#16471 germanium scanning detector, and (d) Kodak model 310-21-X, PtSi thermal Schottky barrier camera (50).
used only to a limited degree in analyzing substructures of oil paintings. But by using commercially available vidicon cameras in the early 1960s, a technique called ‘‘infrared reflectography’’ became widespread. (39,40,50,51). Several systems have been used extensively in museum studies, and a review of four systems has been reported (50,51). Figure 14 shows the spectral sensitivities of these systems. The simplest system to use is an ordinary silicon CCD camera that has spectral sensitivity to about 0.9 µm. Most inexpensive CCD cameras are manufactured for (a)
(c)
671
use in the visible wavelengths by placing IR blocking filters over the CCD array. CCD video cameras are also readily available without IR blocking filters and can provide useful sensitivity out to 1 µm. Generally these cameras use an IR band-pass filter such as a Wratten 87A to eliminate visible light. The spectral sensitivity of CCD cameras is similar to that of IR film, but the added versatility of real-time video, the versatility of filters and illumination techniques, and the ease of digital processing make CCD video a very useful and inexpensive tool for museum applications. A lead sulfide vidicon system provides sensitivities to 1.7 µm, and in this range, many paints found on museum objects scatter radiation significantly less than visible light. Thus, the lead sulfide vidicon is used extensively as a tool for examining the understructures of paintings. Most infrared reflectography is carried out by using CCD or vidicon cameras used in real time with an ordinary video monitor. Individual images are captured from the video stream by conventional video frame grabbers. However, underdrawings often have relatively fine detail, so multiple close-up images are usually captured to document a large area. These close-up images are then combined in a mosaic to form a high resolution image of the entire region under examination, as illustrated in Fig. 15 (52). Examination of an enlarged detail in (b)
(d)
Figure 15. Infrared reflectographic examination of ‘‘Portrait of a Man,’’ a panel painting by a master of the 1540s. Courtesy of the Fogg Art Museum, Harvard University Art Museums, The Kate, Maurice R. and Melvin R. Seiden Special Purchase Fund in honor of Ron Spronk. (a) In visible light, (b) individual IR images, (c) mosaic constructed of the individual IR images, and (d) enlarged detail (52).
672
IMAGING SCIENCE IN ART CONSERVATION
Fig. 15(d) reveals underdrawing marks by the artist. In this illustration, the underdrawing appears to be a dry medium such as black chalk. Other infrared systems are also in limited use in museum studies. The Kodak model 310-21-X, PtSi thermal Schottky barrier camera, for example, is sensitive from 2.3 to 5 µm. Ambient thermal noise becomes much more significant the farther beyond 2 µm one goes, thus requiring a cooled detector. The Kodak system uses liquid nitrogen, and thus is less convenient for use in museums. Other systems using single detectors and a scanning mode of image capture have been explored (50,53). Such systems offer greater versatility in the selection of detectors, but the experimental complexity of scanning systems compared to the ease of use of video systems limits scanning systems to major museum laboratories. Near-Infrared Radiation Imaging Techniques. β Rays are energetic electrons, but they have lower energy than X rays and γ rays. Because they are charged particles, β rays do not penetrate far in materials before they are captured. β rays are typically produced by placing a thin sheet of lead over the object to be irradiated and then irradiating the lead with hard X-rays. The irradiated lead produces β rays that enter the object. Only very thin objects such as paper (stamps, prints, money, etc.) are examined this way. One example of the utility of β radiography is imaging watermarks in papers. Although watermarks may often be observed in transmitted visible light, they are often obscured by dark inks applied by the printer or artist. However, a β ray is absorbed in direct proportion to the total mass of material it encounters. Inks add negligible mass to paper, but a watermark significantly varies the mass of the paper. Thus, β radiography produces an image only of the watermark and the formation of the paper.
(a)
(b)
Examination Using High Energy Radiation Several imaging techniques based on high-energy radiation have been used extensively in analyzing museum objects. Some of these techniques, such as X radiography, are well known in medical imaging, and museums often prevail upon local hospitals or dentist offices for help in carrying out X-ray examinations of objects. Children’s Film cassette
Figure 17. ‘‘The Virgin and Child’’ from the workshop of Dirck Bouts, Netherlandish, c. 1460, oil on wood. Courtesy of the Fogg Art Museum, Harvard University Art Museums, Gift of Mrs. Jesse Isidor Straus in memory of her husband, Jesse Isidor Straus, class of 1893. (a) Visible light photograph and (b) X radiograph of the Virgin’s face (52).
X-ray source
Painting
Figure 16. Typical arrangement for X-radiographic analysis of museum objects.
Hospital in San Diego, for example, is well known for performing computed tomographic (CT) X-ray imaging of ethnographic and paleontological objects (54). In addition, because museum objects are not harmed by high-energy radiation to the same extent as humans, there are several X-ray and other high-energy methods of imaging that
IMAGING SCIENCE IN ART CONSERVATION
(a)
673
(b)
Figure 18. Illustration of an X radiographic examination of a painting on a wood panel. (a) A black-and-white image in visible light of ‘‘The Deposition’’, attributed to Jan Erasmus Quellinus (Los Angeles County Art Museum). (b) X-radiographic image of ‘‘The Deposition’’ (55).
are useful for materials analysis and imaging of museum objects that would not be applicable to medical imaging. These include imaging using high doses X rays, very high energy X rays, gamma rays, neutrons, and beta rays. X radiography of Paintings. The arrangement of museum X radiographic systems is similar to that used in medical and industrial X radiography. X rays pass through the object under study to form a shadow image on a photographic plate, as shown in Fig. 16. Unlike medical X radiography, museum X radiography does not require using a fluorescent screen to expose the film. The increased dose of X rays required to expose the film directly is not harmful to the painting, and increased resolution is obtained. Figure 17 illustrates an X radiograph captured by using the arrangement shown in Fig. 16 (52). Flake white is a lead oxide used extensively as a white pigment in European and American paintings. In the example shown in Fig. 17, lead white was applied locally in the undermodeling of the Virgin’s face. The ridge of the nose, the upper lip, and the eye sockets appear as highlights in the X radiograph because the lead white strongly absorbed X rays. Other pigments and cracks in the painting absorbed X rays to a lesser extent and thus appear dark on the X radiograph. Other pigments containing heavy metals, such as lead-tin yellow and vermilion, which contains mercury, absorb X rays in a manner similar to lead white. Table 1 lists the major pigments found in oil paintings and their X radiographic characteristics.
Figure 18 illustrates the utility of X radiography in determining the history and structure of a painting (55). The image on the left is a black-and-white photograph of an oil painting of the crucifixion on a wood panel. The image on the right is an X radiograph of the panel painting. Both images show the same area of the painting, but the right image appears to be of a sitting room scene. Further examination indicated that the sitting room scene had been painted on the wood panel. At a later date, the crucifixion was painted on paper and adhered to the first painting. Cases like this are not uncommon and present a problem to the art conservator about which painting to attempt to conserve and restore. In this illustration, both paintings were recovered and restored. Magnification Radiography. A technique called magnification radiography (57) provides some added versatility to conventional X radiography. As illustrated in Fig. 19 and Eq. (10), a magnification factor M is obtained by placing the painting some fraction of the way between the film cassette and the X-ray source. Magnification radiography has been used to examine the early stages of adhesive separation of paint layers in oil paintings (57). Adhesive loss can lead to curling and loss of paint flakes. Art conservators treat these conditions by techniques ranging from gentle heating and pressure, to application of new adhesive, and to complete relining and readhesion of the paint layer onto a new support.
674
IMAGING SCIENCE IN ART CONSERVATION Table 1. Color, Chemical Composition, and X-Ray Absorption of Artists’ Pigmentsa Color
Name
Chemical Composition
X-Ray Absorption
White
Silver white Zinc white Flake white Chinese white
Silver-lead carbonate
Very high Very high High High
Yellow and orange
Chrome yellow Cadmium yellow Zinc yellow Aurora yellow Yellow ochre Gamboge Naples yellow Mars yellow Yellow lake
Lead chromate Cadmium sulfide Zinc chromate Cadmium sulfide Iron oxide, alumina Organic Lead antimonide Iron oxide Organic
Very high High High High Medium–high Low Very high Medium Low
Red
Red lead Vermilion-cinnabar Vermilion red Carmine lake Madder (rose, brown, purple)
Lead oxide Mercury sulfide Iron oxide Organic Organic
Very high Very high Medium Low Low
Brown
Florence brown Mara brown Prussian brown Sepia
Copper cyanide Iron oxide Iron cyanide Organic
High Medium Medium Low
Blue
Cerulean blue Cobalt blue Light ultramarine Prussian blue Indigo
Cobalt stannate Cobalt aluminate Sodium sulfide Iron cyanide Organic
High Medium Medium Medium–high Low
Violet
Cobalt violet Mars violet Mineral violet
Cobalt phosphate Iron oxide Maganese phosphate
Medium Medium–high Medium
Green
Emerald green Chrome green Cobalt green Green lake
Copper arsenate Chrome oxide Zinc, cobalt oxide Organic
High Medium High Low
Gray and black
Ivory black Iron black Bleu black Lamp black Carbon black
Calcium phosphate & organic Iron oxide Organic Organic Organic
Medium Medium–high Low Low Low
All colors
Acrylic
Organic
Low
a
Ref. 56.
Detailed examination by magnification radiography can guide the conservator in the choice of treatment options. The Sensitometry of X-Ray Imaging. Medical X rays are generally in the 20 to 90 kV range (58), but museum applications involve everything from 5 to 1,000 kV, depending on the type and thickness of the material under analysis (56). For example, very soft X rays of 5 to 10 kV border the ultraviolet region of the electromagnetic spectrum and penetrate only limited depths of materials. Soft X rays are useful for analyzing thin objects such as prints on paper and money. The 10 to 40 kV range is more useful for analyzing of paintings on canvas or wood that have thick gesso grounds. Energies of 80 to 250 kV are used for very thick wood pieces, mummies in heavy coffins, metal sculptures, and other metal objects.
Painting
Film cassette
X-ray source
d1 M=
d2
d1 + d2 d1
Figure 19. Schematic illustration of magnification X radiography.
IMAGING SCIENCE IN ART CONSERVATION
Industrial X rays of 250 to 1,000 kV are used for very thick stone and metal sculptures. The characteristic curve of an X-radiological system may be measured by placing a step wedge of metal on top of the film or cassette. The metal step tablet is machined so that it has a linear increase in metal thickness, as illustrated in Fig. 20. Exposure of the film through this step tablet produces image density values in the developed film that decrease as the thickness of the metal increases. Figure 21 illustrates the resulting sensitometric curves for soft and for hard X rays. It is evident from this that hard X rays are more penetrating and they also provide a broader dynamic range and lower image contrast on film. One of the factors that decreases the contrast in hard X-ray radiographs is secondary radiation emitted by the irradiated object under study. Secondary radiation is lower in energy than the primary radiation from the X-ray source. Primary radiation can be scattered from
Film cassette X-ray source
Aluminum step tablet
Film cassette Figure 20. Illustration of an aluminum step tablet that has steps of 1 through 8 mm thickness.
675
the object in random directions. The amount of scattered primary radiation and of secondary radiation increases both as the thickness of the object and the energy of the primary radiation increase. Thus, archeological and paleontological objects that require hard X rays are subject to much more secondary radiation than oil paintings. Higher contrast film and processing techniques are generally used to compensate for the resultant decrease in contrast. The electrophotographic process, which has an intrinsically high gamma and a peaked MTF for edge enhancement, significantly overcomes much of the loss of contrast caused by secondary radiation (59,60). However, now, the ability to digitize radiographic images provides many additional options for contrast compensation and significantly reduces the need for intrinsically high contrast detection of X rays. Another device used to decrease the effects of secondary radiation and to increase image contrast is the antidiffusion grid (56), as illustrated in Fig. 22. The grid lines are made of alternating of lead and strips a low absorbing material such as paper or aluminum. The grid lines are placed at an angle so that the direct radiation from the X-ray source falls on the film and is absorbed minimally by the grid lines. However, the grid lines significantly increase the absorption of scattered secondary radiation and diffusely reflected primary radiation. The antidiffusion grid is characterized by (a) the focal distance f , as illustrated in Fig. 22, and (b) the grid ratio, defined as the grid height divided by the spacing interval. A grid ratio of 10, that has a grid height of 2 mm and a spacing of 0.20 mm, is commonly used for large museum objects. Reflex Radiography. Before the advent of the office copy machine, office copies were often made on silver halide print paper using the reflex copy technique illustrated in Fig. 23. The reflex process involves exposing through the copy emulsion onto the original document. The light is absorbed in dark regions of the original document. However, in white regions of the original document, the light is reflected back and forth between the document and
Film
Density
f
2
0
X-ray source
0
0.5 log(L )
Object
1
Figure 21. Developed density D of an X-ray film versus log(L), where L is millimeters of aluminum. Data is for hard X rays(x) and soft (O) X rays using the aluminum metal step tablet technique shown in Fig. 17.
Antidiffusion grid Figure 22. Antidiffusion grid (56).
676
IMAGING SCIENCE IN ART CONSERVATION
the emulsion. This significantly increases the probability of image formation by the emulsion. Thus, a significant exposure differential is achieved between black-and-white regions of the original document. The same technique can be used to take an X ray of a very thick object, as illustrated in Fig. 24 (61). The artistic design on the surface of the column varies in X-ray transparency, thus modulating the X-ray dose received by the column material. This in turn modulates the amount of secondary radiation returned to the surface of the column. Then the artistic design again modulates the absorption of the secondary radiation reaching the film. The resulting radiographic image contains information about both the structure of the paint layers and the substructure of the column itself. Such information is useful for planning the restoration and conservation of architectural art. Nonsilver Output for X Radiography. The fluoroscope was a popular X-radiographic technique used in routine medical practice throughout the early 1950s until the harmful effects of X-ray exposure were fully realized. The fluoroscope was used for real time X-ray examination, as illustrated in Fig. 25. The system uses a zinc-cadmium sulfide screen that fluoresces in visible light when struck by X rays. The fluoroscope is no longer used because X-ray exposure is excessive and also because the image viewed on the fluorescent screen has very low visual contrast and definition (56). To eliminate the poor luminosity and low contrast of traditional fluoroscopy, image intensifiers have been developed to increase luminosity by three to four orders of magnitude (56). Figure 26 is a schematic representation of the image intensifier. When X rays strike the fluorescent screen, visible light is emitted. The light then stimulates a photoconductive
Incident light Print paper AgX emulsion Original document printed both sides Figure 23. The reflex copy technique.
Secondary radiation
X-ray source
Artistically decorated architectural column
Ag X film Figure 24. Schematic illustration of reflex radiography, also called X-ray emissiography, applied to an artistically decorated architectural column.
Fluorescent screen
Observation in real time
X-ray source
Object or patient
Figure 25. Schematic illustration of an X-ray fluoroscope.
Anode and fluorescent screen
Fluorescent screen
X ray
e−
Light
Focusing field Photocathode −
+
30 kV Figure 26. Schematic representation of the image intensifier.
cathode. A high voltage amplifies the signal as a high electron current. The electron current is focused imagewise on the anode, which then functions much like a CRT screen. The light output from the anode screen is brighter by several orders of magnitude than the light from the input fluorescent screen (56). The output from the image intensifier is generally captured by a video camera for real-time display on a monitor. Video capture then produces a digital radiographic image, and this allows applying the vast variety of digital enhancement tools. Higher spatial resolution can be achieved by drum scanning a film radiograph, but the convenience of real time X-radiographic examination makes the video system a very versatile tool for museum analysis. X-Ray Fluorescent Imaging. The diffuse secondary radiation that causes decreased contrast in radiographic images (vide supra) is actually the basis of an important technique for elemental analysis. The secondary radiation is a fluorescent emission from metal ions in the irradiated material, and the energy of the fluorescent X rays is a function of the atomic number of the metal atom. X rays can be formed in situ by using a scanning electron microscope. This is the principal of the SEM microprobe technique of analysis, called X-ray fluorescence (XRF), illustrated in Fig. 27 (62). X-ray fluorescent analysis does not have to be carried out on a microscale, nor does it need to be done in a vacuum. Commercial instruments are available that can be rolled up to a painting in a museum. An X-ray source irradiates a point on the painting. Then fluorescent X rays are detected by an energy dispersive X-ray spectrophotometer. The
IMAGING SCIENCE IN ART CONSERVATION
677
Electron beam Painting
Si(Li) detector X-ray detector Metal foil
Fluorescent X rays
In situ X-rays Al al Sample Figure 27. X-ray fluorescent (XRF) probe in an electron microscope.
Figure 29. Schematic drawing of an XRF spectral imaging system that has a scanning X-ray source and an X-ray spectrometer detector.
Fe Relative Intensity
Scanning X-ray source
(a) K
Ti Cu
Si
0
10 X-ray energy, ke V
Figure 28. Schematic illustration from data reported on an XRF analysis of a drawing by Rembrandt (63).
result is a spectrum of intensity versus X-ray energy whose peaks correspond to specific elements. Figure 28 is an illustration from data reported on a study of the elemental composition of inks used in drawings by Rembrandt. Primary X radiation was done at 20 kV (63). Art conservation scientists at the Hermitage Laboratory in Russia developed a scanning XRF system (64). An X-ray tube using electron scanning produced an X-ray beam that can be raster scanned across a 60 × 60 mm area of a painting, as illustrated in Fig. 29. This system reportedly can produce hyperspectral images that cover the X-ray energy range from 0 to 50 keV at a spatial resolution of 0.5 mm. Gamma (g) Radiography. Gamma (γ ) radiation is electromagnetic radiation that has higher energy than X radiation. γ Radiation has been used on occasion to image very large objects opaque even to hard X rays. A γ -ray source such as iridium-192 (300 to 600 keV), cesium-137 (660 keV, or cobalt-60 (1.3 to 1.7 MeV) is contained in a capsule that has a mechanical aperture and shutter. The γ -ray source is used much like an Xray source to expose silver halide film. Figure 30 is an example of an γ radiograph of ‘‘Crouching Aphrodite’’ from the Greek, Etruscan, and Roman antiquities department of the Louvre (56). The underlying mechanical structure of the statue is clearly visible.
(b)
Figure 30. Visible light image and γ radiograph of ‘‘Crouching Aphrodite’’ from the Greek, Etruscan, and Roman antiquities department in the Louvre (56).
678
IMAGING SCIENCE IN ART CONSERVATION
Neutron Radiography. Neutron radiography is an imaging process that is in only limited use in the analysis of oil paintings although it can provide far more information than conventional X radiography (65a,b,c). The reason for the limited use of this technique is that it requires a nuclear reactor to provide a source of cold neutrons. The Ward Center for Nuclear Sciences at Cornell University, Ithaca, NY, provides neutron radiographic services for art conservation (65d). However, as illustrated in Fig. 31, the configuration of the nuclear reactor places constraints on the size and physical integrity of the painting. The Cold Neutron Source facility of the 10-MW research reactor BERII was constructed using a neutron guide tube designed to accommodate oil paintings up to 120 × 120 cm (65c). Transmission neutron radiography can be performed similarly to traditional X radiography by placing a film and a suitable fluorescent screen behind the painting. The neutrons passing through the painting activate the screen to expose the film. Neutron radiography offers a unique advantage over X radiography because neutron absorption occurs by a mechanism very different from the absorption of X rays. Neutrons are not well absorbed by high atomic number materials such as lead but are absorbed strongly by low atomic number atoms such as carbon, hydrogen, and oxygen. Thus transmission neutron
(a)
radiography provides a complement to X radiography by imaging organic material (65c). A more often used technique of neutron radiography is called neutron activation autoradiography (NAAR) (65a,b). This technique involves irradiating a painting using cold neutrons to a degree sufficient to produce radioactive isotopes of atoms used in pigments. The isotopes then decay, and beta and gamma radiation are emitted. Beta rays can be used to excite a fluorescent screen which, in turn, can expose a radiographic film. The relative intensity of beta-ray emission from different metals depends on the relative abundance of the metal, the neutron capture efficiency, and the half-life. The half-lives of isotopes formed from atoms in pigments can range from minutes (27 Al and 60m Co) to hours (56 Mn) to days (203 Hg). Many atoms form multiple isotopes whose half-lives also range from minutes to days. Thus, the relative beta-ray intensity of different isotopes varies significantly over time as the painting loses its radioactivity, and radiographic images formed on film show very significant differences, depending on the delay time between activation and exposure of the film. An excellent didactic illustration of NAAR analysis of an oil painting was published by The Ward Center for Nuclear Sciences at Cornell University, Ithaca, NY (65d). An oil painting measuring 0.229 × 0.178 m was constructed as
(b)
Figure 31. Placing a painting in the TRIGA Nuclear Reactor at the Ward Laboratory, Cornell University, Ithaca, NY (65d).
IMAGING SCIENCE IN ART CONSERVATION
(c)
Figure 32. Construction of a test oil painting to illustrate NAAR imaging (65d).
a test sample using known materials. A sheet of 1/8 Lucite was coated with a commercial, flake white, oil paint (containing both white lead and zinc) as a ground layer. Next, a head was painted on the ground layer using raw umber containing manganese, as illustrated in Fig. 32. Then grid lines were drawn over the head using cobalt blue. Then the grid squares were painted in with cobalt blue to cover the head image. A cadmium red paint composed of both selenium and cadmium was then used to paint a mask over both the head and the grid. Once dried, the painting was inserted into the vertical aluminum pipe of the reactor, as illustrated in Fig. 31, where it was exposed to a thermal neutron environment of 109 ns−1 . cm−2 for 20 minutes. Following the activation, the radioactive painting was removed and a sequence of autoradiographs was made by placing it in direct contact with Polaroid type AR positive transparency film. Four films were exposed at increasing delay times after neutron activation, as shown in Table 2. Note that as time passes, the radioactivity decays and requires much longer exposure times to capture an image. The four autoradiographs are shown in Fig. 33. The 15-minute image shows a predominance of cobalt in the Table 2. Delay Times and Film Exposure Times Time after Activation
Film Exposure Time
15 minutes 2 hours 4 days 36 days
10 minutes 2 hours 2 days 21 days
(a)
(b)
(c)
(d)
Figure 33. Autoradiographs of the test painting in Fig. 32 made (a) 15 minutes, (b) 2 hours, (c) 4 days, and (d) 32 days after neutron activation.
Cd Cd Relative signal strength
(b)
(a)
679
Cd
Cd
Mn
Zn Cd
60mCo
27Al
Mn
Mn 15 min
Mn 2 hr
Cd Cd Se Se
Se
65Zn
Co
Co
65Zn
Co
Co 36 days
4 days
Relative gamma-ray energy Figure 34. Gamma-ray spectra of the painting of Fig. 32 taken at different delay times.
grid and blocks. Manganese in the head and cadmium in the mask are visible but less pronounced. The presence of the cadmium is also shown by its attenuation of beta rays from the cobalt. Gamma radiation emitted by the decay of isotopes provides another indication of the composition of the painting. The energy of a gamma ray from the decay of a particular isotope is diagnostic of the isotope. Immediately before exposing film to make an autoradiograph, the test painting in Fig. 32 was examined using a gamma-ray spectrometer. The results are shown in Fig. 34. The shortlived 60m Co in the grid and blocks, isotopes of Mn in the head, and of Cd in the mask are all evident. In addition, an isotope of Al is present. Aluminum stearate, present in most of the paints tested, is commonly used as a stabilizing agent in manufacturing modern oil paints.
680
IMAGING SCIENCE IN ART CONSERVATION
Figure 33(b) after the 2-hour delay time shows the disappearance of the grid/checkerboard image. The shortlived 60m Co is no longer active, as shown in the gamma-ray spectrum, but isotopes of manganese, cadmium, and zinc now predominate. The zinc was present in the white ground layer and produces a general fogging of most of the painting’s surface. Figure 33(c) shows the radiograph after a 4-day delay, where the mask is the dominant feature of the image. As shown in the corresponding spectrum in Fig. 34, both cobalt and manganese have decayed and left the cadmium isotopes in the mask as the dominant species that forms the radiographic image. Zinc and cobalt are indicated in the spectrum but are not manifested in the radiographic image. After a delay of 36 days, the gamma spectrum shows complete decay of the Cd isotopes and the emergence of isotopes of selenium, which is a component of cadmium red pigment. Thus, the radiograph in Figure 33(d) shows features of the grid, blocks, and mask, but the head is not visible. The versatility of NAAR imaging, coupled with gamma spectroscopy, is clearly evident.
(a)
Examination by Digital Image Processing
(c)
Digital image processing has had as great an impact on museum imaging as it has had on other imaging applications. The ability to enhance contrast, segment histograms, apply convolution kernels, do Fourier filtering, do morphological analysis, and perform multi-image processing has significantly improved the utility of all of the analytical imaging technologies used in studying of museum objects (66–72,74). One example demonstrates the utility of image processing for replacing the inconvenience of β radiography by simple techniques of digital video capture and optical analysis (29). Figure 35 shows images of a nineteenth-century Spanish ledger captured (a) in transmitted light and (b) in reflected light. The images were calibrated, so that the pixel values correspond to reflectance values R and transmittance values T. The transmittance of image (b) is governed by the transmittance Tp , of the paper and by the absorption coefficient K of the ink [T = f (Tp , K)]. Similarly, the reflectance of image (a) is a function of these same two variables [T = g(Tp , K)]. Kubelka–Munk theory provides the two functions f and g, so inversion allows processing the two images to generate an image showing only Tp , the transmittance of the paper. This reveals the watermark where the visually obscuring ink is digitally removed, shown as (c) in Fig. 35. A unique and somewhat controversial application of digital image processing is in the stylistic analysis and authentication of works of art. Expert art historians can examine a painting visually to determine the artistic style of the work and thus identify the artist. Technical analysis often provides essential confirmation, but stylistic analysis by experts remains of key importance in museums. Some attempts have been made to develop quantitative computer analytical techniques for stylistic analysis to supplement the work of the experts. For example, optical character recognition (OCR) has been applied successfully as a supplemental tool used by
(b)
Figure 35. Watermark analysis (29): (a) Back lit; (b) Front lit; (c) Processed.
scribes to authenticate Hebrew manuscripts for which accurate laws of calligraphy have been established (72). However, other types of hand script, executed using less rigorously governed laws of calligraphy, are much more difficult to recognize by OCR software. One would also anticipate difficulties in attempts to quantify the stylistic features of artists. Nevertheless, several publications have made the attempt. Techniques applied to the analysis of painting style have included histogram analysis (1,74), geometric analysis and element juxtaposition (75–77), and statistical regression (78–81). Although a computer substitute for the expert art historian has not been achieved, studies of this kind, coupled with analytical imaging techniques, have provided insights into the working styles of many artists. BIBLIOGRAPHY 1. See, for example, the journal, A. Hamber, J. Miles, and W. Vaughn, eds., Computers and the History of Art, Mansell, London and New York. 2. H. Besser, J. Am. Soc. Inf. Sci. 42, 589 (1991).
IMAGING SCIENCE IN ART CONSERVATION 3. C. Bard, in C. N. Proudfoot, ed., Handbook of Photographic Science and Engineering, 2nd ed., IS&T, Springfield, VA, 1999, p. 431. 4. T. Suga, in C. N. Proudfoot, ed., Handbook of Photographic Science and Engineering, 2nd ed., IS&T, Springfield, VA, 1999, pp. 586–587. 5. D. J. Waters, Report to The Commission on Preservation and Access, Washington, D.C., 1991. 6. J. Wallace, J. Imaging Technol. 17, 107 (1991). 7. Eastman Kodak Company, White Light Printing Methods, Kodak Technical Publication CIS-22, Eastman Kodak, Rochester, NY, 1979. 8. Eastman Kodak Company, Color Corrected Duplicates from Faded Color Transparencies Using Copy Negatives of Kodak Vericolor Internegative Films 4112 & 6011, Kodak Technical Publication CIS-28, Eastman Kodak, Rochester, NY, 1981. 9. R. Gschwind and F. Frey, J. Imag. Sci. Technol. 38, 513–519 (1994). 10. H. Besser, J. Am. Soc. Inf. Sci. 42, 589–596 (1991). 11. M. Ester, Visual Resourc. 7, 327–352 (1991). 12. H. Besser, Mus. Stud. J. 74–81 (Fall/Winter 1987). 13. L. MacDonald, Adv. Imaging 24–27 (Sept 1990). 14. A. Hamber, J. Miles, and W. Vaughan, eds., Comput. Hist. Art, Mansell, London and New York, 1989. 15. R. Baribeau, M. Rioux, and G. Godin, IEEE Trans. Pattern Anal. Mach. Intelligence 14, 263–269 (1992) and references therein. 16. J. M. Taylor and I. N. M. Wainwright, in Museums and Information, Manitoba Museum of Man and Nature, Winnipeg, 1990, pp. 151–154. 17. J. Trant, Inf. Serv. Use 15, 353–364 (1995). 18. A. E. Cawkell, Inf. Serv. Use 12, 301–325 (1992). 19. M. W. Smith, INSPEL 26, 212 (1992). 20. R. Thibadeau, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 21 April 1999, http://www.ecom. cmu.edu/preservation. 21. A. R. Calmes and E. A. Miller, SPIE 901, 61–64 (1988). 22. D. Saunders, Comput. Humanities 31, 153–167 (1998). 23. A. Hamber, Comput. Hist. Art 1, 3–19 (1991). 24. N. Ohta, J. Imaging Sci. Technol. 36, 63–72 (1992). 25. D. Y. Tzeng and R. S. Berns, Proc. 6th IS&T/SID Color Imaging Conf., 1998, p. 112. 26. C. Lahanier, Mikrochimica Acta 2, 245–254 (1991). 27. J. F. Mowery, Restaurator 12, 110–115 (1991). 28. J. S. Arney, J. M. Mauer, J. Imaging Sci. Technol. 38, 145–153 (1994). 29. J. S. Arney and D. Stewart, J. Imaging Sci. Technol. 37, 504–509 (1993). 30. J. S. Arney, D. Stewart, and R. A. Scharf, J. Imag. Sci. Technol. 39, 261–267 (1995). 31. B. Ford, I. MacLeod, and P. Haydock, Stud. Conserv. 39, 57–69 (1994). 32. R. Freere, 8th Triennial Meet. ICOM Comm. Conserv. Getty Conservation Institute, Los Angeles, 1987. 33. W. C. McCrone, J. Am. Inst. Conserv. 33, 101–104 (1994). 34. M. J. D. Low and N. S. Baer, Stud. Art Conserv. 22, 116–128 (1977). 35. W. C. McCrone, J. Int. Inst. Conserv.-Can. Group 7, 11–34 (1982). 36. J. -S. Tsang and R. H. Cunningham, J. Am. Inst. Conserv. 30, 163–177 (1991).
681
37. M. R. Derreck, E. F. Doehne, A. E. Parker, and D. C. Stulik, J. Am. Inst. Conserv. 33, 171–84 (1994). 38. M. H. McCormick-Goodhart, Working Group 8 of ICOM Committee for Conservation, Photogr. Rec. 1, 262–267 (1990) (a) Images provided by MOLART Research Committee, NWO, P.O. Box 93138, 2509 AC Den Haag, The Netherlands. 39. J. R. J. Van Asperen de Boer, Sci. Technol. Eur. Cult., Proc. Symp. 1989, 1991, pp. 278–83. 40. J. R. J. Van Asperen de Boer, Infrared Reflectography, Central Research Laboratory for Objects of Art and Science, Amsterdam, 1970. 41. J. J. Rorimer, Ultraviolet Rays in the Examination of Art, The Metropolitian Museum of Art, NY, 1931. 42. A. M. De Wild, The Scientific Examination of Pictures, G. Bell and Sons, London, 1929. 43. Eastman Kodak Company, Infrared and Ultraviolet Photography, Kodak Publication No. M-3, Rochester, NY, 1953. 44. J. M. Messinger II, J. Am. Inst. Conserv. 31, 267–274 (1992). 45. Eastman Kodak Company, Applied Infrared Photography, Kodak Publication No. M-28, Rochester, NY, p. 36 ff. 46. C. Hoeniger, J. Am. Inst. Conserv. 30, 115–124 (1991). 47. C. H. Olin and T. G. Carter, in IIC-American Group Technical Papers from 1968 through 1970, International Institute for Conservation-American Group, 1970, pp. 83–88. 48. R. M. Johnson and R. L. Feller, in Application of Science in Examination of Works of Art, Museum of Fine Arts, Boston, 1965, pp. 86–95. 49. P. Kubelka and F. Munk, Z. Tech. Physik. 12, 593 (1931). 50. E. Walmsley, C. Metzger, C. Fletcher, and J. K. Delaney, Stud. Conserv. 39, 217–231 (1994). 51. E. Walmsley, C. Fletcher, and J. Delaney, Stud. Conserv. 37, 120–131 (1992). 52. Images from ‘‘Looking at Paintings: A Guide to Technical Terms’’, by Dawson W. Carr and Mark Leonard, The J. Paul Getty Museum in association with British Museum Press, 1992, pp. 74–75. Technical images courtesy of the Straus Center for Conservation, Harvard University Art Museums. 53. D. Bertani et al., Stud. Conserv. 35, 113–116 (1990). 54. B. Manz, Adv. Imaging 44–47 (Oct. 1992). 55. J. R. Druzik, D. L. Glackin, D. L. Lynn, and R. Quiros, J. Am. Inst. for Conserv. 22, 49–56 (1982). 56. Images provided by DAMRI, CEA Saclay 91193 Gif-surYvette Cedex, France, Tel: 33 (0)1 69 08 27 09, Fax: 33 (0)1 69 08 95 29,
[email protected] 57. Eastman Kodak Company, Medical Radiography and Photography, Kodak Publication, vol. 63-1, Rochester, NY, 1987. 58. A. A. Moss, Handbook for Museum Curators, B-4, Museums Association, London, 1954. 59. R. E. Alexander and R. H. Johnston, Int. Conf. Nondestructive Testing Conserv. Works Art I/3, 1–17, 1983. 60. M. E. Scharfe, D. M. Pai and R. L. Gruber, ‘‘Electrophotography’’, Imaging Processes and Materials, J. Sturg, V. Walworth and Allan Shepp ed., 8th Ed., Ch. 5 Published by Van Nostrand Reinhold, NY, 1989, pp. 135–180. 61. S. Miura, in 8th Triennial Meet. ICOM Comm. Conserv., Getty Conservation Institute, Los Angeles, 1987. 62. I. M. Watt, Electron Microscopy, Cambridge University Press, 1997, p. 288. 63. Joice Zucker, J. Am. Inst. Conserv. 38(3) (1999). 64. A. J. Kossolapov, 9th Triennial Meet. ICOM Comm. Conserv., Getty Conservation Institute, Los Angeles, Aug. 1990, pp. 44–46.
682
IMAGING SCIENCE IN ASTRONOMY
65 (a) E. V. Sayre and H. N. Lechtman, Stud. Conserv. 13, 161–185 (1968); (b) M. W. Ainsworth et al., Art and Autoradiography: Insights into the Genesis of Paintings by Rembrandt, Van Dyck, and Vermeer, The Metropolitan Museum of Art, NY, 1982; (c) C. O. Fischer et al., Nucl. Instrum. Methods A 424, 258–262 (1999); (d) Images and supporting text provided by Ward Laboratory, Cornell University, Ithaca, NY 14853-7701. 66. D. L. Glackin and E. P. Korsmo, Jet Propulsion Laboratory, Final Report 83-75, JPL Publications, Pasadena, 1983. 67. J. R. Druzik, D. Glackin, D. Lynn, and R. Quiros, 10th Annu. Meet. Am. Inst. Conserv., 1982, pp. 71–72. 68. E. J. Wood, Textile Res. J. 60, 212–220 (1990). 69. F. Heitz, H. Maitre, and C. DeCouessin, IEEE Trans. Acous., Speech Signal Process. 38, 695–704 (1990). 70. J. Sobus, B. Pourdeyhimi, B. Xu, and Y. Ulcay, Textile Res. J. 62, 26–39 (1992). 71. K. Knox, R. Johnston, and R. L. Easton Jr., Opt. Photonics News 8, 30–34 (1997). 72. L. Likforman-Sulem, H. Maitre, and C. Sirat, Pattern Recognition 24, 121–137 (1991). 73. E. Lang, and D. Watkinson, Conserv. News 47, 37–39 (1992). 74. F. Su, OE Reports SPIE 99, 1,8–(1992). 75. J. L. Kirsch and R. A. Kirsch, Leonardo 21, 437–444 (1988). 76. J. F. Asmus, Opt. Eng. 28, 800–804 (1989). 77. R. Sablatnig, P. Kammerer, and E. Zolda, Proc. 14th Int. Conf Pattern Recognition, 1998, pp. 172–174. 78. L. R. Doyle, J. J. Lorre, and E. B. Doyle, Stud. Conserv. 31, 1–6 (1986). 79. J. Asmus, Byte Magazine March (1987). 80. P. Clogg, M. Diaz-Andreu, and B. Larkman, J. Archaeological Sci. 27, 837 (2000). 81. P. Clogg and C. Caple, Imaging the Past, British Museum Occasional Paper, London, 1996, p. 114.
IMAGING SCIENCE IN ASTRONOMY JOEL H. KASTNER Rochester Institute of Technology Rochester, NY
INTRODUCTION The vast majority of information about the universe is collected via electromagnetic radiation. This radiation is emitted by matter distributed across tremendous ranges in temperature, density, and chemical composition. Thus, more than any other science, astronomy depends on innovative methods to extend image taking to new, unexplored regions of the electromagnetic spectrum. To bring sufficient breadth and depth to their studies, astronomers also require imaging capability across a vast range in spatial resolution and sensitivity, with emphasis on achieving the highest possible resolution and signal gain in a given wavelength regime. This simultaneous quest for better wavelength coverage and ever higher spatial resolution and sensitivity represents the driving
force for innovation and discovery in astronomical imaging. Classical astronomy — for example, the search for new solar system objects and the classification of stars — is still largely conducted in the optical wavelength regime (400–700 nm). This has been the case, of course, since humans first imagined the constellations, noted the appearance of ‘‘wandering stars’’ (planets), and recorded the appearance of transient phenomena such as comets and novae. During the latter half of the twentieth century, however, a revolution in astronomical imaging took place (1). This relatively brief period in recorded history saw the development and rapid refinement of techniques for collecting and detecting electromagnetic radiation across a far broader wavelength range, from the radio through γ rays. Just as these techniques have reached maturation, astronomers have also developed the means to surmount apparently fundamental physical barriers placed on image quality, such as the distorting effects of refraction by Earth’s atmosphere and diffraction by a single telescope of finite aperture. The accelerating pace of these innovations has resulted in deeper understanding of, and heightened appreciation for, both the rich diversity of astrophysical phenomena and the fundamental, unsolved mysteries of the cosmos. OPENING THE WINDOWS: MULTIWAVELENGTH IMAGING For most of us, our eyes provide our first, fundamental contact with the universe. It is interesting to ponder how humans would conceive of the universe if we had nothing more in the way of imaging apparatus at our disposal, as was the case for astronomers before Galileo. In contrast to the complex cosmologies currently pondered in modern physics, most of which involve an expanding universe shadowed by the afterglow of the Big Bang, the ‘‘first contact’’ provided by our eyes produces a model of the universe that is entirely limited to the Sun, Moon, and planets, the nearby stars, and the faint glow of the collective background of stars in our own Milky Way galaxy and a handful of other, nearby galaxies. From this simple thought experiment, it is clear that the bulk of the visible radiation arriving at Earth is emitted by stars. But the apparent predominance of visible light from the Sun and nearby stars is in fact merely an accident of our particular position in the universe, combined with the evolutionary adaptation that gave our eyes maximal sensitivity at wavelengths of electromagnetic radiation that are near the maximum of the Sun’s energy output. The Sun provides by far the majority of the visible radiation arriving at Earth strictly by virtue of its proximity. The brightest star in the night sky, Sirius (in the constellation Canis Major), actually has an intrinsic luminosity about 50 times larger than that of the Sun, but is about 8.6 light years distant (a light year is the distance traveled by light in one year, 9 × 1012 km; the Sun is about 8 light minutes from Earth). In turn, Sirius is only about one-ten-thousandth as luminous as the star Rigel (in the neighboring constellation Orion),
IMAGING SCIENCE IN ASTRONOMY
683
but Sirius appears several times brighter than Rigel because it is about 50 times closer to us. Like the Sun, which has a surface temperature of about 6,000 K, most of the brightest stars have surfaces within the range of temperatures across which hot objects radiate very efficiently (if not predominantly) in the visible region. Representative stellar surface temperatures are 3,000 K for reddish Betelgeuse, a red supergiant in Orion; 10,000 K for Sirius; and 15,000 K for the blue supergiant Rigel (Fig. 1). Thermal Continuum Emission The tendency of objects at the temperatures of the Sun and stars to emit in the visible can be understood to first order via Planck’s Law, which describes the wavelength dependence of radiation emitted by a perfect blackbody. The peak of the Planck function lies within the visible regime for an object at a temperature of 6,000 K. This same fundamental physical principle tells us that objects much hotter or cooler than the Sun should radiate predominantly at wavelengths much shorter or longer than visible, respectively. Indeed, for a perfect blackbody, the peak wavelength of radiation is given by Wien’s displacement law (2), 0.51 . (1) λ(cm) ∼ T(K) This relationship between the temperatures of objects and the wavelengths of their emergent radiation allows us to understand why Betelgeuse appears reddish and Rigel appears blue (Fig. 2).
106 105
Rigel Deneb
Betetgeuse
M
Spica
Superglants
ts lan dg e Capella R Aldebaran Vega Arcturus Sirius A Pollux Altair Procyon A
qu se
103
n ai
102
ce en
Intrinsic brightness (Sun = 1)
104
10 1
Sun
10−1
W
hit
e
10−2
dw ar
ts Sirius B Procyon B
10−3 40,000
20,000
10,000 6,000 4,000 3,000 2,000
Figure 2. Wide-field photograph of Orion, illustrating the difference in color between the relatively cool star Betelgeuse (upper left) and the hot star Rigel (lower right). The large, red object at the lower center of the image, just below Orion’s belt, is the Orion Nebula (see Fig. 7). (Photo credit: Till Credner, AlltheSky.com) See color insert.
The same, simple relationship also provides powerful insight into astrophysical processes that occur across a very wide range of energy regimes (Fig. 3). The lowest energies and hence longest (radio) wavelengths reveal ‘‘cold’’ phenomena, such as emission from dust and gas in optically opaque clouds distributed throughout interstellar space in our galaxy. At the highest energies and hence shortest wavelengths (characteristic of X rays and γ rays), astronomers probe the ‘‘hottest’’ objects, such as the explosions of supermassive stars or the last vestiges of superheated material that is about to spiral into a black hole.
Stars' surface temperature (K) Figure 1. The Hertzsprung–Russell diagram. The diagram shows the main sequence (Sun-like stars that are fusing hydrogen to helium in the cores), red giants, supergiants, and white dwarfs. In addition, the positions of the Sun, the twelve brightest stars visible from the Northern Hemisphere, and the white dwarf companions of Sirius and Procyon are indicated [Source: NASA (http://observe.ivv.nasa.gov/nasa/core.shtml.html)]. See color insert.
Nonthermal Continuum Emission Certain radiative phenomena in astrophysics do not strongly depend on the precise temperature of the material and are instead sensitive probes of material density and/or chemical composition (3,4). For example, the emission from ‘‘jets’’ ejected from supermassive black holes at the centers of certain galaxies (Fig. 4) is said to be
684
IMAGING SCIENCE IN ASTRONOMY
Figure 3. Schematic diagram showing various regimes of the electromagnetic spectrum in terms of temperatures corresponding to emission in that regime. The diagram also illustrates the wavelength ‘‘niches’’ of NASA’s four orbiting ‘‘Great Observatories.’’ [Source: NASA/Chandra X-Ray Center (http://chandra.harvard.edu)]. See color insert.
‘‘nonthermal’’ because its source is high-velocity electrons that orbit around magnetic field lines. Other, similar examples are the emission from filaments of ionized gas located near the center of our own galaxy and from the chaotic remnant of the explosion of a massive star in 1054 A.D. (the ‘‘Crab Nebula’’). Such so-called ‘‘synchrotron radiation’’ often dominates radiation emitted in the radio wavelength regime (Fig. 5). Indeed, if human eyes were sensitive to radio rather than to visible wavelengths, the early mariners probably would have navigated by the Galactic Center and the Crab because they appear from Earth as the brightest stationary radio continuum sources in the northern sky. The synchrotron emission from the Crab is particularly noteworthy; it can be detected across a very broad wavelength range from radio through X ray (Fig. 6). Monochromatic (‘‘Line’’) Emission and Absorption Deducing Chemical Compositions. Astronomers use electronic transitions of atoms (as well as electronic, vibrational, and rotational transitions of molecules) as Rosetta stones to understand the chemical makeup of gas in a wide variety of astrophysical environments. Because each element or molecule radiates (and absorbs radiation) at a discrete and generally well-determined set of wavelengths — specified by that element’s particular subatomic structure — detection of an excess (or deficit) of
emission at one of these specific wavelengths1 is both necessary and sufficient to determine the presence of that element or molecule. Hence, our knowledge of the origin and evolution of the elements that make up the universe is derived from astronomical spectroscopy (which might also be considered multiband, one-dimensional imaging). Spectra obtained by disparate means across a very broad range of wavelengths can be used to ascertain both chemical compositions and physical conditions (i.e., temperatures and densities) of astronomical sources because the emissive characteristics of a given element depend on the physical conditions of the gas or dust in which it resides. For example, cold (100 K), largely neutral hydrogen gas emits strongly in the radio at 21 cm, whereas hot (10,000 K), largely ionized hydrogen gas emits at a series of optical wavelengths (known as the Balmer series). The former conditions are typical of the gas that permeates interstellar space in our own galaxy and in external galaxies, and the latter conditions are typical of gas in the proximity of very hot stars, which are sources of ionizing ultraviolet light. Such ionized gas also tends to glow brightly in the emission lines of heavier elements such as oxygen, nitrogen, sulfur, and iron (Fig. 7). 1 Such spectral features are called ‘‘lines,’’ because they appeared as dark lines in early spectra of the Sun.
IMAGING SCIENCE IN ASTRONOMY
Figure 4. At a distance of 11 million light years, Centaurus A is the nearest example of a so-called ‘‘active galaxy.’’ This radio image shows opposing ‘‘jets’’ of high energy particles blasting out from its center [Source: National Radio Astronomy Observatory (NRAO)]. See color insert.
Figure 5. The Crab Nebula is the remnant of a supernova explosion that was seen from the earth in 1054 A.D. It is 6,000 light years from Earth. This radio image shows the complex arrangement of gas filaments left in the wake of the explosion (Source: NRAO). See color insert.
Deducing Radial Velocities from Spectral Lines. Atomic and molecular emission lines also serve as probes of bulk motion. If a given source has a component of velocity along our line of sight, then its emission lines will be
685
Figure 6. X-ray image of the innermost region of the Crab Nebula. This image covers a field of view about one-quarter that of the radio image in the previous figure. The image shows tilted rings or waves of high-energy particles that appear to have been flung outward across a distance of a light year from the central star (Source: Chandra X-Ray Center). See color insert.
Figure 7. Color mosaic of the central part of the Great Nebula in Orion, obtained by the Hubble Space Telescope. Light emitted by ionized oxygen is shown as blue, ionized hydrogen emission is shown as green, and ionized nitrogen emission as red. The sources of ionization of the nebula are the hot, blue-white stars of the young Trapezium cluster, which is embedded in nebulosity just left of center in the image (Source: NASA and C.R. O’Dell and S.K. Wong). See color insert.
Doppler shifted away from the rest wavelength. The absorption or emission lines of sources that approach
686
IMAGING SCIENCE IN ASTRONOMY
Velocity (km/s)
30,000
20,000
10,000
0
0
100
200 300 Distance (Mpc)
400
500
Figure 8. Plot of recession velocity vs. distance [in megaparsecs (Mpc); 1 Mpc ≈ 3 × 1019 km] for a sample of galaxies. This figure illustrates that, to very high accuracy, the recession velocity of a distant galaxy, as measured from its redshift, is directly proportional to its distance. This correlation was first established in 1929 by Edwin Hubble and underpins the Big Bang model for the origin of the Universe (Figure courtesy Edward L. Wright, 1996).
us are shifted to shorter wavelengths and are said to be ‘‘blueshifted,’’ whereas the lines of sources moving away from us are shifted to longer wavelengths and are said to be ‘‘redshifted.’’ The observation by Hubble in 1929 that emission lines of distant galaxies are uniformly redshifted and that these redshifts increase monotonically as the distances of the galaxies increase, underpins modern theories of the expansion of the universe2 (Fig. 8). Images obtained at multiple wavelengths that span the rest wavelength of a bright spectral line can allow astronomers to deduce the spatial variation of lineof-sight velocity for a source whose velocity gradients are large. Such velocity mapping, which is presently feasible at wavelengths from the radio through the optical, helps elucidate the three-dimensional structure of sources (Fig. 9). Multiwavelength Astronomical Imaging: An Example Planetary nebulae represent the last stages of dying, Sunlike stars. These highly photogenic nebulae are formed after the nuclear fuel at the core of a Sun-like star has been spent, that is, the bulk of the core hydrogen has been converted to helium. The exhaustion of core hydrogen and the subsequent nuclear fusion, in concentric shells, of hydrogen into helium and helium into carbon around the 2
In practice, all astrophysical sources that emit line radiation — even those within our solar system — will appear Doppler shifted, due for example, to the Earth’s motion around the Sun. Hence it is necessary to account properly for ‘‘local’’ sources of Doppler shifts when deducing the line-of-sight velocity component of interest.
spent core causes the atmosphere of the star to expand, forming a red giant. Although the extended atmospheres of red giants are ‘‘cool’’ enough (∼3,000 K) for dust grains to condense out of the stellar gas, red giant luminosities can be huge (more than 10,000 times that of the Sun). This radiant energy pushes dust away from the outer atmosphere of the star at speeds of 10–20 km s−1 . The outflowing dust then collides with and accelerates the gas away from the star, as well. Eventually enough of the atmosphere is removed so that the hot, inert stellar core is revealed. This hot core is destined to become a fossil remnant of the original star: a white dwarf. But before the ejected atmosphere departs the scene entirely, it is ionized by the intense ultraviolet light from the emerging white dwarf, which has cooled from core nuclear fusion temperatures (107 to 108 K) to a ‘‘mere’’ 105 K or so. The ionizing radiation from the white dwarf causes the ejected gas to fluoresce, thereby producing a planetary nebula. Because the varied conditions that characterize the evolution of planetary nebulae result in a wide variety of phenomena in any given nebula, such objects demand a multiwavelength approach to imaging. A case in point is the young planetary nebula BD +30° 3639 (Fig. 10). This planetary nebula emits strongly at wavelengths ranging from radio through X ray. The Chandra X-ray image shows a region of X-ray emission that seems to fit perfectly inside the shell of ionized and molecular gas seen in Hubble Space Telescope images and in other high-resolution images obtained from the ground. The optical and X-ray emitting regions of BD +30° 3639, which lies about 5,000 light years away, are roughly 1 million times the volume of our solar system. The X-ray emission apparently originates in thin gas that is heated by collisions between the ‘‘new’’ wind blown by the white dwarf, which is seen at the center of the optical and infrared images, and the ‘‘old,’’ photoionized red giant wind, which appears as a shell of ∼10,000 K gas surrounding the ‘‘hot bubble’’ of X-ray emission. REQUIREMENTS AND LIMITATIONS To understand the requirements placed on spatial resolution and sensitivity in astronomical imaging, we must consider the angular sizes and energy fluxes of astronomical objects and phenomena of interest. In turn, there are three fundamental sources of limitation on the resolution and limiting sensitivity (and hence quality) of astronomical images: the atmosphere, the telescope, and the detector. Spatial Resolution Requirements: Angular Size Scales of Astronomical Sources. Figure 11 shows schematically typical scales of physical size and distance from Earth for representative objects and phenomena studied by astronomers. Most of the objects of intrinsically small size, like the Sun, Moon, and the planets in our solar system, lie at small distances; we can study these small objects in detail only because they are relatively close, such that their angular sizes are substantial.
IMAGING SCIENCE IN ASTRONOMY
687
38′′41′45′′
38′′41′40′′
38′′41′36′′
Dec. (2000.0)
38′′41′30′′ 38′′41′45′′
38′′41′40′′
38′′41′35′′
38′′41′30′′ 21h02m19.0
21h02m18.5 R.A. (2000.0)
21h02m18.0
Figure 9. Radio maps of the Egg Nebula, a dying star in the constellation Cygnus, showing emission from the carbon monoxide molecule. At the lower left is shown blueshifted CO emission, and at the lower right redshifted emission; the upper right panel shows the total intensity of CO emission from the source. One interpretation for the localized appearance of the blueshifted and redshifted CO emission is that the Egg Nebula is the source of a complex system of ‘‘molecular jets,’’ shown schematically in the top left panel. Such jets may be quite common during the dying stages of Sun-like stars [Source: Lucas et al. 2000 (5)]. See color insert.
Within our own Milky Way galaxy, we observe objects that span a great range of angular size scales. The angular size of a Sun-like star at even a modest distance makes such stars a challenge to resolve spatially, even with the best available techniques. On the other hand, many structures of interest in our own Milky Way galaxy, such as star-forming molecular clouds and the expelled remnants of dying or expired stars, are sufficiently large that their angular sizes are quite large.3 Certain giant molecular clouds, planetary nebulae, and supernova remnants subtend solid angles similar to that of the Moon. Just as for stars, the angular sizes of external galaxies span a very wide range. The Magellanic Clouds, which are the nearest members of the Local Group of galaxies (of which the Milky Way is the most massive and luminous
3
The ejected envelopes of certain dying, sun-like stars were long ago dubbed ‘‘planetary nebulae’’ because their angular sizes and round shapes resembled the planets Jupiter and Saturn.
member), are detectable and resolvable by the naked eye, whereas the Andromeda galaxy (a Local Group member that is a near-twin to the Milky Way) is detectable and resolvable with the aid of binoculars. The angular sizes of intrinsically similar galaxies in more distant galaxy clusters span a range similar to that of the planets in our solar system. The luminous cores of certain distant galaxies (‘‘quasars’’) — which can outshine their host galaxies — likely have sizes only on the order of that of our solar system; yet these are some of the most distant objects known, and hence quasars are exceedingly small in angular size. Galaxy clusters themselves are of relatively large angular size, simply by virtue of their enormous size scales; indeed, such clusters (and larger scale structures that consist of clusters of such clusters) probably represent the largest gravitationally bound structures in the universe. At still larger size scales lies the cosmic background radiation, the radiative remnant of the Big Bang itself. This radiation encompasses 4π steradians and has only very subtle variations in intensity with position across the sky.
688
IMAGING SCIENCE IN ASTRONOMY
Figure 10. Optical (left), infrared (center), and X-ray (right) images of the planetary nebula BD +30° 3639 [Source: Kastner et al. 2000 (6)]. The optical image was obtained by the Wide Field/Planetary Camera 2 aboard the Hubble Space Telescope in the light of doubly ionized ˚ The infrared image was obtained by the 8-meter Gemini North sulfur at a wavelength of 9,532 A. telescope at a wavelength of 2.2 µm (also referred to as the infrared K band). The X-ray image was obtained by the Advanced CCD Imaging Spectrometer aboard the Chandra X-Ray Observatory, ˚ Images are presented at the same spatial and covers the wavelength range from ∼7 A˚ to ∼30 A. scale. See color insert.
Of course, even within our solar system, there are sources of great interest (e.g., the primordial, comet-like bodies of the Kuiper Belt) that are sufficiently small that they are unresolvable by present imaging techniques. Sources of large angular sizes (such as molecular clouds, planetary nebulae, supernova remnants, and galaxy clusters) typically show a great wealth of structural detail when imaged at high spatial resolution. Thus, our knowledge of objects at all size and distance scales improves with any increase in spatial resolving power at a given wavelength. Limitations
Atmosphere. Time- and position-dependent refraction by turbulent cells in the atmosphere causes astronomical point sources, such as stars, to ‘‘scintillate’’; i.e., stars twinkle. Scintillation occurs when previously planeparallel wave fronts from very distant sources encounter atmospheric cells and become distorted. Astronomers use the term ‘‘seeing’’ to characterize such atmospheric image distortion; the ‘‘seeing disk’’ represents the diameter of an unresolved (point) source that has been smeared by atmospheric distortion. Seeing varies widely from site to site, but optical seeing disks at visual wavelengths are typically not smaller than (that is, the seeing is not better than) ∼1 at most mountaintop observatories. Telescope. The diameter of a telescope places a fundamental limitation on the angular resolution at a given wavelength. Specifically, the limiting angular resolution (in radians) is given by θ ≈ 1.2
λ d
(2)
where θ is the angle subtended by a resolution element, λ is the wavelength of interest, and d is the telescope diameter. This relationship follows from consideration of simple interference effects of wave fronts incident
on a circular aperture, in direct analogy to planeparallel waves of wavelength λ incident on a single slit of size d. The resulting intensity distribution for a point source (known as the ‘‘point-spread function’’) is in fact a classical diffraction pattern, a central disk (the ‘‘Airy disk’’) surrounded by alternating bright and dark annuli. In ground-based optical astronomy using large telescopes, atmospheric scintillation usually dominates over telescope diffraction (that is, the ‘‘seeing disk’’ is much larger than the ‘‘Airy disk’’), and such a diffraction pattern is not observed. However, in spacebased optical astronomy or in ground-based infrared and radio astronomy, diffraction represents the fundamental limitation on spatial resolution.
Detector. Charge-coupled devices (CCDs) have been actively used in optical astronomy for more than two decades. During this period, CCD pixel sizes have steadily decreased, and array formats have steadily grown. As a result, CCDs have remained small and still maintain good spatial coverage. Detector array development at other wavelength regimes lags behind the optical, to various degrees, in number and spacing of pixels. However, almost all regimes, from X ray to radio, now employ some form of detector array. Sizes range from the suite of ten 1, 024 × 1, 024 X-ray-sensitive CCDs aboard the orbiting Chandra X-Ray Observatory to the 37- and 91-element bolometer arrays used for submillimeter-wave imaging by the James Clerk Maxwell Telescope on Mauna Kea. These devices have a common goal of achieving a balance between optimal (Nyquist) sampling of the point-spread function and maximal image (field) size. Sensitivity Requirements: Energy Fluxes of Astronomical Sources. Astronomical sources span an enormous range of intrinsic luminosity. Figure 12 readily shows that the least luminous objects known tend to be close to Earth (e.g.,
IMAGING SCIENCE IN ASTRONOMY
689
1015
Virgo_cluster
1010 M31 Cygnus_GMC
Radius (astron. units)
Cas_A 105 Ring_Nebula CS_disk 3C273 Betelgeuse
100 Sun
Alpha_Cen
Jupiter Moon
Pluto
10−5 Crab_pulsar
10−10 10−10
10−5
100 Distance (light years)
105
1010
Figure 11. Physical radii vs. distances (from Earth) for representative astronomical sources (7). One astronomical unit (AU) is the Earth–Sun distance (1.5 × 108 km). A light year is the distance traveled by light in one year (9 × 1012 km). Represented in the figure are objects within our own solar system, the nearby Sun-like star α Cen, the red supergiant Betelgeuse, the pulsar at the center of the Crab Nebula supernova remnant, a typical circumstellar debris disk (‘‘CS disk’’), a typical planetary nebula (the Ring Nebula), the supernova remnant Cas A, the galactic giant molecular cloud located in the direction of the constellation Cygnus (‘‘Cygnus GMC’’), the nearby Andromeda galaxy (M31), the quasar 3C 273, and the Virgo cluster of galaxies. Diagonal lines represent lines of constant angular size, and angular size decreases from upper left to lower right.
small asteroids in the inner solar system), and the most luminous sources known (e.g., the central engines of active galaxies or the primordial cosmic background radiation) are also the most distant. This tendency to detect intrinsically more luminous sources at greater distances follows directly from the expression for energy flux received at Earth, F=
L , 4π D2
(3)
where F is the flux, L is luminosity, and D is distance. Thus an astronomical imaging system that has a limiting sensitivity F ≥ Fl penetrates to a limiting distance, L , (4) Dl ≤ 4π Fl
for sources of uniform luminosity L. Real samples (of, e.g., stars or galaxies), of course, may include a wide range of intrinsic luminosities. As a result, there tends to be strong selection bias in astronomy, such that the number and/or significance of intrinsically faint objects tends to be underestimated in any sample of sources selected on the basis of minimum flux. For this reason in particular, astronomers require increasingly sensitive imaging systems. To calibrate detected fluxes properly, such systems must still retain good dynamic range, so that the intensities of faint sources can be accurately referenced to the intensities of bright, well-calibrated sources. In addition, because a given source of extended emission may display a wide variation in surface brightness, a combination of high sensitivity and good dynamic range frequently is required to characterize
690
IMAGING SCIENCE IN ASTRONOMY
1015 3C273
M31 1010
Total luminosity (solar units)
105
Betelgeuse Cas_A Ring_Nebula
Alpha_Cen
Sun
100
CS_disk Crab_pulsar
10−5
Jupiter 10−10
Moon
Pluto 10−15 −10 10
10−5
100 Distance (light years)
105
1010
Figure 12. Intrinsic luminosities vs. distances (from Earth) for representative astronomical sources; symbols are the same as in the previous figure. Luminosities are expressed in solar units, where the solar luminosity is 4 × 1033 erg s−1 . Diagonal lines represent lines of constant apparent brightness, and apparent brightness decreases from upper left to lower right.
source morphology adequately and, hence, deduce intrinsic source structure. Limitations
Atmosphere. The Earth’s atmosphere attenuates the signals of most astronomical sources. Signal attenuation is a function of both the path length through the atmosphere between the source and telescope and the atmosphere’s intrinsic opacity at the wavelength of interest. Atmospheric attenuation tends to be smallest at optical and longer radio wavelengths, at which the atmosphere is essentially transparent. Attenuation is largest at very short (γ ray, X ray and UV) wavelengths, where the atmosphere is essentially opaque; attenuation is also large in the infrared. In the infrared regime especially, atmospheric transparency depends strongly on wavelength because the main source of opacity is absorption by molecules (in particular, water vapor). The atmosphere also is a source of ‘‘background’’ radiation at most wavelengths, particularly in the thermal infrared and far-infrared (2 µm ≤ λ ≤ 1 mm), at which
most of the blackbody radiation of the atmosphere emerges. This background radiation tends to limit the signal-to-noise ratio of infrared observations for which other noise sources (such as detector noise) are minimal. Elimination of thermal radiation from the atmosphere provides a primary motivation for the forthcoming Space Infrared Telescope Facility (SIRTF), the last in NASA’s line of Great Observatories.
Telescope. Sensitivity (or image signal-to-noise ratio) is directly proportional to the collecting area and efficiency of the telescope optical surfaces (‘‘efficiency’’ here refers to the fraction of photons incident on the telescope optical surface that are transmitted to the camera or detector).4 Reflecting telescopes supplanted refracting telescopes at the beginning of the twentieth century because large primary mirrors could be supported more easily than large
4 The product of telescope collecting area and efficiency is referred to as the effective area of the telescope.
IMAGING SCIENCE IN ASTRONOMY
objective lenses and the aluminized surface of a mirror provides nearly 100% efficiency at optical wavelengths. Furthermore, unlike lenses, paraboloid mirrors provide images that are free of spherical or chromatic aberrations. These same mirrors provide excellent efficiency and image quality in the near-infrared, as well. Parabolic reflectors are also used as the primary radiation collecting surfaces in the radio regime, where the requirements of mirror figure are less stringent (due to the relatively large wavelengths of interest).
Detector. The photon counting efficiency of a detector and sources of noise within the detector also dictate the image signal-to-noise ratio. Photon counting efficiency is usually referred to as detector quantum efficiency (QE). Detector QEs at or higher than 80% are now feasible in many wavelength regimes; however, such high QE often comes at the price of the introduction of noise. Typical image noise sources are read noise, the inherent uncertainty in the signal readout of the detector, and dark signal, the signal registered by the detector in the absence of exposure to photons from an external source. Surmounting the Obstacles Beating the Limitations of the Atmosphere: Adaptive Optics and Space-Based Imaging. Adaptive optics techniques have been developed to mitigate the effects of atmospheric scintillation. In such systems, the image of a fiducial point source — either a bright star or a laser-generated artificial ‘‘star’’ — is continuously monitored, and these data are used to drive a quasi-real-time image correction system (typically a deformable or steerable mirror). Naturally — as has been demonstrated by the spectacular success of the refurbished Hubble Space Telescope — placement of the telescope above the Earth’s atmosphere provides the most robust remedy for the effects of atmospheric image distortion. Beating the Limitations of Aperture: Interferometry. The diffraction limit of a single telescope can be surmounted by using two or more telescopes in tandem. This technique is referred to as ‘‘interferometry’’ because it uses the interference patterns produced by combination of light waves from multiple sources. Therefore, the angular resolution of such a multiple telescope system, at least in one dimension, is limited by the longest separation between telescopes, rather than by the aperture of a single telescope. However, it is generally not possible to ‘‘fill in’’ the gaps between two telescopes at large separation by using many telescopes at smaller separation. As a result, interferometry is generally limited to relatively bright sources, and interferometric image reconstruction techniques necessarily sacrifice information at low spatial frequencies (i.e., large-scale structure) in favor of recovering information at high spatial frequency (fine spatial structure). Interferometry has long been employed at radio wavelengths because recombination of signals from multiple apertures is relatively easy at long wavelengths. Indeed, the angular resolution achieved routinely at centimeter wavelengths by NRAO’s Very Large Array in New Mexico rivals or
691
exceeds that of optical imaging by the Hubble Space Telescope. Recently, however, several optical and infrared interferometers have been developed and successfully deployed; examples include the Navy Prototype Optical Interferometer at Anderson Mesa and the Infrared Optical Telescope Array on Mt. Hopkins, both in Arizona, and the optical interferometer operated at Mt. Wilson, California, by the Center for High Angular Resolution Astronomy. Beating the Limitations of Materials: Mirror Fabrication. The sheer weight of monolithic, precision-ground mirrors and the difficulty of maintaining the requisite precise figures renders them impractical for constructing telescope apertures larger than about 8 meters in diameter. Hence, during the late 1980s and early 1990s, two competing large mirror fabrication technologies emerged: spin-cast and segmented mirrors (Fig. 13). Both methods have yielded large mirrors whose apertures are far lighter and more flexible than previously feasible. The former method has yielded the 8-meter-class mirrors for facilities such as the twin Gemini telescopes, and the latter method has yielded the largest mirrors thus far, for the twin 10-meter Keck telescopes on Mauna Kea. It is not clear, however, that either technique can yield optical-quality mirrors larger than about 15 meters in diameter. An entirely different mirror fabrication approach is required at high energies because, for example, X rays are readily absorbed (rather than reflected) by aluminized glass mirrors when such mirrors are used at near-normal incidence. The collection and focusing of X-ray photons instead requires grazing incidence geometry to optimize efficiency and nested mirrors to optimize collecting surface (Fig. 14). The challenge now faced by high-energy astronomers is to continue to increase the effective area of such optical systems while meeting the strict weight requirements imposed by space-based observing platforms. It is not clear that facilities larger than the present Chandra and XMM-Newton observatories are practical given present fabrication technologies; indeed, Chandra was the heaviest payload ever launched aboard a NASA Space Shuttle. THE SHAPE OF THINGS TO COME Projects in Progress At this time, several major new astronomical facilities are partially or fully funded and are either in design or under construction. All are expected to accelerate further the steady progress in our understanding of the universe. A comprehensive list is beyond the scope of this article; however, we mention a few facilities of note. • The Space Infrared Telescope Facility (SIRTF): SIRTF is a modest-aperture (0.8 m) telescope equipped with instruments of extraordinary sensitivity for observations in the 3 to 170 µm wavelength regime. SIRTF features a powerful combination of sensitive, wide-field imaging and spectroscopy at low to moderate resolution over this wavelength range. It is well equipped to study (among many other things) primordial galaxies, newborn
692
IMAGING SCIENCE IN ASTRONOMY
Figure 13. Photo of the segmented primary mirror of the 10-meter Keck telescope (Photo credit: Andrew Perala and W.M. Keck Observatory). See color insert. Field-of-view — 5°
Doubly reflected X rays
Four nested hyperboloids
Focal surface
Doubly reflected X rays
10 meters
X rays
X rays
Four nested paraboloids Mirror elements are 0.8 m long and from 0.6 m to 1.2 m in diameter
Figure 14. Geometry of the nested mirrors aboard the orbiting Chandra X-Ray Observatory [Source: NASA/Chandra X-Ray Center (http://chandra.harvard.edu)]. See color insert.
stars and planets, and dying stars because all of these phenomena emit strongly in the mid- to far-infrared. SIRTF has a projected 5-year lifetime and is expected to be deployed into its Earth-trailing orbit in 2002. • The Stratospheric Observatory for Infrared Astronomy (SOFIA): SOFIA will consist of a 2.5-meter telescope and associated cameras and spectrometers installed aboard a Boeing 747 aircraft. SOFIA will be the largest airborne telescope in the world. Due to
its ability to surmount most of Earth’s atmosphere, SOFIA will make infrared observations that are impossible for even the largest and highest groundbased telescopes. The observatory is being developed and operated for NASA by a consortium led by the Universities Space Research Association (USRA). SOFIA will be based at NASA’s Ames Research Center at Moffett Federal Airfield near Mountain View, California. It is expected to begin flying in the year
IMAGING SCIENCE IN BIOCHEMISTRY
2004 and will remain operational for two decades. Like SIRTF, SOFIA is part of NASA’s Origins Program, and hence its science goals are similar and complementary to those of SIRTF. • The Atacama Large Millimeter Array (ALMA): ALMA will be a large array of radio telescopes optimized for observations in the millimeter wavelength regime and situated high in the Atacama desert in the Chilean Andes. Using a collecting area of up to 10,000 square meters, ALMA will feature roughly 10 times the collecting area of today’s largest millimeter-wave telescope arrays. Its telescope-totelescope baselines will extend to 10 km, providing angular resolution equivalent to that of a diffractionlimited optical telescope whose diameter is 4 meters. ALMA observations will focus on emission from molecules and dust from very compact sources, such as galaxies at very high redshift and solar systems in formation. Recommendations of the Year 2000 Decadal Review The National Research Council, the principal operating arm of the National Academy of Sciences and the National Academy of Engineering, has mapped out priorities for investments in astronomical research during the next decade (8). The NRC study should not be used as the sole (or perhaps even primary) means to assess future directions in astronomy, but this study, which was funded by NASA, the National Science Foundation, and the Keck Foundation does offer insight into some potential groundbreaking developments in multiwavelength astronomical imaging. Highest priority in the NRC study was given to the Next Generation Space Telescope (NGST). This 8-meterclass, infrared-optimized telescope will represent a major improvement on the Hubble Space Telescope in both sensitivity and spatial resolution and will extend spacebased infrared imaging into the largely untapped 2–5 µm wavelength regime. This regime is optimal for studying the earliest stages of star and galaxy formation. NGST presently is scheduled for launch in 2007. Several other major initiatives were also deemed crucial to progress in astronomy by the NRC report. Development of the ground-based Giant Segmented Mirror Telescope was given particularly high priority. This instrument has as its primary scientific goal the study of the evolution of galaxies and the intergalactic medium. Other projects singled out by the NRC report include • Constellation-X Observatory, a next-generation X-ray telescope designed to study the origin and properties of black holes; • a major expansion of the Very Large Array radio telescope in New Mexico, designed to improve on its already unique contributions to the study of distant galaxies and the disk-shaped regions around stars where planets form; • a large ground-based survey telescope, designed to perform repeated imaging of wide fields to search for both variable sources and faint solar-system
693
objects (including near-Earth asteroids and some of the most distant, undiscovered objects in the solar system); and • the Terrestrial Planet Finder, a NASA mission designed to discover and study Earth-like planets around other stars. BIBLIOGRAPHY 1. A. Sandage, Ann. Rev. Astron. Astrophys. 37, 445–486 (1999). 2. K. R. Lang, Astrophysical Formulae, 3rd ed., Springer-Verlag, Berlin, 1999. 3. G. B. Rybicki and A. P. Lightman, Radiative Processes in Astrophysics, John Wiley & Sons, Inc., NY, 1979. 4. D. Osterbrock, Astrophysics of Gaseous Nebulae and Active Galactic Nuclei, University Science Books, Mill Valley, 1989. 5. R. Lucas, P. Cox, and P. J. Huggins, in J. H. Kastner, N. Soker, and S. Rappaport, eds., Asymmetrical Planetary Nebulae II: From Origins to Microstructures, vol. 199, Astron. Soc. Pac. Conf. Ser., 2000, p. 285. 6. J. H. Kastner, N. Soker, S. Vrtilek, and R. Dgani, Astrophys. J. (Lett.) 545, 57–59 (2000). 7. C. W. Allen and A. N. Cox, Astrophysical Quantities, 4th ed., Springer Verlag, Berlin, 2000. 8. C. McKee et al., Astronomy and Astrophysics in the New Millennium, National Academy Press, Washington, 2001.
IMAGING SCIENCE IN BIOCHEMISTRY NICOLAS GUEX TORSTEN SCHWEDE MANUEL C. PEITSCH Glaxo Smith Klime Research & Development SA Geneva, Switzerland
INTRODUCTION Research in biology, aimed at understanding the fundamental processes of life, is both an experimental and an observational science. During the last century, all classes of biomolecules relevant to life have been discovered and defined. Consequently, biology progressed from cataloging species and their life styles to analyzing their underlying molecular mechanisms. Among the molecules required by life, proteins represent certainly the most fascinating class because they are the actual ‘‘working molecules’’ involved in both the processes of life and the structure of living beings. Proteins carry out diverse functions, including signaling and chemical communication (for example kinases and hormones), structure (keratin and collagen), transport of metabolites (hemoglobin), and transformation of metabolites (enzymes). In contrast to modeling and simulation, observation and analysis are the main approaches used in biology. Early biology dealt only with the observation of macroscopic phenomena, which could be seen by the naked eye. The development of microscopes permitted observation of the smaller members of the living kingdom and hence of the cells and the organelles they contain (Fig. 1). Observing
694
IMAGING SCIENCE IN BIOCHEMISTRY
Atom 10
−11
10−10 Å
Amino acid Protein 10−9 nm
10−8
Virus
Prokaryotic cell Eukaryotic cell nucleus
10−7
10−6 µm
10−5
10−4
Drosophila 10−3 mm
10−2 cm
Mouse 10−1
Human 1 m
10 m
Violet Green Red Visible light
Figure 1. Relative sizes (indicative only) of atoms, molecules, and organisms, shown on a log scale of lengths from meter to angstrom with the corresponding wavelength of electromagnetic radiation. Imaging an object requires electromagnetic radiation of wavelength equal to or smaller than the object.
Figure 2. Representation of gene expression to produce a protein. A gene (top left) is transcribed into a corresponding mRNA (not shown). The latter is processed by a ribosome (not shown) which links amino acids together to form a polypeptidic chain (top right). The polypeptidic chain (bottom right) folds into a compact functional protein (bottom left).
macromolecules on an atomic level could not be achieved by visible light or electron microscopy, until recent advances allowed imaging the outer shapes of large molecular entities at low resolution (1,2). Consequently, indirect methods that reveal molecular organization in a crystal (X-ray crystallography) or interactions between atoms (NMR) have been developed and are routinely used. Thus, bear in mind that all images are not direct observations. Images in this chapter were carefully selected to give the reader a broad view of the various modes of representation used by scientists to help them unravel protein structure and function. Several programs to generate such images are available (see Software), and most of them can display or generate many of the different representations shown in this article. For a general introduction into principles of protein structure, see Ref. (3). Proteins are linear polypeptides built from 20 different building blocks called amino acids. The information needed to make a protein is encoded by DNA, which is, for eucaryotes, located in the cell’s nucleus. A segment of a DNA molecule, called a gene, is transcribed into an intermediate (mRNA), which is then processed by ribosomes to produce a protein (for details about this process, refer to a biology (4) or biochemistry textbook (5–7)). During this process, individual L-α-amino acids are linked together by peptide (amide) bonds to form a polypeptide (a continuous chain of amino acid residues, in which the carbonyl carbon atom of an amino acid is covalently linked to the nitrogen atom of the next amino acid). Then, this polypeptide folds into its final 3-D conformation to assume a specific function (Fig. 2).
There is no freedom of rotation around the peptidic bond [ω = 180 ° (trans), or 0 ° (cis]; all of the conformational flexibility of the protein backbone is due to two rotatable bonds for each amino acid residue (the dihedral angles φ and ϕ) (Fig. 3a). Most of the 20 natural amino acids have additional rotatable bonds in their sidechains. These dihedral angles are named χ1 –χ5 . Figure 3a is an abstraction in which each atom is represented by its chemical symbol (N,C,O. . .) connected by lines symbolizing covalent bonds. This flat representation of structure, frequently seen in textbooks, does not reflect the spatial arrangement of atoms. Therefore, more realistic representations are used to reflect the three-dimensional nature of the compound. In Fig. 3b the tripeptide segment of Fig. 3a is drawn in ball-and-stick style, where position of atoms are marked by small balls connected by sticks symbolizing the chemical bond. This reveals the geometry (distances and angles), but it does not reflect the space occupied by the atoms accurately and gives the erroneous impression that the atoms of molecules are not tightly packed. Plotting dotted spheres using the van der Waals radii of the respective atom can remedy this shortcoming (Fig. 3b). On the other hand, space-filling models (CPK, Corey–Pauling–Koltun) give a reasonable impression of the overall shape, even though the underlying chemical structure cannot be recognized easily (Fig. 3c). In all cases, those images are static projections and provide only a limited representation of the object from a single viewpoint. Therefore, the possibility of manipulating models of 3-D objects in real time proved essential for the development of the field. Indeed, the overall shape and orientation of an object are much easier to grasp when it is smoothly
IMAGING SCIENCE IN BIOCHEMISTRY
695
(b)
(a) w c1 c2
c3 c4 c5
(c)
1.5Å
Figure 3. Three residues (ala-lys-ala) in a polypeptide chain. (a) Schematic drawing of the chemical structure. ω dihedral angles values are restricted to 180 ° or 0 ° , but conformational flexibility is possible through the freely rotatable bonds around dihedral angles , , and χ1 –χ5 . (b) Ball-and-stick model with the van der Waals radii of atom shown as dotted spheres. (c) Space-filling model. Atoms are colored by chemical element (nitrogen, N, blue; carbon, C, gray; oxygen, O, red). See color insert.
rotated in real time or rendered illuminated with lights casting shadows (Fig. 4), than when it is only drawn in thin lines. However, those representations cannot replace the realism achieved through stereoscopic vision (Figs. 5 and 6). Various computer display methods and tools have been developed to ease the manipulation of 3-D objects. Indeed, it is always difficult to work with 2-D projections because they lack the ‘‘feeling’’ obtained by observing a
Figure 4. A cube presented in a wire frame does not allow the viewer to determine the sphere location precisely. It could be on a face or at the center (left). The same cube properly lit and rendered with shadows (right) gives a much clearer picture of the relative positions of both sphere and cube.
3-D environment (for example, it is easier to evaluate a distance or the slope of a mountain in a real environment than by looking at a map). Because proteins and molecules are 3-D objects, their representation directly benefits from these technologies. Common practice is to draw stereo pairs, where the left and right images are rotated by −3 and +3 ° , respectively, and are separated by 6 cm (about the distance between human eyes) (Fig. 6). This allows parallel viewing of small images (left eye image on left side), as in journal articles. For larger images, such as on computer displays or projection screens, images are placed for cross-eyed viewing (right image on the left side). In either cases, the brain reconstructs a 3-D image. This helps immensely to capture the orientation of compounds in an active site or the overall topology of a protein. The disadvantage of this technique is that many people need an adaptive time to ‘‘switch’’ to stereo view. Indeed, the angle of view is naturally linked to the focusing distance. However stereo perception requires decoupling the angle of view from the focusing distance, that is, for parallel viewing, ‘‘look at infinity’’ while focusing on the screen or paper plane (literally seeing ‘‘through’’ the paper or screen plane). Hence, specialized hardware has been developed that allows for true stereo display. The basic principle is always the same, because two images taken at slightly
696
IMAGING SCIENCE IN BIOCHEMISTRY (a)
(b)
(c)
Figure 5. How stereo perception is achieved: an object (top) is displayed from two different points of view (middle), as it would be perceived by each eye. Various means are available to present the left image to the left eye and the right image to the right eye. For instance, both images can be alternately presented at the same screen position, provided that the left eye is obstructed when the right image is displayed, and conversely. This is achieved by using special LCD shutter glasses synchronized with the screen. Alternatively, images can be colored (in green for the left one and in red for the right one). Then, glasses tinted with the complementary colors (green for the left eye, red for the right eye) can be used to filter out the view that does not correspond to the proper eye. See color insert.
different points of view (to mimic the human stereoscopic vision) are generated and displayed alternately on a screen at high frequencies (Fig. 5). Stereo glasses equipped with LCD shutters (the left lens is obstructed when the right image is displayed, and conversely) are synchronized with the screen, allowing the viewer to see in 3D without effort. Other systems in which LCD screens are built so that the left eye sees pixels different from the right one also allow for 3-D perception when appropriate images are presented to each eye. Those 3-D systems are especially useful during experimental structure elucidation by X-ray crystallography. PROTEIN STRUCTURE DETERMINATION BY X-RAY CRYSTALLOGRAPHY Understanding the microscopic details of life processes is a long-standing task for scientists. Unfortunately, the resolution of optical devices is limited approximately to the wavelength of the electromagnetic radiation. Distances between bonded atoms in molecules are on the order of 0.15 nm or 1.5 A˚ (the unit A˚ is commonly used in crystallographic literature instead of the SI units nm or pm; 1 A˚ corresponds to 10−10 m) (Fig. 1). Therefore, exploration on a molecular or even atomic level cannot
Figure 6. Some stereo images. (a): same cube as in Fig. 4; (b): ATP, an energy transfer agent in all organisms. (c): a protein [retinoic acid transporter protein; PDB entry 1CBQ (40)]. To learn to view in stereo, start with the cube because it is easier to see in 3D. (The trick is to view the left image with the left eye and the right image with the right eye; this gives the impression of ‘‘moving’’ the two spheres until they superpose. Sometimes it helps to place a postcard between the two pictures.) For detailed instructions in stereo viewing, see http://www.usm.maine.edu/∼rhodes/0Help/StereoView.html.
be achieved with microscopes using visible light of wavelengths 400–700 nm. Currently, two experimental methods allow resolving details of large molecules on the level of individual atoms. The most commonly used method, single-crystal X-ray crystallography, can be used for very large macromolecules but is limited by the availability of protein crystals. As of June 2001 about 11,630 structures determined by this method have been deposited in the Protein Data Bank, the repository for 3-D macromolecular structure data (8) (http://www.rcsb.org/pdb/). About 1,918 structures of small macromolecules (e.g., proteins up to a molecular weight of 40 kDa) have been solved in solution rather than in the crystalline state by NMR methods (see references 9,10, and 11 for details on NMR methods). Both experimental methods do not produce a picture of the molecule directly but rather provide data that allows computing a molecular model that is consistent with experimental observations. This article will schematically describe structure solution
IMAGING SCIENCE IN BIOCHEMISTRY
by single-crystal X-ray crystallography. For a more complete introduction to protein crystallography, see Ref. 11. X-Ray Diffraction by Protein Crystals Electromagnetic radiation in the range of 0.1–100 A˚ is called X rays. X rays in the useful range for ˚ can be produced either by crystallography (0.5–3 A) X-ray tubes (in which accelerated electrons strike a metal target) or from synchrotron radiation, a by-product in particle accelerators. In both cases, the resulting primary radiation must be filtered by monochromators to produce X rays of a single wavelength. X-ray tubes are limited to a fixed wavelength that is defined by the emission spectrum of the target metal (e.g., λ = 1.54 A˚ for the commonly used CuKα L → K transition). Modern synchrotron radiation facilities provide X-ray sources that have much higher brilliance and allow varying the selected wavelength. The availability of these powerful radiation sources contributed vitally to the success of structural determination projects in recent years (12,13). The electrons of the individual atoms of a molecule diffract X rays. Whereas for optical light, a focused image of the object can be produced from the diffracted light using optical lenses, this is not possible for diffracted X rays. On one hand, there are no lenses available that can be used to focus X rays, and on the other, the interaction between X rays and a molecule is weak, so that only a tiny fraction of the primary beam is diffracted. Therefore the diffracted radiation from a single molecule is too weak to be detected. The experimental solution for this dilemma is to examine the diffraction of a crystal, which contains a large number of highly ordered molecules in the same or a small number of orientations. The diffracted radiation from of all these molecules sums to strong detectable Xray beams. Crystallographers can compute the image of the molecule, from the directions, intensities and phases of these diffracted X-ray beams, thereby simulating an X-ray lens. Crystals (Fig. 7) are arrays of so-called unit cells (in simple cases they contain one molecule) that build a three-dimensional crystal by periodic repetition in three dimensions. Only part of the crystal volume is occupied by protein molecules. Indeed, protein crystals contain 30–70% water (14,15). All handling of protein crystals has to be done in a humid atmosphere to preserve a nearly native environment of the protein. Modern X-ray
Figure 7. Photo of a protein crystal of recombinant histidine ammonia lyase from Pseudomonas putida using polarized light. The size of the crystal is ca. 250 × 250 × 700 µm3 .
697
diffraction facilities use nitrogen gas at about 100 K to flash-cool the protein crystals and preserve them during diffraction data collection. X-ray beams diffracted by a crystal form a highly regular pattern (Fig. 8), because for a given orientation of the crystal relative to the primary beam, constructive interference of the diffracted radiation is only observed at distinct angles. A simple geometric explanation for this relation is given by Bragg’s law [Eq. (1), Fig. 9a], nλ = 2d sin ,
(1)
where diffraction by a crystal is described as reflection of the beam by a stack of parallel planes with an interplanar spacing of d (Fig. 9b). In this picture, constructive interference is observed only when the difference in the path length of rays reflected by successive planes is equal to an integral number (n) of wavelengths (λ) of the impinging X rays. Each unit cell can be described by three cell edges (a, b, c) and the angles between them (α, β, γ ). The symmetry and geometry of the observed diffraction pattern are related to the symmetry and unit-cell dimensions of the diffracting crystal. Each of the measured reflections corresponds to a set of parallel planes and can be indexed by three small integral numbers h, k, and l, called Miller indexes, that indicate the number of parts into which this set of planes cuts the edges (a, b, c) of each unit cell (Fig. 9b). In the Bragg model, (h, k, l) can be interpreted as a vector perpendicular to the set of planes that is in reflecting position. The goal of crystallographic data collection is to measure the diffracted radiation for all possible orientations between the crystal and the impinging beam.
Figure 8. X-ray diffraction pattern of the same protein as in Fig. 7. The positions of the reflections show a highly regular pattern, but the intensities vary significantly.
698
IMAGING SCIENCE IN BIOCHEMISTRY Incoming beam
The wavelength is that of the X-ray source, the amplitude F(hkl) is amenable to experiment as the square root of the intensity of the diffracted radiation. Unfortunately, the phase ϕ can not be measured directly but has to be determined for each reflection h, k, l. This is called the ‘‘phase-problem’’ of crystallography. Due to the complexity of the subject, only a short overview of phase determination methods can be given in this chapter. For a more detailed discussion, see Refs. 11, 16 and 17. In crystallography of small molecules, phases can be determined computationally from the measured amplitudes directly, but these ‘‘direct methods’’ are not applicable to large protein molecules due to the high number of atoms (18,19). Three methods have been widely used to determine phases in protein crystallography.
Diffracted beam
nl q
q
d
d
y
d sinq
(010)
(010)
d sinq
The Heavy Atom Method (Multiple Isomorphous Replacement, MIR)
1 0) ( 2
b
(100)
x
a
Figure 9. (a) Bragg’s law is a simple geometric model, that describes X-ray diffraction as a reflection of a beam by a stack of parallel planes that have an interplanar spacing of d. (b) Different stacks of equivalent parallel planes of atoms in a two-dimensional section of a crystal lattice. Dots symbolize identical positions in the crystal (e.g., the position of a molecule). A set of parallel planes is identified by three indices hkl, indicating the number of parts into which this set of planes cuts the edges a, b, c of each unit cell.
Anomalous Scattering Anomalous Dispersion)
Electron Density as a Fourier Series The symmetry of a diffracting crystal is responsible for the symmetry of the observed diffraction pattern, but the content of the unit cell determines the intensities of the diffracted radiation. X-ray diffraction takes place by interaction with the electrons of individual atoms. The electron density distribution within the unit cell (reflecting the position of the atoms and therefore the structure of the molecule) can be described by a Fourier series: ρel (x, y, z) =
i F(hkl) · e−2π i(hx+ky+lz) , V h
k
(2)
l
where ρel (x, y, z) represents the electron density at a position x, y, z in the unit cell, V is the volume of the unit cell, and h, k, l are the reflection indexes mentioned before. Equation (2) connects the observed diffraction pattern with the electron density, which is the molecular image of the molecule in the form of a three-dimensional contour map. F(hkl), called ‘‘structure factor,’’ represents the diffracted X-ray beam and is a periodic function that consists of wavelength, amplitude, and phase (Eq. 3): F(hkl) = F(hkl) · eiϕhkl .
Each atom in the unit cell contributes to all observed reflective intensities. The basic principle of the MIR method is collecting diffraction data of several (multiple) crystals of the same protein that share the same crystal properties (isomorphous) but differ in a small number of heavy atoms. The experimental approach is normally to soak protein crystals with dilute solutions of heavy metal compounds (e.g., mercury or platinum derivatives) that often bind specifically to certain protein residues. These additional atoms cause a slight perturbation of the diffraction intensities. To achieve a perturbation large enough to be measured correctly, the added atoms must diffract strongly; so elements with a high number of electrons (heavy atoms) are used (diffraction intensity increases with the atomic number). The differences in the reflective intensities can be used to locate the positions of the heavy atoms within the unit cell, which allows a first estimate of phases.
(3)
(MAD,
Multiple
Wavelength
The MAD method is based on the capacity of heavy atoms to absorb X rays of a specific wavelength. Near the characteristic absorption wavelength of the heavy atoms, the diffraction intensities of symmetry-related reflections (Friedel pairs, h, k, l and −h, −k, −l) are no longer equal. This effect is called anomalous scattering. The characteristic absorption wavelengths of typical protein atoms (N, C, O) are not in the range of the X rays used in protein crystallography and therefore do not contribute to anomalous scattering. However, the use of synchrotron Xray sources with adjustable wavelengths allows collecting diffraction data under conditions where heavy atoms exhibit strong anomalous scattering. In practice, several diffraction data sets are collected from the same protein crystal at different wavelengths. The locations of the heavy atoms can be determined from the small differences between the Friedel pairs, and the initial phases of the native data can be estimated.
Molecular Replacement In some cases, it is known that the structure to be determined is very similar to another structure that has
IMAGING SCIENCE IN BIOCHEMISTRY
already been solved experimentally. For example, this could be a homologous protein from another organism or a mutant of this protein. Then, the phases computed from the known protein structure (phasing model) can be used as initial estimates of the phases of the unknown protein. Model Building and Refinement As mentioned earlier, the primary experimental result of a crystallographic structure determination is a threedimensional electron density map, computed (Eq. 2) from the measured reflection intensities and the initial phases. Unfortunately, in most cases this initial three-dimensional image of the molecule is not detailed and accurate enough to identify atomic details. If the initial phase estimates are sufficiently good, the map will show gross features of the molecules, like continuous chains of electron density along the protein backbone. Then, knowing the chemical structure of the molecule (i.e., the sequence of amino acid residues), the three-dimensional conformation has to be modeled so that it accords with the observed electron density. Interactive computer programs that support hardware stereo display (see earlier) are used to build the first molecular model by placing the parts of the molecule manually into the observed map (map fitting) (Fig. 10a). This initial model represents only a rough cartoon of the real molecular structure and explains the observed diffraction intensities only partially. Computational methods are used to modify the atomic parameters of the model to fit the observed diffraction data better. The process of improving the atomic model to reflect the observed data is called crystallographic structural refinement. Besides the observed experimental data, most refinement programs use a set of idealized chemical parameters derived from structures solved experimentally at very high resolution (20). By restricting the deviations of the modeled bonds and angles from these idealized values, a stereochemically and conformationally reasonable model is ensured. This is especially important when only a small amount of diffraction data is available and the number of observations (reflections) is small compared to the number of parameters (coordinates of atoms) to be refined. The progress of the refinement procedure is monitored by comparing the measured structure factor amplitudes |Fobs | with amplitudes computed from the current model |Fcalc |. When converging to the correct molecular model, the measured F’s and computed F’s should also converge. The primary measure for this is the residual index, or R factor: ||Fobs | − |Fcalc || . (4) R= |Fobs | R factors range between zero for a perfect match between model and diffraction data to 0.6 for a diffraction data set compared to a set of random amplitudes. Commonly, R factors of refined protein structures are in the range of 0.2 for data at 2.5 A resolution, but R factors below 0.1 can occasionally be achieved for very high resolution data. The term ‘‘resolution’’ in X-ray crystallography, unlike in microscopy, refers only to the amount of data used during structural determination. A resolution of 2 A˚ means that diffraction data from planes whose interplanar spacing is as low as d = 2 A˚ (Fig. 9) has
699
been included in the refinement procedure. Actually, for protein structures refined at 2 A˚ resolution, the precision ˚ in of atom locations lies in the range of 0.15–0.40 A; comparison, a carbon–carbon bond is approximately 1.5 A˚ long. However, this is only an estimate for the average positional error resulting from the limits of data accuracy. Note that a crystallographic model represents the average of all molecules that diffracted X rays during the data collection. There are also two major physical reasons for uncertainty in crystal structures: dynamic disorder, or thermal vibrations of atoms around their equilibrium positions, and static disorder, in which equivalent atoms do not occupy the same positions in different unit cells (Fig. 11). Therefore, the model description contains an occupancy value nj (ranging from 0.0–1.0) describing static disorder, and a temperature factor Bj , reflecting the dynamic disorder or thermal motion of atom j for each atom position j. B factors are provided in [A˚ 2 ] and are related to the mean square displacement {uj2 } from its rest position (Eq. 5). Bj = 8π 2 {uj 2 }. (5) Thermal factors in protein structures, usually vary from 5–35 A˚ 2 for main-chain atoms and from 5–60 A˚ 2 for sidechain atoms. B values greater than 60 A˚ 2 may indicate disorder or errors in the protein structure. Structure Databases Most experimentally solved macromolecular structures are deposited in the Protein Data Bank [(PDB) (); URL: http://www.rcsb.org] or the Nucleic Acid Database NDB (URL: http://ndbserver.rutgers.edu), and can be retrieved from their Internet sites. Structures of small molecules are usually collected at the Cambridge Crystallographic Data Centre (URL: http://www.ccdc.cam.ac.uk/). As of 26 June 2001, the PDB contained 13,836 different database entries that consist of 11,630 structures solved by diffraction, 1918 by NMR, and 288 theoretical models. However, bear in mind that numerous proteins have several very similar entries in the PDB, such as structures elucidated at different resolutions, in the presence or absence of ligands, in mutated forms, and so forth. Although these data on similar structure provide useful information about protein families or mode of function, they do not contribute to the diversity of elucidated proteins. Consequently, the number of distinct protein structures is approximately one-fourth of the number of entries. REPRESENTATION OF BIOLOGICAL MACROMOLECULES As explained in the previous section, proteins are not directly observable. Therefore, it should be emphasized that all pictures are abstractions based on the 3-D coordinates of the model and are not direct images of the molecule. Those images also give a static view of the molecule. From comparative analysis of protein structures in different states (for example, with or without bound substrate) and from NMR experiments, it is known that certain proteins can undergo significant conformational changes. See Ref. 21 for a discussion of protein motions.
700
IMAGING SCIENCE IN BIOCHEMISTRY (a)
(b)
Figure 10. Stereo pictures of small sections of three-dimensional electron density maps of the same protein structure [PDB entry 1B8F (41)] at different resolutions. The term ‘‘resolution’’ in crystallography, unlike in microscopy, refers only to the amount of data used during structural determination. Note that these maps are the only experimentally derived images of the molecule. All further representations are based on models derived from these direct observations. (a) Initial experimental map derived by multiple isomorphous replacement (MIR) at 3.3-A˚ resolution. (b) Refined protein model surrounded by a density map at 2.0-A˚ resolution (c) and at 1.1-A˚ resolution.
Backbone Representations The first level of information about a protein is encoded in its amino acid sequence, also referred to as its primary structure. Amino acid sequence is directly dictated by the sequence of gene coding for the protein (Figs. 2
and 3). The backbones of all amino acids (except for proline) have alternating hydrogen donor and acceptor capability. This allows a second level of organization, called the secondary structure (Fig. 12), which consists of a repetition of a basic motif where the backbone forms a regular pattern of hydrogen bonds and dihedral angles
IMAGING SCIENCE IN BIOCHEMISTRY
701
(c)
Figure 10. (Continued)
30Å
and (see Fig. 3a) adopt similar values for successive residues. It is noteworthy that those structures were predicted as possible stable folding units by Linus Pauling long before they were actually observed (22). The first
Figure 11. Two-dimensional section of a crystal lattice of histidine ammonia lyase from Pseudomonas putida [PDB identifier 1B8F (41)]. For simplification, only the backbone of the protein chain is represented as a so-called Cα plot, where the positions of the Cα atoms of individual amino acid residues are connected by a pseudobond. To give a better idea of the repetitive unit, one of the molecules is highlighted in dark gray. The unit cell of the crystal is indicated as a rectangle.
protein whose structure was elucidated (23) [myoglobin; PDB entry 1MBN (24)] was indeed constituted from an assemblage of α helices; the backbone of 70% of its residues adopted an α conformation.
702
IMAGING SCIENCE IN BIOCHEMISTRY
(a)
(a)
3Å
(b)
5Å
Figure 12. Secondary structural elements extracted from an immunoglobulin binding protein [PDB entry 1IGD (42); see also Fig. 13]. (a) α helix. This structure is stabilized by hydrogen bonds between the amide nitrogen group of each residue (i) and the amide carbonyl group of the residue (i + 4). (b) β sheet composed of three strands. The top two strands are parallel because their C-terminal ends point in the same direction, whereas the bottom two strands are antiparallel because their C-terminal ends point in opposite directions. As in the α helix, the β-sheet structure is stabilized by a network of hydrogen bonds between amide nitrogen and carbonyl groups.
On average, secondary structure accounts for about 60% of protein structure. Although some of the remaining residues may be disordered and adopt flexible conformations, the majority of residues remain in a fixed conformation for which the backbone dihedral angles and do not adopt similar values for successive residues (see Ref. 25 for a detailed discussion of secondary structures). Because secondary structure elements are essentially repetitive, they allow a simplified representation that defines the overall fold and topology of the protein. In its crudest form, this can be an α-carbon trace, where pseudobonds that link α carbons of successive residues are drawn instead of the backbone (Fig. 11). More elaborate representations, called ribbons, do not plot any backbone atoms, but rather use them to guide a spline curve that illustrates the overall topology more smoothly (26). Solid representations of this curve (Fig. 13a) that have arrows at the end of each secondary structure element allow the viewer to identify the topology of a protein. However, even those simplified drawings tend to be difficult to follow for very large proteins, and even more abstract representations of the protein topology are useful. An example is given in Fig. 13b, where each secondary structure element is reduced to a circle or a triangle. Assigning a unique secondary structure to a stretch of residues is somewhat arbitrary because the boundaries between secondary structures depend largely on interpretation. Hence, different individuals or programs may assign different secondary structures to the same residue. A further complication is that secondary structures (especially β strands) are frequently (if not always) distorted compared to a ‘‘canonical’’ (idealized) structure. Therefore, assigning a secondary structure for a residue does not capture the subtle variations (tilt, bend, torsion)
(b)
C
N
Figure 13. Two different representations of the topology of an immunoglobulin binding protein [PDB entry 1IGD (42)]. (a) Ribbon representation, in which each secondary structure element is colored differently using a color gradient from blue (N terminal) to red (C terminal). (b) Further simplification of the topology, where each secondary structural element is represented by symbols connected by a line representing the polypeptidic chain. Circles symbolize α helices and triangles β sheets. Lines to the centers of figures connect in front; lines to the edges of figures connect in back. See color insert.
that are visible when secondary structure elements are compared. Of course, this has important implications for secondary structure prediction from the primary sequence alone. Domains and Fold Families The arrangement of all secondary structure elements of a protein, which is essentially the way the polypeptidic chain is folded, is named the tertiary structure, or fold. Proteins elucidated so far adopt a limited number of folds. Several fold classifications based on different methods coexist [SCOP (27), CATH (28), Vast/MMDB (29), Dali/FFSP (30)]. They all capture a slightly different aspect of the folding space. Therefore, it is difficult to estimate the total number of different folds that account for the proteome (the set of all expressed proteins for a given organism or tissue), and although estimates vary from hundreds to a few thousands (31,32), a recent analysis predicts that about 650 folds are enough to account for the total proteome (33). However, our current view of the fold space is likely to be biased because of an overrepresentation of certain common folds in the database, possibly those for which crystals could be obtained. Nevertheless, the total number of distinct folds in an organism is much lower than the total number
IMAGING SCIENCE IN BIOCHEMISTRY (a)
(b)
(c)
703
sequence level generally indicate high-level functional similarities, and hence fold recognition can be used to assist functional assignment of newly discovered proteins. However, this is not always the case because folds are not always linked to protein function. Examples of proteins that have the same function but different folds are known, as well as examples of protein that have a different function but share the same fold (Fig. 14). Most of the proteins folds can be further subdivided into domains (Fig. 15), that are independently folding units that often provide different functions (such as regulatory or binding units). One can look at domains as building blocks of larger proteins. As with folds, there are frequently alternative ways of splitting proteins into domains or recurrent secondary structure elements (also commonly referred to as supersecondary structures) (35). It is noteworthy that proteins frequently assemble in larger functional units called quaternary structures. Dimers, trimers, or even larger oligomers of identical (homo) or different (hetero) protein chains frequently occur. These functional units are distinct from the unit cell observed during protein crystallization, which may represent only part of the biologically relevant structure or multiple copies of the structure (36). Displaying Atomic Interactions Many proteins are enzymes, that is, proteins that catalyze reactions as diverse as the digestion of starch or modification of the activity of other proteins through phosphorylation. Therefore, it is not surprising that scientists are interested in the specific region of the protein responsible for the catalysis, the so called active site. Indeed, the analysis of interactions between an active site and its natural substrate enables computational chemists to design small molecules that could alter enzyme activity. Then, such compounds could be further refined and turned into medicines to cure disease. Furthermore, it is expected that one can design enzyme variants to
Figure 14. Illustration that protein fold and function are not correlated. (a) Soybean (Glycine max.) beta amylase [EC 3.2.1.2; PDB entry 1BYA (43)] adopts a TIM-barrel fold (left); Aspergillus awamori 1,4-α glucosidase, [EC 3.2.1.3; PDB entry 1DOG (44)] adopts an α−α toroid fold (right). Both proteins hydrolyze glucoside linkages. (b) Trypsin inhibitor [1TIE (45)] and interleukin-1 receptor antagonist [1ILR (46)] both belong to the β-trefoil fold family. Nevertheless, their biological functions are different. (c) Proteins sometimes evolve new functions: the avian eye lens protein δ2-crystallin [right, 1DCN (47)] is closely related to the enzyme argininosuccinate lyase [left 1AOS, (48)], which is involved in the urea cycle. Even though δ-crystallin has no enzymatic function in the eye lens, it still can bind the former substrate argininosuccinate in vitro.
of distinct proteins. This has important implications for protein structure prediction: it is expected that the computational prediction of protein folding can be replaced by fold recognition. Indeed, it was recognized early that proteins whose primary structures are similar also share the same fold (34). Consequently, similarities at the
Figure 15. Porcine fibroblast (interstitial) collagenase [PDB entry 1FBL (49)] consists of two domains (blue for the catalytic N-terminal domain, red for the C-terminal domain). The fragment of the peptide chain colored in green can be considered a connector between the two domains. Because the ribbon representation can mislead the reader in thinking that the two domains are widely separated, a transparent molecular surface (see section ‘‘derived properties’’) has been added to demonstrate that the two domains are actually in close contact. Note that, for clarity, only one peptide chain is represented here, although the biological functional unit is a homodimer. See color insert.
704
IMAGING SCIENCE IN BIOCHEMISTRY
improve on a given property such as stability under extreme conditions (washing powders), faster catalysis (higher yields in industrial production processes), or even the capacity to catalyze new reactions. These applications frequently require comparing different active sites, for which various representations or abstractions are helpful. Because several amino acids of the protein interact with several atoms of the compound to analyze, even stereoscopic representations become cumbersome (Fig. 16a). Therefore, schematic representations have been developed to help capture at a glance the interactions between a protein and its bound ligand (Fig. 16b). Derived Properties Proteins frequently bind to other large macromolecules such as DNA (e.g. to regulate gene expression), RNA (ribosome), or other proteins (receptors). Therefore, scientists are interested in predicting which proteins or DNA sequence can interact with a specific protein and also where the putative interactive zone is located. Prediction of binding commonly involves computing protein surfaces because it helps to grasp the overall shape of a protein. Although van der Waals surfaces (CPK atom representation) (Fig. 3c) give a crude idea
of the protein shape, they do not really represent the actual shape complementation that is possible between two molecules. Indeed, taking the inverse of this representation (Fig. 17a) does not reflect the space actually available for atoms of other molecules but only the spatial location where an infinitely small atom or water molecule could be placed. Because water molecules ˚ some parts of the have an approximate radius of 1.4 A, molecular surface are actually out of reach of the solvent (in narrow crevices, for instance). Hence, two other surface computational methods that address this problem have been developed (37,38): the solvent accessible surface and the molecular surface (Fig. 17b). Analytical methods and numerical approximations (using rectangular grids) are available for both. The accessible surface is generated by adding the average radius of the solvent molecule to the radius of each atom, and by plotting each point that is not inside another such atom that has an increased radius. This surface represents the limit beyond which the center of an atom cannot be placed without penetrating into the protein; such a representation can be useful for interactive docking of compounds. Computing a molecular surface is equivalent to rolling a solvent molecule over the protein atoms. It will be considered automatically that the regions that are not accessible to
(a)
Ser217
Ser217 3.12
3.12 2.92
2.92 Gly216
Gly216
Gly250
Gly193 3.03
Gly193
Trp252 Ala251
2.69 2.33
Gly250
3.15
3.15
3.00
3.03
Trp252 Ala251
2.69
2.33
3.00 3.13
3.13 Ser195
Ser195
Ser214
Ser214
2.62 2.70
2.62 2.70
His57 2.95 Asp102
His57 2.95 Asp102
Figure 16. Two representations of a serine protease active site [PDB entry 8GCH (50)]. (a) Stereoscopic view allowing a stereochemically correct representation of the atoms in the active site. Hydrogen bonds between the enzyme and the substrate are represented by green dotted lines. (b) Schematic representation of the same active site. See color insert.
IMAGING SCIENCE IN BIOCHEMISTRY
705
(b)
Figure 16. (Continued)
the solvent still belong to the protein, and the threedimensional surface patch can be considered a boundary between the protein interior and the solvent. Both surfaces can be rendered transparently so that identification of structural elements beneath the surface is possible (Fig. 17c). Electrostatic Potentials All of the properties one wishes to observe or compute (39) are ultimately linked to the atomic coordinates deduced from X-ray analysis, NMR experiments, or even theoretical models because they depend on the overall arrangement of atoms in the molecule. However, such properties do not need to be projected directly on top of atomic coordinates or bonds, as was illustrated in the previous section for molecular surface. Another example of property not mapped directly onto atoms, but still calculated from their positions, is the electrostatic potential of a protein. Although coloring amino acids or surface patches by amino acid charge (positive in blue, negative in red) can already give an indication of the protein’s polarity, this does not allow to draw any conclusion regarding the fine location and intensity of the electrostatic potential generated by the protein. The ability to generate 3D isopotential contours around a protein permits the
viewer to visualize how a potential extends away from a protein (Fig. 18) to influence the approach of substrates or other ligands. Properties such as electrostatic potential can also be mapped on the molecular surface to help explain how and why substrates interact in an active site (Fig. 19). Comparing potentials between enzymes of the same family can also give useful insights regarding their activity. Other Color-Coded Properties It was mentioned earlier that amino acids properties could be color coded (Fig. 19), for example to locate clusters of charges quickly at the protein surface or to verify that no charged residue is in direct contact with the hydrophobic core during protein modeling. The principle of coloring objects by a property of interest is quite general. It is applied, for example, quite extensively to geographic maps where the basic properties of countries (borders) are retained and a third parameter, such as density of population, is color coded and captured on the map. The same method can, of course, be applied to proteins. In this case, the basic property that is kept is the atomic coordinates (topology), and a fourth dimension (color) is used to represent additional parameters. Such parameters can be the similarity
706
IMAGING SCIENCE IN BIOCHEMISTRY
CONCLUSION
(a)
(b)
Accessible surface
Solvent atom
r = 1.4 Å
Protein atom
Molecular surface
Protein atom
Protein function depends on the three-dimensional conformation that amino acids residues adopt in a folded protein. This explains why considerable effort goes into elucidating protein structures. Several methods to extract, derive, and simplify information from protein structure have been developed. Proteins have been analyzed in terms of oligomeric interactions, surface properties, domains, secondary structure elements, backbone conformations, and side-chain properties. In summary, various levels of visualizing a protein are necessary, depending on the research application. However, bear in mind that these images are generated (as opposed to observed) to represent the specific theoretical properties of a molecule. Due to the advent of large-scale sequencing projects, far more data are available on protein sequence than on protein structure (crystallographic or NMR data). Protein structure prediction methods are essential to bridge this gap. These methods must rely on the information already known, and therefore rely strongly on the visualization methods mentioned in this chapter. Software
(c)
Figure 17. (a) Schematic representation of the van der Waals surface. Left: space actually occupied by the atoms of a molecule; right: space theoretically available for solvent atoms, although some crevices are actually too narrow to accommodate solvent atoms. (b) Schematic definition of the molecular surface and of the accessible surface (see text). (c) practical illustration of the difference between the accessible surface (dotted) and the molecular surface (transparent) of a small peptide (wire frame).
between two proteins (rms deviation in distances between corresponding atoms), the crystallographic B factor, the standard deviation between similar atoms in multiple NMR solutions, or the degree of amino acid conservation for similar proteins. The last proves particularly useful in locating potential active sites, binding pockets, or regions involved in protein–protein interactions. Indeed, when several proteins are compared, residues conserved among the whole family usually indicate well what had to be conserved throughout evolution to maintain a specific function.
Numerous software packages (53–64) currently available can be used to produce the kind of images described in this chapter. The following list is by far not complete, but as a starting point, it provides some programs that are freely available to the scientific community and that were used to prepare images for this article. • GRASP (53) is a molecular visualization and analysis program. It is particularly useful for displaying and manipulating the surfaces of molecules and their electrostatic properties. URL: http://transfer.bioc.columbia.edu/grasp/ • Ligplot (54) automatically generates schematic diagrams of protein–ligand interactions from the 3-D coordinates. It can also generate plots of residue interactions across a dimer or domain–domain interface. URL: http://www.biochem.ucl.ac.uk/bsm/ligplot/ ligplot.html • MOLMOL (55) is a molecular graphics program for displaying, analyzing, and manipulating the threedimensional structure of biological macromolecules, specially emphasizing the study of protein or DNA structures determined by NMR. URL: http://www.mol.biol.ethz.ch/wuthrich/software/molmol/ • MolScript (56) is a program for displaying molecular 3-D structures, such as proteins, in both schematic and detailed representations. URL: http://www.avatar.se/molscript/ • MSMS (57) Michel Sanner’s Molecular Surface package URL: http://www.scripps.edu/pub/olson-web/people/sanner/
IMAGING SCIENCE IN BIOCHEMISTRY
707
Figure 18. Electrostatic potential of the superoxide dismutase [PDB entry 2SOD (51)] contoured at +1kB T/e (blue) and −1kB T/e (red), respectively. Because the functional unit is a dimer, two active sites (located at the positive electrostatic potential) are present. The isocontoured surfaces are rendered transparently to see the protein backbone, which is represented as a worm.
Figure 19. Electrostatic potential of a protein kinase [PDB entry 1YDR (52)] mapped onto its molecular surface. Negative and positive surface patches are colored in red and blue, respectively. A peptide inhibitor (wire frame) is bound to the protein. Amino acids that bear a negative or positive charge are colored in red or blue, respectively. See color insert.
• MSP Michael Connolly’s Molecular Surface Package (see references 37 and 38). URL: http://www.biohedron.com/ • O (58) is a general purpose macromolecular modeling environment developed by Alwyn Jones and Morten Kjeldgaard. The program is designed for scientists who need to model, build, and display macromolecules. O is used mainly in the field of protein crystallography. URL: http://origo.imsb.au.dk/∼mok/o/ • POV-Ray The Persistence of Vision Raytracer is a high-quality tool for creating three-dimensional graphics. It can be used to render pictures
composed in other programs such as SwissPdbViewer (Deep View). URL: http://www.povray.org/ • RasMol (59) is a free scriptable software written by Roger Sayle for viewing molecular structures. RasMol stimulated broad interest in molecular visualization for structural biology and education. URL: http://www.umass.edu/microbio/rasmol/ • Raster3D (60) is a set of tools for generating highquality raster images of proteins or other molecules. It can also be used to render pictures composed in other programs such as Molscript in 3D with highlights, shadowing, etc.
708
IMAGING SCIENCE IN BIOCHEMISTRY
URL: http://www.bmsc.washington.edu/raster3d/ • Swiss-PdbViewer (DeepView) (61,62) provides a user-friendly interface to display and analyze several proteins simultaneously. It also provides a scripting language and various tools for structural analysis and model building. URL: http://www.expasy.org/spdbv/ • TOPS (63) computes two-dimensional schematic representations of protein folds, so-called protein topology cartoons. URL: http://www3.ebi.ac.uk/tops/ • WHAT IF (64) is a versatile protein structural analysis program that can be used for mutant prediction, structure verification, molecular graphics, etc. URL: http://www.cmbi.kun.nl/whatif/ Acknowledgments The authors thank Roman Laskowsky for providing the LigPlot output, Mathias Baedeker for the diffraction pattern and electron density maps, and the reviewers for carefully reading the manuscript and excellent suggestions.
ABBREVIATIONS AND ACRONYMS CPK DNA LCD MAD MIR NMR rmsd RNA PDB
corey-pauling-koltun; a way of representing molecules deoxy ribonucleic acid liquid crystal display multiple wavelength anomalous dispersion multiple isomorphous replacement nuclear magnetic resonance root mean square deviation ribonucleic acid protein data bank
BIBLIOGRAPHY 1. T. Walz, T. Hirai, K. Murata, J. B. Heymann, K. Mitsuoka, Y. Fujiyoshi, B. L. Smith, P. Agre, and A. Engel, Nature 387, 624–627 (1997). 2. M. H. Stowell, A. Miyazawa, and N. Unwin, Curr. Opinion Struct. Biol. 8, 595–600 (1998). 3. I. Branden and J. Tooze, Introduction to Protein Structure, Garland publishing, Inc., New York & London, 1998. 4. B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, and J. D. Watson, Molecular Biology of the Cell, Garland publishing, Inc., New York & London, 1989. 5. L. Lehninger, D. L. Nelson, and M. M. Cox, Principles of Biochemistry, W. H. Freeman and Company, New York, 1993. 6. L. Stryer, Biochemistry, W. H. Freeman and Company New York, 1995. 7. D. Voet, J. G. Voet, and C. A. Pratt, Fundamentals of Biochemistry, John Wiley & Sons, Inc., NY, 1998. 8. H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne, Nucleic Acids Res. 28, 235–242 (2000). 9. D. G. Reid, ed., in Methods in Molecular Biology 60, Humana Pr. ISBN 0-89603-309-0, 1997. 10. H. N. Moseley and G. T. Montelione, Curr. Opinion Struct. Biol. 9, 635–642 (1999). 11. G. Rhodes, Crystallography Made Crystal Clear, Academic Press, San Diego, 1993, 2000.
12. M. A. Walsh, G. Evans, R. Sanishvili, I. Dementieva, and A. Joachimiak, Acta Crystallogr. D Biol. Crystallogr. 55, 1,726–1,732 (1999). 13. M. A. Walsh, I. Dementieva, G. Evans, R. Sanishvili, and A. Joachimiak, Acta Crystallogr. D Biol. Crystallogr. 55, 1,168–1,173 (1999). 14. B. W. Mathews, J. Mol. Biol. 33, 491–497 (1968). 15. B. W. Mathews, in H. Neurath and R. L. Hill, eds., The Proteins, 3rd ed., Academic Press, NY, 1977, pp. 404–590. 16. C. Giacovazzo, H. L. Monaco, D. Viterbo, F. Scordani, G. Gilli, G. Zanotti, and M.Catti, Fundamentals of Crystallography, International Union of Crystallography, Oxford University Press, Oxford, UK, 1992. 17. J. Drenth, Principles of Protein Crystallography, Springer, New York, 1994. 18. G. M. Sheldrick, J. Mol. Struct. 130, 9–16 (1985). 19. G. M. Sheldrick and T. R. Schneider, R. M. Sweet and C. W. Carter Jr., eds., SHELXL: High Resolution Refinement, Methods in Enzymology, Academic Press, Orlando, FL, 277, 1997, pp. 319–343. 20. R. A. Engh and R. Huber, Acta Crystallogr. A47, 392–400 (1991). 21. M. Gerstein and W. Krebs, Nucleic Acids Res. 26, 4,280–4,290 (1998). 22. L. N. Pauling, The Nature of the Chemical Bond, Cornell University, Ithaca, NY, 1939. 23. G. C. Kendrew, The Three-Dimensional Structure of a Protein Molecule, Sci. Am. 205, 96 (1962). 24. H. C. Watson, Prog. Stereochem. 4, 299 (1969). 25. J. S. Richardson, Adv. Protein Chem. 34, 167–339 (1981). 26. M. Carson, J. Mol. Graphics. 5, 103–106 (1987). 27. A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, J. Mol. Biol. 247, 536–540 (1995). 28. C. A. Orengo, A. D. Michie, D. T. Jones, M. B. Swindells, and J. M. Thornton, Structure 5, 1,093–1,108 (1997). 29. J. F. Gibrat, T. Madej and S. H. Bryant, Curr. Opinion Struct. Biol. 6, 377–385 (1996). 30. L. Holm and C. Sander, Nucleic Acids Res. 26, 316–319 (1998). 31. C. Chothia, Nature 357, 543–544 (1992). 32. C. A. Orengo, D. T. Jones, and J. M. Thornton, Nature 372, 631–634 (1994). 33. Z. X. Wang, Protein Eng. 11, 621–626 (1998). 34. C. Chothia and A. M. Lesk, EMBO J 5, 823–826 (1986). 35. I. N. Shindyalov and P. E. Bourne, Proteins: Struct. Function Genet. 38, 247–260 (2000). 36. K. Henrick and J. M. Thornton, Trends Biochem. Sci. 23, 358–361 (1998). 37. M. L. Connolly, Science 221, 709–713 (1983). 38. M. L. Connolly, J. Appl. Crystallogr. 16, 548–558 (1983). 39. A. R. Leach, Molecular Modelling, Principles and Applications, Addison-Wesley, Harlow, UK, 1996. 40. G. J. Kleywegt, T. Bergfors, H. Senn, P. Le Motte, B. Gsell, K. Shudo, and T. A. Jones, Structure 2, 1,241–1,258 (1994). 41. T. F. Schwede, J. Retey, and G. E. Schulz, Biochemistry 38, 5,355–5,361 (1999). 42. J. P. Derrick and D. B. Wigley, J. Mol. Biol. 243, 906–918 (1994). 43. B. Mikami, M. Degano, E. J. Hehre, and J. C. Sacchettini, Biochemistry 33, 7,779–7,787 (1994).
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY 44. E. M. Harris, A. E. Aleshin, L. M. Firsov, and R. B. Honzatko, Biochemistry 32, 1,618–1,626 (1993). 45. S. Onesti, P. Brick, and D. M. Blow, J. Mol. Biol. 217, 153–176 (1991). 46. H. A. Schreuder, J. M. Rondeau, C. Tardif, A. Soffientini, E. Sarubbi, A. Akeson, T. L. Bowlin, S. Yanofsky, and R. W. Barrett, Eur. J. Biochem. 227, 838–847 (1995). 47. F. Vallee, M. A. Turner, P. L. Lindley, and P. L. Howell, Biochemistry 38, 2,425–2,434 (1999). 48. M. A. Turner, A. Simpson, R. R. McInnes, and P. L. Howell, Proc. Natl. Acad. Sci. USA 94, 9,063–9,068 (1997). 49. J. Li, P. Brick, M. C. O’Hare, T. Skarzynski, L. F. Lloyd, V. A. Curry, I. M. Clark, H. F. Bigg, B. L. Hazleman, and T. E. Cawston et al., Structure 3, 541–549 (1995). 50. M. Harel, C. T. Su, F. Frolow, I. Silman, and J. L. Sussman, Biochemistry 30, 5,217–5,225 (1991). 51. J. A. Tainer, E. D. Getzoff, K. M. Beem, J. S. Richardson, and D. C. Richardson, J. Mol. Biol. 160, 181–217 (1982). 52. R. A. Engh, A. Girod, V. Kinzel, R. Huber, and D. Bossemeyer, J. Biol. Chem. 271, 26 157–26 164 (1996). 53. A. Nicholls, K. Sharp, and B. Honig, Struct. Function Genet. 11, 281 (1991). 54. A. C. Wallace, R. A. Laskowski, and J. M. Thornton, Protein Eng. 8, 127–134 (1995). ¨ 55. R. Koradi, M. Billeter, and K. Wuthrich, J. Mol. Graphics 14, 51–55 (1996). 56. P. J. Kraulis, J. Appl. Crystallogr. 24, 946–950 (1991). 57. M. F. Sanner, A. J. Olson, and J. C. Spehner, Biopolymers 38, 305–320 (1996). 58. T. A. Jones, J. Y. Zou, S. W. Cowan, Kjeldgaard, Acta Crystallogr. A 47, 110–119 (1991). 59. R. A. Sayle and E. J. Milner-White, Trends Biochem. Sci. 20, 374–376 (1995). 60. E. A. Merritt and D. J. Bacon, Methods Enzymol. 277, 505–524 (1997). 61. N. Guex and M. C. Peitsch, Electrophoresis 18, 2,714–2,723 (1997). 62. N. Guex, A. Diemand, and M. C. Peitsch, Trends Biochem. Sci. 24, 364–367 (1999). 63. D. R. Westhead, D. C. Hutton, and J. M. Thornton, Trends Biochem. Sci. 23, 35–36 (1998). 64. G. Vriend, J. Mol. Graphics 8, 52–56 (1990).
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY RICHARD W. VORDER BRUEGGE Federal Bureau of Investigation Washington, D.C.
INTRODUCTION Imaging and imaging technologies play an important role in forensics and criminology. Photographs and other visual recordings represent an efficient means for documenting the condition of a subject at an instant in time. The subjects of such images may include accident and crime scenes, items of evidence in a laboratory, victims, or suspects. Such visual records are easily preserved for later use by investigators, eyewitnesses, forensic scientists,
709
court officials, and juries. The thorough and complete photographic recording of a scene, person, or item of evidence also enables investigators and forensic scientists who were not present when the images were recorded to develop meaningful information long after the person, scene, or item was photographed and may have lost all traces of a prior condition. The recording of this prior condition can be critical in many investigations due to the transient nature of the subjects or scenes. For instance, victims’ wounds can heal, leaving no visible manifestation of the violence done to them. Similarly, a highway that is the scene of an automobile accident cannot be closed indefinitely to preserve the evidence but must be reopened in a timely fashion to permit the flow of traffic. Likewise, a footprint left in snow at a crime scene will not remain once the snow has melted. Photographs may also help eyewitnesses remember relevant facts they might otherwise discount or forget. In other instances, imaging technologies may provide the only means for revealing or recording evidence. Surveillance imagery depicting the robbery of a bank may be used to document the actions of the robbers, which might not otherwise have been witnessed. In the laboratory, forensic imaging techniques may be used to produce a visible record of writings that are not apparent to the naked eye through the selective use of lighting, specialized photographic papers, and filters. Image processing may be used to extract critical details or reduce extraneous noise from an image, thus helping clarify the picture. Finally, images may also be used as the subject of forensic analysis to derive information above and beyond that immediately apparent to the untrained eye. Such analyses may include comparing photographs of a latent impression with fingerprints of a known suspect or determining the height of a robber depicted in a surveillance image. The range of imaging technologies used in law enforcement and forensic applications encompasses a broad field from traditional silver-based film to analog video, and now includes digital still and digital video imaging. Of particular importance to law enforcement today are technologies such as scanners and digitizers (or ‘‘frame grabbers’’) that convert traditional film and analog video imagery to a digital format. Once in a digital format, such images may be analyzed using digital processing techniques. The range of situations in forensics and criminology in which imaging technologies play a role is wide. For simplicity, these situations may be divided into several general categories: field-based photography and video, laboratory-based photography, forensic image processing, and laboratory image analysis. These categories are rarely independent of one another, however, because images acquired in the field or in a laboratory frequently become the subject of forensic image processing or image analysis. FORENSIC FIELD-BASED PHOTOGRAPHY AND VIDEO Images convey information that is not easily recreated through words alone. Photography provides a way of
710
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
recording complex information that is easily shared with others who were not first hand witnesses of a scene or object. Crime scenes, accident scenes, and the victims of such incidents are among the most commonly documented subjects in forensic and criminological imaging. Basic documentation is critical to provide a picture of the end result of the events that are under investigation to those who were not at the scene. The category of ‘‘forensic field-based photography and video’’ is defined herein as that category of forensic imaging in which the subject of the photography is perishable or otherwise unstable in a relative sense. Most of the time, the forensic photographer in the field will have only a short time to capture the scene before the conditions of the scene or the subject of the photography changes and alters its fundamental visual characteristics. For example, if a crime or accident occurs in a public place, authorities will have only a limited amount of time to restrict public access to the area before they must remove victims and evidence and allow public access to the scene. In such cases, the photographer has only one chance to document the subject thoroughly. Once the victims and evidence have been removed, they are not easily returned for further photographic documentation. Basic Crime Scene Photography Forensic photographers encounter a wide variety of situations including accident scenes, murders, fires, bombings, property crimes, and mass disasters. In every case, investigators are interested in finding out what happened, how it happened, and who was responsible. The specific means for photographing a crime scene may vary depending upon the nature of the crime but for the most part involves the same process. One of the primary goals of crime scene photography is to document the scene so that it is as accessible to later viewers as it was to those on the scene. Therefore, forensic photographers attempt to photograph the scene to approximate how an on-site investigator might view the scene. This typically involves a three-step process in which the viewer is provided with a sequence of images that progress from a relative broad view of an entire scene down to a narrow focus on specific items of interest. In every case, attempts should be made to complete the photography before any of the evidence or victims are moved so that the photographs reflect a crime scene that is as close to pristine as possible. The first step involves relatively wide field of view shots of a scene, sometimes referred to as ‘‘establishing views.’’ An entire scene is captured within a single frame so that the viewer can observe the relationship of the scene to the immediate neighborhood. In some cases, for example, if the crime scene is located inside a house, overall shots will be taken from multiple angles or sides of the house to document possible paths of entrance and egress. If the crime scene is an automobile, photographs of each side of the vehicle will be taken before moving to the interior of the vehicle. In a continuation of the wide-shot process, the approach to the crime scene within a building may include a progression of photographs documenting doorways, staircases, and hallways leading to the room or rooms
in which the main crime scene is located. Once at the specific room in which the crime took place or in which the evidence is located, the process of documenting the scene by overall views continues from photographs of the entire room. Such wide-angle views are generally taken from every corner of the room to provide as thorough documentation as possible. Following the wide shots, medium-range shots are taken to help viewers identify the spatial relationships between individual items of evidence. Thus, these are sometimes called ‘‘relational views.’’ This process frequently involves interaction between the photographer and the personnel at a scene who are responsible for investigating the crime. The investigative personnel may indicate to the photographer specific items of evidence in a scene that are suspected of being important to an investigation, such as a possible murder weapon or wounds on a deceased victim. The photographer documents the relationship between the specific item of interest and its surroundings by focusing on the object of interest while also including recognizable objects in the immediate vicinity to help viewers relate the object in question to the overall scene. Once the position of an object of evidence is documented relative to its surroundings, the photographer can proceed to the third level of documentation in which the object of interest is the sole subject of the photographs. Such photographs are called ‘‘close-up views.’’ This process typically involves two steps, photographing the subject as it was found first, then photographing the subject with an identifier and/or scale inserted into the field of view. Exquisite care must be taken in placing the scale relative to the item of evidence as well as in placing and directing the lighting used to illuminate the object of interest. This is due to the fact that in many cases, the photographs taken at this time will be subjected to later forensic analysis to extract additional information that is not immediately apparent to those on the scene. This is especially important for fingerprints, footwear impressions, and tire tread impressions, as discussed later. Macrophotography is frequently used in this third step of the process. In macrophotography, the size of the image on the recording medium is approximately the same size as the object itself. This permits anyone viewing the photograph of the object to detect as much detail as they would if they were viewing the object itself. It also simplifies the process of measuring an object by using the photographs instead of measuring the object itself directly. Equipment for Forensic Field-Based Photography. Unlike many other forensic fields, there are no formally mandated minimum standards for basic crime scene photography. This may be due, in part, to the extreme range of conditions in which crime scene photographers find themselves, as well as the requirements or restrictions that many agencies place on their photographers. Although it was not uncommon for police photographers to use 4 × 5 black-and-white film cameras several decades ago, the most frequently used cameras for major crime scene documentation today are 35-mm single-lens reflex cameras, although some agencies continue to use larger
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
format 2 1/4 film cameras. For lesser crimes, perhaps involving only minor property damage or limited physical harm to a victim, many agencies will limit their photographic documentation to using point-and-shoot or ‘‘instant’’ film cameras or digital still cameras. Both black-and-white film and color films may be used, black-and-white films are preferred for close-up photography in which maximizing the visibility of small details is most critical. A variety of wide-angle, macro-, and zoom lenses should also be available to provide flexibility to the photographer in capturing the area of interest. Finally, a separable, or ‘‘off-camera’’ flash unit is also frequently included in a field photographer’s kit because it is often necessary to provide lighting from a direction other than that of the camera. Many agencies use video documentation to augment their film documentation of crime scenes. However, the limited resolution of video relative to film media makes video alone insufficient for completely documenting major crime scenes and providing the type of forensic photographic evidence that can be most useful in successfully investigating a crime. High-definition video offers a chance to improve on the quality of crime scene video, but higher costs and the current lack of an industry standard (similar to the analog VHS) prevent its widespread use in law enforcement. Nevertheless, video documentation can be very useful in providing ‘‘walk-through’’ views and giving the viewer a sense of the spatial relationships between one part of a scene and other widely separated parts of a scene. In some cases, investigators will use video to document the paths individuals took when walking or driving from one location to another, such as the suspects’ paths from their homes or places of work to the crime scene. A recent development now being marketed to law enforcement agencies are systems in which 360° views are generated from a single position by using multiple cameras. The separate views generated by each camera are related to one another in real time through computer software, making it possible for viewers to generate their own ‘‘virtual walk-throughs’’ of a scene. As for high-definition video systems, the high cost of such systems prevents their widespread distribution as yet, but it is anticipated that they, too, will become a useful tool for law enforcement in years to come. In recent years, as digital still photography has become more widely available to the consumer market, a number of law enforcement agencies have become interested in using this technology to replace traditional film technologies. Unfortunately, the perception among many is that because digital technology is newer than film, it must be better. This is not yet the case. For forensic applications, the single most important factor besides actually producing a photographic record of a subject is the resolution or amount of detail one may detect in an image. Digital camera systems that are comparable in cost to 35-mm film camera systems are not yet available that can match the resolution that is possible using traditional 35-mm film cameras under optimal conditions. A simple calculation demonstrates this. A single frame of 35-mm film measures 36 × 24 mm. The technical specifications for a major film
711
manufacturer’s ISO 100 black-and-white film states a resolution of between 63 and 200 lines per millimeter, depending upon the contrast exhibited by the subject. Using the simplifying assumption that one line may be equated to one-and-a-half pixels, for the worst case film scenario (resolution = 63 lines per millimeter), this translates into an image size of 3,402 × 2,268 pixels, a total of, 7.7 million pixels. This is more than two and a half times the size of the detectors used in 3-megapixel consumer digital cameras (approximately 2,000 × 1,500 pixels), where one megapixel equals approximately one million pixels. Under the best possible conditions for this film (resolution = 200 lines per millimeter), the resulting image size would be 10,800 × 7,200 pixels, a total of 77.8 million pixels — more than 25 times the number of pixels available in a 3megapixel digital camera. Some researchers state that the assumption of one and a half pixels per line is too low and that a value of two pixels per line would be more appropriate. Using this value, a frame of the 35-mm film discussed before would have a size ranging from 4,536 × 3,024 pixels (total 13.7 million pixels) to 14,400 × 9,600 pixels (a total of 138.2 million pixels). Thus, the film would contain 4 to 40 times more pixels than the 3-megapixel digital camera. Regardless of whether one selects a conversion factor of 1.5 or 2 pixels per line, the 35-mm film can record far more detail in a scene than most digital cameras. Larger format film cameras, such as 2 1/4 cameras, use the same type of films that have the same high resolutions, but capture an even larger frame, resulting in even more pixels per picture. Another factor that favors traditional film cameras over current digital cameras is the homogeneous distribution of the light-sensitive silver grains within film compared with the rigid arrangement of the detectors into horizontal and vertical lines in digital cameras. This fixed alignment of detectors within digital cameras reduces the accuracy with which straight lines or edges that are oriented at an angle may be rendered in the final image. Straight lines on an angle will be rendered by jagged edges in a digital image, whereas the silver grains within film can create a more accurate smooth edge for such lines because they are not restricted to a horizontal or vertical grid position. A primary goal of forensic imaging is to provide the most accurate recording possible, so film would again be favored over digital imaging. Although based on some very broad assumptions, these conclusions support the underlying concept that 35-mm film cameras remain the most effective means of capturing forensic images in the field. Besides ignoring the actual contrast sensitivity of digital camera detectors (a consideration that favors digital cameras in the preceding calculations), a number of other factors may lead one to favor 35-mm (and larger format) cameras for forensic field applications. These factors include the greater availability of interchangeable lenses for SLR film cameras, as well as the ability to use off-camera flash units. Such accessories are critical for applications such as close-up photography and latent impression photography. Although such accessories are becoming more widely available for professional digital cameras, they are not yet
712
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
generally available for consumer-grade digital cameras that many agencies are now considering for use in law enforcement. Despite the current limitations described before, digital still imaging is being used by a number of law enforcement agencies to augment film documentation of crime scenes. These digital images can be examined and used immediately by investigators on the scene which is not possible when using traditional 35-mm film images. They may also be transmitted back to one’s office or to other agencies for rapid dissemination to other investigators. Most of the time, these digital images are not intended for use in later forensic analyses but are, instead used as ‘‘recognition-type’’ photographs, as discussed following. Specialized Crime Scene Photography The photographs whose acquisition is described in the preceding section are primarily intended to document the condition of a scene as it was found, so that an accurate representation of the scene and the items within it may be presented to later observers. In many cases, however, specialized photographic techniques are used to acquire evidence for further forensic investigation. The subject of such photographs typically includes latent fingerprints or other impressions left on objects or victims at the scene. The use of fingerprint, tire tread, and footwear impression evidence is described later. For such photographic evidence to have forensic value, it must be acquired in a manner that preserves its utility as a meaningful piece of evidence. To an extent, the difference between basic crime scene photography and specialized crime scene photography comes down to a question of recognition versus identification. Recognition is a subjective process through which individuals perceive that an object or person is the same object or person that was known to them beforehand. Identification involves an objective process through which an object or person is uniquely distinguished from all other objects or individuals based on specific, observable characteristics. A primary purpose of basic crime scene photographs is to enable those examining them to recognize the places, objects, and people depicted therein, as well as their physical locations within the scene and relative to one another. In most cases, the establishing views and relational views are intended for use in recognizing the physical evidence collected from a scene. Close-up views may also aid in recognition, but in many instances, they serve a more critical purpose linked to the forensic identification process. Latent fingerprint impressions, tire tread impressions, and footwear impressions left at crime scenes provide strong physical evidence that can be used to link suspects to crimes. This is due to the fact that the objects that create these impressions can be individualized to a single source based on unique, observable characteristics. The characteristics that make an individual fingerprint unique involve the arrangement of friction ridges, including features such as bifurcations, ridge endings, and islands (short ridges that are unconnected to other ridges on the skin). These features appear repeatedly on the fingers of
all persons and are arranged in different, unique patterns on different fingers. In tire treads and footwear, differences in the tread design may enable one to differentiate tires or footwear made by different manufacturers, but the key to identifying individual shoes or tires depends upon individual tiny nicks, gouges, or embedded particles that may have been created during the manufacturing process or through normal wear and tear. These individual identifying features result from random processes, thereby making them useful in differentiating one object from another. When a latent impression is made by a finger, shoe, or tire, the pattern of ridges, nicks, gouges, and embedded particles on the object is transferred to the receiving surface, leaving a record of those individual identifying characteristics. When the forensic photographer encounters such evidence, the critical task is to capture those details photographically, so that those characteristics can be meaningfully compared with the object or objects suspected of having left the impression. Such details are typically small and may not be immediately apparent to the untrained eye, so a great deal of care must be exercised to capture them. The first key to capturing such detail is to use a medium that can record sufficient resolution across the entire field of interest so that the features of interest may be detected, identified, and compared against known samples. Latent fingerprints recovered from crime scenes are compared against known fingerprint samples taken from individuals who have been arrested in the past. These known samples usually consist of inked ‘‘ten-print cards’’ that record an individual’s fingerprint for each digit. These ten-print cards define a standard against which latent print photographs may be compared. The National Institute for Standards and Technology (NIST), working with the Federal Bureau of Investigation (FBI), has developed a standard for exchanging fingerprint data via electronic media in which ten-print cards (those that depict inked fingerprints of a suspect’s ten digits) are scanned at a resolution of 500 samples per inch or approximately 19.7 samples per millimeter. Because Nyquist sampling theory holds that one must sample a feature at a rate of at least two pixels per feature length to depict that feature accurately, the NIST standard permits one to resolve features that are approximately one-tenth of a millimeter across. Note, however, that if one defines the feature of interest as a paired set of dark and light lines (or a fingerprint ridge and its adjacent furrow), then one must use four pixels — two each for the dark and light lines — and the resolvable feature would be only one-fifth of a millimeter across. NIST recommends capturing latent impressions (those impressions from an unknown subject that are recovered at a crime scene) at a resolution of 1,000 samples or pixels per inch (ppi). This higher sampling rate is intended to ensure that the features visible at 500 samples per inch in the ten-print card can be detected in the latent impression. If one were attempting to photograph the latent impression left by an average adult male’s hand that is approximately 7 inches long by 5 inches wide, one would need 7,000 ×
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
Figure 1. Photograph depicting part of an average adult male’s hand. Shading indicates the area covered by a camera with 6 megapixels (3,000 × 2,000 pixels) when the resolution is fixed at 1,000 pixels per inch (ppi), the resolution recommended for acquiring latent impressions. See color insert. (FBI Laboratory Photograph)
5,000 pixels. Such an image size can be achieved by using 35-mm film under good conditions as described earlier, but is beyond the detector size of most digital cameras today. Figure 1 further demonstrates this by superimposing the area encompassed by a 6-megapixel camera (3,000 × 2,000 pixels) on an average adult male’s hand at a resolution of 1,000 ppi. It would be possible to photograph the entire hand depicted in Fig. 1 by using the 6-megapixel digital camera at 1,000 ppi, but it would require taking multiple photographs and then stitching them together to create a single image of the hand. In fact, to cover fully a rectangular area 7 inches long by 5 inches wide at a resolution of 1,000 ppi would require eight photographs using the 6-megapixel camera. For 3-megapixel cameras, which are far more common in law enforcement than 6megapixel cameras, the number of photographs needed to capture the same area is almost double. The size of features needed for positive identification of footwear and tire tread impression is not as well defined as for fingerprints. Because such features consist of random nicks or gouges that can originate from a variety of sources and may be only as small as a pin prick. For purposes of discussion, however, one can consider that the smallest feature is comparable to those needed for fingerprint analysis — or a tenth of a millimeter across. Across a distance of 1 foot — a reasonable assumption for the length of most footwear impressions and the width of most tire tread impressions — this translates into a total of 6,000 pixels in the long dimension. This number of pixels is achievable within a single frame when using 35-mm high-resolution film under the best conditions, but, as for the hand print, is not possible using commercially available digital cameras, whose maximum pixel count is no more than 3,000 in the long dimension. As in other situations, larger size film formats, including 2 1/4-inch films and 4 × 5-inch films, provide an even larger area over which high resolution imagery may be recorded.
713
After resolution, the next most significant factor to consider when acquiring forensic images of latent impressions is proper lighting and exposure. In blackand-white photography, one objective is to acquire a wide a range of density or brightness within the scene from darkest black to brightest white. When photographing impression evidence, however, the type and direction of the light source can be even more important, especially when the subject is a footwear or tire tread impression in soil or other material, that has resulted in a marked indentation. Tire tread and footwear sole patterns create impressions that consist of both raised and sunken features that can be highlighted by placing a light source low to the ground and across the impression. This permits identifying both raised and sunken features by creating highlights on those features that face the light source and shadows on those features that face away from the light source. If the light source is placed only directly over the features of interest, they may be less apparent and therefore less useful as a tool of forensic identification. Other steps that must be taken to ensure acquiring the best possible impression photographs include ensuring that the film plane is parallel to the impression so that distortions are not created in the dimensions of the impression. Inclusion of a scale or ruler is also of critical importance to ensure that accurate comparisons of the impression can take place. When including a scale, it should be placed as close to the same height (or depth) as the impression, so that the scale correctly represents the scale of features within the impression. As a final consideration, care must be taken to ensure that the depth of field is selected, so that the entire impression is in focus. Autopsy And Victim Photography One specialized type of forensic photography that falls between field-based and laboratory-based photography is autopsy and victim photography. The conditions under which an autopsy photographer works may be somewhat more controlled from an environmental standpoint — one need not worry about weather conditions, and the lighting should be completely within the control of the photographer — but once the photographs have been taken and each step of the autopsy has been completed, there is little or no chance to go back and take additional photographs. Therefore, autopsy photographs must be taken correctly the first time. When a living victim is being photographed, it is best to photograph the wounds as soon as possible because their outward appearance will change as they heal. As in photography of crime scenes, autopsy and victim photography should begin by taking shots depicting the entire body, moving in to medium-range, relational views to provide the context of specific wounds or features, followed by close-up, wound-specific photographs. The close-up photographs should include frames depicting the wound alone, followed by frames depicting the wound with a ruler or scale in close proximity. It is best to place the ruler immediately adjacent to the wound, so that the most accurate measurements of the wound may be acquired from the photographs, if necessary. In most cases, it is
714
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
preferable to measure the wounds directly at the time of autopsy and record it then; the photographs merely document these measurements. Close-up photographs of wounds may be important in an investigation by showing such characteristics as powder stippling from a gunshot wound or a jagged edged wound created by a serrated knife. Likewise, photographs of gunshot wounds can be used to document whether they represent entrance or exit wounds. In some cases, autopsy or victim photographs can be of special importance in identifying the implement, device, or object that was responsible for creating the wound. Ropes or cords used as ligatures may leave striping or lineations in the skin of the victim, which may then be compared later with items suspected of having been their source. Likewise, burns from hot objects such as lighters or irons may be documented photographically. Because such wounds may heal unevenly, causing a part of the pattern to disappear earlier than the rest of the wound, it is especially important to photograph the wounds when fresh to ensure the most accurate documentation. A final consideration in autopsy and victim photography is the importance of color. The color or change of color in some wounds may be critically important to the investigating pathologist, so the photographs used to depict these injuries must be capable of documenting those observations. Therefore the use of color films is appropriate for most autopsy photography. Care must be taken in such cases, however, to take into account the available lighting conditions and to correct for them when necessary. Fluorescent lighting can generate a green tint in photographs taken under them, so the use of a flash unit or other light source is necessary in such situations. Including a color chart along with a scale in such photographs is always useful for calibrating the colors at a later date. SURVEILLANCE IMAGING Surveillance imaging is any activity designed to record photographically the activities of individuals engaged in, or suspected of, illegal or improper activities. Surveillance may be active or passive and may be conducted by law enforcement agencies in ongoing investigations or as part of an organization’s security activities. The events being documented through surveillance imaging are typically one-time events that will not be replicated. There is only one opportunity to obtain the images, so this type of photography falls within the field-based category. Law enforcement agencies must often engage in active surveillance to document the movements or actions of individuals who are suspected of illegal activities. In such circumstances, surveillance is usually covert, so that the suspects are unaware of the surveillance and will not alter their activities. Most frequently, still images may be used to document the fact that a suspect or vehicle was observed in a specific location or that the suspect met with a specific individual or was carrying a specific object. Traditionally, 35-mm film cameras that have telephoto lenses are used for still imagery. It is not unusual to use lenses whose focal lengths are 300 mm or more, and zoom lenses are
frequently used as well, especially if the range to the subject will vary during the surveillance. Black-and-white film is usually selected as the recording medium in covert surveillance for the higher resolution it offers over color films, digital cameras, and video cameras. In addition, the lighting conditions in many surveillances are such that a high speed film, rated at ISO 800 or higher, is necessary to obtain properly exposed images of sufficient quality that they may be useful to the investigation. Surveillance images are of little use if they are blurry or too poorly exposed to permit one to recognize the individuals, vehicles, or objects depicted therein. Video cameras are used in covert situations in which it is desirable to record the complete actions of individuals under surveillance. Analog video systems remain the most widely used systems in law enforcement, but digital video systems are beginning to be used more frequently. The most widespread use of video cameras in surveillance involves their use in security systems. Banks and other commercial operations such as convenience stores and fast food restaurants, as well as schools and other public facilities, use video camera systems that provide some of the most important imaging evidence used in law enforcement today. Imaging technologies developed specifically for video surveillance include timelapse recorders and multiplexers. ‘‘Time-lapse’’ refers to video systems that can record video imagery at a rate of less than 60 fields per second. (For simplicity, the discussion of video in this section and throughout this entry focuses on National Television Standards Committee (NTSC) video, the standard for video systems used primarily in the Western Hemisphere and Japan. For the most part, this discussion is also valid for video imagery recorded in the Phase Alternation by Line (PAL) and Sequentiel Couleur a Memoire(SECAM) formats, which predominate throughout the rest of the world and record fewer images per second, but have a larger image size than that of NTSC.) The utility of timelapse systems lies in their ability to record imagery during a much longer period of time than would be possible if the recording were made at a normal rate. Whereas a normal, commercial grade video cassette tape will record approximately two (2) hours of NTSC video at a rate of 60 fields per second, time-lapse systems permit one to record images during periods of time up to 240 hours in some cases. The means for accomplishing this is simple. The rate at which images are recorded is simply reduced from 60 fields per second to as few as 1 every two seconds, which permits recording images during a period of 240 hours. In such instances, the cameras are usually pointed at locations in which the subject will be present for a period of time in excess of the period between sequential images. For example, if a typical bank transaction takes 1 minute to occur, then a single camera pointed at a teller station on a time-lapse setting of 240 hours would record 30 images during a transaction. This is usually more than enough time to obtain a good view of the subject. In many surveillance applications, images from multiple video cameras are recorded on a single tape by a multiplexer. In the simplest form, a multiplexer accepts
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
input signals from multiple sources and generates a single output signal. In video surveillance systems, multiple cameras enable an organization to monitor and record images of more than one location at a time. Such systems are particularly useful to banks that choose to record transactions at multiple teller stations at one time and to retail stores that wish to monitor multiple aisles or locations in a store. The rate at which a video multiplexer switches from one input camera signal to the next can vary to as little as one field per camera before switching to the next camera. Multiplexers can be combined with time-lapse recorders to create a multitude of recording options in surveillance systems, and, in fact, some surveillance time-lapse video recorders have built-in multiplexers. Another functionality included in some multiplexers is the ability to display the inputs from multiple cameras simultaneously. The most common configuration used in this scenario displays the incoming signals from four cameras at once, although systems that use sixteen cameras at once are also frequently encountered. In a four-camera system, the output image generated by the multiplexer is divided into four quadrants. This is commonly referred to as ‘‘quad video.’’ To display four signals simultaneously on a regular video monitor, however, it is necessary to compress the incoming signals so that, together, they make a single video image. The resulting output image has only one-half the width and one-half the height of the input image, so this reduces the amount of information available in any one camera’s image to one-quarter of what it was before it was routed through the multiplexer. Finally, a conversion is underway in the surveillance industry in which video will no longer be stored in analog format, but rather in digital format. The analog video signal is captured, then converted to a digital signal, which can then be stored locally on a computer hard drive or transmitted to a remote server for storage. Conversion of the video to a digital format also makes it possible to enhance the capabilities of the surveillance system. In some security applications, the video signals are linked to systems that can detect motion, track people or objects, and even compare the faces of individuals depicted in the video with faces contained in a database. If someone who is not supposed to be present is detected, the system can be configured to alert security personnel. Another benefit of these systems is that by storing the images in a digital format, it is far easier to process the images after the fact using nonlinear editing techniques, as discussed later. From a forensic perspective, the only major drawback to such digital video systems is the frequent use of compression to permit faster transmission and storage of the video images. As discussed later, excessive compression can severely reduce the forensic utility of imagery, so care must be taken to minimize its deleterious effects. Although video now represents the most common imaging technology used in private surveillance systems, some banks and other institutions still use black-andwhite film cameras in conjunction with time-lapse video systems in some of their security systems. The video systems run continuously in time-lapse mode, but the film
715
systems usually do not. Instead, they may be triggered to start taking pictures when an alarm is set off or when a set of ‘‘bait’’ bills is pulled during a robbery. Once activated, these cameras take one or two pictures per second until deactivated or until a set period of time elapses. Usually, such cameras are located over the main entrance/exit of the bank to ensure that any robbers exiting the bank are photographed as they leave. Several different film formats are used in bank surveillance cameras, including 16-mm, 35-mm, 35-mm half-frame, and 70-mm, as well as some panoramic films, among others. Most commonly, the film used in these cameras is rated at approximately ISO 400 and has a resolving power far superior to that of video — anywhere from 50 to 125 lines per millimeter. This translates to an effective resolution of between 3,600 × 2,400 pixels and 9,000 × 6,000 pixels, versus a standard NTSC video resolution of approximately 640 × 480 pixels. Although the resolution of ISO 400 film is less than that of one rated at ISO 100, its faster speed permits higher shutter speeds so that blurring due to subject motion can be reduced while maintaining suitable exposure. Subjects depicted in bank robbery footage often move at high velocities relative to individuals in banks at other times, so a higher shutter speed is useful. MUG SHOTS In 1841, a few years after the invention of photography by Nicepce, the police in Paris began collecting a rogues’ gallery — the first mug shots. In 1874, the use of photographs to identify people was permitted in court for the first time. The use of identification photographs in law enforcement has grown ever since. The greatest boost in taking mug shots came in the 1880s when the Bertillon method of identification was established. This method quickly became the preferred means of personal identification in law enforcement until supplanted by fingerprint identification in the early 1900s. The Bertillon method is a biometric system that relies on careful measurements of the craniofacial characteristics of individuals. To assist the practitioners of this method in their work, mug shots were taken to depict full-facial and profile views of individuals. The general practice of mug shot photography has not changed much since that time, except for the technologies used. As in other areas of forensic imaging, the current tendency is to migrate toward digital or electronic imaging technologies to acquire and store these images. Although the quality of these digital images lags behind that of 35-mm film in identification, they are suitable for recognition. The digital format also simplifies transmitting them via telecommunications channels. Today, the greatest benefit of digital imaging of mug shots is the promise they hold for allowing automated searches of mug shot databases. A number of automated facial recognition technologies have advanced to high capability, although there is still room for improvement. Perhaps the greatest challenge facing those who seek to develop national and international criminal mug shot databases is the lack of standards in acquiring mug shots.
716
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
Until a standard is established, many law enforcement agencies are likely to continue to shoot mug shots on film. LABORATORY BASED PHOTOGRAPHY Once a crime scene has been photographed, it is frequently necessary to photograph items of evidence removed from the scene in a controlled, laboratory setting. Such photography may be undertaken merely for additional documentation, such as documenting the condition of the evidence before it is alterated by destructive testing. Alternately, laboratory photography may be undertaken to support forensic laboratory examinations, where it is easier for the examiner to use photographs of the evidentiary object, rather than the object itself. Likewise, documentary photographs can be easier for investigators, court officials, and juries to handle than the actual item. Laboratory photography may also be used to extract further information of forensic value from the evidence. Evidence Documentation A fundamental advantage of laboratory-based evidence documentation is the opportunity it affords the photographer to select the specific lighting conditions and angle of view that will be used to depict the object. Furthermore, it enables the photographer to focus solely on the object under consideration, without having to worry about other objects that may fall within the field of view and distract from the subject. Unless the object being photographed is particularly fragile or delicate, the photographer also has far more time for photographing the subject than in the field. A common purpose of laboratory photography is to document the condition of an item of evidence before it is modified for further forensic testing. For example, an item of clothing bearing blood stains or other bodily fluids will be photographed before and after swatches of the stained cloth are cut out and removed for serological testing. This is done to document the presence of the fluid on the item, as well as the location from which the swatches were removed. In this way, the integrity of the evidence and the resulting forensic analyses may be ensured. Microscopic evidence is one type of evidence that requires specialized laboratory photographic documentation. Hair, fiber, glass, or other trace evidence recovered at a crime scene can provide useful information in the investigation if it can be associated with a known individual or source. Likewise, bullets and cartridge cases can provide strong evidence in a case if they can be associated with a particular weapon. In each of these instances, the item of evidence can be distinguished from other similar items based on the characteristics visible under microscopic examination. Scientists who examine trace evidence and ballistics rely on comparison microscopes for conducting side by side comparisons of evidence recovered from a scene with items from a known source. A comparison microscope consists of two separate microscopes joined by an optical bridge. By using a comparison microscope, characteristics of hairs and fibers such as color and diameter may be compared
Figure 2. Side by side comparison of hair. See color insert. (FBI Laboratory Photograph)
directly. Figure 2 depicts a side by side comparison of a hair recovered from a crime scene with a hair obtained from a known source. By aligning the known hair with the questioned hair, similarities in the dimensions and optical characteristics may be demonstrated. In bullet or cartridge case comparisons, impressions or striations imparted to the bullet by the barrel rifling or to the cartridge case by the firing pin can be compared directly with those of known weapons by using bullets and cases generated from test firings of the weapon. Once the visual comparison has been performed by the examiner, it can be documented by mounting a camera on the microscope to recreate the examiner’s view. In addition to the one-to-one comparisons that a firearms expert may conduct using the comparison microscope, the images of recovered bullets and shell casings can also be digitally scanned and compared with images of bullets and casings recovered across the United States by using several available image databases. It is expected that by the year 2003, the major ballistics image databases in the United States will be combined into a single national database, known as the National Integrated Ballistics Information Network (NIBIN). Because weapons are frequently used in more than one jurisdiction, this system will enhance the ability of law enforcement agencies to connect
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
previously unconnected crimes that occur across the country. Advanced Laboratory Techniques Although much of forensic photography consists of documenting items and characteristics visible to the naked eye, one of the greatest contributions of forensic photography to law enforcement is its ability to document evidence that cannot normally be seen. Forensic photographers take advantage of lighting and light sources, filters, and films not normally used by recreational photographers to document otherwise invisible evidence. Some of these techniques rely simply on the use of filters to block out unwanted portions of the visible spectrum, and others depend on the reflectance or emission of ultraviolet and infrared light by the object under scrutiny. Filtration and Nonvisible Reflectance. One of the simplest techniques in laboratory forensic photography involves using filters to block out certain wavelengths of visible light, so that the subject of interest may be seen more clearly. Figure 3a depicts an aluminum can bearing a dark fingerprint that extends from the white portion of the can to the red portion. The low contrast between the dark fingerprint dust and the red background makes it difficult to distinguish the details of the print. However, by applying filtration that allows only red light to pass to the film, it is possible to lighten any areas that are red (Fig. 3b). In this filtered view, the red portions appear brightest, and the blue and green portions are dark. The fingerprint, which contains blue and green components
(a)
717
as well as red, remains dark against the previously red background, permitting detection of details of the print. A similar technique may be used when attempting to decipher writing that has been obliterated through the application of additional ink. In much the same way that inks come in different colors, differences in chemistry may affect their properties in the infrared and ultraviolet wavelengths. Two inks that appear identical in visible light may be very different when viewed in infrared or ultraviolet light. Figure 4 depicts an example of this where the infrared reflectance of two inks are found different. Figure 4a depicts an obliteration seen in visible light. When photographed so that only the reflected infrared is recorded (Fig. 4b), the signature underneath the obliteration is clearly seen, indicating that the obliterating ink is transparent to infrared light, whereas the underlying ink is not. The same holds true for the ultraviolet region seen in Fig. 5. Figure 5a depicts two pages from a passport under normal viewing conditions. The passport is suspected of being counterfeit, and one way counterfeit passports are produced is to make previously canceled passports appear valid by staining the pages to obliterate the ‘‘CANCELED’’ stamps. Figure 5b depicts the same pages photographed in reflected ultraviolet light, revealing that the passport is actually a counterfeit that has been treated to obscure the ‘‘canceled’’ stamp. In some cases, the application of reflected infrared or ultraviolet lighting may be combined with high-contrast film to clarify latent impressions further. Figure 6a depicts a blood-stained sheet of paper recovered from a crime scene that was photographed in visible light. The right side of the page appears blank. However, when photographed using reflected ultraviolet and high-contrast film, a pair of footwear impressions is revealed in the previously blank area (Fig. 6b). Luminescence. The use of infrared and ultraviolet reflectance to decipher obliterated writing, as described before, depends on the tendency of some inks to be transparent to these wavelengths whereas others are (a)
(b)
(b)
Figure 3. The use of filtration to drop red portions of the soda can, so that the latent print is more easily detected. (a) View of can with no filtration. (b) View after filtration is inserted to pass only the red component of light, making the red portion lighter. See color insert. (FBI Laboratory Photograph)
Figure 4. Example of infrared reflectance used to reveal obliterated signature. (a) Visible light view of obliteration. (b) View of signature photographed in reflected infrared. (FBI Laboratory Photograph)
718
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
(a)
(b)
(b)
Figure 5. Counterfeit passport revealed via reflected ultraviolet. (a) Visible light view of passport. (b) View of passport photographed in reflected ultraviolet. (FBI Laboratory Photograph)
opaque. Another property of some inks that can be used to great effect in forensic photography is their tendency to exhibit luminescence when exposed to various light sources. Figure 7a depicts a portion of a check suspected of having been altered from $9 to $99. Figure 7b depicts the same check after being photographed by infrared luminescence. The glowing writing in Fig. 7b represents the original writing on the check in an ink different from
(a)
(b) Figure 7. Check alteration revealed via infrared luminescence. (a) Visible light view of check. (b) Check photographed in infrared reveals original writing. (FBI Laboratory Photograph)
Figure 6. Footwear impressions revealed via reflected ultraviolet and high contrast film. (a) Visible light view of page. (b) Page photographed in reflected ultraviolet and recorded on high-contrast film. See color insert. (FBI Laboratory Photograph)
that used to alter the check. Close examination reveals that the individual who altered the check attempted to hide this fact by overwriting all of the original writing as well as adding the ‘‘Ninety’’ and ‘‘9’’ to the dollar value. One can also note in Fig. 7b that the effect of the fluorescence is so strong that fluorescence from the ink stamp on the reverse side of the check is visible to the left of the signature in this image.
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
719
(a)
(b)
Figure 8. Alteration of postal meter stamp detected through ultraviolet luminescence. (a) Visible light view of stamp. (b) Stamp viewed in ultraviolet reveals original meter number. (FBI Laboratory Photograph)
Ultraviolet luminescence likewise can be used to reveal secret writing and to detect alterations, such as the change in the postal meter number depicted in Fig. 8. In each of the images that reveals the alteration, the object in question is being illuminated by an ultraviolet source, but the reflected ultraviolet is filtered out. In some cases, a combination of filtration, contrast, and luminescence must be used to ensure that the maximum amount of information is extracted. Figure 9 provides one such example. Figure 9a depicts the skin removed from the palm of a murder victim. Visual inspection of the hand revealed what appeared to be some writing on the hand. When photographed using a red filter and high-contrast film, the image in Fig. 9b was produced. The writing was discovered to be a phone number. The palm was next photographed using ultraviolet fluorescence techniques, resulting in the photograph in Fig. 9c, which appears to depict a name written above the phone number. Both the phone number and the name proved to be useful investigative leads. Polarization and Directional Lighting. Whereas the previous examples demonstrated the selection or removal of specific wavelengths of light, differences in the polarization of light reflected from a surface may also be used to enhance the visibility of details on a surface. Figure 10a depicts the appearance of a jacket under natural light. The surface of the jacket appears unremarkable. However, upon the application of crosspolarized light, combined with a polarizing filter and a high-contrast film, the dusty impression of a shoe print on the jacket is recorded (Fig. 10b). Finally, sometimes the proper application of light is all that is needed to make latent evidence more apparent. Indented writing is one such type of evidence. If an individual writes a holdup note or some other writing related to a crime on a piece of paper that is the top sheet of a stack of papers, then the pressure of the writing
implement is transmitted through the top sheet, leaving indentations in the sheets below. Figure 11a depicts a page torn from a desk calendar upon which a name and number have been written. Figure 11b depicts the page immediately below that in Fig. 11a, photographed under ambient lighting. This photograph does not reveal any indentations in this second sheet. However, when the second sheet is photographed in light striking the page at a very low angle (‘‘side lighting’’), the indentations from the original writing are revealed (Fig. 11c).
FORENSIC IMAGE PROCESSING ‘‘Image processing’’ is defined by the Scientific Working Group on Imaging Technologies (SWGIT) as any activity that transforms an input image into an output image. By this definition, image processing encompasses a range of activities from simple format conversions to complex image enhancements. The increased availability of lowcost digital image processing hardware and software has accelerated the use of these technologies in law enforcement and forensic applications. In some cases, image processing simply involves converting an image to a form that will make it easier for the end user to view it or otherwise use it. An example of this type of processing is the chemical development of a roll of film after it has been removed from a camera to create a fixed roll of negatives. Those negatives can then be processed further using standard darkroom techniques to create prints. For film used in forensic applications, these development and printing processes are, for the most part, identical to those used for commercial and consumer photography. The processes used for forensic video analysis, however, are more specific to law enforcement applications than those for commercial or consumer uses.
720
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
(b)
(c)
Figure 9. Use of filtration and ultraviolet luminescence to detect writing on palm of hand. (a) Visible light view of palm. (b) View of palm using red filter and high-contrast film enhances phone number. (c) View of palm under ultraviolet luminescence reveals the presence of a name. See color insert. (FBI Laboratory Photograph)
Video Format Conversions The consumer-grade players for video evidence that are most commonly available to investigators and court officials use the video home system (VHS) format. In many instances, however, recordings taped by crime scene personnel, witnesses, suspects, or victims are made using consumer camcorders that use smaller, more compact formats such as VHS-C or 8-mm. VHS-C tapes may be played in a standard VHS machine by using an adapter, but recordings made on 8-mm tapes must be copied onto
(a)
(b)
Figure 10. Use of polarization to detect footwear impression on jacket. (a) Visible light view of jacket. (b) View of jacket under polarized light enhances visibility of footwear impression. (FBI Laboratory Photograph)
VHS tapes to be played in a VHS player. Such simple processing can be considered the video equivalent of making photographic prints from film negatives. A more involved type of format conversion more specific to forensic applications involves multiplexed and timelapse video recording systems. As described before, the use of time-lapse systems and multiplexers is common in video surveillance. Such systems permit users to record images from multiple cameras and over a longer period of time than possible using consumer-grade video recorders. However, the output of time-lapse and multiplexed surveillance recorders is usually difficult to view on consumer-grade video players because a recording made in time-lapse mode (less than 60 fields per second) that is played in ‘‘real-time’’ mode (approximately 60 fields per second) appears at an accelerated rate. If multiple camera views were also recorded on such a time-lapse tape (i.e., multiplexed), then the ability to view the video on consumer-grade equipment is further complicated. The simplest operation to improve the utility of such surveillance tapes is to reduce the playback rate of the tape, preferably using the same type of time-lapse recorder that was used to record the tape. This reduced-speed playback can then be used to record a copy of the tape at a rate that is viewable on a consumer brand video player. If the time period of interest depicted in the time-lapse
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
(b)
(c)
Figure 11. Use of side lighting to reveal indented writing. (a) Page from desk calendar with writing. (b) Page from desk calendar immediately beneath page in (a) viewed in ambient light. (c) Same page as in (b) illuminated via sidelighting reveals indented writing from upper page. (FBI Laboratory Photograph)
tape exceeds two hours, it may be necessary to record the reduced-speed copy on more than one tape.
721
In some multiplexed recordings, however, only a single camera view is of investigative value, making it desirable to remove all but the view of interest from the reduced speed copy. Using analog editing systems, such a procedure can be accomplished only after a timeconsuming operation in which an operator reviews the segment of interest to find the camera view to be copied. The operator then locates each individual field or frame of interest, copies it, and rewrites it out to a new analog tape depicting only the single view. In some cases, rather than write out the individual images directly to another analog tape before copying, the operator will generate a digital file of each image of interest, which is then stored on a computer hard drive or other digital storage medium. The device needed to convert the analog video images to digital image files is an analog-to-digital converter (ADC), commonly referred to as a digitizer or ‘‘frame grabber.’’ Once all of the images of interest have been converted to digital files, then, the individual digital image files may be read out in sequence back to analog videotape by digital-to-analog conversion (DAC) to create the final copy for investigative use. Further discussion of digitizers and frame grabbers is provided later. Like the fully analog method described before, the method of converting each frame of interest to a digital image file can be very time-consuming if the operator must select each image to be converted and perform the analogto-digital (A-to-D) conversion separately on each one. A far less time-consuming means of accomplishing this task is now available to the forensic imaging community in the form of nonlinear digital editing systems. Such systems begin by automatically converting the entire video sequence to a digital file that consists of individual fields or frames that can then be treated as individual images. In many cases, such digitizers use compression to reduce the size of the digital file created through this process. As discussed later, the application of such compression must be given special consideration in forensic applications because it can result in a loss or modification of image data unless care is taken in selecting system settings. Once the video sequence has been converted to a set of digital image files, then, it is a relatively simple procedure to extract the camera view of interest through automated procedures. The simplest means of accomplishing this is to select or extract every ‘‘N’’ th image, where N equals the number of camera views depicted in the multiplexed sequence. This process may not always succeed for several reasons. First, the process by which the analog video sequence is played back and read out is not always perfect due to differences in the devices that record and play back the tapes. Some fields or frames on the original tape may not be played back on a device other than the one that made the recording, due to differences in the alignment of the playback heads or due to a physical process in which the tape being played back does not remain completely in physical contact with the playback heads. Such events can occur randomly, so that one only need rewind the tape and play it back again over the area of interest to retrieve the missing field or frame. In some cases, however, it may be
722
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
necessary to select a different playback device to acquire all of the fields of interest. In a worst case scenario, one may need to play back the tape on the same device that was used to make the original recording to achieve successful playback. An additional challenge that may arise in using automated processes to select a single camera view from a multiplexed recording is that some surveillance systems will alter the sequence by which the camera views are recorded, based on predetermined cues that are tied to other parts of a surveillance system. For example, if a motion detector is set off by movement in an area that is under the observation of a particular surveillance camera, the multiplexer may be programmed to select that camera’s view over those of the other cameras, as long as motion is detected in that area. Another cue frequently used in multiplexed surveillance systems is the activation of a cash register. Such a signal may induce the multiplexer to select only those views that depict the area of the cash register. Finally, some surveillance system multiplexers are programmed to alter the rate at which time-lapse images are recorded when an alarm is triggered; they typically increase the rate to the ‘‘realtime’’ standard 60 fields per second mode, while the rate at which the multiplexer switches between different camera views remains unchanged. When such events prevent one from using a simple time-based technique to extract the relevant images of interest, there are other, more complex automated techniques that can be applied. The most common of these is one in which pattern recognition algorithms are used to permit automatic selection of the camera views depicting the area of interest. Such processes rely the operator to select fixed objects or areas depicted within the field of view of the camera of interest. Then, the nonlinear system may be programmed so that anytime the predetermined objects or area are detected within an image, that image will be extracted from the sequence and saved. In this way, all of the images that depict a single camera’s view may be extracted. One caution regarding this technique, however, is that care must be taken to avoid selecting objects that may become obstructed from view by the movement of people, vehicles, or objects within the scene. Furthermore, although such techniques are technically feasible at present, they require advanced processing systems that are expensive and are unlikely to be widely used by the general forensic imaging community in the near future. Once the entire set of images depicting views from the camera of interest have been copied to a digital format, then, it is a straightforward process of writing them out in sequence, either in real-time (actual speed) or reduced speed (slow-motion) mode. Perhaps more importantly, though, once these images have been converted to digital format, it becomes possible to use digital image enhancement operations more easily to improve their forensic value further. Image Enhancement And Restoration The power of images in forensic science and criminology has been greatly increased in recent years due to progress
in computer technology that now permits individuals to perform complicated image enhancement at their desktops. The SWGIT defines image enhancement as any process intended to improve the visual appearance of an image. Traditional photographic darkroom techniques used to enhance images include brightness (exposure) and contrast adjustments, dodging and burning (or localized exposure variations), color balancing, and cropping. These enhancements can increase the visibility of image details previously obscured by shadows or highlights in an image. Such enhancements are now routinely conducted using digital image processing, which has the benefit of enabling the user to perform enhancements more quickly than in a traditional darkroom without the negative environmental impact created by the chemicals and excess paper used in a traditional darkroom. In addition, digital image processing also permits the use of enhancement and ‘‘image restoration’’ operations not available in the traditional darkroom — operations that originated in signal processing theory. Conversion to Digital Imagery. Digital images used in forensic applications are captured in a number of ways. The quickest method is direct capture using a digital camera. Digital cameras are used in a number of law enforcement and forensic applications to document minor traffic accidents, victim wounds in nonfatal assaults, and gang member tattoos, among other subjects. In some forensic laboratories, digital cameras attached to microscopes are used to document trace evidence such as hairs and fibers and soil samples. Rarely, however, are digital cameras used as the primary means to document major investigations or to generate images that will be subject to detailed forensic analysis due to the current limitations on digital camera technology discussed before. Instead, in many forensic applications, images first captured on film are converted to digital format by film or print scanners. Both devices scan the original image, using either a linear array charge-coupled device (CCD) imager or an area array CCD imager. By scanning an image, rather than simply projecting the entire image onto a fixed area CCD, it is possible to achieve higher resolution and record finer detail from the image. Film scanners that can achieve an optical resolution of 1,200 pixels per inch or greater and flatbed (print) scanners that can achieve an optical resolution of 1,000 pixels per inch or greater are used in many forensic applications. Some film scanners in widespread use can achieve optical resolutions in excess of 3,000 pixels per inch, which is sufficient to resolve individual grains of silver in some black-and-white film images. Videotapes are another source of digital images used in forensic applications. As described before, video signals are recorded electronically in an analog format, so the conversion to digital format is accomplished by an analogto-digital converter (ADC) or ‘‘frame grabber’’ that converts each line of analog image data to a line of pixels. Video signals in the United States follow the RS-170 standard, in which each frame contains approximately 486 usable visible lines of data. To maintain the proper width to height aspect ratio of 4 : 3, a digitized video image that
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
is 486 lines (or pixels) high is sampled at least 648 times to create a digitized video image size of 486 pixels high by 648 pixels wide. Some frame grabbers permit a user to sample each video line more than 648 times and may also sample additional lines above and below the usable visible area. To maintain the proper aspect ratio in such cases, additional horizontal lines must be added to the digital image, either by duplicating entire lines or by interpolating between alternate lines. A related factor that complicates the processing of video images is that each video frame actually consists of two interlaced fields scanned sequentially at a rate of 60 fields per second. By interlacing these fields, each line is scanned approximately 1/60th of a second before or after the lines immediately above and below it, which provides the appearance of continuous motion. A drawback of this system occurs when time-lapse video systems are used. Interlacing can result in extreme discontinuities between subsequent fields, for example, when an object or person moves across the field of view in the one second or more that it may take to record sequential fields. In such cases, a technique known as deinterlacing is used to eliminate one field while maintaining the image height of 486 lines or pixels. This is accomplished either by duplicating every other line or by interpolating between alternate lines and is demonstrated later. Histogram Analysis. Once an image has been rendered in a digital format, it becomes possible to analyze the image using numerical methods because a digital image simply represents a two-dimensional grid of equally spaced locations, each having a specific number corresponding to the brightness or luminance assigned to it. In color images, a set of three numbers is typically used; each number designates the brightness of the red, green, or blue component. The brightness values range from a minimum of zero (0), corresponding to black, up to a maximum of 255, corresponding to white, a total of 256 brightness values. In most cases, 256 values are sufficient to produce the appearance of a continuum within a single scene and has become the de facto standard in digital imaging. This range of values is represented by 8 bits of information (28 = 256 values). Color values consist of three sets of 8-bit data (one each, for red, green, and blue), so color images are usually said to contain 24-bit data. Once the image has been converted into a purely numerical format, a variety of mathematical operations can be performed to analyze the content of the image and adjust the specific values of individual pixels to enhance the visibility of some features (such as small details) or reduce the visibility of other features (such as noise). Histograms provide a simple way of analyzing the content of a digital image that permits one to assess what steps should be taken to adjust the pixel values of an image to improve its overall quality. A histogram represents the brightness content of an image in a format that is similar to a bar chart. It is a plot of the number of pixels that have given a luminance value versus the luminance values. Figure 12a shows the original image of a sidewalk scene, and Fig. 12b shows the histogram of this image. The horizontal axis is labeled
723
‘‘DN’’ for ‘‘digital number,’’ which corresponds to the brightness or luminance values ranging from 0 to 255. The vertical axis represents the number of pixels that have given a DN value. Several observations can be made from analysis of Fig. 12. First, the overall darkness of the image is reflected in the concentration of pixel values at the left end of the histogram. Second, a number of peaks are present in the histogram. Such peaks usually represent large portions in an image that correspond to individual objects or areas depicted in the image. The brightest parts of the image in Fig. 12a, the left faces of the planters in the foreground, are represented by the histogram peak in the vicinity of DN = 130. The second peak around DN = 90 corresponds to the sidewalk upon which the planters rest, and the third peak in the vicinity of DN = 40 corresponds to the right sides of the planters. The less prominent peak below DN = 40 corresponds to the remaining dark parts of the image. Using histogram analysis, one can determine objectively, without inspecting the image itself, whether an image is too bright, too dark, or has too little contrast. The concentration of pixels at the low end of the histogram in Fig. 12b reflects the fact that the image is dark and has few bright, nearly white areas. Once one has assessed the condition of an image’s brightness and contrast variations, then, one can determine what steps should be taken to improve the overall appearance of the image through brightness and contrast adjustments. Brightness and Contrast Adjustments. Brightness and contrast adjustments represent the simplest processing operations that can be applied to images in any setting, whether or not it is forensic. Identifying the proper exposure setting when taking photographs or shooting video is the first consideration for the photographer or videographer. Aperture and shutter speed settings must be selected to ensure that a sufficient amount of light strikes the detector — whether it is film or a CCD — to create an adequate exposure. Most cameras used in forensic (and other) applications use sensors that can detect the exposure within a scene and automatically set the aperture and shutter speed to create a satisfactory exposure for the scene as a whole. Likewise, film processing machines used to generate photographic prints also use automated settings to generate output photographs that will create a satisfactory exposure for the entire image. Such automated processes are geared to producing a pleasing outcome to the casual viewer, and they are based on the assumption that all parts of the scene in the image (including all objects and people throughout the image) are of equal interest to the viewer. In forensic situations, however, the opposite frequently holds true in that a single person or object within a larger scene may be the subject of interest and care must be taken to ensure the proper exposure of that subject, regardless of the effect on the surrounding scene. Properly trained forensic photographers and videographers are aware of these factors and take steps to ensure that the proper exposure and contrast are achieved by using different shutter speeds and aperture settings, as well as different films, photographic papers, and light sources, as necessary.
724
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
250
240
230
220
210
200
190
180
170
160
150
140
130
120
110
90
100
80
70
60
50
40
30
20
0
45000 40000 35000 30000 25000 20000 15000 10000 5000 0 10
# pixels
(b)
DN value (brightness) Figure 12. (a) Sidewalk scene and (b) the histogram of this image. (FBI Laboratory Photograph)
However, many images examined in a forensic setting are not acquired in a manner that ensures proper exposure and contrast of the subject when the images are taken. Instead, images under forensic examination frequently contain only a small portion that is of interest to the investigator. For instance, a bank robbery surveillance film image will depict the interior of the bank, mostly the lobby region, and only a small portion depicting the bank robbers themselves. Using standard automated processing techniques, the overall exposure of the print will be selected to make the details of the lobby most apparent, and the exposure of the robber will be either too great or too little. Because the details of the bank robbers, their faces, clothing, and footwear are of most interest, adjustments geared to improve the visibility of details on the robbers’ persons are appropriate. Traditional Darkroom Techniques. In a traditional photographic darkroom, the brightness of an image or part of an image are adjusted through changes in the exposure. The overall brightness of an image is increased by decreasing the amount of light that is allowed to strike the photographic paper used to print the image. Darker prints
are produced by increasing the amount of light striking the paper. Such changes in exposure are made either by changing the exposure time or by changing the aperture of the lens between the film and the paper. Dodging and burning are traditional darkroom techniques used to decrease or increase the exposure of a particular portion of an image. If a portion of an image appears too dark in one exposure (such as a shadow region), then the photographer will reduce the amount of light striking the paper in that area, making it lighter. If the area is too light (such as a region of highlights), then the area will be subjected to an increased amount of light. Simple masks made of paper or cardboard are frequently used to block the excess light or allow the passage of more light to the areas that are being dodged or burned. Contrast is adjusted in a traditional darkroom in one of two ways. In the first, differences in the chemical makeup of the photographic paper can result in differences in the contrast generated in the images. Silver bromide is the primary reactant in photographic papers. However, silver chloride also reacts to light in a well-controlled manner and may be mixed in different quantities with silver bromide to create a range of light-sensitive surfaces. Pure silver
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
chloride produces a higher contrast image than pure silver bromide, and by combining the silver chloride with silver bromide in varying amounts, it is possible to produce a range of contrasts in different papers. The second way by which traditional darkrooms achieve higher contrast images involves a single type of paper but uses different filters to adjust the wavelengths of light striking the paper during exposure. These papers typically contain both high- and low-contrast emulsions; the highcontrast emulsions are sensitive to blue wavelengths, and the low-contrast emulsions are sensitive to green wavelengths. By adjusting the filtration to increase or decrease the relative amounts of blue and green light, the photographer can achieve variations in contrast. Digital Processing Techniques. Brightness and contrast are adjusted by using digital image processing techniques are accomplished in the same way as other digital operations are accomplished, through numerical operations that act on the digital number (DN) values of each pixel. In the sidewalk scene depicted in Fig. 12a, an increase in brightness could be useful in improving the visibility of details behind the planters, in the street, or on the far sidewalk. By increasing the overall brightness of this image, as in Fig. 13, the visibility of details such as the vehicles is markedly improved. Such improvements do not come without a cost, however. The brightness-adjusted scene depicted in Fig. 13 reveals a characteristic of image processing activities that is frequently discounted. Specifically, when changes are made to improve the visibility of features in one part of an image, the visibility of other features in the image may be degraded. In this case, the details of the planter face have become somewhat washed out by this simple operation. To maintain the ability to detect these differences, one must also adjust the contrast of the image. Contrast adjustments in digital images typically permit easy differentiation of areas that exhibit relatively similar brightness values. Contrast enhancements are intended to provide to the viewer a clearer distinction between two values. From the standpoint of histogram analysis, such operations work by increasing the difference between
Figure 13. Sidewalk scene from Fig. 12 after overall brightness of image has been increased. Note loss of detail in planter faces. (FBI Laboratory Photograph)
725
adjacent brightness values and driving them toward the extremes in the image — toward darkest black and lightest white. Equalization. Although adjusting brightness and contrast separately can be very effective in improving the forensic utility of an image, they are especially effective when used together. Equalization is a particularly effective digital image processing operation that combines these two adjustments. Equalization is an operation in which the distribution of pixel DN values is adjusted so that it is relatively uniform across the entire range from black to white. The result is a histogram whose peaks are spread out more widely than in the original image. Such a histogram reflects an image that exhibits a wide range of brightness values from black to white, as well as good contrast across the image. Figure 14a depicts the sidewalk scene from Fig. 12 and its histogram (Fig. 14b) after histogram equalization. As in other digital operations, histogram equalization does not come without some trade-off. Instead of a smoothly varying histogram, as depicted in Fig. 12b, the equalized image in Fig. 14 has a histogram made up of relatively widely spaced spikes. Whereas this image provides a good range of brightness values from black to white, it does not permit one to detect fine variations between areas that have slight differences in brightness. Unsharp Mask. When brightness and contrast adjustments are applied to an entire image, they are referred to as global operations because they work across the entire image and pixel values, rather than in specific, local areas. These global operators adjust the DN values of individual pixels based solely on the overall statistics of the entire image. Other image processing operations that can improve the overall forensic utility of images are more focused in their application and rely on analysis of the pixels within a given region for their adjustments. Among the most valuable of these operations is the unsharp mask or Laplacian operation. The unsharp mask operation is used to increase the visibility of fine detail or edges in an image. It accomplishes this by increasing the contrast of the edges by making the dark side of an edge darker and the bright side of an edge brighter. The degree to which the edge contrast is increased, as well as the distance from the edge across which the operation acts, may be adjusted, depending on the effect desired. This operation can be performed by using traditional darkroom techniques, but it requires a labor-intensive process where an out-of-focus copy negative is combined with a high-contrast copy negative to produce the desired effect (hence the name ‘‘unsharp mask’’). Digital image processing greatly simplifies this operation through automation. An example of the unsharp mask operation at work is shown in Fig. 15 where the camouflage pattern becomes more clearly defined when the unsharp mask is applied. Deinterlacing. One of the most persistent problems in improving the overall quality of forensic images obtained by using video cameras derives from the interlaced nature
726
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
250
240
230
220
210
200
190
180
170
160
150
140
130
120
110
90
100
80
70
60
50
40
30
20
0
45000 40000 35000 30000 25000 20000 15000 10000 5000 0 10
# Pixels
(b)
DN value (brightness) Figure 14. (a) Sidewalk scene from Fig. 12 after histogram equalization. (b) Histogram of this image. Detail on planter face remains visible. (FBI Laboratory Photograph)
of RS-170 video. Although each video frame consists of approximately 486 useful lines of image data, these 486 lines actually consist of two sets of 243 lines contained in each field. The fields are read out sequentially so that the lines from one field are displayed alternately between the lines from the previous field. By reading out 60 fields per second, a frame rate of 30 frames per second is achieved. When video is recorded and played back at a speed of 60 fields per second, the human eye is unaware of the interlacing effect because changes in the scene between subsequent fields occur too quickly to be detected and give the appearance of continuous motion. However, when time-lapse systems are used to record video images, the movement of objects and individuals in the scene may be sufficient to display detectable variations from field to field. Thus, when a frame grabber is used to convert time-lapse analog video images to digital stills, the result may be that two fields that have detectable variations between them are read out and produce a blurred image. Under other circumstances, a frame grabber actually acquires only a single field of video but creates an entire frame by duplicating each line. This results in an image in which any straight lines that are
oriented on a diagonal exhibit a jagged stair-step pattern, instead of a smoothly varying one. This is demonstrated in Fig. 16a. In either case, an operation called deinterlacing can be used to improve the image quality. Deinterlacing reduces the effect of interlacing by replacing every other line with interpolated values based on the values of the two lines above and below it. In some cases, the interpolated value for each pixel in a line may consist merely of the average of the pixel values immediately above and below the interpolated pixel. In other cases, the interpolation may incorporate weighted values of all of the pixels immediately adjacent. For images resulting from a single field, deinterlacing creates a relatively smooth transition from one line to the next and reduces the stair-step pattern exhibited by diagonal lines. When the interlaced images consist of two separate fields, it becomes possible to generate two separate images by deinterlacing to remove either the odd lines or the even lines. Figure 16b provides the result when the image in Fig. 16a is deinterlaced by replacing every other line with the average of the lines above and below. Note the relative smoothness of the diagonal lines in these images.
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
(a)
(b)
(b)
Figure 15. Unsharp mask operator. (a) Image of bank robber jacket before application of unsharp mask and (b) after. (FBI Laboratory Photograph)
Noise Reduction. Noise in film and video images represents one of the most persistent problems in forensic image analysis. In film images, the film grain itself is frequently referred to as noise because graininess can interfere with the observer’s ability to detect details in an image. Noise in video images may arise from a variety of sources. Noise that has an immediate effect on the output of an imaging sensor includes dark current, thermal noise, and readout noise. Once the video signal
727
Figure 16. Deinterlacing applied to bank robber image. (a) Before deinterlacing and (b) after deinterlacing. (FBI Laboratory Photograph)
leaves the sensor, it is also degraded by interactions with associated electronic components to which it is exposed. The electronics of the camera itself, including clock signals, can generate interference noise, as can the wiring from the camera to the recorder. Once the signal arrives at the recorder, it is subjected to interference from the recorder’s electronics and any other nearby electronics. Dust, dirt, and mechanical problems in the recording device can also have a deleterious effect on the recorded signal.
728
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
In surveillance applications, the problems of noise may be amplified by the environment in which the cameras and recording devices are placed. In contrast to broadcast quality recording environments, surveillance cameras and recorders may be placed in locations that lack adequate ventilation and that are not electronically shielded from one another. Such conditions can test the thermal limits of these devices and result in images that have excessive noise. If an analog video signal is subsequently converted to a digital format, then further noise is generated during the recorder readout and in the A-to-D conversion. As for the other components in this process, the electronics of the frame grabber and other hardware used to implement Ato-D conversion also generate interference that degrades the signal. Noise reduces the overall quality of the image. Although horizontal and vertical banding from a repeating noise signature frequently occurs in video images, the most common type of noise in forensic images is the random pattern introduced by film grain or by electronic noise in video images. This random noise generates a ‘‘speckled’’ or ‘‘salt and pepper’’ appearance in these images. When these images have been converted to digital format, the noise
manifests itself most prominently as marked variations in the digital number (DN) value of individual pixels or groups of pixels within regions that should have uniform values. Such variations can have a great effect on the ability to distinguish the small, individual characteristics used to identify a person or object — either by masking such characteristics or by making a characteristic appear on an object depicted in an image when, in fact, it is not there. When the noise in an image is random speckling, the most straightforward way of minimizing its effect is to set each pixel value to an average of that pixel’s value and its neighbors’ values. This has the drawback, however, of reducing the overall sharpness of the image by averaging together those areas where large pixel value variations represent actual physical boundaries such as edges. This results in an image that appears less sharp or more blurred compared to the original, thus reducing the effective resolution of the image. Some filtering techniques compensate for this problem by adjusting the values of adjacent pixels by varying weights when averaging (e.g., neighborhood ranking or median filtering), but these techniques still tend to reduce the effective resolution of the image.
Figure 17. Four images from video surveillance tape to be averaged together to reduce the visibility of noise. (FBI Laboratory Photograph)
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
Another approach used to minimize the effect of noise is to take multiple images of the same scene or object at different times and average them together on a pixel-bypixel basis. Such pixel-by-pixel averaging can be conducted if the target object is perfectly aligned or ‘‘registered’’ from one image to the next. This results in no loss of spatial resolution and eliminates the blurring generated in singleimage neighborhood averaging. When the object–camera geometry is fixed, for example, when the camera and target object are both stationary, then the object should be in registration in every image and multiple images may be averaged together on a pixel-by-pixel basis without any need to align the images. Real-time video is an ideal medium for generating multiple images from a fixed camera–object geometry if the camera and object are stationary, as is the case in the example following. In this example, multiple images of a minivan were recorded during a surveillance operation. Upon examining the surveillance images, it was observed that the side of the minivan exhibited a number of irregular features that appeared to represent scrapes, scratches, dents, and other marks that could have resulted from normal wear and tear. Such marks, if matched to those on a known vehicle, would be sufficient to individualize the known vehicle as the one in the surveillance video. In this way, it might be possible to associate the owner of the minivan with the activities being monitored by the surveillance operation. Four images from the surveillance video tape depicting the questioned vehicle are included in Fig. 17. It is possible to recognize scrape marks, scratches, and dents in any one of the images shown in Fig. 17. Close inspection, though, reveals a grainy texture (or speckle) from camera/recorder noise that overwhelms the fine detail, particularly when various contrast enhancement operations are performed. To reduce this noise, the four digitized images were averaged together using commercially available image processing software. The images were averaged together in pairs of two to ensure that each image is given equal weighting. Then the two averaged images were themselves averaged, resulting in the image shown in Fig. 18. Because the surveillance
Figure 18. Image that results from the average of four images in Fig. 17. (FBI Laboratory Photograph)
729
camera and target object (the minivan) remained fixed relative to one another in all four images, no image resizing or image shifts were necessary to bring the minivan images into registration. The averaging operation has a significant effect in reducing the amount of speckle and in improving the quality of the image. The improvement in the image quality generated by the averaging operation is demonstrated in Fig. 19. Figure 19a shows a portion of one image from Fig. 17, and the same portion from Fig. 18 is shown in (b). Both images in Fig. 19 have
(a)
(b)
Figure 19. (a) Comparison of images before averaging and (b) after averaging reveals a reduction in the level of noise visible. (FBI Laboratory Photograph)
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
undergone histogram equalization to increase the visibility of fine details. The speckle dominating image (a) is greatly reduced in the averaged image (b), indicating a successful overall reduction in noise in the averaged image. Furthermore, close inspection of the window area
in the averaged image reveals that it is now possible to detect the presence of the window frame/support strut on the far side of the vehicle. The pattern of the rear wheel rim and cap is also more sharply defined in the averaged image. This detail is not visible in image (a).
(a)
Histogram of difference image (offset to gray at DN=128)
(b) 35000
30000
# Pixels
25000
20000
15000
10000
5000
DN Value Figure 20. Difference image demonstrates transient noise removed from surveillance image by the averaging operation. (a) Unprocessed difference image. (b) Histogram of difference image shows concentration of values in middle-gray region. (c) Difference image after histogram equalization. (FBI Laboratory Photograph)
250
240
230
220
210
200
190
180
170
160
150
140
130
110
90
100
80
70
60
50
40
30
20
10
0
0 120
730
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
731
(c)
Figure 20. (Continued)
As a way of assessing the amount of noise removed from the unprocessed images, a difference operation was performed using the averaged image of Fig. 18 and one of the images from Fig. 17. The difference operation compares the digital number (DN) values of a single pixel location in the two images and assigns a new DN value to that pixel location in an output image. The new DN value is determined by the difference between the value in the first image and that in the second image. If the value in the first image is 10 DN greater than that in the second image, then the output value is +10. If the first value is 5 DN less than the second values, then the output value is −5. To ensure that the output image has a range of DN values from 0 to 255 (and will therefore display a range from black to white), the difference values are offset by a fixed DN level of +127. Thus a difference value of −10 results in a DN value of 117 (= 127 − 10). A difference value of 0 (i.e., the same value in both images) results in a DN value of 127 or middle gray. Then, extreme differences are represented as black-or-white. Figure 20a shows the result of the difference operation, and Fig. 20b shows the histogram of this difference image. Through histogram analysis, it can be determined that 95% of the pixels in the difference image have DN values in the range between 106 and 142, that corresponds to middle gray. Thus, Fig. 20a is dominated by middle gray values. Figure 20c shows the image that results when the brightness and contrast are enhanced in Fig. 20a to emphasize the details in this difference image. Two primary observations can be made about the noise signature in Fig. 20c. First, the difference image contains a high degree of speckle, consistent with random noise
in the single image that has been removed from the averaged image. Second, a regular pattern of noise in the form of horizontal bars is also present. This noise is likely to be from clocking signals or other harmonic signals generated in the surveillance system or recorder. Based on this difference image (Fig. 20c), it is possible to conclude that the averaging operation was useful in removing both random and harmonic noise from the original images. Fourier Analysis. Another image processing technique that has been of great use in forensic applications is Fourier analysis, which is based on theories developed in signal processing. The fundamental basis of Fourier analysis is that any signal or waveform may be expressed as a combination of a series of sinusoidal signals. Furthermore, once a signal has been broken down into a series of sinusoids, it is possible to extract individual sinusoids from signal. An example of a one-dimensional signal that results from the combining of two separate signals is provided in Fig. 21. In this case, the signal represents a variation in voltage, but it could just as easily represent a variation in brightness, such as the variation in brightness across a single line of video. Fourier analysis is not restricted to one-dimensional signals. Through Fourier analysis it is possible to examine simultaneously the variations in brightness in both the horizontal and vertical directions in an image and identify any repeating patterns in the brightness variations. The most common technique used in Fourier analysis of forensic images is the Fast Fourier Transform (FFT). Through application of the FFT, the contents of a twodimensional image are converted into a frequency space
732
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
Figure 21. Combination of two one-dimensional signals to create a third signal. See color insert. (FBI Laboratory Figure)
representation in which repeating brightness patterns can be more easily detected and deleted. The method for converting two-dimensional images to frequency space is demonstrated in Fig. 22. In a frequency-domain representation, the brightness of a feature corresponds to the strength of a given component, and the location corresponds to the signal wavelength. The longer wavelength components are represented closer to the center of the frequency-domain representation, and shorter wavelength components are located farther from the center. An additional characteristic of frequency-domain representations is that repeating features are displayed in frequency space at an angle that is 90° away from their orientation in the actual image. This is reflected in Fig. 22 where the vertical bars in Fig. 22a are represented in frequency space by the horizontal line in Fig. 22b. An example of using FFTs to improve the clarity of a surveillance video image is shown in Fig. 23. The original video image (Fig. 23a) reveals an unknown suspect whose shirt pattern is obstructed by electronic interference (Fig. 23b). The signature of this interference consists of repeating patterns oriented vertically and on diagonals from upper left to lower right and upper right to lower left. Figure 23c depicts the frequency space representation of Fig. 23b. Among the dominant characteristics of this frequency representation are the bright vertical bars to the left and right of the center (Fig. 23c). The bars in the upper left and lower right quadrants correspond to the diagonal noise patterns that extend from the upper right to lower left in Fig. 23b. By suppressing these noise patterns, as shown in Fig. 24a and then converting this image back to an image space representation (Fig. 24b), one may observe that the upper right to lower left diagonals have been removed. Further noise suppression and the resulting image are shown in Figs. 24c,d. Although the poor resolution of the original video image prevents one from clearly identifying the pattern in the shirt, the resulting image allows the viewer to differentiate this shirt more easily from others that might be recovered. As in the example of noise reduction provided before, it is possible to analyze the harmonic noise removed by applying the FFT in this case by using a difference image.
(b)
Figure 22. Demonstration of fast Fourier transform. (a) Image depicting a repeating wave pattern. (b) Frequency space representation of image in (a). (FBI Laboratory Photograph)
Fig. 24e documents the parts of the original image that have been removed. The example provided demonstrates the utility of FFTs in eliminating electronic noise signatures; Fourier analysis has also been particularly useful in separating latent fingerprint or friction ridge patterns from background weave patterns in fabrics. In some cases it is not necessary to remove the repeating pattern completely. Simply reducing the intensity of some background patterns may be sufficient to permit detecting the patterns of interest. LABORATORY IMAGE ANALYSIS Although a majority of the images used in forensic applications are taken for documentation, in many cases, the images themselves become the subject of scrutiny to advance an investigation or to implicate or exonerate individuals suspected of crimes. Three major categories of forensic analysis to which images may be subjected are forensic photographic comparisons, forensic photogrammetry, and forensic image authentication. Forensic Photographic Comparisons Whenever an item recovered from a suspect can be associated with a crime, it can provide important
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
733
widely available during the last 30 years. It was envisioned that such images would be used to compare the faces of the individuals depicted in the surveillance film or video with known suspects. Unfortunately, such facial comparisons cannot be conducted when a felon’s face is obscured by a mask or hood. In such cases, it is frequently possible to make an association between the mask or hood (or other item of apparel) worn by the felon and one recovered from the suspect’s home or vehicle based on the visible characteristics of the item. Other such items may include trousers, shoes, hats, firearms, luggage, and vehicles. Photographic comparisons are based upon the ‘‘principle of individualization’’ (sometimes called the ‘‘principle of identification’’), which states:
(a)
(b)
(c)
Figure 23. FFT applied to the video image of a robber’s shirt. (a) Original video image. (b) Close-up of robber’s shirt. (c) Frequency space representation of image (b). (FBI Laboratory Photograph)
circumstantial evidence in prosecuting that suspect. Photographs and videotape images taken by using surveillance cameras during the commission of crimes are one source of circumstantial evidence that has become
When any two items contain a combination of corresponding or similar and specifically oriented characteristics of such number and significance as to preclude the possibility of their occurrence by mere coincidence, and there are no unaccounted for differences, it may be concluded that they are the same, or their characteristics attributed to the same cause (1).
Two types of characteristics are examined in side by side comparisons: class and individual identifying characteristics. Class characteristics are those that separate a person or item belonging to a specific group or set of objects. The make and model of an automobile or the color of an individual’s hair are examples of class characteristics. Individual identifying characteristics are those characteristics used to differentiate a person or object from others within a class. Individual identifying characteristics frequently used to identify people include freckle patterns, moles, scars, chipped teeth, and ear patterns. Individual identifying characteristics frequently used in identifying clothing and other objects include wear marks, rips, nicks, stains, and other damage that can arise from wear and tear resulting from the regular use of an item. Clothing can also exhibit individual identifying characteristics generated during the manufacturing process such as the unique alignment of patterns across seams. Clothing manufactured from materials that bear plaid or camouflage patterns exhibit unique characteristics that are frequently highly visible in surveillance images due to the high contrast features of these patterns. For automobiles, individual identifying characteristics may include dents, scrapes, rust spots, and even bumper stickers. License plate numbers and vehicle identification numbers represent unique identifying characteristics of the manufacturing process. The mere presence of such features alone is not usually sufficient of individualization, however. Consideration must also be given to the location and orientation of these characteristics, as well as placement relative to one another. As an example, many individuals exhibit moles or scars, but the location of these features varies from person to person. Likewise, many automobiles have dents and scratches but their frequency and placement vary from vehicle to vehicle. Even when a person or object has individual identifying characteristics, a more critical factor in forensic photographic comparisons is often the quality of the surveillance images. To make a positive identification, one must be able
734
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
(b)
(c)
(d)
Figure 24. Application of FFT to a robber’s shirt. (a) Frequency space representation of robber’s shirt edited to remove diagonals that are oriented from upper right to lower left. (b) Image that results from editing in frequency space (a). (c) Frequency space representation edited to remove additional repeating noise in image. (d) Image that results from editing in (c). (e) Difference image reflects the noise removed from image in Fig. 23b to generate the image in Fig. 24d. (FBI Laboratory Photograph)
to distinguish a set of individual identifying characteristics of such number or significance that it separates the item or person from all others. In many cases involving video surveillance images, the quality of the unprocessed images is too poor to allow an examiner to distinguish even a single individual identifying characteristic. Two major factors contribute to the relatively poor quality of most video surveillance imagery. The first is that the inherent spatial resolution of video is far less than that of film, making it more difficult to see fine detail in a single video image than in a single film image. The second major factor that reduces the
quality of video images is noise. Reducing the effect of noise in video images is one of the primary tasks in forensic image enhancement, as discussed earlier. However, despite the poor quality of most surveillance images, quite frequently the quality is sufficient to provide valuable evidence. Examples of Forensic Photographic Comparisons. Figure 25 provides an example of a facial comparison. The photographs (Fig. 25a,b) are arrest photographs taken several years apart in two different jurisdictions in the United States. The photographs reportedly depict
(e)
Figure 24. (Continued)
(b)
(a)
(c)
(d)
Figure 25. Facial comparison. Two mug shots (a,b) thought to depict the same individual. (c) Comparison of region above eyes. (d) Comparison of right ear region. (FBI Laboratory Photograph) 735
736
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
two different individuals, but authorities suspected they were actually of the same person. Fingerprint records corresponding to the first photograph were not available, so a facial comparison was conducted to assess whether the photographs depict the same individual. Despite differences in the lighting conditions and quality of the images, multiple similarities were noted between the two images, particularly in the region above the eyes (Fig. 25c) and in the right ear (Fig. 25d). Based on the correspondence of these multiple characteristics, it was concluded that the two photographs depict a single individual. Tattoos provide another means of identifying individuals. The surveillance images depicted in Fig. 26a,b were obtained as part of a covert surveillance. The images of the individual’s face did not show individual identifying characteristics such as moles, scars or ear patterns that would be sufficient to identify the suspect. However, the images in Fig. 26a,b depict what appear to be tattoos on the inside of the left arm of the individual. Law enforcement authorities identified a suspect and obtained photographs depicting the tattoos on the suspect’s arms. As shown in Fig. 26c,d, the tattoos exhibited by the suspect correspond directly with the markings on the known individual. Based on these characteristics, it was possible to identify the suspect as the individual depicted in the surveillance images.
The techniques of photographic comparison may also be applied to comparisons of clothing worn by individuals. As an example, a comparison was conducted between the camouflage jacket worn by the bank robber depicted in Figs. 15 and 16 and a camouflage jacket that was recovered during the investigation of a suspect in the crime (Fig. 27b). The individual pieces of material that make up the camouflage jacket are cut at random from continuous rolls of material upon which the camouflage pattern has already been printed. To reduce manufacturing costs, the individual pieces are cut to maximize the amount of material used per roll. No attempt is made during the manufacturing process to align patterns from one jacket to the next because that would require additional time and effort from the individuals constructing the garments. As a result, each seam can be expected to exhibit a different juxtaposition of individual elements of the camouflage pattern. In the camouflage jacket in Fig. 27, more than a half dozen separate pieces are pictured including the right sleeve, the right front panel, the left front panel, the left sleeve, the back panel, the right front breast pocket, and the right front breast pocket flap. The arrows in Figs. 27a,b indicate some of the similarities identified between the jacket worn by the bank robber and the suspect’s jacket. The correspondence of these multiple
(a)
(b)
(c)
(d)
Figure 26. Personal identification by tattoos. (a,b) Images from surveillance video depicting tattoos on subject’s left arm. (c,d) Tattoos on suspect’s left arm. (FBI Laboratory Photograph)
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
737
(b)
(a)
Figure 27. Photographic comparison of camouflage jacket. (a) Surveillance image depicting jacket worn by robber. (b) Jacket recovered from suspect. Arrows indicate some of the points of similarity. (FBI Laboratory Photograph)
similarities is sufficient to identify the suspect’s jacket as the one worn in the bank. Other objects such as vehicles may also be identified through photographic comparison. The minivan in Figs. 17 through 19 that was the subject of noise reduction was compared with the minivan owned by an individual suspected of involvement in the case (Fig. 28). Recall that the noise was reduced in the video images to clarify small characteristics on the side of the minivan that were scratches and dents. Figure 28a documents similarities in the class characteristics of the questioned and known vehicles, and Fig. 28b documents similarities in the individual identifying characteristics including multiple scratches and dents. Based on these similarities, the suspect’s vehicle could be identified as the one in the surveillance video.
Comparisonof class characteristics (1 - overall configuration)
(a)
2 2 3 4
5
6
3 4
5
6
Individual identifying characteristics
(b)
5 5
1
2
3
4
1
2
3
4
Figure 28. Comparison of questioned vehicle (left images) with suspect’s vehicle. (a) Corresponding class characteristics. (b) Corresponding individual identifying characteristics. (FBI Laboratory Photograph)
Forensic Photogrammetry Photogrammetry, another primary type of forensic image analysis, involves extracting dimensional information from images. A frequent purpose of forensic photogrammetric examinations is to determine the approximate height of bank robbers depicted in surveillance imagery. This type of analysis provides circumstantial evidence against a known suspect if it is determined that the bank robber is approximately the same height as the suspect. In some cases, however, this analysis can provide a crucial means of eliminating suspects or differentiating among several suspects of different heights. This can be especially useful in corroborating the testimony of a cooperating witness when multiple bank robbers are involved and the cooperating witness was involved in the robbery. Finally, this type of analysis may also generate investigative leads to be used in the profile describing an unknown subject. In the latter instance, this type of analysis can be particularly useful when eyewitnesses offer different estimates of an unknown subject’s height. Many engineering and scientific activities use multiimage photogrammetry to derive accurate measurements of a scene or location. However, in bank surveillance imagery, photogrammetry most frequently involves singleimage analysis because there may be only one camera within a bank or, if there is more than one camera, the cameras are not synchronized to capture simultaneous images of a bank robbery. Therefore, single-image photogrammetric techniques are the most commonly used in forensic applications. One technique often used in forensic photogrammetry involves perspective analysis. Generally, when perspective is present in an image, then lines which are parallel in the original scene (such as the two sides of a door frame, the top and bottom of a service counter, or the outline of tiles that make up the floor of a bank lobby) will appear to converge or vanish at a single point in the image. Such points are called ‘‘vanishing points.’’ The perspective properties of an image can be defined by the two-dimensional coordinate locations for three vanishing points that correspond to three sets of mutually orthogonal parallel planes. These three points define the ‘‘perspective triangle.’’ Once the perspective triangle is defined for an image and a single true linear
738
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
dimension is known within the scene, then it becomes possible to measure any linear dimension within that scene. The bank robbery surveillance image in Fig. 29a is an example. The floor tiles upon which the robber is standing provide a known orthogonal grid pattern on the floor of the bank. The teller counter and the customer service counters in the lobby are also aligned with these tiles in an orthogonal arrangement. By extending the edges of the tiles and counters to the left and right, it is possible to locate these vanishing points (Fig. 29b). Likewise, it is possible
(a)
(b)
Vanishing point "Y "
Vanishing point "X "
Vanishing point "Z " (c) Robber height
Counter height Measuring line
Figure 29. Photogrammetric analysis to determine bank robber’s height. (a) Bank robbery surveillance image. (b) Vanishing point determination as part of photogrammetric analysis. (c) Projection of bank robber height and counter height to measuring line. (FBI Laboratory Photograph)
to determine the third vanishing point in the downward direction by using the vertical edges of the counter face panels and the edges of door frames and wall corners, thus completing the perspective triangle. Once a known scale dimension is located within the scene, then, it is possible to use the perspective triangle to project that dimension and the robber’s height to a common location or ‘‘measuring line,’’ as shown in Fig. 29c. Once these dimensions have been projected to the measuring line, direct scaling may be used to calculate the robber’s height. The dimensions of fixed objects within the scene are also calculated by using this technique to assess the accuracy of the solution. In this case, the calculated height of the robber was approximately 69.75 , with an accuracy of approximately 1% or approximately 3/4 . The true height of the robber, it was later found, was 70 , within the calculated accuracy. (Note: English units are used in many photogrammetric analyses in the United States because American law enforcement officials, court officials, and juries are most familiar with these units. The use of, or conversion to, metric units is done whenever necessary.) Reverse projection is a second technique frequently used in forensic photogrammetry. In many cases, traditional analytical single-image techniques cannot be used in examining these images due to a lack of photogrammetric control either within the scene or for the camera. When this is so, reverse projection can frequently be used. Reverse projection can be used to locate the position of a camera from which an image was taken. Once this camera station has been determined, then, from that position it is possible to ‘‘project’’ the location of people or objects depicted in a photograph. The camera station is determined through an iterative process in which a photograph is first examined to determine the approximate location depicted in the scene and the relationship of the camera to that scene. Once the approximate camera location has been determined, then the type of camera and focal length of the lens used are estimated and an attempt is made to locate the exact position of the camera at the time the photograph was taken. This is accomplished by aligning the camera position and focal length so that the same field of view depicted in the image is depicted in the viewfinder of the camera during an on-site examination. Then, various fixed objects within the scene are used as points of reference to align the camera’s field of view until it exactly replicates that of the image. When the images are film, a fine-grain positive of the image may be placed in the viewfinder to aid in the alignment. In video cases, a mixing board is used along with video ‘‘wipes’’ and ‘‘dissolves’’ to compare the live and video images. A video wipe is a process for replacing one part of the overlying video image by that part of the underlying image. Wipes are typically applied to images horizontally or vertically, allowing the viewer to track the alignment of horizontal and vertical features in a scene within the two images. A video dissolve is a process for reducing the opacity of the overlying image and increasing that of the underlying image so that the first image appears to ‘‘dissolve’’ into the second image.
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY (a)
739
(b)
Figure 30. Bank robber height determination through reverse projection photogrammetry. (a) Image of bank robber in lobby. (b) Reconstruction of scene with height chart at bank robber position. (FBI Laboratory Photograph)
After the alignment is achieved, the crime scene is reconstructed by determining the position at which the individual or object was located within the scene and then placing a height chart or other scale device at that location. Once the scale device is in place, additional photographs depicting the scale device are taken, and the individual or object may be measured through direct comparison with the scale. Figure 30 depicts a bank robber and a height chart located at the robber’s location from which the individual’s height could be determined, as indicated in the figure. Forensic Image Authentication A third type of forensic image analysis involves situations in which the veracity or authenticity of an image may be questioned. Photographs have been manipulated to change the content or the appearance of people and objects within an image practically since the invention of photography. In many cases, alterations are for artistic purposes, and in others they have been used for political purposes. Many high ranking individuals in the Nazi and Soviet regimes of the mid-twentieth century disappeared from photographs after they fell out of favor with the leaders of their nations. Today, given the widespread availability of digital image processing software, just about anyone can alter a digital image file or ‘‘manipulate’’ it. Images originally recorded on film may even be digitized, altered, then
transferred back to film to create an original film image that cannot be identified as a manipulated image. The implications of such capabilities are quite worrisome to members of the criminal justice system. If the veracity of images cannot be trusted, then their value as evidence is greatly reduced or eliminated altogether. Fortunately, if the veracity of an image or set of images is questioned, techniques may be used to analyze the images to detect artifacts left over from a poorly performed manipulation. Given a sufficient amount of time and sufficient resources, it should be possible for a well-trained individual to create an undetectable image manipulation. The key to detecting manipulations, therefore, rests on the fact that few individuals have the time, training, and resources needed to make a perfect manipulation. A number of factors can be examined to detect a manipulation. The content of an image is the first feature examined. If objects or people depicted in the questioned image could not possibly have been present at the time and location shown, then one can conclude that an alteration has taken place. Likewise, it may be possible through photogrammetric analysis to determine that the size and shape of objects depicted in an image are different from their true dimensions, in which case an alteration is also indicated. Other characteristics that are routinely examined include the lighting and shadowing, contrast, color, sharpness, grain structure or resolution, and focus of the various
740
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY (b)
(a)
Figure 31. Image manipulation detection. (a) Manipulated image. (b) Original image from which manipulated image was derived. Irregularities in image content, focus, and resolution in the manipulated image can be observed as described in the text. (FBI Laboratory Photograph)
objects and people in an image. Any sudden, unexplained changes in any one of these characteristics can indicate a manipulation. For example, if two people are depicted in a face-on view standing side by side and one of the individuals appears out of focus whereas the other is in sharp focus, then it is possible one of them has been inserted into the image. Likewise if a shadow is present underneath the nose of one of the individuals and the other’s nose casts no shadow, then a manipulation is also indicated. Figure 31 depicts a manipulated image (Fig. 31a) and the original image from which it was derived (Fig. 31b). The image content provides the first indication of manipulation. The fish depicted in the image is a bass, but the size of the one depicted in the manipulated image is far in excess of the largest bass on record. Furthermore, the individual holding the bass does not appear to be straining in any way to support the weight of the fish, as indicated by his right biceps. In addition, although the fisherman’s hands and leg are sharply in focus, the fish is out of focus or blurred. In contrast, the outline of the fish is quite sharply defined. If the interior of an object is out of focus, the edges of the object should also be out of focus. All of these observations indicate a manipulation. Video Authentication. In addition to the authentication of single images discussed before, the authenticity
of videotapes is often questioned. In addition to the techniques of single-image analysis used earlier, analog videotapes may be authenticated through electronic signal analysis. Such analysis can be used to detect record stops, starts, and erasures, because each of these activities leaves electronic signals and physical signal traces on the magnetic tape that may be detected. These analyses are not restricted to videotapes but are also commonly used to authenticate audiotapes. In some cases, it is possible to record an image of the physical appearance of these traces on the magnetic tape. However, the details of this type of analysis are beyond the scope of this article. For further information, the reader is directed to Koenig (2).
ADDITIONAL FACTORS IN FORENSIC IMAGING APPLICATIONS The increasing use of digital imaging media in forensic applications presents a challenge to the law enforcement community. The integrity of images acquired and stored using electronic media is considered by many more open to suspicion than traditional film due to the ease of altering digital images. Likewise, the issue of image compression represents a challenge that remains to be adequately addressed by law enforcement.
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
Image Integrity Images offered in a court of law can be subjected to the same rigorous standards as living witnesses. If an image cannot be verified as a ‘‘true and accurate representation’’ of what it purports to represent, then it may not be accepted by the court. Evidence showing that the image has been altered may also be suppressed. Even if allowed into the court proceedings, an image whose integrity is challenged may have reduced value to the jury that must consider it. Therefore, the issue of image integrity is critical for law enforcement agencies. Before the advent of digital imaging in forensic applications, the veracity of film images was rarely challenged. It was recognized that film images could be altered, but the means for generating fakes was considered beyond that of the lay person. ‘‘Chain of custody’’ procedures in which police follow strict processes to ensure the proper handling of film evidence helped ensure its integrity by treating film same as other pieces of physical evidence. Some fear that digital imaging and the widespread availability of commercial image processing may change the perceived integrity of forensic images. As a result, many in law enforcement are seeking ‘‘foolproof’’ ways of ensuring the integrity of their digital images using technological fixes. As yet, however, there are no standard technologies in widespread use throughout the law enforcement community for dealing with this issue. One reason for the lack of a standard technology for digital image protection is that, in much the same way that images stored on film negatives can be copied, altered, and rephotographed, so, too, images that are stored in ‘‘secure’’ digital formats can be copied, altered, and then stored once again in ‘‘secure’’ formats to defeat these technologies. One common ‘‘secure’’ technology now being offered to law enforcement agencies is digital watermarking. Such systems are designed to prevent the alteration of images by incorporating a unique image signature within the content of the image so that any alteration is detected. A critical drawback of watermarking systems is that by superimposing this watermark on the image, the content of the image is altered by replacing some of the image content with the watermark. Another drawback is that the watermarking system could be used to place a watermark on a manipulated image, after which point the manipulated image would appear to be authenticated. Given the current lack of rigorously tested systems to maintain the integrity of digital images, the Scientific Working Group on Imaging Technologies (SWGIT) has recommended that agencies protect their images in the same way that they protect any other evidence — by saving one’s images on a physical medium such as film or CDROM and using traditional physical security measures to protect that physical evidence. As long as a sound chain of custody is maintained from the acquisition of an image through its presentation in court, the chances that an image is altered are severely limited. Compression A second major challenge facing law enforcement agencies that would use digital imaging is compression. Compression reduces the size of digital files so that they take up
741
less room on storage devices and permits one to store more images on a single piece of medium. This reduction in size also enables one to transmit an image more quickly on available communications channels. Compression may be either ‘‘lossy’’ or ‘‘lossless.’’ In lossless compression, redundant information is eliminated from an image and permits accurate re-creation of the original, precompressed content of the image. ‘‘Lossy’’ compression is less benign, however; some information contained in an image — usually the least significant bits — is removed from the image. When uncompressed, a lossy image will fill in the missing data using estimated values. The removal of data from forensic images through lossy compression is a major challenge because some of the most critical data may reside in the parts of an image that are removed by compression. Another drawback of lossy compression arises from the fact that, in some cases, the act of compressing an image introduces visible artifacts into the image that were not in the original image. One such artifact is the ‘‘blocking’’ artifact frequently seen in images that have been compressed by using the JPEG compression standard. JPEG compression breaks an image into 8 × 8-pixel regions, which are then individually compressed. This can create relatively large changes in luminance and color from one 8 × 8 block to the next, which can be exaggerated when enhancement operations such as unsharp mask are used. Likewise, lossy compression can alter colors in an image. This could be particularly dangerous in victim wound photography where color shifts might reduce the visibility of existing bruises, or, alternately, create the appearance of bruises where none exist. In the absence of further guidelines for using compression appropriately in law enforcement applications, the SWGIT strongly recommends avoiding the use of lossy compression unless sufficient cause is given by those who compress the images. ABBREVIATIONS AND ACRONYMS ADC A-to-D CCD CDROM DAC DN FBI FFT ISO JPEG mm NIBIN NIST NTSC PAL ppi SECAM SLR SWGIT VHS VHS-C
Analog to Digital Converter (or Conversion) Analog to Digital Charge-Coupled Device Compact Disk Read Only Memory Digital to Analog Converter (or Conversion) Digital Number Federal Bureau of Investigation Fast Fourier Transform International Standards Organization Joint Photographic Experts Group millimeters National Integrated Ballistics Information Network National Institute for Standards and Technology National Television Systems Committee Phase Alternation by Line pixels per inch Sequentiel Couleur a Memoire Single Lens Reflex Scientific Working Group on Imaging Techniques Video Home Systems Video Home System — Compact
742
IMAGING SCIENCE IN MEDICINE
BIBLIOGRAPHY (BY CATEGORY) 1. H. Tuthill, Individualization: Principles and Procedures in Criminalistics, Lightning Powder Company, Salem, Oregon, 1994. 2. B. E. Koenig, J. Audio Eng. Soc. 38(1/2), 3–33 (1990).
Forensic Field-Based Photography and Video American National Standards Institute (ANSI), American National Standard for Information Systems — Data Format for the Interchange of Fingerprint, Facial, & Scar Mark & Tattoo (SMT) Information. ANSI/NIST-ITL 1-2000, 2000. Association for Information and Image Management (AIIM), Technical report for Information and Image Management — Resolution as it Relates to Photographic and Electronic Imaging, AIIM TR26-1993, 1993. E. C. Berg, Forensic Sci. Commun. 2(4) (2000). (http://www. fbi.gov/programs/hq/lab/fsc/backissu/oct2000/berg.htm). H. Blitzer, Law Enforcement Technol. 27(2), 16–19 (2000). H. Blitzer, C. Garcia, and A. Leitch, Law Enforcement Technol. 27(6), 52–55 (2000). H. Blitzer, Law Enforcement Technol. 276), 58–62 (2000). W. J. Bodziak, Footwear Impression Evidence, 2nd ed., CRC Press, Boca Raton, 2000. J. E. Duckworth, Forensic Photography, Charles C. Thomas, Springfield, 1983. D. R. Redsicker, The Practical Methodology of Forensic Photography, 2nd ed., CRC Press, Boca Raton, 2001. Scientific Working Group on Imaging Technologies (SWGIT), Forensic Sci. Commun. 3(1), (2001). (http://www.fbi.gov/programs/hq/lab/fsc/backissu/july2001/swgitltr.htm). Scientific Working Group on Imaging Technologies (SWGIT), Forensic Sci. Commun. 2(1), (2000). (http://www.fbi.gov/programs/hq/lab/fsc/backissu/jan2000/swigit.htm).
Laboratory Based Photography R. K. Bunting, The Chemistry of Photography, Photoglass Press, Normal, IL, 1987. W. R. Harrison, Suspect Documents, Nelson-Hall, Chicago, 1981. G. M. Mokrzycki, Forensic Sci. Commun. 1(3) (1999). (http://www.fbi.gov/programs/lab/fsc/backissu/oct1999/ mokrzyck.htm). S. A. Schehl, Forensic Sci. Commun. 2(2) (2000). (http://www/ fbi.gov/programs/lab/fsc/backissu/april2000/schehl1.htm). L. Stroebel and R. Zakia, eds., The Focal Encyclopedia of Photography, 3rd ed., Focal Press, London, 1993.
Forensic Image Processing
Laboratory Image Analysis D. A. Brugioni, Photo Fakery, Brassey’s, Dulles, VA, 1999. J. Cadigan and W. J. Stokes, AFTE J. 25(4), 242–245 (1993). E. A. Craig and W. M. Bass, J. Forensic Identification 43(1), 27–34 (1993). A. Criminisi et al., Proc. SPIE 3,576, 227–238 (1999). C. Fields, H. C. Falls, C. P. Warren, and M. Zimberoff, Obstet. Gynecol. 16(1), 98–102 (1960). C. N. Frechette, J. Forensic Identification 44(4), 410–429 (1994). M. Houts, ed., Photographic Misrepresentation, Matthew Bender, New York, 1969. R. A. Huber, Criminal Law Q. 2, 276–296 (1959–1960). A. Iannarelli, Ear Identification, Paramount, Fremont, CA, 1989. M. Y. Iscan and R. P. Helmer, eds., Forensic Analysis of the Skull, John Wiley & Sons, Inc., NY, 1993. A. Jaubert, Making People Disappear, Pergamon-Brassey’s, Washington, 1986. D. King, The Commissar Vanishes, Metropolitan Books, Henry Holt, NY, 1997. W. J. Mitchell, Sci. Am. 270(2), 68–73 (1994). F. H. Moffitt and E. M. Mikhail, eds., Photogrammetry, 3rd ed., Harper & Row, NY, 1980. E. Muybridge, The Human Figure in Motion, Dover, NY, 1955. E. Muybridge, The Male and Female Figure in Motion, Dover, NY, 1984. J. K. Nickell, J. Forensic Identification 46(6), 702–714 (1996). C. C. Slama, ed., Manual of Photogrammetry, 4th ed., American Society of Photogrammetry, Falls Church, VA, 1980. W. J. Stokes and J. R. Williamson, Annu. Proc. Photogrammetry Remote Sensing Falls Church, 2, August 1992, pp. 8. C. Van der Lugt, Int. Assoc. Identification Annu. Educ. Conf. Milwaukee, WI, July 11–17, 1999, pp. 30. P. Vanezis and C. Brierley, Sci. Justice 36(1), 27–33 (1996). R. W. Vorder Bruegge and T. M. Musheno, Proc. Am. Defense Preparedness Assoc. 12th Annu. Joint Gov.-Ind. Security Technol. Symp. Exhibition, Williamsburg, VI, 1996, pp. 8. R. W. Vorder Bruegge, MAAFS Newsl. 23(2), 5–6 (1995). R. W. Vorder Bruegge, J. Forensic Sci. 44(3), 613–622 (1999). J. Whitnall and K. Millen-Playter, Photomethods 36–39, November 1985. J. Whitnall, K. Millen-Playter, and F. H. Moffitt, Functional Photography 32–38, January/February 1988. J. Whitnall and F. H. Moffitt, in Non-Topographic Photogrammetry, 2nd ed., American Society for Photogrammetry and Remote Sensing, 1989, pp. 389–393. J. R. Williamson and M. H. Brill, Dimensional Analysis Through Perspective: A Reference Manual, American Society for Photogrammetry and Remote Sensing, 1990, pp. 233.
G. A. Baxes, Digital Image Processing: Principles and Applications, John Wiley & Sons, Inc., New York, 1994. A. Bovik, ed., Handbook of Image and Video Processing, Academic Press, San Diego, 2000.
IMAGING SCIENCE IN MEDICINE
K. R. Castleman, Digital Image Processing, Prentice Hall, Upper Saddle River, 1996.
WILLIAM R. HENDEE Medical College of Wisconsin Milwaukee, WI
J. C. Russ, Forensic Uses of Digital Imaging, CRC Press, Boca Raton, 2001. J. C. Russ, The Image Processing Handbook, 3rd ed., CRC Press, Boca Raton, 1998.
INTRODUCTION
R. W. Vorder Bruegge, Proc. SPIE 3,576, 185–194 (1999).
Natural science is the search for ‘‘truth’’ about the natural world. In this definition, truth is defined by
P. Warrick, J. of Forensic Identification 50(1), 20–32 (2000).
IMAGING SCIENCE IN MEDICINE
principles and laws that have evolved from observations and measurements about the natural world that are reproducible through procedures that follow universal rules of scientific experimentation. These observations reveal properties of objects and processes in the natural world that are assumed to exist independently of the measurement technique and of our sensory perceptions of the natural world. The purpose of science is to use these observations to characterize the static and dynamic properties of objects, preferably in quantitative terms, and to integrate these properties into principles and, ultimately, laws and theories that provide a logical framework for understanding the world and our place in it. As a part of natural science, human medicine is the quest for understanding one particular object, the human body, and its structure and function under all conditions of health, illness, and injury. This quest has yielded models of human health and illness that are immensely useful in preventing disease and disability, detecting and diagnosing conditions of illness and injury, and designing therapies to alleviate pain and suffering and restore the body to a state of wellness or, at least, structural and functional capacity. The success of these efforts depends on our depth of understanding of the human body and on the delineation of effective ways to intervene successfully in the progression of disease and the effects of injuries. Progress in understanding the body and intervening successfully in human disease and injury has been so remarkable that the average life span of humans in developed countries is almost twice that expected a century ago. Greater understanding has occurred at all levels from the atomic through molecular, cellular, and tissue to the whole body and the social influences on disease patterns. At present, a massive research effort is focused on acquiring knowledge about genetic coding (the Human Genome Project) and the role of genetic coding in human health and disease. This effort is progressing at an astounding rate and gives rise to the belief among many medical scientists that genetics and bioinformatics (mathematical modeling of biological information, including genetic information) are the major research frontiers of medical science for the next decade or longer.
The human body is an incredibly complex system. Acquiring data about its static and dynamic properties yields massive amounts of information. One of the major challenges to researchers and clinicians is the question of how to acquire, process, and display vast quantities of information about the body, so that the information can be assimilated, interpreted, and used to yield more useful diagnostic methods and therapeutic procedures. In many cases, the presentation of information as images is the most efficient approach to this challenge. As humans, we understand this efficiency; from our earliest years we rely more heavily on sight than on any other perceptual skill in relating to the world around us. Physicians also increasingly rely on images to understand the human body and intervene in the processes of human illness and injury. The use of images to manage and interpret information about biological and medical processes is certain to continue to expand in clinical medicine and also in the biomedical research enterprise that supports it. Images of a complex object such as the human body reveal characteristics of the object such as its transmissivity, opacity, emissivity, reflectivity, conductivity, and magnetizability, and changes in these characteristics with time. Images that delineate one or more of these characteristics can be analyzed to yield information about underlying properties of the object, as depicted in Table 1. For example, images (shadowgraphs) created by X rays transmitted through a region of the body reveal intrinsic properties of the region such as its effective atomic number Z, physical density (grams/cm3 ) and electron density (electrons/cm3 ). Nuclear medicine images, including emission computed tomography (ECT) where pharmaceuticals release positrons [positron emission tomography (PET)] and single photons [single photon emission computed tomography (SPECT)], reveal the spatial and temporal distribution of target-specific pharmaceuticals in the human body. Depending on the application, these data can be interpreted to yield information about physiological processes such as glucose metabolism, blood volume, flow and perfusion, tissue and organ uptake, receptor binding, and oxygen utilization. In ultrasonography, images are produced by capturing energy reflected from interfaces in the body that separate tissues that have different acoustic impedances, where the acoustic impedance is the product
Table 1. Energy Sources and Tissue Properties Employed in Medical Imaging Image Sources • X rays • γ rays • Visible light • Ultraviolet light • Annihilation radiation • Electric fields • Magnetic fields • Infrared • Ultrasound • Applied voltage
743
Image Influences
Image Properties
• Mass Density • Electron density • Proton density • Atomic number • Velocity • Pharmaceutical location • Current flow • Relaxation • Blood volume/flow • Oxygenation level of blood • Temperature • Chemical state
• Transmissivity • Opacity • Emissivity • Reflectivity • Conductivity • Magnetizability • Resonance absorption
744
IMAGING SCIENCE IN MEDICINE
of the physical density and the velocity of ultrasound in the tissue. Magnetic resonance imaging (MRI) of relaxation characteristics following magnetization of tissues can be translated into information about the concentration, mobility, and chemical bonding of hydrogen and, less frequently, other elements present in biological tissues. Maps of the electrical field (electroencephalography) and the magnetic field (magnetoencephalography) at the surface of the skull can be analyzed to identify areas of intense electrical activity in the brain. These and other techniques that use the energy sources listed in Table 1 provide an array of imaging methods useful for displaying structural and functional information about the body that is essential to improving human health by detecting and diagnosing illness and injury. The intrinsic properties of biological tissues that are accessible by acquiring and interpreting images vary spatially and temporally in response to structural and functional changes in the body. Analysis of these variations yields information about static and dynamic processes in the human body. These processes may be changed by disease and disability, and identification of the changes through imaging often permits detecting and delineating the disease or disability. Medical images are pictures of tissue characteristics that influence the way energy is emitted, transmitted, reflected, etc. by the human body. These characteristics are related to, but not the same as, the actual structure (anatomy), composition (biology and chemistry), and function (physiology and metabolism) of the body. Part of the art of interpreting medical images is to bridge the gap between imaging characteristics and clinically relevant properties that aid in diagnosing and treating disease and disability. ADVANCES IN MEDICAL IMAGING Advances in medical imaging have been driven historically by the ‘‘technology push’’ principle. Especially influential have been imaging developments in other areas, particularly in the defense and military sectors, that have been imported into medicine because of their potential applications in detecting and diagnosing human illness and injury. Examples include ultrasound developed initially for submarine detection (sonar), scintillation detectors and reactor-produced isotopes (including 131 I and 60 Co) that emerged from the Manhattan Project (the United States World War II effort to develop the atomic bomb), rare-earth fluorescent compounds synthesized initially in defense and space research laboratories, electrical conductivity detectors for detecting rapid blood loss on the battlefield, and the evolution of microelectronics and computer industries from research funded initially for security and surveillance, defense, and military purposes. Basic research laboratories have also provided several imaging technologies that have migrated successfully into clinical medicine. Examples include reconstruction mathematics for computed tomographic imaging and nuclear magnetic resonance techniques that evolved into magnetic resonance imaging and spectroscopy. The migration of technologies from other arenas into medicine has not always been successful. For example, infrared detection devices
developed for night vision in military operations have so far not proven useful in medicine, despite early enthusiasm for infrared thermography as an imaging method for early detection of breast cancer. Today the emphasis in medical imaging is shifting from a ‘‘technology push’’ approach toward the concept of ‘‘biological/clinical pull.’’ This shift in emphasis reflects a deeper understanding of the biology underlying human health and disease and a growing demand for accountability and proven usefulness of technologies before they are introduced into clinical medicine. Increasingly, unresolved biological questions important in diagnosing and treating human disease and disability are used as an incentive for developing new imaging methods. For example, the function of the human brain and the causes and mechanisms of various mental disorders such as dementia, depression, and schizophrenia are among the greatest biological enigmas that confront biomedical scientists and clinicians. A particularly fruitful method for penetrating this conundrum is the technique of functional imaging that employs tools such as ECT and MRI. Functional magnetic resonance imaging (fMRI) is especially promising as an approach to unraveling some of the mysteries of human brain function in health and in various conditions of disease and disability. Another example is the use of X-ray computed tomography and magnetic resonance imaging as feedback mechanisms to shape and guide the optimized deployment of radiation beams for cancer treatment. The growing use of imaging techniques in radiation oncology reveals an interesting and rather recent development. Until about three decades ago, the diagnostic and therapeutic applications of ionizing radiation were practiced by a single medical specialty. In the late 1960s, these applications began to separate into distinct medical specialties, diagnostic radiology and radiation oncology, that have separate training programs and clinical practices. Today, imaging is used extensively in radiation oncology to characterize the cancers to be treated, design the plans of treatment, guide the delivery of radiation, monitor the response of patients to treatment, and follow patients over the long term to assess the success of therapy, the occurrence of complications, and the frequency of recurrence. The process of accommodating this development in the training and practice of radiation oncology is encouraging a closer working relationship between radiation oncologists and diagnostic radiologists. EVOLUTIONARY DEVELOPMENTS IN IMAGING Six major developments are converging today to raise imaging to a more prominent role in biological and medical research and in the clinical practice of medicine (1): • The ever-increasing sophistication of the biological questions that can be addressed as knowledge expands and understanding grows about the complexity of the human body and its static and dynamic properties. • The ongoing evolution of imaging technologies and the increasing breadth and depth of the questions
IMAGING SCIENCE IN MEDICINE
•
•
•
•
that these technologies can address at ever more fundamental levels. The accelerating advances in computer technology and information networking that support imaging advances such as three- and four-dimensional representations, superposition of images from different devices, creation of virtual reality environments, and transportation of images to remote sites in real time. The growth of massive amounts of information about patients that can best be compressed and expressed by using images. The entry into research and clinical medicine of young persons who are amazingly facile with computer technologies and comfortable with images as the principal pathway to acquiring and displaying information. The growing importance of images as an effective means to convey information in visually oriented developed cultures.
A major challenge confronting medical imaging today is the need to exploit this convergence of evolutionary developments efficiently to accelerate biological and medical imaging toward the realization of its true potential. Images are our principal sensory pathway to knowledge about the natural world. To convey this knowledge to others, we rely on verbal communications that follow accepted rules of human language, of which there are thousands of varieties and dialects. In the distant past, the acts of knowing through images and communicating through languages were separate and distinct processes. Every technological advance that brought images and words closer, even to the point of convergence in a single medium, has had a major cultural and educational impact. Examples of such advances include the printing press, photography, motion pictures, television, video games, computers, and information networking. Each of these technologies has enhanced the shift from using words to communicate information toward a more efficient synthesis of images to provide insights and words to explain and enrich insights (2). Today, this synthesis is evolving at a faster rate than ever before, as evidenced, for example, by the popularity of television news and documentaries and the growing use of multimedia approaches to education and training. A two-way interchange of information is required to inform and educate individuals. In addition, flexible means are needed for mixing images and words and their rate and sequence of presentation to capture and retain the attention, interest, and motivation of persons engaged in the educational process. Computers and information networks provide this capability. In medicine, their use in association with imaging technologies greatly enhances the potential contribution of medical imaging to resolving patient problems in the clinical setting. At the beginning of the twenty-first century, the six evolutionary developments listed before provide the framework for major advances in medical imaging and its contributions to improvements in the health and well-being of people worldwide.
745
Molecular Medicine Medical imaging has traditionally focused on acquiring structural (anatomic) and, to a lesser degree, functional (physiological) information about patients at the organ and tissue levels. This focus has nurtured the correlation of imaging findings with pathological conditions and led to enhanced detection and diagnosis of human disease and injury. At times, however, detection and diagnosis occur at a stage in the disease or injury where radical intervention is required and the effectiveness of treatment is compromised. In many cases, detection and diagnosis at an earlier stage in the progression of disease and injury are required to improve the effectiveness of treatment and enhance the well-being of patients. This objective demands that medical imaging refocus its efforts from the organ and tissue levels to the cellular and molecular levels of human disease and injury. Many scientists believe that medical imaging is well positioned today to experience this refocusing as a benefit of knowledge gained at the research frontiers of molecular biology and genetics. This benefit is often characterized as the entry of medical imaging into the era of molecular medicine. Examples include the use of magnetic resonance to characterize the chemical composition of cancers, emission computed tomography to display the perfusion of blood in the myocardium, and microfocal X-ray computed tomography to reveal the microvasculature of the lung. Contrast agents are widely employed in X ray, ultrasound, and magnetic resonance imaging techniques to enhance the visualization of properties correlated with patient anatomy and physiology. Agents in wide use today localize in tissues either by administration into specific anatomic compartments such as the gastrointestinal or vascular systems or by reliance on nonspecific changes in tissues such as increased capillary permeability or alterations in the extracellular fluid space. These localization mechanisms frequently do not yield a sufficient concentration differential of the agent to reveal subtle tissue differences associated with the presence of an abnormal condition. New contrast agents are needed that exploit growing knowledge about biochemical receptor systems, metabolic pathways, and ‘‘antisense’’ (variant DNA) molecular technologies to yield concentration differentials sufficient to reveal subtle variations among various tissues that may reflect the presence of pathological conditions. Another important imaging application of molecular medicine is using imaging methods to study cellular, molecular, and genetic processes. For example, cells may be genetically altered to attract metal ions that (1) alter the magnetic susceptibility, thereby permitting their identification by magnetic resonance imaging techniques; or (2) are radioactive and therefore can be visualized by nuclear imaging methods. Another possibility is to transect cells with genetic material that causes expression of cell surface receptors that can bind radioactive compounds (3). Conceivably this technique could be used to tag affected cells and monitor the progress of gene therapy. Advances in molecular biology and genetics are yielding new knowledge at an astonishing rate about the molecular and genetic infrastructure that underlie the static and
746
IMAGING SCIENCE IN MEDICINE
dynamic processes of human anatomy and physiology. This new knowledge is likely to yield increasingly specific approaches to using imaging methods to visualize normal and abnormal tissue structure and function at increasingly microscopic levels. These methods will in all likelihood lead to further advances in molecular medicine. Human Vision Images are the product of the interaction of the human visual system with its environment. Any analysis of images, including medical images, must include at least a cursory review of the process of human vision. This process is outlined here; a more detailed treatment of the characteristics of the ‘‘end user’’ of images is provided in later sections of this Encyclopedia. Anatomy and Physiology of the Eye The human eye, diagrammed in Fig. 1, is an approximate sphere that contains four principal features: the cornea, iris, lens, and retina. The retina contains photoreceptors that translate light energy into electrical signals that serve as nerve impulses to the brain. The other three components serve as focusing and filtering mechanisms to transmit a sharp, well-defined light image to the retina.
Tunics. The wall of the eye consists of three layers (tunics) that are discontinuous in the posterior portion where the optic nerve enters the eye. The outermost tunic is a fibrous layer of dense connective tissue that includes the cornea and the sclera. The cornea comprises the front curved surface of the eye, contains an array of collagen fibers and no blood vessels, and is transparent to visible light. The cornea serves as a coarse focusing element to project light onto the observer’s retina. The sclera, or white of the eye, is an opaque and resilient sheath to which the eye muscles are attached. The second layer of the wall is a vascular tunic termed the uvea. It contains the choroid,
Posterior chamber
Anterior lris chamber lris
Cornea
ciliary body, and iris. The choroid contains a dense array of capillaries that supply blood to all of the tunics. Pigments in the choroid reduce internal light reflection that would otherwise blur the images. The ciliary body contains the muscles that support and focus the lens. It also contains capillaries that secrete fluid into the anterior segment of the eyeball. The iris is the colored part of the eye that has a central aperture termed the pupil. The diameter of the aperture can be altered by the action of muscles in the iris to control the amount of light that enters the posterior cavity of the eye. The aperture can vary from about 1.5–8 mm.
Chambers and Lens. The anterior and posterior chambers of the eye are filled with fluid. The anterior chamber contains aqueous humor, a clear plasma-like fluid that is continually drained and replaced. The posterior humor is filled with vitreous humor, a clear viscous fluid that is not replenished. The cornea, aqueous and vitreous humors, and the lens serve collectively as the refractive media of the eye. The lens of the eye provides the fine focusing of incident light onto the retina. It is a convex lens whose thickness can be changed by action of the ciliary muscles. The index of refraction of the lens is close to that of the surrounding fluids in which it is suspended, so it serves as a finefocusing adjustment to the coarse focusing function of the cornea. The process of accommodation by which near objects are brought into focus is achieved by contraction of the ciliary muscles. This contraction causes the elastic lens to bow forward into the aqueous humor, thereby increasing its thickness. Accommodation is accompanied by constriction of the pupil, which increases the depth of field of the eye. The lens loses its flexibility from aging and is unable to accommodate, so that near objects can be focused onto the retina. This is the condition of presbyopia in which reading glasses are needed to supplement the focusing ability of the lens. Clouding of the lens by aging results in diminution of the amount of light that reaches the retina. This condition is known as a lens cataract; when severe enough it makes the individual a candidate for surgical replacement of the lens, often with an artificial lens.
Lens
Hyaloid canal Vitreous
body
Retina Choroid Sclera
Fovea centralis
Central retinal artery Optic nerve
Figure 1. Horizontal section through the human eye (from Ref. 4, with permission).
Retina. The innermost layer of the eye is the retina, which is composed of two components, an outer monolayer of pigmented cells and an inner neural layer of photoreceptors. Because considerable processing of visual information occurs in the retina, it often is thought of more as a remote part of the brain rather than as simply another component of the eye. There are no photoreceptors where the optic nerve enters the eye, creating a blind spot. Near the blind spot is the mucula lutae, an area of about 3 mm2 over which the retina is especially thin. Within the macula lutae is the fovea centralis, a slight depression about 0.4 mm in diameter. The fovea is on the optical axis of the eye and is the area where the visual cones are concentrated to yield the greatest visual acuity. The retina contains two types of photoreceptors, termed rods and cones. Rods are distributed over the entire retina, except in the blind spot and the fovea centralis. The
IMAGING SCIENCE IN MEDICINE
retina contains about 125 million rods, or about 105 /mm2 . Active elements in the rods (and in the cones as well) are replenished throughout an individual’s lifetime. Rods have a low but variable threshold to light and respond to very low intensities of incident light. Vision under low illumination levels (e.g., night vision) is attributable almost entirely to rods. Rods contain the light-sensitive pigment rhodopsin (visual purple) which undergoes chemical reactions (the rhodopsin cycle) when exposed to visible light. Rhodopsin consists of a lipoprotein called opsin and a chromophore (light-absorbing chemical compound called 11-cis-retinal) (5). The chemical reaction begins with the breakdown of rhodopsin and ends with the recombination of the breakdown products into rhodopsin. The recovery process takes 20–30 minutes, which is the time required to accommodate to low levels of illumination (dark adaptation). The process of viewing with rods is known as ‘‘scotopic’’ vision. The rods are maximally sensitive to light of about 510 nm in the blue–green region of the visible spectrum. Rods have no mechanisms to discriminate different wavelengths of light, and vision under low illumination conditions is essentially ‘‘colorblind.’’ More than 100 rods are connected to each ganglion cell, and the brain has no way of discriminating among these photoreceptors to identify the origin of an action potential transmitted along the ganglion. Hence, rod vision is associated with relatively low visual acuity in combination with high sensitivity to low levels of ambient light. The retina contains about 7 million cones that are packed tightly in the fovea and diminish rapidly across the macula lutae. The density of cones in the fovea is about 140,000/mm2 . Cones are maximally sensitive to light of about 550 nm in the yellow–green portion of the visible spectrum. Cones are much (1/104 ) less sensitive to light than rods, but in the fovea there is a 1 : 1 correspondence between cones and ganglions, so that visual acuity is very high. Cones are responsible for color vision through mechanisms that are imperfectly understood at present. One popular theory of color vision proposes that three types of cones exist; each has a different photosensitive pigment that responds maximally to a different wavelength (450 nm for ‘‘blue’’ cones, 525 nm for ‘‘green’’ cones, and 555 nm for ‘‘red’’ cones). The three cone pigments share the same chromophore as the rods; their different spectral sensitivities result from differences in the opsin component. Properties of Vision For two objects to be distinguished on the retina, light rays from the objects must define at least a minimum angle as they pass through the optical center of the eye. The minimum angle is defined as the visual angle. The visual angle, expressed in units of minutes of arc, determines the visual acuity of the eye. A rather crude measure of visual acuity is provided by the Snellen chart that consists of rows of letters that diminish in size from top to bottom. When viewed from a distance of 20 feet, a person who has normal vision can just distinguish the letters in the eighth row. This person is said to have 20 : 20 vision (i.e., the person can see from 20 feet what a normal person can
747
see at the same distance). At this distance, the letters on the eighth row form a visual angle of 1 minute of arc. An individual who has excellent vision and who is functioning in ideal viewing conditions can achieve a visual angle of about 0.5 minutes of arc, which is close to the theoretical minimum defined by the packing density of cones on the retina (6). A person who has 20 : 100 vision can see at 20 feet what a normal person can see at 100 feet. This individual is considered to have impaired visual acuity. Other more exact tests administered under conditions of uniform illumination are used for actual clinical diagnosis of visual defects. If the lettering of a Snellen chart is reversed (i.e., white letters on a black chart, rather than black letters on a white chart), the ability of observers to recognize the letters from a distance is greatly impaired. The eye is extremely sensitive to small amounts of light. Although the cones do not respond at illumination levels below a threshold of about 0.001 cd/m2 , rods are much more sensitive and respond to just a few photons. For example, as few as 10 photons can generate a visual stimulus in an area of the retina where rods are at high concentration (7). Differences in signal intensity that can just be detected by the human observer are known as just noticeable differences (JND). This concept applies to any type of signal, including light, that can be sensed by the observer. The smallest difference in signal that can be detected depends on the magnitude of the signal. For example, we may be able to discern the brightness difference between one and two candles, but we probably cannot distinguish the difference between 100 and 101 candles. This observation was quantified by the work of Weber who demonstrated that the JND is directly proportional to the intensity of the signal. This finding was quantified by Fechner as
dI , (1) dS = k I where I is the intensity of stimulus, dS is an increment of perception (termed a limen), and k is a scaling factor. The integral form of this expression is known as the Weber–Fechner law: S = k(log I) + C,
(2)
or, by setting C = −k(log I0 ),
S=k
log I I0
.
(3)
This expression states that the perceived signal S varies with the logarithm of the relative intensity. The Weber–Fechner law is similar to the equation for expressing the intensity of sound in decibels and provides a connection between the objective measurement of sound intensity and the subjective impression of loudness. A modification of the Weber–Fechner law is known as the power law (6). In this expression, the relationship between a stimulus and the perceived signal can be stated as
dI dS , (4) =n S I
748
IMAGING SCIENCE IN MEDICINE
Table 2. Exponents in the Power Law for a Variety of Psychophysical Responsesa Perceived Quantity
Exponent
Stimulus
0.6 0.5 0.55 1.3 1.0 0.95 1.1 1.1 1.45 3.5
Binaural Point source Coffee Salt Cold on arm 60 Hz on finger White noise stimulus Static force on palm Lifted weight 60 Hz through fingers
100
a
Mass attenuation coefficient (cm2/g)
50 Loudness Brightness Smell Taste Temperature Vibration Duration Pressure Heaviness Electric shock From Ref. 7.
which, when integrated, yields log S = n(log I) + K
(5)
YTaO4
20
10
Gd2O2S
5
2
1
CaWO4
and, when K is written as −n[log I0 ],
S=
I I0
n
0.5
,
(6) 10
where I0 is a reference intensity. The last expression, known as the power law, states that the perceived signal S varies with the relative intensity raised to the power n. The value of the exponent n has been determined by Stevens for a variety of sensations, as shown in Table 2. Image Quality The production of medical images relies on intercepting some form of radiation that is transmitted, scattered, or emitted by the body. The device responsible for intercepting the radiation is termed an image receptor (or radiation detector). The purpose of the image receptor is to generate a measurable signal as a result of energy deposited in it by the intercepted radiation. The signal is often, but not always, an electrical signal that can be measured as an electrical current or voltage pulse. Various image receptors and their uses in specific imaging applications are described in the following sections. In describing the properties of a medical image, it is useful to define certain image characteristics. These characteristics and their definitions change slightly from one type of imaging process to another, so a model is needed to present them conceptually. X-ray projection imaging is the preferred model because this process accounts for more imaging procedures than any other imaging method used in medicine. In X-ray imaging, photons transmitted through the body are intercepted by an image receptor on the side of the body opposite from the X-ray source. The probability of interaction of a photon of energy E in the detector is termed the quantum detection efficiency η (8). This parameter is defined as η = 1 − e−µ(E)t ,
(7)
20
50
100
200
Photon energy (keV) Figure 2. Attenuation curves for three materials used in X-ray intensifying screens (from Ref. 9, with permission).
where µ(E) is the linear attenuation coefficient of the detector material that intercepts X rays incident on the image receptor and t is the thickness of the material. The quantum detection efficiency can be increased by making the detector thicker or by using materials that absorb X rays more readily (i.e., have a greater attenuation coefficient µ(E) because they have a higher mass density or atomic number. In general, η is greater at lower X-ray energies and decreases gradually with increasing energy. If the absorbing material has an absorption edge in the energy range of the incident X rays, however, the value of η increases dramatically for X-ray energies slightly above the absorption edge. Absorption edges are depicted in Fig. 2 for three detectors (Gd2 O2 S, YTaO4 , and CaW04 ) used in X-ray imaging. Image Noise Noise may be defined generically as uncertainty in a signal due to random fluctuations in the signal. Noise is present in all images. It is a result primarily of forming the image from a limited amount of radiation (photons). This contribution to image noise, referred to as quantum mottle, can be reduced by using more radiation to form the image. However, this approach also increases the exposure of the patient to radiation. Other influences on image noise include the intrinsically variable properties of the tissues represented in the image, the type of receptor chosen to acquire the image, the image receptor, processing and display electronics, and the amount of
IMAGING SCIENCE IN MEDICINE
scattered radiation that contributes to the image. In most instances, quantum mottle is the dominant influence on image noise. In an image receptor exposed to N0 photons, the image is formed with ηN0 photons, and the photon image noise σ can be estimated as σ = (ηN0 )1/2 . The signal-to-noise ratio (SNR) is SNR =
ηN0 (ηN0 )1/2
(8)
= (ηN0 )1/2 A reduction in either the quantum detection efficiency η of the receptor or the number of photons N0 used to form the image yields a lower signal-to-noise ratio and produces a noisier image. This effect is illustrated in Fig. 3. A complete analysis of signal and noise propagation in an imaging system must include consideration of the
749
spatial-frequency dependence of both the signal and the noise. The propagation of the signal is characterized by the modulation transfer function (MTF), and the propagation of noise is described by the Wiener (noise power) spectrum W(f ). A useful quantity for characterizing the overall performance of an imaging system is its detective quantum efficiency DQE(f ), where (f ) reveals that DQE depends on the frequency of the signal. This quantity describes the efficiency with which an imaging system transfers the signal-to-noise ratio of the radiative pattern that emerges from the patient into an image to be viewed by an observer. An ideal imaging system has a DQE(f ) = η at all spatial frequencies. In actuality, DQE(f ) is invariably less than η, and the difference between DQE(f ) and η becomes greater at higher spatial frequencies. If DQE = 0.1η at a particular frequency, then the imaging system performs at that spatial frequency as if the number of photons were reduced to 1/10. Hence, the noise would increase by 101/2 at that particular frequency.
Figure 3. Illustration of quantum mottle. As the illumination of the image increases, quantum mottle decreases and the clarity of the image improves, as depicted in these classic photographs (from Ref. 10).
IMAGING SCIENCE IN MEDICINE
Spatial Resolution
Fourier transform F
The spatial resolution of an image is a measure of the smallest visible interval in an object that can be seen in an image of the object. Greater spatial resolution means that smaller intervals can be visualized in the image, that is, greater spatial resolution yields an image that is sharper. Spatial resolution can be measured and expressed in two ways: (1) by a test object that contains structures separated by various intervals, and (2) by a more formal procedure that employs the modulation transfer function (MTF). A simple but often impractical way to describe the spatial resolution of an imaging system is by measuring its point-spread function (PSF; Fig. 4). The PSF(x, y) is the acquired image of an object that consists of an infinitesimal point located at the origin, that is, for an object defined by the coordinates (0,0). The PSF is the function that operates on what would otherwise be a perfect image to yield an unsharp (blurred) image. If the extent of unsharpness is the same at all locations in the image, then the PSF has the property of being ‘‘spatially invariant,’’ and the relationship of the image to the object (or perfect image) is Image(x, y) = PSF(x, y) ⊗ object(x, y),
(9)
where the ‘‘⊗’’ indicates a mathematical operation referred to as ‘‘convolution’’ between the two functions. This operation can be stated as Image(x, y) = PSF(x − u, y − v) object (u, v) du dv.
MTF(m, n) = F[PSF(x, y)],
(10)
The convolution effectively smears each value of the object by the PSF to yield the image. The convolution (blurring) operation can be expressed by a functional operator, S[. . .], such that PSF(x, y) = S[point(x, y)] (11)
(12)
where (m, n) are the conjugate spatial frequency variables for the spatial coordinates (x, y). This expression of MTF is not exactly correct in a technical sense. The Fourier transform of the PSF is actually termed the system transfer function, and the MTF is the normalized absolute value of the magnitude of this function. When the PSF is real and symmetrical about the x and y axes, the absolute value of the Fourier transform of the PSF yields the MTF directly (11). MTFs for representative X-ray imaging systems (intensifying screen and film) are shown in Fig. 5. The PSF and the MTF, which is in effect the representation of the PSF in frequency space, are important descriptors of spatial resolution in a theoretical sense. From a practical point of view, however, the PSF is not very helpful because it can be generated and analyzed only approximately. The difficulty with the PSF is that the source must be essentially a singular point (e.g., a tiny aperture in a lead plate exposed to X rays or a minute source of radioactivity positioned at some distance from a receptor). This condition allows only a few photons (i.e., a very small amount of radiation) to strike the receptor, and very long exposure times are required to acquire an
Relative intensity (light)
750
where S[. . .] represents the blurring operator, referred to as the linear transform of the system. The modulation transfer function MTF(m, n) is obtained from the PSF(x, y) by using the two-dimensional
10-µm X-ray pencil beam
1.0
Fast screen 0.5 Medium screen
0
400
200
0 200 Distance (µm)
400
Modulation transfer function
1.0 Perfect screen Medium screen 0.5 Fast screen
0
Figure 4. The point-spread function PSF (x,y).
0
2
4
6 Cycles/mm
8
10
12
Figure 5. Point-spread (top) and modulation-transfer (bottom) functions for fast and medium-speed CaWO4 intensifying screens (from Ref. 9, with permission).
IMAGING SCIENCE IN MEDICINE
image without excessive noise. In addition, measuring and characterizing the PSF present difficult challenges. One approach to overcoming the limitations of the PSF is to measure the line-spread function (LSF). In this approach, the source is represented as a long line of infinitesimal width (e.g., a slit in an otherwise opaque plate or a line source of radioactivity). The LSF can be measured by a microdensitometer that is scanned across the slit in a direction perpendicular to its length. As for the pointspread function, the width of the line must be so narrow that it does not contribute to the width of the image. If this condition is met, the width of the image is due entirely to unsharpness contributed by the imaging system. The slit (or line source) is defined mathematically as
+∞
Input = line (x ) = ∫ point (x, y )dy
751
Output = LSF(x ) = S [line(x )]
−∞
y
y
(13)
Figure 6. The line-spread function is the image of an ideal line object, where S represents the linear transform of the imaging system (from Ref. 11, with permission).
The line-spread function LSF(x) results from the blurring operator for the imaging system operating on a line source (Fig. 6), that is,
is presented with a source that transmits radiation on one side of an edge and attenuates it completely on the other side. The transmission is defined as:
+∞ LSF(x) = S[line(x)] = S point(x, y) dy
STEP(x, y) = 1 if x > 0, and 0 if x < 0
+∞ point(x, y) dy. line(x) = −∞
This function can also be written as
−∞
+∞ S[point(x, y)] dy =
(14)
x STEP(x, y) =
−∞
+∞ PSF(x, y) dy, LSF(x) =
line (x) dx
(15)
The edge-spread function ESF (x) can be computed as
that is, the line-spread function LSF is the point-spread function PSF integrated over the y dimension. The MTF of an imaging system can be obtained from the Fourier transform of the LSF.
ESF(x) = S[STEP(x, y)] = S
+∞ LSF(x) exp(−2π imx) dx =
x
line(x) dx
−∞
x
x S[line(x)] dx =
= −∞
F[LSF (x)]
LSF(x) dx
(19)
−∞
This relationship, illustrated in Fig. 7, shows that the LSF (x) is the derivative of the edge-response function ESF (x). d [ESF(x)] (20) LSF(x) = dx
−∞
+∞ +∞ PSF(x, y) exp(−2π imx) dy dx = −∞
+∞ +∞ PSF(x, y) exp[−2π i(mx + my)] dy dy n = 0 = −∞
(18)
−∞
−∞
−∞
(17)
Input : STEP (x ) = 0 if x < 0,1 if x > 0 Output = ESF (x ) = S [STEP(x )]
−∞
= F[PSF(x, y)]n=0 = MTF(m, 0),
(16)
that is, the Fourier transform of the line-spread function is the MTF evaluated in one dimension. If the MTF is circularly symmetrical, then this expression describes the MTF completely in the two-dimensional frequency plane. One final method of characterizing the spatial resolution of an imaging system is by using the edge-response function STEP (x, y). In this approach, the imaging system
y
y
Figure 7. The edge-spread function is derived from the image of an ideal step function, where S represents the linear transform of the imaging system (from Ref. 11, with permission).
752
IMAGING SCIENCE IN MEDICINE
The relationship between the ESF and the LSF is useful because one can obtain a microdensitometric scan of the edge to yield an edge-spread function. The derivative of the ESF yields the LSF, and the Fourier transform of the LSF provides the MTF in one dimension. Contrast Image contrast refers conceptually to the difference in brightness or darkness between a structure of interest and the surrounding background in an image. Usually, information in a medical image is presented in shades of gray (levels of ‘‘grayness’’). Differences in gray shades are used to distinguish various types of tissue, analyze structural relationships, and sometimes quantify physiological function. Contrast in an image is a product of both the physical characteristics of the object being studied and the properties of the image receptor used to form the image. In some cases, contrast can be altered by the exposure conditions chosen for the examination [for example, selection of the photon energy (kVp ) and use of a contrast agent in X-ray imaging]. Image contrast is also influenced by perturbing factors such as scattered radiation and the presence of extraneous light in the detection and viewing systems. An example of the same image at different levels of contrast is shown in Fig. 8. In most medical images, contrast is a consequence of the types of tissue represented in the image. In X-ray imaging, for example, image contrast reveals differences in the attenuation of X rays among various regions of the body, modified to some degree by other factors such as the properties of the image receptor, exposure technique, and the presence of extraneous (scattered) radiation. A simplified model of the human body consists of three different body tissues: fat, muscle, and bone. Air is also present in the lungs, sinuses, and gastrointestinal tract, and a contrast agent may have been used to accentuate the attenuation of X rays in a particular region. The chemical composition of the three body tissues, together with their percentage mass composition, are shown in Table 3. Selected physical properties of the tissues are included in Table 4, and the mass attenuation coefficients for different tissues as a function of photon energy are shown in Fig. 9.
Table 3. Elemental Composition of Tissue Constituentsa % Composition (by Mass) Hydrogen Carbon Nitrogen Oxygen Sodium Magnesium Phosphorus Sulfur Potassium Calcium a
Adipose Tissue
Muscle (Striated)
11.2 57.3 1.1 30.3
10.2 12.3 3.5 72.9 0.08 0.02 0.2 0.5 0.3 0.007
0.06
water 11.2
88.8
Bone (Femur) 8.4 27.6 2.7 41.0 7.0 7.0 0.2 14.7
From Ref. 11, with permission.
Table 4. Properties of Tissue Constituents of the Human Bodya Material
Effective Atomic Number
Air Water Muscle Fat Bone
7.6 7.4 7.4 5.9–6.3 11.6–13.8
a
Density (kg/m3 )
Electron Density (electrons/kg)
1.29 1.00 1.00 0.91 1.65–1.85
3.01 × 1026 3.34 × 1026 3.36 × 1026 3.34–3.48 × 1026 3.00–3.10 × 1026
From Ref. 11, with permission.
In Table 4, the data for muscle are also approximately correct for other soft tissues such as collagen, internal organs (e.g., liver and kidney), ligaments, blood, and cerebrospinal fluid. These data are very close to the data for water, because soft tissues, including muscle, are approximately 75% water, and body fluids are 85% to nearly 100% water. The similarity of these tissues suggests that conventional X-ray imaging yields poor discrimination among them, unless a contrast agent is used to accentuate the differences in X-ray attenuation. Because of the presence of low atomic number (low Z) elements, especially hydrogen, fat has a lower density and effective atomic number compared with muscle and other soft tissues. At less than 35 keV, X rays interact in fat and
Figure 8. Different levels of contrast in an image (from Ref. 12, with permission)
IMAGING SCIENCE IN MEDICINE
100
le at sc F ne Mu
Bo
Mass attenuation coefficient (cm2/gm)
1000
1
0.01 1
10 100 Photon energy (keV)
1000
Figure 9. Mass attenuation coefficient of tissues.
soft tissues predominantly by photoelectric interactions that vary with Z3 of the tissue. This dependence provides higher image contrast among tissues of slightly different composition (e.g., fat and muscle) when low energy X rays are used, compared with that obtained from higher energy X rays that interact primarily by Compton interactions that do not depend on atomic number. Low energy X rays are used to accentuate subtle differences in soft tissues (e.g., fat and other soft tissues) in applications such as breast imaging (mammography), where the structures within the object (the breast) provide little intrinsic contrast. When images are desired of structures of high intrinsic contrast (e.g., the chest where bone, soft tissue, and air are present), higher energy X rays are used to suppress X-ray attenuation in bone which otherwise would create shadows in the image that could hide underlying soft-tissue pathology. In some accessible regions of the body, contrast agents can be used to accentuate tissue contrast. For example, iodine-containing contrast agents are often injected into the circulatory system during angiographic imaging of blood vessels. The iodinated agent is water-soluble and mixes with the blood to increase its attenuation compared with surrounding soft tissues. In this manner, blood vessels can be seen that are invisible in X-ray images without a contrast agent. Barium is another element that is used to enhance contrast, usually in studies of the gastrointestinal (GI) tract. A thick solution of a bariumcontaining compound is introduced into the GI tract by swallowing or enema, and the solution outlines the borders of the GI tract to permit visualization of ulcers, polyps, ruptures, and other abnormalities. Contrast agents have also been developed for use in ultrasound (solutions that contain microscopic gas bubbles that reflect sound energy) and magnetic resonance imaging (solutions that contain gadolinium that affects the relaxation constants of tissues). Integration of Image Noise, Resolution and Contrast — The Rose Model. The interpretation of images requires analyzing all of the image’s features, including noise, spatial resolution, and contrast. In trying to understand
753
the interpretive process, the analysis must also include the characteristics of the human observer. Collectively, the interpretive process is referred to as ‘‘visual perception.’’ The study of visual perception has captured the attention of physicists, psychologists, and physicians for more than a century — and of philosophers for several centuries. A seminal investigation of visual perception, performed by Albert Rose in the 1940s and 1950s, yielded the Rose model of human visual perception (13). This model is fundamentally a probabilistic analysis of detection thresholds in low-contrast images. Rose’s theory states that an observer can distinguish two regions of an image, called ‘‘target’’ and ‘‘background,’’ only if there is enough information in the image to permit making the distinction. If the signal is assumed to be the difference in the number of photons used to define each region and the noise is the statistical uncertainty associated with the number of photons in each region, then the observer needs a certain signal-to-noise ratio to distinguish the regions. Rose suggested that this ratio is between 5 and 7. The Rose model can be quantified by a simple example (11) that assumes that the numbers of photons used to image the target and background are Poisson distributed and that the target and background yield a low-contrast image in which N = number of photons that define the target ∼ number of photons that define the background N = signal = difference in the number of photons that define target and background A = area of the target = area of background region C = contrast of the signal compared with background The contrast between target and background is related to the number of detected photons N and the difference N between the number of photons that define the target and the background: N N Signal = N = CN C=
(21) (22)
For Poisson-distributed events, noise = (N)1/2 , and the signal-to-noise ratio (SNR) is SNR =
signal CN = C(N)1/2 = C(A)1/2 = noise (N)1/2
(23)
where is the photon fluence (number of photons detected per unit area) and A is the area of the target or background. Using the experimental data of his predecessor Blackwell, Rose found that the SNR has a threshold in the range of 5–7 for differentiating a target from its background. The Rose model is depicted in Fig. 3. A second approach to integrating resolution, contrast, and noise in image perception involves using contrastdetail curves. This method reveals the threshold contrast needed to perceive an object as a function of its diameter. Contrast-detail curves are shown in Fig. 10 for two sets
754
IMAGING SCIENCE IN MEDICINE
Invisible
Contrast
Low noise
High noise
Visible Size Figure 10. Results of tests using the contrast-detail phantom of Fig. 2–14 for high-noise and low-noise cases. Each dashed line indicates combinations of size and contrast of objects that are just barely visible above the background noise (from Ref. 9, with permission).
of images; one was acquired at a relatively high signal-tonoise ratio (SNR), and the other at a lower SNR. The curves illustrate the intuitively obvious conclusion that images of large objects can be seen at relatively low contrast, whereas smaller objects require greater contrast to be visualized. The threshold contrast curves begin in the upper left corner of the graph (large objects [coarse detail], low contrast) and end in the lower right corner (small objects [fine detail], high contrast). Contrast-detail curves can be used to compare the performance of two imaging systems or the same system under different operating conditions. When the performance data are plotted, as shown in Fig. 10, the superior imaging system is one that encompasses the most visible targets or the greatest area under the curve. Image Display/Processing Conceptually, an image is a two-(or sometimes three-) dimensional continuous distribution of a variable such as intensity or brightness. Each point in the image is an intensity value; for a color image, it may be a vector of three values that represent the primary colors red, green, and blue. An image includes a maximum intensity and a minimum intensity and hence is bounded by finite intensity limits as well as by specific spatial limits. For many decades, medical images were captured on photographic film, which provided a virtually continuous image limited only by the discreteness of image noise and film granularity. Today, however, many if not most medical images are generated by computer-based methods that yield digital images composed of a finite number of numerical values of intensity. A two-dimensional medical image may be composed of J rows of K elements, where each element is referred to as a picture element or pixel, and a three-dimensional image may consist of L slices,
where each slice contains J rows of K elements. The threedimensional image is made up of volume elements or voxels; each voxel has an area of one pixel and a depth equal to the slice thickness. The size of pixels is usually chosen to preserve the desired level of spatial resolution in the image. Pixels are almost invariably square, whereas the depth of a voxel may not correspond to its width and length. Interpolation is frequently used to adjust the depth of a voxel to its width and length. Often pixels and voxels are referred to collectively as image elements or elements. Digital images are usually stored in a computer so that each element is assigned to a unique location in computer memory. The elements of the image are usually stored sequentially, starting with elements in a row, then elements in the next row, etc., until all of the elements in a slice are stored; then the process is repeated for elements in the next slice. There is a number for each element that represents the intensity or brightness at the corresponding point in the image. Usually, this number is constrained to lie within a specific range starting at 0 and increasing to 255 [8 bit (1 byte) number], 65,535 [16 bit (2 byte) number], or even 4,294,967,295 [32 bit (4 byte) number]. To conserve computer memory, many medical images employ an 8-bit intensity number. This decision may require scaling the intensity values, so that they are mapped over the available 256 (0–255) numbers within the 8-bit range. Scaling is achieved by multiplying the intensity value Ii in each pixel by 255/Imzx to yield an adjusted intensity value to be stored at the pixel location. As computer capacity has grown and memory costs have decreased, the need for scaling to decrease memory storage has become less important and is significant today only when large numbers of high-resolution images (e.g., X-ray planar images) must be stored. To display a stored image, the intensity value for each pixel in computer memory is converted to a voltage that is used to control the brightness at the corresponding location on the display screen. The intensity may be linearly related to the voltage. However, the relationship between voltage and screen brightness is a function of the display system and usually is not linear. Further, it may be desirable to alter the voltage: brightness relationship to accentuate or suppress certain features in the image. This can be done by using a lookup table to adjust voltage values for the shades of brightness desired in the image, that is, voltages that correspond to intensity values in computer memory are adjusted by using a lookup table to other voltage values that yield the desired distribution of image brightness. If a number of lookup tables are available, the user can select the table desired to illustrate specific features in the image. Examples of brightness: voltage curves obtained from different lookup tables are shown in Fig. 11. The human eye can distinguish brightness differences of about 2%. Consequently, a greater absolute difference in brightness is required to distinguish bright areas in an image compared with dimmer areas. To compensate for this limitation, the voltage applied to the display screen may be modified by the factor ekV to yield an adjusted brightness that increases with the voltage V. The constant k can be chosen to provide the desired contrast in the displayed images.
IMAGING SCIENCE IN MEDICINE
755
(a) Number of pixels
(a) B
V (b) B
U Pixel value DB (b) Number of pixels
DV
L
V
Figure 11. (a) A linear display mapping; (b) a nonlinear display to increase contrast (from Ref. 6, with permission).
Image Processing Often pixel data are mapped onto a display system in the manner described before. Sometimes, however, it is desirable to distort the mapping to accentuate certain features of the image. This process, termed image processing, can be used to smooth images by reducing their noisiness, accentuate detail by sharpening edges in the image, and enhance contrast in selected regions to reveal features of interest. A few techniques are discussed have as examples of image processing. In many images, the large majority of pixels have intensity values that are clumped closely together, as illustrated in Fig. 12a. Mapping these values onto the display, either directly or in modified form as described earlier is inefficient because there are few bright or dark pixels to be displayed. The process of histogram equalization improves the efficiency by expanding the contrast range within which most pixels fall, and by compressing the range at the bright and dark ends where few pixels have intensity values. This method can make subtle differences in intensity values among pixels more visible. It is useful when the pixels at the upper and lower ends of the intensity range are not important. The process of histogram equalization is illustrated in Fig. 12b. Histogram equalization is also applicable when the pixels are clumped at the high or low end of intensity values. All images contain noise as an intrinsic product of the imaging process. Features of an image can be obscured by noise, and reducing the noise is sometimes desired to make such features visible. Image noise can be reduced by image smoothing (summing or averaging intensity values) across adjacent pixels. The selection and weighting of pixels for averaging can be varied among several patterns;
Pixel value Figure 12. (a) Representative image histogram; (b) intensityequalized histogram.
representative techniques are included there as examples of image smoothing. In the portrayal of a pixel and its neighbors shown in Fig. 13, the intensity value of the pixel (j,k) can be replaced by the average intensity of the pixel and its nearest neighbors (6). This method is then repeated for each pixel in the image. The nearest neighbor approach is a ‘‘filter’’ to reduce noise and yield a smoothed image. The nearest neighbor approach is not confined to a set number of pixels to arrive at an average pixel value; for example, the array of pixels shown in Fig. 13 could be reduced from
j − 1, k + 1
j − 1, k + 1
j + 1, k + 1
j − 1, k
j, k
j + 1, k
j − 1, k − 1
j, k − 1
j + 1, k − 1
Figure 13. The nearest neighbor approach to image smoothing.
756
IMAGING SCIENCE IN MEDICINE
9 to 5 pixels or increased from 9 to 25 pixels, in arriving at an average intensity value for the central pixel. An averaging of pixel values, in which all of the pixels are averaged by using the same weighting to yield a filtered image, is a simple approach to image smoothing. The averaging process can be modified so that the intensity values of some pixels are weighted more heavily than others. Weighted filters usually emphasize the intensity value of the central pixel (the one whose value will be replaced by the averaged value) and give reduced weight to the surrounding pixels in arriving at a weighted average. An almost universal rule of image smoothing is that when smoothing is employed, noise decreases, but unsharpness increases (i.e., edges are blurred). In general, greater smoothing of an image to reduce noise leads to greater blurring of edges as a result of increased unsharpness. Work on image processing often is directed at achieving an optimum balance between increased image smoothing and increased image unsharpness to reveal features of interest in specific types of medical images. The image filtering techniques described before are examples of linear filtering. Many other image-smoothing routines are available, including those that employ ‘‘nonlinear’’ methods to reduce noise. One example of a nonlinear filter is replacement of a pixel intensity by the median value, rather than the average intensity, in a surrounding array of pixels. This filter removes isolated noise spikes and speckle from an image and can help maintain sharp boundaries. Images can also be smoothed in frequency space rather than in real space, often at greater speed. Image Restoration. Image restoration is a term that refers to techniques to remove or reduce image blurring, so that the image is ‘‘restored’’ to a sharper condition that is more representative of the object. This technique is performed in frequency space by using Fourier transforms for both the image and the point-spread function of the imaging system. The technique is expressed as O(j, k) =
I(j, k) P(j, k)
(24)
where I (j,k) is the Fourier transform of the image, P (j,k) is the Fourier transform of the point-spread function, and O (j,k) is the Fourier transform of the object (in three-dimensional space, a third spatial dimension l would be involved). This method implies that the unsharpness (blurring) characteristics of the imaging device can be removed by image processing after the image has been formed. Although many investigators have pursued image restoration with considerable enthusiasm, interest has waned in recent years because two significant limitations of the method have surfaced (6). The first is that the Fourier transform P (j,k) can be zero for certain values of (j,k), leading to an undetermined value for O (j,k). The second is that image noise is amplified greatly by the restoration process and often so overwhelms the imaging data that the restored image is useless. Although methods have been developed to reduce these limitations,
the conclusion of most attempts to restore medical images is that it is preferable to collect medical images at high resolution, even if sensitivity is compromised, than to collect the images at higher sensitivity and lower resolution, and then try to use image-restoration techniques to recover image resolution. Image Enhancement. The human eye and brain act to interpret a visual image principally in terms of boundaries that are presented as steep gradients in image brightness between two adjacent regions. If an image is processed to enhance these boundary (edge) gradients, then image detail may be more visible to the observer. Edgeenhancement algorithms function by disproportionately increasing the high-frequency components of the image. This approach also tends to enhance image noise, so that edge-enhancement algorithms are often used together with an image-smoothing filter to suppress noise. CONCLUSION The use of images to detect, diagnose, and treat human illness and injury has been a collaboration among physicists, engineers, and physicians since the discovery of X rays in 1895 and the first applications of X-ray images to medicine before the turn of the twentieth century. The dramatic expansion of medical imaging during the past century and the ubiquitous character of imaging in all of medicine today have strenghthened the linkage connecting physics, engineering, and medical imaging. This bond is sure to grow even stronger as imaging develops as a tool for probing the cellular, molecular, and genetic nature of disease and disability during the first few years of the twenty-first century. Medical imaging offers innumerable challenges and opportunities to young physicists and engineers interested in applying their knowledge and insight to improving the human condition. ABBREVIATIONS AND ACRONYMS DQE ECT ESF fMRI GI JND LSF MRI MTF PET PSF SNR SPECT
detective quantum efficiency emission computed tomography edge-spread function functional magnetic resonance imaging gastrointestinal just noticeable differences line-spread function magnetic resonance imaging modulation transfer function positron emission tomography point-spread function signal-to-noise ratio single photon emission computed tomography
BIBLIOGRAPHY 1. W. R. Hendee, Rev. Mod. Phys. 71(2), Centenary, S444–S450 (1999). 2. R. N. Beck, in W. Hendee and J. Trueblood, eds., Digital Imaging, Medical Physics, Madison, WI, 1993, pp. 643–665.
IMAGING SCIENCE IN METEOROLOGY 3. J. H. Thrall, Diagn. Imaging (Dec.), 23–27 (1997). 4. W. R. Hendee, in W. Hendee and J. Trueblood, eds., Digital Imaging, Medical Imaging, Madison, WI, 1993, pp. 195–212. 5. P. F. Sharp and R. Philips, in W. R. Hendee and P. N. T. Wells, ed., The Perception of Visual Information, Springer, NY, 1997, pp. 1–32. 6. B. H. Brown et al., Medical Physics and Biomedical Engineering, Institute of Physics Publishing, Philadelphia, 1999. 7. S. S. Stevens, in W. A. Rosenblith, ed., Sensory Communication, MIT Press, Cambridge, MA, 1961, pp. 1–33. 8. J. A. Rowlands, in W.R. Hendee, ed., Biomedical Uses of Radiation, Wiley-VCH, Weinheim, Germany, 1999, pp. 97–173. 9. A. B. Wolbarst, Physics of Radiology, Appleton and Lange, Norwalk, CT, 1993. 10. A. Rose, Vision: Human and Electronic, Plenum Publishing, NY, 1973. 11. B. H. Hasegawa, The Physics of Medical X-Ray Imaging, 2nd ed., Medical Physics, Madison, WI, 1991, p. 127. 12. W. R. Hendee, in C.E. Putman and C.E. Ravin, eds., Textbook of Diagnostic Imaging, 2nd ed., W.B. Saunders, Philadelphia, 1994, pp. 1–97. 13. A. Rose, in Proc. Inst. Radioengineers 30, 293–300 (1942).
IMAGING SCIENCE IN METEOROLOGY ROBERT M. RAUBER LARRY DI GIROLAMO University of Illinois at Urbana-Champaign Urbana, IL
INTRODUCTION Meteorologists draw on a wealth of data from measurements and numerical weather prediction models to understand the atmosphere and forecast its evolution. A dramatic increase in the type and quantity of data available to meteorologists during the latter half of the twentieth century has required worldwide computing resources to manage efficient data processing, dissemination, and storage. Imaging and animation have become essential tools of meteorologists as they attempt to assess the current state of the atmosphere and forecast its future behavior. Meteorological measurements of atmospheric properties such as temperature, pressure, winds and moisture content are made by weather stations in towns and airports worldwide at regular time intervals, typically every 1–3 hours. Some of these stations use balloon-borne instruments to measure vertical profiles of atmospheric properties every 12 hours. Few observations exist in many parts of the world, particularly over oceans, deserts, and mountainous regions. Even in populated areas, the distances between stations are large enough that significant weather phenomena such as thunderstorms may not be reported. Therefore, satellite and radar imagery are key sources of information used by meteorologists to detect and analyze these weather phenomena. The high temporal and spatial resolution of these images permit meteorologists to monitor the progress of weather systems, analyze
757
the structure and evolution of storms, and determine where dangerous conditions are likely to occur. The data used to construct these images are also used to initialize numerical forecast models and to determine the quality of forecasts. The first successful meteorological satellite to acquire images of the earth was the Television and Infrared Observational Satellite 1 (TIROS 1). TIROS 1 was launched into a 48° inclination orbit on 1 April, 1960. The first image acquired by TIROS 1 is shown in Fig. 1. Although its 79-day lifetime was brief and the technology crude by today’s standards, the acquired images demonstrated the powerful utility that satellite imagery would bring to the field of meteorology. Many meteorological satellites followed, and on 7 December, 1966, a landmark event in satellite meteorology took place when the Applications Technology Satellite 1 (ATS 1) was launched. ATS 1 was the first meteorological satellite placed into geostationary orbit. From this vantage point, images of nearly an entire hemisphere can be acquired at high enough temporal resolution to make animation possible. For the first time, meteorologists could visually monitor large weather disturbances such as cyclones and hurricanes. Today, animations routinely appear on television weather broadcasts. By the end of the 1960s, information acquired by satellites was also used quantitatively in numerical weather prediction models for both initialization and verification. Radar has been used for meteorological studies since World War II. Between the 1940s and 1970s, meteorologists routinely examined radar data on cathoderay-tube displays where image brightness was a measure of echo strength and likely precipitation intensity. The
Figure 1. The first TIROS-1 image of earth (courtesy of the National Oceanic and Atmospheric Administration).
758
IMAGING SCIENCE IN METEOROLOGY
development of scanning Doppler radar in the early 1970s, which provides information on wind fields, was quickly followed by the development of color images of radar data. The first displays were developed in 1974 by the Air Force Geophysics Laboratory, the Raytheon Corporation, and the National Center for Atmospheric Research. The use of color by researchers progressed slowly in the 1970s, limited primarily by computing power, but accelerated in the 1980s as computer technology advanced. In 1979, several U.S. government agencies collaborated to establish a network of Doppler radars in the United States. The network was designed and tested in the 1980s and finally deployed in the 1990s. Rapid data processing and dissemination of data from this radar network via the World Wide Web now make it possible for meteorologists and the public to obtain radar images in near real time. Meteorologists require images of satellite and radar data at both fine and coarse resolution to examine the many scales of motion in the atmosphere. Consider, for example, the midlatitude cyclonic storm shown in Fig. 2. A cyclone’s cloud pattern typically takes the shape of a comma. Cyclones can influence weather across an area roughly two-thirds the size of the United States. In Fig. 2, a 1,500-kilometer long line of showers and thunderstorms is present along the ‘‘tail’’ of the comma cloud. Individual thunderstorms within this line are typically about 15 kilometers in diameter. One or more of these thunderstorms may be rotating, and the scale of rotation is typically about 1–5 kilometers. A tornado whose diameter ranges from 0.1–1 kilometer may develop within one of the rotating thunderstorms. A forecaster
Warm front near middle cloud boundary
Surface cold front marked by position of low cloud boundary
Upper level front marked by west edge of high clouds
Figure 2. Infrared satellite image of a cyclone over the central United States on 10 November 1998. The shading is indicative of the temperature of the cloud tops and ground surfaces viewed by the satellite. The blue enhanced regions denote the coldest cloud tops, light gray shades are warm-topped clouds, and dark gray shades are the earth’s surface. See color insert.
responsible for issuing weather forecasts and severe weather warnings must quickly assimilate, interpret, and update information about all of these scales of motion as the cyclone evolves. To facilitate this process, meteorologists require that satellite and radar data be easily remapped into various projections, marked with geographical features such as state and county boundaries, overlaid with other meteorological data, and animated to observe the growth or decay of various features within the weather systems. SATELLITE IMAGERY IN METEOROLOGY Satellite Measurements Meteorological satellites measure electromagnetic radiation, usually in several spectral bands. Virtually all of the radiation measured by satellites originates from the sun or the earth. About 95% of the radiative energy emitted by the sun lies between the wavelengths of 0.3 µm and 2.4 µm, which includes the ultraviolet, visible and near-infrared portion of the electromagnetic spectrum. The sun’s peak radiative output occurs around 0.46 µm in the visible portion of the spectrum. The solar radiation that reaches earth can be scattered or absorbed by atmospheric constituents (air molecules, aerosols, and clouds) and the earth’s surface. Some of the scattered solar radiation makes its way back to space, where it can be measured by satellites. The earth’s surface and atmosphere also emit radiation, primarily in the infrared and microwave portion of the spectrum. The earth’s peak radiative output occurs at around 10 µm in the infrared portion of the spectrum. Some of the radiation emitted by the earth makes its way to space where it can be measured by satellites. Satellite infrared imagery is often gray-scale inverted so that cold clouds, which emit smaller amounts of infrared radiation, appear white and warmer surface features, which emit larger amounts of radiation, appear dark. To interpret satellite-measured radiation properly, basic knowledge of the transmission properties of the earth’s atmosphere is required. Figure 3 shows the vertical transmission properties of the earth’s cloud and aerosolfree atmosphere as a function of the wavelength of the electromagnetic radiation. Note that there are several spectral regions where the transmittance is large. These regions are called ‘‘windows’’ because the atmosphere is nearly transparent to radiation at these wavelengths. Two window regions are commonly exploited by meteorologists. The first is the visible window region that lies between 0.4 µm and 0.7 µm. In this region, a small reduction in the transmittance from unity is caused primarily by scattering of visible light by atmospheric gases. On a clear day, most of the visible sunlight that reaches the top of the earth’s atmosphere is transmitted to the earth’s surface. Some of the transmitted sunlight is then scattered by the surface back to space, where it can be measured by satellites and analyzed by using ‘‘visible-channel’’ imagery. The second window region of interest lies in the infrared portion of the spectrum between 10 µm and 12 µm. In this spectral region, a small reduction of transmittance from unity is caused primarily by water
IMAGING SCIENCE IN METEOROLOGY
759
Vertical transmittance
1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 101112131415 Wavelength (µm)
vapor absorption. The peak emission of the earth occurs at around 10 µm, so that on a clear day, a satellite that has an infrared channel in this window region can measure radiation primarily emitted by the earth’s surface. This measurement can be considered a coarse measure of the earth’s surface temperature because the intensity of radiation emitted by a surface is a function of temperature as described by Planck’s function. In practice, direct inversion of Planck’s function is not used to retrieve surface temperature because a correction for the small reduction in atmospheric transmittance and surface emittance is needed to retrieve the accurate surface temperatures required by meteorologists. One meteorological feature that truly stands out in satellite imagery is clouds. Clouds scatter radiation very efficiently in the visible portion of the spectrum, and only a small amount is absorbed. As a result, clouds appear bright in visible imagery. How bright a cloud appears depends largely on the amount, type, and spatial distribution of the particles that make up the cloud. As a rule of thumb, thick clouds appear brighter than thin clouds. Clouds become very efficient in absorbing radiation in the infrared window and not very efficient in scattering radiation. Thus, clouds absorb much of the infrared radiation emitted by the earth’s surface in the window region. However, according to Kirchoff’s law, good absorbers of radiation at a particular wavelength are also good emitters of radiation at that wavelength. The amount of radiation emitted decreases with decreasing temperature, according to the Planck function. Thus, in infrared imagery, clouds that are at a higher altitude emit less radiation to space than lower clouds because the average temperature in the lower atmosphere, where nearly all clouds occur, decreases with increasing altitude. As an example, Figs. 4a and b show visible (0.67 µm) and infrared (11 µm) images taken by the Advanced Very High Resolution Radiometer (AVHRR) onboard the National Oceanic and Atmospheric Administration (NOAA)-11 satellite on 6 June, 1992 over the Bering Strait. Several important features stand out in these images. Based on the infrared image, the landmasses of Alaska and the Chukchi Peninsula appear much warmer than the surrounding waters. In the visible image, ice covers much of the waters north of the Bering Strait (D) and in Kotzebue Sound (E). In the visible image, open waters appear dark (F) compared to ice-covered waters (D, E). However, in
Figure 3. The vertical transmittance of the earth’s atmosphere between 0.2 µm and 15 µm. The transmittance was calculated using a radiative transfer model at a spectral resolution of 5 cm−1 . The effects of aerosols and clouds have not been included in the calculations.
the infrared, points D, E, and F show very little contrast, indicating little difference in surface temperature between open and ice-covered waters around the strait. The waters to the south are warmer, which can be determined by contrasting point F and C. Snow-covered mountaintops stand out in both visible and infrared images (e.g., H). Roughly one-third of the AVHRR scene in Fig. 4 is covered with clouds. In the visible image, the clouds at points A and B are brighter than the waters at point C. More information can be ascertained in the infrared. The temperature of clouds at A is lower than that of clouds at B, indicating that the cloud tops at A are at a higher altitude than the cloud tops at B. Clouds are more difficult to identify over ice. There is little contrast in both the visible and infrared images between the low-altitude cloud at point G and the surrounding ice field. However, higher altitude clouds over the ice tend to be easier to identify in the infrared image. Overall, by studying infrared and visible imagery in tandem, meteorologists can characterize the scene better than by studying visible or infrared imagery alone. In addition to visual inspection of satellite spectral imagery, remote sensing techniques exist that use measurements from a number of spectral channels to infer quantitatively a wide variety of properties about the earth’s surface and atmosphere, including surface temperature, vertical temperature and humidity profiles, cloud liquid water content, and surface and cloud albedo. Operational Imaging Satellites Satellite imaging instruments used in the daily operations of weather services around the world are called operational satellites. They are found in two types of orbits: geostationary and low earth orbits. A geostationary orbit is a circular orbit whose angular velocity is the same as the earth’s and lies in the earth’s equatorial plane. Consequently, the satellite remains essentially motionless relative to a point on the earth’s equator. A geostationary orbit must be about 36,000 km above the earth’s surface for the satellite to maintain the same angular velocity as that of the earth. A low earth orbit lies between several hundred and several thousand kilometers in altitude. Many meteorological satellites in a low earth orbit are in a near-polar orbit that is sun-synchronous. A satellite in a sun-synchronous orbit crosses the equator at the same local time every day.
760
IMAGING SCIENCE IN METEOROLOGY
(a)
G
D
H
E F
B
A
C
(b)
G
H
D
E F
B
A
C Figure 4. (a) A visible (0.67 µm) image and (b) an infrared (11 µm) image of the Bering Strait between Alaska and Siberia taken from the AVHRR instrument on board the NOAA-11 satellite platform on 6 June 1992. The letter labels are used as reference in the text.
Several countries around the globe maintain geostationary and low earth orbiting imaging meteorological satellites. The United States operates the Geostationary Operational Environmental Satellite (GOES) imager and the Advanced Very High Resolution Radiometer (AVHRR)
in the NOAA series of polar low earth orbiting satellites. These imagers are typical of satellite imagers maintained worldwide, such as the instruments on the Meteorological Satellite (METEOSAT), operated by a European consortium, and the Geostationary Meteorological Satellite (GMS) operated by Japan. The first of the GOES series of satellites was placed in orbit in 1976. Ten GOES satellites have been operational through the year 2000. The latest, GOES-10, carries five spectral channels centered on 0.65, 3.9, 6.7, 10.7, and 12 µm. Table 1 summarizes the GOES-10 imager characteristics. At any given time, the United States operates two GOES satellites, GOES-East (currently GOES-8) and GOES-West (currently GOES-10), located around 75 ° W and 135 ° W, respectively. Most of the Western Hemisphere can be observed from these two locations. For example, Figures 5a, b, and c show the 0.65, 6.7 and 10.7 µm full disk images from GOES-10 taken on 13 August, 1999. These three spectral channels are the channels most commonly exploited in constructing meteorological satellite imagery. The 0.65-µm (visible) imagery gives a measure of the visible reflectivity of objects in the image. Ocean observations, outside of sun-glint, tend to have the lowest reflectivity, and clouds and snowcovered surfaces tend to have the highest reflectivity. The 10.7 µm window (infrared) imagery gives a measure of surface and cloud-top temperatures; clouds are normally colder than the earth’s surface. The 6.7 µm (infrared) channel is in a spectral region of strong water vapor absorption (see Fig. 3). As a result, no surface features can be seen in the 6.7-µm imagery. Instead, the features observed are humidity and clouds, typically in the range of 2–18 km in altitude. Because water vapor is ubiquitous but variable in concentration in the earth’s atmosphere, animation of water vapor imagery allows meteorologists to monitor the atmospheric circulation in both clear and cloudy conditions. The two infrared channels provide useful imagery during day and night, whereas visible imagery provides useful imagery only during the day. Figure 5 was taken around local noon of the subsatellite point, a time when the sun is illuminating the entire scene viewed by GOES-10. The images show two hurricanes and a tropical depression at about 20 ° N latitude. The westernmost hurricane is Eugene, a weakening hurricane. Directly to the east of Eugene is Hurricane Dora, near its peak intensity. GOES-10 can acquire an image of the full earth’s disk viewed by the satellite every 25 minutes. Full disk imaging can be halted at any time to perform a rapidscan image, whereby a small section of the earth can be imaged at higher temporal resolution. A full disk image from GOES is not a complete view of the hemisphere. Latitudes up to only 72.6° can be observed by GOES because the earth subtends a half angle of 17.4° at the geostationary altitude. Thus, polar orbiting instruments, like the AVHRR, are needed to monitor meteorological conditions at polar latitudes. The first AVHRR series of imagers was on board TIROSN, launched in 1978. All of the NOAA series of satellites that followed carried AVHRR imagers. The characteristics of the most recent AVHRR on board NOAA-15 are shown in
IMAGING SCIENCE IN METEOROLOGY
761
Table 1. Specifications of the GOES-10 Imager Instrument Parameters
Specifications
Spectral channels (half-power points in µm) Instantaneous field of view (µrad) Ground instantaneous field of view at nadir Scanning rate 500 × 500-km region 3,000 × 3,000-km region Full disk Instrument mass Instrument size
Channel 1
Channel 2
Channel 3
Channel 4
Channel 5
0.55–0.75
3.8–4.0
6.5–7.0
10.2–11.2
11.5–12.5
28 1 km
112 4 km
224 8 km
112 4 km
112 4 km
20 s 3.1 min 25 min 120 kg Sensor: 115 × 80 × 75 cm; Electronics: 67 × 43 × 19 cm; Power supply: 29 × 20 × 16 cm 119 W
Power consumption
Table 2. Specifications of the AVHRR on Board NOAA-15 Instrument Parameters Instantaneous field of view Ground instantaneous field of view (850-km orbit) Nadir Swath edge Number of samples across scan line Maximum scan angle Swath width Spectral channels (half-power points in µm) 1 2 3A 3B 4 5 Instrument mass Instrument size Power consumption
Specifications 1.3 mrad
1.1 km 2.3 × 6.4 km 2,048 ±55.3° from nadir 3,000 km
0.58–0.68 0.725–1.00 1.58–1.64 3.55–3.93 10.3–11.3 11.5–12.5 30 kg 27 × 37 × 79 cm 29 W
Table 2. AVHRR is a scanning radiometer that sweeps out a swath approximately 3,000 km wide. Thus, from a given satellite pass, the AVHRR views only a small fraction of the earth. Because NOAA-15 orbits the earth about 14 times per day, a different portion of the polar region is viewed about every 100 minutes as the earth rotates under the satellite’s orbit. Although this information is much less frequent and covers a much smaller spatial extent than information from GOES, the information is nonetheless crucial for monitoring the weather in polar regions. Examples of Meteorological Satellite Image Interpretation Cloud Classification. Meteorologists use a standard scheme for classifying clouds that has remained nearly unchanged for more than a century. In this classification,
clouds that have vertical development are called cumulus, layer clouds, stratus, high clouds, cirrus, and precipitating clouds, nimbus. The terms are combined to characterize particular cloud types, such as stratocumulus (a layer of clouds that have vertical development) and cumulonimbus (a thunderstorm), and further refined to take into account a cloud’s altitude and shape. The satellite meteorologist attempts to identify cloud types because the clouds’ type, location, and organization can be related to the type of weather that occurs in their vicinity. An experienced meteorologist can use satellite imagery to identify different cloud types. For example, Fig. 4 shows a wide variety of cloud types belonging to the high and low cloud categories. Point A is a cirrostratus cloud, and points B and G are stratus clouds. Stratocumulus clouds lie to the west of point C, and cumulus clouds tend to dominate the Alaskan landmass to the east and south of Norton Sound. Figure 5 shows several examples where the clouds’ type and organization can be related to the type of weather occurring in their vicinity. The obvious example is Hurricane Dora, where cumulonimbus clouds make up the hurricane’s center and the surrounding cumulus shows the hurricane’s spiral flow. Figure 5 also shows a long, narrow band of clouds whose cirrus tops in the far southern Pacific are associated with a cold front. The marine stratocumulus clouds off the coast of Baja California indicate two regions of large-scale descent and ascent of air. This can be better seen in Figure 6, which is a higher resolution image of the marine stratocumulus region indicated in the boxed region on Fig. 5a. To the right of the image is a region of open-cell convection, characterized by cells whose center is clear and edges are cloudy. To the left of the image, closed-cell convection dominates. Clouds in this region fill the center of the cell, and the edges of the cell are clear. Open-cell convection occurs mainly in regions of largescale descending motion; heating from the surface drives the convection, and closed-cell convection occurs primarily in regions of large-scale ascending motion; infrared cooling of the clouds tops drives the convection.
762
IMAGING SCIENCE IN METEOROLOGY
(a)
(b)
(c)
Figure 5. GOES-10 full disk images for (a) the 0.65 µm channel, (b) the 6.7 µm channel, and (c) the 10.7 µm channel taken on 13 August 1999 at 14:00 Greenwich mean time. The boxed region in (a) is enlarged in Fig. 6 (courtesy of Mike Garay and Roger Davies, Department of Atmospheric Sciences, University of Arizona, Tucson, AZ).
Fronts. A front is the boundary between two air masses that have distinctly different thermal or moisture properties. When cold (typically dry) air is advancing into a region of warm (typically moist) air, the boundary between the two air masses is called a cold front. Similarly, when warm air is advancing into a region of colder air, the boundary between the two air masses is called a warm front. If the boundary between the two air masses is not moving, the front is called a stationary front. Lifting of air is the driving mechanism in cloud and precipitation formation along fronts. As cold air advances toward a warmer air mass, the warm air, which is less dense than the cold air, is forced upward. Similarly, when
warm air advances toward a cold air mass, the warm air will rise over the colder air. As the warm air rises, it cools by expansion. If the warm air is sufficiently moist and enough cooling takes place for the air to reach saturation, clouds form. The cloud types that form along fronts depend on the physical properties of the air masses and the amount of lifting that takes place. Fortunately, some generalizations of cloud types associated with warm and cold fronts can be made, which helps meteorologists to identify fronts on satellite images. Cold fronts tend to produce stronger lifting over a narrower distance than warm fronts. As a result, the clouds along a cold front tend to be cumuliform such as
IMAGING SCIENCE IN METEOROLOGY
763
Figure 7. GOES-8 visible-channel image of the northeast coast of North America taken on 23 February 1999. Figure 6. GOES-10 high-resolution visible-channel image of the marine stratocumulus field off the coast of Baja California shown in the box in Fig. 5a.
cumulus, stratocumulus, and cumulonimbus. If the clouds grow tall enough, strong winds aloft may carry ice crystals at the top of the clouds ahead of the front, producing a broad shield of cirrus clouds. Cold fronts can sometimes be identified in satellite imagery by a long, narrow band of clouds, where the rear edge of the cloud band is very sharp. For example, Fig. 5 prominently shows a long, narrow band of clouds in the far southern Pacific that is associated with a cold front. Figure 2 shows a special kind of cold front associated with the ‘‘comma cloud’’ of the cyclone, a ‘‘split cold front.’’ A split cold front occurs when dry air aloft overruns the warm, moist air ahead of the surface cold front. Under these conditions, deep clouds develop at the leading edge of the advancing dry air aloft (the upper level cold front), and shallow clouds occur below the dry air and ahead of the surface cold front. Infrared satellite images can clearly distinguish the adjacent bands of clouds at the two altitudes. Severe thunderstorms are often found just ahead of the upper level front. Warm fronts tend to be associated with gentle lifting across a wide area. As a result, the clouds along a warm front tend to be stratiform. Far ahead of the warm front, often out to 1000 km, cirrus and cirrostratus clouds form. Closer to the warm front, deeper clouds typically produce a wide area of precipitation that is generally less intense than the precipitation along a cold front. Figure 2, which shows an example of a cloud shield north of the warm front, illustrates a basic difficulty in frontal analysis using satellite data. In this image, the position of the surface warm front is mostly hidden below a deck of cirrus clouds (indicated by the blue enhancement). The lower cloud decks associated with air rising over the warm front can be seen only across Pennsylvania and New York, east of the high cirrus cloud deck. Air-mass Modification. Air masses can be roughly categorized as polar (cold), tropical (warm), continental
(dry), and maritime (humid). Air masses can cover several million square kilometers because the source regions (the underlying continents and oceans) are very large. As an air mass moves away from its source region across a surface that has different thermal or moisture properties, airmass modification takes place. Figure 7 is an example of air-mass modification easily identified in a satellite image. A high-pressure center lies north of Lake Ontario, forcing continental polar air, which originated over northern Canada, southeastward toward the Atlantic coast and over the ocean. When this air mass moves over the warm waters of the Gulf Stream, the lowest layer of the air mass quickly warms, moistens, and destabilizes. Warm moist updrafts develop, producing the cloud bands seen in the image. Cold, dry downdrafts between the cloud bands produce gusty winds at the surface, making the seas choppy and potentially hazardous. Note how the air blowing off the coast of Maine and New Brunswick must cross the Bay of Fundy and Nova Scotia before reaching the Atlantic. As a result, the stability of the air is modified more than once enroute to the Atlantic, as can be seen in the image. Animation. Images of nearly an entire hemisphere can be acquired from geostationary orbit at a high enough temporal resolution to make animation possible. Animated images are a valuable tool in meteorology because the life cycle of weather systems, from small cumulus clouds to large-scale cyclones, can be monitored. Winds at cloud level can be estimated from cloud motions. The large-scale horizontal motion of air can be monitored from water vapor imagery. Animations allow meteorologists to study the progress of significant weather events and make more reliable forecasts. Nowhere is this more important than in forecasting the movement of hurricanes. Hurricanes cause more damage in the United States than any other natural disaster. In terms of lives lost, the hurricane that hit Galveston, Texas, in 1900 was the largest natural disaster in United States history and left more than 6,000 dead.
764
IMAGING SCIENCE IN METEOROLOGY
Today, such disasters are mitigated by advanced hurricane warnings to the public, allowing early preparation and evacuation. Satellite imagery is an important tool used to issue these advanced warnings. Once the advanced warnings became available, lives lost in the United States by hurricanes were reduced dramatically, despite rapid expansion of population in coastal areas affected by hurricanes.
where |K|2 is a dimensionless factor that depends on the dielectric properties of the particle; it is approximately equal to 0.93 for water and 0.18 for ice at radar wavelengths. The radar reflectivity of clouds and precipitation is obtained by summing the cross sections of all the particles in the scattering volume and is written as Eq. (3):
Radar Imagery in Meteorology
The radar reflectivity factor Z is defined by Eq. (4), where the summation is across the scattering volume: 6 D Z= (4) V
Radar Measurements Meteorological radars transmit short pulses of electromagnetic radiation at microwave or radio frequencies and detect energy backscattered toward the radar’s antenna by scattering elements in the atmosphere. Radiation emitted by radar is scattered by water and ice particles, insects, other objects in the path of the beam, and refractive index heterogeneities associated with air density and humidity variations. The returned signal is the combination of radiation backscattered toward the radar by each scattering element within the volume illuminated by a radar pulse. This volume is determined from the angular beam width, the pulse duration, and the distance of the pulse from the radar, as it travels outward at the speed of light. Typically the beam width is about 1° and the pulse duration 1 microsecond. At a range of 50 km, the scattering volume equals about 108 m3 . In moderate rain, this volume may contain more than 1011 raindrops. The contributions of each scattering element in the volume add in phase to create the returned signal. The returned signal fluctuates from pulse to pulse as the scattering elements move. For this reason, the returned signals from many pulses are averaged to determine the average received power. Meteorologists use the average received power to estimate the intensity of precipitation. Doppler radar also measures the pulse-to-pulse phase change due to the average motion of the scattering elements along the radar beam and use this to determine the wind speed toward or away from the radar. Meteorologists image and animate radar data to track and predict the movement of storms, identify dangerous weather situations, estimate winds and amounts of precipitation, and relate rainfall distributions to atmospheric phenomena such as fronts. The radar range equation for meteorological targets such as raindrops is given by Eq. (1) Pr =
Pt G2 λ2 Vη, (4π )3 R4
(1)
where Pr is the average received power, Pt the transmitted power, G the antenna gain, λ the wavelength of transmitted radiation, R the range, V the scattering volume, and η the reflectivity. The radar cross section σ of a spherical water or ice particle whose diameter D is small compared to the wavelength λ is given by the Rayleigh scattering law, Eq. (2), σ =
π5 |K|2 D6 , λ4
(2)
η=
π5 |K|2 Z. λ4
(3)
It is customary to use meters cubed as the unit for volume and to measure the particle diameters in millimeters, so that Z has conventional units of mm6 /m3 . Typical values of the radar reflectivity factor range from 10−5 to 10 mm6 /m3 in nonprecipitating clouds, 10 to 106 mm6 /m3 in rain, and as high as 107 mm6 /m3 in large hail. Because of the sixth-power weighting on diameter in Eq. (4), raindrops dominate the returned signal in a mixture of rain and cloud droplets. The radar reflectivity factor Z is important because it is related to the raindrop size distribution within the scattering volume. In meteorological applications, the averaged returned power is measured and converted to values of Z using Eq. (1)–(4). Because Z varies across orders of magnitude, a logarithmic scale, defined by Eq. (5),
Z (5) dBZ = 10 log10 1 mm6 /m3 is used to display the radar reflectivity factor. For example, an increase from 20 to 40 dBZ represents a hundredfold increase in the radar reflectivity factor. Radar images of weather systems commonly seen in the media (e.g., Fig. 8) show the radar reflectivity factor in logarithmic units. Images of the radar reflectivity factor overlain with regional maps permit meteorologists to determine the location and intensity of precipitation. Meteorologists often interchangeably use the terms ‘‘radar reflectivity factor’’ and ‘‘reflectivity,’’ although radar experts reserve the term ‘‘reflectivity’’ for η. A small change in the frequency of the returned signal, called the Doppler shift, occurs when scattering elements move toward or away from the radar with the ambient wind. Doppler radar determines this small frequency shift by measuring the pulse-to-pulse change in the phase of the returned signal. Doppler measurements provide an estimate of the along-beam, or radial wind component, called the radial velocity. Images of the radial velocity across the full spatial domain of the radar allow meteorologists to estimate the direction and speed of the winds. When thunderstorms move across the radar viewing area, the average motion of the storms can be determined from animation of the radar reflectivity factor. This motion is often subtracted from the measured radial velocities to obtain the storm-relative radial velocity. Images of the storm-relative radial velocity are particularly useful in identifying rotation and strong winds that may indicate severe conditions.
IMAGING SCIENCE IN METEOROLOGY
765
(b)
(a)
10 20 30 40 50 60 70 Reflectivity factor (dBZ)
(c)
(d)
Figure 8. Composite radar images of the radar reflectivity factor (dBZ) from several radars in the midwestern United States during the 19 April 1996 tornado outbreak. Panel a is at 4 P.M. local time. Successive panels are each one hour later.
Images from individual radars are typically plotted in a map-like format called the ‘‘plan-position indicator’’, or PPI display; the radar is at the center, north is at the top, and east is at the right. Earth curvature and atmospheric refraction cause a radar beam to rise above the earth’s surface as the beam propagates away from the radar. For this reason, distant radar echoes on a PPI display are at higher altitudes than those near the radar. Composite radar images constructed from several radars are projections of data in PPI format onto a single, larger map. For this reason, there is ambiguity concerning the altitude of the echoes on composite images. Sometimes, a radar beam is swept between the horizon and the zenith at a constant azimuth. In this case, data are plotted in a ‘‘range-height indicator’’ or RHI display, which allows the meteorologist to view a vertical cross section through a storm. Radar Applications
or weakening, and determine their speed and direction. Storm tracking allows forecasters to estimate the time of storm passage several hours in advance. Meteorologists animate images from single radars to determine the movement, intensity change, and dimensions of individual storms and storm complexes. Composite images from radar networks, combined with other meteorological data, permit meteorologists to determine how precipitation regions organize within their parent weather systems, which can extend over thousands of kilometers. Figure 8 shows a series of images that are composites of radar reflectivity data from radars located in the midwestern United States during a tornado outbreak on 19 April 1996. A line of storms can be seen developing near the western Illinois border and moving eastward into central Illinois. Thunderstorms appear on these radar images as isolated regions of high reflectivity or as small cores of high reflectivity embedded in widespread precipitation.
Storm Tracking. Images and animations of the radar reflectivity factor allow meteorologists to observe storm development, determine whether storms are intensifying
Severe Thunderstorm Detection. Severe thunderstorms can produce large hail, damaging straight-line winds, and
766
IMAGING SCIENCE IN METEOROLOGY
(a)
Radar
(b)
Radar
10 km
Location of tornado
Hook echo
64 Storm-Relative Radial Wind Velocity (knots)
13 18 24 29 34 40 45 50 55 61 Reflectivity Factor (dBZ)
Figure 9. Radar images of (a) the radar reflectivity factor (dBZ) and (b) the storm-relative radial wind velocity (knots) for a supercell thunderstorm that produced a tornado near Jacksonville, IL on 19 April 1996. On the radial velocity display, red (green) colors represent air motion away from (toward) the radar. White lines are county boundaries. See color insert.
tornadoes. Radars, particularly those that have Doppler capability, allow meteorologists to issue timely warnings when severe thunderstorms threaten. Figures 9a and b are high-resolution images of the reflectivity and storm-relative radial velocity fields from a tornadic thunderstorm in the 19 April 1996 outbreak (see arrow in Fig. 8c) and illustrate the signature features of tornadic thunderstorms. Rotation on Fig. 9b appears as a tight couplet of adjacent strong inbound (green) and outbound (red) radial motions. Severe thunderstorms often develop a hook-like appendage in the reflectivity field (Fig. 9a). Tornadoes typically develop near the center of the hook. Animations of the reflectivity and radial velocity images, used to track hook positions and radial velocity couplets, allow meteorologists to estimate the arrival time of dangerous conditions and issue specific warnings. A second indicator of a storm’s severity and potential destructiveness is the reflectivity intensity. Values of reflectivity that exceed 55 dBZ are often associated with hail. Large hail that can reach diameters of 1–6 cm appears on radar images as reflectivity approaching 70 dBZ. Precipitation Measurement and Hydrology. The radar reflectivity factor Z is only a general indicator of precipitation intensity. An exact relationship between Z and the precipitation rate R does not exist. Research has shown that Z and R are approximately related by Eq. (6): Z = aRb ,
point. These differences are due to uncertainties in the values of a and b, radar calibration uncertainties, sampling uncertainties, and other sources of error. Some of these errors are random, so radar estimates of total accumulated rainfall over larger areas and longer times tend to be more accurate. Figure 10, the radar-estimated precipitation during the landfall of Hurricane Georges along the Gulf Coast of Louisiana, Alabama, and Florida in 1998, illustrates the high resolution with which rainfall patterns can be determined from radar imagery. Precipitation estimates
Mobile
Pensacola
New Orleans
(6)
where the coefficient a and the exponent b take on different values that depend on the precipitation type. For example, in widespread rain, a is about 200 and b is 1.6 if P is measured in mm/h and Z is in mm6 /m3 . In general, radar estimates of the short-term precipitation rate across an area can deviate by more than a factor of 2 from surface rain gauge measurements made at a
0 1 2 3 4 6 8 10 12 14 16 18 20 22 Radar estimated rainfall (inches) Figure 10. Total precipitation through 6 A.M. local time on 29 September 1998, as measured by the National Weather Service radar at Mobile, Alabama during the landfall of Hurricane Georges.
IMAGING SCIENCE IN METEOROLOGY
such as those in Fig. 10 allow hydrologists to estimate the total precipitation across a watershed, which can be used to determine stream runoff. For this reason, radar is an important tool for issuing flash flood warnings. Aviation Meteorology. Thunderstorms, hail, lightning, and strong winds pose a direct threat to aviation. In particular, a phenomenon called wind shear, a sudden sharp change in wind speed or direction along the flight path of an aircraft, can pose an extreme hazard for an aircraft that is approaching or departing an airport. Wind shear occurs in downbursts, localized intense downdrafts that induce an outburst of potentially damaging winds on or near the ground. Wind shear also occurs along gust fronts, the leading edge of fast moving rain-cooled air that rushes outward from precipitating areas of thunderstorms. Radar imagery is routinely used to identify downbursts and track the position of gust fronts. Figure 11 shows an example of gust fronts located southwest of three severe thunderstorms in South Dakota on 27 July 1999. The leading edge of the strong winds appears as thin lines of weak reflectivity in the radar reflectivity field. The thin lines are associated with cloud lines that form at the leading edge of the advancing cool air. Gust fronts can also appear as sharp gradients in the radial velocity field. Doppler radars are now installed near larger airports to provide warnings of wind shear. Storm Structure Studies. Doppler radar has made it possible to examine the kinematic and thermodynamic structure of storms in fine detail. When two or more Doppler radars simultaneously observe a storm from different viewing angles, the wind fields within the storm can be retrieved at a spatial resolution as high as 200–500 m. Using appropriate processing techniques, the entire wind field, including updrafts and downdrafts,
North Dakota
South Dakota
Gust front boundaries
10
Minnesota
20 30 40 50 60 Reflectivity factor (dBZ)
70
Figure 11. Composite radar images of the radar reflectivity factor during a thunderstorm outbreak in South Dakota on 27 July 1999. The thin lines of higher reflectivity indicate gust fronts, the leading edges of cool air that rush outward from thunderstorms.
767
can be determined. Images of the wind and reflectivity fields help meteorologists understand 3-D flow patterns within storms and address questions about storm structure and evolution. For example, Fig. 12a shows an infrared satellite image of a storm system over the central United States. Radar data superimposed on the satellite image show a thunderstorm over northeast Kansas. Figure 12b shows a vertical cross section of the radar reflectivity and winds across this thunderstorm derived from measurements from two Doppler radars located near the storm (Fig. 12a). The forward speed of the storm has been subtracted from the winds to illustrate the circulations better within the storm. The vertical scale on Fig. 12b is stretched to illustrate the storm structure better. A 15-km wide updraft appears in the center of the storm, where central updraft speeds approach 5 m s−1 . The updraft coincides with the heaviest rainfall, indicated by the high reflectivity region in the center of the storm. Techniques have also been developed to estimate pressure and temperature fields within storms from the wind fields, once they are deduced. Since the mid-1970s, atmospheric scientists have deployed networks of Doppler radars to investigate the three-dimensional structure of thunderstorms, hurricanes, fronts, and many other weather phenomena. Precipitation Physics. Precipitation forms by two processes. The first involves the growth (and possible melting) of ice particles, and the second involves the collision and coalescence of cloud droplets. Radar provides one of the best methods for detecting precipitation particles in clouds because of the sixth-power relationship between particle diameter and reflectivity (Eq. 4). Radar measurements, combined with aircraft samples and other measurements of cloud properties, have been used to study the relative importance of these two processes in different cloud types. Radar can identify regions of new precipitation formation in clouds. The subsequent growth of precipitation particles can be inferred from images and animations of the reflectivity, as precipitation falls to the earth’s surface. For example, in cold stratiform clouds, streams of ice particles are often detected forming near a cloud top and falling into the lower part of the cloud. The reflectivity increases as the streams of particles descend, allowing meteorologists to estimate the rate of growth of the ice particles. Radar also provides information about the fall speed of precipitation particles. The fall speed is related to the size and shape of particles and to the motion of the air in which the precipitation is embedded. In stratiform clouds, which often have vertical motions of a few cm s−1 or less, the distribution of fall speeds measured by a vertical pointing Doppler radar can be related to the particle size distribution. A prominent feature on radar images from widespread clouds is a ‘‘bright band’’ of high reflectivity at the altitude of the melting level. When snowflakes melt, their reflectivity increases because the dielectric factor |K|2 of water exceeds that of ice (Eq. 2). Snowflakes become more compact and finally collapse into raindrops, as they continue to melt and descend through the cloud. The drops
768
IMAGING SCIENCE IN METEOROLOGY
(a)
Radar A
Radar B
Location of cross section in Panel A
(b) 10
8
−5
7
0
Radar reflectivity factor (dBZ)
Height above sea level (km)
9
5 6
10 15
5
20 4
25 30
3
35 2
40
5 m/s
20 m/s
1 0
8
16
24
32
40 48 56 Distance (km)
64
72
80
88
Figure 12. (a) Infrared satellite image of a storm system over the Central Plains of the United States on 14 February 1992. Composite radar data are superimposed on the image. The location of the two Doppler radars used to create the image in panel B and the location of the cross section in panel B are shown on the image. (b) Vertical cross section of the radar reflectivity factor and storm-relative wind in the plane of the cross section through the thunderstorm in northeast Kansas that appears on panel A.
have a smaller radar cross section and fall faster than snowflakes, leading to reduced reflectivity below the melting layer. Wind Measurements. Some radars are designed to operate at longer wavelengths, so that they are sensitive
to turbulent irregularities in the radio refractive index associated with temperature and humidity variations on a scale of half the radar’s wavelength. These radars, called wind profilers, measure a vertical profile of the wind fields above the radar. Doppler frequency measurements enable estimating the drift velocities of the scattering elements,
IMAGING SCIENCE IN METEOROLOGY
ADVANCED USES OF IMAGERY
8
Height (km)
6
4
2
0
769
12
11
10
9 8 7 Time (CST)
6
5
4
Figure 13. Hourly profiles of winds as a function of height above a wind profiler on 20 January 2000. The wind profiler is located at Fairbury, Nebraska. Short barbs, long barbs, and flags on the staff denote winds of 5, 10 and 50 knots, respectively. The orientation of the staff denotes the wind direction. For example, a vertical staff that has barbs on top denotes wind from the north, and a horizontal staff that has barbs on the left denotes a wind from the west. Most winds on this figure are from the northwest. Color permits easier visualization of the wind speed (blue: ≤20 knots; green: 20–50 knots; yellow: 50–60 knots; orange: 60–70 knots; red: >70 knots). See color insert.
from which wind velocity can be obtained. Wind profilers operate best in precipitation-free conditions. These radars can measure a vertical profile of the wind speed and direction across the profiler site at time intervals as short as 5 minutes. Shorter wavelength radars used for storm detection also can generate vertical wind profiles. The technique involves analyzing the radial (along-beam) wind component on a circle at a fixed distance from the radar. More distant circles correspond to higher elevations. Echoes must cover a sufficient area around the radar to make these measurements. For this reason, the technique works best when precipitation is present. Figure 13 shows an image that depicts hourly vertical wind profiles measured by a profiler in Fairbury, Nebraska on 20 January 2000 during a cold air outbreak. The winds at the site were from the northwest and increased with height to the level of the jetstream, which was located 8 km above the surface (red barbs). A shift in the low level winds near the earth’s surface from northwesterly to southwesterly marked the passage of the center of the high pressure system associated with the cold air. Together, wind profilers and shorter wavelength radars provide wind measurements aloft in most atmospheric conditions.
Advances in computing capabilities now allow meteorologists to superimpose meteorological data analyses, obtained from either measurements or numerical simulations, on radar and satellite imagery. Animations of images using data superimposed permit meteorologists to associate cloud and precipitation patterns better with fronts, jetstreams, or other meteorological phenomena that are responsible for creating the patterns. Figures 14a–d, for example, shows a four-hour sequence of infrared satellite images taken early in the day on the same day that the thunderstorm illustrated in Fig. 12b developed. Superimposed on Fig. 14 are radar data, terrain contours, and analyses of the surface wind and relative humidity based on measurements taken at weather stations. The earliest evidence of the developing thunderstorm on Fig. 12B actually appeared 7 hours before the time of Fig. 12 when, at 8 A.M. local time (panel a of Fig. 14), a small bow-like cloud band first developed over southwest Kansas (see arrow). The cloud band developed south of a low pressure center, which on Fig. 12a is located at the center of the counterclockwise wind circulation. A region of very dry air appears in the relative humidity field extending southeastward from the east slope of the Rockies and then northeastward around the low pressure center. The dry air, which had descended the east slope of the Rocky Mountains, was associated with strong winds. The cloud band that eventually developed into the thunderstorm shown on Fig. 12 was located at the leading edge of this advancing, dry air mass (Fig. 14a–d). This simple illustration shows how meteorologists use images and superimposed data to interpret dynamic and physical processes in the atmosphere and associate them with clouds and storm systems. Atmospheric scientists use imaging techniques to display other types of meteorological data in addition to satellite and radar data. For example, lidars, devices that transmit laser light and measure light backscattered by aerosol and cloud particles, can be used to determine atmospheric circulations impossible to detect by satellite and radar. For example, Fig. 15 shows a high-resolution image from the University of Wisconsin’s Volume Imaging Lidar, a scanning lidar that was located in Sheboygan, Wisconsin, approximately 20 meters from the west shore of Lake Michigan in winter. The image shows patterns produced by very light snow (not visible to the eye) falling into the convective atmospheric surface layer over the lake. The patterns result from combinations of patterns in the snowfall and the subsequent organization caused by surface layer convection. Animated time sequences of images such as Fig. 15 allow researchers to observe wind circulations as they track the movements of particles and aerosols. Animations of vertical cross sections, produced by scanning the lidar between the horizon and zenith, show convection and turbulent circulations. Forecasting today begins with predictions generated by computer models. These models, systems of mathematical equations that describe the behavior of the atmosphere, are based primarily on Newton’s laws of motion and conservation of mass and energy. Additional equations
770
IMAGING SCIENCE IN METEOROLOGY
(a)
8 a.m.
(b)
9 a.m.
30 40 50
90 50
70
90 80 60
70
25 m/s m/s 25 (c)
10 a.m.
25 m/s
(d)
11 a.m.
90 30 50
90 70 80 30 4050 60 30
20
70
25 m/s
25 m/s
Figure 14. Surface relative humidity (%, white lines) and wind fields (white vectors) overlaid on infrared satellite images and radar echoes at 8, 9, 10 and 11 A.M. Central Standard Time, 14 February 1992, for a region of the Central Plains of the United States centered on the state of Kansas. The black lines are elevation contours; with the outer contour is at 1,400 m, and the interval is at 600 m. The arrows on each panel point to the small cloud band discussed in the text. This small cloud band developed into the thunderstorm illustrated in Figs. 12a and 12b.
describe the behavior of ideal gases; heat and moisture transfer from the earth’s surface to the atmosphere; the phase changes of water between vapor, liquid, and ice; and solar and terrestrial radiation transfer within the atmosphere. When the equations are packaged together in a computer code, they are called a ‘‘numerical model’’ of the atmosphere. Numerical models have been developed to study a broad range of atmospheric processes that range from long-term climate change to the growth and evolution of individual clouds. The models are typically formulated on a three-dimensional grid whose resolution depends on the phenomena to be simulated. Numerical models generate huge volumes of data. For
example, a model that produces a 48-hour weather forecast might generate a trillion pieces of data. The ability to employ imaging and visualization techniques to analyze these data has been a major development in the atmospheric sciences. For example, Fig. 16 shows a threedimensional rendering of the thunderstorms associated with the 19 April 1996 tornado outbreak (Figs. 8 and 9). The data used to create this image are from a numerical simulation. The ‘‘thunderstorms’’ in this image are actually isosurfaces of the rainwater concentration calculated by the model. The coloring denotes the surface water vapor field, and the arrows show the surface wind. By examining the relationship of the thunderstorms to
IMAGING SCIENCE IN METEOROLOGY
12/20/97
ele = 0.53
10
8
6
4 km
2
0
2
15:19:13 UT
771
University of Wisconsin Volume Imaging Lidar 20-Dec-1997 15:19:13 UT
2
4
6
8
10 km
12
14
16
18
Figure 15. Lidar image of very light snow (invisible to the eye) falling into the convective atmospheric surface layer over Lake Michigan. The data have a range resolution of 15 m and an azimuthal resolution of 25 m at 18 km from the lidar. The lidar beam was 5 m above the lake surface (courtesy of the University of Wisconsin-Madison Lidar Group and Dr. Ed Eloranta).
various other quantities calculated by the model, the cause of the thunderstorms and the reasons for their severity can be investigated. Three-dimensional rendering and animation of numerical model data, a relatively new tool in meteorological research and forecasting, are contributing to both better understanding and prediction of atmospheric processes. FUTURE DIRECTIONS New instrumentation, analytic techniques, and data visualization methods continue to revolutionize the field of meteorology. In recent years, the advancement in computer processing power has been matched by our ability to collect information from satellites. Collecting satellite data at better spatial, temporal, and spectral resolution, along with better computing power, will continue into the future, thereby gathering more information and improving the analytic techniques used in meteorology. This anticipated growth of information must also be matched by our ability to assimilate the data into an operational forecasting environment. Operational satellite imaging technologies must first pass an experimental phase. Recently, several new concepts in experimental satellites have come forth. For example, the Tropical Rainfall Measuring Mission (TRMM), launched in November 1997, carries the first
Figure 16. Numerical simulation of tornadic thunderstorms from the 19 April 1996 tornado outbreak (see Figs. 8 and 9). The viewpoint is from an altitude well above the storms looking north. The image shows surface wind vectors, rainwater (the isosurface where rain water concentration = 0.5 grams of liquid water/kilogram of dry air), and ground-level water vapor content (colored and contoured every 1 gram of water vapor/kilogram of dry air). Green (orange) shading represents moist (dry) air (courtesy of Dr. Brian Jewett, University of Illinois at Urbana/Champaign). See color insert.
precipitation radar on an earth-orbiting satellite platform. For the first time, the three-dimensional distribution of precipitation particles can be studied from space. The data from the precipitation radar is already finding operational demand, and the images constructed from the
772
IMAGING SCIENCE IN METEOROLOGY
data are giving meteorologists new insight into tropical precipitation processes. Placing active instruments in space has been hampered by their large costs and bulky technologies. However, these problems are being resolved, and the future of spacebased active remote sensing looks promising. During the next several years, the National Aeronautics and Space Administration (NASA) and other space agencies around the world will be launching several active meteorological instruments, including cloud radar and lidar instruments. These are not imagers; rather, they collect a vertical cross section of data along the orbital path, thus giving us a 2-D vertical cross-sectional view of the underlying meteorology, as opposed to a 2-D horizontal view from imagers. Multiangle viewing instruments are also generating new ways of gathering meteorological information from space. Traditionally, satellite instruments have been single viewing, or in the case of many scanners, have tended to scan across the orbital track, so that different view angles correspond to different scenes. These instruments use spectral signatures to sense scene properties remotely. Multiangle viewing instruments are designed to view the same scene from different view angles, typically within several minutes of each view. These instruments take advantage of the angular anisotropy of the upwelling radiation field to sense scene properties remotely; thus, they combine both spectral and angular signature techniques for remote sensing. Three multiangle viewing instruments have been placed in orbit thus far: the Along Track Scanning Radiometer (ATSR), the Polarization and Directionality of the Earth’s Reflectance (POLDER), and the Multiangle Imaging SpectroRadiometer (MISR). In addition to angular signatures, instruments such as MISR offer stereoscopic capabilities that allow fusing images from different view angles to give depth information about the scene. This essentially allows us to view the world in 3D. For example, Fig. 17 shows the MISR stereoanaglyph image of Hurricane Alberto taken on 19 August 2000. The three-dimensional structure of the hurricane stands out when viewed by red-blue 3-D glasses. New visualization tools are being developed to take full advantage of these new 3-D images from space. No one satellite instrument can gather all the information needed from space. A well-coordinated suite of instruments and platforms is required to monitor the earth’s weather and climate systems properly. This statement has recently been reinforced by NASA’s introduction of the Earth Observing System (EOS). EOS is a long-term coordinated effort through NASA’s Earth Science Enterprise program. It is designed to provide a suite of earth-orbiting satellites needed for long-term monitoring of the earth’s land surface, biosphere, solid earth, ocean, and atmosphere. By providing synergy amongst instruments onboard a satellite and across satellite platforms, EOS will provide us with a more complete picture of our natural environment. Similar advances are occurring in radar meteorology. Current operational meteorological radars are designed to radiate and receive electromagnetic waves that have
Figure 17. MISR stereoanaglyph image of Hurricane Alberto taken on 19 August 2000. At this time, Alberto was located about 1,700 km west of the Azores. Viewing this stereoanaglyph image with red/blue 3-D glasses (red lens over the left eye) clearly shows the three-dimensional structure of this storm. The hurricane’s eye is about 60 km in diameter. The thunderstorms in the hurricane’s spiral arms and the funnel-shaped eye wall are apparent when viewed in 3D (courtesy of the National Aeronautics and Space Administration, Goddard Space Flight Center and Jet Propulsion Laboratory MISR Science Team). See color insert.
fixed polarization, a single, fixed orientation of the electric field. Special research radars called polarization diversity radars allow varying the polarization state of the transmitted and/or received signal. Polarization measurements are sensitive to particle orientation. As large raindrops fall, they have a tendency to flatten into an oval shape. Hail tumbles as it falls and generally has no preferred orientation. Ice crystals may or may not have preferred orientation depending on their shape. Polarization measurements can discriminate hail from rain and identify predominant types of ice particles in clouds. Researchers are also exploring methods for using polarization techniques to estimate rainfall better. Polarization diversity measurements are expected to be incorporated into operational Doppler radars within the first decade of the new century. Bistatic radar systems are radars that have one transmitting antenna but several receivers located 10–50 km away from radar. Researchers are currently examining the potential of these radar systems to retrieve high-resolution wind fields in storms. Bistatic radars measure microwave energy scattered toward each receiver. The additional receivers measure the pulse-to-pulse phase change associated with the Doppler shift, from which they determine the wind component toward or away from the receiver. Because the receivers view a storm from different directions, all wind components are measured and make it possible to retrieve wind fields within a storm similarly to that currently done with two or more Doppler radars. Future networks of bistatic radars may make it possible to create images of detailed wind fields within storms in near-real time, providing forecasters a powerful tool to determine storm structure and severity.
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
773
New data visualization methods continue to be developed to image, animate, and manipulate meteorological data. Techniques now exist to render surfaces in three dimensions (e.g. Fig. 16); superimpose data and air trajectories on those surfaces; and animate, rotate, and view the displayed data from various perspectives. These new computational capabilities and new data sets are improving meteorologists’ understanding of weather and storms and are leading to better forecasts and warnings of severe weather for the public.
ABBREVIATIONS AND ACRONYMS ATS ATSR AVHRR EOS GMS GOES METEOSAT MISR NASA NOAA POLDER PPI RHI TIROS
applications technology satellite along track scanning radiometer advanced high resolution radiometer earth observing system geostationary meteorological satellite geostationary operational environmental satellite meteorological satellite multiangle imaging spectroradiometer national aeroautics and space administration national oceanic and atmospheric administration polarization and directionality of the earth’s reflectance plan position indicator range height indicator television and infrared observational satellite
Figure 1. Gaspard Felix Tournachon, also known as Nadar, collected photographs from a hot-air balloon over Paris in 1859.
BIBLIOGRAPHY 1. D. Atlas, ed., Radar in Meteorology, American Meteorological Society, 1990. 2. M. J. Bader et al., in Images in Weather Forecasting, A Practical Guide for Interpreting Satellite and Radar Imagery, Cambridge University Press, Cambridge, U.K., 1995. 3. R. J. Doviak and D. S. Zrni´c, in Doppler Radar and Weather Observations, 2nd ed., Academic Press, San Diego, CA, 1993. 4. S. Q. Kidder and T. H. Vonder Haar, in Satellite Meteorology: An Introduction, Academic Press, San Diego, CA, 1995.
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE ROBERT D. FIETE Eastman Kodak Company Rochester, NY
BRIEF HISTORY Overhead surveillance has come a long way since the days when Gaspard Felix Tournachon, also known as ‘‘Nadar,’’ collected the first known aerial photographs from a hotair balloon 1,200 feet (370 m) over Paris in 1858 (Fig. 1). The first successful aerial photo taken in the United states and the earliest existing aerial photograph was
taken on October 13, 1860, by James Wallace Black from a balloon above Boston (Fig. 2). Thaddeus Lowe urged the creation of the U.S. Army Balloon Corps in 1862 to survey the Confederate positions during the U.S. Civil War. Unfortunately, the balloons made easy targets for the Confederate soldiers to shoot down, so the unit was deactivated in 1863. In 1906, George Lawrence used a camera carried aloft by seventeen kites to capture an aerial view of San Francisco, California, several weeks after the devastating earthquake (Fig. 3). In 1907, Alfred Maul patented a gyroscopically stabilized camera for taking overhead pictures (1) using rockets that could reach an altitude of 2,600 feet (700 m) (Fig. 4). Using breast-mounted cameras for pigeons, patented by Julius Neubronne in 1903, the Bavarian Pigeon Corps captured overhead images of Germany in 1909 by using a timing mechanism to take a picture every 30 seconds during the pigeon’s flight (Fig. 5). The invention of the airplane in 1903 made overhead surveillance more practical than kites, pigeons, or hot air balloons. Wilbur Wright’s passenger, L. P. Bonvllain, took the first photographs from an airplane in 1908 near Le Mans, France. In 1909, Wilbur Wright took photographs from an airplane using a motion picture camera over Centrocelli, Italy (Fig. 6). France and Germany were quick to develop aerial reconnaissance cameras and photointerpretation methods during World War I. Cameras were eventually mounted on the side of the biplane to relieve the photographer of holding the bulky
774
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Figure 2. Earliest existing aerial photograph taken in 1860 by J. Wallace Black from a balloon above Boston.
Figure 3. George Lawrence used seventeen kites to capture an aerial view of San Francisco after the devastating earthquake in 1906.
camera and to acquire consistent vertical views (Fig. 7). These images provided invaluable information on troop positions and a visual map of the vast network of trenches through the battlefields (Fig. 8). The first nighttime aerial photograph was taken by George W. Goddard in 1925 when he ignited an 80-pound flash powder bomb over Rochester, New York. The science of photoreconnaissance and photointerpretation advanced dramatically during World War II due to the development of color infrared film, radar systems, and the collection of stereo images. Harold Edgerton developed a powerful flash system that was used for collecting
nighttime aerial photographs of Normandy before the D-Day invasion. Innovative camera designs were developed to operate at altitudes above 20,000 feet (6 km), well out of range of enemy guns. These high altitude images were used primarily for strategic purposes, such as locating and identifying production facilities. Figure 9 shows an aerial photograph taken in 1944 over Peenemunde that captured an image of the V-2 rocket being developed by the Germans. Unfortunately, tactical information still required higher resolution images, which required pilots to fly high-speed, low-altitude missions, generally putting their lives at great risk (2).
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Figure 4. Overhead image from Alfred Maul’s rocket.
As the cold war between the United States and the Soviet Union intensified in the 1950s, the need for high-altitude reconnaissance was answered by the development of the U-2 at Lockheed’s ‘‘Skunk Works’’
775
under the leadership of C.L. ‘‘Kelly’’ Johnson. The U2 (Fig. 10) had a wingspan of 80 feet (24 meters) and could fly at an altitude of 80,000 feet (24 km), well out of reach of Soviet missiles at the time. It had a range of 3,000 miles (4,800 km) and a maximum speed of more than 500 mph (220 m/s). The U-2 B camera could take images with 2.5 feet (0.8 m) resolution from an altitude of 65,000 feet (20 km). The first U-2 mission was flown in 1955, and modified versions are still in operation 45 years later. The images collected by the U-2 provided the U. S. military with information that could not have been provided by any other methods. This was most dramatically demonstrated during the Cuban missile crisis in 1962. Although intelligence reports suspected that the Soviet Union intended to deploy missiles in Cuba, the Soviets officials denied this. On 14 October, two U-2 aircraft flew over Cuba, and the photographs provided undeniable proof that the Soviets were constructing bases for intermediate-range missiles. Figure 11 shows an aerial image of a missile launch site in Cuba during the crisis. On 28 October, Soviet Premier Khrushchev agreed to remove the missiles. Several years after the U-2 became operational, advances in Soviet radar capabilities allowed them to track the U-2 reliably. Although the Soviets did not
Figure 5. Carrier pigeons were used to collect overhead images of Germany in 1909.
776
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Figure 6. Photograph taken from an airplane by Wilbur Wright over Centrocelli, Italy in 1909.
Figure 7. Reconnaissance camera attached to the side of a plane during World War I.
Figure 8. Aerial image taken of Fort Douaumont near Verdun, France during World War I.
possess a reliable means for shooting down the U2 aircraft, the tracking data could be used to protest the overflights. The SR-71 (Fig. 12), also known as the ‘‘Blackbird,’’ was developed from the Lockheed A-12 and YF-12A aircraft as an advanced reconnaissance aircraft to replace the U-2. Like the U-2, it was developed and built
Figure 9. Aerial image of a German V-2 rocket (circled) at Peenemunde during World War II.
Figure 10. U-2 in flight (courtesy of USAF).
at Lockheed’s ‘‘Skunk Works.’’ The SR-71 aircraft could fly at more than 2,200 mph (980 m/s), could reach cruising altitudes more than 85,000 ft. (26 km), and could fly more than 2,000 miles (3,220 km) without refueling. The first operational SR-71 was flown in 1964, and they continued in operation until the U.S. Air Force retired them in 1990. Several SR-71s are still in service today, but only for research purposes. Although more than 1,000 antiaircraft missiles were fired at the SR-71s during their lifetime, no SR-71s were ever hit. On 6 March, 1990, an SR-71 flew from Los Angeles, California, to Washington, District of Columbia in 1 hour and 4 minutes. SR-71s are still the world’s fastest and highest flying production aircraft. After Gary Power’s U-2 was shot down in 1960, the United States pledged that it would cease overflights of the Soviet Union. This pledge made the SR-71, which was then under development, useless for collecting overhead surveillance of the Soviet Union. Another method for collecting reconnaissance images of the Soviet Union was needed. In 1945, the Air Force commissioned the RAND Corporation to investigate the feasibility of reconnaissance
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
777
Figure 11. Aerial image of a missile launch site in Cuba on 23 October, 1962 (courtesy of USAF).
Figure 12. SR-71 in flight (courtesy of USAF).
from space. Their recommendation was for a televisiontype system, that had severe limitations of spatial resolution. A new program was started that was based on collecting images on film and then returning the film to earth for processing. This program became the CORONA program (3,4) and was classified until 1995. The program was masked as a scientific mission called DISCOVERER. The CORONA camera acquired images on film, which was spooled into a capsule and returned to earth for processing. The first successful mission launch and retrieval of the film capsule finally occurred with the launch of DISCOVERER XIII on 10 August, 1960, but the mission was a no-orbit flight and carried diagnostic equipment rather than a camera and film.
The recovery of the film capsule was historic in that it was the first man-made object recovered from space. The next mission, DISCOVERER XIV, launched on 18 August, 1960, achieved an orbit whose perigee was 190 km and apogee was 810 km and returned the first overhead reconnaissance images from space. The first CORONA image taken of the Soviet airfield at Mys Shmidta on 18 August, 1960, is shown in Fig. 13. The ground resolution was approximately 10 meters and the image quality was poor, but the images provided significant intelligence information on missile sites and airfields. The early CORONA missions provided evidence that the Soviet Union had fewer intercontinental ballistic missiles than they had claimed, thus disproving the ‘‘missile gap.’’ With improvements in the film, optics, and the reduction of system vibrations, the final CORONA system design, the KH-4B, could achieve ground resolution as good as 2 meters and had an orbital life of 18 days. Figure 14 shows a CORONA image of the Pentagon on 25 September, 1967 from the first KH-4B system. The CORONA program was discontinued in 1972. The first image of the earth from space was transmitted from Explorer VI, launched on 7 August, 1959. Figure 15 shows an image of the northern Pacific Ocean taken from an altitude of 27,000 km by Explorer VI on 14 August, 1959. Although crude, it demonstrated the capability of photographing the earth’s cloud cover using a television camera in space. NASA developed the Television and Infrared Observation Satellite program (TIROS) as the world’s first meteorological satellite. TIROS-1 was launched on 4 April, 1960 and consisted of two television cameras. Although it collected data for only 78 days, it
778
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Figure 13. First CORONA image taken of the Soviet airfield at Mys Shmidta on 18 August, 1960.
Figure 15. First image of the earth taken from space, by Explorer VI on 14 August, 1959 (courtesy of NASA).
Figure 14. CORONA 25 September, 1967.
image
taken
of
the
Pentagon
on
successfully demonstrated the utility of meteorological satellites. The first television picture taken from space by TIROS-1 is shown in Fig. 16. The next generation of meteorological satellites consisted of the NIMBUS satellites, first launched on 28 August, 1964. These satellites were placed in sun-synchronous orbits and carried more sophisticated television cameras as well as infrared systems that allowed capturing images at night. The Geostationary Operational Environment Satellite (GOES) program, first launched in 1975, operates two geostationary weather satellites for National Oceanic and Atmospheric Administration (NOAA). The two satellites give continuous coverage of both the Western and Eastern Hemispheres and track large storms, such as hurricanes. Figure 17 shows a GOES satellite image of hurricane Diana on 11 September, 1984.
Figure 16. First TIROS-1 image and the first television picture taken from space on 1 April, 1960 (courtesy of NASA).
Due to the success of the weather satellites, NASA developed satellites to collect scientific data on the earth’s resources. The first Landsat satellite, originally known as the Earth Resources Technology Satellite (ERTS), was
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
779
Figure 17. GOES satellite image of hurricane Diana, 11 September, 1984 (courtesy of NOAA).
launched on 23 July, 1972 and collected digital image data that was downlinked to several ground stations. Landsat 1 had a return beam vidicon (RBV) and a multispectral scanner (MSS) that collected images at a resolution of 80 meters (5). Landsat 4, launched on 16 July, 1982, replaced the RBV with the Thematic Mapper (TM), which collected multispectral data at a resolution of 30 meters. Landsat 7, launched on 15 April, 1999, added
the capability to collect panchromatic (black-and-white) images at a 15-meter resolution. Landsat images can reveal the extent of natural disasters, such as the 1993 St. Louis flooding captured by Landsat 5 TM images from an altitude of 705 km (Fig. 18). The French Space Agency launched its first Systeme Probatoire d’Observation de la Terre (SPOT) satellite on 22 February, 1986. The sensors on the SPOT satellites can collect multispectral images
780
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Figure 19. One-meter resolution IKONOS image of the San Diego Convention Center, taken on 4 April, 2000 (courtesy of Space Imaging).
Figure 18. Landsat 5 TM images of St. Louis, Missouri, show the extent of the 1993 flood (courtesy of EOSAT).
at a 20-meter resolution and panchromatic images at a 10-meter resolution. Presidential Decision Directive 23 and the Remote Sensing Act of 1992 liberalized licensing of commercial remote sensing systems in the United States. Space Imaging’s IKONOS satellite, launched on 24 September, 1999, was the first commercial satellite to capture imagery at a resolution of 1 meter. IKONOS has a panchromatic sensor that can capture black-and-white images at a resolution of one meter and a multispectral sensor that can capture color images at a resolution of four meters from an altitude of 680 km. The multispectral image can be used to generate a 1-meter resolution color image (Fig. 19). The image quality and collection capabilities of satellite surveillance systems continue to improve dramatically. Figure 20 shows a comparison of a CORONA image of the Washington Monument, collected in 1967 and an IKONOS image collected in 1999. Note that the scaffolding on the monument is clearly visible in the IKONOS image. The image quality of the commercial image in 1999 is far superior to the image quality of the then classified image captured 32 years earlier. Today, an increasing amount of overhead surveillance imagery is collected by systems that do not require pilots. Satellites are used to collect data high above the earth, and unmanned aerial vehicles (UAVs) are being used to collect information at lower altitudes. Overhead
CORONA (25, September 1967)
IKONOS (30, September 1999)
Figure 20. Comparison of CORONA and IKONOS images of the Washington Monument.
surveillance systems continue to provide environmental and meteorological data to the scientific community as well as important intelligence data from areas that are difficult to access. Even human atrocities cannot escape the images captured by overhead surveillance cameras. Aerial images collected in 1944 showed the existence of Nazi concentration camps (Fig. 21) in countries controlled by Germany. Satellite images showed the extent of the oil fires in Kuwait set by retreating Iraqi soldiers in 1991 (Fig. 22). In early 1999, Kosovo refugees described mass killings that were being committed by Serbian forces, but these accounts could not be confirmed. Overhead images taken in April 1999 indicated scores of new graves where they had not appeared a month earlier (Fig. 23), offering compelling evidence of Serbian atrocities.
Figure 21. Aerial image of the Auschwitz-Birkenau extermination camp in Poland on 25 August, 1944 (courtesy of National Archives).
NOAA (4-km resolution)
Landsat (30-m resolution)
781
Figure 22. Satellite images taken in 1991 show the extent of the oil fires in Kuwait (courtesy of NOAAe and EROS Data Center).
782
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
is the portion to which the human eye is sensitive and extends from approximately 0.4 to 0.7 µm. The radiometry determines the strength of the signal that will be produced at the detector. The characteristics of the ground target, the atmosphere, and the camera system play critical roles in determining of the signal. The spectral radiant exitance of a blackbody for a given wavelength of light λ, is given by Planck’s equation (5–7), MBB (λ, T) =
Figure 23. Overhead images taken in April 1999, indicated scores of new graves near Izbica, Kosovo (courtesy of NATO).
IMAGE CHAIN OF OVERHEAD SURVEILLANCE SYSTEMS Linking together many steps in an image chain creates the final image product produced by an overhead surveillance system. Each link plays a vital role in the final quality of the image. Figure 24 illustrates the key components of the imaging chain of a remote sensing satellite. The image chain begins with the electromagnetic energy from a radiant source, for example, the sun. Surveillance systems are generally categorized by the wavelength region of the electromagnetic spectrum (Fig. 25) that the sensor images. The visible spectrum
2π hc2 λ5
1 hc
e λkT − 1
watts , m2 − µm
(1)
where T is the temperature of the source in K, h = 6.63 × 10−34 (j-s), c = 3 × 108 (m/s), and k = 1.38 × 10−23 (j/K). For a Lambertian surface, the spectral radiance from a blackbody is given by watts MBB (λ, T) . (2) LBB (λ) = π m2 − µm − sr If the exitance from the sun is approximated by a blackbody, then the solar spectral irradiance on a target on the ground can be approximated as Etarget (λ) ≈ MBB (λ, Tsun ) cos(φzenith )
r2sun sun – targ τ (λ) 2 rearth – sun atm
watts , m2 − µm
(3)
where rsun is the radius of the sun, rearth – sun is the distance sun – targ is the atmospheric from the earth to the sun, τatm transmittance along the path from the sun to the target, φzenith is the solar zenith angle, and Tsun is approximately 5,900 K. The atmosphere has a significant effect on the radiometry. Various molecules in the atmosphere absorb radiation at different wavelengths (Fig. 26) and allow using only certain atmospheric windows for imaging. The spectral radiance from a Lambertian target on earth at the entrance aperture of a remote sensing satellite can be calculated from (5–7) ρtarget (λ) targ – sat (λ) Etarget (λ) + Eskylight (λ) Ltarget (λ) = τatm π (4) + εtarget (λ)LBB (λ, Ttarget ) ,
1.0
Wavelength (µm) 10−6
10−4
10−2
1
102
104
106
108
1010
Transmittance
0.8 Figure 24. Illustration of an image chain for a satellite surveillance system.
0.6 0.4 0.2 0.0 0.2
Gamma rays X-rays UV Visible IR Microwave, Radar Figure 25. Electromagnetic spectrum.
Radio
1
10
Wavelength (µm) Figure 26. Atmospheric transmission from 0.2 to 10 µm calculated by MODTRAN v4.
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
in terms of m and Rimage , where f is the focal length of the optical system, (10) Rimage = f (m + 1).
Adetector Rtarget
R image
A image
Atarget Optics
Figure 27. An imaging system at a distance Rtarget from the target where the focal plane is at a distance Rimage from the camera optics.
targ – sat
where τatm is the atmospheric transmittance along the path from the target to the satellite, ρtarget is the reflectance of the target, εtarget is the emissivity of the target, and Eskylight is the irradiance on the target due to the skylight from atmospheric scattering. Radiometry models, such as MODTRAN, are generally used to calculate Ltarget because the radiometric calculations depend on the acquisition geometry and can be complicated (6). Note that the spectral radiance Ltarget is a combination of the solar irradiance that is reflected from the ground target, including any radiance scattered directly into the aperture from the atmosphere, as well as the blackbody radiance from the target. Assume that an imaging system is at a distance Rtarget from the target where the focal plane is at a distance Rimage from the camera optics, as shown in Fig. 27. For a polychromatic remote sensing camera where the aperture is small compared to the focal length f , the radiant flux within the spectral bandpass that reaches the entrance aperture of the camera from the target is Atarget Aaperture λmax Ltarget (λ) dλ R2target λmin λmax = Atarget Ltarget (λ) dλ(watts),
aperture =
(5)
λmin
where λmin and λmax define the spectral bandpass, Atarget is the area of the target, Aaperture is the area of the camera aperture, and is the solid angle that encompasses the aperture area. The area of the image Aimage , is given by Aimage = m2 Atarget ,
(6)
Rimage . Rtarget
image =
Aimage Aaperture f 2 (m + 1)2
λmax
Ltarget (λ)τoptics (λ) dλ.
detector = =
Adetector image Aimage Adetector Aaperture f 2 (m + 1)2
λmax
Ltarget (λ)τoptics (λ) dλ. (12) λmin
For overhead surveillance cameras that are large distances from their targets, such as those used in satellites, a telescope is required and Rtarget Rimage ; therefore, m + 1 ∼ = 1 and f ∼ = Rimage . If the overhead surveillance camera uses a telescope design that has primary and secondary mirrors, such as a RitcheyChretien or a Cassegrain, as shown in Fig. 28, then the radiant flux on the detector can be written as λmax Adetector π D2ap − D2obs Ltarget (λ)τoptics (λ) dλ, detector = 4f 2 λmin (13) or Adetector π(1 − ε) λmax Ltarget (λ)τoptics (λ) dλ, detector = 4(f #)2 λmin (14) where Dap is the diameter of the optical aperture, Dobs is the diameter of the central obscuration, ε is the fraction of the optical aperture area obscured, and f # is the system f number given by f . (15) f# = Dap
Primary mirror Focal plane
Secondary mirror
R2target R2image
(11)
λmin
If the size of the target is large compared to the groundinstantaneous field of view (GIFOV), then the target is an extended source and Aimage Adetector , where Adetector is the area of the detector. The radiant flux on the detector for an extended source is
(7)
Thus, Atarget can be written as Atarget = Aimage
Using Eq. (8) and Eq. (10), and multiplying by the transmittance of the optics τoptics , the radiant flux reaching the image plane is
The quality of the optics is critical to the final image quality and must be manufactured and built to very tight
where m is the magnification given by m=
783
.
(8)
Dap
Rewriting the Gaussian lens formula, 1 1 1 + = , Rtarget Rimage f
Dobs
(9)
Figure 28. Telescope design that has primary and secondary mirrors.
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Modulation =
No central obscuration, ε = 0
2 (A + B + C) MTFoptics (ν ) = π (1 − ε2 ) where
D0 D
ε= ν = νc =
10% central obscuration, ε = 0.32
ν νc
B = ε cos
−1
ν ε
for 0 ≤ ν ≤ 1
for ν > 1
A = 0,
(20)
−
ν ε
(22)
1−
ν ε
2
Figure 29. The optics MTF for an incoherent diffraction-limited Ritchey–Chretien optical system that has circular aperture and circular central obscurations of 0, 5, 10, and 25%.
1.0
Smear MTF
0.8 0.6
Detector diffusion MTF Optics quality MTF Jitter MTF
0.4
Optics MTF
0.2 System MTF
Detector aperture MTF
0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 Spatial frequency, normalized to the optical cutoff frequency
(23) for ν > ε
B = 0,
(24)
φ (1 + ε) 1 + ε2 − 1 − ε2 tan−1 2 (1 − ε)
φ − π ε2 , 2 C = 0,
0.0 0.0 0.2 0.4 0.6 0.8 1.0 Spatial frequency normalized to the optical cutoff frequency
Figure 30. The individual MTF curves for a notional design and the final system MTF after the individual MTF curves have been multiplied together.
,
for 0 ≤ ν ≤ ε
C = ε sin φ +
(21)
25% central obscuration, ε = 0.50
0.2
(19)
D 1 = λf λ(f #)
0.6
0.4
(18)
A = cos−1 (ν ) − (ν ) 1 − ν 2 ,
2
(17)
5% central obscuration, ε = 0.22
0.8
maximum signal − minimum signal . (16) maximum signal + minimum signal
The MTF for an incoherent diffraction-limited optical system whose a circular aperture of diameter is D and circular central obscuration of diameter is D0 is given by
× tan
1.0
MTTF
specifications. Small deviations in the curvature of the mirror can cause a large degradation in image quality. Light that is imaged by the optics will spread out the point-spread function (PSF) describes the spreading of the light for a point object. The optical transfer function (OTF) is the Fourier transform of the optics PSF, and the magnitude of the OTF is the modulation transfer function (MTF) of the optics (8,9). The MTF measures the output modulation divided by the input modulation for each spatial frequency ν, where modulation is defined as
MTF
784
for (1 − ε) ≤ 2ν ≤ (1 + ε) (25) for 2ν ≥ (1 + ε)
and φ = cos−1
1 + ε2 − 4ν 2 2ε
(26)
.
(27)
Figure 29 shows MTF plots for central obscurations of 0, 5, 10, and 25. Note that the MTF decreases as the spatial frequency increases, which has a blurring effect on the image. Note also that there is a cutoff spatial frequency at νc , which corresponds to the resolution limit of the optics. Increasing the size of the central obscuration does not
change νc , but it does decrease the MTF, which results in additional blurring. Other factors will also blur the image, for example, vehicle motion; each has its own MTF (10). The MTF of the actual optics will be lower than the MTF given in Eq. (17) due to imperfections in manufacturing the optics. The MTF of the optics is multiplied by an optical quality MTF to achieve the actual MTF. Other MTF contributors, such as the jitter and smear caused by camera motion, can be cascaded with the MTF of the optics to yield a system MTF. Figure 30 shows the individual MTF curves for a notional design of a digital camera and the final system MTF after the individual MTF curves have been multiplied together. The MTF of the optics is usually the most significant component of the system MTF. Although the traditional detector for surveillance systems has been film, today, most systems use digital charge-coupled devices (CCDs) for the detectors (10). The CCD detectors generate an electric charge that is proportional to the number of photons that reach the
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
detector, given by ndetector =
Adetector π(1 − ε) 4(f #)2
λmax
λmin
λ tint hc
× Ltarget (λ)τoptics (λ) dλ(photons),
(28)
where tint is the integration time of the imaging system. The signal from the ground target, measured in electrons, that is generated at the detector is starget =
Adetector π(1 − ε)tint 4(f #)2 hc
λmax
η(λ) λmin
× Ltarget (λ)τoptics (λ)λ dλ(electrons),
(29)
where η is the quantum efficiency, which is the average number of photoelectrons generated per incident photon. The CCD detector allows the image data to be electronically transmitted near real time to the user rather than waiting for the film to be returned to a lab, processed, and then shipped to the user. Push-broom sensors use a linear array of CCDs that scan the ground in one direction to build up a two-dimensional image, as shown in Fig. 31. Linear arrays allow imaging larger areas in less time than framing cameras that use two-dimensional CCD arrays. These sensors typically have time delay and integration (TDI) stages to improve the signal. A single, continuous, long array can be difficult to build, so they are generally built in staggered segments. The two-dimensional scene can be reconstructed from these staggered segments through ground processing. Every detector in an electronic image sensor, such as a CCD image sensor, may have a different response function that relates the target radiance to the number of photoelectrons generated. This response function can change with time or operating temperature. For a linear CCD detector array, this can lead to streaking in the image. The sensors are calibrated to reduce this nonuniformity, so that each detector has approximately the same response for the same illumination radiance. The calibration is generally performed by illuminating each detector with a given radiance from a calibration lamp and then recording the signal from each detector to
Linear detector array P
Optics
estimate the response function for each detector. Then, the response function for each detector is used in the ground processing to equalize the output of all of the detectors, so that uniform illumination across all of the detectors will produce a uniform output. Even when the detectors are calibrated, some errors from the calibration process are unavoidable. Each detector is sensitive to a slightly different spectrum of light, but they are all calibrated using the same calibration lamp that has a broad, nonuniform spectrum. Because the scene spectrum is unknown, the calibration process assumes the spectrum of the calibration lamp, and the scenes are identical. The spectrum of the calibration lamp will usually be somewhat different from the spectrum of the scene being imaged; hence, calibration errors will occur. Calibration errors also occur because the calibration process includes an incomplete model of the complete optical process and because the response function for each detector changes over time and operating temperature. Random noise in the signal arises from elements that add uncertainty to the signal level of the target and is quantified by the standard deviation of its statistical distribution. If the distribution of each of the different noise contributors follows a normal distribution, then the variance of the total noise is the sum of the variances of each noise contributor (10). For N independent noise contributors, the standard deviation of the noise is
σnoise
GSD GIFOV Ground plane
Figure 31. Push-broom sensors use a linear array of CCDs that scan the ground in one direction to build up a two-dimensional image.
N = σ 2. n
(30)
n=1
For images with high signals, the primary noise contributor is the photon noise, which arises from random fluctuations in the arrival rate of photons. The photon noise follows a Poisson distribution; therefore, the variance of the signal equals the expected signal level s: σphoton =
√ s.
(31)
When s > 10, the Poisson distribution approximates a normal distribution. The radiance from the target is not the only light that reaches the detector. Scattered radiance from the atmosphere, as well as any stray light within the camera, will produce a background signal with the target signal at the detector. The background contribution adds an additional photon noise factor to the noise term, thus the photon noise, measured in electrons, is σphoton = =
Direction of scan (along scan)
785
2 2 σphoton target + σphoton background
starget + sbackground .
(32)
As in calculating Ltarget , calculating the atmospheric contribution to the signal is a complicated process (6), therefore, radiometry models, such as MODTRAN, are generally used to calculate the background radiance component of sbackground . When no light is incident on the CCD detector, electrons may still be generated due to the dark noise σdark .
786
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Although many factors contribute to the dark noise (10), the principal contributor to σdark at nominal operating integration times of less than one second is the CCD read noise, caused by variations in the detector voltage. The value of σdark for a digital sensor is usually obtained from test measurements of the detector at a given temperature. The analog-to-digital converter quantizes the signal when it is converted to digital counts. This produces an uncertainty in the target signal because a range of target signals can produce the same digital count value. The standard deviation of a uniform distribution is √112 ; therefore, if the total number of electrons that can be stored at each detector, Nwell depth , is divided by NDR digital counts, where NDR is the dynamic range in digital counts, then the quantization noise is σquantization =
Nwell depth QSE = √ , √ NDR 12 12
(33)
where QSE is the quantum step equivalence in electrons per count. Combining Eq. (32) and Eq. (33) with the dark noise, the system noise in electrons can be written as σnoise =
2 2 starget + sbackground + σquantization + σdark (electrons). (34) The digital count value from the signal and noise at each detector represents a pixel, or picture element, in the image data that is then downlinked to the
Figure 32. Reducing the number of bits per pixel by simply quantizing the count values of each pixel.
ground station using a transmitter that has maximum transmission rate in bits per second. If the number of bits per pixel can be reduced or ‘‘compressed,’’ then more data can be transmitted within the limited bandwidth of the transmitter. For an 11-bpp (bits per pixel) detector, the dynamic range is NDR = 211 = 2, 048. Reducing the number of bits per pixel by simply quantizing the count values of each pixel throws away image information and also produces unacceptable quantization of the gray levels at compression levels of 4 bpp and less, as shown in Fig. 32. Bandwidth compression (BWC) algorithms attempt to reduce the average dynamic range of the image, that is, reduce the number of bits per pixel, while maintaining the image quality. Lossless BWC algorithms allow an exact reconstruction of the original image data, whereas, ‘‘lossy’’ BWC algorithms trade off this ability to allow greater compression of the image data (5). Figure 33 shows the effect on image quality as an 11-bit image is compressed to fewer bits per pixel using an adaptive differential pulse code modulation (DPCM) algorithm. The 4:1 compression does not allow exact reconstruction of the image data, but the compression is visually lossless, that is, there is no impact on image quality. The image quality loss from the 6:1 and 11:1 compression, however, is significant. Many strategies exist for compressing image data, such as wavelets, fractals, and vector quantization. After the image data has been downlinked to the ground station and decompressed, it may be hard to interpret the image due to low contrast, blurred edges, streaks from
11 bpp
5 bpp
4 bpp
3 bpp
2 bpp
1 bpp
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
787
by a computer for machine processing, such as feature classification (7). Figure 34 shows a series of images that illustrate the effects of the image chain as the ground scene is imaged and processed. Sensor Types
11 bpp (no compression)
2.8 bpp (4 : 1 compression)
1.8 bpp (6 : 1 compression)
1.0 bpp (11 : 1 compression) Figure 33. An 11-bit image compressed to fewer bits per pixel.
sensitivity differences between detectors, failed detectors, and discontinuities at the segment boundaries. Image processing techniques can be used to optimize the visual interpretability of the image data. Calibration removes the streaks, and the data from each detector segment are processed to synthesize an image collected by a single continuous detector array. The image is also processed to remove any image motion effects, such as vehicle oscillation, and geometric distortions that might occur. The image data can also be processed to a specific mapping geometry, such as orthorectification. Edge sharpening and contrast enhancement processes may be applied to generate the final image. The image chain ends with the human visual system if the image is exploited by an image analyst or alternatively
As mentioned previously, surveillance systems are generally categorized by the wavelength region of the electromagnetic spectrum that they image. The earliest cameras used black-and-white film that captured panchromatic images, that is, film that is sensitive to the visible portion of the electromagnetic spectrum. Even with CCD detectors today, black-and-white images are the most common because the light is captured over a broad spectrum of wavelengths and thus allows a high enough signal to collect high-resolution images. Visible imaging systems generally acquire images by using the sun as the illumination source. Multispectral images collect several images within different narrow bands of the electromagnetic spectrum but collectively span a broader portion of the spectrum. The most common multispectral image is a true color image; three images are collected where one images the red portion of the visible spectrum, another images the green portion of the spectrum, and the other images the blue portion of the spectrum. When these three images are viewed together on a color monitor, a true color image is produced. Figure 35 shows an example of the way multispectral images can improve the detection of military vehicles. Panchromatic (black-and-white), true color (blue, green, and red), and false color (green, red, and near infrared) images were generated using the digital image and remote sensing image generation (DIRSIG) (6) process developed at Rochester Institute of Technology (RIT). Note that the vehicles are not easily discernible from the background in the panchromatic, but they are more readily detected in the true color image and are even more apparent in the false color image where the difference in the spectrum is greater. Hyperspectral images are collected that have much finer spectral resolution than multispectral images. A multispectral collection may divide a spectral region into a few bands, whereas a hyperspectral collection may divide the same spectral region into tens to hundreds of bands. Each pixel in a hyperspectral image contains spectral characteristics about the surface material imaged within that pixel. Hyperspectral imagery is collected to help identify and discriminate different materials in the scene and is generally used with machine vision algorithms. Figure 36 shows a hyperspectral cube of Moffet Field, California, where the spectral images are stacked in order of the spectral wavelength so that the z axis shows the spectrum of each pixel. The hyperspectral data was collected using NASA’s Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor, which has 224 channels from 0.4 to 2.5 µm; each has a spectral bandwidth of approximately 10 nm. Figure 37 compares the 224 hyperspectral bands imaged by AVIRIS with the multispectral bands imaged by the Landsat Thematic Mapper.
Radiometry of ground scene
Optics
Sensor
Ground processing
Image enhancements Figure 34. Series of images illustrating the effects of the image chain as the ground scene is imaged and processed.
Panchromatic
True color
False color
Figure 35. Panchromatic (black-and-white), true color (blue, green, and red), and false color (green, red, and near infrared) images generated using RIT’s DIRSIG process to simulate the detection of military vehicles (courtesy of RIT). See color insert. 788
Solar irradiance and earth excitance (w−m−2µm−1)
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
789
10000 1000
Solar irradiance
100 Earth excitance 10 1 0.1
1
10
100
Wavelength (µm)
Figure 38. Blackbody spectral curves for the sun (T = 5, 900 K) and the earth (T = 300 K).
Figure 36. Hyperspectral cube of Moffet Field, California, taken on 20 August, 1992, using the AVIRIS sensor (courtesy of NASA JPL).
Atmosphere transmittance
1.0 0.9 0.8 0.7 0.6 0.5 0.4
Image Quality
0.3 0.2 0.1 0.0 0.4
an infrared image, whereas no differences may appear in a visible image. Synthetic aperture radar (SAR) imaging emits highfrequency radio signals and converts the returned signal into imagery. The motion of the platform is used to synthesize a larger antenna and produce higher resolution imagery. Because SAR supplies its own illumination source and the radio waves are not scattered by clouds, SAR surveillance systems can acquire images during any time of the day or night and under cloudy conditions. Figure 40 shows an example of an SAR image and illustrates the speckle characteristic of an SAR image due to the coherent nature of the radar signal.
0.7
1.0
1.3
1.6
1.9
2.2
2.5
Wavelength (µm) Landest TM multispectral bands
AVIRIS hyperspectral bands
Figure 37. The 224 hyperspectral bands imaged by AVIRIS compared with the multispectral bands imaged by the Landsat Thematic Mapper.
Infrared images are collected by using sensors that detect the electromagnetic spectrum between 0.7 and 100 µm, outside the range visible to the human eye. Figure 38 compares the spectral irradiance of the sun (T = 5, 900 K) above the earth’s atmosphere and the earth’s spectral radiant exitance (T = 300 K), where each is modeled as a blackbody distribution using Eq. (1). The earth emits radiation at wavelengths in the infrared part of the spectrum above 3 µm. For this reason, infrared surveillance systems can be used to acquire overhead images at night. Infrared images are also used to detect and measure differences in the temperature of objects. Underground features that cause a temperature difference at the surface, such as hot pipes, can be detected, as shown in Fig. 39. Trucks or planes that have their engines running can be discriminated from those that are idle in
Image quality is a broad term that encompasses many factors and has many measures. Image quality may have different meanings to different users; for example, a user of hyperspectral data will require high spectral resolution, whereas a user of visible panchromatic imagery may require high spatial resolution. The utility of an image should not be equated with the quality of the image. For example, geographic surveys can be performed better with overhead images that trade off lower resolution for a larger area of coverage. Figure 41 is a simulation of two overhead images acquired from the same camera but at different altitudes. The image acquired at the lower altitude has more detail that can be resolved on the ground, but much of the surrounding information is lost, which may be very important for understanding the context of the objects on the ground. The image quality depends on each element of the image chain, illustrated in Fig. 24. Assuming that all elements of the image chain have been optimized to maximize image quality, then the primary limitations on image quality for most overhead surveillance systems will be the spatial resolution and the SNR (signal-to-noise ratio). Spatial Resolution The highest spatial resolution, that is, the resolving power, of an imaging system is the highest spatial frequency that can be resolved in the final image. In film systems, the limiting resolution was determined by imaging ground resolution targets and then determining
790
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Visible
Infrared
Figure 39. Visible and infrared image of the same scene. Note that the hot underground pipe is visible in the infrared image.
the limiting factor in spatial resolution, the interaction between detector sampling and the performance of the optics plays an important role in determining final image quality. The GSD is usually the only figure of merit used to communicate the image quality of an overhead surveillance system that uses a digital camera. Figure 42 shows image simulations of various imaging system designs at the same GSD, but each has a different system MTF, SNR, and amount of image motion that results in very different image quality. The detector resolution and the optical resolution for all camera systems must be understood to assess image quality. Detector Resolution Figure 40. SAR image of the U.S. Capitol building (courtesy of Sandia National Laboratories).
the ground resolvable distance (GRD) of the film product. The performance of the optics, the film, and the stability of the collection system primarily limited the GRD. Most digital overhead surveillance systems use the ground sampled distance (GSD) as the measure for spatial resolution. The GSD, however, refers only to the detector sampling projected onto the ground and, unlike the GRD, ignores any effects that the optical system may have on spatial resolution. Even if the detector sampling is
Figure 41. Simulation of two overhead images acquired from the same camera but at different altitudes.
Assuming that the digital detector has good signal performance, the detector sampling pitch, that is, the distance between the centers of adjacent detector elements, limits the highest spatial frequency that can be sampled without aliasing. Spatial frequencies higher than the Nyquist frequency, defined by (9) νN = Nyquist frequency ≡
1 , 2p
(35)
where p is the detector sampling pitch, will be aliased and will appear as spatial frequencies less than the Nyquist
High-altitude collection
Low-altitude collection
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
791
Figure 42. Images at the same GSD but of different image quality.
frequency in the image. Figure 43 illustrates the effect of aliasing on an image when the sampling pitch is increased. For this discussion, the imaging detector will be modeled as a linear array of detector elements that is imaged in a push-broom fashion, as shown in Fig. 31. The GSD is calculated by projecting the detector sampling pitch through the imaging system onto the ground. If the fill factor is 100% for each detector element, then the GSD is equal to the ground instantaneous field of view (GIFOV), which is the linear extent of each detector projected onto the ground. The GSD in one dimension for a nadir viewing geometry is h GSD = p , f
(36)
where p is the detector pitch, h is the altitude of the satellite, and f is the effective focal length of the optical system. In terms of the GSD, the highest spatial frequency
on the ground that can be sampled without aliasing is
νN =
h . 2f GSD
(37)
Referring to Fig. 44, the slant range H is the distance from the satellite to the ground target being imaged and is given by H=
(R + h)2 − R2 cos2 (ISEL) − R sin(ISEL),
(38)
where R is the radius of the earth. The imaging satellite elevation angle (ISEL) is the angle between the line from the ground target to the satellite and the line tangent to the earth at the ground target. The ISEL is related to the look angle L, which is the angle between the line from the satellite to the ground target and the line from the
792
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Then, the geometric mean GSD is
GSD = Original image
Sampling pitch increased 2x
H GSDx GSDy = p f
2
× sin (β) +
cos (β) +
cos(β) sin(ISEL)
2
sin(β) sin(ISEL)
2
2 14 .
(41)
Diffraction Resolution Sampling pitch increased 3x
Sampling pitch increased 4x
Sampling pitch increased 5x
Sampling pitch increased 6x
Figure 43. The effect of aliasing on an image when the sampling pitch is increased.
where r = x2 + y2 and J1 (r) is the first-order Bessel function. The modulus of the Fourier transform of the optics PSF is the MTF. The MTF of a diffraction-limited incoherent optical system that has a circular aperture has a distinct spatial frequency cutoff νc , given by (8,9)
L h
H
ISEL
R
The diffraction resolution of a diffraction-limited incoherent imaging system is determined by the optics. The diffraction of light imaged through the optics causes a point of light to spread out, as described by the point-spread function (PSF) (8,9). The diffractionlimited incoherent optics PSF for a circular aperture is given by 2 π Dr 2J1 λf , (42) PSFoptics (r) = π Dr λf
νc =
b
D 1 = , λf λ(f #)
(43)
In two dimensions, the GSD is commonly defined as the geometric mean of the x and y detector pitch projected to the ground. The x axis is associated with the crossscan direction of the array and the y axis is associated with the along-scan direction. Because the GSD is the sampling distance projected onto the ground plane, the sampling distance along the line of sight will be increased by a factor of 1/sin(ISEL). If β is the angle between the x direction of the digital array and the line perpendicular from the line of sight, then the GSD in the x direction will be
where D is the diameter of the optics aperture, f is the focal length, λ is the wavelength of light, and f # is the system f number that equals f /D. This spatial frequency cutoff limits the spatial resolution that can be imaged with the optical system. This cutoff does not change, even if a central obscuration is placed in the optical aperture. Although it is common in the optics community to define the width of the diffraction-limited incoherent optics PSF for a circular aperture as the diameter of the first zero in PSFoptics from Eq. (42), given by 2.44 λ(f #), it will be convenient in this analysis if we define the PSF width as λ(f #), which is approximately the full width at half maximum (FWHM) of the PSF for a circular aperture. If the optics ground spot size (GSSoptics ) is defined as the width of the optics PSF projected through the imaging system onto the ground, as illustrated in Fig. 45, then the GSSoptics at a nadir viewing geometry for a diffractionlimited incoherent optical system that has a circular aperture is h h (44) GSSoptics = λ = λ(f #) . D f
2 12 H sin(β) . cos2 (β) + GSDx = p f sin(ISEL)
For viewing geometries off-nadir, the GSSoptics can be calculated as the geometric mean of the projection onto the ground of the x and y directions, given by
Figure 44. Satellite acquisition geometry used to calculate the off-nadir GSD.
satellite to the center of the earth, by sin(L) =
R R+h
cos(ISEL).
(39)
(40)
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
PSFoptics Focal plane
GSSoptics = GSD
Optics
GSSoptics Ground plane Figure 45. The optics PSF projected onto the ground.
GSSoptics =
GSSopticsx GSSopticsy
H =λ D
cos (β) + 2
2
× sin (β) +
sin(β) sin(ISEL)
cos(β) sin(ISEL)
2
2 14 .
(45)
Optimizing Detector and Diffraction Resolution A digital imaging system can be designed, so that the system resolution is either detector-limited or diffractionlimited. The two resolution limits can be compared to one another by calculating the ratio of the sampling frequency to the optical bandpass limit of the optical system. For a diffraction-limited incoherent optical system, this ratio is
Detector sampling frequency = Optical bandpass limit
1 λ(f #) p = . 1 p λ(f #)
2 14 sin(β) cos2 (β) + sin(ISEL) H λ 2 D cos(β) 2 sin (β) + sin(ISEL) 2 14 sin(β) cos2 (β) + sin(ISEL) H p 2 f cos(β) 2 sin (β) + sin(ISEL)
793
=
λ(f #) . p
(48) Figure 46 shows the sampling of the optics PSF from Eq. (42) for λ(f #)/p equal to 1, 2, and 3. For panchromatic systems, λ represents the mean wavelength. Note that λ(f #)/p is independent of the acquisition geometry, that is, it is a fundamental design parameter of the imaging system. The diffraction resolution equals the detector resolution when λ(f #)/p = 2 because the optics MTF falls to zero at the Nyquist frequency and νN = νc . If λ(f #)/p < 2, then the spatial resolution is limited by the detector because νN < νc . Spatial frequencies captured by the optics that are above νN will be manifested as aliasing artifacts. If λ(f #)/p > 2, then the spatial resolution is limited by the diffraction of the optics because νc < νN . This would imply that an overhead surveillance system that is designed to λ(f #)/p = 2 by reducing the GSD or the GSSoptics will provide the best image quality, yet remote sensing systems are generally designed so that λ(f #)/p < 2. It is important to note that spatial resolution alone is not synonymous with image quality. Although more aliasing occurs as λ(f #)/p is reduced, the higher MTF below νN will cause edges in the image to appear sharper. Figure 47 shows image simulations that illustrate the improvement in image sharpness as λ(f #)/p decreases (11). Stronger sharpening filters could be used to improve the edge sharpness of the higher λ(f #)/p images, but the noise gain and edge overshoot effects would have been undesirable. Another important benefit of reducing λ(f #)/p is that the SNR improves.
Optics PSF for circular aperture
(46)
Detector
In terms of the Nyquist frequency and the cutoff frequency, l(f/#)/p = 1
λ(f #) 2νN . = p νc
(47)
2.44 l (f/#)
p
l(f/#)/p = 2
λ(f #)/p can also be interpreted as a measure of how finely the detector samples the diffraction-limited optics PSF. Thus, in remote sensing, λ(f #)/p is interpreted as a measure of how finely the GSD samples the ground scene with respect to the GSSoptics , given by
l(f/#)/p = 3 Figure 46. λ(f #)/p as a measure of how finely the detector samples the optics PSF.
794
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
where starget is given by Eq. (29) and σnoise is given by Eq. (34). This SNR calculation for a remote sensing system design would be straightforward, except for the calculation of the target spectral radiance Ltarget (λ). The target spectral radiance depends on the imaging collection parameters, that is, the solar angle, the atmospheric conditions, and the viewing geometry of the remote sensing system, as well as the target reflectance ρtarget (λ). Figure 48 shows the effects of the atmosphere on image quality. As the visibility of the atmosphere decreases, the contrast of the image decreases. Image processing used to improve the image contrast also enhances the noise in the image. Assuming that ‘‘typical’’ imaging collection parameters will be used to calculate the SNR, the most common difference between SNR metrics is the value used for
l(f# )/p = 0.3
l(f# )/p = 1.0
Horizontal visibility = 48 km Nadir viewing geometry l(f# )/p = 1.5
Horizontal visibility = 24 km Nadir viewing geometry l(f# )/p = 2.0 Figure 47. Changing λ(f #)/p while maintaining a GSD of 1 meter at constant SNR.
Signal-to-Noise Ratio (SNR) The quality of an overhead surveillance image depends on the amount of signal that is captured from the desired ground target compared to the noise level in the image. Many different metrics have surfaced in the remote sensing community over the years to define the SNR. In their basic form, all of the metrics relate a signal level to a noise level, that is SNR ≡
signal , noise
(49)
but the differences arise in what is considered signal and what is considered noise. Most SNR metrics compare the mean signal with the standard deviation of the noise, such that SNR =
starget mean target signal = signal deviation σnoise ,
Horizontal visibility = 8 km Nadir viewing geometry
(50)
Horizontal visibility = 8 km 60° look angle Figure 48. The visibility of the atmosphere and the look angle influence the SNR and the quality of the image.
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
ρtarget (λ). A common assumption is to use the signal from a 100% reflectance target, given by $ starget $$ ρtarget = 100%, (51) SNRρ=100% = σnoise $ where the vertical line ‘‘|’’ means ‘‘evaluated at’’. This metric is not very realistic for remote sensing purposes, so values closer to the average reflectance of the earth are used instead. For land surfaces, the average reflectance of the earth between λmin = 0.4 µm and λmax = 0.9 µm is approximately 15% but will vary depending on the terrain type, such as soil and vegetation. Two targets cannot be distinguished from one another in the image if the difference between their reflectance values is less than the signal differences caused by noise. Therefore, it is beneficial to define the signal that uses the difference of the reflectance between two targets (or a target and its background): ρ = ρhigh − ρlow .
(52)
The SNR metric for the reflectance difference between the two targets is $ $ $ $ − starget $$ starget $$ ρtarget =ρhigh ρtarget =ρlow SNRρ = σnoise $ $ starget $$ ρtarget =ρ = . (53) σnoise
SNR(r = 15%) = 84 NEDr = 0.2%
SNR(r = 15%) = 12 NEDr = 1.3%
SNR(r = 15%) = 6 NEDr = 2.4%
SNR(r = 15%) = 1 NEDr = 11%
795
This SNR metric is used often in remote sensing, but the value of SNRρ dependents on the values chosen for ρhigh and ρlow . The value for ρhigh is typically used to calculate the photon noise in σnoise . Another metric commonly used is the noise equivalent change in reflectance, or NEρ, which calculates the difference of the reflectance between two targets that is equivalent to the standard deviation of the noise. Therefore, it will be difficult, to differentiate two targets that have reflectance differences less than the NEρ due to the noise. The NEρ can be calculated by solving SNRρ for ρ. If ρ is independent of λ, then the NEρ is simply NEρ =
σ 1 ρ $ noise = = . $ SNRρ SNRρ starget $$ ρ ρtarget =100%
(54)
Figure 49 shows image simulations for a camera design at different SNR and NEρ values. A 15% target reflectance value and 75 electrons of dark noise were used. National Imagery Interpretability Rating Scale The National Imagery Interpretability Rating Scale (NIIRS) is a 0–9 scale developed by the U. S. government’s Imagery Resolution Assessment and Reporting Standards (IRARS) Committee (12). The NIIRS is an exploitation task-based scale that quantifies the interpretability of
Figure 49. Image simulations for a camera design at different SNR and NEρ values.
Table 1. Visible NIIRS Criteria Visible NIIRS
Criteria
0
Interpretability of the imagery is precluded by obscuration, degradation, or very poor resolution
1
Detect a medium-sized port facility and/or distinguish between taxiways and runways at a large airfield.
2
Detect large hangars at airfields. Detect large static radars (e.g., AN/FPS-85, COBRA DANE, PECHORA, HENHOUSE). Detect military training areas. Identify an SA-5 site based on road pattern and overall site configuration. Detect large buildings at a naval facility (e.g., warehouses, construction hall). Detect large buildings (e.g., hospitals, factories).
3
Identify the wing configuration (e.g., straight, swept, delta) of all large aircraft (e.g., 707, CONCORD, BEAR, BLACKJACK). Identify radar and guidance areas at a SAM site by the configuration, mounds, and presence of concrete aprons. Detect a helipad by the configuration and markings. Detect the presence/absence of support vehicles at a mobile missile base. Identify a large surface ship in port by type (e.g., cruiser, auxiliary ship, noncombatant/merchant). Detect trains or strings of standard rolling stock on railroad tracks (not individual cars).
4
Identify all large fighters by type (e.g., FENCER, FOXBAT, F-15, F-14). Detect the presence of large individual radar antennas (e.g., TALL KING). Identify, by general type, tracked vehicles, field artillery, large river crossing equipment, wheeled vehicles when in groups. Detect an open missile silo door. Determine the shape of the bow (pointed or blunt/rounded) on a medium-sized submarine (e.g., ROMEO, HAN, Type 209, CHARLIE 11, ECHO 11, VICTOR II/III). Identify individual tracks, rail pairs, control towers. Distinguish between a MIDAS and a CANDID by the presence of refueling equipment (e.g., pedestal and wing pod). Identify radar as vehicle-mounted or trailer-mounted. Identify, by type, deployed tactical SSM systems (e.g., FROG, SS-21, SCUD). Distinguish between SS-25 mobile missile TEL and Missile Support Vans (MSVS) in a known support base, when not covered by camouflage. Identify TOP STEER or TOP SAIL air surveillance radar on KIROV-, SOVREMENNY-, KIEV-, SLAVA-, MOSKVA-, KARA-, or KRESTA-II-class vessels.
5
6
Distinguish between models of small/medium helicopters (e.g., HELIX A from HELIX B from HELIX C, HIND D from HIND E, HAZE A from HAZE B from HAZE C). Identify the shape of antennas on EW/GCI/ACQ radars as parabolic, parabolic with clipped corners, or rectangular. Identify the spare tire on a medium-sized truck. Distinguish between SA-6, SA- I 1, and SA- 17 missile airframes. Identify individual launcher covers (8) of vertically launched SA-N-6 on SLAVA-class vessels. Identify automobiles as sedans or station wagons.
7
Identify fitments and fairings on a fighter-sized aircraft (e.g., FULCRUM, FOXHOUND). Identify ports, ladders, vents on electronics vans. Detect the mount for antitank guided missiles (e.g., SAGGER on BMP-1). Detect details of the silo door hinging mechanism on Type III-F, III-G, and 11-H launch silos and Type III-X launch control silos. Identify the individual tubes of the RBU on KIROV-, KARA-, KRIVAK-class vessels. Identify individual rail ties.
8
Identify the rivet lines on bomber aircraft. Detect horn-shaped and W-shaped antennas mounted atop BACKTRAP and BACKNET radars. Identify a hand-held SAM (e.g., SA-7/14, REDEYE, STINGER). Identify joints and welds on a TEL or TELAR. Detect winch cables on deck-mounted cranes. Identify windshield wipers on a vehicle.
9
Differentiate cross-slot from single-slot heads on aircraft skin panel fasteners. Identify small light-toned ceramic insulators that connect wires of an antenna canopy. Identify vehicle registration numbers (VRN) on trucks. Identify screws and bolts on missile components. Identify braid of ropes (1–3 inches in diameter). Detect individual spikes in railroad ties.
796
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
797
Table 2. Civilian NIIRS Criteria Visible NIIRS
Criteria
0
Interpretability of the imagery is precluded by obscuration, degradation, or very poor resolution.
1
Distinguish between major land use classes (e.g., urban, agricultural, forest, water, barren). Detect a medium-sized port facility. Distinguish between runways and taxiways at a large airfield. Identify large area drainage patterns by type (e.g., dendritic, trellis, radial).
2
Identify large (i.e., greater than 160-acre) center-pivot irrigated fields during the growing season. Detect large buildings (e.g., hospitals, factories). Identify road patterns, like clover leafs, on major highway systems. Detect ice-breaker tracks. Detect the wake from a large (e.g., greater than 300’) ship.
3
Detect large area (i.e., larger than 160 acres) contour plowing. Detect individual houses in residential neighborhoods. Detect trains or strings of standard rolling stock on railroad tracks (not individual cars). Identify inland waterways navigable by barges. Distinguish between natural forest stands and orchards.
4
Identify farm buildings as barns, silos, or residences. Count unoccupied railroad tracks along right-of-way or in a railroad yard. Detect basketball court, tennis court, volleyball court in urban areas. Identify individual tracks, rail pairs, control towers, switching points in rail yards. Detect jeep trails through grassland. Identify Christmas tree plantations. Identify individual rail cars by type (e.g., gondola, flat, box) and locomotives by type (e.g., steam, diesel). Detect open bay doors of vehicle storage buildings. Identify tents (larger than two person) at established recreational camping areas. Distinguish between stands of coniferous and deciduous trees during leaf-off condition. Detect large animals (e.g., elephants, rhinoceros, giraffes) in grasslands.
5
6
Detect narcotics intercropping based on texture. Distinguish between row (e.g., corn, soybean) crops and small grain (e.g., wheat, oats) crops. Identify automobiles as sedans or station wagons. Identify individual telephone/electric poles in residential neighborhoods. Detect foot trails through barren areas.
7
Identify individual mature cotton plants in a known cotton field. Identify individual railroad ties. Detect individual steps on a stairway. Detect stumps and rocks in forest clearings and meadows.
8
Count individual baby pigs. Identify a USGS benchmark set in a paved surface. Identify grill detailing and/or the license plate on a passenger/truck type vehicle. Identify individual pine seedlings. Identify individual water lilies on a pond. Identify windshield wipers on a vehicle.
9
Identify individual grain heads on small grains (e.g., wheat, oats, barley). Identify individual barbs on a barbed wire fence. Detect individual spikes in railroad ties. Identify individual bunches of pine needles. Identify an ear tag on large game animals (e.g., deer, elk, moose).
an image and has become a principal measure of image quality for reconnaissance systems (13). An experienced image analyst can assign an NIIRS rating to an image. The specific objects listed in the NIIRS criteria do not need to be in the image, but the experience of the image analyst is required to decide if the NIIRS criteria would be
met if the information were present in the image. If more information can be extracted from the image, then the NIIRS rating will increase. Therefore, the NIIRS is directly related to the quality of the image being exploited and is an important tool for defining the imaging requirements of surveillance systems.
Table 3. Infrared NIIRS Criteria Infrared NIIRS
Criteria
0
Interpretability of the imagery is precluded by obscuration, degradation, or very poor resolution.
1
Distinguish between runways and taxiways on the basis of size, configuration, or pattern at a large airfield. Detect a large (e.g., greater than 1 square kilometer) cleared area in dense forest. Detect large ocean-going vessels (e.g., aircraft carrier, supertanker, KIROV) in open water. Detect large areas (e.g., greater than 1 square kilometer) of marsh/swamp.
2
Detect large aircraft (e.g., C-141, 707, BEAR, CANDID, CLASSIC). Detect individual large buildings (e.g., hospitals, factories) in an urban area. Distinguish between densely wooded, sparsely wooded, and open fields. Identify an SS-25 base by the pattern of buildings and roads. Distinguish between naval and commercial port facilities based on type and configuration of large functional areas.
3
Distinguish between large (e.g., C-141, 707, BEAR, A300 AIRBUS) and small aircraft (e.g., A-4, FISHBED, L-39). Identify individual thermally active flues running between the boiler hall and smoke stacks at a thermal power plant. Detect a large air warning radar site based on the presence of mounds, revetments, and security fencing. Detect a driver training track at a ground forces garrison. Identify individual functional areas (e.g., launch sites, electronics area, support area, missile handling area) of an SA-5 launch complex. Distinguish between large (e.g, greater than 200-meter) freighters and tankers.
4
Identify the wing configuration of small fighter aircraft (e.g., FROGFOOT, F-16, FISHBED). Detect a small (e.g., 50 meter square) electrical transformer yard in an urban area. Detect large (e.g., greater than 10-meter diameter) environmental domes at an electronics facility. Detect individual thermally active vehicles in garrison. Detect thermally active SS-25 MSVs in garrison. Identify individual closed cargo hold hatches on large merchant ships. Distinguish between single-tail (e.g., FLOGGER, F-16, TORNADO) and twin-tailed (e.g., F-15, FLANKER, FOXBAT) fighters. Identify outdoor tennis courts. Identify the metal lattice structure of large (e.g., approximately 75-meter) radio relay towers. Detect armored vehicles in a revetment. Detect a deployed transportable electronics tower (TET) at an SA-10 site. Identify the stack shape (e.g., square, round, oval) on large (e.g., greater than 200-meter) merchant ships.
5
6
Detect wing-mounted stores (i.e., ASM, bombs) protruding from the wings of large bombers (e.g., B-52, BEAR, Badger). Identify individual thermally active engine vents atop diesel locomotives. Distinguish between a FIX FOUR and FIX SIX site based on antenna pattern and spacing. Distinguish between thermally active tanks and APCs (Armored Personnel Carrier). Distinguish between a two-rail and four-rail SA-3 launcher. Identify missile tube hatches on submarines.
7
Distinguish between ground attack and interceptor versions of the MIG-23 FLOGGER based on the shape of the nose. Identify automobiles as sedans or station wagons. Identify antenna dishes (less than 3 meters in diameter) on a radio relay tower. Identify the missile transfer crane on an SA-6 transloader. Distinguish between an SA-2/CSA-1 and a SCUD-B missile transporter, when missiles are not loaded. Detect mooring cleats or bollards on piers.
8
Identify the RAM airscoop on the dorsal spine of FISHBED J/K/L. Identify limbs (e.g., arms, legs) on an individual. Identify individual horizontal and vertical ribs on a radar antenna. Detect closed hatches on a tank turret. Distinguish between fuel and oxidizer multisystem propellant transporters based on twin or single fitments on the front of the semitrailer. Identify individual posts and rails on deck edge life rails.
9
Identify access panels on fighter aircraft. Identify cargo (e.g., shovels, rakes, ladders) in an open-bed, light-duty truck. Distinguish between BIRDS EYE and BELL LACE antennas based on the presence or absence of small dipole elements. Identify turret hatch hinges on armored vehicles. Identify individual command guidance strip antennas on an SA-2/CSA-1 missile. Identify individual rungs on bulkhead-mounted ladders.
798
Table 4. Radar NIIRS Criteria Radar NIIRS
Criteria
0
Interpretability of the imagery is precluded by obscuration, degradation, or very poor resolution.
1
Detect the presence of aircraft dispersal parking areas. Detect a large, cleared swath in a densely wooded area. Detect, a port facility based on presence of piers and warehouses. Detect lines of transportation (either road or rail), but do not distinguish between.
2
Detect the presence of large (e.g., BLACKJACK, CAMBER, COCK, 707, 747) bombers or transports. Identify large phased-array radars (e.g., HEN HOUSE, DOG HOUSE) by type. Detect a military installation by building pattern and site configuration. Detect road pattern, fence, and hardstand configuration at SSM launch sites (missile silos, launch control silos) within a known ICBM complex. Detect large noncombatant ships (e.g., freighters or tankers) at a known port facility. Identify athletic stadiums.
3
Detect medium-sized aircraft (e.g., FENCER, FLANKER, CURL, COKE, F-15). Identify an ORBITA site on the basis of a 12-meter dish antenna normally mounted on a circular building. Detect vehicle revetments at a ground forces facility. Detect vehicles/pieces of equipment at an SAM, SSM, or ABM fixed missile site. Determine the location of the superstructure (e.g., fore, amidships, aft) on a medium-sized freighter. Identify a medium-sized (approx. six-track) railroad classification yard. Distinguish between large rotary-wing and medium fixed-wing aircraft (e.g., HALO helicopter versus CRUSTY transport). Detect recent cable scars between facilities or command posts. Detect individual vehicles in a row at a known motor pool. Distinguish between open and closed sliding roof areas on a single bay garage at a mobile missile base. Identify square bow shape of ROPUCHA class (LST). Detect all rail/road bridges.
4
5
Count all medium helicopters (e.g., HIND, HIP, HAZE, HOUND, PUMA, WASP). Detect deployed TWIN EAR antenna. Distinguish between river crossing equipment and medium/heavy armored vehicles by size and shape (e.g., MTU-20 vs T-62 MBT). Detect missile support equipment at an SS-25 RTP (e.g., TEL, MSV). Distinguish bow shape and length/width differences of SSNS. Detect the break between railcars (count railcars).
6
Distinguish between variable and fixed-wing fighter aircraft (e.g., FENCER vs FLANKER). Distinguish between the BAR LOCK and SIDE NET antennas at a BAR LOCK/SIDE NET acquisition radar site. Distinguish between small support vehicles (e.g., UAZ-69, UAZ-469) and tanks (e.g., T-72, T-80). Identify SS-24 launch triplet at a known location. Distinguish between the raised helicopter deck on a KRESTA II (CG) and the helicopter deck with main deck on a KRESTA I (CG).
7
Identify small fighter aircraft by type (e.g., FISHBED, FITTER, FLOGGER). Distinguish between electronics van trailers (without tractor) and van trucks in garrison. Distinguish, by size and configuration, between a turreted, tracked APC and a medium tank (e.g., BMP-1/2 vs T-64). Detect a missile on the launcher in an SA-2 launch revetment. Distinguish between bow-mounted missile system on KRIVAK I/II and bow-mounted gun turret on KRIVAK III. Detect road/street lamps in an urban residential area or military complex.
8
Distinguish the fuselage difference between a HIND and a HIP helicopter. Distinguish between the FAN SONG E missile control radar and the FAN SONG F based on the number of parabolic dish antennas (three vs. one). Identify the SA-6 transloader when other SA-6 equipment is present. Distinguish limber hole shape and configuration differences between DELTA I and YANKEE I (SSBNs). Identify the dome/vent pattern on rail tank cars.
9
Detect major modifications to large aircraft (e.g., fairings, pods, winglets). Identify the shape of antennas on EW/GCI/ACQ radars as parabolic, parabolic with clipped corners, or rectangular. Identify, based on presence or absence of turret, size of gun tube, and chassis configuration, wheeled or tracked APCs by type (e.g., BTR-80, BMP- 1/2, MT-LB, Ml 13). Identify the forward fins on an SA-3 missile. Identify individual hatch covers of vertically launched SA-N-6 surface-to-air system. Identify trucks as cab-over-engine or engine-in-front.
799
800
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Separate NIIRS criteria have been developed for visible, infrared, radar, and multispectral sensor systems because the exploitation tasks for each sensor type can be very different. The visible NIIRS criteria are shown in Table 1. The NIIRS criteria were originally developed for military applications. Realizing the need to relate NIIRS to commercial systems, a civilian NIIRS for visible imagery was eventually developed that is shown in Table 2. The infrared, radar, and multispectral NIIRS criteria are shown in Tables 3, 4, and 5, respectively. Although NIIRS is defined as an integral scale, NIIRS (delta-NIIRS) ratings at fractional NIIRS are used to measure small differences in image quality between two images. A NIIRS that is less than 0.1 NIIRS is usually not perceptible and does not impact the interpretability of the image, whereas a NIIRS more than 0.2 NIIRS is easily perceptible. The NIIRS scale is designed so that the NIIRS ratings are independent of the NIIRS rating of the image, for example, a degradation that produces a 0.2 NIIRS loss in image quality on a NIIRS 6 image will also produce a 0.2 NIIRS loss on a NIIRS 4 image. An image quality equation (IQE) is a tool designed to predict the NIIRS rating of an image, given an imaging system design and collection parameters. The General Image Quality Equation (GIQE), version 4, for visible EO systems is (14)
Predicted NIIRS = 3
Predicted NIIRS = 4
Predicted NIIRS = 5
NIIRS = 10.251 − a log10 GSDGM + b log10 RERGM − 0.656HGM − 0.344
G , SNR
(55)
where GSDGM is the geometric mean (GSD), RERGM is the geometric mean of the normalized relative edge response (RER), HGM is the geometric mean-height overshoot caused by the edge sharpening, G is the noise gain from the edge sharpening, and SNR is the signal-to-noise ratio. Figure 50 illustrates the calculation of RER and H from a normalized edge response. The SNR calculations use Eq. (53) where ρhigh = 15% and ρlow = 7%. The coefficient a equals 3.32 and b equals 1.559, if RERGM ≥ 0.9; a equals 3.16, and b equals 2.817 if RERGM < 0.9. Figure 51 shows image simulations at one NIIRS increments to illustrate the change in image quality as NIIRS increases from
Normalized edge response
H = height of overshoot
1.50
Figure 51. Image simulations at 1 NIIRS increments.
NIIRS 3 to NIIRS 6. Figure 52 shows the decimal NIIRS predictions to illustrate the image quality change at 0.2 NIIRS increments. ABBREVIATIONS AND ACRONYMS
Enhanced edge profile
AVIRIS
1.00
0.50
Predicted NIIRS = 6
Blurred edge profile
RER = slope 0.00 −4.5 −4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 −0.50 −1.00
Figure 50. RER and H calculation from a normalized edge response.
bpp BWC CCD DIRSIG DPCM ERTS FWHM GIFOV GIQE GOES GRD
airborne visible/infrared imaging spectrometer bits per pixel bandwidth compression charge coupled device digital image and remote sensing image generation differential pulse code modulation earth resources technology satellite full width at half maximum ground-instantaneous field of view general image quality equation geostationary operational environment satellite ground resolvable distance
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
801
Table 5. Multispectral NIIRS Criteria Multispectral NIIRS
Criteria
0
Interpretability of the imagery is precluded by obscuration, degradation, or very poor resolution.
1
Distinguish between urban and rural areas. Identify a large wetland (greater than 100 acres). Detect meander flood plains (characterized by features such as channel scars, oxbow lakes, meander scrolls). Delineate coastal shoreline. Detect major highway and rail bridges over water (e.g., Golden Gate, Chesapeake Bay). Delineate extent of snow or ice cover.
2
Detect multilane highways. Detect strip mining. Determine water current direction as indicated by color differences (e.g., tributary entering larger water feature, chlorophyll, or sediment patterns). Detect timber clear-cutting. Delineate extent of cultivated land. Identify riverine flood plains. Detect vegetation/soil moisture differences along a linear feature (suggesting the presence of a fence line). Identify major street patterns in urban areas. Identify golf courses. Identify shoreline indications of predominant water currents. Distinguish among residential, commercial, and industrial areas within an urban area. Detect reservoir depletion.
3
4
Detect recently constructed weapon positions (e.g., tank, artillery, self-propelled gun) based on the presence of revetments, berms, and ground scarring in vegetated areas. Distinguish between two-lane improved and unimproved roads. Detect indications of natural surface airstrip maintenance or improvements (e.g., runway extension, grading, resurfacing, bush removal, vegetation cutting). Detect landslide or rockslide large enough to obstruct a single-lane road. Detect small boats (15–20 feet in length) in open water. Identify areas suitable for use as light fixed-wing aircraft (e.g., Cessna, Piper Cub, Beechcraft) landing strips.
5
Detect automobile in a parking lot. Identify beach terrain suitable for amphibious landing operation. Detect ditch irrigation of beet fields. Detect disruptive or deceptive use of paints or coatings on buildings/structures at a ground forces installation. Detect raw construction materials in ground forces deployment areas (e.g., timber, sand, gravel).
6
Detect summer woodland camouflage netting large enough to cover a tank against a scattered tree background. Detect foot trail through tall grass. Detect navigational channel markers and mooring buoys in water. Detect livestock in open, but fenced areas. Detect recently installed minefields in ground forces deployment area based on a regular pattern of disturbed earth or vegetation. Count individual dwellings in subsistence housing areas (e.g., squatter settlements, refugee camps).
7
Distinguish between tanks and three-dimensional tank decoys. Identify individual 55-gallon drums. Detect small marine mammals (e.g., harbor seals) on sand/gravel beaches. Detect underwater pier footings. Detect foxholes by ring of spoil outlining hole. Distinguish individual rows of truck crops.
GSD GSS IRARS ISEL MODTRAN MSS MTF NASA NIIRS
ground sampled distance ground spot size imagery resolution assessment and reporting standards imaging satellite elevation angle moderate resolution transmittance multispectral scanner modulation transfer function national aeronautics and space administration national imagery interpretability rating scale
NOAA OTF PSF QSE RBV RER SAR SNR SPOT TDI TIROS
national oceanic and atmospheric administration optical transfer function point spread function quantum step equivalence return beam vidicon relative edge response synthetic aperture radar signal-to-noise ratio systeme probatoire d’observation de la terre time delay and integration television and infrared observation satellite
802
INFRARED THERMOGRAPHY
Figure 52. Image simulations at 0.2 NIIRS increments.
TM UAV
thematic mapper unmanned aerial vehicle
Predicted NIIRS = 4.1
Predicted NIIRS = 4.3
Predicted NIIRS = 4.5
Predicted NIIRS = 4.7
Predicted NIIRS = 4.9
Predicted NIIRS = 5.1
14. J. C. Leachtenauer, W. Malila, J. Irvine, L. Colburn, and N. Salvaggio, Appl. Opt. 36, 8,322–8,328 (1997).
BIBLIOGRAPHY 1. F. J. Janza, ed., Manual of Remote Sensing, American Society of Photogrammetry, Falls Church, 1975. 2. R. Bailey, The Air War in Europe, Time-Life Books, Alexandria, 1979. 3. C. Peebles, The Corona Project, Naval Institute Press, Annapolis, 1997.
INFRARED THERMOGRAPHY BRENT GRIFFITH ¨ DANIEL TURLER HOWDY GOUDEY Lawrence Berkeley National Laboratory Berkeley, CA
4. D. Day, J. Logsdon, and B. Latell, Eye in the Sky, Smithsonian Institute Press, Washington, 1998. 5. T. Lillesand and R. Kiefer, Remote Sensing and Image Interpretation, John Wiley & Sons, Inc., NY, 2000. 6. J. R. Schott, Remote Sensing, The Image Chain Approach, Oxford University Press, NY, 1997. 7. R. A. Schowengerdt, Remote Sensing, Models and Methods for Image Processing, Academic Press, NY, 1997. 8. J. W. Goodman, Introduction to Fourier Optics, McGraw-Hill, NY, 1968. 9. J. D. Gaskill, Linear Systems, Fourier Transforms, and Optics, John Wiley & Sons, Inc., NY, 1978. 10. G. C. Holst, CCD Arrays, Cameras, and Displays, SPIE Optical Engineering Press, Bellingham, 1998. 11. R. D. Fiete, Opt. Eng. 38, 1,229–1,240 (1999). 12. J. C. Leachtenauer, ASPRS/ASCM Annual Convention and Exhibition Technical Papers: Remote Sensing and Photogrammetry 1, 262–272 (1996). 13. K. Riehl and L. Maver, Proc. SPIE 2829, 242–254 (1996).
INTRODUCTION Infrared (IR) thermographic systems, or IR imagers, provide images that represent surface temperatures, or thermograms, by measuring the magnitude of infrared radiation emanating from the surface of an object. Because IR imagers see the radiation naturally emitted by objects, imaging may be performed in the absence of any additional light source. Modern IR imagers resolve surface temperature differences of 0.1 ° C or less. At this high sensitivity, they can evaluate subtle thermal phenomena that are revealed only as slight temperature gradients. Some applications that employ IR thermography include inspections for predictive maintenance, nondestructive evaluation of thermal and mechanical properties, building
INFRARED THERMOGRAPHY
THEORY OF OPERATION The fundamental principles that make IR thermal imaging possible begin with the observation that all objects emit a distribution of electromagnetic radiation that is uniquely related to the object temperature. Temperature is a measure of the internal energy within an object, a macroscopic average of the kinetic energy (the energy of motion) of the atoms or molecules of which the object is composed. Electromagnetic radiation arises from the oscillation of electrostatically charged particles, such as the charged particles found within an atom, the electron and the proton. Electromagnetic radiation propagates by the interaction between oscillating electric and magnetic fields and can sustain itself in the absence of any conveying media. The wavelength, the distance between successive peaks in the oscillations of the electric and magnetic fields, can vary across a wide range that represents a diverse range of phenomena, including radio transmissions, microwaves, infrared (IR), visible and ultraviolet (UV) light, X rays, and gamma rays. The IR portion of the electromagnetic spectrum, which is of primary in interest to thermographers, includes wavelengths from about 1 to 100 µm. The interaction of materials with radiation of different wavelengths is extremely varied. Electromagnetic radiation may be absorbed, reflected, or transmitted by a material, depending on the material properties with respect to the wavelength in question. The IR band of radiation is considered ‘‘thermal’’, mostly because it contains the wavelengths of radiation emitted by objects at ordinary temperatures. However, if it were not for the high absorption of IR by most objects, it would not be nearly as important in heat transfer. For example, human skin
absorbs well in the IR; for this reason, we perceive IR radiation as heat more readily than we perceive radiation of other wavelengths, such as X rays, which are mostly transmitted. Visible radiation is also considered thermally important, because visibly dark objects absorb it well and because it is a substantial component of the radiation emitted by the sun. An object at a single temperature does not simply emit a single wavelength of electromagnetic radiation. Because temperature is a macroscopic average of molecular scale oscillations, there is, in fact, a distribution of molecular kinetic energies underlying a single temperature. Correspondingly, there is a distribution of wavelengths and intensities of electromagnetic radiation emitted by an object at a single temperature, as a result of the varied oscillation rates of the charged particles within. Using a theoretical, idealized emitter termed a blackbody, Planck first derived a mathematical expression for the emissive power of radiation as a function of wavelength and temperature; hence, it is known as the Planck distribution. Qualitatively, at low temperatures, the shape of the distribution is broad and does not have a well-defined peak. At higher temperatures, the distribution is narrower, and the peak is very well defined. Figure 1 shows Planck distributions, or blackbody curves, for objects at 5,800 K (temperature of the sun), 2,000 K, 1,273 K (1,000 ° C), 373 K (100 ° C), and 293 K (20 ° C — room temperature). It can also be seen in Fig. 1 that the wavelength emitted with the most intensity, or peak wavelength, is also a function of temperature. The Wien displacement law expresses this relationship of temperature, T (in Kelvin) to the peak wavelength λpeak of radiation emitted by a body: λPeak =
2897.8 . T
(1)
Using this expression, it is clear that most commonly encountered temperatures correspond to radiation in the IR band (1 to 100 µm). For example, a temperature of 300 K (room temperature) corresponds to a wavelength of about 10 µm. In contrast, the surface of the sun is 5,800 K; hence, the peak wavelength of solar emission is in the visible portion of the spectrum (about 0.5 µm). Because
Emissive power (W/m2)
science, military reconnaissance and weapons guidance, and medical imaging. Infrared thermography can be used both as a qualitative and a quantitative tool. Some applications do not require exact surface temperatures. In such cases, it is sufficient to acquire thermal signatures, characteristic patterns of relative temperatures of phenomena or objects. This method of qualitative visual inspection is expedient for collecting a large number of detailed data and conveying them, so that they can be easily interpreted. In contrast, accurate quantitative thermography demands a more rigorous procedure to extract valid temperature maps from raw thermal images. However, the extra effort can produce large arrays of high-resolution temperature data that are unrivaled by contact thermal measurement techniques, such as using thermocouple wires. A skilled operator of an IR thermographic system, or thermographer, must be conscious of the possibility that reflected or transmitted, rather than emitted, IR radiation may be emanating from an object. These additional sources manifest themselves as signals that appear to be, but are not actually, based exclusively on the temperature of the spot being imaged. To understand the challenges and possibilities of IR thermography, it is first necessary to review the principles of physics on which it relies.
803
1.E+09 1.E+08 1.E+07 1.E+06 1.E+05 1.E+04 1.E+03 1.E+02 1.E+01 1.E+00 1.E−01 1.E−02 1.E−03 1.E−04
5800 K 2000 K 1273 K 373 K 293 K
0.1
1
10
Wavelength (microns) Figure 1. Blackbody emissive power vs. wavelength.
100
804
INFRARED THERMOGRAPHY
most terrestrial temperatures yield emission in the IR band, it is important to recognize that, in IR imaging, unlike in common experience in visible imaging, almost every object in the field of view is a source, not just a reflector of a source. Integrating the Planck distribution over all wavelengths for a given temperature yields the blackbody emissive power Eb . Equation (2) is a simple expression for Eb called the Stefan–Boltzmann law (σ is the Stefan–Boltzmann constant, 5.67 × 10−8 W/m2 K4 ): Eb = σ T 4 .
(2)
This relationship is at the core of IR thermography because it is the emissive power that an IR imager physically measures, whereas temperature is the parameter of interest. Furthermore, the success of IR thermography is highly dependent on the fortuitous fourth power relationship, so that emissive power is a very strong function of temperature. The large response makes it possible to achieve excellent temperature resolution. Because it is derived from Planck’s distribution, the Stefan–Boltzmann law is limited to describing an idealized blackbody emitter. In reality, no body emits exactly as much power as a blackbody. The ratio of power actually emitted by a given body to that of the power emitted by a blackbody at the same temperature is called the emissivity and is represented by a nondimensional number between zero and one. Emissivity is a wavelength, temperature, and incidence angle dependent property specific to each material. The term emittance, or e, is more useful to the thermographer because it is used to describe the emissivity of an object in aggregate across a range of wavelengths, temperatures, and incident angles. Usually this simplified quantity is valid only across a limited range of these parameters, within which the variation in emissivity with respect to each parameter is not highly significant. A surface that meets the criteria of nearly constant emissivity with respect to wavelength and incidence angle within a certain regime is termed a gray surface and can be characterized by a single emittance for simplified calculations. Published values of emittance should be used only when the wavelength, temperature, and angular parameters agree closely with those for the IR imaging arrangement at hand. Surface emittance needs to be well understood for successful operation of IR thermographic systems. Introducing emittance into Eq. (2) yields Eq. (3), the Stephan–Boltzmann law modified by emittance e: (3) E = eσ T 4 . Thermal radiation is largely a surface phenomenon because most materials are not transparent to IR. As a result, it is the material properties of the surface of an object that determine emittance. Polished metals have low emittances, but a thin layer of paint can transform them to high emittance. For materials that are opaque to IR radiation, emittance can be considered the complement of reflectance (the amount of incident radiation that is reflected by a surface), expressed by 1 − e. It is important to realize that because no real surface has an emittance
e = 1, the radiation that is viewed coming from an object is always a combination of emitted and reflected radiation, termed radiosity, and it contains information regarding both the temperature of the object and its surroundings. Special techniques described within this article are necessary to distinguish the two components and obtain accurate surface temperatures for the object of interest. When imaging outdoors, it should be taken into account that the sun is a significant source of IR radiation, particularly in the shorter wavelengths of the IR band. TYPES OF IMAGING SYSTEMS General Characteristics Infrared thermographic systems are essentially imaging IR radiometers. Often, they provide IR images continuously, in real time, similar to the TV image provided by conventional video cameras. The imager itself contains, at a minimum, a detector and an image formation component. Complete thermographic systems also integrate an image processing and display system. An IR imager is often called radiometric when it is designed for measuring temperatures. Nonradiometric IR imagers are used in applications that do not require measuring quantitative temperature differences, but rather are satisfied by a qualitative image display. For example, this type of imager is used for night vision and surveillance. Nonradiometric imagers do not need extensive calibration, thermal stability, or image processing capabilities, which makes them less expensive. The two basic types of IR imagers are focal plane arrays (FPA) and scanners. There are also two categories of detector technologies used in these systems: photon absorbers and photon detectors. They are summarized in Table 1. System Selection The choice of an imaging system for a particular application depends on a number of variables, including the temperatures of the specimens to be measured, the amount of money available to purchase thermographic equipment, the necessary measurement accuracy, the ease of use, and the appropriate wavelength for the application. Table 1 lists typical applications for the various types of IR imagers. Shorter wavelength IR imagers are also referred to as near-IR imagers, because the wavelengths are near the visible range (1 to ∼8 µm). IR imagers that are sensitive in the longer wavelengths (∼8 to 14 µm) of the IR band are typically called long-wave imagers. Near-IR imagers work well for high-temperature subjects, but there is almost always enough thermal radiation for high-temperature objects to be imaged using long-wave imagers, as well. Near-IR imagers are often used to image lasers, such as in the development of LIDAR systems. Long-wave IR imagers experience fewer problems when measuring in sunlight, because the solar spectrum peaks in the visible and has very little power at longer IR wavelengths. More materials are transparent to near-IR than to long-wave IR. To measure the surface temperature of an object, it is important to choose a detector that
INFRARED THERMOGRAPHY
805
Table 1. Summary of Specifications of Various Types of IR Imaging Systems
IR Imager System Types
Resolutions Available (pixels)
Response Ranges (µm)
NETD (noise equivalent temperature difference) (mK)
Typical uses
FPA, Indium Gallium Arsenide (InGaAs)
320 × 256, 640 × 512
0.9 to 1.68
FPA, indium antimonide (InSb)
320 × 256, 640 × 512
1.0 to 5.4 3 to 5
FPA, microbolometer (absorber)
160 × 128, 320 × 240
8 to 14
FPA, quantum well infrared photodetectors (QWIP).
320 × 256, 640 × 512
Narrow band within 8 to 10
30
R&D, laboratory
Scanner, mercury cadmium telluride (HgCdTe or MCT)
Variable, 173 × 240, 1, 024 × 600,
8 to 14, 3 to 5
50
R&D, laboratory
uses wavelengths to which the object of interest is opaque. Conventional glass is a common material that is transparent to near-IR, yet opaque to long-wave IR. IR imagers that have large array sizes tend to be more expensive to produce largely because of their increased complexity and the challenge of making large arrays without excessive numbers of nonoperational pixels. For many applications, it will be important to consider the imager’s instantaneous field of view (IFOV) measured in milliradians, a combination of array size and optical arrangement that determines the physical dimensions represented by any one pixel for a given distance from the imager. Special optics are available to allow IR imaging on the micron length scale; however, most IR imagers have a wide field of view, resulting in pixels whose physical dimensions are of the order of millimeters for object distances in meters. Some detectors require cooling to low temperatures using liquid nitrogen or a closed-cycle Stirling cooler. The need for cooling can add inconvenience, expense, and start-up delay, depending on the cooling method used. Stirling coolers are expensive and require several minutes of operation before the imager can be used. Liquid nitrogen cooling is inexpensive; however, it can be inconvenient for field use where it may not be readily available. There are also detector technologies that do not require cooling.
N/A
IR Laser/LED development, high temperatures
25
General, military imaging, fast
100
General, predictive maintenance, fire fighting
to IR, rather than to visible radiation. In an FPA, each pixel that makes up an IR image is measured by an individual detector. The detectors are arranged in a flat, two-dimensional array. One-dimensional arrays are also used to provide line imaging. The array is placed in the focal plane of the optical system (lens) of the imager, as diagramed in Fig. 2. Typically, array size does not exceed 1, 024 × 1, 024 pixels. However, larger arrays are produced on a custom basis for specialized uses, such as astronomical imaging. Operability describes the percentage of functional pixels in an array. Most fabrication processes yield operability greater than 99%. Still, most arrays inevitably contain bad pixels. Software usually masks this defect by interpolating missing data from neighboring pixels. In contrast to scanners, FPA imagers are more mechanically robust and provide improved depth of field and field of view, as a result of a simple optical design. FPAs do not have any moving parts, other than a simple focus adjustment. Scanners In a scanning imaging system, one or several detectors are combined with a single or multiple-axis mirror system. Images are acquired sequentially by combining
“Whisk-broom” imaging
Focal Plane Arrays
Discrete detector or
Mirror 2
Focal plane array (FPA) imagers are the most common types of systems today. They are analogous to the CCD arrays found in handheld video cameras but are sensitive
several detectors for multispectral imaging Mirror 1 “Push-broom” imaging
• Display • Data storage • Postprocessing Optics
Line array detector Mirror
Detector
Figure 2. Focal plane array detectors. See color insert.
Figure 3. Scanning system configurations. See color insert.
806
INFRARED THERMOGRAPHY
individual measurements. Scanning mirror imagers were used before large FPAs became readily available in the 3–12 µm range. Multiple detector units are used in multispectral imaging systems; each detector records a certain band of the spectrum. Old Landsat satellites are equipped with multispectral scanning systems. Figure 3 is a diagram of two scanning designs. So-called whiskbroom scanning uses a two-axis mirror system. Single point measurements are combined into a line; many lines then compose the final image. Interlaced whisk-broom scanning acquires every other line initially and then fills in the remaining lines in a second pass. Noninterlaced whiskbroom scanning acquires all lines sequentially. Older IR scanners are usually of the whisk-broom type. Push-broom scanning acquires an entire line at once using a linear array detector. A single-axis mirror configuration scans perpendicularly to the linear array to compose the image. Some systems can be set to different scanning speeds; 60 and 30 Hz frame rates are common. Scanning systems have the advantage that they can acquire image arrays of any size. Often the manufacturer fixes the image array size; however, there are also systems that dynamically configure the scanning head to acquire the desired image array. Mercury cadmium telluride detectors are the most frequently used single-element detectors in handheld scanning systems, although any detector that has a sufficiently fast dynamic response could be used. Scanners are delicate mechanical instruments that do not always tolerate vibration or high instrument acceleration. Another disadvantage is that the dynamic response of the detector limits the maximum gradient obtainable in an image at a given scanning speed. As a result, the frame rates are relatively low compared to FPAs. The acquisition time delay across the image may also pose a problem when it is capturing rapid transient events. Photon Absorbers A microbolometer is an example of a photon absorber. The absorber is made of a passive energy-absorbing material that is simply warmed by the IR radiation. Incoming thermal radiation results in an increase in absorber temperature that is proportional to the radiosity of the surface being imaged. The detector temperature is then determined by measuring a temperature dependent material property such as electrical resistance. The absorber has to be thermally decoupled from the substrate and the environment for maximum performance. Absorbing materials perform well across a wide range of the spectrum. Microbolometer arrays are temperature
stabilized, but do not need to be cooled to cryogenic temperatures, as is necessary for some photon detectors discussed in the following section. Pixels are typically 30 × 30 µm and are micromachined in monolithic wafers that also incorporate signal acquisition and processing. Individual pixels are suspended and electrically connected by two arms. Heat exchange through gas convection is suppressed by packing the array in a hard vacuum housing. The broadband absorption intrinsic to the detector is thus limited to 6–14 µm by the transmission of the vacuum housing window. This technology is likely to see further development in the near future that will increase sensitivity (NETD less than 50 mK). Photon Detectors Photon detectors are active elements. A photon striking the detector triggers a free charge, which is collected and amplified by an electronic circuit. Detectors and readout circuits are constructed on different substrates and are electrically connected into a hybrid assembly by indium bump bonding. A variety of detector materials is used today; each has a specific spectral range and specialized application. Some detectors require cryogenic cooling to reduce the dark current (the amount of current passed by the detector in the absence of any photon signal) to acceptable levels. The detectors in Table 2 following are most commonly used today. Indium Gallium Arsenide (InGaAs). Indium gallium arsenide detectors work best mounted on a lattice-matched substrate. Unfortunately, this arrangement also limits the spectral response to 0.9–1.6 µm, which makes this detector material unsuitable for room temperature applications. However, InGaAs is well suited for measuring temperatures above 500 ° C. Uncooled FPA and linear arrays are produced in large quantities and many size variations. This is the detector of choice for military applications such as heat-seeking missiles. InGaAs arrays employed in general purpose IR imagers are commonly 320 × 240 pixels. Indium Antimonide (InSb). Indiuml antimonide detectors are functional in the 3–5 µm range, so they are useful in room temperature IR imagers. Glass and certain plastics are partially transparent in this range of the spectrum and need to be painted or covered by an opaque adhesive tape to collect accurate radiometric data. Some materials exhibit larger reflectivity in this spectral range compared to the 8–12 µm range, so distinguishing emitted radiation from reflected background radiation may prove more challenging. Arrays up to 1, 024 × 1, 024 pixels and NETD
Table 2. Focal Plane Array (FPA) Devices Detector Material InGaAS InSb QWIP (GaAs/AlGaAs) Microbolometer HgCdTe
Wavelength(µm)
FPA Size
Sensing
Operability
0.9–1.6 3.0–5.0 8.0–9.0 7.5–13.5 (limited by optics) 1.0–20.0 (limited by optics)
320 × 240 1, 024 × 1, 024 512 × 512 320 × 240 1, 024 × 1, 024
Detection Detection Detection Absorption Detection
>99% >99% >99% Typical of Si process 85–95%
Cooling Uncooled Cooled to 70 K Cooled to ∼0.5, there is a one-to-one relationship between the wave-front variance and the Strehl ratio (29). However, for lower values, where the PSF may not have a well-defined single maximum, its peak value is no longer relevant. In such a case, the energy spread may be characterized by the encircled energy, as in the next section. The wave-front variance remains a useful way of specifying the departure of the wave front from sphericity, especially because it can be readily measured by interferometric techniques. However, it is not necessarily a good way of specifying image quality for all applications. For example, if the final detector is the human eye, it is known that the wavefront variance is poorly correlated to subjective image quality (30). Encircled Energy. There is no single criterion akin to the Strehl ratio for encircled energy. Although such criteria could be devised (e.g., the radius encircling a fixed amount of energy or the amount of energy inside a fixed radius), no particular need for doing so has arisen. Encircled energy becomes important in the presence of a photodetector of limited area or a photodetector array such as a CCD. Then, it is important to know how much of the PSF energy is contained in the pixel. Because pixels are typically square, we may also speak of ‘‘ensquared’’ energy. Figure 34 compares the encircled energy between the PSFs of Fig. 29 (diffraction-limited system with and without 12% central obscuration). Various cases have been examined by Mahajan (31). Line- and Edge-Spread Functions Line-Spread Function. The line-spread function (LSF) is the irradiance (incoherent) or the complex amplitude (coherent) distribution in the image of an ideal line object. Image formation can be described in terms of the LSF for the orientation perpendicular to the line. In other words, we can write the irradiance distribution in any direction in the image as a convolution of the irradiance distribution in the object (in the same direction) and the LSF. The LSF has two advantages; it reduces all twodimensional integrals to one-dimensional ones, and it is generally easier to measure the irradiance distribution in the image of a line than in the image of a point. The LSF is obtained from the PSF by simply integrating in the
OPTICAL IMAGE FORMATION
1.0 Relative amplitude or irradiance
1.0
0.8 Fraction of total energy
1091
0.6
0.4
0.2
0.8
Amplitude Irradiance
0.6 0.4 0.2 0.0 0
20
40
60
80
100
−0.2 −0.4 Arbitrary units of distance
0.0 0
10
20 30 40 Arbitrary units of distance
50
Figure 34. Encircled energy comparison between an obscured (squares) and unobscured (triangles) system, corresponding to the PSFs of Fig. 29.
direction of the line: +∞ G(u, v) dv
(47)
−∞
In the incoherent case and for a system that has no aberrations and a uniformly illuminated circular aperture, the LSF is given by integrating Eq. (34), expressed in rectangular coordinates. The form of the function is given in Fig. 35. The behavior of the LSF in the presence of aberrations can also be readily predicted numerically. Aberrational tolerance theory based on the LSF has been developed (32).
1.0
Relative irradiance
0.8 LSF PSF
0.4
0.2
0.0 40
60
80
100
Arbitrary units of distance Figure 35. Incoherent LSF (diamonds) for a diffraction-limited system that has a circular pupil, compared with the corresponding PSF (squares).
sin(2π u) πu
(48)
and the corresponding irradiance distribution by the square of this function. The two functions are shown in Fig. 36. The irradiance distribution in the coherent LSF has zeros, unlike the incoherent case. The location of the first zero is given by u = 0.5, or, for nonnormalized coordinates (Eq. 44), ξ=
0.5λ n sin a
(49)
Edge-Spread Function. The edge-spread function (ESF) is the irradiance (incoherent) or complex amplitude (coherent) distribution in the image of an edge. In the incoherent case, there is no closed-form expression even for a diffraction-limited system. The ESF is obtained by convolving the LSF with a step function that represents the edge. Alternatively, we may express the LSF [G(u)] as the derivative of the ESF [E(u)]: G(u) =
0.6
20
If the line object is coherently and cophasally illuminated, the complex amplitude distribution in the image is given simply by A(u) = (const)
LSF ≡ G(u) =
0
Figure 36. Amplitude (diamonds) and irradiance (squares) distribution in the image of the coherent LSF of a system that has no aberrations and a circular pupil.
dE(u) . du
(50)
This formula is more useful in practice because the most common use of the ESF is to determine the LSF and through it, the modulation transfer function (MTF) see next section). The incoherent ESF is a monotonic function for any combination of aberrations. It is shown in Fig. 37. For coherent case in the absence of aberrations, the complex amplitude distribution in the image of an edge is given by 1 (51) A(u) = 0.5 + Si(2π u) π cophasal illumination), where Si(z) = (assuming z sin x/x dx, the so-called Fresnel sine integral. The 0
1092
OPTICAL IMAGE FORMATION
The Image as a Fourier Transform
1.0
0.8 Relative irradiance
LSF ESF 0.6
0.4
0.2
0.0 0
20
40 60 80 Arbitrary units of distance
100
Figure 37. Incoherent ESF (squares) for a system that has no aberration and a circular pupil, compared with the corresponding LSF (diamonds).
1.2
Any periodic function f (x) that satisfies certain simple conditions (see Appendix) can be written as a summation of sinusoidal functions whose frequencies are equal to integral multiples of the frequency of the original function. Thus an image (or irradiance distribution) can be analyzed in terms of its harmonic (Fourier) components, or the components can be added (synthesized) to reconstruct the image. Fourier synthesis is illustrated by an example in Fig. 39, where we show a periodic function (sawtooth wave) and the function that results from summing the first four harmonic components (whose periods are p, p/2, p/3, p/4). A remarkably good representation of the function is obtained, even with only four harmonic components. The (ir)radiance distribution that corresponds to the sawtooth function of Fig. 39 is shown in Fig. 40. A sinusoidal irradiance distribution, whose period is the same as the sawtooth, is shown in Fig. 41. It is the addition of sinusoidal patterns (or gratings) similar to this that can be used to synthesize the distribution of Fig. 40, but there is a subtlety that must be appreciated.
Relative amplitude or irradiance
1.0 1.0
0.8 0.8
0.6 Complex amplitude Irradiance
0.6
0.4 0.4
0.2 0.2
0.0 −50
−30
−10
10
30
50
−0.2 Arbitrary units of distance Figure 38. Amplitude (diamonds) and irradiance (squares) distribution in the coherent image of an edge or a system that has no aberrations and a circular pupil.
0.0 0.0
0.5
1.0
1.5
2.0
2.5
3.0
Figure 39. A sawtooth function whose slope is −1 and period 1, and its harmonic approximation, 4 limited to the first four Fourier coefficients: f (x) ∼ = 1/2 + 1/π n=1 1/n sin(2π n).
corresponding irradiance pattern, B(u) = |A(u)|2 , is shown in Fig. 38. It can be seen that the coherent image of an edge shows ringing, unlike the incoherent ESF. The maximum value of the ESF is 1.18 (if the geometric edge is unity). This maximum value occurs at u = 0.5, or ξ = 0.5λ/(n sin a). IMAGE FORMATION IN THE FREQUENCY DOMAIN (FILTERING). THE OPTICAL TRANSFER FUNCTION The necessary mathematical background for this section is summarized in the Appendix. A reader unfamiliar with the basics of Fourier transforms will need to consult the Appendix before proceeding further.
Figure 40. A sawtooth irradiance distribution. This function may appear to differ visually from a sawtooth near the discontinuity due to the property of the human eye that exaggerates luminance discontinuities, as well as imperfections in the printing process.
OPTICAL IMAGE FORMATION
1093
proceeding further, the variables x and y used in the Appendix will be replaced by u and v to keep a consistent notation with the section ‘‘Image Formation in the Spatial Domain,’’ where the image coordinates were (u, v) and the pupil coordinates were (x, y). If f (u, v) is an irradiance distribution, it may be written as
f (u, v) =
+∞ +∞ F(σ, τ ) exp[i2π(σ u + τ v)] dσ dτ,
(53)
−∞ −∞
where +∞ +∞ F(σ, τ ) = f (u, v) exp[−i2π(σ u + τ v)] du dv.
(54)
−∞ −∞
Figure 41. An approximately sinusoidal irradiance distribution.
A function that represents a physically plausible (ir)radiance distribution cannot contain negative values. The full Fourier series expansion of the sawtooth function of Fig. 39 is given by [see Eqs. (A1)–(A4)] f (x) =
∞ 11 n 1 + sin 2π x , 2 π n=1 n p
(52)
where the coefficients of the cosine terms are all zero. It will be seen that the sinusoidal components contain negative values, unlike the irradiance distribution of Fig. 41. Thus, the harmonic components of an image are not necessarily themselves true images (or physically possible irradiance distributions). In the example of Eq. (52), the addition of the constant term 1/2 makes f (x) overall positive, but this is not the same as adding a series of wholly positive ∞ sinusoids. Such a series would be represented as n=1 [1/n + 1/n sin(2π(n/p)x] because the minimum constant that can be added to a sinusoid to make it positive is its own amplitude. This last series ∞ would give the same result as Eq. (52) if the series n=1 (1/n) converged to π/2. However, this series does not converge; hence, the addition of fully positive sinusoids does not reproduce f (x). The fact that there is no physically negative (ir)radiance1 ultimately does not affect the utility of Fourier analysis in describing an image, nor does it diminish the significance of sinusoidal (ir)radiance distributions as test objects that have special properties. Now, we wish to consider the image in two dimensions and extend the description to nonperiodic functions. Before
1
There are certain special cases, for example, the phenomenon of lateral inhibition in vision or of adjacency in photography, where the response of the recording medium can be described through a point-spread function that has negative irradiance. The physical interpretation of such negative irradiance is that the excitation of a point reduces the response of neighboring points. This exception has no effect on this discussion, and in any case, it is associated with recording media (retina, film) rather than purely optical systems.
The last two equations show that we can consider an image as the summation (or integral) of an infinite number of sinusoidal components of various periods (frequencies) and orientations. Fourier analysis provides us with an alternative way of thinking about an image: instead of being concerned about its visible features, we are concerned about its spatial frequency content. This shift of viewpoint proves to be of great utility. For example, one may perform operations that directly affect the spatial frequency spectrum of an image; some examples are given in the following sections. The Optical Transfer Function Definitions The fundamental significance of the sinusoidal distribution in image formation is that it remains sinusoidal when transmitted by an optical system, except for amplitude and phase. This is a cornerstone fact of image formation, and it is an immediate consequence of the linearity property of optical systems. Linearity means that no higher harmonics are generated by the system. When this fact is coupled with the Fourier method of describing an image, the general problem of image formation is reduced to the problem of imaging a sinusoid (or better, a set of sinusoids of various frequencies and orientations), just as in the section ‘‘Image Formation in the Spatial Domain,’’ the problem of image formation was reduced to determining the image of a point. And although substituting the image of a point by that of many sinusoids might appear a more laborious way of describing image formation, it ultimately leads to a simpler way of describing the action of an optical system — or, if not simpler, at least complementary and of equal significance. The concept of the transfer function is perhaps more familiar in audio engineering. Audio amplifiers are designed to have a flat frequency response over the 20 Hz to 20 kHz band. This means that they must transmit all pure tones (sinusoids) across that band in undiminished (or uniformly amplified) amplitude, independent of frequency. Similarly, they are not supposed to introduce any phase shift, and harmonic distortion (extraneous frequencies not present in the original signal) must be kept very low. An incoherently illuminated optical
1094
OPTICAL IMAGE FORMATION
system can be thought of as an amplifier with a gain ≤1 that is inherently incapable of introducing harmonic distortion but may introduce a nonlinear, frequencydependent phase shift. Let B(u, v) = a + b cos[2π(σ u + τ v)] (55)
magnification. When numerical values for the frequency are quoted in a system that has nonunity magnification, they usually refer to the image space. The functions M and φ in Eq. (58) are the modulus and phase of a single function: D(σ, τ ) = M(σ, τ ) exp[iφ(σ, τ ],
be a cosinusoidal radiance distribution (grating) or object. We assume incoherent illumination for now. The period d and the orientation θ of this cosinusoidal grating are given by d = (σ 2 + τ 2 )−1/2 , τ θ = arctan σ
a
B (u , v ) =
+b cos(2π(σ u + τ v))] [
× G(u − u, v − v) du dv,
(57)
where it is understood that the integration extends across the image area (or the area across which the PSF remains stationary). Performing the integration and after appropriate substitutions of variables that are not detailed here, we obtain the following result: B (u , v ) = const.{a + bM(σ, τ ) cos[2π(σ u + τ v ) + φ(σ, τ )]}
(58)
This equation shows that the irradiance distribution is also cosinusoidal and has the same frequency and orientation as the object, but has different contrast (bM/a) and a different phase (φ). Notice that to derive Eq. (58), we only made the assumptions of linearity and stationarity that are inherent in the convolution Eq. (57). No additional assumptions were needed; hence, the fact that a sinusoid is imaged as a sinusoid is inherent in those two assumptions. The reader may object at this point that the image of the sinusoid cannot be another sinusoid of the same period unless the system has unity magnification. This is true of course, but Eqs. (57) and (58) are strictly correct as written; the normalized coordinates we are using [defined through Eq. (44)] actually account for the magnification because they contain the numerical aperture [see also Eqs. (10) and (24)]. Thus, when we say that the image is a sinusoid of the same frequency, we mean the same frequency in the normalized coordinate space. In real coordinate space, the grating period scales according to the If both a and b are positive and b ≥ a, then the radiance distribution B is physically possible, that is, it contains no negative values. However, the derivation that follows is independent of that assumption.
2
(
u, v) exp[−i2π(σ u + τ v)] du dv
G
(
,
(60)
u, v) du dv
G
where the first of these two relationships arises simply from Pythagoras’ theorem. We also define the contrast or modulation of the grating as the ratio b/a.2 Now, apply the convolution Eq. (40) to determine the image of this grating. If G(u , v ) is the system PSF, the irradiance distribution in the image will be (from Eq. 40)
where D(σ, τ ) is given (from the derivation of Eq. 58) as
D(σ, τ ) = (56)
(59)
where the limits of integration are the same as in Eq. (57). This last equation may explain how G(u, v) seemed to have disappeared in going from Eq. (57) to (58). It was in fact absorbed in D(σ, τ ). The function M(σ, τ ) is the ratio of the image contrast or modulation (bM/a) to the object contrast (b/a). It is called the modulation transfer function (MTF). The function φ(σ, τ ) is the phase shift between image and object. It is called the phase transfer function (PTF). The function D(σ, τ ), which has the MTF as its modulus and the PTF as its phase, is called the optical transfer function (OTF). Under the assumptions of linearity and stationarity, the OTF fully characterizes image formation by an optical system. Eq. (60) shows that the OTF is given by the normalized (inverse) Fourier transform of the PSF G(u, v). The two ways of describing image formation (from the PSF and from the OTF) are intimately related. This may first be appreciated from Eq. (60) that related the PSF and OTF. The convolution theorem (Eqs. (A25), (A26) in the Appendix) also states that if a function can be written as the convolution of two others, its Fourier transform is given by the product of the Fourier transforms of the two functions. From the convolution Eq. (40) that relates the (ir)radiance distribution in the object and the image through the PSF, we obtain the following relationship: (61) b (σ, τ ) = b(σ, τ )g(σ, τ ), where g(σ, τ ) is the nonnormalized version of the OTF. This equation states that the Fourier (spatial frequency) spectrum of the image [b (σ, τ )] is given by the Fourier spectrum of the object, multiplied by the OTF. Thus the OTF acts as a spatial frequency filter on the frequency spectrum of the object. Calculation of the OTF. The steps for calculating the OTF have already been outlined. First, the pupil function f (x, y) must be established, which involves computing (by ray tracing) the wave-front aberration W(x, y) as well as the pupil transmittance [Eq. (42)]. Next, the Fourier transform of the pupil function gives the complex amplitude distribution F(u, v) in the image of a point [Eq. (43)]. The squared modulus of F is the incoherent PSF, G(u, v) [Eq. (45)]. Finally, the
OPTICAL IMAGE FORMATION
(normalized) inverse Fourier transform of the PSF gives the OTF [Eq. (60)]. This method of computing the OTF is occasionally employed because of the availability of fast Fourier transform (FFT) numerical routines that speed up the calculations considerably. However, there is another, more direct method of computation, which is preferable in many ways (33). This alternative is based on a mathematical theorem that allows us to relate the OTF directly to the pupil function. The theorem may be stated as follows: If a function A(x, y) is the Fourier transform of another function a(u, v), then the autocorrelation of a(u, v) is given by the inverse Fourier transform of the squared modulus of A(x, y) (see also Appendix). In this case, the OTF is the inverse Fourier transform of the squared modulus of the complex amplitude distribution in the image, which in turn is the Fourier transform of the pupil function. Hence, we may write the OTF directly as the autocorrelation of the pupil function: 1 A
(
S
1
−1
x −1 s
Figure 42. Illustration of the autocorrelation of a square pupil in the x direction. The area of overlap S is shown shaded.
1
x, y)f ∗ (x − σ, y − τ ) dx dy
(62)
MTF
D(σ, τ ) =
y
1095
f
where A is the pupil area and * denotes complex conjugate. The limits of integration can be considered infinite, but in practice the integral is nonzero only over the area of overlap of f (x, y) and the shifted function f ∗ (x − σ, y − τ ). In writing this equation, it would appear that the physical units are wrong if we are subtracting σ (spatial frequency or inverse length) from x (length). It is important to understand that all variables are normalized to be unitless [see Eq. (44) and associated discussion]. Without the appropriate normalization, all of these equations expressing the PSF, the OTF and the Fourier transformations become a lot more cumbersome. Some simple examples of the OTF are given below, for zero aberrations and unity pupil transmittance, that is, f (x, y) = 1 inside the pupil and zero outside. In that case, the OTF depends solely on the shape of the pupil. Along the axis of a rotationally symmetrical optical system, the pupil shape is typically circular, but off-axis it may be elliptical or more complicated. Of course, it is also possible to have a noncircular aperture stop that would then define the pupil shape. Because there is no phase term, the OTF is the same as the MTF, and the integral, Eq. (62), simplifies to D(σ, τ ) = M(σ, τ ) =
1 A
dx dy =
S(σ, τ ) A
(63)
S
where S is the overlap area between the pupil function and its shifted version. For an example that provides the simplest analytical expression, suppose that the system has a square pupil shape, as in Fig. 42. The pupil area A is 4, and the MTF (S/A) is given by 1 − σ/2. Thus, the MTF is linear with σ and reaches zero at σ = 2. Clearly, the autocorrelation can proceed in directions other than the x or y axes, for example, along the diagonal. In the latter √ case, the cutoff frequency would be 2 2. The MTF in the x direction is shown in Fig. 43.
2
0
s
Figure 43. MTF (or OTF) shape for a system that has a square pupil and no aberration, corresponding to the autocorrelation operation of Fig. 42.
In the more common circular pupil of unit radius, the overlap area between two circles has a somewhat more complicated analytical expression. The MTF in this case is given by D(σ ) = M(σ ) σ σ σ 4 1 = arccos − sin arccos , (64) π 2 2 4 2 and the MTF is the same in any direction. It is shown in Fig. 44. The cutoff frequency is σ = 2 because of the unit radius pupil normalization scheme. In other words, the pupil overlap goes to zero when the shifted function has moved by two radii. By inverting Eq. (44), we can obtain the denormalized spatial frequency N as N=
(n sin a)σ , λ
(65)
where all quantities refer to the image space. The OTF in the Presence of Aberrations. In the presence of aberrations, the MTF and the OTF are no longer the same. The simplest phase term or aberration is defocus. The effect of defocus on the OTF was determined analytically by Hopkins (34), although nowadays similar computations are always performed numerically. Like the PSF, the OTF in the presence of defocus is a wholly
1096
OPTICAL IMAGE FORMATION
1.0
(a)
MTF
0.8 0.6 0.4 0.2 0.0 0.0
0.5
1.0 s
1.5
2.0
Figure 44. MTF (or OTF) for a system that has a circular pupil and no aberrations. The normalized spatial frequency σ relates to the nonnormalized frequency N through σ = λN/(n sin a), where n sin a is the numerical aperture in the image space.
real function, but may contain negative values. This is illustrated in Fig. 45. The negative values imply a phase shift of 180° which produces a contrast reversal in practice; the bright grating bars become dark and vice versa. This is depicted in Fig. 46 using a target that has two different spatial frequencies. When the MTF goes to zero, the corresponding spatial frequency is not transmitted. This is a form of incoherent spatial filtering; certain frequencies are filtered out of the image, and others are modified in different ways. Such incoherent spatial filtering can also be performed by pupil apodization, that is, by modifying the pupil transmittance. For an extreme example, suppose that the pupil is obstructed by a mask that allows light to pass through only two squares as shown in Fig. 47. By performing the autocorrelation of the pupil function in the absence of aberrations [Eq. (63)] in the +x direction, we obtain the MTF shape in Fig. 48 that shows an entire band of missing frequencies. A less extreme example is given in Fig. 49, where the MTF of a centrally obscured system such as an astronomical telescope is compared with the MTF of an unobscured system of the same aperture. It can
1.0 0.9
Diff. limit OTF 1 wave of defocus
0.8
0.6 OTF
Figure 46. (a) A square wave object that has two different frequencies, such that the low frequency falls in the positive part of the MTF of Fig. 45, and the high frequency falls in the negative part. (b) The image of the square wave object through a system that has an OTF, as shown in Fig. 45. Notice that the bright and dark bars are reversed for the high-frequency grating (contrast reversal).
be seen that the effect of the obscuration is to suppress low frequencies and to enhance higher frequencies. In the presence of aberrations other than defocus, the MTF can take a variety of shapes that are constrained by the following conditions (35): 1. The MTF has a maximum of unity at zero spatial frequency3 (this says in effect that the best imaged object is no object at all; any contrast in the object will be attenuated). 2. At any given spatial frequency, the aberrated MTF is always lower than the unaberrated MTF of a system that has the same aperture. 3. The MTF slope at the origin is independent of aberration.
0.7
0.5 0.4 0.3 0.2 0.1 0.0 −0.1
(b)
0.0
0.5
1.0
1.5
2.0
s
Figure 45. OTF for a system suffering from one wave of defocus, compared with the diffraction-limited case (top curve).
3 Some
nonoptical systems like photographic film can have an MTF that peaks at a frequency >0. This is intimately related to the so-called adjacency or inhibition effects, mentioned previously.
OPTICAL IMAGE FORMATION
a
2a
a
Figure 47. An apodized pupil that has two square transmitting areas.
MTF
1.0 0.5
a
0.0
2a s
3a
4a
Figure 48. MTF (or OTF) in the x direction of a system that has a pupil shape such as shown in Fig. 47.
1.0 Diff. limit 12% obscuration 0.8
MTF
0.6
0.4
0.2
0.0 0.0
0.5
1.0 s
1.5
2.0
Figure 49. Comparison between obscured and unobscured diffraction-limited MTFs, corresponding to the PSFs of Fig. 29.
The MTF is a useful tool, but it is not always used as carefully as it requires. If the MTF is to be used successfully, one must keep in mind the following:
1. The MTF is a two-dimensional function. When shown as one-dimensional, one must always be aware that it represents only one object (grating) orientation unless the system is either diffractionlimited or suffers from spherical aberration but no other asymmetrical aberrations.
1097
2. Several aberrations like spherical or astigmatism and their higher order counterparts interact with defocus in a complicated way. In the presence of such aberrations, the optimum focal plane is not universally defined. A different choice of focus will yield a different MTF. In general, the effect of aberrations on the MTF is to suppress certain frequency bands more than other bands, as a function of focus. Thus it is often impossible to define a ‘‘best’’ MTF for an optical system suffering from significant aberration, unless a specific criterion for choosing the best MTF is applied. 3. It is dangerous to use only the MTF and neglect the PTF because the latter may cause significant image distortion. If the PTF is zero or linear, then it has no effect on the image. In the linear case, all sinusoidal components are shifted by an amount proportional to their frequency, resulting in a bodily image shift but no distortion because all sinusoidal components still add in phase. The effect of the nonlinear part of the PSF on image quality has been examined by Hopkins (36). Asymmetrical aberrations like coma have the greatest effect on the nonlinear components of the PTF. Examples have been shown by Hopkins (36) and Mahajan (10). Finally, the PTF may also be neglected when the MTF is high, as, for example, at low frequencies. If an optical system is followed by a sampling detector that cannot reproduce the high frequencies transmitted by the optics, then only the low frequencies matter, and in such a case the PTF may be entirely negligible. The OTF can be measured in a number of ways, and a considerable body of literature exists on this topic. In brief, we may state that although the OTF can be computed from a measurement of the PSF, such a measurement is not always feasible due to the small amount of light that passes through a pinhole (although the introduction of low-noise CCD detector arrays has facilitated such measurements). More often, the LSF is used, whose Fourier transform yields the OTF in the direction perpendicular to the line. The procedure may then be repeated for additional azimuths. When employing this process, a correction must generally be applied for the finite width of the object slit, and for the partial coherence in the slit illumination (37). An edge may also be used as an object, although the differentiation needed to derive the LSF (and hence the MTF) tends to increase the computational error. Interferometric techniques can provide a direct representation of the pupil function and hence, the OTF by autocorrelation. A detailed account of techniques and methods has been provided by Williams (38). The frequency response techniques outlined here can be generalized to handle optical systems and also the detector of the optical radiation or even the transmitting medium, such as the atmosphere. So, for example, it is possible to model a complete Earth-orbiting remote sensing system by taking the product of the transfer functions of the atmosphere, the optics, and the detector (such transfer functions are discussed in other articles
OPTICAL IMAGE FORMATION
in this encyclopedia). The multiplication of the transfer functions is valid if the effects of the various system components are decoupled. This means, for example, that the atmosphere cannot correct the effects of the lens aberrations, or that the lens aberrations cannot correct the effect of the detector. This condition is normally satisfied for physically different systems such as a detector and a lens, but it is not satisfied for two cascaded lenses because the aberrations of the first may cancel the aberrations of the second. Two lenses in cascade may be decoupled if the image of the first lens is formed on a diffusing screen that is then reimaged by the second lens. In such a case, the transfer functions of the two lenses may be multiplied because the diffusing screen destroys the coupling between the lenses and does not permit the second lens to correct the aberrations of the first. Coherent Image Formation Fundamentals. Coherent image formation has several subtleties as well as conceptual differences from the incoherent case. A detailed treatment of coherent image formation is outside the scope of this article because the entire subject holds, for now, relatively little interest outside the field of optics. Thus we will only hint at the subtleties and broadly describe the conceptual differences and similarities. The first broad similarity between incoherent and coherent imaging is that the latter can also be described as a convolution with the coherent PSF, as shown by Eq. (41). The linearity condition is satisfied for a coherently illuminated system, and the stationarity condition merits some more discussion later. Equation (41) relates the image and object complex amplitude distributions through a convolution with the PSF. The convolution theorem of the Appendix then tells us that the Fourier transforms of those functions will be related through a mere multiplication. Specifically, (66) a (x , y ) = a(x, y)f (x, y), where f (x, y), the Fourier transform of the PSF is the pupil function, as we have established previously [Eq. (42)]. Therefore, Eq. (66) tells us that the Fourier (frequency) spectrum of the image is given by the Fourier spectrum of the object multiplied by the pupil function, which plays the role of the coherent transfer function (CTF). Thus, for example, in the absence of aberrations and apodization, the CTF is the simple top-hat function of Fig. 23, for a system that has a circular pupil shape. In summary, then, coherent image formation can be described as follows: From the complex amplitude distribution in the object A(u, v), we pass to the amplitude distribution across the entrance pupil a(x, y) through a Fourier transformation. Multiplication of a(x, y) by the pupil function f (x, y) gives the amplitude distribution across the exit pupil (a ). Finally, an (inverse) Fourier transformation gives the complex amplitude distribution in the image (A ). Based on this description, we can predict the images of several typical objects. Consider first a low-frequency object, whose spatial frequency spectrum does not extend outside the pupil radius (taken as unity). This means
that a(x, y) = 0 for any x2 + y2 > 1. If the pupil function contains no apodizing term or aberrations, all spatial frequencies inside the pupil will be transmitted with undiminished amplitude, and the image will be a perfect reconstruction of the object. This contrasts sharply with the incoherent case, where the only perfectly imaged object was no object at all (zero spatial frequency). The images of objects that contain frequencies that fall outside the pupil will appear low-pass filtered. Take, for example, a sawtooth function similar to that of Fig. 39, where the abscissa is understood as amplitude rather than irradiance. If the extent of the pupil allows only the first four terms to pass, then the resulting image will have an amplitude distribution similar to that shown by synthesizing the first four harmonics. However, a photodetector such as the eye records only the irradiance and not the amplitude of the wave. Therefore, the final appearance of the image would correspond to the square of the function that arises from the superposition of the first four harmonics. Mills and Thompson (39) described in detail the effect of aberrations and pupil apodization in the images of coherently illuminated systems for standard test objects, including a point, slit, edge, and two adjacent points. It is common to compare coherent and incoherent transfer functions, as shown in Fig. 50. However, this figure, a version of which appears in several texts, is conceptually misleading because the spatial frequency axis refers to two different physical quantities, amplitude and irradiance. A sinusoidal amplitude distribution (grating), when squared, will yield an irradiance distribution of twice the frequency [through the trigonometric identity cos2 x = 1/2 + 1/2 cos(2x)]. Thus the sinusoidal grating that corresponds to the cutoff point of the CTF would appear visually as a grating of the same frequency as the cutoff point of the incoherent OTF. It is important to understand in more detail the nature of the ‘‘object’’ in coherent imaging. Up to this point, we have assumed that the object is a spatial
1.0 0.8 Transfer function
1098
0.6 0.4 0.2 0.0 0.0
0.5
1.0
1.5
2.0
Norm. spatial frequency Figure 50. A common but somewhat misleading comparison between the diffraction-limited coherent and incoherent transfer functions for a system that has a circular pupil. In the coherent case, all frequencies up to unity (the pupil radius) pass through without attenuation. However, the spatial frequency axis refers to different types of objects in the two cases (amplitude or irradiance gratings).
OPTICAL IMAGE FORMATION
(a) (b) Figure 51. Schematic of an amplitude (a) and a phase object (b). The latter is like a perfectly transparent piece of glass that has a surface relief pattern as shown.
distribution of complex amplitude, without giving any further details. Such a complex amplitude distribution will have the general form f (x, y) = U(x, y) exp[iφ(x, y)]. In this expression, U is the real amplitude (transmittance or reflectivity) variation, and φ is the variation in optical thickness or phase (Fig. 51). Physically, the object may be taken as an illuminated transparency. The illuminating wave itself may have amplitude and phase variations and is thus represented by a function fi (x, y) = Ui (x, y) exp[iφi (x, y)]. Under the assumption that the spatial detail in all four of these functions is considerably coarser than the wavelength, the complex amplitude distribution emanating from the object plane is given by the product of the two distributions fo (x, y) ≡ fi (x, y)f (x, y) = Ui (x, y)U(x, y) exp{i[φi (x, y) + φ(x, y)]} ≡ Uo (x, y) exp[iφo (x, y)]
(67)
Details in the object comparable to the wavelength are generally of interest for their diffractive, rather than imaging properties (in any case, high frequencies are likely to be cut off by the system aperture, as Fig. 50 indicates). Thus the term ‘‘object,’’ as used so far, implies the complex amplitude distribution fo (x, y) that contains the characteristics of the object and of the illuminating wave. Now, consider the question of stationarity of the amplitude PSF that was postponed previously. In addition to the requirement for the invariance of aberrations with field, as for incoherent imaging, we also need to ensure that there are no phase terms that vary with field and would not appear in the incoherent PSF. In the normal imaging situation for a planar object and image and finite pupil locations, there are two such terms. One is a linear obliquity term, and the other is a quadratic optical path variation term (40). Space prohibits further details, but it is noted that both of these terms can be eliminated if instead of a planar object and image surfaces, we consider spherical surfaces centered on the entrance and exit pupil, respectively. Then, the illuminating wave must also be a spherical wave focusing on the entrance pupil. Under these conditions, the illumination is cophasal, and the additional phase terms disappear, ensuring stationarity of the amplitude PSF and the validity of the convolution integral, Eq. (41). Of course, a planar object can also be handled if illuminated by a plane wave. This places the entrance pupil at infinity, and therefore the optical system must be appropriately designed. An alternative way of handling planar objects is by matched illumination. In this case, the planar object is illuminated by a spherical wave focusing at the entrance pupil. Then, there is an exact Fourier transform relationship between the object distribution and the
1099
entrance pupil, provided the entrance pupil surface is taken as a sphere. (If the illuminating wave is not perfectly spherical, that is, it has aberrations, then the FT of the product of the object function and the aberration terms is obtained, as noted previously). If further the exit pupil and the image surfaces are also taken as spherical, then the stationarity requirement is satisfied exactly (40). However, many practical situations can be handled without concern about the difference between spherical and plane surfaces because the irradiance distribution is very similar in the two cases, and several operations can be performed on the irradiance alone. Finally, we devote a little space to the intuitive understanding of Fourier analysis for a complex amplitude object. To recall the incoherent case, we saw that the fundamental building blocks of any incoherent image were sinusoidal irradiance distributions (albeit fictitious ones, that contain negative values). What are the corresponding building blocks of a complex amplitude distribution? The answer is that any complex amplitude distribution can be analyzed as a set of interfering plane waves at various directions of propagation. This is the so-called angular spectrum. This may be seen as follows: The Fourier transform relationship exp [i2π(ux + vy)] du dv a(x, y) =
(68)
A(u,v)
holds between the complex amplitude distribution in the entrance pupil a(x, y) and the object A(u, v). The inverse relationship, , A(u, v) =
(69)
a(x,y) exp[−i2π(ux+vy)] dx dy
expresses the object distribution as a superposition of plane waves. The plane waves are represented by the exponential term and have a direction of propagation given by the direction cosines α = xλ and β = yλ [see also (17)]. This is illustrated in Fig. 52, where we consider a simple, one-dimensional (coarse) diffraction grating, which is simply a one-dimensional periodic distribution of amplitude and/or phase. The grating is illuminated by a plane wave, as shown. It is a well-known result of wave optics [e.g., (12)] that after the grating, there will be plane waves emanating at various angles. Each such wave is called a grating order and corresponds to a harmonic term in the Fourier series of the grating profile [e.g., Eq. (A21) or (52) for a specific example]. This is the physical implementation of Eq. (68), which, however, refers to a general nonperiodic object. Now, if the direction of propagation is reversed, then the amplitude distribution at the plane of the grating is obtained as the superposition of the plane waves. This is the physical meaning of Eq. (69). Coherent and incoherent illumination are two extreme cases. The general case is partially coherent illumination. Fortunately, most practical imaging problems and systems satisfy the coherence or incoherence conditions quite well.
1100
OPTICAL IMAGE FORMATION
−3
−2
−1 0 +1 +2
+3
Figure 52. A plane wave incident on a periodic object (grating) and the resulting angular spectrum of diffracted plane waves, each corresponding to a grating order.
In general, partial coherence is significant in systems that incorporate illumination subassemblies, most commonly microscopes, or, more recently, photolithographic projection systems. Hopkins (40) showed that the problem of partially coherent imaging can be reduced to the superposition of the image intensities produced by waves emanating from each point of a properly defined ‘‘effective source’’ that simulates the partial coherence conditions across the illuminated object plane. Spatial Filtering: An Example and Application of Coherent Imaging. All of the previously discussed concepts of coherent imaging can be illustrated by the following example of a system, shown in Fig. 53. A laser beam is focused to a point and then expanded to illuminate the object. The system uses a plane wave to illuminate the planar object and therefore has the entrance pupil at infinity. If the image is to be obtained on a plane, then the exit pupil must also be at infinity. The plane noted as ‘‘filter plane’’ is also the image of the entrance and exit pupils (i.e., the aperture stop). The Fourier transform of the object amplitude distribution is formed at the entrance pupil by the angular spectrum of plane waves that emanate from
Figure 53. A system for illustrating coherent imaging concepts and for performing spatial filtering experiments. The two lenses have equal power and are separated by twice their focal length. Object and image are one focal length away from the first and second lens, respectively.
the object. The lens immediately after the object will image the entrance pupil at the aperture stop, where a scaled version of the Fourier transform is obtained. The following lens performs the inverse Fourier transformation, and the image appears at the image plane. If the lenses are well corrected for aberrations, then the Fourier transformations are exact and the stationarity conditions are satisfied across the entire image. This system becomes more interesting if one places various filters at the filter plane. Then, amplitude leaving the filter plane will be the product of the incident and the filter transmission, as in Eq. (67). This allows us to perform operations in the Fourier domain and thus alter the appearance of the image in specified ways. To illustrate the process using an experiment of some historic importance, consider a square grid as an object (Fig. 54). The scaled Fourier transform of the object transmittance will appear at the filter plane and will have a form similar to that shown in Fig. 55. In a landmark experiment (although performed in a somewhat different setup), Porter (41) took a transparent strip as a filter that allowed only the vertical or horizontal orders to pass. In each case, the image turned into a vertical or horizontal one-dimensional grating (Fig. 56). Although the outcome of this experiment might sound obvious after the introduction of the Fourier transform and the imaging equations of the previous sections, it was performed at a time when none of these concepts were in place and led to a revolutionary understanding of the process of image formation, as described briefly in the next section. Several other filtering operations can be performed by using the same setup. For example, a still picture of a TV frame displays the characteristic raster scan pattern in the form of fine horizontal lines. The spectrum of these lines is a set of dots in the perpendicular direction that are quite clearly distinguishable from the spectrum of the rest of the image. The dots may be described mathematically as a series of delta functions (if the grating is considered of infinite extent) that correspond to the Fourier transform of a periodic object [Eq. (A21)]. A well-constructed filter will eliminate the spectrum of the lines and will only minimally impact the rest of the image. Matched filters can also be constructed to aid in optical recognition of specific features within the image. All of these applications are described in several texts (11,13,17,20). THE ABBE THEORY OF IMAGE FORMATION AND RESOLUTION LIMITS Ernst Abbe (42) was the first to understand image formation as a diffraction and interference process. He was particularly interested in image formation by a
Filter plane Image
Plane wave Object
f
f
f
f
OPTICAL IMAGE FORMATION
Figure 54. A square grid object.
Figure 55. Approximate appearance of the spectra corresponding to the square grid object of Fig. 54. The size of the circles attempts to indicate the relative strength of the diffracted orders, except for the central (undiffracted) one, which would be considerably brighter than indicated.
microscope, so we use his theory as a way of summarizing the various types of resolution limits. Abbe’s view of image formation can be described as follows: Consider a periodic object, such as a grating, illuminated by a monochromatic plane wave. An angular spectrum of plane waves is diffracted from the grating. These waves are brought to a focus at the exit pupil plane, where the spectrum (or Fourier transform) of the grating is formed (this is shown in Fig. 53, where the Fourier transform appears at the filter plane). The spectra form a series of point sources that give rise to another set of waves in the image plane. The interference of those waves produces the image. To understand this last step aided by Fig. 53, notice that the image would be formed at infinity without the help of the second lens, or, if the object was not exactly one focal length away, the image would be formed at a finite distance. The second lens is not essential to this description and in this case serves merely to form the image at a finite distance. We may generalize the situation shown in Fig. 53 if we allow the refractive indexes of the object and image space to be different (n, n ) and the magnification to be other than unity (so that the second lens may have a focal length different from the first). We borrow this result from physical optics, that the first-order diffracted beam from the grating is at an angle θ given by d sin θ = λ/n where n is the refractive index of the object (grating) space. From this equation, the grating spacing d may be expressed as d=
Figure 56. Spatial frequency filtering of the square grid object. A rectangular aperture (shown in dashed lines) allows only the vertical or horizontal orders, giving the image the appearance of a vertical or horizontal grating, respectively. Although not shown, other orientations of the aperture (e.g., 45° ) would produce a similar effect.
1101
λ (n sin θ )
(70)
The spectra formed at the plane of the aperture stop are points [or more precisely, functions depending on the aperture shape such as given by Eq. (34)] separated by a distance proportional to the focal length (F tan θ ). Now, suppose that the radius of the aperture stop is larger than this distance, so that all orders pass through. Each of the orders generates a plane wave at the image plane. The interference of the three plane waves gives rise to a sinusoidal amplitude distribution or interference fringes, just as in Young’s interference experiment (12,13). The period of this grating is given by d = λ/(n sin θ ). The ratio d /d = (n sin θ )/(n sin θ ) is equal to the geometric magnification [see Eqs. (8), (24)], as expected. The limit of resolution of a given system is a function of the aperture size, as we have already seen. In this case, the resolution limit is obtained when the diffraction angle is such that the point spectra corresponding to the ±1 orders are formed exactly at the rim of the aperture. If the diffraction angle is larger and the first-order spectra are cut off by the aperture, only the zero-order spectrum remains and gives rise to a single plane wave that results in uniform illumination of the image plane. Therefore, the image detail has been lost. Thus, if a is the maximum angle in the object space for which the first-order spectra just pass, the resolution limit of a coherent system can be expressed as λ/(n sin a). It is possible to obtain higher resolution from a coherent system if oblique illumination is used. If, on the same system shown in Fig. 53, the illumination forms an angle
1102
OPTICAL IMAGE FORMATION
a with the object plane, then the undiffracted beam will be focused at the rim of the aperture. Only one more grating order is needed to pass through the aperture to create interference fringes (and thus visible detail) at the image plane. Thus, the angle between the zero and the first order can now be 2a, corresponding to a grating of twice the frequency. Using oblique illumination, then, the resolution limit can be halved to 0.5λ/(n sin a). It is worth appreciating the difference between an amplitude and a phase object relative to this discussion. The sinusoidal grating employed in the example of this section represents a sinusoidal variation in real amplitude (and hence transparency) and has a transmittance of the form a + b cos(2π x/x0 ). As the expression shows, this type of grating has a Fourier transform that contains only the first and zeroth orders. The appearance of the negative first order in the diffraction pattern justifies the preferential use of the complex exponential form [Eq. (A6)] in writing the Fourier series, because a sinusoidal grating in that form has both a positive and negative first order. However, the Fourier (or diffracted) spectrum of a sinusoidal phase grating contains higher harmonics, as can be seen by taking the Fourier transform of the transmittance of such a grating, which has the form exp[i cos(2π x/x0 )]. A phase object is invisible to a detector such as the human eye that responds to irradiance. Mathematically, the squared modulus of the phase object is a constant; thus, it contains no spatial variation that would render it visible to the eye. If the aperture of the imaging system is such that even the highest spatial frequency present in the object is transmitted, then the corresponding image irradiance distribution is uniform, and the image is invisible. Departures from that condition, whether in the form of truncated high spatial frequencies or defocus/aberrations, can render the image visible by altering the phase relationship of the Fourier components and hence, the order of their interference at the image plane. This is also the principle of phase contrast microscopy, which places a phase mask at the Fourier plane to retard the phase of a chosen frequency band.
Table 3. Various Forms of Resolution Limit Formula
Name
d = 0.61λ/(n sin a ) Rayleigh d = 0.5λ/(n sin a ) d = λ/(n sin a)
d = 0.5λ/(n sin a)
Condition
Resolved object
Image space Two mutually incoherent points MTF limit Image space Incoherent sinusoidal grating Abbe Object space Sinusoidal grating illuminated coherently by normally incident wave — Object space Sinusoidal grating illuminated coherently and obliquely
The various types of resolution limits shown in Table 3 summarize the entire wave theory of image formation for both incoherent and coherent imaging. APPENDIX SOME CONCEPTS OF FOURIER ANALYSIS The Fourier Series for Periodic Functions According to the Fourier theorem, any periodic function that satisfies certain simple conditions (to be stated later) can be written as a summation of sinusoidal functions whose frequencies are equal to integral multiples of the frequency of the original function. Let f (x) be a periodic function of period p, that is, f (x) = f (x + np) where n an integer, then f (x) =
∞ n n a0 + an cos 2π x + bn sin 2π x , (A1) 2 p p n=1
where x2
2 a0 = p
f (x) dx,
(A2)
n f (x) cos(2π x) dx, p
(A3)
n f (x) sin(2π x) dx, p
(A4)
x1
x2
2 an = p
x1
x2
2 bn = p
x1
and x2 − x1 = p. A form of the Fourier theorem is employed when, for example, one breaks down a certain musical tone into pure tones (sinusoidal vibrations). In the example of sound, all pure tones have positive frequencies. However, in certain cases in optics, there is plausible physical meaning assigned to negative frequencies. For this reason, the preferred way of expressing Fourier’s theorem in optics (at least most of the time) is through the complex exponential notation, in which sines and cosines are substituted by complex exponentials through 1 [exp(ix) + exp(−ix)], 2 1 sin x = [exp(ix) − exp(−ix)]. 2i
cos x =
(A5)
Now, Fourier’s theorem for periodic functions may be restated as follows:
n Fn exp i2π x , f (x) = p n=−∞ +∞
1 Fn = p
x2 x1
n f (x) exp −i2π x p
(A6)
dx
(A7)
OPTICAL IMAGE FORMATION
and x2 − x1 = p. Sufficient conditions (called the Dirichlet conditions) for the validity of Eqs. (A1) and (A6) are 1. f (x) has only a finite number of discontinuities in any finite interval of x within the range x1 < x < x2 . 2. f (x) has only a finite number of extrema in any finite interval of x in the range x1 to x2 . 3. The integral of |f (x)| over p converges (is finite). The coefficients Fn are the spectrum of f (x), also called the Fourier spectrum or the Fourier coefficients, or harmonics. When we use Eq. (58) or (53)–(55) to determine the strength of the Fourier coefficients, we speak of Fourier analysis. When we add the sinusoidal (or complex exponential) terms to approximate f (x) (i.e., when using Eq. (52) or (57), we speak of Fourier synthesis. Fourier synthesis is illustrated by an example in Fig. 39. Nonperiodic Functions: Fourier Transforms The Fourier theorem can be extended to nonperiodic functions, thus allowing Fourier analysis to represent an arbitrary (ir)radiance distribution. In the general case, as p goes to infinity to represent a nonperiodic function, the variable n/p in Eq. (A6) becomes continuous, and the Fourier series becomes a Fourier integral, called the Fourier transform:
f (x) =
+∞ F(σ ) exp(i2π σ x) dσ ,
(A8)
−∞
where
+∞ F(σ ) = f (x) exp(−i2π σ x) dx.
(A9)
−∞
The reader should be aware that several different ways exist for expressing the Fourier transform: the negative sign in the exponent may appear in Eq. (A8) instead of √ Eq. (A9), a constant 1/ (2π ) may appear in front of both integrals when 2π is removed from the exponents, or a constant 1/(2π ) may appear in front of one integral and not the other. All of these ways of expressing the Fourier transform are equivalent and lead to the same physics. The equation with the minus sign is called the inverse Fourier transform, and the equation with the positive sign is called the Fourier transform. But this can be cause for confusion because it is acceptable to call either function f (x) or F(σ ) the Fourier transform of the other. In fact, the terminology ‘‘inverse Fourier transform’’ need be employed only when Fourier transformations are cascaded, in which case it helps to ensure that the correct sign is used in successive operations. Otherwise, the sign of the Fourier transform has no physical relevance. The function F(σ ) is the Fourier or frequency spectrum of f (x). Notice that if x has dimensions of length, then σ has dimensions of inverse length and is called spatial frequency, although in the theory of image formation, it pays to normalize these variables and make them
1103
dimensionless, as we have already done in equation 444 . The variable σ is analogous to the discrete variable n/p in Eqs. (A6), (A7). The two functions f (x) and F(σ ) are called a Fourier transform pair to emphasize the symmetry between them. In other words, the two functions are mathematically similar objects. Notice that there is no such symmetry between a periodic function and its Fourier spectrum, as defined by Eqs. (A6) and (A7): the function and its spectrum are mathematically different objects because the spectrum is a set of integrals, rather than a single function. These two facts pose a problem. If the Fourier transform of a periodic function were to exist, then, according to Eqs. (A8) and (A9), it would have to be a mathematical object similar to the original function (it might not be a periodic function, but it should at least be a function). That would mean that it could not be written in the form of Eq. (A7). Therefore, either the Fourier transform of a periodic function could not exist, or it would have to represent something different from the spectrum given by Eq. (A7). The full resolution of this mathematical conundrum would fill several pages (43,44) and can be only hinted at here. It is noted that this dilemma is intimately connected with the conditions for the existence of the Fourier transform, because as p → ∞, the Dirichlet conditions would exclude many simple functions, starting with the function f (x) = const, whose integral from −∞ to +∞ does not converge. However, it is worth noting that if f (x) is to represent a physical quantity such as a function of distance (an image) or time (a signal), then it cannot extend from −∞ to +∞ but must necessarily be defined within finite limits. This would mean that the function is not truly periodic; hence, the Fourier transform would be applicable, but the Fourier series would not be. Thus, if we allow the mathematics to be tempered by physical reality, the conflict disappears. Nevertheless, when a quasi-periodic function (periodic within an interval) has very many periods, it approximates a fully periodic function so well that it is economical to treat it as such. This is another way of saying that the mathematical abstraction represented by a periodic function is useful because it simplifies many concepts and calculations. Thus, it is desirable to define the Fourier transform of periodic functions, and of course, it is desirable that it should have the same meaning as the discrete Fourier spectrum, Eq. (A7). To define the Fourier transform of a periodic function, we actually need to define a new class of functions, called generalized functions (43,44), which contain ordinary functions as a subset. Within this new set of functions, the Fourier transform and its inverse always exist. However, only one generalized function need concern us here, the so-called delta function; other generalized functions of interest can be considered derived from it.The delta function is defined through the following relationships: δ(σ ) = ∞ if σ = 0, = 0 if σ = 0,
(A10)
4 The variable σ can also represent the wave number (or inverse wavelength) when the Fourier transform is used in coherence theory. Fourier transforms are ubiquitous in optics!
1104
OPTICAL IMAGE FORMATION
and
The Fourier Transform of a Periodic Function
+∞ δ(σ ) dσ = 1.
(A11)
−∞
The delta function can be given physical meaning as, for example, the idealized concept of an electrical point charge. The charge is considered concentrated at a single point (taken here as zero), but the integral of the function that represents it must equal the charge value (taken here as unity). The simplest visualization of the delta function can be obtained as a rectangle of height b and width 1/b, centered at zero. The area (integral) of the rectangle remains unity for all values of b. As b → ∞, we obtain a delta function. Now, consider the function f (x) = 1, if −a/2 < x < a/2, and zero elsewhere. The Fourier transform of this function is +∞
+a/2
f (x) exp(−i2π σ x) dx =
F(σ ) = −∞
exp(−i2π σ x) dx, −a/2
(A12) from which, after performing the integration, we obtain F(σ ) =
sin(π aσ ) . πσ
(A13)
Now, as a → ∞, f (x) becomes the unit constant function, f (x) = 1. It may also be shown that +∞ lim
a→∞ −∞
sin(π aσ ) =1 πσ
lim
sin(π aσ ) = 0 for σ = 0, = ∞ for σ = 0. πσ
(A15)
Upon comparing Eqs. (A14) and (A15) with Eqs. (A10) and (A11), we see that lim
a→∞
sin(π aσ ) = δ(σ ) πσ
(A16)
[where the operations of taking the limit and integrating are defined as meaningful only if carried out in the order indicated in Eq. (A14)]. Therefore, the functions f (x) = 1 and δ(σ ) are a Fourier transform pair: +∞ δ(σ ) exp(i2π σ x) dσ , f (x) = 1 =
f (x) = exp(i2π σ0 x)
(A19)
The Fourier transform of this function is [from Eq. (A9)] +∞ exp[i2π(σ0 − σ )x] dx F(σ ) = −∞
and by virtue of Eq. (A18), F(σ ) = δ(σ − σ0 ).
(A20)
Thus the Fourier spectrum of a simple harmonic function of frequency σ0 is a delta function centered at σ = σ0 . This is one of the reasons that the negative sign appears in the exponent of the Fourier spectrum [Eq. (A9], instead of Eq. (A8). Had this alternative way been followed, the spectrum of the harmonic function of frequency σ0 would be centered at σ = −σ0 , which is less satisfactory. There is, however, no actual physical difference between the two representations. Now, consider a general periodic function that can be expressed as a Fourier series, per Eq. (A6). Without the need for introducing new mathematics, it can be shown that the Fourier transform of that function is
n , Fn δ σ − F(σ ) = p n=−∞
+∞
(A14)
and
a→∞
Consider first the simplest periodic function, a sinusoid or simple harmonic function, written in exponential form:
(A21)
by virtue of Eqs. (A9) and (A13). Thus, the Fourier transform of a periodic function is a series of equally spaced delta functions, whose strengths are equal to Fn , located at frequencies n/p. This is the way that the Fourier spectrum of a periodic function may be described in terms of the Fourier transform. The function described by Eq. (A21) is shown schematically in Fig. 57. The Two-Dimensional Fourier Transform The definition of the Fourier transform can be readily extended to two dimensions. The two-dimensional Fourier
F (s)
(A17)
−∞
and
+∞ exp(−i2π σ x) dx. δ(σ ) =
(A18)
−∞
The function F in Eq. (A13) is encountered frequently enough to be given a name. It is called a sinc function [see also Eq. (48)].
−4
−2
0
2
4
s (arbitrary units)
Figure 57. Schematic of the Fourier transform of a periodic function. Each arrow represents a delta function, and the corresponding harmonic strength is indicated by the height of the arrow.
OPTICAL IMAGE FORMATION
ABBREVIATIONS AND ACRONYMS
transform may be written as +∞ +∞ f (x, y) = F(σ, τ ) exp[i2π(σ x + τ y)] dσ dτ ,
(A22)
−∞ −∞
where +∞ +∞ f (x, y) exp[−i2π(σ x + τ y)] dx dy F(σ, τ ) =
(A23)
−∞ −∞
Similar extension can apply to functions of three or more variables, which, however, are not normally encountered in the theory of image formation.
We give these theorems in one-dimensional form for simplicity of notation, but the extension to two dimensions is straightforward. Proofs can be found in (43–45). Parseval’s Theorem. If f (x) and F(σ ) are a Fourier transform pair, then, +∞ +∞ |F(σ )|2 dσ = |f (x)|2 dx.
(A24)
−∞
This theorem expresses, in general, the conservation of energy (or irradiance). Thus, it ensures, for example, that the integral of the PSF always converges for a system that has a finite pupil size.
Convolution. A function A (u ) is said to be the convolution of two functions A(u) and F(u) if A (u ) =
+∞ −∞
A(u)F(u − u) du.
(A25)
If a (x), a(x), and f (x) are the Fourier transforms of A , A, and F, respectively, then, a (x) = a(x)f (x).
(A26)
This theorem can be extended to an arbitrary number of functions (repeated convolution). Autocorrelation. The autocorrelation function of F(x) is defined as +∞ F ∗ (x)F(x + ξ ) dx, (A27) C(ξ ) = −∞
where ∗ denotes complex conjugate. The (inverse) Fourier transform of C(ξ ) is given by c(u) = |f (u)|2 , where f (u) is the inverse Fourier transform of F(x).
CTF EM ESF FT LSF MTF OTF PMR PPR PSF PTF
coherent transfer function electromagnetic edge spread function fourier transform line spread function modulation transfer function optical transfer function paraxial marginal ray paraxial pupil ray point spread function phase transfer function
BIBLIOGRAPHY 1. D. S. Goodman, in M. Bass, ed., Handbook of Optics, vol. I, McGraw-Hill, 1995.
Some Important Definitions and Theorems
−∞
1105
2. P. Mouroulis and J. Macdonald, Geometrical Optics and Optical Design, Oxford University Press, 1997. 3. W. J. Smith, Modern Optical Engineering, McGraw-Hill, 1990. 4. L. Levi, Applied Optics: A Guide to Optical System Design, Wiley, 1968. 5. R. Kingslake, Optical System Design, Academic Press, 1983. 6. D. C. O’Shea, Elements of Modern Optical Design, Wiley, 1985. 7. W. J. Smith, in M. Bass, ed., Handbook of Optics, vol. I, McGraw-Hill, 1995. 8. R. S. Longhurst, Geometrical and Physical Optics, Longman, UK, 1967. 9. W. T. Welford, Aberrations of Optical Systems, Adam Hilger, 1986. 10. V. N. Mahajan, Aberration Theory Made Simple, SPIE Press, Bellingham, WA, 1991. 11. R. Guenther, Modern Optics, Wiley, 1990. 12. M. Born and E Wolf, Principles of Optics, 7th ed., Cambridge University Press, 1999. 13. E. Hecht, Optics, Addison-Wesley, 1987. 14. G. H. Spencer and M. V. R. K. Murty, J. Opt. Soc. Am. 52, 672–678 (1962). 15. H. H. Hopkins, The Wave Theory of Aberrations, Oxford University Press, 1950. 16. V. N. Mahajan, Optical Imaging and Aberrations Part 1: Ray Geometrical Optics, SPIE Press, 1999. 17. J. W. Goodman, Introduction to Fourier Optics, 2nd ed., McGraw-Hill, 1996. 18. J. D. Gaskill, Linear Systems, Fourier Transforms, and Optics, Wiley, 1978. 19. H. H. Hopkins, in R. Shannon and J. Wyant, eds., Applied Optics and Optical Engineering I, vol. IX, Academic Press, 1983. 20. G. O. Reynolds, J. B. DeVelis, G. B. Parrent Jr., and B. J. Thompson, The New Physical Optics Notebook, SPIE Press, Bellingham, WA, 1989. 21. V. N. Mahajan, J. Opt. Soc. Am. 72, 1,258–1,266 (1982). V. N. Mahajan, J. Opt. Soc. Am. A 3, 470–485 (1986). 22. B. R. A. Nijboer, Physica 10, 679–692 (1943).
(A28)
23. B. R. A. Nijboer, Physica 13, 605–620 (1947). 24. K. Nienhuis and B. R. A. Nijboer, Physica 14, 590–603 (1949).
1106
OPTICAL MICROSCOPY
25. M. Cagnet, M. Francon, and J. C. Thrierr, Atlas of Optical Phenomena, Springer-Verlag, 1962. 26. K Strehl, Zeitschrift f. Instrumentenkunde 22, 213–217 (1902). 27. See sect. 8.3 in ref. 10, and also V. N. Mahajan, J. Opt. Soc. Am. 73, 860–861 (1983). P. Mouroulis, in P. Mouroulis, ed., Visual Instrumentation: Optical Design and Engineering Principles, McGraw-Hill, 1999. 28. A. Mar´echal, Revue d’Optique 26, 257–277 (1947). 29. W. B. King, J. Opt. Soc. Am. 58, 655–661 (1968). 30. P. Mouroulis and H. Zhang, J. Opt. Soc. Am. A 9, 34–42 (1992). 31. V. N. Mahajan, Appl. Opt. 17, 964–968 (1978). 32. H. H. Hopkins and B. Zalar, J. Mod. Opt. 34, 371–406 (1987). 33. J. Macdonald, Optica Acta 18, 269–290 (1971). 34. H. H. Hopkins, Proc. R. Soc. A 231, 91–103 (1955). 35. H. H. Hopkins, K. J. Habell, ed., Proceedings of the Conference on Optical Instruments and Techniques, Chapman and Hall, London, 1961, pp. 480–514. 36. H. H. Hopkins, Optica Acta 31, 345–368 (1984). 37. A. P. Shaw, Optica Acta 33, 1,389–1,396 (1986). 38. T. L. Williams, The Optical Transfer Function of Imaging Systems, Institute of Physics, 1999. 39. J. P. Mills and B. J. Thompson, J. Opt. Soc. Am. A 3, 694–716 (1986). 40. H. H. Hopkins, Photogr. Sci. Eng. 21, 114–123 (1977). 41. A. B. Porter, Philos. Mag. 11(6), 154 (1906). 42. E. Abbe, Archiv. Mikroskopische Anat. 9, 413–468 (1873). 43. M. J. Lighthill, Introduction to Fourier Analysis and Generalized Functions, Cambridge University Press, 1958. 44. P. Dennery and A. Krzywicki, Mathematics for Physicists, Harper and Row, 1967. 45. R. Bracewell, The Fourier Transform and Its Applications, 2nd ed., McGraw-Hill, 1965.
OPTICAL MICROSCOPY MICHAEL W. DAVIDSON The Florida State University Tallahassee, FL
MORTIMER ABRAMOWITZ Olympus America, Inc. Melville, NY
INTRODUCTION The past decade has witnessed enormous growth in the application of optical microscopy to micron and submicron level investigations in a wide variety of disciplines (1–5). Rapid development of new fluorescent labels has accelerated the expansion of fluorescence microscopy in laboratory applications and research (6–8). Advances in digital imaging and analysis have also enabled microscopists to acquire quantitative measurements quickly and efficiently on specimens that range from photosensitive caged compounds and synthetic ceramic superconductors to realtime fluorescence microscopy of living cells in their natural environment (2,9). Optical microscopy, helped by digital video, can also be used to image very thin optical sections,
and confocal optical systems are now operating at most major research institutions (10–12). Early microscopists were hampered by optical aberration, blurred images, and poor lens design, which restricted high-resolution observations until the nineteenth century. Aberrations were partially corrected by the mid-nineteenth century by the introduction of Lister and Amici achromatic objectives that reduced chromatic aberration and raised numerical apertures to around 0.65 for dry objectives and up to 1.25 for homogeneous immersion objectives (13). In 1886, Ernst Abbe’s work with Carl Zeiss led to the production of apochromatic objectives based for the first time on sound optical principles and lens design (14). These advanced objectives provided images that had reduced spherical aberration and were free of color distortions (chromatic aberration) at high numerical apertures. Several years later, in 1893, Professor August K¨ohler reported a method of illumination, which he developed to optimize photomicrography, that allowed microscopists to take full advantage of the resolving power of Abbe’s objectives. The last decade of the nineteenth century saw innovations in optical microscopy, including metallographic microscopes, anastigmatic photolenses, binocular microscopes that had image-erecting prisms, and the first stereomicroscope (14). Early in the twentieth century, microscope manufacturers began parfocalizing objectives, that allowed the image to remain in focus when the microscopist exchanged objectives on the rotating nosepiece. In 1924, Zeiss introduced a LeChatelier-style metallograph that had infinity-corrected optics, but this method of correction was not used widely for another 60 years. Shortly before World War II, Zeiss created several prototype phase contrast microscopes based on optical principles advanced by Frits Zernike. Several years later, the same microscopes were modified to produce the first time-lapse cinematography of cell division photographed with phase-contrast optics (14). This contrast-enhancing technique did not become universally recognized until the 1950s and is still a method of choice for many cell biologists today. Physicist Georges Nomarski introduced improvements in Wollaston prism design for another powerful contrastgenerating microscopy theory in 1955 (15). This technique is commonly referred to as Nomarski interference or differential interference contrast (DIC) microscopy and, along with phase contrast, has allowed scientists to explore many new arenas in biology using living cells or unstained tissues. Robert Hoffman (16) introduced another method of increasing contrast in living material by taking advantage of phase gradients near cell membranes. This technique, now termed Hoffman Modulation Contrast, is available as optional equipment on most modern microscopes. The majority of microscopes manufactured around the world had fixed mechanical tube lengths (ranging from 160–210 mm) until the late 1980s, when manufacturers largely changed over to infinity-corrected optics. Ray paths through both finite-tube-length and infinity-corrected microscopes are illustrated in Fig. 1. The upper portion
OPTICAL MICROSCOPY Finite tube−length microscope ray paths
(a)
Eyepiece L ey Object O
Objective L ob
Intermediate image O'
h z
Eye O''
h' f a
(b) Objective L ob
fb
z'
b Infinity−corrected microscope ray paths Tube lens L tb
Object O
Eyepiece L ey Intermediate image O'
Eye O''
h h' Parallel light beam ''infinity space''
Figure 1. Optical trains of finite-tube and infinity-corrected microscope systems. (a) Ray traces of the optical train representing a theoretical finite-tube-length microscope. The object O is a distance a from the objective Lob and projects an intermediate image O’ at the finite tube length b, which is further magnified by the eyepiece Ley and then projected onto the retina at O’’. (b) Ray traces of the optical train that represents a theoretical infinity-corrected microscope system.
of the figure contains the essential optical elements and ray traces that define the optical train of a conventional finite-tube-length microscope (17). An object O of height h is being imaged on the retina of the eye at O’’. The objective lens Lob projects a real and inverted image of O magnified to the size O’ into the intermediate image plane of the microscope. This occurs at the eyepiece diaphragm, at the fixed distance fb + z behind the objective. In this diagram, fb represents the back focal length of the objective and z is the optical tube length of the microscope. The aerial intermediate image at O’ is further magnified by the microscope eyepiece Ley ) and produces an erect image of the object at O’’ on the retina, which appears inverted to the microscopist. The magnification factor of the object is calculated by considering the distance a between the object O and the objective Lob , and the front focal length f of the objective lens. The object is placed a short distance z outside of the objective’s front focal length f , such that z + f = a. The intermediate image of the object O’ is located at distance b, which equals the back focal length of the objective fb plus z , the optical tube length of the microscope. Magnification of the object at the intermediate image plane equals h . The image height at this position is derived by multiplying the microscope tube length b by the object height h, and dividing this by the distance of the object from the objective: h = (h × b)/a. From this argument, we can conclude that the lateral
1107
or transverse magnification of the objective is equal to a factor of b/a (also equal to f /z and z /fb, the back focal length of the objective divided by the distance of the object from the objective. The image at the intermediate plane h is further magnified by a factor of 25 centimeters (called the near distance to the eye) divided by the focal length of the eyepiece. Thus, the total magnification of the microscope is equal to the magnification by the objective multiplied by that of the eyepiece. The visual image (virtual) appears to the observer as if it were 10 inches away from the eye. Most objectives are corrected to work within a narrow range of image distances, and many are designed to work only in specifically corrected optical systems that have matching eyepieces. The magnification inscribed on the objective barrel is defined for the tube length of the microscope for which the objective was designed. The lower portion of Fig. 1 illustrates the optical train that uses ray traces of an infinity-corrected microscope system. The components of this system are labeled similarly to the finite-tube-length system for easy comparison. Here, the magnification of the objective is the ratio h /h, which is determined by the tube lens Ltb . Note the infinity space that is defined by parallel light beams in every azimuth between the objective and the tube lens. This is the space used by microscope manufacturers to add accessories such as vertical illuminators, DIC prisms, polarizers, and retardation plates, whose designs are much simpler and have little distortion of the image (18). The magnification of the objective in the infinity-corrected system equals the focal length of the tube lens divided by the focal length of the objective. FUNDAMENTALS OF IMAGE FORMATION When light from the microscope lamp in the optical microscope, passes through the condenser and then through the specimen (assuming that the specimen is light absorbing), some of the light passes undisturbed in its path around and through the specimen. Such light is called direct light or undeviated light. The background light (often called the surround) that passes around the specimen is also undeviated light. Some of the light that passes through the specimen is deviated when it encounters parts of the specimen. Such deviated light (as you will subsequently learn, called diffracted light) is rendered one-half wavelength or 180° out of phase with the direct light that has passed through undeviated. The one-half wavelength out of phase, caused by the specimen itself, enables this light to cause destructive interference with the direct light when both arrive at the intermediate image plane at the fixed diaphragm of the eyepiece. The eye lens of the eyepiece further magnifies this image which finally is projected onto the retina, the film plane of a camera, or the surface of a light-sensitive computer chip. What has happened is that the direct or undeviated light is projected by the objective and spread evenly across the entire image plane at the diaphragm of the eyepiece. The light diffracted by the specimen is brought to focus at various localized places on the same image plane,
1108
OPTICAL MICROSCOPY
Figure 2. Diffraction spectra seen at the rear focal plane of the objective through a focusing telescope when imaging a closely spaced line grating. (a) Image of the condenser aperture diaphragm whose stage’s empty. (b) Two diffraction spectra from a 10X objective when a finely ruled line grating is placed on the microscope stage. (c) Diffraction spectra of the line grating from a 40X objective. (d) Diffraction spectra of the line grating from a 60X objective.
Line grating diffraction patterns (a)
where the diffracted light causes destructive interference, reduces intensity, and results in more or less dark areas. These patterns of light and dark are what we recognize as an image of the specimen. Because our eyes are sensitive to variations in brightness, the image becomes a more or less faithful reconstitution of the original specimen. To help understand the basic principles, it is suggested that readers try the following exercise and use an object of known structure, such as a stage micrometer or similar grating of closely spaced dark lines as a specimen. To proceed, place the finely ruled grating on the microscope stage and bring it into focus using first a 10X and then the 40X objective (18). Remove the eyepiece and, insert a phase telescope in its place, so that the rear focal plane of the objective can be observed. If the condenser aperture diaphragm is closed most of the way, a bright white central spot of light will appear at the back of the objective, which is the image of the aperture diaphragm. To the right and left of the central spot, will be a series of spectra (also images of the aperture diaphragm), each colored blue on the part closest to the central spot and colored red on the part of the spectrum farthest from the central bright spot (as illustrated in Fig. 2). The intensity of these colored spectra decreases according to the distance from the central spot to the spectrum (17,18). Those spectra nearer the periphery of the objective are dimmer than those closer to the central spot. The diffraction spectra illustrated in Fig. 2 use three different magnifications. In Fig. 2b, the diffraction pattern visible at the rear focal plane of the 10X objective contains two diffraction spectra. If the grating is removed from the stage, as illustrated in Fig. 2a, these spectra disappear, and only the central image of the aperture diaphragm remains. If the grating is reinserted, the spectra reappear once again. Note that the spaces between the colored spectra appear dark. Only a single pair of spectra can be observed if the grating is examined with the 10X objective. In this case, one diffraction spot appears to the left, and one appears to the right of the central aperture opening. If the line grating is examined with a 40X objective, as shown in Fig. 2c, several diffraction spectra appear to the left and right of the central aperture. When the magnification is increased to 60X (and assuming that it has a higher numerical aperture than the 40X objective), additional spectra (Fig. 2d) appear to the right and left than are visible by using the 40X objective. Because the colored spectra disappear when the grating is removed, it can be assumed that the specimen itself affects the light passing through, thus producing the
(b)
(c)
(d)
colored spectra. Further, if the aperture diaphragm is closed down, we will observe that objectives of higher numerical aperture grasp more of these colored spectra than objectives of lower numerical aperture. The crucial importance of these two statements for understanding image formation will become clear in the ensuing paragraphs. The central spot of light (image of the condenser aperture diaphragm) represents the direct or undeviated light that passes undisturbed through or around the specimen as illustrated in Fig. 3b. It is called the zeroth order. The fainter images of the aperture diaphragm on each side of the zeroth order are called the first, second, third, fourth, etc. orders, respectively, as represented by the simulated diffraction pattern in Fig. 3a that would be observed at the rear focal plane of a 40X objective. All of
(a)
(b)
Undeviated light
Diffracted light
Figure 3. Diffraction spectra generated at the rear focal plane of the objective by undeviated and diffracted light. (a) Spectra visible through a focusing telescope at the rear focal plane of a 40X objective. (b) Schematic diagram of light both diffracted and undeviated by a line grating on the microscope stage.
OPTICAL MICROSCOPY
1109
Slit and grid diffraction patterns (a)
(d)
(b)
(e)
Figure 4. Diffraction patterns generated by narrow and wide slits and by complex grids. (a) Orthoscopic image of the grid seen at the rear focal plane of the objective when focused on the wide slit pattern in (b). (b) Conoscopic image of the grid that has a greater slit width at the top and a lesser width at the bottom. (c) Orthoscopic image of the narrow width portion of the grid [lower portion of (b)]. (d) and (f) Conoscopic images of grid lines arranged in a square pattern (d) and a hexagonal pattern (f). (e) and (g) Orthoscopic images of patterns in (d) and (f), respectively.
(c)
(f)
the captured orders in this case, represent, the diffraction pattern of the line grating seen at the rear focal plane of the objective (18). The fainter diffracted images of the aperture diaphragm are caused by light deviated or diffracted, spread out in fan shape, at each of the openings of the line grating (Fig. 3b). The blue wavelengths are diffracted at a lesser angle than the green wavelengths, which are diffracted at a lesser angle than the red wavelengths. The blue wavelengths from each slit at the rear focal plane of the objective, interfere constructively to produce the blue area of the diffracted image of each spectrum or order; similarly for the red and green areas (Fig. 3a). Where the diffracted wavelengths are one-half wave out of phase for each of these colors, the waves destructively interfere. Hence, the dark areas between the spectra or orders. All wavelengths from each slit add constructively at the position of the zeroth order. This produces the bright white light you see as the zeroth order at the center of the rear focal plane of the objective (Figs. 2, 3, and 4). The closer the spacing of a line grating, the fewer the spectra that will be captured by a given objective, as illustrated in Fig. 4a–c. The diffraction pattern illustrated in Fig. 4a was captured by a 40X objective imaging the lower portion the line grating in Fig. 4b, where the slits are closer together (17,18). In Fig. 4c, the objective is focused on the upper portion of the line grating (Fig. 4b) where the slits are farther apart, and more spectra are captured by the objective. The direct light and the light from the diffracted orders continue on, focused by the objective, to the intermediate image plane at the fixed diaphragm of the eyepiece. Here, the direct and diffracted light rays interfere and are thus reconstituted into the real, inverted image that is seen by the eye lens of the eyepiece and further magnified. This is illustrated in Fig. 4d–g by two types of diffraction gratings. The square grid illustrated in Fig. 4d represents the orthoscopic image of the grid (i.e., the usual specimen image) as seen through the full aperture of the objective. The diffraction pattern derived from this grid is shown as a conoscopic image that would be seen at the rear focal plane of the objective (Fig. 4e). Likewise, the orthoscopic image of a hexagonally arranged grid (Fig. 4f) produces a corresponding hexagonally arranged conoscopic image of first-order diffraction patterns (Fig. 4g).
(g)
Microscope specimens can be considered complex gratings that have details and openings of various sizes. This concept of image formation was largely developed by Ernst Abbe, the famous German microscopist and optics theoretician of the nineteenth century. According to Abbe (his theories are widely accepted now), the details of a specimen will be resolved if the objective captures the zeroth order of the light and at least the first order (or any two adjacent orders, for that matter). The greater the number of diffracted orders that gain admittance to the objective, the more accurately the image represents the original object (2,14,17,18). Further, if a medium of refractive index higher than that of air (such as immersion oil) is used in the space between the front lens of the objective and the top of the coverslip, as shown in Fig. 5a, the angle of the diffracted orders is reduced, and the fans of diffracted light are compressed. As a result, an oil immersion objective can
(a)
(b)
Figure 5. Effect of refractive index of imaging medium on diffracted orders captured by the objective. (a) Orthoscopic image of objective back focal plane diffraction spectra when air is the medium between the coverslip and the objective front lens. (b) Diffraction spectra when immersion oil of refractive index similar to glass is used in the space between the coverslip and the objective front lens.
1110
OPTICAL MICROSCOPY
capture more diffracted orders and yield better resolution than a dry objective (Fig. 5b). Moreover, because blue light is diffracted at a lesser angle than green or red light, a lens of a given aperture may capture more orders of light when the wavelengths are in the blue region of the visible light spectrum. These two principles explain the classic Rayleigh equation often cited for resolution (2,18–20): d = 1.22(λ/2NA)
Airy disks and resolution (a)
(b)
(c)
(1)
where d is the space between two adjacent particles (that still allows perceiving the particles as separate), λ is the wavelength of illumination, and NA is the numerical aperture of the objective. The greater the number of higher diffracted orders admitted into the objective, the smaller the details of the specimen that can be clearly separated (resolved). Hence, the value of using high numerical aperture for such specimens. Likewise, the shorter the wavelength of visible light used, the better the resolution. These ideas explain why apochromatic lenses of high numerical aperture can separate extremely small details in blue light. Placing an opaque mask at the back of the objective blocks the outermost diffracted orders. This either reduces the resolution of the grating lines, or any other object details, or it destroys the resolution altogether, so that the specimen is not visible. Hence, the usual caution not to close down the condenser aperture diaphragm below the suggested two-thirds to nine-tenths of the objective’s aperture. Failure of the objective to grasp any of the diffracted orders results in an unresolved image. In a specimen that has very minute details, the diffraction fans are spread at a very large angle and require a high numerical aperture objective to capture them. Likewise, because the diffraction fans are compressed in immersion oil or in water, objectives designed for such use can resolve better than dry objectives. If alternate diffracted orders are blocked out (still assuming the grating as our specimen), the number of lines in the grating appear doubled (a spurious resolution). The important caveat is that actions introduced at the rear of the objective can have a significant effect on the eventual image produced (18). For small details in a specimen (rather than a grating), the objective projects the direct and diffracted light onto the image plane of the eyepiece diaphragm as small, circular diffraction disks known as Airy disks (illustrated in Fig. 6). High numerical aperture objectives capture more of the diffracted orders and produce smaller disks than low numerical aperture objectives. In Fig. 6, Airy disk size is shown steadily decreasing from Fig. 6a through Fig. 6c. The larger disk sizes in Figs. 6a, b are produced by objectives that have lower numerical apertures, whereas the very sharp Airy disk in Fig. 6c is produced by an objective of very high numerical aperture (2,18). The resulting image at the eyepiece diaphragm level is actually a mosaic of Airy disks that are perceived as light and dark regions of the specimen. When two disks are so close together that their central black spots overlap considerably, the two details represented by these
(d)
(e)
Figure 6. Airy disks and resolution. (a–c) Airy disk size as a function of objective numerical aperture, which decreases from (a) to (c) as numerical aperture increases. (d) Two Airy disks so close together that their ring structures overlap. (e) Airy disks at the limit of resolution.
overlapping disks are not resolved or separated and thus appear as one (illustrated in Fig. 6c). The Airy disks shown in Fig. 6d are just far enough apart to be resolved. The basic principle to remember is that the combination of direct and diffracted light (or the manipulation of direct or diffracted light) is critically important in image formation. The key places for such manipulation are the rear focal plane of the objective and the front focal plane of the substage condenser. This principle is fundamental to most of the contrast improvement methods in optical microscopy (18; and see the section on Contrast-enhancing techniques); it is of particular importance at high magnification of small details close in size to the wavelength of light. Abbe was a pioneer in developing these concepts to explain image formation of light-absorbing or amplitude specimens (2,18–20). ¨ Kohler Illumination Proper illumination of the specimen is crucial in achieving high-quality images in microscopy and critical photomicrography. An advanced procedure for microscope illumination was first introduced in 1893 by August
OPTICAL MICROSCOPY
K¨ohler of the Carl Zeiss corporation to provide optimum specimen illumination. All manufacturers of modern laboratory microscopes recommend this technique because it produces specimen illumination that is uniformly bright and free from glare and thus allows the user to realize the microscope’s full potential. Most modern microscopes are designed so that the collector lens and other optical components built into the base project an enlarged and focused image of the lamp filament onto the plane of the aperture diaphragm of a properly positioned substage condenser. Closing or opening the condenser diaphragm controls the angle of the light rays that emerge from the condenser and reach the specimen from all azimuths. Because the light source is not focused at the level of the specimen, illumination at specimen level is essentially grainless and extended and does not suffer deterioration from dust and imperfections on the glass surfaces of the condenser. The opening size of the condenser aperture diaphragm, along with the aperture of the objective, determines the realized numerical aperture of the microscope system. As the condenser diaphragm is opened, the working numerical aperture of the microscope increases and results in greater light transmittance and resolving power. Parallel light rays that pass through and illuminate the specimen are brought to focus at the rear focal plane of the objective, where the image of the variable condenser aperture diaphragm and the light source are observed in focus simultaneously. Light pathways illustrated in Fig. 7 are schematically drawn to represent separate paths taken by the
1111
specimen-illuminating light rays and the image-forming light rays (17). This is not a true representation of any real segregation of these pathways, but it is a diagrammatic representation presented for visualization and discussion. The left-hand diagram in Fig. 7 demonstrates that the ray paths of illuminating light produce a focused image of the lamp filament at the plane of the substage condenser aperture diaphragm, the rear focal plane of the objective, and the eye point (also called the Ramsden disk) above the eyepiece. These areas in common focus are often referred to as conjugate planes, a principle that is critical in understanding the concept of K¨ohler illumination (2,17–21). By definition, an object that is in focus at one plane is also in focus at other conjugate planes of that light path. Four separate planes in each light pathway (both image forming and illumination), together make up a conjugate plane set. Conjugate planes in the path of the illuminating light rays in K¨ohler illumination (left-hand diagram in Fig. 7) include the lamp filament, condenser aperture diaphragm (at the front focal plane of the condenser), the rear focal plane of the objective, and the eye point of the eyepiece. The eye point is located approximately one-half inch (1 cm) above the top lens of the eyepiece, at the point where the observer places the front of the eye during observation. Likewise, the conjugate planes in the image-forming light path in K¨ohler illumination (right-hand diagram in Fig. 7) include the field diaphragm, the focused specimen, the intermediate image plane (i.e., the plane of the fixed diaphragm of the eyepiece), and the retina of the eye or the film plane of the camera. The presence of conjugate focal planes is often useful in troubleshooting a microscope
Light paths in köhler illumination Illuminating light path
Image-forming light path
Retina Exit pupil of microscope (Ramdsen disk)
Eye
Eyepiece Field diaphragm of eyepiece Back focal plane of Objective
Primary image
Aperture diaphragm of objective objective Object
Illuminating aperture diaphragm
Illuminated field diaphragm Lamp
Condenser
Lamp collector Lamp filament
Illuminated field diaphragm
Figure 7. Light paths in K¨ohler illumination. The illuminating ray paths are illustrated on the left side and the image-forming ray paths on the right. Light emitted from the lamp passes through a collector lens and then through the field diaphragm. The aperture diaphragm in the condenser determines the size and shape of the illumination cone on the specimen plane. After passing through the specimen, light is focused on the back focal plane of the objective and then enters and is magnified by the ocular before passing into the eye.
1112
OPTICAL MICROSCOPY
for contaminating dust, fibers, and imperfections in the optical elements. When such artifacts are in sharp focus, it follows that they must reside on or near a surface that is part of the imaging-forming set of conjugate planes. Members of this set include the glass element at the microscope light port, the specimen, and the graticule (if any) in the eyepiece. Alternatively, if these contaminants are out of focus, then they occur near the illuminating set of elements that share conjugate planes. Suspects in this category are the condenser top lens (where dust and dirt often accumulate), the exposed eyepiece lens element (contaminants from eyelashes), and the objective front lens (usually fingerprint smudges). In K¨ohler illumination, light emitted from the tungstenhalide lamp filament passes first through a collector lens located close to the lamp housing and then through a field lens that is near the field diaphragm. A sintered or frosted glass filter is often placed between the lamp and the collector lens to diffuse the light and ensure even intensity of illumination. In this case, the image of the lamp filament is focused onto the front focal plane of the condenser, while the diffuser glass is temporarily removed from the light path. The focal length of the collector lens must be carefully matched to the lamp filament dimensions to ensure that a filament image of the appropriate size is projected into the condenser aperture. For proper K¨ohler illumination, the image of the filament should completely fill the condenser aperture. The field lens is responsible for bringing the image of the filament into focus at the plane of the substage condenser aperture diaphragm. A first surface mirror (positioned at a 45° angle to the light path) reflects focused light that leaves the field lens through the field diaphragm and into the substage condenser. The field diaphragm iris opening is a virtual light source for the microscope, and its image is focused by the condenser (raised or lowered) onto the specimen plane. Optical designs for the arrangement of these elements may vary by microscope manufacturer, but the field diaphragm should be positioned at a sufficient distance from the field lens to eliminate dust and lens imperfections from imaging in the plane of the specimen. The field diaphragm in the base of the microscope controls only the width of the bundle of light rays that reaches the condenser — it does not affect the optical resolution, numerical aperture, or the intensity of illumination. Proper adjustment of the field diaphragm (i.e., focused by adjusting the height of the condenser and centered in the optical path, then opened to lie just outside the field of view) is important for preventing glare, which can reduce contrast in the observed image. Elimination of unwanted light is particularly important when attempting to image specimens that have inherently low contrast. When the field diaphragm is opened too far, scattered light originating from the specimen and light reflected at oblique angles from optical surfaces can act to degrade image quality. The substage condenser is typically mounted directly beneath the microscope stage in a bracket that can be raised or lowered independently of the stage. The aperture diaphragm opening size is controlled by a swinging arm, a lever, or by rotating a collar on the
condenser housing. The most critical aspect of achieving proper K¨ohler illumination is correct adjustment of the substage condenser. Condenser misalignment and an improperly adjusted condenser aperture diaphragm are the main sources of image degradation and poor quality photomicrography (19). When properly adjusted, light from the condenser fills the rear focal plane of the objective and projects a cone of light into the field of view. The condenser aperture diaphragm is responsible for controlling the angle of the illuminating light cone and, consequently, the working numerical aperture of the condenser. With respect to the size and shape of condenser light cones, it is important to note, that reducing the size of the field diaphragm only slightly decreases the size of the lower portions of the light cone. The angle and numerical aperture of the light cone remain essentially unchanged as field diaphragm size is reduced (21). Illumination intensity should not be controlled by opening and closing the condenser aperture diaphragm or by shifting the condenser up and down or axially with respect to the optical center of the microscope. It should be controlled only by using neutral density filters placed into the light path or by reducing voltage to the lamp (although the latter is not usually recommended, especially for photomicrography). To ensure maximum performance of the tungsten-halide lamp, refer to the manufacturer’s instrument manual to determine the optimum lamp voltage (usually 5–10 volts) and use that setting. Then, adding or removing neutral density filters can easily control the brightness of the illumination without affecting color temperature. The size of the substage condenser aperture diaphragm opening should coincide with the desired numerical aperture, and the quality of the resulting image should also be considered. In general, the diaphragm should be set to a position that allows two-thirds to nine-tenths (60 to 90%) of the entire light disk size (visible at the rear focal plane of the objective after removing of the eyepiece or using a phase telescope or Bertrand lens). These values may vary due to extremes in specimen contrast. The condenser aperture diaphragm should be set to an opening size that provides a compromise of resolution and contrast that depends, to a large degree, on the absorption, diffraction, and refraction characteristics of the specimen. This adjustment must be accomplished without overwhelming the image with artifacts that obscure detail and present erroneous enhancement of contrast. The amount of image detail and contrast necessary to produce the best photomicrograph also depends on refractive index, optical characteristics, and other specimen-dependent parameters. When the aperture diaphragm is erroneously closed too far, resulting diffraction artifacts cause visible fringes, banding, and/or pattern formation in visual images and photomicrographs. Other problems, such as refraction phenomena, can also produce apparent structures in the image that are not real (21). Alternatively, opening the condenser aperture too wide causes unwanted glare and light scattering from the specimen and optical surfaces within the microscope and leads to a significant loss of contrast and washing out of image detail. The correct
OPTICAL MICROSCOPY
setting will vary from specimen to specimen, and the experienced microscopist will soon learn to adjust the condenser aperture diaphragm (and numerical aperture of the system) accurately by observing the image without necessarily viewing the diaphragm in the rear focal plane of the objective. In fact, many microscopists (including the authors) believe that critical adjustment of the numerical aperture of the microscope system to optimize image quality is the single most important step in photomicrography. When the illumination system of the microscope is adjusted for proper K¨ohler illumination, it must satisfy several requirements. The illuminated area of the specimen plane must be no larger than the field of view for any given objective/eyepiece combination. The light must also have uniform intensity, and the numerical aperture may vary from a maximum (equal to that of the objective) to a lesser value that depends on the optical characteristics of the specimen. Table 1 contains a list of objective numerical apertures versus the field of view diameter (for an eyepiece of field number 22 with no tube lens present — see discussion on field number) for each objective, from very low to very high magnifications. Many microscopes are equipped with specialized substage condensers that have a swing-out top lens, which can be removed from the optical path for use with lower power objectives (2–5X). This action changes the performance of the remaining components in the light path, and some adjustment is necessary to achieve the best illumination conditions. The field diaphragm can no longer be used for aligning and centering the substage condenser and is now ineffective in limiting the area of the specimen under illumination. Much of the unwanted glare once removed by the field diaphragm is also reduced because the top lens of the condenser produces a light cone that has a much lower numerical aperture and allows light rays to pass through the specimen at much lower angles. Most important, the optical conditions for K¨ohler illumination no longer apply. For low power objectives (2–5X), aligning of the microscope optical components and establishing of K¨ohler illumination conditions should always be undertaken
Table 1. View-field Diameters (FN 22) (SWF 10X Eyepiece)a Objective Magnification 1/2X 1X 2X 4X 10X 20X 40X 50X 60X 100X 150X 250X a
Source: Nikon.
Diameter (mm) 44.0 22.0 11.0 5.5 2.2 1.1 0.55 0.44 0.37 0.22 0.15 0.088
1113
at a higher (10X) magnification before removing the swing-out condenser lens for work at lower (5X and less) magnifications. Then, the height of the condenser should not be changed. Condenser performance is radically changed when the swing-out lens is removed (18,21). The image of the lamp filament is no longer formed in the aperture diaphragm, which ceases to control the numerical aperture of the condenser and the illumination system. In fact, the aperture diaphragm should be opened completely to avoid vignetting, a gradual fading of light at the edges of the view field. Contrast adjustment in low magnification microscopy is achieved by adjusting of the field diaphragm (18,19,21). When the field diaphragm is wide open (more than 80%), specimen details are washed out, and a significant amount of scattering and glare is present. Closing the field diaphragm to a position between 50 and 80% yields the best compromise in specimen contrast and depth of field. This adjustment is now visible at the rear focal plane of the objective when the eyepiece is removed or when a phase telescope or Bertrand lens is inserted into the eye tube. Objectives designed for low magnification have significantly simpler designs than their higher magnification counterparts. This is due to the smaller angles of illuminating light cones produced by low magnification condensers, which require objectives of lower numerical aperture. Measurement graticules, which must be sharply focused and simultaneously superimposed on the specimen image, can be inserted into any of several conjugate planes in the image-forming path. The most common eyepiece (ocular) measuring and photomicrography graticules are placed in the intermediate image plane, which is positioned at the fixed diaphragm within the eyepiece. It is also theoretically possible to place graticules in any image-forming conjugate plane or in the plane of the illuminated field diaphragm. Stage micrometers are specialized graticules placed on microslides, which are used to calibrate eyepiece graticules and to make specimen measurements. Color and neutral density filters are often placed in the optical pathway to reduce light intensity or alter the color characteristics of the illumination. There are several locations within the microscope stand where these filters are usually placed. Some modern laboratory microscopes have a filter holder sandwiched between the lamp housing and collector lens, which is an ideal location for these filters. Often, neutral density filters along with color correction filters and a frosted diffusion filter are placed together in this filter holder. Other microscope designs provide a set of filters built internally into the body, which can be toggled into the light path by levers. A third common location for filters is a holder mounted on the bottom of the substage condenser, below the aperture diaphragm, that will accept gelatin or glass filters. It is important not to place filters in or near any of the image-forming conjugate planes to avoid imaging dirt or surface imperfections on the filters, along with the specimen (22). Some microscopes have an attachment for placing filters near the light port at the base (near the field diaphragm). This placement is probably too close to the
1114
OPTICAL MICROSCOPY
field diaphragm, and surface contamination may be either in sharp focus or appear as blurred artifacts superimposed on the image. For the same reasons, it is also not wise to place filters directly on the microscope stage. MICROSCOPE OBJECTIVES, EYEPIECES, CONDENSERS, AND OPTICAL ABERRATIONS Finite microscope objectives are designed to project a diffraction-limited image at a fixed plane (the intermediate image plane) that is dictated by the microscope tube length and located at a prespecified distance from the rear focal plane of the objective. Specimens are imaged at a very short distance beyond the front focal plane of the objective through a medium of defined refractive index, usually air, water, glycerin, or specialized immersion oils. Microscope manufacturers offer a wide range of objective designs to meet the performance needs of specialized imaging methods (2,6,9,18–21; and see the section on Contrastenhancing techniques) to compensate for cover glass thickness variations and to increase the effective working distance of the objective. All of the major microscope manufacturers have now changed their design to infinity-corrected objectives. Such objectives project emerging rays in parallel bundles from every azimuth to infinity. They require a tube lens in the light path to bring the image into focus at the intermediate image plane. The least expensive (and most common) objectives are achromatic objectives, which are corrected for axial chromatic aberration in two wavelengths (red and blue) that are brought into the same focus. Further, they are corrected for spherical aberration in the color green, as described in Table 2. The limited correction of achromatic objectives leads to problems in color microscopy and photomicrography. When focus is chosen in the red–blue region of the spectrum, images will have a green halo (often termed residual color). Achromatic objectives yield their best results when light is passed through a green filter (often an interference filter) and using blackand-white film when these objectives are employed for photomicrography. The lack of correction for flatness of field (or field curvature) further hampers achromat objectives. In the past few years, most manufacturers have begun providing flat-field corrections for achromat Table 2. Objective Lens Types and Correctionsa
Type Achromat Plan achromat Fluorite Plan fluorite Plan apochromat a
Corrections for Aberrations Spherical Chromatic *b *b 3d 3d 4e
2c 2c 100 dB) of the CCD is an invaluable property for identifying weak spectral lines. Furthermore, the inherent integration of many highly sensitive photodetector elements in a small area has, in some instances, allowed revolutionary experimental designs (56). Several visible light spectrometer systems that use CCDs as the imaging device are commercially available. In the X-ray portion of the spectrum, scientific CCDs (57,58) have been used as imaging spectrometers for astronomical mapping of the sun (59), galactic diffuse X-ray background (60), and other X-ray sources. Additionally, scientific CCDs designed for X-ray detection are also used in X-ray diffraction, materials analysis, medicine, and dentistry. CCD focal planes designed for infrared photon detection have also been demonstrated in InSb (61) and HgCdTe (62) but are not available commercially. Fabrication. Although CCDs have been fabricated in many semiconducting materials such as Ge (63), InP (64), and HgCdTe (65), by far the most readily available devices are those that use Si as the semiconductor. There are several common types of silicon CCDs. All share certain processing steps, and most utilize p-type silicon as the semiconducting material. Single silicon crystals are grown (66) in conventional Czochralski vertical pullers using a single-crystal silicon seed dipped and rotated in a silicon melt. The large boules of silicon are sawed into wafers. The following fabrication discussion describes a
f4 f 1 f2 f3
f4
SiO2 n p
Si
Channel stops
p+
Charg
e-tran
sfer d
irectio
n
Figure 11. Cutaway view of a CCD shift register where the φi represent gate electrodes. Voltage pulses applied to the phase gates move a photogenerated charge in the charge-transfer direction. The channel stops confine the charge during integration and transfer. See text.
1197
process for creating a generic four-phase CCD from p-type silicon (67). A cross-sectional view of a four-phase Si-based CCD is shown in Fig. 11. The starting wafer is a thin p-type silicon layer grown epitaxially on a degenerately doped pC silicon substrate roughly 500 µm thick. The epitaxial layer is typically 8–15 µm thick and is doped with boron to a nominal resistivity of 10 ohm cm. The initial step of the fabrication process consists of a cleaning procedure designed to ensure a nearly defect-free surface. Next, a thin layer of protective SiO2 is grown on the silicon surface by using elevated temperatures in a steam and oxygen ambient. Channel stops, thin stripes that laterally contain the stored charge, are created by high energy implantation of p-type ions into appropriate regions of the silicon. Most CCDs are designed with a thin layer of n-type dopant at the silicon surface which holds the stored charge physically away from the silicon surface. This dopant layer, called the buried channel, is produced by implanting n-type dopant ions such as P or As into the silicon. After the implantation of the channel stops and buried channel, the original SiO2 layer is removed, and a fresh, undamaged layer is SiO2 is regrown. Next a layer of heavily impuritydoped polysilicon is deposited and patterned to form the φ2 and φ4 gate electrodes. An additional layer of SiO2 is then grown to provide an effective insulating layer atop the first polysilicon layer. The φ1 and φ3 gate electrodes are formed from a second layer of heavily impurity-doped polysilicon. A low resistance material such as aluminum is then used to form electrical connections between appropriate parts of the CCD. Typically this same metal layer is used to form bond pads that connect the CCD to external control signals from off-chip electronics. However, some process sequences require including a second metal layer for this purpose. Finally, the CCD is covered with an overcoat of protective material such as SiO2 or borophosphosilicate glass. This 500 to 1,000-A˚ layer is a barrier between the environment and the contamination-sensitive CCD. Device Type Versus Application. The application for which the CCD is designed dictates the variants in the process that are used to provide the desired performance enhancements. Consider as an example the effect of front-side illumination of the CCD on photon collection efficiency. Applications that require very high photon collection efficiency in the visible blue, extreme UV, or X-ray spectral bands are not well served by front-side illumination of the CCD. A useful CCD variant for such applications is the back-side illuminated CCD. Backside illuminated CCDs undergo additional processing to remove the underlying pC substrate from the p-type epitaxial layer. As the name implies, photons impinge upon the device from the back side, thereby avoiding the absorption layers of gate electrodes present in front-side illuminated devices. In this manner, the photon collection efficiency of the device can be improved in the blue, UV, and low-energy X-ray photon regimes. Substrate is removed by using an etchant whose etch rate is highly dependent on the concentration of boron dopant in the silicon. The process of substrate removal is well understood and highly repeatable. However, thinned (to ca. 25 µm)
1198
PHOTODETECTORS
back-side illuminated CCDs are typically more costly than thick front-side illuminated devices. The additional cost is associated with the reduced mechanical rigidity produced by the substrate removal. Other CCDs have special processing steps that lower the rate at which surface dark current is generated during the interval in which signal charge is collected. One such device is known as the virtual phase CCD (VPCCD) (68). The VPCCD uses a series of implanted dopant layers near the surface of the silicon to eliminate one of the polysilicon deposition steps required for electrode formation. The implanted layers serve as virtual electrodes which force the surface potential of the silicon in these regions to a constant bias during operation. These virtual electrodes replace two of the four gate phases described in the generic CCD process. The two remaining gate control phases are incorporated into a single physical gate electrode, again by ion-implanted layers in the silicon. Thus, the top side of the VPCCD pixel is only partially covered by the polysilicon gate electrode. The ion-implanted layers provide storage capability and unidirectional charge transfer, as the single polysilicon electrode phase is clocked. The virtual electrodes also greatly reduce the rate of surface-related dark current generated in the CCD. At 25 ° C, the nominal dark current value of a VPCCD is 0.1 nA/cm2 , which is nearly five times lower than the dark current in a typical buried channel CCD. A second device that has additional specialized ion-implanted layers is the multipinned phase (MPP) CCD (69). The MPP device merges the best performance features of multiphase and virtual phase CCD technologies and is becoming the CCD of choice for scientific applications. As in the generic multiphase CCD, the entire charge storage region is covered by polysilicon gates. However, before the electrode deposition, additional dopant layers are implanted into the silicon. These implanted dopant layers enable the MPP CCD to integrate the charge with the applied gate bias set, so as to attract opposite polarity charges to the SiSiO2 interface. This method of operation produces the lowest possible dark currents that rival or exceed the performance of the virtual phase CCD. Operation of the CCD in the MPP mode does, however, reduce the total charge storage capacity to typically 20% of that of an equivalent, non-MPP device. Although front-side illuminated devices are available, the MPP CCD is typically back-side illuminated to achieve state-of-the-art photon collection performance. Even with this stipulation, the MPP CCD is still a popular detector due to its availability in various array sizes and formats specifically designed for scientific applications. Silicon Photodiodes The popularity of silicon photodiodes is directly related to their ability to detect photons over a spectral range that spans the near-infrared to low-energy X-ray regimes. Typical spectral response characteristics are given in Fig. 9a. The fast response time of less than 1 µs is attractive compared to the response times of photoconductive or bolometer-based devices. The silicon photodiode has a responsivity of about 0.4 A/W at the peak response wavelength. Some manufacturers list
NEP values of less than 2 ð 1015 W/Hz1/2 , DŁ above 1 ð 1014 cm Hz1/2 /W, and optical areas of 1–10 mm2 . The photodiode has proven to be a useful tool for photoncounting and imaging applications across the entire range of spectral sensitivity and has been used for power generation in the visible portion of the spectrum (70,71). Silicon photodiodes are available in both discrete and array formats. The simplicity of the discrete photodiode makes this device one of the least expensive photon detectors available. Discrete photodiodes are merely p–n homojunctions and are available from numerous manufacturers. Typical, commercially available devices have one to four discrete diode detectors per package, and frequently, device manufacturers include operational amplifier-based readout electronics inside the packaged device for ease of use. Discrete photodiodes can be used in a myriad of applications, including high-speed optical switching, intensity determination for automatic exposure control circuitry in film cameras, and photon counting for spectroscopic analyses (72). Photodiode arrays are more complex than their discrete counterparts due to the difficulty of directing the signal information from each diode to off-chip electronics. The more common linear arrays contain internal multiplexing circuitry located on the periphery of the imaging area. This circuitry amplifies and buffers the signal from each diode and presents the information from each pixel through a single-output amplifier in a controlled time sequence. The performance characteristics of linear photodiode arrays typically rival those obtained from discrete diodes. Arrays ranging in size from 1 ð 64 pixels up to 1 ð 4,096 pixels and larger are available from commercial sources. Linear photodiode arrays are commonly found in high-resolution image scanning applications such as photocopiers, facsimile (FAX) machines, and handheld scanners (73). As in linear photodiode arrays, two-dimensional photodiode arrays require internally integrated circuitry to mediate the signal information from each pixel. However, except for the outermost rows and columns, the pixels in two-dimensional arrays are surrounded on all sides by other pixels. Thus the required circuitry cannot reside solely on the periphery of the array but must be integrated into the actual pixel site. In a twodimensional array, each pixel consists of both a photodiode and electronic circuitry designed to attach or detach that diode from the readout electronics located at the chip periphery. The voltage pulses that control the timesequenced readout are generated from two multiplexing circuits, one each for the x and y chip dimensions, which are also located at the chip periphery. In its simplest form, the two-dimensional photodiode array uses a large capacitor that is common to all pixels in a given row to convert the signal charge sequentially from each pixel to a voltage. The resulting signal voltage is very small and results in a signal-to-noise ratio roughly onehalf that of an equivalent format CCD array. Generally device performance varies inversely with the number of pixels in the array. Prior to about 1984, two-dimensional photodiode arrays were heavily used in commercial imaging applications such as video cameras. Recently,
PHOTODETECTORS
CCDs and metal oxide semiconductor (MOS) arrays, two-dimensional imaging arrays that use p–n diodes as photosites, and complimentary metal oxide semiconductor (CMOS)-based components for readout circuitry have emerged as strong competitors in this arena. Nevertheless, some two-dimensional standard video format photodiode arrays are still manufactured. These devices are most useful in situations unsuited for CCD and CMOS-based imagers, such as ionizing radiation environments. In addition to use as a popular image detector, the silicon homojunction photodiode and avalanche photodiode (74) can also be used for power generation by operating the device in the photovoltaic mode. In this mode, incident photons produce a voltage drop across the device that is proportional to the number of absorbed photons. When a finite load resistance is placed across the diode leads, a current is produced. Thus, silicon homojunction diodes can be used to convert optical energy into electrical energy. The energy conversion efficiency for common silicon photocells is in the 3–15% range, depending on the specifics of the device design. High levels of illumination are required to produce useful output power. For typical photosensitive cells an illumination of 10 lux can produce an open-circuit output voltage of about 0.5 volts. However, any desired voltage or current can be generated by appropriate series or parallel interconnection of multiple individual elements. Although silicon photodiodes are not as efficient in generating power as more exotic Group 2–16 (II–VI) and Group 13–15 (III–V) materials, these devices are commonly used as power sources for both terrestrial and space-borne applications. Fabrication. Photodiodes can be made by manufacturing processes resembling those for constructing CMOS as well as bipolar integrated circuits. For a typical p- on ndiode formation process, the starting wafer is an n-type silicon substrate roughly 500 µm thick. The silicon has been doped with either P, As, or Sb during the wafer formation process to a nominal resistivity of 10 Ð cm. The initial step of the fabrication process consists of growing a thick layer of SiO2 on the top surface of the silicon wafer by using elevated temperatures in a steam and oxygen ambient. Next, circular windows are etched in the SiO2 layer. The wafer is then placed in a high-temperature diffusion furnace that introduces boron dopant into the silicon through the open circular windows. Ion implantation is a common alternative to high temperature diffusion for boron doping. The result of the doping procedure is the formation of a p–n junction, where one diode is formed for every window in the SiO2 . Next, a low resistance material such as aluminum is used to form ohmic contacts to the p-type silicon regions. The aluminum layer is also used to form distinct electrodes, one per diode, to which external connections can be made. Finally a single common ohmic contact is formed on the back surface of the n-type silicon wafer by an additional metal layer. The method of diode formation as well as the density and profile of the impurity ions determines the specific optical and electrical performance parameters of the photodiode. When photon absorption occurs in
1199
the depletion region of the diode, the resulting carriers are quickly swept from the diode and measured by the readout circuitry. The same behavior occurs for an optically induced charge formed within a diffusion length of the depletion region. A charge generated at a distance greater than one diffusion length recombines in the undepleted silicon and consequently cannot be detected as a signal charge. A similar fate befalls photogenerated carriers produced in heavily doped or heavily latticedamaged regions. Heavy surface doping and lattice damage are common by-products of the homojunction formation process. Therefore, the diode fabrication process balances the reduction in the rate of surface dark current generation against the charge collection loss produced by heavily doping the silicon surface. Similarly, the improvement in photon collection efficiency obtained by increasing the thickness of the depletion region must be weighed against the associated increase in bulk-generated dark current. CMOS Image Sensors A relatively new entry in the field of visible photon detection is the complementary metal oxide semiconductor (CMOS) image sensor (75). Sometimes referred to as active pixel sensors (APS), these devices are variants of readout electronics commonly used with infrared photodiode arrays (76,77) that have been modified to allow detecting visible photons. Unlike CCD or photodiode imaging arrays, CMOS image sensors have active transistor elements that reside in each pixel. These elements are configured into amplification circuitry that can convert an integrated photogenerated charge to a signal voltage inside each pixel. Before conversion to a voltage, a signal charge is collected and stored in either a diode or MIS capacitor region. Amplifier and multiplexer circuitry residing at the periphery of the imaging area poll, each pixel in an image line in parallel, and transmit the signal voltage measurements to an on-chip analog-to-digital converter (ADC). This readout process continues until every line in the image has been polled and output. In theory, the use of CMOS fabrication technology affords a number of advantages over CCD technology. CMOS provides a very high degree of circuit integrability, thereby allowing complete camera systems to be constructed on a single silicon chip (78). The combination of signal charge detection in the pixel of generation, without the need for transfer through the array, and the use of complementary n- and p-channel field effect transistors (FET) provide very low power operation. Finally, because CMOS fabrication is currently the most popular and most readily available semiconductor process for silicon, imagers can be manufactured at very low cost. At this time (2000), CMOS image sensors are still in their infancy. Compared with CCDs, CMOS imagers suffer from several performance problems, including poor quantum efficiency, fixed pattern noise, and image cross talk (79). Quantum efficiency is hampered in two ways. First, including active circuitry decreases the photonsensitive area of the pixel and thereby diminishes the total collection area for photons. Second, compared to a CCD, standard CMOS processes reduce the depth
1200
PHOTODETECTORS
in silicon from which the signal charge is collected, thereby retarding the collection efficiency of about 600-nm and longer wavelength photons. Fixed pattern noise is introduced into the image through variations in the threshold voltages of interpixel amplifiers across the imaging array, as well as through variations in dark current generation on a pixel-by-pixel basis. Image cross talk occurs through the diffusion of photon charge generated beneath the depletion region of the pixel array. Cross talk also results from changes in pixel operating voltages across the array, which are produced by current flow through the resistance of critical bias lines that feed the pixel circuitry. Much improvement in performance is anticipated during the next few years and will be required to strengthen the prospects for CMOS imagers in scientific usage. Cadmium Sulfide Photoconductor CdS photoconductive films are prepared by both evaporating bulk CdS and settling fine CdS powder from aqueous or organic suspension followed by sintering (80,81). Evaporated CdS is deposited to a thickness from 100 to 600 nm on ceramic substates. The evaporated films are polycrystalline and are heated to 250 ° C in oxygen at low pressure to increase photosensitivity. Copper or silver may be diffused into the films to lower the resistivity and reduce contact rectification and noise. The copper acceptor energy level is within 0.1 eV of the valence band edge. Sulfide vacancies produce donor levels, and cadmium vacancies produce deep acceptor levels. Settling can be accomplished from an ink which contains 1-µm crystallites of CdS and selected concentrations of CdCl2 . The coating is fired in a restricted volume of air at 500–700 ° C. During sintering, the CdCl2 acts as a flux and forms a solution with the surface of the grains of CdS. A few ppm of chloride and copper (from impurities) enter the crystal lattice and act as activator centers for trapping. Excess chloride evaporates and leaves a continuous polycrystalline photoconductive film. The layers are from 5–30 µm thick and have a linear I –V relationship. The films have an area resistance of 100 000–300 000 / square. Most applications, such as switching on outdoor lights at twilight, require a detector resistance near 1,000 ohms to operate the switching circuit without
Electrode
Photoconductive cadmium sulfide
Contact Metal case
Glass window Ceramic substrate
Base pin
Figure 12. CdS film detector in a package showing interdigitation to reduce resistance. See text.
the need for impedance matching electronics. This is accomplished by depositing the contacts in an interdigitated geometry as shown in Fig. 12. A protective film is deposited over the detector and contacts to provide long-term stability, or the detector structure is mounted in a hermetic package as shown. The spectral sensitivity of a thin-film CdS detector is shown in Fig. 9a. The response shape is similar to that of the human eye (see Table 1). GaAs, GaAsP, and InGaAs Photodiodes Gallium–arsenic and gallium–arsenic–phosphorus diodes are fabricated as photodiodes as well as light emitters. Fabrication is typically by mesa etch technology of the films of GaAsP or InGaAs grown (82) by the vapor-phase epitaxial process using metal organic chemical vapor deposition (MOCVD). This growth technique results in impurity densities less than 1 ð 1014 atoms/cm3 . The spectral cutoff range extends from 500–900 nm, depending on the phosphorus content. These detectors can be used for color discrimination and do not require expensive interference filters. Emitter–diode pairs are used for very high impedance signal coupling in high-speed integrated circuits. The spectral sensitivity is shown in Fig. 9a for typical GaAs and GaAsP photodiodes. Surface leakage, caused by the lack of a native surface-passivation technology such as SiO2 for Si detectors, is the most significant performance limitation for diodes made from this material system. The indium–gallium–arsenide (InGaAs) photodiode is becoming a very popular detector for very near-infrared and short-wavelength infrared radiation. InGaAs devices are made by improved epitaxial growth techniques, and their performance rivals or surpasses HgCdTe. The available cutoff wavelength of commercially available discrete InGaAs photodiodes extends to 2.5 µm. Linear InGaAs arrays that have cutoff wavelengths as high as 2.3 µm (see Table 1) are available, enabling research efforts in room temperature atmospheric spectroscopy (83) and environmental monitoring (84). Uncooled InAsSbP/InGaAs photodiodes that have good performance characteristics to a 3.4 µm cutoff wavelength have also been reported (85). PbS and PbSe Photoconductors Lead chalcogenides, PbS, PbSe, and PbTe, were among the first infrared-detector materials investigated. Although photovoltaic effects are observed from p–n junctions in single-crystal material, the response is quite poor and not reproducible. However, very sensitive photoconductors are prepared as polycrystalline thin films about 1 µm thick, which are deposited on glass or quartz substrates between gold or graphite electrodes. Detector elements are prepared either by sublimation in the presence of a small partial pressure of O2 or by chemical deposition from an alkaline solution that contains a lead salt and thiourea or selenourea (86). Lead sulfide and lead selenide deposit from solutions as mirrorlike coatings made up of cubic crystallites 0.2–1 µm on a side. The reaction may be represented nominally by the following: Pb2C C SC(NH2 )2 C 2OH ! PbS C C(D NH)2 C 2H2 O
PHOTODETECTORS
The actual reaction probably is more complex. The photoconductive behavior depends on the pH of the solution from which deposition occurs. It is likely that oxygen-containing compounds are present in the deposited films. For either method of preparation, the effect of oxygen, which is introduced during preparation or by subsequent heat treatment in air or oxygen, is critical for the development of optimum sensitivity. Maximum sensitivity is obtained near the point at which the film conductivity type changes from n to p. The long response times are suggestive of trapping. It is likely that deep trapping states are located at the oxidized surface of the micrograins (87,88). The evaporation technique produces the best results, especially for PbSe and the more obsolete PbTe. The spectral sensitivity is shown in Fig. 9b for different operating temperatures (see Table 1). Platinum Silicide Schottky Barrier Arrays The Pt:Si detector (89–91) is essentially a metal semiconductor barrier in which the platinum silicide is a quasi-metal that generates a small energy barrier to electrons. The effective photons are absorbed in a very thin region of the silicide next to the barrier and generate free electrons that flow over the barrier and tunnel through it into the n-type silicon. The efficiency of this process is only a few percent, even at high energies (short wavelengths), because of the low electron diffusion coefficient in the silicide. At wavelengths beyond 1 µm, the efficiency drops off dramatically because of decreasing tunneling probability for the lower energy electrons. The effective quantum efficiency is about 0.1% at a wavelength of 4.8 µm. Techniques of platinum deposition vary, but sputtering and annealing of a very thin layer of platinum produces a uniform platinum silicide film that has less than 0.3% variation in responsivity in an area of 2 ð 2 cm. Details of deposition and annealing processes are considered trade secrets by the manufacturers. Deposition is typically onto silicon integrated circuit (IC) chips that have a readout structure. Small regions of the IC chip at each pixel are dedicated for the silicide detector element. The readout is typically an in-line charge-coupled device. The spectral sensitivity is shown in Figure 9b, and array information is given in Table 1. Although the photon efficiency is quite low, focal plane performance for infrared imaging of ambient temperature scenes is acceptable because of long television display frame time, reasonably low detector noise, and the excellent uniformity of responsivity that allows for high onchip input gain without offset correction. The responsivity is about 10 millivolts per degree delta scene temperature using f /2 optics, and scene sensitivity with large arrays is about 0.1 ° C. The Pt:Si detector finds typical use in security and defense-related applications but is also useful in spectroscopy (92) and other applications where device cooling is not a significant systems issue. InSb Photodiode Detectors and Arrays Sensitive photodiodes (93,94) have been fabricated from single-crystal InSb using cadmium or zinc to form a p-type
1201
region in bulk n-type material. High-quality InSb crystals can be grown by the infinite-melt process (95,96) where an InSb film is grown epitaxially (from the liquid phase) on a slice of InSb which was prepared in a conventional Czochralski vertical puller. The diode formation process typically is a closed-tube diffusion. Cleaned and etched samples of InSb are placed in a quartz ampule with a limited amount of zinc or cadmium. After evacuation and sealing, the ampule is heated to about 50 ° C below the crystal melting point. The metal vaporizes partially or completely, depending on the amount, volume of ampule, and temperature, and diffuses into the crystal after a few hours. The impurity–diffusion profile approximates the error–function law and the p–n junction is 1–5 µm below the surface. Diode arrays (16) are formed by etching mesas about 50 ð 50 µm. Typical performance of InSb detectors is given in Table 1, and the spectral sensitivity is shown in Fig. 9c. Up to 480 ð 640 matrix arrays of InSb photodiodes in a mesa configuration have been demonstrated. Commercial units 256 ð 256 are available. The mesa detector array is mated to a silicon chip that has an array of amplifiers and multiplex circuitry. Each diode is connected to an amplifier input. The hybridization process consists of forming indium bumps on each diode mesa and on each amplifier input, using a photolithographic process and In evaporation and pressing the detector array chip to the silicon integrated circuit chip. The infrared radiation must pass through the InSb chip to reach the photodiode junction. To improve quantum efficiency, the InSb is grown as a thin layer on GaAs or GaAsSb substrates. Quantum efficiency is greater than 50% for wavelengths greater than 2 µm and less than the cutoff of InSb, 5.3 µm. A protective coating (passivation) of the InSb photodiode for stable operation during several hours without frequent signal normalizations has not been found. However, infrared imaging cameras using hybrid InSb focal planes are a commercial reality. In a real-time imaging configuration, the scene sensitivity is about 0.04 ° C using an InSb infrared camera. As a result of their commercial availability, InSb arrays are finding increased use in infrared astronomy (97) and other scientific applications. Mercury Cadmium Telluride HgCdTe has proven to be an excellent infrared detector material (2) the CdTe content can be readily adjusted to obtain cutoff wavelengths from 2–20 µm. The benefit is high spectral sensitivity of the photon detector, low defect density, and high cooling efficiency. The dependence of energy gap on mole fraction is linear. For Hg1x Cdx Te, the x values of most interest lie between 0.17 and 0.50. The need for large focal planes up to 2 ð 2 cm has dramatically changed the direction of single-crystal HgCdTe growth technology. Crystal Growth. The method of solid-state recrystallization (quench and anneal) was marginally adequate during the 1970s and 1980s for linear photoconductor arrays used in military night vision systems, but quenching kinetics restricted sample size to 6 mm ð 2 cm. The crystals had many low-angle grain boundaries, defect densities were
PHOTODETECTORS
Heating coil Substrate holder assembly
Melt
Figure 13. Furnace reactor for growing HgCdTe films on CdZnTe substrates using the liquid-phase epitaxial process. The melt is tellurium in a quartz crucible. A mercury reservoir in a cooler zone maintains the required Hg overpressure.
high, and there were large nonuniformities in composition and carrier concentration. Therefore, epitaxial growth technologies were developed. Crystal growers were hampered, however, by the lack of suitable substrate material until methods were developed to grow the HgCdTe films onto large-area silicon and GaAs single crystals. The problem with misfit dislocations has not yet been satisfactorily solved except for detector cutoff wavelengths of less than 5 µm. Thus, CdTe single crystals are used to obtain large, low defect density substrates for growth from the liquid phase. Lattice matching is achieved by adding 3% Zn at. wt. Large-area CdZnTe substrates form the basis for liquidphase epitaxy (LPE) growth of mid- and long wavelength IR HgCdTe detector material. Substrates are obtained from 3.5-kg CdZnTe ingots grown in graphite boats in sealed quartz ampules (98). The horizontal Bridgman growth process yields 41 ð 6.4 ð 5.0 cm semicylindrical ingots that have large single-crystal portions which are then sectioned, sawed into slabs, and diced into required dimensions. The substrate surfaces are diamond turned and etched to assure flat, damage-free surfaces. Standard substrate sizes are 3.6 ð 2.0 cm and 3.6 ð 1.5 cm but the ability to obtain single-crystal h111i oriented CdTe substrates as large as 5.1 ð 7.6 cm has already been demonstrated. The material typically exhibits dislocation densities of 1 ð 105 /cm2 and high purity as judged by Hall measurement made on evaluation samples of n-type LPE films grown on these substrates. Liquid-phase epitaxial films (98,99) are grown in production prototype dipping reactors, as shown in Fig. 13, using CdZnTe substrates. Film growth is from both tellurium and mercury solutions. Phase diagrams for the Te and Hg corners are shown in Fig. 14. Growth is from lightly (ca 5 ð 1014 /cm3 ) indium-doped 4,000-g tellurium solutions. The mercury vapor pressure is maintained by a mercury reservoir positioned in an independently
1.0
(a)
A
B
C D
X Hg
Crucible dome
controlled furnace zone. Up to 54 cm2 of material can be grown in a single run from the largest reactors. Multiple furnace zones in the vicinity of the melt crucible permit adjusting the melt temperature profile. The substrates are positioned horizontally during growth and rotated to promote uniformity of composition. The holder design allows reorienting substrates vertically for withdrawal to facilitate melt drainage. LPE films are also grown in mercury solutions of several kilograms in which small amounts of tellurium and cadmium have been added. In both cases, the cutoff wavelength varies less than 0.1 µm across the entire film. The films grown in tellurium are annealed in Hg vapor. Mercury atoms diffuse into the films to reduce the density of Hg vacancies that act as acceptor sites. The carrier concentration is shown in Fig. 15 as a function of the annealing temperature. For photoconductive detectors, the films are annealed so that the indium donor density is dominant, this results in n-type material that has a 1 ð 1015 /cm3 excess electron density. Minority carrier lifetime is typically 1–5 µs at 77 K. For photodiodes, the films are annealed to an acceptor (via Hg vacancies)
E
0.1
F
450 500 0.01
650
600
550
0.01 0.02 0.03 0.04 0.05 0.06 0.07 X Cd
10−3
(b)
A
550 B
520 500 480
C
X Cd
1202
D
−4
10
E 430
460 F
400 10−5 0.01
360 0.1
0.2
X Te
Figure 14. Phase diagrams of HgCdTe used to define the liquid-phase epitaxial growth process where composition is in mole fraction X, and the numbers represent temperatures in ° C: (a) Te-rich corner where the dotted lines A–F correspond to values of XTe of 0.1, 0.2, 0.3, 0.5, 0.8, and 0.9, respectively, and (b) Hg-rich corner where A–F correspond to values of XHg of 0.9, 0.8, 0.6, 0.4, 0.2, and 0.1, respectively.
PHOTODETECTORS
of HgCdTe removed must be carefully etched to avoid generating defects. The arrays are defined by photoetching and passivated using a ZnS film; indium electrical contacts are applied. The contacts define the optically active area of about 2 ð 105 cm2 . The arrays are mounted in a vacuum dewar for cryogenic cooling. Each element is connected to a series resistance bias circuit and ac coupled to an amplifier. Detector resistance is nominally 100 ohms, and bias current is about 2 mA. The signal voltage is given by Eq. (13) and the responsivity by Eq. (34).
Carrier concentration/cm3
1017
1016
1015
n-Type
RV D ητ V/hνnv. 1014
1013
Extrinsic impurities
200
(34)
p-Type
300 400 500 Annealing temperature (°C)
600
Figure 15. Excess carrier concentration in HgCdTe in a saturated Hg vapor as a function of temperature, where the dashed line represents Hg vacancies. The extrinsic impurity concentration can be adjusted in the growth process from low 1014 up to mid-1017 . Low temperature annealing reduces Hg vacancy concentration and acceptor density.
16
1203
3
The responsivity becomes independent of the bias voltage V when the electric field-induced sweep time of the holes equals the hole lifetime. g–r noise is dominant in a well-designed, well-made HgCdTe photoconductor detector (103,104) and may be expressed in terms of a minority carrier density p and majority carrier density n. Semiconductor noise analysis for the HgCdTe photoconductor yields Vg2– r D
4pτ V 2 f . n(n C p)v
(35)
The responsivity and g–r noise may be analyzed to obtain background photon flux and temperature dependence of responsivity, noise, and detectivity. Typically, n>p, and both are determined by shallow impurity levels. The minority carrier density is the sum of thermal and optical contributions, n2 ηB τ , (36) pD i C n t
density of 3 ð 10 /cm . Carrier lifetimes are less than 50 ns at 77 K. Extrinsic acceptor doping is being developed to replace vacancy doping to achieve longer minority carrier lifetime and lower dark current density. Film thickness is 40–80 µm. Films grown in Hg (99,100) are usually structured to make heterojunction photodiode arrays. The first or base layer is narrower band gap HgCdTe, grown on CdZnTe substrates, doped with indium for excess electrons (n-type) in the 3 ð 1016 /cm3 range and is 10 µm thick. The second or cap layer is wider band gap HgCdTe, doped with arsenic for excess holes (p-type) in the 5 ð 1015 /cm3 range and is 4 µm thick. The composite HgCdTe film is photolithographically etched part way into the base layer to form an array of mesas; each one is a photodiode detector element. The p–n junction is close to or coincident with the metallurgical heterojunction. For infrared detection in the 8–12 µm atmospheric spectral window, the base layer CdTe content is about 20%, and the cap layer CdTe content is about 30%.
where, typically, the background flux φB is much greater than the signal flux φs . The lifetime τ is about 1 µs but for narrow band gap, defect-free detectors, it becomes the Auger lifetime and can be calculated readily from basic semiconductor properties (101). The cooling requirement is determined according to Eq. (36) because ni is an exponential function of band gap energy and temperature. Combining Eqs. (35) and (36), detectivity can be expressed as follows: 1/2 τ η . (37) DŁλ D 2hν pt
Photoconductive Detector Arrays. Since 1972, more than 50,000 HgCdTe linear arrays (101,102) have been produced in the United States for Department of Defense infrared systems ranging from night vision for M1 tanks to targeting sights for smart weapons. These common module detector arrays consist of 180 elements photoetched on a 50-µm pitch. Virtually all of the material used for these arrays was prepared by solid-state recrystallization. Small slabs of material were epoxied to sapphire or ceramic and were thinned to 8 µm. The newer epitaxial HgCdTe is also epoxied to ceramic where the film side is down. The material is thinned by diamond turning, and the CdZnTe substrate is removed. In each case, an 8-µm thick layer of HgCdTe remains. The last few micrometers
At low background flux, this gives the temperature dependence of the DŁλ shown in Fig. 4. At high flux, the DŁλ equation (Eq. 37) reduces to Eq. (12) except for a factor of (2)1/2 which results from the random recombination process not present in diodes. The scene sensitivity of a scanning photoconductor array infrared camera is about 0.15 ° C. The long minority carrier lifetime in n-type HgCdTe has been exploited to perform signal processing in the element (SPRITE) (105). The bias field is adjusted to sweep the photogenerated holes along the HgCdTe detector element synchronously with the scanned image, creating a time delay and integration (TDI) enhancement of the signal-tonoise ratio.
PHOTODETECTORS
Photovoltaic Detectors, Arrays, and Focal Planes. The two popular types of photodiode arrays in HgCdTe are based on homojunction (106) and heterojunction (107,108) technologies. Homojunction diode arrays are fabricated from p-type epitaxial HgCdTe. The surface of the grown HgCdTe film is flattened by diamond milling and chemical lapping using a weak solution of bromine in methanol. The epitaxial film is then epoxied to a silicon chip that has an array of amplifiers (typically on 50-µm centers) and readout multiplexer (mux) circuitry. The CdZnTe substrate is milled away, and the HgCdTe is thinned to about 10 µm. An array of small (10–20 µm) holes is etched in the HgCdTe film so that each hole is located over an amplifier input pad of the silicon IC chip. The diodes are made by implanting boron (150 kV, 1 ð 14/cm2 ) through a 500-nm ZnS layer. Planar diodes are generated by using a patterned photoresist to define the implanted regions located between the holes. The implant process disrupts the lattice, creating Hg ion interstitials which cause the formation of very shallow donor states. The damage layer extends about 150 nm from the surface and the n–p junction varies from 2–8 µm deep, depending on the implant and annealing conditions. The fabrication is completed by applying a CdTe passivation layer (about 1 µm thick) directly to the HgCdTe surface, forming a ZnS layer for antireflection and electrical isolation, and formation of a metal film lead to connect each implanted n region to each amplifier input. Infrared illumination is directed onto the HgCdTe film, and quantum efficiency typically exceeds 75%. Because infrared focal planes are typically cooled to less than 100 K, the differential coefficient of thermal expansion causes the shrinking silicon to put tensile stress on the HgCdTe film. However, the thinness of the film and the presence of the holes allow the HgCdTe to strain and retain the integrity of the epoxy film and the electrical contacts. Array dimensions up to 2.5 cm have proven reliable. Heterojunction diode arrays utilize the grown p–n junction and mesa etch technology. The mesa arrays are passivated by a thin layer (about 500 nm) of CdTe using an evaporation or molecular beam epitaxial process. Annealing at 250 ° C in a Hg atmosphere creates a short (about 100 nm) graded layer from the CdTe to the HgCdTe. The benefit of CdTe passivation is efficient isolation of the p–n junction from the surface, low dark current noise, and immunity to ionization radiation. As in InSb hybrid arrays, indium bumps are formed using evaporation at the center of each mesa and on each amplifier contact pad of the silicon integrated circuit chip. The two chips are pressed together in a bump bonding process called hybridization (109). Sometimes the space between the chips is filled with epoxy. Infrared illumination is through the CdZnTe substrate which has been coated with ZnS for antireflection. Quantum efficiency typically exceeds 75%. HgCdTe photodiode performance depends, for the most part, on high quantum efficiency and low dark current density (110,111), as expressed by Eqs. (23) and (25). Typical values of R0 A at 77 K are shown as a function of cutoff wavelength in Fig. 16 (95). HgCdTe diodes sensitive out to a wavelength of 10.5 µm have shown ideal diffusion current limitation down to 50 K. Values of R0 A have
107 106
R 0A (Ω cm2)
1204
105 104 103 102 101 100
4
5
6
7 8 9 10 11 Cutoff wavelength (µm)
12
13
Figure 16. Resistance area (R0 A) product for HgCdTe photodiodes cooled to 77 K. The solid line represents the theoretical limit, the dashed lines (−−) and (− ž −) high and low performance, respectively. Dark current caused by defects lowers R0 A and detector sensitivity. In the high-performance range, dark current is an exponential function of cutoff wavelength and temperature.
exceeded 1 ð 106 Ðcm2 . Spectral sensitivities for three compositions of HgCdTe detectors are shown in Fig. 9a,c. More information is listed in Table 1. The scene sensitivity of a HgCdTe diode area array cooled to 80 K is about 0.02 ° C for ambient temperature scenes. Doped Germanium and Silicon Photoconductors Extrinsic photoconductors are typically single-crystal germanium doped with zinc, cadmium, mercury, boron, and gold (112,113) and silicon doped with indium, gallium, or arsenic (114). The doping density ranges from 1 ð 1015 to 1 ð 1017 impurity atoms/cm3 leading, respectively, from a low (1 cm1 ) to a higher absorption coefficient (50 cm1 ). Information on ionization energies, solubilities, diffusion coefficients, and solid–liquid distribution coefficients is available for many impurities from nearly all columns of the periodic table (113). Extrinsic Ge and Si have been used almost exclusively for infrared detector applications. Of the impurities, Cu, Au, Zn, Cd, Hg, and some of the elements of Groups 13 (III) and 15 (V) have been used in detectors. Germanium and silicon, which are used in the preparation of detectors, must be of high purity before they are doped with the desired activator impurity to avoid unwanted compensation by impurities whose ionization energies are smaller than the activator. The required purity can be achieved by zone refining in which a short molten zone is repeatedly passed from end to end of an ingot of impure Ge or Si. Impurities that have distribution coefficients larger than unity collect near the seed. The concentration of electrically active residual impurities in the center portion of the ingot can be reduced to 1012 –1013 /cm3 . Single crystals can be grown by using the Czochralski method, in which an oriented seed crystal is brought into contact with the melt and then is withdrawn slowly while being rotated, or by using a horizontal zone melting method, in which a seed crystal is melted onto one end of a polycrystalline ingot. A molten zone is produced at the junction of the ingot and seed and is moved slowly along
PHOTODETECTORS
the ingot, leaving behind a single crystal. All of these operations must be carried out in an inert or reducing atmosphere to prevent oxidation of the germanium or silicon. In most cases, the activator impurity must be incorporated during crystal growth. An appropriate amount of impurity element is dissolved in the molten Ge and, as crystal growth proceeds, enters the crystal at a concentration that depends on the magnitude of the distribution coefficient. For volatile impurities, for example, Zn, Cd, and Hg, special precautions must be taken to maintain a constant impurity concentration in the melt. Growth occurs in a sealed tube to prevent escape of the impurity vapor or in a flow system in which loss caused by vaporization from the melt is replenished from an upstream reservoir. Some impurities, for example, Cu, Ag, Ni, Co, and Fe in Ge, and In, Ga, and As in Si, have diffusion coefficients that are large enough to permit doping by solid-state diffusion well below the melting point of the host crystal. A thin layer of the diffusant, which is deposited on the surface of the crystal by electroplating, vacuum evaporation, or electrochemical replacement, serves as the source for diffusion. After homogeneity is achieved, the sample is quenched. The alloyed surface layer must be removed by lapping and etching before the electrical contacts are applied. The impurity concentration should be as large as possible, within limits, to maximize the absorption coefficient. In some cases, the concentration is limited by the impurity solubility, which may be small depending on the element. For impurities of high solubility, the upper limit is set by the onset of impurity banding. When the average separation of impurity atoms in the lattice is small enough, conduction by direct transfer of carriers from atom to atom can occur. Impurity banding limits the extent by which cooling can reduce the dark current and, therefore, the noise, in Ge and Si. This is significant above impurity concentrations of about 1016 /cm3 . Impurities other than those from Groups 13 (III) and 15 (V) generally exhibit two or more impurity levels in Ge. If an activator is used which is not of the lowest lying level, the lower lying levels must be compensated for by adding an impurity of the opposite conductivity type. For example, if the second Zn acceptor level for 0.095 eV is to be used, the lower lying level at 0.035 eV must be compensated for by adding a donor impurity, for example, As in a concentration slightly greater than that of Zn. Electrons from the As donors fall into the low-lying Zn level and render it inactive. A special case is the lowest lying Au level; Au acts as a donor at 0.045 eV above the valence band. If a shallow acceptor, for example, Ga, is added in a concentration that is slightly less than that of Au, electrons from the donor level fall into the Ga level. At low temperatures, holes that are bound to the compensated Au centers can be photoexcited to yield photoconductivity that has a long wavelength threshold at about 25 µm. Single-detector elements and arrays are formed by dicing and etching and attaching electrical contacts. Linear arrays of several hundred elements have been made. The spectral sensitivities of doped Si and Ge detectors are shown in Fig. 9c,d. See Table 1 for other information. Monolithic and hybrid extrinsic focal planes have also been
1205
demonstrated (115,116). Such arrays play an important role in areas that require detecting very long wavelength infrared radiation and can observe at the sensitivity limits imposed by natural sky background radiation (117). GaAs–AlGaAs Quantum Well Arrays The quantum well infrared detector (118–120) is a technology based on the artificial structure called a superlattice. Infrared detection out to a 15-µm wavelength and beyond has been achieved using engineered material that has a controlled energy gap. The technique utilizes multiple stacked layers of semiconducting material; each has a tailored band gap energy, such that a series of potential wells exists in the direction normal to the layers. Typically, very thin layers of GaAs and AlGaAs are grown by molecular beam epitaxy (MBE). The MBE process is conducted in stainless tubing and consists of simply controlled evaporation of the elements from side chambers into a main chamber that contains a wafer of the substrate material, typically, GaAs. The electron wave function is confined in the potential wells when the photon-generated electron wavelength is nominally the layer thickness (about 10 nm). The electrons are freed from the wells by photons, and a signal is detected by applying an external bias voltage. AlGaAs quantum well infrared photodetector (QWIP) focal planes have achieved sufficient sensitivity out to a 10-µm wavelength to result in scene temperature sensitivity of about 0.04 ° C when the focal plane is cooled to 70 K (121). Spectral sensitivity is shown in Fig. 9c, and array information is given in Table 1. To date, QWIP devices have been used in a number of scientific applications, including geology (122), medicine (120), and astronomy (123), as well as in public service applications such as fire fighting. Semiconductor Bolometer Arrays The use of bolometers to sense infrared radiation is not new. What is different is that, rather than using a single large cell, a matrix of miniature cells or microbolometers is created. Rather than a single amplifier connected externally, a custom integrated circuit is built under each cell to form a totally integrated focal plane array, as indicated in Fig. 17. Key elements of the structure (124,125) are the thin (100-nm) amorphous silicon (α:Si) or vanadium oxide (VO) thermally sensitive membrane, the thermally insulating support arms, and the integrated circuit underlying this structure. Infrared energy focused on the individual pixels heats the membrane, as given by Eq. (24). The support arms act as electrical contacts but are sufficiently thin and narrow to prevent significant conduction of thermal energy from the membrane to the surroundings. The temperaturedependent electrical resistance of the semiconductor film is monitored by the circuitry and is converted into an electrical signal proportional to the incident radiance. This signal is displayed on a standard television monitor, thus forming the image of the scene. Two technological advances have occurred that make such a structure feasible to build. The first is the development of microetching techniques that can be used
1206
PHOTODETECTORS Radiation (Uv,Ir, and visible)
Support, electrical connection, and thermal isolation Semiconductor membrane
Support post
Bias
Mux out
Buffer
Filter
Gain
R load
IC readout electronics Silicon substrate
Figure 17. Cross-sectional schematic of a microbolometer photodetector. Micromachining is used to construct small, very low mass detectors. The dimension of a detector–amplifier cell is 50 µm. Mux D multiplexer, IC D integrated circuit.
small circuit for each cell of the array. By doing this, the noise bandwidth of each cell can be minimized, thereby maximizing performance. This device is a thermal detector in the infrared and therefore is independent of wavelength. Measured responsivity is about 5 ð 105 V/W, and the NEP is about 50 pW. The focal plane operates at ambient temperature. Imaging arrays that have scene sensitivity better than 0.1 ° C have been demonstrated (see Table 1).
(a)
HEALTH AND SAFETY FACTORS
(b) Active area
Support pillars
2.5-µm Gap
Figure 18. Microbolometer (a) array portion showing pixels on a 50-µm pitch. Each pixel is connected to a readout amplifier in the supporting silicon IC chip. (b) Detector has a 35 ð 40 µm active area. The serpentine arms give excellent thermal isolation, and the low mass results in a 10-ms response time, ideal for thermal imaging.
to form microscopic structures in silicon and its coatings. This advance makes forming the membrane and support arms possible, and thickness control is in the 10-nm range. Cell dimensions of less than 50 µm have been demonstrated. Scanning electron microscope photos of an array and a test pixel are shown in Fig. 18. The second critical factor is the increased circuit density in silicon integrated circuits that makes it possible to build a
The completed photodetector is usually packaged hermetically in inert glass or plastic or is enclosed in an evacuated metal or glass container. Although most detector materials are toxic, the means taken to passivate and isolate these materials are often adequate to protect the user. However there are exceptions. The preparation of detector materials and detector fabrication can present considerable hazards. Some crystal preparation techniques require using a toxic substance, for example, Hg at vapor pressures above 1 MPa (10 atm). Ampule explosions do occur. The electrical circuitry required to operate photodetectors almost always couples to detector devices at low voltages, for example, Z2 , and λ/2 when Z1 < Z2 .
N=
where N is the near-field region, D is the diameter of the transducer, and λ is the wavelength of the material used for the buffer rod. Considering acoustic energy loss, a material that has high velocity compared to that of a coupling medium and low attenuation (e.g., fused quartz, sapphire, or the like) is generally selected. The selection of the material directly relates to the design of the lens in terms of the spherical aberration that needs to be reduced.
r = F0 sin
where d is the thickness of the transducer, n is a natural number, and c is the velocity of the longitudinal wave of the transducer. Therefore, by modifying Eq. (1), the thickness d of the transducer is determined from c 2n − 1 2n − 1 λ, (2) = d= 2 fr 2
(3)
1233
θα 2
.
(6)
The spherical aberration of an acoustic lens is illustrated in Fig. 4. When the acoustic wave from the lens is emitted into the specimen, the following equation is obtained by Snell’s law: C1 sin θ = C2 sin θ,
(7)
where θ is the incident angle of the acoustic wave and θ is the refracted angle of the acoustic wave. Then, the zonal focal distance (denoted as ‘‘F’’) is expressed as F = R (1 − cos θ ) +
sin θ . tan(θ − θ )
Table 2. Range of Radius due to Frequency Frequency 100 MHz 200 MHz 400 MHz 800–1,000 MHz 1.5–3 GHz
Radius 1–2 mm 500 µm–1 mm 500 µm 125 µm 50 µm
(8)
1234
SCANNING ACOUSTIC MICROSCOPY
Transducer
A(q)/R Optical glass air– glass C2 /C1 = 0.667
0.14 Buffer rod
0.12 0.1
F
F0
R
0.02
q′
0
A(q) Figure 4. Calculation of the spherical aberration of the acoustic lens. A(θ) is the spherical aberration, F0 is the paraxial focal distance of the acoustic lens, F is the zonal focal distance, R is the radius of curvature of the surface of the lens, θ is the incident angle of the acoustic wave, and θ is the refracted angle of the acoustic wave.
Therefore, the spherical aberration denoted as ‘‘A(θ )’’ is expressed as A(θ ) = F0 − F.
(9)
For example, when sapphire and water are used for a buffer rod and a coupling medium, respectively, A(θ ) is calculated as 0.003 R, where R is the radius of curvature of the surface of the lens. For an acoustic lens operating at a frequency of 1.0 GHz, R is approximately 100 µm. Therefore, A(θ ) is calculated as 0.3 µm. The value of wavelength of an ultrasonic wave that has a frequency of 1.0 GHz in water (1.5 µm) is small enough for a single spherical lens to form a focused image. Therefore, sapphire (z-cut) is considered the best material although it is difficult to make a spherical recess by mechanical polishing. For reference, a comparison of the ratio of the spherical aberration and the radius (denoted as ‘‘A(θ )/R’’) between an acoustic lens and an optical lens is shown in Fig. 5. The spherical aberration of an acoustic lens is much smaller than that of an optical lens.
Acoustic Anti Reflective Coating. When acoustic waves are emitted directly from sapphire into water that is used as coupling medium, 88% of the acoustic waves are reflected back from the interface between the lens and water because of their impedance mismatch. Therefore, it is necessary to coat the lens with an acoustic anti reflective coating (AARC). The thickness of the AARC should be λ/4, where λ is the wavelength of the acoustic wave, to have an optimized result, and the acoustic impedance Z is Z=
Z1 Z2 ,
Acoustic lens Sapphire– water C2 /C1 = 0.135
0.06
q
10
20
30
40 q
50
60
70
Figure 5. Spherical aberration of optical and acoustic lenses. A(θ) is the spherical aberration, R is the radius of curvature of the surface of the lens, C1 is the longitudinal wave velocity of the buffer rod, and C2 is the longitudinal wave velocity of the coupling medium.
Evaporated silicon oxide (SiO2 ) is typically used as a material for the AARC although it does not completely satisfy Eq. (10). Acoustic Lens for Pulse-Wave Mode. Most of the configurations of acoustic lenses for the pulse-wave mode are the same as those for the burst-wave mode. The significant differences are described here.
Piezoelectric Transducer. Lead zirconate titanate ceramic (PZT), lithium niobate (LiNbO3 ), and polyvinyldifluoride, (PVDF) are typically used for transducer materials for operation of frequencies of 100 MHz or less. Buffer Rod and Lens. This mode is used mainly for observing interior portions of a specimen. Therefore, it is important that the working distance (denoted as ‘‘W. D.’’) is designed to be long and the aperture angle of the lens is designed to be small. The working distance is defined as the distance from the position of the lens touching the surface of the specimen to the position of the lens when focused on the surface of the specimen. The maximum depth at which the material can be imaged is determined by the working distance. Table 3 shows the range of working distances corresponding to the range of frequencies. In the pulse-wave mode, the focal distance is designed to be long, so that the buffer rod is long. Considering its high cost, sapphire is usually not used for the buffer rod. Fused quartz, polymers, and ceramics are typically used.
Table 3. Range of Working Distances due to Frequency Frequency
Working Distance
5–30 MHz 30–50 MHz 50–100 MHz More than 100 MHz
15–50 mm 10–20 mm 5–15 mm Less than 10 mm
(10)
where Z is the impedance of the AARC, Z1 is the impedance of the lens rod, and Z2 is the impedance of the coupling medium.
SCANNING ACOUSTIC MICROSCOPY
Contrast and Application Acoustic properties (i.e., reflection coefficient, attenuation, and velocity of acoustic wave) and the surface condition (i.e., surface roughness and discontinuities) of the specimen are factors in forming acoustic images. Reflection Coefficient. The reflection coefficient at the surface of a specimen differs, depending on the materials. The difference in the reflection coefficient is the basic factor behind the formation of an acoustic image. The reflection coefficient is obtained by the reflectance function defined by the acoustic impedance and is determined by the type of material. The acoustic impedance Z can be determined from Z = ρc,
(11)
where ρ is the density and c is the longitudinal wave velocity. When acoustic waves are incident from liquid to solid at angle θ , some reflect from the solid, and some transmit into the solid (see Fig. 6). Let Z1 , Z2l and Z2s be the acoustic impedance of the liquid, the solid due to a longitudinal wave, and the solid due to a shear wave, respectively. Then, Z1 , Z2l and Z2s are expressed by ρ1 c1 , cos θ ρ2 cl Z2l = , cos θl ρ2 cs Z2s = , cos θs Z1 =
(12) (13) (14)
where ρ1 is the density of the liquid, ρ2 is the density of the solid, c1 is the velocity of the longitudinal wave in the liquid, cl is the velocity of the longitudinal wave in the solid, cs is the velocity of the shear wave in the solid, θ is the reflection angle of the longitudinal wave in the liquid,
z K
KR q
Liquid
qR
x
Solid
qS
KL qL
KS
Figure 6. Reflection and transmission of a wave incident from a liquid to a solid. K, KR , KL , and KS are wave vectors that represent the incident longitudinal wave, the reflected wave, the refracted longitudinal wave, and the mode-converted shear wave, respectively. θ, θR , θL , and θS are angles corresponding to the wave vectors.
1235
θl is the refraction angle of the longitudinal wave in the solid, and θs is the refraction angle of the shear wave in the solid. The following equations are obtained by Snell’s law: sin θl = sin θs =
cl c1 cs c1
sin θ,
(15)
sin θ.
(16)
The reflectance function R(θ ) is expressed as R(θ ) =
Z2l cos2 2θs + Z2s sin2 θs − Z1 Z2l cos2 2θs + Z2s sin2 θs + Z1
.
(17)
When an acoustic wave is vertically incident from a liquid to a solid, the reflection coefficient R is determined as R=
Z2l − Z1 . Z2l + Z1
(18)
From this equation, it can be seen that the larger the difference in acoustic impedance of two contacting materials, the greater the reflection coefficient at the interface. Normally, water is used as coupling medium between the lens and a specimen. The acoustic impedance of water is lower than that of a solid; materials that have greater acoustic impedance show a higher reflection coefficient at the surface of a specimen, thus, forming image contrast. For example, suppose that an acoustic wave is perpendicularly incident onto fused quartz and sapphire immersed side by side in water. They have different elastic properties (their reflection coefficients are 0.632 and 0.873, respectively), but it is difficult to distinguish optically between them. However, the acoustic images show a significant difference in accordance with their reflection coefficients. In addition, the acoustic impedance of a gas, such as air, is significantly lower. Thus, if a material has a void, a debonding, or a delamination, the acoustic wave will reflect strongly from the solid–gas interface. Many applications have been reported for visualizing voids, debondings, delaminations, inclusions, adhesion strength, powder distribution, deformation, and residual stress distribution, using this contrast mechanism. This nondestructive evaluation (NDE) technique is applicable to thin and thick layered films (60–62), electrical/electronic materials including integrated circuits (63–67), polymers (68–73), composite materials (74–76), ceramics (77–79), and other materials (80,81). For example, Figs. 7 and 8 show debonding of reinforcing fibers from matrices. Figure 9 shows delaminations at the interfaces of carbon-fiber-reinforced plastics (C-FRP). Figure 10 shows delaminations and adhesion differences in a layered thin film. Figure 11 shows gaps around inclusions located at the interface of the substrate and the urethane coating. Figures 12 and 13 show deformations of composites. Figure 14 shows the powder distribution in composites.
1236
SCANNING ACOUSTIC MICROSCOPY
(a)
Impact impression
0°
0° 90°
Delamination A Transverse crack
90° Delamination B
Debonding 100 µm
0° Back surface crack
Figure 7. Debonding shown in metal matrix composites (MMC; fiber: carbon; matrix: aluminum alloy). The debondings are observed as bright regions at interfaces. Frequency: 200 MHz; defocusing distance: z = −200 µm.
Attenuation. When an acoustic wave travels within a specimen, its amplitude diminishes with travel distance and is finally not measurable. This phenomenon is called attenuation unique to the specimen and, mathematically, is described by Pt = P0 exp(−αz),
A
90° B 0° (b)
(19)
where Pt is the amplitude of a transmitted wave, P0 is the amplitude of an incident acoustic wave, α is the attenuation coefficient, and z is the travel distance. Figures 15 and 16 are acoustic images of living mouse cells (3T3) and a thinly sliced biological tissue (heart muscle). The acoustic impedance of these is close to that of water (or culture liquid). Therefore, virtually no contrast caused by the difference in reflection coefficient is displayed in these images. The contrast in the acoustic images of living cells or biological tissue is generated primarily from the difference in attenuation. Therefore, it is necessary to use a background composed of highly reflective material to make maximum use of the difference in attenuation. For example, when two types of substrates (one made of glass that has a longitudinal velocity of 5.6 km/s, a density of 2.49 × 103 kg/m3 , and a reflection coefficient of 0.581, and another of polystyrene that has corresponding values of 2.4 km/s, 1.05 × 103 kg/m3 , and 0.232, respectively) are used to image a biological specimen, the glass substrate theoretically provides better results. A variety of biological applications have been reported that use this contrast mechanism (82–93).
Figure 8. Debonding shown in plastic matrix composites (PMC; fiber: carbon; matrix: glass ceramics). The debondings are observed as bright regions at interfaces.
0°
Figure 9. Delaminations at the interfaces of carbon-fiber-reinforced plastics (C-FRP). (a) Schematic diagram showing locations of delaminations. (b) Superimposed pulse-wave-mode images showing delaminations at interfaces. Delaminations were induced by the impact of energy of 1.71 J. Directions of fibers are shown in [0°n /90°n ]SYM .
Velocity of Surface Acoustic Waves (SAW). This parameter is very important for images obtained in the tone-burst-wave mode. Figure 17a is the acoustic image of polycrystalline manganese zinc ferrite that has an operating frequency of 400 MHz and has the acoustic lens focused on the surface (z = 0 µm). Also, Figs. 17b,c are the acoustic images of the same specimen where the acoustic lens is focused into the interior (z = −25 µm), and the same areas were scanned. Acoustic
Debonding
f = 800 MHz
50 µm
z = 0 µm
50 µm
z = −9 µm
20 µm
SCANNING ACOUSTIC MICROSCOPY
1237
receiving voltage in the transducer associated with the excitation of the SAW. When the aperture angle of the acoustic lens is large, surface acoustic waves are generated because the incident angle goes beyond the Rayleigh critical angle. Therefore, the contrast in Fig. 17b is caused by the generation of the SAW. The aperture angle of the acoustic lens used for forming the image shown in Fig. 17c is not large enough to generate surface acoustic waves. Details will be given here.
100 µm
Figure 10. Delaminations and adhesion differences in a thin layered film. Frequency: 800 MHz. See color insert.
Inclusion Incomplete Bonding
V (z ) Curve. Figure 18 shows a schematic diagram of an acoustic lens for expressing the V(z) curve using an angular-spectrum approach (6). When a plane acoustic wave that has an angular frequency ω is incident on the transducer plane, the acoustic field at the transducer plane is expressed as −iωt . Hereafter, we omit e−iωt . u+ 0 (x, y)e Acoustic fields of the planes (denoted as i, where i = 0, 1, 2, and 3) are expressed by a spatial distribution u± i (x, y) where i = 0, 1, 2, and 3 or a frequency distribution Ui± (kx , ky ), where i = 0, 1, 2, and 3. Superscripts + or − indicate the direction of field travel, that is in the +z or −z direction, respectively. The angular spectrum at the transducer plane (z = z0 ) is expressed as Deformation
400 µm
Figure 11. Inclusion at the interface of the urethane coating and the substrate. Frequency: 200 MHz.
lenses that have aperture angles of 120° and 60° , respectively, were used. Grains are not clearly seen in Figs. 17a,c, but are observed with significant contrast in Fig. 17b. Furthermore, the contrast in the images formed by the acoustic lens that has the aperture angle of 120° changes in accordance with the position of the focal point controlled by the movement of the acoustic lens along the Z axis. This contrast mechanism is explained by the change of the
200 µm
Figure 13. Deformation shown in metal matrix composites (MMC; whisker: SiC; matrix: aluminum alloy). Frequency: 600 MHz; defocusing distance: Z = −10 µm.
Plastic deformation
z = 0 µm
200 µm
z = −20 µm
z = −25 µm
Figure 12. Deformation shown in plastic matrix composites (PMC; fiber: glass; matrix: nylon 6). Frequency: 200 MHz.
1238
SCANNING ACOUSTIC MICROSCOPY
Sic
AI 50 µm Figure 14. Powder distribution in metal matrix composites (MMC; powder: SiC; matrix: aluminum alloy). Frequency: 600 MHz; defocusing distance: Z = −4 µm.
400 µm Figure 16. Human heart muscle. Frequency: 400 MHz. To prepare thin specimens, the tissue was embedded in paraffin and sectioned at 5 µm by the microtome. For scanning acoustic microscopy, the thinly sectioned specimen was affixed on the glass substrate, wherein the specimen was deparaffinized in xylene but not stained.
When ultrasonic waves travel from the transducer plane (z0 ) to the back focal plane (z1 ), only the phase related to the distance traveled (|z1 − z0 |) is changed: U1+ (kx , ky ) = U0+ (kx , ky ) exp[ikz (z1 − z0 )].
(23)
The pupil function P(x, y), which shows the shape of the lens, is expressed as
Figure 15. Living mouse cells (3T3). Frequency: 600 MHz; defocusing distance: Z = −5 µm; culturing medium: Dulebecco’s modification of Eagle’s medium: 1 × (Mod); temperature of the culturing medium: 37.5 ° C. The cells that are grown on the glass substrate have enough acoustical reflectivity from the substrate.
U0+ (kx , ky ) = F [u+ 0 (x, y)], ∞ =
=
=
where R is the radius of the pupil and circ. is expressed as circ(r) =
u+ 0 (x, y) exp[−i{kx x + ky y + kz (z − z0 )}] dx dy, u+ 2 (x, y) = u+ 0 (x, y) exp[−i(kx x + ky y)] exp[ikz (z − z0 )] dx dy,
−∞
∞
(24)
1, 0,
r1
(25)
Using the Fresnel approximation and the thin lens model, u+ 2 (x, y) is expressed as
−∞
∞
(x2 + y2 )1/2 , P(x, y) = circ R
100 µm
exp[ik0 f (1 + c2 )] iλ0 f ∞ ×
u+ 1 (x1 , y1 )P1 (x1 + x2 , y1 + y2 )
−∞
u+ 0 (x, y) exp[−i(kx x + ky y)] dx dy.
(20)
−∞
Then, the relationships between the spatial and frequency distributions are generalized by the following equations: Ui± (kx , ky ) = F {u± i (x, y)}, u± i (kx , ky ) = F
−1
{Ui± (x, y)}.
2π × exp −i (x1 x2 + y1 y2 ) dx1 dy1 , λ0 f
where k0 is the wave number in the coupling medium, f is the front focal distance, λ0 is the wavelength in the coupling medium, P1 is the pupil function from the lens to the specimen, and c is the ratio expressed by
(21) (22)
(26)
c=
C2 , C1
(27)
SCANNING ACOUSTIC MICROSCOPY
(a)
u0+
1239
u0−
Transducer 0
Rt
u1+
u1− D
1
R u2+
100 µm
u2− x
2
(b)
u3+
u3− Sample
3
z Figure 18. Schematic diagram of an acoustic lens for expressing the V(z) curve by an angular-spectrum approach. ‘Plane 0’ is the transducer plane, ‘Plane 1’ is the back focal plane, ‘Plane 2’ is the front focal plane, ‘Plane 3’ is the surface of the specimen, and u± i (x, y), where i = 0, 1, 2, and 3, is a spatial distribution.
2 exp[ik0 f (1 + c )] + u+ (x, y) = F [u (x, y)P (x, y)] k0 x , 1 2 1 kx = iλ0 f kf y 0 ky =
100 µm
(c)
f
(29) U2+ (kx , ky ) = F [u+ 2 (x, y)].
(30)
Combining Eqs. (29) and (30) gives
100 µm Figure 17. Manganese zinc ferrite. Frequency: 400 MHz; (a) z = 0 µm (aperture angle is 120° ); (b) z = −25 µm (aperture angle is 120° ); (c) z = −25 µm (aperture angle is 60° ).
U2+ (kx , ky ) exp[ik f (1 + c2 )] 0 + =F F [u1 (x, y)P1 (x, y)] k0 x kx = iλ0 f kf y 0 ky =
f
where C1 is the longitudinal wave velocity of the buffer rod and C2 is the longitudinal wave velocity of the coupling medium. Supposing that x2 x1 and y2 y1 at the front focal plane, the pupil function from the lens to specimen is expressed as P1 (x1 + x2 , y1 + y2 ) ∼ = P1 (x1 , y1 ). Then Eq. (26) is rewritten as
f f = iλ0 f exp[ik0 f (1 + c2 )]u+ kx , − ky 1 − k0 k0 f f kx , − ky . (31) × P1 − k0 k0 Suppose that k2x + k2y k0 . Then, the following approximation is obtained:
(28) kz =
(k20
−
k2x
−
k2y )
1 ∼ = k0 − 2
k2x + k2y k0
.
(32)
1240
SCANNING ACOUSTIC MICROSCOPY
Then U3+ (kx , ky ) is written as
Then, Eq. (39) is written as
U3+ (kx , ky )
U2+ (kx , ky ) exp[ik0 z] exp
=
−i
k2x + k2y 2k0
z , (33)
z = |z3 − z2 |,
(34)
where z2 and z3 are values along the Z axis at the front focal and the specimen planes, respectively. U3− (kx , ky ) is the angular spectrum of the reflection wave at the interface between the specimen and the coupling medium. Therefore, it is written as
kx ky , U3− (kx , ky ) = U3+ (kx , ky )R −i z , k0 k0
(35)
where R is the reflectance function [see Eq. (17)]. The angular spectrum of the reflection wave at the front focal plane is written as U2− (kx , ky )
=
U3− (kx , ky ) exp[ik0 z] exp
−i
k2x + k2y 2k0
2 f (1 + c )] exp[ik 0 − − u1 (x, y) = P1 (x, y)F [u2 (x, y)] k0 x , kx = iλ0 f kf y 0 ky = f
u− 1 (x, y) =
exp[ik0 f (1 + c )] P2 (x, y)U2− iλ0 f 2
k0 k0 x, y , f f
(41) (42)
2 + u− 1 (x, y) = − exp{i2k0 [z + f (1 + c )]}u1 (−x, −y)P1 (−x, −y) k0 x y . × P2 (x, y) exp −i 2 z(x2 + y2 ) R , f f f (43)
The variation of transducer voltage, as the position of the focal point varies is controlled by the movement of the acoustic lens along the z axis and is expressed as
z .
∞ V(z) =
(36)
− u+ 0 (x1 , y1 )u0 (x1 , y1 ) dx1 dy1 .
(44)
−∞
From Eqs. (33), (35), and (36), we obtain The wave number kl in the buffer rod is expressed as
U2− (kx , ky )
f f − k ky , − = iλ0 f exp[ik0 f (1 + c2 )]u+ x 1 k0 k0 f f kx , − ky exp(i2k0 z) × P1 − k0 k0 k2x + k2y kx ky . , × exp −i z R k0 k0 k0 (37)
kz = (k2l − k2x − k2y )1/2 ,
(45)
+ ∗ u+ 1 (x1 , y1 ) = u0 (x1 , y1 ) F [exp(ikz d)],
(46)
u− 0 (x1 , y1 )
=
∗ u− 1 (x1 , y1 )
∞ V(z) =
u− 1 (ξ, η)
−∞
−1
[U2− (kx , ky )].
u− 1 (x, y) =
×
F u− 2 (x2 , y2 )P2 (x1 + x2 , y1 + y2 )
−∞
2π × exp −i (x1 + x2 , y1 + y2 ) dx2 dy2 , (39) λ0 f where P2 is the pupil function from the coupling medium to the lens. We use the same approximation as in Eq. (28): P2 (x1 + x2 , y1 + y2 ) ∼ = P2 (x1 , y1 ).
(48)
u+ 0 (x1 , y1 )
−∞
Equation (45) is an even function. Therefore, the following relationship is obtained:
exp[ik0 f (1 + c2 )] iλ0 f ∞
∞
(47)
−1 × F [exp(ikz d)] x=x1 −ξ dx1 dx2 dξ dη. (49) y=y2 −η
(38)
Similarly to obtaining Eq. (26), we obtain
[exp(ikz d)],
d = |z1 − z0 |,
From Eq. (22), we obtain u− 2 (x, y) = F
F
−1
−1
[exp(ikz d)] x=x1 −ξ = F y=y1 −η
[exp(ikz d)] x=ξ −x1 .
(50)
y=η−y1
Using Eq. (50), Eq. (49) is rewritten as ∞ V(z) =
+ ∗ dξ dη, u− 1 (ξ, η) u0 (ξ, η) F [exp(ikz d)] x=ξ y=η
−∞
∞ V(z) =
(40)
−1
−∞
(51) + u− 1 (ξ, η)u1 (ξ, η) dξ dη.
(52)
SCANNING ACOUSTIC MICROSCOPY
Then, finally, we obtain ∞ V(z) = − exp{i2k0 [z + f (1 + c2 )]}
+ u+ 1 (−x, −y)u1 (x, y)
−∞
k0 x y exp −i 2 × P1 (−x, −y)P2 (x, y)R , f f f × z(x2 + y2 ) dx dy. (53)
Omitting the constant exp{i2k0 [z + f (1 + c2 )]}, Eq. (53) is rewritten as ∞ V(z) = −∞
+ u+ 1 (−x, −y)u1 (x, y)P1 (−x, −y)P2 (x, y)
x y k0 2 2 ×R , exp −i 2 z(x + y ) dx dy. (54) f f f
Using r − θ coordination, the function of the voltage of the transducer, in accordance with the position of the focal point controlled by the movement of the acoustic lens along the Z axis, is rewritten as ∞ V(z) = 2π
2 u+ 1 (r) P1 (r)P2 (r)R
−∞
r f
k0 × exp −i 2 zr2 dr. f
(55)
Equation (55) shows that the acoustic field of the back focal plane, the lens, the reflectance function, and the defocal distance are the main factors in the output voltage. Although the shape of the amplitude distribution of the acoustic field at the back focal plane is determined by the size of transducer and the frequency of the acoustic wave, generally speaking, its shape forms that of a Gaussian distribution (30). The pupil function is substantially constant. Therefore, the reflectance function affects the V(z), which depends on the elastic properties of the material. Furthermore, the V(z) changes in accordance with the defocusing distance. Figure 19 shows the V(z) curve for fused quartz. An acoustic lens that has an aperture angle of 120° and a
1241
working distance of 310 µm was used at an operating frequency of 400 MHz. The specimen was located in the water tank. The temperature of 22.3 ° C in the coupling medium (distilled water) was measured by a thermocouple. The temperature was substantially stabilized (change less than ±0.1 ° C). The movement of the acoustic lens along the Z axis was monitored by a laser-based measuring instrument. The transducer output voltage was periodic and had axial motion as the acoustic lens advanced from the focal plane toward the specimen. The contrast changed in accordance with the period.
Phase Change. Figure 20a,b show the amplitude and phase changes of the reflectance function due to incident angles of the acoustic waves from water to fused quartz, respectively. Figure 20a shows the relationship between the incident angle and the amplitude of the ultrasonic beam emitted from an acoustic lens operating at a frequency of 400 MHz onto the specimen (fused quartz) via the coupling medium (water). When the incident angle is close to the critical angle of the longitudinal wave (14.58° ), the amplitude of the reflectance function suddenly becomes strong and becomes the maximum value at the critical angle. After passing the critical angle of the longitudinal wave, the amplitude decreases up to about 0.8 when the incident angle increases. Then, the incident angle is close to the critical angle of the shear wave (23.49° ); the amplitude is again suddenly strong and becomes the maximum value at the critical angle. After passing the critical angle of the shear wave, the amplitude is constant. Figure 20b shows the relationship between the incident angle and the phase of the ultrasonic beam emitted from an acoustic lens operating at a frequency of 400 MHz to the specimen (fused quartz) via the coupling medium (water). The phase changes a little at the critical angle of the longitudinal wave but changes significantly in the neighborhood of the Rayleigh critical angle. This significant phase change is the key factor in contrast change. Using a ray-tracing technique, this mechanism is explained as follows. The period of this variation results from interference between the two components. Figure 21 shows that one component is spectrally reflected at normal incidence; the second undergoes a lateral shift on incidence and reradiates at the critical phase-matching angle
0 −5
dB
−10 −15 −20 −25 ∆z
−30 −35 −250
−200
−150
−100
−50
0
50 µm
100
150
Figure 19. V(z) curve for fused quartz; Specimen: fused quartz; coupling medium: distilled water; temperature of the coupling medium: 22.3 ° C (change less than ±0.1 ° C). Parameters of the acoustic lens are as follows: frequency: 400 MHz; aperture angle: 120° ; and working distance: 310 µm.
1242
SCANNING ACOUSTIC MICROSCOPY
q′L (deg) (a)
0 10
20
30
∇A
35
40
45
∇B
Amplitude of R
1
I+
II+
II−
Lens I−
0.5
O′
0 1
0.9
0.8
0.7
R
Q
Normalized wavenumber k ′u /k ′L (b)
q′L (deg) 0 10
20
35
40
45
B
Sample
A
∇C
180 Phase of R (deg)
30
Coupler liquid
O
z
C Leaky surface wave
90
Figure 21. Cross-sectional geometry of spherical acoustic lens, explaining the mechanism of the V(z) curves.
0
The phase difference is calculated by
−90
−180 1
0.9 0.8 Normalized wavenumber k ′u /k ′L
0.7
Figure 20. Reflectance function (a) Amplitude. (b) Phase. Specimen: fused quartz; coupling medium: distilled water; temperature of the coupling medium: 22.3 ° C (change less than ±0.1 ° C). Parameters of the acoustic lens are as follows: frequency: 400 MHz, aperture angle: 120° ; and working distance: 310 µm.
for the SAW (also referred to as leaky Rayleigh waves). When the acoustic wave is focused onto the surface of the specimen, the phases of acoustic waves that travel path (I) and path (II) are the same. Let this phase be denoted as f . When the acoustic lens is defocused toward the specimen at distance z, the phase changes of the waves that travel paths (I) and (II) are expressed as
(I)
(II)
= f −
4π z 4π tan θR + + π, λW cos θR λR
(56)
(57)
where λW is the wavelength of the coupling medium (water), λR is the wavelength, and θR is the Rayleigh critical angle.
(58)
By Snell’s law, we obtain the following equation: VW λW = = sin θR , VR λR
(59)
where VR is the SAW (e.g., Rayleigh wave) velocity on the surface of the specimen whose penetration depth is about one wavelength: λW (60) λR = sin θR Then, putting Eq. (59) into Eq. (58), we obtain the quantitative contrast factor as = 4π z
2OC 4π z = f − , 2π = f − λW λW 2AC 2AB = f − 2π + 2π + π λW λR
= (II) − (I)
1 1 − cos θR + π. = 4π z tan θR λW + λR
1 − cos θR λW
+ π.
(61)
Theory of the SAW Velocity Measurement. Using the phase difference of two waves, the velocity of a surface acoustic wave (SAW) such as leaky Rayleigh waves, which propagate within a small area of a specimen, can be obtained (4,5). When ((II) − (I) ) = (2n − 1)π , where n is a natural number, the V(z) curve is at a local minimum. Therefore, the period of the V(z) is obtained as z =
λW . 2(1 − cos θR )
(62)
SCANNING ACOUSTIC MICROSCOPY
By rewriting Eq. (60), we obtain VR =
VW . sin θR
(63)
VW , f
(64)
λW is expressed as λW =
where λ is the frequency of the acoustic lens. Therefore, finally, from Eqs. (62–64) finally we obtain the following equation to calculate the SAW velocity: VR =
1−
VW 1 VW · 2 z · f
2 .
(65)
Many applications using this result have been reported for biological materials (94), thin films (95–101), coated materials (102–106), ceramics (107,108), and other materials (109–113).
Optimizing Measurement Precision. Assuming that 1 VW , Eq. (65) is expressed approximately as 2f z VR
VW f z.
the movement of the acoustic lens along the Z axis accurately. Many refinements in techniques for precision measurement have been reported (114–116). Scattering. The scattering of acoustic waves can be divided into two categories: scattering caused by the particular condition of the specimen surface and scattering caused by nonhomogeneous particles inside the specimen. The former covers discontinuities including those such as surface roughness. Although minor roughness (generally λ/5 or lower) is acceptable (117), large degrees of roughness affect the image directly (see Fig. 22). The image that is formed gives the impression that it is being viewed through frosted glass. For scanning acoustic microscopy, it is necessary either to consider the roughness in relation to the size of the area under observation or lower the frequency to minimize the influence of any roughness; some techniques for removing the effects of surface roughness have been reported (14,118,119). The previous also applies to wave scattering caused by nonhomogeneous particles and must be kept in mind, especially for subsurface imaging. A specimen that contains a huge amount of scattering bodies is difficult to observe under high resolution and necessitates lowering the observation frequency.
(66)
To enhance the precision of the SAW velocity measurement by the V(z) curve technique, both sides of Eq. (66) are differentiated after taking the logarithm of both sides. Then, the following equation is obtained: 1 dVW 1 df 1 dz dVR = + + VR 2 VW 2 f 2 z
1243
(67)
Equation (67) shows that the errors in measuring the SAW velocity are the sum of the errors in the values of the velocity of the coupling medium, the frequency of the acoustic wave, and the distance of the period. Therefore, to minimize the measurement error, it is necessary to maintain a constant temperature for the coupling medium to stabilize the frequency of the acoustic wave and measure
(a)
Discontinuities. When a specimen includes an elastic discontinuity, such as an edge, a step, a crack, or a joint interface, an acoustic image of the elastically discontinuous and peripheral portions visualized by the scanning acoustic microscope shows unique contrast, for example, fringes or black stripes. As a model shown in Fig. 23, this type of contrast appears as an interference effect of surface acoustic waves incident on and reflected from elastic discontinuities (120–125). Figure 24a is an image obtained by the scanning electron microscope (SEM) of a Cu/Si3 N4 interface after the Si3 N4 portion was given a micro-Vickers indentation. Figure 24b is the magnified SEM image of the portion that has a vertical radial crack which reached the Cu/Si3 N4 interface. The SEM at higher magnification did not reveal any delamination. The assumption was that an opened
(b)
50 µm Figure 22. Surface roughness. (a) Incident acoustic waves are scattered by surface roughness of the specimen. Therefore, the interior portions are difficult to observe. (b) SAM image of the specimen that has a polished surface. Specimen: stainless steel (SUS 304); frequency: 800 MHz.
50 µm
1244
SCANNING ACOUSTIC MICROSCOPY
Scanning direction Acoustic lens
Coupler
Disconnected boundary
Sample 2
1
Figure 23. Model of Rayleigh wave excitation at a boundary.
delamination on the surface, soon closed due to change of time and relief of stress when the indentation was completed. Figures 25a,b are SAM images. A frequency of 400 MHz was chosen for the visualization. The scanning width was
1.0 mm. In Figs. 25a,b, the acoustic lens was defocused by 40 and 100 µm, respectively. Without further preparation, the SAM gave the image of the same specimen at similar magnifications and revealed the delamination (see Fig. 25a). Moreover, the delamination was observed with enhanced contrast (Fig. 25b). This can be explained by the SAW scattering mechanism, that is, when an acoustic lens that has a high numerical aperture is defocused, an ultrasonic beam is incident at a certain range of angles onto the specimen. In this experiment, the aperture angle of the acoustic lens was 120° . At a certain incident angle, the SAW is excited, travels on the surface of the specimen, and is reflected back from the interface (see Fig. 23). These reflected SAWs give the enhanced contrast of discontinuities in a specimen in acoustic images. Resolution Two types of resolution must be considered for the SAM. One is a lateral resolution (notated as ‘‘r’’), and the other is a vertical resolution (notated as ‘‘ρ’’). These are
(b)
(a) Cu
Cu
Joint interface
Joint interface Si3N4
Si3N4
Figure 24. Scanning electron microscope (SEM) image. (a) Schematic view. (b) Magnified image. The micro-Vickers indentation was made in the Si3 N4 portion to create the delamination at the interface between steel and Si3 N4 .
(a)
(b) Cu
Cu
Delamination
Interference fringes
Si3N4 Si3N4
200 µm
Figure 25. SAM images. (a) Z = −40 µm; (b) Z = −100 µm; frequency: 400 MHz.
200 µm
SCANNING ACOUSTIC MICROSCOPY
1245
(b)
(a) (µm) (µm)
1.4
4.5
1.6 1.8 2.0 2.2 2.5
4.0 3.5 3.1
5.0
2.8
(c)
Figure 26. Resolution. (a) Optical image of a standard specimen that has patterns for measuring resolution for the scanning acoustic microscope in the C-scan mode using a tone-burst wave; (b) frequency: 1.0 GHz; (c) frequency: 400 MHz.
expressed as r = Fλ = F
vw f
of the lens. Figure 26 shows surface resolutions at different frequencies. The resolutions in the images tend to be higher than the calculated resolutions.
ρ = 2F 2 λ = (2F 2 )
,
(68) vw f
CONCLUSION , The features for the SAM are summarized as follows:
fo 2 v w =2 , D f 2 1 vw =2 , 2 tan θ f vw 1 = , 2(tan θ )2 f
(69)
where F is a constant related to lens geometry, λ is the wavelength in the coupling medium (water), f is the frequency of the wave generated by the transducer, vw is the longitudinal wave velocity in the coupling medium, fo is the focal distance of the lens, D is the diameter of the lens aperture, and θ is one-half of the aperture angle
1. capability of observing both the surface and subsurface of a specimen 2. high resolution 3. high contrast 4. capability of quantitative data acquisition Although the SAM has unique features, it (especially the tone-burst-wave mode) has not been used in industry as much as conventional microscopes. It is hoped that dedicated work performed by acoustic microscopists will clear the obstacles to acceptance and allow the SAM to mature and prosper fully and that this article, which describes the fundamentals of scanning acoustic microscopy, is useful to a person to become one of them.
1246
SCANNING ACOUSTIC MICROSCOPY
ABBREVIATIONS AND ACRONYMS
20. K. Kasano, T. S. Bea, and C. Miyasaka, Proc., Japan-U.S. CCM-VII, 1995, pp. 601–608.
AARC A/D C-FRP D/A DMB FRP IC LiNbO3 mV NDE PVDF PZT RF SAM SAW SEM SPDT V W.D.
21. A. M. Howard, IEE Colloquium on ‘NDT Evaluation of Electronic Components and Assemblies’, 1990, p. 3.
acoustic antireflective coating analog-to-digital signal carbon-fiber-reinforced plastic digital-to-analog signal double balanced mixer fiber reinforced plastic integrated circutis lithium niobate milli volt nondestructive evaluation polyvinyldifluoride lead zirconate titanate ceramic radio frequency scanning acoustic microscope surface acoustic wave scanning electron microscope single pole double throw volt working distance
BIBLIOGRAPHY 1. S. Y. Sokolov, Academia Nauk SSSR Doklady, 64(3), 333–335 (1949). 2. R. A. Lemons and C. F. Quate, Appl. Phys. Lett. 24, 163–165 (1974). 3. A. Atalar, C. F. Quate, and H. K. Wickramasinge, Appl. Phys. Lett. 31, 791 (1977). 4. R. D. Weglein, Appl. Phys. Lett. 34, 179–181 (1979). 5. W. Parmon and H. L. Bertoni, Electron. Lett. 15, 684–686 (1979). 6. A. Atalar, J. Appl. Phys. 49, 5,130–5,139 (1978). 7. B. Hadimioglu and C. F. Quate, Appl. Phys. Lett. 43, 1,006–1,007 (1983). 8. J. S. Foster and D. Rugar, Appl. Phys. Lett. 42, 869–871 (1983). 9. M. S. Muha, A. A. Moulthrop, G. C. Kozlowski, and B. Hadimioglu, Appl. Phys. Lett. 56, 1,019–1,021 (1990). 10. N. Chubachi, J. Kushibiki, T. Sannomia, and Y. Iyama, IEEE, Proc. Ultrasonic Symp., 1979, pp. 415–418. 11. D. A. Davids, P. Y. We, and D. Chizhik, Appl. Phys. Lett. 54(17), 1,639–1,641 (1989). 12. A. Atalar, H. Koymen, and L. Degertekin, Proc. IEEE Ultrasonics Symp., 1990, pp. 359–362. 13. B. T. Khuri-Yakub, C. Cinbis, and P. A. Reinholdtsen, Proc. IEEE Ultrasonics Symp., 1989, pp. 805–807.
22. L. Revay, G. Lindblad, and L. Lind, CERT’90; Component Engineering, Reliability and Test Conference (Electron Components Inst.), 1990, pp. 115–122. 23. R. Bauer, M. Luniak, and L. Rebenklau, Proc. Int. Symp. Microelectronics (IMAPS), 1997, pp. 659–664. 24. T. Adams, Electron. Bull. 16(6), 51–55 (1998). 25. J. C. Bernier and L. Teems, Proc. Int. Symp. Testing Failure Anal. (ISFTA), 1998, pp. 393–398. 26. B. Smith, Am. Lab. 31(22), 18–21 (1999). 27. J. Kushibiki, K. Horii, and N. Chubachi, Electron. Lett. 19, 404–405 (1983). 28. J. Kushibiki, K. Horii, and N. Chubachi, Electron. Lett. 19, 404–405 (1983). 29. K. Liang, G. S. Kino, and B. T. Khuri-Yakub, IEEE Trans. SU-32, 213–224 (1985). 30. T. Endo, Y. Sasaki, T. Yamagishi, and M. Sakai, Jpn. Appl. Phys. 31, 160–162 (1992). 31. A. Kulik, G. Gremaud, and S. Sathish, in Acoustic Imaging, vol. 17, Plenum Press, New York, 1989, pp. 71–78. 32. Y. Tsukahara, E. Takeuchi, E. Hayashi, and Y. Tani, Proc. IEEE 1984 Ultrason. Symp., 1984, pp. 992–996. 33. M. Nikoonahad, Electron. Lett. 32(10), 489–490 (1987). 34. S. W. Meeks et al., Appl. Phys. Lett. 55(18), 1,835–1,837 (1989). 35. J. Attal, in E. A. Ash, ed., Scanned Image Microscopy, Academic Press, London, 1980, pp. 97–118. 36. H. K. Wickramasinghe and C. R. Petts, in E. A. Ash, ed., Scanned Image Microscopy, Academic Press, London, 1980, pp. 57–100. 37. K. Karaki and M. Sakai, K. Toda, ed., Toyohashi Int. Conf. Ultrasonic Technol., Toyohashi, Japan, MYU Research, Tokyo, 1987, pp. 25–30. 38. K. Yamanaka, Y. Nagata, and T. Koda, Ultrasonics Int. 89, 744–749 (1989). 39. J. D. Fox, B. T. Khuri-Yakub, and G. S. Kino, Proc. IEEE 1983 Ultrasonics Symp., 1983, pp. 581–592. 40. T. Yano, M. Tone, and A. Fukumoto, IEEE Trans. Ultrasonics Ferroelectronic Frequency Control 34(2), 222–236 (1987). 41. M. I. Haller and B. T. Khuri-Yakub, Proc. IEEE 1992 Ultrasonics Symp., pp. 937–939. 42. D. Reilly and G. Hayward, Proc. IEEE 1991 Ultrasonics Symp., pp. 763–766.
14. C. Miyasaka, B. R. Tittmann, and M. Ohno, Res. Nondestructive Evaluation 11, 97–116 (1999).
43. D. W. Schindel, D. A. Hutchins, L. Zou, and M. Sayer, IEEE Trans. Ultrasonics Ferroelectronics Frequency Control 42, 42–51 (1995).
15. C. W. W. Daft, J. M. R. Weaver, and G. A. D. Briggs, J. Micros. 139(3), RP3–RP4 (1985).
44. I. Ladabaum, B. T. Khuri-Yakub, and D. Spoliansky, Appl. Phys. Lett. 68, 7–9 (1996).
16. B. T. Khuri-Yakub and C. -H. Chou, Proc. IEEE Ultrasonics Symp., 1986, pp. 741–744.
45. M. C. Bhardwaj, I. Neeson, M. E. Langron, and L. Vandervalk, Proc. 24th Conf. An Int. Conf. Eng. Ceram. Struct., American Ceramic, Materials, and Structures: A, 2000, pp. 163–172.
17. A. Atalar, I. Ishikawa, Y. Ogura, and K. Tomita, Proc. IEEE Ultrasonics Symp., 1993, pp. 613–616. 18. S. Ujihashi, K. Skanoue, T. Adachi, and H. Matsumoto, Proc. Mech. Eng. Fourth Int. Conf. FRC’90, 1990, pp. 157–162. 19. H. Matsumoto, T. Adachi, and S. Ujihashi, Proc. Oji International Seminar on Dynamic Fracture, Chou Technical Drawing Co., Ltd., 1990, pp. 174–182.
46. C. Miyasaka and B. R. Tittmann, Proc. IEEE 1998 Ultrasonics Symp., pp. 1,265–1,268. 47. I. Ihara, C. -K. Jen, and D. R. Franca, Proc. IEEE 1998 Ultrasonics Symp., pp. 803–807. 48. R. Kompfner and R. A. Lemons, Appl. Phys. Lett. 28(6), 295 (1976).
SCANNING ACOUSTIC MICROSCOPY 49. C. E. Yeack, M. Chodorow, and C. C. Cuttler, Appl. Phys. 51(9), 4,631–4,636 (1977). 50. L. Germain and J. D. N. Cheek, J. Acoust. Soc. Am. 83(3), 942–949 (1988). 51. L. Germain, R. Jacques, and J. D. N. Cheek, J. Acoust. Soc. Am. 86(4), 1,560–1,565 (1989). 52. M. R. T. Tan, H. L. Ransom Jr., C. C. Cutler, and M. Chodorow, Appl. Phys. 57, 4,931–4,935 (1985). 53. H. K. Wickramasinghe and C. E. Yeack, Appl. Phys. 48(12), 4,951–4,954 (1977). 54. D. Rugar, Appl. Phys. 56, 1,338–1,346 (1984). 55. K. Kraki, T. Saito, K. Matsumoto, and Y. Okuda, Physica B 165/166, 131–132 (1990). 56. K. Kraki, T. Saito, K. Matsumoto, and Y. Okuda, Appl. Phys. Lett. 59(8), 908–910 (1991). 57. V. A. Burov, I. E. Gurinovich, O. V. Rudenko, and E. Y. Tagunov, in Acoustic Image, vol. 22, Plenum, NY, 1996, pp. 125–130. 58. X. Gong and D. Zhang, in Acoustic Imaging, vol. 23, Plenum, NY, 1997, 601–605. 59. J. Synnevag and S. Holm, Proc. IEEE 1998 Ultrasonics Symp., 1998, pp. 1,885–1,888. 60. R. C. Bray and C. F. Quate, Thin Solid Films 74, 295–302 (1980). 61. C. C. Lee, C. S. Tsai, and X. Cheng, IEEE Trans. Sonics Ultrasonics SU-32(2), 248–258 (1985). 62. R. C. Addison, M. C. Somekh, J. M. Rowe, and G. A. D. Briggs, L. A. Ferrari, ed., SPIE 768, 275–284 (1987). 63. R. D. Weglein, IEEE Trans. Sonics Ultrasonics SU-30, 40–42 (1983). 64. I. Ishikawa, H. Kanda, and K. Ktakura, IEEE Trans. Sonics and Ultrasonics SU-32(2), 325–331 (1985). 65. C. S. Tsai and C. C. Lee, Proc. SPIE 768, 260–266 (1987). 66. J. Onuki, M. Koizumi, and I. Ishikawa, Materials Trans. JIM 37(9), 1,492–1,496 (1996). 67. I. Ishikawa, H. Kanda, K. Ktakura, and T. Semba, IEEE Trans. Ultrasonics Ferroelectrics Frequency Control UFCC36(6), 587–592 (1989). 68. R. L. Hollis, R. Hammer, and M. Y. Al-Jaroudi, Appl. Phys. 54, 7,016–7,019 (1984). 69. M. Hoppe and J. Bereiter-Hahn, IEEE Trans. SU-32, 289–301 (1985). 70. J. Y. Duquesne et al., Mat. Res. Soc. Symp. Proc. 142, 253–259 (1989). 71. A. F. Fagan, J. M. Bell, and G. A. D. Briggs, in A. C. RoulinMoloney, ed., Fractography and Failure Mechanisms of Polymers and Composites, Elsevier Applied Science, London, 1989, pp. 213–230. 72. A. M. Sinton, G. A. D. Briggs, and Y. Tsukahara, in H. Shimizu, N. Chubachi, and J. Kushibiki, eds., Acoustical Imaging, 17th ed., Plenum, NY, 1989, pp. 87–95. 73. R. C. Addison, M. W. Kendig, and S. J. Jeanjaquet, in H. Shimizu, N. Chubachi, and J. Kushibiki, eds., Acoustical Imaging, 17th ed., Plenum, NY, 1989, pp. 143–152. 74. T. Adachi, M. Okazaki, S. Ujihashi, and H. Matsumoto, JSME Int. J. Series A 38(3), 370–377 (1995). 75. H. Morita, T. Adachi, and H. Matsumoto, J. Reinforced Plastics Composites 16(2), 131–143 (1997). 76. N. Takeda, C. Miyasaka, and K. Nakata, Proc. 5th Int. Symp. Nondestructive Characterization Mater. Karuizawa, 1991, pp. 813–824. 77. K. Yamanaka, Appl. Phys. 24, 184–186 (1984).
1247
78. A. Okada, C. Miyasaka, and T. Nomura, JIM 33(1), 73–79 (1992). 79. B. Nongaillard et al., NDT INT. 19(2), 77–82 (1986). 80. R. D. Weglein, Acousts Imaging 17, 51–59 (1988). 81. H. Vetters, A. Meyyappan, A. Schlz, and P. Mayr, Mater. Sci Eng. A122, 9–14 (1989). 82. J. Hahn-Bereiter, C. H. Fox, and B. Thorell, J. Cell Biol. 82, 767–779 (1979). 83. J. A. Hildebrand and D. Rugar, J. Microsc. 134, 245–260 (1984). 84. N. Chubachi et al., Acoust. Imaging 16, 1–9 (1987). 85. J. Bereiter-Hahn et al., Acoust. Imaging 17, 27–38 (1988). 86. N. Akashi, J. Kushibiki, N. Chubachi, and F. Dunn, Acoust. Imaging 17, 183–191 (1989). 87. J. Litniewski and J. Bereiter-Hahn, J. Microsc. 158, 95–107 (1990). 88. J. Bereiter-Hahn and H. Luers, 53(Suppl. 31), 85 (1990).
Eur.
J.
Cell
Biol.
89. H. Luers, K. Hillmann, J. Litniewski, and J. Bereiter-Hahn, Cell Biophys. 18, 279–93 (1992). 90. P. A. N. Chandraratna et al., Am. Heart J 124, 1,358–1,364 (1992). 91. G. A. D. Briggs, J. Wang, and R. Gundle, J. Microsc. 172, 3–12 (1993). 92. J. Bereiter-Hahn and H. Luers, in N. Akkas, ed., Mechanics of Actively Locomoting Cells, ASI series 84, Springer, Heidelberg, New York, Berlin, 1994, pp. 181–230. 93. P. A. N. Chandraratna et al., Acous. Imaging 21, 559–564 (1995). 94. T. Kundu, J. Bereiter-Hahn, and K. Hillmann, Biophys. J. 59, 1,194–1,207 (1991). 95. J. Kushibiki, T. Ishikawa, and N. Chubachi, Appl. Phys. Lett. 57(19), 1,967–1,969 (1990). 96. T. Endo, C. Abe, M. Sakai, and M. Ohono, Proc. Ultrasonic Int. 1993 Conf., pp. 45–48. 97. D. Achenbach, J. D. Kim, and Y. C. Lee, in A. Briggs, ed., Advances in Acoustic Microscopy, 1st ed., Plenum, NY, 1995, pp. 153–208. 98. S. Parthasarathi, B. R. Tittmann, and R. J. Ianno, Thin Solid Films 300, 42–52 (1997). 99. R. D. Weglein, IEEE Trans. Sonics Ultrasonics SU-27(2), 82–86 (1980). 100. J. Kushibiki and N. Chubachi, Electron. Lett. 23(12), 652–4 (1987). 101. Y. Sasaki, T. Endo, T. Yamagishi, and M. Sakai, IEEE Trans. UFFC-39(5), 638–642 (1992). 102. R. D. Weglein, IEEE Trans. Sonics Ultrasonics SU-32(2), 225–234 (1985). 103. R. D. Weglein and A. K. Mal, Surf. Coat. Technol. 47, 667–686 (1991). 104. J. Kushibiki, M. Miyashita, and N. Chubachi, IEEE Photonics xTechnol. Lett. 8(11), 1,516–1,518 (1996). 105. J. Kushibiki, T. Kobayashi, H. Ishiji, and N. Chubachi, Appl. Phys. Lett. 61(18), 2,164–2,166 (1992). 106. J. Kushibiki and M. Miyashita, Jpn J. Appl. Phys. 36(7B), 959–961 (1997). 107. T. Narita, K. Miura, I. Ishikawa, and T. Ishikawa, Jpn. Inst. Met. 1,142–1,146 (in Japanese) (1990). 108. S. Tanaka and C. Miyasaka, Residual Stress III, 1991, pp. 278–283.
1248
SCANNING ELECTROCHEMICAL MICROSCOPY
109. J. Kushibiki and N. Chubachi, IEEE Trans. Sonics and Ultrasonics SU-32(2), 189–212 (1985). 110. M. Obata, H. Shimada, and T. Mihara, Exp. Mech. 30, 34–39 (1990). 111. Y. C. Lee, J. D. Kim, and D. Achenbach, Ultrasonics 32(5), 359–365 (1995). 112. K. Yamanaka, Electr. Lett. 18(14), 587–589 (1982). 113. S. M. Gracewski, R. C. Waag, and E. A. Schenk, J. Acoust. Soc. Am. 83(6), 2,405–2,409 (1988). 114. Z. L. Li, IEEE Trans. UFCC-40(6), 680–686 (1993). 115. M. Okade and K. Kawashima, Proc. QNDE 14, 1,883–1,889 (1994). 116. J. Kushibiki and M. Arakawa, IEEE Trans. UFCC-45(2), 421–430 (1998). 117. B. T. Khuri-Yakub, P. Reinholdtsen, and C. H. Chou, IEEE 1985 Ultrasonics Symp., pp. 746–749. 118. P. Reinholdtsen and B. T. Khuri-Yakub, Proc. IEEE 1986 Ultrasonics Symp., pp. 759–763. 119. M. Ohno, Appl. Phys. Lett. 55, 832–823 (1989). 120. R. D. Weglein, Electron. Lett. 14(20), 656–657 (1978). 121. C. Iett, M. G Somekh, and G. A. D. Briggs, Proc. R. Soc. London A393, 171–183 (1984). 122. S. Kojima, Jpn. J. Appl. Phys. 26, 233–235 (1987). 123. K. Yamanaka and Y. Enomoto, J. Appl. Phys. 53, 84–50 (1982). 124. T. Ghosh, K. I. Maslov, and T. Kundu, Ultrasonics 35, 337–366 (1997). 125. M. Ohno, C. Miyasaka, and B. R. Tittmann, Wave Motion 33, 309–320 (2001).
SCANNING ELECTROCHEMICAL MICROSCOPY DAVID O. WIPF Mississippi State University Mississippi State, MS
INTRODUCTION Scanning electrochemical microscopy (SECM) is one of a number of scanned probe microscopy (SPM) techniques invented after the demonstration of the scanning tunneling microscope (1). The use of an electrochemical process for image formation defines SECM. In most applications of the method, an ultramicroelectrode (UME) is used as the probe, and the probe signal is the faradic current that arises from electrolysis of solution species (2,3). In other applications, using an ion-selective electrode (ISE) as the probe provides a probe signal proportional to the logarithm of the activity of an ion in solution (e.g., pH). In SECM, the primary interaction between probe tip and sample is mediated by diffusion of solution species between the sample and the tip of the probe, this distinguishes SECM from other SPM methods that may use an electrochemically active probe. An electrochemically active probe permits a versatile range of experiments, whose essential aspect is chemical sensitivity or control of chemical processes that occur at a substrate surface (4–6). Forming an image in an SPM technique requires reproducibly perturbing the probe signal by some aspect of the
imaged substrate. There are two principal image-forming modes in SECM: feedback and generation/collection (GC). The feedback mode uses the faradaic current that flows from electrolysis of an intentionally added or naturally present mediator species (e.g., Ru(NH3 )6 3+ or O2 ) at an UME probe (4,7,8). Far from the substrate surface, the electrolytic current assumes a characteristic steady-state value, but moving the probe close to the surface disturbs the current. An electrochemical, chemical, or enzymatic reaction at the substrate can restore the mediator to its original oxidation state and produce a positive-feedback signal. The cartoon representation in Fig. 1a illustrates the process where an oxidized mediator undergoes reduction at the probe tip, diffuses to the substrate surface, and then is reoxidized by the substrate. The regeneration of the mediator in the tip–substrate gap causes the probe current to increase as the separation decreases and suggests the term feedback as a description. In contrast, negative feedback occurs when the substrate surface cannot regenerate the mediator and mediator diffusion is blocked. Depletion of mediator by tip electrolysis in the tip–substrate gap causes the probe signal to decrease as the gap width decreases (Fig. 1b). Imaging in the feedback mode provides topographic images of blocking or conducting surfaces. Images of many types of surfaces have been obtained in the feedback mode, including images of electrodes (9–11), polymer films (12–16), and immiscible liquid interfaces (17). It is possible to manipulate the SECM imaging conditions to produce images that represent chemical and electrochemical activity. In the reaction-rate feedback mode (Fig. 1c), the overall mediator turnover rate at the substrate is limited by the rate of substrate–mediator electron transfer (ET) (18–20). Thus, the probe current signal contains kinetic and tip–substrate separation information. Images in the reaction-rate mode can be used to map variations in ET rate between the mediator at metallic electrodes (18,21) and enzymes in biological materials (22). The generation/collection (GC) mode uses the probe to detect changes in the concentration of a chemical species at the surface of the imaged material (Fig. 1d). Ideally, the probe acts as a passive sensor to produce concentration maps of a particular chemical species near the substrate surface. GC mode imaging is described further by the type of sensing probe used. In amperometric GC imaging, the probe is an UME, that detects species by electrolysis. Amperometric GC, first reported as a method for mapping electrochemically active areas on electrodes (23), is used to make high-resolution chemical concentration maps of corroding metal surfaces (24–28), biological materials (29–33), and polymeric materials (15,34,35). In addition, measurements of ion fluxes through porous materials (36,37) such as skin (38–40) and dental material (37,41,42) are useful applications. In potentiometric GC imaging, the probe is an ISE, which has the advantage of increased sensitivity to nonelectroactive ions and improved selectivity for imaging a desired ion concentration. The literature contains descriptions of tip electrodes and experiments for measuring of many types of ions such as H+ (i.e., pH) (43–45), K+ (46), NH4 + , and Zn2+ (45).
SCANNING ELECTROCHEMICAL MICROSCOPY
Ultramicroelectrode tip
(a) Ox
Ox
Ox
Ox Ox
Ox
Ox Red
Ox
Ox
Conductive substrate Positive-feedback mode Ultramicroelectrode tip
(b) Ox
Ox
Ox
Ox Ox Ox Red
Ox
Ox
Ox
Blocking substrate Negative-feedback mode Ultramicroelectrode tip
(c) Ox
A description of the electrochemical behavior of the UME tip is required to understand the probe signal in the SECM feedback or amperometric GC mode. Voltammetry or amperometry is the basis of a large number of electrochemical techniques in which a potential is applied between an auxiliary and working electrode (47). Control of the potential is aided by a third, reference, electrode that maintains a constant reference voltage. In either voltammetry or amperometry, the signal is the faradaic current that flows as solution species are oxidized or reduced at the working electrode. A cyclic voltammogram (CV) illustrates this at an embedded-disk UME working electrode (Fig. 2, inset). The CV wave shown is a plot of the reduction current versus applied potential for a commonly used mediator ion, Ru(NH3 )6 3+ , at a 10-µm diameter tip electrode in unstirred solution. The reduction current reaches a limiting current value at potentials negative of the reduction potential, and the forward curve is retraced as the tip electrode potential is swept back to the starting potential. The current limit iT,∞ occurs when the rate of mediator diffusion to the electrode surface is at maximum velocity (i.e., when the mediator concentration is zero at the electrode surface) and, for a microdisk electrode embedded in an insulator, is given by the following equation (3): iT,∞ = 4 nFDCa
Ox
Ox
Ox Ox
Ox Ox
Red
Ox
1249
Ox
(1)
where F is the faraday, C and D are the mediator concentration and diffusion coefficients, n is the number
k 6
Substrate
Positive-feedback
Reaction-rate mode
5
Ultramicroelectrode tip
(d)
1 nA
4
IT
I T,∞
P
P
P
Q P
3 −0.2 E (V vs Ag/AgCl)
0.0 2
−0.4
I T,∞ 1 Generation/collection mode Figure 1. Schematic diagram of the image modes in SECM.
Negative feedback
0 0
1
2
3
4
5
L
SIGNAL TRANSDUCTION The probe in SECM is an electrode in an electrolyte solution, and the tip electrode response can be quantitatively described under most measurement conditions. The most well-understood response arises from a probe geometry where disk is embedded in an insulating plane. Other tip electrode geometries are also used, but the experimental data are generally treated semiquantitatively or empirically.
Figure 2. Theoretical ( ) and experimental current vs. distance curves for positive () and negative (•••) feedback using an embedded-disk tip electrode. Experimental conditions for positive feedback: 2 mM Ru(NH3 )6 3+ , pH 4.0 phosphate-citrate buffer, 10-µm diameter Pt tip, Pt substrate, tip potential (Etip ) is −0.3 V vs. Ag/AgCl reference, and substrate potential (Esub ) 0.0 V vs. Ag/AgCl. Negative feedback conditions are as in positive feedback except for use of a microscope slide substrate. Experimental current data are normalized by dividing by iT,∞ , and distance scale is divided by the tip radius. (Inset) cyclic voltammogram for a 10-µm diameter Pt disk in 2 mM Ru(NH3 )6 3+ , pH 4.0 phosphate-citrate buffer at a scan rate of 100 mV/s.
1250
SCANNING ELECTROCHEMICAL MICROSCOPY
of electrons transferred in the tip electrode reaction, and a is the disk radius. A characteristic of an UME is that the region of solution perturbed by diffusion is much larger than the electrode itself. This results in an efficient form of diffusional transport where the diffusion field assumes a nearly hemispherical zone around the electrode (3) and is the basis for the feedback mode of SECM imaging.
In the feedback mode, the tip electrode potential is adjusted so that the electrode current for a mediator is at the limiting current value. As the UME probe approaches the surface, the hemispherical diffusion field of the mediator is disturbed, and the imaging signal iT changes from that observed at infinite distance (i.e., iT,∞ ). Two limiting cases, referred to as positive or negative feedback, are observed when the substrate either permits or blocks, respectively, ET to the mediator. Although no physical contact between probe and sample occurs in normal feedback imaging, a chemical interaction occurs because a small region of the substrate is subjected to a chemical environment different from the bulk. Imaging conditions are tuned by choosing a mediator that will produce a desirable interaction between the probe and substrate. However, a mediator must be chosen to avoid undesirable interactions with the sample, such as destruction of a sample by an oxidizing mediator. The relationship between probe current and tip–surface separation can be quantitatively calculated for tip electrodes in the shapes of embedded disks (8), cones, and spherical sections (11,48–50). An approximate analytical equation of the current–distance relationship is available for an embedded-disk probe (49,51). For positive feedback, 0.78377 −1.0672 + 0.68. + 0.3315 exp L L
(2)
The probe current and distance are presented in normalized form: IT = iT /iT,∞ and L = d/a is the normalized distance d between the probe tip electrode and substrate. For negative feedback, IT (L) =
(b)
Microdisk tip
1 . (3) 0.292 + 1.515/L + 0.6553 exp(−2.4035/L)
Comparing I –L curves between experimental and theoretical data for disk-shaped electrode probes generally produces excellent agreement, and curve fitting can be used to verify proper instrument operation. Figure 2 is a plot of a portion of the theoretical I –L curve compared to experimental data recorded at a 10-µm diameter microdisk probe. Note the increased distance sensitivity in positive feedback compared to negative feedback. Probe geometry strongly influences the probe current in the feedback mode experiment (52). For accurate measurements, the probe generally has the shape of a truncated cone. Insulating material (e.g., glass or polymer) forms the shape of the cone, and the disk electrode is located in the center of the truncated end (Fig. 3). This shape allows a closer probe tip approach to the sample surface than would be otherwise possible with
Etched tip
Connecting wire Glass capillary
Electrode shank
Silver epoxy
Feedback Mode
IT (L) =
(a)
Microwire
Polymer or wax coating Exposed tip Bottom view
Side view
Figure 3. Schematic diagram for probes that have (a) an embedded disk and (b) a conical (etched) electrode geometry.
a large insulating radius. A truncated cone tip gives a different I –L curve than, say, a hemispherical tip. Thus, no single equation for an I –L curve includes all geometric possibilities. Equations (2) and (3) are strictly valid only for disk electrodes whose insulating radius is 10 times the electrode radius (a common configuration). For negative feedback, a smaller or larger insulating radius causes the probe I –L curve to descend less or more steeply, respectively (52,53) because both the sample surface and the insulating material provide the mediator blocking necessary for negative feedback. For example, at L = 1, IT = 0.534, 0.302, and 0.213 for insulator-to-electrode radius ratios of 10, 100, and 1,000, respectively (8). The effect of insulating radius is less critical in positive feedback because the probe current is mainly due to regeneration of the mediator in the region of the substrate directly below the probe tip. Probes that have a tip electrode size less than 1 µm are easier to make in a cone or spherical-segment (e.g., hemispherical) geometry (10,49,54–56). Unfortunately, the I –L sensitivity of these probes in the feedback mode is much less than that of the disk-shaped electrodes. Further, most construction methods for very small probes (e.g., electrochemically etched tips) do not produce a reproducible tip electrode shape (57). Good estimates of the tip electrode shape can be made by fitting the experimental to simulated I –L curves (49), but it is unlikely that fitting will be used for general imaging purposes. Thus, at present, non-disk-shaped tip electrodes should be used only to provide qualitative SECM feedback images. Rastering the probe across the sample surface and recording the probe current produces topographic images of surfaces in the feedback mode. Both resolution and sensitivity increase with a decrease in tip–sample distance. Typically, the probe tip is positioned within one tip electrode radius of the sample. Because the probe response is different for insulating (blocking) and conducting surfaces, areas of both types can be
SCANNING ELECTROCHEMICAL MICROSCOPY
distinguished in feedback mode imaging. Conducting areas of a surface have a positive-feedback response and are identified by currents larger than iT,∞ ; insulating areas have a negative-feedback response when tip currents are less than iT,∞ . Note that the nonlinear I –L relationship distorts the relationship to the actual topography of the images made in the feedback mode. However, a true topographic image can be made from the current image by employing theoretical I –L equations [e.g., Eqs. (2) and (3)] (7,58). The feedback image in Fig. 4a illustrates the concept of positive and negative feedback in a test image taken of an interdigitated electrode array. Au electrodes 3 µm wide are separated by 5-µm wide SiO2 insulators. The primary contrast is the transition from positive to negative feedback; the Au electrodes exhibit positive feedback and the largest current. Topographic features such as damage to the Au electrodes are visible as slight depressions on the second and third band from the right. Lateral and vertical resolution in the feedback mode depends on the probe electrode size and tip–sample separation (52,59). The ability to discriminate between insulating and conducting regions can be used as one estimate of lateral resolution. Scanning the probe over a conducting–insulating boundary ideally produces an instantaneous change in probe current (from positive to negative feedback). Experimentally, the change occurs across a region of finite width x. For zero tip–sample separation (L = 0), x equals the tip electrode diameter. For larger separations, the lateral resolution degrades as diffusion broadens the mediator concentration profile. Unfortunately, no comprehensive theoretical answer for lateral resolution has been published. An experimental study reported that the resolution function has the form (9) x/a = 4.6 + 0.111 L2 .
(4)
1251
This equation is probably overconservative because the best resolution is 2.3 times the tip electrode diameter, which is contrary to intuition and other published data [see (60) for resolution superior to that predicted by Eq. (4)]. However, the functional form of resolution by distance seems more reasonable. Regions smaller than a tip electrode diameter cannot be resolved, but small active regions embedded in an insulating matrix can be recognized if the regions are separated by at least a tip electrode diameter. The vertical resolution of the SECM depends primarily on the probe tip–sample separation and the dynamic range of the probe current measurement. Equations (2) and (3) allow estimating the minimum resolvable vertical change in distance at a disk-shaped probe tip electrode. For example, a 1% change in I occurs from a change from L = 1 to L = 1.0137. If this difference in current is measurable, it corresponds to a theoretical 0.068-µm change at a 5-µm radius tip electrode and 5-µm tip–substrate separation or a theoretical 0.014-µm change at a 1-µm radius tip electrode and a 1-µm separation. Increasing the resolution of SECM feedback images is, in principle, possible by removing diffusional blurring by applying digital image processing techniques. Unfortunately, the nonlinear response in the SECM feedback mode has prevented a general formulation of a deconvolution function. It has been shown that approximate deblurring functions improve image contrast (61,62). The ultimate resolution in SECM may have been demonstrated by Fan and Bard who reported a resolution approaching 1 nm for imaging biological molecules (63). To achieve this high resolution, they used an etched tungsten tip electrode and a very thin layer of water naturally present on the sample in a humid atmosphere. The thin layer of solution minimizes the blurring effect of diffusion. Reaction-Rate Mode
(a)
1 nA
50 pA (b)
y (µm)
40 20 0 0
20 x (µm)
40
Figure 4. SECM images of an interdigitated array electrode that consists of 3-µm wide Au electrodes separated by 5-µm wide strips of SiO2 insulator. Images acquired by using a 1.3-µm tip–substrate distance at a scan rate of 5 µm/s. Images are presented as shaded surface plots and larger tip currents appear as higher points on the image. (a) Feedback image experimental conditions: 2 mM Ru(NH3 )6 3+ , pH 3.2 phosphate-citrate buffer, 2-µm diameter Pt tip, Etip = −0.3 V vs. Ag/AgCl reference, Esub = 0.0 V, cathodic currents are positive. (b) GC image, experimental conditions: as in (a) except Etip = 0.0V, Esub = −0.090 V, and anodic currents are positive.
The general case of feedback imaging, known as reactionrate imaging, exists when the rate of ET at the probe tip electrode or sample is considered. Positive and negative feedback are limiting cases that occur when the ET rate is at the diffusion limit or zero, respectively. Between those limits, the I –L curves change from curves similar to positive feedback to what appears to be negative feedback. Note that deviations from the theoretical positivefeedback curve may arise even when mediators that have very rapid ET kinetics are used. This is made clear by considering that the SECM response for a disk-electrode tip at very small tip–sample separations is approximately that of a twin-electrode, thin-layer cell (64,65). For this configuration, the probe current is iT = 4 nFDCA/d.
(5)
where A is the tip electrode area. The very small separations in SECM produce impressively large values of the mass transfer coefficient D/d (47). As an example, for d = 0.2 µm and D = 1 × 10−5 cm2 s−1 , the mass transfer coefficient is 0.5 cm s−1 . Heterogeneous rate constants for mediator tip electrodes or substrate ETs must be about 10 times larger to preserve the theoretical response
SCANNING ELECTROCHEMICAL MICROSCOPY
Generation/Collection Mode Using the GC mode is most appropriate when the sample itself produces an electrochemically detectable material. GC is ideally a passive mode where the probe is used as a microscopic chemical sensor. Examples include using the GC mode to investigate corroding metal surfaces (25,70), ionic and molecular transfer through porous material (71–73), and oxygen generation and
150
y (µm)
for positive feedback. Thus, the SECM can measure or detect change in heterogeneous rates for ET reaction at the tip electrode or the substrate, as long as the opposing electrode is not allowed to limit the overall reaction. This is not difficult for mediators with reasonably rapid ET rates because the opposing substrate or tip electrode can be poised at a potential sufficiently past the oxidation or reduction potential to drive the reaction to its diffusion-controlled limits. A beneficial consequence of rapid diffusional transport in the positive-feedback mode is that the probe is less sensitive to convection caused by, for example, probe movement during SECM imaging. Because of the high mass-transport rate and the ease of making measurements at steady state, the SECM is a very promising method for fundamental investigation of rapid interfacial ET rate constants. Calculations suggest that it is possible to use the SECM tip electrode to measure a rate constant as high as 10–20 cm s−1 under steady-state conditions (66). When the mediator and sample are chosen to produce a kinetic limitation of the feedback process, a number of interesting experiments become possible. Calculations of I –L curves for quasi-reversible and irreversible ET kinetics are available (67–69) that permit quantitative measurements of ET rates at micron-sized regions of surfaces. At electronically conducting surfaces, kinetic information can be extracted in the tip-sized regions that allow direct imaging of active or passive sites on surfaces. For example, greater ET activity is easily observed on the Au region of a composite carbon/Au electrode (18). Feedback current also supplies potential independent kinetic information for chemical reactions that involve ET at the interface such as immobilized enzymatic activity or oxidative metal dissolution. An example of a reaction-rate image is presented in Fig. 5. The SECM was used to prepare a pattern of oxidized regions on a glassy-carbon electrode using a direct-mode oxidation process (see later). Reduction of Fe3+ ions is catalyzed on oxidized carbon regions and the reactionrate image is based on the increased reaction rate on oxidized carbon. Successive images of the same region were collected when the carbon electrode potential was at −200 (A), −400 (B), −600 (C), and −800 mV (D). At potentials less negative than −800 mV, the ‘‘face’’ pattern appears on the image due to a higher rate of ET in the oxidized regions compared to the native carbon. At −800 mV, the pattern disappears because the rate of ET in all regions of the carbon surface is at the diffusion limit and the contrast disappears. Note that topography is not the predominant contrast mechanism in the images, although a small pit in the electrode surface is clear in (D) and less so in the other images.
100 50 0 150
y (µm)
1252
S
100 50 0
0
50
100 x (µm)
150 0
50
100 x (µm)
150
Figure 5. Reaction-rate SECM images of an oxidized carbon pattern on a glassy-carbon electrode. Imaging conditions: 10-µm Pt tip, Etip = 700 mV vs. Hg/Hg2 SO4 reference, 2 mM Fe2+ /1 M H2 SO4 , 2.5-µm tip-substrate distance, scan rate of 10 µm/s. Images are presented as gray-scale plots and larger cathodic tip currents are lighter shades. (A) Esub = −200 mV, S = 3.6–5.0 nA. (B) Esub = −400 mV, S = 5.0–7.3 nA. (C) Esub = −600 mV, S = 6.4–8.7 nA. (D) Esub = −800 mV, S = 7.7–9.3 nA. Conditions for SECM production of oxidized pattern: 10-µm Pt tip, distilled water solution, tip–substrate separation S2 > S1 . rc is the critical radius for particle stability.
1265
4D (C − Ci ) ρL
(11)
where D is the diffusivity, ρ is the crystal density, L is the crystal diameter, C is the bulk solute concentration, and
SILVER HALIDE DETECTOR TECHNOLOGY
Ci is the solute concentration at the surface. Because of its rapid incorporation into the crystal, the concentration of new material at the surface is always less than that in the bulk solution and provides a concentration gradient that acts as a driving force for growth. Precipitation Scheme
Sm
Pump
Mixer pH, p Ag meas.
Pump
AgNO3 soln
Computer Gel H2O
Temp probe
Temperature-controlled bath
Quasi steady state (dynamic mass balance)
Transient Nucleation
Sc
1.0
0 t1
tm
t2
t3
Time (t ) (nonlinear scale)
No.of crystals Z
Figure 12 shows the basic arrangement of a double-jet precipitation. The reactants are typically at 1 to 4 M concentrations, so that the supersaturation is very high just at the point of emergence from the supply tubes near the mixing head. For example, at 70 ° C and pAg 7.5, the equilibrium solubility of silver ion species is 10−6 M. If the AgNO3 concentration in the supply tube is 1 M, then the supersaturation is 106 , which provides a large driving force for crystal formation. Thus, transient nuclei are continuously formed in this highly supersaturated region of the reactor and then are fed into the bulk solution by the mixing head, where their survival will depend on the prevailing level of supersaturation (24). In the early stage (less than one minute), the bulk supersaturation is high, and the nuclei survive. These initial nuclei grow rapidly and reduce the bulk supersaturation so that additional nuclei cannot survive. Newly generated nuclei from the mixing head region will dissolve by Ostwald ripening and provide growth material for the crystals initially formed. Nucleation ceases when the rate of Ostwald ripening equals the rate of reactant addition. A model has been developed to understand quantitatively the parameters that control the outcome of the precipitation (32). Figure 13 shows a qualitative view of the precipitation. In the quasi-steady-state region a dynamic mass balance between material added and the total mass of the microcrystals formed leads to expressions for the number of stable growing microcrystals. In the usual diffusion-controlled growth mode, the number of
Reference electrode pH electrode pAg electrode
Supersaturation ratio (s)
1266
t1
t2
t3 Time
Figure 13. Qualitative representation of supersaturation ratio [Eq. (10)] and concentration of microcrystals vs. precipitation time. Axes are nonlinear. (From Ref. 32. Reprinted with permission of the American Chemical Society.)
growing microcrystals Z is given by Z=
ART r Cs −1 rc
(12)
where R is the reactant addition rate, r is the number average crystal radius, rc is the critical radius below which the crystal will tend to dissolve due to the Gibbs–Thomson effect, and A is a constant. This equation accounts for the observed effects of reactant addition rate, pAg, solubility, and temperature on the number of microcrystals formed. In many applications of silver halide technology, the grains contain more than one type of halide ion. In some cases, the second halide ion is added at a selected point during the precipitation, or even after all the first halide is added. In such situations, recrystallization will occur, and the system will tend toward a uniform composition, but cannot usually achieve it due to kinetic factors (24). The driving force for recrystallization is increased entropy, which is fundamentally different from that of Ostwald ripening. Twinning
NaBr soln
Pump control Figure 12. Diagram of an emulsion precipitation apparatus based on the double-jet scheme. The computer-controlled pumps allow the reactants to enter the reaction zone near the mixing head at the proper rate for microcrystal size control. The pAg electrode signal is fed to the computer and is part of a feedback loop for adjusting the bromide pump speed to maintain the desired pAg.
In earlier applications of silver halide technology, the grains had a more or less three-dimensional structure. But in more recent applications, tabular grains have been used. To form tabular grains, a phenomenon called ‘‘twinning’’ must occur. To understand twinning, we must first discuss how ions are placed within an individual layer during crystal growth. Consider a crystal growing by successive (111) layers — one layer of silver ions, the next bromide ions, the next layer silver ions, etc. Figure 14 shows the position of the ions in the first layer A. All of the ions in the next layer
SILVER HALIDE DETECTOR TECHNOLOGY
A
A
A B
B
C
C A
A
A
C A
B C
C
A B
C
A
A
B C
B
B
A
A B
A B
C A
C A
A
Figure 14. Layer arrangement for successive (111) planes. The bottom layer is denoted by A. The second layer is placed at the B position, the third layer at the C position, and the fourth layer would adopt the A position to start the sequence again.
can go in either position B or position C. Let us assume it is B. Then, the ions in the third layer would have to go into position C. The fourth layer would repeat the pattern where the ions go directly over those in position A in the first layer. Because there are two different ions in a growing AgBr crystal, we can represent one type of ion by an upper case letter, and the other type by a lower case letter. Then, the stacking pattern described above could be expressed as AbCaBcAbC. . . Twinning consists of reversal of the usual stacking sequence within the crystal, as shown in Fig. 15 for twinning on a (111) plane, which is the most common application in silver halide technology. The twinned portion may be thought of as the mirror image of the parent crystal. In practice, twinning often occurs when the precipitation is carried out at pAg > 10. A possible explanation of twinning suggests that occasionally a Br− (111) layer is added in the wrong (twin) position due to the predominance of complex ions such as AgBr2− 3 , (33). Other mechanisms that involve supersaturation-induced dominance of a surface integration growth mechanism (34) or coalescence of colloidal nuclei (35) have also been suggested to explain twinning. Tabular morphology requires that at least two twinning events occur in a microcrystal. In a hexagonal tabular grain, double twinning would result in six troughs, as shown in Fig. 16. It is thought that addition of new
C
Br−
b
Ag+
A
Br−
+
b C a B
a
Ag+
Ag
Stacking fault occurs here
B
Br−
Br−
A
c
Ag+
C b
A Layer structure of AgBr
1267
a C
Twin plane
b A Layer structure of AgBr with (111) twin plane
Figure 15. Normal stacking sequence of (111) planes (left) and stacking fault (right) in AgBr crystals. The stacking fault leads to the formation of a twin plane.
Double twinning Figure 16. A view of an hexagonal tabular grain that shows four of the troughs (arrows) created at the edges by double twinning. The other two troughs are hidden in the rear of the grain.
material to these trough regions occurs much more readily than addition to the large flat faces and results in much higher growth rates in the lateral direction (36). As a result, grains with a high aspect ratio (diameter divided by the thickness) are formed. The photographic advantages of tabular morphology will be discussed later. Controlling Crystal Shape and Size As already mentioned, a variety of crystal shapes are used in silver halide imaging. The shape is controlled primarily by pAg during precipitation (37). Cubic grains are formed at very low [Br− ] (10−3 to 10−4 M), octahedra are formed at somewhat higher concentrations (10−2 M), and tabular grains are formed at very high concentrations (10−1 M). The explanation for the pAg effect on shape is based on the silver-containing species in solution at the prevailing pAg. It is known that Br− is more strongly adsorbed to (111) faces than to (100) faces (38,39). At relatively high pAg, say 9, the predominant form of soluble silver is AgBr2− 3 . As more Br− is adsorbed to the (111) faces, these faces will repel the negatively charged complexes and therefore will grow more slowly than the (100) faces. The (100) faces will grow themselves out of existence, leaving only (111) faces (octahedra). At even higher pAg, twining will occur, and tabular grains are formed as discussed before. Other crystal shapes are possible for the face-centered cubic system (40) but, as discussed before, they are unstable and rapidly convert to cubes, octahedra, or cubooctahedra, depending on the pAg. Growth modifiers can be used to stabilize these other shapes (40), but such compounds are not practical because they interfere with subsequent sensitization steps. There are two basic classes of growth modifiers. The first class is known as ‘‘restrainers’’ because they slow crystal growth, sometimes preferentially on a particular face, as just mentioned. They adsorb on the crystal surface and typically form complexes with silver ions that have very small Ksp values. The second class of growth modifier is known as ‘‘ripeners.’’ These form soluble complexes with silver ions and are primarily used to increase grain size. We see from Eq. (12) that if the solubility of the system is increased, then the number of growing crystals is decreased. For a fixed mass of silver halide added during the precipitation, each grain will be larger if the solubility is increased by
1268
SILVER HALIDE DETECTOR TECHNOLOGY
using a ripener. Studies of ripeners have shown that they, indeed, decrease the number of growing crystals (41). In addition to ripeners, other parameters of the emulsion precipitation can be used to adjust grain size. We see by referring again to Eq. (12), that increasing the reactant addition rate will increase the number of crystals and thereby decrease grain size for a fixed mass of silver halide added. Factors that affect solubility such as pAg or halide type will also affect the number of crystals and ultimately the final grain size. Many of these variables will also affect crystal shape (42,43). One obvious method for increasing grain size is increasing the amount of material added. However, in a scheme of constant reactant flow rate, the grain volume increases linearly with time, so that the edge length increases as the cube root of time. Therefore, doubling the edge length will require an increase in precipitation time by a factor of 8. Under practical conditions, an upper limit to grain diameter is about 0.4 µm. Larger grain sizes can be achieved in several ways. One is to use a seeding technique in which the reactor is charged with small seeds that are then grown to a desired size. Another method is to use ripeners, as mentioned before. A third technique is to use accelerated flow rates. For example, if the flow rates are changed continuously, so that a constant flow rate per grain surface area is maintained, grain diameter will vary linearly with time. Emulsion Washing At the end of the precipitation, the reactor contains the desired emulsion grains but also a large concentration of counterions, usually NO3 − and Na+ . If these counterions are not removed, the emulsion will gradually dissolve. To stabilize the emulsion, it is necessary to remove these counterions. Several techniques are possible. One technique is called ‘‘flocculation washing.’’ In this procedure, modified gelatin is added to the emulsion, making the solubility pH-dependent. By lowering the pH, the grains and associated gelatin flocculate and settle to the bottom of the reactor, whereupon the counterions in the supernatant can be readily removed. After two or three cycles, the emulsion is essentially free of counterions. A variant of this technique uses inert salts instead of modified gelatin to increase the ionic strength until flocculation occurs, and the counterions can be removed by decanting. Another method common in industrial operations is ultrafiltration. In this technique, the emulsion is pumped through a membrane cell. The membrane pore size is such that the counterions can be transmitted, but the grains and gelatin cannot. Circulation of the emulsion is stopped when the desired conductivity is reached, indicating that the ion concentration has been reduced to the desired level. Emulsion Characterization Once the precipitation is complete and the counterions are removed, it is necessary to characterize the emulsion to determine if the desired properties were achieved. To determine the morphology of the grains, they can be examined by a scanning electron microscope or an
optical microscope if the grains are large enough. Next, the grain size must be determined. Grains in calibrated micrographs can be measured. There are also a variety of particle size distribution methods that can be employed for this purpose. It is also possible to use electrolytic reduction of the grains and, by counting the number of electrons required to reduce the grains, the volume can be calculated (44,45). It is also useful to know the thickness of tabular grains. Optical micrographs can sometimes be used for this purpose qualitatively because the grains are thin enough to cause interference effects, so that different grain thicknesses will have different colors. Quantitative measurements can be made by obtaining SEM micrographs from tilted images. Another possibility is using shadowed micrographs from a transmission electron microscope (see Fig. 7). Knowing the length of the shadow and the angle at which the shadowing was done, it is possible to calculate the thickness. However, for very thin grains, the thickness of any adsorbed gelatin layer can obscure the true thickness. The composition of the grains is another property of interest. There are several bulk methods for characterizing the grain as a whole such as neutron activation analysis, X-ray powder diffraction, spark source mass spectrometry, atomic absorption, and plasma emission. Each technique has its own limitations in detection limits and sample preparation. In some cases, it is useful to do compositional analysis in different parts of the grains to test for homogeneity or more usually to see if a desired heterogeneity has been achieved. Techniques here include analytical electron microscopy coupled with cross-sectioning techniques (46), luminescence microscopy at low temperatures (47,48), and secondary ion mass spectrometry (49,50). PHYSICS OF THE SILVER HALIDE MICROCRYSTAL Crystal Defects Similar to other crystalline materials, silver halides can possess crystal defects. One defect of importance photographically is the dislocation. An edge dislocation occurs when a partial plane of ions is missing from the crystal, creating a line imperfection (17). Dislocations can be introduced during crystal growth or can be induced by applying pressure to a perfect crystal. The defect induces inefficiencies in the image formation stage of the photographic process because it leads to internal latentimage formation, which is inaccessible to the developer during image processing. Thus, great care is taken to avoid this defect. Other crystal defects are known as point defects because they involve just one of the lattice ions, rather than a plane of ions. An example is the formation of an interstitial silver ion, as shown in Fig. 17. Note that a complementary species, a vacancy, is also formed and is assigned a negative charge to take into account the surrounding six Br− ions. The process is reversible and allows interstitials to recombine with vacancies to re-form lattice silver ions. Thus, an equilibrium is set up between lattice silver ions
SILVER HALIDE DETECTOR TECHNOLOGY
Ag+lattice
Ag+interstitial
V −Ag(vacancy)
+
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Interstitial Ag+i Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Vacancy Br− Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Figure 17. Two-dimensional view of a silver halide crystal showing the formation of an interstitial silver ion and a vacancy. Also shown is the equilibrium between lattice silver ions and interstitials and vacancies. Once created, the interstitial migrates throughout the crystal, driven by thermal energy.
and interstitial silver ions and vacancies. Such a defect is also known as a Frenkel disorder, after one of the pioneers in studying them. The energy requirement for such a process is very low, so it is quite favorable in AgBr, but less so in AgCl. As will be discussed in detail later, much of the image formation and detection occurs at the surface of the silver halide microcrystals making the structure of the surface very important. A smooth surface plane would not be realizable under practical crystal growth and storage conditions. Rather, a more realistic view of the (100) crystal surface consists of steps and terraces and jogs along the steps. An example of these jogs, known as ‘‘kink’’ sites, is shown in Fig. 18. Because of the ionic nature of silver halides, the kink site has a partial charge associated with it because it is missing three of its normal six neighbors of opposite sign. The kink site illustrated in Fig. 18 has a formal charge of +0.5 e, where e is the electrostatic unit of charge, and is known as a ‘‘positive-kink site.’’ By interchanging the positive and negative charges in Fig. 18, one obtains a negative-kink site whose formal charge is −0.5 e. However, the ions around these sites of excess
−
− − − −
− −
− − − −
+
− − −
− −
− −
+
− − − − − −
− −− − − −
+
− − −
+
− − − −
− − − −
− − − − − −
− −
+
− −
−
+ −−
− −− − − − − − − − −− − − −− −
−
1269
charge can move to some extent to accommodate these excess charges and thereby reduce the charge to less than the indicated formal charge. This movement by the lattice ions is called ‘‘lattice relaxation.’’ These sites of partial charge will take on importance when we discuss image formation later. A microcrystal bounded by (111) planes would be very unstable. It would have a large excess of charge because all of the surface ions would be the same; they carry uncompensated charge because, like kink sites on (100) surfaces, they are missing three of their normal oppositely charged neighbors. As a result, the crystal surface reconstructs to achieve a lower energy configuration. The exact nature of the reconstructed surface is not known. Models of the surface suggest a partial outermost layer of the ions of the same type where ions form either a hexagonal (51) or alternate row (52–54) pattern, as illustrated in Fig. 19. Some of these surface ions may be missing, thus forming ‘‘kink-like’’ sites.
Ionic Properties The interstitial silver ion discussed previously is not a static entity but can repeatedly move from one interstitial position to another because the free energy requirement for such a process is very low. On the other hand, the complementary vacancy formed with the interstitial ion has a large free energy requirement for motion that reduces its mobility at room temperature by several orders of magnitude relative to the interstitial silver ion. The movement of interstitial silver ions and vacancies through silver halide crystals corresponds to a movement of charge and imparts electrical conductivity to the material. Because silver halides are electrical insulators in the dark, measurements of conductivity under such conditions relate to the ionic properties of the material.
(a)
(b)
− −−
−
Figure 18. Illustration of a positive-kink site on a AgBr (100) surface.
Figure 19. Two possible models of a reconstructed AgBr (111) surface. (a) hexagonal model. (b) alternate-row model. Shaded symbols are the top layer; open symbols are the second layer. The shaded circles can be either Ag+ or Br− , and the crosshatched squares are ion vacancies. (From Ref. 54. Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
1270
SILVER HALIDE DETECTOR TECHNOLOGY
Ionic conductivity σ is determined by the product of mobility µ and concentration n and is expressed as σ = e(µi ni + µv nv )
(13)
where the subscripts i and v refer to the interstitial and vacancy, respectively. At equilibrium, KF = ni nv = 2N 2 exp
GF kT
(14)
where KF is the Frenkel equilibrium constant, N is the number of lattice sites, GF is the free energy of formation of the Frenkel pair, k is Boltzmann’s constant, and T is the temperature in absolute units. Table 2 summarizes the thermodynamic data related to Frenkel pair formation in AgCl and AgBr which can be used to calculate concentrations of defects, also given in Table 2. The mobility part of the ionic conductivity can be calculated as µ=
Gµ el2 v exp 6kT kT
(15)
where l is the jump distance, ν is the vibrational frequency of the defect, and Gµ is the free energy of the jump. Table 3 summarizes the thermodynamics of the vacancy and interstitial jump, as well as the calculated mobilities. A model consistent with experimental data has been developed that describes the mechanism by which the interstitial and vacancy move (55–57). Figure 20 illustrates that both the interstitial silver ion and the lattice ion that it will eventually replace move in a concerted process. The lattice silver ion moves into an Table 2. Thermodynamic Constants for Frenkel Defect Formation in Silver Halidesa Property Formation enthalpy (eV) Formation entropy (e.u.) Mole fraction defects Defect concentration (cm−3 ) a
AgCl
AgBr
1.49 ± 0.02 11.13 ± 0.20 4.92 × 10−11 1.15 × 1012
1.16 ± 0.02 7.28 ± 0.58 4.71 × 10−9 9.80 × 1013
Ref. 7.
Table 3. Thermodynamic Constants for Frenkel Defect Motion in Silver Halidesa Property
AgCl
AgBr
Interstitial Enthalpy (eV) Entropy (e.u.) Mobility, cm2 V−1 s−1
0.018 ± 0.008 −3.83 ± 0.12 3.3 × 10−3
0.042 ± 0.011 −3.34 ± 0.28 7.7 × 10−4
Vacancy Enthalpy (eV) Entropy (e.u.) Mobility, cm2 V−1 s−1 a
Ref. 7.
0.306 ± 0.008 −0.65 ± 0.12 6.1 × 10−7
I
0.325 ± 0.011 1.01 ± 0.28 1.7 × 10−6
I
Figure 20. Schematic of the movement of an interstitial silver ion in a AgBr crystal. The ions are not drawn to scale. The interstitial silver ion I displaces a lattice silver ion into another interstitial position, taking its place in the lattice. The silver ion must undergo quadrupolar deformation to squeeze through the triad of halide ions (crosshatched).
interstitial position at the same time that the interstitial silver ion is moving into the lattice ion position. The energetics involved in the movement of both ions is the same. Key to the operation of this mechanism is the ability of the interstitial silver ion to squeeze through the triad of bromide ions (crosshatched ions in Fig. 20). The actual effective radii of the bromide ions are much larger than those illustrated in Fig. 20 to the point that they are almost touching each other. The remaining space for the silver ion to squeeze through the bromides is smaller than the free silver ion. What saves the day is that the silver ion can rather easily undergo quadrupolar deformation to form an ellipsoidal ion that can squeeze through the available space between the bromide triad. The mechanism for vacancy motion is illustrated in Fig. 21. In the simple diagonal movement at the bottom of the figure, the silver ion moves from its lattice site toward the vacancy, so that when the jump is completed, the lattice silver ion and the vacancy have merely switched places. In this case, even after the quadrupolar deformation of the silver ion, there is significant spatial overlap between the deformed ion and the two bromide ions, leading to the higher enthalpies of motion indicated in Table 3. However, Fig. 21 also shows an alternative, presumably lower energy pathway that involves two sequential noncollinear jumps, where the normal interstitial position acts as an intermediate state. But such a low energy pathway is not consistent with experimental data, and the reasons for this discrepancy have been discussed (56,58). Impurities can be incorporated within the grain, and this is sometimes intentionally done as in doping. Whether impurities or dopants, these species are most often incorporated in their ionic form, and when they are polyvalent ions, the Frenkel equilibrium can be altered due to the requirement for overall electrical neutrality. So, for example, if an impurity in a +2 valence
SILVER HALIDE DETECTOR TECHNOLOGY
1271
25°C
AgCl : Cd Cl2 (0.20) Mol (0.15) Mol AgCl (0.10)
I
(0.08)
V
(0.06)
state is incorporated, a vacancy must also be present to compensate for the extra charge on the impurity. Depending on the impurity, the vacancy may be bound to it or may dissociate from the impurity to migrate elsewhere in the crystal. Ionic conductivity is an important property that affects both the storage and image capture properties of the silver halide detector. Thus, it is important to be able to measure this property in photographic materials. A contactless technique is required because of the small size of the microcrystal. This is done by measuring the dielectric properties of thick emulsion layers and then using a theory based on ionic relaxation of conducting particles in an insulating medium (59–68). In this technique, the imaginary part of the dielectric constant of the coating is measured at different frequencies of an ac field. The theory predicts that the frequency at which the peak occurs is proportional to the ionic conductivity. An example of the data obtained by this technique is shown in Fig. 22 which pertains to AgCl grains doped with CdCl2 (61). As the Cd2+ concentration increases, the peak first shifts to a lower frequency that indicates lower ionic conductivity. This can be attributed to a lowering of the interstitial silver ion concentration and a corresponding increase in the less mobile vacancy concentration. However, a further increase in the Cd2+ concentration shifts the peak to higher frequencies which indicates an increase in ionic conductivity. In this region, the Frenkel equilibrium is dominated by vacancies and, even though they are less mobile than interstitials, their concentration is high enough that conductivity increases. Problems with this theory and associated theories (69) have been noted when multiple peaks are observed in the dielectric loss spectra of silver halide (70,71). Nevertheless, the reasonable correlation between observed and expected trends makes the technique at least qualitatively useful.
Dielectric loss
Figure 21. Schematic illustrating the movement of a vacancy V in a AgBr crystal. The lattice silver ion can move diagonally to a new lattice position (dashed arrow), thereby exchanging places with the vacancy. The ion must undergo quadrupolar deformation to squeeze through the halide dyad (crosshatched). Also shown is a presumably lower energy pathway that involves two sequential noncollinear jumps through the normal interstitial position (solid arrows).
(0.04) (0.02)
AgCl
(0.01)
(0)
1
2 3 4 Log frequency (Hz)
5
Figure 22. Dielectric loss spectra for CdCl2 doped AgCl. Curve labels indicate the mole fraction of dopant. Curves are displaced vertically for clarity. (From Ref. 61. Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
Table 4. Ionic Conductivity of Silver Halides at Room Temperaturea Ionic conductivity, −1 cm−1
AgCl AgBr a
Microcrystal
Bulk crystal
0.8 × 10−8 1 × 10−6
2 × 10−9 2 × 10−8
Ref. 61.
It is of interest to compare those results obtained by techniques for measuring ionic conductivity in emulsion grains with measurements on large crystals using more conventional techniques. This comparison, given in Table 4, shows that microcrystals have 10 to 100 times higher ionic conductivity than large crystals. These results are explained by the large surface-area-to-volume ratio in microcrystals and the realization that interstitial silver ions can be created independently of vacancies at the surface via positive-kink sites, as illustrated in Fig. 23. This mechanism lowers the free energy requirements relative to the bulk process (7,72) and leads to higher concentrations of interstitial silver ions. The vast majority
1272
SILVER HALIDE DETECTOR TECHNOLOGY
Br− Ag+ Br− Ag+
Br− Ag+ Br−
Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Figure 23. An edge-on view of the formation of an interstitial silver ion from a surface positive-kink site.
of the interstitial ions in emulsion grains are created by this surface generation process and the following equilibrium is appropriate: + −−−→ Ag+ kink ←−−− Aginterstitial
toward the center of the crystal where the normal bulk process dominates. The potential profile that arises from the ionic space charge region is also shown in Fig. 24. Theories that treat the ionic space region have been developed (73–77), and they predict how the concentration of interstitials and, based on knowledge of mobilities (Table 3), ionic conductivity will vary with grain size (78). These calculations agree reasonably with experimental data and show that ionic conductivity decreases with increasing microcrystal diameter. Using space charge theory, it is possible to calculate the number of interstitials per grain. A 1-µm diameter spherical AgBr grain would contain about 5,000 interstitial silver ions (78). Because of the higher energy requirements in AgCl (see Table 2), the number of interstitials in the same size AgCl grain would be two to three orders of magnitude lower.
(16) Electronic Properties
The increase in the density of negative kinks at the surface is another important feature associated with the surface generation process of interstitials. In fact, there is an increase of one negative kink for every interstitial silver ion created by the surface generation process and a corresponding decrease of one positive kink. Thus, an excess negative charge is built up on the surface, which is balanced by a subsurface region of interstitial silver ions. This spatially inhomogeneous charge distribution is known as the ‘‘ionic space charge region.’’ The concentration gradients are illustrated in Fig. 24. Note that the concentration of interstitials and vacancies is quite different near the surface, but they gradually approach equal concentrations as one moves
(a)
The electronic properties of a material are controlled by the energy levels accessible to its electrons. The relevant energy levels in molecules are those of the highest filled and lowest vacant molecular orbitals. In crystalline solids, these energy levels broaden into bands called the ‘‘conduction band’’ and ‘‘valence band.’’ These bands arise because the Pauli exclusion principle dictates that all of the electrons in the material be distinguishable (i.e., have a set of four quantum numbers that is unique). These bands in AgBr are derived from the Ag+ 4d and 5s and the Br− 4p and 5s atomic orbitals (79,80), as shown schematically in Fig. 25. Of special interest is the energy gap between the two bands, called the ‘‘forbidden gap’’ or the ‘‘band gap.’’ Electrons are quantum mechanically forbidden to have energies encompassed by this gap. Insulators like AgBr and AgCl are characterized by a filled valence band and an empty conduction band. Light absorption occurs when a photon transfers its energy
Log concentration
[Agi+ ]
x Conduction band (empty) − [V ]
Bulk levels
Br− 5s
Energy
Ag+ 5s Surface
Forbidden gap
(b)
x
Potential
0
−
Valence band (filled) Br− 4p Ag+ 4d
Ionic space charge region
Figure 24. Concentration profiles (a) of interstitials and vacancies and electric potential (b) as a function of distance x from the surface. The space charge region is indicated. ‘‘Bulk levels’’ are the concentrations expected in the absence of the ionic space charge effect.
Interionic distance Figure 25. Energy vs. interionic distance schematic illustrating the formation of the conduction and valence bands in AgBr. On the right are the orbital energies of the isolated ions. These orbitals mix together as the interionic distance decreases to form the indicated bands.
SILVER HALIDE DETECTOR TECHNOLOGY
to an electron in the valence band and promotes it to the conduction band. Once in the conduction band, the electron is free to migrate throughout the crystal, whereas those electrons in the valence band are confined to a very localized region. The energy of the band gap defines the lowest photon energy that can cause light absorption. Based on photographic data, the effective band gap is about 3 eV (410 nm) for AgCl and about 2.6 eV (475 nm) for AgBr. Mixed halide crystals are often used in practical applications. In AgClBr crystals, the photon threshold is intermediate between that for AgCl and AgBr, whereas the threshold wavelength in AgIBr crystals extends to somewhat longer wavelengths than those in AgBr. Typical absorption spectra of large crystals shown in Fig. 26 (81) indicate that absorption extends to a slightly longer wavelength than indicated by the photographic data. But this absorption is too weak to be photographically detected. The absorption spectra shown in Fig. 26 indicate a rather significant problem in recording color images. The AgBr detector responds only with sufficient sensitivity in the UV and blue regions of the spectrum, and incorporation of iodide extends the absorption only into the short green region. This deficiency is resolved by adsorbing spectral sensitizing dyes on the silver halide surface, which extends
106
Absorption coefficient (cm−1)
105 104 103 102 101
their absorption range through the red and even into the near IR. We will discuss this feature in a later section. Light absorption also leads to an empty electron position in the valence band which is referred to as a ‘‘hole.’’ Neighboring electrons can move to occupy the empty electron position, but this creates a vacant position elsewhere; the hole appears to move. Although the electrons in the valence band are the actual charge carriers, it is mathematically valid and simpler to focus on the hole as the moving entity. The hole has a characteristic mobility and mass, as well as a positive charge to complement the negative charge on the electron. Electrons and holes are confined to the energy bands, but localized regions in the crystal lattice, known as ‘‘traps,’’ can exist at energies within the forbidden gap. These traps are sites that may localize the electron or hole, as illustrated in Fig. 27. An electron is localized within a trap by losing enough energy, so that its energy is now coincident with that of the trapping level. The excess energy is dissipated to the surrounding lattice ions as thermal energy. An escape from the trap requires the trapped electron to acquire sufficient thermal energy so that its energy is now coincident with the bottom of the conduction band. Note that the holes in this diagram are trapped ‘‘upward’’ because the energy direction refers to that of the electron. Bear in mind that when a hole moves either spatially or energetically in one direction, it can be viewed as an electron moving in the opposite direction. Traps in silver halide materials are impurities (such as dopants) and lattice defects (such as dislocations and kink sites). Lattice defects generally have an associated uncompensated charge and interact with charged carriers (electron or hole) via Coulomb’s law. The situation is analogous to that of a hydrogen atom with a central charge and an orbiting particle of opposite charge. An equation to calculate the binding energy of the carrier to the trapping site comes from the Bohr treatment of the H atom, modified for the effective mass m∗ of the carrier and the dielectric constant ε of the material (82,83):
1
100
3
2
4
E=
10−1 10−2 220
260
300
340 380 420 460 Wavelength (nm)
500
540
580
Figure 26. Absorption spectra of pure and mixed halides at room temperature. (1) AgCl; (2) AgBr; (3) AgCl containing 10 mole% AgBr; (4) AgBr containing 3 mole% AgI. (From Ref. 81.)
Energy
e−
1273
q2t e2 m∗ 2
(17)
2h ε2
where qt is the trap charge and h is Planck’s constant divided by 2π . The effective mass arises because the carriers within the crystal appear to move with a mass different from their real mass. This effect is induced by the carrier interaction with the surrounding lattice ions as it moves through the crystal. An analogous equation is
Conduction band (CB)
Traps
Valence band (VB) Photon absorption
h+
Figure 27. Illustration of electron and hole trapping levels in a crystal. The energy arrow applies only to electrons; an oppositely directed one exists for holes, but is not shown
1274
SILVER HALIDE DETECTOR TECHNOLOGY
available for calculating orbital radius r: 2
r=
eh qt em∗
(18)
Typical binding energies and orbital radii are given in Table 5. The binding energies for the electron are small (kT or less), and the radii are correspondingly large (10’s of lattice spacings). Because of its much higher effective mass, the hole has a much higher binding energy and smaller orbital radii than the electron. However, the trapped carrier in both cases is in a delocalized state characteristic of coulombic traps. Impurity traps are derived from the atomic energy levels of the specific impurity but are modified by their environment within the crystal lattice. Examples of such traps are I− and Cu+ . These are isoelectronic with the lattice ions that they replace and therefore provide no long-range coulombic effects. However, their atomic energy levels are different from the surrounding lattice ions and provide the requisite trapping level. In contrast with the coulombic traps, these impurity traps provide high localization of the trapped carrier. In addition to these two types of trapping sites in the silver halide lattice, a third possibility is derived by combining these two; an example is Ir3+ . The probability of trapping by a particular trap is determined by its effective trapping radius or crosssectional area (84). The traps present a ‘‘sphere of influence’’ to a carrier diffusing through the crystal, and trapping occurs if the carrier passes through this sphere. The effective trapping radius is determined by the amount of energy that must be dissipated as a carrier falls from the free band to the trapping level. If the amount of energy to be dissipated is only on the order of a few kT or less, then the transition has unit probability of occurring, and the trapping radius is determined solely by the charge on the trap. This is typical for coulombic traps such as lattice defects. Then, the capture cross section of these so-called ‘‘shallow traps’’ is determined by equating the binding energy to the average thermal energy available (kT), and the resulting trap cross section for a trap with charge 0.5 e is of the order of 10−13 cm2 . Although obviously a small number, this cross section is about 100 times larger than that calculated for an uncharged site, which would have a trapping radius of the order of a lattice spacing.
When the amount of energy to be dissipated is larger than a few kT (>0.05 eV), a multistep process is required in which the energy is transferred to the surrounding lattice in small increments (85). This process requires the presence of intermediate energy levels that act like rungs on a ladder. These so-called ‘‘deep traps’’ are usually also associated with a shallow trapping coulombic level. For example, a silver atom located at a positive-kink site would be characterized by a shallow trapping level due to the coulombic nature of the kink site and by a deep trapping level associated with the 5s atomic energy level of the silver atom. This is depicted schematically in Fig. 28. The effective trapping cross section σt is given by (86) σt = σc η = σc
t t+i
(19)
where σc is the trapping cross section of the coulombic trap and η is a ‘‘sticking coefficient’’ that describes the probability that the carrier will transit to the deep level t, as opposed to returning to the conduction band i. Trapping of the electron (or hole) causes a lattice distortion (lattice relaxation) around the localized carrier because the charge density has been altered (87). Basically, the ions rearrange themselves to accommodate the higher (lower) charge, thereby lowering the energy and deepening the trap relative to the case in which there is no lattice relaxation. These processes are represented as configuration coordinate diagrams. An example of such a diagram is given in Fig. 29 for electron trapping at a deep trap such as a silver atom, which also has an associated coulombic level. The three-dimensional nature of the crystal is collapsed into a one-dimensional lattice distortion Q, where Q = 0 is the equilibrium configuration in the absence of the trapped carrier. Once the electron is trapped in the coulombic level, it can move to the relaxed state, provided that the energy barrier Ea is less than the coulombic trap depth Ec . Transition to the deep level occurs where the energy curves of the relaxed coulombic level and the deep level intersect (point C in Fig. 29). As drawn, this transition is favored, but, if the crossing point C lies above the barrier for thermal escape (point B), then detrapping is favored.
e− CB
i Table 5. Trap Characteristics of Charged Centers in Silver Halides
e− m∗ /m ε qt = 1 E(meV) ˚ r(A) qt = 0.5 E(meV) ˚ r(A)
0.29 13.1 23 24 6 48
AgCl h+ 1.71 13.1 140 4 45 8
e− 0.43 12.3 39 15 10 30
Energy
AgBr
Shallow
t
Deep Figure 28. Schematic showing the capture of an electron at a deep trap. Transition t represents the movement of the electron to the deep level, whereas transition i represents thermal release to the conduction band from the shallow trapping level.
SILVER HALIDE DETECTOR TECHNOLOGY
1275
Table 6. Trap Escape Times
CB
Trap depth (eV)
Escape timea
Comments
0.01 0.05 0.10 0.20 0.30 1.0
0.4 ps 1.8 ps 13.6 ps 0.75 ns 41 ns 67 h
Intrinsic defect traps Intrinsic defect traps Intrinsic defect traps Chemical sensitization traps Chemical sensitization traps Latent image
B
Ec
Eb
Ea
C
Relaxed state
Energy
Shallow trap
a
Deep trap
ν = 10−12 s−1
CB
e− e−
Q=0
Escape of a carrier from a trap involves transferring thermal energy from the surrounding lattice ions to the carrier. The rate of transfer depends on the lattice vibrational frequency which is about 1012 s−1 in AgBr. The probability that a certain amount of energy will be available is given by the usual Boltzmann factor, so that the overall rate of escape of a carrier is given by rateescape = ν exp
−E kT
(20)
where ν is the attempt-to-escape frequency, synonymous with the lattice vibrational frequency, and E is the trap depth, which is measured from the edge of the free band to the trap energy level. The inverse of the release rate gives the time for escape of the carrier from the trap, so that τ = ν −1 exp
E kT
(21)
Table 6 summarizes some typical escape times for a range of trap depths and indicates the kinds of entities that are associated with these trap depths. A major loss process in the photochemistry of silver halides is recombination between electrons and holes. Quantum mechanical selection rules prevent recombination between free electrons and free holes (7). Nevertheless, recombination does occur, provided that one of the carriers is trapped, as illustrated in Fig. 30.
Energy
Q Figure 29. Configuration coordinate diagram for trapping at a deep trap. Following capture of the electron at the shallow trap (coulombic level), a thermally assisted process allows formation of the relaxed state. From this state, the transition to the deep level occurs at crossing point C, also requiring thermal assistance. Ec is the coulombic trap depth of the shallow trap, Ea is the energy barrier at point B to relaxation, and Eb is the thermal trap depth of the relaxed state. Q is a one-dimensional simplification of the three-dimensional rearrangement of lattice ions as they accommodate the extra charge of the electron.
h+ VB
h+
Figure 30. Electron-hole recombination pathways in AgBr. Dotted lines depict trapping levels for carriers. Dashed lines indicate the recombination step. In the right-hand diagram, the hole is undergoing trapping at a filled level of the center that is providing the electron trapping level, after which the recombination occurs.
Under most conditions, the pathway on the left is favored because holes have a higher effective mass than electrons and therefore are more deeply trapped and because the electron’s mobility is much higher than that of the hole. The recombination often leads to luminescence at low temperatures (88,89), but this process is quenched at room temperature where nonradiative processes dominate. The photochemistry of silver halides involves the formation of atomic species — silver and halogen. The silver atom is formed when an interstitial silver ion migrates to the site of a trapped electron. The energy level scheme associated with this transformation is shown in Fig. 31. The electron is initially trapped in a shallow coulombic trap that is characteristic of a positive kink. Following this initial trapping, the lattice surrounding the trapped electron relaxes to accommodate this new added charge, thus deepening the trap. Then, the interstitial silver ion is attracted to this site which now bears a partial negative charge. Finally, the electron is localized at the deep level corresponding to the 5s level of the silver atom. There is a corresponding process for the hole, except that now a lattice silver ion must leave the site
1276
SILVER HALIDE DETECTOR TECHNOLOGY
e−
CB
e− e−
Energy
Trap empty 1
Trap filled 2
Trap relaxed + 3 Ag i 4
Free
y
f
Free
Trapped
g
b
Trapped
Atom
D
a
Atom
Hole
Electron
Ag 5s
Atom state 5
VB Figure 31. Sequence of events leading to the formation of a silver atom. Circled numbers indicate the sequence of events. Arrow depicts the ionic step in which a silver ion interstitial is captured at the relaxed trap.
Figure 32. Schematic representation of the three-way equilibria between the free, trapped, and atom states for both the hole and electron.
LATENT-IMAGE FORMATION adjacent to a trapped hole at a negative-kink site and move into an interstitial position, leaving behind a halogen atom (90). Both the silver and halogen atoms are unstable species. Most of the evidence for the instability of the silver atom, will be discussed in later chapters. Now, we can merely realize that for the silver cluster formation process to be efficient, which it often is, there must be a pathway for collecting all of the photoproduced silver atoms in one or a few sites on the grain surface. Although diffusion of silver atoms on the grain surface is one way of accomplishing this, the general consensus is that silver clusters form by alternate addition of electrons and interstitial silver ions, as discussed in the next section. Thus, the silver atoms must decay and re-form at a preferred growth site. There is no direct way to measure the lifetime of the silver atom, but indirect methods provide an upper limit of about 1 s (91–96). In the case of halogen atoms, experiments in which large silver halide crystals are placed in a chamber of halogen gas have shown increased conductivity of the crystals; this could happen only if the adsorbed halogen gas injected a hole into the valence band, resulting in the formation of a halide ion (97). In this section we have described three effective states for the electron — free, trapped, and atom. Similar states exist for the hole. These states are all transient; it is useful to use the schematic in Fig. 32 to describe the transitions between the different states. The fraction of time spent in each state is symbolized by Greek letters, so that φ + β + α = 1 and similarly for the three hole states. Because of the high interstitial silver ion concentration in silver bromide, the electron spends most of its time in the atom state. The low mobility of the hole coupled with its higher effective mass results in a deeply trapped state which means that the hole spends most of its time in the trapped state. We will return to this partitioning of the electron and hole into transitory states in the next section.
Nucleation-and-Growth Model A number of models have been developed to explain latent-image formation in silver halides (7,98–114). The most successful of them is based on ideas first proposed by Gurney and Mott in 1940 (99,100), shortly after the new ideas of quantum mechanics had been applied to crystalline solids. The Gurney–Mott model is based on the following assumptions: • Silver atoms form when photoelectrons combine with preexisting mobile interstitial silverions. • Silver atoms are unstable, and their diffusion to give silver clusters is not possible. • Aggregation of silver atoms proceeds by the addition of photoelectrons and mobile interstitial silver ions to the initial trapping site. • Trapping of a number of electrons occurs at a given site in the microcrystal and then a like number of interstitial silver ions is added. It was recognized rather quickly that the Gurney–Mott picture had a serious flaw. After trapping the first electron, a trapping site would acquire a negative charge that would repel subsequent electrons. To avoid this problem, it was suggested that each trapping of an electron should be followed by the addition of a silver ion (115). Thus, the charge should alternate between 0 and −e. Later, it was pointed out that the sites where the silver clusters form are most likely to be physical defects such as kink sites, that have a partial electrostatic charge (116). The charge on the site should alternate between partial negative and partial positive as electrons and silver ions are added alternately. These ideas have become the basis of a model first proposed by Hamilton and Bayer (117) and subsequently referred to as the nucleation-and-growth model (7,106) because it includes a two-stage concept of nucleation and growth (118–123). It also considers that the trapping of electrons at trapping sites and atom
SILVER HALIDE DETECTOR TECHNOLOGY
e−
(a)
CB Trapping level
Energy
e− + (+kink)+1/2
(trapped e−)−1/2 VB
(b)
Energy
Agi+
e−
e−
Ag
Ag
CB +
Agi
(trapped e−)−1/2 + Agi+ (Ag0)+1/2 VB
Figure 33. The initial electronic (a) and ionic (b) events in the formation of the latent image. Figures give a band diagram view, whereas the inset shows the chemical equilibrium between the two states.
formation are reversible, as shown in Fig. 33. These two reversible events form the basis for the three-way cycle diagrams seen earlier in Fig. 32. Following the formation of a silver atom, it is possible for a second electron to be trapped at this site followed by the addition of a second silver ion. These two events constitute the nucleation stage and result in a two-atom center. Unlike the silver atom, the two-atom center is considered a permanently stable species during latent-image formation. Because nucleation cannot occur without a silver atom, nucleation requires at least two electrons in the grain. Now, we can add nucleation to the other events depicted in Fig. 31 to get Fig. 34. Growth is the continuation of this scheme of alternate addition of electrons and interstitial silver ions at the trapping site. The growth stage continues as long as both reactants are available.
CB
e−
e− e−
e−
Energy
1
2
3
5 Agi+
Interstitial silver ions are continually replenished via the Frenkel equilibrium (Fig. 17), so the terminating step of the growth sequence is exhaustion of the electron supply. A basic premise of the model is that the various states alternate in charge and provide an electrostatic driving force for each step (7), as shown in Fig. 35. As the state of the grain moves from left to right along this diagram, larger and larger silver clusters are produced. As we will see shortly, a certain size cluster is required to promote the development of the grain, so that growth need only achieve this threshold size for the grain to be developable and thereby contribute to the macroscopic image. From what has been said so far, one might conclude that, for example, if a five-atom silver cluster is required to initiate the development reaction, then the grain need absorb only five photons to produce such an occurrence. However, we have not yet included the major inefficiency in the model that will dramatically increase the number of absorbed photons required for the grain to achieve developability. This inefficiency involves recombination between electrons and holes, and, as discussed earlier, there are two possible pathways. For free electrons recombining with trapped holes, we recognize that a negative-kink site would be a possible hole trap that would then act as a recombination center whose formal charge is +0.5 e. Likewise, for free holes recombining with trapped electrons, we recognize that an electron trapped at a positive-kink site would be a recombination center whose charge is −0.5 e. Thus, both pathways have an electrostatic driving force that allows them to compete readily with nucleation and growth. As discussed earlier, the dominant pathway is recombination between free electrons and trapped holes. Figure 36 summarizes all of the events in latent-image formation that we have discussed. On the left are the two three-way cycles that characterize the transient states of the electron and hole. Also included within this region are the two recombination pathways. To the right are the irreversible events of nucleation and growth. Note that nucleation involves a free electron combining with an electron in an atom state to produce an electron-atan-atom state. Then, moving diagonally from this state represents the addition of a silver ion to form a twoatom center. Similar processes occur to produce larger
e− e−
Trap Trap Trap Ag empty filled relaxed
1277
Ag + e− 6
e−
Unstable
Nucleation
−e/2
−e/2
Growth
Ag2
Agi+
8
7
4
VB Figure 34. A continuation of Fig. 31 showing the formation of Ag2 .
e− Kink +e/2
Ag+ e− Ag + e/2
Ag+ e− Ag2 +e/2
−e/2
Ag+ e− Ag3 +e/2
−e/2
Ag+ e− Ag4 +e/2
Figure 35. The sequence of events that lead to the formation of a developable silver cluster. Note the alternation of charge on the site which is given at the top and bottom of the diagram. (From Ref. 106. Reprinted with permission from World Scientific Publishing Co. Pte. Ltd.)
1278
SILVER HALIDE DETECTOR TECHNOLOGY
and
Free hole y
Trapped hole g
Recombination
hn Free electron f
Nucleation
Growth
= (e)(e − 1)n where
Trapped electron b
Electron at atom
n = φαve σAg V −1
Electron at two atoms etc.
Bromine atom D
(26)
Two silver atoms
Silver atom a
Three silver atoms
Figure 36. A schematic representation of the essential events in the nucleation-and-growth model. Light absorption is symbolized by hν. Additional events may be needed to treat specific cases. (From Ref. 124. Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
silver clusters. The scheme depicted in Fig. 36 is the minimum number of events needed to explain latentimage formation. The sensitometric behavior of silver halides can be explained largely by understanding how the forward moving processes of nucleation and growth compete with the backward moving recombination process, as well as how nucleation and growth compete with each other. There will be many examples in the remaining sections to illustrate these concepts. In some situations, additional events may be needed to explain latent-image formation. The ideas expressed before concerning latent-image formation can be put into equations that can be used as a basis for a simulation program to predict the photographic consequences of various factors (117,125–127). The three-way cycles can be treated with a steadystate approximation. Equations for the irreversible events can be derived by assuming that the ionic event — the addition of an interstitial silver ion — is a fast process and electron or hole capture is rate-limiting. Generically, the rates λct of the irreversible processes can be expressed as λct = nc nt vc σt V −1 (22)
Note how the concentration of silver atoms is expressed. The (e − 1) factor is used so that the nucleation rate cannot be greater than zero unless there are two or more electrons in the three-way electron cycle. Physically, this factor indicates that an electron can participate with all other electrons in silver cluster formation, but not with itself. Similarly, a growth rate at an n-atom silver cluster can be expressed as λg(n) = [free electrons][n-atom centers](rate constant),(28) (29)
= eAgn g(n)
(30)
where g(n) = φve σAgn V −1
(31)
and Agn is the number of n-atom centers. Equation (30) gives the rate of growth at just one size silver cluster. The total growth rate is the sum of rate expressions for all of the silver clusters present. Note that the rate constant for growth differs from that for nucleation primarily in the capture cross section used — that for a silver atom versus that for a silver cluster. We will shortly see that this is a very fundamental difference. We can develop similar equations for recombination. However, now there are two rates — one for freeelectron/trapped-hole recombination and the other for free-hole/trapped-electron recombination. For the first pathway, λr1 = [free electrons][trapped holes](rate constant), (32) = (φe)(γ h)ve σγ V −1
(33)
= ehr1
(34)
(23)
where nc is the number of carriers, nt is the number of traps, νc is the thermal velocity of the carrier, σt is the capture cross section of the trap, ct is the rate constant, and V is the grain volume which is used to put the rates in per-grain terms. If we let e be the number of electrons partitioned among the free, trapped, and atom states, and let h be the number of holes similarly partitioned, then, using the template expressed by Eqs. (22,23), the rate of nucleation can be expressed as λn = [free electrons][silver stoms](rate constant), (24) = (φe)α(e − 1)ve σAg V −1
= (φe)(Agn )ve σAgn V −1 and
and = nc nt ct
(27)
(25)
where σγ is the cross section for electron capture at a trapped-hole site. For the second pathway, λr2 = [free holes][trapped electrons](rate constant), (35) = (ψh)(βe)vh σβ V −1
(36)
= ehr2
(37)
where σβ is the cross section for hole capture at a trappedelectron site, (38) r1 = φγ ve σγ V −1 and r2 = ψβvh σβ V −1
(39)
SILVER HALIDE DETECTOR TECHNOLOGY
1279
The rate constant for the total recombination is r = r1 + r2 = [φγ ve σγ + ψβvh σβ ]V −1
(40)
c c.b.
c.b.
Equation (40) can be simplified by the following definition: f =
Shallow state
v h σβ v e σγ
Excited state
(41) Silver atom level
so that now f ψβ r = φ γ + ve σγ V −1 φ
ω=γ +
f ψβ φ
(43)
which then leads to r = φωve σγ V −1
(a) Nucleation (h 3, δ becomes progressively greater as both N and ηn increase. It may be thought that recombination might be the cause of HIRF because high-irradiance exposures lead to large concentrations of electrons and holes. However, Eqs. (45) and (47) show that both nucleation and recombination have a second-order dependence on carrier density. Thus, it is generally believed that competition between nucleation and growth is the cause of HIRF (105). However, Eq. (46) shows that growth has a first-order dependence on carrier density, indicating that recombination will dominate over growth at high irradiance. So, the extent of recombination will influence the magnitude of HIRF. This can be seen in Fig. 47 where an increase in the recombination index ω produces higher δ values.
1288
SILVER HALIDE DETECTOR TECHNOLOGY
2.0
Normal exposure followed by a uniform exposure
N=6
1.8
Uniform exposure followed by a normal exposure
1.4
N=5
Speed
1.6
Normal exposure
d
1.2 1.0 0.8 Log time
N=4
0.6 0.4
Figure 48. An illustration of the results of double-exposure experiments using a reciprocity-failure plot. ‘‘Normal exposure’’ means that a step tablet was used. When the uniform exposure follows the normal exposure, it has low irradiance, but when it precedes the normal exposure, it has high irradiance.
0.2
N=3 N=2
0 0.2
0.4
0.6 hn
0.8
1.0
Figure 47. The dependence of the degree of HIRF δ on nucleation efficiency ηn and minimum developable size N. Solid line, ω = 1; dashed line, ω = 4. (From Ref. 124. Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
Double-Exposure Experiments Now that we know something about the mechanistic causes of LIRF and HIRF, we can discuss double-exposure experiments (118–123). These experiments, summarized in Fig. 48, involve reciprocity studies but use either a uniform pre- or postexposure in conjunction with the usual step-tablet exposure. In the absence of limitations due to exposure-induced fog, these experiments can eliminate LIRF by a short, high-irradiance preexposure and eliminate HIRF by a long, low-irradiance postexposure. In the less than ideal case when the pre- or postexposure creates fog centers, there is still a substantial reduction in the degree of LIRF and HIRF. These results indicate that there are two distinct stages in latent-image formation that have different inefficiencies in each stage and are one of the foundation experiments that led to applying the nucleation-and-growth concept to latent-image formation. The results of the double-exposure experiments can easily be explained by the nucleation-and-growth model. At low irradiance, nucleation is inefficient relative to growth. Thus, a low-irradiance uniform exposure following a high-irradiance step-tablet exposure will primarily provide electrons that can be used to enlarge a latent subimage to a developable size, with minimal formation of new clusters (fog centers). This procedure, known as ‘‘light latensification,’’ will eliminate HIRF. At high irradiance, nucleation is efficient, whereas growth is not. Therefore,
a uniform preexposure at high irradiance will produce many small subdevelopable centers. When followed by a low-irradiance step-tablet exposure, these centers will be made developable by a few growth steps. The normally inefficient nucleation at low irradiance is avoided, and LIRF is removed. CHEMICAL SENSITIZATION Chemical sensitization (5,7) involves treating the grain surface with low levels (100 ppm or less) of chemical reagents to increase the efficiency of latent-image formation. In some cases, chemical sensitization may have an alternative goal such as modifying the shape of the D–log E curve, reciprocity failure, or the response to changes in development time. Because chemical sensitization occurs at the grain surface, this process must necessarily occur after the emulsion has been washed but before the coating operation. In contrast, dopants, which are the subject of the next section, are placed in the emulsion reactor so that they can be incorporated into the grain during crystal growth. There are two principal inefficiencies to be minimized in improving the efficiency of latent-image formation. The dominant one that is always present is recombination. In addition, some grains may have a tendency to form an internal latent image in the unsensitized state, but the surface modifications by chemical sensitization are intended to favor surface over internal latent-image formation. It will be useful first to review the state of the emulsion before any chemical sensitization. Such unsensitized grains are characterized by very poor quantum sensitivity of 100 or more photons/grain (159,175). This is explained by postulating inefficient nucleation, as done in an earlier section. Using the quantum sensitivity equation [Eq. (66)], taking the recombination index ω = 1, and assuming that
SILVER HALIDE DETECTOR TECHNOLOGY
1289
(71)
Thus, the low efficiency agrees with experimental observations. Furthermore, this simple analysis shows that most of the inefficiency occurs in nucleation. This calculation assumes that recombination is the only inefficiency. Internal latent-image formation would simply increase the number of photons required. Unsensitized emulsions display no HIRF but have significant LIRF. They have primarily one latent-image center/grain at all exposure irradiances and show little speed change when the development time is varied over a wide range. All of these effects can be explained by inefficient nucleation. Such inefficiency would prevent nucleation from competing with growth, which avoids the dispersity inefficiency and the associated HIRF. Because there is no dispersity inefficiency, there is only one silver cluster/grain, even at high irradiance. It follows that, if there is only one cluster/grain, increasing the development time will not increase speed because there are no grains that contain only subdevelopable centers that might be revealed by lengthening the development time and effectively decreasing N. Sulfur Sensitization Historically, sulfur sensitization is probably the first type of sensitization used for silver halides. Early gelatin materials had many impurities, including those containing sulfur. Simply heating the emulsion would produce a speed increase and a fog increase if overdone (176,177). Modern gelatin is quite pure, and sulfur sensitization is accomplished by intentionally adding reagents that contain labile sulfur atoms. Examples are thiourea derivatives and thiosulfate; we will use the latter as an example in our discussions because it is a commonly used reagent. The schematic in Fig. 49 illustrates the practical aspects of chemical sensitization. The emulsion is in a gelled state at room temperature, so typically it is heated to 35–40 ° C to produce a homogeneous liquid state. Then, the sulfur-sensitizing reagent is added after adjusting the pH and pAg to the desired levels. Most sulfur-sensitizing reagents react slowly at this temperature, so the emulsion is heated to accelerate the reaction. Typical reaction temperatures are 50–70 ° C. After the desired reaction time at the elevated temperature, the emulsion is cooled to 40 ° C, and stabilizers are added. The emulsion is now ready for coating. The overall reaction between thiosulfate and AgBr is shown in Fig. 50. Only the outer sulfur atom is involved in the reaction. It is thought that the reaction occurs in three stages (178–183). The first stage involves adsorption of the thiosulfate anion. The second stage involves the actual reaction to form sulfide anions
40
Reduce temp.
QS = 5 + 200 + 12 = 217 photons/grain
Hold for desired time at desired temp.
Raise temp. 5°C/3 min.
Now assume that ηn is 0.01, which leads to
Add sensitizer @ pH, p Ag
70
(70) Bring emulsion to 40°C
2 + 12 ηn
Temperature (°C)
QS = 5 +
Add coating doctors
the minimum developable size is five atoms,
Coat Chill set to store
Time Figure 49. Schematic illustrating the practical steps involved in the chemical sensitization of an emulsion. NaO
O
+ 2AgBr + H2O = Ag2S +
S NaO
−
O
S
O + 2Na− + 2Br− + H+
S HO
O
Figure 50. The chemistry involved in silver sulfide formation on a AgBr surface using sodium thiosulfate.
adsorbed on the grain surface. Formally, we denote the species Ag2 S, although the exact structure of the formed species is unknown. Experiments that limit the reaction to these first two stages result in little, if any, speed increase (178–180,184). A third stage occurs upon normal sulfur sensitization, which involves formation of the active species. The exact nature of the active species is unknown, but it is often assigned to an aggregate of silver sulfide. That is, the site contains two or more S2− which somehow improve the efficiency of nucleation and increase the sensitivity of the emulsion. Some experiments, as well as modeling, suggest that the active site contains two S2− (184–186). It has been suggested that the number of these sites needed to reach maximum speed is about 1,000/µm2 (184), based on radiotracer determination of the deposited S2− and the resulting speed. The energy levels of the sulfur-containing species on the silver halide surface obviously play a critical role in their sensitometric effects. Because bulk silver sulfide is black, we know that its band gap is much smaller than that of silver halides. We might surmise that there is a gradual decrease in the energy gap as one goes from very small centers containing just a few sulfide ions to very large centers whose structure approaches that of bulk silver sulfide. Quantum mechanical calculations on free single- and double-sulfide molecules show this effect (187). Consistent with these ideas is the knowledge that sulfur sensitization induces photographic sensitivity at longer wavelengths where silver halide does not absorb (188–191). Assuming a general pattern of decreasing energy gap as the number of sulfides increases, the electronic properties of the sulfide centers will depend on the way their vacant and filled levels align with the silver halide conduction and valence bands.
SILVER HALIDE DETECTOR TECHNOLOGY
Unfortunately, direct physical measurement of these levels is not possible. Indirect measurements combining photoconductivity and dielectric loss measurements have indicated that the lowest vacant level of the sulfide centers is about 0.3 eV below the conduction band (192). Studies using luminescence modulation spectroscopy have also suggested a similar value (185,186). Photographic measurements of the temperature dependence of the previously mentioned long wavelength sensitivity also indicate trapping levels of several tenths of an eV (189–191). Some researchers have attributed a hole trapping property to the sulfide centers (101), and the measured and calculated energy levels are consistent with this (185–187,189–191). However, it is likely that sulfide centers are formed at positive-kink or kink-like sites, and would, as such, provide a very small cross section for hole trapping. The relationship between speed and sensitizer level or reaction time is shown schematically in Fig. 51. There is a clear optimum sensitizer level or reaction time, beyond which speed decreases. Associated with the speed decrease is an increase in fog, which in unfavorable situations begins to increase before the maximum speed is reached. These two effects are correlated. The speed loss at high sensitizer concentration or long reaction times is called ‘‘oversensitization,’’ but we will postpone a discussion of the mechanism of this effect until after a general discussion of the sulfur-sensitization mechanism. The source of fog upon sulfur sensitization is often attributed to overly large aggregates of Ag2 S (193,194). The proposed energy-level scheme given in Fig. 52 shows that small aggregates provide electron-trapping levels below the CB that will improve nucleation, as will be discussed shortly. However, it is hypothesized that large aggregates have even lower lying electron-accepting levels that can accept electrons from the developer, much like that of the latent image. As a result, a grain that contains such a cluster will develop without exposure and lead to an increase in fog.
CB (Ag2S)small e− e−
(Ag2S)large
VB
Developer solution
AgBr
Figure 52. Energy-band diagram that shows the suggested difference between sensitization centers and fog centers from sulfur sensitization of AgBr. Shaded rectangles depict the range of electron-accepting levels. The energy level in the developer solution phase is that of the highest filled molecular orbital of the developing agent.
S opt
S over Speed
1290
U
Log time
Fog
Speed
Figure 53. Changes in reciprocity failure due to sulfur sensitization. U denotes the unsensitized emulsion, Sopt the optimum level of sensitizing reagent, and Sover an excessive amount of sensitizing reagent.
[S ] or reaction time Figure 51. Schematic showing how speed and fog depend on either thiosulfate concentration at a fixed reaction time or reaction time at a fixed thiosulfate concentration.
Sulfur sensitization has profound effects on the reciprocity characteristics of silver halides (131,195–198). As demonstrated schematically in Fig. 53, sulfur sensitization, relative to the unsensitized emulsion, shifts the onset of LIRF to longer exposure times but also introduces HIRF. The extent of these changes increases with increasing sulfur level. The speed of sulfur-sensitized emulsions exposed for short times is very sensitive to the development conditions (time, temperature, activity of the developer) (195), as discussed earlier in connection with dispersity inefficiency. Centers arising from sulfur sensitization direct the location of the latent-image centers (197). It is possible to
SILVER HALIDE DETECTOR TECHNOLOGY
interrupt the emulsion precipitation, carry out chemical sensitization, and then to continue precipitation so as to ‘‘bury’’ the sulfide centers. Emulsion grains in which the sulfide centers are located internally form an internal latent image (detected with a special developer for developing an internal latent image). Emulsion grains in which the sulfide centers are located on the surface form a surface latent image. This latent-image directing property is most consistent with the idea that sulfide centers act as electron traps. An understanding of the many sensitometric effects of sulfur sensitization can be found in the nucleation-andgrowth model (127). All of the effects can be explained by enhanced electron-trapping ability of the grain. From the QS analysis of the unsensitized emulsion, we know that nucleation is very inefficient, so it is reasonable to suggest that sulfur sensitization somehow enhances this stage of latent-image formation. This enhancement is due to the creation of a deeper trapping level associated with the silver sulfide center, as shown on the righthand side of Fig. 54 (106). Here, we see that the sulfide center increases the trap depth and thereby lowers the crossing point on the configuration coordinate diagram, so that the transition to the deep level associated with the silver atom becomes more likely. As a result, nucleation becomes more efficient. Experimental quantum sensitivity measurements yield a value of about 30 photons/grain for optimally sulfur-sensitized emulsions (131). Using Eq. (66) and a nucleation efficiency of 0.2 yields a QS of 27 photons/grain. The effect of sulfur sensitization on reciprocity failure follows similar reasoning. As discussed earlier, the probability of the dispersity inefficiency, which is the cause of HIRF, increases with increasing nucleation efficiency. As nucleation becomes more efficient, this stage competes more favorably with growth at high irradiance and leads to a dispersed silver cluster distribution and HIRF. The effect on LIRF is understood by realizing that the onset of LIRF is proportional to the fraction of time that the electron spends free in the CB (127). Sulfur sensitization causes the electron to spend more time in the deeper traps
1291
associated with the sulfide centers and less time free in the CB, requiring a longer exposure time to observe LIRF. When the sulfur-sensitizer concentration is increased beyond the optimum level, oversensitization results, as illustrated in Fig. 51. Recalling Eq. (43), we see that the recombination index ω depends on γ , ψ, β, and φ. We have already argued that the hole spends most of its time in the trapped state so γ ∼ = 1, which means that ψ 1. In the unsensitized state, the electrons spend most of their time in the atom state, but the remaining time is partitioned more toward the free state than the trapped state because the intrinsic defect traps are very shallow. Thus, β < φ and the second term of Eq. (43) is negligible, leading to ω∼ = 1, as we assumed before. As more and more traps are created by the increased sulfur-sensitizer concentration, the electron spends less and less time free in the conduction band, causing φ to decrease and β to increase, so that the second term in Eq. (43) eventually becomes larger than one. At this point, ω begins to increase (131,198). This latter parameter is present in both the second and third term of the quantum sensitivity equation [Eq. (66)], so we see that the efficiency decreases. This happens because now the second pathway for recombination involving free holes and trapped electrons, whose extent is determined by the second term of Eq. (43), has become important. Thus, the optimum level of sensitizer or optimum sensitization time that leads to maximum speed corresponds to the minimization of the rate of both recombination pathways. These considerations show why an optimum sensitizer level exists. Increasing sensitizer concentration leads to more traps or deeper traps or some combination of the two. As a result, nucleation efficiency increases and quantum sensitivity improves. However, at some sensitizer concentration, the parameter ω may start to increase, leading to poorer efficiency. The possible speed versus sensitizer level behavior is shown in Fig. 55. C corresponds to the case in which ω remains at unity for any sensitizer level, whereas B corresponds to the optimum sensitization condition in which ηn and ω are both one at the sensitizer level that leads to the maximum sensitivity point. When ω increases before
c c.b.
C
Shallow state
c
Speed
c.b. B A
Silver atom level (a) Unsensitized (h Na > Li . To avoid charge buildup, the cation must diffuse into the film at the same rate as the anion. It turns out that NH+ 4 can accelerate the fixing process when added to a sodium thiosulfate solution, although there is a optimum ratio, beyond which the acceleration effect can be lost. The final stage of image processing is the wash step. Washing removes salts and other species accumulated during other stages in image processing. A major component to be removed is thiosulfate. If it remains in the dried film or print, it will attack the silver image over time. The principle of washing is to provide a driving force for thiosulfate to diffuse out of the film or paper by providing a solution of very low thiosulfate concentration. This principle can be met by continually providing changes of water by using a flowing water setup. The degree of washing can be predicted by using the Bunsen–Ostwald dilution law (279), Xn =
v V +v
Xo
(96)
where Xo is the initial concentration of thiosulfate, Xn is the thiosulfate concentration after n washings, v is the volume of washing fluid in and on the materials between stages, and V is the volume used in each wash bath. Commercial acceptability requires Xn /Xo = 100, but archival acceptability requires Xn /Xo = 1,000. DETECTOR DESIGN CONSIDERATIONS What has been discussed so far focuses entirely on how the silver halide grain is optimized for the desired (usually maximum) sensitivity. But the performance of any detector is based on its achievable signal-to-noise ratio (SNR). Noise in the silver halide detector is observed as graininess in an image (4,280,281). The image of an object whose radiance is uniform may or may not display the expected uniformity, depending on the characteristics of the detector. When the image does not reproduce the object’s uniform radiance faithfully, it has a granular appearance. There are fluctuations in the image density that should not be present. The appearance of graininess has to do with the randomness by which the silver halide grains are placed within the detector during the coating operation. Following development, the ‘‘silver grains’’ are also randomly placed and lead to areas of image density fluctuations. The human visual system is very sensitive to this randomness. A similar degree of density fluctuation but in an ordered array would be less noticeable. Although the human visual system cannot resolve the individual silver grains, their random placement does translate into the appearance of graininess in the final image. A further feature of this randomness in silver halide detectors is that it increases with grain size. Consider an
1304
SILVER HALIDE DETECTOR TECHNOLOGY
emulsion coating that contains small grains and another that contains large grains. If both coatings are made so that they have comparable maximum density, then the large grain coating will have higher graininess because the larger grains can achieve the required maximum density with fewer grains. As a result, the intergrain distance will be larger with correspondingly larger fluctuations in density. The correlation of graininess with grain size leads to a fundamental problem in designing photographic systems whose SNR is optimum. As discussed before, higher speed is achieved most often by increasing the grain size. But, now we see that the noise also increases with the sensitivity increase, so that the improvement in SNR will be less than expected. The exact relationship between SNR and grain size is related to a host of emulsion, coating, and design factors, so no simple rule of thumb can be given. Nevertheless, it is a fundamental issue in optimizing photographic systems for image capture. Yet another feature of the silver halide detector is how well it reproduces the sharpness of the object being imaged (4,280,281). If we imagine that the object reflects a very narrow beam of light, we are concerned whether this beam is blurred in the final image. Imaging scientists use the concept of a ‘‘point-spread function’’ to characterize the amount of blurring seen in the image. Grain size is an important factor in determining the point-spread function in a silver halide detector. Because the sizes of the grains are on the order of the wavelength of light used in image capture, there is considerable scattering of light. The point of light is spread out laterally so that there is blurring. Although a bit of oversimplification, the tendency is for the spread to be greater for larger grains. Taking these three system design factors — sensitivity, graininess, and sharpness — we can relate them by reference to the ‘‘photographic space’’ shown in Fig. 70 (282). The volume under the triangle represents the space available to the system designer. If higher sensitivity is needed, then the grain size can be increased. This will move the apex of the triangle further up
Sensitivity
Increased sharpness
Less noise
Figure 70. Schematic that illustrates detector design space. The area under the triangle is the ‘‘photographic space.’’
the vertical axis, but will also pull the points inward where the triangle intercepts the horizontal axes — the volume under the triangle remains constant, and one design attribute is optimized at the expense of others. Equal but opposite effects occur when grain size is decreased to reduce graininess and increase sharpness. One way to overcome these design constraints is to improve the efficiency of latent-image formation. Using this approach, the sensitivity can be increased without increasing the grain size. Thus, the volume under the triangle increases, and the system designer has more degrees of freedom. If higher sensitivity is not needed for a particular application, then the efficiencyincreasing technology can be applied to smaller grains to increase their sensitivity to the design requirements, and the smaller grain size ensures improved sharpness and less graininess. For this reason, photographic film manufacturers are constantly looking for ways to improve the efficiency of latent-image formation. IMAGING SYSTEMS This section briefly describes imaging systems based on silver halide detector technology; much more information can be found in other articles. The detector in systems designed to produce black-and-white images, would be prepared much as described earlier (4). Because the detector must have spectral sensitivity that spans the major part of the visible spectrum, both green and red dyes must be adsorbed on the grain surface. The emulsion would have a polydisperse grain-size distribution, so that the characteristic curve has a low-contrast (about 0.6), long-latitude response to minimize the risk of over- or underexposure. The final print image is made by exposing through the negative onto a negativeworking silver halide emulsion on a reflective support. The characteristic curve of the emulsion intended for print viewing must have high contrast (about 1.8) to produce a positive image that has the correct variation of image gray value (tone scale) with scene luminance (4,14). Systems designed to produce color prints also use a low-contrast emulsion to capture the image and a highcontrast emulsion to print the image. However, such systems are designed to produce dye images rather than silver images (271–273). As mentioned in the Image Processing section, this is done by allowing the oxidized developer to react with dye precursors to produce the dye image and then bleaching the developed silver back to silver halide for removal during fixing. Movie films are designed similarly to still camera films, except that the final-stage ‘‘print’’ is made using a transparent support, so that the image can be projected on a screen in a darkened room. For color reproduction, the image must be separated into its blue, green, and red components using separate layers in the film that are sensitive to these spectral regions. This color separation is accomplished by using spectral sensitization to produce separate layers that are sensitive to blue through a combination of the silver halide
SILVER HALIDE DETECTOR TECHNOLOGY
grain and an adsorbed blue spectral sensitizing dye (B), blue plus green by using a green spectral sensitizing dye (G), and blue plus red by using a red spectral sensitizing dye (R), as shown in Fig. 71. Representative spectral sensitivity curves for the different layers are shown in Fig. 72. By coating the B layer on the top and coating underneath it a yellow filter layer (which absorbs blue light), the G and R layers receive only minus-blue light. Then, by coating the G layer on top of the R layer, the latter receives only red light. Thus, the image has been separated into its B, G, and R components. Systems that produce slides also use similar color technology, but now the image capture medium and the image display medium are the same film. Positive images are obtained by first developing the negative image in a black-and-white developer, then chemically treating the remaining undeveloped grains to make them developable, and finally developing those grains with a color developer. Then all the developed silver is bleached back to silver halide and fixed out. To produce the right tone scale in the final image, the film latitude must be considerably shorter than that in systems designed to produce prints and requires much more care in selecting optimum exposure conditions. Systems designed for medical diagnosis using Xrays produce negative black-and-white images. Because film is a very poor absorber of X rays, these systems
use a screen–film combination to minimize patient exposure (283). The screens are composed of heavyelement particles that are good X-ray absorbers and also emit radiation in the near UV or visible region. In these systems, the film need be sensitive only to the particular wavelength region where the screen emits. Films are designed with different contrasts to optimize the image for the particular diagnosis performed.
ABBREVIATIONS AND ACRONYMS E t T R I D ISO γ AgCl AgBr AgI Ag+ Br− I− Ksp pAg pBr S G Q
Blue
ηn ω N
Yellow filter layer Green (+Blue)
Red (+Blue)
Coating support
Spectral sensitivity (log E )
Figure 71. Arrangement of layers in a simple color film.
Green layer
Blue layer
4
Red layer
1305
QS F P LIRF HIRF S S + Au LV HF RQE Ered Eox Dred Dox EAg Edev Ecell SNR
exposure exposure time transmittance reflectance irradiance image density international standards organization contrast silver chloride silver bromide silver iodide silver ion bromide ion iodide ion solubility product negative logarithm of the silver ion concentration negative logarithm of the bromide ion concentration supersaturation free energy one-dimensional representation of a lattice distortion nucleation efficiency recombination index minimum number of silver/gold atoms in the developable latent image quantum sensitivity fraction of grains developable mean absorbed photons/grain low-irradiance reciprocity failure high-irradiance reciprocity failure sulfur sensitized sulfur plus gold sensitized lowest vacant molecular orbital highest filled molecular orbital relative quantum efficiency electrochemical reduction potential electrochemical oxidation potential reduced form of the developing agent oxidized form of the developing agent electrochemical silver potential electrochemical developer potential difference between EAg and Edev (= EAg − Edev ) signal-to-noise ratio
2
0
400
500
600
700
l (nm) Figure 72. Spectral sensitivity of layers in color film. Data were obtained on separately coated layers.
BIBLIOGRAPHY 1. E. Ostroff, ed., Pioneers of Photography. Their Achievements in Science and Technology, SPSE, The Society for Imaging Science and Technology, Springfield, VA, 1987.
1306
SILVER HALIDE DETECTOR TECHNOLOGY
2. W. H. F. Talbot, The Process of Calotype Photogenic Drawing (communicated to the Royal Society, June 10, 1841) J. L. Cox & Sons, London, 1841. 3. G. F. Dutton, Photographic Emulsion Chemistry, Focal Press, London, 1966. 4. B. H. Carroll, G. C. Higgins, and T. H. James, Introduction to Photographic Theory, J Wiley, NY, 1980. 5. T. Tani, Photographic Sensitivity, Oxford University Press, NY, 1995. 6. T. H. James, T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977. 7. J. F. Hamilton, Adv. Phys. 37, 359 (1988). 8. D. M. Sturmer and A. P. Marchetti, in J. Sturge, V. Walworth, and A. Shepp, eds., Imaging Materials and Processes, Neblette’s 8th ed., Van Nostrand Reinhold, NY, 1989, Chap. 3. 9. D. J. Locker, in Kirk Othmer Encyclopedia of Chemical Technology, vol. 18, 4th ed., J Wiley, NY, 1996, pp. 905–963. 10. R. S. Eachus, A. P. Marchetti, and A. A. Muenter, in H. L. Strauss, G. T. Babcock, and S. R. Leone, eds., Annual Review of Physical Chemistry, vol. 50, Annual Reviews, Palo Alto, CA, 1999, pp. 117–144. 11. H. A. Hoyen and X. Wen, in C. N. Proudfoot, ed., Handbook of Photographic Science and Engineering, 2nd ed., The Society for Imaging Science and Technology, Springfield, VA, 1997, pp. 201–224. 12. J. H. Altman, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 17. 13. P. Kowalski, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 18. 14. C. N. Nelson, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 19, Sec. I. 15. American National Standard PH2.5-1972. 16. American National Standard PH2.27-1965. 17. C. Kittel, Introduction to Solid Sate Physics, 4th ed., J Wiley, NY, 1971. 18. E. Vogl and W. Waidelich, Z. Agnew. Phys. 25, 98 (1968). 19. M. Bucher, Phys. Rev. B 35, 2,923 (1987). 20. W. Nernst, Z. Physik. Chem. 4, 129 (1889). Consult any analytical chemistry textbook. 21. I. H. Leubner, R. Jaganathan, and J. S. Wey, Photogr. Sci. Eng. 24, 268 (1980). 22. E. J. Birr, Stabilization of Photographic Silver Halide Emulsions, Focal Press, London, 1974. 23. J. W. Mullin, Crystallization, 2nd ed., Butterworths, London, 1972, p. 222. 24. C. R. Berry and D. C. Skillman, J. Photogr. Sci. 16, 137–147 (1968). 25. C. R. Berry and D. C. Skillman, J. Phys. Chem. 68, 1,138 (1964). 26. C. R. Berry and D. C. Skillman, J. Appl. Phys. 33, 1,900 (1962). 27. P. Claes and H. Borginon, J. Photogr. Sci. 21, 155 (1973). 28. J. S. Wey and R. W. Strong, Photogr. Sci. Eng. 21, 14–18 (1977). 29. T. Tanaka and M. Iwasaki, J. Imaging Sci. 29, 86 (1985). 30. C. R. Berry and D. C. Skillman, Photogr. Sci. Eng. 6, 159–165 (1962).
31. J. S. Wey and R. W. Strong, Photogr. Sci. Eng. 21, 248 (1977). 32. I. H. Leubner, J. Phys. Chem. 91, 6,069 (1987) and references cited therein. 33. C. R. Berry, S. J. Marino, and C. F. Oster Jr., Photogr. Sci. Eng. 5, 332–336 (1961). 34. R. Jagannathan, J. Imaging Sci. 35, 104–112 (1991). 35. M. Antonaides and J. S. Wey, J. Imaging Sci. Technol. 39, 323–331 (1995). 36. C. R. Berry, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 3. 37. A. H. Herz and J. Helling, J. Colloid Sci. 17, 293 (1962). 38. W. L. Gardiner, D. Wrathall, and A. H. Herz, Photogr. Sci. Eng. 21, 325–330 (1977). 39. S. Boyer, J. Cappelaere, and J. Pouradier, Chim. Phys. 56, 495 (1959). 40. J. E. Maskasky, J. Imaging Sci. 30, 247 (1986). 41. I. H. Leubner, J. Imaging Sci. 31, 145 (1987). 42. M. J. Harding, J. Photogr. Sci. 27, 1–12 (1979). 43. W. Markocki and A. Zaleski, Photogr. Sci. Eng. 17, 289–294 (1973). 44. A. B. Holland and J. R. Sawers, Photogr. Sci. Eng. 17, 295–298 (1973). 45. A. B. Holland and A. D. Feinerman, J. Appl. Photogr. Eng. 84, 165 (1982). 46. D. L. Black and J. A. Timmons, J. Imaging Sci. Technol. 38, 10–13 (1994). 47. J. E. Maskasky, J. Imaging Sci. 31, 15–26 (1987). 48. J. E. Maskasky, J. Imaging Sci. 32, 15–16 (1988). 49. B. K. Furman, G. H. Morrison, V. I. Saunders, and Y. T. Tan, Photogr. Sci. Eng. 25, 121 (1981). 50. T. J. Maternaghan, C. J. Falder, R. Levi-Setti, and J. M. Chabala, J. Imaging Sci. 34, 58–65 (1990). 51. J. F. Hamilton and L. E. Brady, Surface Sci. 23, 389 (1970). 52. R. C. Baetzold, Y. T. Tan, and P. W. Tasker, Surface Sci. 195, 579 (1988). 53. P. Tangyunyong, T. N. Rhodin, Y. T. Tan, and K. J. Lushington, Surface Sci. 255, 259 (1991). 54. Y. T. Tan, K. J. Lushington, P. Tangyunyong, and T. N. Rhodin, J. Imaging Sci. Technol. 36, 118 (1992). 55. P. W. M. Jacobs, J. Corish, and C. R. A. Catlow, J. Phys. C Solid State Phys. 13, 1977 (1980). 56. W. G. Kleppmann and H. Bilz, Commun. Phys. 1, 105 (1976). 57. H. Bilz and W. Weber, in A. Baldereschi, W. Czaja, E. Tosati, and M. Tosi, eds., The Physics of Latent Image Formation in Silver Halides, World Scientific, Singapore, 1984, p. 25. 58. R. J. Friauf, in A. Baldereschi, W. Czaja, E. Tosati, and M. Tosi, eds., The Physics of Latent Image Formation in Silver Halides, World Scientific, Singapore, 1984, p. 79. 59. J. Van Biesen, J. Appl. Phys. 41, 1,910 (1970). 60. S. Takada, J. Appl. Phys. Jpn. 12, 190 (1973). 61. S. Takada, Photogr. Sci. Eng. 18, 500 (1974). 62. M. E. van Hull and W. Maenhout-van der Vorst, Phys. Stat. Sol. 39(a), 253 (1977). 63. M. E. van Hull and W. Maenhout-van der Vorst, Phys. Stat. Sol. 40(a), K57 (1977). 64. M. E. van Hull and W. Maenhout-van der Vorst, Int. Cong. Photogr. Sci., Rochester, NY, 1978, Paper I 8. 65. M. E. van Hull and W. Maenhout-van der Vorst, Phys. Stat. Sol. 52(a), 277 (1979).
SILVER HALIDE DETECTOR TECHNOLOGY
1307
66. F. Callens and W. Maenhout-van der Vorst, Phys. Stat. Sol. 50(a), K175 (1978).
101. J. W. Mitchell, Rep. Prog. Phys. 20, 433 (1957).
67. F. Callens and W. Maenhout-van der Vorst, Phys. Stat. Sol. 71(a), K61 (1982).
103. J. W. Mitchell, Photogr. Sci. Eng. 22, 249 (1978).
68. F. Callens, W. Maenhout-van der Vorst, and L. Kettellapper, Phys. Stat. Sol. 70(a), 189 (1982).
105. J. F. Hamilton, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 4.
69. H. Pauly and H. P. Schwan, Zeitsch. f. Naturforschung 14b, 125–131 (1950). 70. F. Callens, D. Vandenbroucke, L Soens, M. Van den Eeden, and F. Cardon, J. Photogr. Sci. 41, 72–73 (1993).
102. J. W. Mitchell, J. Phys. Chem. 66, 2,359 (1962). 104. J. W. Mitchell, Photogr. Sci. Eng. 25, 170 (1981).
106. J. F. Hamilton, in A. Baldereschi, W. Czaja, E. Tosati, and M. Tosi, eds., The Physics of Latent Image Formation in Silver Halides, World Scientific, Singapore, 1984, p. 203.
71. J. Heick and F. Granzer, J. Imaging Sci. Technol. 38, 464–474 (1995).
107. J. Malinowski, Photogr. Sci. Eng. 14, 112 (1970).
72. R. C. Baetzold, Phys. Rev. B 52, 11,424–11,431 (1995).
109. J. Malinowski, Photogr. Sci. Eng. 23, 99 (1979).
73. K. Lehovec, J. Chem. Phys. 21, 1,123 (1953).
110. E. Moisar, Photogr. Sci. Eng. 26, 124–132 (1982).
74. K. L. Kliewer, J. Phys. Chem. Solids 27, 705, 719–(1966).
111. E. Moisar, Photogr. Sci. Eng. 25, 45–56 (1981).
75. R. B. Poeppel and J. M. Blakely, Surface Sci. 15, 507 (1969).
112. E. Moisar and F. Granzer, Photogr. Sci. Eng. 26, 1–14 (1982).
76. Y. T. Tan and H. A. Hoyen Jr., Surface Sci. 36, 242 (1973).
108. J. Malinowski, J. Photogr. Sci. 18, 363 (1974).
77. H. A. Hoyen, in A. Baldereschi, W. Czaja, E. Tosati, and M. Tosi, eds., The Physics of Latent Image Formation in Silver Halides, World Scientific, Singapore, 1984, p. 151.
113. M. R. V. Sahyun, Photogr. Sci. Eng. 27, 171–177 (1983).
78. H. A. Hoyen Jr. and Y. T. Tan, J. Colloid Interface Sci. 79, 525–534 (1981).
116. F. Seitz, Rev. Mod. Phys. 23, 328 (1951).
79. F. Bassini, R. S. Knox, and W. B. Fowler, Phys. Rev. A 137, 1,217 (1965).
114. M. R. V. Sahyun, Photogr. Sci. Eng. 28, 157–161 (1984). 115. W. F. Berg, Trans. Faraday Soc. 39, 115 (1943). 117. B. E. Bayer and J. F. Hamilton, J. Opt. Soc. Am. 55, 439–452 (1965). 118. P. C. Burton and W. F. Berg, Photogr. J. 86B, 2 (1946).
80. W. B. Fowler, Phys. Stat. Sol.(b) 52, 591 (1972).
119. P. C. Burton, Photogr. J. 86B, 62 (1946).
81. F. Moser and R. Ahrenkiel, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 1, Sect. IV.
120. P. C. Burton and W. F. Berg, Photogr. J. 88B, 84 (1948).
82. R. C. Brandt and F. C. Brown, Phys, Rev. 181, 1,241 (1969). 83. A. M. Stoneham, Theory of Defects in Solids, Clarendon, Oxford, 1975. 84. P. Langevin, Ann. Chem. Phys. 28, 433 (1903).
121. P. C. Burton, Photogr. J. 88B, 13 (1948). 122. P. C. Burton, Photogr. J. 88B, 123 (1948). 123. W. F. Berg, Rep. Prog. Phys. 11, 248 (1948). 124. R. K. Hailstone and J. F. Hamilton, J. Imaging Sci. 29, 125–131 (1985).
85. M. Lax, Phys. Rev. 119, 1,502 (1960).
125. J. F. Hamilton and B. E. Bayer, J. Opt. Soc. Am. 55, 528–533 (1965).
86. R. M. Gibb, G. J. Rees, B. W. Thomas, B. L. H. Wilson, B. Hamilton, D. R. Wight, and N. F. Mott, Philos. Mag. 36, 1,021 (1977).
126. J. F. Hamilton and B. E. Bayer, J. Opt. Soc. Am. 56, 1,088–1,094 (1966).
87. Y. Toyozawa, Semicond. Insulators 5, 175 (1983) and references therein.
128. J. F. Hamilton, Radiat. Effects 72, 103–106 (1983).
88. H. Kanzaki and S. Sakuragi, J. Phys. Soc. Jpn 27, 109 (1969). 89. F. Moser, R. K. Ahrenkeil, and S. L. Lyu, Phys. Rev. 161, 897 (1967). 90. V. Platikanova and J. Malinowsi, Phys. Stat. Sol. 47, 683 (1978). 91. R. E. Maerker, J. Opt. Soc. Am. 44, 625 (1954). 92. M. Tamura, H. Hada, S. Fujiwara, and S. Ikenoue, Photogr. Sci. Eng. 15, 200 (1971).
127. J. F. Hamilton, Photogr. Sci. Eng. 26, 263–269 (1982). 129. J. F. Hamilton and P. C. Logel, Photogr. Sci. Eng. 18, 507–512 (1974). 130. R. K. Hailstone, N. B. Liebert, M. Levy, and J. F. Hamilton, J. Imaging Sci. 31, 185–193 (1987). 131. R. K. Hailstone, N. B. Liebert, M. Levy, and J. F. Hamilton, J. Imaging Sci. 31, 255–262 (1987). 132. P. Fayet, F. Granzer, G. Hegenbart, E. Moisar, B. Pischel, and L. W¨oste, Phys. Rev. Lett. 55, 3,002 (1985). 133. T. Leisner, C. Rosche, S. Wolf, F. Granzer, and L. W¨oste, Surf. Rev. Lett. 3, 1,105–1,108 (1996).
93. M. Kawasaki and H. Hada, J. Soc. Photogr. Sci. Technol. Jpn. 44, 185 (1981).
134. R. K. Hailstone and J. F. Hamilton, J. Imaging Sci. 31, 229–238 (1987).
94. H. Hada and M. Kawasaki, J. Appl. Phys. 54, 1,644 (1983).
135. F. Trautweiler, Photogr. Sci. Eng. 12, 138–142 (1968).
95. M. Kawasaki and H. Hada, J. Imaging Sci. 29, 132 (1985). 97. G. W. Luckey, Discuss. Faraday Soc. 28, 113 (1959).
136. D. E. Powers, S. G. Hamsen, M. E. Geusic, D. L. Michalopoulos, and R. E. Smalley, J. Chem. Phys. 78, 2,866–2,881 (1983).
98. S. E. Sheppard, A. P. H. Trivelli, R. P. Loveland, J. Franklin Inst. 200, 15 (1925).
137. M. Kawaski, Y. Tsujimura, and H. Hada, Phys. Rev. Lett. 57, 2,796–2,799 (1986).
99. R. W. Gurney and N. F. Mott, Proc. R. Soc. London A 164, 151 (1938).
138. J. F. Hamilton and R. C. Baetzold, Photogr. Sci. Eng. 25, 189–197 (1981).
96. M. Kawasaki and H. Hada, J. Imaging Sci. 31, 267 (1987).
100. N. F. Mott and R. W. Gurney, Electronic Processes in Ionic Crystals, Clarendon, Oxford, 1940.
139. R. C. Baetzold, J. Phys. Chem. 101, 8,180–8,190 (1997). 140. T. H. James, J. Photogr. Sci. 20, 182–186 (1972).
1308
SILVER HALIDE DETECTOR TECHNOLOGY
141. R. K. Hailstone and J. F. Hamilton, J. Photogr. Sci. 34, 2–8 (1986). 142. J. F. Hamilton, Photogr. Sci. Eng. 14, 122–130 (1970). 143. J. F. Hamilton, Photogr. Sci. Eng. 14, 102–111 (1970). 144. V. I. Saunders, R. W. Tyler, and W. West, Photogr. Sci. Eng. 16, 87 (1972). 145. R. S. Van Heyingen and F. C. Brown, Phys. Rev. 111, 462 (1958). 146. H. E. Spencer and R. E. Atwell, J. Opt. Soc. Am. 54, 498–505 (1964). 147. R. Deri and J. Spoonhower, J. Appl. Phys. 57, 2,806 (1985).
181. M. Ridgway and P. J. Hillson, J. Photogr. Sci. 23, 153 (1975). 182. P. H. Roth and W. H. Simpson, Photogr. Sci. Eng. 24, 133 (1975). 183. D. A. Pitt, D. L. Rachu, and M. R. V. Sahyun, Photogr. Sci. Eng. 25, 57 (1981). 184. J. E. Keevert and V. V. Gokhale, J. Imaging Sci. 31, 243 (1987). 185. H. Kamzaki and Y. Tadakuma, J. Phys. Chem. Solids 55, 631 (1994). 186. H. Kanzaki and Y. Tadakuma, J. Phys. Chem. Solids 58, 221 (1997).
148. J. Flad, H. Stoll, and H. Preuss, Z. Phys. D — At. Mol. Clusters 6, 193–198 (1987).
187. R. C. Baetzold, J. Imaging Sci. Technol. 43, 375 (1999).
149. J. Flad, H. Stoll, and H. Preuss, Z. Phys. D — At. Mol. Clusters 6, 287–292 (1987).
189. J. F. Hamilton, J. M. Harbison, and D. L. Jeanmaire, J. Imaging Sci. 32, 17 (1988).
150. J. Flad, H. Stoll, and H. Preuss, Z. Phys. D — At. Mol. Clusters 15, 79–86 (1990).
190. D. Zhang and R. K. Hailstone, J. Imaging Sci. Technol. 37, 61 (1993).
151. P. G. Nutting, Philos. Mag. 26(6), 423 (1913).
191. K. Morimura and H. Mifune, J. Soc. Photogr. Sci. Technol. Jpn. 61, 175 (1998).
188. H. Frieser and W. Bahnmuller, J. Photogr. Sci. 16, 38 (1968).
152. J. Gasper and J. J. DePalma, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 20.
192. T. Kaneda, J. Imaging Sci. 33, 115–118 (1989).
153. J. H. Webb, J. Opt. Soc. Am. 38, 27 (1948).
194. T. Tani, J. Imaging Sci. Technol. 39, 386 (1995).
154. G. C. Farnell, J. Photogr. Sci. 7, 83 (1959).
195. H. E. Spencer and R. E. Atwell, J. Opt. Soc. Am. 54, 498–505 (1964).
155. G. C. Farnell, J. Photogr. Sci. 8, 194 (1960).
193. T. Tani, Photogr. Sci. Eng. 27, 75 (1983).
156. R. K. Hailstone, N. B. Liebert, M. Levy, R. T. McCleary, S. R. Girolmo, D. L. Jeanmaire, and C. R. Boda, J. Imaging Sci. 32, 113–124 (1988).
196. H. E. Spencer and M. Levy, J. Soc. Photogr. Sci. Technol. Jpn. 46, 514–524 (1983).
157. L. Silberstein, Philos. Mag 45, 1,062 (1923).
198. R. K. Hailstone, N. B. Liebert, and M. Levy, J. Imaging Sci. 34, 169–176 (1990).
158. G. C. Farnell and J. B. Chanter, J. Photogr. Sci. 9, 73 (1961). 159. H. E. Spencer, Photogr. Sci. Eng. 15, 468 (1971). 160. G. C. Attridge, J. Photogr. Sci. 30, 197 (1982). 161. T. A. Babcock and T. H. James, J. Photogr. Sci. 24, 19 (1976). 162. G. C. Farnell, J. Photogr. Sci. 17, 116 (1969). 163. P. Broadhead and G. C. Farnell, J. Photogr. Sci. 30, 176 (1982). 164. T. Tani, J. Imaging Sci. 29, 93 (1985). 165. A. G. DiFrancesco, M. Tyne, C. Pryor, and R. Hailstone, J. Imaging Sci. Technol. 40, 576–581 (1996). 166. T. Tani, J. Soc. Photogr. Sci. Technol. Jpn. 43, 335 (1980). 167. J. W. Mitchell, Photogr. Sci. Eng. 25, 170 (1981). 168. R. K. Hailstone, J. Appl. Phys. 86, 1,363–1,369 (1999). 169. R. W. Bunsen and H. E. Roscoe, Ann. Phys. Chem. 2(117), 529 (1862). 170. J. H. Webb, J. Opt. Soc. Am. 40, 3 (1950). 171. T. A. Babcock, P. M. Ferguson, W. C. Lewis, and T. H. James, Photogr. Sci. Eng. 19, 49–55 (1975). 172. T. A. Babcock, P. M. Ferguson, W. C. Lewis, and T. H. James, Photogr. Sci. Eng. 19, 211–214 (1975).
197. E. Moisar, Photogr. Sci. Eng. 25, 45 (1981).
199. J. Pouradier, A. Maillet, and B. Cerisy, J. Chim. Phys. 63, 469 (1966). 200. K. Tanaka, Nippon Kagaku Kaishi 12, 2,264 (1973). 201. H. Hirsch, J. Photogr. Sci. 20, 187 (1972). 202. H. E. Spencer, J. Imaging Sci. 32, 28–34 (1988). 203. D. Spracklen, J. Photogr. Sci. 9, 145–(1961). 204. P. Faelens, Photogr. Korr. 104, 137 (1968). 205. D. Cash, Photogr. Sci. Eng. 27, 156 (1983). 206. R. C. Baetzold, J. Photogr.Sci. 28, 15–22 (1980). 207. J. M. Harbison and J. F. Hamilton, Photogr. Sci. Eng. 19, 322 (1975). 208. G. W. Lowe, J. E. Jones, and H. E. Roberts, in J. W. Mitchell, ed., Fundamentals of Photographic Sensitivity (Proc. Bristol Symp.), Butterworths, London, 1951, p. 112. 209. T. Tani, Photogr. Sci. Eng. 15, 181 (1971). 210. T. Tani and M. Murofushi, J. Imaging Sci. Technol. 38, 1 (1994). 211. T. Tani, J. Imaging Sci. Technol. 41, 577 (1997). 212. T. Tani, J. Imaging Sci. Technol. 42, 402 (1998).
173. G. A. Janusonis, Photogr. Sci. Eng. 22, 297–301 (1978)
213. S. Guo and R. Hailstone, J. Imaging Sci. Technol. 40, 210 (1996).
174. J. M. Harbison and H. E. Spencer, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 5.
214. M. Kawasakki and Y. Oku, J. Imaging Sci. Technol. 42, 409 (1998).
175. H. E. Spencer, J. Photogr. Sci. 24, 34–39 (1976).
215. A. P. Marchetti, A. A. Muenter, R. C. Baetzold, and R. T. McCleary, J. Phys. Chem. B 102, 5,287–5,297 (1998).
176. S. Sheppard, Photogr. J. 65, 380 (1925).
216. T. Tani, Imaging Sci. J. 47, 1 (1999).
177. S. Sheppard, Photogr. J. 66, 399 (1926).
217. E. Moisar, Photogr. Korr. 106, 149 (1970).
178. D. J. Cash, J. Photogr. Sci. 20, 19 (1972).
218. S. S. Collier, Photogr. Sci. Eng. 23, 113 (1979).
179. D. J. Cash, J. Photogr. Sci. 20, 77 (1972).
219. H. E. Spencer, L. E. Brady, and J. F. Hamilton, J. Opt. Soc. Am. 57, 1,020 (1967).
180. D. J. Cash, J. Photogr. Sci. 20, 223 (1972).
SILVER HALIDE DETECTOR TECHNOLOGY
1309
220. H. E. Spencer, R. E. Atwell, and M. Levy, J. Photogr. Sci. 31, 158 (1983).
254. T. Tani, T. Suzumoto and K. Ohzeki, J. Phys. Chem. 94, 1,298–1,301 (1990).
221. H. E. Spencer, Photogr. Sci. Eng. 11, 352 (1967).
255. R. A. Marcus, Annu. Rev. Phys. Chem. 15, 155 (1964).
222. T. Tani, J. Imaging Sci. 29, 93 (1985). 223. T. Tani, J. Imaging Sci. 30, 41 (1986).
256. R. W. Berriman and P. B. Gilman Jr., Photogr. Sci. Eng. 17, 235–244 (1973).
224. R. K. Hailstone, N. B. Liebert, M. Levy, and J. F. Hamilton, J. Imaging Sci. 35, 219–230 (1991).
257. J. R. Lenhard and B. R. Hein, J. Phys. Chem. 100, 17,287 (1996).
225. S. S. Collier, Photogr. Sci. Eng. 26, 98 (1982).
258. A. A. Muenter, P. B. Gilman Jr., J. R. Lenhard, and T. L. Penner, The Int. East-West Symp. Factors Influencing Photogr. Sensitivity, 1984. Mauii, Hawaii, Paper C-4.
226. A. G. DiFrancesco, M. Tyne, and R. Hailstone, IS & T 49th Annual Conf., Minneapolis, MN, 1996, p. 222. 227. R. S. Eachus and M. T. Olm, Crystl. Lattice Deformation Amorphous Mater. 18, 297 (1989). 228. R. S. Eachus and M. T. Olm, Annu. Rep. Prog. Chem. C 83, 3 (1989). 229. D. A. Corrigan, R. S. Eachus, R. E. Graves, and M. T. Olm, J. Chem. Phys. 70, 5,676 (1979). 230. A. P. Marchetti and R. S. Eachus, in D. Volman, G. Hammond, and D. Neckers, eds., Advances in Photochemistry, vol. 17, J Wiley, NY, 1992, pp 145–216. 231. R. S. Eachus and R. E. Graves, 5,445–5,452 (1976).
J.
Chem.
Phys.
65,
259. T. H. James, Photogr. Sci. Eng. 18, 100–109 (1974). 260. W. C. Lewis and T. H. James, Photogr. Sci. Eng. 13, 54–64 (1969). 261. J. M. Simson and W. S. Gaugh, Photogr. Sci. Eng. 19, 339–343 (1975). 262. F. J. Evans and P. B. Gilman Jr., Photogr. Sci. Eng. 19, 333–339 (1975). 263. D. M. Sturmer and W. S. Gaugh, Photogr. Sci. Eng. 19, 344–351 (1975). 264. J. R. Lenhard, B. R. Hein, and A. A. Muenter, J. Phys. Chem. 97, 8,269–8,280 (1993).
¨ 232. W. Bahnmuller, Photogr. Korr. 104, 169 (1968).
265. P. B. Gilman Jr., Photogr. Sci. Eng. 11, 222 (1967).
233. H. Zwickey, J. Photogr. Sci. 33, 201 (1985).
266. P. B. Gilman Jr., Photogr. Sci. Eng. 12, 230 (1968).
234. B. H. Carroll, Photogr. Sci. Eng. 24, 265–267 (1980).
267. P. B. Gilman Jr., Photogr. Sci. Eng. 18, 418 (1974).
235. R. S. Eachus and R. E. Graves, J. Chem. Phys. 65, 1,530 (1976).
268. W. E. Lee and E. R. Brown, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 11.
236. M. T. Olm, R. S. Eachus, and W. S. McDugle, Bulg. Chem. Comm. 26, 350–367 (1993). 237. W. West and P. B. Gilman Jr., in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 10. 238. D. M. Sturmer and D. W. Heseltine, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 8. 239. H. Kuhn, J. Chem. Phys. 17, 1,198–1,212 (1949). 240. A. H. Herz, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 9. 241. T. Tani and S. Kikuchi, Bull. Soc. Sci. Photogr. Jpn. 18, 1 (1968). 242. T. Tani and S. Kikuchi, J. Photogr. Sci. 17, 33 (1969). 243. H. Matsusaki, H. Hada, and M. Tamura, J. Soc. Photogr. Sci. Technol. Jpn. 31, 204 (1968). 244. J. F. Padday, Trans. Faraday Soc. 60, 1,325 (1964).
¨ 246. E. Gunther and E. Moisar, J. Photogr. Sci. 13, 280 (1965). 247. H. Phillippaerts, W. Vanassche, F. Cleaes, and H. Borginon, J. Photogr. Sci. 20, 215 (1972). 248. W. West and A. L. Geddes, J. Phys. Chem. 68, 837 (1964). 249. T. Tani and S. Kikuchi, Bull. Soc. Sci. Photogr. Jpn. 17, 1 (1967). H. B.
251. L. G. S. Brooker, F. L. White, D. W. Heseltine, G. H. Keyes, S. G. Dent Jr., and E. J. Van Lare, J. Photogr. Sci. 1, 173 (1953). 252. R. L. Large, in R. Cox, ed., Photographic Sensitivity, Academic Press, NY, 1973, pp. 241–263. 253. J. Lenhard, J. Imaging Sci. 30, 27–35 (1986).
270. G. Haist, Modern Photographic Processing, vols. 1 and 2, J Wiley, NY, 1979. 271. J. Kapecki and J. Rodgers, in Kirk Othmer Encyclopedia of Chemical Technology, vol. 6, 4th ed., Wiley, NY, 1993, pp. 965–1002. 272. J. R. Thirtle, L. K. J. Tong, and L. J. Fleckenstein, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 12. 273. P. Krause, in J. Sturge, V. Walworth, and A. Shepp, ed., Imaging Materials and Processes, Neblette’s 8th ed., Van Nostrand Reinhold, New York, 1989, Chap. 4. 274. J. F. Hamilton, Appl. Opt. 11, 13 (1972). 275. C. R. Berry, Photogr. Sci. Eng. 13, 65 (1969). 276. H. D. Keith and J. W. Mitchell, Philos. Mag. 44, 877 (1953).
245. B. H. Carroll and W. West, in J. W. Mitchell, ed., Fundamentals of Photographic Sensitivity (Proc. Bristol Symp.), Butterworths, London, 1951, p. 162.
250. G. R. Bird, K. S. Norland, A. E. Rosenoff, and Michaud, Photogr. Sci. Eng. 12, 196–206 (1968).
269. T. H. James, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chaps. 13, 14.
277. D. C. Skillman, Photogr. Sci. Eng. 19, 28 (1975). 278. W. E. Lee, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 14, Sect. II. 279. G. I. P. Levensen, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 15. 280. M. A. Kriss, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 21. 281. J. Dainty and R. Shaw, Image Science, Academic Press, NY, 1974. 282. M. R. Pointer and R. A. Jeffreys, J. Photog. Sci. 39, 100 (1991). 283. L. Erickson and H. R. Splettstosser, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 23, Sect. III.
1310
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT) MARK T. MADSEN University of Iowa Iowa City, IA
INTRODUCTION Single photon emission computed tomography (SPECT) is a diagnostic imaging modality that produces tomographic slices of internally distributed radiopharmaceuticals. It is routinely used in diagnosing coronary artery disease and in tumor detection. Projection views of the radiopharmaceutical distribution are collected by one or more scintillation cameras mounted on a gantry designed to rotate about a patient lying horizontally on a pallet. The projection information is mathematically reconstructed to obtain the tomographic slices. Most clinical SPECT studies are qualitative and have simplistic corrections for attenuation and scattered radiation. Quantitative SPECT requires corrections for attenuation, scatter and spatial resolution, although these have not been routinely implemented in the past because of their computational load. SPECT instrumentation has evolved to include coincidence imaging of positron-emitting radiopharmaceuticals, specifically 18 F Fluorodeoxyglucose. RADIOTRACERS Much of medical imaging depends on anatomic information. Examples include radiographs, X-ray computed tomography (CT), and magnetic resonance imaging (MRI). In SPECT imaging, functional information is obtained about tissues and organs from specific chemical compounds labeled with radionuclides that are used as tracers. These radiotracers, or radiopharmaceuticals, are nearly ideal tracers because they can be externally detected and they are injected in such small quantities that they do not perturb the physiological state of the patient. A radionuclide is an unstable atomic nucleus that spontaneously emits energy (1). As part of this process, it may emit some or all of the energy as high energy photons called gamma rays. Because gamma rays are energetic, a significant fraction of them are transmitted from their site of origin to the outside of the body where they can be
detected and recorded. Only a small number of radionuclides are suitable as radiotracers. The most commonly used radionuclides in SPECT imaging are summarized in Table 1. The phrase ‘‘single photon’’ refers to the fact that in this type of imaging, gamma rays are detected as individual events. The term is used to distinguish SPECT from positron-emitting emission tomography (PET) which also uses radionuclides, but relies on coincidence imaging. The radionuclides used in PET emit positrons, which quickly annihilate with electrons to form two, collinear 511-keV photons. Both of the annihilation photons have to be detected simultaneously by opposed detectors to record a true event, as discussed in more detail in the SPECT/PET Hybrid section. Diagnostic information is obtained from the way the tissues and organs of the body process the radiopharmaceutical. For example, some tumor imaging uses radiopharmaceuticals that have affinity for malignant tissue. In these scans, abnormal areas are characterized by an increased uptake of the tracer. In nearly all instances, the radiopharmaceutical is administered to the patient by intravenous injection and is carried throughout the body by the circulation where it localizes in tissues and organs. Because SPECT studies require 15–30 minutes to acquire, we are limited to radiopharmaceuticals whose distribution will remain relatively constant over that or longer intervals. Ideally, we also want the radiopharmaceutical to distribute only in abnormal tissues. Unfortunately, this is never the case, and the abnormal concentration of the radiotracer is often obscured by normal uptake of the radiopharmaceutical in surrounding tissues. This is why tomographic imaging is crucial. It substantially increases the contrast of the abnormal area, thereby greatly improving the likelihood of detection. The widespread distribution of the radiopharmaceutical in the body has other implications; the most important is radiation dose. The radiation burden limits the amount of radioactivity that can be administered to a patient, and for most SPECT studies, this limits the number of detected emissions and thereby, the quality of the SPECT images. SPECT studies are performed for a wide variety of diseases and organ systems (2–6). Although myocardial perfusion imaging (Fig. 1) and tumor scanning (Fig. 2) are by far the most common SPECT applications, other studies include brain perfusion for evaluating stroke (Fig. 3) and dementias, renal function, and the evaluation of trauma.
Table 1. SPECT Radionuclides Radionuclide 99m Tc 67 Ga 111 In 123 I 131 I 133 Xe 201 Tl 18 F
Decay Mode IT EC EC EC ββEC β+
Production Method 99 Mo
generator
68 Zn(p,2n)67 Ga 111 Cd(p,n)111 In 124 Te(p,5n)123 I
Fission by-product Fission by-product 201 Hg(d,2n)201 Tl 18 O(p,n)18 F
Half-Life
Principal photon Emissions (keV)
6 h 78 h 67 h 13 h 8 days 5.3 days 73 h 110 min
140 93, 185, 296 172, 247 159 364 30(X rays), 80 60–80(X rays), 167 511
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
1311
(c)
(a)
A
L I
A
S
Septal
Lateral I
(b)
(d)
200 EF: 69% EDV: 182 mL ESV: 56 mL SV: 126 mL Mass: 189 gm
150 100 50 0
1
2
3
4
5
6
7
8
Figure 1. Myocardial perfusion SPECT. This common SPECT procedure is used to evaluate coronary artery disease. SPECT images show regional blood flow in the heart muscle under resting and stress conditions. Both bullet (a) and bull’s eye displays (b) are used to compare the 3-D rest and stress images. Myocardial SPECT studies can also be gated to evaluate wall motion (c) and ejection fraction (d). (a)
Transverse
Sagittal
Coronal
Transverse
Sagittal
Coronal
(b)
Figure 2. Tumor scanning. This shows images from scan for prostate cancer. Because of the difficulty in distinguishing abnormal uptake from circulating tracer, additional studies using a nonspecific radiotracer are acquired simultaneously. The upper set of images (a) shows tumor uptake (arrows) and no corresponding uptake in the corresponding blood pool image in the lower set (b).
Table 2 summarizes several of the most common SPECT studies along with radiation dose estimates. Gamma Ray Interactions To understand the detection and imaging of gamma rays, we must first review gamma ray interactions
with different materials (7). The intensity of a gamma ray beam decreases as it traverses through a material because of interactions between the gamma rays and the electrons in the material. This is referred to as attenuation. Attenuation is an exponential process described by I(x) = Io exp(−µx),
(1)
1312
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
(a) 28
29
30
31
32
(b) 36
37
38
39
40
(c) 32
33
34
35
36
Figure 3. Brain perfusion study. These images show the asymmetric distribution of blood flow in the brain resulting from a stroke. Selected transverse (a), sagittal (b), and coronal views (c) are shown.
Table 2. SPECT Radiopharmaceuticals Radiopharmaceutical 99m Tc 99m Tc 99m Tc 99m Tc 123 I 67 Ga 111 In 111 In 201 Tl 18 F
Medronate(MDP), Oxidronate(HDP) Exametazine(HMPAO), Bicisate(ECD) Arcitumomab(CEA Scan) Sestamibi, Tetrofosmin Metaiodobenzylguanidine (MIBG) Citrate Capromab Pendetide(ProstaScint) Pentetreotide(OctreoScan) Thallous Chloride Fluoro-2-deoxyglucose(FDG)
where Io is the initial intensity, I(x) is the intensity after traveling a distance x through the material, and µ is the linear attenuation coefficient of the material. Over the range of gamma ray energies used in radionuclide imaging, the two primary interactions that contribute to
Application
Effective Dose (rem)
Bone scan
0.75
Brain perfusion
1.2
Colon cancer Myocardial perfusion, breast cancer Neuroendocrine tumors
0.75 1.2
Infection, tumor localization Prostate cancer
2.5 2.1
Neuroendocrine tumors Myocardial perfusion Tumor localization, Myocardial viability
2.1 2.5 1.1
0.7
the attenuation coefficient are photoelectric absorption and Compton scattering. Photoelectric absorption refers to the total absorption of the gamma ray by an inner shell atomic electron. It is not an important interaction in body tissues, but it is the primary interaction in high
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
Z materials such as sodium iodide (the detector material used in the scintillation camera) and lead. Photoelectric absorption is inversely proportional to the cube of the gamma ray energy, so that the efficiency of detection falls sharply as photon energy increases. Compton scattering occurs when the incoming gamma ray interacts with a loosely bound outer shell electron. A portion of the gamma ray energy is imparted to the electron, and the remaining energy is left with the scattered photon. The amount of energy lost in scattering depends on the angle between the gamma ray and the scattered photon. The cross section of Compton scattering is inversely proportional to the gamma ray energy, and it is the dominant interaction in body tissues.
1313
Scintillation camera Pulse height analyzer Energy signal
Position signals
X Y
Z
PMT array Nal(TI) crystal Image matrix Collimator
Scintillation Cameras All imaging studies in nuclear medicine (SPECT and conventional planar) are acquired on scintillation cameras (also referred to as Anger or gamma cameras) invented by H. O. Anger in 1953 (8,9). The detector of the scintillation camera is a large, thin sodium iodide crystal (Fig. 4). Typical dimensions of the crystal are 40 × 50 cm and 9.5 mm thick. Sodium iodide, NaI(Tl), is a scintillator; it converts absorbed gamma ray energy into visible light. The magnitude of the light flash is proportional to the energy absorbed, so that information about the event energy as well as location is available. Photomultiplier tubes, which convert the scintillation into an electronic pulse, are arranged in a close-packed array that covers the entire sensitive area of the crystal. Approximately sixty 7.5-cm photomultiplier tubes are required for the scintillation camera dimensions given before. The location of the detected event is determined by the positionweighted average of the electronic pulses generated by the photomultiplier tubes in the vicinity of the event. This approach yields an intrinsic spatial resolution in the range of 3–4 mm. In addition to estimating the position of the event, the photomultiplier tube signals are also combined to estimate the energy absorbed in the interaction. The energy signal is used primarily to discriminate against Compton scattered radiation that occurs in the patient and to normalize the position signals so that the size of the image does not depend on the gamma ray energy. It also makes it possible to image distributions of radiotracers labeled with different radionuclides simultaneously. This is often referred to as dual isotope imaging; however, modern gamma cameras can acquire simultaneous images from four or more energy ranges. Because the response of the crystal and photomultiplier tubes is not uniform, additional corrections are made for position-dependent shifts in the energy signal (referred to as Z or energy correction) and in determining the event location (referred to as L or spatial linearity correction). Thus, when a gamma ray is absorbed, the scintillation camera must determine the position and energy of the event, determine if the energy signal falls within a selected pulse height analyzer window, and apply spatial linearity correction. At this point, the location within the image matrix corresponding to the event has its count value increased by one. A scintillation camera image is generated from the accumulation of many (105 –106 ) detected events.
Figure 4. Scintillation camera. Virtually all SPECT imaging is performed with scintillation cameras. The scintillation camera determines the location and energy of each gamma ray interaction through the weighted averaging of photomultiplier signals. A collimator is required to form the image on the NaI(Tl) detector.
The time it takes to process an event is ultimately limited by the scintillation relaxation time [t = 250 ns for NaI(Tl)]. For most SPECT imaging, this does not present any problem. However, it becomes a severe constraint for coincidence imaging, discussed in detail later. Typical performance specifications of scintillation cameras are given in Table 3. Gamma rays cannot be focused because of their high photon energy. Therefore, a collimator must be used to project the distribution of radioactivity within the patient onto the NaI(Tl) crystal (10). A collimator is a multihole lead device that selectively absorbs all gamma rays except those that traverse the holes (Fig. 5). This design severely restricts the number of gamma rays that can be detected. Less than 0.05% of the gamma rays that hit the front
Table 3. Scintillation Camera Specifications Parameter Crystal size Crystal thickness Efficiency at 140 keV Efficiency at 511 keV Energy resolution Intrinsic spatial resolution System count sensitivitya System spatial resolution at 10 cma Maximum count rate(SPECT) Maximum count rate (coincidence) a
Specified for high-resolution collimator.
Specification 40 × 50 cm 9.5 mm 0.86 (photopeak); 0.99 (total) 0.05 (photopeak); 0.27 (total) 10% 3.5 mm 250 counts/min/µCi 8.0 mm 250,000 counts/s >1,000,000 counts/s
1314
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
1.5 mm
25 mm 0.2 mm
Parallel
Fan beam
Cone beam
Figure 5. Collimators. Collimators are the image-forming apertures of the scintillation camera. They can be configured in parallel, fan-beam, and cone-beam geometries.
surface of the collimator are transmitted through to the crystal. Several parameters enter in the design of collimators. Most collimators have parallel holes that map the gamma ray distribution one-to-one onto the detector. Trade-offs are made in optimizing of the design for count sensitivity and spatial resolution. The sensitivity of the collimator is proportional to the square of the ratio of the hole size d and length l) (ε ∝ d2 /l2 ). The spatial resolution (Rcol ), characterized by the full-width-at-half-maximum (FWHM) of the line spread function, is proportional to d/l. The desire is to maximize ε while minimizing Rcol . Because the optimal design often depends on the specific imaging situation, most clinics have a range of collimators available. Some typical examples are given in Table 4. For low-energy studies (Eγ < 150 keV) either high-resolution or ultrahigh resolution collimators are typically used. Because the lead absorption of gamma rays is inversely proportional to the gamma ray energy, the design of collimators is influenced by the gamma ray energies that are imaged. As the photon energy increases, thicker septa are required, and to maintain count sensitivity, the size of the holes is increased which compromises spatial resolution. Parallel hole geometry is not the most efficient arrangement. Substantial increases in count sensitivity are obtained by using fan- and cone-beam geometries (11) (Fig. 5). The disadvantage of these configurations
is that the field of view becomes smaller as the sourceto-collimator distance increases. This presents a problem for SPECT imaging in the body where portions of the radiopharmaceutical distribution are often truncated. Fan-beam collimators are routinely used for brain imaging, and hybrid cone-beam collimators are available for imaging the heart. In addition to the dependence on hole size and length, the spatial resolution of a collimator depends on the source-to-collimator distance, as shown in Fig. 6. The overall system spatial resolution Rsys can be estimated from Rsys =
R2col + R2int .
(2)
For most imaging, the collimator resolution is substantially larger than the intrinsic spatial resolution (Rint ∼ 3.5 mm) and is the dominant factor. Therefore, it is very important for the collimator to be as close to the patient as possible. SPECT Systems A SPECT system consists of one or more scintillation cameras mounted on a gantry that can revolve about a fixed axis in space, the axis of rotation (8,9,12,13) (Fig. 7).
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
1315
Table 4. Collimator Specifications
Collimator Parallel general purpose Parallel high-sensitivity Parallel high-resolution Parallel ultrahigh resolution Ultrahigh fan-beam Cone-beam Parallel medium-energy Parallel high-energy Parallel ultrahigh energy
Energy (keV)
Hole Size (mm)
Hole Length (mm)
Septal Thickness (mm)
Relative Sensitivity
Rcol at 10 cm (mm)
140 140 140 140 140 140 300 364 511
1.4 2.0 1.1 1.1 1.4 1.9 3.5 4.2 4.2
24 24 24 36 35 41 50 63 80
0.20 0.20 0.15 0.15 0.20 0.25 1.30 1.30 2.40
1.0 2.0 0.7 0.3 0.8 1.2 0.8 0.5 0.3
8.3 11.9 6.3 4.5 7.1 7.1 11.6 13.4 12
(a)
5 cm
10 cm
15 cm
20 cm
25 cm
30 cm
35 cm
40 cm
(b) 45 40
FWHM (mm)
35 30 25 20 15 10 5 0 0
50
100
150
200
250
300
350
400
450
500
550
Source distance (mm)
Figure 6. Collimator spatial resolution as a function of source distance. The spatial resolution of a collimator falls continuously as the source distance increases. This is shown in the quality of the images (a) and the plot of the calculated FWHM (b).
SPECT studies are usually acquired over a full 360° arc. This yields better quality images than 180° acquisitions because it tends to compensate somewhat for the effects of attenuation. One exception to this practice is myocardial perfusion studies, which are acquired using views from
only 180° (see later). SPECT acquisitions are performed either in step-and-shoot mode or in a continuous rotational mode. In the step-and-shoot mode, the detector rotates to its angular position and begins collecting data after the detector stops for a preselected frame duration. In the
1316
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
(a)
(b)
Scintillation camera
(c)
(d)
Figure 7. SPECT systems. SPECT systems consist of one or more scintillation cameras mounted on a gantry that allows image collection from 360° around a patient. The most common configuration has two scintillation cameras. To accommodate the 180° sampling of myocardial perfusion studies, many systems can locate the scintillation cameras at either 90° or 180° .
continuous rotational mode, the duration of the entire study is selected, and the detector rotational speed is adjusted to complete one full orbit. Data are collected continually and are binned into a preselected number of projections. Typically 60 to 120 projection views are acquired over 360° . Another feature of SPECT acquisition is body contouring of the scintillation cameras. Because spatial resolution depends on the source-to-collimator distance, it is crucial to maintain close proximity to the body as the detector rotates about the patient. Although a number of different approaches have been used to accomplish this, the most common method moves the detectors radially in and out as a function of rotational angle. Myocardial perfusion studies are the most common SPECT procedures. Because the heart is located in the left anterior portion of the thorax, gamma rays originating in the heart are highly attenuated for views collected from the right lateral and right posterior portions of the arc. For this reason, SPECT studies of the heart are usually collected using the 180° arc that extends from the left posterior oblique to the right anterior oblique view (14) (Fig. 7c). This results in reconstructed images that have the best contrast, although distortions are often somewhat more pronounced than when 360° data are used (15). Because of the widespread use of myocardial perfusion imaging, many SPECT systems have been optimized for 180° acquisition by using two detectors arranged at ∼90° (Fig. 7c). This reduces the acquisition time by a factor of 2 compared to single detectors and is approximately 30% more efficient than triple-detector SPECT systems. Positioning the detectors at 90° poses some challenges for maintaining close proximity. Most systems rely on the motion of both the detectors and the SPECT table to accomplish this.
The heart is continually moving during the SPECT acquisition, and this further compromises spatial resolution. Because the heart beats many times per minute, it is impossible to acquire a stop-action SPECT study directly. However, the heart’s motion is periodic, so it is possible to obtain this information by gating the SPECT acquisition (16). In a gated SPECT acquisition, the cardiac cycle is subdivided, and a set of eight images that span the ECG R–R interval is acquired for each angular view. These images are placed into predetermined time bins based on the patient’s heart rate, which is monitored by the ECG R wave interfaced to the SPECT system. As added benefits of gating, the motion of the heart walls can be observed, and ventricular volumes and ejection fractions can be determined (17) (Fig. 1). Although most SPECT imaging samples a more or less static distribution of radionuclides, some SPECT systems can perform rapid sequential studies to monitor tracer clearance. An example of this is determining regional cerebral blood from the clearance of 133 Xe (18). Multiple 1-minute SPECT studies are acquired over a 10-minute interval. When one acquisition sample is completed, the next begins automatically. To minimize time, SPECT systems that perform these studies can alternately reverse the acquisition direction, although at least one SPECT system uses slip-ring technology, so that the detectors can rotate continuously in the same direction. SPECT Image Reconstruction The projection information collected by the SPECT system has to be mathematically reconstructed to obtain tomographic slices (8,19–21). The information sought is the distribution of radioactivity for one selected transaxial plane, denoted by f (x, y). A projection through this distribution consists of a set of parallel line integrals,
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
p(s, θ ), where s is the independent variable of the projection and θ is the angle along which the projection is collected. Ignoring attenuation effects for the moment, ∞ p(s, θ ) =
f (x , y ) dt,
(3)
−∞
where x = s cos θ − t sin θ ; y = s sin θ + t cos θ.
(4)
If a complete set of projections can be collected over θ , then one can analytically reconstruct f (x, y) by using several different equivalent methods. The most common approach used in SPECT is filtered backprojection: 2π f (x, y) =
plane. To reconstruct a tomographic slice, the projections associated with that plane are gathered together, as shown in Fig. 8. This organization of the projections by angle is often referred to as a sinogram because each source in the plane completes a sinusoidal trajectory. The reconstruction filter is usually applied to the projections in the frequency domain, and the filtered projections are then backprojected to generate the tomographic slice (Fig. 9). The noise level of the acquired projections is typically high. When the ramp reconstruction filter is applied, amplification of the noise-dominant higher frequencies overwhelms the reconstructed image. To prevent this, the reconstruction filter is combined with a low-pass filter (apodization). Many different low-pass filters have been used in this application. One common example is the Butterworth filter. B(ω) =
p∗ [x cos(θ ) + y sin(θ )] dθ,
1317
(5)
1 2N , ω 1+ ωc
(7)
0
where p∗ is the measured projection altered by a reconstruction filter [R(ω)]: p∗ (s, θ ) = FT−1 {FT[p(s, θ )] × R(ω)}.
and the apodized reconstruction filter is |ω| 2N . ω 1+ ωc
R(ω) =
(6)
For an ideal projection set (completely sampled and no noise), R(ω) = |ω| and is commonly referred to as a ramp filter. The amplification of high-frequency information by this filter requires adding a low-pass filter when real projections are reconstructed, as discussed later. Operationally, SPECT imaging proceeds as follows. Projection views are collected with a scintillation camera at multiple angles about the patient. The field of view of the scintillation camera is large, so that information is acquired from a volume where each row of the projection view corresponds to a projection from a transaxial
(8)
The adjustable parameters, the cutoff frequency (ωc ), and the order (N) allow configuring the reconstruction filter for different imaging situations. A low-cutoff frequency is desirable when the projections are noisy. When the count density’s high, a low cutoff yields an overly smoothed result, as shown in Fig. 10. An accurate description of SPECT imaging requires including attenuation. A more appropriate model of the measured projections (excluding scatter and resolution
0°
90°
180°
270°
Projection image set
Sinogram Figure 8. Projection data sets. The scintillation camera collects information from multiple projections in each view. A projection set consists of a stack of image rows from each of the angular views. The organization of projections by angle is commonly referred to as a sinogram. Madsen, M.T. Introduction to emission CT. Radiogrophics 15:975–991, 1995.
1318
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
Transform projection
Multiply by ramp filter
Transform back
Ramp filter
Projection data set (sinogram)
Spatial frequency
Filtered projection data set
Reconstructed image:
Figure 9. Filtered backprojection reconstruction. The projections are modified by the reconstruction filter and then backprojected to yield the tomographic image. For an ideal projection set, the ramp filter provides an exact reconstruction.
Backprojection of filtered projection data set
wc = 0.2 Nyquist
wc = 0.4 Nyquist
wc = 0.2 Nyquist
wc = 0.4 Nyquist
wc = 0.6 Nyquist
wc = 0.8 Nyquist
wc = 0.6 Nyquist
wc = 0.8 Nyquist
Figure 10. SPECT reconstruction filters and noise. Because of the statistical fluctuations in the projection views, it is necessary to suppress the high-frequency components of the ramp reconstruction filter by selecting appropriate filter parameters. Because noise suppression also reduces detail, the optimal filter choice depends on the organ system imaged and the count density. Madsen, M.T. Introduction to emission CT. Radiogrophics 15:975–991, 1995.
effects) is ∞ p(s, θ ) = −∞
f (x , y ) exp −
∞
µ(x , y ) dt dt.
(9)
t
This formulation is known as the attenuated Radon transform. Unfortunately, there is no analytic solution to this problem. Until recently, there were two ways of handling this problem for clinical studies. The first, and still very common, is simply to reconstruct the acquired projections using filtered backprojection and accept the
artifacts that accompany the inconsistent data set. The second is to apply a simple attenuation correction. The most commonly used attenuation correction is the firstorder Chang method in which a calculated correction map is applied to the reconstructed images (22). The correction factors are calculated by assuming that activity is uniformly distributed in a uniformly attenuating elliptical contour. The size of the ellipse is determined from the anterior and lateral projections. This attenuation correction is fairly adequate for parts of the body such as the abdomen and head but is not useful for the thorax where the assumptions are far from valid.
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
The other approach to image reconstruction uses iterative algorithms (23,24) (Fig. 11). Iterative algorithms are more time-consuming than filtered backprojection, but they have several important advantages. These include the elimination of radial streak artifacts, often seen in images reconstructed using filtered backprojection; accurate correction of physical degradations such as scatter, attenuation and spatial resolution; and better performance where a wide range of activities is present or where limited angle data are available. Iterative algorithms for image reconstruction were introduced in the 1970s resulting from the advent of X-ray computed tomography. These algorithms were extensions of general approaches to solving linear systems by using sparse matrices. Significant progress in iterative algorithms for emission computed tomography was made in 1982 when the maximum likelihood expectation maximization (MLEM) algorithm of Shepp and Vardi was introduced (25). In the ML-EM approach, the Poisson nature of the gamma rays is included in the derivation. The likelihood that the measured projections are consistent with the estimated emission distribution is maximized to yield λj cij yi λnew = , j bi i cij
(10)
i
where bi =
cik λk
1319
(11)
k
In this formulation, λj is the emission distribution (i.e., the SPECT image), y is the set of measured projections, and b is the set of calculated projections from the current estimate of λ. The cij are backprojection weighting factors that can also encompass appropriate factors for other physical effects such as attenuation, spatial resolution, and scatter. This yields an algorithm with several nice features. First, it is easy to see that because the updating of the estimate in each iteration depends on a ratio, it automatically restricts results to positive numbers. Second, the algorithm conserves the total image counts in each iteration. Unfortunately, the ML-EM algorithm converges slowly, and 20–50 iterations are often required for a satisfactory result. One reason for the slow convergence of the ML-EM algorithm is that the SPECT estimate is updated only at the end of each iteration. One way of significantly reducing the number of iterations is the ordered subset (OS-EM) approach introduced by Hudson and Larkin (26). Using OS-EM, the projection set is split into multiple equal-sized projection sets. For example, a projection set of 64 angular samples might be split into eight subsets of eight samples each. The members of each set are
ML-EM iterative algorithm
Calculate projections
Measured projections
Estimated projections
X
Updated estimate
Measured Estimated Backproject ratio
Filtered backprojection
OS-EM Figure 11. Iterative reconstruction. In iterative reconstruction, the initial uniform estimate of the tomographic slice is continually updated by backprojecting the ratio of the measured and calculated projections from the latest estimate. Although computationally intensive, iterative reconstructions allow accurate correction for attenuation and other physical degradations. They also reduce streak artifacts and perform better than filtered backprojection when the projection set is undersampled.
1320
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
chosen to span the angular range. Set one would consist of projections 1, 9, 17, 25, 33, 41, 49, and 57, and set two would have projections 2, 10, 18, 26, 34, 42, 50, and 58, and so on. The ML-EM algorithm is applied sequentially to each subset, and one iteration is completed when all the subsets have been operated on. Thus, the estimated emission distribution is updated multiple times in each iteration. This approach decreases the number of required iterations by a factor approximately equal to the number of subsets. As a result, five or less iterations of the OS-EM algorithm are sufficient for most SPECT reconstructions. Although strict convergence has not been demonstrated for the OS-EM algorithm, it is now the most commonly used iterative algorithm for emission tomographic applications. The reconstruction process produces a set of contiguous transaxial slices. These slices can be viewed individually, in a group, or even in a cine format. However, often the best way to visual the information is by using views that are parallel to the long axis of patients. These views can be generated directly from the transaxial slices. Sagittal slices are oriented at 0° (the side of the body) and proceed laterally from the right to the left side. Coronal slices are oriented at 90° (the front of the body) and proceed from posterior to anterior. These orientations are useful because many of the organs are aligned with the long axis of the body. An exception is the heart, which points down and to the left. Oblique views parallel and perpendicular to the long axis of the heart are generated for myocardial perfusion studies (see Figs. 1 and 14). Because myocardial
SPECT is common, automatic routines exist to generate these views. The transverse, sagittal, and coronal views are very useful, but they require that the observer view multiple slices. Another useful display is the maximum pixel intensity reprojection. Projections through the reconstructed slice volumes are calculated for typically 20–30 viewing angles over 360° . Instead of summing the information, the highest count pixel value is projected for each ray. Often this value will also be distance weighted. Then, the set of reprojected images is viewed in a cine format yielding a high-contrast, three-dimensional display. The maximum pixel reprojection displays are most useful for radiopharmaceuticals that accumulate in abnormal areas. Examples of these SPECT displays are shown in Fig. 12. SPECT imaging is susceptible to many artifacts if not performed carefully (27,28). Many of the artifacts are a direct consequence of the fundamental assumptions of tomography. The primary assumption is that the external measurements of the distribution reflect true projections, i.e., line integrals. It has already been noted that attenuation and scatter violate this assumption. In addition, it is critical that an accurate center of rotation is used in the backprojection algorithm. The center of rotation is the point on a projection plane that maps the center of the image field, and it must be known to within one-half pixel. Errors larger than this distort each reconstructed point into a ‘‘doughnut’’ shape (Fig. 13a). It
(a)
R Transverse L
R
Sagittal
L
R
Coronal
L
(b)
Figure 12. SPECT tomographic displays. Because SPECT uniformly samples a large volume, multiple transverse slices are available. This data can be resliced (a) to yield sagittal (side views parallel to the long axis of the body) and coronal (front views parallel to the long axis of the body), or any oblique view. Another useful display is (b) the maximum pixel reprojection set that is often viewed in cine mode.
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
(a)
Accurate COR
COR off by 3 pixels
is also a fundamental assumption of emission tomography that the detector field is uniform. Nonuniformities in the detector result in ring-shaped artifacts (Fig. 13b). This problem is especially acute for uniformities that fall near the center of rotation, where deviations as small as 1% can result in a very distinct artifact. A change of the source distribution from radiotracer kinetics or patient motion also induces artifacts (Fig. 13c). Although much of SPECT imaging today uses the methods described before, the equations presented earlier oversimplify of the actual imaging problem. Both the spatial resolution of SPECT system and the scatter contributions correlate information from other planes. An accurate description of SPECT requires a 3-D formulation such as (29) p(s, θ ) = c
∞ h(t, ω; r)
(b)
1321
f (r) exp −
−∞
∞
µ(u) dt dt dω.
r
(12) Here h(t, ω; r) represents a three-dimensional system transfer function that includes both the effects of spatial resolution and scattered radiation, f (r) is the emission distribution and µ(u) is the attenuation distribution. This more accurate model has not been routinely implemented for clinical situations because of high computational costs. However, investigations have shown measurable improvements in image quality, and it is likely that the 3-D formulation will be standard in the near future. This is discussed in greater detail in the Quantitative SPECT section.
SPECT SYSTEM PERFORMANCE Ring artifact (c)
The system performance of SPECT is summarized in Table 5. The scintillation cameras and the associated collimation determine the count sensitivity of a SPECT system. SPECT spatial resolution is generally isotropic and has a FWHM of 8–10 mm for brain imaging and 12–18 mm for body imaging. Spatial resolution is affected by the collimation, the organ system imaged, and the radiopharmaceutical used. This becomes clear when the components of the spatial resolution are examined. SPECT
No Motion Table 5. SPECT System Performance (High-Resolution Collimator) Parameter
Motion Figure 13. SPECT artifacts. SPECT images are susceptible to a variety of artifacts. (a) Inaccurate center of rotation values blurs each image point. (b) Nonuniformities in the scintillation camera cause ring artifacts. (c) Motion during SPECT acquisition can cause severe distortions.
Number of scintillation cameras Count sensitivity per camera Matrix size Pixel size Spatial resolution (brain studies) Spatial resolution (heart studies) SPECT uniformity Contrast of 25.4-mm spherea
Specification 1, 2 or 3 250 cpm/µCi per detector 64 × 64; 128 × 128 6 mm; 3 mm 8 mm 14 mm 15% 0.45
a Measured in a cylindrical SPECT phantom of 22-cm diameter at a detector orbit radius of 20 cm (58).
1322
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
spatial resolution is quantified by RSPECT =
R2col + R2filter + R2int .
(13)
The intrinsic spatial resolution is a relatively minor factor in this calculation. The collimator resolution depends on the type of collimator selected and how close the scintillation camera can approach the patient during acquisition. The type of collimation depends on the expected gamma ray flux and the energy of the gamma rays emitted. If a 99m Tc radiopharmaceutical is used that concentrates with high uptake in the organ of interest, then a very high-resolution collimator can be used to minimize the Rcol component. However, if a highenergy gamma emitter such as 131 I is used, the appropriate collimator will perform significantly more poorly. Keeping the collimator close to the patient during acquisition is extremely critical for maintaining good spatial resolution.
The best results are obtained for head imaging where a radius of less than 15 cm is possible for most studies. In the trunk of the body, it is difficult to maintain close proximity, and there is a corresponding loss of spatial resolution. In addition, count density is a major consideration. Low count density requires more smoothing within the reconstruction filter, and this imposes additional losses in spatial resolution. QUANTITATIVE SPECT As stated before, until recently, most SPECT imaging relied on either filtered backprojection with no corrections or corrections that use simple physical models that often are poor descriptors of the actual imaging situation (Fig. 14a,b). Routines are available in clinical SPECT systems for enhancing contrast and suppressing noise by using Metz or Wiener filters in conjunction with
(b)
(a)
Conventional attenuation Correction assumptions Reality (c)
(d) Detector 1
Detector 1 Detector 2
No Attenuation Correction
100 keV
Collimated line source End view
Side view 1 Attenuation Correction
Mask size (Y)
2
Line source
1 Mask width (X)
Figure 14. SPECT attenuation correction. Accurate attenuation correction is important for myocardial perfusion imaging because of the heterogeneous distribution of tissues that are different from the assumptions used in the simplified correction schemes (a) and (b). Accurate attenuation correction requires an independent transmission study using an external gamma ray source such as that shown in (c). Attenuation correction removes artifacts that mimic coronary artery disease (d). Photos courtesy of GE Medical Systems.
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
is generally sufficient information to provide useful attenuation compensation. It is desirable to minimize noise in the attenuation map, but because the correction is integral, the propagation of noise is considerably less than in multiplicative corrections such as in PET studies. Compton scattered radiation accounts for about 30–40% of the acquired counts in SPECT imaging. This occurs despite the energy discrimination available in all SPECT systems. This is illustrated in Fig. 15a which shows a plot of the energy of a Compton scattered photon as a function of the scattering angle for different energy gamma rays. Future SPECT systems may have substantially better energy resolution than the 9–10% that is available from NaI(Tl) detectors, but for now, it is necessary to correct for this undesirable information. Scattered
Energy of compton scattered photons as a functiion of angle
(a)
70
% Energy loss
60 50 40 30 20 10 0 0
(b)
50 100 150 Scattering angle (degrees)
Scatter image
200
Photopeak image
290 k
280 k
# of Events
filtered backprojection (30,31). These filters are most often applied to the projection set before reconstruction and have a resolution-restoration component. Then, the resulting projections can be reconstructed by using a ramp filter because the noise is already suppressed. This prefiltering improves image quality significantly, but it still does not accurately correct for attenuation, distance-dependent spatial resolution, or scatter. An iterative approach is required to accomplish that (14,32,33). In the past, iterative algorithms have not been used because of the computational load required to implement the corrections accurately. However, the combination of faster computers and improved reconstruction algorithms in recent years has made these corrections feasible. Gamma ray attenuation by the body destroys the desired linear relationship between the measured projections and the true line integrals of the internally distributed radioactivity. Reconstructing of the measured projections without compensating for attenuation results in artifacts (34). This is especially a big problem in the thorax where the artifacts from diaphragmatic and breast attenuation mimic the perfusion defects associated with coronary artery disease (Fig. 14d). To correct accurately for attenuation, the attenuation distribution needs to be known for each slice. Many different approaches have been investigated to obtain attenuation maps. These range from using information in the emission data to acquiring transmission studies (35,36). Transmission acquisition is the standard approach used today. In this approach, an external source (or sources) is mounted on the gantry opposite a detector, and transmission measurements are acquired at the same angles as the emission data (Fig. 14c). All of the manufacturers of SPECT systems have options for obtaining transmission studies by using external sources to measure the attenuation distribution of the cardiac patients directly using the scintillation camera as a crude CT. Most of the transmission devices allow simultaneous acquisition of emission and transmission information. Therefore, the transmission sources must have energy emissions different from the radiotracers used in the clinical study. Radionuclides that have been used as sources for transmission studies include Am-241, Gd-153, Ba-133 and Cs-137, and at least one vendor uses an X-ray tube. Different source configurations have been used to collect the transmission studies (37), but all of the commercial systems use one or more line sources. In some systems, the line source is translated across the camera field of view (38). One vendor uses an array of line sources that spans the field of view (39). The information collected from these transmission measurements is corrected for cross talk by using the emission gamma rays, and the transmission views are reconstructed to yield an attenuation map. If the photon energy of the transmission source is significantly different from that of the emission radionuclide, the map has to be scaled to the appropriate attenuation coefficients. This is a relatively easy mapping step and can be done with sufficient accuracy. Then, the scaled attenuation map can be used in the iterative algorithm. Because of time, count rate and sensitivity constraints, the quality of the attenuation maps is poor. However, there
1323
Energy Figure 15. Scattered radiation. Compton scattered radiation degrades contrast and compromises attenuation correction. (a) shows a plot of the energy loss of scattered photons as a function of angle and energy. (b) Scattered radiation can be compensated for by acquiring additional data simultaneously from an energy window below the photopeak.
1324
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
radiation decreases contrast and can impact other corrections. For example, when attenuation correction is applied without also correcting for scattered radiation, the heart walls near the liver may be overenhanced. Scatter has been corrected in several different ways (33,40–44). The easiest to implement is the subtraction method where information is simultaneously acquired in a second energy window centered below the photopeak in the Compton scatter region of the energy spectrum (Fig. 15b). After establishing an appropriate normalization factor, the counts from the scatter window are subtracted from the photopeak window. Then, the corrected projections are used in the reconstruction algorithm. The disadvantage of this approach is that it increases noise and it is difficult to establish an accurate normalization factor. An alternate approach is to model the scatter as part of the forward projection routine using a more or less complicated photon-transport model (30,31,42,45,46). This requires information about tissue densities, which is available from transmission measurements. This approach has the potential to provide the best correction; however, it is computationally intensive. It is likely that this problem will be overcome by improvements in computer performance and by more efficient algorithms. For example, it has already been shown that one can calculate the scatter fractions on a coarser grid than the emission data because it has inherently low resolution (47). In addition, because of its low spatial frequency, it converges rapidly, and it is not necessary to update the calculated scatter data at every iteration. Correction must also be made for the distancedependent spatial resolution discussed earlier (Fig. 6). Although a number of approaches have been investigated for applying this correction by using filtered backprojection, the best results have been achieved from iterative algorithms. Like scatter correction, accurate modeling of spatial resolution requires a three-dimensional approach. This is computationally intensive because specific convolution kernels are required for each projection distance. The Gaussian diffusion method is a simpler and faster alternative (48). Using Gaussian diffusion, a convolution kernel is chosen that is sequentially applied at each row of the forward projection matrix. The repeated convolutions reproduce the distance dependence of the collimation fairly accurately. SPECT/PET HYBRID SYSTEMS As previously stated, the motivating force behind SPECT imaging is the availability of radiopharmaceuticals that provide crucial diagnostic information. In recent years, it has become apparent that the premier cancer imaging agent is 18 F Fluorodeoxyglucose (18 F FDG). Fluorodeoxyglucose is a glucose analog that reveals metabolic activity, and it has a sensitivity and specificity for detecting a large number of cancers, including lung, colon, breast, melanoma, and lymphoma. However, 18 F is a positron emitter. This makes it ideally suitable for positron-emission tomography, but unfortunately much less suited for SPECT. The main reason for this is the
high energy of the annihilation radiation resulting from the positron emission. High-energy photons are a problem in SPECT for two reasons. First, the relatively thin NaI(Tl) crystals have low efficiency for detection. At 511 keV, the photopeak efficiency is less than 10% for a 9.6-mm crystal. The second problem is that it is difficult to collimate these high-energy photons (49–51). Because thicker septa are required, the count sensitivity is very low. As a result, the spatial resolution is 30–50% worse than collimators used with 99m Tc. This poor spatial resolution reduces the sensitivity of the test. There is one SPECT application where 18 F FDG performs adequately, and that is in heart imaging. Fluorodeoxyglucose provides information about the metabolic activity of the heart muscle and is a good indicator of myocardial viability. However, the imaging of 18 F FDG in tumors is substantially worse in SPECT than in PET tomographs. As stated previously, the dual detector SPECT system is the most common configuration. One obvious solution to the problem of collimated SPECT using positron emitting radiotracers is to resort to coincidence imaging (Table 6 and Fig. 16). When a positron is emitted from a nucleus during radioactive decay, it dissipates its energy over a short distance and captures an electron. The electron positron pair very quickly annihilates each other and produces two collinear 511keV photons. This feature of annihilation radiation can be exploited in coincidence detection where simultaneous detection by opposed detectors is required. Two opposed scintillation cameras whose collimators are removed can have additional electronics added to enable coincidence detection and essentially turn a SPECT system into a PET tomograph (52,53). Although this may sound easy, there are many problems to overcome. Detection of annihilation photons at the two scintillation cameras represents independent events. The overall efficiency for detection is equal to the product of the individual efficiencies. With a singles efficiency of about 10% (i.e., the efficiency for either detector to register one event), the coincidence efficiency drops to 1%. Although this is very low compared to the detection efficiency at 140 keV (86%), the overall coincidence efficiency is actually very high compared to system efficiency using collimators. But there are still problems. The detection efficiency for detecting only one photon (singles efficiency) is an order of magnitude higher than the coincidence efficiency. This leads to problems with random coincidences. Random coincidences are registered coincidence events that do not result from a
Table 6. SPECT/PET Hybrid System Performance Parameter Number of scintillation cameras NaI(Tl) thickness(mm) Matrix size Pixel size Maximum singles rate(counts/s) Maximum coincidence rate(counts/s) Spatial resolution (mm)
Specification 2 15.9–19 128 × 128 3 mm 1,000,000–2,000,0000 10,000 5
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
1325
Scintillation camera Electronics
Computer acquisition station
Position processor
Nal (TI)
Coincidence gate pulse
Shields & absorbers
Coincidence board
Nal (TI)
Electronics Scintillation camera
single annihilation. These randoms present a background that needs to be subtracted. Because of the low coincidence efficiency, the random rates in coincidence scintillation cameras are quite high and further compromise image quality. One way to improve this problem is to increase efficiency by using thicker crystals. All of the vendors have done this. The crystals have been increased from 9.6 mm to at least 15.9 mm and to as much as 19 mm. As NaI(Tl) crystals increase in thickness, there is a loss of intrinsic spatial resolution that limits the thickness of the crystals that can be used. In most SPECT imaging studies, there are essentially no count rate losses from finite temporal resolution. The amount of radioactivity that can be safely administered and the low sensitivity of the collimation put the observed count rate well within the no-loss range. In fact, most SPECT studies would benefit from higher gamma ray flux. This is not so in coincidence imaging. Once the collimators are removed, the wide open NaI(Tl) crystals are exposed to very large count rates. These count rates are so high that count rate losses are unavoidable, and they become the limiting factor in performance. Much effort has been devoted to improving this situation. In the early 1990s, the maximum observed counting rate for a scintillation camera was in the 200,000–400,000 count/second range. As the SPECT systems were redesigned for coincidence imaging, this rate has been extended to more than 1,000,000 counts/second by shortening the integration time on the pulses and implementing active baseline restoration. The limiting count rate factor in the scintillation camera is the persistence of the scintillation. The 1/e scintillation time for NaI(Tl) is 250 nanoseconds. At the lower energies at which the scintillation camera typically operates, it is advantageous to capture the entire scintillation signal to optimize energy resolution and intrinsic spatial resolution. In coincidence imaging, shortening the pulse is mandatory. Fortunately, the
Figure 16. Coincidence imaging. Opposed scintillation cameras can acquire PET studies by adding of coincidence electronics to record the locations of simultaneously detected events. Photos courtesy of GE Medical Systems.
increased signal obtained from a 511-keV interaction (compared to the typical 140-keV) allows shortening the pulse integration without extreme degradation. Note that, even with these efforts, the coincidence cameras are still limited by count rate and the amount of activity that can be in the camera field of view at the time of imaging is restricted to less than 3 mCi. Thus, one cannot make up for the loss in sensitivity by giving more activity to the patient. Other measures have been taken to help reduce the count rate burden. One of these is a graded absorber. The scintillation camera has to process every event that the crystal absorbs. The only events we care about are the 511-keV photons, but many low-energy photons that result from scatter within the subject also interact with the detector. The photoelectric cross section is inversely proportional to the cube of the gamma ray energy. This means that a thin lead shield placed over the detector will freely pass most 511-keV photons but will strongly absorb low-energy photons. If one uses only lead, there is a problem with the lead characteristic X rays that are emitted as part of the absorption process. These can be absorbed by a tin filter. In turn, the tin characteristic X rays are absorbed by a copper filter and the copper characteristic X rays by an aluminum filter. Even though the detectors are thin, the uncollimated detectors present a large solid angle to the annihilation photons. To achieve maximum sensitivity, it is desirable to accept all coincidences, even those at large angles. Several problems are associated with this. First, it is apparent that the camera sensitivity is highly dependent on position. Sources at the central axis of the detectors have a large solid angle, whereas those at the edge have a very small solid angle. In addition, including of the large angle coincidence events drastically increases the scatter component to more than 50%. Because of this problem, many manufacturer’s use lead slits aligned perpendicular
1326
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
to the axial direction to restrict the angular extent of the coincidences. This reduces the scatter component to less than 30% and also reduces the solid angle variability. The intrinsic spatial resolution of hybrid systems is comparable to that of the dedicated PET systems whose FWHM is 4–5 mm. However, the count sensitivity is at least an order of magnitude lower. This, along with the maximum count rate constraint, guarantees that the coincidence camera data will be very count poor and therefore require substantial low-pass filtering when reconstructed. As a result, the quality of the reconstructed images is perceptibly worse than the dedicated PET images (Fig. 17). In head-to-head comparisons, it has been found that the hybrid systems perform well on tumors greater than 2 cm in diameter located in the lung (54–56). Tumors smaller than 1.5 cm and those located in high background areas are detected by a much lower sensitivity. These results are important because they provide a guide for the useful application of the coincidence camera. Other improvements have also been made on scintillation camera performance for coincidence imaging. As discussed before, the conventional method for determining the location of an interaction on the detector is through a weighted average of PMT signals. At high count rates, this produces unacceptable positioning errors. For example, if two gamma rays are absorbed simultaneously in the opposing corners of the detector, the Anger logic will place a single event in the center of the camera. New positioning algorithms have been developed that use maximum likelihood calculations and can correctly handle that scenario. The projection information collected by coincidence cameras requires correction for random coincidences, scatter, and attenuation for accurate tomographic images. Typically, randoms are either monitored in a separate time window or are calculated from the singles count rate and are subtracted. Scatter correction is sometimes ignored or is accomplished by subtracting data collected in a separate scatter energy window, as discussed for SPECT imaging. Attenuation correction requires information about the transmission of gamma rays through the body at the coincidence lines of response. Some systems ignore this correction and just reconstruct the random- and scatteredcorrected projections. This creates rather severe artifacts
(a)
(b)
Figure 17. Comparison of PET images from a SPECT/PET hybrid system and a dedicated PET system. (a) The top row of images shows several slices obtained by a dedicated PET system from a patient who has lung cancer. (b) The bottom row shows the corresponding images obtained from a SPECT/PET hybrid system. The intensified body edge seen in both the dedicated PET and SPECT/PET hybrid system is an attenuation artifact.
but still shows the accumulation of 18 F FDG in tumor sites (Fig. 17). When attenuation is corrected, a separate transmission study is performed using an external source, and attenuation maps are formed in a manner similar to that discussed in myocardial SPECT. Cs-137 has been used for this purpose for coincidence cameras. Attenuation correction factors for coincidence imaging are very large for annihilation radiation going through thick portions of the body where they can approach a value of 100. SUMMARY SPECT imaging is expected to play a continuing and important role in medical imaging. Future improvements in SPECT instrumentation are likely to include new detectors and collimation schemes. The coincidence scintillation cameras will also continue their evolution by adding more cameras and multidetector levels optimized for SPECT and coincidence imaging. Improvements in reconstruction algorithms will include prior information about the tomographic image such as smoothness constraints and anatomic distributions. The primary motivating factor in SPECT imaging will continue to be the creation and implementation of new radiopharmaceuticals. SPECT will continue in wide use for myocardial perfusion imaging, but SPECT use in tumor imaging will probably experience the largest growth. Applications will include treatment planning for internal radiation therapy as well as diagnostic studies (57). ABBREVIATIONS AND ACRONYMS FWHM MRI ML-EM OS-EM PMT PET SPECT CT
full-width-at-half-maximum magnetic resonance imaging maimum likelihood expectation maximization ordered subsets expectation maximization photomultiplier tube positron emission tomography single photon emission computed tomography X-ray computed tomography
BIBLIOGRAPHY 1. J. A. Patton, Radiographics 18, 995–1,007 (1998). 2. T. R. Miller, Radiographics 16, 661–668 (1996). 3. R. Hustinx and A. Alavi, Neuroimaging Clin. N. Am. 9, 751–766 (1999). 4. B. L. Holman and S. S. Tumeh, JAMA 263, 561–564 (1990). 5. R. E. Coleman, R. A. Blinder, and R. J. Jaszczak, Invest. Radiol. 21, 1–11 (1986). 6. J. F. Eary, Lancet 354, 853–857 (1999). 7. D. J. Simpkin, Radiographics 19, 155–167; quiz 153–154 (1999). 8. R. J. Jaszczak and R. E. Coleman, Invest. Radiol. 20, 897–910 (1985). 9. F. H. Fahey, Radiographics 16, 409–420 (1996). 10. S. C. Moore, K. Kouris, and I. Cullum, Eur. J. Nucl. Med. 19, 138–150 (1992). 11. B. M. Tsui and G. T. Gullberg, Phys. Med. Biol. 35, 81–93 (1990).
STEREO AND 3-D DISPLAY TECHNOLOGIES 12. J. M. Links, Eur. J. Nucl. Med. 25, 1,453–1,466 (1998). 13. W. L. Rogers and R. J. Ackermann, Am. J. Physiol. Imaging 7, 105–120 (1992). 14. B. M. Tsui et al., J. Nucl. Cardiol. 5, 507–522 (1998). 15. K. J. LaCroix, B. M. Tsui, and B. H. Hasegawa, J. Nucl. Med. 39, 562–574 (1998).
1327
48. V. Kohli, M. A. King, S. J. Glick, and T. S. Pan, Phys. Med. Biol. 43, 1,025–1,037 (1998). 49. W. E. Drane et al., Radiology 191, 461–465 (1994). 50. J. S. Fleming and A. S. Alaamer, 1,832–1,836 (1996).
J.
Nucl.
Med.
37,
51. P. K. Leichner et al., J. Nucl. Med. 36, 1,472–1,475 (1995).
16. M. P. White, A. Mann, and M. A. Saari, J. Nucl. Cardiol. 5, 523–526 (1998).
52. T. K. Lewellen, R. S. Miyaoka, and W. L. Swan, Nucl. Med. Commun. 20, 5–12 (1999).
17. D. S. Berman and G. Germano, J. Nucl. Cardiol. 4, S169–171 (1997).
53. P. H. Jarritt and P. D. Acton, Nucl. Med. Commun. 17, 758–766 (1996).
18. P. Bruyant, J. Sau, J. J. Mallet, and A. Bonmartin, Comput. Biol. Med. 28, 27–45 (1998).
54. P. D. Shreve et al., Radiology 207, 431–437 (1998).
19. H. Barrett, in A. Todd-Pokropek and M. Viergever, eds., Medical Images: Formation, Handling and Evaluation, Springer-Verlag, NY, 1992, pp. 3–42. 20. M. T. Madsen, Radiographics 15, 975–91 (1995). 21. T. F. Budinger et al., J. Comput. Assist. Tomogr. 1, 131–145 (1977).
55. P. D. Shreve, R. S. Steventon, and M. D. Gross, Clin. Nucl. Med. 23, 799–802 (1998). 56. R. E. Coleman, C. M. Laymon, and T. G. Turkington, Radiology 210, 823–828 (1999). 57. P. B. Zanzonico, R. E. Bigler, G. Sgouros, and A. Strauss, Semin. Nucl. Med. 19, 47–61 (1989). 58. L. S. Graham et al., Medical Phys. 22, 401–409 (1995).
22. L. Chang, IEEE Trans. Nucl. Sci. NS-25, 638–643 (1978). 23. J. W. Wallis and T. R. Miller, J. Nucl. Med. 34, 1,793–1,800 (1993). 24. B. F. Hutton, H. M. Hudson, and F. J. Beekman, Eur. J. Nucl. Med. 24, 797–808 (1997).
STEREO AND 3-D DISPLAY TECHNOLOGIES
25. L. Shepp and Y. Vardi, IEEE Trans. Medical Imaging 1, 113–122 (1982).
DAVID F. MCALLISTER
26. H. Hudson and R. Larkin, IEEE Trans. Medical Imaging 13, 601–609 (1994).
North Carolina State University Raleigh, NC
27. L. S. Graham, Radiographics 15, 1,471–1,481 (1995). 28. H. Hines et al., Eur. J. Nucl. Med. 26, 527–532 (1999). 29. T. Budinger et al., Mathematics and Physics of Emerging Biomedical Imaging, National Academy Press, Washington, D.C., 1996. 30. J. M. Links et al., J. Nucl. Med. 31, 1,230–1,236 (1990). 31. M. A. King, M. Coleman, B. C. Penney, and S. J. Glick, Med. Phys. 18, 184–189 (1991). 32. M. A. King et al., J. Nucl. Cardiol. 3, 55–64 (1996). 33. B. M. Tsui, X. Zhao, E. C. Frey, and Semin. Nucl. Med. 24, 38–65 (1994).
W. H. McCartney,
34. S. L. Bacharach and I. Buvat, J. Nucl. Cardiol. 2, 246–255 (1995). 35. M. T. Madsen et al., J. Nucl. Cardiol. 4, 477–486 (1997). 36. A. Welch, R. Clack, F. Natterer, and G. T. Gullberg, IEEE Trans. Medical Imaging 16, 532–541 (1997). 37. M. A. King, B. M. Tsui, and T. S. Pan, J. Nucl. Cardiol. 2, 513–524 (1995). 38. R. Jaszczak et al., J. Nucl. Med. 34, 1,577–1,586 (1993). 39. A. Celler et al., J. Nucl. Med. 39, 2,183–2,189 (1998). 40. I. Buvat et al., J. Nucl. Med. 36, 1,476–1,488 (1995). 41. F. J. Beekman, C. Kamphuis, and E. C. Frey, Phys. Med. Biol. 42, 1,619–1,632 (1997). 42. F. J. Beekman, H. W. de Jong, and E. T. Slijpen, Phys. Med. Biol. 44, N183–192 (1999).
INTRODUCTION Recently, there have been rapid advancements in 3-D techniques and technologies. Hardware has improved and become considerably cheaper, making real-time and interactive 3-D available to the hobbyist, as well as to the researcher. There have been major studies in areas such as molecular modeling, photogrammetry, flight simulation, CAD, visualization of multidimensional data, medical imaging, teleoperations such as remote vehicle piloting and remote surgery, and stereolithography. In computer graphics, the improvements in speed, resolution, and economy make interactive stereo an important capability. Old techniques have been improved, and new ones have been developed. True 3-D is rapidly becoming an important part of computer graphics, visualization, virtual-reality systems, and computer gaming. Numerous 3-D systems are granted patents each year, but very few systems move beyond the prototype stage and become commercially viable. Here, we treat the salient 3-D systems. First, we discuss the major depth cues that we use to determine depth relationships among objects in a scene.
43. D. R. Haynor, M. S. Kaplan, R. S. Miyaoka, and T. K. Lewellen, Medical Phys. 22, 2,015–2,024 (1995). 44. M. S. Rosenthal et al., J. Nucl. Med. 36, 1,489–1,513 (1995).
DEPTH CUES
45. A. Welch et al., Medical Phys. 22, 1,627–1,635 (1995). 46. A. Welch and G. T. Gullberg, IEEE Trans. Medical Imaging 16, 717–726 (1997). 47. D. J. Kadrmas, E. C. Frey, S. S. Karimi, and B. M. Tsui, Phys. Med. Biol. 43, 857–873 (1998).
The human visual system uses many depth cues to eliminate the ambiguity of the relative positions of objects in a 3-D scene. These cues are divided into two categories: physiological and psychological.
1328
STEREO AND 3-D DISPLAY TECHNOLOGIES
Physiological Depth Cues Accommodation. Accommodation is the change in focal length of the lens of the eye as it focuses on specific regions of a 3-D scene. The lens changes thickness due to a change in tension of the ciliary muscle. This depth cue is normally used by the visual system in tandem with convergence. Convergence. Convergence, or simply vergence, is the inward rotation of the eyes to converge on objects as they move closer to the observer. Binocular Disparity. Binocular disparity is the difference in the images projected on the retinas of the left and right eyes in viewing a 3-D scene. It is the salient depth cue used by the visual system to produce the sensation of depth, or stereopsis. Any 3-D display device must be able to produce a left- and right-eye view and present them to the appropriate eye separately. There are many ways to do this as we will see. Motion Parallax. Motion parallax provides different views of a scene in response to movement of the scene or the viewer. Consider a cloud of discrete points in space in which all points are the same color and approximately the same size. Because no other depth cues (other than binocular disparity) can be used to determine the relative depths of the points, we move our heads from side to side to get several different views of the scene (called look around). We determine relative depths by noticing how much two points move relative to each other: as we move our heads from left to right or up and down, the points closer to us appear to move more than points further away. Psychological Depth Cues Linear Perspective. Linear perspective is the change in image size of an object on the retina in inverse proportion to the object’s change in distance. Parallel lines moving away from the viewer, like the rails of a train track, converge to a vanishing point. As an object moves further away, its image becomes smaller, an effect called perspective foreshortening. This is a component of the depth cue of retinal image size. Shading and Shadowing. The amount of light from a light source that illuminates a surface is inversely proportional to the square of the distance from the light source to the surface. Hence, the surfaces of an object that are further from the light source are darker (shading), which gives cues of both depth and shape. Shadows cast by one object on another (shadowing) also give cues to relative position and size. Aerial Perspective. Distant objects tend to be less distinct and appear cloudy or hazy. Blue has a shorter wavelength and penetrates the atmosphere more easily than other colors. Hence, distant outdoor objects sometimes appear bluish. Interposition. If one object occludes, hides, or overlaps (interposes) another, we assume that the object doing
the hiding is closer. This is one of the most powerful depth cues. Retinal Image Size. We use our knowledge of the world, linear perspective, and the relative sizes of objects to determine relative depth. If we view a picture in which an elephant is the same size as a human, we assume that the elephant is further away because we know that elephants are larger than humans. Textural Gradient. We can perceive detail more easily in objects that are closer to us. As objects become more distant, the textures become blurred. Texture in brick, stone, or sand, for example, is coarse in the foreground and grows finer as distance increases. Color. The fluids in the eye refract different wavelengths at different angles. Hence, objects of the same shape and size and at the same distance from the viewer often appear to be at different depths because of differences in color. In addition, bright-colored objects will appear closer than dark-colored objects. The human visual system uses all of these depth cues to determine relative depths in a scene. In general, depth cues are additive; the more cues, the better the viewer can determine depth. However, in certain situations, some cues are more powerful than others, and this can produce conflicting depth information. Our interpretation of the scene and our perception of the depth relationships result from our knowledge of the world and can override binocular disparity. A TECHNOLOGY TAXONOMY The history of 3-D displays is well summarized in several works. Okoshi (1) and McAllister (2) each present histories of the development of 3-D technologies. Those interested in a history beginning with Euclid will find (3) of interest. Most 3-D displays fit into one or more of three broad categories: stereo pair, holographic, and multiplanar or volumetric. Stereo pair-based technologies distribute left and right views of a scene independently to the left and right eyes of the viewer. Often, special viewing devices are required to direct the appropriate view to the correct eye and block the incorrect view to the opposite eye. If no special viewing devices are required, then the technology is called autostereoscopic. The human visual system processes the images and if the pair of images is a stereo pair, described later, most viewers will perceive depth. Only one view of a scene is possible per image pair which means that the viewer cannot change position and see a different view of the scene. We call such images ‘‘virtual.’’ Some displays include head tracking devices to simulate head motion or ‘‘look around.’’ Some technologies allow displaying multiple views of the same scene providing motion parallax as the viewer moves the head from side to side. We discuss these technologies here. In general, holographic and multiplanar images produce ‘‘real’’ or ‘‘solid’’ images, in which binocular disparity, accommodation, and convergence are consistent with the
STEREO AND 3-D DISPLAY TECHNOLOGIES
apparent depth in the image. They require no special viewing devices and hence, are autostereoscopic. Holographic techniques are discussed elsewhere. Multiplanar methods are discussed later. Stereo Pairs The production of stereoscopic photographs (stereo pairs or stereographs) began in the early 1850s. Stereo pairs simulate the binocular disparity depth cue by projecting distinct (normally flat) images to each eye. There are many techniques for viewing stereo pairs. One of the first was the stereoscope, which used stereo card images that can still be found in antique shops. A familiar display technology, which is a newer version of the stereoscope, is the ViewMaster and its associated circular reels (Fig. 1). Because some of the displays described are based on the stereo pair concept, some stereo terminology is appropriate. Terminology. Stereo pairs are based on presenting two different images, one for the left eye (L) and the other for the right eye (R). Stereo images produced photographically normally use two cameras that are aligned horizontally and have identical optics, focus, and zoom. To quantify what the observer sees on the two images, we relate each image to a single view of the scene. Consider a point P in a scene being viewed by a binocular viewer through a window (such as the film plane of a camera). A point P in the scene is projected on the window surface, normally a plane perpendicular to the observer’s line of sight, such as the camera film plane, the face of a CRT, or a projection screen. This projection surface is called the stereo window or stereo plane. We assume that the y axis lies in a plane that is perpendicular to the line through the observer’s eyes.
1329
The distance between the eyes is called the interocular distance. Assigning a Cartesian coordinate system to the plane, the point P will appear on the left eye view at coordinates (xL , yL ) and in the right eye view at coordinates (xR , yR ). These two points are called homologous. The horizontal parallax of the point P is the distance xR − xL between the left- and right-eye views; the vertical parallax is yR − yL (Fig. 2). Positive parallax occurs if the point appears behind the stereo window because the left-eye view is to the left of the right-eye view. Zero parallax occurs if the point is at the same depth as the stereo window; zero parallax defines the stereo window, and negative parallax occurs if the point lies in front of the stereo window (Fig. 3). Given the previous geometric assumptions, vertical parallax or vertical disparity should always be zero. Misaligned cameras can produce nonzero vertical parallax. Observers differ about the amount they can tolerate before getting side effects such as headache, eye strain, nausea,
Stereo plane Ho
mo
log
ous
poi
Left eye
nts
Scan line
Right eye
Figure 2. Horizontal parallax.
P1 Image behind window
Stereo window
P3 Image at plane of window
P2 Image in front of window
Left eye Figure 1. ViewMaster.
Right eye
Figure 3. Positive/negative parallax.
1330
STEREO AND 3-D DISPLAY TECHNOLOGIES
or other uncomfortable physical symptoms. Henceforth, the term parallax will mean horizontal parallax. If the horizontal parallax is too large and exceeds the maximum parallax, to view the points, our eyes must go wall-eyed, a condition where the eyes each move to the outside to view the image. After lengthy exposure, this can produce disturbing physical side effects. Images in which the parallax is reversed are said to have pseudostereo. Such images can be very difficult to fuse; the human visual system will have difficulty recognizing the binocular disparity. Other depth cues compete and overwhelm the visual system. Parallax and convergence are the primary vehicles for determining perceived depth in a stereo pair; the observer focuses both eyes on the plane of the stereo window. Hence, accommodation is fixed. In such cases accommodation and convergence are said to be ‘‘disconnected,’’ and the image is ‘‘virtual’’ rather than ‘‘solid’’ (see the section on volumetric images later). This inconsistency between accommodation and convergence can make stereo images difficult for some viewers to fuse. If you cannot perceive depth in a stereo pair, you may be a person who is ‘‘stereo-blind’’ and cannot fuse stereo images (interpret as a 3-D image rather than two separate 2-D images). There are many degrees of stereo-blindness, and the ability or inability to see stereo may depend on the presentation technique, whether the scene is animated, color consistency between the L/R pair, and many other considerations. Computation of Stereo Pairs. Several methods have been proposed for computing stereo pairs in a graphics environment. Certain perception issues eliminate some techniques from consideration. A common technique for computing stereo pairs involves rotating a 3-D scene about an axis parallel to the sides of the viewing screen, followed by a perspective projection. This process can cause vertical displacement because of the foreshortening that occurs in a perspective projection. Hence, the technique is not recommended. Although parallel projection will not produce vertical displacement, the absence of linear perspective can create a ‘‘reverse’’ perspective as the result of a perceptual phenomenon known as Emmert’s law: objects that do not obey linear perspective can appear to get larger as the distance from the observer increases. The preferred method for computing stereo pairs is to use two off-axis centers of perspective projection (corresponding to the positions of the left and right eyes). This method simulates the optics of a stereo camera where both lenses are parallel. For further details, see (2). OVERVIEW OF DISPLAY TECHNOLOGIES Separating Left- and Right-Eye Views When viewing stereo pairs, a mechanism is required so that the left eye sees only the left-eye view and the right eye sees only the right-eye view. Many mechanisms have been proposed to accomplish this. The ViewMaster uses two images each directed to the appropriate eye by lenses. The images are shown in parallel, and there is no way one eye can see any part of the other eye view.
It is common in display technologies to use a single screen to reflect or display both images either simultaneously (time parallel) or in sequence (time multiplexed or field sequential). The technologies used to direct the appropriate image to each eye while avoiding mixing the left- and right-eye images require sophisticated electro-optics or shuttering. Some of the more common methods are described here. Cross Talk Stereo cross talk occurs when a portion of one eye view is visible in the other eye. In this case, the image can appear blurred, or a second or double image appears in regions of the scene being viewed that creates a phenomenon called ghosting. Cross talk can create difficulty in fusing L/R views. When using the same display surface to project both eye views, cross talk can be a problem. When stereo displays are evaluated, the cross talk issue should be addressed. Field-Sequential Techniques A popular method for viewing stereo by a single display device is the field-sequential or time-multiplexed technique. The L/R views are alternated on the display device, and a blocking mechanism to prevent the left eye from seeing the right eye view and vice versa is required. The technology for field-sequential presentation has progressed rapidly. Historically, mechanical devices were used to occlude the appropriate eye view during display refresh. A comparison of many of these older devices can be found in (4). Newer technologies use electro-optical methods such as liquid-crystal plates. These techniques fall into two groups: those that use active versus passive viewing glasses. In a passive system, a polarizing shutter is attached to the display device, as in a CRT, or the screen produces polarized light automatically as in an LCD panel. The system polarizes the left- and right-eye images in orthogonal directions (linear or circular), and the user wears passive polarized glasses where the polarization axes are also orthogonal. The polarizing lenses of the glasses combine with the polarized light from the display device to act as blocking shutters to each eye. When the left eye view is displayed, the light is polarized along an axis parallel to the axis of the left-eye lens and the left eye sees the image on the display. Because the axis is orthogonal to the polarizer of the right eye, the image is blocked to the right eye. The passive system permits several people to view the display simultaneously and allows a user to switch viewing easily from one display device to another because no synchronization with the display device is required. It also permits a larger field of view (FOV). The drawback is that the display device must produce a polarized image. Projector mechanisms must have polarizing lenses, and a CRT or panel display must have a polarizing plate attached to or hanging in front of the screen or the projector. When projecting an image on a screen, the screen must be coated with a material (vapor-deposited aluminum) that does not depolarize the light (the commercially available ‘‘silver’’
STEREO AND 3-D DISPLAY TECHNOLOGIES
screen). Polarization has the added disadvantage that the efficiency or transmission is poor; the intensity of the light to reach the viewer compared to the light emitted from the display device is very low, often in the range of 30%. Hence, images appear dark. LCDs can also be used as blocking lenses. An electronic pulse provided by batteries or a cable causes the lens to ‘‘open’’ or admit light from the display device. When no electronic pulse is present, the lens is opaque and blocks the eye from seeing the display device. The pulses are alternated for each eye while the display device alternates the image produced. The glasses must be synchronized to the refresh of the display device, normally using an infrared signal or a cable connection. For CRT-based systems, this communication is accomplished using the stereo-sync or Left/Right (L/R) signal. In 1997, the Video Equipment Standards Association (VESA) called for the addition of a standard jack that incorporates the L/R signal along with a + 5 volt power supply output. Using this new standard, stereo equipment can be plugged directly into a stereo-ready video card that has this jack. Active glasses have an advantage that the display device does not have to polarize the light before it reaches the viewer. Hence, efficiency is higher and back-projection can be used effectively. The disadvantage is obviously the requirement for synchronization. Though the initial cost of the passive system is higher, the cost to add another user is low. This makes the passive system a good choice for theaters and trade shows, for example, where one does not want to expose expensive eyewear to abuse. If the images in both systems are delivered at a sufficiently fast frame rate (120 Hz) to avoid flicker, the visual system will fuse the images into a three-dimensional image. Most mid- to high-end monitors can do this. A minimum of 100 Hz is acceptable for active eyewear systems. One may be able to use 90 Hz for a passive system without perceiving flicker, even in a well-lit room.
1331
user wore polarized glasses that distributed the correct view to each eye. Polarizing filters can also be attached to glass-mounted slides. Incorrect positioning of the projectors relative to the screen can cause keystoning, in which the image is trapezoidal caused by foreshortening that results in vertical parallax. If more than one projector is used, as is often the case when projecting 35-mm stereo slides, for example, orthogonal polarizing filters are placed in front of each projector, and both left- and right-eye images are projected simultaneously onto a nondepolarizing screen. Hence, the technique is time parallel. The audience wears passive glasses in this case. Using more than one projector always brings with it the difficulties of adjusting the images. L/R views should be correctly registered; there must be minimal luminosity differences, minimal size differences, minimal keystoning, minimal vertical parallax, minimal ghosting, and so forth. Most nonautostereoscopic display systems use one of these methods. Following, we indicate which method. 3D DISPLAYS VIEWING DEVICES REQUIRED Hard Copy
Time-Parallel Techniques
Anaglyphs. The anaglyph method has been used for years to represent stereo pairs, and it was a salient technique in old 3-D movies and comic books. Colored filters cover each eye; red/green, red/blue, or red/cyan filters are the most common. One eye image is displayed in red and the other in green, blue, or cyan, so that the appropriate eye sees the correct image. Because both images appear simultaneously, it is a time-parallel method. The technique is easy to produce using simple image processing techniques, and the cost of viewing glasses is very low. Gray-scale images are most common. Pseudocolor or polychromatic anaglyphs are becoming more common. If correctly done, anaglyphs can be an effective method for presenting stereo images.
Time-parallel methods present both eye views to the viewer simultaneously and use optical techniques to direct each view to the appropriate eye. Often, 3-D movies used the anaglyph method that requires the user to wear glasses that have red and green lenses or filters. Both images were presented on a screen simultaneously; hence, it is a time-parallel method. Many observers suffered headaches and nausea when leaving the theater, which gave 3-D, and stereo in particular, a bad reputation. (A phenomenon called ghosting or cross talk was a significant problem. Colors were not adjusted correctly, and the filters did not completely eliminate the opposite-eye view, so that the left eye saw its image and sometimes part of the right-eye image as well. Other problems included poor registration of the left- and righteye images that caused vertical parallax and projectors out of sync.) The ViewMaster is another example of a time-parallel method. An early technique for viewing stereo images on a CRT was the half-silvered mirror originally made for viewing microfiche (4). The device had polarizing sheets, and the
Vectographs. Polaroid’s Vectograph process was introduced by Edwin Land in 1940. The earliest Vectograph images used extensively were black-and-white polarizing images formed by iodine ink applied imagewise to oppositely oriented polyvinyl alcohol (PVA) layers laminated to opposite sides of a transparent base material. The iodine forms short polymeric chains that readily align with the oriented polymeric molecules and stain the sheet. The chemistry is analogous to that of uniformly stained iodine polarizers, such as Polaroid H-sheet, used in polarizing filters for stereo projection and in 3-D glasses used for viewing stereoscopic images [see (2) for more details]. In 1953, Land demonstrated three-color Vectograph images formed by successive transfer of cyan, magenta, and yellow dichroic dyes from gelatin relief images to Vectograph sheet. Unlike StereoJet digital ink-jet printing described next, preparation of Vectograph color images required lengthy, critical photographic and dye transfer steps. Although the process produced excellent images, it was never commercialized.
1332
STEREO AND 3-D DISPLAY TECHNOLOGIES
StereoJet. The StereoJet process, developed at the Rowland Institute for Science in Cambridge, Massachusetts, provides stereoscopic hard copy in the form of integral, full-color polarizing images. StereoJet images are produced by ink-jet printing that forms polarizing images by using inks formulated from dichroic dyes. Paired left-eye and right-eye images are printed onto opposite surfaces of a clear multilayer substrate, as shown in Fig. 4. The two outer layers, formed of an ink-permeable polymer such as carboxymethylcellulose, meter the ink as it penetrates the underlying image-receiving layers. The image-receiving layers are formed of polyvinyl alcohol (PVA) molecularly oriented at 45° to the edge of the sheet. As the dye molecules are adsorbed, they align with the oriented polymer molecules and assume the same orientation. The two PVA layers are oriented at 90° to one another, so that the images formed have orthogonal polarization. StereoJet transparencies are displayed directly by rear illumination or projected by overhead projector onto a nondepolarizing screen, such as a commercially available lenticular ‘‘silver’’ screen. No attachments to the projector are needed because the images themselves provide the polarization. StereoJet prints for viewing by reflected light have aluminized backing laminated to the rear surfaces of StereoJet transparencies. ChromaDepth. Chromostereoscopy is a phenomenon in optics commercialized by Richard Steenblik (2). The technique originally used double prism-based glasses that slightly deflect different colors in an image, laterally displacing the visual positions of differently colored regions of an image by different amounts. The prisms are oriented in opposite directions for each eye, so that different images are presented to each eye, thereby creating a stereo pair (Fig. 5). Production chromostereoscopic glasses, marketed under the name
This image appears in full contrast to the left eye and invisible to the right eye
This image appears in full contrast to the right eye and invisible to the left eye
Figure 4. StereoJet imaging.
Left eye
Actual object distance Superchromatic prism
Superchromatic prism
B
Right eye
R B Make depth
Average mace distance Figure 5. Superchromatic glasses.
ChromaDepth 3-D, use a unique micro-optic film that performs the same optical function as double-prism optics without the attendant weight and cost. Images designed for viewing with ChromaDepth 3-D glasses use color to encode depth information. A number of color palettes have been successfully employed; the simplest is the RGB on Black palette: on a black background, red will appear closest, green in the middleground, and blue in the background. Reversal of the optics results in the opposite depth palette: BGR on Black. A peculiar feature of the ChromaDepth 3-D process is that the user does not have to create a stereo pair. A single ChromaDepth 3-D color image contains X, Y, and Z information by virtue of the image contrast and the image colors. The stereo pair seen by the user is created by the passive optics in the ChromaDepth 3-D glasses. The primary limitation of the ChromaDepth 3-D process is that the colors in an image cannot be arbitrary if they are to carry the image’s Z dimension; so the method will not work on arbitrary images. The best effects are obtained from images that are specifically designed for the process and from natural images, such as underwater reef photographs, that have natural coloring fitting the required palette. Another limitation is that some color ‘‘fringing’’ can occur when viewing CRT images. The light emitted from a CRT consists of different intensities of red, green, and blue; any other color created by a CRT is a composite of two or more of these primary colors. If a small region of a composite color, such as yellow, is displayed on a CRT, the optics of the ChromaDepth 3-D glasses may cause the composite color to separate into its primary components and blur the region. ChromaDepth 3-D high definition glasses reduce this problem by placing most of the optical power in one eye, leaving the other eye to see the image clearly. The ChromaDepth 3-D technique can be used in any color medium. It has found wide application in laser shows and in print, video, television, computer graphic, photographic slide, and Internet images. Many areas of research have benefited from ChromaDepth 3-D, including interactive visualization of geographic and geophysical data.
STEREO AND 3-D DISPLAY TECHNOLOGIES
1333
Transparency Viewers. Cheap plastic and cardboard slide viewers are available from many companies like Reel 3-D Enterprises (http://stereoscopy.com/reel3D/index.html) for viewing 35-mm stereo slides. The user places the left eye view in the left slot and the right eye view in the right slot and then holds them up to the light. This is a standard technique for checking the mounting of slides for correct registration. Field-Sequential Devices StereoGraphics Systems. Although there are many manufacturers of active and passive glasses systems, StereoGraphics is a well-known company that has produced high-quality CRT and RGB projector based stereo systems for years. The quality of their hardware is excellent, and we report on it here. Active StereoGraphics shutters called CrystalEyes (Fig. 6) are doped, twisted-nematic devices. They ‘‘open’’ in about 3 ms and ‘‘close’’ in about 0.2 ms. The shutter transition occurs within the vertical blanking period of the display device and is all but invisible. The principal figure of merit for such shutters is the dynamic range, which is the ratio of the transmission of the shutter in its open state to its closed state. The CrystalEyes system has a ratio in excess of 1000 : 1. The transmission of the shutters is commonly 32%, but because of the 50% duty cycle, the effective transmission is half that. Their transmission should be neutral and impart little color shift to the image being viewed. The field of view (FOV) also varies. Ninety-seven degrees is typical. SGI can operate at a speed up to 200 fields per second. The cost for eyewear and emitter is $1000. Passive systems have a lower dynamic range than active eyewear systems. The phosphor afterglow on the CRT causes ghosting, or image cross talk, in this type of system. Electrode segmentation can be used to minimize the time during which the modulator is passing an unwanted image. The modulator’s segments change state moments before the CRT’s scanning beam arrives at that portion of the screen. The consequence of this action is a modulator that changes state just as the information is changing. This increases the effective dynamic range of the system and produces a high-quality stereo image.
Figure 6. Active glasses CrystalEyes system.
Figure 7. Passive glasses ZScreen system.
This technique is used by StereoGraphics in their ZScreen system (Fig. 7). A Monitor ZScreen system costs $2200. The above-and-below format is used on personal computers that do not have a stereo sync output. The left image is placed on the top half of the CRT screen, and the right image on the bottom half, thus reducing the resolution of the image. Chasm Graphics makes a software program called Sudden Depth that will format the images this way. Now, the stereo information exists but needs an appropriate way to send each L/R image to the proper eye. The StereoGraphics EPC-2 performs this task. The EPC-2 connects to the computer’s VGA connector and intercepts the vertical sync signal. When enabled, the unit adds an extra vertical sync pulse halfway between the existing pulses. The result causes the monitor to refresh at twice the original rate. In effect, this stretches the two images to fill the whole screen and show field-sequential stereo. The EPC-2 acts as an emitter for CrystalEyes or can be used as a device to create a left/right signal to drive a liquid crystal modulator or other stereo product. The EPC-2 is the same size as the other emitters and has approximately the same range. Its cost is $400. The Pulfrich Technique. Retinal sensors require a minimum number of light photons to fire and send a signal to the visual system. By covering one eye with a neutral density filter (like a lens in a pair of sunglasses), the light from a source will be slightly delayed to the covered eye. Hence, if an object is in motion in a scene, the eye that has the filter cover sees the position of the object later than the uncovered eye. Therefore, the images perceived by the left and right eyes will be slightly different, and the visual system will interpret the result as a stereo pair.
1334
STEREO AND 3-D DISPLAY TECHNOLOGIES
If the motion of an object on a display device is right to left and the right eye is covered by the filter, then a point on the object will be seen by the left eye before the right eye. This will be interpreted by the visual system as positive parallax, and the object will appear to move behind the stereo window. Similarly, an object moving from left to right will appear in front of the display device. The reader can implement the technique easily using one lens of a pair of sunglasses while watching TV. The Fakespace PUSH Display. Fakespace Lab’s PUSH desktop display uses a box-shaped binocular viewing device that has attached handles and is mounted on a triad of cylindrical sensors (Fig. 8). The device allows the user to move the viewing device and simulate limited movement within a virtual environment. The field of view can be as large as 140° on CRT-based systems. The cost is US $25,000 for the 1024 × 768 CRT and US $9,995 for the 640 × 480 LCD version. A variation that permits more viewer movement is the Boom (Fig. 9). The binocular viewing device is attached to a large arm configured like a 3-D digitizer that signals the position of the viewer using sensors at the joints of the arm. The viewer motion is extended to a circle 6 ft in diameter. Vertical movement is limited to 2.5 ft. The Boom sells for US $60,000. A hands-free version is available for US $85,000.
Figure 9. Fakespace Lab’s Boom.
Workbench Displays. Smaller adjustable table-based systems such as the Fakespace ImmersaDesk R2 (Fig. 10) and ImmersaDesk M1 are available. The systems use the active glasses stereo technique. The fully portable R2 sells for approximately US $140,000, including tracking. The M2 sells for US $62,995.
Figure 10. Fakespace ImmersaDesk R2.
Figure 8. The Fakespace Lab’s PUSH desktop display.
VREX Micropolarizers. VREX has patented what they call the µPol (micropolarizer) technology, an optical device that can change the polarization of an LCD display line by line. It is a periodic array of microscopically small polarizers that spatially alternate between mutually perpendicular polarizing states. Each micropolarizer can be as small as 10 millionths of a meter. Hence, a µPol could have more than 6 million micropolarizers of alternating polarization states per square inch in a checkerboard configuration of more than 2500 lines per inch in a one-dimensional configuration. In practice, the µPol encodes the left-eye image on even lines and the right-eye image on odd lines. Passive polarized glasses are needed to view the image.
STEREO AND 3-D DISPLAY TECHNOLOGIES
1335
The format requires a single-frame stereoscopic image format that combines a left-eye perspective view with a right-eye perspective view to form a composite image, which contains both left- and right-eye information alternating line by line. VREX provides software to combine left- and right-eye views into a single image. All VREX hardware supports this image format. The advantages of µPol include the ability to run at lower refresh rates because both eyes are presented with a (lower resolution) image simultaneously and hence the presentation is time parallel. LARGE FORMAT DISPLAYS One of the objects of virtual reality is to give the user the feeling of immersion in a scene. This has been accomplished in various ways. Head-mounted displays are a common solution. In general, head-mounted displays have a limited field of view and low resolution. In addition, allowing the user to move in space requires position tracking which has been a difficult problem to solve. Position tracking results in image lag which is a result of the time required to sense that the viewer’s position has changed, signal the change to the graphics system, render the scene change, and then transmit it to the headmounted display. Any system that must track the viewer and change the scene accordingly must treat this problem. The lag can produce motion sickness in some people. Projection systems have been developed that use large projection surfaces to simulate immersion. In some cases, the user is permitted to move about. In others, the user is stationary, and the scene changes.
Figure 11. Fakespace CAVE, front view.
IMAX Most readers are familiar with the large screen IMAX system that employs a large flat screen to give the illusion of peripheral vision. When projecting stereo, IMAX uses the standard field-sequential polarized projection mechanism where the user wears passive glasses. Similar techniques are used in the Kodak flat screen 3-D movies at Disney. Fakespace Systems Displays Fakespace Systems markets immersive displays that are similar to immersive technologies produced by several other companies. The walk-in, fully immersive CAVE is an extension of flat screen stereo. The CAVE system was developed at the Electronic Visualization Lab of the University of Illinois where the user is in a 10 × 10 ft room that has flat walls (Figs. 11 and 12). A separate stereo image is back-projected onto each wall; the floor and possibly the ceiling give the user the feeling of immersion. Image management is required so that the scenes on each wall fit together seamlessly to replicate the single surrounding environment. Because the system uses back-projection, it requires active shuttering glasses. The user can interact with the environment using 3-D input devices such as gloves and other navigational tools. The system sells for
Figure 12. Fakespace CAVE, inside.
approximately US $325,000 to $500,000, depending on the projection systems used. Fakespace also produces an immersive WorkWall whose screen size is up to 8 × 24 ft (Fig. 13). The system uses two or more projectors, and images blend to create a seamless image. As in the CAVE, the user can interact with the image using various 2-D and 3-D input devices. The cost is approximately US $290,000 for an 8 × 24 ft three-projector system. The VisionDome Elumens Corporation Vision Series displays (5–8) use a hemispherical projection screen that has a single projection lens. Previous dome-based systems relied on multiple projectors and seamed-together output from multiple computers, making them both complicated to configure and prohibitively expensive. The high cost, complexity, and nonportability of these systems made them suitable for highly specialized military and training applications, but they were impractical and out of reach for most corporate users. Available in sizes from 1.5 to 5 meters in diameter, which accommodate from one to forty
1336
STEREO AND 3-D DISPLAY TECHNOLOGIES
screen. The image-based projection depends on the viewer position; if the viewer moves, the image must change accordingly, or straight lines become curved. The number of viewers within the viewing ‘‘sweet spot’’ increases as the screen diameter increases. Field-sequential stereo imaging with synchronized shutter glasses is supported on Elumens products. The maximum refresh rate currently supported is 85 Hz (42.5 Hz stereo pair). Passive stereo that projects leftand right-eye images simultaneously but of opposite polarization is currently under development. AUTOSTEREOSCOPIC DISPLAYS NO VIEWING DEVICES REQUIRED Figure 13. The Fakespace WorkWall.
people, the VisionDome systems range in price from US $15,000 to US $300,000. The projector is equipped with a patented ‘‘fish-eye’’ lens that provides a 180° field of view. This single projection source completely fills the concave screen with light. Unlike other fish-eye lenses, whose projections produce focal ‘‘hot spots’’ and nonlinear distortions, the Vision Series lens uses linear angular projection to provide uniform pixel distribution and uniform pixel size across the entire viewing area. The lens also provides an infinite depth of field, so images remain in focus on screens from 0.5 meters away to theoretical infinity at all points on the projection surface. The single-user VisionStation displays 1024 × 768 pixels at 1000 lumens; larger 3- to 5-meter VisionDomes display up to 1280 × 1024 pixels at 2000 lumens (Fig. 14). Elumens provides an application programming interface called SPI (Spherical Projection of Images). Available for both OpenGL and DirectX applications, SPI is an image-based methodology for displaying 3-D data on a curved surface. It enables off-axis projection that permits arbitrary placement of the projector on the face plane of the
Hard Copy Free Viewing. With practice, most readers can view stereo pairs without the aid of blocking devices by using a technique called free viewing. There are two types of free viewing, distinguished by the way the left- and right-eye images are arranged. In parallel, or uncrossed viewing, the left-eye image is to the left of the right-eye image. In transverse or cross viewing, they are reversed and crossing the eyes is required to form an image in the center. Some people can do both types of viewing, some only one, some neither. In Fig. 15, the eye views have been arranged in left/right/left order. To parallel view, look at the left two images. To cross view, look at the right two images. Figure 16 is a random dot autostereogram in which the scene is encoded in a single image, as opposed to a stereo pair (9). There are no depth cues other than binocular disparity. Using cross viewing, merge the two dots beneath the image to view the functional surface. Crossing your eyes even further will produce other images. [See (10) for a description of the method for generating these interesting images]. Holographic Stereograms. Most readers are familiar with holographic displays, which reconstruct solid images. Normally, a holographic image of a three-dimensional scene has the ‘‘look around’’ property. A popular combination of holography and stereo pair technology, called a holographic stereogram, involves recording a set of 2-D images, often perspective views of a scene, on a piece of holographic film. The film can be bent to form a cylinder, so that the user can walk around the cylinder to view the scene from any aspect. At any point, the left eye will see one view of the scene and the right eye another, or the user is viewing a stereo pair.
Left-eye view Figure 14. The VisionDome.
Right-eye view
Left-eye view
Figure 15. Free viewing examples.
STEREO AND 3-D DISPLAY TECHNOLOGIES
Figure 16. A random dot autostereogram cos[(x2 + y2 )(1/2) ] for −10 ≤ x, y ≤ 10.
Conventional display holography has long been hampered by many constraints such as limitations with regard to color, view angle, subject matter, and final image size. Despite the proliferation of holographic stereogram techniques in the 1980s, the majority of the constraints remained. Zebra Imaging, Inc. expanded on the developments in one-step holographic stereogram printing techniques and has developed the technology to print digital full-color reflection holographic stereograms that have a very wide view angle (up to 110° ), are unlimited in size, and have full parallax. Zebra Imaging’s holographic stereogram technique is based on creating an array of small (1- or 2-mm) square elemental holographic elements (hogels). Much like the pixels of two-dimensional digital images, hogel arrays can be used to form complete images of any size and resolution. Each hogel is a reflection holographic recording on panchromatic photopolymer film. The image recorded in each hogel is of a two-dimensional digital image on a spatial light modulator (SLM) illuminated by laser light in the three primary colors: red, green, and blue (Fig. 17).
Volumetric interference pattern
1337
Parallax Barrier Displays. A parallax barrier (2) consists of a series of fine vertical slits in an otherwise opaque medium. The barrier is positioned close to an image that has been recorded in vertical slits and backlit. If the vertical slits in the image have been sampled at the correct frequency relative to the slits in the parallax barrier and the viewer is the required distance from the barrier, then the barrier will occlude the appropriate image slits to the right and left eyes, respectively, and the viewer will perceive an autostereoscopic image (Fig. 18). The images can be made panoramic to some extent by recording multiple views of a scene. As the viewer changes position, different views of the scene will be directed by the barrier to the visual system. The number of views is limited by the optics and, hence, moving horizontally beyond a certain point will produce ‘‘image flipping’’ or cycling of the different views of the scene. High resolution laser printing has made it possible to produce very high quality images: the barrier is printed on one side of a transparent medium and the image on the other. This technique was pioneered by Artn in the early 1990s to produce hard-copy displays and is now being used by Sanyo for CRT displays. Lenticular Sheets. A lenticular sheet (1,2) consists of a series of semicylindrical vertical lenses called ‘‘lenticles,’’ typically made of plastic. The sheet is designed so the parallel light that enters the front of the sheet will be focused onto strips on the flat rear surface (Fig. 19). By recording an image in strips consistent with the optics of the lenticles, as in the parallax barrier display, an autostereoscopic panoramic image can be produced. Because the displays depend on refraction versus occlusion, the brightness of a lenticular sheet display is usually superior to the parallax barrier and requires no backlighting. Such displays have been mass produced for many years for such hard-copy media as postcards. In these two techniques, the image is recorded in strips behind the parallax barrier or the lenticular sheet. Although the techniques are old, recent advances in printing and optics have increased their popularity for both hard-copy and autostereoscopic CRT devices. In both the lenticular and parallax barrier cases, multiple views of a scene can be included to provide
“Hogel” Spatial light modulator (SLM)
Reference beam
Film image plane
Converging lens
Figure 17. Zebra ogram recording.
Imaging
holographic
stere-
1338
STEREO AND 3-D DISPLAY TECHNOLOGIES
an autostereoscopic image. Many variants have been proposed but to date the author knows of no commercially viable products using the technique. Right eye
Left eye image
Left eye
Right eye image
Parallax barrier Figure 18. Parallax barrier display.
Lenticular sheet
Alternating left and right eye image strips
Right eye Left eye
Figure 19. Lenticular sheet display.
motion parallax as viewers move their heads from side to side creating what is called a panoramagram. Recently, parallax barrier liquid-crystal imaging devices have been developed that can be driven by a microprocessor and used to view stereo pairs in real time without glasses. Some of these techniques are discussed later. Alternating Pairs The output from two vertically mounted video cameras are combined. An integrating circuit was designed to merge the two video streams by recording a fixed number of frames from one camera, followed by the same number of frames from the other camera. The technique imparts a vertical rocking motion to the image. If the scene has sufficient detail and the speed of the rocking motion and the angle of rotation are appropriate for the individual viewing the system, most viewers will fuse a 3-D image. The system was commercialized under the name VISIDEP. The technique can be improved using graphical and image processing methods. More details can be found in (2).
The DTI System The Dimension Technologies, Inc. (DTI) illuminator is used to produce what is known as a multiperspective autostereoscopic display. Such a display produces multiple images of a scene; each is visible from a well-defined region of space called a viewing zone. The images are all 2-D perspective views of the scene as it would appear from the center of the zones. The viewing zones are of such a size and position that an observer sitting in front of the display always has one eye in one zone and the other eye in another. Because the two eyes see different images in different perspectives, a 3-D image is perceived. The DTI system is designed for use with an LCD or other transmissive display. The LCD is illuminated from behind, and the amount of light passing through individual elements is controlled to form a full-color image. The DTI system uses an LCD backlight technology which they call parallax illumination (11). Figures 20 and 21 illustrate the basic concept. As shown in Fig. 20, a special illuminator is located behind the LCD. The illuminator generates a set of very thin, very bright, uniformly spaced vertical lines. The lines are spaced with respect to pixel columns such that (because of parallax) the left eye sees all the lines through the odd columns of the LCD and the right eye sees them through even columns. There is a fixed relation between the distance of the LCD to the illumination plate, and the distance of the viewer from the display. This in part determines the extent of the ‘‘viewing zones.’’ As shown in Fig. 21, viewing zones are diamond-shaped areas in front of the display where all of the light lines are seen behind the odd or even pixel columns of the LCD. To display 3-D images, left- and right-eye images of a stereoscopic pair are placed in alternate columns of elements. The left image appears in the odd columns, and the right image is displayed in even columns. Both left and right images are displayed simultaneously, and hence
Illumination plate
Light lines Pixels
Moving Slit Parallax Barrier A variation of the parallax barrier is a mechanical moving slit display popularized by Homer Tilton that he called the Parallactiscope (2). A single vertical slit is vibrated horizontally in front of a point-plotting output display such as a CRT or oscilloscope. The image on the display is synchronized with the vibration to produce
Liquid crystal display
d
Figure 20. DTI illuminator.
STEREO AND 3-D DISPLAY TECHNOLOGIES
L
R
L
R
L
R
Figure 21. Viewing zones.
the display is time parallel. Because the left eye sees the light lines behind the odd columns, it sees only the left-eye image displayed in the odd columns. Similarly, the right eye sees only the right-eye image displayed in the even columns. The 2-D/3-D Backlight System. There are many ways to create the precise light lines described before. One method that is used in DTI products is illustrated in Fig. 22 (12,13). The first component is a standard off-theshelf backlight of the type used for conventional 2-D LCD monitors. This type of backlight uses one or two miniature fluorescent lamps as light sources in combination with a flat, rectangular light guide. Two straight lamps along the top and bottom of the guide are typically used for large displays. A single U-shaped lamp is typically used for smaller displays. An aluminized reflector is placed around the lamp(s) to reflect light into the light guide.
Front diffuser Lenticular lens
Secondary LCD Light guide
1339
The flat, rectangular light guide is typically made of acrylic or some other clear plastic. Light from the lamp enters the light guide from the sides and travels through it due to total internal reflection from the front and back surfaces of the guide. The side of the light guide facing away from the LCD possesses a pattern of reflective structures designed to reflect light into the guide and out the front surface. Several possible choices for such structures exist, but current manufacturers usually use a simple pattern of white ink dots applied to the rear surface of the light guide in combination with a white reflective sheet placed behind the light guide. The second component is a simple, secondary LCD which, in the ‘‘on’’ state, displays a pattern of dozens of thin, transparent lines that have thicker opaque black stripes between them. These lines are used for 3-D imaging as described in the previous section. The third major component is a lenticular lens, again shown in Fig. 22 This lens consists of a flat substrate upon whose front surface of hundreds of vertical, parallel cylindrical lenslets are molded. Light coming through the dozens of thin transparent lines on the secondary LCD is reimaged into thousands of very thin, evenly spaced vertical lines by a lenticular lens array spaced apart from and in front of the secondary LCD. The lines can be imaged onto an optional front diffuser located in a plane at one focal length from the lenticular lenslets. The pitch (centerto-center distance) of the lines on the light guide and the lenticular lenses must be chosen so that the pitch of the light lines reimaged by the lenticular lenslets bears a certain relationship to the pitch of the LCD pixels. Because the displays are likely to be used for conventional 2-D applications (such as word processing and spreadsheets) as well as 3-D graphics, the system must be capable of generating illumination so that each eye sees all of the pixels of the LCD and a conventional full-resolution 2-D image can be displayed by using conventional software. Note that when the secondary LCD is off, in other words in the clear state where the lines are not generated, the even diffuse light from the backlight passes through it freely and remains even and diffuse after being focused by the lenticular lens. Therefore, when the secondary LCD is off, no light lines are imaged, and the observer sees even, diffuse illumination behind all of the pixels of the LCD. Therefore, each of the observer’s eyes can see all of the pixels on the LCD, and full-resolution 2-D images can be viewed. DTI sells two displays, a 15-inch at $1699 and optional video input at $300 extra, and an 18.1-inch at $6999, video included. Both have 2-D and 3-D modes and accept the standard stereo formats (field sequential, frame sequential, side by side, top/bottom). Seaphone Display
Reflector Lamp
Figure 22. Backlight system.
Figure 23 shows a schematic diagram of the Seaphone display (14–16). A special transparent µPol-based color liquid-crystal imaging plate (LCD : SVGA 800 × 600) that has a lenticular sheet and a special backlight unit is used to produce a perspective image for each eye. The lenticular
1340
STEREO AND 3-D DISPLAY TECHNOLOGIES
Fresnellens LCD White LED array
Lenticular sheet
Polarizers
Mirror Diffuzer Infrared camera
Figure 25. Plan view of a backlight unit.
Image circuit
Mirror Viewers Infrared illuminator Figure 23. A schematic of the Seaphone display.
sheet creates vertical optical scattering. Horizontal strips of two types of micropolarizers that have orthogonal polarization axes are transmitted on odd versus even lines of the LCD. The backlight unit consists of a large format convex lens and a white LED array filtered by the polarizers whose axes of polarization are the same as those of the µPol array. The large format convex lens is arranged so that an image of the viewers is focused on the white LED array. The light from the white LED array illuminates the right half face of the viewer using the odd (or even) field of the LCD, when the geometrical condition is as indicated in Fig. 25. The viewers’ right eyes perceive the large convex lens as a full-size bright light source, and the viewers’ left eyes perceive it as dark one; similarly for the left eye (see Fig. 24). On the head tracking system, the viewers’ infrared image is focused on the diffuser by the large format convex lens and is captured by the infrared camera. An image circuit modulates the infrared image and produces binary half right and left face images of each viewer. The binary half face images are displayed on the appropriate cells of the white LED array. The infrared image is captured by using the large convex format lens. There is no parallax
Microretarder
in the captured infrared image when the image is focused on the white LED array. Hence, the displayed infrared viewers’ binary half right face image (the appropriate cells) and the viewers’ image that is focused by the large format convex lens are automatically superimposed on the surface of the white LED array. The bright areas of the binary half face images (the appropriate cells) are distributed to the correct eye of the viewers. On the Seaphone display, several viewers can perceive a stereo pair simultaneously, and they can move independently without special attachments. The display currently costs 1,492,000 yen. The Sanyo Display The Sanyo display uses LC technology for both image presentation and a parallax barrier (17). Because the thermal expansion coefficients are the same, registration is maintained under different operating conditions. They call the parallax barrier part of the display the ‘‘image splitter.’’ They use two image splitters, one on each side of the LC (image presentation) panel (Fig. 26). The splitter on the backlight side is two-layer thin films of evaporated aluminum and chromium oxide. The vertical stripes are produced by etching. The stripe pitch is slightly larger than twice the dot pitch of the LC panel. The viewerside splitter is a low-reflection layer. The stripe pitch is slightly smaller than twice the dot pitch on the LC image presentation panel. Each slit corresponds to a column of the LC panel. They claim that the technique produces no
Barrier Aperture
Polarizers
Right-eye image Left-eye image Barrier Aperture 65 mm
White LED array Polarizers
Backlight
LCD Figure 24. Each perspective backlight.
Image splitter 2 LC panel Image splitter 1
Viewer
ex. 580 mm ex. 0.9 mm (in air) Figure 26. A double image splitter.
STEREO AND 3-D DISPLAY TECHNOLOGIES
ghosting. They also have a head-tracking system in which the viewer does not have to wear any attachments. The HinesLab Display An autostereoscopic display using motion parallax (18–20) has been developed by HinesLab, Inc. (www.hineslab.com) of Glendale, California. The display uses live or recorded camera images, or computer graphics, and displays multiple views simultaneously (Fig. 27). The viewer stands or sits in front of the display where the eyes fall naturally into two of multiple viewing positions. If the viewer shifts positions, the eyes move out of the two original viewing positions into two different positions where views that have the appropriate parallax are prepositioned. This gives a natural feeling of motion parallax as the viewer moves laterally. An advantage of this approach is that multiple viewers can use the display simultaneously. The technology provides from 3 to 21 eye positions that give lateral head freedom and look-around ability, confirming the positions and shapes of objects. The device is NTSC compatible, and all images can be projected on a screen simultaneously in full color without flicker. The display is built around a single liquid-crystal panel, from which multiple images are projected to a screen where they form the 3-D image. The general approach used to create the autostereo display was to divide the overall area of the display source into horizontal rows. The rows were then filled by the maximum number of images, while maintaining the conventional 3 : 4 aspect ratio; no two images have the same lateral position (Fig. 28). The optical design for these configurations is very straightforward. Identical projection lenses are mounted
Figure 27. HinesLab autostereoscopic computer display — video arcade games.
2 Rows, 75% efficiency
3 Rows, 78% efficiency
1341
on a common surface in the display housing, and they project each image to the back of a viewing screen from unique lateral angles. Working in conjunction with a Fresnel field lens at the viewing screen, multiple exit pupils, or viewing positions, are formed at a comfortable viewing distance in front of the display. Figure 29 shows an arrangement of seven images displayed in three horizontal rows on the LCD panel. VOLUMETRIC DISPLAYS A representation technique used in computer visualization to represent a 3-D object uses parallel planar cross sections of the object, for example, CAT scans in medical imaging. We call such a representation a multiplanar image. Volumetric or multiplanar 3-D displays normally depend on moving mirrors, rotating LEDs, or other optical techniques to project or reflect light at points in space. Indeed, aquariums full of Jell-O that have images drawn in ink inside the Jell-O have also been used for such displays. A survey of such methods can be found in (2,21). A few techniques are worth mentioning. First, we discuss the principle of the oscillating mirror. Oscillating Planar Mirror Imagine a planar mirror which can vibrate or move back and forth rapidly along a track perpendicular to the face of a CRT, and assume that we can flash a point (pixel) on the CRT that decays very rapidly (Fig. 30). Let the observer be on the same side of the mirror as the CRT, so that the image in the CRT can be seen reflected by the mirror. If a point is rendered on the surface of the CRT when the mirror reaches a given location in its vibration and the rate of vibration of the mirror is at least fusion frequency (30 Hz), the point will appear continuously in the same position in space. In fact, the point would produce a solid image in the sense that, as we changed our position, our view of the point would also change accordingly. If the point is not extinguished as the mirror vibrates, then the mirror would reflect the point at all positions on its track, and the viewer would see a line in space perpendicular to the face of the CRT. Any point plotted on the surface of the CRT would appear at a depth depending on the position of the mirror at the instant the point appears on the CRT. The space that contains all possible positions of points appearing on the CRT defines what is called the view volume. All depth cues would be consistent, and there would be no ‘‘disconnection’’ of accommodation and vergence as for stereo pairs. The optics of the planar mirror produce a view volume depth twice that of the mirror excursion or
4 Rows, 81% efficiency
Figure 28. Possible image arrangements on the liquid-crystal projection panel.
1342
STEREO AND 3-D DISPLAY TECHNOLOGIES
Optional television broadcast
Subject
Lamp Fresnel lens
Mirror
3-D Image
Liquid-crystal projection panel
Combines images
7 - Lens camera
7 Eye positions 7 Lenses Screen
HinesLab 3DTV U.S. Pats. 5,430,474 & 5,614,941 Mirror
Figure 29. The seven-lens autostereo display.
Volume in which image will appear
Image of CRT
CRT
Limit of prime viewing
Mirror displacement
Figure 30. Vibrating mirror.
+
Resulting image displacement
−
p
q
Volume in which image will appear
d CRT
Figure 31. A varifocal mirror.
displacement depth. If the focal length of the mirror is also changed during the oscillation, a dramatic improvement in view volume depth can be obtained. Varifocal Mirror The varifocal mirror was a commercially available multiplanar display for several years. The technique uses
Image of CRT
h
Mirror displacement extremes (exaggerated for clarity)
a flexible circular mirror anchored at the edges (Fig. 31). A common woofer driven at 30 Hz is used to change the focal length of the mirror. A 3-D scene is divided into hundreds of planes, and a point-plotting electrostatic CRT plots a single point from each. The mirror reflects these points, and the change in the focal length of the mirror affects their apparent distance from the viewer. A software
STEREO AND 3-D DISPLAY TECHNOLOGIES
1343
z Z Dimension volume
x y
Out resolution angle
Multiplanar display surface
Display control computer X, Y Input synchronization electronics
X, Y Scanners Layers (RGB)
Modulator
Figure 32. Omniview volumetric display.
program determines which point from each plane is to be rendered, so that lines appear to be continuous and uniform in thickness and brightness. The resulting image is solid. The view volume depth is approximately 72 times the mirror displacement depth at its center. The images produced by the CRT would have to be warped to handle the varying focal length of the mirror. Such a mirror was produced by several companies in the past. At that time, only a green phosphor existed which had a sufficiently fast decay rate to prevent image smear.
although head trackers could be implemented for single view use. In addition, they are limited to showing computer-generated images. Another major disadvantage of multiplanar displays has been that the electro-optics and point-plotting devices used to produce the image are not sufficiently fast to produce more than a few points at a time on a 3-D object, and laser grids are far too expensive to generate good raster displays. Hence, multiplanar or volumetric displays have been limited to wire frame renderings.
Rotating Mirror
Acknowledgments The author thanks the following individuals who contributed to this article: Marc Highbloom, Denise MacKay, VREX; Shihoko Kajiwara, Seaphone, Inc.; Jesse Eichenlaub, Dimension Technologies, Inc.; Jeff Wuopio, StereoGraphics, Inc.; Richard Steenblik, Chromatek; David McConville, Elumens Corporation; Vivian Walworth, The Rowland Institute for Science; Shunichi Kishimoto, Sanyo Corporation; Jeff Brum, Fakespace Systems, Inc.; Michael Starks, 3-DTV Corp.; David McConville, Elumens Corp.; Stephen Hines, HinesLab, Inc.; and Mark Holzbach, Zebra Imaging, Inc.
A variant of this approach developed by Texas Instruments using RGB lasers for point plotting and a double helix mirror rotating at 600 rpm as a reflecting device was also commercially available for a time under the name of Omniview (Fig. 32). Some recent efforts have included LCD displays, but the switching times are currently too slow to produce useful images. Problems and Advantages A major advantage of multiplanar displays is that they are ‘‘solid.’’ Accommodation and convergence are not disconnected, as in viewing stereo pairs where the visual system always focuses at the same distance. Users who are stereo-blind can see the depth, and the image is viewable by several people at once. The primary problem that these mirror-oriented technologies have is that the images they produce are transparent. The amount of information they can represent before the user becomes confused is small because of the absence of hidden surface elimination,
BIBLIOGRAPHY 1. T. Okoshi, Three-Dimensional Imaging Techniques, Academic Press, NY, 1976. 2. D. F. McAllister, ed., Stereo Computer Graphics and Other True 3-D Technologies, Princeton University Press, Princeton, NJ, 1993. 3. H. Morgan and D. Symmes, Amazing 3-D, Little, Brown, Boston, 1982. 4. J. Lipscomb, Proc. SPIE: Non-Holographic True 3-D Display Techniques, 1989, Vol. 1083, pp. 28–34, LA.
1344
STILL PHOTOGRAPHY
5. Multi-pieced, portable projection dome and method of assembling the same. US Pat. 5,724,775, March 10, 1998, R.W. Zobel Jr. et al. 6. Tiltable hemispherical optical projection systems and methods having constant angular separation of projected pixels. US Pat. 5,762,413, June 9, 1998, D. Colucci et al. 7. Systems, methods and computer program products for converting image data to nonplanar image data. US Pat. 6,104,405, August 15, 2000, R.L. Idaszak et al. 8. Visually seamless projection screen and methods of making same. US Pat. 6,128,130, October 3, 2000, R.W. Zobel Jr. et al. 9. C. W. Tyler and M. B. Clarke, Proc SPIE: Stereoscopic Displays and Applications, 1990, Vol. 1256, pp. 187, Santa clara. 10. D. Bar-Natan, Mathematica. 1(3), pp. 69–75 (1991). 11. Autostereoscopic display with illuminating lines and light valve, US Pat. 4,717,949, January 5, 1988, J. Eichenlaub. 12. Autostereoscopic display illumination system allowing viewing zones to follow the observer’s head, US Pat. 5,349,379, September 20, 1994, J. Eichenlaub. 13. Stroboscopic illumination system for video displays, US Pat. 5,410,345, April 25, 1995, J. Eichenlaub. 14. T. Hattori, T. Ishigaki et al., Proc. SPIE, 1999, Vol. 3639, pp. 66–75, San Jose. 15. D. Swinbanks, Nature, 385(6,616), 476 Feb. (1997). 16. Stereoscopic display, US Pat. 6,069,649, May 30, 2000, T. Hattori, San Jose. 17. K. Mashitani, M. Inoue, R. Amano, S. Yamashita, and G. Hamagishi, Asia Display 98, 151–156 (1998). 18. S. Hines, J. Soc. Inf. Display 7(3), 187–192 (1999). 19. Autostereoscopic imaging system, US Pat. 5,430,474, July 4, 1995, S. Hines. 20. Multi-image autostereoscopic imaging system, US Pat. 5,614,941, March 25, 1997, S. Hines. 21. B. Blundell and A. Schwarz, Volumetric Three Dimensional Display Systems, Wiley, NY, 2000.
STILL PHOTOGRAPHY RUSSELL KRAUS Rochester Institute of Technology Rochester NY
INTRODUCTION Camera still imaging: The use of a lighttight device that holds a light-sensitive detector and permits controlled exposure to light by using a lens that has diaphragm control and a shutter, a device that controls the length of exposure. Controlled exposure is simply taken as an amount of light during a continuous period of time that produces a desired amount of density on the film after development. The desired amount of exposure is typically determined by either experimental methods know as sensitometry or through trial and error. The shutter range for exposure can be between several hours and 1/8000th of a second, excluding the use of a
stroboscopic flash that permits exposure times shorter than a millionth of a second. The image has a size or format of standard dimensions: 16-mm (subminiature), 35-mm(miniature), and 60-mm (medium format). These are commonly referred to as roll film formats. Four inches by five inches, 5 × 7 inches, and 8 × 10 inches are three standard sheet film sizes. A view camera, monorail camera, or folding type is used to expose each sheet one at a time. Folding type, technical cameras that have a baseboard can be single sheet exposure or can adapt to a roll film back of reduced dimensions. All current camera systems can either replace their film detectors with digital detectors or themselves are replaced entirely by a digital version. The nature of current still photography can be seen in its applied aspects: Documentary, reportage, scientific recording, commercial/ advertising, and fine art shooting are the primary realms of the professional photographer. In each activity, care is given to the materials, equipment, and processes by which a specific end is achieved. Scientific and technical photographers use the photographic process as a data collection tool where accuracy in time and space is of paramount importance. Photography for the commercial, documentary, and fine arts photographers has never been an objective and simple recording of an event or subject. Documentary photography before the Farm Security Administration attempts at reporting the Dust Bowl of the 1930s illustrated a point of view of the photographer. Photography was not simply a moment of captured time, an opening of a window blind to let in the world through a frame, but rather the photograph was the result of a complex social and political view held by the image-maker. The documentary/reportage photography of Hine, Evans, and Peress represent their unique vision and understanding of their times, not a na¨ıve recording of events. This type of photography has as much purpose and artifice as commercial/advertising shooting. The visual selling of product by imagery designed to elicit an emotional response has been an integral part of advertising for more than 100 years. The psychology of photography has remained relatively the same during this time, albeit photocriticism has had many incarnations, but the technology that has made photography the tool it is has changed very rapidly and innovatively during the last century. The current digital evolution in photography will further advance the tools available and alter the way images are captured and displayed.
A BRIEF HISTORY The history of photography traces the coincidence of two major technical tracks. The first track is the optical track. It includes the history of the camera obscura. Mentioned by Aristotle in the fourth century B.C., the camera obscura is basically a dark chamber useful for projecting the world through a pinhole into a closed, dark space. Described for the viewing of a solar eclipse, this pinhole device remained the basis for ‘‘camera imaging’’ for more than 1000 years. By the thirteenth century A.D. a lens had been added to the
STILL PHOTOGRAPHY
device, at least in theory. Three hundred years later at the height of the Italian Renaissance a number of scientificoartisans (Cardano and della Porto) mention or describe the camera obscura using a lens. By the mid sixteenth century, Barbaro describes the use of a lens in conjunction with a diaphragm. The diaphragm allows the camera to project a finely sharpened image by virtue of stopping down to minimize aberrations and to control the intensity of the light passing through the lens. The diaphragm or stop and the lens limit the width of the beam of light passing through the lens. The changeover from double convex lens to a meniscus type lens at the end of the eighteenth century limited the aberrations produced by the camera obscura. The prototype of the modern camera was born. There are other modifications that gradually came into existence. Optical achievements in the areas of astronomy and microscopy contributed toward contemporary photographic equipment. Sturm, a mathematician, produced the forerunner of the modern single-lens reflex camera by the later part of the seventeenth century. Photography had to wait approximately 150 years before the idea of permanently capturing an image became a practicality, and modern photography began its journey into the twenty-first century. The idea of permanently fixing an image captured by the camera obscura must have been in the ether for hundreds of years. Successful fixing took the independent efforts of Louis Daguerre and Joseph Niepce who approached the problem of permanent imaging from different vantage points. Both Daguerre and Niepce followed the trail broken by others to permanent pictures. The chemist, Johann Heinrich Schulze, is credited with the discovery of the light sensitivity of silver nitrate (AgNO3 ), and a decade later in England, Sir Josiah Wedgewood of Wedgewood China fame employed AgNo3 to make temporary photograms. These silhouettes were made by exposing silver nitrate coated paper to sunlight. Silver nitrate darkened upon exposure to sunlight, and those areas of the coated paper covered by an object remained white. Unfortunately, Wedgewood could not prevent the AgNO3 paper from darkening in toto after time. While Niepce and Daguerre worked in France, Sir John Herschel and Wm. Henry Fox Talbot contributed to the emerging science of photography from across the channel. Talbot is credited with the invention of the negative-positive process resulting in the Calotype in photography. A coating of silver iodide and potassium iodide is applied to heavy paper and dried. Before exposure, a solution of silver nitrate, acetic acid, and gallic acid is applied. Immediately after exposure to bright sunlight, the paper is developed in a silver nitrate and gallic acid solution. Washing, fixing in sodium thiosulfate, and drying completed the processing of a Calotype paper negative. Herschel’s chemical tinkering led to his formulation of ‘‘hyposulfite.’’ ‘‘Hypo’’ was a misnomer for sodium thiosulfate; however, the term hypo has remained a photographer’s shortspeak for fixer. The invention of this chemistry permitted fixing the image. The removal of light-sensitive silver nitrate by this solution prevented the
1345
image from darkening totally under continued exposure to light. Daguerre continued to proceed with his investigations into the direct positive process that was soon to be known by his name, the daguerreotype, and by 1837, he had produced permanent direct positive images. The daguerreotype depended on exposing silvered metal plates, generally copper, that had previously been in close proximity to iodine vapors. The sensitized plate is loaded into a camera for a long exposure, up to one hour. Development was managed by subsequent fuming of the plate in hot mercury vapors (60° C). The plate is finally washed in a solution of hot salts that remove excess iodine, permanently fixing the image (Fig. 1). For 15 years, the daguerreotype achieved worldwide recognition. During these years, major improvements were made in the process. Higher sensitivity was achieved through by using faster lenses and bromide-chlorine fuming. Images that formerly required 30 minutes or more of exposure could be made in 2 minutes. This process produced an amazing interest in photography, and an untold number of one of a kind images were produced, but the daguerreotype eventually gave way to the negative-positive process of Talbot and Herschel. Modern still photography had been born. Contemporary still photography begins where the daguerreotype ends. The ascendancy of the negativepositive process is the fundamental basis of film-based photography. IMAGE RECORDING Photography is distinguished by the idea that the image that will be recorded is of a real object. The recording of the image is the result of the optics involved in collecting the reflected light, and the recorded density range is a relative recording of the tones of the subject created by the source light falling, illuminance (luminous flux per unit area), on the subject. The establishment of a viewpoint controls perspective. The photographer must first choose
Figure 1. A Giroux camera explicitly designed for the daguerreotype.
1346
STILL PHOTOGRAPHY
the location from which the photograph will be taken. This achievement of perspective is essential to creating an appearance of a three-dimensional image on a two dimensional surface. The image projected by the lens is seen on ground glass placed at the focal plane or reflected by a mirror arrangement as in a single-reflex camera. The ground glass organizes or frames the image. In miniature and medium format photography, the photographer is looking at the image through a viewfinder. The image produced in the viewfinder shows the relative positions of objects in the scene as images produced by the lens in the same relative relationship. Distance is suggested in the image by the relative image sizes of objects in the field of view. PERSPECTIVE
Focal plane of 35-mm image
Subject
Photographic perspective is also controlled by the choice of optics. Using short focal length lenses, a wide angle view of the scene is imaged in the viewfinder. The angle of view determines how much of the image will be projected by the lens onto the frame. In practice, the frame is considered to the film size, horizontal × vertical dimensions. The lens focal length and the film size determine the angle of view (Fig. 2). The wider angle permits projecting a greater area, and objects in the scene will appear smaller in image size. A change to a longer than normal focal length lens limits the area projected due to a narrower angle of view, but the relative size of the image is greater than normal. In both cases, the change in relative size of the images depends solely on the distance of the objects from the camera. Both the position of the camera and the lens focal length control perspective. Image size is directly proportional to focal length and inversely proportion to the distance from the viewpoint and camera position. For a given lens, a change in film size alters the angle of view, for example, a 4 × 5 inch reducing back used in place of a 5 × 7 inch film plane decreases the angle of view. Perspective creates the illusion of depth on the twodimensional surface of a print. Still photography employs several tools to strengthen or weaken perspective. Control of parallel lines can be considered part of the issue of perspective. Linear perspective is illustrated by the convergence of parallel lines that give visual cues to the presentation of implied distance in the photograph. This
47° Lens axis
is shown through the classic example of the illusion of converging railroad tracks that seem to come together at a vanishing point in the distance. The image size of the cross ties becomes smaller as the their distance from the camera increases. We are more used to seeing horizontal receding lines appear to converge than vertical lines. In photographing a building from a working distance that requires the camera to be tilted upward to capture the entire building, the parallel lines of the building converge to a point beyond the film frame. When the film format is presented to the scene in portrait fashion, the vertical dimension of the film is perpendicular to the lens axis, parallel lines in an architectural structure will cause the image of the building to fall away from the viewer. The resultant image is known as keystone distortion. This can be controlled by the photographer by using of a view or field camera whose front and rear standards can tilt or rise. On a smaller camera, 35-mm or 6-cm, a perspective control lens can be used for correction. This type of lens allows an 8 to 11-mm shift (maximum advantage occurs when the film format is rotated so that the longer dimension is in the vertical direction) and a rotation of 360° . Small and medium format camera lenses permit tilting the lens relative to the film plane. In a relatively short object distance, the tilt-shift or perspective control (PC) lens is very helpful. The PC lens has a greater angle of view than the standard lens of the same focal length. This larger angle of view allows the photographer to photograph at a closer working distance. The angle of view can be calculated as follows: 2 tan−1 (d/2/f ): where d is equal to the dimension of the film format, and f is the focal length of the lens. This larger angle of view is referred to in relation to its diagonal coverage. These lenses have a greater circle of good definition. This larger circle of good definition permits shifting the lens by an amount equal to the difference between the standard circle of good definition (which is usually equal to the diagonal of the image frame) and the circle of good definition for the PC lens that has the greater angle of view (Fig 3). Further, assume that the image size of a building has a 0.001 scale of reproduction for a 35-mm lens at a working distance of 35 m. Therefore, every 1-mm upward shift of the PC lens causes a 1-m downward movement of the object. The effect generated by the perspective control lens is the same as the rising front on a view/field camera. Objects far a away subtend a smaller angle and create an image that is so small that the details of the object are unresolvable. The photographer can also use this lack
f le o tion Circ defini d goo Lens Axis
ge Ima at m r o f
Angle of view Figure 2. Angle of view of a lens–film format combination.
Figure 3. Format and circle of good definition.
STILL PHOTOGRAPHY
of detail by the controlling the depth of field (DOF). A narrow DOF causes objects beyond a certain point of focus to appear unsharp, thus fostering a sense of distance. Camera lenses focus on only one object plane but can render other objects acceptably sharp. These other objects in front of and behind the plane of focus are not as sharp. These zones of sharpness are referred to as depth of field. Acceptable sharpness depends on the ability of the eye to accept a certain amount of blur. The lens does not image spots of light outside the focused zone as sharply as objects that are focused. These somewhat out of focused spots are known as circles of confusion. If small enough, they are acceptable to the average viewer in terms of perceived sharpness. When the sharpness of objects in the zones before and after the zone of focus cannot be made any sharper, then the circles of confusion are commonly referred to as permissible circles of confusion. In 35-mm format, these permissible circles of confusion typically have a diameter of 0.03 mm in the negative. This size permits magnification of the negative to 8 to 10× and maintains acceptable sharpness when the print is viewed at a normal viewing distance. When the lens renders permissible circles of confusion in front of and behind the focused zone, these zones are referred to as the near distance sharp and the far distance sharp. At long working camera-to-subject distances, near and far sharpness determine depth of field (distance far sharp minus distance near sharp). In practice, the hyperfocal distance, the near distance rendered sharp when the lens is focused at infinity, for a given f # is used to determine the near distance sharp and the far distance sharp. Hyperfocal distance is focal length squared divided by f # times the circle of confusion, H = fl2 /(f # × cn). Therefore, the near and far distances sharp can be calculated as follows: Dn = Hu/[H + (u − fl)] where u is the distance from camera lens to subject and Df = Hu/[H − (u − fl)]. In studio photography, space is limited, a sense of depth can be fostered by the placement of lighting. The casting of foreground and background shadows and the creation of tonality on curved surfaces suggest greater depth than is actually there. In outdoor landscape photography, the inclusion of some near camera objects such as a tree branch in relationship to a distance scene cues the viewer to the distances recorded. LENS CHOICE AND PERSPECTIVE Photographs have either a strong or weak perspective that is permanently fixed. In three-dimensional viewing of a scene, the perspective and image size change in response to a change in viewing distance. This does not occur in photographs of two-dimensional objects. But in photographs containing multiple objects at different image planes or converging parallel lines, viewing distance can have an effect. Viewing distance changes influence the sense of perspective. Viewing distance is equal to the focal length of the camera’s lens when the print is made from contact exposure with the negative. When an enlarger is used, the focal length of the camera lens must be multiplied by the amount of magnification. Thus, if a 20-mm lens is used on a 35-mm camera and the
1347
negative is magnified by projection to an image size of 4 × 6 inches, the correct viewing distance is 80 mm. This has the effect of changing the perspective from a strong to a normal perspective. People tend to view photographs from a distance that is approximately the diagonal measurement of the print being viewed, thereby accepting the perspective determined by the photographer. In the previous example, 80 mm is too close for people to view a photograph. Seven to 8 inches is most likely to be chosen as the correct viewing distance. This maintains the photographer’s point of view and the strong perspective chosen. The choice of a wide-angle lens to convey strong perspective carries certain image limitations. Wide-angle close-up photography of a person’s face can present a distorted image of the person. The nose will appear unduly large in relation to the rest of the face. This would occur if the photographer wishes to fill the film frame with the face and does so by using a short object distance and a wide-angle lens. To maintain a more naturalistic representation, an experienced photographer will use a telephoto lens from a greater distance. Other distortions arise when using wide-angle lenses. In large group portraits where the wide-angle lens has been chosen because of the need to include a large group and object distance is limited by physical constraints, the film receives light rays at an oblique angle. Spherical objects such as the heads of the group portrait will be stretched. The amount of stretching is determined by the angle of the light from the lens axis that forms the image. The image size of the heads changes relative to the reciprocal of the cosine of the angle of deviation from the lens axis. This type of distortion occurs in using normal focal length lenses, out the amount of stretch is much less, and because the final print is viewed at a normal viewing distance, the viewer’s angle of view approximates the original lens’ angle of view and corrects for the distortion. NEGATIVE-POSITIVE PROCESS The process of exposing silver halide light-sensitive materials, in camera, relies on the range of density formed on a developed negative. Light reflected, luminance (intensity per unit area), from a subject is collected by a lens and focused as an image at a given distance behind the lens, the focal plane. The purposeful act of exposure and the concomitant development given to the exposed film achieves a tonal response or range of density that captures the relative relationships of tones in the original scene, albeit in reverse or negative state. Whites of uniform brightness in a scene, considered highlights, appear as a uniform area of darkness or relatively greater density in the negative. Blacks of uniform darkness in a scene, considered shadows, appear as uniform areas of lightness or a lesser density in the negative. Other tones between whites and blacks are rendered as relative densities between the extremes of black and white. This is the continuous tone or gray scale of photography. The negative’s density range must be exposed to other silver sensitized material to convert the reversed tones into a positive. The negative is made on a transparent base
1348
STILL PHOTOGRAPHY
that permits its subsequent exposure to a positive, either by contact or projection. There are obvious exceptions to the general practice of the negative-positive photographic approach in image making. Positives can be made directly on film for projection, direct duplication, or scanning. These transparencies may either be black-and-white or color. Likewise, direct positives can be made on paper for specific purposes. This process of representing real world objects by a negative image on film and then by a positive image on paper is basic to traditional still photography. CAMERAS Comments on the breadth of available camera styles, formats, features and advantages and disadvantages are beyond the scope of this article. However, certain specific cameras will be mentioned in terms of their professional photographic capabilities. The miniature camera is the 35-mm format. This format is represented by two types: compact and ultracompact. There are fixed-lens and interchangeable-lens type 35-mm cameras. The latter type, unlike the former, constitutes the basis of an imaging system that can be further subdivided into two basic types: rangefinder and single-lens reflex cameras. Both singlelens reflex and rangefinder style cameras are available as medium and special format cameras as well (Fig. 4). Professional 35-mm camera systems are characterized by their interchangeable lens systems, focal plane shutters, motorized film advance and rewinds, electronic flash systems, extra length film magazines, and highly developed autofocusing and complex exposure metering systems. Lenses from ultrawide-angle focal lengths, fish-eye (6mm), to extremely long focal lengths (2,000-mm) can replace the typical normal (50-mm) focal length lens. Special purpose lenses such as zoom lenses of various focal lengths and macro lenses that permit close-up photography resulting in image to object ratios greater than 1 : 1 are two of the most common specialty lenses.
21/4″
5″
36 mm
21/4″
24 mm
4″ Figure 4. Film/camera formats showing relative differences among them.
Among the specialty lenses that are of interest is the fish-eye lens. A wide-angle lens can be designed to give a greater angle of view if the diagonal format covered is reduced relative to the standard format. This unique lens projects a circular image whose diameter is 21 to 23-mm onto the 43-mm diagonal dimension of the 35 mm film format. Hence, it provides a circular image of the scene. The true fish-eye angle of view is between 180 and 220° . A normal wide-angle lens that has a rectilinear projection can achieve a very short focal length (15-mm) until its peripheral illumination noticeably decreases by the cos4 law. A change in lens design to one of retrofocus does not alter this loss of illumination. Other projection geometry is used to permit recording angles of view greater than 180° . Equidistant, orthographic, and equisolid angle projections are used to increase angles of view. The image projection formula for a rectilinear projection is y = f tan θ , for equidistant projection, y = f θ , and for equisolid angle projection, y = 2f sin(θ /2). Motor Drive A motor drive permits exposing of individual frames of film at relatively high film advance speeds. Unlike motion picture film cameras that expose at a rate of 24 frames per second continuously, motor drives advance a film frame at typically from three to eight frames per second. An entire roll of 35-mm 36-exposure film may be rewound by the motor in approximately 4 seconds. State-of-the-art 35-mm camera systems provide built-in film advance systems and do not require separate add-on hardware. When coupled with advanced fast autofocusing, the photographer has an extraordinary tool for recording sports action, nature, and surveillance situations. Electronic Flash Specialized flash systems are another feature of the advanced camera system. The fundamental components are power supply, electrolytic capacitor, reflector design, triggering system, and flash tube. Some triggering systems control the electronic flash exposure through the lens and are referred to as TTL. TTL cameras permit evaluating light that reaches the film plane. Variations of the basic TTL approach are the A and E versions of TTL. The A-TTL approach uses a sensor on the flash unit to evaluate the flash in conjunction with the aperture and shutter settings of the camera. E-TTL uses the camera’s internal sensor to evaluate the flash and to set the camera aperture. Another variant uses the camera-to-subject distance determined by the camera’s focusing system to control the flash duration. Modern camera systems communicate with the camera via a hot shoe and or cable (PC) connection. Professional cameras can program electronic flash to allow for highspeed flash at a shutter speed of 1/300 of a second. This is achieved by dividing the current discharged by the capacitor into a series of very short pulses. There are occasions for flash exposure at the end of the shutter travel. The rear-curtain synchronization allows continuous illumination of a moving subject that would produce any associated movement blur trailing the subject. X-synchronization flash would occur when the shutter is
STILL PHOTOGRAPHY
first fully opened. In X synchronization, the flash occurs when the first curtain is fully open and the following curtain has not begun to move; the blur of a moving image occurs in front of the image. Electronic flash systems produce high intensity, ultrashort bursts of light from the discharge of an internal capacitor. This discharge occurs in a quartz envelope filled with xenon gas. Typical effective flash excitation occurs within 1 millisecond. When used with a thyristor that controls exposure, the flash duration can be as short as 0.02 milliseconds (Fig. 5). The thyristor switches off the flash when a fast photodiode indicates that sufficient exposure has occurred. This system allows fast recharging of the capacitor to the appropriate current if an energy efficient design is used. This type of design uses only the exact level of charge in the capacitor needed for a specific exposure. Fast recycling times are available, albeit at power levels well below the maximum available. The quenching tube design permits dumping excess current to a secondary, low-resistance flash tube after the primary flash tube has been fired. While the accuracy of the exposure is controlled, the current is completely drained from the capacitor. Recycling times and battery life are fixed. The output of an electronic flash is measured in lumens per second, a measure of the rate of flux. A lumen is the amount of light falling on a uniform surface of 1 ft2 . The source is one candela at a distance of 1 foot from the surface. Because an electronic flash emits light in a defined direction, a lumen can be considered the amount of light given off in a solid angle. A more photographic measure is beam-candle-power seconds (BCPS), a measure of the flash’s output at the beam position of the tube. Beam-candle-power seconds is used to determine the guide number for flash effectiveness. An electronic flash’s guide number can be expressed as GN = (ISO × BCPS) 0.5 K, where K is a constant, 0.25 if distance is in feet or 0.075 if distance is measured in meters. Guide numbers express the relationship between object distance and aperture number. If a guide number is 88 for a given film speed/flash combination, the photographer can use an aperture of f /8 at the camera-to subject-distance of 11 ft or f /11 at 8 feet. At 16 feet, the aperture would need to be set two stops wider or f /5.6. The inverse square law governs this relationship. The law states that illumination increases or decrease as a function of the square of the distance
100%
Peak
50% Effective flash duration
0% Time (ms) Figure 5. Flash curve and time.
1349
(E = I/d2 ) from a subject to the point source of light. Guide numbers are approximations or initial exposure recommendations. The environment of the subject can alter the guide number, plus or minus, by as much as 30%. The type of electronic flash and the placement of the flash have a significant effect on the photographic image. Flash units mounted on the reflex prism of the camera are also on-axis (2° or less) with the lens. The combination of this location and the camera-to-subject distance is the source of ‘‘red-eye’’, the term given to the reflection of the flash by the subject’s retina. The reddish hue is caused by the blood vessels. This particularly unpleasing photographic effect is also accompanied by flat and hard lighting. Moving the flash head well off center and bouncing the flash off a reflector or nearby wall can provide more pleasing lighting. When shadowless lighting is desired, a ring flash can be used. A ring flash uses a circular flash tube and reflector that fits around the front of the lens. Ring flash systems generally have modeling lighting ability to assist in critical focusing. Scientific, technical, and forensic applications use this device. Single-lens Reflex Camera The single-lens reflex (SLR) camera is the most widely used miniature format, professional, camera system. Having supplanted the rangefinder as the dominant 35-mm system by the late 1970s, its distinctive characteristic is direct viewing and focusing of the image by a mirror located behind the lens and in front of the film gate. This mirror reflects the image formed by the lens to a ground glass viewing screen. The mirror is constructed of silveror aluminum-coated thin glass or metal. The coatings prevent optical distortions and increase the illuminance reflected to the viewing screen. The mirror is characterized by its fast return to its rest position after each exposure. The viewing screen is viewed through a viewfinder or pentaprism that laterally reverses the image. The image is in view in correct vertical and horizontal relationship to the subject. Mirror shape and length are two important design considerations that impact performance. Maximum reflectance depends on the intersection of the light path exiting the lens and mirror location. Mirrors can be trapezoidal, rectangular, or square and are designed to intersect the exiting cone of light best. Mirror length can impact the overall size of the camera housing and/or the viewing of the image. When the mirror is too short, images from telephoto lenses are noticeably darker at the top and bottom of the viewing screen. Overly long mirrors necessitate deep camera bodies or a lens system that uses a retrofocus design. Some manufacture’s use a hinged mirror that permits upward and rearward movement during exposure. Secondary mirrors that are hinged to the main mirror are used to permit autofocusing and exposure metering through the lens (Fig. 6). Almost all professional SLR systems have a mirror lockup feature that eliminates vibrations during exposure. This is very useful for high magnification photography where slight vibration can cause a loss of image quality, and for very long telephoto use at slow shutter speeds where a similar loss of image quality can occur. It is noteworthy to mention that long focal lengths (telephoto)
1350
STILL PHOTOGRAPHY
ant m e P ris p
(a)
Object
Eye piece
Ground glass Lens Mirror that rotates Axis
Film plane
Coincident image Eyepiece (viewfinder)
Mirror Figure 6. Lens, mirror, and pentaprism arrangement.
(b)
Object
from 150 to 2,000 mm are more accurately focused in a SLR system than in a rangefinder system. The Rangefinder In 1925, Leica introduced a personal camera system that established the 35-mm roll format and the rangefinder type camera as a professional photographic tool. The modern-day camera is compact and easy to focus, even in low light. It is compact and quiet because it does not require a reflex mirror mechanism, and it permits using a wide-angle lens of nonretrofocusing design. The rangefinder establishes the focal distance to the subject by viewing the subject from two separate lines of sight. These lines converge at different angles, depending on the working distance. At near distances, the angle is greater than at far distances. The subject is viewed through one viewing window and through a mirror that sees the subject through a second window. The distance between the two windows is the base length. By virtue of a sliding mirror or rotating mirror, the subject is viewed as two images that coincide on each other. The mirror or mirrors may be silvered or half silver and can be made to image the subject in two halves, one image half (upper) above the other image half (lower), or one image half alongside the other image half. This vertical or horizontal coincidence is the basis for focusing the camera lens. There are several design variations for constructing a rangefinder system. One variation maintains both mirrors in fixed position, and the appropriate deviation is achieved by the inserting a sliding prism in the light between the two mirrors (Fig. 7). Given that tan x = b/d, where b = base length of the rangefinder, d = distance of the object, and tan x is the angle of rotation of the mirror, it is obvious that a minimum rotation can accommodate focusing from near to distant. When the rangefinding system is mechanically coupled to the lens (cams and gears), visual focusing through the eyepiece of the viewfinder and lens focusing of the subject now are in unison. When the superimposed images of the rangefinder are coincident, the lens is correctly focused on the subject. Focusing accuracy depends on a number of factors: base length, focusing error tolerance, mechanical couplings, and
Sliding wedge
Eyepiece Figure 7. Drawing of rangefinder types.
image scale of reproduction in the viewfinder. Accuracy of focus is defined by the limits of the eye (using a standard of 1 of arc) as an angle. Because the acuity of the eye can be influenced by a magnifier in the eyepiece of the optical viewfinder, rangefinder error is described as 2D2 a/Rb, where Rb is the scale of reproduction × base length, D is distance, and a is the angle of 1 of arc. Rb is usually referred to as true base length. Therefore, if the base length of a rangefinder camera is 150 mm and the scale of reproduction is 0.8, the true base length is 120-mm. Note that the scale of reproduction in all miniature cameras and smaller formats is less than one. This is necessary to permit the rangefinder image in the viewfinder window. There are some systems that permit attaching a scale of reproduction larger than one. This extends the true base length without expanding the physical dimensions of the camera. Telephoto lenses can focus on a subject less than 5 feet from the camera. The coupled rangefinder permits focusing a 50-mm lens on a 35-mm camera from infinity to 30 inches (0.75 m). However, if the focusing error is too
STILL PHOTOGRAPHY
great, it will exceed the DOF for a given lens, aperture, and distance. The appropriate base length has a practical limit because there is a maximum focal length for a given base length. The base length becomes unwieldy, or the camera would require large magnification viewfinders for focal lengths in excess of 150 mm in a 35-mm system. This is expressed in the formula Rb = focal length squared/f number times C, where C is the permissible circle of confusion in the negative. At the end of the 1900s, the rangefinder camera had resurgence in the issuance of several new 35-mm systems, in the marketing of a large number of new APS (Advanced Photo System, a sub 35 mm Format) roll film cameras, and in the establishment of ‘‘prosumer’’ digital cameras. Medium Format Cameras These cameras are chiefly considered professional by virtue of their extensive accessories, specialized attachments, and their larger film format. The larger film format has distinct advantages in that the magnification used to reach final print output is generally less than that for the smaller format. Consequently, microimage characteristics of the film that may detract from overall print quality (definition) are limited by the reduced magnification required. The film format is generally unperforated roll type in sizes of 120 or 220, but specialized cameras in this classification using 120 roll film can produce images that are 2 14 × 6 34 inches to 6 × 4.5 cm. Other film formats derived from 120 or 220 film are 6 × 6 cm, 6 × 7 cm, 6 × 8 cm, 6 × 9 cm and 6 × 12 cm. Seventy-millimeter perforated film stock used in long roll film-magazines is considered medium format. Two-twenty film essentially provides for double the number of exposures as the 120 film and has us backing paper. Nonperforated 120 roll film is backed by yellow and black paper. Exposure numbers and guide markings printed on the yellow side are visible through a viewing area (red safety window) on the camera back. When there is no viewing window, the guide marks are aligned with reference marks on the cameras’ film magazine, the film back is shut, and the film is advanced forward by a crank until the advancing mechanism locks into the first frame exposure position. The film is acetate approximately 3.6 mils thickness and must be held flat in the film gate by a pressure plate similar to that employed in 35-mm cameras. Medium format cameras are generally SLR types, but rangefinder types and the twin lens reflex are also available. Many of the available medium format camera offer only a few lenses, viewfinders, and interchangeable back, and a few offer complete and unique systems that provide all of the advantages of a large format camera within the smaller and more transportable design of the medium format camera. More recently introduced in the 2 14 s × 2 14 -inch format are systems that provide tilt and shift perspective control, a modified bellows connection between the front lens board and the rear film magazine, and interchangeable backs for different film formats and for Polaroid film. In some cameras, the backs can rotate to either portrait or landscape mode. These backs protect the film by a metal slide that is placed between the film gate and the camera body. The slide acts as an interlock, thereby preventing exposure or accidental fogging of the film when in place.
1351
This feature also permits exchanging film backs in mid roll. Other forms of interlocks prevent multiple accidental exposures. The twin-lens reflex camera is a unique 2 14 -inch square, medium format camera design. This design permits viewing the image at actual film format size. The image is in constant view and it is unaffected by the action of the shutter or advancing the film. However, its capabilities are affected by parallax error when it is used for close-up photography. The viewing lens of the camera is directly above the picture-taking lens. The mirror that reflects the image for viewing to a ground glass focusing screen is fixed. The camera is designed to frame the subject from a low viewpoint, generally at the photographer’s waist level. The shutter is located within the picture-taking lens elements and is typical of other medium format shutter mechanisms. Shutters All types of shutter mechanisms may be found in all format cameras. Typically, one type of shutter may be mostly associated with a specific format camera. Focal plane shutters are found mostly in miniature format cameras, although some medium format cameras use this design. Likewise, leaf shutters, so-called between the lens shutters because of the leaflike design of their blades, are used mostly in medium and large format cameras, although they are found in some fixed-lens 35-mm cameras and in smaller format cameras as well (Fig. 8). Specialized shutters such as revolving shutters that are generally considered typical in aerial technical cameras have been adapted to 35-mm systems. Leaf shutters are located between lens elements or behind the lens itself. Ideally, the shutter should be at the optical center of the lens, parallel with the diaphragm. The blades of the shutter can be made to function as an iris diaphragm as well. The shutter and aperture are mostly part of one combined mechanism, the compound shutter, exemplified by manufacturers such as Compur, Copal, and Prontor. These shutters are numbered from 00 to 3 and reflect an increasing size that is necessitated by the increasing exit pupil diameter of the lens. For the most part, these lenses are limited to a top shutter speed of 1/500th of a second. X synchronization is available at all shutter speeds. When fired, the shutter opens from the center to the outer edges of the lens. Because the shutter does not move infinitely fast, the center of the aperture is uncovered first and stays uncovered the longest. The illuminance changes as the shutter opens and closes. Consequently, if exposure times were based on the full travel path of the shutter blades, the
Leaf shutter opening
Enlarged leaf shutter closed
Figure 8. Illustration of leaf design.
1352
STILL PHOTOGRAPHY
resulting image would be underexposed. To correct for this, the shutter speed is measured from the one-half open position to the one-half closed position. This is known as the effective exposure time. Note that small apertures are exposed sooner and longer than larger apertures. Effective exposure time is longer for smaller apertures. This poses no problem for exposure at slower shutter speeds; however, as shutter speeds increase and apertures decrease, exposures move in the direction of overexposure. In a test of a Copal shutter, a set shutter speed of 1/500th of a second and an aperture number of f /2.8 produced a measured exposure time of 1/400th of a second. This difference between the set exposure time and the measured exposure time was minimal, less than one-third of a stop. However, when the diaphragm was stopped down to f /32 and the 1/500th of a second shutter speed was left untouched, the measured shutter speed was 1/250th of a second. This difference is attributed solely to the fact that the smaller aperture is left uncovered for a longer time. This one stop difference is significant (Fig. 9). Between the lens shutters increase the cost of lenses for medium and large format cameras due to the builtin shutter for each lens. These shutters may be fully mechanical and have clockwork type gears and springs or may be electromechanical, using a battery, resistorbased timing circuits, and even quartz crystals timers. Resistors can be coupled in circuit with a photocell, thereby creating an autoexposure system where the shutter speed is determined by the amount of light sensed by the photocell relative to the aperture setting, that is, aperture priority. Focal plane shutters have a distinct advantage because they are part of the camera. The cost of lenses does not reflect the need for shutters to be incorporated. Higher shutter speeds are possible, 1/8,000th of a second, and advanced electronics can synchronize flash to shutter speeds as fast as 1/300th of a second. The heart of modern focal plane shutter design is the Copal vertical travel
Shutter constant at 1/500th second 100%
50
0 Milliseconds Total time of 4 ms at aperture f /32 and 2.5 ms at aperture f /2.8 Figure 9. Oscilloscope trace of two apertures and fast shutter speed and an example of effective exposure time.
shutter. Horizontal travel shutters are still used. Focal plane shutters derive their name from the fact that they are located close to the focal plane at the film or detector. This closeness avoids effective exposure time problems for small apertures as happens with between the lens shutters. Historically, the shutter was a slit in a curtain that was comprised of a leading and trailing edge; the film was exposed by the action of the slit scanning the film. The curtain traveled horizontally. Exposure time is the result of the width of the slit in the curtain divided by the velocity of the traveling curtain. The slit used for the exposure may in fact not be a slit in a curtain, but rather a set of titanium or metal alloy blades or the travel difference between two metallic curtains or blinds. High-speed shutters require blades made of durable and lightweight materials such as carbon fibers, and they require complex systems to control vibrations (shutter brakes and balancers) and to prevent damage to the blades. State-of -the-art shutters may include a self-monitoring system to ensure accuracy of speed and reliability. These systems are made even more complex by electronic components that are used to control exposure. The Hasselblad 200 series camera systems offer electronically controlled focal plane shutters that permit exposure times from 34 minutes to 1/2,000th of a second and flash synchronization up to 1/90th of a second. These electronic shutters permit aperture priority mode exposures that couple an atypical shutter speed (e.g., 1/325th of a second) to a selected aperture. Normal shutter speeds are established on a ratio scale whereby the speeds increase or decrease by a constant factor of 2. Unique to focal plane shutters is the capability of metering changing illuminance values for exposure control off-the-film plane (OTF). Focal plane shutters are identified mostly with a professional, 35-mm system and in rarer medium format systems. They are still rarer in large format camera systems but are found in barrel lens camera systems. Large Format Cameras The view camera, a large format camera, is the most direct approach for producing photographic exposures. A view camera is essentially a lens mounted on a board connected by a bellows to a ground glass for composing the image. The frame is further supported by using a monorail or for a field or technical camera, by folding flatbed guide rails. These field cameras may use rangefinder focusing in addition to direct viewing or an optical viewfinder. The view camera is capable of a series of movements. The lens plane and film plane can move independently. These movements are horizontal and vertical shifts for the lens and film planes, forward and rearward tilting of the lens and film planes, clockwise and counterclockwise swinging of the lens plane about its vertical axis, and clockwise and counterclockwise swinging of the film plane about its vertical axis. These movements control image shape, sharpness of focus, and location of the image on the film plane. Simply shifting the film plane or lens plane places the image appropriately on the film. This movement can avoid the need to tilt the camera (upward/downward) to include the full object. Because the film and lens planes can be shifted independently, a shift of the lens plane in one direction
STILL PHOTOGRAPHY
is equivalent to shifting the film plane an equal distance in the opposite direction. This type of shift, as well as the other movements, functions purposefully as long as the diagonal of the image formed by the lens covers the diagonal of the film plane format. The projection of the image circle in which sharpness is maintained is known as the circle of good definition. The circle of good definition increases in size when the lens to film-plane distance is increased to maintain good focus. Stopping down, that is, reducing the aperture, can increase the circle of good definition. The larger the circle of good definition, the greater the latitude of view camera movement. Often the angle of coverage is used as another measure of the covering power of the lens. The angle of coverage is unaffected by changes in the image distance. Movements can be done in tandem. When shifting the lens plane to include the full object being photographed, the shift may not be sufficient. Tilting the lens plane back and tilting the film plane forward can position the image within the area of good definition. Swinging the film plane around its vertical axis, so that it parallels the object being photographed, can eliminate the convergence of horizontal lines. This has the effect of making the scale of reproduction equal at the ends of all horizontal lines. This image effect is explained by the formula, R = V/U, where R is the scale of reproduction, V is the image distance, and U is the object distance. Therefore, given an object, where the two ends of its horizontal lines are at an object distance of 20 feet(left side) and 10 feet (right side), the image distance V must be adjusted by swinging the film plane so that Vr /Ul is equal to Vl /Ur . Swinging the lens plane will not produce the same effect because Ur is increased and Vl is increased; consequently, the ratio V/U is the same, and R is constant (Fig. 10). Film plane movements can be used to improve the plane of sharp focus. However, when using film-plane swing capabilities to control image shape, the lens plane swing must be used to control sharp focus because adjusting the film plane around its vertical axis affects sharpness and shape simultaneously. Swinging the lens plane does not affect image shape. Focus control by the lens plane is limited by the covering power of the lens. Movement of the lens plane moves the image location relative to the film plane. Excessive movement can adjust the image
(a)
beyond the circle of good definition and beyond the circle of illumination, that is, optical vignetting. An oblique pencil of light is reduced in illumination compared to an axial beam from the same source. Physical features of the lens, such as lens length, can further impact optical vignetting. Vignetting can be reduced by using smaller apertures, but some loss of illumination will occur as a matter of course, natural vignetting, because the illumination falls off as the distance from the lens to the film increases. Illumination is inversely proportional to the square of the lens to film distance, cos4 law. It has been calculated that the cos4 law affects a normal focal length lens that has an angle of view of 60° , so that there is a 40 to 50% loss of illumination at the edges of the film plane. For lenses of greater angle of view such as wide angle lenses, a 90° lens could have as much as a 75% loss of illumination at the edges. Reverse telephoto lens, a design that permits a greater lens to film plane distance than a normal lens design for the same focal length. The swings, shifts, and tilts achieved by the view camera provide a powerful tool for capturing a sharp image. When these tools are unavailable, the photographer can focus only at a distance where DOF can be employed to achieve overall acceptable sharpness in the negative. The relationship of the lens to film-plane distance expressed by the formula 1/f = (1/U) + (1/V) determines that objects at varying distances U are brought into focus as the lens to film-plane distance V is adjusted. The view camera that has independent front and back movements may use either front focusing when the lens to film-plane distance is adjusted by its movement or rear focusing when the film plane is moved closer to or further from the lens. Back focusing controls image focus, and front focusing alters image size as well as focus. It can be seen from the previous formula that U and V are conjugates and vary inversely. Back focus is required when copying to form an image of exact size. The plane of lens, the plane of the film, and the plane of the object are related by the Scheimpflug rule (Fig. 11). When the object is at an oblique angle to the film plane, the lens plane can be swung about its vertical axis or tilted around its horizontal axis, so that the three planes, object, film, and lens planes, meet at a common line. When the lens plane is parallel to the object plane, the film plane is swung or tilted in the
(b)
X
VL
UL
1-1 0
0
I
X
V
U
View camera, not parallel to object
1353
VR
UR
View camera parallel back to object U L/VR: U R/VL = 1:1 Figure 10. Diagram of the film plane swing.
1354
STILL PHOTOGRAPHY
Object
Common line
Back-plane Figure 11. Illustration of the Scheimpflug rule.
opposite direction to achieve the three-plane convergence. The correct order of action is to adjust the back-plane to ensure correct image shape and then to adjust the lens plane to ensure sharpness. It is apparent that the image plane has different foci, left to right, when swung, and top to bottom when tilted. DOF must be used to further the creation of an overall sharp image. There are limitations in using DOF. DOF calculations depend on the formula C = f 2 /(NH). Both f , focal length, and N, aperture number, alter the DOF. Doubling the size of the permissible circle of confusion would require a change of one stop of aperture. Depth of field is directly proportional to f -number. DOF increases as the object distance U increases, and it is expressed as D1 /D2 = (U1 )2 /(U2 )2 ; this is conditional on the caveat that the hyperfocal distance does not exceed the object distance. DOF increases as focal length decreases for a given image format. When comparing the DOF for two lenses for the same image format, the DOF ratio is equal to the focal length ratio squared. Lenses and Image Forming Principles Focal length is defined by the basic formula 1/f = (1/U) + (1/V). This is the foundation for a series of equations that describe basic image formation. I (image size) is equal to O (object size) × V/U, except for near objects less than twice the focal length. Practically, I = O × f /U, I/O = f /U, and the scale of reproduction R is equal to f /U. R is determined by focal length for any given distance, and for a specific focal length, R is determined by distance. Focal length determines the size of the image for an object located at infinity for any given film/detector size. The measured distance between the lens and the focused image is expressed as focal length. In camera systems where the lens is focused by varying the location of an element within the lens, the focal length is dynamic. The ability of the lens to project a cone of light of differing brightness is a function of the aperture control or iris diaphragm. The ratio of the focal length to the maximum diameter of the diaphragm (entrance pupil) is the lens’ f -number (f #). f# = focal length / D, where D is the diameter of the entrance pupil. f -numbers are a ratio scale, 1 : 1.4, and the
image illuminance changes by a factor of 2. In photographic parlance, this factor of change is referred to as a stop. The smaller the f #, the relatively brighter the image. The intensity of the image will be less than the intensity of the light falling on the lens. The transmission of the reflected light depends on a number of lens properties; absorption and reflection factors. When the focal length of the lens equals the diagonal of the image format of the camera, the focal length is considered ‘‘normal.’’ The normal focal length lens has an angle of view of approximately 47 to 53° that is akin to the angle of view of the human eye. The 50-mm lens is the ‘‘normal’’ standard for 35-mm format photography. Medium format lenses have been standardized at 75 or 80 mm, and 4 × 5-inch cameras have a standardization range between 180 and 210 mm. There is a difference between the calculated normal lens determined by the diagonal of the film format and those that are actually found on cameras. The actual normal lenses are those of a longer focal length than would be required by the length of the diagonal. Wide-angle lenses are characterized by the larger angle of view. The focal length of these lenses is much less than the diagonal of the image format that they cover. Because of the short lens to focal plane distance, problems of camera function may occur. In the SLR camera, the mirror arrangement may be impeded by using short focal lengths, and in view cameras, camera movement may be hindered. Reverse-telephoto wide-angle designed lenses (retrofocus) overcome such problems. The design requires placing a negative element in front of a positive element, thereby spacing the lens at a greater distance from the image plane. When the focal length of a lens is much greater than the diagonal of the film, the term telephoto is applied. The angle of view is narrower than the normal focal length. The telephoto design lens should not be confused with a long focus lens (Fig. 12). In the telephoto design, the placement of a negative element/group behind the positive objective brings the cone of light to a focus as if it had been from a positive objective of greater focal length. The back nodal plane is now located to give a shorter lens to film distance than that of a lens of normal design for the same focal length. The opposite of this is short focal length wide-angle designs by which the distance is increased when a negative element is placed in front of the objective. This greater lens to film distance permits full use of the SLR mirror and more compact 35-mm camera designs. Macro lenses primarily used in scientific and medical photography have found their way into other photographic venues. This has been made possible by the availability of ‘‘telemacro’’ lenses. These are not true macro lenses;
Axis
Focal plane
Focal plane
BFD Lens
f1
Normal design lens
BFD
f1 Telephoto design Figure 12
STILL PHOTOGRAPHY
although they allow a close working distance, the image delivered is generally of the order of 0.5 magnification. True photomacrography ranges from 1 to 40×. The ‘‘telemacros’’ and some normal macros permit close-up photography at a working distance of approximately 3 to 0.5 feet and at a magnification of 0.10 to 1.0. These are close-up photography lenses, although they are typically misnamed macro-zoom lenses. Close-up photography requires racking out the lens to achieve a unit of magnification. The helical focusing mount limits the near focusing distance, so that other alternatives must be found to extend the lens to film distance, increase magnification, and shorten working distance. Many of these close-up lenses require supplemental lenses or extension tubes to achieve magnification beyond 0.5×. The classic Micro Nikkor 55-mm lens for the 35-mm format can magnify by 0.5×. An extension tube permits the lens to render 1× magnification. The 60-mm, 105-mm, and 200-mm Micro Nikkor achieve 1 : 1 reproduction without extension tubes. Long focus length macro lenses for miniature and medium format cameras require supplemental lenses to achieve magnification up to 3×. Positive supplemental lens focal lengths are designated in diopters. Diopter (D) power can be converted into focal length by the formula f = 1 (meter) /D. It is a practice to add supplemental lenses to each other to increase their power and increase magnification. The useful focal length is now the sum of all of the focal lengths in the optical system. The formula 1/f = (1/f1 ) + (1/f2 ) + (1/fn ) expresses this. When used with rangefinder cameras or TLL reflex cameras, an optional viewfinder that corrects for parallax must be used. The working distance relationship to the focused distance is determined by the formula uc = ufs /(u + fs ); where uc is the close-up working distance, fs is the focal length of the system, and u is the focused distance of the main lens. Extension tubes are placed between the lens and the camera body and may be replaced by a bellows that provides variable magnification. The bellows system offers continuous magnification, an option to attach different lenses to the bellows to achieve different results, and the ability to use a reversing ring that reverses the lens position so that the front element faces toward the camera. As for lenses that use an internal floating element/group to achieve increased magnification, the bellows attachment makes good use of TTL metering for optimum exposure control. Autoexposure Automated systems for exposure and focusing are the hallmarks of modern camera systems. Photographic exposure is defined as H = E × T, where H is meter candle seconds (log10), E is illuminance in meter candles, and T is time in seconds. Autoexposure systems are designed to determine the optimum H, range of apertures, and choice of shutter speed for a given film speed. When the aperture is predetermined by the photographer, the camera’s autoexposure system will select the appropriate shutter speed. When the image at a selected speed may show camera shake, a warning signal may occur, or a flash is activated in those cameras that incorporate built-in-flash. This aperture priority system is found in
1355
many autoexposure cameras. The nature of autoexposure depends on the calculated relationship between the luminance of the subject and the sensitivity of the film. Film sensitivity is defined by the International Standards Organization (ISO). ISO has two parts to its designation, or Deutsche Industrie Normale (DIN) and Arithmetic (ASA). Both designations are used to represent the same minimum exposure necessary to produce a density level above the base + fog of the developed film for a given range of exposure. The relationship between the two components is described by the formula log(ASA) × 10 + 1. Thus an ASA of 100 is also a DIN of 21, [log(100) = 2 × 10 + 1 = 21]. The most advanced autoexposure systems measure the subject luminances passed by the lens aperture and determine the shutter speed. Conversely, a shutter speed may be set, and the aperture of the camera would be automatically determined. Illuminance measured by a photometer located within the light path projected by the lens. Because focusing and viewing are done through a wide open aperture and metering is for a working aperture, a stop-down method is required, if a full aperture method is not offered. Off-the-film metering is a very useful stopdown approach. A number of devices from secondary reflex mirrors, beam splitters, prisms, multiple (segmented) photocells, and specially designed reflex mirrors allow a photocell to measure the light that reaches the film plane. The ideal location of the cell for measurement is the film plane. Photocell choices for measurement have specific advantages and certain disadvantages. For example, a selenium photocell does not require a battery, is slow, and has low sensitivity, but its spectral sensitivity matches that of the eye. A gallium arsenic phosphide cell requires a battery and an amplifier but is fast and very sensitive to low light. Its spectral sensitivity is limited to the visual spectrum. The calculations of exposure depend on the assumption that the luminance transmitted by the lens is 18%, the integrated value of an average scene. This is not always the case. Consequently, in camera metering systems apply certain patterns that vary the nature of the calculation for exposure. The patterns are selectable and can cover the range from a 3° spot to an overall weighted average. These metering approaches are found in miniature and medium format cameras; however, view cameras can use direct measurement by a special fiber optic or other types of probes directly on the ground glass. Without such a device, large format camera users must resort to handheld meters. Unique to the handheld meter is the incident or illuminance meter. An integrating, hemispheric diffuser covers the photocell. The meter is
Eight segment meter cell pattern, pentax w/spot
Nikon eight segment with spot metering
Figure 13. Illustration of metering patterns.
1356
STILL PHOTOGRAPHY
held at the subject and aimed at the camera. It is assumed that the subject is not very dark or light. The meter is designed on the assumption that the subject approximates a normal range of tones. When the subject is very dark or very light, exposure must be adjusted by one-half to one stop. The location of the light source is also of importance. Studio lighting close to the subject requires that the photographer compensate for any loss of luminance that results from a increase in the subject to light source distance, following the inverse square law. Handheld meters do not compensate for other photographic exposure influences such as reciprocity law failure or the use of filters for contrast or color control; nor can handheld illuminance meters be used for emitting sources. The relationship among the various factors, film speed, shutter speed, f#, and illuminance is expressed in the formula, foot candles = 25 (a constant) × f #2 /(arithmetic film speed × shutter speed). It is obvious that a measurement of the illuminance in foot candles can be used in the previous formula to solve for f # or shutter speed. Autofocusing Coupled with autoexposure systems are advanced autofocusing systems. Autofocusing can be applied to a number of photographic systems. Likewise, a number of approaches can be used for autofocusing. Phase detection, sound ranging, and image contrast comparisons have been used in different camera systems. Electromechanical coupling racks the lens forward or rearward for correct exposure. By using a linear array containing up to 900 photosites, adjacent photosites on the array are compared. Distance may be measured by correlation based on the angle subtended between the two zones. Multiple zone focusing in which a number of fixed zones are preset for focusing distances from infinity to less than 2 ft are found in a number of current prosumer digital cameras and nonprofessional miniature cameras. Other similar camera systems offer infrared (IR) focusing. IR focusing involves scanning an IR beam emitted through the lens by using a beam splitter. The return IR reflection is read by a separate photo array through a nearby lens. This array sets the focus. IR beamsplitting focusing fails for subjects that have very low or very high IR reflectance. An IR lock can be set when one is photographing through glass. If the camera is equipped with autozooming, the IR diode detector can drive the zoom and maintain constant focus at a given magnification. The use of CCD arrays, photodiodes, and electronic focusing controls (micromotors) is made possible by incorporating of high-quality, miniaturized analog–digital circuitry and attendant CPUs and ROM chips. Such advanced technology permits focusing by measuring image contrast or phase difference. The state-of-the-art Nikon F-5 and its digital equivalent use a phase detection system. This system has a specifically designed array. Phase or image shift is the measured illuminance exiting from the pupil in two distinct zones. Two images are projected to the focal plane, and their displacement is measured. This is very much like a rangefinder system, but instead of a visual split image being seen by the photographer, a digital equivalent is detected by the CCD array (Fig. 14).
Autofocus array CCD
Spot 150 CCDs
50 CCDs Figure 14. Nikon autofocus arrays.
CAMERA FILM A light-sensitive material that upon exposure creates a latent image whose susceptibility to development (amplification) is proportional to the exposure received. Camera film is made of a colloidal suspension commonly referred to as an emulsion. A polyester, acetate base, or other substrate is coated with a suspension of a compound of silver and one or more halides, as well as other addenda such as spectral sensitizing dyes. A film may have several different emulsions coated on it. The difference in emulsion gives rise to the characteristics of the film. The pictorial contrast of a film is the result of coating the substrate with a number of emulsions that have varying sizes of silver halide grains. Silver halides, denoted AgX, can be made of silver (Ag) and any or all of the three halides bromide, iodine, and chloride. The combination of three halides extends the spectral response of the film beyond the film’s inherent UV–blue sensitivity. The other two halides, astatine and fluoride, are not used because of either radioactivity or water solubility. The size of the grain, a microscopic speck of AgX, is the primary determinant of speed or the sensitivity of the film: the larger the grain, the greater the response to a photon. Other addenda are added to increase the speed of the film further. Photographic speed (black-and-white) is determined by the formula ASA = 1/Hm × 0.8, where HM is the exposure in luxseconds that produces a density of 0.1 above base + fog and 0.8 is a safety factor to guard against underexposure. Overall, films exhibit a variety of properties such as speed, spectral sensitivity, exposure latitude, and a unique characteristic curve response (nonlinear) to exposure and development combinations. The overall film’s image quality is referred to as definition. Definition is comprised of three components: resolution, graininess, and sharpness. Black-and-white films (panchromatic) that are used in pictorial photography can be made of three to nine emulsion layers, have an exposure latitude of 1,000 : 1, and a spectral sensitivity from UV to deep red. Black-and-white pan films are sensitive to all visible wavelengths as various tones of gray. They record color information from the object in terms of luminance and chromaticity. Pan films can be designed to limit their spectral response only to the UV and IR portions of the spectrum (Fig. 15). Color films are essentially monochrome emulsions that record the blue, green, and red record of a object or scene on three discrete emulsions that are superimposed, a ‘‘tripack.’’ Development causes these layers to hold the
STILL PHOTOGRAPHY
Hardness Antistatic Antiscatter dye Wetting agent Antihalation layer
1357
Topcoat Emulsion
AgX Au2 Dye addenda
Base Noncurl coat
image as layers of yellow, magenta, and cyan dyes from which a positive or full color print may be made. Negative color film requires reexposing the negative to a similar negative emulsion coating on paper. This negative-tonegative process produces a full color positive image. If the film selected for exposure is transparent, then the image captured after processing will be positive, that is, the colors of the image will be the same as the colors of the object.
Figure 15. Cross section of panchromatic film.
up the curve is greater than that toward the base+fog region. The exposure latitude of negative films is much greater than that of transparent film. The characteristic curve that illustrates these relationships is not fixed but can exhibit different slopes or measures of contrast, depending on development factors such as time, agitation, dilution, and type of developer. The sensitometric studies and densitometric plottings graphically illustrate for the photographer the possible outcomes of film and developer combinations and their exposure latitude.
Density The most useful approach to determining the effect of exposure and development on a given film is measuring the film’s density. The nature of film and light produce three interactive effects during exposure: scattering, reflection, and absorption. Film density is the result of these three exposure outcomes, and it is defined as −log(1/T). T is transmittance, the ratio of transmitted light to incident light. Density can be measured and provides a logarithmic value. The relationship between density, as an outcome of photographic exposure and development, is graphically expressed in the D–log H curve. The curve typically illustrates four zones of interest to the photographer. The base + fog region, the toe, the straight line, and the shoulder region are the four sections that provide useful information about exposure and development. The aim of exposure is to produce a shadow detail of the subject as a density in the negative that has a value of 0.1 above the density of the film’s base+fog. The midtone and highlight reflectances of the scene follow the placement of the shadow on the curve. This correct placement or exposure permits proper transfer of these densities in the negative to paper during the printing process. Because a film’s density range can greatly exceed the capacity of paper’s density range, the correct placement of shadow (exact exposure) is the fundamental first step in tone reproduction. Underexposure results in a lower than useful density in the shadow detail. Graphically, this would place the shadow density below 0.1 and possibly into the base+fog of the film. Consequently, no tonal detail would be captured. In a severe overexposure, the shadow detail would be placed further to the right, up the curve. Though shadow detail would still exist further up the curve and greater in density, it is quite likely that the highlights would move to the shoulder zone. Loss of highlight detail would result. The permissible error range between the shadow–highlight shift is known as exposure latitude. Consider that a subject’s luminance range is 1 : 160. Its log luminance is 2.2. If the film-developer’s exposure range is log 2.8, the difference of 0.6 is the exposure latitude or two stops. This margin for adjusting exposure or for error is not equidistant on the curve. The two stops of adjustment are primarily in favor of overexposure because the distance
MEASURES OF IMAGE QUALITY The overall measure of image quality is termed definition. It is comprised of three major components: graininess, sharpness, and resolution. The transfer of tonal information from the original scene or subject to the negative and through to the final print is of great importance, but the lack of definition in whole or part can contribute to an overall bad image. Sharpness and resolution are attributes of the optical system and the detector. Graininess (noise) is a characteristic of the detector. The quality of an edge can be described as a subjective evaluation of sharpness. When measured on a microlevel as a change of density across an edge, it is known as acutance. The rate of change of density across the edge, or slope, determines the image’s appearance of sharpness. Many factors can influence sharpness. Imprecise focusing, low contrast, poor optics, camera vibrations, and bad developing techniques can result in loss of sharpness. After exposure, silver halides are transformed during development into silver grains whose structure and size change. The increase in size causes the grains to overlap and clump into an irregular pattern that is detectable at higher magnifications, such as enlargement printing. This noticeable pattern is referred to as graininess. It is not objectionable in its own right, but it can obfuscate detail. As density increases in the negative, the perception of graininess decreases. Graininess is most visible in the midtone region. It is inherent in the material and cannot be eliminated simply. Options to minimize graininess are to use film sizes that require minimum magnification and print on lower contrast or matte paper. A microdensitometer can measure graininess and provide a trace of density fluctuations across distance. This granularity is considered a measure of standard deviation around the mean or average density measured. Because the standard deviation is the root mean square, this measure is known as rms granularity. Manufacturers’ measures of rms correlate well with perceptional graininess, but these measures of granularity do not correlate well among various manufacturers.
1358
STILL PHOTOGRAPHY
Resolution is the ability of the optical and detection systems to reproduce fine detail. All of the components in the imaging system combine to produce an overall measure of resolution known as resolving power. Resolving power is expressed as 1/RS = 1/RL + 1/RD . Greater accuracy can be achieved by taking the second moment, that is, 1/(RS )2 = 1/(RL )2 + 1/(RD )2 . Every component in the system contributes to the overall measure, and this measure cannot be higher than the lowest component. Overall photographic definition describes the total image and can consist of many isolated measures that affect the image. Such measures are the point-spread function, as indicated by the size of the Airy disk or diffraction effects, the emulsion spread function of a particular film, and the line spread function that measures the ability of the image to separate adjacent lines in the image. It would be onerous for the photographer to collect various measures and attempt to correlate them. There is an overall measure made available by manufacturers that eliminates such a task. The modulation transfer function (MTF) represents the overall contrast transfer of the object to the image (Fig. 16). If the contrast of the object were to be matched totally by the image, the transfer would be 100%. All detail or frequencies of the object would be maintained at a 1 : 1 contrast ratio regardless of a change in the finest details or frequencies. Modulation is determined as MO = Emax − Emin /(Emax + Emin ) and MI = Emax − Emin /(Emax + Emin ), therefore, Mimage /Mobject . Individual MTF measures for various imaging components can be multiplied to produce one MTF factor for the system. DIGITAL PHOTOGRAPHY The application of CCD or CMOS detectors in place of film at the plane of focus has quickly changed photography. As the resolving power of the detectors has improved and the inherent firmware in the digital camera/back has improved its algorithms for image reconstruction, the usefulness and availability of digital hardware has increased as well. The basic principles of image formation, lens types, and image quality also hold true for digital imaging. The creation of binary image data that are easily manipulated by computer-based software can take enormous advantages of digital pictures. Postexposure photographic data can be eliminated, improved, edited, or added. Images may be sent directly to video monitors, satellites, and remote sites or may be printed on hard copy via a number of devices that do not require any darkroom or projection device. Ink-jet printers, thermal dye imagers, and other devices can produce images that are virtually indistinguishable from traditional photographic images. The crossover to
Modulation transfer curves 100%
Ideal
80 Film A 60 Film B
40 20 0 5
10 15 Frequencies
20
Figure 16. Modulation transfer function.
digital from analog in the professional arena began with the photojournalist and soon extended into catalog photography. This was driven by improved quality, ease of application, and cost effectiveness compared to film. Publication (matrix reproduction) of images in magazines, newspapers, journals, and other media have become more digital than analog. Hybrid approaches that use film as the capture medium and scanners that convert the image to digital data have almost totally ended the use of the process camera in the printing industry. Large format scanning backs are readily available for the view camera. Medium format camera manufacturers provide digital back options for all major systems. Surveys of digital camera sales at the consumer and professional level show a steadily upward trend. The indications are that digital photography will not disappear and may be the preferred method of basic still photography. BIBLIOGRAPHY 1. C. R. Arnold, P. J. Rolls, and J. C. J. Stuart, in D. A. Spencer, ed., Applied Photography, Focal Press, London, 1971. 2. M. J. Langford, Advanced Photography, Focal Press, London, 1972. 3. S. Ray, Applied Photographic Optics, Focal Press, Boston, 1994. 4. S. Ray, Camera Systems, Focal Press, Boston, 1983. 5. L. Stroebel, J. Compton, I. Current, and R. Zakia, Basic Photographic Materials and Processes, 2 ed., Focal Press, Boston, 1998. 6. L. Stroebel, View Camera Techniques, Focal Press, Boston, 1992. 7. The Encyclopedia of Photography, Eastman Kodak Co., Rochester, 1981.
T TELEVISION BROADCAST TRANSMISSION STANDARDS
ANALOG TELEVISION SYSTEMS Black-and-White Television
ALAN S. GODGER JAMES R. REDFORD
The purpose of all conventional broadcast television systems is to provide instantaneous vision beyond human sight, a window into which the viewer may peer to see activity at another place. Not surprisingly, all of the modern systems evolved to have similar characteristics. Basically, a sampling structure is used to convert a threedimensional image (horizontal, vertical, and temporal variations) into a continuous time-varying broadband electrical signal. This modulates a high-frequency carrier with the accompanying sound, and it is broadcast over the airwaves. Reasonably inexpensive consumer television sets recover the picture and sound in the viewer’s home.
ABC Engineering Lab 30 W/7 New York, NY
Since the invention of television, the images and sound have been captured, processed, transmitted, received, and displayed using analog technology, where the picture and sound elements are represented by signals that are proportional to the image amplitude and sound volume. More recently, as solid-state technology has developed, spurred primarily by the development of computers, digital technology has gradually been introduced into handling the television signal, both for image and sound. The digital electric signal representing the various elements of the image and sound is composed of binary numbers that represent the image intensity, color, and so on, and the sound characteristics. Many portions of television systems are now hybrid combinations of analog and digital, and it is expected that eventually all television equipment will be fully digital, except for the transducers, cameras, and microphones (whose inputs are analog) and the television displays and loudspeakers (whose outputs are analog). The currently used broadcast television transmission standards [National Television Systems Committee (NTSC), phase alternate line (PAL) and sequential and memory (SECAM)] for 525- and 625-line systems were designed around analog technology, and although significant portions of those broadcast systems are now hybrid analog/digital or digital, the ‘‘over the air’’ transmission system is still analog. Furthermore, other than for ‘‘component’’ processed portions of the system, the video signals take the same ‘‘encoded’’ form from studio camera to receiver and conform to the same standard. The recently developed ATSC Digital Television Standard, however, uses digital technology for ‘‘over the air’’ transmission, and the digital signals used from the studio camera to the receiver represent the same image and sound, but differ in form in portions of the transmission system. This variation is such that in the studio, maximum image and sound information is coded digitally, but during recording, special effects processing, distribution around a broadcast facility, and transmission, the digital signal is ‘‘compressed’’ to an increasing extent as it approaches its final destination at the home. This permits practical and economical handling of the signal.
Image Representation. The sampling structure first divides the motion into a series of still pictures to be sequenced rapidly enough to restore an illusion of movement. Next, each individual picture is divided vertically into sufficient segments so that enough definition can be retrieved in this dimension at the receiver. This process is called scanning. The individual pictures generated are known as frames; each contains scanning lines from top to bottom. The number of scanning lines necessary was derived from typical room dimensions and practical display size. Based on the acuity of human vision, a viewing distance of four to six picture heights is intended. The scanning lines must be capable of enough transitions to resolve comparable definition horizontally. The image aspect ratio (width/height) of all conventional systems is 4 : 3, from the motion picture industry ‘‘academy aperture.’’ All systems sample the picture from the top left to bottom right. In professional cinema, the projection rate of 48 Hz is sufficient to make flicker practically invisible. Longdistance electric power distribution networks throughout the world use slightly higher rates of 50–60 Hz alternating current. To minimize the movement of vertical ‘‘hum’’ in the picture caused by marginal filtering in direct current power supplies, the picture repetition rate was made equal to the power line frequency. A variation of this process used by all conventional systems is interlaced scanning, whereby every other line is scanned to produce a picture of half the vertical resolution, known as a field. The following field ‘‘fills in’’ the missing lines to form the complete frame. Each field illuminates a sufficient portion of the display so that flicker is practically invisible, yet only half the information is being generated. This conserves bandwidth in transmission. For both fields to start and stop at the same point vertically, one field must have a half scanning line at the top, and the other field must have a half scanning line at the bottom of the picture. This results in an odd number of scanning lines for the entire frame. 1359
1360
TELEVISION BROADCAST TRANSMISSION STANDARDS
Mechanical systems using rotating disks that have spiral holes to scan the image were investigated in the 1920s and 1930s, but these efforts gave way to ‘‘all electronic’’ television. Prior to World War II, developers in the United States experimented with 343-line and 441line systems. Developers in Great Britain began a 405-line service, and after the war, the French developed an 819line system, but these are no longer in use. Synchronization. In most of North and South America and the Far East, where the power line frequency is 60 Hz, a 525-line system became the norm. This results in an interlaced scanning line rate of 15.750 kHz. The development of color television in Europe led to standardization of 625 lines in much of the rest of the world. The resulting line frequency of a 50 Hz field rate is 15.625 kHz. The similar line and field rates enable the use of similar picture tube deflection circuitry and components. Horizontal and vertical frequencies must be synchronous and phase-locked, so they are derived from a common oscillator. Synchronization pulses are inserted between each scanning line (Fig. 1) and between each field to enable the television receiver to present the picture details that have the same spatial orientation as that of the camera. The sync pulses are of opposite polarity from the picture information, permitting easy differentiation in the receiver. The line sync pulses, occurring at a faster rate, are narrower than the field sync pulses, which typically are the duration of several lines. Sync separation circuitry in the receiver discriminates between the two time constants. Sync pulses cause the scanning to retrace rapidly from right to left and from bottom to top.
Blanking. To provide time for the scanning circuits to reposition and stabilize at the start of a line or field, the picture signal is blanked, or turned off. This occurs just before (front porch) and for a short time after (back porch) the horizontal sync pulse, as well as for several lines before and after vertical sync. During vertical sync, serrations are inserted to maintain horizontal synchronization. Shorter equalizing pulses are added in the several blanked lines before and after vertical sync (Fig. 2). All of these pulses occur at twice the rate of normal sync pulses, so that the vertical interval of both fields (which are offset by one-half line) can be identical, simplifying circuit design. Additional scanning lines are blanked before the active picture begins; typically there is a total of 25 blanked lines per field in 625-line systems and 21 blanked lines per field for the 525-line system M. Modern television receivers complete the vertical retrace very soon after the vertical sync pulse is received. The extra blanked lines now contain various ancillary signals, such as for short-time and line-time distortion and noise measurement, ghost cancellation, source identification, closed captioning, and teletext. Fields and lines of each frame are numbered for technical convenience. In the 625-line systems, field 1 is that which begins the active picture with a half line of video. In the 525-line system M, the active picture of field 1 begins with a full line of video. Lines are numbered sequentially throughout the frame, beginning at the vertical sync pulse in the 625-line systems. For the 525-line system M, the line numbering begins at the first complete line of blanking for each field. Field 1 continues halfway through line 263, at which point field 2 begins, containing through line 262.
IRE Maximum chrominance excursions: +120 IRE
120 110 100
Maximum luminance level : 100 +0/−2 IRE
90 80 70 60 50 40 30 20
Horizontal blanking 10.9 ± 0.2 µs
at 20 IRE
Setup level = Picture black :7.5 ± 2 IRE
7.5 0 −10 Front porch Horiz. sync 1.5 ± 0.1 µs −20 4.7 µs −30 Start of ± 0.1 µs line at 50% −40 Sync level −40 ± 2 IRE 55
Active line time : 52.7µs
60 0 µs
5
Sync rise time: 10−90% = 140 ns ± 20 ns
Color black porch : 1.6 ± 0.1 µs Blanking level defines 0 IRE = 0V ± 50 mV Maximum chrominance excursions : −20 IRE Color burst : 5.3 ± 0.1 µs after sync leading edge. 9 ± 1 cycles @ 3.58 MHz (= 2.5µs), 40 ± 2 IRE P-P Breezway : 0.6 ± 0.1 µs Total line time 63.6 µs 10
15
20
25
30
35
40
45
Figure 1. The 525-line system M: Line-time signal specifications.
50
55
60
0 µs
Closed captioning
1361
Postequalizing pulses
−40
NABTS FCC MB
−30
NABTS
NABTS
−20
CC ONLY
0 −10
GCR ONLY
Preequalizing pulses
NABTS NTC-7/FCC Comp
Vertical sync pulse with serrations
20
Source ID
Intercast
IRE
7.5
Ghost cancellation
TELEVISION BROADCAST TRANSMISSION STANDARDS
0
GCR ONLY
NABTS
Burst blanking (9H)
7.5
NABTS FCC bars
NABTS
20
NABTS NTC-7 Comp
261 262 1&3 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 1 F Test, cue, control Vertical blanking interval = 19−21 lines VIRS permitted I Start of and ID E Max 70 IRE Telecommunications Max 80 IRE Vertical sync (9H) interval L fields D 261 262 263 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2&4
−10
Ghost cancellation
−30 −40
CC/extended data service (entire line may be used)
−20
Figure 2. The 525-line system M: Field-time and vertical interval signal specifications.
120
L.D. = 2% .5%
100 80
Kpb − +4% − − − − − − − − −5%
0% − − − − − − − − − − − −
12.5%
25%
60
37.5%
40
50%
20 − − − − − − − − −
7.5
(−)
−20 + −40
200 ns + −36 − − − −44 −
(+)
62.5%
87.5% +
Signal Levels. During blanking, the video signal is at 0 V, the reference used to measure picture (positive-going) and sync (negative-going) levels. Signal amplitudes are measured directly in millivolts, except that, because of changes made during the conversion to color, the 525line system uses IRE units. A specialized oscilloscope
1 3.58
100%
Figure 3. Typical NTSC-M waveform monitor graticule. A number of additional markings are for measuring various signal distortions.
is used to monitor the characteristics of the signal amplitude and period. The waveform monitor has its voltage scale calibrated in millivolts (or IRE units for 525-line applications), and its time base is calibrated to scanning line and picture field rates, as well as in microseconds (Fig. 3).
1362
TELEVISION BROADCAST TRANSMISSION STANDARDS
Originally, the 525-line system used an amplitude of 1 V peak-to-peak (p–p) for the picture information, and it used 0.4 V for sync. So that color information (modulated onto a subcarrier which can extend above peak white level) could be accommodated within the same dynamic range of existing equipment, the 1.4 V p–p scaling was compressed in amplitude to 1 V p–p. This created fractional voltage levels for peak white (714.3 mV) and sync (−286.7 mV), so a 1 V scale of 140 IRE units was adopted to simplify measurement. The 625-line standards did not have this historical complication. The peak white level is 700 mV, and sync level is −300 mV. Another anachronism of the 525-line system is the use of a direct-current (dc) offset of the picture black from the blanking level. This was done to ensure that during retrace, the electron beam in the display tube was completely cut off, so retrace lines did not appear in the picture. This setup level originally varied between 5 and 10 IRE units above blanking but was standardized at 7.5 IRE for color TV, although it has been discarded altogether in Japan. Setup, or ‘‘lift,’’ was used to some extent in earlier systems, but abandoned by the advent of 625-line services. The electrical-to-optical transfer characteristic (gamma) of the cathode-ray picture tube is nonlinear. Doubling the video signal level applied to the control grid of the picture tube does not cause the light output to double; rather, it follows a power law of approximately 2.5. To correct for this, the video signal itself is made nonlinear, and an opposite transfer characteristic is about 0.4. This correction is applied at the camera in all systems. Resolution. Resolution in the vertical direction is determined by taking the total number of scanning lines and subtracting those used for vertical blanking. This is multiplied by 0.7, a combination of the Kell factor, (a correction for slight overlap between adjacent lines) and an additional correction for imperfect interlace. By convention, television resolution is expressed in television (TV) lines per picture height, in contrast to photographic ‘‘line pairs per millimeter.’’ Because the resolution is fixed by the scanning system, picture size is immaterial. Note that a vertical ‘‘line pair’’ in television requires two scanning lines. To compute the bandwidth necessary for equal horizontal resolution, the vertical resolution is multiplied by the aspect ratio of 4 : 3 and is divided by the ratio of total scanning line time to active picture (unblanked) line time. This number is halved because an electric cycle defines a line pair, whereas a ‘‘TV line of resolution’’ is really only one transition. Multiplying the number of cycles per scanning line by the total number of scanning lines in a field and then multiplying the number of fields per second gives the bandwidth of the baseband video signal. Broadcasting Standards. The various systems have been assigned letter designations by the International Telecommunications Union (ITU). The letters were assigned as the systems were registered, so alphabetical order bears no relation to system differences (Table 1), but a rearrangement highlights similarities (Table 2).
Only scanning parameters and radio-frequency (RF) characteristics are defined; color encoding is not specified. Systems A, C, E, and F are no longer used. Portions of the very high frequency (VHF) and ultrahigh-frequency (UHF) RF spectrum are divided into channels for television broadcast. Modern channel assignments are 6 MHz wide in the Americas and the Far East. Elsewhere, they are generally 7 MHz in VHF and 8 MHz in UHF, and the carrier is a fixed distance from one edge. Because the picture carrier in most systems is near the lower edge and the audio signals are at the upper end, when the opposite is true, the channels are called inverted. As a bandwidth-saving technique, the amplitudemodulated RF signal is filtered so that only one sideband is fully emitted; the other sideband is vestigial, or partially suppressed, which aids in fine-tuning to the correct carrier frequency at the receiver. The full sideband, which represents the video bandwidth, extends in the direction of the audio carrier(s), but sufficient guard band is included to prevent interference. The bandwidth of the vestigial sideband varies among systems as does the placement of the audio carrier in relation to the picture carrier (Fig. 4). These factors complicate receiver design in areas where signals of two or more systems may exist. The main audio signal is sent via an amplitude-modulated or, more commonly, frequency-modulated carrier. Peak deviation in frequency modulation (FM) is ±50 kHz with 50 µm preemphasis, except ±25 kHz and 75 µs for systems M and N. Preemphasis for improving the signal-to-noise ratio is common in FM systems; it was used in some amplitudemodulation (AM) systems to simplify receivers that could accommodate both modulation schemes. Amplitude modulation is used in all systems for the video waveform, which, unlike audio, is not sinusoidal. The majority of systems employ a negative sense of modulation, such that negative excursions of the baseband signal produce an increase in the amplitude of the modulated carrier. This allows the constant amplitude sync pulses to serve as an indication of the received RF signal level for automatic gain control. Interfering electrical energy also, tends to produce less noticeable black flashes in the received picture, and the duty cycle of the signal is reduced, which consumes less power at the transmitter. Multichannel Sound Early attempts to provide stereo sound for special TV events involved simulcasting, whereby an FM radio station in the same coverage area broadcast the stereo audio for the program. Due to the high profitability of FM radio today, this scheme is becoming impractical. For the 525line system M that has channels of only 6 MHz bandwidth, a multiplexing scheme is used on the existing single audio carrier. Due to the wider channel bandwidths in 625-line systems, multiple sound carriers emerged as the solution. Multiplex Sound Systems. In the United States, Zenith Electronics developed multichannel television sound (MTS), a pilot-tone system in which the sum of the
1363
Frequency band Channel B/W (MHz) Visual/Edge separation (MHz) Video B/W (MHz) Vestigial sideband (MHz) Visual modulation polarity (Wht =) Picture/synchronization ratio Black pedestal(%) Visual/aural separation (MHz) Aural modulation Aural peak deviation (kHz) Aural preemphasis (µs) Lines per frame Field blanking (lines) Field synchronization (lines) Equalization pulses (lines) Vertical resolution (L/ph) Horizontal resolution (L/ph) Field rate (Hz) Line rate (Hz) Line blanking (µs) Front porch (µs) Line synchronization (µs)
Standard
AM
50 625 25 2.5
7/3 0 +5.5 FM ±50 50 625 25 2.5 2.5 413 400 50 15,625 12 1.5 4.7
7/3
0 −3.5
AM
405 13 4
0
254 270
50 10,125 18 1.75 9.0
50 15,625 12 1.4 5.0
413 400
2.5
0 +5.5
7/3
5 −0.75 Pos.
5 −0.75 Neg.
3 +0.75 Pos.
VHF 7 +1.25
C Luxembourg
V/U 7/8 +1.25
B/G Western Europe
VHF 5 −1.25
A United Kingdom
50 15,625 12 1.5 4.7
413 450
2.5
50 625 25 2.5
FM ±50
0 +6.5
7/3
6 −0.75 Neg.
V/U 8 +1.25
D/K Eastern Europe
Table 1. Principal Characteristics of World Television Systems
50 20,475 9.5 0.6 2.5
516 635
0
819 33 20 µs
AM
0 ±11.15
7/3
10 ±2.0 Pos.
VHF 14 ±2.83
E France
H
50 20,475 9.2 1.1 3.6
516 400
3.5
50 819 33 3.5
AM
0 +5.5
7/3
5 −0.75 Pos.
50 15,625 12 1.5 4.7
413 400
2.5
50 625 25 2.5
FM ±50
0 +5.5
7/3
5 −1.25 Neg.
UHF 8 +1.25
Belgium VHF 7 +1.25
F
50 15,625 12 1.5 4.7
413 430
2.5
50 625 25 2.5
FM ±50
0 +6.0
7/3
5.5 −1.25 Neg.
V/U 8 +1.25
I United Kingdom
50 15,625 12 1.5 4.7
413 450
2.5
50 625 25 2.5
FM ±50
0 +6.5
7/3
6 −1.25 Neg.
V/U 8 +1.25
K French Overseas Territories
50 15,625 12 1.5 4.7
413 450
2.5
625 25 2.5
AM
0 +6.5
7/3
6 −1.25 Pos.
50 15,625 12 1.5 4.7
413 450
2.5
625 25 2.5
AM
0 −6.5
7/3
6 +1.25 Pos.
VHF-1 8 −1.25
France V-3/U 8 +1.25
L
L
59.94 15,734 10.9 1.5 4.7
350 320
3
75 525 21 3
FM ±25
7.5 +4.5
10/4
4.2 −0.75 Neg.
V/U 6 +1.25
M North America/ Far East
50 15,625 12 1.5 4.7
413 320
2.5
75 625 25 2.5
FM ±25
0 +4.5
7/3
4.2 −0.75 Neg.
V/U 6 +1.25
N South America
Frequencies in MHz () = obsolete std.
AM 3.0
0.75
(A)
1.25
0 3.5
5
NTSC, PAL FM M N 0
0.75
4.2
1.5
1.25
0.5 3.58
4.5 6
Mostly PAL B (C) (F) 0
0.75
5.0
B-FM (C, F-AM)
1.5
1.25
0.5 4.43
5.5
7
Mostly PAL FM D G K 0
G, H = 5.0 I = 5.5
0.75
4.43
H, I, K′ L
0
5.5
6.0
1.5
1.25
I
G H I
6.0
8
L-AM D, K, K′-FM
Mostly SECAM
1.25
4.25
1.5 4.41
D, K K, L 6.5 8
AM (E)
0 11.15
~ ~ ~ ~ ~ ~
Figure 4. Television RF channel spectra for the various world television systems. For simplicity, only a single illustration of the lower and upper portions of the channel spectra is shown for the 8 MHz wide channels. Therefore, for systems D, H, I, and K, the lower and upper illustrations are not adjacent to each other.
1.5
1.25
G, H G H I 0.5 1.0
1364
10.0
2.0 2.83
14
TELEVISION BROADCAST TRANSMISSION STANDARDS
two stereo channels (L + R) modulates the main TV audio FM carrier and provides a monophonic signal to conventional receivers. A difference signal (L − R) is dbxTV suppressed-carrier amplitude-modulated onto an audio subcarrier at twice the line scanning frequency (2fH ), and a pilot is sent at the line frequency as a reference for demodulation. A second audio program (SAP) may be frequency modulated at 5fH , and nonpublic voice or data may be included at 6.5fH . Japan’s developed a similar FM/FM stereo system using FM of the L − R subchannel at 2fH . A control subcarrier at 3.5fH is tone-modulated to indicate whether stereo or second audio programming is being broadcast. Multiple Carrier Systems. In Germany, IRT introduced Zweiton, a dual carrier system for transmission standards B and G. In both standards, the main carrier is frequencymodulated at 5.5 MHz above the vision carrier. For stereo, this carrier conveys the sum of the two channels (L + R). The second frequency-modulated sound carrier is placed 15.5 times the line scanning frequency above the main carrier, or 5.742 MHz above the vision carrier. For transmission system D, a similar relationship exists to the 6.0 MHz main channel. For stereo, this carrier conveys only the right audio channel. A reference carrier at 3fH is tone-modulated, and the particular tone frequency switches receivers for stereo or second audio channel sound. A variant is used in Korea where the second carrier is placed at 4.72 MHz above the vision carrier and conveys L − R information. The British Broadcasting Corporation (BBC) in Britain developed near-instantaneous companded audio multiplex (NICAM), a digital sound carrier system. Both audio channels are sampled at 32 kHz at 14-bit resolution. Each sample is compressed to 10 bits, then arranged into frame packages of 728-bit length. These are rearranged and then the data are scrambled to disperse the effect of noise at the receiver. Finally, two bits at a time are fed to a QPSK modulator for transmission. Either stereo or two independent sound signals may be transmitted. Other possible combinations are (1) one sound and one data channel or (2) two data channels. The original analog-modulated single-channel sound carrier is retained for older receivers. The digital carrier is 6.552 MHz above the vision carrier for transmission system I or 5.85 MHz for systems B, G, and L. NTSC Color Television It has long been recognized that color vision in humans results from three types of photoreceptors in the eye, each sensitive to a different portion of the visible spectrum. The ratio of excitation creates the perception of hue and saturation, and the aggregate evokes the sensation of brightness. Stimulating the three types of photoreceptors using three wavelengths of light can produce the impression of a wide gamut of colors. For television, the image is optically divided into three primary color images, and this information is delivered to the receiver, which spatially integrates the three pictures, something like tripack color film and printing.
1365
Sequential Color Systems. The first attempts at commercial color TV involved transmitting the three color images sequentially. Compatibility with existing transmitters was essential from an economic standpoint. Because linesequential transmission caused crawl patterning, fieldsequential was preferred. However, there was no separate luminance signal for existing black-and-white receivers to use. To reduce flicker caused by the apparent brightness differences among the three primary colors, the field rate had to be increased, and maintaining an equivalent channel bandwidth required that the number of scanning lines be reduced. These changes aggravated the compatibility problem with existing sets. A field-sequential system developed by the Columbia Broadcasting System (CBS) network was briefly commissioned in the United States during 1951. To be compatible, a color television system must have the same channel bandwidth as existing monochrome transmitters and receivers, use equivalent scanning parameters, and supply the same luminance signal, as if the picture were black-and-white. An all-industry body, the National Television Systems Committee (NTSC), was set up in the United States to devise such a color TV system. Separate Luminance and Mixed Highs. The human visual system senses shapes and edges from brightness variations. Color fills in only the larger areas, much like a child’s coloring book. At the suggestion of Hazeltine Electronics, the existing wide-bandwidth luminance signal of black-and-white television was retained. The color information is limited to a much narrower bandwidth of the order of 1 MHz and restricts its resolution in the horizontal direction. This first led to a dot-sequential system that sampled the three colors many times along each scanning line to form a high-frequency chrominance signal. The frequency of sampling may be likened to a subcarrier signal whose amplitude and phase are changing according to color variations along the line. At the receiver, the ‘‘dot’’ patterns of each primary resulting from sampling are averaged in low-pass frequency filters. The result is a continuous but low-resolution full-color signal. Equal amounts of their higher frequency components are summed to form a mixed-highs signal for fine luminance detail (Y = 1/3R + 1/3G + 1/3B), an idea from Philco. Quadrature Modulation. The dot-sequential concept formed the basis for a more sophisticated simultaneous system. The luminance signal contains both high- and low-frequency components. Only two lower resolution color signals are needed (the third can be derived by subtracting their sum from the low-frequency portion of luminance). The spectral composition of green is nearest to that of luminance, so transmitting the red and blue signals improves the signal-to-noise performance. These low-frequency color signals are sampled by using a timemultiplexing technique proposed by Philco, known as quadrature modulation. The chrominance signal is formed by the addition of two subcarriers, which are locked at the same frequency
1366
TELEVISION BROADCAST TRANSMISSION STANDARDS
but differ in phase by 90 ° . The two subcarriers are modulated by separate baseband signals such that each is sampled when the other carrier is at a null. This results in modulating the subcarrier in both amplitude and phase. The amplitude relates to the saturation of the color, and the phase component corresponds to the hue (Fig. 5).
Amplitude
Addition of modulated I and Q Modulated I signal
Modulated Q signal
Time
Sound carrier
Color subcarrier
Picture carrier
Figure 5. Quadrature modulation. Addition of two amplitude-modulated signals whose carrier frequencies are in phase quadrature (same frequency but offset in phase by 90 ° ) produces an output signal whose carrier is modulated in both amplitude (AM) and phase (PM) simultaneously. This method of combining two baseband signals onto a single carrier is called quadrature modulation. In the case of NTSC or PAL encoding for color television, the two baseband components represent the two chrominance signals (I and Q for NTSC, U and V for PAL). The resulting amplitude of the subcarrier relates to the saturation, and the phase conveys the hue information. The frequency of the subcarrier is unchanged.
Frequency Multiplexing. The sampling rate is more than twice the highest frequency of the color signals after low-pass filtering, so the chrominance information shares the upper part of the video spectrum with luminance. This frequency-multiplexing scheme was put forward by General Electric. The scanning process involves sampling the image at line and field rates; therefore, energy in the video signal is concentrated at intervals of the line and field frequencies. These sidebands leave pockets between them where very little energy exists. The exact subcarrier frequency was made an odd multiple of one-half the line scanning frequency. This causes sidebands containing the color information to fall likewise in between those of the existing luminance signal (Fig. 6). Therefore, the phase of the subcarrier signal is opposite line-to-line. This prevents the positive and negative excursions of the subcarrier from lining up vertically in the picture, and it results in a less objectionable ‘‘dot’’ interference pattern between the subcarrier and luminance signal. Comb filtering to separate luminance and chrominance may be employed by examining the phase of information around the subcarrier frequency on adjacent lines. The dot pattern is further concealed because the subcarrier phase is also opposite frame-to-frame. A four-field sequence is established whereby the two interlaced picture fields, together with the alternating phase of subcarrier on sequential frames, requires maintaining the proper sequence. Sources to be intercut or mixed must be properly timed, and editing points must be chosen to preserve the sequence of the four color fields.
fH /2 −0.75 0
0.5
0
32
64
96 128 160 192 224 256 288 320 352 384 416 1
0 1.25
2
2 3
3 4
455
502 534 572
3.58
4.2 4.5
5
Sidebands MHz baseband 6 MHz channel
Figure 6. Frequency spectrum of composite NTSC-M color television signal showing relationships between the baseband and channel spectra and between sidebands of the picture carrier and color subcarrier.
TELEVISION BROADCAST TRANSMISSION STANDARDS
A slight modification of the line and field scanning frequencies was necessary because one of the sidebands of the new color subcarrier fell right at 4.5 MHz, the rest frequency of the FM sound carrier for system M. Existing black-and-white receivers did not have adequate filtering to prevent an annoying buzz when the program sound was low and color saturation was high. By reducing the scanning frequencies by a mere 0.1%, the sidebands of luminance and chrominance remained interleaved, but shifted to eliminate the problem. Hence, the field frequency became 59.94 Hz, and the line frequency became 15.734 kHz. Color-Difference Signals. Another suggestion came from Philco: Interference with the luminance signal is minimized by forming the two color signals as the difference between their respective primary and luminance (i.e., R − Y, B − Y). This makes the color-difference signals smaller in amplitude because most scenes have predominantly pale colors. The subcarrier itself is suppressed, so that only the sidebands are formed. When there is no color in the picture, the subcarrier vanishes. This necessitates a local oscillator at the receiver. A color-burst reference is inserted on the back porch of the horizontal sync pulse that synchronizes the reference oscillator and provides an amplitude reference for color saturation automatic gain control. Constant Luminance. In the constant-amplitude formulation (Y = 1/3R + 1/3G + 1/3B), the luminance signal does not represent the exact scene brightness. Part of the brightness information is carried by the chrominance channels, so unwanted irregularities in them, such as noise and interference, produce brightness variations. Also, the gray-scale rendition of a color broadcast on a black-and-white receiver is not correct. Hazeltine Electronics suggested weighting the contributions of the primaries to the luminance signal according to their actual addition to the displayed brightness. The color-difference signals will then represent only variations in hue and saturation because they are ‘‘minus’’ the true brightness (R − Y, B − Y). A design based on this principle is called a constant-luminance system. For the display phosphors and white point originally specified, the luminance composition is Y = 30%R + 59%G + 11%B. Scaling Factors. The two low-bandwidth color-difference signals modulate a relatively high-frequency subcarrier superimposed onto the signal level representing luminance. However, the peak subcarrier excursions for some hues could reach far beyond the original black-and-white limits, where the complete picture signal is restricted between levels representing blanking and peak white picture information. Overmodulation at the transmitter may produce periodic suppression of the RF carrier and/or interference with the synchronizing signals. If the overall amplitude of the composite (luminance level plus superimposed subcarrier amplitude) signal were simply lowered, the effective power of the transmitted signal would be significantly reduced.
1367
A better solution was to reduce the overall amplitude of only the modulated subcarrier signal. However, such an arbitrary reduction would severely impair the signalto-noise ratio of the chrominance information. The best solution proved to be selective reduction of each of the baseband R − Y and B − Y signal amplitudes to restrict the resulting modulated subcarrier excursions to ±4/3 of the luminance signal levels. The R − Y signal is divided by 1.14, and B − Y is divided by 2.03. It was found that the resulting 33.3% overmodulation beyond both peak white and blanking levels was an acceptable compromise because the incidence of highly saturated colors is slight (Fig. 7). Proportioned Bandwidths. RCA proposed shifting the axes of modulation from R − Y, B − Y to conform to the greater and lesser acuity of human vision for certain colors. The new coordinates, called I and Q, are along the orange/cyan and purple/yellow-green axes. This was done so that the bandwidths of the two color signals could be proportioned to minimize cross talk (Fig. 8). Early receivers used the wider bandwidth of the I signal; however, it became evident that a very acceptable color picture could be reproduced when the I bandwidth is restricted to the same as that of the Q channel. Virtually all NTSC receivers now employ ‘‘narrowband’’ I channel decoding. A block diagram of NTSC color encoding is shown in Fig. 9. These recommendations were adopted by the U.S. Federal Communications Commission in late 1953, and commercial color broadcasting began in early 1954. Sequential and Memory (SECAM) Economic devastation of World War II delayed the introduction of color television to Europe and other regions. Because differences between 525- and 625line scanning standards made video tapes incompatible anyway and satellite transmission was unheard of, there seemed little reason not to explore possible improvements to the NTSC process. Sequential Frequency Modulation. The most tenuous characteristic of NTSC proved to be its sensitivity to distortion of the phase component of the modulated subcarrier. Because the phase component imparts color hue information, errors are quite noticeable, especially in skin tones. Also of concern were variations in the subcarrier amplitude, which affect color saturation. Most long-distance transmission circuits in Europe did not have the phase and gain linearity to cope with the added color subcarrier requirements. A solution to these drawbacks was devised by the Campagnie Fran¸caise de T`el`evision in Paris. By employing a one-line delay in the receiver, quadrature modulation of the subcarrier could be discarded, and the color-difference signals (called DR and DB in SECAM) sent sequentially on alternate lines. This reduces vertical resolution in color by half; however, it is sufficient to provide only coarse detail vertically, as is already the case horizontally. In early development, AM was contemplated; however, the use of FM also eliminated the effects of subcarrier
1368
TELEVISION BROADCAST TRANSMISSION STANDARDS
131.3 131.1
(a)
(b)
934 933
117.0
824 100.4
100 89.8
702
700
94.1
652
620
72.3
491
62.1
59.1
411
393
45.4
48.4
289
35.2
307
209
17.7 13.4
80
7.5 IRE 7.1
48 −9.5
0mV −2.5 −124
−23.6 −23.8
−233 −234 (c)
(100) 100.3 100.2
(d)
700 700 700
89.6 76.9
618 77.1
69.2
(525)
72.4
527 465
56.1 48.4
368
46.2
308
35.9 38.1
295 217
230
157
28.3
60
15.1 7.5 IRE
12.0 7.2
36 −5.3 −15.8 −16.0
489
0mV −2 −93
−175 −175
Figure 7. (a) 100% NTSC color bars (100/7.5/100/7.5). (b) 100% PAL color bars (100/0/100/0). (c) Standard 75% ‘‘EIA’’ color bars (75/7.5/75/7.5). (d) Standard 75% ‘‘EBU’’ color bars (100/0/75/0).
amplitude distortion. In addition, FM allowed recording the composite signal on conventional black-and-white tape machines because precise time base correction was not necessary. Compatibility. In FM, the subcarrier is always present, superimposed on the luminance signal at constant amplitude (unlike NTSC, in which the subcarrier produces noticeable interference with the luminance only on highly saturated colors). To reduce its visibility, a number of techniques are employed. First, preemphasis is applied to the baseband colordifference signals to lessen their amplitudes at lower saturation, but preserve adequate signal-to-noise ratio (low-level preemphasis; see Fig. 10). Second, different subcarrier frequencies are employed that are integral multiples of the scanning line frequency; foB is 4.25 MHz (272 H), and foR is 4.40625 MHz (282 H). The foR signal is inverted before modulation so that the maximum deviation is toward a lower frequency, reducing the bandwidth required for the dual subcarriers. Third, another level of preemphasis is applied to the modulated subcarrier around a point between the two rest frequencies, known as the ‘‘cloche’’ frequency of 4.286 MHz (high-level preemphasis, the so-called ‘‘antibell’’ shaping shown in Fig. 11). Finally, the phase of the modulated subcarrier is reversed on consecutive
fields and, additionally, on every third scanning line, or, alternately, every three lines. Line Identification. Synchronizing the receiver to the alternating lines of color-difference signals is provided in one of two ways. Earlier specifications called for nine lines of vertical blanking to contain a field identification sequence formed by truncated sawteeth of the colordifference signals from the white point to the limiting frequency (so-called ‘‘bottles’’; see Fig. 12). This method is referred to as ‘‘SECAM-V.’’ As use of the vertical interval increased for ancillary signals, receiver demodulators were fashioned to sample the unblanked subcarrier immediately following the horizontal sync pulse, providing an indication of line sequence from the rest frequency. Where this method is employed, it is called ‘‘SECAM-H.’’ An advantage of this method is near-instantaneous recovery from momentary color field sequence errors, whereas SECAM-V receivers must wait until the next vertical interval. Issues in Program Production. High-level preemphasis causes the chrominance subcarrier envelope to increase in amplitude at horizontal transitions, as can be seen on a waveform monitor (Fig. 13). Unlike NTSC, the subcarrier amplitude bears no relation to saturation, so, except for testing purposes, a luminance low-pass filter is employed
TELEVISION BROADCAST TRANSMISSION STANDARDS
rp
les
53
5
C
C
C
100 IRE P−P 80
7
Red 88
Pu
520
496C
510
493C= 780 630
R−Y
611
58
590
600
118
500C
Reds
s
nge Ora
+I
1369
110 0C
55
Magenta 82
58
+Q
Yel low s
0
0C
56
60
40
67 C
°
380=5 430
33
573
83
20
Yellow 62
167°
180°
(B−Y)570
ts Viole
3°
12
61 °
° 103
90°
440 450 455 B-Y 460 nm
0° 347°
62 Blue 83
3°
21
470
es Blu
23 246° 1°
3°
30 283°
270°
0
56
48
0
−Q
54
496
500
520
5 53
510
ns
4
ee
490.5
G−Y
Gr
110
48
88 Cyan
2
55
0
82 Green
118
ans Cy
−(R−Y)
Figure 8. Vector relationship among chrominance components and corresponding dominant wavelengths. 75% color bars with 7.5% setup. Linear NTSC system, NTSC luminophors, illuminant C. Hexagon defines maximum chrominance subcarrier amplitudes as defined by 100% color bars with 7.5% setup. Caution: The outer calibration circle on vectorscopes does not represent exactly 100 IRE P–P.
on the waveform monitor. A vectorscope presentation of the saturation and hue is implemented by decoding the FM subcarrier into baseband color-difference signals and applying them to an X, Y display. Unfortunately, the choice of FM for the color subcarrier means that conventional studio production switchers cannot be employed for effects such as mixing or fading from one scene to another because reducing the amplitude of the subcarrier does not reduce the saturation. This necessitates using a component switcher and then using encoding afterward. When the signal has already been encoded to SECAM (such as from a prerecorded video tape), it must be decoded before the component switcher and then reencoded. Like NTSC, program editing must be done in two-frame increments. Although the subcarrier phase is reversed on
a field-and-line basis, establishing a ‘‘12-field sequence,’’ it is the instantaneous frequency — not phase — that defines the hue. However, the line-by-line sequence of the colordifference signals must be maintained. The odd number of scanning lines means that each successive frame begins with the opposite color-difference signal. As described before, mixes or fades are never done by using composite signals. Because the instantaneous frequency of the subcarrier is not related to the line scanning frequency, it is impossible to employ modern comb-filtering techniques to separate the chrominance and luminance in decoding. Increasingly, special effects devices rely on decoding the composite TV signal to components for manipulation, then reencoding. Every operation of this sort impairs luminance resolution because a notch filter must be used around
Gamma corr.
RED
30% 59% 11%
Y Matrix
Luminance
Delay 1ms
Sync
Clock − +
Color bar generator
GRN
Gamma corr.
Sync generator
Burst Flag
Bars
21% −52%
Pix
31%
LPF 0.5 MHz
Q Matrix
Adder
Blanking
Q Modulator
NTSC
Encoded chroma
33° Burst generator
+ − Auto white balance
BLU
Subcarrier generator
Chroma adder 123°
60% −28% −32%
Gamma corr.
Modulated I I Matrix
LPF 1.5 MHz
I Modulator
Delay 400 ns
Baseband I Figure 9. Block diagram of RGB to NTSC encoding (and related circuitry).
3.900.25
−506
Modulation limit
71.4 Subcarrier: Amplitudes in % of the luminance interval (peak−to−peak)
4,126.25 4,171.25
−280 −235
23
4,286.00 4,361.25 4,406.25 4,451.25
4,641.25 4,686.25
35.8 Red Magenta 30.2
− 45 0 +45
+235 +280 +350 + Kc
26.2 30.4 35.8
center frequency
3.900
− 506
4.020
− 230
4.098
−152
4.172
− 78
30
0
23.7
4.250 4.286
Yellow White-black Blue
D′B identification line
51.6 39.9
Subcarrier: Amplitudes in % of the luminance interval (peak−to−peak)
71.4
Yellow Green Red White-black
23
center frequency
24
Cyan
4.328
+78
4.402
+152
4.480
+230
39.4
4.756
+506 + Kc
Modulation limit
30
Magenta Blue
61.5 Green 67.8 Cyan D′R identification line
77.2
Figure 10. SECAM baseband (low-level) preemphasis.
1370
77.2
TELEVISION BROADCAST TRANSMISSION STANDARDS
the subcarrier frequency. These concerns have led many countries that formerly adopted SECAM to switch to PAL for program production and transcode to SECAM only for RF broadcasting.
scaling factors are used, and the signals are known as V and U, respectively. Color Phase Alternation. In the PAL System, the phase of the modulated V component of the chrominance signal is reversed on alternate lines to cancel chrominance phase distortion acquired in equipment or transmission. A reference is provided to indicate which lines have +V or −V phase by also shifting the phase of the color burst signal by ±45 ° on alternate lines. Any phase shift encountered will have the opposite effect on the displayed hue on adjacent lines in the picture. If the phase error is limited to just a few degrees, the eye integrates the error, because, more chrominance detail is provided in the vertical direction than can be perceived at normal viewing distances. Receivers based on this principle are said to have ‘‘simple PAL’’ decoders. If the phase error is more than a few degrees, the difference in hue produces a venetian-blind effect, called Hanover bars. Adding a one-line delay permits integrating chrominance information from adjacent scanning lines electrically, and there is a slight reduction in saturation for large errors. Color resolution is reduced by half in the vertical direction but more closely matches horizontal resolution due to band-limiting in the encoder. This technique of decoding is called ‘‘deluxe PAL.’’
Phase Alternate Line (PAL) To retain the ease of the NTSC in program production, yet correct for phase errors, the German Telefunken Company developed a system more comparable to the NTSC that retains quadrature modulation. Because of the wider channel bandwidth of 625-line systems, the color subcarrier could be positioned so that the sidebands from both color-difference signals have the same bandwidth. This means that R − Y and B − Y signals could be used directly, rather than I and Q as in the NTSC. Identical
dB
dB
0
ATTENUATION
Shaping and (complementary curves)
3.8
4
−1
3
−3
5
−5
7
−7
9
−9
11
−11
13
−13 −15
15 4.4
4.2
Compatibility. The line-by-line phase reversal results in an identical phase on alternate lines for hues on or near the V axis. To sustain a low-visibility interference pattern in PAL, the subcarrier frequency is made an odd multiple of one-quarter of the line frequency (creating eight distinct color fields). This effectively offsets the excursions of the V component by 90 ° line-to-line and offsets those of the U component by 270 ° . Because this 180 ° difference would cause the excursions of one component to line up vertically with those of the other in the next frame, an additional 25 Hz offset (fV /2) is added to the PAL subcarrier frequency to reduce its visibility further. In early subcarrier oscillator designs, the reference was derived from the mean phase of the alternating burst signal. Interlaced scanning causes a half-line offset between fields with respect to the vertical position, so that
Frequencies
4.6
4.8
Mc
FC = 4.286 Mc 4.75 Mc
3.9 Mc
Figure 11. SECAM RF (high-level) preemphasis.
Characteristic signal of identification lines
D′R and D′B
End of frame blanking interval
Frequency
3.6
1
1
2
3
D′B D′R
4
5
6
7
8
9
10
1371
11 12 13
14 15 16 17 18
19 20
21 22
D′B D′R Figure 12. SECAM field identification ‘‘bottles.’’
1372
TELEVISION BROADCAST TRANSMISSION STANDARDS
Black
Blue
Red
Mauve
Green
Turquoise
Yellow
White
LINE D'R
LINE D'B
Figure 13. SECAM color bar waveforms.
the number of bursts actually blanked during the 7(1/2) H vertical interval would be different for the odd versus even fields. Because the phase of burst alternates line-to-line, the mean phase would then appear to vary in this region, causing disturbances at the top of the picture. This is remedied by a technique known as ‘‘Bruch blanking.’’ The burst blanking is increased to a total of nine lines and repositioned in a four-field progression to include the 7(1/2) H interval, such that the first and last burst of every field has a phase corresponding to (−U + V) or +135 ° . The burst signal is said to ‘‘meander,’’ so that color fields 3 and 7 have the earliest bursts.
This complicates program editing because edit points occur only every four frames, which is slightly less than 1/10 s. Comb filtering to separate chrominance and luminance in decoding is somewhat more complicated in PAL; however, it has become essential for special picture effects. On a waveform monitor, the composite PAL signal looks very much like NTSC, except that the 25 Hz offset causes a slight phase shift from line to line, so that when viewing the entire field, the sine wave pattern is blurred. Because of the reversal in the phase of the V component on alternate lines, the vectorscope presentation has a mirror image about the U axis (Fig. 14).
Issues in Program Production. Because the subcarrier frequency in PAL is an odd multiple of one-quarter the line frequency, each line ends on a quarter-cycle. This, coupled with the whole number plus one-half lines per field, causes the phase of subcarrier to be offset in each field by 45 ° . Thus, in PAL, the subcarrier phase repeats only every eight fields, creating an ‘‘eight-field sequence.’’
Variations of PAL. The differences between most 625line transmission standards involve only RF parameters (such as sound-to-picture carrier spacing). For 625-line PAL program production, a common set of technical specifications may be used. These standards are routinely referred to in the production environment as ‘‘PAL-B,’’ although the baseband signals may be used with any 625-line transmission standard.
TELEVISION BROADCAST TRANSMISSION STANDARDS
1373
cy g
R
MG V
75% YL
b
100%
U yl
B 20% 3° G
CY
20%
mg 5% 10°
r 2°
10% 10%
Several countries in South America have adopted the PAL system. The 6 MHz channel allocations in that region meant that the color subcarrier frequency had to be suitably located, about 1 MHz lower in frequency than for 7 MHz or 8 MHz channels. The exact frequencies are close to, but not the same as, those for the NTSC. The 625-line system is known as PAL-N. Studio production for this standard is done in conventional ‘‘PAL-B,’’ then converted to PAL-N at the transmitter. The 525-line PAL is known as PAL-M, and it requires studio equipment unique to this standard, although the trend is toward using conventional NTSC-M equipment and also transcoding to PAL-M at the transmitter. PAL-M does not employ a 25 Hz offset of the subcarrier frequency, as in all other PAL systems. Similarities of Color Encoding Systems The similarities of the three basic color television encoding systems are notable (see Table 3). They all rely on the concept of a separate luminance signal that provides compatibility with black-and-white television receivers. The ‘‘mixed-highs’’ principle combines high-frequency information from the three color primaries into luminance, where the eye is sensitive to fine detail; only the relatively low-frequency information is used for the chrominance channels. All three systems use the concept of a subcarrier, located in the upper frequency spectrum of luminance, to convey the chrominance information (Fig. 15). All systems use color-difference signals, rather than the color primaries directly, to minimize cross talk with the luminance signal, and all derive the third color signal by subtracting the other two from luminance. The constant luminance principle is applied in all systems, based on the original NTSC picture tube phosphors, so the matrix formulations for luminance and color-difference signals are identical (some recent NTSC encoders use
Figure 14. Typical PAL vectorscope graticule.
equal-bandwidth R − Y and B − Y signals, instead of proportioned-bandwidth I and Q signals). All of the systems use scaling factors to limit excessive subcarrier amplitude (NTSC/PAL) or deviation (SECAM) excursions. Finally, all three systems use an unmodulated subcarrier sample on the back porch of the horizontal sync pulse for reference information in the decoding process. Because of these similarities, conversion of signals between standards for international program distribution is possible. Early standards converters were optical, essentially using a camera of the target standard focused on a picture tube operating at the source standard. Later, especially for color, electronic conversion became practical. The most serious issue in standards conversion involves motion artifacts due to the different field rates between 525- and 625-line systems. Simply dropping or repeating fields and lines creates disturbing discontinuities, so interpolation must be done. In modern units, the composite signals are decoded into components, using up to threedimensional adaptive comb filtering, converted using motion prediction, then reencoded to the new standard. Table 4 lists the transmission and color standards used in various territories throughout the world. Component Analog Video (CAV) The advent of small-format video tape machines that recorded luminance and chrominance on separate tracks led to interest in component interconnection. Increasingly, new equipment decoded and reencoded the composite signal to perform manipulations that would be impossible or would cause significant distortions if done in the composite environment. It was reasoned that if component signals (Y, R − Y, B − Y) could be taken from the camera and if encoding to NTSC, PAL, or SECAM could be done just before transmission, then technical quality would be greatly improved.
1374
TELEVISION BROADCAST TRANSMISSION STANDARDS
Table 3. Principal Characteristics of Color Television Encoding Systems System Display primaries White reference Display gamma Luminance Chrominance signals
Chrominance baseband video preemphasis (kHz) Modulation method
NTSC FCC CIE III C 2.2 EY = +0.30ER − 0.59EG + 0.11EB Q = +0.41(B − Y) + 0.48(R − Y) I = −0.27(B − Y) + 0.74(R − Y)
PAL EBU CIE III D65 2.8
EBU CIE III C 2.8 EY = +0.299ER + 0.587EG + 0.114EB
U = 0.493(B − Y)
DB = +1.505(B − Y)
V = 0.877(R − Y)
DR = −1.902(R − Y)
—
—
Amplitude modulation of two suppressed subcarriers in quadrature
Axes of modulation Chroma BW/Deviation (kHz)
Q = 33 ° , I = 123 ° Q = 620, I = 1, 300
U = 0 ° , V = ±90 ° U + V = 1, 300
Vestigial sideband (kHz)
+620
Composite color video signal (CCVS)
EM = EY + EQ (sin ωt + 33 ° ) + EI (cos ωt + 33 ° ) GSC = EQ2 + EI2
+570 (PAL-B,G,H), +1, 070 (PAL-I), +620 (PAL-M,N) EM = EY + EU sin ωt ± EV cos ωt
Modulated subcarrier amplitude/preemphasis
SC/H frequency relationship
fSC = (455/2)fH
Subcarrier frequency (MHz)
3.579545 ± 10 Hz
Phase/Deviation of SC reference Start of SC reference (µs)
180 °
SC reference width (cycles) SC reference amplitude (mV)
9±1
5.3 ± 0.1
286 (40 IRE ± 4)
SECAM
GSC =
EU2 + EV2
fSC = (1, 135/4)fH + fV /2 (PAL-B, G, H, I) = (909/4)fH (PAL-M) = (917/4)fH + fV /2 (PAL-N) 4.43361875 ± 5 Hz(PAL-B, G, H); ±1 Hz(PAL-I) 3.57561149 ± 10 Hz(PAL-M) 3.58205625 ± 5 Hz(PAL-N) +V = +135 ° , −V = −135 ° 5.6 ± 0.1 (PAL-B, G, H, I, N); 5.2 ± 0.5 (PAL-M) 10 ± 1 (PAL-B, G, H, I); 9 ± 1 (PAL-M, N) 300 ± 30
However, component signal distribution required that some equipment, such as switchers and distribution amplifiers, have three times the circuitry, and interconnection required three times the cable and connections as those of composite systems. This brought about consideration of multiplexed analog component (MAC) standards, whereby the luminance and chrominance signals are time-multiplexed into a single, higher bandwidth signal. No single standard for component signal levels emerged (Table 5), and the idea was not widely popular. Interest soon shifted to the possibility of digital signal distribution.
D∗B = A × DR D∗R = A × DR
1 + j × fB /fR 85 A= /f f B R 1 + j × 255
Frequency modulation of two sequential subcarriers — foB = ±230 + 276/ − 120, foR = ±280 + 70/ − 226 — EM = EY + GSC × cos 2π(foB + D∗ B foB )t + GSC × cos 2π(foR + D∗ B foR )t GSC = D∗B /D∗R × 0.115EY (P − 1 + j(16)F P) × 1 + j(1.26)F f0 fB /fR − (f0 = F= f0 fB /fR 4.286 ± 0.02MHz) foB = 272 fH foR = 282 fH
foB = 4.250000 ± 2 kHz, foR = 4.406250 ± 2 kHz DB = −350 kHz, DR = +350 kHz 5.7 ± 0.3 — DB = 167, DR = 215
Digital Video Devices such as time-base correctors, frame synchronizers, and standards converters process the TV signal in the digital realm but use analog interfaces. The advent of digital video tape recording set standards for signal sampling and quantization to the extent that digital interconnection became practical. Component Digital. The European Broadcasting Union (EBU) and Society of Motion Picture and Television Engineers (SMPTE) coordinated research and conducted
TELEVISION BROADCAST TRANSMISSION STANDARDS
Camera signals
1375
G B R
R, G, B Y, R, B
0
1
2
3
4
5
6 MHz
Baseband primary components Matrixing Y, I, Q Y, U, V Y, PB, PR Y (Luminance) I 2
Q 0
1.5
3 4 5 6 MHz Baseband color-difference components
0.5
Color Subcarrier
Color encoding NTSC PAL SECAM Baseband luminance 0
1
1.5 2
3
CH 2
5
6 MHz
61 1.25
62
4.2 63
4.5 TV channel 3
64
65
demonstrations in search of a component digital video standard that would lend itself to the exchange of programs worldwide. A common data rate of 13.5 Mbps based on line-locked sampling of both 525- and 625line standards was chosen. This allows analog video frequencies of better than 5.5 MHz to be recovered and is an exact harmonic of the scanning line rate for both standards, enabling great commonality in equipment. A common image format of static orthogonal shape is also employed, whereby the sampling instants on every line coincide with those on previous lines and fields and also overlay the samples from previous frames. There are 858 total luminance samples per line for the 525-line system, 864 samples for the 625-line system, but 720 samples during the picture portion for both systems. This image structure facilitates filter design, special effects, compression, and conversion between standards.
Audio
Picture carrier
Color Subcarrier
B, D, G, H, I, K, M, N
60
0.5 3.58 4
Baseband composite
RF transmission
59
Encoded chrominance
66
67 CH 4
MHz
Figure 15. Four-stage color television frequency spectrum, showing the compression of three wideband color-separation signals from the camera through bandwidth limiting and frequency multiplexing into the same channel bandwidth used for black-and-white television.
For studio applications, the color-difference signals are sampled at half the rate of luminance, or 6.75 MHz, cosited with every odd luminance sample, yielding a total data rate of 27 Mbps. This provides additional resolution for the chrominance signals, enabling good special effects keying from color detail. The sampling ratio for luminance and the two chrominance channels is designated ‘‘4 : 2 : 2.’’ Other related ratios are possible (Table 6). Quantization is uniform (not logarithmic) for both luminance and color-difference channels. Eight-bit quantization, providing 256 discrete levels, provides adequate signal-to-noise ratio for videotape applications. However, the 25-pin parallel interface selected can accommodate two extra bits, because 10-bit quantization was foreseen as desirable in the future. Only the active picture information is sampled and quantized, allowing better resolution of the signal amplitude. Sync and blanking are coded by special signals (Figs. 16 and 17).
Table 4. National Television Transmission Standards
Table 4. (Continued)
Territory
Territory
VHF
Eritrea Estonia Ethiopia Faeroe Islands Falkland Islands Fernando Po Fiji Finland France French Guyana French Polynesia Gabon Gambia Georgia Germany Ghana Gibraltar Greece Greenland Grenada Guadeloupe Guam Guatemala Guinea Guinea-Bissau Guyana, Republic of Haiti Honduras Hong Kong Hungary Iceland India Indonesia Iran Iraq Ireland Israel Italy Ivory Coast = Cˆote d’Ivoire Jamaica Japan Johnston Islands Jordan Kampuchea = Cambodia Kazakhstan Kenya Korea, Democracy of (N) Korea, Republic of (S) Kuwait Kyrgyzstan Laos Latvia Lebanon Leeward Islands = Antigua Lesotho Liberia Libya Lichtenstein Lithuania Luxembourg Macao Macedonia Madagascar
B D B B I B M B L K K K B D B B B B B M K M M K I M M M
VHF
UHF
Afars and Isaas = Djibouti Afghanistan D Albania B G Algeria B Andorra B Angola I Antigua and Barbuda M Argentina N Armenia D K Ascension Islands I Australia B B Austria B G Azerbaijan D K Azores B Bahamas M Bahrain B G Bangladesh B Barbados M Belarus D K Belgium B H Benin K Bermuda M Bolivia M M Bosnia and Herzegovina B G Botswana I Brazil M Brunei Darussalam B Bulgaria D K Burkina Faso K Burma = Myanmar Burundi K Cambodia B Cameroon B Canada M M Canary Islands B G Cape Verde Islands I Cayman Islands M Central African Republic K Ceylon = Sri Lanka Chad K Channel Islands I Chile M China D D Colombia M Commonwealth of Independent States: see state Comores K Congo K Costa Rica M Cˆote d’Ivoire K Croatia B G Cuba M Curacao M M Cyprus B G Czech Republic D K Dahomey = Benin Denmark B G Diego Garcia M Djibouti K Dominican Republic M Ecuador M Equtorial Guinea = Fernando Po Egypt B G El Salvador M
Color
PAL PAL PAL PAL PAL NTSC PAL SECAM PAL PAL SECAM PAL NTSC PAL PAL NTSC SECAM PAL SECAM NTSC NTSC PAL PAL PAL PAL P&S SECAM SECAM PAL PAL NTSC PAL PAL NTSC SECAM SECAM PAL NTSC PAL NTSC
SECAM NTSC SECAM PAL NTSC NTSC P&S SP PAL NTSC SECAM NTSC NTSC P&S NTSC 1376
D B B B B B I B B M M M B D B D M B D B D B I B B B D B B K
UHF
K G
G L
K G G G
M
I K G
I G G
M G K
M G K K G
G K G/L I G
Color PAL SECAM PAL PAL PAL PAL NTSC PAL SECAM SECAM SECAM SECAM PAL SECAM PAL PAL PAL SECAM PAL NTSC SECAM NTSC NTSC PAL NTSC NTSC NTSC PAL P&S PAL PAL PAL SECAM SECAM PAL PAL PAL NTSC NTSC NTSC PAL SECAM PAL PAL NTSC PAL SECAM PAL SECAM SECAM PAL PAL SECAM PAL SECAM P P/S PAL PAL SECAM
TELEVISION BROADCAST TRANSMISSION STANDARDS Table 4. (Continued)
Table 4. (Continued) Territory
1377
VHF
Madeira B Malawi B Malaysia B Maldives B Mali K Malta B Martinique K Mauritania B Mauritius B Mayotte K Mexico M Micronesia M Moldova D Monaco L Mongolia D Montserrat M Morocco B Mozambique Myanmar M Namibia I Nepal B Netherlands B Netherlands Antilles M New Caledonia K New Zealand B Nicaragua M Niger K Nigeria B Norway B Oman B Pakistan B Palau M Panama M Papua New Guinea B Paraguay N Peru M Philippines M Poland D Portugal B Puerto Rico M Qatar B Reunion K Romania D Russia D Rwanda K St. Helena I St. Pierre et Miquelon K St. Kitts and Nevis M Samoa (American) M Samoa (Western) B B Sao Tom´e e Princ´ıpe San Andres Islands M San Marino B Saudi Arabia B Senegal K Serbia B Seychelles B Sierra Leone B Singapore B Slovakia B Slovenia B Society Islands = French Polynesia Somalia B
UHF
M K G/L
B
G
G
G G
G M K G M G G/K K K
G G G
G/K G
Color PAL PAL PAL PAL SECAM PAL SECAM SECAM SECAM SECAM NTSC NTSC SECAM S P/S SECAM NTSC SECAM PAL NTSC PAL PAL PAL NTSC SECAM PAL NTSC SECAM PAL PAL PAL PAL NTSC NTSC PAL PAL NTSC NTSC P&S PAL NTSC PAL SECAM PAL SECAM
SECAM NTSC NTSC PAL PAL NTSC PAL S/P S SECAM PAL PAL PAL PAL PAL PAL PAL
Territory
VHF
UHF
South Africa S. West Africa = Namibia Spain Sri Lanka Sudan Suriname Swaziland Sweden Switzerland Syria Tahiti = French Polynesia Taiwan Tajikistan Tanzania Thailand Togo Trinidad and Tobago Tunisia Turks and Caicos Turkey Turkmenistan Uganda Ukraine USSR: see independent state United Arab Emirates United Kingdom United States Upper Volta = Burkina Faso Uruguay Uzbekistan Venezuela Vietnam Virgin Islands Yemen Yugoslavia: see new state Zaire Zambia Zanzibar = Tanzania Zimbabwe
I
I
PAL
B B B M B B B B
G
PAL PAL PAL NTSC PAL PAL PAL P&S
G G G G
Color
M D B B K M B M B D B D
M K I M
K
NTSC SECAM PAL PN SECAM NTSC PS NTSC PAL SECAM PAL SECAM
B
G I M
PAL PAL NTSC
M N D M D/M M B
M G G K
K
PAL SECAM NTSC S/N NTSC PAL
K B
SECAM PAL
B
PAL
These specifications were standardized in ITU-R BT.601 — hence the abbreviated reference, ‘‘601 Video.’’ The first component digital tape machine standard was designated ‘‘D1’’ by SMPTE. This term has become used in place of more correct designations. For wide-screen applications, a 360-Mbps standard scales up the number of sampling points for a 16 : 9 aspect ratio. Interconnecting digital video equipment is vastly simplified by using a serial interface. Originally, an 8/9 block code was devised to facilitate clock recovery by preventing long strings of ones or zeros in the code. This would have resulted in a serial data rate of 243 Mbps. To permit serializing 10-bit data, scrambling is employed, and complementary descrambling is at the receiver. NRZI coding is used, so the fundamental frequency is half the bit rate of 270 MHz. Composite Digital. Time-base correctors for composite 1 in. videotape recorders had been developed that had several lines of storage capability. Some early devices sampled at three times the color subcarrier frequency
1378
TELEVISION BROADCAST TRANSMISSION STANDARDS Table 5. Component Analog Video Format Summary Color Bar Amplitudes (mV)
Format R/G/B/Sa G/B/R Y/I/Q (NTSC) Y/Q/I (MI ) Y/R - Y/B - Y∗ Y/U/V (PAL) Betacam 525 2 CH Y/CTDM Betacam 625 2 CH Y/CTDM MII 525 2 CH Y/CTCM MII 625 2 CH Y/CTCM SMPTE/EBU (Y/PB /PR ) a
Peak Excursions (mV)
Channel 1
Channel 2
Channel 3
100% 75%
100% 75%
100% 75%
+1V/+750 +700/+525 +714/+549 +934/+714 +700/+525 +700/+525 +714/+549 +714/+549 +700/+525 +700/+525 +700/+538 +714/+549 +700/+525 +700/+525 +700/+525
+1V/+750 +700/+525 ±393/ ± 295 ±476/ ± 357 ±491/ ± 368 ±306/ ± 229 ±467/ ± 350 ±467/ ± 350 ±467/ ± 350 ±467/ ± 350 ±324/ ± 243 ±350/ ± 263 ±350/ ± 263 ±350/ ± 263 ±350/ ± 263
+1V/+750 +700/+525 ±345/ ± 259 ±476/ ± 357 ±620/ ± 465 ±430/ ± 323 ±467/ ± 350 ±467/ ± 350 ±324/ ± 243 ±350/ ± 263 ±350/ ± 263
Synchronization Channels/Signals S = −4V G, B, R = −300 Y = −286 Y = −286 Y = −300 Y = −300 Y = −286 Y = ±286 Y = −300 Y = ±300 Y = −300 Y = −286 Y = −300 Y = −300 Y = −300
Setup
I = −600
No No Yes Yes No No Yes
C = −420 No C = −420 Yes C = −650 No C = −650 No
Other levels possible with this generic designation.
Table 6. Sampling Structures for Component Systemsa Sample/Pixel
Sample/Pixel
1
2
3
4
5
1
2
3
4
5
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr
Y Y
YCb Cr YCb Cr
Y Y
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr 4:4:4
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr
Y Y
YCb Cr YCb Cr 4:2:2
Y Y
YCb Cr YCb Cr
Line
Line
Sample/Pixel
Sample/Pixel
1
2
3
4
5
1
2
3
4
5
YCb Cr
Y
Y
Y
YCb Cr
Y
Y
Y
Y
YCb Cr
Y Cb Cr Y
Y
YCb Cr
Y Cb Cr Y
Y Cb Cr Y
Y Cb Cr Y
Y
Y Cb Cr Y 4:2:0
Y
Line
Y
Y
Line YCb Cr
Y
Y
Y
YCb Cr
YCb Cr
Y
Y 4:1:1
Y
YCb Cr
Y
Y
Y Cb Cr Y
a Y = luminance sample; Cb Cr = chrominance samples; YCb Cr = pixels so shown are cosited. Boldfaced type indicates bottom field, if interlaced.
(3fSC ); however, better filter response could be obtained by 4fSC sampling. The sampling instants correspond to peak excursions of the I and Q subcarrier components in NTSC. The total number of samples per scanning line is 910 for NTSC and 1,135 for PAL. To accommodate the 25 Hz offset in PAL, lines 313 and 625 each have 1,137 samples. The active picture portion of a line consists of 768 samples in NTSC and 948 samples in PAL. These specifications are standardized as SMPTE 244M (NTSC) and EBU Tech. 3,280 (PAL).
Unlike component digital, nearly the entire horizontal and vertical blanking intervals are sampled and quantized, which degrades the amplitude resolution (Fig. 18). However, in PAL, no headroom is provided for sync level, and the sampling instants are specified at 45 ° from the peak excursions of the V and U components of the subcarrier (Fig. 19). This allows ‘‘negative headroom’’ in the positive direction. Thus, an improvement of about 0.5 dB in the signal-to-noise ratio is obtained.
10-Bit
8-Bit Waveform location
Voltage level
Decimal Value
Hox Value
Binary Value
Decimal Value
Hox Value
Binary Value
Excluded Excluded
766 .3 mV 763 .9 mV
255
FF
1111 1111
1023 1020
3FF 3FF
11 1111 1111 11 1111 1100
Peak
700 .0 mV
235
EB
1110 1011
940
3AC
11 1010 1100
0 .0 mV
16
10
0001 0000
64
040
00 0100 0000
−48.7 mV −51.1 mV
0
0
0000 0000
3 0
003 000
00 0000 0011 00 0000 0000
Black Excluded Excluded
Figure 16. Quantizing levels for component digital luminance.
10-Bit Decimal Hex Value Value
Waveform location
Voltage level
Decimal Value
8-Bit Hex Value
Binary Value
Excluded Excluded
399.2 mV 396.9 mV
255
FF
1111 1111
1023 1020
3FF 11 1111 1111 3FC 11 1111 1100
Max positive
350.0 mV
240
F0
1111 0000
960
3C0 11 1100 0000
Black
0.0 mV
128
80
1000 0000
512
200
10 0000 0000
Max negative
−350.0 mV
16
10
0001 0000
64
040
00 0100 0000
Excluded Excluded
−397.7 mV −400.0 mV
0
00
0000 0000
3 0
003 000
00 0000 0011 00 0000 0000
Binary Value
Figure 17. Quantizing levels for component digital color difference.
1379
1380
TELEVISION BROADCAST TRANSMISSION STANDARDS
Decimal value
8-Bit Hex value
Binary value
Decimal value
998.7 mV 139.8 994.9 mV 139.3
255
FF
1111 1111
1023 1020
3FF 11 1111 1111 3FC 11 1111 1100
100% Chroma 907.7 mV 131.3
244
F4
1111 0100
975
3CF 11 1100 1111
Waveform location Excluded Excluded
Voltage level
IRE units
10-Bit Hex Binary value value
Peak white
714.3 mV
100
200
C8
1100 1000
800
320
11 0010 0000
Blanking
0.0 mV
0
60
3C
0011 1100
240
0F0
00 1111 0000
Sync tip
−285.7 mV −40
4
04
0000 0100
16
101
00 0001 0000
Excluded Excluded
−302.3 mV −42.3 −306.1 mV −42.9
0
00
0000 0000
3 0
003 000
00 0000 0011 00 0000 0000
Figure 18. Quantizing levels for composite digital NTSC.
Rate conversion between component and composite digital television signals involves different sampling points and quantizing levels. Each conversion degrades the picture because exact levels cannot be reproduced in each pass. An important advantage of digital coding is thereby lost. In addition, decoding composite signals requires filtering to prevent cross-luminance and cross-color effects. This forever removes a part of the information; therefore, this process must be severely limited in its use. Ancillary data may be added to digital component and composite video signals. AES/EBU-encoded digital audio can be multiplexed into the serial bit stream. Four channels are possible in the composite format, and 16 channels are possible in component digital video. Component Video Standards The video signals from a camera before encoding to NTSC, PAL, SECAM, or the ATSC Digital Standard are normally green (G), blue (B), and red (R). These are described as component signals because they are parts or components of the whole video signal. It has been found more efficient of bandwidth use for distribution and sometimes for processing to convert these signals into a luminance signal (Y) and two color-difference signals, blue minus luminance (B − Y) and red minus luminance (R − Y), where the color difference signals use 1/2 or 1/4 of the bandwidth of the luminance signal. The SMPTE/EBU Standard N10 adopted has a uniform signal specification for all 525/60 and 625/50 television systems. When the color-difference signals in this standard, are digitally formatted, they are
termed Cb and Cr , respectively. At the same time, due to the lower sensitivity of the human eye to fine detail in color, it is possible to reduce the bandwidth of the color-difference signals compared to that of the luminance signal. When these signals are digitized according to International Telecommunication Union, Radiocommunication Sector, (ITU-R) Recommendation 601, for both 525/60 and 625/50 systems, several modes of transmission may be used, all based on multiples of a 3.75 MHz sampling rate. For the ATSC standard, 4 : 2 : 0 is used (see below and the ATSC digital television standard). Either eight, or more frequently, 10 bits per sample are used. 4 : 4 : 4 Mode. The G, B, R or Y, Cb , Cr signal at an equal sampling rate of 13.5 MHz for each channel is termed the 4 : 4 : 4 mode of operation, and it yields 720 active samples per line for both 525/60 and 625/50 standards. This mode is frequently used for postproduction. If a (full-bandwidth) key signal must also be carried with the video, this combination is known as a 4 : 4 : 4 : 4 signal. 4 : 2 : 2 Mode. The 4 : 2 : 2 mode is more frequently used for distribution, where Y is sampled at 13.5 MHz, and the color-difference signals are sampled at a 6.25 MHz rate, corresponding to 360 active samples per line. 4 : 1 : 1 Mode. The 4 : 1 : 1 mode is used where bandwidth is at a premium, and the color-difference signals are each sampled at a 3.75 MHz rate, corresponding to 180 samples per line.
1381
1
0
−300.0 mV
−301.2 mV −304.8 mV
Excluded Excluded 0000 0000
0000 0001
0100 0000
3 0
4
256
844
3E8
003 000
004
100
34C
3E8
3FF 3FC
1023 1020
Binary value
00 0000 0011 00 0000 0000
00 0000 0010
01 0000 0000
11 0100 1100
11 1110 1000
11 1111 1111 11 1111 1100
10-Bit Hex value
Decimal value
Figure 19. Quantizing levels for composite digital PAL.
00
01
40
1101 0011
D3
211
Sync tip
700.0 mV
Peak white
1111 1010
F4
250
64
886.2 mV
100% Chroma Highest sample
Binary value
1111 1111
8-Bit Hex value
FF
255
0.0 mV
913.1 mV 909.5 mV
Excluded Excluded
Decimal value
Blanking
Voltage level
Waveform location
1382
TELEVISION BROADCAST TRANSMISSION STANDARDS
4 : 2 : 0 Mode. A further alternative, 4 : 2 : 0 mode, whose structure is not self evident, is derived from a 4 : 2 : 2 sampling structure but reduces the vertical resolution of the color-difference information by 2 : 1 to match the reduced color-difference horizontal resolution. Four line (and field sequential if interlaced) cosited Cb , Cr samples are vertically interpolated weighted toward the closest samples, and the resultant sample is located in between two adjacent scanning lines. This mode is used in MPEG bit-rate reduced digital signal distribution formats, and hence in the ATSC digital television standard. These four modes are illustrated in Table 6. ADVANCED TELEVISION SYSTEMS, CURRENT AND FUTURE ATSC Digital Television Standard Overview. From 1987 to 1995, the Advisory Committee on Advanced Television Service (ACATS) to the Federal Communications Commission, with support from Canada and Mexico, developed a recommendation for an Advanced Television Service for North America. The ACATS enlisted the cooperation of the best minds in the television industry, manufacturing, broadcasting, cable industry, film industry, and federal regulators, in its organization to develop an advanced television system that would produce a substantial improvement in video images and in audio performance over the existing NTSC, 525-line system. The primary video goal was at least a doubling of horizontal and vertical resolution and a widening in picture aspect ratio from the current 4 (W) × 3 (H) to 16 (W) × 9 (H), and this was named ‘‘high-definition television.’’ Also included was a digital audio system consisting of five channels plus a low-frequency channel (5.1). Twenty-one proposals were made for terrestrial transmission systems for extended-definition television (EDTV) or high-definition television (HDTV), using varying amounts of the RF spectrum. Some systems augmented the existing NTSC system by an additional channel of 3 MHz or 6 MHz, some used a separate simulcast channel of 6 MHz or 9 MHz bandwidth, and all of the early systems used hybrid analog/digital technology in signal processing by an analog RF transmission system. Later proposals changed the RF transmission system to digital along with all-digital signal processing. It was also decided that the signal would be transmitted in a 6 MHz RF channel, one for each current broadcaster of the (6 MHz channel) NTSC system, and that this new channel would eventually replace the NTSC channels. The additional channels were created within the existing UHF spectrum by improved design of TV receivers, so that the previously taboo channels, of which there were many, could now be used. In parallel with this effort, the Advanced Television Systems Committee (ATSC) documented and developed the standard known as the ATSC Digital Television Standard, and it is subsequently developing related implementation standards. In countries currently using 625-line, 4 : 3 aspect ratio television systems, plans are being developed to use a 1,250-line, 16 : 9 aspect ratio system eventually, and
the ITU-R has worked successfully to harmonize and provide interoperability between the ATSC and 1,250-line systems. Figure 20 shows the choices by which the signals of the various television standards will reach the consumer. Other articles detail satellite, cable TV, and asynchronous transfer mode (ATM) common carrier networks. The ATSC and the ITU-R have agreed on a digital terrestrial broadcasting model, which is shown in Fig. 21. Video and audio sources are coded and compressed in separate video and audio subsystems. The compressed video and audio are then combined with ancillary data and control data in a service multiplex and transport, in which form the combined signals are distributed to the terrestrial transmitter. The signal is then channel-coded, modulated, and fed at appropriate power to the transmission antenna. The receiver reverses the process, demodulates the RF signal to the transport stream, then demultiplexes the audio, video, ancillary, and control data into their separate but compressed modes, and the individual subsystems then decompress the bit streams into video and audio signals that are fed to display screen and speakers, and the ancillary and control data are used if and as appropriate within the receiver. Information Service Multiplex and Transport System. These subsystems provide the foundation for the digital communication system. The raw digital data are first formatted into elementary bit streams, representing image data, sound data, and ancillary data. The elementary bit streams are then formed into manageable packets of information (packetized elementary stream, PES), and a mechanism is provided to indicate the start of a packet (synchronization) and assign an appropriate identification code (packet identifier, PID) within a header to each packet. The packetized data are then multiplexed into a program transport stream that contains all of the information for a single (television) program. Multiple program transport streams may then be multiplexed to form a system level multiplex transport stream. Figure 22 illustrates the functions of the multiplex and transport system and shows its location between the application (e.g., audio or video) encoding function and the transmission subsystem. The transport and demultiplex subsystem functions in the receiver in the reverse manner, being situated between the RF modem and the individual application decoders.
Fixed-Length Packets. The transport system employs the fixed-length transportation stream packetization approach defined by the Moving Picture Experts Group (MPEG), which is well suited to the needs of terrestrial broadcast and cable television transmission of digital television. The use of moderately long, fixed-length packets matches well with the needs for error protection, and it provides great flexibility for initial needs of the service to multiplex audio, video, and data, while providing backward compatibility for the future and maximum interoperability with other media (MPEG-based). Packet Identifier. The use of a PID in each packet header to identify the bit stream makes it possible to have a mix
TELEVISION BROADCAST TRANSMISSION STANDARDS
Standard TV program
Wide-screen television program
Standard TV encoder
Wide-screen television encoder
1383
HDTV program Program sources
HDTV encoder
(Service multiplex and transport) Broadcaster program interface
MPEG-2 packets
Broadcaster distribution interface (physical (modulation) layer)
Terrestrial modulator
Satellite modulator
Cable modulator
Terrestrial services
Satellite services
Cable services
Antenna, tuner, and demodulator
Dish, tuner, and demodulator
Switched Switched network network distribution distribution ATM packets Physical delivery (disks, tapes)
Converted and demodulator Local hub
Consumer interface
Standard TV receiver
HDTV receiver
Consumer recorder
Figure 20. Television service model.
Video
Video subsystem Video source coding and compression
Service multiplex and transport
RF/transmission system Channel conding
Transport
Audio
Audio subsystem Audio source coding and compression
Service multiplex Modulation
Ancillary data Control data
Receiver characteristics
of audio, video, and auxiliary data that is not specified in advance.
Scalability and Extensibility. The transport format is scalable in that more elementary bit streams may be added at the input of the multiplexer or at a second multiplexer. Extensibility for the future could be achieved without hardware modification by assigning new PIDs for additional elementary bit streams.
Figure 21. Block diagram showing ATSC and ITU-R terrestrial television broadcasting model.
Robustness. After detecting errors during transmission, the data bit stream is recovered starting from the first good packet. This approach ensures that recovery is independent of the properties of each elementary bit stream. Transport Packet Format. The data transport packet format, shown in Fig. 23, is based on fixed-length packets (188 bytes) identified by a variable-length header, including a sync byte and the PID. Each header identifies a
TELEVISION BROADCAST TRANSMISSION STANDARDS
Sources for encoding (video, audio, data, etc.)
Application encoders
Transmitter
PID1 Elementary stream 1 (Video?) PID2 Elementary stream 2 (Audio 1?) PID3 Elementary stream 3 (Audio 2?) . . PID(n − 1) Elementary stream n − 1 (Data i) PIDn Elementary stream n (Data j) PID(n + 1) Elementary stream map (program_map_table)
Multiplexer
Program transport stream 1
System level
Program transport stream 2 Program transport stream 3 Program transport stream 4
multi-
. . .
plex Multiplexer
Modem
Program transport stream 5 Program stream map
PID = 0
Clock
. . .
Transport depacketization and demultiplexing
Application encoders
Presentation
Receiver
Elementary
Program transport stream 1 Transport bit stream with error signaling
bit streams
System level
Program transport stream 2 Program transport stream 3 Program transport stream 4
format
(program_association_table) Transmission
1384
multi-
. . .
plex Demultiplexer
Modem
Program transport stream 5
with error signaling
Clock Clock control
Figure 22. Organization of functionality within a transport system for digital TV programs.
188 bytes 4 bytes "Link" header
Variablelength adaptation header Payload (Not to scale)
Figure 23. Transport packet format.
particular application bit stream (elementary bit stream) that forms the payload of the packet. Applications include audio, video, auxiliary data, program and system control information, and so on.
the (fixed-length) transport packet layer. New PES packets always start a new transport packet, and stuffing bytes (i.e., null bytes) are used to fill partially filled transport packets.
PES Packet Format. The elementary bit streams themselves are wrapped in a variable-length packet structure called the packetized elementary stream (PES) before transport processing. (Fig. 24). Each PES packet for a particular elementary bit stream then occupies a variable number of transport packets, and data from the various elementary bit streams are interleaved with each other at
Channel Capacity Allocation. The entire channel capacity can be reallocated to meet immediate service needs. As an example, ancillary data can be assigned fixed amounts depending on a decision as to how much to allocate to video; or if the data transmission time is not critical, then it can be sent as opportunistic data during periods when the video channel is not fully loaded.
Variable length
Figure 24. Structural overview of packetized elementary stream (PES) packet.
3 bytes Packet start code prefix
2 2 bytes bits 14 bits PES PES Stream packet 10 packet ID length flags 1 byte
1 byte PES packet length
PES header PES packet data block fields
TELEVISION BROADCAST TRANSMISSION STANDARDS
Audio
Audio
Video
1385
Video
Packetized elementary stream (PES) Audio Video Audio Video Video Video Audio Video Audio Audio Video Video Video Audio Video Figure 25. Variable-length PES packets and fixed-length transport packets.
Transport stream
Figure 25 illustrates how the variable-length PES packets relate to the fixed-length transport packets. The transport system provides other features, including decoder synchronization, conditional access, and local program insertion. Issues relating to the storage and playback of programs are also addressed, and the appropriate hooks are provided to support the design of consumer digital products based on recording and playback of these bitstreams, including the use of ‘‘trick modes’’ such as slow motion and still frame, typical of current analog video cassette recorders (VCRs).
Local Program Insertion. This feature is extremely important to permit local broadcast stations to insert video, audio, or data unique to that station. As shown in Fig. 26 to splice local programs, it is necessary to extract (by demultiplexing) the transport packets, identified by the PIDs of the individual elementary bit streams, which make up the program that is to be replaced, including the program map table, which identifies the individual bit streams that make up the program. Program insertion can then take place on an individual PID basis, using the fixed-length transport packets.
Input program transport stream
Program_map_PID Program_map_ table update
Elementar y bit streams
Pr og ra m PI _m D
ap _
Source of program to be spliced in
Splicing operation Splicing operation
Flow through
Output program transport stream
Elementary bit stream termination Figure 26. Example of program insertion architecture.
Presentation Time Stamp and Decoding Time Stamp. Both of these time stamps occur within the header of the PES packet, and they are used to determine when the data within the packet should be read out of the decoder. This process ensures the correct relative timing of the various elementary streams at the decoder relative to the timing at which they were encoded. Interoperability with ATM. The MPEG-2 transport packet size (188 bytes) is such that it can easily be partitioned for transfer in a link layer that supports asynchronous transfer mode (ATM) transmission (53 bytes per cell). The MPEG-2 transport layer solves MPEG2 presentation problems and performs the multimedia multiplexing function, and the ATM layer solves switching and network adaptation problems. Video Systems
Compressed Video. Compression in a digital HDTV system is required because the bit rate required for an uncompressed HDTV signal approximates 1 Gbps (when the luminance/chrominance sampling is already compressed to the 4 : 2 : 2 mode). The total transmitted data rate over a 6 MHz channel in the ATSC digital television standard is approximately 19.4 Mbps. Therefore, a compression ratio of 50 : 1 or greater is required. The ATSC Digital Television Standard specifies video compression using a combination of compression techniques which, for compatibility, conform to the algorithms of MPEG-2 Main Profile, High Level. The goal of the compression and decompression process is to produce an approximate version of the original image sequence, such that the reconstructed approximation is imperceptibly different from the original for most viewers, for most images, and for most of the time. Production Formats. A range of production format video inputs may be used. These include the current NTSC format of 483 active lines, 720 active samples/line, 60 fields, 2 : 1 interlaced scan (60I); the Standard Definition format of 480 active lines, 720 active samples/line, 60 frames progressive scan (60P); and high definition formats of 720 active lines, 1,280 active samples/line, 60P, or 1,080 active lines, 1,920 active samples/line, 60I. Compression Formats. A large range of 18 compression formats is included to accommodate all of the those production formats. The 30P and 24P formats are included primarily to provide efficient transmission of film images associated with these production formats. The VGA Graphics format is also included at 480 lines and 640 pixels
1386
TELEVISION BROADCAST TRANSMISSION STANDARDS Table 7. ATSC Compression Formats: A Hierarchy of Pixels and Bitsa
Active Lines
Pixels per Line
1,080 1,920 720 1,280 480 704 480 640 Vertical Horizontal Resolution a
Total Pixels per Frame 2,073,600 921,600 337,920 307,200
Uncompressed Payload Bit Rate in Mbps (8-bit 4 : 2 : 2 sampling) at Picture (Frame) Rate 60P
60I
Future 885 324 295 Higher
995 — 162 148
30P
995 442 162 148 ← → Temporal Resolution
24P 796 334 130 118 Lower
Aspect Ratio and Notes 16 : 9 only 16 : 9 only 16 : 9&4 : 3 4 : 3 only (VGA)
Data courtesy of Patrick Griffis, Panasonic, NAB, 1998.
(see later for pixel definition). Details of these compression formats are found in Table 7. Colorimetry. The Digital Television Standard specifies SMPTE 274M colorimetry (same as ITU-R BT.709, 1990) as the default and preferred colorimetry. This defines the color primaries, transfer characteristics, and matrix coefficients. Sample Precision. After preprocessing, the various luminance and chrominance samples will typically be represented using 8 bits per sample of each component. Film Mode. In the case of 24 fps film which is sent at 60 Hz rate using a 3 : 2 pull-down operation, the processor may detect the sequences of three nearly identical pictures followed by two nearly identical pictures and may encode only the 24 unique pictures per second that existed in the original film sequence. This avoids sending redundant information and permits higher quality transmission. The processor may detect similar sequencing for 30 fps film and may encode only the 30 unique pictures per second. Color Component Separation and Processing. The input video source to the video compression system is in the form of RGB components matrixed into luminance (Y) (intensity or black-and-white picture) and chrominance (Cb and Cr ) color-difference components, using a linear transformation. The Y, Cb , and Cr signals have less correlation with each other than R, G, and B and are thus easier to code. The human visual system is less sensitive to high frequencies in the chrominance components than in the luminance components. The chrominance components are low-pass-filtered and subsampled by a factor of 2 in both horizontal and vertical dimensions (4 : 2 : 0 mode) (see section entitled Component Video Standards). Representation of Picture Data. Digital television uses digital representation of the image data. The process of digitization involves sampling the analog signals and their components in a sequence corresponding to the scanning raster of the television format and representing each sample by a digital code. Pixels. The individual samples of digital data are referred to as picture elements, ‘‘pixels’’ or ‘‘pels.’’ When
the ratio of active pixels per line to active pixels per frame is the same as the aspect ratio, the format is said to have ‘‘square pixels.’’ The term refers to the spacing of samples, not the shape of the pixel. Blocks, Macroblocks, and Slices. For further processing, pixels are organized into 8 × 8 blocks, representing either luminance or chrominance information. Macroblocks consist of four blocks of luminance (Y) and one each of Cb and Cr . Slices consist of one or more macroblocks in the same row, and they begin with a slice start code. The number of slices affects compression efficiency; a larger number of slices provides for better error recovery but uses bits that could otherwise be used to improve picture quality. The slice is the minimum unit for resynchronization after an error. Removal of Temporal Information Redundancy: Motion Estimation and Compensation. A video sequence is a series of still pictures shown in rapid succession to give the impression of continuous motion. This usually results in much temporal redundancy (picture sameness) among adjacent pictures. Motion compensation attempts to delete this temporal redundancy from the information transmitted. In the standard, the current picture is predicted from the previously encoded picture by estimating the motion between the two adjacent pictures and compensating for the motion. This ‘‘motioncompensated residual’’ is encoded rather than the complete picture and eliminates repetition of the redundant information.
Pictures, Groups of Pictures, and Sequences. The primary coding unit of a video sequence is the individual video frame or picture, which consists of the collection of slices constituting the active picture area. A video sequence consists of one or more consecutive pictures, and it commences with a sequence header that can serve as an entry point. One or more pictures or frames in sequence may be combined into a group of pictures (GOP), optional within MPEG-2 and the ATSC Standard, to provide boundaries for interpicture coding and registration of a time code. Figure 27 illustrates a time sequence of video frames consisting of intracoded pictures (I-frames), predictive
TELEVISION BROADCAST TRANSMISSION STANDARDS Group of picutres
Encoding and transmission order 100 99 102 101 104 Frame 103 B I B P B P B P B I type 106 105 Source and 108 99 100 101 102 103 104 105 106 107 108 display order 107
Block (8 pels × 8 lines)
1
2
3
4
4:2:0 5
Cb y
6
Cr slice
Macroblock
Picture (frame)
Group of pictures (GOP) Video sequence
Figure 27. Video frame order, group of pictures, and typical I-frames, P-frames, and B-frames.
Figure 28. Video structure hierarchy.
coded pictures (P-frames), and bidirectionally predictive coded pictures (B-frames). (a)
I-, P-, and B-Frames. Frames that do not use any interframe coding are referred to as I-frames (where I denotes intraframe coded). All of the information for a complete image is contained within an I-frame, and the image can be displayed without reference to any other frame. (The preceding frames may not be present or complete for initialization or acquisition, and the preceding or following frames may not be present or complete when noncorrectable channel errors occur.) P-frames (where P denotes predicted) are frames where the temporal prediction is only in the forward direction (formed only from pixels in the most recently decoded I- or P-frame). Interframe coding techniques improve the overall compression efficiency and picture quality. P-frames may include portions that are only intraframecoded. B-frames (where B denotes bidirectionally predicted) include prediction from a future frame as well as from a previous frame (always I- or P-frames). Some of the consequences of using future frames in the prediction are as follows: The transmission order of frames is different from the displayed order of frames, and the encoder and decoder must reorder the video frames, thus increasing the total latency. B-frames are used for increasing compression efficiency and perceived picture quality. Figure 28 illustrates the components of pictures, as discussed before. Removal of Spatial Information Redundancy: The Discrete Cosine Transform. As shown in Fig. 29, 8 × 8 blocks of spatial intensity that show variations of luminance and chrominance pel information are converted into 8 × 8 arrays of coefficients relating to the spatial frequency content of the original intensity information. The transformation method used is the discrete cosine transform (DCT). As an example, in Fig. 29a, an 8 × 8 pel array representing a black-to-white transition is shown as increasing levels of a gray scale. In Fig. 29b, the grayscale steps have been digitized and are represented by pel amplitude numerical values. In Fig. 29c, the grayscale block is represented by its frequency transformation
1387
(b)
0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 DCT-DC component
(c)
43.8 −40 0 −4.1 0 −1.1 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Figure 29. Discrete cosine transform.
1388
TELEVISION BROADCAST TRANSMISSION STANDARDS
coefficients, appropriately scaled. The DCT compacts most of the energy into only a small number of the transform coefficients. To achieve a higher decorrelation of the picture content, two-dimensional (along two axes) DCT coding is applied. The (0,0) array position (top left), represents the DC coefficient or average value of the array. Quantizing the Coefficients. The goal of video compression is to maximize the video quality for a given bit rate. Quantization is a process of dividing the coefficients by a value of N which is greater than 1, and rounding the answer to the nearest integral value. This allows scaling the coefficient values according to their importance in the overall image. Thus high-resolution detail to which the human eye is less sensitive may be more heavily scaled (coarsely coded). The quantizer may also include a dead zone (enlarged interval around zero) to core to zero small noise-like perturbations of the element value. Quantization in the compression algorithm is a lossy step (information is discarded that cannot be recovered). Variable Length Coding, Codeword Assignment. The quantized values could be represented using fixed-length codewords. However, greater efficiency can be achieved in bit rate by employing what is known as entropy coding. This attempts to exploit the statistical properties of the signal to be encoded. It is possible to assign a shorter code word to those values that occur more frequently and a longer code word to those that occur less frequently. The Morse code is an example of this method. One optimal code word design method, the Huffman code, is used in the Standard. Note that many zero value coefficients are produced, and these may be prioritized into long runs of zeros by zigzag scanning or a similar method. Channel Buffer. Motion compensation, adaptive quantization, and variable-length coding produce highly variable amounts of compressed video data as a function of time. A buffer is used to regulate the variableinput bit rate into a fixed-output bit rate for transmission. The fullness of the buffer is controlled by adjusting the amount of quantization error in each image block (a rate controller driven by a buffer state sensor adjusts the quantization level). Buffer size is constrained by maximum tolerable delay through the system and by cost. Audio System Overview. The audio subsystem used in the ATSC Digital Television Standard is based on the AC-3 digital audio compression standard. The subsystem can encode from one to six channels of source audio from a pulse-code modulation (PCM) representation (requiring 5.184 Mbps for the 5.1 channel mode) into a serial bit stream at a normal data rate of 384 kbps. The 5.1 channels are left (front), center (front), right (front), left surround (rear), right surround (rear) (all 3 Hz to 20 kHz), and lowfrequency subwoofer (normally placed centrally) (which
represents the 0.1 channel, 3 Hz to 120 Hz). The system conveys digital audio sampled at a frequency of 48 kHz, locked to the 27 MHz system clock. In addition to the 5.1 channel input, monophonic and stereophonic inputs and outputs can be handled. Monophonic and stereophonic outputs can also be derived from a 5.1 channel input, permitting backward compatibility. The audio subsystem, illustrated in Fig. 30, comprises the audio encoding/decoding function and resides between the audio inputs/outputs and the transport system. The audio encoder(s) is (are) responsible for generating the audio elementary stream(s), which are encoded representations of the baseband audio input signals. The transport subsystem packetizes the audio data into PES packets which are then further packetized into (fixedlength) transport packets. The transmission subsystem converts the transport packets into a modulated RF signal for transmission to the receiver. Transport system flexibility allows transmitting multiple audio elementary streams. The encoding, packetization, and modulation process is reversed in the receiver to produce reconstructed audio. Audio Compression. Two mechanisms are available for reducing the bit rate of sound signals. The first uses statistical correlation to remove redundancy from the bit stream. The second uses the psychoacoustical characteristics of the human hearing system such as spectral and temporal masking to reduce the number of bits required to recreate the original sounds. The audio compression system consists of three basic operations, as shown in Fig. 31. In the first stage, the representation of the audio signal is changed from the time domain to the frequency domain, which is more efficient, to perform psychoacoustical audio compression. The frequency domain coefficients may be coarsely quantized because the resulting quantizing noise will be at the same frequency as the audio signal, and relatively low signal-tonoise ratios (SNRs) are acceptable due to the phenomena of psychoacoustic masking. The bit allocation operation determines the actual SNR acceptable for each individual frequency coefficient. Finally, the frequency coefficients are coarsely quantized to the necessary precision and formatted into the audio elementary stream. The basic unit of encoded audio is the AC-3 sync frame, which represents six audio blocks of 256 frequency coefficient samples (derived from 512 time samples), a total of 1,536 samples. The AC-3 bit stream is a sequence of AC-3 sync frames. Additional Audio Services. Additional features are provided by the AC-3 subsystem. These include loudness normalization, dynamic range compression that has an override for the listener, and several associated services; dialogue, commentary, emergency, voice-over, help for the visually impaired and hearing-impaired (captioning), and multiple languages. Some of these services are mutually exclusive, and multilanguage service requires up to an extra full 5.1 channel service for each language (up to an additional 384 kbps).
TELEVISION BROADCAST TRANSMISSION STANDARDS
1389
8VSB RF transmission Left
Transport subsystem
Right Center Left surround Right surround
Signal encoder
Modulator
Transport packets
Elementary stream(s)
Modulated signal
LFE
Channel
Left Right Center Left surround Right surround
Transport subsystem Signal decoder
Elementary stream(s)
Demodulator
8VSB RF reception
Transport packets
LFE
Frequency coefficients Audio source
Analysis filter bank
Figure 30. Audio subsystem within the digital television system.
Elementary bit stream Bit allocation
Quantization
Channel
Reconstructed audio
Synthesis filter bank
Dequantization
Bit allocation
Reconstructed frequency coefficients
Ancillary Data Services Several data services have been included in the ATSC Standard. Other services can be added in the future. Currently, program subtitles (similar to closed captioning in NTSC), emergency messages (mixed into baseband video in NTSC), and program guide information are included. Possible Future Data Services. Information data related to the following may be desired: conditional access, picture structure, colorimetry, scene changes, local program insertion, field/frame rate and film pull-down, pan/scan, multiprogram, and stereoscopic image. Transmission Characteristics The transmission subsystem uses a vestigial sideband (VSB) method: (1) 8-VSB for simulcast terrestrial
Elementary bit stream
Figure 31. Overview of audio compression system.
broadcast mode and (2) a 16-VSB high data rate mode. VSB includes a small part of the lower sideband and the full upper sideband. Sloped filtering at the transmitter and/or the receiver attenuates the lower end of the band. The 8-VSB coding maps three bits into one of eight signal levels. The system uses a symbol rate of 10.76 Msymbols/s, capable of supporting a data stream payload of 19.39 MBits/s. See Fig. 32 VSB in a 6 MHz channel. Modulation techniques for some other planned broadcast systems use orthogonal frequency division multiplexing (OFDM) or coded OFDM (COFDM), which is a form of multicarrier modulation where the carrier spacing is selected, so that each subcarrier within the channel is orthogonal to the other subcarriers; this mathematically ensures that during the sampling time for one carrier, all other carriers are at a zero point.
1390
TELEVISION BROADCAST TRANSMISSION STANDARDS
1.0 0.7 Pilot, suppressed carrier
0.5
0
d d
d = 0.31 MHz 5.38 MHz 6.00 MHz
d d
Figure 32. Vestigial sideband (VSB) in a 6 MHz channel for digital transmission.
The 8-VSB subsystem takes advantage of a pilot, segment sync, and a training sequence for robust acquisition and operation. To maximize service area, an NTSC rejection filter (in the receiver) and trellis coding are used. The system can operate in a signal-to-additivewhite-Gaussian noise (S/N) environment of 14.9 dB. The transient peak power to average power ratio measured on a low-power transmitted signal that has no nonlinearities is no more than 6.3 dB for 99.9% of the time. A block diagram of a generic transmitter subsystem is shown in Fig. 33. The incoming data (19.39 MB/s) are randomized and then processed for forward error correction (FEC) in the form of Reed–Solomon coding (20 RS parity bits are added to each packet, known as outer error correction). Data interleaving to reorganize the data stream so that it is less vulnerable to bursts of errors, then interleaves to a depth of about 1/6 data field (4 ms deep). The second stage, called inner error correction, consists of a 2/3 rate trellis coding. This encodes one bit of a two-bit pair into two output bits, using a 1/2 convolutional code, whereas the other input bit is retained as precoded. Along with the trellis encoder, the data packets are precoded into data frames and mapped into a signaling waveform using an eight-level (3 bit), one-dimensional constellation (8 VSB). Data segment sync (4 symbols = 1 byte) at the beginning of a segment of 828 data plus parity symbols, and data field sync at the beginning of a data field of 313 segments (24.2 ms), are then added. data field sync includes the training signal used for setting the receiver equalizer.
A small in-phase pilot is then added to the data signal at a power of 11.3 dB below the average data signal power. The data are then modulated onto an IF carrier, which is the same frequency for all channels. The RF up-converter then translates the filtered, flat IF data signal spectrum to the desired RF channel. It is then amplified to the appropriate power for the transmitting antenna. For the same approximate coverage as an NTSC transmitter (at the same frequency), the average power of the ATV signal is approximately 12 dB less than the NTSC peak sync power. The frequency of the RF up-converter oscillator will typically be the same as that for NTSC (except for offsets). For extreme cochannel situations, precise RF carrier frequency offsets with respect to the NTSC cochannel carrier may be used to reduce interference into the ATV signal. The ATV signal is noise-like, and its interference into NTSC does not change with precise offset. The ATV cochannel pilot should be offset in the RF upconverter from the dominant NTSC picture carrier by an odd multiple of half the data segment rate. An additional offset of 0, +10 kHz, or −10 kHz is required to track the principal NTSC interferer. For ATV-into-ATV cochannel interference, precise carrier offset prevents the adaptive equalizer from misinterpreting the interference as a ghost. The Japanese High-Definition Television Production System This television production system was developed by the Japanese Broadcasting Corporation (NHK). It was standardized in 1987 by the Broadcast Technology Association (BTA), now renamed the Association of Radio Industries and Business (ARIB), in Japan and in the United States by SMPTE (240M and 260M Standards). It uses a total of 1,125 lines (1,035 active lines), is interlaced at a field rate of 60 Hz, and has an aspect ratio of 16 : 9. It requires a bandwidth of 30 MHz for the luminance signal (Y), and 15 MHz for each of the two color difference signals PB and PR When digitized at eight bits per sample, it uses 1,920 pixels per line, and it requires a total bit rate of 1.2 Gbps. Note that this production system is similar to the interlaced system used in the ATSC standard, except that the latter uses 1,080 active lines.
Transport layer interface
Transport data Stream
Inner error correction
Outer error correction
Data interleaver
Mapper Mux
(Optional)
Synchronization signal
Preequalizer filter
Figure 33. 8-VSB block diagram.
transmitter
subsystem (Optional)
Modulator
Pilot insertion
RF up-converter
Amplifier
Pre-coder
TELEVISION BROADCAST TRANSMISSION STANDARDS
Japanese MUSE Transmission Systems
compared to 525I, and the aspect ratio has been changed from 4 : 3 to 16 : 9. This increase of sampling frequency permits maintaining comparable resolution in H and V axes. The production system is effectively an 8 : 4 : 4 digital system that has production interfaces at 540 Mbps. A 4 : 2 : 0 system can also be used in production and would require interfacing at 360 Mbps. Horizontal blanking is shrunk to achieve this bit rate. The EDTV-II analog transmission system is used for both terrestrial and satellite broadcasting. It requires the same bandwidth as the NTSC system, and no changes are needed in transmitter implementations. The image is displayed on an EDTV-II receiver, progressively by 480 lines and a 16 : 9 aspect ratio. It is compatible with existing NTSC receivers, except that the display image has a 16 : 9 aspect ratio and so appears in a letterbox format that has black bars at top and bottom. The 525P signal requires a video bandwidth of approximately 6.2 MHz. The EDTV-II system creates three enhancement signals in addition to an NTSC signal, with which they are then frequency-domain-multiplexed.
A range of transmission systems were developed by NHK based on the multiple sub-Nyquist encoding (MUSE) transmission scheme (see Table 8). MUSE (8.1 MHz bandwidth) was developed for DBS broadcasting and MUSE-T (16.2 MHz bandwidth) was developed for satellite transmission. MUSE-6 was designed to be compatible with a 6 MHz channel and NTSC receivers. MUSE-9 uses a 3 MHz augmentation channel in addition to the standard 6 MHz channel and is NTSC receivercompatible. Japanese Hi-Vision System This system incorporates the 1,920 × 1,035 television production system and the MUSE-E transmission system. MUSE-E uses an 8.1 MHz bandwidth and is incompatible with standard NTSC receivers and channel allocations. Four audio channels are time-division-multiplexed with the video signals in the blanking intervals. The encoding and decoding processes are both very complex and require many very large scale integration (VLSI) chips. This system requires a MUSE-E receiver, or a set-top box equipped with a MUSE decoder that feeds either a 16 : 9 display or a 4 : 3 aspect ratio conventional receiver. In the near-term, NHK will use simultaneous Hi-Vision/NTSC program production. The MUSE systems are not receiver-compatible with either the North-American ATSC system or the European DVB system (see later).
Main Picture (MP). The 525P 16 : 9 signal is reduced from 6.2 MHz to 4.2 MHz bandwidth, and the 480 lines are decimated to 360 lines to produce a letterbox display on the NTSC 4 : 3 receiver. Black bars at top and bottom are each 60 lines wide. Thus, horizontal and vertical resolution are reduced to conform to the NTSC format, but to maintain the 16 : 9 aspect ratio. Horizontal High (HH 4.2 MHz to 6.2 MHz). A frequency enhancement signal is extracted from the original 525P image and is multiplexed into the MP signal to increase the horizontal bandwidth to 6.2 MHz in the EDTV-II receiver. For transmission, the HH signal is downshifted to 2 to 4 MHz and frequency-division-multiplexed into an unused vertical temporal frequency domain in the conventional NTSC system called the Fukinuki hole. The Fukinuki hole may be used only for correlated video information, which applies in this case. In the EDTV-II receiver, a motion detector multiplexes the HH signal only onto the still parts of the picture where there is more need for high resolution to satisfy human vision characteristics. Two enhancement signals are frequency-division-multiplexed together into the top
The Japanese Enhanced Definition Television System (EDTV-II) EDTV-II is an NTSC-compatible letterbox analog transmission system standardized by the ARIB in Japan. The input signal is a 525-line, 60-frame progressive scan (525P) that has a 16 : 9 aspect ratio. A 525-line, 30-frame interlaced scan (525I) can be up-converted as an input signal. Note that the 525P signal is one of the SDTV signal formats defined in the ATSC Standard (720 × 480 at 60 P). It is also defined as a production format in SMPTE 293M and SMPTE 294M standards documents. Compared with the current 525I standard, the frame rate has been doubled from 30 to 60. The sampling frequency in the format has been doubled to 27 MHz, Table 8. MUSE Transmission Systems Transmission System MUSE
MUSE-T MUSE-6 MUSE-9
MUSE-E
Type of Transmission Direct broadcast by satellite (DBS) Satellite Terrestrial broadcast Terrestrial broadcast Terrestrial broadcast
1391
Bandwidth
Channel Compatible
Compatible with NTSC
8.1 MHz
NA
No
16.2 MHz 6 MHz
NA Yes
No Yes
6 + 3 MHz Augmentation
Yes, with 2nd 3 MHz channel No
Yes
8.1 MHz
No
1392
TELEVISION BROADCAST TRANSMISSION STANDARDS
and bottom panels, which together occupy one-third as much area as the main picture. As these are generated in a 360-line format, they must be compressed by a 3 to 1 pixel downsampling decimation process to fit into the 120 lines of the top and bottom panels. Vertical High Frequency (VH). The VH signal enhances the vertical still picture resolution back up to 480 lines. The signal is transmitted only for stationary areas of the image, and temporal averaging is applied. Vertical Temporal Frequency (VT). The VT enhancement signal is derived from the progressive-to-interlace scan conversion at the encoder and improves the interlace-toprogressive scan (360/2 : 1 to 360/1 : 1) conversion in the receiver. The EDTV-II receiver performs the reverse of the encoding process. The NTSC receiver uses the MP signal directly. The European DVB System The Digital Video Broadcast (DVB) system has been designed for MPEG-2-based digital delivery systems for satellite, cable, community cable, multichannel multipoint distribution (MMDS), and terrestrial broadcasting. Service information, conditional access, and teletext functions are also available. All DVB systems are compatible. DVB-T, the terrestrial broadcasting standard, is similar in many respects to the ATSC standard. However, there are a number of significant differences. DVB-T uses coded orthogonal frequency division multiplexing (COFDM). This technique is already being used for direct audio broadcast (DAB). Individual carriers 1,704 (2 k) or 6,816 (8 k) may be used. The 8-k system is more robust, but increases receiver complexity and cost. Some broadcasters have already adopted the 2-k system, although it will not be compatible with the 8-k system. DVB-T uses the MPEG-2 Layer II Musicam audio standard, a 50 Hz frame rate, and aspect ratios of 4 : 3, 16 : 9, or 20 : 9. The European PALplus System This is an analog delivery system that uses a current TV channel to transmit an enhanced wide-screen version of the PAL signal. A conventional receiver displays the PALplus picture as a letterbox in a 4 : 3 aspect ratio. A wide-screen receiver shows the same transmitted picture in a 16 : 9 format at higher resolution. European broadcasters are divided on whether to use this format. The PALplus concept is similar to the Japanese EDTV-II format described before. Acknowledgments The authors sincerely thank the following for permission to use portions of their work in this article: • The Advanced Television Systems Committee (ATSC) and its Executive Director, Craig Tanner, for text and figures from Standards A/52, A/53, and A/54. • Mr. Stanley N. Baron for text and figures from his book Digital Image and Audio Communications, Toward a Global Information Infrastructure.
• Mr. Patrick Griffis for the data in a figure from his article Bits = Bucks, Panasonic, paper presented at NAB, 1998, unpublished.
BIBLOGRAPHY 1. M. Ashibe and H. Honma, A wide-aspect NTSC compatible EDTV system, J. SMPTE, Mar. 1992, p. 130. 2. ATSC Digital Television Standard, Advanced Television Systems Committee, Doc. A/53, 16 Sept., 1995. 3. S. N. Baron, ed., Composite Digital Television: A Primer, Soc. of Motion Picture and Television Eng., White Plains, NY, 1996. 4. S. N. Baron and M. I. Krivocheev, Digital Image and Audio Communications, Toward a Global Information Infrastructure, Van Nostrand Reinhold, New York, 1996. 5. K. B. Benson, ed., Television Engineering Handbook, revised by J. Whitaker, McGraw-Hill, New York, 1992. 6. Digital Audio Compression Standard (AC-3), Advanced Television Systems Committee, Doc. A/52, 20 Dec., 1995. 7. A. Dubec, The SECAM Colour Television System, Compagnie Fran¸caise de T´el´evision, Paris, 1976. 8. P. Griffis, Bits = Bucks, Panasonic, paper at NAB, 1998. 9. Guide to the Use of the ATSC Digital Television Standard, Advanced Television Systems Committee, Doc. A/54, 4 Oct., 1995. 10. A. Itoh, 525 line progressive scan signal digital interface standard and system, J. SMPTE, Nov. 1997, p. 768. 11. R. W. G. Hunt, The Reproduction of Colour, 5th ed., Fountain Press, Kingston-upon-Thames, 1995. 12. G. Hutson, P. Shepherd, and J. Brice, Colour Television, McGraw-Hill, London, 1990. 13. A. F. Inglis and A. C. Luther, Video Engineering, 2nd ed., McGraw-Hill, New York, 1996. 14. ISO/IEC IS 13818-1, International Standard MPEG-2 Systems, 1994. 15. ISO/IEC IS 13818-2, International Standard MPEG-2 Video, 1994. 16. ISO/IEC IS 13818-2, Section 8. 17. ITU-R BT.470-4, Characteristics of television systems, International Telecommunications Union, Geneva, 1995. 18. ITU-R Document, 11-3/15, MPEG digital compression systems, 9 Aug., 1994. 19. K. Jack, Video Demystified, 2nd ed., HighText Interactive, San Diego, 1996. 20. K. Jackson and B. Townsend, eds., TV & Video Engineer’s Reference Book, Butterworth-Heinemann, Oxford, England, 1991. 21. H. Y. Kim and S. Naimpally, Digital EDTV, compatible HDTV, J. SMPTE, Feb. 1993, p. 119. 22. B. Marti et al., Problems and perspectives of digital terrestrial TV in Europe, J. SMPTE, Aug. 1993, p. 703. ¨ 23. R. Mausl, Refresher Topics — Television Technology, Rohde & Schwarz, Munich, 1992. 24. R. S. O’Brien, ed., Color Television, Selections from the Journal of the SMPTE, Society of Motion Picture and Television Engineers, New York, 1970. 25. G. Pensinger, ed., 4 : 2 : 2 Digital Video Background and Implementation, Society of Motion Picture and Television Engineers, White Plains, NY, 1989. 26. D. H. Pritchard and J. J. Gibson, Worldwide color television standards — similarities and differences, J. Soc. Motion Picture and Television Eng. 89, 111–120 (1980).
TERAHERTZ ELECTRIC FIELD IMAGING 27. Proc. IRE, Color Television Issue 39(10) (1951). 28. V. Reimer, Advanced TV systems, Germany and Central Europe, J. SMPTE, May 1993, p. 398. 29. M. Robin, Addendum to Worldwide color television standards — similarities and differences, J. Soc. Motion Picture and Television Eng. 89, 948–949 (1980). 30. M. Robin and M. Poulin, Digital Television Fundamentals, McGraw-Hill, New York, 1997. 31. T. Rzeszewski, ed., Color Television, IEEE Press, New York, 1983. 32. T. S. Rzeszewski, ed., Television Technology Today, IEEE Press, New York, 1984. 33. H. V. Sims, Principles of PAL Colour Television and Related Systems, Newnes Technical Books, London, 1969. 34. SMPTE 274 M Standard for Television, 1, 920 × 1, 080 Scanning and Interface, 1995. 35. SMPTE S17.392 Proposed Standard for Television, 1, 280 × 720 Scanning and Interface, 1995. 36. V. Steinberg, Video Standards, Signals, Formats, and Interfaces, Snell & Wilcox, Durford Mill, England, 1997. 37. N. Suzuki et al., Matrix conversion VT resolution in letterbox, J. SMPTE, Feb. 1991, p. 104. 38. N. Suzuki et al., Experiments on proposed multiplexing scheme for vertical-temporal and vertical high helper signals in EDTV-II, J. SMPTE, Nov. 1994, p. 728. 39. Television Operating and Interface Standards, Society of Motion Picture and Television Engineers, 595 W. Hartsdale Ave., New York, NY 10607-1824. 40. Television Measurements Standards, Institute of Electrical and Electronic Engineers, Inc., Broadcast Technology Society, c/o IEEE Service Center, 445 Hoes Lane, Box 1,331, Piscataway, NJ 08855. 41. J. Watkinson, Television Fundamentals, Focal Press, Oxford, England, 1996.
TERAHERTZ ELECTRIC FIELD IMAGING X. -C. ZHANG Rensselaer Polytechnic Institute Troy, NY
INTRODUCTION TO THE TERAHERTZ WAVE Various frequencies are spaced along the frequently used electromagnetic spectrum, including microwaves,
Electronics
THz
Microwaves
infrared, visible light, and X rays. Terahertz radiation between microwave and infrared frequencies lies (Fig. 1). In the electromagnetic spectrum, radiation at 1 THz has a period of 1 ps, a wavelength of 300 µm, a wave number of 33 cm−1 , photon energy of 4.1 meV, and an equivalent temperature of 47.6 K. In the same way that visible light can create a photograph, radio waves can transmit sound, and X rays can see shapes within the human body, terahertz waves (Trays) can create pictures and transmit information. Until recently, however, the very large terahertz portion of the spectrum has not been particularly useful because there were neither suitable emitters to send out controlled terahertz signals nor efficient sensors to collect them and record information. Recent developments in terahertz time-domain spectroscopy and related terahertz technologies now lead us to view the world in a new way. As a result of developing research, terahertz radiation now has widespread potential applications in medicine, microelectronics, agriculture, forensic science, and many other fields. Three properties of THz wave radiation triggered research to develop this frequency band for applications: • Terahertz waves have low photon energies (4 meV at 1 THz) and thus cannot lead to photoionization in biological tissues. • Many molecules exhibit strong absorption and dispersion rates at terahertz frequencies, due to dipole allowed rotational and vibrational transitions. These transitions are specific to the molecule and therefore enable terahertz wave fingerprinting. • Coherent terahertz wave signals can be detected in the time domain by mapping the transient of the electrical field in amplitude and phase. This gives access to absorption and dispersion spectroscopy. Coherent terahertz time-domain spectroscopy that has an ultrawide bandwidth provides a new method for characterizing the electronic, vibronic, and compositional properties of solid, liquid, and gas phase materials, as well as flames and flows. In theory, many biological and chemical compounds have distinct signature responses to terahertz waves due to their unique molecular vibrations and rotational energy levels, this implies that their chemical compositions might be examined
Photonics Visible
X ray
g ray
MF, HF, VHF, UHF, SHF, EHF
100
103
106
109
Kilo
Mega
Giga
1012
Tera
1393
1015
Peta
1018
Exa
1021
1024
Zetta
Yotta
Hz
Frequency (Hz) Figure 1. The terahertz gap: a scientifically rich but technologically limited frequency band between microwave and optical frequencies.
1394
TERAHERTZ ELECTRIC FIELD IMAGING
using a terahertz beam. Such capability could be used diagnosing disease, detecting pollutants, sensing biological and chemical agents, and quality control of food products. It is also quite possible that plastic explosives could be distinguished from suitcases, clothing, common household materials, and equipment based on molecular structure. Detecting the binding state of genetic materials (DNA and RNA) by directly using terahertz waves, without requiring markers, provides a label-free method for genetic analysis for future biochip technologies. A T-ray imaging modality would produce images that have ‘‘component contrast’’ enabling analysis of the water content and composition of tissues in biological samples. Such capability presents tremendous potential to identify early changes in composition and function as a precursor to specific medical investigations and treatment. Moreover, in conventional optical transillumination techniques that use near-infrared pulses, large amounts of scattering can spatially smear out the objects to be imaged. T-ray imaging techniques, due to their longer wavelengths, can provide significantly enhanced contrast as a result of low scattering (Rayleigh scattering).
TECHNICAL BACKGROUND The development of terahertz time-domain spectroscopy has recently stimulated applications of this unexplored frequency band. Hu and Nuss first applied THz pulses to imaging applications (1). Far-infrared images (T-ray imaging) of tree leaves, bacon, and semiconductor integrated chips have been demonstrated. In an imaging system that has a single terahertz antenna, the image is obtained by pixel-scanning the sample in two dimensions (2–4). As a result, the time for acquiring an image is typically of the order of minutes or hours, depending on the total number of pixels and the lowest terahertz frequency components of interest. Although it is highly desirable to improve the data acquisition rate further for real-time imaging by fabricating a focal plane antenna array, technical issues such as high optical power consumption and limits on the antenna packaging density would hinder such a device (5). Recently, a free-space electro-optic sampling system has been used to characterize the temporal and 2-D spatial distribution of pulsed electromagnetic radiation (6–8). A T ray can be reflected as a quasi-optical beam, collimated by metallic mirrors, and focused by a plastic or high-resistivity silicon lens. The typical powers of a T-ray sensing (single pixel) and an imaging (2-D array) system are microwatts and milliwatts, respectively. Thus a terahertz imaging system, based on the electro-optic sampling technique, shows promise for 2-D real-time frame imaging using terahertz beams.
GENERATION OF TERAHERTZ BEAMS Currently, photoconduction and optical rectification are the two basic approaches for generating terahertz beams
by using ultrafast laser pulses. The photoconductive approach uses high-speed photoconductors as transient current sources for radiating antennas (9). These antennas include elementary hertzian dipoles, resonant dipoles, tapered antennas, transmission lines, and largeaperture photoconducting antennas. The optical rectification approach uses electro-optic crystals as a rectification medium (10). The rectification can be a second-order (difference frequency generation) or higher order nonlinear optical process, depending on the optical fluency. The physical mechanism for generating a terahertz beam by photoconductive antennas is the following: a laser pulse (hω ≥ Eg ) creates electron–hole pairs in the photoconductor, the free carriers then accelerate in the static field to form a transient photocurrent, and the fast time-varying current radiates electromagnetic waves. In the far field, the electrical component of terahertz radiation is proportional to the first time derivative of the photocurrent. The waveform is measured by a 100-µm photoconducting dipole that can resolve subpicosecond electrical transients. Because the radiating energy comes mainly from stored surface energy, the terahertz radiation energy can scale up with the bias and optical fluency (11). Optical rectification is the inverse process of the electro-optic effect (12). In contrast to photoconducting elements where the optical beam functions as a trigger, the energy of terahertz radiation during transient optical rectification comes from the excitatory laser pulse. The conversion efficiency depends on the value of the nonlinear coefficient and the phase matching condition. In the optical rectification mode, the terahertz pulse duration is comparable to the optical pulse duration, and the frequency spectrum is limited mainly by the spectral broadening of the laser pulse, as determined by the uncertainty principle. Materials used for terahertz sources have been adapted from conventional electrooptic crystals and include semiconductor and organic crystals. Enhancement and sign change of second-order susceptibility by optically exciting electronic resonance states have been reported (13).
FREE-SPACE ELECTRO-OPTIC DETECTION Fundamentally, the electro-optic effect is a coupling between a low-frequency electrical field (terahertz pulse) and a laser beam (optical pulse) in the sensor crystal. Free-space electro-optic sampling via the linear electrooptic effect (Pockels effect) offers a flat frequency response across an ultrawide bandwidth. Because field detection is purely an electro-optic process, the system bandwidth is limited mainly by either the pulse duration of the probe laser or the lowest transverse optical (TO) phonon frequency of the sensor crystal. Furthermore, because electro-optic sampling is purely an optical technique, it does not require electrode contact or wiring on the sensor crystal (14,15). Figure 2 is a schematic of the experimental setup for using optical rectification and electro-optic effects. Nonlinear optics forms the basis of the terahertz system. A
TERAHERTZ ELECTRIC FIELD IMAGING
mode-locked Ti:sapphire laser is used as the optical source. Several different gigahertz/terahertz emitters can be used, including photoconductive antennas (transient current source) and a 111 GaAs wafer at normal incidence (optical rectification source) (16–18). Generally, an optical rectification source emits terahertz pulses whose duration is comparable to that of the optical excitatory pulse, and a transient current source radiates longer terahertz pulses. Figure 3 shows the details of the sampling setup. Simple tensor analysis indicates that using a 110 oriented zincblende crystal as a sensor gives the best sensitivity. The polarization of the terahertz beam and optical probe beam are parallel to the [1,−1,0] crystal direction. Modulating the birefringence of the sensor crystal via an applied electrical field (terahertz) modulates the polarization ellipticity of the optical probe beam that passes through the crystal. The ellipticity modulation of the optical beam can then be analyzed for polarization to provide information on both the amplitude and phase of the applied electrical field. The detection system will analyze a polarization change from the electro-optic crystal and correlate it with the amplitude and phase of the electrical test field. For weak field detection, the power of
ZnTe
Time delay
12"
l/4
Terahertz beam
18"
Terahertz emitter
Fiber laser
the laser beam Pout (E) modulated by the electrical field of the terahertz pulse (E = V/d) is πE , Pout (E) = P0 1 + Eπ
60
0]
[1,−1,0]
Wollaston polarizer
, ,1
[1
EO signal (nA)
40
p Detector
(1)
where P0 is the output optical probe power at zero applied field and Eπ is the half-wave field of the sensor crystal of certain thickness. By measuring Pout from a calibrated voltage source as a function of time delay between the terahertz pulse and optical probe pulse, the time-resolved sign and amplitude of V can be obtained, and a numerical FFT provides frequency information. For a 3-mm thick ZnTe sensor crystal, the shot-noise limit gives a minimum detectable field of 100 nVcm−1 Hz1/2 and a frequency range from near dc to 4 THz. Figure 4 is a plot of the temporal electro-optic waveform of a terahertz pulse whose a half-cycle duration is 1 ps, as measured by a balanced detector using a 110 ZnTe sensor crystal. The time delay is provided by changing the relative length of the optical beam path between the terahertz pulses and the optical probe pulses. Detection sensitivity is significantly improved by increasing the interactive length of the pulsed field and the optical probe beam within the crystal. The dynamic range can exceed 10, 000 : 1 using unfocused beams, 100, 000 : 1 using unamplified focused beams, and 5, 000, 000 : 1 using focused amplified beams and a ZnTe sensor crystal. Figure 5 is a plot of the signal and noise spectra, where the a SNR > 50, 000 from 0.1 to 1.2 terahertz, corresponding to the waveform in Fig. 4. A linear response in both generating and detecting of the terahertz pulses is crucial. Figure 6 is a plot of the electro-optic signal versus peak terahertz field strength. Excellent linearity is achieved. By increasing the optically illuminated area of the photoconductor on the terahertz emitter, the total emitted terahertz power scales linearly with the illumination area (assuming a nonsaturating optical fluence). A shorter laser pulse ( 40 THz; t < 30 fs
ZnTe
Figure 14. One of the ZnTe crystals used in terahertz imaging. The useful area is more than 3 × 3 cm2 .
Fibers, specks, and masses
THz images Fibers
Specks
Mass
Figure 15. Terahertz images of fibers, mass, and specks. Small structures less than 0.5 mm thick and diameter less than 0.24 mm can be resolved.
Photo
Terahertz image
CCD camera
Polarizer
Readout beam Figure 16. Photo of human breast tissue and a T-ray image of a 0.6 mm abnormal structure (shadow).
Computer
THz Image
Figure 13. Setup for converting a terahertz image into an optical image. The 2-D field distribution in the sensor crystal is converted into a 2-D optical intensity distribution after the readout beam passes through a crossed analyzer.
between 30 GHz and 0.2 THz. The electro-optic imaging system can image fast moving species, such as real-time imaging of living insects. Figure 18 demonstrates realtime in vivo terahertz images of insects, such as a fly, worm, ant and ladybug. The antennae and legs of the ant and the organs in the ladybug can be resolved. Terahertz radiation has no ionizing effect, and the spectrum of a
TERAHERTZ ELECTRIC FIELD IMAGING
Terahertz image
Standard photo
1399
Terahertz waveforms
0 4
Cancer tissue Normal tissue
EO signals (nA)
mm
5 10 15 20 25 A tumor
3 2 1 0
0
5
10
15 mm
20
25
0
10
20
30
40
Time delay (ps)
Figure 17. A breast tissue sample for terahertz measurement. The light spot near the center is a cancerous tumor. Transmitted terahertz waveform is from normal tissue and cancerous tissue.
Terahertz beam
2f
2f
A fly on a leaf
Invisible object
CCD camera
ZnTe
Analyzer Polarizer
Lens
Readout beam Terahertz images of insects
A worm
An ant
A ladybug
Computer
Figure 18. In vivo terahertz images of insects, such as a fly, worm, ant and ladybug. An image rate as fast as 30 frames/s is achieved.
terahertz wave falls within the heat range; therefore it is safe for medical applications. Figure 19 shows the schematic for terahertz imaging of currency watermarks. Unlike intensity images viewed by a visible beam, the watermark images in Fig. 19 are obtained purely by the phase difference of the terahertz pulse transmitted through the watermarks. The maximum phase shift is less than 60 fs. The terahertz absorption is less than 1%. Clearly these terahertz watermark images in the terahertz spectrum show an alternative method for detecting-counterfeiting. Electro-optic imaging makes it possible to see terahertz wave images of electrical fields, diseased tissue, the chemical composition of plants, and much more that is undetectable by other imaging systems. Real-time monitoring of a terahertz field supports real-time diagnostic techniques.
THZ WAVE TRANSCEIVER In a conventional experimental setup of terahertz timedomain spectroscopy, a separate terahertz transmitter and terahertz receiver are used to generate and detect the terahertz signal. However, because electro-optic detection
is the reverse of rectified generation, the transmitter and the receiver can be the same crystal (31). Therefore, a terahertz transceiver, which alternately transmits pulsed electromagnetic radiation (optical rectification) and receives the returned signal (electro-optic effect) is feasible. The use of a transceiver has its advantages for terahertz range remote sensing and tomographic imaging. Theoretically and experimentally, it has also been demonstrated that the working efficiency of an electrooptic transceiver constructed from a (110) zincblende crystal is optimized when the pump beam polarization is 26° counterclockwise from the crystallographic Zaxis of the crystal. An experimental setup of a terahertz imaging system using an electro-optic transceiver is shown in Fig. 20. Compared to the traditional terahertz tomographic setup in reflective geometry, this imaging system using an electro-optic transceiver is simpler and easier to align. Besides, the normal incidence of the terahertz beam on the sample can be maintained. Greater than 50 meters of free-space terahertz generation, propagation and detection has been demonstrated by using this transceiver. Terahertz tomographic imaging using an electro-optic transceiver is illustrated by using a razor pasted on a metal
1400
TERAHERTZ ELECTRIC FIELD IMAGING
2f
Lens
Ggahertz/Terahertz beam
2f
CCD camera
ZnTe
Analyzer Polarizer
Optical beam
readout beam
compute r
Figure 19. Terahertz image based on the phase difference in a currency watermark structure. A phase shift as small as a few femtoseconds can be resolved.
Sample
(c)
Chopper
Probe
Terahertz signal (a.u.)
6
Pump
4
(b)
2 (a) 0 −2
0
2
4
6
8
Time delay (ps)
Figure 21. Terahertz waveforms reflected from (a) the metal handle of a razor, (b) the razor surface, and (c) the metal mirror.
ZnTe
mirror. There are three different reflective metal layers in this sample; the first is the metal handle of the razor, the second is the razor surface, and the third is the metal mirror. Figure 21 shows the terahertz waveforms reflected from these three different layers; the timing difference in the peak intensity spatially separate these layers, which can be used to construct a three-dimensional tomographic image of a razor, as shown in Fig. 22. Using the same imaging system, terahertz tomographic images of a quarter dollar and a 50-pence piece are shown in Fig. 23. The image contrast is limited by the terahertz beam focal size and the flatness of the background metal surface. The width of the short timing window is determined by degree of ‘‘unflatness’’ of the target. If two images are from two different reflective layers and their spatial separation (depth) is large enough, the image can be displayed in this fashion at two different timing positions; the timing difference is proportional to the depth difference between two layers. Three-dimensional terahertz imaging can still be realized
ps
Figure 20. Schematic experimental setup of an electro-optic terahertz transceiver. The terahertz signal is generated and detected by the same ZnTe crystal. 5 0 −5 3 4 2
Cm
2
1
Cm
0
0
Figure 22. Terahertz tomographic image of a razor, the gray level represents the timing of the peak intensity.
without displaying the image in terms of the timing of peak intensity. TERAHERTZ WAVE NEAR-FIELD IMAGING The terahertz wave near-field imaging technique can greatly improve the spatial resolution of a terahertz wave sensing and imaging system (32). Dr. Klass Wynne
TERAHERTZ ELECTRIC FIELD IMAGING (a) 2.5
Cm
2
1.5
1
0.5
0.5
1
1.5
2
2.5
Cm
(b) 2.5
Cm
2
1.5
1
0.5
0.5
1
1.5
2
2.5
Cm Figure 23. Terahertz image of (a) a quarter dollar; (b) a fifty-pence piece; the gray level represents the peak intensity within a certain temporal window.
in the United Kingdom has demonstrated 110-µm and 232-µm spatial resolution for λ = 125 µm and λ = 1 mm, respectively (33). The improvement factor is about 2 to 4. O. Mitrofanov and John Federici at the New Jersey Institute of Technology and Bell Laboratory reported the use of collection mode near-field imaging to improve spatial resolution (34–36). The best result reported is 7µm imaging resolution using 0.5 terahertz pulses. This is about 1/100 of the wavelength. The limitation of such a system is the low throughput of the terahertz wave past the emitter tip, the throughput terahertz wave field is inversely proportional to the third power of the aperture size of the emitter tip. A newly developed dynamic-aperture method that introduces a third gating beam can image objects at a
1401
subwavelength resolution (λ/100); however, the drawback of this method is the difficulty in coating a gating material on the surface of biomedical samples such as cells and tissues (37). Dr. Wynne’s method (the use of an electro-optic crystal as a near-field emitter) led to the development of the terahertz wave microscope. Biomedical samples are mounted directly on the surface of the crystal. Figure 24 shows the terahertz wave near-field microscope for 2D microscopic imaging. In this case, terahertz waves are generated in the crystal by optical rectification and are detected by a terahertz wave detector crystal by the electro-optic effect. The spatial resolution is limited only by the optical focal size of the laser on the crystal (less than 1 µm due to the large refractive index of 2.8 for ZnTe) under moderate optical power, and it is independent of the wavelength of the terahertz wave. A coated thin ZnTe plate (antireflective coating for the bottom surface and highly-reflective coating for the top surface) is placed at the focal plane of the microscope as the terahertz wave emitter. The coating prevents optical loss in the crystal and leakage of the optical beam into the tissue sample. The tissue can be monitored by the optical microscope. A laser beam is guided from the bottom of the microscope into the terahertz emitter. Terahertz waves generated by the emitter can be detected in the transmitted mode (a terahertz wave sensor is mounted on top of the microscope) and/or reflection mode (transceiver). The emitter, the sample, or the terahertz beam can be scanned laterally to obtain a 2-D image. Submicron spatial resolution is expected, even though the imaging wavelength is about 300 µm at 1 THz. In the transmitted mode shown in Fig. 25, a separate ZnTe sensor crystal (terahertz detector) is required, and a probe beam is required to sample the terahertz wave in the sensor crystal. This construction also applies the concept of the terahertz wave transceiver, which combines the emitter and receiver in one crystal in the nearfield range, as shown in Fig. 26. Both transmitted and reflected terahertz wave microscopic images can therefore be obtained from the proposed system. When a Ti : sapphire laser where λ = 0.8 µm is used as the optical source, the smallest optical focal spot
Terahertz wave image
Microscope monitor
Laser Figure 24. The concept of 2-D near-field inverted terahertz wave microscope imaging (left) and the schematic of a terahertz wave microscope system (right). A tissue sample is placed on top of a terahertz wave emitter.
1402
TERAHERTZ ELECTRIC FIELD IMAGING
Parabolic mirror
Terahertz wave
Thin tissue Terahertz emitter
To Terahertz detector Microscope objective Laser
Figure 25. The Tray is generated and detected in one ZnTe crystal in the reflected geometry. The T-ray imaging spot on the tissue is comparable to the focal spot of the optical beam.
Tissue ZnTe Index matched lens
Lens
The optical beam is focused in the ZnTe through the matching refractive index lens to a spot size comparable to 1.22λ/n (assume NA = 1). If λ = 0.8 µm and n = 2.8, in theory, a focal spot can be as small as 0.35 µm. By using a shorter optical wavelength, such as the second-harmonic wave from a Ti : sapphire laser, a smaller focal spot is expected. An electro-optic terahertz transceiver can be used in the reflective mode of the near-field terahertz wave microscope. In this case, the terahertz wave is generated and detected at the same focal spot within a thin crystal (ZnTe). The target sample (biomedical tissue) is placed on top of the crystal (terahertz transceiver). The measured area of the tissue is comparable to the optical focal size. Due to the intense power density at an optical focal spot (micron or submicron), higher order nonlinear phenomena other than optical rectification have to be considered. Some effects may limit T-ray generation and detection. For example, two-photon absorption (a third-order nonlinear optical effect) in ZnTe generates free carriers. In a tight focal spot, extremely high free-carrier density changes the ZnTe local conductivity, screens the Trays, and saturates the T-ray field. A possible solution is to reduce the optical peak power and increase the pulse repetitionrate. This method can maintain the same average power. CONCLUSION
BS
Diode Laser P
Lens
Figure 26. The T-ray is generated and detected in one ZnTe crystal in the reflected geometry.
a in the air is calculated from the standard equation a = 1.22λ2f /D the (1.22 factor comes from the diffraction limit under the Gaussian beam approximation) where f is the wavelength, D is the beam diameter, and D/2f is the numerical aperture NA of the microscopes objective lens. Assuming the ideal case where NA = 1, then a = 1 µm. A possible way of achieving submicron lateral resolution is to focus the optical beam into a high refractive index medium. The refractive index of ZnTe is greater than 1, therefore, the focal spot in a ZnTe must be smaller than that in air by the factor of the refractive index value. However, when directly focusing a laser beam from air into a ZnTe plate, as shown in Fig. 25, it is difficult to achieve a much smaller focal spot because of the change in the numerical aperture after optical refraction at the ZnTe interface by Snells law. This can be improved by using a high-index hemispherical lens, as shown in Fig. 26. The numerical apertures of the first focal lens and the hemispherical lens must be identical. A thin ZnTe plate is placed on the top of the hemispherical lens, which has the same refractive index as that of the ZnTe (n = 2.8).
The terahertz band occupies an extremely broad spectral range between the infrared and microwave bands. However, due to the lack of efficient terahertz emitters and sensors, far less is known about spectra in the terahertz band than those in the rest of the electromagnetic spectrum. Recently developed photoconductive antennas and free-space electro-optic sampling provide measurement sensitivity by several orders better than conventional bolometer detection, but it is still far from the detection resolution achieved in other frequency bands. The development of instruments impacts physics and basic science; recent examples are the scanning tunneling microscope and near field optical microscope, which opened the following new fields to the physics community. Terahertz System for Spectroscopy The powerful capabilities of the time-domain terahertz spectroscopic technique results from the ability to provide 20-cm diameter real-time images at a variable frame rate up to 2,000 frames/second and the ability to image moving objects, turbulent flows, and explosions noninvasively. Furthermore, the imaging system will also have a subwavelength spatial resolution (1/1,000 λ), 50-femtosecond temporal resolution, sub-mV/cm field sensitivity and be capable of single-shot measurements. New Terahertz Sources New terahertz beam sources emphasize tunable narrowband terahertz lasers using novel semiconductor structures. For example, a terahertz laser was recently developed using a p-type Ge, which operates at the temperature of liquid nitrogen. In this laser, a novel unipolar-type
TERAHERTZ ELECTRIC FIELD IMAGING
population inversion is realized by the streaming motion of carriers in the semiconductor. It may be possible to improve these solid-state terahertz laser sources further by using strained SiGe. Terahertz BioChip and Spectrometer The detection of nanolayers using terahertz techniques is a challenge, which cannot be solved by conventional transmission techniques. The fact that the thickness of the layer is orders of magnitude smaller than the wavelength of the terahertz radiation leads to layer specific signatures that are so small that they are beyond any reasonable detection limit. An alternative method for detecting nanolayers is grating couplers. Evanescent waves, that travel on the grating enlarge the interactive length between the nanolayers and terahertz radiation from nanometers to several tens of micrometers. Quantum Terahertz Biocavity Spectroscopy The concept is to design and fabricate photonic band-gap structures in the terahertz regime and place materials such as DNA, biological and chemical agents, or quantum dots in the active region of the cavity. This configuration will be useful for enhanced absorption as enhanced spontaneous emission spectroscopy. Previous research has indicated that DNA (and other cellular materials) possess large numbers of unique resonances due to localized phonon modes that arise from DNA base-pair interaction, that is absent from far-infrared data.
1403
than relying on transit and/or tunneling phenomena for individual electrons. Just as sound waves propagating in air achieve velocities that are orders of magnitude higher than the velocities of individual molecules propagating from the sound source to a sound detector, such as the human ear, in the same way electron plasma waves can propagate at much higher velocities and can generate and detect terahertz radiation. A preliminary theoretical foundation of this approach to terahertz generation and detection has already been established and the first experimental results have been obtained for detecting and generating terahertz radiation by a two-dimensional electron gas. Acknowledgments This work was supported by the U.S. Army Research Office and the U.S. National Science Foundation.
ABBREVIATIONS AND ACRONYMS THz GaAs ZnTe 2-D IR DNA RNA CCD ps fs
terahertz gallium arsenate zinc telluride two-dimensional infrared deoxyribonucleic acid ribonucleic acid charged couple device picosecond femtosecond
BIBLIOGRAPHY Terahertz Molecular Biology Spectroscopy Understanding how genes are regulated is one of the grand challenges in molecular biology. Recent reports indicate that pulsed terahertz spectroscopy provides a new handle on the interactions of biopolymers. The use of pulsed terahertz spectroscopy in the study of the binding of transcription factors to these cognate DNA binding sites will be explored. Near-Field Terahertz Imaging The current near-field terahertz imaging system has a spatial resolution of 1/50 λ. The microscopic imaging system will achieve submicron spatial resolution ( 0 and b are real-valued and the symbol stands for ‘‘defined as.’’ The function ψa,b (t) is a mother wavelet ψ(t) dilated by a factor a and shifted in time by an amount 1444
WAVELET TRANSFORMS
1445
Mexican hat wavelet
1 1
a = 1.5, b = 0
0.8 0.6
0.5
ya,0(t )
0.4 0.2 0
0
−0.2 −0.4 −10
−8
−6
−4
−2
0
2
4
6
8
10
Time
−0.5 −10
−5
0
1.5
5
10
Time
a = 0.5, b = 0
Figure 4
1
ya,0(t )
0.5
Scalogram 2.0
0
1.8 1.6
−0.5 −1 −10
−8
−6
−4
−2
0
2
4
6
8
10
Time
Scale
1.4 1.2 1.0 0.8
Figure 2. Wavelet dilations.
0.6 0.4
Chirp 1
0.2
0.8
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
Time 0.6 Figure 5. Scalogram of chirp. See color insert.
0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Time Figure 3. Chirp function sin(0.4π t2 ).
(shown in Fig. 4), ψ(t) = (1 − 2t2 ) exp[−t2 ],
The scale variable bears an inverse relationship to frequency. As dilation increases, the zero crossing rate of the daughter wavelet decreases. Therefore, the wavelet transform at large values of a provides information about the lower frequency spectrum of the analyzed signal. Conversely, information contained at the higher end of the frequency spectrum is obtained for very small values of a. Thus, as seen in Fig. 5, the scalogram peaks occupy continually lower positions along the scale axis, as time progresses. This corresponds to increasing frequency of the chirp with time, as seen in Fig. 3. An elegant expression exists for the inverse wavelet transform if a wavelet satisfies the admissibility condition. Suppose that
(6)
∞
(ω) =
is displayed in Fig. 5.
ψ(t) exp(−iωt)dt −∞
(7)
1446
WAVELET TRANSFORMS
is the Fourier transform of a wavelet ψ(t). Then, the wavelet is said to satisfy the admissibility condition if ∞
| (ω)| dω < ∞ |ω| 2
C −∞
(8)
When the admissibility condition is satisfied, the inverse continuous wavelet transform, is given by s(t) =
1 C
In the frequency domain, Eqs. (13) and (15) become
∞ ∞
−∞ −∞
1 S(a, b)ψa,b (t) da db. |a|2
(9)
DISCRETE WAVELET TRANSFORM It is possible to find mother wavelets so that one can synthesize functions from a set of their daughter wavelets whose dilation and shift parameters are indexed by the set of integers. In particular, dilations and translations from a dyadic set, ψa,b (t) : a = 2k , b = 2k i, i and k are integers,
(10)
φ(t) = 2
h(k)φ (2t − k)
˜ =2 φ(t)
˜ φ(2t ˜ h(k) − k)
(12)
k
and
˜ − k) = δ(k), φ(t), φ(t
where δ(k)
1 0
(13)
˜ − n), g(n) = (−1)1−n h(1 ˜ g(n) = (−1)1−n h(1 − n),
h(k) =
k
and
˜ h(k) =1
˜ + 2n) = h(k)h(k
k
Defining H(ω)
δ(n) . 2
(14)
n
˜ ψ(t) =2
˜ h(n) exp (−iωn).
(16)
g(k)φ (2t − k)
˜ φ˜ (2t − k) g(k)
(20)
k
˜ The resulting wavelet ψ(t) is orthogonal to φ(t), and ˜ its integer translates. Similarly, the wavelet ψ(t) is orthogonal to φ(t), and its integer translates. The wavelet ˜ pair ψ(t) and ψ(t) are said to be duals of each other. Along with their dyadic dilations and translations, they form a biorthogonal basis for the space of square-integrable functions. Given a square-integrable function f (t),
where
ak,i ψ2k ,2k i (t),
(21)
k
ak,i = f (t), ψ˜ 2k ,2k i (t) .
(22)
The special case of an orthogonal wavelet basis arises when the filter coefficients and scaling functions are their ˜ and φ(t) ˜ = φ(t). own biorthogonal duals, that is, h() = h() This results in a single wavelet ψ(t) that is orthogonal to its own integral translates, as well as all of its dyadic dilations. The earliest known orthogonal wavelet is the Haar wavelet which appeared in the literature several decades before wavelet transforms (5) and is given by
(15)
k
i
h(n) exp (−iωn)
n
˜ H(ω)
ψ(t) = 2
f (t) =
k
(19)
and then using the two scale relations given by Eqs. (11) and (12).
k=0 otherwise.
˜ The functions φ(t) and φ(t)are called scaling functions. Equations (11) and (12), called the dilation equations or two-scale relations, indicate that the scaling functions are generated as linear combinations of their own dyadic shifts and dilations. Equation (13) imposes orthogonality between one scaling function and integral translations of the other. Two conditions that follow from Eqs. (11)–(13) are
(18)
(11)
k
(17)
˜ ∗ (ω + π ) = 1. ˜ ∗ (ω) + H(ω + π )H H(ω)H
If the sequences h() and h∼ () are viewed as impulse responses of discrete-time filters, then H(ω) and H ∼ (ω) are their frequency responses. It follows from Eqs. (17) and (18) that the product H(π )∼H ˜ ∗ (π ) = 0. Although this condition is satisfied by letting just one of the frequency responses be zero at a frequency of π radians, the interesting and practical cases require that both frequency responses be zero at π radians. Consequently, the filters assume low-pass characteristics. Wavelets are constructed from scaling functions by forming sequences g(n) and g∼ (n) as
provide the daughter wavelets. ˜ Consider finite-norm functions φ(t) and φ(t) that have nonvanishing integrals satisfying
˜ H(0) = H(0) = 1,
ψ(t) =
1 0 ≤ t < 1/2 −1 1/2 ≤ t < 1 0 otherwise.
(23)
The Haar wavelet is a special case of an important class of orthogonal wavelets due to Daubechies (6). The coefficient sequence h(k) in the dilation equation for this class of wavelets consists of a finite and even number of coefficients. The resulting scaling function and wavelet are
WAVELET TRANSFORMS
lim Vk = L2 (R),
Scaling function
k→−∞
1.4
∞
Vk = 0(t),
1.2
1447
(26) (27)
k=−∞
1
where 0(t) is the function whose value is identically zero everywhere. The properties given by Eqs. (24)–(27) make the set of vector spaces {Vk : k = 0, ±1, ±2, . . .}, a multiresolution analysis (MRA). Obviously, an MRA is also generated by {V˜ k : k = 0, ±1, ±2, . . .} where V˜ k is the vector space ˜ −k t − ) : integer}. spanned by {φ(2 Suppose that xk (t) is some function in the vector space Vk . Expanded in terms of the basis functions of Vk .
0.8 0.6 0.4 0.2 0
−0.2 −0.4
0
0.5
1
1.5
2
2.5
3
xk (t) =
(28)
n
Time
where, (by Eq. (13)),
Wavelet 2
˜ −k t − n) . x(k, n) = xk (t), 2−k φ(2
1.5
Notice how the coefficients of the expansion are generated by projecting xk (t) not onto Vk but rather onto V˜ k . Now, suppose that we have a square-integrable signal f (t) that is not necessarily contained in any of the vector spaces Vk for a finite k. Approximations to this signal in each vector space of the MRA can be obtained as projections of the f (t) on these vector spaces as follows. First, inner products fk,n are formed as ˜ −k t − n) . fk,n = f (t), 2−k φ(2 (29)
1 0.5 0 −0.5 −1 −1.5
x(k, n)φ (2−k t − n)
0
0.5
1
1.5
2
2.5
3
Time Figure 6. Daubechies four-tap filter. Top: scaling function. Bottom: wavelet.
Then, the approximation fk (t) of f (t) in Vk , referred to as the approximation at level k, is constructed as fk (t) =
fk,n φ (2−k t − n).
(30)
n
compactly supported. A particularly famous√example1 is one involving four coefficients, h(0) = (1 + 3/8), √ √ h(1) = √ (3 + 3/8), h(2) = (3 − 3/8), and h(3) = (1 − 3/8). The corresponding scaling function and wavelet are shown in Fig. 6. MULTIRESOLUTION ANALYSIS AND DIGITAL FILTERING IMPLEMENTATION Let Vk denote the vector space spanned by the set {φ (2−k t − ) : } integer for an integer k. By virtue of Eq. (11), the vector spaces display the nesting . . . ⊂ V1 ⊂ V0 ⊂ V−1 .
(24)
1
gk (t) = fk−1 (t) − fk (t).
(31)
The detail represents the information lost in going from one level of approximation to the next coarser level and can be represented as a wavelet expansion at dilation 2k as gk (t) =
ak,i ψ2k ,2k i (t),
(32)
i
There are other interesting properties: x(t) ∈ Vk ⇔ x(2t) ∈ Vk−1 ,
For any k, fk (t) is a coarser approximation to the signal f (t) than fk−1 (t). Thus, there is a hierarchy of approximations to the function f (t). As k decreases, we get increasingly finer approximations. Hence, the term multiresolution analysis. The detail function gk (t) at level k is defined as the difference between the approximation at that level and the next finer level
(25)
The Haar wavelet has two coefficients h(0) = 1/2 and h(1) = 1/2.
where the coefficients ak,i are exactly as in Eq. (22). Because ∞ gk (t), (33) fk (t) = j=k+1
1448
WAVELET TRANSFORMS
as can be seen by repeated application of Eq. (31), and f (t) = lim fk (t),
Eqs. (32)–(34) lead to the result in Eq. (21). There is a simple digital filtering scheme available to determine the approximation and detail coefficients, once the approximation coefficients are found at a given resolution. Suppose that we have determined coefficients f0,n at level 0. Then, the next coarser or lower2 level approximation coefficients f1,n are found by passing the sequence f0,n through a digital filter whose impulse ˜ response is given by the sequence 2h(−n) and then ˜ retaining every other sample of the output.3 Here h(n) is the coefficient sequence in the dilation equation (12). Similarly, the detail coefficients a1,n are found by passing the sequence f0,n through a digital filter whose impulse ˜ response is given by the sequence 2g(−n) and once again ˜ retaining every other sample of the filter; we define g(n) as the coefficient sequence in Eq. (19). The block diagram in Fig. 7 illustrates these operations. The circle that has the down arrow represents the downsampling by two operations. Given an input sequence x(n), this block generates the output sequence x(2n) that is, it retains only the even-indexed samples of the input. The combination of filtering and downsampling is called decimation. It is also possible to obtain the approximation coefficients at a finer level from the approximation and detail coefficients at the next lower level of resolution by the digital filtering operation of interpolation, for which the block diagram is shown in Figure 8. The circle that has the up arrow represents upsampling by a factor of 2. For an input sequence x(n), this operation results in the output sequence that has a value zero for odd n and the value x(n/2) for even n. The process of filtering followed by upsampling is called interpolation. In most signal processing applications where the data samples are in discrete time, wavelet decomposition has come to mean the filtering of the input signal at multiple stages of the arrangement in Fig. 7. An Nlevel decomposition uses N such stages yielding one set of approximation coefficients and N sets of detail coefficients. Wavelet reconstruction means the processing of approximation and detail coefficients of a decomposition through multiple stages of the arrangement in Fig. 8. A
~ 2h (−n)
1
~ 2g (−n) 1 Figure 7. Block diagram of decimation.
2
h (n)
0
(34)
k→−∞
0
1
Lower because of the lower resolution in the approximation. Thus, approximation levels go lower as k gets higher. 3 The reader unfamiliar with digital filtering terminology may refer to any of a number of textbooks, for example, Mitra (7).
g (n)
1
Figure 8. Block diagram of interpolation.
block diagram of a two-stage decomposition-reconstruction scheme is shown in Fig. 9. Application to Image Processing Wavelet applications of image processing are based on exploiting the localization properties of the wavelet transform in space and spatial frequency. Noise removal, or what has come to be known as image denoising, is a popular application of the wavelet transform, as is image compression. Other types of two-dimensional (2-D) wavelet constructs are possible, but most applications involve separable wavelet basis functions that are relatively straightforward extensions of 1-D basis functions along the two image axes. For an orthogonal system, these basis functions are ψa (x, y) = ψ(x)φ(y) ψb (x, y) = ψ(y)φ(x) ψc (x, y) = ψ(x)ψ(y).
(35)
The scaling function φd (x, y) = φ(x)φ(y)
(36)
also comes in handy when the wavelet is transformed or decomposed is across a small number of scales. Because images are of finite extent, there are a finite number of coefficients associated with the 2-D wavelet expansion on any dyadic scale. The number of coefficients on a given scale is one-quarter the number of coefficients on the next finer scale. This permits arranging the wavelet coefficients in pyramidal form as shown in Fig. 10. The top left corner of the transform in an N-level decomposition is a projection of the image on φd (x, y) at a dilation of 2N and is called the low-resolution component of the wavelet decomposition. The other coefficients are projections on the wavelets ψa (x, y), ψb (x, y) and ψc (x, y) on various scales and are called the detail coefficients. At each level, the coefficients with respect to these three wavelets are seen in the bottom-left, top-right and bottom-right sections respectively. As can be seen from the figure, the detail coefficients retain edge-related information in the input image. Wavelet expansions converge faster around edges than Fourier or discrete cosine expansions, a fact exploited in compression and denoising applications. Most types of image noise contribute to wavelet expansions principally in the high-frequency detail coefficients. Thus, wavelet transformation followed by a suitable threshold zeroes
WAVELET TRANSFORMS
1449
Decomposition 0
~ 2h(−n) ~ 2g(−n)
1
~ 2h(−n) ~ 2g(−n)
1
g (n)
g (n)
h (n)
h (n) Reconstruction
2
2
Figure 9. Two-level wavelet decomposition and reconstruction.
Original
Reconstruction
Figure 11. Illustration of image reconstruction using a fraction of its wavelet transform coefficients.
out many of these coefficients. A subsequent image reconstruction results in image denoising and minimal edge distrotion. An example is shown in Fig. 11. Wavelets in Image Compression Figure 10. Top: image ‘‘Barbara.’’ Bottom: Its two-level wavelet transform.
Wavelet transforms have made their impact in the world of image compression, and this section provides insight
1450
WEATHER RADAR
into the reason. The fast convergence rate of wavelet expansions is the key to the success of wavelet transforms in image compression. Among known linear transforms, wavelet transforms provide the fastest convergence in the neighborhood of point singularities. Although they do not necessarily provide the fastest convergence, along edges they still converge faster than the discrete Fourier or discrete cosine transforms (DCT) (8). Consequently, good image reconstruction can be obtained by retaining a small number of wavelet coefficients (9). This is demonstrated in Fig. 11 where a reconstruction is performed using only 10% of wavelet coefficients. These coefficients were chosen by sorting them in descending order of magnitudes and retaining the first 10%. This method of reducing the number of coefficients and, consequently, reducing the number of bits used to represent the data is called zonal sampling (10). Zonal sampling is only one component in achieving high compression rates. Quantization and entropy coding are additional components. An examination of the wavelet transform in Fig. 10 reveals vast areas that are close to zero in value especially in the detail coefficients. This is typical of wavelet transforms of most natural scenes and is what wavelet transform-based compression algorithms exploit the most. An example of a compression technique that demonstrates this is the Set Partitioning in Hierarchical Trees algorithm due to Said and Pearlman (10,11). Yet another approach is to be found in the FBI fingerprint image compression standard (12) which uses run-length encoding. Most values in the detail regions can be forced to zero by applying a small threshold. Contiguous sections of zeros can then be coded simply as the number of zeros in that section. This increases the compression ratio because we do not have to reserve a certain number of bits for each coefficient that has a value of zero. For example, suppose that we estimate the maximum number of contiguous zeros ever to appear in the wavelet transform of an image at about 32,000. Then, we can represent the number of zeros in a section of contiguous zeros using a 16-bit binary number. Contrast this with the situation where every coefficient, including those that have a value of zero after thresholding, is individually coded by using binary digits. Then, we would require at least one bit per zero-valued coefficient. Thus, a section of 1,000 contiguous zeros would require 1,000 bits to represent it as opposed to just 16 bits. This approach of coding contiguous sections of zeros is called run-length encoding. Run-length encoding is used as part of the FBI’s wavelet transform-based fingerprint image compression scheme. After wavelet transformation of the fingerprint image, the coefficients are quantized and coefficients close to zero are forced to zero. Run-length encoding followed by entropy coding, using a Huffman code, is performed. Details of the scheme can be found in various articles, for example, (12). The advantages of wavelet transform in compression have made it the basis for the new JPEG-2000 image compression standard. Information regarding this standard may be found elsewhere in the encyclopedia.
BIBLIOGRAPHY 1. A. Grossman and J. Morlet, SIAM J. Math. Anal. 723–736 (1984). 2. J. Morlet, Proc. 51st Annu. Meet. Soc. Exploration Geophys., Los Angeles, 1981. 3. C. W. Helstrom, IEEE Trans. Inf. Theory 12, 81–82 (1966). 4. I. Daubechies, Proc. IEEE 84(4), 510–513 (1996). 5. A. Haar, Math. Annal. 69, 331–371 (1910). 6. I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, 1992. 7. S. K. Mitra, Digital Signal Processing: A Computer-Based Approach, McGraw-Hill Irwin, Boston, 2001. 8. D. L. Donoho and M. R. Duncan, Proc. SPIE, 4,056, 12–30 (2000). 9. R. M. Rao and A. S. Bopardikar, Wavelet Transforms: Introduction to Theory and Applications, Addison-Wesley Longman, Reading, MA, 1998. 10. N. S. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video, PrenticeHall, Englewood Cliffs, NJ, 1984. 11. A. Said and W. A. Pearlman, IEEE Trans. Circuits Syst. Video Technol. 6(3), 243–250 (1996). 12. C. M. Brislawn, J. N. Bradley, R. J. Onyshczak, and T. Hopper, Proc. SPIE — Int. Soc. Opt. Eng. (USA) 2,847, 344–355 (1996).
WEATHER RADAR ROBERT M. RAUBER University of Illinois at Urbana-Champaign Urbana, IL
This article contains a brief overview of the history of meteorological radars, presents the operating principles of these radars, and explains important applications of radar in the atmospheric sciences. Meteorological radars transmit short pulses of electromagnetic radiation at microwave or radio frequencies and detect energy backscattered toward the radar’s antenna by scattering elements in the atmosphere. Radiation emitted by radar is scattered by water and ice particles, insects, other objects in the path of the beam, and refractive index heterogeneities in air density and humidity. The returned signal is the combination of radiation backscattered toward the radar by each scattering element within the volume illuminated by a radar pulse. Meteorologists use the amplitude, phase, and polarization state of the backscattered energy to deduce the location and intensity of precipitation, the wind speed in the direction of the radar beam, and precipitation characteristics, such as rain versus hail. HISTORICAL OVERVIEW Radar, an acronym for radio detection and ranging, was initially developed to detect aircraft and ships remotely.
WEATHER RADAR
During the late 1930s, military radar applications for aircraft detection were adopted by Britain, Germany, and the United States, but these radars were limited to very low frequencies (0.2–0.4 GHz) and low power output. In 1940, the highly secret British invention of the cavity magnetron permitted radars to operate at higher frequencies (3–10 GHz) and high power output, allowing the Allies of World War II to detect aircraft and ships at long ranges. Studies of atmospheric phenomenon by radar began almost as soon as the first radars were used. These studies were initiated because weather and atmospheric echoes represented undesirable clutter that hampered detection of military targets. Particularly problematic atmospheric echoes were caused by storms and the anomalous propagation of radar beams. Because of extreme security, studies of these phenomena went unpublished until after the war. A key advance during World War II was the development of the theory relating the magnitude of echo intensity and attenuation to the type and size of drops and ice particles illuminated by the radar beam. Theoretical analyses, based on Mie scattering principles, predicted a host of phenomena that were subsequently observed,
1451
such as the bright band at the melting level and the high reflectivity of very large hailstones. The first weather radar equations were published in 1947. Slightly modified, these equations remain today as the foundation of radar meteorology. The first large weather-related field project for a nonmilitary application, the ‘‘Thunderstorm Project,’’ was organized after the war to study coastal and inland thunderstorms. Data from this and other projects stimulated interest in a national network of weather radars. Enthusiasm for a national network was spurred by efforts to estimate precipitation from radar measurements. The discovery of the ‘‘hook echo’’ and its association with a tornado (Fig. 1) led to widespread optimism that tornadoes may be identified by radar. Following the war, several surplus military radars were adapted for weather observation. Beginning in 1957, these were replaced by the Weather Surveillance Radar (WSR-57), which became the backbone of the U.S. National Weather Service radar network until the WSR-88D Doppler radars were installed three decades later. Radar meteorological research in the decades between 1950 and 1970 focused primarily on studies of the physics of precipitation formation, precipitation measurement,
30 km range ring
Thunderstorm echo
Tornado echo
Figure 1. First published photograph of a thunderstorm hook echo observed on a radarscope. The tornado, which was located at the southern end of the hook, occurred just north of Champaign, IL, on 9 April, 1953. (From G. E. Stout and F. A. Huff, Radar records Illinois tornadogenesis. Bulletin of the American Meteorological Society 34, 281–284 (1953). Courtesy of Glenn E. Stout and the American Meteorological Society.)
1452
WEATHER RADAR
storm structure, and severe storm monitoring. The first coordinated research flights through large cyclonic storms were conducted in conjunction with weather radar observations. These studies led to investigations of the physics of the ‘‘bright band,’’ a high reflectivity region in stratiform clouds associated with melting particles, ‘‘generating cells,’’ regions near cloud tops from which streams of ice particles appeared, and other cloud physical processes. Films of radarscopes were first used to document the evolution of storms. Doppler radars, which measure the velocity of scattering elements in the direction of the beam, were first developed. The advent of digital technology, the rapid growth in the number of scientists in the field of radar meteorology, and the availability of research radars to the general meteorological community led to dramatic advances in radar meteorological research beginning around 1970. The fundamental change that led the revolution was the proliferation of microprocessors and computers and associated advances in digital technology. A basic problem that hampers radar scientists is the large volume of information generated by radars. For example, a typical pulsed Doppler radar system samples data at rates as high as three million samples per second. This volume of data is sufficiently large that storage for later analysis, even today, is impractical — the data must be processed in real time to reduce its volume and convert it to useful forms. Beginning in the early 1970s, advances in hardware, data storage technology, digital displays, and software algorithms made it possible to collect, process, store, and view data at a rate equal to the rate of data ingest. A key advance was the development of efficient software to process the data stream from Doppler radars. This development occurred at about the same time that the hardware became available to implement it and led to rapid advances in Doppler measurements. Doppler radars were soon developed whose antennas rotate in azimuth and elevation so that the full hemisphere around the radar could be observed. A network of Doppler radars was installed throughout the United States in the early 1990s to monitor severe weather. Other countries are also using Doppler radars now for storm monitoring. Mobile, airborne, and spaceborne meteorological research radars were developed in the 1980s and 1990s for specialized applications. Scientists currently use these radars and other types of modern meteorological radar systems to study a wide range of meteorological phenomena.
BASIC OPERATING PRINCIPLES OF RADAR Radars transmit brief pulses of microwave energy. Each pulse lasts about 1 microsecond, and pulses are separated by a few milliseconds. The distance r to a target, determined from the time interval t, between transmission of the microwaves and reception of the echo, is given by ct , (1) r= 2 where c is the speed of light.
The pulse repetition frequency determines the maximum unambiguous range across which a radar can detect targets. After a pulse has been transmitted, the radar must wait until echoes from the most distant detectable target of interest return before transmitting the next pulse. Otherwise, echoes of the nth pulse will arrive from distant targets after the (n + 1)-th pulse has been transmitted. The late arriving information from the nth pulse will then be interpreted as echoes of the (n + 1)-th pulse. Echoes from the distant targets will then be folded back into the observable range and will appear as weak elongated echoes close to the radar. Echoes from distant targets that arrive after the transmission of a subsequent pulse are called second-trip or ghost echoes. The maximum unambiguous range rmax , for a radar is given by rmax =
c , 2F
(2)
where F is the pulse repetition frequency. Depending upon the application, the F chosen is generally from 400–1,500 s−1 , leading to rmax between 375 and 100 km. Although a low F is desirable for viewing targets far from the radar, there are other issues that favor a high value for F. Accurate measurement of echo intensity, for example, requires averaging information from a number of pulses from the same volume. The accuracy of the measurement is directly related to the number of samples in the average, so a high F is desired. Using Doppler measurements, F determines the range of velocities observable by the radar. A greater range of velocities can be observed by using a higher F. The four primary parts of a typical pulsed Doppler radar, a transmitter, antenna, receiver, and display, are contained in the simplified block diagram of a Doppler radar shown in Fig. 2. The transmitter section contains a microwave tube that produces power pulses. Two kinds of transmitter tubes, magnetrons and klystrons, are in general use. The magnetron is an oscillator tube in which the frequency is determined mainly from its internal structure. The klystron, illustrated in Fig. 2, is a power amplifier. Its microwave frequency is established by mixing lower frequency signals from a stable lowpower oscillator, termed the STALO, and a coherent local oscillator, termed the COHO. Microwaves are carried from the klystron to the antenna through waveguide plumbing that is designed to minimize energy loss. The transmitter and receiver normally share a single antenna, a task accomplished by using a fast switch called a duplexer that connects the antenna alternately to the transmitter and the receiver. The size and shape of the antenna determine the shape of the microwave beam. Most meteorological radars use circular-parabolic antennas that form the beam into a narrow cone that has a typical halfpower beam width between about 0.8 and 1.5° . Radar antennas typically can be rotated in both azimuth and elevation, so that the entire hemisphere around the radar can be observed. Wind profiler radars employ ‘‘phasedarray’’ antennas, whose beam is scanned by electronic means rather than by moving the antenna. Side lobes, peaks of energy transmission that occur outside the main beam, as shown in Fig. 2, complicate the radiation pattern.
WEATHER RADAR al
Electric field
Side lobes
Pulse
Antenna Half-power beam width Transmitter Duplexer switch
Klystron amplifier
Pulse modulator
Frequency mixer
Frequency mixer
STALO microwave oscillator Amplifier
Display
COHO microwave oscillator
Phase detector
Receiver
Figure 2. Simplified block diagram of a radar showing the key hardware components, the antenna radiation pattern, and a microwave frequency pulse.
Targets that lie off the beam axis may be illuminated by power transmitted into the side lobes. Echoes from these targets cannot be distinguished from echoes from targets in the main lobe. Echoes from side lobes introduce confusion and error into radar observations and are undesirable. Other undesirable echoes occur when the side lobes, or the main lobe, strikes ground targets such as trees. The echoes from these objects, called ground clutter, sometimes make it difficult to interpret meteorological echoes in the same vicinity. When the microwaves strike any object in their path, a small part of the energy is reflected or scattered back toward the antenna. The antenna receives backscattered waves from all scatters in a volume illuminated by a pulse. These waves superimpose to create the received waveform, which passes along the waveguide through the duplexer into the receiver. The power collected by the antenna is small. Whereas a typical peak transmitted power might be a megawatt (106 watts), the typical received power might be only a nanowatt (10−9 watts). The receiver first amplifies the signal and then processes it to determine its amplitude, which is used to calculate the radar reflectivity factor. The radar reflectivity factor is proportional to the sum of the sixth power of the diameter of all of the raindrops in the radar volume and is related to the precipitation intensity. Doppler radar receivers also extract information about the phase of the returned wave, which is used to determine the radial velocity, the velocity of the scatterers in the direction of the radar beam. The radial velocity is related to the component of the wind in the direction of the beam and, when the beam is pointed above the horizontal, to the terminal fall velocity of the particles. Polarization
1453
diversity radars determine polarization information by comparing either the intensity or phase difference between pulses transmitted at different polarization states. This information is used to estimate the shapes and types of particles in clouds. Finally, pulse-to-pulse variations in radial velocity are used to estimate the velocity spectral width, which provides a rough measure of the intensity of turbulence. Weather radars typically operate at microwave frequencies that range from 2.8–35 GHz (wavelength of 10.7–0.86 cm). A few research radars operate at frequencies that exceed 35 GHz, but their use is limited because of extreme attenuation of the beam in clouds. Radar wind profilers operate in ultrahigh frequency (UHF; 0.3–3.0 GHz) and very high frequency (VHF; 0.03–0.3 GHz) bands. Meteorologists typically use radar wavelength rather than frequency because wavelengths can be compared more directly to precipitation particle sizes. This comparison is important because radar-derived meteorological quantities such as the radar reflectivity factor are based on Rayleigh scattering theory, which assumes that particles are small relative to the wavelength. The factors that govern the choice of the wavelength include sensitivity, spatial resolution, the nature of the targets (e.g., thunderstorms, cirrus clouds), the effects of attenuation, as well as equipment size, weight, and cost. For shorter wavelengths, higher sensitivity can be achieved with smaller and cheaper radar systems; however, shorter wavelengths suffer severe attenuation in heavy precipitation, which limits their usefulness. PROPAGATION OF RADAR WAVES THROUGH THE ATMOSPHERE As electromagnetic pulses propagate outward from a radar antenna and return from a target, they pass through air that contains water vapor and may also contain water drops and ice particles. Refraction and absorption of the electromagnetic energy by vapor, water, ice, and air affect the determination of both the location and properties of meteorological targets that comprise the returned signal. The height of a radar beam transmitted at an angle φ above the earth’s surface depends on both the earth’s curvature and the refractive properties of the earth’s atmosphere. Due to the earth’s curvature, a beam propagating away from a radar will progress to ever higher altitudes above the earth’s surface. Refraction acts to oppose the increase in beam altitude. Electromagnetic waves propagate through a vacuum at the speed of light, c = 3 × 108 m s−1 . When these same waves propagate through air or water drops, they no longer propagate at c, but at a slower velocity v, that is related to properties of the medium. The refractive index of air, defined as n = c/v, is related to atmospheric dry air density, water vapor density, and temperature. The value of n varies from approximately 1.003 at sea level to 1.000 at the top of the atmosphere, a consequence of the fact that dry air density and water vapor density decrease rapidly as height increases. In average atmospheric conditions, the
1454
WEATHER RADAR
20 45
20
10 8
6
5
4
3
2
Height (km)
16
1
12 0
8 4
Figure 3. The height of a radar beam above the earth’s surface as a function of range and elevation angle considering earth curvature and standard atmospheric refraction.
0 0
vertical gradient of n is about −4 × 10−8 m−1 . According to Snell’s law of refraction, radar beams that pass through an atmosphere where n decreases with height will bend earthward. To calculate the height of a radar beam, radar meteorologists must consider both the earth’s curvature and the vertical profile of n. Normally, the vertical profile of n is estimated from tables that are based on climatologically average conditions for the radar site. Figure 3 shows examples of beam paths for various values of φ. It is obvious from Fig. 3 that the earth’s curvature dominates refraction under average conditions because the radar beam’s altitude increases as distance from the radar increases. An important consequence of the earth’s curvature is that radars cannot detect storms at long distances because the beam will pass over the distant storm tops. For example, a beam pointed at the horizon (φ = 0° ) in Fig. 3 attains a height of 9.5 km at a 400-km range. Any deviation of a radar beam from the standard paths shown in Fig. 3 is termed anomalous propagation. Severe anomalous propagation can occur in the atmosphere when n decreases very rapidly with height. Ideal conditions for severe refraction of microwaves exists when a cool moist layer of air is found underneath a warm dry layer, and temperature increases with altitude through the boundary between the layers. Under these conditions, which often occur along some coastlines (e.g., the U.S. West Coast in summer), beams transmitted at small φ can bend downward and strike the earth’s surface. In these cases, echoes from the surface appear on the radar display and cause uncertainty in the interpretation of meteorological echoes. Radar waves experience power loss from both energy absorption and scatter. Collectively, radar meteorologists refer to this power loss as attenuation. Although atmospheric gases, cloud droplets, fog droplets, and snow contribute to attenuation, the most serious attenuation is caused by raindrops and hail. Attenuation depends strongly on wavelength; shorter wavelength radars suffer the most serious attenuation. For example, a 3-cmwavelength radar can suffer echo power losses 100 times that of a 10-cm-wavelength radar in heavy precipitation. This is a particular problem when a second storm lies
50
100
150
200
250
300
350
400
Range (km)
along the beam path beyond a closer storm. Two-way attenuation by precipitation in the first storm may make the distant storm appear weak or even invisible on the radar display. Were it not for attenuation, short wavelength radars would be in general use because of their greater sensitivity, superior angular resolution, small size, and low cost. Instead, radars with wavelengths shorter than 5 cm are rarely used. One exception is a class of radars called ‘‘cloud’’ radars, that are typically pointed vertically to study the structure of clouds that pass overhead. Aircraft meteorological radars also employ short wavelengths, despite attenuation. On aircraft, weight and space constraints limit antenna size, whereas beam width constraints required for meteorological measurements using the smaller antenna normally limit useable wavelengths to 3 cm or less. Fortunately, radars whose wavelengths are near 10 cm suffer little from attenuation in nearly all meteorological conditions. For this reason, the wavelength chosen for U.S. National Weather Service WSR-88D radars was 10 cm. In the past, a few research radar systems have been equipped with two radars of different wavelength; two antennas point in the same direction and are mounted on the same pedestal. These dual-wavelength systems have been used for hail detection, improved rainfall measurements, and to understand cloud processes. Dualwavelength techniques depend on the fact that energy of different wavelengths is attenuated differently as it passes through a field of precipitation particles, and is scattered in different ways by both particles and refractive index heterogeneities. Dual-wavelength radars have received less attention since the advent of polarization diversity radars, which provide better techniques for discriminating particle types and estimating rainfall. THE WEATHER RADAR EQUATION AND THE RADAR REFLECTIVITY FACTOR Radar is useful in meteorology because the echo power scattered back to the radar by meteorological targets such as raindrops and snowflakes is related, with some
WEATHER RADAR
caveats, to meteorologically significant quantities such as precipitation intensity. The basis for relating the physical characteristics of the targets to the received echo power is the weather radar equation. The radar range equation for meteorological targets, such as raindrops, is obtained by first determining the radiated power per unit area (the power flux density) incident on the target, next determining the power flux density scattered back toward the radar by the target; and then determining the amount of back-scattered power collected by the antenna. For meteorological targets, the radar range equation is given by Pt G2 λ2 Pr = Vη, (3) 64π 3 r4 where Pr is the average received power, Pt the transmitted power, G the antenna gain, λ the wavelength of transmitted radiation, r the range, V the scattering volume, and η the reflectivity. The volume V illuminated by a pulse is determined from the pulse duration τ , the speed of light, c, the range r, and the angular beam width θ , which is normally taken as the angular distance in radians between the half-power points from the beam center (see Fig. 2). For antennas that have a circular beam pattern, the volume is given by V=
π cτ θ 2 r2 . 8
(4)
Typically the beam width is about 1° , and the pulse duration 1 microsecond, so at a range of 50 km, the scattering volume equals about 108 m3 . In moderate rain, this volume may contain more than 1011 raindrops. The contributions of each scattering element in the volume add in phase to create the returned signal. The returned signal fluctuates from pulse to pulse as the scattering elements move. For this reason, the returned signals from many pulses must be averaged to determine the average received power. The radar cross section σ of a spherical water or ice particle, whose diameter D is small compared to the wavelength λ, is given by the Rayleigh scattering law π5 σ = 4 |K|2 D6 , λ
where |K|2 is a dimensionless factor that depends on the dielectric properties of the particle and is approximately equal to 0.93 for water and 0.18 for ice at radar wavelengths. The radar reflectivity of clouds and precipitation is obtained by summing the cross sections of all of the particles in the scattering volume and is written as π5 (6) η = 4 |K|2 Z. λ The radar reflectivity factor Z is defined as
Z=
D6 , V
to meteorologists because it relates the diameter of the targets (e.g., raindrops), and therefore the raindrop size distribution, to the power received at the radar. The radar equation for meteorological targets is obtained by combining Eqs. (3),(4), and (6) and solving for Z to obtain Z=
512(2 ln 2) π 3c
λ2 Pt τ G2 θ 2
(7)
where the summation is across all of the particles in the scattering volume. The quantity Z is of prime interest
r2 Pr . K2
(8)
The term (2 ln 2) in the numerator was added to account for the fact that most antenna systems are designed for tapered, rather than uniform illumination, to reduce the effects of side lobes. Equation (8) is valid provided that the raindrops and ice particles illuminated by the radar beam satisfy the Rayleigh criterion. For long-wavelength radars (e.g., 10 cm), the Rayleigh criterion holds for all particles except for large hail. However, for shorter wavelength radars, the Rayleigh criterion is sometimes violated. Equation (8) assumes a single value of K. The true value of Z will not be measured for radar volumes that contain water and ice particles or in volumes where water is assumed to exist, but ice is actually present. Equation (8) was derived under the assumption that attenuation can be neglected. As pointed out earlier, this assumption is reasonable only for longer wavelength radars. When one or more of the assumptions used to derive Eq. (8) are invalid, the measured quantity is often termed the equivalent radar reflectivity factor (Ze ). Most problems that have invalid assumptions are minimized by selecting radars with long (e.g., 10 cm) wavelengths. It is customary to use m3 as the unit for volume and to measure particle diameters in millimeters, so that Z has conventional units of mm6 /m3 . Typical values of the radar reflectivity factor range from 10−5 mm6 /m3 to 10 mm6 /m3 in nonprecipitating clouds, 10 to 106 mm6 /m3 in rain, and as high as 107 mm6 /m3 in large hail. Because of the sixth-power weighting on diameter in Eq. (5), raindrops dominate the returned signal in a mixture of rain and cloud droplets. Because Z varies over orders of magnitude, a logarithmic scale, defined as
(5)
1455
dBZ = 10 log10
Z 1 mm6 /m3
,
(9)
is used to display the radar reflectivity factor. Radar images of weather systems commonly seen in the media (e.g., Fig. 4) show the radar reflectivity factor in logarithmic units. Images of the radar reflectivity factor overlain with regional maps permit meteorologists to determine the location and intensity of precipitation. Meteorologists often interchangeably use the terms ‘‘radar reflectivity factor’’ and ‘‘radar reflectivity,’’ although radar experts reserved the term ‘‘reflectivity’’ for η . Radar data are typically collected on a cone formed as the beam is swept through 360° of azimuth at a constant angle of elevation. A series of cones taken at several angles of elevation constitutes a radar volume. Images of the radar reflectivity factor from individual radars are typically projected from the conical surface onto a map-like format called the ‘‘plan-position indicator,’’ or PPI display,
1456
WEATHER RADAR
Figure 4. Plan-position indicator scan of the radar reflectivity factor at 0.5° elevation from the Lincoln, IL, radar on 19 April 1996 showing tornadic thunderstorms moving across the state. The red areas denote the heavy rainfall and hail, and green and blue colors denote lighter precipitation. See color insert.
where the radar is at the center, north at the top, and east at the right. Because of the earth’s curvature, atmospheric refraction, and beam tilt, distant radar echoes on a PPI display are at higher altitudes than those near the radar. Sometimes, a radar beam is swept between the horizon and the zenith at a constant azimuth. In this case, data are plotted in a ‘‘range-height indicator’’ or RHI display, which allows the meteorologist to view a vertical cross section through a storm. Meteorologists broadly characterize precipitation as convective when it originates from storms such as thunderstorms that consist of towering cumulus clouds that have large vertical motions. Convective storms produce locally heavy rain and are characterized by high values of the reflectivity factor. Convective storms typically appear on PPI displays as small cores of very high reflectivity. Weaker reflectivity values typically extend downwind of the convective cores as precipitation particles are swept downstream by the wind. Figure 4 shows an example of the radar reflectivity factor measured by the Lincoln, IL, Doppler radar during an outbreak of tornadic convective storms on 19 April 1996. The red areas in the image, which denote heavy rain, correspond to the convective area of the storm, and the green areas northeast of the convective regions denote the lighter rain downstream of the convective regions. Meteorologists characterize precipitation as stratiform when it originates from clouds that have a layered structure. These clouds have weak vertical motions, lighter
Reflectivity factor(dBZ)
65
55
45
35
25
15
5
precipitation, and generally lower values of the radar reflectivity factor. Echoes from stratiform precipitation in a PPI display appear in Fig. 5, a radar image of a snowstorm over Michigan on 8 January, 1998. Stratiform echoes are generally widespread and weak, although often they do exhibit organization and typically have narrow bands of heavier precipitation embedded in the weaker echo. On RHI scans, convective storms appear as cores of high reflectivity that extend from the surface upward high into the storm. Divergent winds at the top of the storm carry precipitation outward away from the convective region. In strong squall lines, this precipitation can extend 50–100 km behind the convective region, creating a widespread stratiform cloud. These features are all evident in Fig. 6, an RHI scan through a squall line that occurred in Kansas and Oklahoma on 11 June 1985. The radar bright band is a common characteristic often observed in RHI displays of stratiform precipitation. The bright band is a local region of high reflectivity at the melting level (see BB in Fig. 6). As ice particles fall from aloft and approach the melting level, they often aggregate into snowflakes that can reach sizes of a centimeter or more. When these particles first begin to melt, they develop a water coating on the ice surfaces. The reflectivity increases dramatically due to both the larger particle sizes and the change in the dielectric properties of the melting snow (K in Eq. 6). Because of the dependence of Z on the sixth power of the particle diameters, these
WEATHER RADAR
10
20 30 40 50 60 Reflectivity factor(dBZ )
Figure 5. Plan-position indicator scan of the radar reflectivity factor at 0.5° elevation from the Grand Rapids, MI, radar on 8 January, 1998 showing a snowstorm. Bands of heavier snowfall can be seen embedded in the generally weaker echoes. See color insert.
70
Stratiform area BB
−50
−40
−30
−20 7
1457
Convection
Altitude (km) 10
−10
0 16
10 25
20
30 34
40
50
Distance (km)
43
Reflectivity factor (dBZ )
Figure 6. Range-height indicator scan of the radar reflectivity factor taken through a mature squall line on 11 June, 1985 in Texas. The bright band is denoted by the symbol BB in the trailing stratiform region of the storm on the left side of the figure. (Courtesy of Michael Biggerstaff, Texas A & M University, with changes.) See color insert.
large water-coated snowflakes have very high reflectivity. Snowflakes collapse into raindrops, reducing in size as they melt. In addition, their fall speed increases by a factor of 6, so they quickly fall away from the melting layer and reduce particle concentrations. As a result, the reflectivity reduces below the melting level. Therefore, the band of highest reflectivity occurs locally at the melting level (see Fig. 6). The bright band can also appear in PPI displays as a ring of high reflectivity at the range where the beam intersects the melting level (see Fig. 7). Another common feature of stratiform clouds is precipitation streamers, regions of higher reflectivity which begin at cloud top and descend in the cloud. Figure 8 shows an example of a precipitation streamer observed in a snowstorm in Michigan by the NCAR ELDORA airborne radar. Streamers occur when ice particles form in local regions near a cloud top, descend, grow as they fall through the cloud, and are blown downstream by midlevel winds. The PPI and RHI displays are the most common displays used in meteorology. In research applications, radar data are often interpolated to a constant altitude and displayed in a horizontal cross section to visualize a storm’s structure better at a specific height. Similarly, interpolated data can be used to construct vertical cross sections, which appear much like the RHI display. Composite radar images are also constructed by combining reflectivity data from several radars. These composites are
typically projections of data in PPI format onto a single larger map. For this reason, there is ambiguity concerning the altitude of the echoes on composite images. The radar reflectivity factor Z is a general indicator of precipitation intensity. Unfortunately, an exact relationship between Z and the precipitation rate R does not exist. Research has shown that Z and R are approximately related by Eq. (10): Z = aRb , (10) where the coefficient a and the exponent b take different values that depend on the precipitation type. For example, in widespread stratiform rain, a is about 200 and b is 1.6 if R is measured in mm/h and Z is in mm6 /m3 . In general, radar estimates of the short-term precipitation rate at a point can deviate by more than a factor of 2 from surface rain gauge measurements. These differences are due to uncertainties in the values of a and b, radar calibration uncertainties, and other sources of error. Some of these errors are random, so radar estimates of total accumulated rainfall over larger areas and longer times tend to be more accurate. Modern radars use algorithms to display total accumulated rainfall by integrating the rainfall rate, determined from Eq. (10), across selected time periods. For example, Fig. 9 shows the accumulated rainfall during the passage of a weather system over eastern Iowa and Illinois. Radar estimated rainfall exceeded one inch locally near
1458
WEATHER RADAR
05/30/97 02:16:39 SPOL
SUR
3.4 dea 47 # DBZ
processing must be done or assumptions made to extract information about the total wind field. To obtain the Doppler information, the phase information in the echo must be retained. Phase, rather than frequency, is used in Doppler signal processing because of the timescales of the measurements. The period of the Doppler frequency is typically between about 0.1 and 1.0 millisecond. This is much longer than the pulse duration, which is typically about 1 microsecond, so only a fraction of a cycle occurs within the pulse period. Consequently, for meteorological targets, one cannot measure the Doppler frequency by just one transmitted pulse. The Doppler frequency is estimated instead by measuring the phase φ of the echo, at a specific range for each pulse in a train of pulses. Each adjacent pair of sampled phase values of the returned wave, for example, φ1 and φ2 , φ2 and φ3 , etc., can be used to obtain an estimate of the Doppler frequency fd from fd =
−15.0
−5.0
5.0
15.0
25.0
35.0
45.0
Figure 7. Plan-position indicator scan of the radar reflectivity factor through a stratiform cloud. The bright band appears as a ring of strong (red) echoes between the 40 and 50 km range. Melting snowflakes, which have a high radar reflectivity, cause the bright band. (Courtesy of R. Rilling, National Center for Atmospheric Research). See color insert.
Davenport, IA, and Champaign, IL, and other locations received smaller amounts. DOPPLER RADARS Doppler Measurements A Doppler frequency shift occurs in echoes from targets that move along the radar beam. The magnitude and direction of the frequency shift provides information about the targets’ motion along the beam, toward or away from the radar. In meteorological applications, this measurement, termed the radial velocity, is used primarily to estimate winds. The targets’ total motion consists of four components, two horizontal components of air motion, vertical air motion, and the target mean fall velocity in still air. Because only one component is observed, additional
7.5 Height (km)
Figure 8. Range-height indicator scan of the radar reflectivity factor through a snowstorm on 21 January, 1998. The data, taken with the ELDORA radar on the National Center for Atmospheric Research Electra aircraft, show a precipitation streamer, a region of heavier snow that develops near the cloud top and is carried downstream by stronger midlevel winds. See color insert.
(φn+1 − φn )F . 2π
(11)
Conceptually, the average value of fd from a number of pulse pairs determines the final value of the Doppler frequency. In actual signal processing, each phase measurement must be calculated from the returned signal by performing an arctangent calculation. This type of calculation is computationally demanding for radars that collect millions of echo samples each second. In practice, a more computationally efficient technique called the Pulse Pair Processor, which depends on the signal autocorrelation function, is normally used to extract the Doppler frequency. Unfortunately, inversion of the sampled phase values to determine the Doppler frequency and the target radial velocity is not unique. As a result, velocity ambiguity problems exist for Doppler radar systems. In the inversion process, Doppler radars normally use the lowest frequency that fits the observed phase samples to determine the target radial velocity. Using this approach, the maximum unambiguous observable radial velocity vr,max is given by λF |vr,max | = . (12) 4 Doppler frequencies that correspond to velocities higher than |vr,max | are aliased, or folded, back into the observable range. For example, if |vr,max | = 20 m s−1 , then the range of observable velocities will be −20 m s−1 ≤ vr,max ≤ 20 m s−1 ,
Precipitation streamer
5.0 2.5 ELDORA radar 0.0 −10
0 Distance (km)
−15
0
15
30 10
45 (dBZ )
WEATHER RADAR
ILX Storm total precip
1459
4:30 pm CDT Wed Jun 14, 2000 8.0 6.0 5.0 4.0 3.0 2.5 2.0 1.5 1.0 0.8 0.5 0.3 0.2 0.1
Atmospheric Sciences, University of Illinois at Urbana-Champaign
http://www.atmos.uiuc.edu/
0.0
Figure 9. Total precipitation through 4:30 P.M. local time on 14 June, 2000, as measured by the National Weather Service radar at Lincoln, IL, during the passage of a storm system. See color insert.
and a true velocity of 21 m s−1 would be recorded by the radar system as −19 m s−1 . Folding of Doppler velocities is related to a sampling theorem known as the Nyquist criterion, which requires that at least two samples of a sinusoidal signal per cycle be available to determine the frequency of the signal. In a pulse Doppler radar, fd is the signal frequency, and F is the sampling rate. The velocity vr,max , corresponding to the maximum unambiguous Doppler frequency, is commonly called the Nyquist velocity. The Nyquist velocity depends on wavelength (Eq. 12), so that long-wavelength radars (e.g., 10 cm) have a larger range of observable velocities for the same pulse repetition frequency. For example, at F = 1,000 s−1 , the Nyquist velocity will be 25 m s−1 at λ = 10 cm, but only 7.5 m s−1 at λ = 3 cm. Velocities >25 m s−1 commonly occur above the earth’s surface, so velocities recorded by the shorter wavelength radar could potentially be folded multiple times. This is another reason that shorter wavelength radars are rarely used for Doppler measurements. From Eq. (12), it is obvious that long wavelength and high F are preferable for limiting Doppler velocity ambiguity. The choice of a high value of F to mitigate Doppler velocity ambiguity is directly counter to the need for a low value of F to mitigate range ambiguity [Eq. (2)]. Solving for F in Eq. (12) and substituting the result for F in Eq. (2),
we find that rmax vmax =
cλ . 8
(13)
This equation, which shows that the maximum unambiguous range and radial velocity are inversely related, is called the Doppler dilemma because a choice good for one parameter will be a poor choice for the other. Figure 10 shows the limits imposed by the Doppler dilemma for several commonly used wavelengths. There are two ways in common use to avoid the restrictions imposed on range and radial velocity measurements by the Doppler dilemma. The first, which is implemented in the U.S. National Weather Service radars, involves transmitting a series of pulses at a small F, followed by another series at large F. The first set is used to measure the radar reflectivity factor out to longer range, whereas the second set is used to measure radial velocities across a wider Nyquist interval, but at a shorter range. An advantage of this approach is that erroneous velocities introduced from range-folded echoes (i.e., from targets beyond the maximum unambiguous range) during the large F pulse sequence can be identified and removed. This is accomplished by comparing the echoes in the small and large F sequences and deleting all data that appears
Maximum unambiguous velocity (m/s)
1460
WEATHER RADAR
39 36 33 30 27 24 21 18 15 12 9 6 3 0
λ = 10.0 cm
λ = 5.0 cm λ = 3.2 cm λ = 0.86 cm 0
50 100 150 200 Maximum unambiguous range (km)
250
Figure 10. Relationship between the maximum unambiguous range and velocity for Doppler radars of different wavelength (λ).
only in the high F sequence. This approach has the disadvantage of increasing the dwell time (i.e., slowing the antenna rotational rate) because a sufficiently large number of samples is required to determine both the radar reflectivity factor and the radial velocity accurately. A second approach involves transmitting a series of pulses, each at a slightly different value of F, in an alternating sequence. From these measurements, it is possible to calculate the number of times that the velocity measurements have been folded. Algorithms can then unfold and correctly record and display the true radial velocities.
the signal. In the real atmosphere, where billions of drops and ice particles are contained in a single pulse volume, the amplitude of the signal and the Doppler frequency (and therefore the radial velocity) vary from pulse to pulse due to the superimposition of the backscattered waves from each of the particles as they change position relative to each other. The returned signal from each pulse will have a different amplitude and frequency (and radial velocity) which depend on the relative size and position of each of the particles that backscatters the waves. The distribution of velocities obtained from a large number of pulses constitutes the Doppler power spectrum. Figure 11, a Doppler power spectrum, shows the power S returned in each velocity interval v, across the Nyquist interval ±vr,max . The spectrum essentially represents the reflectivity-weighted distribution of particle radial speeds; more echo power appears at those velocities (or frequencies) whose particle reflectivities are greater. Some power is present at all velocities due to noise generated from the electronic components of the radar, the sun, the cosmic background, and other sources. The meteorological signal consists of the peaked part of the spectrum. Full Doppler spectra have been recorded for limited purposes, such as determining the spectrum of fall velocities of precipitation by using a vertically pointing radar or estimating the maximum wind speed in a tornado. However, recording the full Doppler spectrum routinely requires such enormous data storage capacity that, even today, it is impractical. In nearly all applications, the full Doppler spectrum is not recorded. More important to meteorologists are the moments of the spectrum, given by
The Doppler Spectrum
vr,max
Pr =
S(vr )v,
(14)
−vr,max
vr,max
vr =
vr S(vr )v
−vr,max
,
Pr
(15)
0 Power (dB below peak)
The discussion of Doppler radar to this point has not considered the fact that meteorological targets contain many individual scatterers that have a range of radial speeds. The variation of radial speeds is due to a number of factors, including wind shear within the pulse volume (typically associated with winds increasing with altitude), turbulence, and differences in terminal velocities of the large and small raindrops and ice particles. The echoes from meteorological targets contain a spectrum of Doppler frequencies that superimpose to create the received waveform. The way that these waves superimpose changes from pulse to pulse, because, in the intervening time, the particles in the pulse volume change position relative to one another. This can best be understood by considering a volume containing only two raindrops of the same size that lie on the beam axis. Consider that these raindrops are illuminated by a microwave beam whose electrical field oscillates in the form of a sine wave. If the raindrops are separated by an odd number of halfwavelengths, then the round-trip distance of backscattered waves from the two drops will be an even number of wavelengths. The returned waves, when superimposed, will be in phase and increase the amplitude of the signal. On the other hand, consider a case where the two drops are separated by an odd number of quarter-wavelengths. In this case, the superimposed returned waves will be 180° out of phase and will destructively interfere, eliminating
−20
Spectral width Mean velocity
−40 −Vr,max
0 Radial velocity (ms−1)
+Vr,max
Figure 11. An idealized Doppler velocity spectrum. The average returned power is determined from the area under the curve, the mean radial velocity as the reflectivity-weighted average of the velocity in each spectral interval (typically the peak in the curve), and the spectral width as the standard deviation normalized by the mean power [see Eqs. (14)–(16)].
WEATHER RADAR
vr,max
σv =
(vr − vr )2 S(vr )v
−vr,max
Pr
.
(16)
The quantity Pr , the area under the curve in Fig. 11, is the averaged returned power, from which the radar reflectivity factor can be determined by using Eq. (8). The mean radial velocity vr , typically the velocity near the peak in the curve, represents the average motion of the precipitating particles along the radar beam. The spread in velocities is represented by the spectral width σv . The spectral width gives a rough estimate of the turbulence within the pulse volume. Processing techniques used in Doppler radars extract these parameters, which are subsequently displayed and recorded for future use. Doppler Radial Velocity Patterns in PPI Displays Doppler radial velocity patterns appearing in radar displays are complicated by the fact that a radar pulse moves higher above the earth’s surface, as it recedes from the radar. Because of this geometry, radar returns originating from targets near the radar represent the low-level wind field, and returns from distant targets represent winds at higher levels. In a PPI radar display, the distance away from the radar at the center of the display represents both a change in horizontal distance and a change in vertical distance. To determine the wind field at a particular elevation above the radar, radar meteorologists must examine the radial velocities on a ring at a fixed distance from the radar. The exact elevation represented by a particular ring depends on the angle of elevation of the radar beam. Figure 12 shows two examples illustrating the relationship between radial velocity patterns observed on radar images and corresponding atmospheric wind profiles. The examples in Fig. 12 are simulated, rather than real images. Doppler velocity patterns (right) correspond to vertical wind profiles (left), where the wind barbs indicate wind speed and direction from the ground up to 24,000 feet (7,315 m). Each tail on a wind barb represents 10 knots (5 m s−1 ). The direction in which the barb is pointing represents the wind direction. For example, a wind from the south at 30 knots would be represented by an upward pointing barb that has three tails, and a 20-knot wind from the east would be represented by a left pointing barb that has two tails. Negative Doppler velocities (blue-green) are toward the radar and positive (yellow–red) are away. The radar location is at the center of the display. In the top example in Fig. 12, the wind speed increases from 20 to 40 knots (10 to 20 m s−1 ) between zero and 12,000 feet (3,657 m) and then decreases again to 20 knots at 24,000 feet (7,315 m). The wind direction is constant. The radar beam intersects the 12,000-foot level along a ring halfway across the radar display, where the maximum inbound and outbound velocities occur. The bottom panels of Fig. 12 show a case where the wind speed is constant at 40 knots, but the wind direction varies from southerly to westerly between the ground and 24,000 feet. The
1461
innermost rings in the radar display show blue to the south and orange to the north, representing a southerly wind. The outermost rings show blue to the west and orange to the east, representing westerly winds. Intermediate rings show a progressive change from southerly to westerly as one moves outward from the center of the display. In real applications, wind speed and direction vary with height and typically vary across the radar viewing area at any given height. The radial velocity patterns are typically more complicated than the simple patterns illustrated in Fig. 12. Of particular importance to radar meteorologists are radial velocity signatures of tornadoes. When thunderstorms move across the radar viewing area and tornadoes are possible, the average motion of a storm is determined from animations of the radar reflectivity factor or by other means and is subtracted from the measured radial velocities to obtain the storm-relative radial velocity. Images of the storm-relative radial velocity are particularly useful in identifying rotation and strong winds that may indicate severe conditions. Tornadoes are typically less than 1 kilometer wide. When a tornado is present, it is usually small enough that it fits within one or two beam widths. Depending upon the geometry of the beam, the distance of the tornado from the radar, and the location of the beam relative to the tornado, the strong winds of the tornado will typically occupy one or two pixels in a display. Adjacent pixels will have sharply different storm-relative velocities, typically one strong inbound and one strong outbound. Figure 13b shows a small portion of a radar screen located north of a radar (see location in Fig. 13a). Winds in this region are rotating (see Fig. 13c), and the strongest rotation is located close to the center of rotation, as would occur in a tornado. The radial velocity pattern in Fig. 13b is characteristic of a tornado vortex signature. Often, the winds will be so strong in a tornado that the velocities observed by the radar will be folded in the pixel that contains the tornado. Tornado vortex signatures take on slightly different characteristics depending on the position of individual radar beams relative to the tornado and whether or not the velocities are folded. Single Doppler Recovery of Wind Profiles A single Doppler radar provides measurements of the component of target motion along the beam path. At low elevations, the radial velocity is essentially the same as the radial component of the horizontal wind. At high elevation angles, the radial velocity also contains information about the targets’ fall velocity and vertical air motion. Meteorologists want information about the total wind, not just the radial velocity. The simplest method for recovering a vertical profile of the horizontal wind above a radar is a technique called the velocity-azimuth display (VAD), initially named because the wind was estimated from displays of the radial velocity versus azimuth angle at a specific distance from the radar, as the radar scanned through 360° of azimuth. In the VAD technique, the radar antenna is rotated through 360° at a fixed angle of elevation. At a fixed value of range (and elevation above the radar), the sampled volumes lie on a circle centered on the radar.
1462
WEATHER RADAR
Height (kft)
24
12
0 180 225 270 315 360
Height (kft)
24
12
0
0
20
40
60
Wind speed (knots)
80
Wind speed (kt) −51
−37
−24
−10
3
17
30
44
Height (kft)
24
12
0 180 225 270 315 360
Height (kft)
24
12
0
0
20
40
60
Wind speed (knots)
80
Wind speed (kt) −51
−37
−24
−10
3
17
30
44
Figure 12. (Top) Doppler radial velocity pattern (right) corresponding to a vertical wind profile (left) where wind direction is constant and wind speed is 20 knots at the ground and at 24,000 feet altitude and 40 knots at 12,000 feet. Negative Doppler velocities (blues) are toward the radar, which is located at the center of the display. (Bottom) Same as top, except the wind speed is constant and the wind direction varies from southerly at the ground to westerly at 24,000 feet. (Courtesy of R. A. Brown and V. T. Woods, National Oceanic and Atmospheric Administration, with changes). See color insert.
The circles become progressively larger and higher in altitude as range increases, as shown in Fig. 14. The standard convention used for Doppler radars is that approaching radial velocities are negative and receding
velocities are positive. To understand how VAD works, assume initially that the wind is uniform at 10 m s−1 from the northwest across a circle scanned by the radar and that the particles illuminated by the radar along this
WEATHER RADAR (a)
N
Display window
W
E Radar 50 n miles 100 n miles S
Radial velocity pattern in display window
(c) Airflow in display window
(b)
T
1463
to obtain additional information about atmospheric properties, such as vertical air motion. National Weather Service radars used for weather monitoring and forecasting in the United States use the VAD scanning technique routinely to obtain vertical wind profiles every 6 minutes. Figure 15 shows an example of VAD-determined wind profiles from the Lincoln, IL, radar during a tornado outbreak on 19 April 1996. The data, which is typical of an environment supporting tornadic thunderstorms, shows 30-knot southerly low-level winds just above the radar, veering to westerly at 90 knots at 30,000 feet. Profiles such as those in Fig. 15 provide high-resolution data that also allow meteorologists to identify the precise position of fronts, wind shear zones that may be associated with turbulence, jet streams, and other phenomena. The VAD technique works best when echoes completely surround the radar, when the sources of the echoes are clouds that are stratiform rather than convective, and when the echoes are deep. In clear air, winds can be recovered only at lower elevations where the radar may receive echoes from scatterers such as insects and refractive index heterogeneities. Single Doppler Recovery of Wind Fields
Wind speed (knots) −51
−37
−24
−10
3
17
30
44
Figure 13. (a) Location of the 27 × 27 nautical mile radial velocity display window in bottom figures. The window is located 65 nautical miles north of the radar. (b) Radial velocity pattern corresponding to a tornado vortex signature (peak velocity = 60 kt, core radius = 0.5 nautical miles). One of the beams is centered on the circulation center. (c) Wind circulation corresponding to radial velocity pattern. Arrow length is proportional to wind speed, and the curved lines represent the overall wind pattern. (Courtesy of R. A. Brown and V. T. Woods, National Oceanic and Atmospheric Administration, with changes.) See color insert.
circle are falling at 5 m s−1 , as shown in Fig. 14a. Under these conditions, the radial velocity reaches its maximum negative value when the antenna is pointed directly into the wind and reaches its maximum positive value when pointed directly downwind. The radial velocity is negative when the antenna is pointed normal to the wind direction because of the particle fall velocity. When plotted as a function of azimuthal angle, the radial velocity traces out a sine wave (Fig. 14b), the amplitude of the sine wave is a measure of the wind velocity, the phase shift in azimuth from 0° is a measure of the wind direction, and the displacement of the center axis of the sine wave from 0 m s−1 is a measure of the vertical motion of the particles (Fig. 14b–d). In reality, the flow within a circle may not be uniform. However, with some assumptions, the winds and properties of the flow related to the nonuniformity, such as air divergence and deformation, can be estimated by mathematically determining the value of the fundamental harmonics from the plot of radial velocity versus azimuthal angle. These properties are used in research applications
The VAD technique permits meteorologists to obtain only a vertical profile of the winds above the radar. In research, scientists are often interested in obtaining estimates of the total three-dimensional wind fields within a storm. Normally, such wind fields can be obtained only by using two or more Doppler radars that simultaneously view a storm from different directions. However, under certain conditions, it is possible to retrieve estimates of horizontal wind fields in a storm from a single Doppler radar. In the last two decades, a number of methodologies have been developed to retrieve single Doppler wind estimates; two of them are briefly described here. The first method, called objective tracking of radar echo with correlations, or TREC, employs the technique of pattern recognition by crosscorrelating arrays of radar reflectivities measured several minutes apart to determine the translational motion of the echoes. One advantage of the TREC method is that it does not rely on Doppler measurements and therefore can be used by radars that do not have Doppler capability. Typically, fields of radar reflectivity are subdivided into arrays of dimensions from 3–7 km, which for modern radars, consists of 100–500 data points. The wind vector determined from each array is combined with those from other arrays to provide images of the horizontal winds across the radar scan which can be superimposed on the radar reflectivity. Figure 16 shows an example of TRECderived winds from the Charleston, SC, WSR-57 radar, a non-Doppler radar, during the landfall of Hurricane Hugo in 1989. The TREC method in this case captured the salient features of the hurricane circulation, including the cyclonic circulation about the eye, the strongest winds near the eye wall (seen in the superimposed reflectivity field), and a decrease in the magnitude of the winds with distance outside the eyewall. The strongest winds detected, 55–60 m s−1 , were consistent with surface wind speeds observed by other instruments.
1464
WEATHER RADAR
(a)
Zenith Horizontal ring
North
East
H
Vr
V
H
H V
Vr
Vr
V
H West
South
V
Radar
Radial velocity
(b) 15
Radial velocity (Vr)
10
Minimum radial velocity looking into wind
5 0 −5
Maximum radial velocity looking downwind
−10 −15 N
NE
E
SE
S
SW
W
NW
N
Beam direction
Radial velocity
(c) 15 Component of radial velocity due to vertical (V ) motion of precipitation
10 5 0 −5 −10 −15 N
NE
E
SE S SW Beam direction
W
NW
Radial velocity
(d) 15 Figure 14. (a) Geometry for a radar velocity azimuth display scan. The horizontal wind vector is denoted by H, the vertical fall speed of the precipitation by V, and the radial velocity by Vr . (b) Measured radial velocity as a function of azimuthal angle corresponding to H = 10 m s−1 and V = 5 m s−1 . (c) Same as (b), but only for that part of the radial velocity contributed by the fall speed of the precipitation. (d) Same as (b), but only for that part of the radial velocity contributed by the horizontal wind.
N
Component of radial velocity due to horizontal (H ) wind
10 5 0 −5 −10 −15
A second method, called the synthetic dual Doppler (SDD) technique, uses the Doppler radial velocity measurements from two times. This method can be used if the wind field remains nearly steady state in the reference frame of the storm between the times and the storm motion results in a significant change in the radar viewing angle with time. Figure 17 shows a schematic of the SDD geometry for (a) radar-relative and (b) storm-relative coordinates. In radar-relative coordinates, the storm is first viewed at t − t/2, where t is the time separation
N
NE
E
SE
S
SW
W
NW
N
Beam direction
of the two radar volumes used for the SDD analysis and t is the time of the SDD wind retrieval. At some later time, t + t/2, the storm has moved a distance d to a new location, and the radar viewing angle β from the radar to the storm changes. Using radial velocity measurements from these two time periods, an SDD horizontal wind field can be retrieved for an intermediate time period t, when the storm was located at an intermediate distance d/2. The geometry of the storm-relative coordinates in Fig. 17b is identical to a conventional dual-Doppler (i.e., two-radar)
WEATHER RADAR
1465
30 25 Wind speed
20
0 − 20 kts
15
20 − 40 kts
14
40 − 60 kts
KFT MSL
13
60 − 80 kts
12
> 80 kts
11 10 9 8 7 6 5 4 3 2
1 Time
22:51 22:57 23:03 23:09 23:14 23:20 23:26 23:32 23:38 23:44
Figure 15. Wind speed and direction as a function of height and time derived using the velocity-azimuth display (VAD) technique. The data were collected by the Lincoln, IL, National Weather Service radar on 19 April 1996 between 22 : 51 and 23 : 44 Greenwich Mean Time. Long tails on a wind barb represent 10 knots (5 m s−1 ), short tails 5 knots, and flags 50 knots. The direction in which the barb is pointing represents the wind direction. For example, a wind from the north at 20 knots would be represented by downward pointing barb that has two tails, and a 60-knot wind from the west would be represented by a right pointing barb that has a flag and a long tail. See color insert.
Hugo 22 Sept 89
2:04− 2:07
dBZ
Distance north of radar (km)
100
5 15 25 35 45
0
−100
50. m/s −200 −100
0 100 Distance east of radar (km)
200
Figure 16. TREC-determined wind vectors for Hurricane Hugo overlaid on radar reflectivity. A 50 m s−1 reference vector is shown on the lower right. (From J. Tuttle and R. Gall, A single-radar technique for estimating the winds in a tropical cyclone. Bulletin of the American Meteorological Society, 80, 653–688, 1998. Courtesy of John Tuttle and the American Meteorological Society.) See color insert.
system that is viewing a single storm during the same time period. However, when using the SDD technique for a single radar, it necessary to ‘‘shift the position’’ of the radar a distance d/2 for both time periods by using the storm propagation velocity. Figure 17b shows that using data collected at two time periods and shifting the radar position can, in essence, allow a single radar to obtain measurements of a storm from two viewing geometries at an intermediate time and location. Figure 18 shows an example of SDD winds recovered for a vortex that developed over the southern portion of Lake Michigan during a cold air outbreak and moved onshore in Michigan. The forward speed of the vortex has been subtracted from the wind vectors to show the circulation better. The bands of high reflectivity are due to heavy snow. The SDD wind retrieval from the WSR-88D radar at Grand Rapids, MI, clearly shows the vortex circulation and convergence of the wind flows into the radial snowbands extending from the vortex center, which corresponds with the position of a weak reflectivity ‘‘eye.’’ Multiple Doppler Retrieval of 3-D Wind Fields Three-dimensional wind fields in many types of storm systems have been determined during special field campaigns in which two or more Doppler radars have been deployed. In these projects, the scanning techniques
1466
WEATHER RADAR
(a)
Storm at t − ∆t /2 ∆ Radar viewing angle b
d /2 Ra
SDD storm at t
da
rb
as
S ve torm loc ity
eli
ne
d
d /2
Storm at t + ∆t /2
(b)
Radar location at t + ∆t /2 SDD storm at t
d /2
St ve orm loc da ity rb as eli ne d
Ra
The techniques to recover the wind fields from radial velocity measurements from more than one Doppler radar are termed multiple Doppler analysis. Radar data are collected in spherical coordinates (radius, azimuth, elevation). Multiple Doppler analyses are normally done in Cartesian space, particularly because the calculation of derivative quantities, such as divergence of the wind field, and integral quantities, such as air vertical velocity, are required. For this reason, radial velocity and other data are interpolated from spherical to Cartesian coordinates. Data are also edited to remove nonmeteorological echoes such as ground clutter and second-trip echoes, and are unfolded to correct velocity ambiguities discussed previously. The data must also be adjusted spatially to account for storm motion during sampling. The equation relating the radial velocity measured by a radar to the four components of motion of particles in a Cartesian framework is vr = u sin a cos e + v cos a cos e + (w + wt ) sin e
d /2 ∆ Radar viewing angle b Radar location at t − ∆t /2 Figure 17. Schematic diagrams of the SDD geometry for (a) radar-relative and (b) storm-relative coordinates. t is the time separation of the two radar volumes used for the SDD analysis and t is the time of the SDD retrieval. The distance d is analogous to the radar baseline in a conventional dual-Doppler system. The solid circles represent (panel a) the observed storm locations and (panel b) the shifted radar locations. The open circles denote the location of the radar and the SDD retrieved storm position. (Courtesy of N. Laird, Illinois State Water Survey).
are optimized to cover the entire storm circulation from cloud top to the ground in as short a time as feasible.
(17)
where u, v, and w are the west–east, north–south, and vertical components of air motion in the Cartesian system, a and e are the azimuthal and elevation angles of the radar, wt is the mean fall velocity of the particles, and vr is the measured radial velocity. For each radar viewing a specific location in a storm, vr , a, and e are known and u, v, w, and wt are unknown. In principle, four separate measurements are needed to solve for the desired four unknown quantities. In practice, four measurements of radial velocity from different viewing angles are rarely available. Most field campaigns employ two radars, although some have used more. The remaining unknown variables are estimated by applying constraints imposed by mass continuity in air flows, appropriate application of boundary conditions at storm boundaries, estimates of particle fall velocities based on radar reflectivity factors, and additional information available from other sensors.
25
−5 10 M/S 0
Figure 18. Radar reflectivity factor and winds (relative to the forward speed of the vortex) at a 2-km altitude within a vortex over Lake Michigan derived from two radar volumes collected by the Grand Rapids, MI, WSR-88D Doppler radar at 1,023 and 1,123 UTC on 5 December 1997. The winds were derived using the synthetic dual-Doppler technique. (Courtesy of N. Laird, Illinois State Water Survey). See color insert.
5 −25 10
15
−50
20 −75 25 −100 −125
−100
−75
−50
Distance east (km)
−25
0
Reflectivity factor (dBZ )
Distance north (km)
0
WEATHER RADAR
The details of multiple Doppler analysis can become rather involved, but the results are often spectacular and provide exceptional insight into the structure and dynamics of storm circulations. For example, Fig. 19a shows a vertical cross section of the radar reflectivity and winds across a thunderstorm derived from measurements from two Doppler radars located near the storm. The forward speed of the storm has been subtracted from the winds to illustrate the circulations within the storm better. The vertical scale on Fig. 19a is stretched to illustrate the storm structure better. A 15-km wide updraft appears in the center of the storm, and central updraft speeds approach 5 m s−1 . The updraft coincides with the heaviest rainfall, indicated by the high reflectivity region in the center of the storm. Figure 19b shows the horizontal wind speed in the plane of the cross section. The sharp increase in wind speed marks
1467
the position of an advancing front, which is lifting air to the east of the front (right side of the figure), creating the updraft appearing in Fig. 19a. Three-dimensional Doppler wind analyses in severe storms have been used to determine the origin and trajectories of large hailstones, examine the origin of rotation in tornadoes, study the circulations in hurricanes, and investigate the structure of a wide variety of other atmospheric phenomena such as fronts, squall lines, and downbursts. Retrieval of Thermodynamic Parameters from 3-D Doppler Wind Fields Wind in the atmosphere is a response to variations in atmospheric pressure. Pressure variations occur on many scales and for a variety of reasons but are closely tied on larger scales to variations in air temperature. Newton’s second law of motion describes the acceleration of air
(a) 10
8
−5
7
0 5
6
10 15
5
20 4
25 30
3
35 2
Radar reflectivity factor (dBZ)
Height above sea level (km)
9
5 m/s
40 20 m/s
1 0
8
16
24
32
40 48 56 Distance (km)
64
72
80
88
5
9
7 Height above sea level (km)
8
9 11
7
13 6
15 17
5
19 4
21 23
3
25 2
Wind speed in plane of cross section (m/s)
(b) 10
27
1 0
8
16
24
32
40 48 56 Distance (km)
64
72
80
88
Figure 19. Vertical cross sections through a thunderstorm derived from measurements from two Doppler radars. The storm occurred in northeast Kansas on 14 February, 1992. The panels show (a) radar reflectivity (dBZ) and winds (vectors, m s−1 ) in the plane of the cross section. The forward speed of the storm has been subtracted from the wind vectors to illustrate the vertical circulations within the storm; (b) the horizontal wind speed (m s−1 ) in the plane of the cross section. Positive values of the wind speed denote flow from left to right; (c) the perturbation pressure field (millibars) within the storm. The perturbation pressure is the pressure field remaining after the average pressure at each elevation is subtracted from the field. See color insert.
1468
WEATHER RADAR
(c) 10 -1.0
9
Height above sea level (km)
-0.6
7
-0.4
6
-0.2 0.0
5
0.2 4
0.4
3
0.6
2
0.8
Pressure perturbation (millibars)
-0.8 8
1.0 1 0 Figure 19. (continued)
and its relation to atmospheric pressure, thermal fields, the earth’s rotation, and other factors. For atmospheric processes, Newton’s law is expressed in the form of three equations describing the acceleration of the winds as a response to forces in the three cardinal directions. Doppler wind fields and the momentum equations have been used to retrieve perturbation pressure and buoyancy information from derived three-dimensional multiple Doppler wind fields. More recently, retrieval techniques have also been developed that incorporate the thermodynamic equation, which relates temperature variations in air parcels to heating and cooling processes in the atmosphere. These retrieval techniques have extended multiple Doppler radar analyses from purely kinematic descriptions of wind fields in storms to analysis of the dynamic forces that create the wind fields. As an example, the retrieved pressure perturbations associated with the storm in Fig. 19b are shown in Fig. 19c. The primary features of the pressure field include a positive pressure perturbation in the upper part of the storm located just to the left of the primary updraft, a large area of negative pressure perturbation both in and above the sharp wind gradient in the middle and upper part of the storm, a strong positive perturbation at the base of the downdraft to the left of the main updraft and a weak negative pressure perturbation in the low levels ahead of the advancing front. Physically, the pressure perturbations are associated with two physical processes: (1) horizontal accelerations at the leading edge of the front and within the outflow at the top of the updraft, and (2) positive buoyancy in the updraft and negative buoyancy in the downdraft. Analyses such as these help meteorologists understand how storms form and organize and the processes that lead to their structures. POLARIZATION DIVERSITY RADARS As electromagnetic waves propagate away from a radar antenna, the electrical field becomes confined to a plane
8
16
24
32
40
48
56
64
72
80
88
Distance (km)
that is normal to the propagative direction. The orientation of the electrical field vector within this plane determines the wave’s polarization state. For radars, the electrical field vector either lies on a line or traces out an ellipse in this plane, which means that radar waves are polarized. If the electrical field lies on a line, the condition for most meteorological radars, the waves are linearly polarized. A radar wave that propagates toward the horizon is vertically polarized if the electrical field vector oscillates in a direction between the zenith and the earth’s surface, and is horizontally polarized if the vector oscillates in a direction parallel to the earth’s surface. Polarization diversity radars measure echo characteristics at two orthogonal polarizations, typically, horizontal and vertical. This is done either by changing the polarization of successive pulses and/or transmitting one and receiving both polarizations. At this time, polarization diversity radars are used only in meteorological research. However, the U.S. National Weather Service is planning to upgrade its Doppler radar network in the near future to include polarization capability. Polarization diversity radars take advantage of the fact that precipitation particles have different shapes, sizes, orientations, dielectric constants, and number densities. For example, raindrops smaller than about 1 mm in diameter are spherical, but larger raindrops progressively flatten due to air resistance and take on a ‘‘hamburger’’ shape as they become large. Hailstones are typically spherical or conical but may take on more diverse shapes depending on how their growth proceeds. Hailstones sometimes develop a water coating, while growing at subfreezing temperatures, due to heat deposited on the hailstone surface during freezing. The water coating changes the dielectric constant of the hail surface. Small ice crystals typically are oriented horizontally but become randomly oriented as they grow larger. Eventually, individual crystals form loosely packed, low-density snowflakes as they collect one another during fall. When snowflakes fall through the melting level, they develop
WEATHER RADAR
wet surfaces and a corresponding change in dielectric constant. Horizontally and vertically polarized waves are closely aligned with the natural primary axes of falling precipitation particles and therefore are ideal orientations to take advantage of the particle characteristics to identify them remotely. Linearly polarized waves induce strong electrical fields in precipitation particles in the direction of electrical field oscillation and weak fields in the orthogonal direction. For particles that have a large aspect ratio, such as large raindrops, a horizontally polarized wave induces a larger electrical field and subsequently, a larger returned signal than a vertically polarized wave. In general, the two orthogonal fields provide a means of probing particle characteristics in the two orthogonal dimensions. Differences in particle characteristics in these dimensions due to shape or orientation will appear as detectable features in the returned signal. These are backscatter effects related to particles in the radar scattering volume. Propagation effects such as attenuation, which are associated with particles located between the radar and the scattering volume, also differ for the two orthogonal polarizations. Measurement of the differences in propagation at orthogonal polarization provides further information about the characteristics of particles along the beam path. There are six backscatter variables and four propagation variables that carry meaningful information provided by polarization diversity radars that employ linear polarization. Other variables are derived from these basic quantities. The most important backscatter variables are (1) the reflectivity factor Z for horizontal polarization; (2) the differential reflectivity ZDR , which is the ratio of reflected power at horizontal and vertical polarization; (3) the linear depolarization ratio (LDR), which is the ratio of the cross-polar power (transmitted horizontally, received vertically) to the copolar power (transmitted and received horizontally); (4) the complex correlation coefficient ρhv ejδ between copolar horizontally and vertically polarized echo signals; and (5) the phase of the correlation coefficient δ, which is the difference in phase between the horizontally and vertically polarized field caused by backscattering. √ In the expression for the correlation coefficient, j = −1. Propagative effects that influence polarization measurements include (1) attenuation of the horizontally polarized signal, (2) attenuation of the vertically polarized signal, (3) depolarization, and (4) the differential phase shift DP . A differential phase shift, or lag, occurs in rain because horizontally polarized waves propagate more slowly than vertically polarized waves. This occurs because larger raindrops are oblate and present a larger cross section to horizontally polarized waves. The variable of most interest is the specific differential phase KDP , which is the range derivative of DP . The specific differential phase, it has been shown, is an excellent indicator of liquid water content and rain rate and may be superior to rain rates derived from standard Z–R relationships based on Eq. (10). An example of the capabilities of polarimetric radars in identifying precipitation type appears in Fig. 20. This figure shows a cross section of (a) Z, the radar
1469
reflectivity factor for horizontally polarized radiation; (b) ZDR , the differential reflectivity; and (c) a particle classification based on polarization variables. Note in panel (c) that the hail shaft, in yellow, has high Z and low ZDR , and the rain to the right of the hail shaft has low Z and high ZDR . These variables, combined with other polarimetric measurements, allow meteorologists to estimate the other types of particles that populate a storm. For example, the upper part of the storm in Fig. 20 contained hail and graupel (smaller, softer ice spheres), and the ‘‘anvil’’ top of the storm extending to the right of the diagram was composed of dry snow and irregular ice crystals. Identifying boundaries between different particle types uses techniques invoking ‘‘fuzzy logic’’ decisions, which take into account the fact that there is overlap between various polarization parameters for various precipitation types. The use of KDP for precipitation measurements is especially promising. The major advantages of KDP over Z is that KDP is independent of receiver and transmitter calibrations, unaffected by attenuation, less affected by beam blockage, unbiased by ground clutter cancelers, less sensitive to variations in the distributions of drops, biased little by the presence of hail, and can be used to detect anomalous propagation. For these reasons, research efforts using polarization diversity radars have focused particularly on verification measurements of rainfall using KDP . Quantitative predictions of snowfall may also be possible using polarization diversity radars, but this aspect of precipitation measurement has received less attention in meteorological research because of the greater importance of severe thunderstorms and flash flood forecasting. WIND PROFILING RADARS A wind profiling radar, or wind profiler, is a Doppler radar used to measure winds above the radar site, typically to a height of about 15 km above the earth’s surface. Wind profilers are low-power, high-sensitivity radars that operate best in clear air conditions. Profilers, which operate at UHF and VHF frequencies, detect fluctuations in the index of refraction at half the radar wavelength caused by fluctuations in the radio refractive index. These fluctuations arise from variations in air density and moisture content primarily from turbulence. Radar meteorologists assume that the fluctuations in the radio refractive index are carried along with the mean wind, and therefore the Doppler frequency shift for the motion of the scattering elements can be used to estimate the wind. Wind profilers use fixed phased array antennas. In the 404.37-MHz (74-cm wavelength) profilers used by the U.S. National Weather Service, an antenna is made up of a 13 × 13 meter grid of coaxial cables, and the antenna itself consists of many individual radiating elements, each similar to a standard dipole antenna. If a transmitted pulse arrives at each of these elements at the same time (in phase), a beam propagates away from the antenna vertically. If the pulses arrive at rows of elements at slightly different times (out of phase), a beam propagates upward at an angle to the zenith. The phasing is controlled
1470
WEATHER RADAR 06/13/97 01:39:06 SPOL
RHI 118.0 dea
1# DBZ
12.0
4.0
30.0 −27.0 06/13/97 01:39:06 SPOL
50.0 −13.0 RHI 118.0 dea
70.0
1.0
15.0
29.0
90.0 43.0
57.0
1# ZDR
12.0
4.0
30.0 −3.0 06/13/97 01:39:06 SPOL
50.0 2.0 RHI 118.0 dea
70.0
−1.0
0.0
1.0
90.0 2.0
3.0
1# PD
12.0
4.0
30.0
50.0
70.0
90.0
Ground clutter
Birds
Insects
Supercooled liquid water droplets
Irregular ice crystals
Ice crystals
Wet snow
Dry snow
Graupel/Rain
Graupel/Small hail
Rain/Hail
Hail
Heavy rain
Moderate rain
Light rain
Drizzle
Cloud
Figure 20. Range-height indicator scans of (top) radar reflectivity factor for horizontal polarization Z; (middle) differential reflectivity ZDR ; (bottom) particle classification results based on analysis of all polarimetric parameters. (From J. Vivekanandan, D. S. Zrnic, S. M. Ellis, R. Oye, A. V. Ryzhkov, and J. Straka, Cloud microphysics retrieval using S band dual-polarization radar measurements. Bulletin of the American Meteorological Society 80, 381–388 (1999). Courtesy of J. Vivekanandan and the American Meteorological Society.) See color insert.
by changing the feed cable lengths. Profilers typically use a three-beam pattern that has a vertically pointing beam and beams pointing in orthogonal directions (e.g., north and east). In the absence of precipitation, the radial velocities measured by a profiler along each beam are vre = u cos e + w sin e,
(18)
vrn = v cos e + w sin e,
(19)
vrv = w,
(20)
where vre , vrn , and vrv are the radial velocities measured by the beams in the east, north, and vertical pointing positions; u, v, and w are the wind components in the west–east, south–north, and upward directions; and e is the angle of elevation of the east and north beams above
the earth’s surface. Because the vertical beam measures w directly and e and the radial velocities are measured, these equations can be solved for u and v, which can be easily converted to wind speed and direction. Profilers are pulsed radars, so the round-trip travel time of the pulse measures the height of the wind. Precipitation particles also scatter energy and therefore contribute to the measurement of the radial velocity. When precipitation occurs, w in Eqs. (18)–(20) would be replaced by w + wt , where wt is the terminal fall velocity of the precipitation particles. Problems arise if all three beams are not filled by the same size precipitation particles. In this case, the winds may not be recoverable. The radars also require a recovery time after the pulse is transmitted before accurate data can be received by the transmitter; so information in the lowest 500 meters of the atmosphere is typically not recoverable.
WEATHER RADAR
Wind profiles can be obtained at high time resolution, often at times as short as 6 minutes, and vertical resolution of the order of 250 meters. The 6-minute time interval, compared with the standard 12-hour time interval for standard weather balloon launch wind measurements, represents a dramatic increase in a meteorologist’s capability of sampling upper atmospheric winds. Wind profilers are used for a wide range of research applications in meteorology, often in combination with other instruments, such as microwave radiometers and acoustic sounding devices that remotely measure moisture and temperature profiles. The data from profilers are presented in a format essentially identical to Fig. 15. Currently, the U.S. National Oceanic and Atmospheric Administration operates a network of 30 profilers called the Wind Profiler Demonstration Network. Most of these profilers are located in the central United States and are used for weather monitoring by the National Weather Service and in forecasting applications such as the initialization and verification of numerical weather forecasting models. MOBILE RADAR SYSTEMS Mobile radar systems consist of three distinct classes of instruments: rapidly deployable ground based radars, airborne radars, and satellite-borne radars. Each of these radar systems is well-suited to address particular research problems that are either difficult or impossible to carry out from fixed ground-based systems. Rapidly deployable ground-based radars are used to study tornadoes, land-falling hurricanes, and other atmospheric phenomena that have small space and timescales and that must be sampled at close range. These phenomena are unlikely to occur close to a fixed radar network, particularly during relatively short research field campaigns. At this time, six mobile radars are available to the meteorological research community: two truckmounted 3-cm wavelength radars called the ‘‘Doppler on Wheels,’’ or DOWs, operated jointly by the University of Oklahoma and the National Center for Atmospheric Research; two truck-mounted 5-cm wavelength radars called the Shared Mobile Atmospheric Research and Teaching (SMART) radars operated by the National Severe Storms Laboratory, Texas A&M University, Texas Tech University, and the University of Oklahoma; a 3-mm wavelength truck-mounted Doppler system operated by the University of Massachusetts; and a 3-mm wavelength trailer-mounted Doppler radar operated by the University of Miami. The DOW and SMART radars are used in dual-Doppler arrangements to measure wind fields near and within tornadoes and hurricanes. The millimeter wavelength radars have been used to study tornadoes and cloud and precipitation processes. Figure 21 shows a DOW image of the radar reflectivity and radial velocity of a tornado near Scottsbluff, Nebraska. The tornado is located at the position of the tight inbound/outbound radial velocity couplet near the center of the image in the left panel. The reflectivity factor in the right panel of the figure shows a donut-shaped reflectivity region that has a minimum in reflectivity at
1471
the tornado center. It is thought that this occurs because debris is centrifuged outward from the tornado center. In a tornado, the backscattered energy comes primarily from debris. Although great progress has been made using groundbased Doppler radars to study storm structure, remote storms such as oceanic cyclones and hurricanes cannot be observed by these systems. In addition, small-scale phenomena such as tornadoes rarely occur close enough to special fixed dual-Doppler networks so that detailed data, aside from new data obtained by the DOWs, are hard to obtain. Airborne meteorological radars provide a means to measure the structure and dynamics of these difficult to observe weather systems. Currently, three research aircraft flown by the meteorological community have scanning Doppler radars, two P-3 aircraft operated by the National Oceanic and Atmospheric Administration and a Lockheed Electra aircraft operated by the National Science Foundation through the National Center for Atmospheric Research (NCAR). The scanning technique used by these radars to map wind fields is shown in Fig. 22. In the P-3 aircraft, a single radar uses an antenna designed to point alternately at angles fore and then aft of the aircraft. In the Electra, two radars are used; the first points fore, and the second points aft. In both cases, the scan pattern consists of an array of beams that cross at an angle sufficient to sample both components of the horizontal wind. Therefore, the data from the fore and aft beams can be used as a dual-Doppler set that permits recovery of wind fields. Three other aircraft, an ER-2 high-altitude aircraft and a DC-8 operated by NASA, and a King Air operated by the University of Wyoming also have radars used for meteorological research. Airborne radars have significant limitations imposed by weight, antenna size, and electrical power requirements. Aircraft flight capabilities in adverse weather, stability in turbulence, and speed all impact the quality of the measurements. The aircraft’s precise location, instantaneous three-dimensional orientation, and beam pointing angle must be known accurately to position each sample in space. At altitudes other than the flight altitude, the measured radial velocity contains a component of the particle fall velocity, which must be accounted for in processing data. This component becomes progressively more significant as the beam rotates toward the ground and zenith. A time lag exists between the forward and aft observations during which the storm evolution will degrade the accuracy of the wind recovery. Short wavelengths (e.g., 3 cm) must be used because of small antenna and narrow beam-width requirements that make attenuation and radial velocity folding concerns. Despite these limitations, airborne Doppler radars have provided unique data sets and significant new insight into storm structure. For example, Fig. 23 shows an RHI scan of the radar reflectivity factor through a tornadic thunderstorm near Friona, TX, on 2 June 1995 measured by the ELDORA radar on the NCAR Electra aircraft. The data, which was collected during the Verification of the Origins of Rotations in Tornadoes experiment (VORTEX), shows a minimum in the reflectivity of the Friona tornado extending from the ground to the cloud top, the first
1472
WEATHER RADAR
05/21/98 00:45:48 DOW3 ROT PPI 2.0 dea 49# VJ
05/21/98 00:45:48 DOW3 ROT PPI 2.0 dea 49# DZ
−9.0
−9.0
−11.0
−11.0
−13.0
−13.0
−15.0
−15.0
−17.0
−17.0 −1.0
1.0
3.0
−24.0 −16.0 −8.0
0.0
5.0 8.0
16.0
7.0
−1.0
24.0
12.0
1.0 16.0
3.0 20.0
24.0
5.0 28.0
7.0 32.0
36.0
Figure 21. Image of radial velocity (m s−1 ; left) and reflectivity factor (dBZ; right) taken by a Doppler on Wheels radar located near a tornado in Scottsbluff, Nebraska, on 21 May 1998. Blue colors denote inbound velocities in the radial velocity image. (Courtesy of J. Wurman, University of Oklahoma.) See color insert.
(a)
(b)
Beam tilt t qr
Beam rotation
ack
ht tr
Flig
Beam tilt
Figure 22. (a) ELDORA/ASTRAIA airborne radar scan technique showing the dual-radar beams tilted fore and aft of the plane normal to the fuselage. The antennas and radome protective covering rotate as a unit about an axis parallel to the longitudinal axis of the aircraft. (b) Sampling of a storm by the radar. The flight track past a hypothetical storm is shown. Data are taken from the fore and aft beams to form an analysis of the velocity and radar reflectivity field on planes through the storm. The radial velocities at beam intersections are used to derive the two-dimensional wind field on the analysis planes. (From P. H. Hildebrand et al., The ELDORA-ASTRAIA airborne Doppler weather radar: High-resolution observations from TOGA-COARE. Bulletin of the American Meteorological Society 77, 213–232 (1996) Courtesy of American Meteorological Society.)
time such a feature has ever been documented in a tornadic storm. Space radars include altimeters, scatterometers, imaging radars, and most recently, a precipitation radar whose
measurement capabilities are similar to other radars described in this article. Altimeters, scatterometers and imaging radars are used primarily to determine properties of the earth’s surface, such as surface wave height,
WEATHER RADAR
0
5 −15
10 0
15 15
20 30
25
1473
Range (km) 45
Reflectivity factor (dBZ )
Figure 23. RHI cross section of the radar reflectivity factor through a severe thunderstorm and tornado near Friona, TX, on 2 June 1995 measured by the ELDORA radar on board the National Center for Atmospheric Research Electra aircraft. The data was collected during VORTEX, the Verification of the Origins of Rotations in Tornadoes experiment. (From R. M. Wakimoto, W. -C. Lee, H. B. Bluestein, C. -H. Liu, and P. H. Hildebrand, ELDORA observations during VORTEX 95. Bulletin of the American Meteorological Society 77, 1,465–1,481 (1996). Courtesy of R. Wakimoto and the American Meteorological Society.) See color insert.
the location of ocean currents, eddies and other circulation features, soil moisture, snow cover, and sea ice distribution. These parameters are all important to meteorologists because the information can be used to initialize numerical forecast and climate models. Surface winds at sea can also be deduced because short gravity and capillary waves on the ocean surface respond rapidly to the local near-instantaneous wind and the character of these waves can be deduced from scatterometers. The first precipitation radar flown in space was launched aboard the Tropical Rainfall Measuring Mission (TRMM) satellite in November 1997. The TRMM radar, jointly funded by Japan and the United States, is designed to obtain data concerning the three-dimensional structure of rainfall over the tropics where ground-based and oceanbased radar measurements of precipitation are almost nonexistent. Figure 24, for example, shows the reflectivity factor measured in Hurricane Mitch over the Caribbean during October 1998. A unique feature of the precipitation radar is its ability to measure rain over land, where passive microwave instruments have more difficulty. The data are being used in conjunction with other instruments on the satellite to examine the atmospheric energy budget of the tropics. The radar uses a phased array antenna that operates at a frequency of 13.8 GHz. It has a horizontal resolution at the ground of about 4 km and a swath width of 220 km. The radar measures vertical profiles of rain and snow from the surface to a height of about 20 kilometers at a vertical resolution of 250 m and can detect rain rates as low as 0.7 millimeters per hour. The radar echo of the precipitation radar consists of three components: echoes due to rain; echoes from the surface; and mirror image echoes, rain echoes received through double reflection at the surface. At intense rain rates, where the attenuation effects can be strong, new methods of data processing using these echoes have been developed to correct for attenuation. The Precipitation Radar is the first spaceborne instrument that provides threedimensional views of storm structure. The measurements are currently being analyzed and are expected to yield
Figure 24. Radar reflectivity factor within Hurricane Mitch in 1998 over the Caribbean Sea measured by the Precipitation Radar on board the Tropical Rainfall Measuring System (TRMM) satellite in October 1998. The viewing swath is 215 kilometers wide. (Courtesy of the National Aeronautics and Space Administration.) See color insert.
invaluable information on the intensity and distribution of rain, rain type, storm depth, and the height of the melting level across tropical latitudes. FUTURE DEVELOPMENTS A number of advancements are on the horizon in radar meteorology. Scientists now know that polarimetric radars have the potential to obtain superior estimates of rainfall compared to nonpolarimetric radars, and give far better information concerning the presence of hail. Because of these capabilities, polarization diversity should eventually become incorporated into the suite of radars used by the U.S. National Weather Service for weather monitoring and severe weather warning. Experimental bistatic radars, systems that have one transmitter but
1474
WEATHER RADAR
many receivers distributed over a wide geographic area, were first developed for meteorological radars in the 1990s and are currently being tested as a less expensive way to retrieve wind fields in storms. Bistatic radar receivers provide a means to obtain wind fields by using a single Doppler radar. Each of the additional receivers measures the pulse-to-pulse phase change of the Doppler shift, from which they determine the wind component toward or away from the receiver. Because these receivers view a storm from different directions, all wind components are measured, making it possible to retrieve wind fields within a storm similarly to that currently done by using two or more Doppler radars. Future networks of bistatic radars may make it possible to create images of detailed wind fields within storms in near-real time, providing forecasters with a powerful tool to determine storm structure and severity. At present, the operational network of wind profilers in the United States is limited primarily to the central United States. Eventual expansion of this network will provide very high temporal monitoring of the winds, leading to more accurate initialization of numerical weather prediction models and ultimately, better forecasts. Mobile radars are being developed that operate at different wavelengths and may eventually have polarization capability. One of the biggest limitations in storm research is the relatively slow speed at which storms must be scanned. A complete volume scan by a radar, for example, typically takes about 6 minutes. Techniques developed for military applications are currently being examined to reduce this time by using phased array antennas that can scan a number of beams simultaneously at different elevations. Using these new techniques, future radars will enhance scientists’ capability to understand and predict a wide variety of weather phenomena. ABBREVIATIONS AND ACRONYMS COHO DPW ELDORA LDR NASA NCAR
coherent local oscillator doppler on wheels electra doppler radar linear depolarization ratio national aeronautics and space administration national center for atmospheric research
PPI RHI SDD SMART STALO TREC TRMM UHF VAD VHF VORTEX WRS-57 WSR-88D
plan position indicator range-height indicator synthetic dual doppler shared mobile atmospheric research and teaching stable local oscillator objective tracking of radar echo with correlations tropical rainfall measuring mission ultra high frequency velocity azimuth display very high frequency verification of the origins of rotations in storms experiment weather surveillance radar 1957 weather surveillance radar 1988-doppler
BIBLIOGRAPHY 1. D. Atlas, ed., Radar in Meteorology, American Meteorological Society, Boston, 1990. 2. L. J. Battan, Radar Observation of the Atmosphere, University of Chicago Press, Chicago, 1973. 3. T. D. Crum, R. E. Saffle, and J. W. Wilson, Weather and Forecasting 13, 253–262 (1998). 4. R. J. Doviak and D. S. Zrni´c, Doppler Radar and Weather Observations, 2nd ed., Academic Press, San Diego, CA, 1993. 5. T. Matejka and R. C. Srivastava, J. Atmos. Oceanic Technol. 8, 453–466 (1991). 6. R. Reinhart, Radar for Meteorologists, Reinhart, Grand Forks, ND, 1997. 7. F. Roux, Mon. Weather Rev. 113, 2,142–2,157 (1985). 8. M. A. Shapiro, T. Hample, and D. W. Van De Kamp, Mon. Weather Rev. 112, 1,263–1,266 (1984). 9. M. Skolnik, Introduction to Radar Systems, 2nd ed., McGrawHill, NY, 1980. 10. J. Tuttle and R. Gall, Bull. Am. Meteorol. Soc. 79, 653–668 (1998). 11. R. M. Wakimoto et al., Bull. Am. Meteorol. Soc. 77, 1,465– 1,481 (1996). 12. B. L. Weber et al., J. Atmos. Oceanic Technol. 7, 909–918 (1990). 13. J. Vivekanandan et al., Bull. Am. Meteorol. Soc. 80, 381–388 (1999). 14. D. S. Zrnic and A. V. Ryzhkov, Bull. Am. Meteorol. Soc. 80, 389–406 (1999).
X X-RAY FLUORESCENCE IMAGING
the K, L, . . . absorption edges of the atom, or in terms of wavelength, λK , λL , . . ., respectively (2–4). The absorption at/near E ∼ = Wγ is different from the common phenomena of absorption, because the absorptive cross section and the linear absorption coefficient increase abruptly at absorption edges. This phenomenon is the so-called anomalous dispersion due to its resonant nature (2,3). In normal absorption, the intensity of a transmitted X-ray beam through a material is attenuated exponentially as the material increases in thickness according to a normal linear absorption coefficient µ. Usually, the normal linear absorption coefficient µ is related to the photoelectric absorption cross section of the electron σe by µ = σe n0 ρ, where n0 is the number of electrons per unit volume in the material and ρ is the mass density. The atomic absorption coefficient is µa = (A/NO )(µ/ρ), where A is the atomic weight of the element in question and NO is Avogadro’s number. For λ = λγ (i.e., E = Wγ ), µa is approximately proportional to λ3 and Z4 , according to quantum mechanical considerations (5,6), Z is the atomic number. When E increases (i.e., λ decreases), µa decreases according to λ3 . When E = Wγ (i.e., λ = λγ ), µa increases abruptly because X rays are absorbed in the process of ejecting γ electrons. For E > Wγ , the absorption resumes a decreasing trend as λ3 (2,3). According to the electron configuration of atoms, the K, L, M electrons, etc. are specified by the principal quantum number n, orbital angular quantum number , magnetic quantum number ml , and spin quantum number ms , or by n, , j, m, where j = ± ms , m = ±j (3–6). Following the common notations used, the s, p, d, f, . . . subshells and the quantum numbers n, , j, m are also employed to indicate the absorption edges or to relate to the ejection of electrons in the subshell. For example, there are three absorption edges for L electrons, LI ,LII , LIII , corresponding to the three energy levels (states) specified by n, , and j: two electrons in (2, 0, 1/2), two electrons in (2, 1, 1/2), and four electrons in (2, 1, 3/2). The LI state involves 2s electrons, and LII and LIII involve six 2p electrons. The difference between LII and LIII is that j < for the former and j > for the latter. X-ray emission, the other process, involves the allowed transition of an atom from a high-energy state to a lower one. Usually, a high-energy state has a vacancy in the tightly bound inner shell. During the transition, the vacancy in the inner shell is filled with an electron coming from the outer shell. The energy of the fluorescence emitted is the energy difference between the two states involved. According to quantum mechanics (5–7), the allowed transition, the transition whose probability is high, is governed by selection rules. In the electric dipole approximation (5–7), the changes in quantum numbers in going from one state to the other follow the conditions = ±1 and j = 0 or ±1. For example, in the transition from a K state to an L state, the atom can only go from the K state to either the LII or LIII state. The X-rays emitted are the familiar Kα2 and Kα1 radiation, respectively. The
SHIH-LIN CHANG National Tsing Hua University Hsinchu, Taiwan Synchrotron Radiation Research Center Hsinchu, Taiwan
INTRODUCTION In Wilhelm Conrad R¨ontgen’s cathode-ray-tube experiments, the lighting up of a screen made from barium platinocyanide crystals due to fluorescence led to his discovery of X rays in 1895. Subsequent investigations by R¨ontgen and by Straubel and Winkelmann showed that the radiation that emanated from a fluorspar, or fluorite crystal CaF2 excited by X rays is more absorbing than the incident X rays. This so-called fluorspar radiation is X-ray fluorescence (1). Nowadays, it is known that X-ray fluorescence is one of the by-products of the interaction of electromagnetic (EM) waves with the atoms in matter in ˚ During the interaction, the the X-ray regime (0.1–500 A). incident X rays have sufficient energy to eject an inner electron from an atom. The energy of the atom is raised by an amount equal to the work done in ejecting the electron. Therefore, the atom is excited to states of higher energies by an incident EM wave. The tendency to have lower energy brings the excited atom back to its stable initial states (of lower energies) by recapturing an electron. The excess energy, the energy difference between the excited and the stable state, is then released from the atom in the form of EM radiation, or photons — this is fluorescence (2–4). The elapsed time between excitation and emission is only about 10−16 second. The energy of X-ray fluorescence is lower than that of the incident radiation. In general, the distribution of the X-ray fluorescence generated is directionally isotropic; the fluorescence is radiated in all directions at energies characteristic of the emitting atoms. Thus, X-ray fluorescence is elementspecific. X-ray fluorescence is a two-step process; X-ray absorption (the ejection of an inner electron) and X-ray emission (the recapture of an electron) (4). Both involve the electron configuration of atoms in the irradiated material. To eject an electron, the atom needs to absorb sufficient energy from the incident X rays of energy E and E ≥ Wγ , where Wγ is the energy required to remove an electron from the γ shell. The γ shell corresponds to K, L, M electrons. An atom is said to be in the K quantum state if a K electron is ejected. It is similar for the L and M quantum states. Because a K electron is more tightly bound to the nucleus than the L and M electrons, the energy of the K quantum state is higher than that of the L quantum state. The energies corresponding to the work WK , WL , WM , done by the EM wave to remove K, L, M electrons are called 1475
1476
X-RAY FLUORESCENCE IMAGING
transition from the K state to the LI state is forbidden by selection rules. Moreover, an initial double ionization of an atom can also result in the emission of X rays. For example, the Kα3,4 emission is the satellite spectrum produced by a transition from a KL state to an LL state, where the KL state means that the absorbed X-ray energy is sufficient to eject a K and an L electron (3,4). The absorption and emission of X-ray photons by atoms in fluorescent processes can be rigorously described by using the quantum theory of radiation (7). The vector r, t) that represents the electromagnetic potential A( radiation (8) is usually expressed as a function of the position vector r and time t (5–7): 1/2 2 ¯ r, t) = c N h εˆ ei(k∗r−ωt) A( ωV 1/2 2 c (N + 1)h ¯ + εˆ e−i(k∗r−ωt) , (1) ωV where the first term is related to the absorption of a photon quantum and the second term to the emission of a photon quantum by an electron. N is the photon occupation number in the initial state, εˆ is the unit and ω are the wave vector vector of polarization, and k and angular frequency of the EM wave. c, h ¯ , and V are the speed of light in vacuum, Planck’s constant h/2π , and the volume irradiated, respectively. The absorption probability, the transition probability, and the differential cross section can be calculated quantum mechanically by considering the fluorescence as the scattering of photons by atomic electrons (4–7). Before scattering, the photon, angular frequency is ω, and whose wavevector is k, polarization vector is εˆ , is incident on the atom at its initial state A. After scattering, the atom is left in its final state B, and the scattered photon of the polarization vector εˆ and angular frequency ω propagates along the vector. In between are the excited intermediate states. k The Kramers–Heisenberg formula, modified to include the effects of radiation damping, gives the differential cross section, the derivative of the cross section σ with respect to the solid angle , as follows (7): ω p ∗ εˆ )BI ( p ∗ εˆ )IA 1 dσ ( = r2O εˆ ∗ εˆ δAB − I d ω m I EI − EA − h ¯ω−i 2 2 p ∗ εˆ )IA ( p ∗ εˆ )BI ( (2) + , EI − EA + h ¯ω where rO is the classic radius of the electron, rO = (e2 /mc2 ), and the subscript I stands for the excited intermediate is the electric dipole moment, and ϕ is the wave state. p function of a corresponding state. δAB is a Kronecker delta that involves the wave functions ϕA and ϕB of states A and p ∗ εˆ )IA involve the B. The matrix elements ( p ∗ εˆ )BI and ( wave functions, ϕA , ϕB , and ϕI (7). The probability of finding the intermediate state I is proportional to exp(−I t/h ¯ ), where I = h ¯ /τI and τI is the lifetime of state I. The first term in Eq. (2) represents the nonresonant amplitude of scattering (ω = ωIA = (EI − EA /h ¯ )). The second and
third terms have appreciable amplitude in the resonant condition, that is ω = ωIA . This phenomenon is known as resonant fluorescence. The sum of the nonresonant amplitudes is usually of the order of rO , which is much smaller than the resonant amplitude (of the order of c/ω). By ignoring the nonresonant amplitudes, the differential cross section of a single-level resonant fluorescent process in the vicinity of a nondegenerate resonance state R takes the form (7) r2 ω |( p ∗ εˆ )AR |2 dσ |( p ∗ εˆ )RB |2 . = O2 d m ω (ER − EA − h ¯ ω)2 + R2 /4 (3) This expression is related to the product of the probability of finding state R formed by the absorption of a photon εˆ ) and the spontaneous emission characterized by (k, probability per solid angle for the transition from state R to state B characterized by the emitted photon (k , εˆ ). The term in the square bracket is proportional to the absorption probability, and the matrix element after the bracket is connected to the emission probability. For a conventional X-ray and a synchrotron radiation (SR) source, the energy resolution E of an incident Xray beam is around 10−2 to 10 eV (2,3,9,10), except that for some special experiments, such as those involving M¨ossbauer effects, 10−9 eV is needed (11,12). Under normal circumstances, E R (∼10−9 eV) according to the uncertainty principle. This means that the temporal duration of the incident photon (X-ray) beam is shorter than the lifetime of the resonance. In other words, the formation of the metastable resonance state R via absorption and the subsequent X-ray emission can be treated as two independent quantum mechanical processes. In such a case, the emission can be thought of as a spontaneous emission, that is, the atom undergoes a radioactive transition from state R to state B as if there were no incident electromagnetic wave. Then, the corresponding dσ/d takes the following simple form under the electric dipole approximation (7): αω3 −→ 2 dσ ∼ V|XAB | sin2 θ, = PAB d 2π c3
(4)
where PAB is the absorption probability equal to the term in the square bracket given in Eq. (3), and |XAB |2 is the transmission probability equal to the term after the bracket of Eq. (3). α is the fine-structure constant (∼1/137). and θ is the angle between the electric dipole moment p the direction of the X-ray fluorescence emitted from the atom. As is well known, this type of fluorescence has the following distinct properties: (1) In general, fluorescence is directionally isotropic, so that its distribution is uniform in space. (2) The intensity of fluorescence is maximum and is when the emission direction is perpendicular to p . (3) X-ray fluorescence emitted from different zero along p atoms in a sample is incoherent, and the emitted photons will not interfere with each other. (4) Whenever the energy of an incident photon is greater than an absorption edge, the energy of the emitted fluorescence is determined by the energy difference between the two states involved in the transition, which is independent of the incident energy. X-ray fluorescence has diverse applications, including identifying elements in materials and quantitative and
X-RAY FLUORESCENCE IMAGING
qualitative trace-element analyses from X-ray fluorescent spectra. Together with other imaging techniques, X-ray fluorescent imaging can, in principle, provide both spectral and spatial distributions of elements in matter in a corresponding photon-energy range. Unlike other fluorescent imaging techniques, such as optical fluorescence and laser-induced fluorescent imaging, the development and application of X-ray fluorescent imaging is still in a state of infancy, though X-ray fluorescence and related spectroscopy have been known for quite sometime. Yet, due to the advent of synchrotron radiation, new methods and applications using this particular imaging technique have been developed very recently. An X-ray fluorescent image is usually obtained by following these steps: The incident X-ray beam, after proper monochromatization and collimation, impinges on a sample of interest. An absorption spectrum is acquired by scanning the monochromator to identify the locations of the absorption edges of the element investigated. The incident beam is tuned to photon energies higher than the absorption edges so that the constituent elements of the sample are excited. Then, X-ray fluorescence emanates. The spatial distribution of the fluorescent intensity, the fluorescent image, is recorded on a two-dimensional detector or on a point detector using a proper scan scheme for the sample. The following items need to be considered for good quality images of high spatial and spectral resolution: (1) appropriate X-ray optics for shaping the incident beam and for analyzing the fluoresced X rays and (2) a suitable scanning scheme for the sample to cross the incident beam. In addition, image reconstruction of intensity data versus position is also important for direct mapping of the trace elements in a sample. TECHNICAL ASPECTS An X-ray fluorescent imaging setup consists of an X-ray source, an X-ray optics arrangement, a sample holder, an imaging detector, and an image processing system. These components are briefly described here: X-ray Sources Conventional X-ray sources are sealed X-ray tubes and rotating-anode X-ray generators. Synchrotron radiation emitted from accelerated relativistic charged particles is another X-ray source. The spectra of conventional sources consist of characteristic lines according to specific atomic transitions and white radiation background due to the energy loss from the electrons bombarding the metal target used in X-ray generators (2,3). The energy resolution of the characteristic lines is about E/E ≈ 10−4 . In contrast, the spectrum of synchrotron radiation is continuous and has a cutoff on high energies, resulting from the emission of a collection of relativistic charged particles (9). High directionality (parallelism), high brilliance, well-defined polarization (linear polarization in the orbital plane and elliptical polarization off the orbital plane), pulsed time structure (usually the pulse width is in picoseconds and the period in nanoseconds), and a clean environment (in vacuum of 10−9 torr) are the advantages of synchrotron
1477
X-ray sources (9). Other sources such as plasma-generated X rays can also be used. Beam Conditioning Proper X-ray monochromators need to be employed for monochromatic incident X-rays. Different incident beam sizes for different imaging purposes, can be obtained by using a suitable collimation scheme. Beam conditioning for X-ray fluorescent imaging is similar to that for Xray diffraction and scattering. Single- or double-crystal monochromators (10) are usually employed for beam monochromatization. For synchrotron radiation, grazing incidence mirrors before or after the monochromator for focusing or refocusing, respectively, are used (13,14). Pinhole or double-slit systems and an evacuated beam path have been found useful for beam collimation. To obtain a microbeam, two double-crystal monochromators — one vertical and the other horizontal, or multilayered mirrors such as the Kirkpatrick–Baez (K–B) configuration (15) are reasonable choices for the collimation system. Sample Holder Usually a sample holder is designed to facilitate the positioning of the sample with respect to the incident X-ray beam. In general, the sample holder should provide the degree of freedom needed to translate the sample left and right and also up and down. In addition, in some cases, the sample needs to be rotated during the imaging process. Therefore, the azimuthal rotation around the samplesurface normal and rocking about the axis perpendicular to both the surface normal and the incident beam are indispensable. Accuracy in translation is of prime concern in high spatial resolution fluorescent imaging. If soft Xray fluorescent imaging is pursued, a sample holder in a high vacuum (HV) or ultrahigh vacuum (UHV) condition is required to reduce air absorption. Detector A suitable detector is essential in studies/investigations for recording X-ray fluorescent images. Area detectors or two-dimensional array detectors are usually employed to obtain two-dimensional images. They include imaging plates (IP) (a resolution of 100 µm for a pixel), charge coupled device (CCD) cameras (a resolution of about 70–100 µm), microchannel plates (MCP) (25 µm per pixel), and solid-state detectors (SSD) (an energy resolution of about 100–200 eV). Traditionally, point detectors such as the gas proportional counter and NaI(Li) scintillation counter have been used frequently. However, to form images, proper scan schemes for the sample as well as the detector are required. For hard X-ray fluorescent imaging, an imaging plate, CCD camera, and point detectors serve the purpose. For soft X-ray fluorescence imaging, a vacuum CCD camera, microchannel plate, and semiconductor pindiode array may be necessary. Computer-Aided Image Processor The fluorescent signals collected by a detector need to be stored according to the sample position. Usually analog signals, are converted into digital signals, and
1478
X-RAY FLUORESCENCE IMAGING
this digital information is then stored in a computer. Image processing, which includes background subtraction, image-frame addition, and cross-sectional display, can be carried out on a computer. In addition, black-and-white contrast as well as color images can be displayed on a computer monitor and on a hard copy. In some cases when the rotating-sample technique is employed to collect a fluorescent image, corrections to avoid spatial distortion need to be considered (16,17). Image reconstruction depends on the scan mode chosen. There are three different modes scan: the point scan, line scan, and area scan. Usually the data from a point scan give a one to one spatial correspondence. Therefore, there is no need to reconstruct the image using mathematical transformation techniques, provided that high-resolution data are collected. Because the areas covered by the incident beam during a line scan (involving a line beam and sample translation and rotation) and an area scan are larger than the sample area, these two types of scans require image reconstruction from the measured data. The estimated relative errors σrel in the reconstructed images as a function of the percentage of the sample area a % that provide fluorescent intensity are shown in Fig. 1 for the three different scan types without (Fig. 1a) and with (Fig. 1b) a constant background. Except for very small fluorescent areas, the line scan provides the lowest error (18,19). Images from both the line scan and area (a) srel
Area Point
1
Line 0.5
scan can be reconstructed by employing appropriate filter functions and convolution techniques (16,20). The following representative cases illustrate in detail the general basic experimental components necessary for X-ray fluorescent imaging. The required experimental conditions specific for the investigation are given, and the actual images obtained are shown. CASE STUDIES Nondestructive X-ray Fluorescent Imaging of Trace Elements by Point Scanning The two-dimensional spatial distribution of multiple elements in samples is usually observed by X-ray fluorescent imaging using synchrotron radiation, mainly because of the tunability of synchrotron photon energy (21). Proper photon energy can be chosen for a specific element investigated. In addition, the brightness and directionality of synchrotron radiation provide superb capability for microbeam analysis, which can improve the spatial resolution of imaging approximately 100-fold. The minimal radiation damage of synchrotron radiation to specimens is another advantage over charged particles (22) such as ions and electrons. A typical experimental setup (23) for this purpose is shown schematically in Fig. 2. Polychromatic white X-rays from synchrotron storage ring are monochromatized for a selected wavelength using either a Si (111) cut double-crystal monochromator (DCM) or a Si–W synthetic multilayer monochromator (SMM). Using the DCM, the incident photon energy can be varied continuously by rotating the crystal around the horizontal axis perpendicular to the incident beam. Using the SMM, the photon energy can be changed by tilting the multilayer assembly about the vertical axis perpendicular to the incident beam. However, the entire system of the SMM may need rearrangement. Cooling the SMM may be necessary, and this should be taken into consideration. A multilayer substrate is usually attached to a copper block, which can be watercooled. The energy resolution E/E is about 10−4 for the
a (%) 0
100
50
(b) srel
Side view Point
1
M2
Slit 2
Slit 1
Area
Sample
Line
0.5
M1 Top view
SR (DCM)
(SMM)
IC
IC Si(Li) detector
0
50
a (%) 100
Figure 1. Estimated error σrel in reconstructed image vs. signal area a% for point, line, and area scans: (a) with, (b) without constant background (19) (courtesy of M. Bavdaz et al.; reprinted from Nucl. Instrum. Methods A266, M. Bavdaz, A. Knochel, P. Ketelsen, W. Peterson, N. Gurker, M. H. Salehi, and T. Dietrich, Imaging Multi-Element Analysis with Synchrotron Radiation Excited X-ray Fluorescence Radiation, pp. 308–312, copyright 1988, Elsevier Science with permission).
Figure 2. Schematic representation of the experimental setup for synchrotron radiation (SR) fluorescent imaging: The incident SR is monochromatized by either a double-crystal monochromator (DCM) or a synthetic multilayer (SMM) and is then focused by the K–B mirror system (M1 and M2). An ionization chamber (IC) and a Si(Li) SSD are used to monitor the incident beam and the fluorescence (23) (courtesy of A. Iida et al.; reprinted from Nucl. Instrum. Methods B82 A. Lida and T. Norma, Synchrotron X-ray Microprobe and its Application to Human Hair Analysis, pp. 129–138, copyright 1993, Elsevier Science with permission).
X-RAY FLUORESCENCE IMAGING
DCM and 10−2 for the SMM. A double-slit system is usually used to trim the incident beam to a desired size, say from 750 × 750 µm to as small as 3 × 4 µm for point scans. The intensity of the incident beam is monitored by an ionization chamber (IC) as usual for synchrotron experiments. A Kirkpatrick–Baez mirror system that consists of a vertical mirror M1 and a horizontal mirror M2 is placed after the IC to modulate the beam size for spatial resolution and the photon flux for sensitivity. Usually, elliptical mirrors are used to reduce spherical aberration (14,24). A sample is located on a computer-controlled X,Y translation stage. The translation step varies from 5 to 750 µm. Smaller steps can be achieved by special design of the translation stage. The fluorescence emitted from the sample at a given position is detected by a Si (Li) solid-state detector. The fluorescent intensity as a function of the sample position (x, y) is recorded and displayed on a personal computer. For illustration, the following examples are X-ray fluorescent images obtained by using this specific setup.
metal contamination in mammals. The following example involves Hg distribution in rat hair. A hair sample of rats endogenously exposed to methylmercury (MeHg) (internal administration) was subjected to synchrotron X-ray fluorescent measurement at the incident photon energy of 14.28 keV, using an experimental setup similar to that shown in Fig. 2. The beam size was 5 × 6 µm2 . Figure 3 shows the fluorescent spectrum of the hair that was collected in 50 minutes. The presence (contamination) of Hg, Zn, and S is clearly seen from the presence of Hg Lα, Hg Lβ, Zn Kα, and S Kα lines in the spectrum. The elemental images of Hg Lα, Zn Kα, and S Kα from a hair cross section of the MMC (methylmercury chloride)treated rat 13.9 days after the first administration are shown in Fig. 4. The sample was obtained at 1 mm
10000
(a)
Intensity (Cts)
X-ray Fluorescent Images of Biological Tissues
Two-dimensional Distribution of Trace Elements in a Rat Hair Cross Section (25). Trace element analysis of hair samples has been widely employed for biological monitoring of health conditions and for environmental investigations of heavy metal exposure and contamination (26,27). Consider a heavy toxic metal such as Hg, which has already been taken up from the blood to the hair of animals and humans due to exposure to methylmercury (MeHg) — a commonly encountered and widely used form of environmental mercury. It is known that MeHg damages the central nervous system (28) by penetrating the blood–brain barrier (29). The effects on and alteration of biological systems and the dynamics of the distribution of Hg in the organs of animals exposed to MeHg have long been investigated (21,30). Because hair specimens usually provide a historical record of trace elements and are easily accessed, hair is a good bioindicator of
1479
S Kα
Ar Kα
Ca Kα Hg Lα Hg Lβ 5000
Zn Kα
5
10
15
Energy (keV) Figure 3. X-ray fluorescence spectrum obtained from the hair cross section of a rat given MMC (25) (courtesy of N. Shimojo et al.; reprinted from Life Sci. 60, N. Shimojo, S. Homma-Takeda, K. Ohuchi, M. Shinyashiki, G.F. Sun, and Y. Kumagi, Mercury Dynamics in Hair of Rats Exposed to Methylmercury by Synchrotron Radiation X-ray Fluorescence Imaging, pp. 2,129–2,137, copyright 1997, Elsevier Science with permission).
(b)
20 µm (c)
(d)
Figure 4. (a) Optical micrograph and (b) X-ray fluorescent images of Hg Lα, (c) Zn Kα, and (d) S Kα from a hair of a MMC-treated rat (25) (courtesy of N. Shimojo et al.; reprinted from Life Sci. 60, N. Shimojo, S. Hommo-Takeda, K. Ohuchi, M. Shinyashiki, G. K. Sun, and Y. Kumagi, Mercury Dynamics in Hair of Rats Exposed to Methylmercury by Sychrotron Radiation X-ray Fluorescence Imaging, pp. 2,129–2,137, copyright 1997, Elsevier Science with permission).
1480
X-RAY FLUORESCENCE IMAGING
from the root end of the hair. The sample thickness was about 25 µm. The scanning condition was 16 × 16 steps at 5 µm/step, and the counting time was 10 s per step. An optical micrograph of the hair cross section is also shown for comparison (Fig. 4a). The black-andwhite contrast, classified into 14 degrees from maximum to minimum, represents linearly the 14 levels of the element concentrations. Figure 4c clearly shows that the element Zn is localized in the hair cortex rather than the medulla after MMC administration. In addition, endogenous exposure to MMC results in preferential accumulation of Hg in the hair cortex than in the medulla or cuticle (Fig. 4b). However, for hair under exogenous exposure to MMC (external administration), detailed analysis of the fluorescent images, together with the fluorescent spectra across the hair cross section, indicates that Hg is found distributed on the surface of the hair cuticle rather than the cortex (25). This result is consistent with other element characterization using flameless atomic absorption spectrometry (FAAS). Similarly, the Hg concentration profile along the length of a single hair can also be determined by using this imaging technique. Human hair can also be analyzed by using the same Xray fluorescent imaging technique (23). The distribution of the elements S, Ca, Zn, Fe, and Cu in a hair cross section, similar to Fig. 3, can be determined similarly. The concentrations of these elements measured by the quantitative analysis of fluorescent intensity are comparable with the values obtained by other techniques (31–33). This imaging technique can be also applied to dynamic studies of metal contamination in hair and blood because the samples prepared can be repeatedly used due to negligibly small radiation damage. For example, fluorescent images of hair cross sections cut at every 1 mm from the root end of MMC-treated rats show that the Hg concentration, distributed in the center of cortex, increases first to about 1,000 µg/g for about 10 days after MMC administration. Then, the concentration decreases to about 600 µg/g for another 10 days (25). Thus, the history of Hg accumulation in animals can be clearly revealed. In summary, the X-ray fluorescent imaging technique is useful in revealing the distribution and concentrations of metal elements in hair and also in providing dynamic information about the pathway of metal exposure in animals and the human body.
Distribution of Cu, Se, and Zn in Human Kidney Tumors. Copper (Cu), selenium (Se), and zinc (Zn) are important metal cofactors in metalloenzymes and metalloproteins. Their presence in these enzymes and proteins directly influences many biochemical and physiological functions. As is well known, ceruloplasmin (34) and dopamine-β-hydroxylase (35) are involved in iron metabolism and neurotransmitter biosynthesis, respectively. Both contain Cu. Se is an essential component of glutathione peroxidase, which plays an important role in protecting organisms from oxidative damage via the reduction of lipoperoxides (R–O–OR) and hydrogen peroxide (36). Zn is also an important metal element in
DNA polymerase and appears in many enzymes such as carbonic anhydrase, alcohol dehydrogenase, and alkaline phosphatase (37,38). Recent studies of the physiological roles of these essential trace elements have been emphasized in connection with possible causes of cancer. For example, investigations of the blood serum levels of these trace elements in cancer patients show the possible involvement of these elements in many cancerous conditions (39–44). In particular, serum levels of Cu and Zn and Cu/Zn ratios of patients who have malignant neoplasms have been used as an indicator for assessing disease activities and prognoses. Increased serum Cu levels and decreased serum Zn levels have also been found in patients who have sarcomas (39), lung cancer (40), gynecologic tumors (41), and carcinoma of the digestive system (stomach) (42). However, only a limited number of works in the literature, are concerned with the distribution of Cu, Se, and Zn in malignant neoplasms. In the following, two-dimensional distributions of Cu, Se, and Zn in human kidney tumors were determined and visualized using nondestructive synchrotron radiation X-ray fluorescent imaging (45). The distribution of Cu, Se, and Zn in cancerous and normal renal tissues and the correlation among these distributions were studied. The experimental setup shown in Fig. 2 and an incident photon energy of 16 keV were used. The experimental conditions were 750 × 750 µm for the beam size, 750 µm/step for the sample translation, and 10 s counting time for each position. The fluorescence of Cu Kα, Zn Kα, and Se Kα were identified in X-ray fluorescent spectra. Compare the optical micrograph shown in Fig. 5a to the spatial distributions of fluorescent intensities of Cu Kα, Zn Kα, and Se Kα for normal (N) and cancerous (C) renal tissues shown in Fig. 5b, c, and d, respectively. The frame of (b)
(a) C
N
(c)
(d)
Figure 5. Chemical imaging of trace elements in normal (N) and cancerous (C) renal tissue from an aged female: (a) Optical micrograph of a sliced sample and the distributions of (b) Zn, (c) Cu, and (d) Se (45) (courtesy of S. Homma-Takeda et al.; translated from J. Trace Elements in Exp. Med. 6, S. Homma, A. Sasaki, I. Nakai, M. Sagal, K. Koiso, and N. Shimojo, Distribution of Copper, Selenium, and Zinc in Human Kidney Tumours by Nondestructive Synchrotron X-ray Fluorescence Imaging, pp. 163–170, copyright 1993, Wiley-Liss Inc. with permission of John Wiley & Sons, Inc. All rights reserved).
X-RAY FLUORESCENCE IMAGING
the fluorescent images covers the entire tissue sample. The upper left portion of the frame corresponds to the cancerous tissue, and the low right portion corresponds to the normal tissue. Darker pixels indicate positions of higher metal concentration. The images reveal that Cu, Zn, and Se accumulate more densely in normal tissue than in cancerous tissue. The concentrations of these metals in the samples determined by ICP-AES (inductively coupled plasma atomic emission spectrometry) agree qualitatively with those estimated from the intensities of the fluorescent images. The average Zn concentration in the cancerous tissues is 12.30 ± 5.05 µg/g compared to 19.10 ± 10.19 µg/g for the normal tissues. The average concentration of Cu is 0.991 ± 0.503 µg/g in the cancerous tissues, compared to 17.200 ± 0.461 µg/g in the normal tissues. The decreases in Zn and Cu concentrations in the cancerous tissues are statistically significant across hundreds of measured data sets. Moreover, the correlation coefficients among the distributions of trace elements can be calculated from the X-ray intensity data at each analytical point. In general, it is found that the correlation coefficients among the metal elements Cu, Se, and Zn in cancerous tissues are qualitatively lower than those in normal tissues. However, the correlation between Cu and Zn in the cancerous tissue investigated is significantly decreased, more than 30%, compared with that in the normal tissues. It should be noted that the change in the trace element level in cancerous tissues is not the same for all types of tumors. Moreover, Zn levels in cancerous tissues vary in different organs. According to (46), tumors located in organs that normally exhibit a low Zn concentration have a Zn accumulation that is similar to or greater than that in the tissue around the tumor. Tumors located in organs that normally have a high Zn concentration also, exhibit a lower uptake of Zn than tissues not involved in the growth of the tumor. Kidney is an example of an organ that has a high Zn concentration. Therefore, the observation of decreased Zn concentration in the kidney tumor is expected. The age dependence of the distribution of these metal elements in human kidney can also be observed in X-ray fluorescent images (47). Figures 6a–d and 6e–h show the X-ray fluorescent images of Zn, Cu, and Se in adult human kidneys from a 22-year-old and a 61-year-old man, respectively. The same experimental setup and conditions as those used for imaging the kidney tumor were employed. The exposure time was 6 s/point for the sample from the 22-year-old man and 10 s/point from the 61-year-old man. Clearly, Cu, Zn, and Se are more concentrated in the renal cortex than in the medulla. This result agrees with that reported in (48–50) and is consistent with the functions of each tissue that is, the kidney cortex contains glomeruli, responsible for cleaning waste materials from blood, and the proximal tubule is responsible for reabsorbing metal ions. The elemental concentrations of Cu and Zn in the kidney tissue determined by both ICP-AES and X-ray fluorescent analysis are 1.67 ± 0.22 µg/g (Cu) and 13.3 ± 2.6 µg/g (Zn) for the 22-year-old man and 1.06 ± 0.34 µg/g (Cu) and 4.42 ± 1.52 µg/g (Zn) for the 61-year-old man. The correlation coefficients between the two trace elements calculated from the X-ray intensity data indicate that
(a)
(e)
(b)
(f)
(c)
(g)
(d)
(h)
1481
Figure 6. X-ray fluorescent images of trace elements in adult human kidney: (a) photograph of a sliced sample and the distributions of (b) Zn, (c) Cu, (d) Se for a 22-year-old man, and (e) photograph and the distributions of (f) Zn, (g) Cu, and (h) Se for a 61-year-old man (field width 750 × 750 µm2 ; medulla: the upper central region; cortex: the periphery) (47) (courtesy of S. Homma-Takeda et al.; reprinted from Nucl. Instrum Methods B103, S. Homma, I. Nakai, S. Misawa, and N. Shimojo, Site-Specific Distribution of Copper, Selenium, and Zinc in Human Kidney by Synchrotron Radiation Induced X-ray fluorescence, pp. 229–232, copyright 1995, Elsevier Science with permission).
the correlation between Zn and Cu is higher for the 22year-old man than for the 61-year-old man. Moreover, the correlation between Cu and Se is the lowest among the Zn–Cu, Zn–Se, and Cu–Se correlations (47). In summary, the X-ray fluorescent imaging technique is well suited for evaluating levels of trace elements in human organs that may influence smoking, cancer, or other diseases. Consequently, the physiological roles of trace elements could be better understood. Furthermore, using synchrotron radiation, this X-ray fluorescent
X-RAY FLUORESCENCE IMAGING
imaging technique causes negligible damage to samples. As a result, samples can be examined histologically after analysis. This fact makes X-ray fluorescent imaging a useful technique for histochemical analysis of biological specimens. The X-ray fluorescent imaging technique can be used in combination with other techniques, such as isoelectric focusing agarose gel electrophoresis (IEF-AGE), histochemical staining, and X-ray microprobes, to obtain functional and structural information about biological substances. For example, using IEF-AGE, direct detection of structural or functional alteration of protein attributed to the binding of mercury (51) and efficient monitoring of changes in mRNA levels, protein contents, and enzyme activities for brain Cu, Zn-, and Mn-SOD (superoxide dismutase) by MMC administration (52) are feasible. Furthermore, detailed distribution of metal elements and morphological changes in biological specimens that are histochemically stained (53), and high-resolution fluorescent images without tedious sample preparation and treatment using X-ray microprobes (54) can also be obtained. X-ray Fluorescent Imaging by Line Scanning This example is included to demonstrate the X-ray fluorescence imaging of a Ni-wire cross by synchrotron radiation using line scans. Images different from those obtained by point scans are expected. The schematic of the experimental setup for line scanning is shown in Fig. 7 (19). The incident beam is collimated by a beam aperture (1) and focused by a mirror (2). An additional guard aperture (3) is then placed between the linear slit (4) and the mirror to cut down the parasitic scattering from the first aperture and the mirror. A sample aperture (5) in front of the sample holder (6) prevents X-rays scattered by the linear slit from participating in the sample excitation. The sample holder can be rotated around the incident beam and translated across the line-shaped incident beam. A solid-state detector (7) is used to monitor the fluorescence that emanates from the sample. During excitation, the sample, a Ni-wire cross, is translated vertically and then rotated around the incident beam. Because a monochromator is not used, both Ni Kα and
6 4
7
5
3 1
2
Beam Figure 7. Experimental setup for X-ray fluorescent imaging using line scan: (1) aperture, (2) mirror, (3) antiscattering stop, (4) slit-aperture, (5) aperture-stop, (6) sample holder, (7) solid-state detector (19) (courtesy of M. Bavdaz et al.; reprinted from Nucl. Instrum. Methods A266, M. Bavdaz, A. Knochel, P. Ketelsen, W. Petersen, N. Gurker, M. H. Salehi, and T. Dietrich, Imaging Multi-Element Analysis with Sychronotron Radiation Excited X-ray Fluorescence Radiation, pp. 308–312, copyright 1998, Elsevier Science with permission).
180
Angle (deg)
1482
0 0
60
120
180 240 300 360 420 Scanning position (µm)
480
540
Figure 8. Measured Ni K fluorescent intensity distribution of a Ni-wire cross (19) (courtesy of M. Bavdaz et al.; reprinted from Nucl. Instrum. Methods A266, M. Bavdaz, A. Knochel, P. Ketelsen, W. Petersen, N. Gurker, M. H. Salehi, and T. Dietrich, Imaging Multi-Element Analysis with Sychronotron Radiation Excited X-ray Fluorescence Radiation, pp. 308–312, copyright 1998, Elsevier Science with permission).
Ni Kβ are excited. Figure 8 shows the two-dimensional Ni Kα and Ni Kβ fluorescent distribution measured at a translation of 9 µm per step and 3 ° per step in rotation. Clearly, the measured distribution is far from the real image of the cross, which can usually be obtained by point scans. The spatial resolution of Fig. 8 depends on the sizes of the apertures and slits used and the distances between the sample, the apertures, and the detector. The accuracy in rotating the sample is also an important factor. Using this kind of setup, 10-µm resolution can easily be achieved. By applying the filtered back-projection technique (20) for image reconstruction from the line scanned distribution, the final reconstructed image of the Ni cross shown in Fig. 9 is obtained. This fluorescent image from linear scanning is equally applicable to imaging trace elements for various types of specimens (16). Soft X-ray Fluorescent Imaging at Submicron Scale Resolution Spatial resolution of soft X-ray fluorescent imaging depends in some cases on the fluorescent material used in making the screen for image recording and display. Among the various materials available for making fluorescent screens, polycrystalline phosphor is most frequently used to detect X rays and electrons. The required features of a material such as phosphor for imaging are high conversion efficiency and high spatial resolution. However, polycrystalline phosphor (powder) usually suffers from particle size and the scattering of fluorescent light among powder particles, which limit the spatial resolution (≥1 µm) and the resultant conversion efficiency. To improve the optical quality of fluorescent
X-RAY FLUORESCENCE IMAGING
1483
540 480
ICCD
Vert.sample pos. (µm)
420 View port 360 300
Fluorescent light
100X, NA = 1.25 objective
240 180
Single-crystal phosphor
120
Aerial image Primary mirror
Vacuum chamber
60
Schwarzschild camera
0 0
60
120 180 240 300 360 420 480 540 Hor. sample pos. (µm)
Figure 9. Reconstructed image of a Ni-wire cross (19) (courtesy of M. Bavdaz et al.; reprinted from Nucl. Instrum. Methods A266, M. Bavdaz, A. Knochel, P. Ketelsen, W. Petersen, N. Gurker, M. H. Salehi, and T. Dietrich, Imaging Multi-Element Analysis with Sychronotron Radiation Excited X-ray Fluorescence Radiation, pp. 308–312, copyright 1998, Elsevier Science with permission).
screen materials, the use of a plastic scintillator, or doped polymer films, has been suggested to provide better spatial uniformity. However, the efficiency of this material is lower than that of phosphor powder (55). Recently, phosphor single crystals have been grown by liquid-phase epitaxy (LPE) (56,57). The resulting monocrystalline phosphor possesses excellent optical quality and high conversion efficiency. The characteristics of a short absorption depth of soft X-rays in the phosphor (200 A˚ for ˚ provide a well-defined and localized fluorescent λ = 139 A) image without any smearing. The high efficiency of conversion and the high sensitivity of the phosphor crystal that covering the emission spectrum in the soft X-ray regime afford low light level detection. Moreover, because the single-crystal phosphor is transparent to emitted light and possesses a superpolished crystal surface, deleterious scattering effects are eliminated, and a well-defined perfect X-ray image plane is achieved. Collectively, these superb optical properties of the single-crystal phosphor lead to soft X-ray fluorescent imaging at submicron resolution, which is described here (58). The imaging experiment was performed by using soft X rays from a synchrotron storage ring. The schematic representation of the experimental setup is given in Fig. 10. It comprises a 20X reduction Schwarzschild ˚ a fluorescent crystal, an optical camera working at 139 A, microscope, and an intensified charge-coupled device (ICCD) camera. The Schwarzschild camera contains two highly reflecting mirrors that have multilayer coatings. The main component of the optical microscope is an oil immersion 100X objective that has a numerical aperture of 1.25. Referring to Fig. 10, an X-ray image
Aperture
Secondary mirror Transmission mask Synchrotron radiation (139 Å)
Figure 10. Experimental setup for submicron soft X-ray fluorescent imaging (58) (courtesy of B. LaFontaine et al.; reprinted from Appl. Phys. Lett. 63, B. La Fontaine, A. A. MacDowell, Z. Tan, D. L. Taylor, O. R. Wood II, J. E. Bjorkholm, D. M. Tennant, and S. L. Hulbert, Submicron, Soft X-ray Fluorescence Imaging, pp. 282–284, copyright 1995, American Institute of Physics with permission).
of a transmission mask is projected onto the image plane of the Schwarzschild soft X-ray camera, at which a commercially available phosphor crystal [a STI-F10G crystal, manufactured by Star Tech Instruments Inc. (58)] is positioned to convert the soft X-ray image into the visible. This visible image is well localized and easily magnified by the optical microscope. The magnified image, located at about 11 cm from the crystal outside the vacuum chamber, is then magnified with a 5X long-workingdistance microscope and viewed by the ICCD camera. The X-ray fluorescent spectrum of the fluorescent crystal, STI-F10 G, excited by soft X rays from 30 to 400 A˚ is shown in Fig. 11. The broad fluorescence in the emission spectrum is useful for uniform image mapping. Figure 12 illustrates the submicron-resolution fluorescent image (Fig. 12a) of a mask, which can be compared with the actual mask shown in Fig. 12b. As can be seen, all of the lines whose widths range from 0.5 to 0.1 µm after the reduction due to the Schwarzschild camera can be clearly identified in the fluorescent image. To improve the resolution of the fluorescent imaging system further, the diffraction limit (0.1 ∼ 0.2 µm, the Rayleigh criterion) of the microscope used needs to be considered. According to the Rayleigh criterion, the smallest size which can be resolved by a lens is proportional to the ratio between the wavelength used and the numerical aperture. The latter is related to the
X-RAY FLUORESCENCE IMAGING
Signal (a.u.)
1484
This high-resolution X-ray fluorescent imaging technique has potential in a variety of applications, especially in optimizing designs for deep ultraviolet and extreme ultraviolet lithographic exposure tools.
20 10
X-ray Fluorescent Holographic Images at Atomic Resolution 0 4000
4500
5000
5500
6000
6500
Emission wavelength (Å) Figure 11. X-ray fluorescent spectrum of the STI-F10 G crystal (58) (courtesy of B. LaFontaine et al.; reprinted from Appl. Phys. Lett. 63, B. La Fontaine, A. A. MacDowell, Z. Tan, D. L. Taylor, O. R. Wood II, J. E. Bjorkholm, D. M. Tennant, and S. L. Hulbert, Submicron, Soft X-ray Fluorescence Imaging, pp. 282–284, copyright 1995, American Institute of Physics with permission).
(a)
8 µm 4 µm 2 µm 6 µm 10 µm (b)
5 µm Figure 12. Soft X-ray fluorescent mask (a) and image (b) (58) (courtesy of B. LaFontaine et al.; reprinted from Appl. Phys. Lett. 63, B. La Fontaine, A. A. MacDowell, Z. Tan, D. L. Taylor, O. R. Wood II, J. E. Bjorkholm, D. M. Tennant, and S. L. Hulbert, Submicron, Soft X-ray Fluorescence Imaging, pp. 282–284, copyright 1995, American Institute of Physics with permission).
The relative positions of atoms in a crystal unit cell — the so-called single-crystal structure — are valuable structural information for a crystalline material. Diffraction methods are usually applied to determine crystal structure. However, due to the fact that a diffraction pattern provides intensity but not phase information for diffracted beams — the so-called phase problem (59) — unique determination of crystal structure cannot usually be achieved. Several methods have been developed to solve this phase problem: the Patterson method (60), molecular replacement (60) and isomorphous replacement (61), direct methods (62,63), multiwavelength anomalous dispersion (MAD) (64), and multiple diffraction (65,66). It is, most desirable however, to image the crystallographic structure and to visualize the atomic arrangement in a crystal directly. The holographic approach is certainly a promising candidate that is well suited to obtain phase information for a diffracted beam. The idea of using X-ray fluorescen holographic imaging has been proposed to attain atomic resolution and element-specific requirements (67–79). Very recently, effective methods for mapping the structure of single crystals and crystal surfaces at atomic resolution have been developed, and they are described here. Normal (single-energy) X-ray Fluorescent Holographic Imaging (NXFH). The formation of a holograph involves an objective beam (wave) and a reference beam (wave) (80). The phase information (or contrast) can be retained (or revealed) by the interference between an objective (scattered) beam and a reference beam. For X-ray fluorescence, the internal X-ray source inside a crystal is the key to holographic formation. This internal Xray source is the emitted fluorescence from atoms inside the crystal excited by incident X radiation. Figure 13a shows the schematic of single-energy X-ray fluorescent holography, where externally excited fluorescence from a target atom A is treated as a holographic reference beam. Fluorescence scattered from neighboring atoms serves as the objective beam. The interference between the two modulates the scattered fluorescent intensity, which is monitored by a detector located at a large distance from the crystal (73). By moving the detector around the crystal, is recorded, where k is the wave an intensity pattern I(k) vector of the scattered beam. The Fourier transform of I(k) yields an atomic resolution hologram. The basic formula that describes the scattered intensity is given by (72). I(k) = Io /R2 [1 + | I(k)
refractive index of the lens. Therefore, using fluorescent crystals that have a high index of refraction and a shorter wavelength emitter may put the resolution below a tenth of a micron.
aj |2 + 2Re(
aj )],
(5)
where Io is the intensity of the source atom A, aj is the amplitude of the objective wave scattered by the jth atom inside the crystal, and R is the distance between the
X-RAY FLUORESCENCE IMAGING
(a)
I (K ) E F, K rj
Incident radiation
A
(b)
E g,K
A
I (K )
EF
Figure 13. Schematic of (a) normal X-ray fluorescent holographic (NXFH) imaging and (b) inverse X-ray fluorescent holographic (IXFH) imaging (73) (courtesy of T. Gog et al.; reprinted from Phys. Rev. Lett. 76 T. Gog, P. M. Len, G. Materlik, D. Bahr, C. S. Fadley, and C. Sanchez-Hank, Multiple-Energy X-Ray Holography: Atomic Images of Hematite (Fe2 O3 ), pp. 3,132–3,135, copyright 1996, American Physical Society with permission).
translational symmetry imposed by the crystal, the value of the second term of Eq. (5) can be substantial. As pointed out in (68), the spatial frequency of this term is much larger than that of the holographic information provided by the nearest neighbor radiating atoms. Therefore, a lowpass filter (68) can be used to remove the contribution of the second term. (4) The effect of absorption by the crystal on the scattered intensity, depending on the crystal shape and the scattering geometry, needs to be corrected to see the expected 0.3% oscillatory signals in the hologram. (5) Correcting the dead time of the detector is necessary. (6) Reducing the noise by using a high-pass filter is required (78). In addition to these requirements, a large scanning angular range must be covered to obtain a threedimensional image that has isotropic atomic resolution. Some diffraction effects, such as the formation of Kossel lines (65,81), generated by the widely distributed fluorescence in the crystal, need to be suppressed. A typical experimental setup for normal X-ray fluorescent holographic (NXFH) imaging is shown in Fig. 14 (72). The incident X-rays from a conventional sealed X-ray tube or synchrotron radiation facility are monochromatized by a graphite crystal and impinge at 45° on the crystal sample mounted on a goniometer, which provides θ -angle tilting and azimuthal φ rotation normal to the crystal surface. A Ge solid-state detector cooled by liquid nitrogen is used to monitor the scattered fluorescence. The two-dimensional hologram is measured by turning the sample around the φ axis and moving the detector by rotating the θ axis in the plane of incidence defined by the incident beam and the φ axis. The upper panel of Fig. 15 shows the 2,402-pixel hologram of a SrTiO3 single crystal after absorption correction and removing the incident-beam contribution. The crystal is plate-like, 30 mm in diameter, and 0.5 mm thick. It is crystallized in the perovskite-type SrTiO3 ˚ The Sr atoms structure whose lattice constant a = 3.9 A. form a simple cubic lattice. The large surface is parallel to the (110) plane. Mo Kα radiation (E ∼ 17.425 keV) is used to excite Sr Kα, where the Sr K edge is 16.105 keV. The
f SrTiO3
Source Graphite
q
D LN 2
sample and the detector. Re means real part. The first term in the square bracket represents the intensity of the reference wave without interaction with neighboring atoms. The second term corresponds to the intensity of the scattered objective waves. The last term results from the interference between the reference and objective waves. Equation (5) also holds for multiple scattering. Because the scattering cross section of X rays is relatively small compared with that of electrons, the main contribution to interference is from the holographic process. Usually, for weak scattering of photons, the second term is of an order of 10−3 smaller than that of the interference term. For a crystal that has multiple atoms, the amplitudes of the objective and the interference waves are much more complicated to estimate, and separating the two is difficult. In addition, because many atoms radiate independently, many holograms thus formed complicate the image. The basic requirements for obtaining a single hologram are the following: (1) The crystal size needs to be small. (2) All of the irradiating atoms must have the same environment, and isotropy of the irradiation centers distributed in the sample is essential. For the former, the crystal size illuminated by the incident X rays has to be smaller than Rλ/r, where λ is the wavelength of X-ray fluorescence and r is the atomic resolution expected. (3) Due to the
1485
Figure 14. Experimental setup for NXFH imaging using a conventional X-ray source: the monochromator (graphite), the sample (SrTiO3 ), and the detector (liquid nitrogen LN2 cooled SSD) (72) (courtesy of M. Tegze et al.; reprinted with permission from Nature 380, M. Tegze and G. Faigel X-Ray Holography with Atomic Resolution, pp. 49–51, copyright 1996 Macmillan Magazines Limited).
1486
X-RAY FLUORESCENCE IMAGING
Inverse (multiple-energy) X-ray Fluorescent Holographic Imaging (IXFH)
(a)
(b) 4 2 0 −2 −4 5 0 −5
−5
0
5
Figure 15. (a) The normal X-ray fluorescent hologram of SrTiO3 , and (b) three-dimensional reconstructed image of SrTiO3 structure (only Sr atoms are revealed) (72) (courtesy of G. Faigel et al.; reprinted with permission from Nature 380, M. Tegze and G. Faigel X-Ray Holography with Atomic Resolution, pp. 49–51, copyright 1996 Macmillan Magazines Limited).
photon-counting statistic for each pixel is about 0.05%, and the overall anisotropy, that is, the spatial distribution of interference signals, in the measured intensities is 0.3%, in agreement with the theoretical expectations described in Eq. (5). The atomic arrangement is reconstructed by using the Helmholtz–Kirchoff formula (78,82) and the proper absorption and dead-time corrections mentioned. Three atomic planes parallel to the crystal surface are depicted in the lower panel of Fig. 15. Only the Sr atom that has a large atomic scattering factor can be observed. Twin images are known to occur in traditional holography. However, due to the centrosymmetry of the crystal lattice, the twin and real images of different atoms occur at the same position. Because these two waves (one of the twin and the other of the real image) may have different phases, the interference between the two can cause an intensity modulation close to the atomic positions. This effect can lead to an appreciable shift in the positions of the atoms in question or to cancellation of a given atomic image if the two waves are out of phase. The different sizes of the atoms in Fig. 15 are due mainly to the flat plate geometry of the crystal because of the different resolutions in the inplane (parallel to the crystal surface) and the out-of-plane (perpendicular direction).
Multiple-energy holographic imaging (83–85) is well established for photoelectrons, Auger electrons, backscattered Kikuchi lines, and diffuse low-energy electrons and positrons. The idea of using multiple energy for X-ray fluorescence has been adopted in inverse X-ray fluorescent holographic imaging (IXFH). A schematic representation of IXFH is shown in Fig. 13b (73). The radiation source and detector are interchanged compared to their positions in normal X-ray fluorescent holography (NXFH) (Fig. 13a). The detector used in NXFH is now replaced by the incident monochromatic radiation of energy EK , which produces an incident plane wave at the sample that propagates along As the plane wave moves toward atom the wave vector k. A, a holographic reference wave is formed. The scattered radiation from the neighboring atoms serves as the holographic objective wave. The overlap and interaction of the reference and the objective waves at atom A excite it, which results in the emission of fluorescence of energy EF . The intensity of the fluorescence is proportional to the strength of the interference. Thus, atom A, formerly the source of radiation in NXFH, now serves as a detector. By and changing its scanning the incident beam direction k that energy EK , the fluorescent intensity distribution I(k) is collected. The is emitted at atom A as a function of k is then Fourier transformed to intensity distribution I(k) produce an atomic holographic image. The two holographic schemes, NXFH and IXFH, are, in principle, equivalent according to the reciprocity theorem in optics (80). The main difference between these two schemes is that NXFH uses monochromatic fluorescence from internal excited atoms to produce a holographic scattering field, which is measured in the far field (8). The IXFH scheme, on the other hand, utilizes energy-tuned external radiation to generate a holographic scattering field, which is detected in the near field (8). Therefore, the fluorescence of the latter plays no role in establishing the scattering field for the holographic process. Because an energy-tunable source is required for IXFH, the use of synchrotron radiation is essential. Figure 16 shows the experimental configuration for IXFH measurements using synchrotron radiation. Monochromatic radiation, after the double-crystal monochromator, is focused onto the sample, which is mounted on a six-circle diffractometer. The sample can be tilted (θ -angle) in the plane of incidence and rotated via the azimuthal angle around an axis normal to the crystal surface. Fluorescence emitted from the atoms in the sample excited by the incident radiation is collected by a cylindrical graphite analyzer and detected by a proportional counter. The inset of Fig. 16 shows a typical scan at E = 9.65 keV and θ = 55° for a (001) cut hematite (Fe2 O3 ) natural crystal slab (73). The effect of the detector dead time has been corrected. The mosaic spread of the crystal is 0.01° . The angular acceptance of the curved analyzer is approximately 14° in the direction of curvature and 0.5° along the straight width. As can be seen, the signal modulation is about 0.5% of the averaged background. The local structural images, such as Kossel lines, are
X-RAY FLUORESCENCE IMAGING
1487
(a) Focusing analyzer
a Proportional counter
Q Sample
5.89
Q = 55° E = 9.65 keV
×107 5.84 −60°
F
Incident beam monitor
a1 a2 (b)
0° F
60° Monochromator
Figure 16. Experimental setup for multiple-energy IXFH imaging. The inset shows the intensity of fluorescence, corrected for detector dead time, versus the azimuthal angle (73) (courtesy of T. Gog et al.; reprinted from Phys. Rev. Lett. 76, T. Gog, P. M. Len, G. Materlik, D. Bahr, C. S. Fadley, and C. Sanchez-Hank, Multiple-Energy X-Ray Holography: Atomic Images of Hematite (Fe2 O3 ), pp. 3,132–3,135, copyright 1996, American Physical Society with permission).
(c)
suppressed by the large angular acceptance of the curved analyzer. Hematite forms a hexagonal crystal that has lattice ˚ Figure 17a is constants a = 5.038A˚ and c = 13.772A. the projection of Fe atoms onto the basal plane (001) perpendicular to the c axis, where the Fe positions in two different kinds of stacking order are shown. For the first kind, the black circles represent iron atoms in an upper plane, and the gray circles are atoms in the plane 0.6A˚ below the black circles. For the second kind, white circles denote upper plane atoms, and black circles are also 0.6A˚ under the white ones. Figure 17b shows the calculated fluorescent image of the Fe sites. Using the experimental arrangement mentioned, the two different Fe atom stackings are not distinguishable in the IXFH images. A superposition of Fe atom images in a holographic reconstruction of these layers occurs, as described in the previous section. The reconstructed image (Fig. 17c) of the (001) Fe layer from the experimental data indeed shows the effect of superposition (see Fig. 17a for comparison). As can be seen, six nearest neighbored Fe atoms, three for each kind of stacking order, are located 2.9A˚ from the center atom. The diagonal distance of this figure ˚ corresponds to 8.7A. The experimental data consist of the fluorescent intensities for three incident energies, E = 9.00, 9.65, and 10.30 keV, measured in 150 hours within the ranges of 45° ≤ θ ≤ 85° and −60° ≤ ≤ 60° by the scan window is θ = 5° and = 5° . The measured intensity I(k) mapped onto the entire 2π hemisphere above the sample by considering the threefold symmetry of the c axis. The background Io (k) derived from a Gaussian lowpass convolution (86) is subtracted from the measured intensity. The theoretically reconstructed image (Fig. 17b) calculated for the clusters of Fe atoms shows good agreement with the experiment.
Å Figure 17. (a) The projection of Fe atom positions on the (001) plane of hematite. (b) Image calculated for the Fe sites. (c) Holographic reconstruction of the (001) Fe layer (73) (courtesy of T. Gog et al.; reprinted from Phys. Rev. Lett. 76, T. Gog, P. M. Len, G. Materlik, D. Bahr, C. S. Fadley, and C. Sanchez-Hank, Multiple-Energy X-Ray Holography: Atomic Images of Hematite (Fe2 O3 ), pp. 3,132–3,135, copyright 1996, American Physical Society with permission).
Isotropic resolution in every direction for threedimensional reconstruction can be improved by combining the NXFH and IXFH techniques (79). Figure 18 is the experimental setup suitable for this combined NXFH and IXFH experiment. The arrangement is similar to that shown in Figs. 14 and 16, except that the three-circle diffractometer comprises two coaxial vertical rotations θ and θ goniometers. The upper stacked one (θ ) is on top of the other θ circle. A sample can be mounted on the horizontal φ axis situated on the lower θ circle. The sample can be rotated as usual by varying the incident angle θ (tilting angle) and the azimuthal φ angle. A detector, for example, a high purity germanium solid-state detector
1488
X-RAY FLUORESCENCE IMAGING
(a) X-ray beam
Sample
1
f
0.9
q q′
0.8 0.7
Detector (b) Sample
0.01
Graphite
0 Detector
−0.01
Figure 18. Experimental setup for combined NXFH and IXFH imaging. The inset shows a curved graphite analyzer used in the synchrotron experiments in place of the detector in the main picture (79) (courtesy of G. Faigel et al.; reprinted from Phys. Rev. Lett. 82, M. Tegze, G. Faigel, S. Marchesini, M. Belakhovsky, and A. I. Chumakov, Three Dimensional Imaging of Atoms with Isotropic 0.5A˚ Resolution, pp. 4,847–4,850, copyright 1999, American Physical Society with permission).
(c) 0.01 0 -0.01
(SSD), is mounted on the θ circle. Again a doubly focused curved graphite monochromator is used for synchrotron sources (see the inset in Fig. 18), and an avalanche photodiode (APD) is employed to handle the high counting rate (87). The θ and θ circles facilitate both the NXFH and IXFH experiments without changing the diffractometer. Special care must be taken with respect to the detector and sample motions to avoid losing any information between pixels and to maintain uniform resolution throughout the holographic image. Figures 19, 20, and 21 represent the holograms and reconstructed atomic images of a circular platelet of CoO crystal (79). The large face of the crystal is parallel to the (111) plane, and the mosaic spread of the crystal is 0.3° . CoO forms a face-centered cubic (fcc) crystal whose ˚ The sample was subjected to lattice constant a = 4.26A. both NXFH and IXFH imaging experiments at several photon energies. The time for data collection ranged from three hours for synchrotron sources to some 20 days for conventional sources. The measured Co Kα fluorescent intensity distribution, normalized to the incident intensity shown in Fig. 19a, is the projection of the hologram mapped onto the surface of a sphere defined by the coordinates θ and φ, where θ ≤ 70° . The picture is obtained by a synchrotron measurement in the IXFH mode at the photon energy E = 13.861 keV sufficient to excite Co Kα. The dominant feature in this picture is due to the strong θ -dependent absorption of the emitted fluorescence by the sample, according to its shape (78). This absorption effect can be easily corrected by a theoretical calculation (78). Figure 19b is the hologram after this correction. The wide radial stripes and the narrow conic lines (the Kossel lines) are the two main features. The former originates from the change of the crystal orientation relative to the detector during the crystal scan, so as to keep the incident radiation source on the hemisphere above the crystal. This relative change in crystal orientation
×10−3
(d)
2 0 −2 (e)
Å 6 4 2 0 −2 −4 −6 −6 −4 −2 0
0.2
0.4
0
2 0.6
4 0.8
6 Å 1
Figure 19. Holograms obtained at various stages: (a) the normalized intensity distribution of Co Kα fluorescence measured; (b) intensity distribution after correction for sample absorption; (c) image after the correction for the detector position and crystal orientation; (d) image after removal of Kossel lines, using filter techniques; (e) reconstructed atomic images in the plane containing the source atom and parallel to the crystal surface. The dotted lines indicate the crystal lattice (79) (courtesy of G. Faigel et al.; reprinted from Phys. Rev. Lett. 82, M. Tegze, G. Faigel, S. Marchesini, M. Belakhovsky, and A. I. Chumakov, Three Dimensional Imaging of Atoms with Isotropic 0.5A˚ Resolution, pp. 4,847–4,850, copyright 1999, American Physical Society with permission).
X-RAY FLUORESCENCE IMAGING
(a)
−0.02
0
0.02
(b)
−5
0
5 × 10−3
(c) 1 0.8 0.6 0.4 0.2 0 Figure 20. (a) Extended hologram before and (b) after employing a low-pass filter; (c) reconstructed atomic image of Co atoms. The first nearest neighbors are shown in dashed lines, and the next nearest neighbors in dotted lines (79) (courtesy of G. Faigel et al.; reprinted from Phys. Rev. Lett. 82, M. Tegze, G. Faigel, S. Marchesini, M. Belakhovsky, and A. I. Chumakov, Three Dimensional Imaging of Atoms with Isotropic 0.5A˚ Resolution, pp. 4,847–4,850, copyright 1996, American Physical Society with permission).
introduces modulation of the fluorescent intensity, which can be precisely measured and subtracted. Figure 19c is the corrected hologram. The Kossel, or, sometimes, called XSW (X-ray standing wave) lines, can be filtered out by a low-pass spatial filter, as already mentioned. This filtering is carried out by calculating the convolution of the hologram that has a Gaussian on the surface of the sphere. The width of the Gaussian is σ ≈ λ/2π rmax , where the maximum radius rmax ≈ 5A˚ of the region for imaging and λ is the X-ray wavelength. The hologram after filtering is shown in Fig. 19d. Threefold symmetry due to the (111) surface is clearly seen in the figure. The atomic image in the plane that contains the source atom, parallel to the crystal surface, was reconstructed by using the Helmholtz–Kirchhoff formula (78,82) and is shown in Fig. 19e. The six nearest neighbored Co atoms appear at
1489
approximately the proper positions relative to the central fluorescent atom (not shown). Clearly, Fig. 19e is merely a two-dimensional reconstructed atomic image. For threedimensional imaging, the sampling angular range should be increased up to 4π in solid angle, and the structural information provided by the Kossel lines (or XSW lines) needs to be included. Figure 20a shows the extended hologram after accounting for these two factors. The images of conic Kossel lines become much clearer. The lowpass filtered image is shown in Fig. 20b for comparison. The corresponding three-dimensional image reconstructed from Fig. 20b and the outline of the crystal lattice are given in Fig. 20c. Both the nearest and next nearest neighbor Co atoms are clearly seen. However, the nearest neighbor sites are slightly shifted toward the central atom due to the twin-image interference and the angular dependence of the atomic scattering factor (78). This distortion can be corrected by summing holograms taken at different energies (82), that is, the properly phased summing of the reconstructed wave amplitudes suppresses the twin images and reduces the unwanted intensity oscillation. Therefore, the combination of NXFH and IXFH (multipleenergy) could provide better image quality. Figures 21a–d show the holograms of CoO measured at E = 6.925, 13.861, 17.444, and 18.915 keV, respectively. The first and the third pictures were taken in the normal mode (NXFH) using a conventional X-ray source, and the second and the fourth were taken in the inverse mode (IXFH) using a synchrotron source. The combination of these four measurements leads to the three-dimensional atomic structure of the Co atoms shown in Fig. 21e. Slight improvement of the image quality in Fig. 21e, compared with the single-energy image in Fig. 20c can be detected. The atomic positions of Co atoms are closer to their real positions in the known CoO crystal, and the background noise is substantially reduced. The spatial resolution of the atomic positions is estimated from the full width at half-maxima (FWHM) of the fluorescent intensity ˚ The deviation of the intensity at approximately 0.5A. maxima from the expected real positions of the nearest ˚ Co neighbors is less than 0.1A. In summary, X-ray fluorescent holographic imaging can be used to map three-dimensional local atomic structures at isotropic resolution and without any prior knowledge of the structures. The combination of the experimental techniques of NXFH and IXFH, singleenergy and multiple-energy, together with mathematical evaluation methods, reinforces the three-dimensional capabilities of X-ray fluorescent holographic imaging. Very recently, improved experimental conditions and a datahandling scheme have made possible the imaging of light atoms (88), as well as atoms in quasi-crystals (89). It is hoped that this X-ray fluorescent imaging method can be applied in the future to more complex crystal systems, such as macromolecules. X-ray Fluorescence Images from Solids in Electric Fields X-ray fluorescence emitted from atoms usually exhibits a homogeneous spatial intensity distribution, except for the direction along a polarization axis. The familiar doughnutshaped distribution that has zero intensity along the
1490
X-RAY FLUORESCENCE IMAGING
(a)
(b) (e)
1
× 10−3
0.8
5 0.6 (c)
0
(d)
0.4
−5
0.2 0
Figure 21. Holograms of CoO measured at E = (a) 6.925, (b) 13.861, (c) 17.444, and (d) 18.915 keV. (e) Three-dimensional images obtained from the combination of the four measurements (79) (courtesy of G. Faigel et al.; reprinted from Phys. Rev. Lett. 82, M. Tegze, G. Faigel, S. Marchesini, M. Belakhovsky, and A. I. Chumakov, Three Dimensional Imaging of Atoms with Isotropic 0.5A˚ Resolution, pp. 4,847–4,850, copyright 1996, American Physical Society with permission).
polarization axis, as expected from Eq. (4), is commonly observed in many experiments (8, 90–96). In contrast to this common feature, different X-ray fluorescent images could be formed if a solid sample is placed in an external electric field during photon excitation. In recent experiments, discrete ring-like fluorescent images have been observed from solids excited by synchrotron radiation (97). The semi angles of the rings found are less than 20° and are closely related to the atomic number Z and to the principal and the orbital angular momentum quantum numbers n and . The details of this unusual observation are described here. Synchrotron radiation of energies ranging from 15 eV to 9 keV is used as the X-ray source to cover the excitation energies of the elements of 4 < Z < 27. The formation of X-ray fluorescent images in amorphous, single- and polycrystalline materials that contain these elements was investigated. The samples were placed on a holder at the center of a UHV chamber (10−9 Torr) (see Fig. 22). Incident radiation whose beam size was 1.5 × 1.5 mm and photon energy E0 hit the sample and fluorescent radiation generated was recorded on a five-plate MCP (microchannel plate) detector placed 6 to 10 cm from the sample (Fig. 22). The MCP detector is operated at −4,050 volts during fluorescent detection. The diameter of the MCP was 30 mm, and the detecting area contained 1,022 × 1,022 pixels. The spatial resolution was 25 µm, the width of a pixel. A copper grid with a +150 volts bias was mounted on the front face of the detector to stop the positive ions. The −4,050 volt bias of the MCP is sufficient to prevent negatively charged particles from entering the detector. The fluorescent signals amplified by the electron gain of 5 × 107 were digitized and displayed on a color monitor. The sample current Is can be measured and a variable bias Vs can be applied to the sample. ϕ and θ are the angles of the sample surface and the detection direction with respect to the incident beam, respectively. The detector is placed at θ = 45° or θ = 90° . Total fluorescence yield (TFY) and total
Synchrotron radiation
Sample j q = 45° MCP
Position analyzer
Is
Bias (Vs)
+150 V −4050 V Computer
Figure 22. Schematic of the experimental setup for X-ray fluorescence in an electric field. The monochromatic SR excites the sample and generates fluorescence measured by the MCP area detector. The electric field results from the working bias of the MCP (97).
electron yield (TEY) versus E0 , measured from the MCP and the sample current Is , respectively, gave absorption spectra that indicate the energy positions of absorption edges. Figures 23a and b show the fluorescent images obtained at θ = 45° for ϕ = 45° from a sapphire sample whose photon energy was slightly above (E0 = 545 eV) and below (E0 = 500 eV) the oxygen K edge (EK = 543 eV). A bright ring and a dark hole are observed for E0 > EK (Fig. 23a) and E0 < EK (Fig. 23b), respectively. The ring images remain the same for various ϕ angles, as long as E0 ≥ EK (the energies of absorption edges), irrespective of the crystalline form of the samples. For example, the fluorescent rings of oxygen 1s from a single-crystal Al2 O3 ,
X-RAY FLUORESCENCE IMAGING
Cursor: 510 X 510 Y Z 1.000 R 12.00 25.00 23.33 21.67 20.00 18.33 16.67 15.00 13.33 11.67 10.00 8.333 6.667 5.000 3.333 1.667
(b)
(c)
Cursor: X 510 510 Y Z 3.000 R 3.000 35.00 32.67 30.33 28.00 25.67 23.33 21.00 18.67 16.33 14.00 11.67 9.333 7.000 4.667 2.333
(d)
Cursor: 510 X 510 Y Z 2.000 R 2.000 13.00 12.13 11.27 10.40 9.533 8.667 7.800 6.933 6.067 5.200 4.333 3.467 2.600 1.733 0.8667
(e)
Cursor: X 510 510 Y Z 18.00 R 18.00 25.00 23.33 21.67 20.00 18.33 16.67 15.00 13.33 11.67 10.00 8.333 6.667 5.000 3.333 1.667
(f)
Cursor: X 1020 400 Y 0. Z R 5.000 23.00 21.47 19.93 18.40 16.87 15.33 13.80 12.27 10.73 9.200 7.667 6.133 4.600 3.067 1.533
(a)
1491
Cursor: 510 X 510 Y 0. Z R 1.000 10.00 9.333 8.667 8.000 7.333 6.667 6.000 5.333 4.667 4.000 3.333 2.667 2.000 1.333 0.6667
Figure 23. X-ray fluorescent ring images at θ = 45 ° and ϕ = 45 ° of (a) O1s of sapphire excited at E0 = 545 eV, (b) sapphire at E0 = 500 eV, (c) B1s of boron nitride (BN) powder excited at E0 = 194 eV > EK (B1s ), (d) C1s of graphite at E0 = 285 eV > EK (C1s ), (e) Ti1s excited at E0 = 5,000 eV > EK (Ti1s ), and (f) the photoelectron image of Mn1s excited at E0 = 6,565 eV > EK (Mn1s ), where EK (O1s ) = 543 eV, EK (B1s ) = 192 eV, EK (C1s ) = 284 eV, EK (Ti1s ) = 4,966 eV, and EK (Mn1s ) = 6,539 eV. (A scratch is always observed on the MCP screen (97).
a piece of glass, and amorphous LaNiO3 and SiO2 samples are nearly the same. Fluorescent rings are repeatedly observed for various elements of atomic numbers 4 < Z < 27 at θ = 45° and 90° . For illustration, Figs. 23c–e display the images of the excitations of B1s , C1s , and Ti1s in color. Compared with fluorescence, photoelectron ring images (98) can also be
clearly observed by applying a −2,500 volt sample bias. Figure 23f shows such a ring image of Mn1s photoelectrons [EK (Mn) = 6,539 eV]. The experimental conditions for recording Fig. 23 are the following: The energy resolutions are E = 100 meV for Figs. 23a and b, 50 meV for Fig. 23c, and 800 meV for Fig. 23e and f. The sample-detector distances are 9.4 cm for Figs. 23a–d and 7.9 cm for
X-RAY FLUORESCENCE IMAGING
Figs. 23e and f. The angle ϕ is set at 45° and the counting time is 300 s for each image. Fluorescence involving 2p and 3d electrons that show ring images is also observed (not shown here). In addition, the excitation of the samples by photons of higher harmonics can also give ring images (not shown). The emitted photon spectra, measured by using a seven-element solid-state detector (E = 100 eV), proved to be the fluorescent radiation of energies E < EK . The semiangles δ corresponding to the ring radii follow the E−1/2 relationship (Fig. 24a), where E is the emission energy between the energy levels of atoms. The δ value remains the same for the same element involved, irrespective of its crystalline form. The origin of the formation of fluorescent rings is the following: The linearly polarized monochromatic radiation polarizes the atoms of the samples by the excitation of a given state (n, ). The excited atoms then emit fluorescence in all directions when returning to their stable states. Because fluorescent rings are observed at θ = 45° and 90° , irrespective of the rotation of the sample, this implies that wherever the detector is placed, the fluorescent ring is observed along that direction. The high working bias
(b) 0.04
ER
0.12
b b
0.08
C1s
0.03
0.12
0.02
0.08
0.06 1s
0.04
0.01
0.04
Cu2p
Ca1s
Cr1s
Fe1s
0
1
Au3d
0.02 0
1
2 3 4 5 Emission energy (kev)
6
2 3 4 5 Emission energy (kev)
6
7
(c) 0.12
2p
0.7
Cu b (radian)
d (radian)
b (radian)
B1s 0.10
0.16
0.10
0.6
Mn
0.5 0.08
Cr Sr
0.4 0.3
0.06 0.4
0.6
0.8 1.0 1.2 1.4 Emission energy (kev)
1.6
1.8
Figure 24. (a) The semiangles δ of the fluorescent cones versus the emission energy E of the various elements investigated; B1s , C1s , N1s , O1s , Cr2p , Mn2p , Cu2p , Al1s , Sr2p , P1s , Au3d , S1s , Cl1s , K1s , Ca1s , Ti1s , V1s , Cr1s , Mn 1s , Fe1s in the order of emission energy. The fitting curve is the function 0.058(±0.0016) × 1/E + 0.012(±0.0017). The standard deviation in δ is about 5%. (b) and (c) The corresponding opening angles β (the cross) and σ F /σ a (the triangle) for the elements investigated involving 1s and 2p fluorescence, respectively (see the text). The inset: Schematic of the redistributed dipoles (97).
7
s F/ s a
(a)
of the MCP detector may also affect the polarized atoms of the sample during photon excitation. The influence of the electric field between the detector and the sample on the fluorescent image from changing the sample bias Vs is shown in Figs. 25a, b, and c for Cr1s at Vs = 0, 2,000 and −2,500 volts, respectively. As can be seen, the intensity of the fluorescence (Fig. 25b) increases when the electric potential difference between the detector and the sample increases. The image becomes blurred when the electric potential difference decreases (Fig. 25c). The latter is also due partly to the presence of photoelectrons. This R result indicates that the presence of the electric field E between the detector and the sample clearly causes the , to align to polarized atoms, namely, the emitting dipoles p R . Thus, the spherically distributed some extent with the E dipoles originally in random fashion are now redistributed as shown in the inset of Fig. 24a. The preferred orientation R , and there are no dipoles in the of the dipoles is along E R field area defined by the opening angle 2β. Because the E is very weak, angle β is very small. The emitted fluorescent intensity distribution IF (θ ) R for the dipole versus the angle θ with respect to E
s F/ s a
1492
X-RAY FLUORESCENCE IMAGING
(a)
Image Trak 2−D
0.
Channel
Image Trak 2−D
(b)
0.
Channel Image Trak 2−D
(c)
0.
Channel
Cursor: 510 X 510 Y Z 14.00 R 14.00 30.00 28.00 26.00 24.00 22.00 20.00 18.00 16.00 14.00 12.00 10.00 8.000 6.000 4.000 2.000 1020.
Cursor: X 510 Y 510 Z 14.00 R 14.00 30.00 28.00 26.00 24.00 22.00 20.00 18.00 16.00 14.00 12.00 10.00 8.000 6.000 4.000 2.000 1020. Cursor: X 510 Y 510 Z 13.00 R 13.00 30.00 28.00 26.00 24.00 22.00 20.00 18.00 16.00 14.00 12.00 10.00 8.000 6.000 4.000 2.000 1020.
Figure 25. The fluorescent images of Cr1s where the sample bias Vs is equal to (a) 0, (b) 2,000, and (c) −2,500 volts in a 200 s-exposure. The photoelectron ring appears near the center of the screen in (a) and (c), and the fluorescent ring is slightly below the center (97).
distribution shown in the inset of Fig. 24a can be calculated by considering the absorption of the incident and the fluorescent beams by the sample and the usual doughnut-shaped distribution sin2 (θ ) of the dipoles (see,
1493
also Eq. 4):
∞
IF (θ ) =
a
dz 0
dγ γ =−a
µa z µF z Bn − sin2 (θ + γ )e cos ϕ e− cos θ , (6) cos ϕ
αω3 ៝ 2 π |XAB | V and a = − β. µa and µF 2π c3 2 are the linear absorption coefficients of the incident and the fluoresced beams, respectively, where µa (ω) = n0 σ a and µF (ω) = n0 σ F ; σ ’s are the total absorption/fluorescence cross sections, and n0 is the total number of the dipoles per unit volume. γ covers all of the angular range, except for the β range. z is along the inward surface normal. I0 , A, V, and PAB are the intensity and the cross section of the incident beam, the irradiated volume, and the probability of absorption of the incident photons by the atoms in the transition from state A to state B, respectively. XAB is the corresponding transition matrix element (see Eq. 4). Integration of Eq. (6) leads to
where B = I0 APAB
IF (θ ) = B ×
sec ϕ(π/2 − β − 1/2 sin 2β cos 2θ ) , (σ F sec θ − σ a sec ϕ)
(7)
which describes a ring-like distribution. From the values of σ ’s given in Refs. 99 and 100, the β angles for various emission energies are determined by fitting the measured fluorescent profile IF (θ ) where β is the only adjustable variable. Figures 24b and c are the β angles determined and the ratios σ F /σ a versus emission energy for the 1s and 2p fluorescence. Clearly, both the β and σ F /σ a behave very similarly. Because the mean free path of X rays in the sample equals 1/µ, therefore, the β angle is closely related to the mean-free-path ratio a /F . The ring centers of Figs. 23a, c–e and the dark hole in R field. The strength Fig. 23b indicate the direction of the E R affects only the fluorescence intensity not the of the E ring size. The formation of a bright ring or a dark hole on the detector depends only on whether the incident photon energy is greater or smaller than the absorption edges. This is analogous to the photoelectric effect. Based on the experimental data, the semiangle of the fluorescent ring for a given Z, n, and can be predicted by using the curve-fitting relationship: δ = 0.058(±0.0016) × 1/E + 0.012(±0.0017), where E is the emission energy between the energy levels of atoms. In summary, the observed ring-like discrete spatial distributions of X-ray fluorescence result from the collective alignment effect of the dipoles in solids exposed to an external electric field. The induced small opening of the dipole distribution due to this alignment effect and the self-absorption by the samples related to X-ray mean free paths are responsible for the formation of a fluorescent ring, which differs drastically from our common knowledge about emitted radiation. An empirical formula predicts the dimension of the fluorescent rings for given Z, n, and . This interesting feature, in turn, provides an alternative for characterizing materials according to their fluorescent images. X-ray fluorescence is a useful imaging technique for trace element analysis in biomedical and environmental applications and material characterization. It has the
1494
X-RAY FLUORESCENCE IMAGING
potential for crystallographic structural studies such as crystal-structure determination of surfaces, interfaces, and bulk materials, using X-ray fluorescent holographic imaging. The availability of synchrotron radiation, the excellent soft and hard X-rays obtained from this source, and the increasing number of synchrotron facilities in the world will certainly enlarge the applicability of this fluorescent imaging technique. It is anticipated that in the near future this technique will develop into a major imaging tool for investigating various kinds of materials at submicron or even smaller scales. Acknowledgments The author is indebted to G. Faigel, T. Gog, N. Gurker, S. Homma-Takeda, A. Iida, Y. Kumagai, N. Shimojo, the American Institute of Physics, the American Physical Society, and the publisher of Nature, Elsevier Science B. V., and John Wiley and Sons for permission to reproduce figures and photographs from their published materials.
ABBREVIATIONS AND ACRONYMS APD CCD DCM EM wave FAAS FWHM HV IC ICCD ICP-AES IEF-AGE IP IXFH K-B LN2 LPE MAD MCP MeHg MMC NXFH SMM SOD SR SSD TEY TFY UHV XSW
avalanche photodiode charge-coupled device double-crystal monochromator electromagnetic wave flameless atomic absorption spectrometry full width at half-maxima high vacuum ionization chamber intensified charge-coupled device inductively coupled plasma atomic emission spectrometry isoelectric focusing agarose gel electrophoresis imaging plate inverse (multiple-energy) X-ray fluorescent holography Kirkpatrick-Baez liquid nitrogen liquid-phase epitaxy multiwavelength anomalous dispersion microchannel plate methylmercury methylmercury chloride normal (single-energy) X-ray fluorescent holography synthetic multilayer monochromator superoxide dismutase synchrotron radiation solid-state detector total electron yield total fluorescence yield ultrahigh vacuum X-ray standing wave
BIBLIOGRAPHY 1. P. P. Ewald, FiftyYears of X-Ray Diffraction, N.V.A. Oosthoek’s Uitgeversmaatschappij, Utrecht, The Netherlands, 1962. 2. A. H. Compton and S. K. Allison, X-Rays in Theory and Experiment, 2nd ed., Van Nostrand, Princeton, 1935. 3. L. V. Azaroff, Elements of X-Ray Crystallography, McGrawHill, NY, 1968.
˚ 4. T. Aberg and J. Tulkki, in B. Crasemann, ed., Atomic InnerShell Physics, Plenum press, New York, 1985, pp. 419–463 and the references therein. 5. E. Merzbacher, Quantum Mechanics, Wiley, NY, 1961. 6. A. Messiah, Quantum Mechanics, North-Holland, Amsterdam, 1962. 7. J. J. Sakurai, Advanced Quantum Mechanics, AddisonWesley, NY, 1967. 8. J. D. Jackson, Classical Electrodynamics, Wiley, NY, 1967. 9. E. -E. Koch, D. E. Eastman, and Y. Farge, in E. -E. Koch, ed., Handbook on Synchrotron Radiation, 1a ed., NorthHolland, Amsterdam, 1983, pp. 1–63. 10. T. Matsushita and H. Hashizume, in E. -E. Koch, ed., Handbook on Synchrotron Radiation, 1a ed., North-Holland, Amsterdam, 1983, pp. 261–314. 11. R. L. M¨ossbauer, Z. Phys. 151, 124–143 (1958). 12. D. P. Siddons et al., Rev. Sci. Instrum. 60, 1,649–1,654 (1989). 13. S. Hayakawa et al., Nucl. Instrum. Methods. B49, 555–560 (1990). 14. Y. Suzuki and F. Uchida, Rev. Sci. Instrum. 63, 578–581 (1992). 15. P. Kirkpatrick and A. V. Baez, J. Opt. Soc. Am. 38, 766–774 (1948). 16. A. Iida, M. Takahashi, K. Sakurai, and Y. Gohshi, Rev. Sci. Instrum. 60, 2,458–2,461 (1989). 17. N. Gurker, X-Ray Spectrom. 14, 74–83 (1985). 18. N. Gurker, Adv. X-Ray Analysis 30, 53–60 (1987). 19. M. Bavdaz et al., Nucl. Instrum. Methods A266, 308–312 (1988). 20. G. T. Herman, Image Reconstruction from Projections, Academic Press, NY, 1980. 21. N. Shimojo, S. Homma, I. Nakgi, and A. Iida, Anal. Lett. 24, 1,767–1,777 (1991). 22. W. J. M. Lenglet et al., Histochemistry 81, 305–309 (1984). 23. A. Iida and T. Norma, Nucl. Instrum. Methods B82, 129–138 (1993). 24. Y. Suzuki, F. Uchida, and Y. Hirai, Jpn. J. Appl. Phys. 28, L1,660–(1989). 25. N. Shimojo et al., Life Sci. 60, 2,129–2,137 (1997). 26. S. A. Katz and R. B. Katz, J. Appl. Toxicol. 12, 79–84 (1992). 27. W. M. A. Burgess, L. Diberardinis, and F. E. Speizer, Am Ind. Hyg. Assoc. J. 38, 184–191 (1977). 28. W. E. Atchison and M. F. Hare, FASEB J. 8, 622–629 (1994). 29. M. Aschner and J. L. Aschner, Neurosci. Biobehaviour Rev. 14, 169–176 (1990). 30. Y. Kumagai, S. Homma-Takeda, M. Shinyashiki, and N. Shimojo, Appl. Organomet. Chem. 11, 635–643 (1997). 31. A. J. J. Bos et al., Nucl. Instrum. Methods B3, 654–659 (1984). 32. I. Orlic, J. Makjanic, and V. Valkovic, Nucl. Instrum. Methods B3, 250–252 (1984). 33. K. Okmoto et al., Clin. Chem. 31, 1,592–1,597 (1985). 34. S. Osaki, D. A. Johnson, and E. Freiden, J. Biol. Chem. 241, 2,746–2,751 (1966). 35. T. L. Sourkes, Pharmacol. Rev. 24, 349–359 (1972). 36. S. H. Oh, H.E. Ganther, and W. G. Hoekstra, Biochemistry 13, 1,825–1,829 (1974). 37. D. Keilin and T. Mann, Biochem. J. 34, 1,163–1,176 (1940).
X-RAY TELESCOPE 38. C. G. Elinder, in L. Friberg, G. F. Nordberg, and V. B. Vouk, eds., Handbook on Toxicology of Metals, vol. 2, Oxford, Amsterdam, 1986, p. 664. 39. G. L. Fisher, V. S. Byers, M. Shifrine, and A. S. Levin, Cancer 37, 356–363 (1976). 40. B. F. Issel et al., Cancer 47, 1,845–1,848 (1981). 41. N. Cetinkaya, D. Cetinkaya, and M. Tuce, Biol. Trace Element Res. 18, 29–38 (1988). 42. S. Inutsuka and S. Araki, Cancer 42, 626–631 (1978). 44. E. Huhti, A. Poukkula, and E. Uksila, Respiration 40, 112–116 (1980). 45. S. Homma et al., J. Trace Elements Exp. Med. 6, 163–170 (1993). 46. B. Rosoff and H. Spence, Nature 207, 652–654 (1965). 47. S. Homma, I. Nakai, S. Misawa, and N. Shimojo, Nucl. Instrum. Methods B103, 229–232 (1995). 48. K. Julshamn et al., Sci. Total Environ. 84, 25–33 (1989). 50. N. Koizumi et al., Environ. Res. 49, 104–114 (1989). Phamacol.
53. S. Homma-Takeda, Y. Kumagai, M. Shinyashiki, N. Shimojo, J. Synchrotron Radiat. 5, 57–59 (1998).
76. B. Adams et al., Phys. Rev. B57, 7,526–7,534 (1998). 77. S. Y. Tong, C. W. Mok, H. Wu, and L. Z. Xin, Phys. Rev. B58, 10 815–10 822 (1998). 78. G. Faigel and M. Tegze, Rep. Prog. Phys. 62, 355–393 (1999). 80. R. J. Collier, C. B. Burckhardt, and L. H. Lin, Optical Holography, Academic Press, NY, 1971. 81. W. Kossel, Ann. Phys. (Leipzig) 26, 533–553 (1936). 82. J. J. Barton, Phys. Rev. Lett. 61, 1,356–1,359 (1988). 83. J. J. Barton, Phys. Rev. Lett. 67, 3,106–3,109 (1991). 84. S. Y. Tong, H. Huang, and C. M. Wei, Phys. Rev. B46, 2,452–2,459 (1992). 85. S. Thevuthasan et al., Phys. Rev. Lett. 70, 595–598 (1993).
87. A. Q. R. Baron, Nucl. Instrum. Methods A352, 665–667 (1995).
51. S. Homma-Takeda et al., Anal. Lett. 29, 601–611 (1996). Toxicol.
75. P. M. Len et al., Phys. Rev. B56, 1,529–1,539 (1997).
86. G. P. Harp et al., J. Electron Spectrosc. Relat. Phenomena 70, 331–337 (1991).
49. R. Scott et al., Urol. Res. 11, 285–290 (1983).
Environ.
73. T. Gog et al., Phys. Rev. Lett. 76, 3,132–3,135 (1996). 74. P. M. Len, T. Gog, C. S. Fadley, and G. Materlik, Phys. Rev. B55, 3,323–3,327 (1997).
79. M. Tegze et al., Phys. Rev. Lett. 82, 4,847–4,850 (1999).
43. M. Hrgovcic et al., Cancer 31, 1,337–1,345 (1973).
52. M. Shinyashiki et al., 359–366 (1996).
1495
2, and
54. N. Shimojo et al., J. Occup. Health 39, 64–65 (1997). 55. J. A. R. Samson, Techniques of Vacuum Ultraviolet Spectroscopy, Wiley & Sons, NY, 1967. 56. G. W. Berkstresser, J. Shmulovich, D. T. C. Huo, and G. Matulis, J. Electrochem. Soc. 134, 2,624–2,628 (1987). 57. G. W. Berkstresser et al., J. Electrochem. Soc. 135, 1,302– 1,305 (1988). 58. B. La Fontaine et al., Appl. Phys. Lett. 63, 282–284 (1995). 59. H. A. Hauptman, Phys. Today 42, 24–30 (1989). 60. G. H. Stout and L. H. Jensen, X-ray Structure Determination, 2nd ed., Wiley, NY, 1989. 61. R. G. Rossmann, ed., The Molecular Replacement Method, Gordon and Breach, NY, 1972. 62. H. Schenk, ed., Direct Methods for Solving Crystal Structures, Plenum Press, NY, 1991. 63. M. M. Woolfson and H. -F. Han, Physical and Non-Physical Methods of Solving Crystal Structures, Cambridge University Press, Cambridge, 1995.
88. M. Tegze et al., Nature 407, 38–40 (2000). 89. S. Marchesini et al., Phys. Rev. Lett. 85, 4,723–4,726 (2000). 90. J. Muller et al., Phys. Lett. 44A, 263–264 (1973). 91. U. Fano and J. H.Macek, Rev. Mod. Phys. 45, 553–573 (1973). 92. C. H.Greene and R. N.Zare, Ann. Rev. Phys. Chem. 33, 119–150 (1982). 93. D. W. Lindle et al., Phys. Rev. Lett. 60, 1,010–1,013 (1988). 94. S. H. Southworth et al., Phys. Rev. Lett. 67, 1,098–1,101 (1991). 95. Y. Ma et al., Phys. Rev. Lett. 74, 478–481 (1995). 96. J. A. Carlisle et al., Phys. Rev. Lett. 74, 1,234–1,237 (1995) 97. C. K. Chen et al., Paper 14P9, Third Crystallogr. Assoc., Kuala Lumpur, 1998.
Conf.
Asian
98. H. Helm et al., Phys. Rev. Lett. 70, 3,221–3,224 (1993). 99. J. J. Yeh, Atomic Calculation of Photoionization Crosssections and Asymmetry Parameters, Gordon & Breach, NY, 1993. 100. E. B. Saloman, J. H. Hubbel, and J. H. Scofield, Atom. Data Nucl. Data Tables 38, 1–51 (1988).
64. W. A. Hendrickson, Science 254, 51–58 (1991). 65. S. -L. Chang, Multiple Diffraction of X-Rays in Crystals, Springer-Verlag, Berlin, 1984.
X-RAY TELESCOPE
66. S. -L. Chang, Acta Crystallogr. A54, 886–894 (1998); also in H. Schenk, ed., Crystallography Across the Sciences, International Union of Crystallography, Chester, 1998, pp. 886–894. 67. A. Szoke, in D. T. Attwood and J. Boker, eds., Short Wavelength Coherent Radiation: Generation and Applications, AIP Conf. Proc. No. 147, American Institute of Physics, NY, 1986, pp. 361–467. 68. M. Tegze and G. Faigel, Europhys. Lett. 16, 41–46 (1991). 69. G. J. Maalouf et al., Acta Crystallogr. A49, 866–871 (1993). 70. A. Szoke, Acta Crystallogr. A49, 853–866 (1993). 71. P. M. Len, S. Thevuthasan, and C. S. Fadley, Phys. Rev. B50, 11 275–11 278 (1994). 72. M. Tegze and G. Faigel, Nature 380, 49–51 (1996).
WEBSTER CASH University of Colorado Boulder, CO
An X-ray telescope is an optic that is used to focus and image x-rays like a conventional telescope for visible light astronomy. X-ray telescopes are exclusively the domain of X-ray astronomy, the discipline that studies highenergy emissions from objects in space. A telescope, by its very nature, is for studying objects at large distances, concentrating their radiation, and magnifying their angular extent. Because X rays cannot travel large distances through the earth’s atmosphere, they can be
1496
X-RAY TELESCOPE
used only in spacecraft, observing targets through the vacuum of space. X rays penetrate matter and are used to image the interior of the human body. Unfortunately, this means that X rays also tend to penetrate telescope mirrors, making the mirror worthless for astronomy. But, by using of a specialized technique called grazing incidence, telescopes can be made to reflect, and astronomy can be performed. In this article, we describe the basic techniques used to build X-ray telescopes for X-ray astronomy. Because X-ray astronomy must be performed above the atmosphere, it is a child of the space program. Telescopes are carried above the atmosphere by rockets and used to study the sun, the planets, and objects in the depths of space. These telescopes have provided a different view of the hot and energetic constituents of the universe and have produced the key observations to establish the existence of black holes. X-RAY ASTRONOMY Spectra of Hot Objects X rays are created by the interactions of energetic charged particles. A sufficiently fast moving electron that impacts an atom or ion or is accelerated in a magnetic or electric field can create high-frequency radiation in the X-ray band. Thus X rays tend to be associated with objects that involve high-energy phenomena and high temperatures. In general, if radiation is generated by thermal processes, the characteristic frequency emitted will be given by hν ≈ kT (1)
Intensity
The X-ray band stretches from 1016 –1018 Hz, which indicates that the characteristic temperatures of objects range from 106 K up to 108 K. At these extreme temperatures, matter does not exist in the forms we see in everyday life. The particles are moving so fast that atoms become ionized to form plasma. The plasma can have very low density as in the blast wave of a supernova, or very high density under the surface gravity of a neutron star. Many stars, including the sun, have a hot, X-ray emitting gas around them, called a corona. For example, the Xray spectrum of the corona of the star HR1,099 is shown in Fig. 1.
5
10
15
20 25 Wavelength (Å)
30
Figure 1. X-ray spectrum of the star HR1,099.
35
Figure 2. Image of the X rays emitted from the 300-year-old supernova explosion remnant named Cas-A. The picture shows an expanding shock front of interstellar gas and the collapsed remnant star in the middle. This image was acquired using the 1.2-m diameter X-ray telescope on the Chandra Satellite. See color insert.
Dramatic events in the universe, such as supernovae, also generate X rays. The shock waves from the exploding star heat as they pass through interstellar space to create supernova remnants, an expanding shell of hot plasma, as shown in Fig. 2. Another way by which X rays are created is through the acceleration of matter near the surfaces of collapsed stars, including white dwarfs, neutron stars, and black holes. As matter spirals into these extreme objects, it heats to high temperatures and emits X rays. X-ray telescopes, through imaging, timing, and spectroscopy, have played a central role in proving the existence of black holes and in studying the physics of the most extreme objects in the universe. Interstellar Absorption The space between stars is a vacuum, better than the best vacuum ever created in a laboratory on earth. But in the immense stretches of interstellar space, there is so much volume that the quantities of gas become large enough to absorb X rays. In our part of the galaxy, there is an average of about one atom of gas for every 10 cc of space. The composition of this gas is similar to that of the sun, mostly hydrogen and some helium mixed in. There is also a small, but significant quantity of other heavier elements such as oxygen and carbon. Across the distances between the stars, the absorption caused by these stars can become significant in the soft X-ray band. In Fig. 3, we show a graph of the transmission of interstellar gas as a function of X-ray energy. It shows that low-energy X rays have higher absorption, so that one cannot see as far through the galaxy at 0.1 keV, as at 1 keV. It is for this reason that x-ray telescopes are usually
X-RAY TELESCOPE
For hard X rays, however, there is a more modest means of achieving altitude — the balloon. Large balloons can carry telescopes to altitudes above 100,000 feet, which is sufficient to observe the harder X rays. Balloons can stay up for hours, days, and even weeks, compensating for the lower flux of observable signal.
1.0
Transmission
0.8
0.6
Signal-to-Noise Issues 0.4
0.2
0.0
0
20
40
60 80 Wavelength (Å)
100
120
Figure 3. The thin gas between the stars can absorb X rays, particularly at low energies. This graph shows the transmission of the interstellar medium as a function of X-ray wavelength for the amount of interstellar gas expected at 2,000 parsecs.
designed to function above 0.5 keV and up to 10 keV, if possible. Atmospheric Absorption Because of absorption, X rays cannot penetrate the earth’s atmosphere. The primary interaction is by the photoelectric effect in the oxygen and nitrogen that are the main constituents of our atmosphere. Thus, to use an X-ray telescope, we must be above most of these gases. In Fig. 4, we show the transmission of the atmosphere at an altitude of 110 km, as it becomes partially transparent. A soft X-ray telescope requires a rocket to gain sufficient altitude. Whether by suborbital rocket to view above the atmosphere quickly for five minutes, by a larger missile carrying a satellite to orbit, or by a larger launcher yet carrying the telescope to interplanetary space, the rocket is essential.
1.0
0.8 Transmission
1497
0.6
Telescopes were not used in the early days of Xray astronomy. The first observations of the sky were performed using proportional counters. These are modified Geiger counters that have large open areas and no sensitivity to the direction from which the photon came. Grid and slat collimators were used to restrict the solid angle of sky to which the detector was sensitive, so that individual sources could be observed. Collimators were built that achieved spatial resolution as fine as 20 arcminutes, which was adequate for the first surveys of the sky and for studying the few hundred brightest X-ray sources in the sky. However, the detectors had to be large to collect much signal, and that led to high detector background. The large angular extent caused signal from the sky to add to the background as well. Additionally, many faint sources could be unresolved in the field of view at the same time, leading to confusion. If X-ray astronomy were to advance, it needed telescopes to concentrate the signal and resolve weak sources near each other in the sky. GRAZING INCIDENCE X rays are often referred to as ‘‘penetrating radiation’’ because they pass easily through matter. The medical applications of this property revolutionized medicine. However, when the task is to build a telescope, it becomes necessary to reflect the radiation rather than transmit or absorb it. The fraction of 1-keV X rays reflected from a mirror at normal incidence can be as low as 10−10 , effectively killing its utility as an optic. Another problem of conventional mirrors is their roughness. If a mirror is to reflect radiation specularly, it needs to a have surface roughness substantially lower than the wavelength of the radiation. In the X ray, this can be difficult, considering that the wavelength of a 10-keV X ray is comparable to the diameter of a single atom. These problems would have made X-ray telescopes impossible, except for the phenomenon of grazing incidence, in which an X ray reflects off a mirror surface at a very low angle, like a stone skipping off a pond (Fig. 5).
0.4
Grazing Incidence The graze angle is the angle between the direction of the incident photon and the plane of the mirror surface. In
0.2
0.0
0
20
40
60
80
100
Wavelength (Å) Figure 4. This shows the fraction of X rays transmitted from overhead down to an altitude of 110 km above sea level as a function of X-ray wavelength.
Figure 5. An X ray approaches a mirror at a very low graze angle, reflecting by the property of total external reflection.
1498
X-RAY TELESCOPE
most optical notation, the incidence angle is the angle between the direction of the ray and the normal to the plane of the mirror. Thus, the graze angle is the complement of the incidence angle. At any given X-ray wavelength, there is a critical angle, below which the X rays reflect. As the graze angle drops below the critical angle, the efficiency of the reflection rises. As the energy of the X ray rises, the critical angle drops, so hard X-ray optics feature very low graze angles. In general the critical angle θc is given by sin θc = λ
e2 N mc2 π
The reflectance of the transverse electric wave (one of the two plane polarizations) is given by RE =
(2)
where λ is the wavelength of the X ray and N is the number of electrons per unit volume (1). This behavior comes about because the index of refraction inside a metal is less than one, to an X ray. The X ray interacts with the electrons in the reflecting metal as if it had encountered a plasma of free electrons. The wave is dispersed and absorbed as it passes through the metal, allowing the index of refraction to fall below one. This process can be described by assigning a complex index of refraction to the material (2). The index of refraction n of a metal is given by n = 1 − δ − iβ,
(3)
where the complex term β is related to the absorption coefficient of the metal. If β is zero, then δ cannot be positive. A well-known optical effect in the visible is total internal reflection, which leads to the mirror-like reflection of the glass in a fish tank. If the radiation approaches the glass at an angle for which there is no solution to Snell’s law, it is reflected instead of transmitted. This happens when the radiation attempts to pass from a medium of higher index of refraction to one of lower index. In the X ray, where the index of refraction in metals is less than one, total external reflection is experienced when the ray tries to pass into a metal from air or vacuum where the index of refraction is closer to unity.
RM =
but now ϕt is complex, and evaluation involves complex arithmetic.
cos ϕi − n cos ϕt cos ϕi + n cos ϕt
∗ ,
(5)
cos ϕi − cos ϕt /n cos ϕi + cos ϕt /n
cos ϕi − cos ϕt /n cos ϕi + cos ϕt /n
∗ ,
(6)
where now the star on the second term represents the complex conjugate. Extensive work has been done over the years to tabulate the index of refraction of X rays in a wide variety of optical materials and elements. Tabulations can be found in the literature (3). Certainly, the most common material used as a coating in the X ray is gold, which is stable and easy to deposit. A few other materials can be better, including platinum and osmium at high energies and nickel and uranium below 1 keV. In Fig. 6, we show the reflectivity of gold as a function of X-ray energy. From perusal of the chart, one can see that in the quarter kilovolt band (0.1–0.25 keV), graze angles as high as 5° are possible. At 1 keV, angles close to 1° are required, and at 10 keV, angles below half a degree are necessary. Mirror Quality The principles of grazing incidence provide a technique for the efficient reflection of X rays, but the mirrors must be of adequate quality. Analogous to all telescope mirrors, they must have good figure, good polish, and adequate size to suppress diffraction (4). In Fig. 7, we show the reflection of the ray in three dimensions. The direction of the ray can be deflected to the side by a slope error in the ‘‘off-plane’’ direction.
1.0
Fresnel Equations
1°
0.8
Reflectance
The equations for reflection of X rays are the same as those for longer wavelength radiation and are known as the Fresnel equations (2). In the X ray, evaluation of the equations differs from the usual because the index of refraction has an imaginary component, so the algebra becomes complex. If radiation approaches at an angle ϕi with respect to the normal (ϕ, the incidence angle, is the complement of θ , the graze angle), then some of the radiation will reflect at an angle ϕr , which is equal to ϕi . Some of the power will be transmitted into the material at an angle ϕt with respect to the normal, where ϕt is given by Snell’s law: sin ϕi (4) sin ϕt = n
and the transverse magnetic wave (the other polarization) is given by
1/2 ,
cos ϕi − n cos ϕt cos ϕi + n cos ϕt
2° 0.6 3° 0.4
0.2
0.0
0
20
40
60
80
100
Wavelength (Å) Figure 6. Plot of the fraction of X rays reflected off a polished gold surface as a function of the wavelength of the X ray. Three graze angles are shown. One can see the drop in efficiency as the angle rises.
X-RAY TELESCOPE
have a diffraction limit that is given by
In-plane scatter Off-plane scatter
ϕ = 0.7
Figure 7. Symmetry is broken by a grazing incidence reflection. Scatter is much worse in the plane of the incident and reflected ray. There is scatter off-plane, but it is much lower.
However, as one can see from the diagram, to change the direction of the ray that reflects at angle θ by an angle close to θ requires a slope error of the order of 30° . The effective angle through which the ray is thrown is reduced by a factor of sin θ in the off-plane direction. This means that errors in mirror quality lead to greater error in the in-plane direction. The resulting image is blurred anisotropically and creates a long, narrow image in the vertical direction (4). The height of the image blur is roughly 1/ sin θ times the width. Microscopic roughness can also degrade the image by scattering the X rays. Scatter is a problem at all wavelengths, but it is particularly severe in the X ray region where the radiation has a short wavelength. Scatter is caused by small deviations in the wave fronts of the reflected light. A deviation in height on the surface of the mirror of size δ will create a deviation in phase of size 2δ sin θ , where θ is the graze angle. This means that the amount of scattering drops with the sine of the graze angle, that allows relaxation of polish requirements. If one assumes that the surface roughness can be described as a probability function that has Gaussian distribution, then the total amount of scatter will be given by −
S=e
4π σ sin θ λ
1499
λ D
(9)
where λ is the wavelength of the light, D is the diameter of the aperture through which the wave passes, and ϕ is the width of the diffracted beam in radians. The same equation holds true for the X ray, but now D represents the projected aperture of the optic, as viewed by the incoming beam. A grazing incidence optic of length L would create a projected aperture of size D sin θ , where θ is the graze angle. A large grazing incidence optic has a length of 50 cm, which indicates that D will be typically 1 cm in projection. Because an X ray has λ = 1 nm, ϕ will have a value of 10−7 radians, which is 0.02 arcseconds. This resolution is higher than that of the Hubble Space Telescope, so diffraction has not yet become a limiting factor in X-ray telescopes. WOLTER TELESCOPES The previous section describes how one can make a mirror to reflect X rays efficiently, but it does not provide a plan for building a telescope. The geometry of grazing incidence is so different from the geometry of conventional telescopes that the designs become radically different in form. At the root of the design of a telescope is the parabola of rotation. As shown schematically in Fig. 8, parallel light from infinity that reflects off the surface of a parabola will come to a perfect focus. However, this is a mathematical fact for the entire parabola, not just the normal incident part near the vertex. Figure 9 shows a full parabola. The rays that strike the part of the parabola where the slope is large compared to one, reflect at grazing incidence but also pass through the same focus. The mirror can be a figure of rotation about the axis of symmetry of the parabola,
2 ,
(7)
where σ is the standard deviation of the surface roughness (5). This means, for example, that for a surface ˚ a mirror at a graze angle of 1° can suppress polish of 10-A, scatter to less than 5%. The angle through which the X ray is scattered is given by ϕ = λ/ρ,
(8)
where ρ is the ‘‘correlation length’’ — the characteristic distance between the surface errors. This equation is similar to the grating equation that governs diffraction and also leads to scattering preferentially in the plane of reflection. Because X rays are electromagnetic radiation, they are subject to diffraction just like longer wavelength photons. However, because of their very short wavelengths, the effects are not usually noticeable. Visible light telescopes
Figure 8. A conventional telescope features a parabolic surface that focuses parallel rays from infinity onto a single point.
1500
X-RAY TELESCOPE
Figure 9. Extending the parabola to a grazing geometry does not interfere with its focusing properties.
just as at normal incidence. Such a figure of revolution is called a paraboloid. Thus, a paraboloidal mirror can reflect at grazing incidence near the perimeter and at normal incidence near the center. But, because the reflectivity is low near the center, in practice the mirrors are truncated, as shown in Fig. 9. The resulting shape resembles a circular wastepaper basket that has a polished interior. The aperture of the telescope is an annulus. Because the telescope is a simple parabola, the surface can be described by Z(r) =
ρ r2 − , 2ρ 2
(10)
where Z(r) is the height of the telescope above the focal position at a radius r. The parameter ρ is the radius of curvature at the vertex, which is never part of the fabricated paraboloid. As such, it is merely a formal parameter of the system. In a normal incidence telescope, the radius of curvature is approximately twice the focal length, but in a grazing parabola, the focal length is the distance from the point of reflection to the focal point. This causes a major problem for the performance of the telescope. Radiation that enters the telescope on the optic axis is theoretically concentrated into a perfect focus, but that approaching off-axis is not. The effective focal length of the ray is the distance between the point of reflection and the focus, which is approximately equal to Z. Because the paraboloid usually is long in the Z direction, the focal length is a highly variable function of the surface position. This is called comatic aberration. It is so severe that it limits the value of the paraboloid as an imaging telescope. Such paraboloids have been used, but mostly for photometric and spectroscopic work on bright, known targets, where field of view is unimportant.
The severity of this problem was recognized early in the history of X ray optics and was solved by adding a second reflection. In 1952, Wolter (6) showed that two optical surfaces in sequence could remove most of the coma, allowing for the design of a true imaging telescope. It is a mathematical property of the hyperbola that light converging to one focus will be refocused onto the other. Thus, by placing the focal point of the paraboloid at one of the foci of a hyperboloid, the light will be refocused, as shown schematically in Fig. 10. In the process, the distances traveled by the rays after reflection become close to equalized, and the comatic aberration is reduced. Wolter described three types of these paraboloidhyperboloid telescopes, as shown in Fig. 11. Each has a paraboloidal surface followed by a hyperboloidal surface. The first reflection focuses parallel light to a point. The second reflection redirects the light to a secondary focus. In X-ray astronomy, we use mostly Type I because the angles at which the rays reflect are additive and create a shorter distance to the focal plane. When the graze angles are low, this can be very important. Sometimes, in the extreme ultraviolet, where graze angles can be in the range of 5–15° , the Wolter Type 2 has been used (7). The equations for the paraboloid-hyperboloid telescopes are r2 ρ (11) z1 = 1 − − 2 a2 + b2 2ρ 2 and z2 =
a b2 + r22 − a2 + b2 . b
(12)
Paraboloid
Hyperboloid
Focus of hyperboloid
Focus of paraboloid Figure 10. A Wolter Type I telescope, also known as a paraboloid-hyperboloid telescope, features two reflections. The first focuses the light to a distant point, the second reflection, off the hyperboloid, refocuses to a nearer point and provides a wider field of view.
X-RAY TELESCOPE
Type I
Type II
Type III Figure 11. There are three types of Wolter telescopes shown. The Wolter Type I is dominant because it shortens the focal length and allows nesting. Type II is used for spectroscopic experiments at the longest X-ray wavelengths, where graze angles are higher.
Thus, three parameters define the surfaces of the optics, and the designer must additionally define the range of z1 and z2 over which the optic will be built. During the design process, care must be taken to ensure that the proper range of graze angles is represented in the optic, so that the spectral response of the telescope is as expected. This is usually done by computer ray tracing for field of view and throughput. In 1953, Wolter extended the theoretical basis for grazing incidence telescopes by applying the Abbe sine condition in the manner of Schwarzschild to create the Wolter–Schwarzschild optic (8). This optic is a double reflection that has three types analogous to the paraboloidhyperboloids, but have, theoretically, perfect lack of coma on-axis. The Wolter–Schwarzschild is described by the parametric equations, r1 = F sin α, z1 =
−F FC sin2 α F + + 1 − C sin2 (α/2) C 4 R
2−C 1−C
(13)
−2C
× cos 1−C (α/2),
(14)
z2 = d cos α,
(15)
r2 = d sin α,
(16)
where −C 1 R C 1−C 2 2 = sin (α/2) + 1 − C sin (α/2) d F F × cos
2 1−C
(α/2).
(17)
1501
For this system, a ray that approaches on-axis, strikes the primary at (r1 , z1 ), reflects, and strikes the secondary at (r2 , z2 ), and then approaches the focus (located at the origin) at an angle of α off-axis. The parameter F is the effective focal length of the system and has units of length. C and R are dimensionless shape parameters. In practice, as the graze angle becomes small, the performance advantage of the Wolter–Schwarzschild over the paraboloid-hyperboloid becomes small. So often added complexity is avoided, and simple paraboloid-hyperboloids are built. One advantage of the type I Wolter telescopes is that their geometry allows nesting. Because the effective focal length of each optic is approximately the distance from the plane at the intersection of the paraboloids and hyperboloids to the focus, a series of Wolters of different diameter can be nested, one inside the other, to increase the effective area of the telescope. This implies that the outer nested pairs must reflect X rays at higher graze angles. The actual design process for X ray telescopes is usually performed by computer ray tracing. The resolution as a function of angle off-axis is difficult to estimate in closed form. Furthermore, the reflectivity of the X rays is a function of angle, and the angle can change substantially across the aperture, meaning that not all parts of the aperture have equal weight and that the response is a function of the energy of the incident radiation. By ray tracing, the resolution and throughput can be evaluated as a function of X-ray energy across the field of view of the telescope. THIN MIRROR TELESCOPES For the astronomer, the ability to study faint objects is crucial, meaning that a large collecting area is of primary importance. Wolter telescopes can be nested to enhance the total collecting area, as described in the previous section. However, the thickness of the mirrors employed occults much of the open aperture of a typical Wolter. The ideal Wolter telescope would have mirrors of zero thickness, so that all of the aperture could be used. Thus, mirrors made from thin, densely packed shells can significantly enhance capability. This is also important because thin mirrors allow the maximum collecting area for a given weight. Because these mirrors are to be launched into space aboard rockets, maximizing the ratio of collecting area to weight can be of central concern. This is especially true at high energies, where graze angles are low and the mass of mirror needed to collect X rays rises. Unfortunately, as mirrors are made thinner, they become less rigid and less able to hold a precise optical form. Thus a telescope made of thin, closely packed mirrors is likely have lower quality. Additionally, paraboloid and hyperboloid mirrors are expensive to fabricate in large numbers on thin backings. One solution has been to make the mirrors out of thin, polished foils. When a thin (often well below a millimeter) foil is cut to the right shape and rolled up, it can form a conical shape that approximates a paraboloid or hyperboloid, as shown in Figure 12. Two of these, properly
1502
X-RAY TELESCOPE
Figure 12. The paraboloid of Fig. 9 can be approximated by a cone. This allows fabricating low-cost, low-quality optics.
spectroscopy. In X-ray spectroscopy, the targets that are bright enough to be studied tend to be far apart in the sky, so low resolution is not an issue. Also, because the background signal from the sky is very low, there is no loss of signal-to-noise by letting the focal spot size grow. An alternative to simple cones has been to use electroforming techniques to make thin shell mirrors. In the electroforming process, one builds a ‘‘mandrel’’ which is like a mold, the inverse of the desired optic. The mandrel is machined from a piece of metal and then polished. When placed in an electroforming bath, a shell of metal (usually nickel) is built up around the mandrel by electrochemical processes. When the shell is sufficiently thick, it is removed from the bath and separated from the mandrel. The disadvantage of this process is that many mandrels must be made. The advantage is that many copies are possible at low cost. Because the mandrel is machined, a sag can easily be added to approximate the ideal Wolter shape more closely. Metals (such as nickel) that are used in electroforming tend to have high density and lead to heavy telescopes. Sometimes, replication is used to reduce weight. A shell of some lightweight material like carbon-fiber epoxy is built, that approximates the desired shape. The mandrel is covered by a thin layer of epoxy, and the shell placed over it. The epoxy dries to the shape and polish of the mandrel. When the replica is pulled away, it provides a mirror that has low weight and good imaging properties. KIRKPATRICK–BAEZ TELESCOPES
Figure 13. Photograph of an X-ray telescope that has densely nested thin shell mirrors to build a large collecting area.
The first imaging optic built using grazing incidence was an X-ray microscope built by Kirkpatrick and Baez in 1948 (9). (At the time, the space program had not really started, and only the sun was known to be a source of X rays. So there was no immediate application for X-ray telescopes.) The optic consisted of two, standard, spherically shaped mirrors in sequence, which, together, formed an imaging optic, as shown in Fig. 14. Parallel light incident onto a mirror that has a concave, spherically shaped surface of curvature R and graze angle θ will come to a line focus a distance (R/2) sin θ from the mirror, parallel to the mirror surface. There is very little focusing in the other dimension. Because we usually want a two-dimensional image, a second reflection is added by placing the second mirror oriented orthogonally, beyond the first. If the two mirrors are properly placed, then both dimensions focus at the same point. Kirkpatrick and Baez used spheres because they are readily available, but spheres have severe on-axis coma, and thus can have poor focus. The usual solution for
configured, then approximate a Wolter. Because there is no sag to the mirrors, the rays do not come to a theoretically perfect point focus. Instead, they form a spot of diameter equal to the projected width of the conical mirror annulus. If the goal of the builder is to create a focus of modest quality that has a large collecting area, especially at high energies, then this is an acceptable trade. A number of such telescopes have been built (Fig. 13), usually for
Figure 14. The Kirkpatrick–Baez optic is the first focusing X-ray optic made. It features two flat (or nearly flat) mirrors that reflect X rays in sequence. Each mirror provides focus in a different dimension.
X-RAY TELESCOPE
1503
1.0
Reflectance
0.8
0.6
0.4
0.2
0.0
Figure 15. To achieve a large collecting area, the Kirkpatrick–Baez telescopes require many co-aligned mirrors.
making a telescope is to replace the sphere by a onedimensional paraboloid. The paraboloid, which can be made by bending a sheet of glass or metal, has the geometric property of focusing the on-axis light to a perfect line focus. The paraboloids can then be nested to add extra signal into the line focus. To achieve a two-dimensional image, another set of co-aligned paraboloids must be placed after the first set, rotated 90° around the optic axis, as shown in Fig. 15. This creates a two-dimensional image. Using this ‘‘Kirkpatrick–Baez’’ geometry, one can build telescopes that have very large collecting areas, suitable for studying faint sources. The disadvantages of these telescopes are that they have relatively poor resolution, typically no better than 20 arcseconds, and have a fairly small field of view due to comatic aberration. MULTILAYER TELESCOPES A multilayer coating can be considered a synthetic version of a Bragg crystal. It consists of alternating layers of two materials deposited on a smooth substrate. Typically, one material has a high density and the other a low density to maximize the change in the index of refraction at the material interface. If the radiation is incident on a multilayer of thickness d for each layer pair at an angle θ , then constructive interference is experienced if the Bragg condition, mλ = 2d cos θ, (18) is met. This creates a narrow-wavelength band where the reflectance of the surface is much higher than using a metal coating alone. In Fig. 16, we show the reflectance of a multilayer mirror that consists of alternating layers of tungsten and silicon as a function of incident energy. This narrow energy response can be tuned to the strong emission lines of an object like a star but leads to the absorption of most of the flux from a continuum source. By placing the multilayer on the surface of a conventional, normal incidence mirror, it becomes possible
0
2000
4000 6000 Energy (eV)
8000
10000
Figure 16. Multilayer-coated mirrors can provide high reflectivity, where otherwise there would be none. In this graph, we show the response of a multilayer-coated mirror at a 3° graze. To the left, at low energy, the X rays reflect as usual. The multilayer provides narrow bands of high reflectivity at higher energies. Three Bragg orders are visible in this plot.
to create a narrow band of good reflectivity at normal incidence, where before there was none. This is usually applied in the extreme ultraviolet, but multilayers now work effectively up to 0.25 keV and even 0.5 keV. This approach has been used to excellent advantage in the study of the sun (10). The spectral band in which most X-ray telescopes have functioned is between 0.1 and 3.0 keV. Above 3 keV, the required graze angle becomes so small that the aperture annulus becomes small as well. The problem is compounded by the relatively low flux of the sources. Thus, to build a telescope, either we live with low efficiency or find a better way to improve the collecting area. Multilayers can be used to enhance reflectivity at grazing incidence. Higher energy radiation can be reflected at any given graze angle, but the narrow spectral response is difficult to match to the changing graze angles of the surface of a Wolter telescope. It has now been shown (11) that by varying the thickness of the multilayers as a function of the depth, a broad band response can be created, making the multilayers more useful for spectroscopy and the study of continuum sources. This effect works particularly well for hard X-ray telescopes, where the penetrating power of the X rays is higher, leading to interaction with a larger number of layers. The next generation of hard X-ray telescopes will probably use this effect. X-RAY INTERFEROMETRIC TELESCOPES Telescope resolution is limited by diffraction. Diffraction in radio telescopes is so severe that most new major observatories are interferometric. By linking together the signals from the telescopes without losing the phase information, synthetic images may be created to match the resolution of a single giant telescope whose diameter is equal to the baseline of the interferometer. The need for
1504
X-RAY TELESCOPE
interferometry comes from engineering realities. At some point, a telescope simply becomes so large that either it cannot be built or the expenses cannot be met. The largest, practical mirrors of quality sufficient for X rays are about 1 m long. At a graze angle of 2° , the entrance aperture of a 1-m mirror is about 3 cm; this means that a 1-keV signal will be diffraction-limited at about 10 milliarcseconds (0.01’’). Thus, conventional X-ray telescopes can achieve very high resolution, ten times that of the Hubble Space Telescope, before the diffraction limit becomes a serious problem. But, there is much exciting science beyond the X-ray diffraction limit. For example, at 10−3 arcseconds, it is possible to image the corona of Alpha Centuari, and at 10−6 arcseconds, it will be possible to resolve the event horizons of black holes in nearby active galactic nuclei. To build an X-ray interferometer that can accomplish these goals may, seem impossible at first, but the properties of grazing incidence, coupled with the properties of interferometers, provide a pathway. First, one must achieve diffraction-limited performance in two or more grazing incidence mirrors and then combine the signals in a practical way to achieve a synthetic aperture. Such a system has now been built and demonstrated in the laboratory (12), but, although there are plans to fly an X-ray interferometer in the 2,010 time frame, none has yet been launched. In theory, paraboloid-hyperboloid telescopes can reach the diffraction limit but in practice are too expensive. So, flat mirrors are used to ease the mirror figure and polish requirements. Figure 17 shows a schematic of a practical X-ray interferometer that has produced fringes in the laboratory and is being scaled up for flight. The idea is provide two flat, grazing incidence mirrors set at an arbitrary separation that direct the X rays into a beam combiner. The separation of these two mirrors sets the resolution of the interferometer. The two beams are then mixed by a beam combiner. This has traditionally been accomplished for X rays by using Laue crystals (13), but the thickness of the crystals, coupled with their low efficiency, makes them impractical for astronomy.
A solution that provides broad spectral response and high efficiency is simply to use two more flat mirrors. The beams cross and then strike the second set of flat mirrors at a graze angle just slightly higher than the initial reflection. This brings the two beams (still plane parallel) back together at a low angle. As Fig. 17 shows, if the two beams are coherent when they cross, fringes will appear. The spacing s of the fringes is given by Lλ , (19) s= d where d is the separation of the second set of mirrors and L is the distance from the mirrors to the focal plane where the beams cross. If L/d is sufficiently large, then the fringes can be resolved by a conventional detector. For example, if L/d is 100,000, then the fringe spacing from 10-A˚ X rays will be 100 µ, easily resolved by most detectors. This approach to beam combination is highly practical because it uses flat mirrors and has high efficiency. It also works in a panchromatic way. Each wavelength of radiation creates fringes at its own spacing, so, if the detector can resolve the energy of each photon, the individual sine waves will be resolved, and the interferometer will function across a wide spectral band. A single pair of mirror channels is inadequate in an interferometer for X-ray astronomy. In the early days of radio interferometry, a single pair was used, and the UV plane was sampled as the source drifted across the sky. However, many of the most interesting X-ray sources are highly variable, and it may not be possible to wait for a change of orientation. Thus, a substantial portion of the UV plane needs sampling simultaneously, effectively requiring that more than two channels mix at the focal plane. One attractive geometry that is being pursued is to use a ring of flats, as shown in Fig. 18. This can be considered a dilute aperture telescope. Because each of the many mirror paths interferes against all of the others, there is a large range of sampling in frequency space, and the beam pattern starts to resemble that of a telescope. GRATINGS
Detector
The analysis of spectra is a central tool of the X-ray astronomer, just as it is in other bands of the spectrum.
Flats Beams cross
Figure 17. A simple X-ray interferometer, capable of synthetic aperture imaging may be built in this configuration. A pair of flat mirrors at grazing incidence causes the wave fronts to cross. Another pair of flat, grazing incidence mirrors redirects the light to an almost parallel geometry, where the beams cross at a large distance. Because of the low angles at which the wave fronts cross, fringes much larger than the wavelength of the X rays can be created.
Figure 18. The interferometer of Fig. 17 can be made more powerful by placing multiple sets of flat mirrors in a ring that feeds a common focus.
X-RAY TELESCOPE
X-ray astronomy differs from other fields in that much of the spectroscopy is performed by using energy-sensitive, photon-counting detectors. The first X-ray observatories had no optics but could still perform low-resolution spectroscopy by using proportional counters. These devices measured the number of secondary ionization events caused by an X ray and thus led to an energy estimate of each photon. However, the spectral resolution (R = λ/δλ) was only about 5 at 1 keV. Solid-state systems, as exemplified by CCDs, now reach R = 20 at 1 keV, and quantum calorimeters have a resolution of several hundred and are still improving. To achieve very high spectral resolution, the X-ray astronomer, just like the visible light astronomer, needs a diffraction grating. Similarly, diffraction gratings come in two categories. Transmission gratings allow the photons to pass through and diffract them in the process; reflection gratings reflect the photons at grazing incidence and diffract them in the process. Transmission Transmission gratings are, in essence, a series of thin, parallel wires, as shown in Fig. 19. The space between the wires is ideally empty to minimize absorption of X rays. The wires should be optically thick to absorb the flux that strikes them. This creates a wave-front amplitude on the far side of the grating that is shaped like a repeating square wave. At a large distance from the grating, constructive interference can be experienced where the wave satisfies the ‘‘grating equation’’ (1), nλ = d sin α,
(20)
d
where d is the groove spacing, λ the wavelength of the radiation, and α is defined as in Fig. 19. The value of α is the dispersion angle, which is limited to about ±1° in the X ray. Because the grating consists of simple wires that cast shadows, the grating cannot be blazed in the same way as a reflection grating. If the wires are fully opaque and they cover about half of the area, then the plus and minus first orders of diffraction have about 10% of the beam at a maximum signal of 20%. This is a low efficiency, but is still much higher than crystals, because the entire spectrum is dispersed with that efficiency. To disperse the short wavelength X rays through a substantial angle, the grooves must be very close together, typically 100–200 nm. This in turn means that the gratings must be very thin, substantially under 1 µ. Making a single grating of this groove density and thickness is currently impractical, so the transmission grating used in a telescope consists of many small facets, each about a square centimeter in extent. These are arranged behind the Wolter telescope. The aberrations introduced off-axis by the grating array are smaller than those from the Wolter telescope, so a better spectrum is gained by placement behind the optic in the converging beam. Reflection A reflection grating is an alternative to transmission gratings. The same reflection gratings that are used in the ultraviolet and longer wavelength bands can be used in X ray as well, as long as they are used at grazing incidence. When X rays approach a grating at grazing incidence, the symmetry of the interaction is broken. The rays can approach the grating in the plane that lies perpendicular to the grooves, and the diffracted light will also emerge in that plane, as shown in Fig. 20. This is known as the ‘‘inplane’’ mount. However, it is also possible for the radiation to approach the grating quasi-parallel to the grooves, in which case it diffracts into a cone about the direction of the grooves, as shown in Fig. 21. This is known as conical diffraction in the ‘‘extreme off-plane’’ mount because the diffracted light no longer lies in the same plane as the incident and zero-order reflected light. The grating equation can be written in general form, nλ = d sin γ (sin α + sin β),
q
-1
0
+1
Figure 19. An X-ray transmission grating is made of many finely spaced, parallel wires. Diffraction of X rays results at wire densities of up to 10,000 per millimeter.
1505
(21)
where n is the order number, λ the wavelength, and d the groove spacing. α and β are the azimuthal angles of the incident and diffracted rays about the direction of the grooves, and γ is the angle between the direction of the grooves and the direction of the incident radiation. For an in-plane mount, γ is 90° , and the grating equation simplifies to nλ = d(sin α + sin β), (22) which is the usual form of the equation. In the off-plane mount, γ is small and makes the change of azimuthal angle larger at a given groove density.
1506
X-RAY TELESCOPE
nλ d
β
α
Figure 20. In the conventional grating geometry, a ray approaches in the plane that lies perpendicular to the groove direction. The reflected and diffracted rays emerge in the same plane. α measures the direction of the reflected, zero-order light, and β measures the diffracted ray. At grazing incidence, α and β must both be close to 90° .
nl d b a
g
a Figure 21. In the off-plane geometry, the ray approaches the grating nearly parallel to the grooves, separated by the angle γ . α and β are defined as in Fig. 20 but now are azimuthal angles. The diffracted light lies in the cone of half angle γ centered on the direction of the rulings in space.
Both approaches have been used in X-ray astronomy. The advantage of the in-plane mount is that the groove density can be low, easing the difficulty of fabrication. One advantage of the off-plane mount is that its dispersion can be greater at a given graze angle because all diffraction is at the same angle. Additionally, the efficiency of diffraction tends to be higher in the off-plane mount. In both mounts, as with transmission gratings, it is preferable to place the gratings in the converging beam
Figure 22. A varied line space grating features parallel grooves that become closer together along the length of the grating. This removes the coma from the spectrum.
behind the Wolter optic. If the gratings are placed ahead of the optic (14), then the aberrations of the telescope degrade the spectral resolution. However, a major advantage is that conventional, plane gratings can be used to disperse the light. If the gratings are used in the in-plane mount, then the grooves may be parallel. To compensate for the changing distance between the point of reflection and the focus, the dispersion must increase as one goes farther down the grating (Fig. 22). This leads to a grating whose grooves are parallel, but change density (15). Such grating arrays were used on XMM-Newton (16). An alternative that can lead to higher efficiency and greater resolution is to use the off-plane mount in the converging beam (17). Again, the gratings stretch from right behind the mirror to closer to the focal plane. Near the bottom of the gratings, the dispersion needs to be higher, so the grooves must be closer together. Such a geometry requires the grooves to be radial, fanning outward from a hub which is near the focus (Fig. 23). Such gratings can be more difficult to fabricate, given their requirement for higher groove density, but can give much better performance (18). Such a grating was used to obtain an extreme ultraviolet spectrum of a hot white dwarf in a suborbital rocket experiment (19). When the gratings are placed in the converging beam, behind the optics, severe aberrations result if plane gratings are used. So, the grooves must not be parallel. Additionally, because there is a range of angles eight times the graze angle represented in the converging beam, eight or more gratings are required in an array, as shown in Fig. 24. MAJOR MISSIONS The history of X-ray telescopes can be tracked in the major missions that have flown and performed new astronomy
X-RAY TELESCOPE
1507
Figure 23. A radial groove grating features grooves that radiate from a hub off the edge of the grating. This removes the coma in the spectrum of the converging beam used in the off-plane mount.
Figure 25. An image of the sun was taken at 1 keV using a paraboloid-hyperboloid telescope on the Yokoh mission. See color insert.
the telescope and returned the film to earth for analysis. Using this instrument, scientists could track the changing complexity of the solar corona in detail for the first time. In Fig. 25, we show an X-ray image of the sun captured by the Yokoh satellite in 1991 (20). Einstein
Figure 24. The radial groove and varied line space gratings must be placed in arrays behind the telescope to cover the entire beam.
The first major observatory for X-ray astronomy that featured a telescope was flown in 1978 (21). Dubbed the Einstein Observatory, the mission was based on a set of nested, Wolter Type I telescopes. Using a 4-m focal length and a maximum diameter of 60 cm, this observatory captured the first true images of the X-ray sky at a resolution as fine as 6 arcseconds. It had a low-resolution imaging proportional counter and a higher resolution microchannel plate at the focal plane. A transmission grating array was available for the study of bright sources. This observatory operated for just 3 years, but through the use of grazing incidence telescopes, moved X-ray astronomy onto the central stage of astrophysics. Rosat
using optics. There are too many missions to discuss in detail, but a few stand out as landmarks and show the evolution of X-ray telescopes. Apollo Telescope Mount The first high-quality X-ray images captured by telescopes were taken in an experiment called the Apollo Telescope Mount which was flown on Skylab, the orbiting space station of the 1970s. This featured a small Wolter telescope that had a few arcseconds of resolution for imaging the sun. This telescope was built before there was general access to digital, imaging detectors, and so it used film at the focal plane. Astronauts on the station operated
The next major advance came from the ROSAT (22), which featured nested Wolter telescopes of 3-arcsecond quality. Unlike the Einstein Observatory, which was used in a conventional observatory mode of pointing at selected targets, Rosat was first used to survey the entire sky. By sweeping continuously around the sky, it was able to build up an image of the entire sky in the X-ray region at unprecedented resolution. Then, it spent the ensuing years studying individual targets in a pointed mode. Trace Remarkable new images of the X-ray emission from the sun became available from the TRACE mission (10).
1508
X-RAY TELESCOPE
Figure 26. An image of the hot plasma on the limb of the sun was taken using a multilayer-coated telescope on the TRACE satellite. See color insert.
This satellite used a multilayer coating on a normal incidence optic to image the detailed structure of the sun’s corona (Fig. 26). Because of its use of normal incidence optics, the telescope achieved exceptionally high resolution, but across a limited band of the spectrum. The structure of the X rays shows previously unsuspected detail in the sun’s magnetic fields.
Figure 27. The Chandra Satellite before launch. The 10-meter focal length of the X-ray mirror created a long, skinny geometry.
Chandra NASA launched the Chandra Observatory in July 1999. This was the natural successor to the Einstein Observatory (23). Chandra features high-resolution Wolter Type I optics nested six deep. The telescope is so finely figured and polished that it achieves resolution better than onehalf arcsecond, not far from the performance of the Hubble Space Telescope. Chandra, which is shown in Fig. 27, uses a CCD detector or a microchannel plate at the focal plane. It performs spectroscopy through the energy sensitivity of the CCD and by using a transmission grating array. The improved resolution is providing dramatic new results. For example, the very first image taken was of the supernova remnant Cas-A. The telescope is so fine that it could immediately identify a previously unremarkable feature near the center as a stellar remnant of the supernova explosion. In Fig. 28, we compare the images of the Crab Nebula from Rosat, and Chandra, illustrating the importance of improved resolution from telescopes in X-ray astronomy.
Figure 28. X-ray telescopes are improving. To the left is an image of the Crab Nebula acquired with the ROSAT in 1990. It shows the synchrotron emission in green and the bright spot which is the pulsar in the center. To the right is an image captured with Chandra in 1999. The resolution has improved from 3 arcseconds one-half arcsecond, and the level of detail is much higher. See color insert.
of the thin mirror variety. At the cost of some resolution (15 arcseconds), it can achieve a very high collecting area for studing faint objects. Spectroscopic studies using the CCDs and an array of reflection gratings are now starting to generate unique new information about the physics of X-ray sources.
XMM-Newton
Future Missions
In December 1999, the European Space Agency launched a major X-ray telescope into orbit. Called XMM-Newton, it is an X-ray Multi-Mirror Mission named in honor of Sir Isaac Newton (16). It features three high collecting area mirrors
Considerable effort has gone into the definition of future missions for X-ray astronomy. They appear to be splitting into two varieties, similarly to visible light astronomy. The first class of observatory will feature modest resolution
X-RAY TELESCOPE
(1–30 arcseconds) but very high collecting area using thin mirror telescopes. NASA is planning a mission called Constellation-X (24) which will generate an order of magnitude more collecting area for spectroscopy than we currently enjoy with Chandra. The European Space Agency is studying a mission called XEUS (X-ray Evolving Universe Satellite), which will feature a huge X-ray telescope of modest resolution using thin mirrors in orbit near the Space Station Freedom (25). The other area of development is toward high resolution. NASA is now studying a mission called MAXIM (Micro-Arcsecond X-ray Imaging Mission) which has the goal of using X-ray interferometry to improve our resolution of X-ray sources by more than a factor of 1,000,000(26). At resolution below a microarcsecond, it should be actually possible to image the event horizons of black holes in the centers of our galaxy and others. BIBLIOGRAPHY 1. J. A. R. Samson, Techniques of Vacuum Ultraviolet Spectroscopy, Cruithne, Glasgow, 2000. 2. M. Born and E. Wolf, Principles of Optics, 7e, Cambridge University Press, Cambridge, 1999, pp. 292–263. 3. B. L. Henke, E. M. Gullikson, and J. C. Davis, At. Data Nucl. Data Tables 54, 181 (1993). 4. W. Cash, Appl. Opt. 26, 2,915–2,920 (1987). 5. D. K. G. de Boer, Phys. Rev. B 53, 6,048–6,064 (1996).
1509
6. H. Wolter, Ann. Phys. 10, 94–114 (1952). 7. M. Lampton, W. Cash, R. F. Malina, and S. Bowyer, Proc. Soc. Photo-Opt. Instrum. Eng. 106, 93–97 (1977). 8. H. Wolter, Ann. Phys. 10, 286–295 (1952). 9. P. Kirkpatrick and A. V. Baez, J. Opt. Soc. Am. 38, 766–774 (1948). 10. http://vestige.lmsal.com/TRACE/ 11. D. L. Windt, App. Phys. Lett. 74, 2,890–2,892 (1999). 12. W. Cash, A. Shipley, S. Osterman, and M. Joy, Nature 407, 160–162 (2000). 13. U. Bonse and M. Hart, App. Phys Lett. 6, 155–156 (1965). 14. R. Catura, R. Stern, W. Cash, D. Windt, J. L. Culhane, J. Lappington, and K. Barnsdale, Proc. Soc. Photo-Opt. Instrum. Eng. 830, 204–216 (1988). 15. M. C. Hettrick, Appl. Opt. 23, 3,221–3,235 (1984). 16. http://sci.esa.int/home/xmm-newton/index.cfm 17. W. Cash, Appl. Opt. 22, 3,971 (1983). 18. W. Cash, Appl. Opt. 30, 1,749–1,759 (1991). 19. E. Wilkinson, J. C. Green, and W. Cash, Astrophys. J. (Suppl.), 89, 211–220 (1993). 20. http://www.lmsal.com/SXT/ 21. http://heasarc.gsfc.nasa.gov/docs/einstein/heao2− about. html 22. 23. 24. 25.
http://heasarc.gsfc.nasa.gov/docs/rosat/rosgof.html http://chandra.harvard.edu http://constellation.gsfc.nasa.gov http://sci.esa.int/home/xeus/index.cfm
26. http://maxim.gsfc.nasa.gov
This page intentionally left blank
INDEX A A or B wind, 1040 A roll, 1034 A scan, in ultrasonography, 1413 A&B cutting, 1040 Abbe (V) number, 234–235, 1081–1082, 1100–1102, 1109, 1122–1123 Abbe sine condition of light optics, in charged particle optics and, 88 Abbe, Ernst, 261, 1100, 1106, 1109 Aberrations, 91–92, 234, 559, 1095, 1098, 1114–1124 accommodation and, 547 in charged particle optics, 92–94, 98 chromatic aberration, 1081–1082 human eye, 547–548 human vision and, 554–555 microscopy and, 1106 monochromatic, 1083–1085 optical transfer function (OTF) and, 1095–1098 PSF and, 1088–1089 wave theory of, 1083–1084 Absolute fluorescence (ABF), 863 Absorptance, 527–529 Absorption, 253–256, 527–529, 803 astronomy science and, 684–686 atmospheric, 1497 interstellar, 1496–1497 microscopy and, 1126 near resonance, 228–233 photodetectors and, 1184–1187 silver halide, 1273 Absorption band, 229 Absorption edge, 255 Absorption lines, 235 Academy aperture, 1040, 1359 Academy leader, 1040 Accommodation, 547, 1328 Acetate film, 1039, 1040 Acetone, PLIF and, 409 Achromatic doublet, 235 Acoustic antireflective coatings (AARC) in scanning acoustic microscopy (SAM), 1231 Acoustic impedance, in ultrasonography, 1415 Acoustic microscopy, 1128–1148 Acoustic reciprocity theorem, 1 Acoustic sources or receiver arrays, 1–9 Active centers, 1137 Active glasses, in three-dimensional imaging, 1331 Active matrix liquid crystal displays, 374, 857–858, 956
Active MATrix coating (AMAT), in electrophotography, 301 Active pixel sensors (APS), 1199–1200 Active snakes, thresholding and segmentation in, 644–645 Acuity, of human vision, 558–560 Acutance, 1357 Addition, image processing and, 590 Addition, Minkowski, 433, 612 Additive color films, instant photography and, 847–849 Additive color matching, 102 Additive color mixing, 126–127 Additive printing, 1040 Additivity, Grassmann’s, 531 Addressability, cathode ray tube (CRT), vs. resolution, 32 Addressing displays, liquid crystal, 955–959 Admissibility condition, wavelet transforms in, 1446 Advance, 1040 Advanced Photo System (APS) in, 141 Advanced Technology Materials Inc. (ATMI), 377 Advanced Very High Resolution Radiometer (AVHRR), 759, 760–761 Advanced Visible Infrared Imaging Spectrometer (AVIRIS) geologic imaging and, 648, 649 in overhead surveillance systems, 787 Aerial imaging, 350, 463–476, 1040 Aerial perspective, 1328 Aerosol scattering, lidar and, 880–882 Afocal systems, 1074, 1079 Agfa, 1024 Agfa Geaert, 839 Agfachrome Speed, 839 Agfacolor Neue process in, 128 Aging and human vision, 541, 549, 560 AgX (See Silver halide) Airborne and Topographic SAR (AIRSAR/TOPSAR), 650 Airborne radar, 1471–1473 Aircraft in overhead surveillance, 773–802 Airy disk, 249 flow imaging and, 399 microscopy and, 1110, 1124 telescopes, 688 Airy function, 94, 399 Airy pattern, 1082 1511
Albedo, 610, 611 AlGaN/GaN, scanning capacitance microscope (SCM) analysis, 22 Algebraic opening/closing, 437 Algebraic reconstruction technique (ART), in tomography, 1407–1408 Algebraic theory, in morphological image processing, 430 Aliasing, 50, 59, 60, 69, 75, 84 flow imaging and, 401 in magnetic resonance imaging (MRI), 986 Along Track Scanning Radiometer (ATSR), 772 Alternating current scanning tunneling microscopy (ACSTM), 28 Alternating pairs, in three-dimensional imaging, 1338 Alternating sequential filters, 437 Alumina dopants, secondary ion mass spectroscopy (SIMS) in analysis, 487–489 Aluminum lithium alloys, secondary ion mass spectroscopy (SIMS) in analysis, 484 Ambient light, vs. cathode ray tube (CRT), 182–183 American National Standards Institute (ANSI), 1041 American Standards Association (ASA), 1023 Amines, 1179 Amino acids (See Biochemistry) Ampere’s law, 211, 218 Amplifiers, SQUID sensors, dc array, 12 Amplifying medium, radar, 223 Amplitude, 212, 226, 1098 beam tilting, 5 nonsymmetrical, 4 Amplitude modulation (AM), 383, 1362 Amplitude reflection coefficient, 235 Amplitude resolution, 151 Amplitude time slice imaging, ground penetrating radar and, 472–475, 472 Amplitude transmission coefficient, 235 Amplitude weighting, 3 Anaglyph method, in three-dimensional imaging, 1331
1512
INDEX
Analog technology, 1040 in endoscopy, 334 SQUID sensors using, 10–12 Analog to digital conversion (ADC), 49, 61 in field emission displays (FED), 382 in forensic and criminology research, 721 in overhead surveillance systems, 786 Analytical density, 1040 Anamorphic image, 1040 Anamorphic lens, 1031, 1040 Anamorphic release print, 1040 AND, 590 Anger cameras, neutron/in neutron imaging, 1062–1063 Angle, 1040 Angle alpha, 541 Angle of incidence, 234, 468 Angle of reflection, 234 Angle of refraction, 234 Angular dependency, in liquid crystal displays (LCDs), 184 Angular field size, 1080 Angular frequency, 213 Angular magnification, 1076–1077 Angular resolution, High Energy Neutral Atom Imager (HENA), 1010 Angular spectrum, 1099 Animation, 1022, 1040–1042 cel, 1042 meteor/in meteorological research, 763–764, 769 in motion pictures, 1035 Anisotropic coma, 94 Anisotropic distortion, 94 Anisotropic astigmatism, 94 Anode, in field emission displays (FED), 380, 386, 387 Anomalies, gravitational, 444–445 Anomalous dispersion, 228 Anomalous propagation, radar, 1454 Anomalous scattering, in biochemical research, 698 Ansco Color process in, 128 Answer print, 1041 Antennas, 220, 242 ground penetrating radar and, 464–465, 468–469 in magnetic resonance imaging (MRI), 979 radar and over the horizon (OTH) radar, 1142, 1151, 1450 terahertz electric field imaging and, 1394 Antifoggants, in photographic color display technology, 1216 Antihalation backing, 1041
Antireflective coatings (ARCs), 381 Aperture, 54, 57–59, 243, 247–249, 1041, 1101, 1352 academy, 1040, 1359 Fraunhofer diffraction in, 247–249 microscopy and, 1108, 1122, 1124, 1128 in motion pictures, 1028 numerical, 1081, 1115 in overhead surveillance systems, 783 radar and over the horizon (OTH) radar, 1147 relative, 1081 Aperture plate, 1041 Aperture stop, 1080 Aperture transmission function, 248 Aplanatic condensers, 1123 Apodization, 1086, 1088 Apollo Telescope Mount, 1507 Appliances, orthopedic, force imaging and, 422 Applications Technology Satellite (ATS), 757 Aqueous humor, 512 Arc lamps, 1041 Archaeology, ground penetrating radar and, 464 Archiving systems, 661–682 art conservation and analysis using, 661–682 in motion pictures, 1038–1039 Argon plasma coagulation (APC), in endoscopy, 340 Armat, Thomas, 1022 Array amplifiers, SQUID sensors, 12 Array of Low Energy X Ray Imaging Sensors (ALEXIS), lightning locators, 905, 929 Array theorem, 249 Arrival Time Difference (ATD), lightning locators, 890–904 Art conservation, 661–682 Artifacts quality metrics and, 598–616 tomography/in tomography, 1410 Artificial intelligence, 371 feature recognition and object classification in, 351 in search and retrieval systems, 632 Artificial vision systems, 352 Arylazo, 135 ASA/ISO rating, film, 1023, 1041 ASOS Lightning Sensor (ALS), 907, 922 Aspect ratio, 1031, 1041 in motion pictures, 1022 television, 147, 1359
Associated Legendre functions, 253 ASTER, 660 Astigmatism, 92, 94, 1084–1085, 1089 in charged particle optics, 93 electron gun, 40–41 microscopy and, 1119 Astronomy (See also Telescopes), 682–693 Apollo Telescope Mount, 1507 Chandra Observatory, 1508 Constellation X mission, 1509 Einstein Observatory Telescope, 1507 magnetospheric imaging, 1002–1021 ROSAT telescopes, 1507 TRACE telescopes, 1507–1508 X-ray Evolving Universe Satellite, 1509 X-ray telescope, 1495–1509 XMM Newton telescope, 1508 Asynchronous transfer mode (ATM), 1382 Atacama Large Millimeter Array (ALMA), 693 Atmospheric pressure chemical vapor deposition (APCVD), 384 Atmospherics, 890 Atomic force microscope (AFM), 16 Atomic transitions, 215 ATSC Digital Television Standard, 1359 ATSC Digital Television Standard, 1382–1389, 1382 Attenuation, 229 gamma ray, 260 ground penetrating radar and, 466 human vision and, 559, 562 radar and over the horizon (OTH) radar, 1147 in ultrasonography, 1416–1417 Audio (See Sound) Auroras, far ultraviolet imaging of, 1016–1020 Authentication digital watermarking and, 161 in forensic and criminology research, 739–740 Autoassemble, 1041 Autochrome Plate process, color photography, 127 Autocorrelation, 1105 Autoexposure, 1355–1356 Autofluorescence, 1136 Autofocus, 1356 Automated Surface Observing System (ASOS) lightning locators, 907, 922
INDEX
Autonomous System Lab, art conservation and analysis using, 664 Autostereoscopic displays, 1328, 1336–1341 Avalanche photoconductors, 1173–1174 Aviation meteorology, 767 Axis, camera, 1041 Azimuth range Doppler (ARD), radar and over the horizon (OTH) radar, 1148 Azohydroquinones, instant photography and, 834 Azomethines, instant photography and, 835 Azos, 1179 Azosulfones, instant photography and, 839
B B frame, video, 1387 B roll, 1034 B scan, in ultrasonography, 1413, 1417, 1428–1429 Babinet’s principle, 249 Background, 568–569 Background, in infrared imaging, 807, 810, 813 Background haze, 604 Background limited infrared photodetector (BLIP), 1189 Background uniformity, 605 Backing, 1041 Backlight systems, in three-dimensional imaging, 1339 Backpropagation algorithm, in neural networks, 372 Backscatter, 873, 1450, 1469 Backscattered electron (BE) imaging, in scanning electron microscopes (SEM), 276 Baffled circular piston, 8 Baffled rectangular piston, 8–9 Balance stripe, 1041 Balance, color, 114–116, 118 Ballistics analysis, in forensic and criminology research, 716 Balloons in overhead surveillance, 773 Band theory of matter, photoconductors and, 1170–1171 Bandpass filters, 593, 611 Bandwidth, 50 cathode ray tube (CRT), 179–180 digital watermarking and, 149–150 expansion of, 149–150 lightning locators, 905
in magnetic resonance imaging (MRI), 985 radar and over the horizon (OTH) radar, 1147 television, 1362, 1367 Bandwidth compression, in overhead surveillance systems, 786 Barn door, 1041 Barney, 1041 Barrel distortion, 93 Baryta paper, 1209–1210 Base, for film, 1022–23, 1041, 1045 BASIC, 186 Batchelor scale, flow imaging and, 404 Beam candlepower seconds (BCPS) rating, 492 Beam conditioning, X-ray fluorescence imaging and, 1477 Beam expanders, holography, 509 Beam patterns and profiles, 1, 1426–1427 Beam splitters, in holography, 509 Beam tilting, amplitude, 5 Bechtold, M., 456 Beer-Lambert absorption law, 409 Beer’s law, 344 Bell, Thomas, 455, 456 Benchmark comparison, 607 Bent contours, 282 Benzisoxazolone, in instant photography, 838 Benzoylacetanilides, 134 Bertrand polarization lens, 1132 Bessel function, 94, 1082 Best optical axis, in human vision, 540–541 Beta decay, 220 Bethe–Bloch equation, 1156 Biacetyl, PLIF and, 409 Bias voltage, scanning capacitance microscope (SCM) vs., 20 Biased transfer roll (BTR), in electrophotography, 301 Bidirectional reflectance distribution function (BRDF), 51, 528 Binary imaging, 584–589 Binder or dead layer, in field emission displays (FED), 381 Binocular disparity, 1328 Binoculars, 1074 Biochemistry and biological research, 693–709 force imaging and, 422 in scanning acoustic microscopy (SAM), 1228 in scanning electrochemical microscopy (SECM), 1255–1256
1513
secondary ion mass spectroscopy (SIMS) in analysis, 484 X-ray fluorescence imaging and, 1479–1482 Biochips, terahertz electric field imaging and, 1403 Bioorthogonal wavelet basis, 1446 Bipack filming, 1041 Bipedal locomotion, force imaging and analysis of, 419–430 Bipolar junction transistors, 1173 Birefraction, 1131–1132 Birefringence, 233, 1134 Bisazos, 1179–1180 Bistatic radar, 772 Bit rates, compressed vs. uncompressed, 152 Black-and-white film, 1023, 1041, 1356 Black-and-white images, 584–589, 830–833, 1347 Black-and-white TV, 1359 Black light, 1041 Black-white vision, 567 Black, James Wallace, 773 BLACKBEARD lightning locators, 905, 929 Blackbody radiation, 103, 211, 222, 525, 690, 803, 804, 813 in overhead surveillance systems, 782, 789 photodetectors and, 1184–1187 Blanking, television, 1360 Bleaches, 138–139, 1217 Blimp, 1041 Blind spot, human eye, 514, 516 Blobs, 646 Blobworld, 624, 630 Block based search and retrieval systems, 623 Blocking lens, in three-dimensional imaging, 1331 Blooping, 1041 Blow up, 1041 Blue light, 101 Blue screen, 1041 Blueshift (See also Doppler shift), 686, 772 Blur function, 596 Blurring, 50, 56, 58, 59, 72, 81, 82, 83, 84, 544, 577, 604 digital watermarking and, 167 flow imaging and, 399–400 high-speed photography and, 491–492 image processing and, 579, 581, 595–596 in medical imaging, 756 in overhead surveillance systems, 784 quality metrics and, 598–616
1514
INDEX
Bohr magnetron, 288 Bolometers, 1193–1194, 1204–1206 Boltzmann’s constant, 222 Bonvillan, L.P., 773 Boolean operations, 590 Boom, 1041 Borehole gravity meter, 450 Bormann effect, 282 Born Oppenheimer approximation, 216 Bouguer anomaly/correction, 447, 453 Bounce light, 1041 Boundaries, thresholding and segmentation in, 644–645 Boundary conditions, 233–234 Boundary hugging, thresholding and segmentation in, 642 Bragg reflection, 244 Bragg’s law, 267 Brain and human vision, 513–514, 558, 569 Braking radiation (bremsstrahlung), 223 breakdown, 1041 Breast cancer detection, in infrared imaging, 812 Bremsstrahlung, 219, 223–224 Brewster angle, 237 Brewster window, 237 Bright field image, 268 Bright fringes, 243, 269–270 Brightness, 102, 618, 1037 cathode ray tube (CRT), 34–35, 180 in charged particle optics, electron gun, 88 feature measurement and, 344–345 in field emission displays (FED), 382 in forensic and criminology research, 723–724 image processing and, 580–584 Broad light, 1041 Broadatz textures, 622 Broadcast transmission standards, television, 1359–1393 Browsing, in search and retrieval systems, 617 Bucky diagrams, 572 Buffer rods, in scanning acoustic microscopy (SAM), 1233, 1234 Building block structures, in transmission electron microscopes (TEM), 271 Buried heterostructure laser, scanning capacitance microscope (SCM) analysis, 22–23 Butt splice, 1041
C C scan ultrasonography, 1413 Cadmium sulfide photodetectors and, 1190, 1200 Calibration color image, 116–117 DPACK, 27 FASTC2D, 27 force imaging and, 424 Hall generators, 974 scanners, 603 scanning capacitance microscope (SCM), 27–28 TSUPREM4, 28 Calotype, 1259–1309, 1345 Cambridge Display Technology (CDT), 818 CAMECA, 478, 479 Camera axis, 1041 Camera film, in motion pictures, 1026–1027 Camera log, 1041 Camera obscura, 1344–1345 Camera operator/animator, 1042 Cameras Anger, 1062–1063 animation, 1041 aperture in, 1028 autoexposure, 1355–1356 autofocus, 1356 Captiva, 847 digital imaging, 854–855 dollies, 1030 electronic flash in, 1348–1349 energetic neutral atom (ENA) imaging, 1006–1010 flow imaging and, 393–394 in forensic and criminology research, 710–714 frame and film gate in, 1028 frame rates in, 495 handheld cameras, 1031 high-speed, 494–498, 1047 I Zone, 847 image dissection cameras, 498 image formation in, 571 instant photography and, 827–859 intermittent action high-speed cameras for, 495 large format, 1352–1354 lens in, 1029, 1348 medium format, 1351 microimaging, 1351 mirrors in, 1074, 1346, 1349–1350 for motion pictures, 1022, 1027–1029, 1042 motor drive in, 1348 in overhead surveillance systems, 773–802
photoconductors and, 1174 pinhole, 1072–1073 Pocket Camera, 847 Polachrome, 848 Polacolor, 843–844 Polaroid, 844–847 Polavision, 848 pull down claw in, 1027 rangefinder in, 1350–51 reflex type, 1345, 1349–1350 rotating drum and mirror cameras, 496–497 rotating mirror framing cameras, 497–498 rotating prism cameras, 495–496 scintillation, 1313–1314 shutter in, 1027–28, 1351–1352 single lens reflex (SLR), 1349–1350 speed of, 1028–1029 Steadicam, 1031 still photography, 1344–1358 streak cameras, 499–500 strip cameras, 500 tripods, 1030 trucks, 1030 video assist, 1031 video, 1029–1031, 1174 viewfinders in, 1029 Campisi, George J., 375 Canadian Lightning Detection Network (CLDN), 890–904, 935 Cancer detection, using infrared imaging, 812 Candela, 1042 Candescent, 377–378 Canon Image Runner, 300 Capacitance electronic disk (CED), 16 Capacitance sensors, 17–18, 423–424 Capacitance–voltage (C–V) curve, scanning capacitance microscope (SCM), 21, 25 Capacitive probe microscopy (See also Scanning capacitance microscope), 16–31 Capacity, information, 1082–1083 Captiva, 847 Captured orders, microscopy and, 1109 Carbon arc lamps, in projectors, 1037 Carlson, Chester F., 299, 1174 Cascade development, in electrophotography, 312 Cassegrain mirrors, 783 Cassini Saturn Orbiter, 1020 Cataract, human eye, 548 Category scaling, quality metrics and, 608–609
INDEX
Cathode cathode ray tube (CRT), 44–45 electron gun, 39 in field emission displays (FED), 378–380, 384–387 Cathode ray direction finder (CRDF) lightning locators, 890, 912, 935 Cathode ray tube (CRT), 31–43, 44–48 in three-dimensional imaging, 1330 electron microscope use of, 262 in field emission displays (FED) vs., 374 graphics cards and, 174–175 meteor/in meteorological research, 757 oscilloscope using, 47 radar tubes using, 47 in three-dimensional imaging, 1333 Cathodluminescence (CL), 374 CAVE three-dimensional display, 1335 Cavity resonators, 1223–27 CD ROM compression, 151, 156–157 CD-2/3/4 developers, 131 Cel animation, 1042 Cellulose triacetate, 1042 Cement splice, 1042 Ceramics, secondary ion mass spectroscopy (SIMS) analysis, 484 Cerenkov radiation/counters, 1158, 1162 CERN Accelerator, 970 CGR3 flash counters, 907 Chain codes, in search and retrieval systems, 625 Chambers of human eye, 746 Chandra Observatory, 1508 Changeover, 1042 Channel buffer, 1388 Channel capacity, 65–66 Channel constancy, 173–174, 181, 184 Channel independence, 176, 181 Channeling, electron, 285–286 Channels, 1362, 1388 Character animation, 1042 Characterization of image systems, 48–86 Charge carrier, photoconductors and, 1170 Charge coupled devices (CCD), 1090 art conservation and analysis using, 663, 664, 671 astronomy science and, 688 color photography and, 141 digital photomicrography, 1138–1140
in endoscopy, 334 flow imaging and, 393–394 in forensic and criminology research, 722–723 high-speed photography and, 498–499 human vision and vs., 552–553 image formation in, 571 instant photography and, 827 multipinned phase (MPP), 1198 noise in, 396–397 in overhead surveillance systems, 784–787 particle detector imaging and, 1167 photoconductors, 1173–1174 photodetectors and, 1184, 1187, 1190, 1193, 1194–1198 radiography/in radiographic imaging, 1067–1069 terahertz electric field imaging and, 1398 virtual phase CCD (VPCCD), 1198 X-ray fluorescence imaging and, 1477 Charge cycle, in electrophotography, 300–304 Charge exchange process, 1004 Charge injection devices (CID), 1192–1193 Charge mode detectors, 1191–1193 Charge-to-mass ratio, in electrophotography, 313–315 Charge transfer devices (CTD), 1192 Charge transport layer (CTL), 1176 Charged area development (CAD), in electrophotography, 317 Charged particle optics, 86–100 Charging cycle, 1176 Charters of Freedom project, art conservation, 663 Chemical composition analysis, 684–686 Chemical sensitization, 1288–1293 Chemical shift imaging, MRI, 995–1000 Chemical shift saturation, MRI, 994–995 Chemical vapor deposition (CVD), 383–384 Chemiluminescence, 255 Chen operator, image processing and, 582 Chirped probe pulse, terahertz electric field imaging and, 1396–1397 Cholesteric liquid crystal displays, 965–966 Chomratic aberration, 92 Choppers, lidar, 873
1515
Chroma in, 103 ChromaDepth three-dimensional imaging, 1332 Chromakey, 1042 Chromatic aberration, 94, 234, 1081–1082 in charged particle optics, 93, 98 in human vision, 544–545, 554–555 in microscopy, 1117–1118, 1123 Chromatic adaptation, human color vision and, 520 Chromatic difference of refraction, 544 Chromatic filters, human vision and, 560 Chromaticity, 119, 148, 380, 512, 534–535 Chromaticity diagrams, 107–108, 534–535 Chrome, in gravure printing, 457–458 Chrominance, 103, 109, 1042 Chromogenic chemistry, 129–139 Chromophores, instant photography and, 836 Cibachrome process, 127 CIECAM97, 538 CIELAB, 521, 533, 535, 536, 537, 619 CIELUV, 535, 537, 619 CIGRE flash counter lightning locators, 907 Cinch marks, 1042 Cinemascope, 1031, 1042 Cinematographe, 1022 Cinemiracle, 1042 Cineon digital film system, 1042 Cineorama, 1031 Cinerama, 1031, 1042 Cinex strip, 1042 Circ function, 542 Circaram, 1042 Circles of confusion, 1347 Circular pistons, 8, 1424–1426 Circular polarization, 231–232 Circularity, 347–348, 627 Clapsticks, 1042 Clarity, 71, 75 Classical electron radius, 250 Clausius–Mossotti relationship, 229 Claw, 1042 Clay animation, 1042 Climatic change, geologic imaging and, 655–656 Close up, 1042 Closed captioning, 1389 Closed interval, 434, 435 Closing, morphological, 430, 436–438, 585 Cloud classification, 761–764
1516
INDEX
Cloud to Ground Lightning Surveillance System (CGLSS), 890–904, 940–941 Clustering, in feature recognition and object classification, 366–370 Clutter, radar vs., 470–472 CMOS image sensors, 1139–1140, 1199–1200 Coated lens, 1042 Coaterless film, 832 Coatings, 528 Coded OFDM (COFDM), 1389 Codewords, 1388 Coding, retinex, 75–83 Codoping, secondary ion mass spectroscopy (SIMS) analysis of, 489 Coefficient of thermal expansion (CTE), 382, 384, 387 Coherent image formation, 242, 258, 504, 507, 874, 1086, 1098–1100 Coherent PSF, 1087–1088 Coherent transfer functions, 1098 Coils, deflection yoke, 41–42 Cold field emitter (CFE), in charged particle optics, 89 Cold pressure transfer, in electrophotography, 323 Collimation, 247, 255, 509, 1042 Collinear sources, in-phase signals, 2–3 Colonoscopes, 332 Color analyzer, 1042 Color appearance models, 537–538 Color balance, 114–116, 118, 1042 Color blindness, 522–523 Color burst reference, television, 1367 Color characterization, in field emission displays (FED), 380 Color circle in, 109–110 Color codes, 705–706, 1116 Color compensation in, 116–117 Color constancy, in human vision, 520–521 Color coordinate systems, 105–114, 531–536, 619, 537, 630, 641 Color correction, 114–116, 1042 Color difference signals, 106–107, 536–537, 1367 Color duplicate negative, 1035, 1043 Color electrophotography, 325–329 Color film, 1023, 1356–1357 Color image calibration in, 116–117 Color image fidelity assessor (CIFA) model, 612 Color image processing, 100–122, 325–329 Color imaging, 51, 56, 75, 219, 524, 641 in biochemical research, 705–706
cathode ray tube (CRT), 35, 46, 47 in endoscopy, 335–336 in forensic and criminology research, 714 geologic imaging and, 654 in infrared imaging, 809 instant photography and, 833–842 liquid crystal displays, 967–968 quality metrics and, 598–616 in search and retrieval systems, 618–622 in ultrasonography, 1431–1432 Color internegative, 1043 Color lookup tables (CLUTs), 575 Color master, 1034 Color matching, 102, 104–105, 531–534 Color negative, 1043 Color photography, 122–145, 1208–22 Color positive, 1043 Color print film, 1043 Color purity, 388 Color reproduction, in dye transfer printing, 194 Color reversal film, 1043 Color reversal intermediate film, 1043 Color saturation, 1043 Color separation negative, 1043 Color shifted dye developers, 835 Color space, 512, 521, 535–536, 619, 641 Color television, 1365–67 Color temperature, 103, 525 Color ultrasound, 1431–1432 Color vision, 512–514, 518–521, 551–552, 560–561, 564, 567, 747, 1328 deficiencies of, 522–523 Coloray, 377 Colored masking couplers for, 136–137 Colorimetry, 103–105, 512, 523–524 Coma, 92–94, 1084, 1089, 1119 Comb filters, television, 1366 Combined feature, in search and retrieval systems, 630–631 Combined images, image processing and, 589–591 Combined negative, 1043 Commission Internationale de l’Eclariage (CIE), 102, 618 Common intermediate format (CIF), 156–157 Communication theory, 49, 75–83, 161, 168–171 Compensating eyepieces, microscopy and, 1115, 1120 Compensation, color, 116–117
Complementary color, 1043 Complex cells, 567–568 Component analog video (CAV), 1373–74 Component analysis for dimensionality reduction, 363–364 Component digital television 1374–75 Component TV systems, 149–150 Component video standards, 1380–1382 Composite digital television, 148–149, 1377–1380 Composite materials, in scanning acoustic microscopy (SAM) analysis for, 1228 Composite print, 1043 Composite video, 1043 Compound, 1043 Compressed video, 1385–1386 Compression, 151, 156–157, 519 advantages of, 151 audio, 1388 bit rate, vs. uncompressed, 152 CD ROM, 151, 156–157 common intermediate format (CIF) in, 156–157 comparison of techniques in, 155 complexity/cost of, 156 digital compression standards (DCS) in, 156 digital watermarking and, 150–157 discrete cosine transform (DCT) in, 154 displaced frame difference (DFD) in, 155 entropy coding in, 154–155 error handling in, 156 in forensic and criminology research, 740, 741 high definition TV (HDTV), 151, 153, 157 human vision and color vision, 521 interactivity and, 152 MPEG and, 156–157 multiple encoding and, 152–153 noise in, 152 in overhead surveillance systems, 786 packet communications and, 151, 152 packetization delay and, 152 perceptual factors and, 155 perceptual redundancy and, 151 predictive coding (DPCM) in, 153, 155
INDEX
quality of, 151 quantization in, 154 redundancy in, 155 requirements of, 151–153 resolution vs., 151 robustness of, 152 sampling in, 153 scalability of, 153 in search and retrieval systems, 631 standard definition TV (SDTV), 157 standards for, 156–157 statistical redundancy and, 150 subband/wavelet coding in, 154 symmetry of, 152 temporal processing in, 155 transform coding in, 153–154 transformation in, 153 vector quantization in, 154 video, 1385–86 Videophone standard for, 156–157 wavelet transforms in, 154, 1449–1450 Compton scattering, 211, 250, 256–258, 260, 1323–1324 Compton, Arthur, 211 Compur shutters, 1351 Computed tomography (CT), 219, 1057–1071, 1404 image formation in, 571–572 in magnetic resonance imaging (MRI) vs., 983 in medical imaging, 743 single photon emission computed tomography (SPECT), 1310–1327 Computer modeling, gravity imaging and, 452–453 Computer simulations of human vision, 50 Computers in biochemical research, 695–696 holography in, 510–511 in magnetic resonance imaging (MRI), 999–1000 meteor/in meteorological research, 769–771 in search and retrieval systems, 616–637 X-ray fluorescence imaging and, 1477–1478 Concentration density, flow imaging and, 408–411 Concentration measurements, planar laser induced fluorescence (PLIF) in, 863–864 Condensers, microscope, 1112–1124 Conduction electrons, 230
Conductivity, 220–231 developer, in electrophotography, 315–317 photoconductors and, 1171–1172 Cone coordinates, 176–178 Cone opsins, 517, 561 Cones, of human eye, 122, 513–517, 551–554, 558, 560–563, 746–747 Conform, 1043 Conjugate images, holography in, 505–506 Conjunctive granulometry, 441 Constant deltaC or closed loop mode, SCM, 21 Constant deltaV or open loop mode, SCM, 20 Constant luminance, television, 1367 Constant separation imaging, SECM, 1254–1255 Constellation X Observatory, 693, 1509 Constructive interference, 241, 243, 245, 507 Contact heat transfer, in electrophotography, 324–325 Contact print, 1043 Content based indexing, 617–618 Continuity, 1032–1033, 1043 Continuous contact printer, 1043 Continuous line sources, 6–7 Continuous planar sources, 7–8 Continuous wave (CW), 288, 293, 391 Continuous wavelet transform (CWT), 1444–1446 Contours, 615, 644–645 Contrast, 81, 117, 270, 605, 606, 611, 612, 623, 1094 in cathode ray tube (CRT), 34–35, 183 in field emission displays (FED), 387–388 in forensic and criminology research, 723–724 human vision and, 521, 558–560 liquid crystal displays, 967 in magnetic resonance imaging (MRI), 983, 988–991 in medical imaging, 752–755 microscopy and, 1110, 1113–1124, 1127, 1132–1134 in motion pictures, 1043 Rose model and medical imaging, 753–754 in scanning acoustic microscopy (SAM), 1229, 1235–1244 silver halide, 1261–1262 in transmission electron microscopes (TEM), 269
1517
in ultrasonography, 1418, 1423–1424, 1432–33 Contrast agents, in ultrasonography, 1418 Contrast mechanism, scanning capacitance microscope (SCM), 17–18 Contrast resolution, in ultrasonography, 1423–1424 Contrast sensitivity, 167, 521–522, 558–560, 605 Contrast transfer function (CTF), 401 Control strip, 1043 Conventional fixed beam transmission electron microscopy (CTEM), 277 Convergence, 34, 1328, 1330 Convergent beam diffraction, 268 Convex, 438 Convexity, 438 Convolution, 594–597, 1085–1092, 1105 Cooling, of photodetectors, 1188–1190 Coordinate registration, radar, 1144, 1147–1148 Copal shutters, 1351 Copiers, 574 electrophotographic/xerographic process in, 299 photoconductors, 1174–1183 quality metrics and, 598–616 Copper, in gravure printing, 457–458, 463 Copper phthalocyanines, 301 Copy control, digital watermarking and, 159 Copycolor CCN, 839, 855 Core, film, 1043 Corey Pauling Koltun (CPK) models, 694 Cornea, 512, 539, 541, 547 Corona, in electrophotography, 302–303 CORONA project, 777, 780 Corotron charging, in electrophotography, 302–303 Corpuscles, in light theory, 211 Correction filter, 1043 Corrections, optical, microscopy and, 1115 Correlated color temperature (CCT), 525 Correlation, image processing and, 594–597 Correspondence principle, 215 Corrosion science, scanning electrochemical microscopy (SECM) in, 1255 Coulombic aging, 380
1518
INDEX
Couplers, coupling, 1043 in color photography, 129, 131–137 in photographic color display technology, 1216 scanning capacitance microscope (SCM) vs., 20 in scanning acoustic microscopy (SAM), 1230 Coupling displacement, in instant photography dyes, 839 Cover glass, microscope, 1115–1116 Coverslip, microscope, 1115–1116, 1118 Crane, 1043 Credits, 1043 Criminology (See Forensics and criminology) Critical angle, 238 Cronar films, 1023 Cropping, 1043 Cross talk, in three-dimensional imaging, 1330, 1333 Cross viewing, in three-dimensional imaging, 1336 Cryptography, digital watermarking and, 161–164 Crystal growth and structure of silver halide, 125–126, 1262–1268 Crystal spectrometers, 244 Crystal structure, Bragg reflection and, 244 Crystallography, 282–285, 696–699 Cube root systems in, 107–108 Cubic splines, tomography/in tomography, 1405 Current density, in charged particle optics, 87, 93 Curvature of field, 92, 93, 1085, 1118–1119 Cut, cutting, 1043 Cyan, 1043 Cyan couplers, color photography, 133–135 Cycles of wave, 213 Cyclic color copying/printing, 328 Cycolor, 827 Cylinders, in gravure printing, 457–463 Cylindrical color spaces in, 109–110 Cylindrical coordinates in, 110–111 Czochralksi vertical puller, 1197
D D log E, 1043 D log H curve, 1043 Daguere, Louis, 1345 Daguerreotype, 1259–1309, 1345 Dailies, 1033–1034, 1044
Damping force, 226, 231 Dancer roll, in gravure printing, 459–460 Dark adaptation, 519 Dark current, 396, 1188–1190 Dark discharge, 311–312 Dark field image, 269 in scanning transmission electron microscopes (STEM), 277 in transmission electron microscopes (TEM), 264 Dark fringes, 243, 269–270 Dark noise, 785–786 Dark signal, 397 Darkfield microscopy, 1127–1128 Darkness, overall, 604 Darkroom techniques, in forensic and criminology research, 724–725 Darwin–Howie–Whelan equations, 282, 284 Data acquisition in infrared imaging, 807 in scanning acoustic microscopy (SAM), 1230 Data rate, 50, 63–64, 67, 81, 84 Data reduction processes, in feature recognition, 353–358 Data transmission, over TV signals, 1389 Databases in biochemical research, 699 in search and retrieval systems, 617–637 Dating, (EPR) imaging for, 296–297 Daylight, 524, 525, 1044 DC restoration, cathode ray tube (CRT), 180 De Broglie, Louis, 261 De-exicitation of atoms, 253 De Morgan’s law, 431, 435 Dead layer, 381 Decay, 254, 260, 980–981 Decibels, 1044 Decision theory, 161, 359–363 Decomposition, in search and retrieval systems, 627 Deconvolution, 595–596 Defect analysis and detection electron paramagnetic resonance (EPR) imaging for, 295–296 feature recognition and object classification in, 355 microscopy and, 1116–1117 scanning capacitance microscope (SCM) for, 22 in scanning acoustic microscopy (SAM) analysis for, 1228 silver halide, 1268–1269 Defense Meteorological Satellite Program (DMSP), 890–904, 929
Definition, 1044 Deflection angle, cathode ray tube (CRT), 35–36 Deflection yoke, CRT, 31, 41–43, 46, 48 Deflectometry, flow imaging and, 412 Defocus, 272, 555, 1088 Deinterlacing, in forensic and criminology research, 725–726 Delphax Systems, 301, 322 Delrama, 1044 Delta function, 1103–1104 Delta rays, particle detector imaging and, 1156 Denisyuk, N., 504 Denoising, wavelet transforms in, 1448 Densitometer, 1044 Density film, 1357 gravity imaging and, 449–450 human vision and, 513, 598–616, 598 in motion pictures, 1044 Density gradients, flow imaging and, 405 Depth cues, in three-dimensional imaging, 1327–1328 Depth of field (DOF), 403, 1044, 1124, 1347 Depth of focus, 403, 1044, 1124 Derived features, 357–358 Desertification, geologic imaging and, 655–656 Designer, 1044 Desktop publishing, 300 Destructive interference, 241 Detection theory, digital watermarking and, 161 Detectivity, photodetector, 1188 Detector arrays, photoconductive, 1203–1205 Detectors background limited infrared photodetector (BLIP), 1189 charge mode detectors, 1191–1193 diamond, 1168 fabrication and performance of, 1194–1198 in infrared imaging, 805, 806 lidar, 873 magneto/ in magnetospheric imaging, 1007 neutron/in neutron imaging, 1058–1062 in overhead surveillance systems, 785, 790–793 particle detector imaging, 1154–1169 photoconductor, 1169–1183
INDEX
photodetectors, 1183–1208 photoelectric, 1169 pixel, 1167–1168 radiography/in radiographic imaging, 1067 scintillator, 1062–1064 semiconductor, 1064–1065, 1163–1165, 1168 silicon drift, 1167 silver halide, 1259–1309 strip type, 1165–1167 in ultrasonography, 1418–1419 X-ray fluorescence imaging and, 1477 Deuteranopia, 522–523 Developers and development processes, 1356–1357 color photography, 129–131 in electrophotography, 300, 312–322 in forensic and criminology research, 724–725 holography in, 509 instant photography and, 827, 830, 834–842 in motion pictures, 1034, 1044 silver halide, 1299–1302 Development inhibitor releasing (DIR) couplers, 129, 137–139 Diagnostic imaging, art conservation and analysis using, 665–680 Dialogue, 1044 Diamond detectors, 1168 Diaphragm, 1044, 1080 Dichroic coatings, 1044 Dichroism, 233 Dicorotron charging, 303 Diderot, Denis, 455 Dielectric constants, 225 Dielectrics, speed of light in, 224–225 Difference of Gaussian (DOG) function, 80 Difference threshold, 612 Differential absorption lidar (DIAL), 869, 874, 882 Differential interference contrast (DIC), 1106, 1116, 1134–1135 Differential pulse code modulation (DPCF), 67, 786 Differential scattering, 250, 410 Diffraction, 3, 58, 246–253, 278–286, 1073, 1082, 1083, 1087 in charged particle optics, 94 convergent beam, 268 flow imaging and, 403 image formation in, 571 microscopy and, 1107–1108, 1110, 1126, 1128 in motion pictures, 1044 in overhead surveillance systems, 792–794
in transmission electron microscopes (TEM), 266–274 in ultrasonography, 1424–1428 Diffraction contrast imaging, TEM, 268–270 Diffraction limited devices, 249 Diffraction limited PSF, 1087 Diffraction theory, 3 Diffractometers, 244 Diffuse reflection, 234 Diffusers, 509, 528 Diffusion, 404, 525, 528, 1044 Diffusion coefficients, electron paramagnetic resonance (EPR), 294 Diffusion etch, gravure printing, 460 Diffusion transfer process, 456 Diffusion transfer reversal (DTR), 827, 833 Diffusivity, 404 Digital cameras, 854–855 Digital cinema, 1039–1040 Digital compression standards (DCS), 156 Digital effects, 1044 Digital imaging, 49–50, 516 art conservation and analysis using, 662–665, 680 CMOS sensors and, 1139–40 in forensic and criminology research, 722–723 instant photography and, 827, 854–855 in medical imaging, 754 microscopy and, 1106 in overhead surveillance systems, 786 in photographic color display technology, 1216–1217 photography, 1358 in photomicrography, 1137–1140 quality metrics and, 602–603 in search and retrieval systems, 616–637 television, 1374–1380 video, 1374–80 wavelet transforms in, 1447–1450 Digital intermediate, in motion pictures, 1035 Digital Library Initiative (DLI), 616 Digital light processor (DLP), 183, 185 Digital photography, 141–142, 827 Digital photomicrography, 1138–1140 Digital processing Cineon digital film system, 1042 in forensic and criminology research, 725
1519
in motion pictures, 1039–1040, 1044 in ultrasonography, 1429–1430, 1429 Digital rights management (DRM), 159 Digital sound, 1031, 1033, 1037, 1044 Digital television, 1374–1380 Digital to analog conversion (DAC), 721 Digital versatile disk (DVD), 159 Digital video (See also Compression), 146–157 Digital Video Broadcast (DVB), 1392 Digital watermarking, 158–172, 158 Digitizers, in forensic and criminology research, 709, 722–723 Dilation, 430, 433, 584–589, 1446 Dimension Technologies Inc. (DTI), 1338–1339 Dimensionality reduction, in feature recognition, 363–364 Dimmer, 1044 Dipole moment matrix element, 254 Dirac functions, 57 Direct image holography, 505 Direct mapping, in RF magnetic field mapping, 1125–1126 Direct thermal process, 196, 853 Direct transfer imaging, gravure printing, 461 Direction finding (DF) lightning locators, 890, 906–907 Directional response characteristics, 1 Directivity patterns, 1 Director, 1044 Dirichlet conditions, 1103 Disaster assessment, geologic imaging and, 651–655 Discharge cycle, 310–312, 1176 Discharge lamps, 222 Discharged area development (DAD), 317 Discontinuities, SAM, 1243–44 Discounting of illuminant, 520–521, 558, 561–562 DISCOVERER mission, 777 Discrete cosine transform (DCT) compression, 154 digital watermarking and, 171 in search and retrieval systems, 631–632 video, 1387–88 Discrete element counters, 1060 Discrete Fourier transform (DFT), 57, 61, 67–68, 80, 263, 1405, 1448 Discrete wavelet transform (DWT), 1446–1447
1520
INDEX
Discrimination, 607 Disjunctive granulometry, 441 Dislocation contrast, TEM, 270 Dispersion, 225–229, 234–235, 466, 1081 Dispersive power, 234 Displaced frame difference (DFD), 155 Displacement current, 212 Displays, 172–199 autostereoscopic, 1336–1341 characterization of, 172–199 field emission display (FED) panels, 374–389 flat panel display (FPD), 374 holography in, 510 liquid crystal (LCD), 955–969 in medical imaging, 754 photographic color technology, 1208–1222 secondary ion mass spectroscopy (SIMS) in, 482–484 in three-dimensional imaging, 1331–1335 in ultrasonography, 1413–1415 Dissolves, 1034, 1044 Distortion, 82, 92–94, 398, 1085, 1106 Distributors, 1044 DNA (See Genetic research) Doctor blade, in gravure printing, 454–455, 458 Documentation, in forensic and criminology research, 716–717 Dolby Digital Sound, 1033, 1044 Dollies, 1030, 1044 Dolph–Chebyshev shading, 3 Dopants and doping in field emission displays (FED), 384 germanium photoconductors, 1204–1205 in photographic color display technology, 1214 photoconductors and, 1171 secondary ion mass spectroscopy (SIMS) in analysis, 487–489 silver halide, 1293–1294 Doppler lidar, 870 Doppler radar, 223, 758, 764, 766, 767, 768, 772, 1142, 1458–1468 Doppler shift, 764, 772 astronomy science and, 685–686 flow imaging and, 415–416 lidar and, 879–881 magneto/ in magnetospheric imaging, 1017 planar laser induced fluorescence (PLIF) in, 863 radar and over the horizon (OTH) radar, 1142, 1145
in ultrasonography, 1418, 1430–1432 Doppler ultrasound, 1430–32 Dot matrix printers, 300 Dots per inch (DPI), 602 Double exposures, silver halide, 1288 Double frame, 1044 Double refraction, 1131–1132 Double system recording, 1033, 1044 Doublet, achromatic, 235 Downward continuation, gravity imaging and, 452 DPACK calibration, 27 DPCM, 83 Drag force, 226 DRAM, scanning capacitance microscope (SCM), 22 Drift correction, gravity imaging and, 446 Drift tube scintillation tracking, 1161 Drive mechanisms, display, 382–383 printer, 193 Driven equilibrium, in magnetic resonance imaging (MRI), 992 Driving force, 226 Dry developers, in electrophotography, 300 Dry process printers, 195 Dryers, gravure printing, 459 Dual color polymer light emitting diodes (PLED), 820–822 Dual mapping, 431 Dubbing, 1044 Dupe, dupe negative, 1044 Duplexers, radar, 1452 DuraLife paper, 1211 Dwell coherent integration time, radar and over the horizon (OTH) radar, 1147 Dwells, radar, 1146 Dyadic set, 1446 Dye bleach, 1217 Dye desensitization, 1298 Dye diffusion thermal transfer (D2T2), 853 Dye lasers, 885 Dye sublimation printers, 189–190, 189, 194–195, 194 Dye transfer printing, 188–197, 827 Dyes, 1356 in color photography, 123–125, 131–133, 140 in dye transfer printing, 188–197 in instant photography, 827, 834–842 in motion pictures, 1045 silver halide, 1295–1296 Dynamic astigmatism, electron gun, 40–41
Dynamic spatial range (DSR), 404–405, 415
E E field Change Sensor Array (EDOT), 890–904 Earth Observing System (EOS), 659, 772 Earth Probe, 660 Earth Resources Technology Satellite (ERTS), 778–779 Earthquake imaging, 453 Eastman Color film, 1024 Eastman Kodak, 498 Eastman, George, 1022 Echo, in magnetic resonance imaging (MRI), 981–983, 981 Echo planar imaging (EPI), 989, 992 Echo signal processing, in ultrasonography, 1419–1420 Edge detection, 517 Edge enhancement, 580–583 Edge finding, 582 Edge following, 642 Edge histograms, 626 Edge numbering, edge codes, film, 1026–1027, 1045 Edge sharpness, 1357 Edge spread function (ESF), 751–752, 1091–1092 Edgerton, Harold, 774 Edison, Thomas, 1022 Edit decision list (EDL), 1035, 1045 Editing, motion pictures, 1034–1035, 1045 Effective/equivalent focal length (EFL), 1078 EI number, film, 1023 Eigenfunctions, eigenstates, 285–286 8 mm film, 1025 Einstein Observatory Telescope, 1507 Einstein, Albert, 211 Einstein’s coefficient of spontaneous emission, 253 Einstein’s Theory of Special Relativity, 228 Ektapro process, 498 Elastic scattering, 249, 338 Elastic theory, liquid crystal displays, 959 Elasticity imaging, in ultrasonography, 1433–1434 ELDORA radar, 1457, 1471 Electric dipole radiation, 218, 220 Electric discharge sources, 222–223 Electric field imaging, X-ray fluorescence imaging and, 1489–1494 Electric Field Measurement System (EFMS), 890–904, 908 Electric permittivity, 212
INDEX
Electric polarization, 227 Electrical conductivity, 230 Electrical discharge, 910–911 Electrical fields, magnetospheric imaging, 1002–1021 Electrical Storm Identification Device (ESID), 907, 922–924 Electro–optic effect, 233 Electro optics, terahertz electric field imaging and, 1394–1396 Electrocardiogram (ECG), 198 Electrochemical microscopy (SECM), 1248–1259 Electroencephalogram (EEG), 198–210, 744 Electroluminescent display, 817–827 Electromagnetic pulse (EMP), 909, 941 Electromagnetic radiation, 210–261, 682, 803, 1072, 1393 Electromagnetic spectrum, 218–220, 1072 Electromechanical engraving, 461–462 Electromyography, force imaging and, 424 Electron beam, cathode ray tube (CRT), 32–34 Electron beam gravure (EBG), 462 Electron beam induced current (EBIC), SEM, 276 Electron channeling, 285–286 Electron gun, 31, 39–45, 48, 87–88, 173 Electron magnetic resonance (EMR), 287 Electron microscopes, 87, 261–287, 573, 590, 594 Electron paramagnetic resonance (EPR) imaging, 287–299, 1223–1227 Electron positron annihilation, 220 Electron radius, classical, 250 Electron sources, in charged particle optics, 87–91 Electronic endoscopes, 334 Electronic flash, 1348–1349 Electrophotography, 299–331, 574, 598–616, 1174–1183 Electroplating, gravure printing, 457–458 Electrosensitive transfer printers, 195 Electrostatic energy analyzers (ESA), 482 Electrostatic image tube cameras, 498 Electrostatic transfer, 323 Electrostatics and lightning locators, 911–912 Elementary bit streams, 1382
Elliptical polarization, 232 Emission, 223, 253–256, 259, 379, 383–384, 387, 804 Emission computed tomography (ECT), 743 Emission electron microscope (EMM), 479 Emission ion microscope (EIM), 478 Emissivity, 804, 1072 Emittance, 804 in charged particle optics, electron gun, 88 in infrared imaging, 807–810, 813–814 Emitters, in charged particle optics, 89–91 Emmert’s law, 1330 Empty magnification, microscopy and, 1121 Emulsion speed, 1045 Emulsions, 1023, 1039, 1045, 1208–1222, 1268 Encircled energy, 1090 Encoding, 50, 61–62, 152, 152–153, 1045 Encryption (See also Steganography), 160 Endogenous image contrast, MRI, 988 Endoscopy, 331–342 Energetic neural atom (ENA) imaging, 1003–104, 1006–1016 Energy and momentum transport by, 213–214 Energy density, in transmission electron microscopes (TEM), 213 Energy exploration, geologic imaging and, 650 Energy flux, astronomy science and, 688–690 Energy levels and transitions in, 215 Engraving, 454–462 Enhanced definition TV (EDTV), 1391–1392 Enhancement of image color/in color image processing, 117–119 feature recognition and object classification in, 351–353 in forensic and criminology research, 722 in medical imaging, 756 in overhead surveillance systems, 787 SPECT imaging, 1316–1321 Enteroscopes, 331–332 Entrance pupil, 1080, 1354 Entropy, tomography/in tomography, 1408 Entropy coding, 83, 154–155
1521
Environmental issues, color photography, 141 E¨otv¨os correction, 448 Equalization, 725, 755, 1360 Equatorial anomaly, 1144 Ergonomics, force imaging and, 424 Erosion, 430, 432–434, 584–589 Error correction, television, 1390 Error handling, 156, 174–175 Estar films, 1023, 1045 ETANN neural network, 371–373 Etching in field emission displays (FED), 384 gravure printing, 456, 460–461 Ethylenediaminodisuccinic acid, 141 Euclidean distance functions, 624, 646 Euclidean granulometries, 439–442 Euclidean mapping, 586–587 Euclidean properties, 439 Euclidean set theory, 430 Euler’s formula, 226 European Broadcasting Union (EBU), 1374 European Radar Satellite, 649 European Spallation Source (ESS), 1057 Evanescent waves, 238 Evaporation, 383 Event frequency, high-speed photography and, 493 Evoked brain activity, evoked potential (EEG), 199, 201–202 Ewald diffraction, 267 Ewald’s sphere, 267, 279, 280 Exchange, 1045 Excimer lasers, 391 Excitation, in magnetic resonance imaging (MRI), 979–980 Excitation error, 268 Excitation of atoms, 253 Existing light, 1045 Exit pupil, 1080 Expanders, holography in, 509 Expectation maximum (EM), tomography/in tomography, 1408–1409 Exposure, 1045, 1352 autoexposure, 1355–1356 in electrophotography, 304–310 electrophotography, 1176–77 in forensic and criminology research, 723–724 gravure printing, 460–461 high-speed photography and, 492 in motion pictures, 1043–1045 silver halide, 1288 Exposure latitude, 1045 Extended definition TV (EDTV), 1382 External reflection, 236
1522
INDEX
Extinction contours, 282 Extinction distance, 280 Extraction, in search and retrieval systems, 622, 625 Extraction of features, 353–358 Extraneous marks, 604, 605 Extreme ultraviolet imaging (EUV), 1005–1006 Eye (See Human vision) Eye tracker systems, 522 Eyepieces, microscope, 1108–1109, 1114–1124
F f-number, 1045 f-stops, 1352, 1354 Fabry Perot devices, 246 Fade, 1034, 1045 Fakespace CAVE three-dimensional imaging, 1335 Fakespace PUSH three-dimensional imaging, 1334 Far distance sharp, 1347 Far field, 220, 911–912, 1425–1426 Far field approximation, 251 Far field diffraction, 246 Far ultraviolet imaging of proton/electron auroras, 1016–1020 Far Ultraviolet Spectographic Imager (FUV-SI), 1017–1020 Faraday effect, 975 Faraday, Michael, 1022 Faraday’s laws, 211, 212 Fast, 1045 Fast Fourier transform (FFT), 1095, 1151, 1405 Fast On Orbit Recording of Transient Events (FORTE), 890–904, 929 Fast spin echo, MRI, 992 FASTC2D calibration, 27 Fawcett, Samuel, 456 Feature extraction, in feature recognition, 353–358 Feature Index Based Similar Shape Retrieval (FIBSSR), 626 Feature measurement (See also Measurement), 343–350 Feature recognition and object classification, 350–374 Fermat’s principle of least time, 234 Fermi energy levels, 1170–1171 Ferric ethylendiaminetetraacetic acid (ferric EDTA), 138, 141 Ferric propylenediaminetetraacetic acid (ferric PDTA), 141 Ferricyanide, 138 Ferroelectric liquid crystal displays (FLC), 964–965 Feynman diagrams, 259
Fiber optics, 333–334, 509, 1063–1064 Fidelity, 50, 71–74, 81, 84, 598, 611, 612, 615 Field angle, 1080 Field curvature, 1085 Field effect transistors (FET), 1173, 1199 Field emission, in charged particle optics, 89 Field emission display (FED) panels, 374–389 Field emission guns (FEG), 277 Field emitter arrays (FEA), 89, 375, 376 Field number (FN), 1121 Field of view (FOV), 60 lidar and, 870 in magnetic resonance imaging (MRI), 986, 996 microscopy and, 1121 in motion pictures, 1045 scanning capacitance microscope (SCM), 19 in three-dimensional imaging, 1330, 1333 Field points, 251 in three-dimensional imaging, 1330–1331, 1333 Field stop, 1080 Fields, television, 1359 Figures of merit (See also Quality metrics), 50, 62–64, 1409 Filaments, in charged particle optics, 87–88 Fill, 604 Fill light, 1045 Filling-in phenomenon, 516 Film acetate, 1039, 1040 additive color films, 847–849 Agfa, 1024 antihalation backing, 1041 art conservation and analysis using, 661–662 ASA/ISO rating for, 1023, 1041 aspect ratio, 1022 backing, 1041 balance stripe, 1041 base, 1041 black-and-white, 1023, 1041, 1356 camera type, 1026–1027 cellulose triacetate, 1042 coaterless, in instant photography, 832 color photography, 124, 139–142 color reproduction in, 139 color reversal intermediate, 1043 color reversal, 1043 color, 1023, 1043, 1356–1357 containers for, 1039
core for, 1043 density of, 1357 dichroic coatings, 1044 dye stability in, 140 Eastman Color, 1024 edge numbering, edge codes in, 1026–1027 EI number for, 1023 8 mm, 1025 emulsion in, 1023, 1039 Fuji instant films, 849–851 Fuji, 1024 Fujix Pictrography 1000, 851–852 granularity in, 140 gravure printing, 461 high-speed photography and, 498 holography in, 509 image formation in, 571 image structure and, 140–141 imbibition (IB) system for, 1024 instant photography and, 827–829 integral, for instant photography, 828 intermediate, 1026, 1047 Kodachrome, 1024 Kodak instant films, 849 Kodak, 1024 laboratory, 1048 length of, in motion pictures, 1025 magazines of, 1026 in medical imaging, 754 modulation transfer function (MTF) in, 140–141 in motion pictures, 1022–1023, 1045 negative and reversal, 1023 negative, 1049 negative, intermediate, and reversal, 1023–1024 neutron/in neutron imaging, 1065 nitrate, 1023, 1039, 1049 orthochromatic, 1050 in overhead surveillance systems, 790 Panchromatic, 1050 peel apart, for instant photography, 828 perforations in, in motion pictures, 1025–1026, 1050 photographic color display technology, 1208–1222 photomicrography, 1137–1138 Pictrography 3000/4000, 852–853 Pictrostat 300, 852–853 Pictrostat Digital 400, 852–853 pitch in, 1026 Pocket Camera instant films, 847 Polachrome, 848 Polacolor, 843–844 Polavision, 848
INDEX
polyester, 1023, 1039, 1050 print type, 1024 projection type, 1023 quality metrics and, 598–616 rem jet backing, 1051 reversal type, 139, 1052 root mean square (rms) granularity in, 140 safety acetate, 1023, 1024, 1052 sensitivity or speed of, 124, 139 seventy/70 mm, 1025 sharpness and, 140 silver halide and, 140 sixteen/16 mm, 1024, 1025, 1052 sixty/ 65 mm film, 1024 Spectra instant film, 847 speed of, 1023 still photography, 1344–1358 Super xxx, 1025 SX70 instant film, 844–847 Technicolor, 1024 thirty/35 mm, 1022, 1024, 1054 Time Zero, 846–847 Type 500/600 instant films, 847 vesicular, 662 width of, in motion pictures, 1024–1025 Film base, 1045 Film can, 1045 Film cement, 1045 Film gate, 1028, 1036, 1045 Film gauge, 1045 Film identification code, 1045 Film perforation, 1045 Film to tape transfer, 1045 Filtered backprojection (FBP), 1405 Filtered Rayleigh scattering (FRS), 411, 412, 415 Filters and filtering, 55, 56, 59, 65, 68, 69, 70, 71, 73, 76, 80, 100, 262, 437, 1092–1100 alternating sequential, 437 color/in color image processing, 118–119 comb, 1366 digital watermarking and, 150, 167 downward continuation, 452 extreme ultraviolet imaging (EUV), 1006 flow imaging and, 394 in forensic and criminology research, 717 Gabor, 623 gravity imaging and, 450–452 haze, 1047 holography in, 509 human vision and, 548, 558–560, 566–567 image processing and, 578, 579, 589, 593–597
incoherent spatial, 1096 lidar and, 872 light, 1048 linear, 756 liquid crystal displays, 968 logical structural, 442 in medical imaging, 755–756 microscopy and, 1113–1114 in motion pictures, 1043, 1045 multispectral image processing, 101 neutral density, 1049 open-close, 437 polarizing, 1050 quadrature mirror filter (QMF), 622 quality metrics and, 611 Ram Lak, 1405 in search and retrieval systems, 623 spatial, 1100 SPECT imaging, 1322 strike, 452 upward continuation in, 451 vertical derivatives, 452 wavelet transforms in, 1447–1450 Final cut, 1046 Fine grain, 1046 Fine structure, 218, 254 Fingerprinting, digital watermarking and, 159 Finite ray tracing, 1083 First hop sky waves, 912 First order radiative processes, 253 First print, 1046 Fisher’s discriminant, 364–366 Fixers, 1345 Fixing bath, 1046 Fixing process, 138–139, 324–325, 324 Flaking, 1046 Flame imaging, 409 Flange, 1046 Flare, in cathode ray tube (CRT), 182–183 Flash photography, 104, 492, 1348–1349 Flashing, 1046 Flat, 458–459, 1046 Flat-bed editing tables, 1034 Flat panel display (FPD), 374 Flat, motion pictures, 1031 Floating wire method, in magnetic field imaging, 975 Flooding disasters, geologic imaging and, 651–655 Flow imaging, 390–419, 501, 989–991 Fluid dynamics flow imaging and, 390
1523
gravity imaging and, 453 in magnetic resonance imaging (MRI), 991 Fluorescence and fluorescence imaging, 210, 223, 255, 259, 529 absolute fluorescence (ABF), 863 art conservation and analysis using, 661, 676–677 flow imaging and, 397 laser induced fluorescence (LIF), 408, 861–869 phosphor thermography, 864–867 planar laser induced (PLIF), 391, 408–409, 411–416, 861–864 pressure sensitive paint, 867–868 thermally assisted fluorescence (THAF), 863 X-ray fluorescence imaging, 1475–1495 Fluorescence microscopy, 1106, 1135–1137 Fluorescent lifetime, 254 Fluorescent sources, 524, 525, 529 Fluorescent yield, 255 Fluorochrome stains, microscopy and, 1137 Flux density, 214 Flux measurement, 972–973 Flux, lens-collected, 1081 Flux, photon, 215 Flux, reflected, 527–529 Fluxgate magnetometer, 974–975 Fluxmeters, 971–973 Focal length, 1073, 1078, 1354–1355, 1347 Focal plane, 1046, 1352 Focal plane array (FPA), 804, 805 Focal plane shutters, 1352 Focal point, 54 Focus, 1073, 1347 autofocus, 1356 flow imaging and, 403 ground penetrating radar and, 471 human vision and, 513, 555 in infrared imaging, 809–810 microscopy and, 1122, 1124 in ultrasonography, 1427–1428 Focus variation, holographic, 272 Focused ion beam (FIB) imaging, 90–91, 93, 479 Fog, 1046 Fold degeneration, 217 Foley, 1046, 1053 Follow focus, 1046 Foot, human, force imaging and analysis of, 421 Footage, 1046 Footlambert, 1046 Force imaging, 419–430
1524
INDEX
Force process, 1046 Forecasting and lightning locators, 909 Foreground, 1046 Forensics and criminology, 709–742, 1393 Foreshortening, 1328 Forgery detection, art conservation and analysis using, 661 Format, 1046 Format conversion, video, 720–722 Formation of images (See Image formation) Forward error correction (FEC), 1390 Foundations of morphological image processing, 430–443 Four field sequence, television, 1366–1367 Fourier analysis, 1102–1106 in forensic and criminology research, 731–732 gravity imaging and, 448 Fourier descriptors, in search and retrieval systems, 625–626 Fourier series, 698, 1102–1103 Fourier transform infrared (FTIR) microscope, 667 Fourier transforms, 50–58, 77, 280, 285, 1073, 1088, 1092–95, 1098–1099, 1102–1104, 1448 human vision and, 542 image processing and, 591–594, 596 in magnetic resonance imaging (MRI), 985, 987 in medical imaging, 751, 756 periodic functions and, 1104–1105 tomography/in tomography, 1405, 1406 in transmission electron microscopes (TEM), 263 two dimensional, 1104–1105 Fovea, 513, 515, 522, 561, 566, 746–747 Fowler–Nordheim plot, 379 Fox-Talbot, William Henry, 455, 492, 1345 Foxfet bias, 1166 Fractals, feature measurement and, 349–350 Fractional k space, in magnetic resonance imaging (MRI), 992 Frame, 1046 Frame and film gate, 1028 Frame by frame, 1046 Frame grabbers, 709, 722–723 Frame line, 1046 Frame rates, high-speed photography and cameras, 495 Frame transfer CCD, 393
Frames, television, 1359 Frames per second (FPS), 1046 Fraud detection, digital watermarking and, 159 Fraunhofer diffraction, 246, 247–249, 264, 280 Fraunhofer lines, 235 Free air correction, gravity imaging and, 447 Free electron gas, 230 Free electron lasers, 223 Free induction decay, in magnetic resonance imaging (MRI), 980–981 Free viewing, in three-dimensional imaging, 1336 Freeze frame, 1046 Frei operator, image processing and, 582 Frenkel equilibrium, silver halide, 1270 Frequency, 213, 227, 293, 559 Frequency band power mapping, 200–201 Frequency domain, image processing and, 591–595 Frequency interference, ground penetrating radar and, 470–471 Frequency modulation (FM), television, 1362 Frequency multiplexing, television, 1366–1367 Frequency response, 50, 1046 Frequency spectrum, 1103 Fresnel approximation, 285 Fresnel diffraction, 246 Fresnel equations, 235–236, 1498 Fresnel rhomb, 239 Fresnel sine integral, 1091 Fringes, 243, 1102 Frit, in field emission displays (FED), 387 Frustrated total internal reflection (FTIR), 238 Fuji, 1024 Fuji Colorcopy, 840 Fuji instant films, 849–851 Fujix Pictrography 1000, 851–852 Full frame CCD, 393 Fuming, 1345 Functional MRI (fMRI), 744 Functional parallelism, 562, 563, 565 Fundamental tristimulus values, 534 Fur brush development, in electrophotography, 312 Fusing process, in electrophotography, 324–325
Futaba field emission displays (FED), 377
G G strings, in search and retrieval systems, 628 Gabor filters, 623 Gabor, Dennis, 504 Gabriel graph (GG), 370 Gain, screen, 1046 Gallium arsenic phosphorus (GaAsP) photodiodes, 1200 Gallium arsenide (GaAs) photodiodes, 1172, 1200 Gamma, 1046 in cathode ray tube (CRT), 176, 177, 179 in liquid crystal displays (LCDs), 184 in motion pictures, 1043 silver halide and, 1261–62 in television, 1362 Gamma radiation, 218–220, 257, 803 art conservation and analysis using, 677–680 astronomy science and, 682, 683, 688 attenuation in, 260 neutron imaging and, 1057–58 photoconductors, 1169 SPECT imaging, 1311–1313, 1323 Ganglion cells, human vision and, 517–518, 562, 563 Gastroscopes, 331 Gate, frame and film, in camera, 1028, 1046 Gauge, 1046 Gauss’ law, 25, 211, 212 Gaussian distribution, 1157 Gaussian operator/function, 542, 581, 593, 637 Gaussian optics, 1078–1079 Gaussian rays, 1083–1084, 1085 Gelatin filter, 1046 Generalized functions, 1103 Generation, 220–224 Genetic research (See also Biochemistry and biological research), 694, 745–746 Geneva movement, 1046–1047 Geodesy, gravity imaging and, 444 Geodetic Reference System, 445 Geographic Information System (GIS), 453 Geoid, 445–446 Geological Survey, 453
INDEX
Geology, 647–661 ground penetrating radar and, 464 in magnetic field imaging, 970 instant photography and, 855 magnetospheric imaging, 1002–1021 radar and over the horizon (OTH) radar, 1149 Geostationary Meteorological Satellite (GMS), 760 Geostationary Operational Environmental Satellite (GOES), 760, 778 Germanium photoconductors, 1190, 1197, 1204–1205 Gettering materials, in field emission displays (FED), 380 Ghosting in radar, 1452 in three-dimensional imaging, 1330, 1333 Giant Segmented Mirror Telescope, 693 Gibbs phenomenon, 71 Gladstone–Dale relationship, flow imaging and, 406 Global change dynamics, 655–656 Global circuit and lightning locators, 910 Global field power (GFP), EEG, 203 Global Position and Tracking System (GPATS), 890–904 Global Positioning System (GPS), 445 Global processing, in human vision, 567–568 Glossy surfaces, 528 Gobo, 1047 Godchaux, Auguste, 456 Gold sensitization, in silver halide, 1292–1293 Gradient, in dye transfer printing, 193–194 Gradient echo, MRI, 981, 992–993, 998–999 Grain boundary segmentation, SIMS analysis, 487–489 Graininess, 598–616, 1047, 1303–1304, 1357 Granularity, 75, 140, 439, 604–605 Granulometric size density (GSD), 442 Granulometries, 439–442 Graphics cards, cathode ray tube (CRT), 174–175 Grasp, microscopy and, 1108 GRASP software, 706 Grassmann’s law, 531 Graticules, microscope, 1113, 1121 Gratings, 244–246, 560, 1092, 1094, 1099, 1102
holography in, 508 human vision and, 565, 567 lobes of, 5 microscopy and, 1108–1109 X-ray telescopes, 1504–1506 Gravimeter, 446 Gravitation imaging, 444–454 Gravity, 444, 445 Gravity anomalies, gravity imaging and, 444–445 Gravure multicopy printing, 454–463 Gravure Research Institute, 459 Gray card, 1047 Gray levels, 68, 618, 622 color/in color image processing, 114–115 in medical imaging, 752 monochrome image processing and, 100 multispectral image processing and, 101 thresholding and segmentation in, 638–641 Gray scale, 589–591, 961, 1347 Gray surface, 804 Gray value, 103, 646, 1421 Gray, Henry F., 375 Graybody radiation, 222 Grazing incidence, X-ray telescopes, 1497–1499 Great Plains 1 (GP-1) lightning locators, 924–928 Green print, 1047 Greene, Richard, 376 Gridding, gravity imaging and, 450 Ground clutter, radar, 1453 Ground coupling, ground penetrating radar and, 468 Ground instantaneous field of view (GIFOV) overhead surveillance systems, 783, 791 Ground penetrating radar, 463–476 Ground reaction force (GRF), force imaging and, 419–420 Ground resolvable distance (GRD), 790 Ground sampled distance (GSD), 790, 791–792 Groundwater detection, 454, 464 Group of pictures (GOP), 1386 Group range, 1144 Group velocity, 228 Guide rails, 1047 Guide roller, 1047 Guillotine splice, 1047 Gutenberg, Johannes, 455 Gyromagnetic ratio, 217, 978
H Haar wavelet, 1446 Hadronic cascades, 1158
1525
Halation, 1047 Half-wave antennas, 220 Half-wave plates, 233 Halftone screening, 455 Halide, 1047 Hall effect/Hall generators, in magnetic field imaging, 973–974 Halogen, 135, 222 Halos, 1135 Hamiltonians, 285 Hamming windows, 611 Handheld cameras, 1031 Hanover bars, 1371 Hard, 1047 Hard light, 1047 Harmonic imaging, 1432 Harmonics, 212, 225–226, 250, 972, 1092, 1098, 1104 Hartmann–Shack sensors, 547 Hazard assessment, geologic imaging, 651–655 Haze, 604 Haze filters, 1047 Head-end, 1047 Head recording, 1047 Head-up displays (HUD), 509 Heat transfer, in infrared imaging, 809 Heater, electron gun, 39 Heidelberg Digimaster, 300 Heisenberg uncertainty principle, 215 Helio Klischograph, 456–457, 461 Helium neon lasers, 508 Hell machine, 456, 461 Helmholtz invariant, 1080 Helmholtz–Kirchoff formula, 1486 Helmholtz–Lagrange relationship, 481 Hermitian transforms, 987 Herschel, John, 1022, 1345 Hertz, 213, 1047 Hertz, Heinrich, 211 Heterojunction photoconductors, 1173 Hewlett-Packard Laserjet, 302 Heyl, Henry, 1022 Hi Vision television, 1391 High-definition TV (HDTV), 41, 42, 47, 147, 151, 153, 157, 1039, 1047, 1382, 1390 High-Energy Neutral Atom Imager (HENA), 1007–1010 High-energy radiation (See X-ray; Gamma radiation) High-frequency voltage, scanning capacitance microscope (SCM) vs., 20
1526
INDEX
High-frequency waves, radar and over-the-horizon (OTH) radar, 1142 High-pass filters, image processing, 593–597 High-resolution electron microscopy (HREM), 273 High-resolution images, TEM, 270 High-resolution secondary ion mass spectroscopy, 477–491 High-resolution visible (HRV) imaging systems, 649, 655 High-speed cameras, 1047 High-speed photographic imaging, 491–504 High-voltage regulation, cathode ray tube (CRT), 180 Highlights, 1047 Hindered amine stabilizers (HAS), 296 HinesLab three-dimensional imaging, 1341 HIRES geologic imaging, 660 Histograms art conservation and analysis using, 666 in forensic and criminology research, 723 image processing and, 584 in search and retrieval systems, 619–620, 626, 629 in medical imaging, 755 thresholding and segmentation in, 637–638, 640 Hit or miss transform, 434–436 HLS coordinate system, 619 HMI lights, 1047 Hoffman Modulation Contrast, 1106, 1132–1134 Hoffman, Robert, 1106, 1133 Hold, 1047 Holographic optical elements (HOE), 509, 510 Holographic PIV (HPIV), 417 Holography, 223, 262, 504–512, 1328 flow imaging and, 417 image formation in, 571 inverse X-ray fluorescent holographic (IXFH) imaging, 1486–1489 normal X-ray fluorescent holographic (NXFH) imaging, 1484–1486 stereograms using, 1336–1337 in three-dimensional imaging, 1336–1337 in transmission electron microscopes (TEM), 272 X-ray fluorescence imaging and, 1484–1489
Homogeneity, in cathode ray tube (CRT), 181–182 Homologous points, in three-dimensional imaging, 1329 Horizontal gradients, gravity imaging, 452 Horn, Hermann, 456 Hot, 1047 HSB color coordinate system, 641 HSI color coordinate system, 641 HSI coordinate system, 112–114 HSV color coordinate system, 619, 641 Hubble telescope, 595 Hue, 103, 111, 117–119, 578, 580, 618, 1047 Human vision, 49–51, 84, 122–142, 512–570, 746–748 color vision in, 122–142 in color image processing, 101 display characterization in and, 186 feature recognition and object classification in, 353–358 image processing and, 583 optical geometry of, 54 persistence of vision, 1021–1022, 1050 in three-dimensional imaging, depth cues and, 1327–1328 Humidity, 1047 Huygen’s principle, 242–243, 246 Huygenian eyepieces, 1120 Huygens, Christian, 243 Hybrid image formation, 571, 574 Hybrid ink jet printing technology (HIJP), 818–819 Hybrid scavengeless development (HSD), electrophotography, 312, 320–321 Hydrazone, 1179 Hydrology, ground penetrating radar, 464 Hydroquinone, instant photography, 834–839 Hydroxylamine, instant photography, 839 Hyperfine splitting, electron paramagnetic resonance (EPR), 288 Hyperfocal distance, 1347 Hyperspectral imaging, in overhead surveillance systems, 787 Hypo, 1046, 1047 Hyposulfite, 1345
I I frame, video, 1387 iWERKS, 1031
I-Zone camera, 847 Idempotent operators, 436–437 Identification systems, instant photography, 855–856 Idle roller, 1047 IHS coordinate system, 111–112 Infrared light, 356 IKONOS satellite, 780 Ilfochrome, in photographic color display technology, 1217 Illuminance, 1345 Illuminants, 103–104, 524–527, 558, 561, 610–611 discounting, in human vision, 520–521 Illumination, 243, 528, 558, 1072, 1101 in art conservation and analysis, 665 in endoscopy, 335–336 in holography, 504 Kohler, 1126 matched, 1099 in microscopy and, 1110–1014, 1125–1128 in monochrome image processing, 100 Rheinberg, 1128 standard illuminants and, CIE, 103–104 Illuminators, microscope, 1107, 1125–1127 Image, 1047 Image aspect ratio (See Aspect ratio) Image authentication, digital watermarking, 161 Image chain, in overhead surveillance systems, 782–800 Image combination, image processing, 589–591 Image correction, flow imaging, 397 Image dissection cameras, 498 Image enhancement (See Enhancement of image) IMAGE EUV, 1006 Image fidelity, 598, 611, 612, 615 Image formation, 571–575 in human eye, 541–544 in microscope, 1107–1114 Image gathering, 49, 54–67, 84 Image integrity, in forensic and criminology research, 740–741 Image manipulation, instant photography, 856 Image-on-image or REaD color printing, 328–329, 328 Image plates, in neutron imaging, 1065–1066
INDEX
Image processing, 575–598 color (See Color image processing) in dye transfer printing, 193–194 in endoscopy, 336–338 feature recognition and object classification in, 351–353 in forensic and criminology research, 719–732 in infrared imaging, 807 instant photography and, 828, 829–830, 833–842 in magnetic resonance imaging (MRI), 987–988 in medical imaging, 754–756 monochrome, 100–101 morphological, 430–443 multispectral, 101 in overhead surveillance systems, 787 in scanning acoustic microscopy (SAM), 1231–1233 silver halide, 1299–1303 wavelet transforms in, 1448 X-ray fluorescence imaging and, 1477–1478 Image processing and pattern recognition (IPPR), 351 Image quality metrics (See Quality metrics) Image restoration, 49, 51, 60, 67–75, 84, 118–119, 167, 722–756 IMAGE satellite, 1018 Image search and retrieval (See Search and retrieval systems) IMAGE SEEK, 618, 621 Image sensors, 1199–1200 Image space, 1075 Image vectors, in tomography, 1407 Imagebase, 630 Imagery Resolution Assessment and Reporting Standards (IRARS), 795 ImageScape, 633 Imaging arrays, photodetector, 1194–1198 Imaging satellite elevation angle (ISEA), 791–792 IMAX, 1031, 1335 Imbibition (IB) system, 1024, 1047 Immersion medium, microscopy, 1116 Impedance, boundaries, non-planar radiators, 9 Impedance, acoustic, in ultrasonography, 1415 Impression rolls, gravure printing, 458–459 Improved Accuracy from Combined Technology (IMPACT) sensors, 935, 937, 941
Impulse response, in transmission electron microscopes (TEM), 265 In-phase signals, 1–3 In-plane switching (IPS), liquid crystal displays, 963–964 In the can, 1048 In vivo imaging electron paramagnetic resonance (EPR) imaging for, 297–298 endoscopy, 331–342 fluorescence microscopy, 1136–1137 terahertz electric field imaging and, 1398–1399 INCA, 1020 Incandescent sources, 222, 242, 524, 1037 Incoherent light, 232, 242, 1085 Incoherent spatial filtering, 1096 Incoherent transfer functions, 1098 Index of refraction, 225, 1075, 1079 in charged particle optics, 88 complex, 227 flow imaging and, 405, 412 human vision and, 551 microscopy and, 1109 radar, 1453 Indexed color, cathode ray tube (CRT), 176 Indexing, in search and retrieval systems, 617–618 Indium antimonide (InSb), 806–807, 1201 Indium gallium arsenide (InGaAs), 806, 1200 Indium tin oxide (ITO), 381, 819, 957 Indoaniline, 138 Indophenol, instant photography, 835 Induced dipole moment, 225 Induction coils, in magnetic field imaging, 971–972 Inelastic scattering, 249 Infinity corrected microscopes, 1106–1107 Infinity space, microscopy, 1107 Information capacity, 1082–1083 Information efficiency, 50, 64, 84 Information hiding, digital watermarking vs., 160 Information processing, holography, 510–511 Information rate, 50, 62–67, 72–74, 81, 84 Information theory, 99, 161, 168–171 Infrared imaging, 218–219, 230, 1393 art conservation and analysis using, 668–672 astronomy science and, 690, 691–693
1527
geologic imaging and, 648, 660 in overhead surveillance systems, 789 National Imagery Interpretability Rating Scale (NIIRS), 795–800 phosphor thermography vs., 866–867 pigments and paints in, 668–672 quantum well infrared photodetector (QWIP), 1190, 1205 satellite imaging systems and, 758 Space Infrared Telescope Facility (SIRTF), 690, 691–692 Stratospheric Observatory for Infrared Astronomy (SOFIA), 692–693 Television and Infrared Observational Satellite (TIROS), 757, 777 thermography, 802–817 Initial phase, 213 Ink-jet printers hybrid systems, 818–819 instant photography and, 853 organic electroluminescent display and, 817–827 quality metrics and, 598–616 shadow masks in, 825–826 Inks in electrophotography, 321–322 in gravure printing, 454, 455, 458, 459 InP lasers, 22 InP/InGaAsP buried hetero structure laser, 22–23 Instant color photography, Polaroid Corp., 127 Instant photography, 827–859 Instantaneous field of vision (IFOV), 54, 805 Intaglio, gravure printing, 454–463 Integrity, image (See Image integrity) Intensified CCD (ICCD), 393–394, 862 Intensity, 102, 110–111, 214, 241, 610, 611, 1047 in cathode ray tube (CRT), 177, 178–179 in charged particle optics, 89 feature measurement and, 344–345 flow imaging and, 397–398 image processing and, 578, 580 in television, 147–148 Interface contrast, in transmission electron microscopes (TEM), 269
1528
INDEX
Interference, 239–246 in cathode ray tube (CRT), 38–39 ground penetrating radar and, 470–471 Interferometry, 242, 246 astronomy science and, 691 flow imaging and, 405, 412 geologic imaging and, 648 holography in, 510 lightning locators, 907, 946–947 X-ray interferometric telescopes, 1503–1504 Interlace, 146–147, 725–726, 1048, 1359 Interline transfer CCD, 393 Interlock, 1047 Intermediate atomic state, 258 Intermediate films, 1026, 1047 Intermediate sprocket, 1047 Intermittent, 1048 Intermittent action high-speed cameras, 495 Intermittent contact SCM, 21–22 Intermittent movement, 1048 Internal conversion, 255 Internal reflection, 236, 237–239 International Color Consortium (ICC) standards, 172 International Commission on Illumination (CIE), 523 International Electrochemical Commission (IEC), 602 International Standards Organization (ISO), 139, 602, 1048 International Telecommunications Union (ITU), 102, 1362 Internegative, 1043, 1048 Interocular distance, 1329 Interplanetary magnetic fields (IMF), magnetospheric imaging, 1002–1021 Interposition, 1328 Interpositive films, 1034, 1048 Intersystem crossing, 255 Intertropical Convergence Zone (ITCZ), 655–656 Invariant class, in morphological image processing, 437 Inverse continuous wavelet transform, 1446 Inverse Fourier transform (IFT), 1095 Inverse square law, 214 Inverse X-ray fluorescent holographic (IXFH) imaging, 1486–1489 Inversion recovery, in magnetic resonance imaging (MRI), 993–994 Iodate, 1213 Iodide, 1212–1214
Ion beam-induced chemistry, in charged particle optics, 87 Ion fraction, secondary ion mass spectroscopy (SIMS), 478 Ion-induced secondary electrons (ISE), 482–483 Ion-induced secondary ions (ISI), 482–483 Ion selective electrodes (ISE), 1248–1259 IONCAD, 482 Ionic conductivity, silver halide, 1271 Ionization, 1154–1157, 1159–1162, 1173 Ionographic process, 301, 322 Ionospheric analysis, 219, 1141, 1149 Ionospheric plasma outflow, ENA imaging, 1012–1016 Ionospheric propagation, radar and over-the-horizon (OTH) radar, 1143–1145 IRE units, television, 1361 Iridium, in photographic color display technology, 1214 Iris, 112–113, 512 Irradiance, 50–55, 75, 80–83, 214, 524–525, 530, 782, 1079–1081, 1085, 1092, 1094, 1102, 1287–1288 Isostatic correction, gravity imaging, 448 Isotope decay, art conservation and analysis using, 679–680 Issacs, John D., 1022
J Jacobian transformation, 97, 98 Jagged edges, 75 Johnson, C.L., 775 Joint motion, force imaging, 420 Josphson tunnel junctions, SQUID sensors, 9–15 JPEG, 154, 171, 519, 521, 631, 741 Jumping development, in electrophotography, 301, 312, 318–320 Just noticeable difference (JND), 747
K K, 1048 K nearest neighbor classification, 366–370 K space, 574, 987–988, 992 Kell factor, television, 1362 Kernel operations, image processing, 577–578, 580 Kerr cells, high-speed photography, 492–493 Kerr effect, 233 Ketocarboxiamdie, 134
Keykode number, 1048 Keystoning, 1048, 1331 Kinematical theory, 286, 278–281 Kinematics, 421, 1154–1169 Kinematoscope, 1022 Kinescope, 1048 Kinetics, force imaging, 424 Kinetograph, 1022 Kinetoscope, 1022 Kirchoff’s laws, 759 Kirkpatrick–Baez telescopes, 1502–1503 Klein–Nishina formula, 257 Klietsch, Karl, 456 Klystrons, radar, 1452 Knife edge technique, 402–403, 501 Knock-on electrons, 1156 Kodachrome process, 127–128, 1024 Kodacolor process, 127–128 Kodak, 165–168, 829–830, 849, 1024 Kohler illumination, 1110–1114, 1126 Kohler, August, 1106, 1110–1111 Kolmogorov scale, flow imaging, 404 KONTRON, 483 Kramers–Heisenberg formula, 1476 Kronecker delta, 78 Kuwahara filter, 582
L L*a*b* coordinate system, 108–109, 630 L*u*v* coordinate system, 109, 537 Laboratory film, 1048 Laboratory image analysis, in forensic and criminology research, 732–740 Lagrange invariant, 1079–1081 Lambertian diffusers, 525 Lambertian operators, 525, 528, 782 Lambertian surfaces, 51 Laminar flow, resolution, 402–403 Lamps, in projectors, 1037 Land cameras, 827 Land, Edwin, 1331 Landau distributions, 1155–1156, 1159 Landmark-based decomposition, 627 Landsat, 353, 444, 453, 648, 778, 787 Laplacian of Gaussian (V2G) operator, 71 Laplacian operator, 580–581, 593 Large area density variation (LADV), 604 Large format cameras, 1352–1354 Larmor frequency, 979 Larmor’s formula, 220, 221 Laser engraving systems, 462–463 Laser-induced fluorescence (LIF), 338, 408, 861–869
INDEX
Laser printers, 195–196, 302, 306–310 Laser pumps, 223 Laserjet printer, 302 Lasers, 223, 255, 1177 continuous wave (CW) lasers, 391 dye lasers, 885 in electrophotography, 306–310 engraving systems using, 462–463 excimer lasers, 391 flow imaging and, 391–394, 416 free electron, 223 gravure printing, 462–463 helium neon lasers, 508 in holography, 507–508, 507 laser-induced fluorescence imaging, 861–869 lidar, 869–889 liquid crystal displays, 958 NdYAG, 391, 416 in phosphor thermography, 866–867 Q switching in, 391–392 terahertz electric field imaging and, 1402–1403 yttrium aluminum garnet (YAG), 391 Latent image, 1047, 1048, 1276–1284 Latent impressions, in forensic and criminology research, 712–714 Lateral geniculate nucleus (LGN), 563, 564 Lateral hard dot imaging, gravure printing, 461 Lateral inhibition, 1093 Latitude, 1048 Latitude correction, gravity imaging, 447 Lateral geniculate nucleus (LGN), 516, 518 Lattice, sampling, 50, 54, 57, 58, 59, 65, 71 Laue patterns, 244, 279 Launch Pad Lightning Warning System (LPLWS), 890–904, 908, 921–922 Law of reflection, 234 Law of refraction, 234 Law, H.B., 31 Layout, 1048 Lead chalcogenide photoconductors, 1200–1201 Leader, 1048 Leaf shutters, 1351 Least mean square (LMS) algorithm, 372 Least time, Fermat’s principle, 234 Legendre functions, 253 Leith, E.N., 504
Lens, 54, 58, 59, 65, 91–100, 541, 1073, 1345, 1347, 1348, 1354–1355 anamorphic, 1031, 1040 Bertrand polarization lens, 1132 charged particle optics, 86–100 coatings on, 1042 electron gun, 40 eyepiece, 1119–1122 flow imaging and, 392–394, 399 flux in, 1081 Gaussian optics in, 1078–1079 in holography in, 505–506, 508–509 in human eye, 54, 512–514, 540, 547, 560, 746 in microscopy, 1106, 1107, 1117 for motion pictures, 1029, 1048 in overhead surveillance systems, 783 in projectors, 1036–1037 in scanning acoustic microscopy (SAM), 1233–1234 in secondary ion mass spectroscopy (SIMS) in, 480–481 in terahertz electric field imaging, 1397 for wide-screen motion pictures, 1031 zoom, 1029 Lenticular method, color photography, 127 Lenticular sheets, in three-dimensional imaging, 1337–1338 Letter press, 455 Leyden jars, 492 Lichtenberg, 299 Lidar, 223, 804, 869–889 Lif converter, 1064–1065 Lifetime, radiative, 254 Light speed of, 224–225 wave vs. particle behavior of light in, 210–211 Light adaptation, human vision, 519–520 Light amplification by stimulated emission of radiation (See Lasers) Light axis, 1048 Light detection and ranging (See Lidar) Light emitting diodes (LED), 1177 in electrophotography, 300 in electrophotography, 305–306, 305 multicolor organic, 822–825, 822 in three-dimensional imaging displays, 1340
1529
Light filter, 1048 Light in flight measurement, holography, 510 Light intensity, 1048 Light levels, human vision, 558 Light meter, 1048 Light microscope, 261 Light output, 1048 Light sensitive microcapsule printers, 195 Light sources, 524–527 Light valve, 1048 Lighting monochrome image processing and, 100–101 in motion pictures, 1033 Lighting ratio, 1048 Lightness, 102–103, 618 Lightning Detection and Ranging (LDAR), 890–904, 914, 941–945 lightning direction finders, 906–907 Lightning Imaging Sensor (LIS), 890–904, 929, 932–935 Lightning locators, 572, 890–955 Lightning Mapping System (LMS), 890–904, 929 Lightning Position and Tracking System (LPATS), 890–904, 935 Lightning warning systems, 907 LightSAR, 660 Ligplot software, 706 Line metrics, 603–604 Line of sight, 541 Line scan image systems, 60, 1482 Line sources, continuous, 6–7 Line spread function (LSF), 402, 751–752, 1090–1091, 1097 Line spread profiles, cathode ray tube (CRT), 33–34 Linear attenuation coefficient, 260 Linear filtering, in medical imaging, 756 Linear operations, image processing, 577–578 Linear perspective, 1328 Linear polarization, 231, 233 Linearity, 1085 Lines per picture height, television, 1362 Linogram method, in tomography, 1406 Lip sync, 1048 Liquid crystal display (LCD), 176, 184–186, 955–969 in field emission displays (FED) vs., 374 in three-dimensional imaging, 1331, 1338–1339 response time in, 387 Liquid gate, 1048
1530
INDEX
Liquid immersion development (LID), 312, 321–322 Liquid metal ion sources (LMIS), 90–91, 93, 478, 479 Liquid mirror telescopes (LMT), 872 Lithography, 456 electron beam, 87 quality metrics for, 598–616 secondary ion mass spectroscopy (SIMS) in, 479 Lithosphere, gravity imaging, 444 Live action, 1048 Lobes, grating, 5 Local (residual) gravity anomaly, 448 Local motion signal, human vision, 566 Local processing, human vision, 567–568 Localization, 1074–1077 in human vision and, 569 in magnetic resonance imaging (MRI), 984–987 Locomotion, force imaging and analysis, 419–430 Log, camera, 1041 Logical granulometry, 442 Logical structural filters, 442 Long Range Lightning Detection Network (LRLDN), 890–904 Long shot, 1048 Longitudinal magnification, 1076 Lookup table (LUT), 575, 612 Loop, 1048 Lorentz electromagnetic force, 1157 Lorentzian, 229 Lossless coding, 50 Lossless compression, 741 Lossy compression, 741 Low Energy Neutral Atom Imager (LENA), 1013–1016 Low key, 1048 Low-pass filters, 593, 594–597, 611 Low resolution electromagnetic tomography (LORETA), 204–208 Lowe, Thaddeus, 773 LPE, scanning capacitance microscope (SCM) analysis, 22 Lubrication, 1048 Luma, 103 Lumen, 1048 Lumiere, Louis and Auguste, 1022 Luminance, 102, 109, 530, 611, 618, 1081 in cathode ray tube (CRT), 173, 183 in field emission displays (FED), 380, 387 human vision and, 521, 564 in liquid crystal displays, 957, 968 in motion pictures, 1048
in television, 148, 1365, 1367, 1375 Luminescence, 255 in field emission displays (FED), 375 in forensic and criminology research, 717–719 Luminiferous ether, 211 Luminosity, flow imaging, 397–398 Luminous efficiency function, 529–531 Luminous flux, 1345 Lux, 1048 Lyman alpha signals, 1017 Lythoge, Harry, 456
M M scan ultrasonography, 1414–1415 M way search, in search and retrieval systems, 626 Mach bands, 81 Macro lenses, 1354–1355 Macula, 513, 514, 549, 746–747 Macular degeneration, 549 Magazines, film, 1026, 1049 Magenta, 1049 Magenta couplers, color photography, 135–136 Magic lasso, thresholding and segmentation, 644–645 Magic wand, thresholding and segmentation, 643–645 Magnetic brush developer, in electrophotography, 312–318 Magnetic dipole moment, 217 Magnetic direction finders (MDF), 906 Magnetic field imaging, 211–212, 803, 970–977 cathode ray tube (CRT), 38 electron paramagnetic resonance (EPR) imaging for, gradients in, 289–292 energetic neutral atom (ENA) imaging, 1006–1010 magnetospheric imaging, 1002–1021 particle detector imaging and, 1157 RF magnetic field mapping, 1223–1227 Magnetic permeability, 467 Magnetic permittivity, 212 Magnetic quantum number, 217 Magnetic resonance imaging (MRI), 210, 977–1002 electron paramagnetic resonance (EPR) imaging for vs., 293–294 image formation in, 571, 573, 574 in magnetic field imaging, 970
in medical imaging, 744, 745 in RF magnetic field mapping, 1224 Magnetic resonance measurement, 970–971 Magnetic sector mass spectrometer, 481–482 Magnetic sound, 1037, 1049 Magnetic splitting, atomic, 217 Magnetic storms, 1145 Magnetic striping, 1049 Magnetic susceptibility effects, 297 Magnetic tape, 1049 Magnetic toner touchdown development, 312 Magnetic track, 1049 Magnetization transfer imaging, 991 Magnetoencephalograms, 12–15, 744 Magnetoinductive technology, 976 Magnetometers, 1015, 974–975 Magnetopause to Aurora Global Exploration (See IMAGE) Magnetoresistivity effect, 975 Magnetospheric imaging, 1002–1021 Magnetron, 288, 1452 Magnets, for MRI, 980–981, 997–998 Magnification, 1076, 1079, 1080 angular, 1076–1077 longitudinal, 1076 in microscopy, 1107, 1121, 1124 in overhead surveillance systems, 783 transverse, 1076 visual, 1077 Magnification radiography, 673–674 Magnitude, 226, 241 Magoptical, 1049 MagSIMS, 479–484 Mahanlanobis distance measure, 624 Makeup table, 1049 Manhattan Project, 744 Mapping in biochemical research, 699 geologic imaging and, 647–661 gravity imaging and, 444–454 in infrared imaging, 812–815 lightning locators, 909 in magnetic field imaging, 970 RF magnetic field mapping, 1223–1227 secondary ion mass spectroscopy (SIMS) in, 477 Marcus electron transfer theory, 1297 Markov random field, in search and retrieval systems, 623 Mars Express mission, 1020–1021 MARS systems, 622, 632 Masking, 129, 136–137, 139, 1049 Mass, gravity imaging, 444 Mass attenuation coefficient, 260
INDEX
Mass spectrometry, secondary ion (SIMS), 477–491 Master, 1034, 1049 Master positive, 1049 Matched illumination, 1099 MATLAB, 186, 542 Matte, 1049 Matte surfaces, 528 Mauguin condition, 960 Maxima, 245 Maximum realizable fidelity, 73, 81, 84 Maxwell pairs, in magnetic resonance imaging (MRI), 984 Maxwell, James Clerk, 123, 211 Maxwell’s equations, 212, 225, 233, 234 Maxwell’s theory, 211–214 Mean size distribution (MSD), 442 Mean square restoration error (MSRE), 69–70 Measurement, 343–350 in astronomy science, 686–688 in color image processing, 120 in colorimetry, 528 in display characterization, 185–186 in endoscopy, lesion size, 336–338 in gravity imaging, 453–454 in holography, 510 in infrared imaging, 814–815 in magnetic field imaging, 970–976 in meteorological research, 758–759 in planar laser-induced fluorescence (PLIF), 863 quality metrics and, 598–616 in radar imagery, 764–765 in satellite imaging systems, 758–759 in terahertz electric field imaging, 1396–1397 Media relative colorimetry, 533 Medial axis transform (MAS), 625 Medical imaging, 742–757 infrared imaging for, 812 instant photography and, 856 lens in, 1354 magnetic resonance imaging (MRI) in, 977–1002 scanning acoustic microscopy (SAM) in, 1228 single photon emission computed tomography (SPECT) in, 1310–1327 terahertz electric field imaging, 1393–1404 three-dimensional imaging in, 1327
ultrasound, 1412–1435 X-ray fluorescence, 1479–1482 Medium Energy Neutral Atom Imager (MENA), 1010 Medium format cameras, 1351 Medium shot, 1049 MegaSystems, 1031 MEH PPV polymer, 819 Memory color, 520–521, 612 Mercury cadmium telluride (HgCdTe or MCT), 807, 1190, 1201–1205 Mercury discharge tubes, 222 Mercury vapor lamps, 222 Mesopic vision, 515 Metal alloys, SAM analysis for, 1228 Metal halide lamps, 1037 Metal matrix composites, SIMS analysis, 484, 487 Metal oxide semiconductors (MOS), 18–19 Metal semiconductor diodes, 1173 Meteorology, 757–773 airborne radar, 1471–1473 ASOS Lightning Sensor (ALS), 907, 922 Automated Surface Observing System (ASOS), 907, 922 Defense Meteorological Satellite Program (DMSP), 890–904, 929 Doppler radar, 1458–1468 Electrical Storm Identification Device (ESID), 907, 922–924 lidar and, 869–870 lightning locators, 890–955 magnetospheric imaging, 1002–1021 mobile radar, 1471–1473 Thunderstorm Sensor Series, 907 Tropical Rainfall Measuring Mission (TRMM), 660, 771–772, 890–904, 929, 932–935, 1473 weather radar, 1450–1474 wind profiling radar, 1469–1471 Meter candle, 1049 Meteorological Satellite (METEOSAT), 760 Methodology for Art Reproduction in Color (MARC), 664 Methyliminodiacetic acid (MIDA), 141 Metz filters, 1322 Mexican hat wavelet, 1444 Michelson interferometer, 242 Michelson, Albert, 211 Micro dry process printers, 195 Microchannel plates (MCP), in neutron imaging, 1066–1067 Microcrystal preparation, silver halide, 1262–1268
1531
Microcrystals, SIMS analysis, 484–486 Microfilm, art conservation and analysis using, 661–662 Microimaging, 1351 Micromachining, 87 Micron Display, 377 Micropattern gas detector scintillation tracking, 1162–1163 Micropolarizers, in three-dimensional imaging, 1334–1335 Microreagant action in SECM, 1256–1257 Microscopy, 210, 261, 1072, 1106–1141 alternating current scanning tunneling microscopy (ACSTM), 28 art conservation and analysis using, 666–668 atomic force microscope (AFM), 16 biochemistry research and, 693–709 capacitive probe microscopy, 16–31 in charged particle optics and, 87 conventional fixed beam transmission electron microscopy (CTEM), 277 darkfield microscopy, 1127–1128 differential interference contrast (DIC) microscopy, 1106, 1134–1135 digital photomicrography, 1138–1140 electron microscopes, 261–287 emission electron microscope (EMM), 479 emission ion microscope (EIM), 478 fluorescence, 1106, 1135–1137 Fourier transform infrared (FTIR) microscope, 667 high-resolution electron microscopy (HREM), 273 Hoffman Modulation Contrast, 1106, 1132–1134 infinity corrected, 1106–1107 instant photography and, 855 Nomarksi interference microscopy, 1106 phase contrast microscopy, 265–266, 1106, 1128–1130 photomicrography, 1106, 1124, 1137–1138 reflected light microscopy, 1124–1127 scanned probe microscopy (SPM), 1248
1532
INDEX
Microscopy, (continued ) scanning acoustic microscopy (SAM), 1128–1148 scanning capacitance microscope (SCM), 16–31 scanning electrochemical microscopy (SECM), 1248–1259 scanning electron microscope (SEM), 23, 87–88, 262, 274–278, 477, 1243 scanning evanescent microwave microscope (SEMM), 28 scanning ion microscope (SIM), 477 scanning Kelvin probe microscope (SKPM), 16, 28 scanning microwave microscope (SMWM), 16, 28 scanning transmission electron microscope (STEM), 87, 93, 262, 276–278 scanning transmission ion microscope (STIM), 479 secondary ion mass spectroscopy (SIMS), 477–491 stereomicroscope, 1106 transmission electron microscope (TEM), 23, 87, 93, 262–274, 262 Microstrip gas chambers (MSGC), 1060–1062 Microstrip, silicon/Gd, 1065 Microtips, molybdenum, in field emission displays (FED), 375–376 Microwave radar ducting, 1141 Microwaves, 218, 220, 230, 242, 288, 803, 1223–1227, 1393, 1452 Mid-foot key number, 1049 Mie scattering, 253, 548, 875, 1451 Miller indices, 267 Mineral resource exploration, geologic imaging, 650 Mineralogy, electron paramagnetic resonance (EPR) imaging for, 296–297 Minima, 245 Minimum density, 1049 Minimum ionizing particles (MIPs), 1155 Minkowski addition/subtraction, 433, 612 Mirrors, 1073, 1074, 1075 astronomy science and, 691 camera, 1346, 1349–1350 extreme ultraviolet imaging (EUV), 1006 Giant Segmented Mirror Telescope, 693 in holography, 508–509
liquid mirror telescopes (LMT), 872 in overhead surveillance systems, 783–784 quadrature mirror filter (QMF), 622 rotating drum and mirror cameras, 496–497 rotating mirror framing cameras, 497–498 thin mirror telescopes, 1501–1502 in three-dimensional imaging, 1341–1343 X-ray telescopes, 1498–1499 Mix, 1049 Mixed high-signals, television, 1365 Mixing studies, flow imaging, 409, 410 Mobile radar, 1471–1473 Modeling in biochemical research, 699 quality metrics and, 610–615 three-dimensional imaging and, 1327 MODIS, 659–660 MODTRAN, 783, 785 Modulation, 1094 amplitude modulation (AM), 383, 1362 differential pulse code modulation (DPCF), 67, 786 frequency modulation (FM), 1362 pulse code modulation (PCM), 150 pulse width modulation (PWM), 383 quadrature amplitude (QAM), 149 quadrature modulation, 1365–1366 quantization index modulation (QIM), 170–171 sequential frequency modulation, 1367–1368 Modulation transfer function (MTF), 57–58, 95–96, 99, 140–141, 749, 750, 1091, 1094–1098, 1358 cathode ray tube (CRT), 34, 182 flow imaging and, 400–402 human vision and, 521, 544, 555 Moir´e patterns, 38–39, 75 Molecular biology spectroscopy, 1403 Molecular medicine, 745–746 Molecular polarization, 227, 252 Molecular replacement, in biochemical research, 698–699 MOLMOL software, 706 MolScript software, 706 Molybdenum, in field emission displays (FED), 375–376
Moment-based search and retrieval systems, 626 Mondrian values, 52–53 Monitors (See Displays) Monochromatic aberrations, 545–548, 1083–1085 Monochromatic primaries, 104 Monochrome image processing, 100–101 Morley, Edward, 211 Morphological image processing, 430–443, 584–589 MOSFETs, 22–23 Motion analysis, 424, 494 Motion detection, human vision, 566–567 Motion parallax, 1328 Motion pictures (See also Television; Video), 1021–1056 color in, 142 electrostatic image tube cameras, 498 high-speed photography and, 494–498 image dissection cameras, 498 intermittent action high-speed cameras for, 495 photographic color display technology, 1208–1222 Polachrome, 848 Polavision, 848 rotating drum and mirror cameras, 496–497 rotating mirror framing cameras, 497–498 rotating prism cameras, 495–496 Motion studies in meteorological research, 769–771 in overhead surveillance systems, 784 Motorboating, 1049 Motorola, 377 Mottle, 604 Moving slit parallax barrier, 1338 Moviola, 1034, 1049 MOVPE, 22 MPEG, 156–157, 171, 631 MPS PPV, 821–822 MSMS software, 706 MSP software, 707 Mug shots, 715–716 Multi-up processing, 384 Multiangle Imaging Spectroradiometer (MISR), 772 Multiangle viewing instruments, 772 Multichannel scaler (MCS), 873 Multichannel sound, 1362, 1365 Multicolor organic light emitting diodes (OLED), 822–825
INDEX
Multihop operations, radar and over-the-horizon (OTH) radar, 1141 Multilayer perceptron for pattern recognition, 371–373 Multilayer telescopes, 1503 Multipinned phase (MPP) CCD, 1198 Multiplanar imaging, 1328, 1341 Multiple beam interference, 244–246 Multiple carrier systems, television, 1365 Multiple coulombic scattering (MCS), 1157 Multiple isomorphous replacement (MIR), 698 Multiple wavelength anomalous dispersion (MAD), 698 Multiplex sound, television, 1362, 1365 Multiplexed analog component (MAC), television, 1374 Multiplexing, 1049 in liquid crystal displays, 956, 957 in television, 1366–1367, 1382–1385, 1389 Multislice method for crystals, 284–285 Multispectral imaging, 56, 101 geologic imaging and, 660 National Imagery Interpretability Rating Scale (NIIRS), 801 in overhead surveillance systems, 787 Multispectral scanner (MSS), 352, 360, 779 Multispectral sensors, 356 Multivariate granulometry, 441 Multiwave proportional chamber (MWPC), 1160 Multiwavelength imaging, 682–686 Multiwire proportional chambers (MWPC), 1059–1060 Munsell system, 110 MUSE TV transmission systems, 1391 Muybridge, Edweard, 1022 Mylar, in electrophotography, 301
N Naphthol, 133, 134 Narration, 1049 National Imaging and Mapping Agency, 445 National Imagery Interpretability Rating Scale (NIIRS), 795–800 National Integrated Ballistics Information Network (NIBIN), 716 National Lightning Detection Network (NLDN), 890–904, 924, 925, 935–941
National Television Standards Committee (NTSC) standards, 102, 146–149, 382, 519, 521, 619, 1049–1050, 1359–1393 NdYAG lasers, 391, 416, 867 Near distance sharp, 1347 Near field lightning locators, 911–912 Near field diffraction, 246 Near field imaging, 1400–1403, 1425–1426 Near instantaneous companded audio multiplex (NICAM), 1365 Near point, 1077 Nearest neighbor classification, 366–370 Negative and reversal film, 1023 Negative film, 1049 Negative image, 1043, 1049 Negative positive process, 1049, 1347–1348 Negative, intermediate, and reversal film, 1023–1024 Neighborhood operations in image processing, 576–578, 583–584 in medical imaging, 755 in search and retrieval systems, 623 in thresholding and segmentation, 640 Network locator lightning locators, 905, 908 Network mapper lightning locators, 905, 909 Neural activity, electroencephalogram (EEG), 198–199 Neural networks, 371–373, 510–511, 632 Neurons and human vision, 513, 560, 563, 565 Neutral density filters, 1049 Neutral test card, 1049 Neutron activation autoradiography (NAAR), 678–680 Neutron imaging, 1057–71 Neutron radiography, art conservation and analysis using, 678–680 New York Times, 456 Newton’s law of gravitation, 444 Newton’s rings, 1049 Next Generation Space Telescope (NGST), 693 Niepce, Joseph, 455, 1345 Nier–Johnson geometry, 481 NIMBUS satellites, 778 Niobium, in SQUID magnetometers, 10 Nitrate film, 1023, 1039, 1049
1533
Nitride oxide silicon (NOS), 23 Nitrogen, 135 Nitroxyl radicals, 289, 296 NMOS, scanning capacitance microscope (SCM), 22 Noise, 50, 60, 61, 62, 63, 64, 68, 69, 72, 74, 75, 80, 1049, 1357 compression, 152 in forensic and criminology research, 727–731 in human vision and color vision, 517 image processing and, 593–594 in infrared imaging, 809 in magnetic resonance imaging (MRI), 996 in medical imaging, 748–749 in overhead surveillance systems, 785 photodetectors and, 1187–1188 quality metrics and, 598–616 radar and over-the-horizon (OTH) radar, 1142 Rose model and medical imaging of, 753–754 SPECT imaging, 1322 thresholding and segmentation in, 639 in ultrasonography, 1416–1417 wavelet transforms in, 1448 Noise equivalent power (NEP), 1188 Noise equivalent temperature difference (NETD), 814 Noise factor, flow imaging, 395, 396 Nomarksi interference microscopy, 1106 Nomarski, Georges, 1106, 1134 Nondestructive evaluation (NDE) infrared imaging for, 811–812 ultrasonography for, 1412 X-ray fluorescence imaging for, 1478–1479 Nonlinear editing, 1035, 1230–1231 Nonlinearity, in ultrasonography, 1417–1418 Nonparametric decision theoretic classifiers, 359–363 Nonradiometric IR imaging systems, 804 Nonsymmetrical amplitude, 4 Normal dispersion, 227 Normal X-ray fluorescent holographic (NXFH) imaging, 1484–1486 Normalization, 532, 604–605, 1156 Normalized colorimetry, 533 Normally black (NB) liquid crystal displays, 963 Normally white (NW) liquid crystal displays, 960
1534
INDEX
North American Lightning Detection Network (NALDN), 890–904, 935 Notching, 1049 Nuclear magnetic resonance (NMR) in biochemical research, 694, 699, 705, 706 electron paramagnetic resonance (EPR) imaging for vs., 287, 293–294 in magnetic field imaging, 971 RF magnetic field mapping, 1223–1227 Nuclear systems, 215 Nucleation and growth method, for silver halide, 1276–1280 Nucleic Acid Database (NDB), 699 Numerical aperture, 1081, 1115 Nyquist frequency, 790 Nyquist limit, 401, 561 Nyquist’s criterion, MRI, 986 Nyquist’s sampling theorem, 552
O O software, 707 Object classification, human vision, 569 Object recognition, 520–521 Object shaped search and retrieval systems, 625–628 Object space, 1075 Objective quality metrics, 602–606 Objectives, microscope, 1114–1124 Observed gravity value, 446 Ocean wave height, radar and over-the-horizon (OTH) radar, 1149 Oceanography, 760 Oculars (See Eyepieces) Odd and even field scanning, 146 Off axis aberration, 547 Off axis holography, 272 Off line editing, 1050 Offset gravure, 460 Oil exploration, 454, 650 Oil immersion microscopy, 1109–1110 Omnimax, 1031 On axis aberration, 545–546 Online editing, 1050 OPAL jet chamber, 1155 Opaque, 1050 Open cascade development, 312 Open-close filters, 437 Openings, morphological, 430, 436–438, 585 Optic nerve, 513, 552, 562 Optical activity, 233 Optical axis, 54, 540–543 Optical character recognition (OCR), 680
Optical coherence tomography (OCT), 339 Optical density, 234 Optical design, 1074 Optical disks storage 483 Optical effects, 1050 Optical fibers, 509 Optical geometry, 54 Optical image formation, 571, 1072–1106 Optical microscopy, 1106–1141 Optical power, 1074 Optical printer, 1050 Optical resonators, 223 Optical sectioning, microscopy, 1133–1134 Optical sound, 1031, 1037, 1050 Optical thickness, 1135 Optical transfer function (OTF), 57, 1092–1100 aberrations and, 1095–1098 flow imaging and, 399–402 in overhead surveillance systems, 784 Optical Transient Detector (OTD), 890–904, 929–932 Optics (See also Lens) astronomy science and, 691 flow imaging and, 392–393 holography in, 508–509 in human vision, 558–560 microscopy and, 1106–1107 paraxial, 1073–1074 secondary ion mass spectroscopy (SIMS) in, 480–481 terahertz electric field imaging and, 1394 Optimization criterion, in tomography, 1407 Optimum classification, 364–366 OR, 590 Orbital angular momentum, 216 Orders, microscopy, 1108 Organic electroluminescent display, 817–827 Organic ligands, in photographic color display technology, 1214 Organic light emitting diodes (OLED), 817, 822–825 Original, 1050 Orientation, human vision, 565 Original negative, 1050 Orthochromatic film, 1050 Orthogonal frequency division multiplexing (OFDM), 1389 Orthogonal wavelet basis, 1446 Orthopedic medicine, force imaging, 422 Oscillating planar mirror displays, 1341–1342 Oscillations of waves, 226, 803
Oscillator strength, 229 Oscillators, 242 Oscilloscope, cathode ray tube (CRT), 47 Ostwald system, 110 Out of phase signals, 2 Out take, 1050 Outgassing, in field emission displays (FED), 370–380 Over-the-horizon (OTH) radar, 1141–1153 Overall darkness, 604 Overexposure, in electrophotography, 318 Overhead surveillance (See also Aerial imaging), 773–802 Overlap splice, 1050 Overlay, 1050 Overshoot tolerance, 912–914 Oxichromic developers, 835 Oxidation of dyes, 836–839 Oximetry, EPR imaging for, 297–298
P P-frame, video, 1387 p–n photodiodes, 1172 Packet communications, 151, 152, 1382–1385 Packetization delay, compression, 152 Packetized elementary streams (PES), 1382–1385 Paints infrared light and, 668–672 Pressure sensitive paint, 867–868 Pair production, 256, 260 PAL standard, 146–149, 382, 1050, 1371–1373 PALplus television systems, 1392 Pan, 1050 Panavision 35, 1050 Panchromatic film, 1050 Panchromatic imaging, 55, 61, 85 Papers, photographic, 141, 605–606, 1209–1211 Parallacticscope, 1338 Parallax, 1050, 1330 Parallax barrier displays, 1337 Parallax, motion, 1328 Parallel viewing, 1336 Parallelism, functional, 562, 563 Parallelism, spatial, 562, 564 Paraxial marginal ray (PMR), 1080, 1083, 1085 Paraxial optics, 1073–74 Paraxial pupil ray (PPR), 1080, 1083, 1085 Paraxial ray tracing, 1074–1075, 1078, 1083, 1085 Paraxial rays, 97
INDEX
Parietal lobe, 569 Parseval’s theorem, 1105 Particle accelerators, 970 Particle beam measurement, 975 Particle detector imaging, 1154–1169 Particle form factor, 252 Particle image velocimetry (PIV), 391, 413–416 Particle polarization, 252 Particle sizing, in holography, 510 Passband sampling, 50, 56–61, 65 Passive matrix liquid crystal displays, 956 Pathogen dispersal, geologic imaging, 656–659 Pattern classification, 358–371 Pattern recognition, 350–353, 371–373 human vision and, 522, 559, 565 in search and retrieval systems, 625, 633 Pattern scanning, 573–574 Pattern spectrum, 442 Pattern spectrum density (PSD), 442 Pauli exclusion, in silver halide, 1272 Pediatric medicine, force imaging, 422 PEDOT polymer, 819–820, 823 Peel apart films, 828 Pendellousung effect, 282, 286 Penetration depth, in scanning acoustic microscopy (SAM), 1129–1130 Perceptron, pattern recognition using, 371–373 Perceptual redundancy and compression, 151 Perfect reflecting diffuser, 528 Perforations, film, 1025–1026, 1045, 1050 Period of wave, 212 Periodic functions, 1102–1105 Permeability, ground penetrating radar, 467 Permittivity, 212, 225, 466, 467 Persistence of vision, 1021–1022, 1050 Perspecta sound, 1050 Perspective, 1328, 1330, 1345–1347 Persulfate bleaches, 139 Perturbing spheres, in RF magnetic field mapping, 1223–1224 Perylenes, 1179, 1181 Phase, 241 Phase contrast imaging, in scanning acoustic microscopy (SAM), 1230 Phase contrast microscope, 265–266, 1106, 1116, 1128–1130 Phase object approximation, 278 Phase of wave, 213
Phase retarders, 233 Phase shift, 238–239, 1094, 1130 Phase transfer function (PTF), 1094 Phase velocity, 213, 228 Phasor diagrams, 241 Phasors, 246 Phenols, 133 Phenylenediamine (PPD) developer, color photography, 130 Phosphor primaries, 104 Phosphor screen, CRT, 31, 44–46, 173, 380 PhosPhor thermography, 864–867 Phosphoresence, 255 Photo-induced discharge curve (PIDC), 301–302, 318 Photobook, 623, 627 Photoconductivity, 1169–83 Photoconductors, 300–302, 310–312, 1169–83, 1190, 1200–05 Photodetectors, 49–61, 68, 70, 75, 80, 1087, 1090, 1172–1174, 1190–1191, 1198–1201 quantum well infrared photodetectors (QWIP), 807 Photodetectors, 1183–1208 Photodiodes, 1172–1174, 1190–1191, 1198–1201 Photodynamic therapy (PDT), in endoscopy, 339 Photoelastic materials, force imaging, 420 Photoelectric detectors, 1169 Photoelectric effect, 211, 255–256 Photoemission, 1169–1170 Photoemulsion microcrystals, SIMS analysis, 484–486 Photofinish photography, 500–501 Photogrammetry, 737–739, 1327 Photographic color display technology, 1208–1222 Photography, 456, 1344–58 art conservation and analysis using, 661–682 color (See Color photography) digital, 141–142 electro- (See Electrophotography) in forensic and criminology research, 709–714 high-speed imaging (See High-speed photographic imaging) instant (See Instant photography) in medical imaging, 754 motion picture (See Motion pictures) mug shots, 715–716 in overhead surveillance systems, 773–802
1535
photofinish photography, 500–501 process screen photography, 1051 quality metrics and, 598–616 in search and retrieval systems, 616–637 silver halide detectors and, 1259–1309 still, 1344–1358 stroboscopic, 493–494 surveillance imaging using, 714–715 synchroballistic photography, 500–501 Photointerpretation, 774 Photolithography, 383, 384 Photoluminescence, 255 Photometer, 185–186, 1050 Photomicrography, 855, 1106, 1124, 1137–1138 Photomultiplier, 1062, 1183–1208 Photomultiplier tubes (PMT), 873 Photon absorbers, in infrared imaging, 806 Photon counting, lidar, 873–874 Photon flux, 215 Photons, 211, 214–215, 223 Photopic luminous efficiency function, 555 Photopic vision, 515 Photopigments, 101, 563 Photopolymers, holography, 509 Photoreceptors, 49, 50, 58, 65, 71, 72, 513–517, 549–552, 558–562, 746–747, 1178–1183 in copiers, 1175–1183 Photoreconnaissance, 774 Photostat, 299 Photothermographic imaging, 851–853 Phototransistors, 1172–1174 Photovoltaics, 1190, 1204 Phthalocyanines, 1179, 1181 Physical vapor deposition (PVD), 383 PicHunter, 632 Pickup coils, in RF magnetic field mapping, 1224 Pictorial interface, in search and retrieval systems, 618 Pictrography 3000/4000, 852–853 Pictrostat 300, 852–853 Pictrostat Digital 400, 852–853 Piezoelectric effect, 420 Piezoelectric sensors, 423–424 Piezoelectric transducers, 1233, 1234 Pigment epithelium, 513 Pigments, infrared light, 668–672 Pin photodiodes, 1173 Pin registration, 1050 Pin scorotron, in electrophotography, 303
1536
INDEX
Pincushion distortion, 42–43, 93 Pinhole camera, 1072–1073 Piston transducers, 8–9, 1424–26 Pitch, 1050 Pitch, film, 1026 Pivaloylacetanilides, 134–135 Pixel detectors, particle detector imaging, 1167–1168 Pixel independence, cathode ray tube (CRT), 176, 179–181 Pixels, 1050 in cathode ray tube (CRT), 173 feature extraction using, 357 feature measurement and, 343–344 image processing and, 575–578 PixTech, 376 Planar Doppler velocimetry (PDV), flow imaging, 415–416 Planar imaging, flow imaging, 390–391, 397 Planar laser-induced fluorescence (PLIF), 391, 408–409, 411–416, 861–864 Planar neutron radiography, 1067 Planar sources, continuous, 7–8 Planck distribution, 804 Planck, Max, 211 Planck’s constant, 287, 288, 394, 1184, 1273, 1476 Planck’s equation, 525, 782 Planck’s function, 759 Planck’s law, 683 Planckian radiation (See also Blackbody radiation), 525 Plane of incidence, 233, 234 Plane of vibration, 231 Plasma displays, display characterization, 185 Plasma enhanced chemical vapor deposition (PECVD), 384 Plasma frequency, 227, 231 Plasma sheet and cusp, ENA imaging, 1010–1012 Plasma wave generators, terahertz electric field imaging, 1403 Plasmasphere, extreme ultraviolet imaging (EUV), 1005–1006 Plasmatrope, 1022 Plateau, Joseph A., 1022 Platinum silicide Schottky barrier arrays, 1201 PMOS, scanning capacitance microscope (SCM), 22 p–n junctions, scanning capacitance microscope (SCM) analysis, 22 Pockels cells, high-speed photography, 492–493 Pockels effect, 233, 1394, 1397 Pocket Camera instant films, 847 Podobarograph, force imaging, 421
Point of denotation (POD), in search and retrieval systems, 621–622, 630 Point of subjective equality (PSE), 609 Point spread function (PSF), 55, 1082, 1085–1092, 1097, 1098, 1099 flow imaging and, 399, 402 human vision and, 542, 543 image processing and, 596 in magnetic resonance imaging (MRI), 987 in medical imaging, 750 in overhead surveillance systems, 784 telescopes, 688 in ultrasonography, 1421–1423 Poisson probability, 53 Poisson’s equation, 26–27 Polachrome, 848 Polacolor, 834, 840, 843–844 Polarization, 212, 223, 226, 227, 231–233, 235, 241, 250, 251, 252, 255, 257 art conservation and analysis using, 666–668 in forensic and criminology research, 719 in ground penetrating radar, 468 in high-speed photography, 492–493 in holography, 509 in human vision, 550–551 in lidar, 873 in magnetic resonance imaging (MRI), 978, 996 in meteorological research, 772 in microscopy and, 1116, 1131–1132, 1134 modifying materials vs., 233 in radar, 1453, 1468 reflectance and, 237 in terahertz electric field imaging, 1397 in three-dimensional imaging, 1330, 1331 Polarization and Directionality of Earth’s Reflectance (POLDER), 772 Polarization angle, 237 Polarization diversity radar, 772, 1468–69 Polarization rotators, holography, 509 Polarizing filter, 1050 Polaroid, 127, 829, 831–833, 844–849, 1331 Polaroid sheet, 233 Polavision, 848 Polyaromatic hydrocarbons, 1179
Polyester film, 1023, 1039, 1050 Polyethylene terephthalate films, 1023 Polymer degradation, EPR imaging, 296 Polymer films, gravure printing, 461 Polymer light emitting diodes (PLED), 817, 820–822 Polymer light emitting logo, 819–820 Polymeric research, scanning electrochemical microscopy (SECM) for, 1255–1256 Polyvinylcarbazole (PVK), 301, 823–825, 1178 Population inversion, 223 Position of objects, feature measurement, 345–347 Positive film, 1051 Positron emission tomography (PET), 210, 220, 743, 1407, 1324–1326 Postdevelopment processes, silver halide, 1303 Postproduction phase, in motion pictures, 1034, 1051 Potential well (POW) Scorotron, in electrophotography, 303 POV Ray software, 707 Powder cloud development, in electrophotography, 312 Power amplification, radar and over-the-horizon (OTH) radar, 1151 Power consumption, in field emission displays (FED), 388 Power K, 1078 Power law, human vision, 747 Power spectral density (PSD), 52, 53, 61–75, 80 Power, optical, 1074 Poynting vector, 213 PPP NEt3 polymer, 820–822 Prandtl number, flow imaging, 404 Precipitation, in silver halide, 1266 Predictive coding (DPCM), compression, 153, 155 Preproduction, in motion pictures, 1031–1032 Presbyopia, 540 Pressure effects flow imaging and, 411–416 force imaging and, 420, 421, 424–426 planar laser-induced fluorescence (PLIF) in, 863 pressure sensitive paint, 867–868 Pressure garments, force imaging, 420 Pressure sensitive paint, 867–868 Primary colors, 102, 104, 126–127, 358, 531–534, 1051 Principal maxima, 245
INDEX
Principal planes, 1078 Principle of superposition, 239–240 Print film, 1024, 1051 Printed edge numbers, 1051 Printer lights, 1051 Printers, 300 cyclic color copier/printers, 328 direct thermal printers, 196 display characterization in, 183 dots per inch (DPI) in, 602 drive mechanisms in, 193 dye sublimation, 189–190, 194–195 in dye transfer printing, 188–197 electrosensitive transfer printers, 195 image-on-image or REaD color printing, 328–329 laser printers, 195–196, 302, 306–310 light sensitive microcapsule printers, 195 micro dry process printers, 195 in motion pictures, 1022 quality metrics and, 598–616 REaD color printing, 328–329 tandem color printers, 328 thermal transfer printers, 189 wax melt, 190–191, 195 wet gate, 1055 Printing, 454–463, 1051 Printing press, 455 Prisms, 495–496, 1073, 1106 Prisoner’s problem, 160 Process screen photography, 1051 Processing, 1051 Processing chemicals (See also Developers), in holography, 509 Processing of images (See Image processing) Product theorem, 4 Production phase, in motion pictures, 1032 Production supervisor, 1051 Progressive scanning, 146–147 Projection, image formation, 571–574 Projection film, 1023 Projection matrix, in tomography, 1407 Projection speed, 1051 Projection theorem, in tomography, 1406 Projectors, 184–185, 1022, 1035–1038, 1330 Prontor shutters, 1351 Propagation of waves, 220–231 in ground penetrating radar, 466, 467–469
in radar and over-the-horizon (OTH) radar, 1147, 1453–1454 Propagative speed, in ultrasonography, 1415 Proportionality, Grassmann’s, 531 Proportioned bandwidth, television, 1367 Protanopia, 522–523 Protein Data Bank (PDB), 699 Proteins (See Biochemistry) Proton/electron auroras, far ultraviolet imaging, 1016–1020 Proximity sensors, for lightning locators, 907 Pseudocolor display, 120–121 Pseudostereo imaging, 1330 Psychological quality metrics, 614–615 Psychometrics in quality metrics, 609–610 Psychophysical quality metrics, 607, 614–615 Ptychography, 262 Pulfrich technique, in three-dimensional imaging, 1333–1334 Pull down claw, 1027, 1051 Pulse code modulation (PCM), 67, 150 Pulse echo imaging, in ultrasonography, 1412 Pulse wave mode, in scanning acoustic microscopy (SAM), 1232–1234 Pulse width modulation (PWM), 383 Pulsed EPR imaging, 293 Pulsed repetition frequency (PRF), 872, 1452 Pump, laser, 223, 391 Punch-through bias, particle detector imaging, 1166 Pupil, 112–113, 514, 519, 540–543, 547, 553–554, 558 Pupil functions, 1086–1087 Pupil rays, 1081 Purple Crow lidar, 871 Push-broom scanners, 806 PUSH three-dimensional imaging, 1334 Push processing, 1051 Pyrazolin, 135 Pyrazolinone, 135 Pyrazolotriazole, 133, 135–136 Pythagoras’ theory, 1094
Q Q switching, 391–392, 508 QBIC, 618–630 Quad tree, in search and retrieval systems, 629
1537
Quad tree method, thresholding and segmentation, 645–646 Quadrature amplitude modulation (QAM), 149 Quadrature mirror filter (QMF), 622 Quadrature modulation, 1365–1366 Quality metrics and figures of merit, 598–616, 1081–1085, 1357–1358 human vision and, 543–544 in medical imaging, 748–754 in overhead surveillance systems, 789 PSF and, 1089–90 in tomography, 1409–1410 in ultrasonography, 1420–1424 Quantitative imaging, flow imaging, 391, 397 Quantitative SPECT, 1322–1324 Quantitative structure determination, in transmission electron microscopes (TEM), 273–274 Quantization, 61, 64, 211 compression, 154 digital watermarking and, 150 electron paramagnetic resonance (EPR) imaging for, 287 flow imaging and, 402 liquid crystal displays (LCDs), 184 television, 1375 video, 1388 Quantization index modulation (QIM), digital watermarking, 170–171 Quantization noise, 62, 63, 64, 68, 69, 80 Quantum detection efficiency/quantum efficiency, 691, 748, 1190 Quantum electrodynamics (QED), 256, 259 Quantum mechanical cross section, 255–256 Quantum nature, 214–218 Quantum sensitivity, in silver halide, 1282, 1284–1285 Quantum step equivalence (QSE), 786 Quantum terahertz biocavity spectroscopy, 1403 Quantum theory, 211, 214–218, 253, 1170 Quantum well arrays, 1205 Quantum well infrared photodetector (QWIP), 807, 1190, 1205 Quantum yield, 255, 862 Quarter wave plates, 233 Quasi 3D, scanning capacitance microscope (SCM), 25–26
1538
INDEX
Query by color, in search and retrieval systems, 621–622 Query by example, in search and retrieval systems, 620–621, 624, 628, 630 Query by sample, in search and retrieval systems, 624 Query by sketch, in search and retrieval systems, 618, 627, 630 Quinone, instant photography, 837–839 Quinonedimine (QDI) developer, color photography, 130
R R 190 spool, 1051 R 90 spool, 1051 Radar, 1450–1474 airborne, 1471–1473 bistatic, 772 cathode ray tube (CRT) in, 47 coordinate registration, 1144 Doppler (See Doppler radar) Doppler shift, 1145 dwells, 1146 equatorial anomaly, 1144 geologic imaging and, 648 ground penetrating, 463–476 group range, 1144 lidar vs., 869–870 magnetic storms vs., 1145 measurements in, 764–765 meteorology, 757–773 microwave radar ducting, 1141 mobile systems, 1471–73 National Imagery Interpretability Rating Scale (NIIRS), 795–800 over-the-horizon (OTH) radar, 1141–1153 in overhead surveillance systems, 775–802 polarization diversity radar, 772, 1468–1469 range folding, 1145 scattering, 1145–1146 shortwave fadeout, 1145 skip zone, 1141 sporadic E, 1144 spotlights, 1146 storm tracking, 765–769 surface wave radar, 1141 synthetic aperture radar (SAR), 356, 789 terminators, 1145 traveling ionsopheric disturbance (TID), 1144 weather radar, 1450–1474 wind profiling radar, 1469–1471 Radar cross section (RCS), 1145
Radar reflectivity factor, 765, 1453, 1454–1458 RADARSAT, 649 Radial velocity, 764 Radiance, 61, 524–525, 530, 785, 968, 1081, 1092 Radiance field, 51–54 Radiant heat transfer, in electrophotography, 325 Radiated fields, lightning locators, 909–914 Radiation, lightning locators, 911–912 Radiation belts, energetic neutral atom (ENA) imaging, 1006–1010 Radiation damping, 226 Radiation oncology, in medical imaging, 744 Radiation pressure, in transmission electron microscopes (TEM), 214 Radiation zone, 220 Radiative lifetime, 254 Radiators, non-planar, impedance boundaries, 9 Radio astronomy, 210 Radio frequency (RF), 220, 909 Radio interferometry, lightning locators, 946–947 Radio waves, 218, 230, 242, 803 astronomy science and, 682, 693 Atacama Large Millimeter Array (ALMA), 693 lightning locators, 890 RF magnetic field mapping, 1223–27 Radiography, 1057–71, 1067 Radiometric IR imaging systems, 804 Radiometry, 524, 810 Advanced Very High-Resolution Radiometer (AVHRR), 759, 760–761 Along Track Scanning Radiometer (ATSR), 772 Multiangle Imaging Spectroradiometer (MISR), 772 in overhead surveillance systems, 783 Radiosity, 804 Radiotracers, SPECT imaging, 1310–1314 Radon transform, in tomography, 1404–1406 Raggedness, 604 Rainbow schlieren, flow imaging, 412 Ram Lak filters, in tomography, 1405 Raman scattering, 259–260, 391, 411, 874, 882–885 Ramsden disk, 1111, 1120
Ramsden eyepiece, 1120 Random dot autostereogram, in three-dimensional imaging, 1336 Range folding, 1145 Range normalized signal strength (RNSS), lightning locators, 937 Range of use magnification, 1121 Rangefinders, 1350–1351 Rank ordering, quality metrics, 607–608 Ranking operations, image processing, 578–580, 583 RasMol software, 707 Raster, 31, 35, 42–43, 1051 Raster effect, 72 Raster3D, 707 Rate conversion, television, 1380 Rating techniques, quality metrics, 608–609 Raw stock, 1051 Ray tracing, 1076 finite, 1083 paraxial, 1074–1075, 1078, 1083, 1085 Rayleigh criterion, 245–246, 249 Rayleigh function, 94 Rayleigh–Gans–Debye scattering, 251–252 Rayleigh range, flow imaging and, 392 Rayleigh resolution limit, 1082, 1087 Rayleigh scattering, 252–253, 257–259, 391 flow imaging and, 410, 411, 412, 416 human vision and, 548 lidar and, 874, 876–880 flow imaging and, 415 Reach-through bias, particle detector imaging, 1166 REaD color printing, 328–329 Read noise, flow imaging, 395, 396 Reagents, instant photography, 828–829 Real time, 1051 Real time imaging, ultrasonography, 1428–1429 Receivers lidar and, 872–873 radar and over-the-horizon (OTH) radar, 1147, 1151 Reception, in magnetic resonance imaging (MRI), 979–980 Receptive field, human vision, 563 Reciprocity failure, silver halide, 1286–1288 Reciprocity law, 1051 Recombination, in silver halide, 1275, 1281 Recommended practice, 1051
INDEX
Reconstruction of image algebraic reconstruction technique (ART), 1407–1408 in magnetic resonance imaging (MRI), 987–988 SPECT imaging, 1316–1321 in tomography, 1407–1408 wavelet transforms in, 1448 Reconstructive granulometries, 441–442 Recording systems, 504–505, 873 Rectangular piston, baffled, 8–9 Red green blue (RGB) system, 105–106, 531–534 in cathode ray tube (CRT), 173, 174 in color image processing, 101–102 in color photography, 123–142 in feature recognition and object classification in, 358 HSI conversion in, 112–114 image processing and, 578, 580 in motion pictures, 1052 in search and retrieval systems, 619 in television, 147–148 in thresholding and segmentation, 641 Redshift (See also Doppler shift), 686, 772 Reduction dye release, in instant photography, 839–840 Reduction printing, 1051 Reduction sensitization, silver halide, 1293 Redundancy, compression, 155 Reed Solomon coding, 1390 Reel 3D Enterprises, 1333 Reel band, 1051 Reference carriers, television, 1365 Reference white, 102 Reflectance, 50–53, 61, 68, 236–237, 527–530, 558, 610, 611, 803, 1072 in forensic and criminology research, 717 in scanning acoustic microscopy (SAM), 1235–1244 Reflected light microscopy, 1124–1127 Reflecting prisms, 1073 Reflection, 233–239, 267, 527–529, 1072, 1075 Bragg, 244 conducting surface, 239 ground penetrating radar and, 464–466, 468 holography in, 508 human vision and, 553–554, 561
in scanning acoustic microscopy (SAM), 1235–1244 in ultrasonography, 1415–1416 Reflection coefficient, in scanning acoustic microscopy (SAM), 1235–1244 Reflection grating, X-ray telescopes, 1505–06 Reflection holograms, 508 Reflective liquid crystal displays, 965–966 Reflectivity factor, radar, 1453, 1454–1458 Reflex cameras, 1345 Reflex radiography, art conservation and analysis using, 675–676 Refraction, 233–239, 1075, 1131–1132 dispersion vs., 234–235 ground penetrating radar and, 468 index of (See Index of refraction) in motion pictures, 1051 Refractive error, 548 Refractive index (See also Index of refraction), 512, 1075, 1079, 1109, 1453 Region growing, thresholding and segmentation, 643–645 Regional gravity anomaly, gravity imaging, 448 Regions of influence, feature recognition and object classification, 370–371 Register tolerances, cathode ray tube (CRT), 37 Registration of image, in three-dimensional imaging, 1331 Regularization parameters, in tomography, 1407 Regularized least squares, in tomography, 1408 Rehabilitation, force imaging, 422, 424 Rehalogenating bleaches, 138 Reich, Theodore, 456 Relative aperture, 1081 Relative colorimetry, 533 Relative dielectric permittivity (RDP), 467 Relative edge response (RER), 800 Relative neighborhood graph (RNG), 370–371 Relative quantum efficiency (RQE), silver halide, 1296–1298 Relaxation parameters, in tomography, 1408 Release print, 1023, 1051 Rem jet backing, 1051 Rembrandt Intaglio Printing Co., 456
1539
Remote sensing, 210, 356 geologic imaging and, 648–650 lidar and, 869, 870 in overhead surveillance systems, 780, 782 in three-dimensional imaging, 1327 Remote Sensing Act, 1992, 780 Rendering, 1051 Repetitive flash (stroboscopic) photography, 493–494 Reset noise, flow imaging, 396 Residual gravity anomaly, 448 Residual index, in biochemical research, 699 Resin coated (RC) paper, 1210–1211 Resolution, 60, 71, 75, 84, 1073, 1082, 1087, 1100–1103, 1357, 1358 Abbe numbers in, 1100–1102 amplitude, 151 astronomy science and, 686–688 cathode ray tube (CRT), 32, 33–34 charged particle optics, 86–100 in charged particle optics, 94–100 compression and, 151 detector, 790–793 electron paramagnetic resonance (EPR) imaging for, 292 in endoscopy, 334–336 flow imaging and, 398–405 in forensic and criminology research, 712–713 ground penetrating radar and, 469–471 High-Energy Neutral Atom Imager (HENA), 1010 human vision and, 561 liquid crystal displays (LCDs), 184, 858–859 in magnetic resonance imaging (MRI), 987 in magnetospheric imaging, 1010 in medical imaging, 750–752 microscopy and, 1106 in motion pictures, 1051 in overhead surveillance systems, 789–790, 792–794 photoconductors, 1181–82 Rose model and medical imaging, 753–754 scanning capacitance microscope (SCM), 19 in scanning acoustic microscopy (SAM), 1229, 1244–1245 spatial, 151 television, 1362 temporal, 151 in tomography, 1410
1540
INDEX
Resolution, (continued ) in transmission electron microscopes (TEM), 263, 266 in ultrasonography, 1421–1424 wavelet transforms in, 1448 X-ray fluorescence imaging and, 1482–1484 Resolution limit, 1073, 1087 Resolving power, 1051, 1358 Resonance, 226, 228–233, 287 Resonance curve, 226 Resonance lidar, 883–885 Resonant coils, RF magnetic field mapping, 1223–1227 Resonant fluorescence, 259 Resonant frequency, 228 Resonators, 223 Response time, in field emission displays (FED), 387 Responsitivity, 54–56 human vision and color vision, 529–531 photodetectors and, 1187 Restoration of images (See Image restoration) Restoring force, 225 Reticulation, 1051 Retina, 65, 513–519, 522, 543, 547, 549, 552, 558–564, 746–747 Retinal image size, 1328 Retinex coding, 61, 65, 75–83 Retrace lines, television, 1362 Retrofocus, 1354 Return beam vidicon (RBV), in overhead surveillance systems, 779 Reversal film, 139, 1052 Reversal process, 1052 Reverse modeling, scanning capacitance microscope (SCM), 27 Reverse perspective, 1330 Reynolds number, flow imaging, 404, 410, 412 RF coils, in magnetic resonance imaging (MRI), 999 RF magnetic field mapping, 1223–27 RF spoiling, in magnetic resonance imaging (MRI), 992–993 Rheinberg illumination, 1128 Rhodopsin, 517, 561, 747 Ring current, energetic neutral atom (ENA) imaging, 1006–1010 Ring imaging, particle detector imaging, 1162 Ring imaging Cerenkov counters (RICH), 1162 Ring opening single electron transfer (ROSET) dye release, 840, 851 Ringing, 75 Rise time, in lightning locators, 913
Ritchey-Chretein mirrors, 783 Ritter von Stampfer, Simon, 1022 Robustness, 74 Rocking curve, 280, 282 Rods, 122, 513–519, 530–531, 551–554, 558, 560–561, 562, 746–747 Roget, Peter Mark, 1022 Roller charging, in electrophotography, 303 Ronalds, 299 Rontgen, Wilhelm C., 1475 Root mean square (rms) aberration, 1090 Root mean square (rms) granularity, 140 ROSAT telescopes, 1507 Rose model, in medical imaging, 753–754 Rotating drum and mirror cameras, 496–497 Rotating mirror framing cameras, 497–498 Rotating mirrors, in three-dimensional imaging, 1343 Rotating prism cameras, 495–496 Rotation, 167, 1052 Rotation, molecular, 216 Rotational frequency, in magnetic resonance imaging (MRI), 979 Rotational mapping, in RF magnetic field mapping, 1225 Rotogravure, 454–463 Rough cut, 1052 RS strings, in search and retrieval systems, 629 Run-length encoding, 516 Ruska, Ernst, 261 Rutherford scattering, particle detector imaging, 1157
S Saddle coils, deflection yoke, 41–42 Safety acetate film, 1023, 1024, 1052 SAFIR lightning locators, 914, 945–950 Sampling, 56, 61 compression, 153 digital watermarking and, 150 human vision and, 552, 554, 560 lightning locators, 918–921 in magnetic resonance imaging (MRI), 986–987 passband, 50, 56–61, 65 thresholding and segmentation in, 640 Sampling lattice, 50, 54, 57, 58, 59, 65, 71 Sanyo three-dimensional display, 1340–41
Satellite imaging systems, 350, 356 Advanced Very High-Resolution Radiometer (AVHRR), 759, 760–761 Advanced Visible Infrared Imaging Spectrometer (AVIRIS), 650, 787 Along Track Scanning Radiometer (ATSR), 772 Applications Technology Satellite (ATS), 757 Array of Low Energy X Ray Imaging Sensors (ALEXIS), 905, 929 cloud classification, 761–764 Defense Meteorological Satellite Program (DMSP), 890–904, 929 Earth Observing System (EOS), 772 Earth Resources Technology Satellite (ERTS), 778–779 Fast On Orbit Recording of Transient Events (FORTE), 890–904, 929 feature recognition and object classification in, 352–353 geologic imaging and, 647–661 Geostationary Meteorological Satellite (GMS), 760 Geostationary Operational Environmental Satellite (GOES), 760, 778 gravity imaging and, 444–454 IKONOS satellite, 780 image processing and, 589 IMAGE, 1018 imaging satellite elevation angle (ISEA) in, 791 in overhead surveillance systems, 773–802 infrared in, 758 Landsat, 778, 787 Lightning Imaging Sensor (LIS), 890–904, 929, 932–935 lightning locators, 890, 928–935 magnetospheric imaging, 1002–1021 measurement in, 758–759 meteorology, 757–773 Meteorological Satellite (METEOSAT), 760 Multiangle Imaging Spectroradiometer (MISR), 772 multiangle viewing instruments, 772 multispectral image processing and, 101 NIMBUS, 778 oceanography, 760
INDEX
Optical Transient Detector (OTD), 890–904, 929–932 Polarization and Directionality of Earth’s Reflectance (POLDER), 772 Systeme Probatoire d’Observation de la Terre (SPOT), 779–780 Television and Infrared Observational Satellite (TIROS), 757, 777 Thematic Mapper, 648, 653, 654, 657, 779 time delay integration (TDI), 1018 Tropical Rainfall Measuring Mission (TRMM), 660, 771–772, 890–904, 929, 932–935, 1473 X-ray Evolving Universe Satellite, 1509 Saturation, 103, 111, 117, 119, 578, 580, 618, 994–995, 1043, 1052 Sawtooth irradiance distribution, 1092 Scaling, digital watermarking, 167 Scaling factors, television, 1367 Scaling methods, quality metrics, 607–608 Scalograms, 1444 Scan a Graver, 456, 461 Scanned probe microscopy (SPM), 1248 Scanners, 574 art conservation and analysis using, 663 calibration of, 603 in forensic and criminology research, 709 in infrared imaging, 804, 805–806 in meteorological research, 769 multispectral, 360 in overhead surveillance systems, 779 push broom type, 806 quality metrics and, 602–603 whisk broom type, 806 Scanning, 1072 high-definition TV (HDTV), 147 image formation in, 571, 573–574 interlaced, 146–147 k space, 574 odd and even field, 146 pattern, 573–574 progressive, 146–147 television, 146–148, 1359, 1362, 1366–1367 ultrasonography, 1413–1415 X-ray fluorescence imaging and, 1478–1479, 1482 Scanning acoustic microscopy (SAM), 1128–1148
Scanning capacitance microscope (SCM), 16–31 Scanning capacitance spectroscopy, 21 Scanning electrochemical microscopy (SECM), 1248–1259, 1248 Scanning electron microscope (SEM), 23, 87–88, 262, 274–278, 477, 1243 Scanning evanescent microwave microscope (SEMM), 28 Scanning ion microscopy (SIM), 477 Scanning Kelvin probe microscope (SKPM), 16, 28 Scanning lines, television, 1359 Scanning microwave microscope (SMWM), 16, 28 Scanning transmission electron microscope (STEM), 87, 93, 262, 276–278 Scanning transmission ion microscope (STIM), 479 Scattering, 51, 242, 244, 249–253, 256–260, 282, 283, 285, 286, 391, 1072 in biochemical research, 698 flow imaging and, 397–398, 410, 411, 415–416 ground penetrating radar and, 471 holography in, 504 human vision and, 548, 549 image formation in, 571 lidar and, 874–875 multiple coulombic scattering (MCS), 1157 in overhead surveillance systems, 785 particle detector imaging and, 1157 radar, 1145–1146, 1451 Rutherford scattering, 1157 in scanning acoustic microscopy (SAM), 1243 SPECT imaging, 1323–1324 in transmission electron microscopes (TEM), 269 in ultrasonography, 1415–1416 Scattering angle, 250 Scattering plane, 251 Scene, 1052 Scherzer defocus, 272 Schlieren images, 405–408, 412, 501–504 Schottky diode, 19, 1172–1174, 1190, 1201 Schottky thermal field emitter, in charged particle optics, 90 Schulze, Johnann Heinrich, 1345
1541
Science (See also Astronomy; Biochemistry; Medicine and medical research), 742 Scientific Working Group on Imaging Technologies (SWGIT), 719, 741 Scintillation, 688, 1158, 1313–1314 Scintillation cameras, SPECT imaging, 1313–1314 Scintillation tracking devices, particle detector imaging, 1158–1168 Scintillator detectors, in neutron imaging, 1062–1064 Scintillators, in radiographic imaging, 1067, 1068 Scope, 1052 Scorotron charging, in electrophotography, 303 Scorotrons, 1176 Scotopic (rod) vision, human, 122, 515, 747 Scotopic luminous efficiency function, 555 Scrambling, 60 Screened images, 455 Screw threads, microscopy, 1116 Scrim, 1052 Script, 1052 Sealing glass (frit), in field emission displays (FED), 387 Seaphone three-dimensional display, 1339–1340 Search and retrieval systems, 616–637 Search engines, in search and retrieval systems, 633 SECAM standards, 146–148, 1052, 1367–1371 Second order radiative process, 256 Secondary electron (SE) imaging, in scanning electron microscopes (SEM), 275–276 Secondary ion mass spectroscopy (SIMS), 477–491 Secondary maxima, 245 Security, digital watermarking, 159–161 Segmentation, 615 in color image processing, 119–120 human vision and, 568–569 image processing and, 587 in search and retrieval systems, 622, 625 Seidel polynomials, 542 Selection rules, atomic, 253 Selectivity, human vision, 565–567 Selenium, 300, 301, 1170 Sellers, Coleman, 1022
1542
INDEX
Semiconductor detectors, 1064–65, 1163–1165, 1168 Semiconductors, 22–23, 1183–1208 Semigloss surfaces, 528 Sensitivity, 1052 astronomy science and, 688–690 in magnetic resonance imaging (MRI), 983 in radar and over-the-horizon (OTH) radar, 1147 silver halide, 1261, 1282, 1284–1285 in SQUID sensors, 14 Sensitivity or speed of film, 124, 139 Sensitization, in photographic color display technology, 1215–1216, 1288–1293 Sensitizers, color photography, 124–125 Sensitometer, 1052 Sensitometry, silver halide, 1262 Sensors, 101, 356 active pixel sensors (APS), 1199–1200 capacitance, 17–18 CMOS image sensors, 1199–1200 force imaging and, 420, 422–424 monochrome image processing and, 100 in overhead surveillance systems, 787–789 scanning capacitance microscope (SCM), 17–18 Separation light, 1052 Separation masters, 1052 Sequence, 1052 Sequential color TV, 1365 Sequential frequency modulation, 1367–1368 Series expansion methods, in tomography, 1406–1409 Serrations, television, 1360 Set, 1052 Set theory, 430 Setup level, television, 1362 70 mm film, 1025 Sferics, 890 Shading, 3, 58, 65, 82, 1328 Shadow mask, 31, 36–38, 44, 47, 173, 825–826 Shadowgraphs, 405–408, 501, 743 Shadowing, 1328 Shadows, 51, 53, 58, 75, 82 Shallow electron trapping (SET) dopants, 1215 Shannon’s law, 49, 63 Shannon’s theory of information, 99 Shape analysis-based search and retrieval systems, 625–628 Shape factor, 357–358
Shape of objects, in feature measurement, 347–350 Sharpness, 71, 75, 81, 140, 1052, 1347, 1357 in color photography, 137 in high-speed photography, 492 in image processing, 590–591 quality metrics and, 598–616 silver halide and, 1304 Sheet-fed gravure, 460 Shielding, ground penetrating radar, 468 Shore A scale, gravure printing, 459 Short, 1052 Shortwave broadcast, radar and over-the-horizon (OTH) radar, 1142 Shortwave fadeout, 1145 Shot, 1052 Shot noise, flow imaging, 395 Show Scan, 1031 Shutter, 492–493, 1027–1028, 1036, 1052, 1351–1352 Shuttle Imaging Radar, 649 Shuttle Radar Topographic Mapping Mission (SRTM), 660 Sibilance, 1052 SiC, scanning capacitance microscope (SCM) analysis, 22 Sidebands, television, 1362, 1366, 1389 Signal coding (See also Encoding), 49–51, 61–62, 84 Signal detection, particle detector imaging, 1168 Signal-induced noise (SIN), lidar, 873 Signal levels, television, 1361–1362 Signal processing, in digital watermarking, 161, 171 in human vision and color vision, 516–518 in radar and over-the-horizon (OTH) radar, 1147, 1151–1152 signal propagation model, lightning locators, 937 signal to noise ratio, 50 signal to noise ratio (SNR), 60, 64–66, 74, 81, 84 in charged particle optics, 99 electron paramagnetic resonance (EPR) imaging for, 289 in flow imaging, 393, 394–397 in magnetic resonance imaging (MRI), 987, 996 in medical imaging, 749 in overhead surveillance systems, 794–795 in radar and over-the-horizon (OTH) radar, 1147, 1150 silver halide, 1303
in sound systems, 1388 in tomography, 1410 in X-ray telescopes, 1497 Signal transduction, in scanning electrochemical microscopy (SECM), 1249–1253 Silicon dioxide, scanning capacitance microscope (SCM) analysis, 23 Silicon drift detectors, particle detector imaging, 1167 Silicon nitride, scanning capacitance microscope (SCM) analysis, 23 Silicon photoconductors, 1204–1205 Silicon technology, 384–385 Silicon transistors, scanning capacitance microscope (SCM) analysis, 23 Silicon Video Corp. (SVC), 377 Silver assisted cleavage dye release, instant photography, 840, 851 Silver clusters and development, silver halide, 1280–1281 Silver Dye Bleach, 127 Silver halide, 140, 1259–1309, 1345, 1356–1357 art conservation and analysis using, 661–662 in color photography, 123, 125–126, 129–130 detectors using, 1259–1309 in holography, 509 in instant photography, 827, 830–833 in motion pictures, 1052 in photographic color display technology, 1208–1222 secondary ion mass spectroscopy (SIMS) analysis of, 484–486 Silver nitrate, 1345 Silver oxide, 381 SIMION, 482 Simulated images, in transmission electron microscopes (TEM), 271–273 Simulations, 769–771, 1282, 1327 Simultaneous autoregressive (SAR) model, 623–624 Single electron transfer dye release, 840, 851 Single frame exposure, 1052 Single hop mode, radar and over-the-horizon (OTH) radar, 1141 Single lens reflex cameras, 1349–1350 Single perforation film, 1052 Single photon emission computed tomography (SPECT), 743, 1310–1327 Single pixel image processing, 575–576
INDEX
Single poSitive imaging, gravure printing, 461 Single station lightning locators, 907–908 Single system sound, 1052 16 mm film, 1024–1025, 1052 Size distributions, 442 Size of image, 1074–77 Size of objects, 343–344, 686–688 Sketch interface in search and retrieval systems, 618 Skin depth, 229, 230 Skip frame, 1052 Skip zone, 1141 Skunk Works, 775, 776 Sky waves, 912 Slew rate, cathode ray tube (CRT), 179–180 Slides, microscope, 1124 Slitting, 1052 Slow motion, 1052 Smith, Willoughby, 1170 Smoothing, 577, 580–583, 593, 598–616, 755–756 Snell’s law, 234, 238, 468, 1076 Snellen chart, human vision, 747 Sobel operator, image processing, 582 Society for Information Display (SID), 818 Society of Motion Picture and Television Engineers (SMPTE), 102, 1052, 1374 Sodium arc lamps, 222 Sodium double, 222 Sodium lidar, 884–885 Soft, 1053 Soft light, 1053 Soil moisture mapping, geologic imaging, 656–659 Solar wind, magnetospheric imaging, 1002–1021 Solid state detectors (SSD), 1007, 1477 SOLLO lightning locators, 905, 908, 909 Sound drum, 1053 Sound editing, in motion pictures, 1035 Sound effects, 1053 Sound gate, 1053 Sound head, 1037–38, 1053 Sound navigation and ranging (SONAR), 1412 Sound pressure level (SPL), 3 Sound recorder, 1053 Sound speed, in motion pictures, 1028–1029 Sound sprocket, 1053 Sound systems in motion pictures, 1031, 1033, 1037
in television, 1362, 1365, 1388–1389, 1388 Sound track, 1053 Sounding, radar and over-the-horizon (OTH) radar, 1142 Source points, 251 Space-based imaging technology, astronomy science, 691 Space exploration, magnetospheric imaging, 1002–1021 Space Infrared Telescope Facility (SIRTF), 690, 691–692 spacers, in field emission displays (FED), 381–382, 386 Spallation Neutron Source (SNS), 1057 Sparrow resolution limit, 1087 Spatial domain, image processing, 577, 594 Spatial filters, 509, 1100 Spatial frequency, 248, 559–562, 565–566, 1098, 1103 Spatial frequency response (SFR), 50, 56–61, 62, 63, 66, 68, 70–74, 72, 79, 80 Spatial homogeneity, in cathode ray tube (CRT), 181–182 Spatial parallelism, 562, 564 Spatial relationship, in search and retrieval systems, 628–630 Spatial resolution, 151 in medical imaging, 750–752 in microscopy, 1136 in overhead surveillance systems, 789–790 in scanning capacitance microscope (SCM), 19 in X-ray fluorescence imaging, 1482–84 Spatial response (SR), 50, 54–56, 68, 78, 79, 80 Spatial uniformity, in ultrasonography, 1424 Spatial visual processing, human vision, 558–570 Special effect, 1034, 1053 Specimens, microscopy, 1108–1109 Speckle, in ultrasonography, 1420–1421 Spectra, microscopy, 1108 Spectra instant film, 847 Spectral filters, 55, 872 Spectral imaging, feature recognition and object classification, 356–357 Spectral lines, velocity analysis using, 685–686 Spectral luminosity function (SLF), 554–555 Spectral power density/distribution (SPD), 100, 524–527, 618
1543
Spectral purity, radar and over-the-horizon (OTH) radar, 1147 Spectral radiant exitance, 222 Spectral radiant flux density, 222 Spectral response, in feature recognition and object classification, 358 Spectral sensitization, silver halide, 1294–1299 Spectrometer, 239, 244, 481–482, 650, 787, 970, 1403 Spectroradiometer, 185–186, 524 Spectroscopy, 571 Constellation X mission, 1509 electron paramagnetic resonance (EPR) imaging for, 289 in endoscopy, 338 high-resolution secondary ion mass, 477–491 quantum terahertz biocavity spectroscopy, 1403 terahertz electric field imaging and, 1393–1404 Spectrum, 57, 1053 Specular surfaces, 234 Speed, 1053 Speed of film, 124, 139, 1023 Speed of light, 224–225 Spherical aberration, 92, 98, 1088–1089, 1117, 1123 Spherical waves, 214 Spherics, 890 Spider stop, microscopy, 1128 Spin angular momentum, 217 Spin density, in magnetic resonance imaging (MRI), 983–984 Spin echo in magnetic resonance imaging (MRI), 981, 992 in RF magnetic field mapping, 1125–1126 Spin states, atomic, 217 Spindt emitters, 385 Spindt technique, 385 Splice, 1053 Splicer, 1053 Splicing tape, 1053 Spline fits, thresholding and segmentation, 644–645 Split and merge techniques, thresholding and segmentation, 645–646 Splitters, holography, 509–509 Spontaneous emission, 253–254 Spontaneous Raman scattering, 411 Spool, 1053 Sporadic E, 1144 Sports medicine, force imaging, 422 SPOT high-resolution visible (HRV) imaging systems, 649, 655
1544
INDEX
Spotlight, 1053 Spotlights, radar, 1146 Sprockets, in projectors, 1036, 1053 Sputtering, 383, 478 Square root integral (SQRI), quality metrics, 606 Squarilium, 1179 SQUID sensors, analog and digital, 9–15 Stabilization of dyes, 831, 841–842 Staircase patterns, 75 Standard definition TV (SDTV), compression, 157 Standard illuminants, CIE, 103–104 Standard Observer, 618 Standards converters, television, 1373 Stanford Research Institute, 375 Stanford, Leland, 1022 Static electricity, 1053 Stationarity, 1085–1086 Statistical redundancy and compression, 150–156 Steadicam, 1031 Steering, 4–5 Stefan–Boltzmann law/constant, 222, 804 Steganography, digital watermarking and vs., 160 Step response function (SRF), 402 Stereo display technologies, 1327–1344 Stereo pairs, in three-dimensional imaging, 1329–1330 Stereo window, 1329 StereoGraphics three-dimensional imaging, 1333 StereoJet three-dimensional imaging, 1332 Stereolithography three-dimensional imaging, 1327 Stereomicroscope, 1106 Stereophonic, 1053 Stereoscopic vision, 566 Stiffness matrix, 627 Stiles–Crawford effect, 58, 513, 542, 552–554 Stiles–Holladay approximation, 548 Still photography, 491–494, 1344–58 Stimulated echo, in magnetic resonance imaging (MRI), 983 Stimulated emission, 223, 253, 254–255 Stock, 1053 Stop, 1053 Stop motion, 1053 Stops, 1354 Storage and retrieval systems art conservation and analysis using, 661–682
in forensic and criminology research, 716–717 Methodology for Art Reproduction in Color (MARC), 664 in motion pictures, 1038–39 Visual Arts System for Archiving and Retrieval of Images (VASARI), 663–664 Storage systems, secondary ion mass spectroscopy (SIMS), 482–484 Storm tracking, 655–656, 765–769 Storyboard, 1053 Stratospheric Observatory for Infrared Astronomy (SOFIA), 692–693 Streak cameras, 499–500 Strehl ratio, 543, 544, 1090 Strike filtering, gravity imaging, 452 Strip cameras, 500 Strip scintillation detectors, particle detector imaging, 1165–67 Stripe, magnetic, 1053 Stroboscopic photography, 492, 493–494 Structural dilation, 433 Structural erosion, 432–433 Subband/wavelet coding, compression, 154 Subcarriers, television, 1362, 1365 Subclustering, in feature recognition and object classification, 367–370 Subjective quality metrics, 602, 606–610 Sublimation printers, 189–190, 194–195 Subtraction, image processing, 590 Subtraction, Minkowski, 432, 612 Subtractive color, 833–841, 1053 Subtractive color matching, 102 Subtractive color mixing, 127–128, 139 Sulfite developer, color photography, 130 Sulfonamidonaphthol, 837, 838 Sulfonamidophenol, 838 Sulfoselenide, 301 Sulfur plus gold sensitization, silver halide, 1292–1293 Sulfur sensitization, silver halide, 1289–1292 Super xxx films, 1025 Super Panavision, 1053 Superadditivity, silver halide, 1303 Superconducting quantum interference devices (SQUIDs), 9–15, 976 Superconductors, 9–15, 484, 486–487, 1106 Superposition, 239–240 Superscope, 1053
Supersensitization, silver halide, 1298–1299 Supersonic flow, flow imaging, 409 Supertwisted nematic (STN) liquid crystal displays, 961–962 Surface acoustic waves (SAW), in scanning acoustic microscopy (SAM), 1236–1243 Surface stabilized ferroelectric LCD (SSFLC), 965 Surface wave radar, 1141 Surround function, 77, 79 Surround speakers, 1054 Surveillance et Alerte Foudre par Interferometrie Radioelectriquie (See SAFIR) Surveillance imaging in forensic and criminology research, 709, 714–715 overhead, 773–802 radar and over-the-horizon (OTH) radar, 1141–1153 SVGA video, in field emission displays (FED), 382 Swan, J.W., 455 Sweetening, 1054 Swell, 1054 Swiss PdbViewer, 708 SX70 instant film, 844–847 Symmetry, in compression, 152 Sync pulse, 1054, 1360 Sync sound, in motion pictures, 1033 Synchroballistic photography, 500–501 Synchronization high-speed photography and, 493 in motion pictures, 1054 in television, 1360, 1375 Synchronizer, 1054 Synchrotron radiation (SR), 221, 1476 Synthetic aperture radar (SAR), 356, 648, 789 Systeme Probatoire d’Observation de la Terre (SPOT), 779–780
T T-grain emulsion, 1054 T1/T2 relaxation, MRI, 983–984, 988–991 Tail ends, 1054 Take, 1054 Tamper detection, digital watermarking, 159 Tandem color printing, 328 Tape splice, 1054 Tapetum, 513 Taylor procedure, 3 Taylor series, 227 Technicolor, 1024 Technirama, 1031, 1054
INDEX
Techniscope, 1054 Telecine, 1054 Telemacro lens, 1354–1355 Telephotography, 59 Telephoto lens, 1347, 1354 Telescopes (See also Astronomy), 210, 1072 Apollo Telescope Mount, 1507 astronomy science and, 682–693 Atacama Large Millimeter Array (ALMA), 693 Chandra Observatory, 1508 Constellation X Observatory, 693, 1509 Einstein Observatory Telescope, 1507 Giant Segmented Mirror Telescope, 693 Kirkpatrick Baez telescopes, 1502–1503 lidar and, 871–872 limitations on, 688, 690–691 liquid mirror telescopes (LMT), 872 mirrors for, 691 multilayer telescopes, 1503 Next Generation Space Telescope (NGST), 693 in overhead surveillance systems, 783 ROSAT telescopes, 1507 Space Infrared Telescope Facility (SIRTF), 690, 691–692 Stratospheric Observatory for Infrared Astronomy (SOFIA), 692–693 Terrestrial Planet Finder, 693 thin mirror telescopes, 1501–1502 TRACE telescopes, 1507–1508 Very Large Array Radio Telescope, 693 Wolter, 1499–1501 X-ray Evolving Universe Satellite, 1509 X-ray interferometric telescopes, 1503–1504 X-ray telescope, 1495–1509 XMM Newton telescope, 1508 Television (See also Motion pictures; Video), 59, 1021 ATSC Digital Television Standard for, 1359, 1382–1389 black-and-white, 1359 broadcast transmission standards, 1359–1393 cathode ray tube (CRT) using, 47 chromaticity in, 148 component systems in, 148–150 compression in, 150–157
digital watermarking and, 146–148 digitized video and, 149–150 high-definition (HDTV), 41, 42, 47, 147, 151, 153, 157, 1039, 1047, 1382, 1390 image aspect ratio in, 147 image intensity in, 147–148 interlaced scanning in, 146–147 luminance in, 148 National Television System Committee (NTSC) standards for, 146–149 National Television Systems Committee (NTSC), 1359–1393 PAL standard for, 146–149, 1359–1393 progressive scanning in, 146–147 red green blue (RGB) system in, 147–148 scanning in, 146–148 SECAM standard for, 146–148, 1359–1393 standard definition TV (SDTV), 157 trichromatic color systems in, 147–148 Television and Infrared Observational Satellite (TIROS), 757, 777 Temperature photodetectors and, 1188–1190 in scanning acoustic microscopy (SAM), 1230 Temperature calculation, in infrared imaging, 814–815 Temperature effects, flow imaging, 411–416 Temperature mapping, in infrared imaging, 812–815 Temperature measurement, planar laser-induced fluorescence (PLIF), 863 Temperature, color, 103, 525 Tempone, 289 Temporal dependencies, 180, 184 Temporal homogeneity in cathode ray tube (CRT), 182 Temporal lobe, 569 Temporal resolution, 151, 1424 Terahertz electric field imaging, 1393–1404 Terminator, radar, 1145 TERRA, 659 Terrain correction, gravity imaging, 448 Terrestrial Planet Finder, 693 Test patterns, quality metrics, 603 Tetramethyl ammonium hydroxide (TMAH), 384
1545
Texas Instruments, 376–377 Text tag information, in search and retrieval systems, 617 Textile presses, 456 Textural gradient, 1328 Texture, in search and retrieval systems, 622–625 Texture processing, image processing, 583–584 Theatres, 1038 Thematic Mapper, 648, 653, 654, 657, 779 Thermal emission, 1176, 1184–1187 Thermal field emitter (TFE), 90 Thermal head, in dye transfer printing, 191–193 Thermal imaging, 810–811 Thermal Infrared Multispectral Scanner (TIMS), 650 Thermal radiation, 356 Thermal signatures, 803 Thermal sources, 222 Thermal transfer process, 189, 853 Thermally assisted fluorescence (THAF), 863 Thermionic emission, 223 Thermofax, 299 Thermograms, 802–817 Thermographic imaging, 851–854 Thermography, 802–817, 864–867 Thermoplastics, 509 Thiazolidine, 840, 841 Thickness extinction contours, 282 Thin-film technology, 383–384 in field emission displays (FED), 377, 379, 383–384 in liquid crystal displays, 957 in scanning acoustic microscopy (SAM) analysis for, 1228 Thin lens conjugate equation, 1078 Thin mirror telescopes, 1501–1502 Thin objects, in transmission electron microscopes (TEM), 270–271 Think Laboratories, 461 Thinker ImageBase, 617 Thiols, 135 Thiopyrilium, 1179 Thiosulfate bleaches, 139 35 mm film, 1022, 1024, 1054 Thomson scattering, 249–250, 256 Thread, 1054 Three-dimensional imaging, 1054, 1072, 1327–1344 in biochemical research, 694–708 Doppler radar and, 1465–1468 flow imaging and, 416–417 force imaging and, 424 ground penetrating radar and, 472–475, 476 human vision and, 566
1546
INDEX
Three-dimensional imaging, (continued ) in meteorological research, 772 in ultrasonography, 1433 Thresholding, 590, 584–589, 637–638 Thresholds, quality metrics, 609–610 Throw, 1054 Thunderstorm Sensor Series (TSS), 907, 922 Thunderstorms (See Lightning locators) Thyristor flash systems, 1348–1349 Tidal correction, gravity imaging, 447 Tight wind, 1054 Tilting, 4–5 Time delay and integration (TDI), 785, 1018 Time domain waveform analysis, 912–914 Time–energy uncertainty principle, 259 Time lapse, 1054 Time of arrival (TOA) lightning locators, 906–907, 935, 941–945 Time of flight imaging, 989–991, 1015–1016 Time parallel techniques, in three-dimensional imaging, 1331 Time projection chamber (TPC), particle detector imaging, 1160 Time sequence maps, electroencephalogram (EEG), 201–204 Time slice imaging, ground penetrating radar, 472–475 Time Zero film, 846–847 Timing, 184, 1054 Timing layer, in instant films, 832 Titanyl phthalocyanine (TiOPc), 1180–1181 Todd AO, 1031, 1054 Toe, 1054 Tomography, 1404–1411 flow imaging and, 416 ground penetrating radar and, 475–476 image formation in, 571 low resolution electromagnetic tomography (LORETA), 204–208 in medical imaging, 743 in radiographic imaging, 1068 single photon emission computed tomography (SPECT), 1310–1327 terahertz electric field imaging and, 1399–1400 Tone, 598–616, 1054
Tone burst wave mode, in scanning acoustic microscopy (SAM), 1231, 1233 Toner, in electrophotography, 301, 312, 313–315, 325–329 Top hat transforms, 430 Topographic imaging technology, 199–201 TOPS software, 708 Toroidal coils, deflection yoke, 41–42 Total internal reflectance (TIR), 238–239 Total scattering cross section, 250 Tournachon, Gaspard Felix, 773 TRACE telescopes, 1507–1508 TRACKERR, 1158 Tracking, radar and over-the-horizon (OTH) radar, 1148, 1152 Trailer, 1055 Trajectories, particle detector imaging, 1157 Trajectory effects, gravity imaging, 444 Tranceivers in magnetic resonance imaging (MRI), 999 in terahertz electric field imaging and, 1399–1400 Transducers, 1, 1418–1419, 1424–1429 Transfer function, 264, 575 Transfer process, in electrophotography, 322–324, 322 Transverse electromagnetic modes (TEM), 392 Transform coding, compression, 153–154 Transformation, compression, 153 Transfusion, in electrophotography, 322 Transistors, scanning capacitance microscope (SCM) analysis, 23 Transition, 1055 Transitions, atomic, 215 Translation invariant operators, 431–436 Transmission, 527–529, 548–550, 561, 1404 Transmission electron microscopes (TEM), 23, 87, 93, 262–274 Transmission grating, X-ray telescopes, 1505 Transmission holograms, 507–508 Transmission line model (TLM), 937 Transmittance, 51, 54, 58, 236–237, 527–529, 783, 803, 1072, 1095 Transmitters, lidar, 871–872 Transparency views, in three-dimensional imaging, 1333
Transverse chromatic aberration, 545 Transverse electric or magnetic waves, 235 Transverse magnification, 1076 Transverse viewing, in three-dimensional imaging, 1336 Transverse waves, 212 Trapping, scanning capacitance microscope (SCM) analysis, 23 Traps, silver halide, 1273–75 Traveling ionsopheric disturbance (TID), 1144 Traveling matte, 1055 Trellis coding, 1390 TREMBLE lightning locators, 909 Triangle, 1055 Triangulation, 571, 572–573, 908 Triarylmethyl radicals, 289 Triboelectrification, 944 Trichromatic color theory, television, 147–148 Trichromatic color vision, 567 Trichromatic receptors, human vision and color vision, 519 Tricolor image processing systems, 101–102 Triiodide, 1213 Trims, 1055 Trinitron electron gun, 40 Triphenylamine, 1179 Tripods, 1030 Tristimulus values, 102, 148, 531–534, 537 Tritanopia, 522–523 Tropical Rainfall Measuring Mission (TRMM), 660, 771–772, 890–904, 929, 932–935, 1473 Truck, 1055 Trucks, 1030 True color mode, cathode ray tube (CRT), 174 TSUPREM4 calibration, 28 Tube length, microscopy, 1115 Tungsten filaments, 222 Tungsten light, 1055 Tunics of human eye, 746 Turbulent flow, flow imaging, 405 Twinning, silver halide, 1266–1267 Twisted nematic (TN) liquid crystal displays, 959–961 Two-beam dynamic theory for crystals, 281–284, 281 Two-dimensional Fourier transforms, 1104–1105 Two-dimensional imaging backlight systems for, 1339 ground penetrating radar and, 471–472 in infrared imaging, 809
INDEX
in magnetic resonance imaging (MRI), 983, 987–988 in search and retrieval systems, 623, 628–630 terahertz electric field imaging and, 1398 Two-point resolution limit, 1087 Two-positive imaging, gravure printing, 461 Two-scale relations, 1446 Two-slit interference, 242–243 Type 500/600 instant films, 847 Type C videotape, 1055 Type K/T/U/Y or Z core, 1055
University of Chicago, secondary ion mass spectroscopy (SIMS) in (UC SIM), 478–479 Unmanned aerial vehicles (UAVs), 780 Unsharp masks, in forensic and criminology research, 725 Unsqueezed print, 1055 Upatnieks, J., 504 Upward continuation, gravity imaging, 451 Useful yield, secondary ion mass spectroscopy (SIMS), 477 UVW and U*V*W* coordinate systems, 108
U U space representation, 5–6 U2 aerial surveillance planes, 775–776 UC SIM, 478–484 Ultra high-frequency (UHF) television, 1362 Ultra Panavision, 1031 Ultramicroelectrodes (UME), 1248–1259 Ultrasonic cleaner, 1055 Ultrasonography, ultrasound, 1412–1435 in endoscopy, 338–340 image formation in, 571, 573 in magnetic resonance imaging (MRI) vs., 983 in medical imaging, 745 in scanning acoustic microscopy (SAM), 1228 Ultraviolet radiation, 218, 219, 239, 356, 1055 art conservation and analysis using, 661, 668–672 electron paramagnetic resonance (EPR) imaging for, 296 extreme ultraviolet imaging (EUV), 1005–06 far ultraviolet imaging of proton/electron auroras, 1016–1020 fluorescence microscopy, 1135–37 gravure printing, 461 photodetectors and, 1196 radar and over-the-horizon (OTH) radar, 1143 Ultraviolet catastrophe, 211 Uncertainty principle, 215, 259 Uncrossed viewing, in three-dimensional imaging, 1336 Undulator magnet, 221 Uniform Chromaticity Scale (UCS), 535 Universal leader, 1055
V V number (See also Abbe number), 234 Vacuum and wave equation, 212 Value, 103 Valve rollers, 1055 Van Allen belts, energetic neutral atom (ENA) imaging, 1006–1010 Van Dyke Company, 456 Variable area sound track, 1055 Variable density sound track, 1055 Variable length coding, 1388 Varifocal mirrors, in three-dimensional imaging, 1342–1343 Vectograph three-dimensional imaging, 1331 Vector quantization, compression, 154, 633 Vegetation, geologic imaging, 653 Velocimetry, 413–416 Velocity effects, 228, 764 flow imaging and, 413–416 planar laser-induced fluorescence (PLIF) in, 863 spectral line analysis of, 685–686 Verifax, 299 Versatec, 299 Vertical derivatives, gravity imaging, 452 Vertical disparity, in three-dimensional imaging, 1329 Vertical interval time code (VITC), 1055 Very high-frequency (VHF) television, 1362 Very Large Array Radio Telescope, 693 Vesicular films, art conservation and analysis using, 662 Vestigial sideband (VSB) television, 1362, 1389
1547
VGA video, in field emission displays (FED), 382 Vibration, molecular, 216 Vibrational relaxation, 255 Video (See also Motion pictures; Television), 1385–1388 authentication techniques, 740 cameras for, 1029–31 component video standards, 1380–82 compressed video, 1385–86 digital (See Digital video) Digital Video Broadcast (DVB), 1392 in forensic and criminology research, 709–714 format conversion in, 720–722 group of pictures (GOP) in, 1386 high-speed photography and, 498–499 I, P, and B frames in, 1387 photoconductors cameras, 1174 Polachrome, 848 Polavision, 848 surveillance imaging using, 714–715 Video assist, 1031 VideoDisc, 16 Videophone, 156–157, 156 Videotape editing, 1035 Viewer, 1055 Viewfinders, 1029, 1346 Viewing angle, 387, 967 Viewing distance, 1347 ViewMaster, 1330, 1331 Vignetting, 1055 Virtual image, 1040, 1328, 1330 Virtual phase CCD (VPCCD), 1198 Virtual states, 259 Virtual transitions, 258–259 Visible light, 356, 1072 Visible Human Project, search and retrieval systems, 616–617 Visible light, 218, 219, 665–666, 782, 803, 1393 Vision tests, 747 VisionDome, 1335–1336 VistaVision, 1031 Visual angle, human vision, 747 Visual areas, 516, 518, 563, 565, 569 Visual Arts System for Archiving and Retrieval of Images (VASARI), 663–664 Visual cortex, 65, 72, 563–570 Visual field, human vision, 566 Visual field mapping, in magnetic field imaging, 975 Visual information rate, 50, 73–74 Visual magnification, 1077 Visual quality, 74–75 Visualization technology, 773, 1327
1548
INDEX
VisualSEEk, 618, 621–622, 624, 627, 629, 630 Vitascope, 1022 Voice over, in motion pictures, 1033, 1055 Voids, 604 Volcanic activity, geologic imaging, 651 Volume grating, holography, 508 Volume imaging, holography, 507–508 Volume Imaging Lidar, in meteorological research, 769 Volumetric displays, in three-dimensional imaging, 1341–1343 von Ardenne, Manfred, 262 von Laue interference function, 279 von Uchatius, Franz, 1022 VORTEX radar, 1471 VREX micropolarizers, in three-dimensional imaging, 1334–1335
W Wall eyed, in three-dimensional imaging, 1330 Warm up, liquid crystal displays (LCDs), 184 Waste management, geologic imaging, 656–659 Water Cerenkov counters, particle detector imaging, 1162 Watermarking, digital (See Digital watermarking) Watershed transforms, 430, 587, 646 Watts, 524 Wave aberration, 542–544 Wave equation, in transmission electron microscopes (TEM), 212 Wave fronts, 212, 243, 1083, 1084, 1086, 1090 Wave number, 213 Wave propagation, 220–231 in ground penetrating radar, 466, 467–469 Wave vector transfer (Q), 251 Wave vs. particle behavior of light, 210–211 Waveform monitors, television, 1361 Waveform repetition frequency (WRF), radar and over-the-horizon (OTH) radar, 1147, 1150 Waveforms, 212, 285, 1147, 1150–1151 Waveguides, radar, 1452 Wavelength, 212, 1072, 1109 Wavelength analysis, 448 Wavelet coding, compression, 154
Wavelet transforms, 1444–1450 Wavelets, 243, 622 Wax melt printers, 190–191, 195 Weak beam images, in transmission electron microscopes (TEM), 270 Weather radar, 1450–74 Weave, 1055 Web-fed presses, gravure printing, 459–460, 459 Weber–Fechner law, 747 Weber fractions, quality metrics, 611 Weber’s law, 611 Wedgewood, Josiah, 1345 Weighting, amplitude, 3 Wet-gate printer, 1055 WHAT IF software, 708 Whisk broom scanners, 806 White, 219 White balance, 520 White field response, flow imaging, 397 White light, 219 White point normalization, 533 White uniformity, cathode ray tube (CRT), 35 White, reference, 102 Whole field imaging, 1072 Wide-angle lens, 1347, 1354 Wide-screen, 1031, 1055 Wien displacement law, 222, 803 Wiener filters, 49, 69, 71, 73, 76, 80, 167, 1322 Wiener matrix, 60 Wiener restoration, 68, 70–71, 74–75, 82 Wiggler magnet, 221 Wild, 1055 Wind profiling radar, 1149, 1469–1471 Winding, 1055 Window thermal testing, using infrared imaging, 815 Wipe, 1055 Wire chamber scintillation tracking, 1159–1162, 1168 WKB approximation, in charged particle optics, 89 Wold components, in search and retrieval systems, 623 Wollaston prisms, 1106 Wolter telescopes, 1499–1501 Work print, 1056 Working distance, microscopy, 1116 WorkWall three-dimensional imaging, 1335 World Geodetic System, 445 Wright, Wilbur, 773 Write black, in electrophotography, 317
Write gates, SQUID sensors, 13–14 Write white, in electrophotography, 317 Wynne, Klass, 1400–01
X X-ray analysis (EDX), 262, 478 X-ray astronomy, 219 X-ray crystallography, 696–699 X-ray diffractometers, 244 X-ray Evolving Universe Satellite, 1509 X-ray fluorescence (XRF), 676–677, 1475–1495 X-ray interferometric telescopes, 1503–04 X-ray telescope, 239, 1495–1509 X-ray telescopes, 239 X-ray transform, 1404 X-rays, 210–211, 218, 219, 221, 224, 239, 242, 249, 256–260, 272, 350, 590, 803, 1067, 1393 Array of Low Energy X Ray Imaging Sensors (ALEXIS), 905, 929 art conservation and analysis using, 672–680 astronomy science and, 683, 688 in biochemical research, 694, 696–699, 705 Bragg reflection in, 244 image formation in, 572 in medical imaging, 743, 745, 748, 752–753, 756 non-silver output in, 676 phosphor thermography, 865 photodetectors and, 1197 radar and over-the-horizon (OTH) radar, 1143 sources of, 1477 in tomography, 1404 X-ray Evolving Universe Satellite, 1509 X-ray fluorescence imaging, 1475–1495 X-ray interferometric telescopes, 1503–1504 X-ray telescope, 1495–1509 Xenon arc, 1056 Xenon lamps, 1037 Xerography (See also Electrophotography), 299, 1174 Xeroradiography, 312 Xerox copiers, 574 Xerox Corporation, 299 Xerox Docu Tech, 300, 302, 303 XMM Newton telescope, 1508 XYZ coordinate system, 107, 532, 537, 619
INDEX
Y
Z
Yellow, 1056 Yellow couplers, color photography, 134 YIQ coordinate system, 106–107, 149 Young, Thomas, 242–243 Yttrium aluminum garnet (YAG) laser, 391 YUV coordinate system, 106–107, 149
Z contrast imaging, 277 Z dependence, 478 Zeeman effect, 217, 218 Zeeman states, 217 Zeiss, Carl, 1106 Zernicke moments, 627 Zernike polynomials, 542–543 Zernike, Frits, 1106, 1128
1549
Zero frame reference mark, 1056 Zero padding, 68 Zero power (afocal) systems, 1074, 1079 Zinc oxide, 381 Zoetropic effect, 1022 Zoom in/out, 1056 Zoom lens, 1029 Zoylacetanilides, 134 Zweiton, 1365 Zwitterions, 124–125