Signal Processing and Performance Analysis for Imaging Systems
For a listing of recent titles in the Artech House Optoelectronics Series, turn to the back of this book.
Signal Processing and Performance Analysis for Imaging Systems S. Susan Young Ronald G. Driggers Eddie L. Jacobs
artechhouse.com
Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the U.S. Library of Congress.
British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library.
ISBN-13: 978-1-59693-287-6
Cover design by Igor Valdman
© 2008 ARTECH HOUSE, INC. 685 Canton Street Norwood, MA 02062 All rights reserved. Printed and bound in the United States of America. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher. All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Artech House cannot attest to the accuracy of this information. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark.
10 9 8 7 6 5 4 3 2 1
To our families
Contents Preface
xiii
PART I Basic Principles of Imaging Systems and Performance
1
CHAPTER 1 Introduction
3
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10
“Combined” Imaging System Performance Imaging Performance Signal Processing: Basic Principles and Advanced Applications Image Resampling Super-Resolution Image Reconstruction Image Restoration—Deblurring Image Contrast Enhancement Nonuniformity Correction (NUC) Tone Scale Image Fusion References
CHAPTER 2 Imaging Systems 2.1 Basic Imaging Systems 2.2 Resolution and Sensitivity 2.3 Linear Shift-Invariant (LSI) Imaging Systems 2.4 Imaging System Point Spread Function and Modulation Transfer Function 2.4.1 Optical Filtering 2.4.2 Detector Spatial Filters 2.4.3 Electronics Filtering 2.4.4 Display Filtering 2.4.5 Human Eye 2.4.6 Overall Image Transfer 2.5 Sampled Imaging Systems 2.6 Signal-to-Noise Ratio 2.7 Electro-Optical and Infrared Imaging Systems 2.8 Summary References
3 3 4 4 5 6 7 7 8 8 10
11 11 15 16 20 21 22 24 25 26 27 28 34 38 39 39
vii
viii
Contents
CHAPTER 3 Target Acquisition and Image Quality
41
3.1 Introduction 3.2 A Brief History of Target Acquisition Theory 3.3 Threshold Vision 3.3.1 Threshold Vision of the Unaided Eye 3.3.2 Threshold Vision of the Aided Eye 3.4 Image Quality Metric 3.5 Example 3.6 Summary References
41 41 43 43 47 50 53 61 61
PART II Basic Principles of Signal Processing
63
CHAPTER 4 Basic Principles of Signal and Image Processing
65
4.1 Introduction 4.2 The Fourier Transform 4.2.1 One-Dimensional Fourier Transform 4.2.2 Two-Dimensional Fourier Transform 4.3 Finite Impulse Response Filters 4.3.1 Definition of Nonrecursive and Recursive Filters 4.3.2 Implementation of FIR Filters 4.3.3 Shortcomings of FIR Filters 4.4 Fourier-Based Filters 4.4.1 Radially Symmetric Filter with a Gaussian Window 4.4.2 Radially Symmetric Filter with a Hamming Window at a Transition Point 4.4.3 Radially Symmetric Filter with a Butterworth Window at a Transition Point 4.4.4 Radially Symmetric Filter with a Power Window 4.4.5 Performance Comparison of Fourier-Based Filters 4.5 The Wavelet Transform 4.5.1 Time-Frequency Wavelet Analysis 4.5.2 Dyadic and Discrete Wavelet Transform 4.5.3 Condition of Constructing a Wavelet Transform 4.5.4 Forward and Inverse Wavelet Transform 4.5.5 Two-Dimensional Wavelet Transform 4.5.6 Multiscale Edge Detection 4.6 Summary References
65 65 65 78 83 83 84 85 86 87
88 89 90 90 91 96 97 97 98 98 102 102
PART III Advanced Applications
105
87
Contents
ix
CHAPTER 5 Image Resampling
107
5.1 Introduction 5.2 Image Display, Reconstruction, and Resampling 5.3 Sampling Theory and Sampling Artifacts 5.3.1 Sampling Theory 5.3.2 Sampling Artifacts 5.4 Image Resampling Using Spatial Domain Methods 5.4.1 Image Resampling Model 5.4.2 Image Rescale Implementation 5.4.3 Resampling Filters 5.5 Antialias Image Resampling Using Fourier-Based Methods 5.5.1 Image Resampling Model 5.5.2 Image Rescale Implementation 5.5.3 Resampling System Design 5.5.4 Resampling Filters 5.5.5 Resampling Filters Performance Analysis 5.6 Image Resampling Performance Measurements 5.7 Summary References
107 107 109 109 110 111 111 112 112 114 114 115 117 118 119 125 127 127
CHAPTER 6 Super-Resolution
129
6.1 Introduction 6.1.1 The Meaning of Super-Resolution 6.1.2 Super-Resolution for Diffraction and Sampling 6.1.3 Proposed Nomenclature by IEEE 6.2 Super-Resolution Image Restoration 6.3 Super-Resolution Image Reconstruction 6.3.1 Background 6.3.2 Overview of the Super-Resolution Reconstruction Algorithm 6.3.3 Image Acquisition—Microdither Scanner Versus Natural Jitter 6.3.4 Subpixel Shift Estimation 6.3.5 Motion Estimation 6.3.6 High-Resolution Output Image Reconstruction 6.4 Super-Resolution Imager Performance Measurements 6.4.1 Background 6.4.2 Experimental Approach 6.4.3 Measurement Results 6.5 Sensors That Benefit from Super-Resolution Reconstruction 6.5.1 Example and Performance Estimates 6.6 Performance Modeling and Prediction of Super-Resolution Reconstruction 6.7 Summary References
129 129 129 130 130 131 131 132 132 133 135 143 158 158 159 166 167 168 172 173 174
x
Contents
CHAPTER 7 Image Deblurring 7.1 7.2 7.3 7.4 7.5 7.6
179
Introduction Regularization Methods Wiener Filter Van Cittert Filter CLEAN Algorithm P-Deblurring Filter 7.6.1 Definition of the P-Deblurring Filter 7.6.2 Properties of the P-Deblurring Filter 7.6.3 P-Deblurring Filter Design 7.7 Image Deblurring Performance Measurements 7.7.1 Experimental Approach 7.7.2 Perception Experiment Result Analysis 7.8 Summary References
179 181 181 182 183 184 185 186 188 199 200 203 204 204
CHAPTER 8 Image Contrast Enhancement
207
8.1 Introduction 8.2 Single-Scale Process 8.2.1 Contrast Stretching 8.2.2 Histogram Modification 8.2.3 Region-Growing Method 8.3 Multiscale Process 8.3.1 Multiresolution Analysis 8.3.2 Contrast Enhancement Based on Unsharp Masking 8.3.3 Contrast Enhancement Based on Wavelet Edges 8.4 Contrast Enhancement Image Performance Measurements 8.4.1 Background 8.4.2 Time Limited Search Model 8.4.3 Experimental Approach 8.4.4 Results 8.4.5 Analysis 8.4.6 Discussion 8.5 Summary References
207 208 208 209 209 209 210 210 211 217 217 218 219 222 223 226 227 228
CHAPTER 9 Nonuniformity Correction
231
9.1 Detector Nonuniformity 9.2 Linear Correction and the Effects of Nonlinearity 9.2.1 Linear Correction Model 9.2.2 Effects of Nonlinearity 9.3 Adaptive NUC 9.3.1 Temporal Processing 9.3.2 Spatio-Temporal Processing
231 232 233 233 238 238 240
Contents
xi
9.4 Imaging System Performance with Fixed-Pattern Noise 9.5 Summary References
243 244 245
CHAPTER 10 Tone Scale
247
10.1 Introduction 10.2 Piece-Wise Linear Tone Scale 10.3 Nonlinear Tone Scale 10.3.1 Gamma Correction 10.3.2 Look-Up Tables 10.4 Perceptual Linearization Tone Scale 10.5 Application of Tone Scale to Enhanced Visualization in Radiation Treatment 10.5.1 Portal Image in Radiation Treatment 10.5.2 Locating and Labeling the Radiation and Collimation Fields 10.5.3 Design of the Tone Scale Curves 10.5.4 Contrast Enhancement 10.5.5 Producing the Output Image 10.6 Tone Scale Performance Example 10.7 Summary References
247 248 250 250 252 252 255 255 257 257 262 264 264 266 267
CHAPTER 11 Image Fusion
269
11.1 Introduction 11.2 Objectives for Image Fusion 11.3 Image Fusion Algorithms 11.3.1 Superposition 11.3.2 Laplacian Pyramid 11.3.3 Ratio of a Lowpass Pyramid 11.3.4 Perceptual-Based Multiscale Decomposition 11.3.5 Discrete Wavelet Transform 11.4 Benefits of Multiple Image Modes 11.5 Image Fusion Quality Metrics 11.5.1 Mean Squared Error 11.5.2 Peak Signal-to-Noise Ratio 11.5.3 Mutual Information 11.5.4 Image Quality Index by Wang and Bovik 11.5.5 Image Fusion Quality Index by Piella and Heijmans 11.5.6 Xydeas and Petrovic Metric 11.6 Imaging System Performance with Image Fusion 11.7 Summary References
269 270 271 272 272 275 276 278 280 281 282 283 283 283 284 285 286 290 290
About the Authors
293
Index
295
Preface In today’s consumer electronics market where a 5-megapixel camera is no longer considered state-of-the-art, signal and image processing algorithms are real-time and widely used. They stabilize images, provide super-resolution, adjust for detector nonuniformities, reduce noise and blur, and generally improve camera performance for those of us who are not professional photographers. Most of these signal and image processing techniques are company proprietary and the details of these techniques are never revealed to outside scientists and engineers. In addition, it is not necessary for the performance of these systems (including the algorithms) to be determined since the metric of success is whether the consumer likes the product and buys the device. In other imaging communities such as military imaging systems (which, at a minimum, include visible, image intensifiers, and infrared) and medical imaging devices, it is extremely important to determine the performance of the imaging system, including the signal and image processing techniques. In military imaging systems that involve target acquisition and surveillance/reconnaissance, the performance of an imaging system determines how effective the warfighter can accomplish his or her mission. In medical systems, the imaging system performance determines how accurately a diagnosis can be provided. Signal and image processing plays a key role in the performance of these imaging systems and, in the past 5 to 10 years, has become a key contributor to increased imaging system performance. There is a great deal of government funding in signal and image processing for imaging system performance and the literature is full of university and government laboratory developed algorithms. There are still a great number of industry algorithms that, overall, are considered company proprietary. We focus on those in the literature and those algorithms that can be generalized in a nonproprietary manner. There are numerous books in the literature on signal and image processing techniques, algorithms, and methods. The majority of these books emphasize the mathematics of image processing and how they are applied to image information. Very few of the books address the overall imaging system performance when signal and image processing is considered a component of the imaging system. Likewise, there are many books in the area of imaging system performance that consider the optics, the detector, and the displays in the system and how the system performance behaves with changes or modifications of these components. There is very little book content where signal and imager processing is included as a component of the overall imaging system performance. This is the gap that we have attempted to fill with this book. While algorithm development has exploded in the past 5 to 10 years,
xiii
xiv
Preface
the system performance aspects are relatively new and not quite fully understood. While the focus of this book is to help the scientist and engineer begin to understand that these algorithms are really an imaging system component and help in the system performance prediction of imaging systems with these algorithms, the performance material is new and will undergo dramatic improvements in the next 5 years. We have chosen to address signal and image processing techniques that are not new, but the real time implementation in military and medical systems are relatively new and the performance predication of systems with these algorithms are definitely new. There are some algorithms that are not addressed such as electronic stabilization and turbulence correction. There are current programs in algorithm development that will provide great advances in algorithm performance in the next few years, so we decided not to spend time on these particular areas. It is worth mentioning that there is a community called “computational imaging” where, instead of using signal/image processing to improve the performance of an existing imaging system approach, signal processing is an inherent part of the electro-optical design process for image formation. The field includes unconventional imaging systems and unconventional processing, where the performance of the collective system design is beyond any conventional system approach. In many cases, the resulting image is not important. The goal of the field is to maximize system task performance for a given electro-optical application using nonconventional design rules (with signal processing and electro-optical components) through the exploitation of various degrees of freedom (space, time, spectrum, polarization, dynamic range, and so forth). Leaders in this field include Dennis Healey at DARPA, Ravi Athale at MITRE, Joe Mait at the Army Research Laboratory, Mark Mirotznick at Catholic University, and Dave Brady at Duke University. These researchers and others are forging a new path for the rest of us and have provided some very stimulating experiments and demonstrations in the past 2 or 3 years. We do not address computational imaging in this book, as the design and approach methods are still a matter of research and, as always, it will be some time before system performance is addressed in a quantitative manner. We would like to thank a number of people for their thoughtful assistance in this work. Dr. Patti Gillespie at the Army Research Laboratory provided inspiration and encouragement for the project. Rich Vollmerhausen has contributed more to military imaging system performance modeling over the past 10 years than any other researcher, and his help was critical to the success of the project. Keith Krapels and Jonathan Fanning both assisted with the super-resolution work. Khoa Dang, Mike Prarie, Richard Moore, Chris Howell, Stephen Burks, and Carl Halford contributed material for the fusion chapter. There are many others who worked signal processing issues and with whom we collaborated through research papers to include: Nicole Devitt, Tana Maurer, Richard Espinola, Patrick O’Shea, Brian Teaney, Louis Larsen, Jim Waterman, Leslie Smith, Jerry Holst, Gene Tener, Jennifer Parks, Dean Scribner, Jonathan Schuler, Penny Warren, Alan Silver, Jim Howe, Jim Hilger, and Phil Perconti. We are grateful for the contributions that all of these people have provided over the years. We (S. Susan Young and Eddie Jacobs) would like to thank our coauthor, Dr. Ronald G. Driggers for his suggestion of writing this book and encouragement in this venture. Our understanding and appreciation of system performance significance started from collaborating with him. S. Susan Young would like to thank Dr.
xv
Hsien-Che Lee for his guidance and help early in her career in signal and image processing. On a personal side, we authors are very thankful to our families for their support and understanding.
PART I
Basic Principles of Imaging Systems and Performance
CHAPTER 1
Introduction 1.1
“Combined” Imaging System Performance The “combined” imaging system performance of both hardware (sensor) and software (signal processing) is extremely important. Imaging system hardware is designed primarily to form a high-quality image from source emissions under a large variety of environmental conditions. Signal processing is used to help highlight or extract information from the images that are generated from an imaging system. This processing can be automated for decision-making purposes or it can be utilized to enhance the visual acuity of a human looking through the imaging system. Performance measures of an imaging system have been excellent methods for better design and understanding of the imaging system. However, the imaging performance of an imaging system with the aid of signal processing has not been widely considered in the light of improving image quality from imaging systems and signal processing algorithms. Imaging systems can generate images with low-contrast, high-noise, blurring, or corrupted/lost high-frequency details, among others. How does the image performance of a low-cost imaging system with the aid of signal processing compare with the one of an expensive imaging system? Is it worth investing in higher image quality by improving the imaging system hardware or by developing the signal processing software? The topic of this book is to relate the ability of extracting information from an imaging system with the aid of signal processing to evaluate the overall performance of imaging systems.
1.2
Imaging Performance Understanding the image formation and recording process helps in understanding the factors that affect image performance and therefore helps the design of imaging systems and signal processing algorithms. The image formation process and the sources of image degradation, such as loss of useful high-frequency details, noise, or low-contrast target environment, are discussed in Chapter 2. Methods of determining image performance are important tools in determining the merits of imaging systems and signal processing algorithms. Image performance determination can be performed via subjective human perception studies or image performance modeling. Image performance prediction and the role of image performance modeling are also discussed in Chapter 3.
3
4
1.3
Introduction
Signal Processing: Basic Principles and Advanced Applications The basic signal processing principles, including Fourier transform, wavelet transform, finite impulse response (FIR) filters, and Fourier-based filters, are discussed in Chapter 4. In an image formation and recording process, many factors affect sensor performance and image quality, and these can result in loss of high-frequency information or low contrast in an image. Several common causes of low image quality are the following: •
•
•
•
Many low-cost visible and thermal sensors spatially or electronically undersample an image. Undersampling results in aliased imagery in which subtle/detailed information (high-frequency components) is lost in these images. An imaging system’s blurring function (sometimes called the point spread function, or PSF) is another common factor in the reduction of high-frequency components in the acquired imagery and results in blurred images. Low-cost sensors and environmental factors, such as lighting sources or background complexities, result in low-contrast images. Focal plan array (FPA) sensors have detector-to-detector variability in the FPA fabrication process and cause the fixed-pattern noise in the acquired imagery.
There are many signal processing applications for the enhancement of imaging system performance. Most of them attempt to enhance the image quality or remove the degradation phenomena. Specifically, these applications try to recover the useful high-frequency components that are lost or corrupted in the image and attempt to suppress the undesired high-frequency components, which are noises. In Chapters 5 to 11, the following classes of signal processing applications are considered: 1. 2. 3. 4. 5. 6. 7.
1.4
Image resampling; Super-resolution image reconstruction; Image restoration—deblurring; Image contrast enhancement; Nonuniformity correction (NUC); Tone scale; Image fusion.
Image Resampling The concept of image resampling originates from the sampled imager. The discussion in this chapter relates image resampling with image display and reconstruction from sampled points of one single image. These topics provide the reader with a fundamental understanding that the way an image is processed and displayed is just as important as the blur and sampling characteristics of the sensor. It also provides a background for undersampled imaging for discussion on super-resolution image reconstruction in the following chapter. In signal processing, image resampling is
1.5 Super-Resolution Image Reconstruction
5
also called image decimation, or image interpolation, according to whether the goal is to reduce or enlarge the size (or resolution) of a captured image. It can provide the image values that are not recorded by the imaging system, but are calculated from the neighboring pixels. Image resampling does not increase the inherent information content in the image, but poor image display reconstruction function can reduce the overall imaging system performance. The image resampling algorithms include spatial and spatial-frequency domain, or Fourier-based windowing, methods. The important considerations in image resampling include the image resampling model, image rescale implementation, and resampling filters, especially the anti-aliasing image resampling filter. These algorithms, examples, and image resampling performance measurements are discussed in Chapter 5.
1.5
Super-Resolution Image Reconstruction The loss of high-frequency information in an image could be due to many factors. Many low-cost visible and thermal sensors spatially or electronically undersample an image. Undersampling results in aliased imagery in which the high-frequency components are folded into the low-frequency components in the image. Consequently, subtle/detailed information (high-frequency components) is lost in these images. Super-resolution image reconstruction can produce high-resolution images by using the existing low-cost imaging devices from a sequence, or a few snapshots, of low-resolution images. Since undersampled images have subpixel shifts between successive frames, they represent different information from the same scene. Therefore, the information that is contained in an undersampled image sequence can be combined to obtain an alias-free (high-resolution) image. Super-resolution image reconstruction from multiple snapshots provides far more detail information than any interpolated image from a single snapshot. Figure 1.1 shows an example of a high-resolution (alias-free) infrared image that is obtained from a sequence of low-resolution (aliased) input images having subpixel shifts among them.
(b) (a)
Figure 1.1 Example of super-resolution image reconstruction: (a) input sequence of aliased infrared images having subpixel shifts among them; and (b) output alias-free (high-resolution) image in which the details of tree branches are revealed.
6
Introduction
The first step in a super-resolution image reconstruction algorithm is to estimate the supixel shifts of each frame with respect to a reference frame. The second step is to increase the effective spatial sampling by operating on a sequence of low-resolution subpixel-shifted images. There are also spatial and spatial frequency domain methods for the subpixel shift estimation and the generation of the high-resolution output images. These algorithms, examples, and the image performance are discussed in Chapter 6.
1.6
Image Restoration—Deblurring An imaging system’s blurring function, also called the point spread function (PSF), is another common factor in the reduction of high-frequency components in the image. Image restoration tries to inverse this blurring degradation phenomenon, but within the bandlimit of the imager (i.e., it enhances the spatial frequencies within the imager band). This includes deblurring images that are degraded by the limitations of a sensor or environment. The estimate or knowledge of the blurring function is essential to the application of these algorithms. One of the most important considerations of designing a deblurring filter is to control noise, since the noise is likely amplified at high frequencies. The amplification of noise results in undesired artifacts in the output image. Figure 1.2 shows examples of image deblurring. One input image [Figure 1.2(a)] contains the blur, while the deblurred version of it [Figure 1.2(b)] removes the most blur. Another input image [Figure 1.2(c)] contains the blur and noise; the noise effect illustrates on the deblurred version of it [Figure 1.2(d)]. Image restoration tries to recover the high-frequency information below the diffraction limit while limiting the noise artifacts. The designs of deblurring filters, the
(a)
(b)
(c)
(d)
Figure 1.2 Examples of image deblurring: (a) blurred bar image; (b) deblurred version of (a); (c) blurred bar image with noise added; and (d) deblurred version of (c).
1.7 Image Contrast Enhancement
7
noise control mechanisms, examples, and image performance are discussed in Chapter 7.
1.7
Image Contrast Enhancement Image details can also be enhanced by image contrast enhancement techniques in which certain image edges are emphasized as desired. For an example of a medical application in diagnosing breast cancer from mammograms, radiologists follow the ductal networks to look for abnormalities. However, the number of ducts and the shape of ductal branches vary with individuals, which make the visual process of locating the ducts difficult. The image contrast enhancement provides the ability to enhance the appearance of the ductal elements relative to the fatty-tissue surroundings, which helps radiologists to visualize abnormalities in mammograms. Image contrast enhancement methods can be divided into single-scale approach and multiscale approach. In the single-scale approach, the image is processed in the original image domain, such as a simple look-up table. In the multiscale approach, the image is decomposed into multiple resolution scales, and processing is performed in the multiscale domain. Because the information at each scale is adjusted before the image is reconstructed back to the original image intensity domain, the output image contains the desired detail information. The multiscale approach can also be coupled with the dynamic range reduction. Therefore, the detail information in different scales can be displayed in one output image. Localized contrast enhancement (LCE) is the process in which these techniques are applied on a local scale for the management of dynamic range in the image. For example, the sky-to-ground interface in infrared imaging can include a huge apparent temperature difference that occupies most of the image dynamic range. Small targets with smaller signals can be lost, while LCE can reduce the large sky-to-ground interface signal and enhance small target signals (see Figure 8.10 later in this book). Details of the algorithms, examples, and image performance are discussed in Chapter 8.
1.8
Nonuniformity Correction (NUC) Focal plan array (FPA) sensors have been used in many commercial and military applications, including both visible and infrared imaging systems, since they have wide spectral responses, compact structures, and cost-effective production. However, each individual photodetector in the FPA has a different photoresponse, due to detector-to-detector variability in the FPA fabrication process [1]. Images that are acquired by an FPA sensor suffer from a common problem known as fixed-pattern noise, or spatial nonuniformity. The technique to compensate for this distortion is called nonuniformity correction (NUC). Figure 1.3 shows an example of a nonuniformity corrected image from an original input image with the fixed-pattern noise. There are two main categories of NUC algorithms, namely, calibration-based and scene-adaptive algorithms. A conventional, calibration-based NUC is the standard two-point calibration, which is also called linear NUC. This algorithm esti-
8
Introduction
(a)
(b)
Figure 1.3 Example of nonuniformity correction: (a) input image with the fixed-pattern noise shown in the image; and (b) nonuniformity corrected image in which the helicopter in the center is clearly illustrated.
mates the gain and offset parameters by exposing the FPA to two distinct and uniform irradiance levels. The scene-adaptive NUC uses the data acquired in the video sequence and a motion estimation algorithm to register each point in the scene across all of the image frames. This way, continuous compensation can be applied adaptively for individual detector responses and background changes. These algorithms, examples, and imaging system performance are discussed in Chapter 9.
1.9
Tone Scale Tone scale is a technique that improves the image presentation on an output display medium (softcopy display or hardcopy print). Tone scale is also a mathematical mapping of the image pixel values from the sensor to a region of interest on an output medium. Note that tone scale transforms improve only the appearance of the image, but not the image quality itself. The gray value resolution is still the same. However, a proper tone scale allows the characteristic curve of a display system to match the sensitivity of the human eye to enhance the image interpretation task performance. There are various tone scale techniques, including piece-wise linear tone scale, nonlinear tone scale, and perceptual linearization tone scale. These techniques and a tone scale performance example are discussed in Chapter 10.
1.10
Image Fusion Because researchers realize that different sensors provide different signature cues of the scene, image fusion has been receiving additional attention in signal processing. Some of those applications are shown to benefit from fusing the images of multiple sensors. Imaging sensor characteristics are determined by the wavebands that they respond to in the electromagnetic spectrum. Figure 1.4 is a diagram of the electromagnetic spectrum with wavelength indicated in metric length units [2]. The most familiar classifications of wavebands are the radiowave, microwave, infrared, visible, ultraviolet, X-ray, and gamma-ray wavebands. Figure 1.5 shows further subdivided wavebands for broadband sensors [3]. For example, the infrared waveband is
1.10 Image Fusion
9 λ wavelength (meters) −13 10 Gamma rays
1 nm
10
1 µm
10
−9
X-rays
−6
Ultraviolet Infrared
1 mm 1 cm 1m
Figure 1.4
400 nm Visible 750 nm
−3
10 −2 10
Microwave (and subbands)
10
Radiowave
Electromagnetic spectrum.
λ wavelength (micrometers, µm) 14 10
Longwave infrared Midwave infrared
3.0 1.0
Near- and shortwave infrared Visible
0.4
Figure 1.5
Subdivided infrared wavebands.
divided into near infrared (NIR), shortwave infrared (SWIR), midwave infrared (MWIR), longwave infrared (LWIR), and far infrared. The sensor types are driven by the type of image information that can be exploited within these bands. X-ray sensors can view human bones for disease diagnosis. Microwave and radiowave sensors have a good weather penetration in military applications. Infrared sensors detect both temperature and emissivity and are beneficial for night-vision applications. Different subwaveband sensors in infrared wavebands can provide different information. For example, MWIR sensors respond better to hotter-than-terrestrial objects. LWIR sensors have better response to overall terrestrial object temperatures, which are around 300 Kelvins (K). Solar clutter is high in the MWIR in the daytime and is negligible in the LWIR. Figure 1.6 shows an example of fusing MWIR and LWIR images. The road cracks are visible in LWIR, but not in MWIR. Similarly, the Sun glint is visible in MWIR image, but not in LWIR. The fused image shows both Sun glint and road cracks.
10
Introduction
Fused image
Road cracks clearly visible in LW but not MW
Road cracks Sun glint
MW
LW
Figure 1.6 Example of fusing MWIR and LWIR images. The road cracks are visible in LWIR, but not in MWIR. Similarly, the Sun glint is visible in MWIR image, but not in LWIR. The fused image shows both Sun glint and road cracks.
Many questions of image fusion remain unanswered and open to new research opportunities. Some of the questions involve how to select different sensors to provide better image information from the scene; whether different imaging information can be effectively combined to provide a better cue in the scene; and how to best combine the information. These issues are presented and examples and imaging system performance are provided in Chapter 11.
References [1] [2] [3]
Milton, A. F., F. B. Barone, and M. R. Kruer, “Influence of Nonuniformity on Infrared Focal Plan Array Performance,” Optical Engineering, Vol. 24, No. 5, 1985, pp. 855–862. Richards, A., Alien Vision—Exploring the Electromagnetic Spectrum with Imaging Technology, Bellingham, WA: SPIE Press, 2001. Driggers, R. G., P. Cox, and T. Edwards, Introduction to Infrared and Electro-Optical Systems, Norwood, MA: Artech House, 1999.
CHAPTER 2
Imaging Systems In this chapter, basic imaging systems are introduced and the concepts of resolution and sensitivity are explored. This introduction presents helpful background information that is necessary to understand imaging system performance, which is presented in Chapter 3. It also provides a basis for later discussions on the implementation of advanced signal and image processing techniques.
2.1
Basic Imaging Systems A basic imaging system can be depicted as a cascaded system where the input signal is optical flux from a target and background and the output is an image presented for human consumption. A basic imaging system is shown in Figure 2.1. The system can begin with the flux leaving the target and the background. For electro-optical systems and more sophisticated treatments of infrared systems, the system can even begin with the illumination of the target with external sources. Regardless, the flux leaving the source traverses the atmosphere as shown. This path includes blur from turbulence and scattering and a reduction in the flux due to atmospheric extinction, such as scattering, and absorption, among others. The flux that makes it to the entrance of the optics is then blurred by optical diffraction and aberrations. The flux is also reduced by the optical transmission. The flux is imaged onto a detector array, either scanning or staring. Here, the flux is converted from photons to electrons. There is a quantum efficiency that reduces the signal, and the finite size of the detector imposes a blur on the image. The electronics further reduce, or in some cases enhance, the signal. The display also provides a signal reduction and a blur, due to the finite size of the display element. Finally, the eye consumes the image. The eye has its own inherent blur and noise, which are considered in overall system performance. In some cases, the output of the electronics is processed by an automatic target recognizer (ATR), which is an automated process of detecting and recognizing targets. An even more common process is an aided target recognizer (AiTR), which is more of a cueing process for a human to view the resultant cued image “chips” (a small area containing an object). All source and background objects above 0K emit electromagnetic radiation associated with the thermal activity on the surface of the object. For terrestrial temperatures (around 300K), objects emit a good portion of the electromagnetic flux in the infrared part of the electromagnetic spectrum. This emission of flux is sometimes called blackbody thermal emission. The human eye views energy only in the visible portion of the electromagnetic spectrum, where the visible band spans wavelengths from 0.4 to 0.7 micrometer (µm). Infrared imaging devices convert energy in
11
12
Imaging Systems
Optics
Display
Detector array and cooler
Human vision Scanner
Electronics ATR
Atmosphere
Target and background
Figure 2.1
Basic imaging system.
the infrared portion of the electromagnetic spectrum into displayable images in the visible band for human use. The infrared spectrum begins at the red end of the visible spectrum where the eye can no longer sense energy. It spans from 0.7 to 100 µm. The infrared spectrum is, by common convention, broken into five different bands (this may vary according to the application/community). The bands are typically defined in the following way: near-infrared (NIR) from 0.7 to 1.0 µm, shortwave infrared (SWIR) from 1.0 to 3.0 µm, midwave infrared (MWIR) from 3.0 to 5.0 µm, longwave infrared (LWIR) from 8.0 to 14.0 µm, and far infrared (FIR) from 14.0 to 100 µm. These bands are depicted graphically in Figure 2.2. Figure 2.2 shows the atmospheric transmission for a 1-kilometer horizontal ground path for a “standard” day in the United States. These types of transmission graphs can be tailored for any condition using sophisticated atmospheric models, such as MODTRAN (from http://www.ontar.com). Note that there are many atmospheric “windows” so that an imager designed with such a band selection can see through the atmosphere.
Transmission
Ultraviolet 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0.4
Visible
Near- and shortwave infrared
1
Midwave infrared
3
Longwave infrared
10
Far infrared
14
Wavelength (micrometers) Figure 2.2
Atmospheric transmission for a 1-kilometer path on a standard U.S. atmosphere day.
2.1 Basic Imaging Systems
13
The primary difference between a visible spectrum camera and an infrared imager is the physical phenomenology of the radiation from the scene being imaged. The energy used by a visible camera is predominantly reflected solar or some other illuminating energy in the visible spectrum. The energy imaged by infrared imagers, commonly known as forward looking infrareds (FLIRs) in the MWIR and LWIR bands, is primarily self-emitted radiation. From Figure 2.2, the MWIR band has an atmospheric window in the 3- to 5-µm region, and the LWIR band has an atmospheric window in the 8- to 12-µm region. The atmosphere is opaque in the 5- to 8-µm region, so it would be pointless to construct a camera that responds to this waveband. Figure 2.3 provides images to show the difference in the source of the radiation sensed by the two types of cameras. The visible image on the left side is all light that was provided by the Sun, propagated through Earth’s atmosphere, reflected off the objects in the scene, traversed through a second atmospheric path to the sensor, and then imaged with a lens and a visible band detector. A key here is that the objects in the scene are represented by their reflectivity characteristics. The image characteristics can also change by any change in atmospheric path or source characteristic change. The atmospheric path characteristics from the sun to the objects change frequently because the Sun’s angle changes throughout the day, plus the weather and cloud conditions change. The visible imager characterization model is a multipath problem that is extremely difficult. The LWIR image given on the right side of Figure 2.3 is obtained primarily by the emission of radiation by objects in the scene. The amount of electromagnetic flux depends on the temperature and emissivity of the objects. A higher temperature and a higher emissivity correspond to a higher flux. The image shown is white hot—a whiter point in the image corresponds to a higher flux leaving the object. It is interesting to note that trees have a natural self-cooling process, since a high temperature can damage foliage. Objects that have absorbed a large amount of solar energy are hot and are emitting large amounts of infrared radiation. This is sometimes called solar loading.
Figure 2.3 Visible image on the left side: reflected flux; LWIR infrared image on the right side: emitted flux. (Images courtesy of NRL Optical Sciences Division.)
14
Imaging Systems
The characteristics of the infrared radiation emitted by an object are described by Planck’s blackbody law in terms of spectral radiant emittance, in the following: M λ = ε( λ)
(
λ e 5
c1
)
c 2 λT
−1
W cm 2 − µm
4
4
(2.1)
2
4
where c1 and c2 are constants of 3.7418 × 10 W-µm /cm and 1.4388 × 10 µm-K. The wavelength, λ, is provided in micrometers and ε(λ) is the emissivity of the surface. A blackbody source is defined as an object with an emissivity of 1.0 and is considered a perfect emitter. Source emissions of blackbodies at typical terrestrial temperatures are shown in Figure 2.4. Often, in modeling and system performance assessment, the terrestrial background temperature is assumed to be 300K. The source emittance curves are shown for other temperatures for comparison. One curve corresponds to an object colder than the background, and two curves correspond to temperatures hotter than the background. Planck’s equation describes the spectral shape of the source as a function of wavelength. It is readily apparent that the peak shifts to the left (shorter wavelengths) as the body temperature increases. If the temperature of a blackbody were increased to that of the Sun (5,900K), the peak of the spectral shape would decrease to 0.55 µm or green light (note that this is in the visible band). This peak wavelength is described by Wien’s displacement law λmax = 2,898 T, µm
(2.2)
Radiant emittance, W/cm2-µm
For a terrestrial temperature of 300K, the peak wavelength is around 10 µm. It is important to note that the difference between the blackbody curves is the “signal” in the infrared bands. For an infrared sensor, if the background is at 300K and the target is at 302K, the signal is the difference in flux between the blackbody curves. Signals in the infrared sensor are small riding on very large amounts of background flux. In the visible band, this is not the case. For example, consider the case of a white target on a black background. The black background is generating no signal, while the white target is generating a maximum signal, given the sensor gain has
Blackbody curves for four temps from 290K to 320K
0.005
0.004 0.003
0.002 0.001 0 0
5
10
15
20
25
30
Wavelength (micrometers) 2
Figure 2.4
290
Planck’s blackbody radiation curves.
300
310
320
35
40
2.2 Resolution and Sensitivity
15
been adjusted. Dynamic range may be fully utilized in a visible sensor. For the case of an infrared sensor, a portion of the dynamic range is used by the large background flux radiated by everything in the scene. This flux is never a small value; hence, sensitivity and dynamic range requirements are much more difficult to satisfy in infrared sensors than in visible sensors.
2.2
Resolution and Sensitivity There are three general categories of infrared sensor performance characterizations. The first is sensitivity and the second is resolution. When end-to-end, or human-in-the-loop (HITL), performance is required, the third type of performance characterization describes the visual acuity of an observer through a sensor, which will be discussed in Chapter 3. The former two are both related to the hardware and software that comprises the system, while the latter includes both the sensor and the observer. The first type of measure, sensitivity, is determined through radiometric analysis of the scene/environment and the quantum electronic properties of the detectors. Resolution is determined by analysis of the physical optical properties, the detector array geometry, and other degrading components of the system in much the same manner as complex electronic circuit/signals analysis. Sensitivity describes how the sensor performs with respect to input signal level. It relates noise characteristics, responsivity of the detector, light gathering of the optics, and the dynamic range/quantization of the sensor. Radiometry describes how much light leaves the object and background and is collected by the detector. Optical design and detector characteristics are of considerable importance in sensor sensitivity analysis. In infrared systems, noise equivalent temperature difference (NETD) is often a first-order description of the system sensitivity. The three-dimensional (3-D) noise model [1] describes more detailed representations of sensitivity parameters. In visible systems, the noise equivalent irradiance (NEI) is a similar term that is used to determine the sensitivity of the system. The second type of measure is resolution. Resolution is the ability of the sensor to image small targets and to resolve fine detail in large targets. Modulation transfer function (MTF) is the most widely used resolution descriptor in infrared systems. Alternatively, it may be specified by a number of descriptive metrics, such as the optical Rayleigh Criterion or the instantaneous field-of-view (FOV), of the detector. While these metrics are component-level descriptions, the system MTF is an all-encompassing function that describes the system resolution. Sensitivity and resolution can be competing system characteristics, and they are the most important issues in initial studies for a design. For example, given a fixed sensor aperture diameter, an increase in focal length can provide an increase in resolution, but it may decrease sensitivity [1]. Typically, visible band systems have plenty of sensitivity and are resolution-limited, while infrared imagers have been more sensitivity-limited. With staring infrared sensors, the sensitivity has seen significant improvements. Quite often metrics, such as NETD and MTF, are considered to be separable. However, in an actual sensor, sensitivity and resolution performance are not independent. As a result, minimum resolvable temperature difference (MRT or MRTD)
16
Imaging Systems
or the sensor contrast threshold function (CTF) has become the primary performance metrics for infrared systems. MRT and MRC (minimum resolvable contrast) are a quantitative performance measure in terms of both sensitivity and resolution. A simple MRT curve is shown in Figure 2.5. The performance is bounded by the sensor’s limits and the observer’s limits. The temperature difference, or thermal contrast, required to image smaller details in a scene increases with detail size. The inclusion of observer performance yields a single-sensor performance characterization. It describes sensitivity as a function of resolution and includes the human visual system.
Linear Shift-Invariant (LSI) Imaging Systems A linear imaging system requires two properties [1, 2]: superposition and scaling. Consider an input scene, i(x, y) and an output image, o(x, y). Given that a linear system is described by L{}, then o( x , y) = L{i( x , y)}
(2.3)
The superposition and scaling properties are satisfied if L{ai1 ( x , y) + bi 2 ( x , y)} = aL{i1 ( x , y )} + bL{i 2 ( x , y )}
(2.4)
where i1(x, y) and i2(x, y) are input scenes and a and b are constants. Superposition, simply described, is that the image of two scenes, such as a target scene and a background scene, is the sum of individual scenes imaged separately. The simplest example here is that of a point source as shown in Figure 2.6. The left side of the figure shows the case where a single point source is imaged, then a second point source is imaged, and the two results are summed to give an image of the two point sources. System resolution limit
Minimum resolvable temperature
2.3
System response
Visual sensitivity limit Spatial frequency
Figure 2.5
Sensor resolution and sensitivity.
2.3 Linear Shift-Invariant (LSI) Imaging Systems
17 Imaged together
Imaged separately and added Output image
Input scene
Imaging system P1 Input scene
+
Output image
P2
Imaging system
Imaging system
=
Figure 2.6
Output image
Input scene
Output Image
Superposition principle.
The superposition principle states that this sum of point source images would be identical to the resultant image if both point sources were included in the input scene. The second property simply states that an increase in input scene brightness increases the image brightness. Doubling a point source brightness would double the image brightness. The linear systems approach is extremely important with imaging systems, since any scene can be represented as a collection of weighted point sources. The output image is the collection of the imaging system responses to the point sources. In continuous (nonsampled) imaging systems, another property is typically assumed: shift-invariance. Sometimes a shift invariant system is called isoplanatic. Mathematically stated, the response of a shift invariant system to a shifted input, such as a point source, is a shifted output; that is, o( x − x o , x − y o ) = L{i( x − x o , y − y o )}
(2.5)
where xo and yo are the coordinates of the point source. It does not matter where the point source is located in the scene, the image of the point source will appear to be the same, only shifted in space. The image of the point source does not change with position. If this property is satisfied, the shifting property of the point source, or delta function, can be used, i( x o , y o ) =
y 2 x2
∫ ∫ i( x , y)δ( x − x
y 1 x1
o
, y − y o )dxdy
(2.6)
18
Imaging Systems
where x1 ≤ xo ≤ x2 and y1 ≤ yo ≤ y2. The delta function, δ(x−xo, y−yo), is nonzero only at xo, yo and has an area of unity. The delta function is used frequently to describe infinitesimal sources of light. Equation (2.6) states that the value of the input scene at xo, yo can be written in terms of a weighted delta function. We can substitute i(x, y) in (2.6) i( x , y) =
∞ ∞
∫ ∫ i( α, β)δ( α − x , β − y)dαdβ
(2.7)
−∞ −∞
which states that the entire input scene can be represented as a collection of weighted point sources. The output of the linear system can then be written using (2.7) as the input, so that ∞ ∞ o( x , y) = L ∫ ∫ i( α, β)δ( α − x , β − y)dαdβ −∞ −∞
(2.8)
Since the linear operator, L{}, does not operate on α and β, (2.8) can be rewritten as o( x , y) =
∞ ∞
∫ ∫ i( α, β)L{δ( α − x , β − y)dαdβ}
(2.9)
−∞ −∞
If we call the point source response of the system the impulse response, defined as h( x , y) = L{δ( x , y)}
(2.10)
then the output of the system is the convolution of the input scene with the impulse response of the system; that is, o( x , y) =
∞ ∞
∫ ∫ i( α, β)h( α − x , β − y)dαdβ = i( x , y) **h( x , y)
(2.11)
−∞ −∞
where ** denotes the two-dimensional (2-D) convolution. The impulse response of the system, h(x, y), is commonly called the point spread function (PSF) of the imaging system. The significance of (2.11) is that the system impulse response is a spatial
** h(x,y) i(x,y)
x Point spread function
Figure 2.7
Simplified LSI imaging system.
o(x,y)
2.3 Linear Shift-Invariant (LSI) Imaging Systems
19
filter that is convolved with the input scene to obtain an output image. The simplified LSI imaging system model is shown in Figure 2.7. The system described here is valid for LSI systems only. This analysis technique is a reasonable description for continuous and well-sampled imaging systems. It is not a good description for an undersampled or a well-designed sampled imaging system. These sampled imaging systems do satisfy the requirements of a linear system, but they do not follow the shift invariance property. The sampling nature of these systems is described later in this chapter. The representation of sampled imaging systems is a modification to this approach. For completeness, we take the spatial domain linear systems model and convert it to the spatial frequency domain. Spatial filtering can be accomplished in both domains. Given that x and y are spatial coordinates in units of milliradians, the spatial frequency domain has independent variables of fx and fy, cycles per milliradian. A spatial input or output function is related to its spectrum by the Fourier transform F (f x , f y ) =
∞ ∞
∫ ∫ f ( x , y )e
− j2 π (f xx+ f y y )
(2.12)
dxdy
−∞ −∞
where the inverse Fourier transform converts an image spectrum to a spatial function f ( x , y) =
∞ ∞
∫ ∫ F (f
−∞ −∞
x
, f y )e
j2 π (f xx+ f y y )
(2.13)
df x df y
The properties and characteristics of the Fourier transform are provided in [2–4]. A function and its spectrum are collectively described as a Fourier transform pair. We will use the notation of the Fourier transform operator
[
]
[
]
G(f x , f y ) = ᑤ g( x , y) and g( x , y) = ᑤ −1 G(f x , f y )
(2.14)
in order to simplify analyses descriptions. One of the very important properties of the Fourier transform is that the Fourier transform of a convolution results in a product. Therefore, the spatial convolution described in (2.11) results in a spectrum of O(f x , f y ) = I(f x , f y )H(f x , f y )
(2.15)
Here, the output spectrum is related to the input spectrum by the product of the Fourier transform of the system impulse response. Therefore, the Fourier transform of the system impulse response is called the transfer function of the system. Multiplication of the input scene spectrum by the transfer function of an imaging system provides the same filtering action as the convolution of the input scene with the imaging system PSF. In imaging systems, the magnitude of the Fourier transform of the system PSF is the modulation transfer function (MTF).
20
Imaging Systems
2.4 Imaging System Point Spread Function and Modulation Transfer Function The system impulse response or point spread function (PSF) of an imaging system is comprised of component impulse responses as shown in Figure 2.8. Each of the components in the system contributes to the blurring of the scene. In fact, each of the components has an impulse response that can be applied in the same manner as the system impulse response. The blur attributed to a component may be comprised of a few different physical effects. For example, the optical blur is a combination of the diffraction and aberration effects of the optical system. The detector blur is a combination of the detector shape and the finite time of detector integration as it traverses the scene. It can be shown that the PSF of the system is a combination of the individual impulse responses hsystem ( x , y) = hatm ( x , y) **hoptics ( x , y) **hdet ( x , y) **helec ( x , y) * *hdisp ( x , y)
(2.16)
so that the total blur, or system PSF, is a combination of the component impulse responses. The Fourier transform of the system impulse response is called the transfer function of the system. In fact, each of the component impulse responses given in (2.16) has a component transfer function that, when cascaded (multiplied), the resulting transfer function is the overall system transfer function; that is, O(f x , f y ) = I(f x , f y )H atm (f x , f y )H optics (f x , f y )
H det (f x , f y )H elec (f x , f y )H disp (f x , f y )H eye (f x , f y )
(2.17)
Note that the system transfer function is the product of the component transfer functions. A large number of imaging spatial filters are accounted for in the design and/or analysis of imaging system performance. These filters include effects from optics, detectors, electronics, displays, and the human eye. We use (2.16) and (2.17) as our spatial filtering guidelines, where we know that the treatment can be applied in either the spatial or frequency domain. We present the most common of these filters
Figure 2.8
i(x,y)
hatm(x,y)
hoptics(x,y)
hdet(x,y)
Input scene
Atmosphere
Optics
Detectors
Imaging system components.
helec (x,y)
hdisp(x,y)
o(x,y)
Electronics
Display
Output scene
2.4 Imaging System Point Spread Function and Modulation Transfer Function
21
beginning with the optical effects. Also, the transfer function of a system, as given in (2.17), is frequently described without the eye transfer function. 2.4.1
Optical Filtering
Two filters account for the optical effects in an imaging system: diffraction and aberrations. The diffraction filter accounts for the spreading of the light as it passes an obstruction or an aperture. The diffraction impulse response for an incoherent imaging system with a circular aperture of diameter D is 2
D Dr hdiff ( x , y) = somb 2 λ λ
where λ is the average band wavelength and r =
(2.18)
x 2 + y 2 . The somb (for som-
brero) function is given by Gaskill [3] to be somb( r ) =
J 1 ( πr ) πr
(2.19)
where J1 is the first-order Bessel function of the first kind. The filtering associated with the optical aberrations is sometimes called the geometric blur. There are many ways to model this blur and there are numerous commercial programs for calculating geometric blur at different locations on the image. However, a convenient method is to consider the geometric blur collectively as a Gaussian function h geom ( x , y) =
r 1 Gaus 2 σ gb σ gb
(2.20)
where σgb is an amplitude that best describes the blur associated with the aberrations. The Gaussian function, Gaus, is Gaus( r ) = e − πr
2
(2.21)
Note that the scaling values in front of the somb and the Gaus functions are intended to provide a functional area (under the curve) of unity so that no gain is applied to the scene. Examples of the optical impulse responses are given in Figure 2.9 corresponding to a wavelength of 10 µm, an optical diameter of 10 centimeters, and a geometric blur of 0.1 milliradian. The overall impulse response of the optics is the combined blur of both the diffraction and aberration effects hoptics ( x , y) = hdiff ( x , y) **h geom ( x , y)
(2.22)
The transfer functions corresponding to these impulse responses are obtained by taking the Fourier transform of the functions given in (2.18) and (2.20). The Fourier transform of the somb is given by Gaskill [3] so that the transfer function is
22
Imaging Systems h diff (x,y)
1 0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2 0 0.2
0 0.2
0.1
0
−0.1
y milliradians
Figure 2.9
0.1
0
−0.2 −0.2
h geom (x,y)
1
0.2
0.1
0
−0.1 y milliradians
−0.1 x milliradians
−0.2
−0.1
0
0.1
0.2
x milliradians
Spatial representations of optical blur.
H diff (f x , f y ) =
ρλ ρλ 2 ρλ cos −1 − 1− D D D π
2
(2.23)
where ρ = f x2 + f y2 and is plotted in cycles per milliradian and D is the entrance aperture diameter. The Fourier transform of the Gaus function is simply the Gaus function [4], with care taken on the scaling property of the transform. The transfer function corresponding to the aberration effects is
(
)
H geom (f x , f y ) = Gaus σ gb ρ
(2.24)
For the example described here, the transfer functions are shown in Figure 2.10. Note that the overall optical transfer function is the product of the two functions. 2.4.2
Detector Spatial Filters
The detector spatial filter is also comprised of a number of different effects, including spatial integration, sample-and-hold, crosstalk, and responsivity, among others. The two most common effects are spatial integration and sample-and-hold; that is, hdet ( x , y) = hdet_ sp ( x , y ) **hdet_ sh ( x , y )
Hgeom (fx ,fy )
Hdiff (fx,fy) 1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2 0 10
0.2 0 10
5 0
fx cycles per milliradians
Figure 2.10
−5
−10 −10
−5
(2.25)
0
5
10
fy cycles per milliradians
Optical transfer functions of optical blur.
5 0
fx cycles per milliradians
−5
−10 −10
−5
0
5
10
fy cycles per milliradians
2.4 Imaging System Point Spread Function and Modulation Transfer Function
23
The other effects can be included, but they are usually considered negligible unless there is good reason to believe otherwise (i.e., the detector responsivity varies dramatically over the detector). The detector spatial impulse response is due to the spatial integration of the light over the detector. Since most detectors are rectangular in shape, the rectangle function is used as the spatial model of the detector hdet_ sp ( x , y) =
x y 1 rect , DAS x DAS y DAS x DAS y
y x 1 1 = rect rect DAS x DAS x DAS y DAS y
(2.26)
where DASx and DASy are the horizontal and vertical detector angular subtenses in milliradians. The detector angular subtense is the detector width (or height) divided by the sensor focal length. The transfer function corresponding to the detector spatial integration is determined by taking the Fourier transform of (2.26) H det_ sp (f x , f y ) = sinc(DAS x f x , DAS y f y ) = sinc(DAS x f x )sinc(DAS y f y )
(2.27)
where the sinc function is defined as [2] sinc( x ) =
sin( πx )
(2.28)
πx
The impulse response and the transfer function for a detector with a 0.1 by 0.1 milliradian detector angular subtense is shown in Figure 2.11. The detector sample-and-hold function is an integration of the light as the detector scans across the image. This sample-and-hold function is not present in staring arrays, but it is present in most scanning systems where the output of the integrated signal is sampled. The sampling direction is assumed to be the horizontal, or x, direction. Usually, the distance, in milliradians, between samples is smaller than the detector angular subtense by a factor called samples per IFOV or samples per DAS, spdas. The sample-and-hold function can be considered a rectangular Hdet_sp (fx,fy)
1 hdet_sp(x,y)
1
0.8
0.5
0.6 0.4
0
0.2 0 1.0
1.0 0
y milliradians
Figure 2.11
0 −0.1 −0.1
x milliradians
−0.5 40 20
0
fy cycles per milliradians
−20
Detector spatial impulse response and transfer function.
−40 −40
−20
0
20
40
fx cycles per milliradians
24
Imaging Systems
function in x where the size of the rectangle corresponds to the distance between samples. In the spatial domain y direction, the function is an impulse function. Therefore, the impulse response of the sample-and-hold function is hdet_ sh ( x , y) =
x spdas spdas rect δ( y) DAS x DAS x
(2.29)
The Fourier transform of the impulse response gives the transfer function of the sample- and-hold operation DAS x f x H det_ sh (f x , f y ) = sinc spdas
(2.30)
Note that the Fourier transform of the impulse function in the y direction is 1. The impulse response and the transfer function for sample-and-hold associated with the detector given in Figure 2.11 with a two-sample-per-DAS sample-and-hold are shown in Figure 2.12. 2.4.3
Electronics Filtering
The electronics filtering function is one of the more difficult to characterize and one of the more loosely applied functions. First, it involves the conversion of temporal frequencies to spatial frequencies. Usually, this involves some scan rate or readout rate. Second, most impulse response functions in space are even functions. With electronic filtering, the impulse function can be a one-sided function. Finally, most engineers apply a two-sided impulse response that violates the rules of causality. This gross approximation does not usually have a heavy impact on sensor performance estimates since the electronics are not typically the limiting component of the sensor. Holst [5] and Vollmerhausen and Driggers [6] provide electronic filter (digital and analog) approximations that can be used in transfer function estimates. Digital filters also provide a spatial blur and a corresponding transfer function. Finite impulse response (FIR) filters are common in electro-optical and infrared sys-
Hdet_sh(fx,fy)
1 hdet_sh(x,y)
1
0.8
0.5
0.6 0.4 0.2 0 1.0 0.05
0
1.0
0.05 0 0 −0.05 −0.05 −0.1 −0.1 y milliradians x milliradians
Figure 2.12
−0.5 40 20
0
fy cycles per milliradians
−20
−40 −40
−20
Detector sample-and-hold impulse response and transfer function.
0
20
40
fx cycles per milliradians
2.4 Imaging System Point Spread Function and Modulation Transfer Function
25
tems with such functions as interpolation, boost, and edge enhancements. These are filters that are convolved with a digital image and so they have a discrete “kernel” that is used to process the spatial image. The transfer function associated with these FIR filters is a summation of sines and cosines, where the filter is not band-limited. The combination of these filters with a display reconstruction provides for an overall output filter (and corresponding transfer function). Chapter 4 discusses finite impulse response filters and the transfer function associated with these filters. 2.4.4
Display Filtering
The finite size and shape of the display spot also corresponds to a spatial filtering of the image. Usually, the spot, or element, of a display is either Gaussian in shape like a cathode ray tube (CRT), or it is rectangular in shape, like a flat-panel display. Light emitting diode (LED) displays are also rectangular in shape. The PSF of the display is simply the size and shape of the display spot. The only difference is that the finite size and shape of the display spot must be converted from a physical dimension to the sensor angular space. For the Gaussian spot, the spot size dimension in centimeters must be converted to an equivalent angular space in the sensor’s field of view (FOV) σ disp _ angle = σ disp _ cm
FOVv L disp _ v
(2.31)
where Ldisp_v is the length in centimeters of the display vertical dimension and FOVv is FOV of the sensor in milliradians. For the rectangular display element, the height and width of the display element must also be converted to the sensor’s angular space. The vertical dimension of the rectangular shape is obtained using (2.31) and the horizontal dimension is similar with the horizontal display length and sensor FOV. Once these angular dimensions are obtained, the PSF of the display spot is simply the size and shape of the display element hdisp ( x , y) =
1 σ disp _ angle
2
r for a Gaussian spot Gaus σ disp _ angle
(2.32)
or hdisp ( x , y) =
1 W disp _ angle _ h H disp _ angle _ v y x rect , W disp _ angle _ h H disp _ angle _ v
for flat panel
(2.33)
where the angular display element shapes are given in milliradians. These spatial shapes are shown in Figures 2.9 and 2.11. The transfer functions associated with these display spots are determined by taking the Fourier transform of the earlier PSF equations; that is,
26
Imaging Systems
(
)
H disp (f x , f y ) = Gaus σ disp _ angle ρ Gaussian display
(2.34)
or
(
)
H disp (f x , f y ) = sinc W disp _ angle _ h f x , H disp _ angle _ v f y Flat-panel display
(2.35)
Again, these transfer functions are shown in Figures 2.9 and 2.11. 2.4.5
Human Eye
Note that the human eye is not part of the system performance MTF as shown in Figure 2.8, and the eye MTF should not be included in the PSF of the system. In Chapter 3, the eye CTF is used to include the eye sensitivity and resolution limitations in performance calculations. It is, however, useful to understand the PSF and MTF of the eye such that the eye blur can be compared to sensor blur. A system with much higher resolution than the eye is a waste of money, and a system with much lower resolution than the eye is a poorly performing system. The human eye certainly has a PSF that is a combination of three physical components: optics, retina, and tremor [7, 8]. In terms of these components, the PSF is h( x , y ) = heye _ optics ( x , y ) **hretina ( x , y ) **htremor ( x , y)
(2.36)
Therefore, the transfer function of the eye is H eye (f x , f y ) = H eye _ optics (f x , f y )H retina (f x , f y )Htremor (f x , f y )
(2.37)
The transfer function associated with the eye optics is a function of display light level. This is because the pupil diameter changes with light level. The number of foot-Lamberts (fL) at the eye from the display is Ld/0.929, where Ld is the display luminance in millilamberts. The pupil diameter is then Dpupil = −9011 . + 1323 . exp{− log10 ( fL) 21082 . }
[mm]
(2.38)
This equation is valid, if one eye is used as in some targeting applications. If both eyes view the display, the pupil diameter is reduced by 0.5 millimeter. Two parameters, io and fo, are required for the eye optics transfer function. The first parameter is
(
io = 0 .7155 + 0277 .
Dpupil
)
2
(2.39)
and the second is
{
(
2 fo = exp 3663 . − 00216 . * Dpupil log Dpupil
)}
(2.40)
Now, the eye optics transfer function can be written as
{[
]
H eye _ optics ( ρ) = exp − 4369 . ( ρ M ) fo
io
}
(2.41)
2.4 Imaging System Point Spread Function and Modulation Transfer Function
27
where ρ is the radial spatial frequency, f x2 + f y2 , in cycles per milliradian. M is the system magnification (angular subtense of the display to the eye divided by the sensor FOV). The retina transfer function is
{
H retina ( ρ) = exp −0375 . ( ρ M)
1. 21
}
(2.42)
Finally, the transfer function of the eye due to tremor is
{
Htremor ( ρ) = exp −0.4441( ρ M )
2
}
(2.43)
which completes the eye model. For an example, let the magnification of the system equal 1. With a pupil diameter of 3.6 mm corresponding to a display brightness of 10 fL at the eye (with one viewing eye), the combined MTF of the eye is shown in Figure 2.13. The io and fo parameters were 0.742 and 27.2, respectively. All of the PSFs and transfer functions given in this section are used in the modeling of infrared and electro-optical imaging systems. We covered only the more common system components. There may be many more that must be considered when they are part of an imaging system. 2.4.6
Overall Image Transfer
To quantify the overall system resolution, all of the spatial blurs are convolved and all of the transfer functions are multiplied. The system PSF is the combination of all the blurs, and the system MTF is the product of all the transfer functions. In the roll-up, the eye is typically not included to describe the resolution of the system. Also, the system can be described as “limited” by some aspect of the system. For example, a diffraction-limited system is one in which the diffraction cutoff frequency is smaller than all of the other components in the system (and spatial blur is larger). A detector-limited system would be one in which the detector blur is larger and the detector transfer cutoff frequency is smaller than the other system components.
H eye 1 0.8 0.6 0.4 0.2 0 2
1
0
fy cyc/mrad
Figure 2.13
Eye transfer function.
−1
−2 −2
−1
0
1
fx cyc/mrad
2
28
Imaging Systems
The MTF for a typical MWIR system is shown in Figure 2.14. The pre-MTF shown is the rollup transfer function for the optics diffraction blur, aberrations, and the detector shape. The post-MTF is the rollup transfer for the electronics (many times negligible) and the display. The system MTF (system transfer) is the product of the pre- and post-MTFs as shown. In Figure 2.14, the horizontal MTF is shown. In most sensor performance models, the horizontal and vertical blurs, and corresponding MTFs, are considered separable. That is, h( x , y) = h( x )h( y)
(2.44)
and the corresponding Fourier transform is H(f x , f y ) = H( f x )H(f y )
(2.45)
This approach usually provides for small errors (a few percent) in performance calculations even when some of the components in the system are circularly symmetric.
2.5
Sampled Imaging Systems In the previous sections, we described the process of imaging for a continuous or well-sampled imager. In this case, the input scene is convolved with the imager PSF (i.e., the impulse response of the system). With sampled imaging systems, the process is different. As an image traverses through a sampled imaging system, the image undergoes a three-step process. Figure 2.15 shows this process as a presample blur, a sampling action, and a postsample blur (reconstruction). The image is blurred by the optics, the detector angular subtense, the spatial integration scan, if needed, and any other effects appropriate to presampling. This presample blur, h(x,y), is applied to the image in the manner of an impulse response, so the response is convolved with the input scene o1 ( x , y ) =
∞ ∞
∫ ∫ i( x − α, y − β)h( α, β)dαdβ = i( x , y) **h( x , y)
−∞ −∞
Horizontal system MTFs 1 0.8 0.6
Pre-MTF Post-MTF System transfer
0.4 0.2 0
Figure 2.14
0
5 10 Cycles/mrad
System transfer function.
15
(2.46)
2.5 Sampled Imaging Systems
29 h(x,y)
i(x,y)
d(x,y)
s(x,y)
x
x Presample blur
Image sample
o 1(x,y)
Figure 2.15
x
o(x,y)
Reconstruction
o 2 (x,y)
Three-step imaging process.
where o1(x, y) is the presampled blur image or the output of the presample blur process. The convolution is denoted by the *, so ** denotes the two-dimensional convolution. The sampling process can be modeled with the multiplication of the presample blur image with the sampling function. For convention, we use Gaskill’s comb function [9] ∞ ∞ x y comb , = a b ∑ ∑ δ( x − ma)δ( y − nb ) a b m =−∞ n =−∞
(2.47)
which is a two-dimensional separable function. Now the output of the sampling process can be written as the product of the presample blurred image with the sampling function (note that a and b are the distances in milliradians or millimeters between samples) o 2 ( x , y ) = o1 ( x , y )
1 1 x y x y comb , = i( x , y) **h( x , y) comb , a b a b ab ab
[
]
(2.48)
At this point, all that is present is a set of discrete values that represent the presample blurred image at discrete locations. This output can be thought of as a weighted “bed of nails” that is meaningless to look at unless the “image” is reconstructed. The display and the eye, if applied properly, reconstruct the image to a function that is interpretable. This reconstruction is modeled as the convolution of the display and eye blur (and any other spatial postsample blur) and the output of the sampling process; that is, o( x , y) = o 2 ( x , y) **d ( x , y) =
{[i( x, y) **h( x, y)] ×
1 x y comb , * *d ( x , y) a b ab
(2.49)
While (2.49) appears to be a simple spatial process, there is a great deal that is inherent in the calculation. We have simplified the equation with the aggregate presample blur effects and the aggregate postsample reconstruction effects. The frequency analysis of the three-step process shown in Figure 2.15 can be presented simply by taking the Fourier transform of each process step. Consider the first step in the process, the presample blur. The transform of the convolution in space is equivalent to a product in spatial frequency
30
Imaging Systems
O1 (f x , f y ) = I(f x , f y )H pre (f x , f y )
(2.50)
where fx and fy are the horizontal and vertical spatial frequencies. If x and y are in milliradians, then the spatial frequencies are in cycles per milliradian. Hpre(fx, fy) is the Fourier transform of the presample blur spot. Note that the output spectrum can be normalized to the input spectrum so that Hpre(fx, fy) is a transfer function that follows the linear systems principles. Consider the presample blur spectrum (i.e., the presample blur transfer function given in Figure 2.16). Note that this is the image spectrum on the output of the blur that would occur if an impulse were input to the system. Next, we address the sampling process. The Fourier transform of (2.48) gives
[
]
O 2 (f x , f y ) = I(f x , f y )H pre (f x , f y ) **comb(af x , bf y )
(2.51)
where comb(af x , bf y ) =
∞
∞
∑ ∑ δ( f
k =−∞ l =−∞
x
− kf xs )δ(f y − lf ys ), f xs = 1 a and f ys = 1 b
(2.52)
If an impulse were input to the system, the response would be O 2 (f x , f y ) = H pre (f x , f y ) **comb(af x , bf y )
(2.53)
which is a replication of the presample blur at sample spacings of 1/a and 1/b. Consider the case shown in Figure 2.17. The sampled spectrum shown corresponds to a Gaussian blur of 0.5-milliradian radius (to the 0.043 cutoff) and a sample spacing of 0.5 milliradian. Note that the reproduction in frequency of the presample blur is at 2 cycles per milliradian. The so-called Nyquist rate of the sensor (the sensor half-sample rate) is at 1 cycle per milliradian. Any frequency from the presample blur baseband that is greater than the half-sample rate is also present as a mirror signal, or classical aliasing, under the half-sample rate by the first-order reproduction. The amount of classical aliasing is easily computed as the area of this mirrored signal. However, this is not the aliased signal seen on the output of the display as the display transfer Hpre(fx ,fy ) 1.2 1 0.8 0.6 0.4 0.2 −3
−2
−1
0 0
1
2 Cycles/mrad
Figure 2.16
Presample blur transfer function.
3
2.5 Sampled Imaging Systems
31 O 2 (fx ,f y )
1 0.8 0.6 0.4 0.2
−5
Figure 2.17
−3
−1
0 1
3 Cycles/mrad
5
Output of sampling.
has not been applied to the signal. The higher-order replications of the baseband are real-frequency components. The curves to the left and the right of the central curve are the first- and second-order replications at the positive and negative positions. The current state of the sample signal is tiny infinitesimal points weighted with the presample blurred image values. These points have spectra that extend in the frequency domain to very high frequencies. The higher-order replications are typically filtered with a reconstruction function involving the display and the eye. There is no practical way to implement the perfect reconstruction filter; however, the perfect rectangular filter would eliminate these higher-order replications and result only in the classical aliased signal. The reconstruction filter usually degrades the baseband and allows some of the signal of the higher-order terms through to the observer. The final step in the process corresponds to the reconstruction of the sampled information. This is accomplished simply by blurring the infinitesimal points so that the function looks nearly like that of the continuous input imagery. The blur is convolved in space, so it is multiplied in frequency
[
]
O(f x , f y ) = H pre (f x , f y ) **comb(af x , bf y ) D(f x , f y )
(2.54)
where this output corresponds to a point source input. Note that the postsampling transfer function is multiplied by the sampled spectrum to give the output of the whole system. Consider the sampled spectrum and the dashed display transfer function shown in Figure 2.18. The postsampling transfer function is shown in the graph as the display, passes part of the first-order replications. However, this display degrades the baseband signal relatively little. The tradeoff here is baseband resolution versus spurious response content. Given that all of the signals traverse through the postsample blur transfer function, the output is shown in Figure 2.19. There is classical aliasing on the output of the system, but a large signal corresponds to higher-order replication signals that passed through the display. Aliasing and the higher-order signals are collectively the spurious response of the sensor. These signals were not present on the input imagery, but there are artifacts on the output imagery. Without sampling, these spurious
32
Imaging Systems Sampled signal and display transfer 1 0.8 0.6 0.4 0.2 0 −5
Figure 2.18
−3
−1
1
3 5 Cycles/mrad
Sampled signal and display transfer.
System output spectrum 1 0.8 0.6 0.4 0.2 −5
Figure 2.19
−3
−1
0 1
3 5 Cycles/mrad
System output signal.
signals would not be present. The higher-order replicated signals are combined at each spatial frequency in terms of a vector sum, so it is convenient to represent the magnitude of the spurious signals as the root-sum-squared (RSS) of the spurious orders that make it through the reconstruction process. Three aggregate quantities have proven useful in describing the spurious response of a sampled imaging system: total integrated spurious response as defined by (2.55), in-band spurious response as defined by (2.56), and out-of-band spurious response as defined by (2.57) [10]; that is, ∞
∫ ( Spurious Response)df SR = ∫ ( BasebandSignal )df −∞
x
∞
x
−∞
SR in − band
∫ =
(2.55)
fs 2
−fs 2 ∞
( Spurious Response)df x
∫−∞ ( BasebandSignal )df x
(2.56)
2.5 Sampled Imaging Systems
33
SR out − of − band = SR − SR in − band
(2.57)
where fs is the sampling frequency. Examples of total and in-band spurious response are illustrated in Figures 2.20 and 2.21, respectively. The spurious responses of the higher-order replications could be constructive or destructive in nature, depending on the phase and frequency content of the spurious signals. The combination in magnitude was identical to that of a vector sum. This magnitude, on average, was the quadrature sum of the signals. This integrated RSS spurious response value was normalized by the integral of the baseband area. This ratio was the spurious response ratio. There is good experimental and theoretical evidence to generalize the effects of in-band spurious response and out-of-band spurious response. An in-band spurious response is the same as classical aliasing in communication systems. These are signals that are added to real signals and can corrupt an image by making the image look jagged, misplaced, (spatial registering), or even the wrong size (wider or thinner than the original object). The only way to decrease the amount of in-band spurious response is to increase the sample rate or increase the amount of presample blur. Increasing presample blur, or reducing the MTF, only reduces the performance of the imaging system. Blur causes more severe degradation than aliased signals. The effects of out-of-band spurious response are manifested by display artifacts. Common out-of-band spurious response is raster where the display spot is small compared to the line spacing or pixelization where the display looks blocky. Pixelization occurs on flat-panel displays where the display elements are large or in cases where pixel replication is used as a reconstruction technique. Gδ ( fx ) Transfer response
Spurious response fx
Figure 2.20
Example of total spurious response.
G δ ( fx ) Transfer response
Spurious response fx
Figure 2.21
Example of in-band spurious response.
34
Imaging Systems
The sampling artifacts associated with out-of-band spurious response can be removed by the display or image reconstruction process. Multiple display pixels per sensor sample can be used, and the sensor samples are interpolated to provide the intensity values for the added pixels. It is possible to remove essentially all of the out-of-band spurious response without degrading the transfer response of the sensor. That is, image interpolation can remove much of the bad without affecting the good; there is no performance down side to interpolation [11].
2.6
Signal-to-Noise Ratio In the past few sections, we have discussed primarily resolution. Just as important as resolution is sensitivity. The foundation for sensitivity is signal-to-noise ratio (SNR). SNR is of importance in both electro-optical and infrared systems, but in this section we focus on the calculation of SNR in the infrared. A very similar analysis is applied to electro-optical systems in which the signal is reflected light instead of emitted light. The noise concepts are the same. The differences between electro-optical and infrared systems are briefly discussed in the next section. Using Figure 2.22 as a guide, we first consider calculating the signal leaving the source object. In this case, the source object is resolved; that is, it is larger than a single detector footprint on the object. We start with the emittance in W/cm2-µm leaving the object. The emittance is the flux being emitted from the object, while an electro-optical system would use the quantity of exitance in the same units. For most sources in the infrared, the object is Lambertian so that the radiance of the object is related to the emittance by L source =
M source W cm 2 − µm − sr π
(2.58)
where Msource is the emittance of the source.
Extended source (resolved) Area of source seen by the detector
Ωsensor
System radiometry.
Detector
D IFOV Range, R
Figure 2.22
Sensor Collecting optic ddet foptics
2.6 Signal-to-Noise Ratio
35
To calculate the intensity of the source associated with the footprint of the detector (i.e., the only part of the source from which the detector can receive light), we multiply (2.58) by the effective source area M source M R2 A source = source A det 2 W − µm − sr π π f
I source =
(2.59)
where the area of the source seen by the detector, Asource, is related to the area of the detector, Adet, the focal length, f, and the range to the target, R. The intensity can be used to determine the amount of power that enters the sensor aperture; that is, P = I source Ω sensor =
M source M source R 2 πD 2 A det 2 = A det W − µm 2 2 π f 4R 4(F # )
(2.60)
so that the power entering the aperture is related to the area of the detector and the sensor f-number (F/#). It is written as various forms, such as f/4, f4, F/4, and the like. This power entering the aperture is the power on the detector (since we only considered the source seen by the detector), except that the power is reduced by the optical transmission of the optical system, optics P=
M source A det τ optics 4(F # )
2
W µm
(2.61)
This power on the detector must be integrated with wavelength to provide the overall power. The noise on the detector is described by the noise equivalent power (NEP). The detector NEP is related to the area of the detector, the detector bandwidth, and the normalized detectivity; that is, NEP =
A det ∆f
D * ( λ)
(2.62)
W
0.5
where D*(λ) is the detectivity of the detector in cm(Hz) /W and ∆f is the temporal bandwidth associated with the detector in hertz. The SNR is determined by taking the ratio of the signal in (2.61) to the noise in (2.62); that is, SNR =
∫
λ
M source ( λ) A det τ optics ( λ)D * ( λ) 4(F # )
2
∆f
dλ (unitless)
(2.63)
This SNR determines how “noisy” the image appears. An SNR of 10 appears very noisy, an SNR of 100 looks acceptable, and an SNR of 1,000 appears pristine. In fact, an image with an SNR of 1,000 appears with no perceivable noise since most displays have a dynamic range of 7 or 8 bits (less than 256 gray levels). In this case, the noise is smaller than the minimum dynamic range of the display.
36
Imaging Systems
Equation (2.63) can be rearranged for infrared systems so that when the SNR is set to 1, a blackbody temperature difference that creates an emittance difference for this condition can be determined. This temperature difference that creates an SNR of 1 is called NETD, or is sometimes called the random spatio-temporal noise. The derivation is provided in [12]. An NETD of 50 milliKelvins (mK) means that a differential scene temperature (in equivalent blackbody temperature) of 50 millikelvins will create an SNR of 1. Band-averaged detectivity is usually specified by detector manufacturers so that a useful form of NETD is 2 4 (F # ) ∆ f NETD = ∆L π τ optic A d D* ∆T
(2.64)
As an example, an LWIR system with an F/# of 1.75 with a 35-cm focal length, 10 0.5 an optical transmission of 0.7, a detectivity of 5 × 10 cm(Hz) /W, a detector area −6 2 of 49 × 10 cm , and a bandwidth of 55.9 kHz yields an NETD of 0.06K or 60 mK. −6 2 The LWIR band is from 8 to 12 µm and the ∆L/∆T is 6.7 × 10 W/(cm -sr-K)for this band. With the development of advanced scanning arrays, including line scanners, and focal plane arrays (FPA), a single-valued temporal noise could no longer characterize imaging system noise in an adequate manner. The nonuniformities of the detector arrays contributed significantly to the overall noise of the system. The nonuniformity noise values are not represented in the classical NETD. In 1989 and 1990, the U.S. Army Night Vision and Electronic Sensor Directorate (NVESD) developed the 3-D noise technique along with a laboratory procedure to address these problems. The concept of directional averaging allows the characterization of complex noise patterns. Consider the 3-D noise coordinate system shown in Figure 2.23. The 3-D noise method applies directional averages at the system output port to result in
T, time
H, columns
V, rows Figure 2.23
3-D Noise coordinate system.
2.6 Signal-to-Noise Ratio
37
eight noise components. These components are described in Table 2.1. Note that the subscripts of the noise components indicate the dimensions in which the noise components fluctuate. The symbol σtvh is the parameter that resembles NETD. References by D’Agostino [13] and Webb et al. [14, 15] provide the measurement and calculation of 3-D noise. In Table 2.1, σvh is the most common fixed-pattern noise seen in advanced FPAs. In many cases, this random spatial noise is the only significant noise inherent in the sensor other than the random spatio-temporal noise. Therefore, the random spatial noise is given as the imager fixed-pattern noise. The 3-D noise is not the only method for characterizing fixed-pattern noise in FPA imagers. Inhomogeneity equivalent temperature difference (IETD) is defined as the blackbody temperature difference that produces a signal equal to a signal caused by the different responses of the detectors. It is important in staring arrays because it can be the main source of noise. In terms of 3-D noise, IETD is the collective noise attributed to σvh, σv, σh, that is, σ 2vh + σ 2v + σ 2h
IETD =
(2.65)
Again, in many advanced FPAs, the random spatial noise is the only significant factor, so IETD is approximately the random spatial noise. Note that IETD is small when nonuniformity correction (NUC) has been applied to the sensor under test. Finally, correctability describes the residual spatial noise after the calibration and NUC of the sensor and is normalized to the random spatio-temporal noise, σtvh. A value of “one” means that the spatial noise after correction is equal to the random spatio-temporal noise of the system, that is, C=
σ 2vh + σ v2 + σ 2h σ tvh
(2.66)
The most desirable situation occurs when the sensor is limited by the random spatio-temporal noise (i.e., the correctability is less than one). In modeling the effect of noise on the sensor in which the sensor noise is only comprised of random spatio-temporal and random spatial noise, the contributions of the noise parameters are Table 2.1
3-D Noise Components
Noise Component
Potential Source
σtvh
Random spatio-temporal noise Detector temporal noise
σtv
Temporal row bounce
σth
Temporal column bounce
Scan effects
σvh
Random spatial noise
Pixel processing, detector nonuniformity
σv
Fixed row noise
Detector nonuniformity, 1/f
σh
Fixed column noise
Detector nonuniformity, scan effects
σt
Frame to frame noise
Frame processing
S
Mean of all components
Line processing, 1/f, readout
38
Imaging Systems 2 2 Ω( f x ) = σ tvh Et E v ( f x )E h ( f x ) + σ vh E v ( f x )E h ( f x )
(2.67)
where Et, Ev(fx), and Eh(fx) are the temporal, vertical, and horizontal integrations associated with the eye/brain and fx is the spatial frequency in cycles per milliradian. For uncorrelated noise, the temporal integration can be estimated by Et =
1 FR τ e
(2.68)
where FR is the frame rate of the sensor and τe is the integration time constant of the eye. Note that denominator of (2.68) gives the number of frames that the eye integrates in a time constant. Therefore, the noise contribution of the random spatio-temporal component in (2.67) is reduced by the number of frames that are integrated by the eye. The random spatial noise contribution remains constant with time.
2.7
Electro-Optical and Infrared Imaging Systems There are numerous engineers and scientists who consider electro-optical imaging systems as those that view reflected light and infrared imaging systems as those that view emitted light. In Figure 2.3, the image on the left was formed when the imager viewed light that was completely reflected by the target and background. For the image on the right, the light imaged was emitted by the target and the background. Electro-optical systems cover the 0.4- to 3.0-µm bands. The infrared band certainly covers the 8- to 12-µm band (LWIR) and most of the time it covers the 3- to 5-µm (MWIR). At night, the MWIR band provides a mostly emissive target and background signature, while in the daytime, the signature is the combination of emitted light and solar light that is reflected by the target and the background. This MWIR daytime case, as well as the case for electro-optical systems with all reflected light, is extremely difficult to characterize. In both measurements and performance modeling, the reflected light case associated with electro-optical systems is a more difficult problem because it is a two-path problem. The first path is the path from the illuminator (i.e., the sun in most cases) to the target or the background. The light from the first path is multiplied by the target or background reflectivities and then the second path is the path from the target, or background, and the imager. The second path is the exitance of reflected flux through the atmosphere into the sensor aperture, on to the focal plane, converted to electrons by the detector, and then processed and displayed for human consumption. The infrared case is much simpler and is a single-path problem. The light is emitted from the target, traverses the single atmospheric path, enters the optics, is converted to electrons by the detector, and is processed and displayed for human consumption. In Chapter 3, the overall system performance is considered to include sensitivity and resolution.
2.8 Summary
2.8
39
Summary This chapter introduced the basic imaging system and its components. The concepts of resolution and sensitivity have been discussed. The imaging system components have been introduced and their contributions to overall system resolution and sensitivity were presented. These discussions, along with the presented sampling theory, aid in the formation of the overall system performance metrics that are developed in Chapter 3.
References [1] Driggers, R. G., P. Cox, and T. Edwards, Introduction to Infrared and Electro-Optical Systems, Norwood, MA: Artech House, 1999, p. 8. [2] Goodman, J., Introduction to Fourier Optics, New York: McGraw-Hill, 1968, pp. 17–18. [3] Gaskill, J., Linear System, Fourier Transforms, and Optics, New York: Wiley, 1978, p. 72. [4] Gaskill, J., Linear System, Fourier Transforms, and Optics, New York: Wiley, 1978, p. 47. [5] Holst, G., Electro-Optical Imaging System Performance, Orlando, FL: JCD Publishing, 1995, p. 127. [6] Vollmerhausen, R., and R. Driggers, Analysis of Sampled Imaging Systems, Ch. 4, Bellingham, WA: SPIE Press, 2001. [7] Overington, I., Vision and Acquisition, New York: Crane and Russak, 1976. [8] Vollmerhausen, R., Electro-Optical Imaging System Performance Modeling, Chapter 23, Bellingham, WA: ONTAR and SPIE Press, 2000. [9] Gaskill, J., Linear System, Fourier Transforms, and Optics, New York: Wiley, 1978, p. 60. [10] Vollmerhausen, R., and R. Driggers, Analysis of Sampled Imaging Systems, Bellingham, WA: SPIE Press, 2001, pp. 68–69. [11] Vollmerhausen, R., and R. Driggers, Analysis of Sampled Imaging Systems, Bellingham, WA: SPIE Press, 2001, pp. 73–85. [12] Lloyd, M., Thermal Imaging Systems, New York: Plenum Press, 1975, p. 166. [13] D’Agostino, J., “Three Dimensional Noise Analysis Framework and Measurement Methodology for Imaging System Noise,” Proceedings of SPIE, Vol. 1488, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing II, Orlando, FL, April 3, 1991, pp. 110–121. [14] Webb, C., P. Bell, and G. Mayott, “Laboratory Procedures for the Characterization of 3-D Noise in Thermal Imaging Systems,” Proceedings of the IRIS Passive Sensors Symposium, Laurel, MD, March 1991, pp. 23–30. [15] Webb, C., “Approach to 3-D Noise Spectral Analysis,” Proceedings of SPIE Vol. 2470, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing VI, Orlando, FL, April 19, 1995, pp. 288–299.
CHAPTER 3
Target Acquisition and Image Quality 3.1
Introduction In Chapter 2, we reviewed the basic principles of imaging systems. In this chapter, we study methods of determining image performance, including target acquisition theory and an image quality metric. Signal or image processing is often used to enhance the amount of information in an image available to an observer. If an observer is using imagery to perform a task, then an enhancement in the information content available to the observer will result in an improvement in observer performance. It is logical then to assess image processing techniques based on observer performance with and without the application of the image processing technique being assessed. One application in which this technique has been used extensively is military target acquisition with imaging sensors. These sensors often operate in conditions in which the image formed is significantly degraded by blur, noise, or sampling artifacts. Image processing is used to improve the image quality for the purpose of improving target acquisition performance. An example of this type of assessment is given by Driggers et al. [1]. In this chapter, a theory of target acquisition is reviewed. First, a brief history of the military development of target acquisition is presented, followed by a discussion of human threshold vision. A metric based on threshold vision is then provided and related to the statistical performance of human observers. The chapter concludes with a discussion of how these results can be used in the assessment of image processing techniques.
3.2
A Brief History of Target Acquisition Theory In 1958, John Johnson of the U.S. Army Engineer Research and Development Laboratories, now NVESD, presented a methodology for predicting the performance of observers using electro-optic sensors [2]. The Johnson methodology proceeds from two basic assumptions: 1. Target acquisition performance is related to perceived image quality; 2. Perceived image quality is related to threshold vision. The first assumption is recognition of the fact that our ability to see a target in an image is strongly dependent on the clarity of the image. A blurred and noisy
41
42
Target Acquisition and Image Quality
image degrades target acquisition performance. The second assumption acknowledges that the only information that affects the target acquisition process is that which transits the human visual system. Details in the image that are imperceptible to the observer have no bearing on target acquisition performance. Together, these two assumptions encompass limitations on target acquisition performance imposed by both the sensor and the observer. The Johnson methodology used a measure of image quality based on the limiting spatial frequency that an observer could see. Limiting frequencies were determined empirically by observers viewing bar patterns with the sensor. By adjusting the frequency and intensity of the bar pattern, a curve was traced out defining the threshold values for the observer using the sensor. After averaging over multiple observers, this curve became the characterization of the threshold response of a “typical” observer. This curve is the minimum resolvable contrast (MRC) curve for electro-optic sensors. The equivalent for thermal infrared sensors is the minimum resolvable temperature difference (MRTD). Image quality measured in this way encompasses two key concepts: the threshold response of the observer at all spatial frequencies (MRC or MRTD) and a metric of image quality based on this threshold response (limiting frequency). With the assumption that image quality influenced target acquisition task performance, Johnson was able to relate limiting frequency to the range performance of observers performing target acquisition tasks. This was done by empirically determining the number of line pairs (period of the limiting frequency) of a bar target that could be placed across the height of a complex target at range and at the same contrast as the bars. By dividing limiting frequency for a given target contrast by the range to that target, the number of line pairs across a target could be determined and was given the value of N. Johnson then performed experiments to determine the number of line pairs (N) needed for observers to perform a task at a 50 percent probability of success. Tabulations of the number of line pairs for various acquisition tasks were created empirically and became known as the Johnson criteria, or the N50 values for task performance. A tabulation of some typical values for N50 for military vehicle target acquisition tasks is shown in Figure 3.1. Later, researchers would develop analytic models for predicting MRC and MRTD [3]. The best known of these models was FLIR92 [4], which provided sensor designers with a tool for predicting the MRTD of thermal sensors. They would also develop an empirically determined function for the probability of task performance based on the Johnson criteria known as the target transfer probability function (TTPF) and would incorporate the effects of atmospheric transmission into the process. These developments would be packaged in a separate model known as ACQUIRE [5]. The entire process summarized in the Johnson methodology is illustrated in Figure 3.1. Over the past few years, this methodology has been reinterpreted in light of psychophysical research to form a new view of the theory of target acquisition, which is different in substance than Johnson’s approach but not different in principle. Developing a model based on Johnson’s assumptions requires prediction or measurement of a threshold vision function of the observer through the sensor, the use of an appropriate image quality metric based on the predicted or measured threshold vision function, and a methodology for relating that metric to task perfor-
3.3 Threshold Vision
43 Inherent ∆T temperature difference
∆T × Transmission = ∆T´ ∆T´
Critical dimension MRT
h
Target to sensor range R
f´ Frequency
Number of resolvable Resolvable h = cycles N frequency f´ × R
Target transfer probability function (TTPF)
N 50
N 50
Detect 1.0
Probability
Probability
Task Aim Recog ID
2.5 4.0 8.0
N Resolvable cycles
Figure 3.1
Range
Johnson’s methodology relating image quality to target acquisition performance.
mance. The new view uses these same procedures to create a model better grounded in current psychophysical theory. The main element of the new approach is the replacement of limiting frequency metric for image quality with the target task performance (TTP) metric. This new element has dramatically improved the accuracy of predictions made using the Johnson methodology. An attendant benefit of this new approach is its greater ability to predict the performance of sensors using advanced image processing techniques. A description of the elements of this approach follows.
3.3
Threshold Vision 3.3.1
Threshold Vision of the Unaided Eye
Threshold vision of the observer in the absence of a sensor can be defined in terms of the contrast threshold function (CTF) of the eye, which defines the observer threshold contrast of a sine wave grating. Contrast for a sine wave is defined by C=
∆L peak −to − peak Lmax − Lmin = 2 L avg Lmax + Lmin
(3.1)
where Lmax and Lmin are the maximum and minimum luminance of the sine wave, respectively. The expression on the right is an alternative statement. In repeated
44
Target Acquisition and Image Quality
observations, an observer’s probability of detecting the presence of a sine wave will increase from a guessing rate to unity as the contrast of the sine wave increases. The threshold is normally defined as the contrast value for a probability level between the guessing rate and unity, and it is dependent upon the psychophysical technique used to measure the threshold. A common method of measuring the CTF uses a two-alternative, forced-choice procedure with a guessing rate of 50 percent. The threshold contrast is measured as the contrast at which 75 percent of the responses are correct. In general, CTF varies between observers, light levels, and display sizes. In Figure 3.2, a chirped frequency sine wave with variation in the horizontal direction only is shown. The chart, known as a Campbell-Robson chart [6], is formed by varying frequency in the horizontal direction by some power, in this case 2.5, and varying the peak-to-peak amplitude in the vertical direction by some power, in this case 2. In doing this, accepting the limitations of display, the peak-to-peak amplitude measured at any point along any horizontal line is constant. Despite this fact, human observers will see horizontal variation in the amplitude of the sine wave, especially toward the bottom of the chart. This is due to the early vision limits of the eye and forms a fundamental limit to the amount of visual information available to an observer. Although the threshold behavior of individuals will vary, for the purposes of modeling, an average contrast threshold response can be used. Barten has provided an empirical fit for CTF to data from an ensemble of observers [7]. Barten’s equation is given by CTF ( f ) = afe − bf 1 + ce bf
−1
(3.2a)
where f is the spatial frequency in cycles per degree,
Figure 3.2 eye.
Campbell-Robson chart demonstrating the contrast threshold characteristics of the
3.3 Threshold Vision
45 −0. 2
. 07 5401 + L a= 12 1+ 2 f w 1 + 3
(3.2b)
w is the root area of the image in degrees, L is the display luminance in candelas per square meter, 100 b = 03 . 1 + L
0.15
(3.2c)
and c is 0.06. In Figures 3.3–3.5, the Barten CTF in (3.2) is compared with measured CTF data from three sources. The first is data taken by Van Meeteren as reported by Beaton and Farley [8]. The second source is data taken from a project known as Modelfest [9], which consists of a series of visual psychophysical experiments conducted by a large group of researchers over the course of a few years. The third source is previously unpublished measurements conducted by the author at the NVESD in 2005–2006. The Barten equation approximates well the measurements conducted by Van Meeteren (see Figure 3.3), but this should not be surprising since these measurements formed a large part of the basis for Barten’s approximation. However, the measurements conducted at NVESD (see Figure 3.5) are similar to the Van Meeteren measurements and are well represented by the Barten equation. The Barten equation predicts a lower-threshold contrast than measured in the Modelfest
Threshold contrast
1
0.1
0.01
0.001 0
0.5
1 1.5 Spatial frequency in cycles/milliradian
Van Meeteren measurements
2
2.5
Barten prediction
Figure 3.3 Measured threshold contrast from Van Meeteren compared with the Barten prediction. The display had a root area of 13.67 degrees and an average luminance of 10 candelas per square meter.
46
Target Acquisition and Image Quality
Threshold contrast
1
0.1
0.01
0.001 0
0.5
1 1.5 2 Spatial frequency in cycles/millliradian Modelfest measurement
2.5
Barten prediction
Figure 3.4 Threshold contrast as measured during Modelfest and the Barten prediction. The root area of the display was 2.133 degrees and the average luminance was 30 candelas per square meter.
Threshold contrast
1
0.1
0.01
0.001 0
0.5
1 1.5 Spatial frequency in cycles/milliradian
NVESD measurements
2
2.5
Barten prediction
Figure 3.5 Threshold contrast measured at NVESD compared to Barten prediction. The root area of the display was 13.41 degrees and the average luminance was 34 candelas per square meter.
data (see Figure 3.4). A possible reason for this is that both Van Meeteren’s measurements and the NVESD measurements were conducted using unlimited presentation time, while the Modelfest data used a two-interval, forced-choice measurement. The Modelfest stimuli were presented in one of two short sequential time intervals, and the observer was required to identify the interval in which the stimuli occurred. As a
3.3 Threshold Vision
47
result, the task was somewhat more difficult, which would lead to higher-threshold levels. Nevertheless, the shape of the measured curves are in good agreement with the Barten equation and given that an unlimited presentation time is assumed for most target acquisition tasks, the Barten equation appears to be a good approximation to observer CTF. 3.3.2
Threshold Vision of the Aided Eye
The Barten equation allows a prediction to be made of the threshold contrast of an observer viewing a target with the naked eye. This equation must be modified to account for the use of a sensor by the observer. It must be modified to account for both the benefits and the attendant penalties of using a sensor. There are two primary reasons an observer uses a sensor to aid visual acuity. The first reason is to gain vision in a portion of the electromagnetic spectrum where there would normally be no vision. An example is the use of an image intensification device. This device amplifies photons in the near infrared (NIR) portion of the spectrum and displays them in the visible region allowing the user to see under conditions in which vision would normally be severely limited or impossible. The second reason is to gain magnification. Optical binoculars, telescopes, and microscopes are all examples of devices primarily designed to provide magnification. Magnification has the effect of linearly expanding the CTF along the spatial frequency axis or f sensor = Mf eye
(3.3)
where fsensor is spatial frequency measured from the sensor to the target, feye is spatial frequency measured from the eye to the target, and M is the magnification. The magnification is a function of the angular size of the displayed image as measured from the observer, compared with the angular size of the object as measured from the sensor, and is given by M=
θ display θ object
(3.4)
Often in target acquisition, spatial frequencies are referenced to the target, which may be several kilometers from the observer. Equations (3.3) and (3.4) allow these spatial frequencies to be converted into equivalent spatial frequencies of the eye viewing a display. Ideally, a sensor would be a transducer of invisible photons into the visible region of the spectrum and provide magnification without any loss of information. Unfortunately, no “real” sensors can do this. The absorption of photons in detectors is not perfectly efficient, physical limits on the size of optics and detectors limit the spatial frequencies that can be seen resulting in blur, and many of the components used to make the sensor introduce noise. Indeed, the process of reflection, transmission, and absorption of photons in the environment is a noisy process. Real sensors can be characterized by the amount of blur (spatial frequency bandwidth limitations) and noise that they introduce. These effects exist even if a “pristine” image is presented on an electronic display as the display introduces
48
Target Acquisition and Image Quality
noise and blur. Both of these degradations involve a loss of information. With blur, the loss is with respect to modulation of an incoming signal. The modulation transfer function (MTF) describes the amount of contrast modulation, as expressed by (3.1), lost at a particular spatial frequency. For most common components of a sensor, high frequencies are attenuated more than low frequencies. If the observer and sensor are considered together as a system, the resulting CTF, in the presence of a blurry sensor, is increased at the higher spatial frequencies over what would be expected from a perfect sensor. The amount of increase in CTF is proportional to the amount of contrast lost as characterized by the MTF. Noise introduced at a given spatial frequency introduces uncertainty into the perceptual process. The intensity of the sine wave stimulus must be raised in order to overcome the uncertainty introduced by the noise resulting in a higher-contrast threshold. Both of these effects can be incorporated into a general equation for threshold vision of an observer through an electro-optic sensor, including both noise and MTF [10]. This equation characterizes the observer and the sensor as a system. It can be expressed as CTF SYS ( f ) =
CTF ( f M ) H sys
{1 + k (f )
2
}
σ N2 ( f )
12
(3.5)
where f is the spatial frequency of a sine wave target measured from the sensor, M is the magnification of the sensor, Hsys is the MTF of the sensor system, and N is the standard deviation of the perceived noise modulation in units of squared contrast per root hertz (Hz).The constant k is an empirically determined calibration factor having a value of 169.6 root Hz. The perceived noise is found by taking the spatio-temporal noise spectrum present at the eye and applying a linear filter representative of the perceptual process. The human visual system is believed to use spatial frequency and temporal channels tuned to the signal to mediate stimuli to the centers of perception in the brain [11]. These channels can be modeled as linear band pass filters centered on the frequency of the stimulus. Treated as a linear system, the perceived noise for a stimulus consisting of noise and horizontally varying sine waves is given by σ 2N ( f ) =
+∞
∫ dfY′
−∞
+∞
fY′ f X′ ∫−∞ df X′ S N ( f X′ , fY′ ) H eye M H eye M H S ( f X′ ; f )
2
(3.6)
where f x′ , f y′ , and f are variables of integration associated with the horizontal spatial, vertical spatial, and temporal frequencies, respectively. The power spectral density (PSD) is the spatio-temporal power spectrum of the noise contrast at the eye. It will have units of intensity and inverse spatial and temporal frequency, such as wattssquare milliradians-seconds. The spatial channel response for a horizontal sine wave is given by f ′ H S ( f ′; f ) = exp −2.2 log 2 f
(3.7)
3.3 Threshold Vision
49
The transfer function of the eye is given by (2.37) through (2.44) in Chapter 2. The response of the eye for a vertically oriented sine wave is found similarly. The PSD SN is found by computing the power spectrum of the noise signal from the sensor converted to contrast. Noise in a sensor typically arises during the detection process. Subsequent elements of the sensor, such as the display and any electronic processing, then filter this noise. The display is assumed to linearly map the noise into luminance. The spatial power spectrum of the noise at the eye can then be found by S N ( f X , fY ) =
S sensor ( f X , fY ) H sensor ( f X , fY ) 2 I scn
2
(3.8)
where Ssensor is the spatial noise spectrum at the source within the sensor, Hsensor is the combined transfer function of all elements, including the display, between the noise source and the eye, and Iscn is an intensity (measured in the same units as the noise) that maps to the average display luminance. For a monochromatic and linear display with Imin mapping to black and Imax mapping to white, this intensity is given by Iscn = (Imax + Imin)/2. The eye-brain has a temporal channel and integrates stimuli over time. This effect is captured in the calibration constant k of (3.5) and is reflected in the fact that it is given in root hertz. Temporal integration (the equivalent of lowpass filtering) in the eye averages the temporal variations in the noise and reduces its effect to the observer. For typical frame rates in cameras, the temporal integration of the eye is the limiting factor, and this effect is adequately modeled by the integration inherent in the constant k. Therefore, (3.8) can be used, neglecting the temporal aspects of the sensor. However, if the temporal character of the noise is changing at a scale below the sensor frame time, it can limit the effect of the temporal integration of the eye. An example in which this condition can occur is with fixed-pattern noise. Fixed-pattern noise sources have spatial variation, but no or very slow temporal variation. As shown in Section 3.5, the sensor transfer function in (3.8) will still yield a temporal response and will then be the limiting temporal factor in the system. Multiplying by the constant k in (3.5) inappropriately reduces the noise by a factor related to the temporal bandwidth of the eye. To compensate for this, the noise PSD in (3.8) can be increased by a similar factor. The temporal bandwidth of the eye is inversely related to the eye integration time. An empirical curve fit to data obtained by Schnitzler [12] provides a good estimate of the eye integration time. It is an inverse function of L, the average luminance in candelas per square meter, and in equation form is given by τ eye = 00192 . + 0078 . L−0.17 (seconds)
(3.9)
The power spectrum of fixed-pattern noise must be multiplied by this quantity to compensate for the temporal integration inherent in the factor k.
50
3.4
Target Acquisition and Image Quality
Image Quality Metric In the context of target acquisition, an ideal image quality metric measures the amount of information in a scene that is available to an observer. Since the scene typically cannot be known or specified a priori, the measure of information in the scene must be made in some statistically meaningful sense. The results from the previous section give lower limits on visible contrast due to limitations of the sensor and the human observer. A measure of information in a scene then might be the amount of contrast available above this lower limit. While there are many possible ways of forming such a metric, the following target task performance (TTP) metric has been shown to have a high degree of correlation with observer performance [13], V =
At R
f2
∫
f1
Ct df CTF SYS ( f )
(3.10)
Equation (3.10) defines a quantity that is proportional to the amount of information beyond threshold at the frequency f. Ct is the value of contrast associated with an ensemble of targets. The limits of the integral f1 and f2 are the frequencies where the contrast threshold CTFSYS(f) is equal to Ct. The area At is the area of the target measured in the same plane as spatial frequency, and R is the range to that plane. Two common planes used in imaging are the target plane, as measured from the sensor, and the display plane, as measured from the observer. Equation (3.10) can be used in either case as long as care is taken to measure the target area, range, and spatial frequencies in the same plane. The contrast Ct is used to represent the average information content of an ensemble of targets with respect to their local background. It is taken as a constant over frequency. Ideally, this representation should be consistent with (3.1) and have a strong correlation with the psychophysical response of observers. For complex scenes, the sine wave contrast definition given in (3.1) becomes problematic, and no satisfactory definition for all types of tasks of interest yet exists [14, 15]. The measure of contrast most often used in the target acquisition community is the RSS contrast. RSS contrast is defined by [16]
Ct =
( µt
− µ b ) + σ t2 2
2 I sc
(3.11)
where µt and µb are the average target and background intensities in the scene. Intensity can be measured in terms of physical quantities, such as temperature or luminance, or as integer quantities, such as photon counts or pixel values. The symbol σt is the standard deviation of the target intensity. The symbol Isc is the scene contrast intensity, which represents the intensity needed to generate the average luminance of the display. Often for thermal sensors, only the numerator of (3.11) measured in Kelvin is reported as a contrast. To use these quantities in (3.10), they must be normalized by an appropriate value of the scene contrast intensity (in Kelvins) as in (3.11).
3.4 Image Quality Metric
51
Assuming that the contrast is constant with frequency implies that the frequency content of the ensemble average of targets under consideration is spectrally white. This is a reasonable assumption for target sets having a wide variety of sizes and levels of detail. However, if a target set consists of highly similar or identical targets in both size and detail, perhaps differing only in orientation, then this assumption may not be valid. A modification of the definition of contrast has been developed for this situation [17]. In this formulation of contrast, the constant contrast is replaced with a term proportional to the Fourier transform of the target. The reader is referred to [17] for further details. The image quality measure defined in (3.10) has a one-dimensional (1-D) dependence on frequency. Images are two-dimensional (2-D) quantities so they are 2-D in frequency. Equation (3.10) could be reformulated in terms of a 2-D definition of the CTF. This requires a 2-D definition of the eye CTF, along with a 2-D definition of the noise terms. Instead, a 2-D form of (3.10) is constructed, assuming separability of the horizontal and vertical directions. Many imaging systems have a high degree of separability, and this approximation does not lead to significant errors in most cases. The 2-D form of (3.10) is then given by V2 D = V H V V
A = 2t R
fY 2 fX 2
∫ ∫
fY 1 fX 1
Ct H V CTF SYS ( f X )CTFSYS ( fY )
df X dfY
12
(3.12)
where the superscript H (or V) indicates that the quantity is calculated with respect to the horizontal (or vertical) dimension. The limits of integration are calculated as before, namely, where the target-to-background contrast is equal to the CTF in that dimension. Insight into the nature of this image quality metric can be gained from rewriting (3.12) in a different form. First, define the minimum discernable contrast as H V C 0 = min CTF SYS ( f X )CTFSYS ( fY )
(3.13)
and the target solid angle as Ωt =
At
(3.14)
R2
Factoring these terms out of (3.12) allows the equation to be written as
V2 D
fY 2 fX 2 C C = t Ωt ∫ ∫ df X dfY H V C 0 CTF SYS fY 1 fX 1 ( f X )CTFSYS ( fY )
12
(3.15)
The integral in (3.15) now represents a measure of the spatial bandwidth associated with the observer/sensor system. This bandwidth is inversely proportional to some solid angle in space, essentially, the minimum spatial detail the observer can resolve. Define
52
Target Acquisition and Image Quality
f Y 2 f X 2 C0 df X dfY Ω0 = ∫ ∫ f Y 1 f X 1 CTF H ( f )CTF V ( f ) SYS X SYS Y
−1
(3.16)
Then, (3.15) becomes C Ωt V2 D = t C 0 Ω 0
12
(3.17)
The ratios in (3.17) can be interpreted as the number of degrees of freedom in both contrast and resolution—fundamental measures of information content. This metric captures variations in the contrast of the scene, the amount of blur in sensor and display devices, and the amount of noise introduced in the sensing process. Most modern imaging systems use detectors configured in a focal plane array (FPA). The FPA produces a discrete sampled representation of the scene imaged. This sampling process produces artifacts in the image that can interfere with target acquisition tasks. In the context of (3.17), sampling can reduce the usable information content of the image. The mathematics of describing sampling was described in Chapter 2. As shown there and repeated here, in an expanded form, the response of a linear system in the presence of sampling is given by +∞
H SAMP ( f ) = H post ( f ) ∑ H pre ( f − nf s ) n =−∞
= H post ( f )H pre ( f ) + H post ( f )∑ H pre ( f − nf s )
(3.18)
n≠0
where Hpre is the linear product of all transfer functions prior to the sampling process, Hpost is the linear product of all transfer function posterior to the sampling process, and fs is the sampling frequency that is the inverse of the sample spacing. The second term in (3.18) represents replicas of the presample transfer function filtered by the postsample transfer function. It is scene information filtered by these replicas that interferes with the target acquisition process. As discussed in Chapter 2, [see discussion of (2.55)–(2.57)], three aggregate quantities have been used to describe the effects of sampling in imagers: total integrated spurious response, in-band spurious response, and out-of-band spurious response. Of these three, the out-of-band spurious response has been shown to be the most deleterious to target acquisition performance in many cases of interest. The out-of-band spurious response metric defined by (2.57) can be calculated in many ways. In practice, this quantity is calculated as 2 .5 f s
SR out − of − band =
∫
fs 2
∑ H ( f − nf )H ( f ) pre
s
post
n =−2 , −1 ,1 , 2 2 .5 f s
∫ 0
2
df (3.19)
H pre ( f )H post ( f ) df 2
3.5 Example
53
In (3.19) the sum is limited to the first four replicas and the integration is truncated at 2.5 times the sample frequency. This is done for ease of calculation. The image quality metric is then modified by the out-of-band spurious response by an empirically derived relationship given by V2 D =
. SR (1 − 058
H out − of − band
. SR )V (1 − 058 H
V out − of − band
)V
V
(3.20)
The second modification is to integrate the image quality metric up to the lesser of the contrast limit or the half-sample frequency. The TTP metric has been related to human performance of target acquisition tasks. The relationship is obtained by empirically fitting a parameterized function to results from carefully controlled perception experiments [13]. The relationship is given by α + β (V2 D V50 )
(V2 D V50 ) Ptask (V2 D ) = α + β (V 1 + (V2 D V50 )
2D
(3.21)
V50 )
where Ptask is the probability of performing a defined target acquisition task, V50 is an empirically determined constant representing the value of (3.12) corresponding to a 50 percent task performance probability. The difficulty of a task depends on the set of objects under consideration. Identifying highly similar items is a more difficult task than identifying highly disparate items. Values for V50 along with characteristic dimensions and contrast values for a variety of discrimination tasks are shown in Tables 3.1 and 3.2 for LWIR, MWIR, and visible sensors. The exponents and are also empirically determined constants. Commonly used values for military target acquisition tasks are 1.54 for and 0.24 for .
3.5
Example As an example of how the equations of the previous sections can be used, consider a simple imaging system composed of a lens, an FPA of detectors, and a display device. This system is illustrated schematically in Figure 3.6. The lens is assumed circularly symmetric and without aberrations. The detectors in the FPA are assumed to be square and have an 80 percent linear fill factor in each dimension. The linear
Table 3.1
Discrimination Data for Armored and Tracked Vehicles
Discrimination
Band
Object Set
Armored vehicle recognition
LWIR Tracked wheeled armored wheeled soft
Char. Dim. Contrast V50 (Meters) (as Given) (Cyc/tgt) 3.0
3.4K
16.9
Tracked armored LWIR 2S3, BMP, M1A, M2, M60, 3.0 vehicle identification M109, M113, M551, T55, T62, T72, ZSU2
4.7K
23.3
Tracked armored Visible 2S3, BMP, M1A, M2, M60, 3.0 vehicle identification M109, M113, M551, T55, T62, T72, ZSU2
0.28 (unitless)
22
54
Target Acquisition and Image Quality Table 3.2
Discrimination Data for Vehicles, Humans, and Handheld Objects Char. Dim. Contrast (Meters) (as Given)
V50 (Cyc/tgt)
Commercial and LWIR Ford, sedan, HMMV, paramilitary vehicle ID panhard M3, SUV, ambulance, pickup, SUMB, van, ferret, police car, pickup with RPG
2.2
7.3K
27
Commercial and MWIR Ford, sedan, HMMV, paramilitary vehicle ID Panhard M3, SUV, ambulance, pickup, SUMB, van, ferret, police car, pickup with RPG
2.2
7.0K
29
Human identification
LWIR Armed civilian, unarmed 0.7 civilian, soldier, contractor, armed contractor, police officer
5.0K
19
Human identification
MWIR Armed civilian, unarmed 0.7 civilian, soldier, contractor, armed conractor, police officer
5.4K
14
Single handheld object ID
LWIR Rock, camcorder, PDA, 0.1 gun, knife, radio, mug, brick, grenade, flashlight, cell phone, soda
3.0K
17
Single handheld object ID
MWIR Rock, camcorder, PDA, 0.1 gun, knife, radio, mug, brick, grenade, flashlight, cell phone, soda
3.3K
18
Single handheld object Visible Rock, camcorder, PDA, 0.1 ID gun, knife, radio, mug, brick, grenade, flashlight, cell phone, soda
0.3 (unitless)
9.5
Two handheld object ID
4.1K
16
Discrimination
Band
Object Set
MWIR Ax, pipe, broom, shovel, 0.25 stick, AK47, M16, RPG, RPK, sniper rifle
fill factor is the width of a detector divided by the distance between adjacent detectors (the pitch). The display is assumed to be a CRT-type display with a Gaussian spatio-temporal response function. The performance of an observer viewing a static frame of imagery will be considered. The magnification of this system is determined by the angular size of the objects on the display and the angular size of objects at range. It can be determined by making use of the one-to-one correspondence between detectors and display pixels in this simple imaging system. The angular subtense associated with a single detector in object space is a function of the detector spacing and the optical focal length. It is given by θ object =
∆ pitch fl
(3.22)
3.5 Example
55 Optics
Display
Focal plane array
Figure 3.6
Simple imaging system.
where ∆pitch is the pitch or spacing of the detectors and fl is the focal length of the lens. The sampling frequency will be the inverse of θobject. The angular subtense of a display pixel in eye space is found in a similar manner from the display pixel size and the observer viewing distance or θ display =
∆ pixel R viewing
(3.23)
where ∆pixel is the size of a display pixel and Rviewing is the viewing distance. The magnification can be found from (3.4). The MTF of the lens is given by (2.23) in Chapter 2 as 2 λρ λρ λρ 2 cos −1 − 1 − ρ ≤ D λ H optics ( ρ) = π D D D 0 otherwise
where ρ = f X2 + fY2 ,
(3.24)
is the diffraction wavelength, and D is the diameter of the
lens. The spatial frequencies are measured in object space (cycles per milliradian on the object). The detectors are square with an angular subtense in object space of θdetector = ∆detector/fl. The temporal response is assumed to be rectangular with an integration time given by τdetector. The MTF of a detector is then given by H detector ( f X , fY ) = sinc( θ detector f X )sinc( θ detector fY )
(3.25)
The display is assumed to have a symmetric Gaussian spatial response. The MTF of the display is given by θ display fY θ display f X H display ( f X , fY ) = Gauss Gauss M M
(3.26)
where M is the magnification. The temporal response of the system is assumed to be limited by the detector integration time. Noise is described in the model as a PSD. In practice, noise is characterized by a measured noise power. For a zero mean noise process, this is the variance of the
56
Target Acquisition and Image Quality
noise. In Chapter 2, the 3-D noise model was described and is characterized in terms of the variance of the various directional noise terms. Assume that only spatio-temporal noise is present. The spatio-temporal noise measured at the output of the 2 detectors is given by σ tvh and arises as a result of a white-noise process filtered by the 2 detector. The spatio-temporal noise σ tvh is related to the PSD of the noise by the following. 2 σ tvh = S detector
+∞ +∞
∫∫H
detector
( f X , fY )
2
df X dfY =
−∞ −∞
S detector 2 θ det ector
(3.27)
The detector is assumed to have a temporal bandwidth of τ −1 detector and is assumed to be the limiting temporal filter in the system. The calibration constant k adds an additional temporal filter that for a static frame of imagery is not appropriate. The effects of the temporal filters can be accounted for by multiplying by a factor of (τeye/τdetector) where τeye is given by (3.9). The noise PSD at the eye is then found using (3.8) as 2
τ eye 2 σ S N ( f X , fY ) = tvh θ 2detector H detector ( f X , fY )H display ( f X , fY ) τ detector I scn
(3.28)
To complete the description of the simple imaging system, parameters for (3.23)–(3.30) need to be assigned. Table 3.3 gives the assumed parameters. In addition to the values in Table 3.3, the image size is needed for the calculation of the CTF. This value was calculated using the display pixel size and the number of detectors in the array and assumes that each detector is mapped to a displayed pixel. The system described in Table 3.3 is typical of inexpensive LWIR microbolometer sensors. From these parameters, several intermediate results can be calculated. For instance, the angular subtense of the detector pitch is 0.5 milliradian in object space. The angular subtense of a display pixel is 1 milliradian in eye space. By (3.4), this results in a magnification of 2. Table 3.3
Parameters for a Simple Example
Parameter
Value
Lens diameter
5 cm
Focal length
5 cm
Diffraction wavelength
10 micrometers (µm)
Detector size
20 µm
Detector pitch
25 µm
Detector integration time
16.7 milliseconds
Array size
320 horizontal × 240 vertical
Display pixel size
0.0381 cm
Viewing distance
38.1 cm
Average display luminance
15 candelas per square meter
σtvh
50 mK
Average scene intensity ( Isc) 1.5K
3.5 Example
57
Figure 3.7 shows the MTFs for the various components in this system. The limiting MTF of the system is the display MTF. Figure 3.8 shows the MTF of the overall system compared with the MTF of the eye. It can be seen from this figure that the system is display limited at spatial frequencies above 0.7 cycle per milliradian and eye limited below that frequency. Figure 3.9 shows the CTF of the unaided eye with the parameters assumed for this sensor and the system CTF. This simple system is limited by the MTFs of the system elements. This system is also undersampled. The size of the Airy disk in an image plane can be calculated by the following D = 2.24 × λ × F #
where λ is the diffraction wavelength of the imager and F/# is the f-number of the optical system. The f-number is given by the ratio of the focal length to the diameter of the lens. For this simple system, F/# has a value of unity. Therefore, the smallest size of resolved image features at the FPA is around 22.4 µm. For a well-sampled image, the FPA must obtain at least two samples of the Airy disk. Since the pitch of the detectors in this FPA is 25 µm, the image is undersampled. To relate these curves to a target acquisition probability, a task must be assumed. From Table 3.2, the assumed task is the identification of commercial and paramilitary vehicles in the LWIR band. The targets have an average characteristic dimension (square root of area) of 2.2m. The numerator of (3.12) is given in Table 3.2 as 7.3K. A scene contrast temperature of three times this value is assumed to give a target contrast value of 0.33. For the detector parameters given, the out-of-band spurious response has a value of 0.084. Figure 3.10 shows the resulting values of the image quality metric described in (3.21) as a function of range to the target. The probability of identification can now be calculated using (3.22) and yields the values shown in Figure 3.11. 1 Lens Detector Display
0.9 0.8
Modulation contrast
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Figure 3.7
0
0.2
0.4 0.6 0.8 1 1.2 Cycles per milliradian in object space
MTFs of system components.
1.4
1.6
58
Target Acquisition and Image Quality 1 System Eye
0.9 0.8
Modulation contrast
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Figure 3.8
0
0.2
0.4 0.6 0.8 1 1.2 Cycles per milliradian in object space
1.4
1.6
MTFs of system and eye.
0
10
Unaided eye Eye with system blur and noise
Contrast threshold
−1
10
−2
10
−3
10 −3 10
−2
10
−1
10
0
10
Cycles per milliradian in object space
Figure 3.9
Contrast threshold functions for simple system.
The main limiting factors of this simple imager are the amount of blur introduced by the display and undersampling by the FPA. At this point, many forms of image processing could be employed to improve the performance of the imager. An interpolation could be done to increase the magnification. This would shift the transfer function of the display out in frequency relative to object space as shown in (3.26). A similar effect would take place for the eye transfer function used in the calculation of the noise in (3.6). The main consequences of shifting these transfer func-
3.5 Example
59 100 90 80 70
V2D
60 50 40 30 20 10 0 0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Range to target in km
Figure 3.10
V2D as a function of range.
1 0.9
Probability of identification
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Figure 3.11
0
0.2
0.4
0.6
0.8 1 1.2 Range to target in km
1.4
1.6
1.8
2
Probability of identifying commercial and paramilitary vehicles with a simple imager.
tions out in frequency are an increase in the amount of noise and an increase in the amount of out-of-band spurious response. The latter results because the increase in bandwidth of the post-sample filters allows more spurious response through the imager. In addition, the process of interpolation introduces a new transfer function
60
Target Acquisition and Image Quality
(see [18]) that limits the high-frequency response of the imager. The performance implications of this type of processing are discussed by Burks et al. [19]. It has been shown that the performance of undersampled imagers can be improved through an image processing technique known as super-resolution image reconstruction [20–22]. The details of this technique are discussed at length in Chapter 6. In short, this technique uses a time sequence of images with subpixel motion to reconstruct a single image at a higher sampling rate. The motion between each image can be obtained by either random vibrations inherent in the sensor or deliberate motion induced by a gimbal or other mounting apparatus of the sensor. For the purposes of this example, assume that a number of frames with sufficient motion have been obtained to reconstruct a single image that has twice the number of samples as the original. This will have the effect of halving the detector pitch in (3.22) subsequently doubling the sampling frequency. Since there are twice as many samples, either the image displayed will be larger, assuming the display can accommodate the increase in image size, or the FOV displayed will be smaller, assuming it cannot. In either case, the magnification of the imager will be doubled. This will result in an increase in noise for the reasons mentioned earlier in the discussion of interpolation. However, no interpolation filter will be introduced. The increase in sampling frequency will result in a decrease in spurious response. Figure 3.12 shows the performance results from linearly interpolating twice as many samples and for obtaining them through super-resolution reconstruction. It is assumed that all pixels (in this case 640 × 480) are displayed on a CRT having the same resolution and viewing distance as before. The impact of the increase in noise and spurious response result in a decrease in performance for the bilinear interpolation. These effects are overcome by super-resolution reconstruction leading to an attendant increase in performance. 1 Baseline Bilinear interpolation Super Super-resolution reconstruction
0.9
Probability of identification
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Figure 3.12 struction.
0
0.2
0.4
0.6
0.8 1 1.2 Range in km
1.4
1.6
1.8
2
Comparison of system performance using interpolation and super-resolution recon-
3.6 Summary
61
This simple example was designed to show how image quality for an imaging system relates to target acquisition and to illustrate the impact of image processing in terms of target acquisition performance. To keep the example simple, many important factors have been neglected, such as atmospheric transmission and turbulence, the presence of fixed-pattern noise, aberrations in the optics, vibration of the imager, and display interpolation. NVESD produces models that take into account all of the parameters just mentioned, among others [23]. The use of these models provides system and algorithm developers with an advanced tool for gauging the impact of image processing on target acquisition performance.
3.6
Summary Target acquisition performance by human observers is highly dependent on the quality of the image mediated by the human visual system. It is also dependent on the degradations and enhancements in image quality added through an imaging sensor system. This chapter advanced a performance modeling methodology that accounts for both observer ability and sensor capabilities. This model was developed over the last 50 years and is useful in the assessment of signal and image processing algorithms that are designed to enhance the target acquisition process. The applicability of the model was demonstrated by a simple example using superresolution image reconstruction.
References [1] Driggers, R. G., et al., “Superresolution Performance for Undersampled Imagers,” Optical Engineering, Vol. 44, January 2005, pp. 014002-1–9. [2] Johnson, J., “Analysis of Image Forming Systems,” Proc. Image Intensifier Symposium, 1958, pp. 249–273. [3] Ratches, J. A., “Static Performance Model for Thermal Imaging Systems,” Optical Engineering, Vol. 15, No. 6, December 1976, pp. 525–530. [4] Scott, L. B., and L. Condiff, “C2NVEO Advanced FLIR Systems Performance Model,” Proc. of SPIE Technical Symposium on Optical Engineering & Photonics in Aerospace Sensing, Orlando, FL, April 1990, pp. 16–20. [5] D’Agostino, J. A., et al., “ACQUIRE Range Performance Model for Target Acquisition Systems,” USA CECOM Night Vision & Electronic Sensors Directorate Report, Version 1, User’s Guide, Fort Belvoir, VA, May 1995. [6] Campbell, F. W., and J. G. Robson, “Application of Fourier Analysis to the Visibility of Gratings,” J. Physiol. Vol. 197, 1968, pp. 551–566. [7] Barten, P. G. J., “Evaluation of Subjective Image Quality with the Square-Root Integral Method,” J. Opt. Soc. Am. Vol. A7, 1990, pp. 2024–2031. [8] Beaton, R. J., and W. W. Farley, “Comparative Study of the MTFA, ICS, and SQRI Image Quality Metrics for Visual Display Systems,” DTIC Report, No. AD-A252 116, September 1991. [9] Carney, T., et al., “Development of an Image/Threshold Database for Designing and Testing Human Vision Models,” Proc. of SPIE Vol. 3644, Human Vision and Electronic Imaging IV, San Jose, CA, January 25, 1999, pp. 542–551. [10] Vollmerhausen, R. H., “Incorporating Display Limitations into Night Vision Performance Models,” 1995 IRIS Passive Sensors, Vol. 2, 1995, pp. 11–31.
62
Target Acquisition and Image Quality [11] Klein, S. A., “Channels: Bandwidth, Channel Independence, Detection vs. Discrimination,” in Channels in The Visual Nervous System: Neurophysiology, Psychophysics and Models, B. Blum, (ed.), London: Freund, 1991, pp. 447–470. [12] Schnitzler, A., “Image-Detector Model and Parameters of the Human Visual System,” J. Opt. Soc. Am., Vol. A63, 1973, pp. 1357–1368. [13] Vollmerhausen, R. H., E. Jacobs, and R. G. Driggers, “New Metric for Predicting Target Acquisition Performance,” Optical Engineering, Vol. 43, No. 11, 2004, pp. 2806–2818. [14] Peli, E., “Contrast in Complex Images,” J. Opt. Soc. Am., Vol. A7, 1990, pp. 2032–2040. [15] Bex, P. J., and W. Makous, “Spatial Frequency, Phase, and the Contrast of Natural Images,” J. Opt. Soc. Am., Vol. A19, No. 6, June 2002, pp. 1096–1106. [16] Whalen, M. R., and E. J. Borg, “Thermal Contrast Definition for Infrared Imaging Sensors,” Proc. of SPIE, Vol. 1967, Characterization, Propagation, and Simulation of Sources and Backgrounds III, Orlando, FL, April 12, 1993, pp. 220–227. [17] Vollmerhausen, R., and A. Robinson, “Modeling Target Acquisition Tasks Associated with Security and Surveillance,” Applied Optics, Vol. 46, No. 20, June 2007, pp. 4209–4221. [18] Vollmerhausen, R., and R. Driggers, Analysis of Sampled Imaging Systems, Bellingham, WA: SPIE Press, 2001. [19] Burks, S. D., et al., “Electronic Zoom Functionality in Under-Sampled Imaging Systems,” Proc. of SPIE, Vol. 6543, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing XVIII, Orlando, FL, April 11–13, 2007, pp. 65430M-1–12. [20] Kang, M. G., and S. Chaudhuri, “Super-Resolution Image Reconstruction,” IEEE Signal Processing Magazine, Vol. 20, No. 3, May 2003, pp. 19–20. [21] Krapels, K., et al., “Characteristics of Infrared Imaging Systems That Benefit from Superresolution Reconstruction,” Applied Optics, Vol. 46, No. 21, July 20, 2007, pp. 4594–4603. [22] Young, S. S., and R. G. Driggers, “Super-Resolution Image Reconstruction from a Sequence of Aliased Imagery,” Applied Optics, Vol. 45, No. 21, July 2006, pp. 5073–5085. [23] https://www.sensiac.gatech.edu/external/.
PART II
Basic Principles of Signal Processing
CHAPTER 4
Basic Principles of Signal and Image Processing 4.1
Introduction In previous chapters, we discussed the basic theory for imaging systems and performance. In this chapter, we provide a review of some of the basic principles of signal and image processing that will be useful throughout the book. The Fourier analysis method is reviewed first, followed by finite impulse response (FIR) filters and Fourier-based filters. Finally, the wavelet transform that is derived from the time-frequency window Fourier transform and is useful for multiscale (multiresolution) signal and image processing, is reviewed.
4.2
The Fourier Transform The study of an imaging system is largely based on processing measured signals and their Fourier properties. The Fourier transform is the main tool to achieve the translation of information contents of an imaging system’s measured signals into the desired image information. In this section, we briefly review the Fourier analysis method for 1-D and 2-D signals. For further details on this type of signal processing, the reader is referred to [1–8]. 4.2.1 4.2.1.1
One-Dimensional Fourier Transform Fourier Integral
A one-dimensional spatial signal g(x) has the following Fourier decomposition in terms of complex sinusoidal signals (inverse Fourier transform equation): g( x ) =
∫
∞
−∞
[
]
G( f x ) exp( j2 πf x x ) df x = ᑤ −(1f x ) G( f x )
(4.1)
where fx is the spatial frequency domain (units = cycle/meter or cycle/milliradian), j = −1, and G(fx) is called the Fourier transform of g(x). The symbol G(fx) is found via (forward Fourier transform equation): G( f x ) =
∫ g( x ) exp( − j2πf x ) dx = ᑤ ( ) [ g( x )] ∞
−∞
x
x
(4.2)
65
66
Basic Principles of Signal and Image Processing
The inverse Fourier transform is a representation/decomposition of g(x) in terms of a linear combination (view the Fourier integral as a sum) of G(fx), fx ∈ (−∞, ∞); the weight of G(fx) in this linear combination is exp(j2πfxx)dfx. Similarly, the forward Fourier transform provides a decomposition of G(fx) in terms of a linear combination of g(x), x ∈ (−∞, ∞). Above both integrals in (4.1) and (4.2) perform a reversible linear transform of information: one from the spatial domain to the spatial frequency domain, and the other from the spatial frequency domain to the spatial domain. The Fourier transform is an information-preserving linear transformation. 4.2.1.2
Properties of Fourier Transform
In this section, some of the properties of the Fourier transform that are used to analyze linear systems are provided. Linearity
We define s(x) to be the following linear combination of g1(x) and g2(x): s( x ) ≡ a1 g 1 ( x ) + a 2 g 2 ( x )
(4.3)
where a1 and a2 are constants. Then, S(fx) is related to G1(fx) and G2(fx) via the same linear combination in the fx domain; that is, S( f x ) = a1 G1 ( f x ) + a 2 G2 ( f x )
(4.4)
where S(fx), G1(fx), and G2(fx) are the Fourier transforms for s(x), g1(x), and g2(x), respectively. Example 4.1
Let g1(x) and g2(x) be the two pulse functions that are defined in the following: 1, for x + 1 ≤ 1 g1 ( x ) = 0, otherwise
(4.5)
1, for x − 05 . ≤ 05 . g2 (x) = otherwise 0,
(4.6)
Let s(x) = a1g1(x) + a2g2(x), where a1 = 1 and a2 = 2. The figures of g1(x), g2(x), and s(x), and their Fourier transforms, are illustrated in Figures 4.1, 4.2, and 4.3, respectively. The reader can verify that S(fx) and a1G1(fx) + a2G2(fx) are identical. Shifting
A function s(x) is said to be a linear-shifted version of g(x) by x0 in the spatial domain, if s( x ) = g( x − x 0 )
(4.7)
4.2 The Fourier Transform
67 a1 g 1(x) 3 2 1 0
−2 0 2 Spatial domain x a1 1 real [G1 (fx )]
4 3 2 1 0 −4
−2
0
2
4
Spatial frequency domain f x
Figure 4.1
Illustration of a1g1(x) and its Fourier transform a1G1(fx), where a1 = 1.
a2 g 2(x) 3 2 1 0
−2 0 2 Spatial domain x a21 real [G2 (fx )]
4 3 2 1 0 −4
−2
0
2
4
Spatial frequency domain f x
Figure 4.2
Illustration of a2g2(x) and its Fourier transform a2G2(fx), where a2 = 2.
where x0 is a constant. Then, S( f x ) = exp( − j2 πf x x 0 )G( f x )
(4.8)
Thus, a shift in one domain results in the addition of a linear phase function in the Fourier domain.
68
Basic Principles of Signal and Image Processing s(x) = a1 g 1 (x) + a2 g 2 (x) 3 2 1 0
−2 0 2 Spatial domain x
Real[S(fx )] = a 1 real[G 1 (f x )] + a 2 real [G 2 (f x )] 4 3 2 1 0 −4
−2
0
2
4
Spatial frequency domain f x
Figure 4.3
Function s(x) and its Fourier transform S(fx). Illustration of linearity property.
Example 4.2
Let g(x) be a pulse function and s(x) = g(x − x0) where x0 = 1, which are illustrated in Figure 4.4. From the Fourier transform definition, the magnitudes of their Fourier transforms are equal, as shown in Figure 4.4. In addition, the angle of multiplication of S(fx) and the conjugate of G(fx) is expressed as
[
]
Angle S( f x )G* ( f x ) = −2 πf x x 0 = −2 πx 0 f x
(4.9)
*
where G (fx) is the conjugate of G(fx). Actually, this angle is a linear function of fx and the slope of this linear function is −2πx0, which is illustrated in Figure 4.5. Scaling
If s( x ) = g( ax )
(4.10)
where a is a constant, then S( f x ) =
1 fx G a a
(4.11)
Example 4.3
Let g(x) be a pulse function shown in Figure 4.6, and s(x) = g(ax), a = 2 shown in Figure 4.7. Their Fourier transforms are shown at the bottom of the previous figures, respectively. The scaling effect in spatial domain and spatial frequency domain are illustrated.
4.2 The Fourier Transform
69 Abs[G(f x)]
g(x ) 2 1.5 1.5
1
1
0.5
0.5
0
−2
1 −1 0 Spatial domain x
0
2
−4
4
x
s(x) = g(x−x0)
2
−2 0 2 Spatial frequency domain f Abs[S(f x )]
1.5 1.5
1
1 0.5
0.5
0
−2
Figure 4.4
0
−1 0 1 Spatial domain x
2
−4
−2 0 2 Spatial frequency domain f x
4
Rectangular pulse, its shifted vision, and magnitudes of their Fourier transforms.
Differentiation and Integration
If s( x ) =
d g( x ) dx
(4.12)
then, S( f x ) = j2πf x G( f x )
(4.13)
Conversely, if s( x ) =
∫ g( x )dx
(4.14)
1 G( f x ) j2πf x
(4.15)
x
−∞
then, S( f x ) =
70
Basic Principles of Signal and Image Processing Angle[S.*conj(G)] 25 20 15 10 5 0 −5 −10 −15 −20 −25
−4
−2 0 2 Spatial frequency domain fx
4
Figure 4.5 Illustration of the shift property of the Fourier tranform. Angle of the multiplication of S(fx) and the conjugate of G(fx) is a linear function of fx. The slope is –2πx0.
g(x) 2 1.5 1 0.5 0
−2 0 2 Spatial domain x Real [G(f x )]
2 1 0 −4
−2
0
2
4
Spatial frequency domain f x
Figure 4.6
Illustration of a pulse function and its Fourier transform.
Example 4.4
d g( x), the derivative of g(x) with respect to dx x. They are shown in Figure 4.8. The Fourier transform of a pulse function g(x) is a sinc function that is written as Let g(x) be a pulse function and s( x) =
4.2 The Fourier Transform
71 g(x) = g(ax) 2 1.5 1 0.5 0
−2 0 2 Spatial domain x
Real[S(fx)] = 1/a * real[G(f x /a)] 2 1 0 −4
−2
0
2
4
Spatial frequency domain f x
Figure 4.7 a = 2.
Illustration of scaling property in spatial domain and spatial frequency domain, where
g(x) 2 1.5 1 0.5 0 −xp xp 0 Spatial domain x s (x) = d[g(x)]/dx 2 1 0 −1 −2 −x p
0
xp
Spatial frequency domain f x
Figure 4.8
Illustration of differentiation property.
G( f x ) = 2 x p
sin(2 πf x x p ) 2 πf x x p
(4.16)
When j2πfx is multiplied to both sides of the previous equation and called H(fx), it gives
72
Basic Principles of Signal and Image Processing
H( f x ) = j2 πf x G( f x ) = j2 πf x 2 x p
sin(2 πf x x p ) 2 πf x x p
(4.17)
= j2 sin(2 πf x x p ) = exp( j2 πf x x p ) − exp(− j2 πf x x p )
The inverse Fourier transform of H(fx) is the same as s(x), which is shown in Figure 4.8. This proves the differentiation property. Duality
Let G(fx) be the Fourier transform of g(x). If s( x ) = G( x )
(4.18)
[note that G( . ) can be a function of any variable] then, S( f x ) = g( − f x )
(4.19)
Proof
This can be proven according to the Fourier transform that is defined in (4.1). Replace G(x) with s(x) in the Fourier transform of s(x), we have ∞
S( f x ) =
∫ s( x ) exp( − j2 πf x ) dx −∞
∫
=
∞
−∞
x
(4.20)
G( x ) exp( − j2 πf x x ) dx
By writing the inverse Fourier transform of G(fx), we also have g( x ) = =
∫
∞
−∞
∫
∞
−∞
G( f x ) exp( j2 πxf x ) df x
G( f x ) exp(− j2 π(−x )f x ) df x
(4.21)
Now, if we exchange the variable x with fx for the previous equation, we have g( f x ) =
∫
∞
−∞
G( x ) exp(− j2π( − f x ) x ) dx
(4.22)
This is equivalent to g( − f x ) =
∫
∞
−∞
G( x ) exp( − j2πf x x ) dx
(4.23)
By comparing this with the equation of S(fx) in (4.20), we yield that S(fx) = g(−fx). The property reveals that whatever tools, principles, flexibilities, and other conditions that Fourier analysis brings to the study of signals and systems in a given domain, such as spatial, there are dual forms of those same conditions for the study of signals and systems in the Fourier counterpart domain (spatial frequency). Therefore, if there are any operational properties for a signal in any domain, there are also analogous forms for the signal in the Fourier counterpart domain.
4.2 The Fourier Transform
73
Convolution
Convolution of two signals in the spatial domain—for example, g1(x) and g2(x)—is defined in the following equation: s( x ) ≡ g 1 ( x ) * g 2 ( x )
∫
=
∞
−∞
g 1 (u )g 2 ( x − u )du
(4.24)
where * denotes convolution. In the spatial frequency domain, this translates to the following multiplication relationship: S( f x ) = G1 ( f x )G 2 ( f x )
(4.25)
This is the most popular property of the Fourier transform. The utility of the convolution property is in simplifying the analysis of inverse as well as forward problems of linear shift invariant (LSI) systems. The dual form of the convolution property is that if s( x ) ≡ g 1 ( x )g 2 ( x )
(4.26)
then, S( f x ) = G1 ( f x ) * G 2 ( f x ) =
∫
∞
−∞
G1 ( α x )G2 ( f x − α x )dα x
(4.27)
where * denotes convolution in the spatial frequency domain. Example 4.5
Let g1(x) and g2(x) be two identical pulse functions shown in Figure 4.9. The convolution of these two pulse functions is the triangle function that is also shown in Figure 4.9. The Fourier transforms of two pulse functions [G1(fx) and G2(fx)] and the triangle function [S(fx)], are shown in Figure 4.10. Indeed, S(fx) equals to the multiplication of two sinc functions, G1(fx) and G2(fx). Parseval’s Identity
Parseval’s Identity says that the inner product of two functions is equal to the inner product of their Fourier transforms; that is, I: =
∫
∞
∞
g 1 ( x )g *2 ( x )dx = ∫ G1 ( f x )G*2 ( f x )df x
−∞
−∞
(4.28)
*
where g (x) is the conjugate of g(x), and G1(fx) and G2(fx) are the Fourier transforms for g1(x) and g2(x), respectively. If g1(x) = g2(x), the Parseval energy conservation equation is obtained:
∫
∞
−∞
g1 ( x )
2
dx =
∫
∞
−∞
G1 ( f x )
2
df x
(4.29)
That is, the total energy in the function is the same as in its Fourier transform.
74
Basic Principles of Signal and Image Processing g 1(x)
g 2(x)
2
2
1.5
1.5
1
1
0.5
0.5
0
−2 0 2 Spatial domain x
0
−2 0 2 Spatial domain x
s(x) = g1 (x)*g2 (x)
2 1.5 1 0.5 0
−2 0 2 Spatial domain x
Figure 4.9 Illustration of convolution property. The convolution of two pulse functions, g1(x) and g2(x), is the triangle function, s(x).
Real[G1(f x )]
Real[G2(f x )]
1
1
0.5
0.5
0
0
−5 0 5 Spatial frequency domain f x
−5 0 5 Spatial frequency domain f x
Real[S(f x )] = real[G1 (f x )G2 (f x )] 1 0.5 0 −5 0 5 Spatial frequency domain f x
Figure 4.10 The Fourier transform of the triangle function, S(fx), equals to the multiplication of two sinc functions G1(fx) and G2(fx), which are the Fourier transforms of two pulse functions shown in Figure 4.9, respectively.
4.2 The Fourier Transform
75
Example 4.6 −x
Let g1(x) = e u(x), where u(x) = 1, for x ≥ 0, and u(x) = 0 for x < 0. The Fourier transform of g1(x) is G1 ( f x ) =
1 1 + j2 πf x
(4.30)
The total energy of the function g1(x) is I=
∫ ( e ) u( x ) dx = ∫ ∞
−x
2
−∞
∞
0
e −2 x dx =
1 2
(4.31)
The total energy of its Fourier transform G1(fx) is I = =
∫
∞
−∞
∫
∞
−∞
1 1 + j2 πf x
2
df x =
∫
∞
−∞
1 1 df x 1 + j2 πf x 1 − j2 πf x
(4.32)
1 1 df x = 2 2 2 1 + 4π f x
Periodic Signals
Let g(x) be a periodic signal; that is, g( x ) =
∞
∑ g ( x − 2lX ) 0
(4.33)
0
l =−∞
where 2X0 is one period of the signal, and g0(x) is the base signal (a single period that is centered at the origin). Then, its Fourier transform G(fx) is a sampled version of the Fourier transform G0(fx) of the base signal, g0(x); that is, ∞
G( f x ) = f x 0 G 0 ( f x ) ∑ δ( f x − lf x 0 )
(4.34)
l =−∞
where fx0 = 1/2X0. Example 4.7
An example of a periodic pulse signal and its Fourier transform is shown in Figure 4.11. The upper figure shows a periodic pulse function, g(x). The base signal g0(x), a rectangular pulse function with duration 6 units, is repeated every 20 units from −∞ to ∞. The lower figure shows the Fourier transform of g(x). It is a sampled version of the sinc function, which is the Fourier transform of the base rectangular pulse signal, g0(x). Sampling
Let gδ(x) be a delta-sampled signal of an arbitrary signal, g(x): gδ(x) =
∞
∑ g( x )δ( x − x ) n
n =−∞
n
(4.35)
76
Basic Principles of Signal and Image Processing Periodic pulse function 1
0.5
0 −50
0 50 Spatial domain x
Fourier transform of periodic pulse function 40 30 20 10 0 −10 −1 0 1 Spatial frequency domain f x
Figure 4.11 Illustration of a periodic signal and its Fourier transform. The upper figure shows a periodic pulse function g(x). The base signal g0(x), a rectangular pulse function with duration 6 units, is repeated every 20 units from −∞ to ∞. The lower figure shows the Fourier transform of g(x). It is a sampled version of the sinc function, which is the Fourier transform of the base rectangular pulse signal g0(x).
where xn = n∆x, and ∆x is the sample spacing in the spatial domain. Then, the Fourier transform of the delta-sampled signal, Gδ(fx), is composed of repeated versions of the Fourier transform of the continuous signal G(fx), that is, Gδ ( f x ) =
1 ∆x
∞
∑ G f
n =−∞
x
−
n ∆x
(4.36)
Example 4.8
Figure 4.12 shows a sampled pulse function and a portion of its Fourier transform. The upper figure shows a sampled signal gδ(x), which is a sampled version of a rectangular pulse function g(x) with duration 6 units; the sample spacing is 1 unit. The lower figure shows the Fourier transform of gδ(x). It is the repeated versions of the continuous sinc function, which is the Fourier transform of the pulse function, g(x). See similar results in Figures 2.15 through 2.17 of Chapter 2, in which the comb function is used for sampled imaging systems. In that case, the presample blur process is equivalent to g(x) and its Fourier transform, G(fx), is the presample blur function shown in Figure 2.15. The sampling process is performed by the comb function. The postsample reconstruction function, Gδ(fx) shown in Figure 2.17, is a replication of the presample blur function. Periodic Sampled Signal: Discrete Fourier Transform
Let g(x) be a periodic signal composed of evenly spaced delta functions; that is,
4.2 The Fourier Transform
77 Sampled pulse function 1
0.5
0 −10
0 10 Spatial domain x
Fourier transform of sampled pulse function 8 6 4 2 0 −2
−2 0 2 Spatial frequency domain f x
Figure 4.12 Illustration of a sampled signal and its Fourier transform. The upper figure shows a sampled signal gδ(x), which is a sampled version of a rectangular pulse function g(x) with duration 6 units; the sample spacing is 1 unit. The lower figure shows the Fourier transform of gδ(x). It is the repeated versions of the continuous sinc function, which is the Fourier transform of the pulse function g(x).
∞
N 2 −1
∑ ∑ g δ[ x − (n + Nl )∆ ]
g( x ) =
n
(4.37)
x
l =−∞ n =− N 2
where ∆x is the spacing between two consecutive delta functions and N∆x is the period. The second summation represents one period. The first summation represents the total l periods. This model provides a link between a discrete sequence with evenly spaced samples of a continuous signal [6]. Using the forward and inverse Fourier integrals, it can be shown that G( f x ) =
∞
N 2 −1
∑ ∑
[
Gm δ f x − ( m + Nl )∆ f x
l =−∞ m =− N 2
]
(4.38)
where Gm =
N 2 −1
∑
n =− N 2
gn =
and
2π g n exp − j m n N
1 N 2 −1 2π Gm exp j m n ∑ N N m =− N 2
(4.39)
(4.40)
78
Basic Principles of Signal and Image Processing
N∆ x ∆ f x = 1
(4.41)
Equations (4.38) and (4.39) are called the discrete Fourier transform (DFT) equations. It should be noted that G(fx) is also a period signal that is composed of evenly spaced delta functions. Example 4.9
Figure 4.13 shows a periodic sampled pulse function and a portion of its Fourier transform. The upper figure shows a periodic sampled pulse function, g(x). The base signal, a sampled rectangular pulse function with duration 4 units and the sample spacing 1 unit, is repeated every 16 units from −∞ to ∞. The lower figure shows the Fourier transform of g(x), also a periodic signal, that is the repeated versions of the sampled sinc function, which is the Fourier transform of a pulse function. 4.2.2 4.2.2.1
Two-Dimensional Fourier Transform Two-Dimensional Continuous Fourier Transform
The two-dimensional Fourier transform of a 2-D signal g(x, y) is defined as:
Periodic sampled pulse function 1
0.5
0
−20 0 20 Spatial domain x
Fourier transform of periodic sampled pulse function 15 10 5 0 −2
0
2
Spatial frequency domain fx
Figure 4.13 Illustration of a periodic sampled signal and its Fourier transform. The upper figure shows a periodic sampled pulse function g(x). The base signal, a sampled rectangular pulse function with duration 4 units and the sample spacing 1 unit, is repeated every 16 units from −∞ to ∞. The lower figure shows the Fourier transform of g(x) that is also a periodic signal that is the repeated versions of the sampled sinc function, which is the Fourier transform of a pulse function.
4.2 The Fourier Transform
79
G(f x , f y ) =
∫ ∫ g( x , y) exp[− j2π(f ∞
∞
−∞
−∞
x
]
x + f y y) dxdy
(4.42)
where (fx, fy) represents the spatial frequency domain for (x, y) and j = −1. The inverse Fourier transform of G(fx, fy) with respect to (fx, fy) is defined via g( x , y) =
∞
∞
−∞
−∞
∫ ∫
[
]
G(f x , f y ) exp j2π(f x x + f y y) df x df y
(4.43)
If g(x, y) is a separable 2-D signal, that is, g( x , y) = g 1 ( x ) g 2 ( y)
(4.44)
then, its 2-D Fourier transform becomes G(f x , f y ) = G1 ( f x ) G 2 (f y )
(4.45)
which is also a separable 2-D signal. Example 4.10: 2-D Separable Functions
A separable pulse function is written as: 1 for x ≤ X 0 and y ≤ Y0 g( x , y) = otherwise 0, y x = rect rect 2Y0 2X 0
(4.46)
1 for x ≤ 05 . rect( x ) = 0, otherwise
(4.47)
where
its Fourier transform becomes sin(2 πf y Y0 ) sin(2 πf x X 0 ) G(f x , f y ) = 2 X 0 2Y0 2 πf y Y0 2 πf x X 0
(4.48)
= 4X 0 Y0 sinc(2 πf x X 0 ) sinc(2 πf y Y0 )
An example of a separable rectangular function and its Fourier transform is shown in Figure 4.14. 4.2.2.2
Marginal Fourier Transform
The marginal (1-D) Fourier transform of g(x, y) with respect to x is defined by
80
Basic Principles of Signal and Image Processing Separable 2-D pulse
Fourier transform of 2-D pulse
Figure 4.14
Two-dimensional separable rectangular function and its Fourier transform.
[
]
Gx ( f x , y) ≡ ᑤ ( x ) g( x , y) =
∫
∞
-∞
(4.49)
g( x , y ) exp( − j2πf x x )dx
Also, the marginal (1-D) Fourier transform of g(x, y) with respect to y is defined by
[
] = ∫ g( x , y ) exp(− j2πf y)dy
Gy ( x , f y ) ≡ ᑤ ( y ) g( x , y)
(4.50)
∞
y
−∞
Note that
]} = ᑤ [G ( f , y)] { [ {ᑤ [ g( x, y)]} = ᑤ [G ( x, f )]
G(f x , f y ) = ᑤ ( y ) ᑤ ( x ) g( x , y) = ᑤ ( x)
4.2.2.3
(y )
(y )
( x)
x
y
x
(4.51)
y
Polar Representation of Fourier Transform
The polar transformation of the domain (x, y) for a 2-D signal g(x, y) may be represented by g p ( θ, r ) ≡ g( x , y)
where
(4.52)
4.2 The Fourier Transform
81
y θ ≡ arctan x
(4.53)
r≡
(4.54)
x 2 + y2
Note that g(x, y) ≠ gp(x, y). This equation is called the polar function mapping of g(x, y). The polar function mapping of G(fx, fy) in the spatial frequency domain may be defined via Gp ( φ, ρ) ≡ G(f x , f y )
(4.55)
fy φ ≡ arctan fx
(4.56)
where
ρ≡
f x2 + f y2
(4.57)
Note that Gp(., .) is not the Fourier transform of gp(., .). Using the polar mapping functions, the forward Fourier transform integral, that is, G(f x , f y ) =
∫ ∫ g( x , y) exp[− j 2π(f
]
x + f y y) dxdy
∞
∞
−∞
−∞
∞
2π
0
0
r g P ( θ, r ) exp − j 2 πρr cos( θ − φ) dθ dr
x
(4.58)
can be rewritten as: Gp ( φ, ρ) ≡
∫ ∫
[
]
(4.59)
Example 4.11: 2-D Radially Symmetric Functions
A radially symmetric disk function is written as 1 for x 2 + y 2 ≤ R g( x , y) = otherwise 0,
(4.60)
It is shown that [7] G(f x , f y ) = 2 πR
2
(
J1 2 πR f x2 + f y2
)
2 πR f x2 + f y2
where J1 is the Bessel function of the first kind, first order. Here, Airy pattern with
(4.61)
J 1 ( x) is called the x
82
Basic Principles of Signal and Image Processing
lim
J1 ( x )
x→ 0
x
=
1 2
Figure 4.15 shows an example of disk function and its Fourier transform. 4.2.2.4
Two-Dimensional Discrete Fourier Transform and Sampling
Let the 2-D signal be g(xn, ym), with n = −N/2, …, N/2−1, and with m = −M/2, …, M/2−1, where N and M are the number of samples in x and y domains, respectively. The sample spacing ∆x and ∆y are obtained by X0 Y and ∆ y = 0 N M
∆x =
(4.62)
where X0 and Y0 are the physical size of the signal (for example, an image), as shown in Figure 4.16. Then, each sample (xn, ym) can be represented by
( x n , y m ) = (n∆ x , m∆ y )
(4.63)
where n = −N/2, …, N/2−1, and m = −M/2, …, M/2−1. The discrete 2-D Fourier transform of g(xn, ym) is obtained by the following:
(
)
G f xk , f yl =
N 2 −1
∑
M 2 −1
∑ g( x
n =− N 2 m =− M 2
n
2π 2π , y m ) exp − j kn + lm M N
2-D radially symmetric disk
Fourier transform of 2-D radially symmetric disk
Figure 4.15
Two-dimensional radially symmetric function and its Fourier transform.
(4.64)
4.3 Finite Impulse Response Filters
83 ∆x
∆y
Y0
X0
Figure 4.16
Illustration of 2-D sampling grid.
where
(f
xk
) (
, f yl = k∆ f x , l∆ f y
)
(4.65)
with k = −N/2−1, …, N/2–1, and l = −M/2, ..., M/2−1. (fxk, fyl) are samples of 2-D spatial frequency domain—that is, (fx, fy). The sample spacing of the spatial frequency domain ∆ f x and ∆ f y are related to the spatial domain sample spacing by the following [5]: ∆ fx =
4.3
1 1 and ∆ f y = N∆ x M∆ y
(4.66)
Finite Impulse Response Filters 4.3.1
Definition of Nonrecursive and Recursive Filters
A finite impulse response (FIR) filter belongs to the simplest kinds of filters, which are called nonrecursive filters [9]. Let {gn} be a set of evenly spaced measurements of a signal g(x), where n is an integer and x is a continuous variable. The nonrecursive filter is defined by the linear equation: qn =
∞
∑h
k
k =−∞
g n−k
(4.67)
where the coefficients hk are the constants of the filter, the gn−k are the input data, and the qn are the outputs. This process is basic and is called a convolution of the data with the coefficients. In practice, the number of products that can be handled must be finite. Equation (4.67) becomes qn =
N
∑h
k =−∞
k
g n−k
(4.68)
where N is the finite number. A simple example of an FIR filter is a smooth filter in which all of the coefficients have the same value; that is,
84
Basic Principles of Signal and Image Processing
qn =
1 ( g n − 2 + g n −1 + g n + g n + 1 + g n + 2 ) 5
(4.69)
In this case, N = 2. This is the familiar smoothing equation. When not only data values are used to compute the output values qn but also other values of the output are used, the filters are called recursive filters. If the finite number is used in practice, the filter is defined as: qn =
N
∑h
k=0
M
k
g n − k + ∑ rk q n − k
(4.70)
k =1
where both the hk and the rk are constants. The counterpart of FIR, the infinite impulse response (IIR) filter, belongs to recursive filters. 4.3.2
Implementation of FIR Filters
A conventional FIR filter is developed in a way such that the filter (window) is generated in the frequency domain, and then it is inversely transformed into the spatial domain. The actual filtering is implemented by convolving the window and the signal in the spatial domain. Example 4.12
Figure 4.17 shows an example of FIR filter design for a lowpass filter. The lowpass filter in the frequency domain is constructed such that the cutoff frequency is chosen to be 0.3 (maximum frequency is normalized to one) as shown in Figure 4.17(a). The lowpass FIR filter coefficients in the spatial domain are displayed in Figure 4.17(b); the number of samples is chosen to be 20. Figure 4.18 shows an example of FIR filter design for a highpass filter. Again, the highpass filter in the frequency domain is formed by choosing the cutoff frequency to be 0.6 (maximum frequency is normalized to one) as shown in Figure
FIR filter frequency response
FIR filter 0.3
1
0.25 0.2
Filter
Magnitude
0.8 0.6
0.15 0.1
0.4
0.05 0.2 0
0 −1
−0.5 0 0.5 Normalized frequency (a)
1
−0.05
0
5
10 15 20 Sample number (b)
25
Figure 4.17 Lowpass FIR filter: (a) lowpass filter in the frequency domain, cutoff = 0.3 (maximum frequency is normalized to one), dashed line is the ideal filter (rectangular window); and (b) FIR filter coefficients in the spatial domain, number of samples = 20.
4.3 Finite Impulse Response Filters
85
FIR filter frequency response
FIR filter 0.5 0.4 0.3 0.2
0.8
Filter
Magnitude
1
0.6 0.4
0.1 0 −0.1 −0.2
0.2
−0.3
0 −1
−0.5 0 0.5 Normalized frequency (a)
1
−0.4
0
10
20
30
40
Sample number (b)
Figure 4.18 Highpass FIR filter: (a) highpass filter in the frequency domain, cutoff = 0.6 (maximum frequency is normalized to one), dashed line is the ideal filter (rectangular window); and (b) FIR filter coefficients in the spatial domain, number of samples = 40.
4.18(a). The highpass FIR filter coefficients in the spatial domain are displayed in Figure 4.18(b); the number of samples is chosen to be 40. This kind of FIR filtering was considered to be preferable to the Fourier-based methods (see Section 4.4), due to their simplicity and ease of implementation on the older computing environments. However, with the new digital processing architectures for signal and image processing, the Fourier-based methods not only provide more accurate solutions than the FIR filtering techniques, but they are also less computationally intensive. 4.3.3
Shortcomings of FIR Filters
Lowpass FIR filtering is commonly used in digital signal and image processing. It is a fairly simple scheme to implement and design. The main shortcoming of FIR filtering is that it cannot completely remove the higher-frequency contents of the signal. An FIR filter has a finite support in the spatial domain. Thus, its spectral support is infinite in the frequency domain. This implies that an FIR filter cannot remove all of the higher-frequency contents beyond the desired cutoff frequency. Therefore the FIR lowpass-filtered signal still suffers from some aliasing. This occurs especially when the FIR lowpass filter is used to obtain a subsampled version of the input image. Moreover, an FIR filter contains undesirable “ripples” in its passband [9]. Recent research has emphasized the variations of implementing the FIR filter using feedforward neural networks [10, 11], morphological filters [12], and different subsampling lattices and structure [13, 14]. In these implementations (e.g., in training the feedforward neural networks), the desired output values are designed as the average value for a smooth region (i.e., a lowpass filter). Therefore, the fundamental idea is still using lowpass FIR filtering prior to directly subsampling an image in the spatial domain. An alternative to the FIR filtering method is to use Fourier or frequency domain filtering, which is a form of infinite impulse response (IIR) filtering. Fourier-based filters are described in the following section.
86
4.4
Basic Principles of Signal and Image Processing
Fourier-Based Filters In the Fourier-based filtering, the user applies a filter in the frequency domain that is zero outside the desired band. Then, the bandwidth of the resultant image is reduced by the desired factor from the bandwidth of the original image. That is, the spatial frequency filter is denoted by H(fx, fy). Then the spectrum of the resultant image is obtained by Gs (f x , f y ) = G(f x , f y )H(f x , f y )
(4.71)
The simplest IIR filter H(fx, fy) is a rectangular one. However, this filter contains large sidelobes (ringing effect or Gibbs phenomenon) [9] in the spatial domain, which is not desirable. Many filter design methods have been implemented to yield a smooth transition band at the edge of the cutoff frequency to alleviate the ringing effect by using windowing. Examples of these windows include Gaussian window, Hamming window, Hanning window, and the Butterworth window, among others [3, 9]. Because of the lack of a sharp cutoff of these windows, these windows attenuate the higher-frequency components near the cutoff frequency of the signal. This results in the widening of the transition band of the impulse response, that is, point spread function (PSF). Moreover, this introduces nonuniform attenuation of the image and allows the eye to see the high-frequency signals introduced by the display sampling processing in the softcopy environment. Cavallerano and Ciacci [15] teach a method of modifying a filter to deemphasize one portion of frequency components and emphasize the other portion. The combination of this filter results in a flat-frequency response across the baseband of the signal. In reality, it is not possible to physically realize such a perfect filter. Drisko and Bachelder [16] suggest running a lowpass filter according to the Nyquist theorem to avoid aliasing. Another method that is described by Zhu and Stec [17] is to use an interpolation post-processing to alleviate the ringing distortion caused by this filtering. One filter is called the power window [5], which is developed as a frequency domain Fourier-based filter that can • •
Reduce the ringing effect in the spatial domain; Preserve most of the spatial frequency contents of the resultant image.
In this section, we describe the filter H(fx, fy) as a 2-D radially symmetrical filter with a smooth window; that is, H(f x , f y ) = H r (f x , f y )W ( ρ)
(4.72)
The 2-D radially symmetrical filter Hr(fx, fy) is defined as: 1 if ρ(f x , f y ) ≤ ρ s 0 H r (f x , f y ) = otherwise 0,
where
(4.73)
4.4 Fourier-Based Filters
87
ρ(f x , f y ) =
f x2 + f y2
is called the radial spatial frequency domain. The parameter ρs0 is the cutoff frequency in the radial spatial frequency domain. Four radial spatial frequency windows, W(ρ), are described as the following: 1. 2. 3. 4. 4.4.1
Radially symmetric filter with a Gaussian window; Radially symmetric filter with a Hamming window at a transition point; Radially symmetric filter with a Butterworth window at a transition point; Radially symmetric filter with a power window. Radially Symmetric Filter with a Gaussian Window
A Gaussian window is defined as: ρ2 W ( ρ) = exp − 2 2σ
(4.74)
where σ is the standard deviation of the window. However, the Gaussian window imposes a relatively heavy attenuation on the higher-frequency components in the resultant image. 4.4.2
Radially Symmetric Filter with a Hamming Window at a Transition Point
A Hamming window is defined as: ρ W ( ρ) = 054 . + 0.46 cos π ρs0
(4.75)
where ρs0 is the cutoff frequency. A Hamming window is similar to a Gaussian window in which it imposes a heavy attenuation on higher-frequency components in the resultant image. A better alternative is to apply a Hamming window at a transition point ρt close to the cutoff frequency. This window is constructed by: if ρ ≤ ρt 1, ( ρ − ρt ) W ( ρ) = 054 . + 0.46 cos π if ρt < ρ ≤ ρ s 0 ( ρ s 0 − ρt ) 0, otherwise
(4.76)
The transition point ρt can be selected as a certain percentage of the cutoff frequency based on the user’s specification; that is, ρt = γρ s 0 , 0 < γ < 1
(4.77)
It can be shown that the 3-dB point of this filter (i.e., the point at which the magnitude of this filter drops by 3 dB) is at
88
Basic Principles of Signal and Image Processing
ρ1 = 062 . ρt + 038 . ρ s 0 = (062 . γ + 038 . ) ρs0
(4.78)
Once the user selects ρt (or γ), then ρ1 can be determined, or vice versa. We also notice that when ρ = ρs0, we have W ( ρ s 0 ) = 008 .
(4.79)
20 log10 W ( ρ s 0 ) = −219 . dB
(4.80)
That is,
In this way, the filter does not significantly distort higher-frequency components in the resultant image. However, this window is not differentiable at the point where the Hamming window is applied (i.e., at ρ = ρt). A nondifferentiable window is known to cause undesirable ringing (sidelobes). The Hamming window with the parameter ρt = 0.7ρs0 is shown in Figure 4.19. 4.4.3 Radially Symmetric Filter with a Butterworth Window at a Transition Point
A Butterworth window is defined as: W ( ρ) =
1
(4.81)
n
ρ 1+ ρ1
where ρ1 is the 3-dB point, which can be selected as a certain percentage of the cutoff frequency ρs0. From this Butterworth window, when ρ = ρ1, we have 20 log10 W ( ρ1 ) = 20 log10
1 2
= −3 dB
(4.82)
5
Power, dB
0 −5 −10 −15 −20 −25
−200 −100
0
100 200
Spatial frequency domian f x
Figure 4.19 Power window (solid line), Hamming window at a transition point (dashed line), and Butterworth window at a transition point (dotted line).
4.4 Fourier-Based Filters
89
Then one can select W(ρ) at ρ = ρs0 to be 20 log10 W ( ρ s 0 ) = −219 . dB
(4.83)
Based on this criterion, the parameter n can be solved, and then the Butterworth window can be determined. The Butterworth window with the parameter ρ1 = 0.7ρs0 is shown in Figure 4.19. 4.4.4
Radially Symmetric Filter with a Power Window
The radial spatial frequency power window is defined as:
(
W ( ρ) = exp − αρ n
)
(4.84)
where the parameters ( , n) are chosen by the user, based on certain specifications, and n is an even integer. Since the parameter n is the power of the smooth window, this window is called the power window. The selection of the parameters ( , n) is discussed next. The Gaussian window is a special case of the power window when n = 2 and α = 2 1/2σ , where σ is the standard deviation of the signal. However, the Gaussian window imposes a relatively heavy attenuation on higher-frequency components in the resultant image. Therefore, we will consider another variation of the power window. By specifying two distinct values of W(ρ), such as, W , for ρ = ρ1 W ( ρ) = 1 W 2 , for ρ = ρ 2
(4.85)
one can solve for the parameters ( , n). For example, one can select: 20 log10 W1
ρ 1 = 0. 7 ρ s 0
20 log10 W 2
ρ2 = ρs0
= −3 dB
= −219 . dB
(4.86)
That means that the magnitude of the spectrum drops by 3 dB at 70 percent of the band, and the magnitude drops by 21.9 dB at the cutoff frequency ρs0. In this case, the solution for the parameters ( , n) is obtained α = 2.5 × 10 −8 n=6
(4.87)
The power window with the previous parameter is shown in Figure 4.19. This filter has a desired property (i.e., it is differentiable at the transition point). Therefore, it achieves both of the following desired filter design requirements: 1. Reduce the ringing effect in the spatial domain. 2. Preserve most of the spatial frequency components.
90
Basic Principles of Signal and Image Processing
4.4.5
Performance Comparison of Fourier-Based Filters
It is desirable to use a 2-D radially symmetric filter with a smooth window that yields a smooth transition at the cutoff frequency. The three types of windowing methods used conventionally are Gaussian, Hamming, and Butterworth. A 2-D radially symmetric filter using the Gaussian window imposes a relatively heavy attenuation on the higher-frequency components in the resultant image (see Section 4.4.1). For a Hamming window or a Butterworth window, a better alternative is to apply either one of the windows at a transition point close to the cutoff frequency (see Sections 4.4.2 and 4.4.3). This way, the filter does not significantly distort higher-frequency components in the resultant image. However, this type of window is not differentiable at the point where the Hamming window is applied. A nondifferentiable window is known to cause undesirable ringing (sidelobes). Figure 4.19 depicts the power window with the parameters (n = 6, α = 2.5 × −8 10 ). A Hamming window at the 3-dB point ρ1 = 0.7ρs0 (dashed line) and a Butterworth window at the 3-dB point ρ1 = 0.7ρs0 (dotted line) are also shown for comparison. Figure 4.20 shows the impulse responses of the power, Hamming, and Butterworth windows. The impulse responses show that the power window has the smoothest and the narrowest transition band among the three windows. Both the Hamming and Butterworth windows have unexpected shapes that appear in the transition band. We know that a Hamming window is not differentiable at the transition point. It is known that a Butterworth window has a nonsymmetric phase relation between the input and output signals that is not the same for all frequencies. This is one of the reasons that the impulse responses of Hamming and Butterworth windows behave unexpectedly.
4.5
The Wavelet Transform Wavelets have been viewed as a useful technique for many purposes [18–22]. For example, wavelets are developed for signal representing functions that are used in
0 −20 −40 −60 −80
−0.2 0 0.2 Spatial domian X
Figure 4.20 Impulse response of Power window (solid line), Hamming window at a transition point (dashed line), and Butterworth window at a transition point (line with crosses).
4.5 The Wavelet Transform
91
image compression; wavelets are also developed to extend Fourier analysis for time-frequency analysis in signal and image processing. Fourier transforms reveal the spectral information of the signal. In some applications, one only wants to investigate the spectrum of a localized time observation. This is called time-frequency analysis. However, this cannot be easily done without acquiring full knowledge of the signal in the time-domain and performing the Fourier transform for the entire signal. A technique that is called short-time Fourier transform (STFT) [23] was introduced in which a time-localization window is applied to the signal centered around the desired time T and a Fourier expansion is applied to the windowed signal. The Gabor transform is a special case of the STFT in which the Gaussian function is used as the window. It is observed that the time-frequency window of any STFT is rigid and is not very effective detecting signals with high frequencies and investigating signals with low frequencies at the same time. Therefore, the idea of wavelets was introduced by Grossman and Morlet [24] to overcome the inconvenience of STFT in which a scaling parameter (dilation) narrows and widens the time-frequency window according to high and low frequencies. The related idea is the multiscale (multiresolution) analysis in image processing. Generally, the desired structures have very different sizes. Hence, it is not possible to define a prior optimal resolution for analyzing images. The idea is to choose a function called the mother wavelet, which often has small spatial support (small wave), and hence the name, wavelet. A signal is then expanded (decomposed) into a family of functions that are the dilations (scales) and translations of the mother wavelet. Note: We commonly use Z and R to denote the set of integers and real numbers, respectively. L2(R) denotes the space of measurable square-integrable functions. 4.5.1
Time-Frequency Wavelet Analysis
4.5.1.1
Window Fourier Transform
The forward Fourier transform, which is defined in (4.2), G( f x ) =
∫ g( x ) exp( − j2πf x ) dx = ᑤ ( ) [ g( x )] ∞
x
−∞
(4.88)
x
provides a measure of spectral information of the signal g(x). The Fourier transform is defined through an integral that covers the whole spatial domain. In order to extract information of the spectrum G(fx) from local observation of the signal, g(x), Gabor [23] defined a new transform using a special window in the Fourier integral: ᑬ( f x , u ) =
∫ [ g( x ) exp( − j2πf x )] z( x − u )dx = ᑬ ( ∞
x
−∞
fx ,u)
[ g( x )]
(4.89)
where u is the desired location of the local observation. In the original Gabor transform, the window z(x) is a Gaussian function. The Gabor transform localizes the Fourier transform of g(x) around x = u. This can be seen in the following window Fourier transform,
[
] ∫ g( x )[ exp( − j2πf x )z( x − u)]dx
ᑬ ( f x , u ) g( x ) =
∞
−∞
x
(4.90)
92
Basic Principles of Signal and Image Processing
We denote w f x , u ( x ) ≡ exp( − j2πf x x ) z( x − u )
(4.91)
then, a window Fourier transform is interpreted as the inner product of the signal g(x) with the family of the window function ( w f x , u ( x)) (f , u )∈R 2 : x
[
]
ᑬ f x , u g( x ) = g( x ), w f x , u ( x ) =
∞
∫ g( x )w ( x ) dx fx ,u
−∞
(4.92)
2
In general, for any window w(x) ∈ L (R) that satisfies x w( x ) ∈ L2 (R )
(4.93)
the center xc and width ∆w of the window w(x) are defined by xc ≡
1 w( x )
2
∫
∞
−∞
x w( x ) dx 2
(4.94)
2
and ∆w ≡
1 w( x )
2
{∫
The norm of w(x) is w( x)
∞
−∞
(x − xc )
2
=
∫
∞
−∞
2
}
w( x ) dx 2
(4.95)
2
w( x) dx. So that the Gabor transform
ᑬ f x , u [ g( x)] gives the local information of g(x) in the spatial window:
[x
c
+ u − ∆w, xc + u + ∆w]
(4.96)
Suppose that the Fourier transform W f x , u ( f x ) of the window function w f x , u (fx) also satisfies (4.93). Then, the center fxc and width ∆W of the window function W f x , u ( f x ) are determined by using formulas analogous to (4.94) and (4.95). By setting [18]: Vf x , u ( f x ) ≡ exp( j2 πf xu u ) exp( − j2 πf x u )W f x , u ( f x − f xu )
(4.97)
the Parseval Identity yields
[
]
ᑬ ( f x , u ) g( x ) = g( x ), w f x , u ( x ) = G( f x ), Vf x , u ( f x )
(4.98)
Therefore, the Gabor transform also gives local information of g(x) in the frequency window:
[f
xc
+ f xu − ∆ W , f xc + f xu + ∆ W ]
(4.99)
The width of the spatial-frequency window remains unchanged for localizing signals with both high and low frequencies.
4.5 The Wavelet Transform
4.5.1.2
93
Wavelet Transform
A window Fourier transform is better suited for analyzing signals where all of the patterns appear approximately at the same scale. In order to analyze the signals with patterns in various resolutions in the spatial and frequency domains, the wavelet transform was introduced by Grossman and Morlet [24]. A basic function is selected and is called the mother wavelet, ψ(x). The corresponding wavelet family is given by the translations and scales (dilations) of the unique function, ψ(x) (i.e., ψ[s(x − u)], where s is the scale parameter and u is the translation parameter). Then 2 the wavelet transform of a signal g(x) ∈ L (R) is defined by
[
] ∫ g( x )
W ( s , u ) g( x ) =
∞
−∞
[
]
sψ s( x − u ) dx
(4.100)
If we denote: ψs ( x) =
sψ( sx )
(4.101)
where s is a scale parameter, the wavelet transform can also be written as the inner product as in (4.92):
[
]
W ( s , u ) g( x ) = g( x ), ψ s ( x − u )
(4.102)
where u is a translation parameter. Let us assume that the mother wavelet ψ(x) and its Fourier transform Ψ(fx) both satisfy (4.93). Then, if the center and width of the window function ψ(x) are given by xc and ∆ψ, respectively, the wavelet transform gives local information of a signal g(x) with a spatial window
[u + x
]
s − ∆ ψ s, u + x c s + ∆ ψ s
c
(4.103)
This window narrows for large values of s and widens for small values of s. Now, consider and reset the Fourier transform of the mother wavelet as follows [18]: Ψs ,u ( f x ) =
s∫
[
∞
]
exp( − j2 πf x x )ψ s( x − u ) dx
−∞
=
f exp( − j2 πf x u )Ψ x s s
1
(4.104)
Let the center and width of the mother wavelet window function be fxc and ∆Ψ, respectively. Then, by setting Θ( f x ) ≡ Ψ ( f x + f xc )
(4.105)
similarly using the Parseval’s Identity, the following results
[
]
W ( s , u ) g( x ) =
1
∫ s
∞
−∞
f − s f xc G( f x ) exp( j2πf x u )Θ x df x s
(4.106)
94
Basic Principles of Signal and Image Processing
This equation expresses that, with the exception of a multiple of 1/( s ) and a linear phase shift of exp(j2πfxu), the wavelet transform W(s, u)[g(x)] also gives local information of G(fx) with a frequency window
[s f
xc
− s∆ ψ , s f xc + s∆ ψ
]
(4.107)
This means that the center of the passband of Ψ(fx) is sfxc and that its bandwidth is s∆Ψ. With the spatial window and frequency window, a rectangular spatialfrequency window results:
[u + x
c
]
s − ∆ ψ s, u + x c s + ∆ ψ s × [ s f xc − s∆ Ψ , s f xc + s∆ Ψ ]
(4.108)
As opposed to a window Fourier transform, the shape of the resolution cell varies with the scale s. When the scale s is small, the resolution is coarse in the spatial domain and fine in the frequency domain. If the scale s increases, then the resolution increases in the spatial domain and decreases in the frequency domain. Therefore, this window automatically narrows for detecting high-frequency patterns and widens for investigating low-frequency behaviors. Figure 4.21 shows this phenomenon. In this figure, for a mother wavelet, ψs ( x) =
sψ( s x )
(4.109)
f Ψ x s s
(4.110)
its Fourier transform is Ψs ( f x ) =
1
−m
where the scale parameter is set to be s = 2 , m = 1, 2, …, M. ~ Mallat [25] shows that when we denote ψ s ( x) = ψ s (− x), we can rewrite the ~ wavelet transform at a scale s and a point u as a convolution product with ψ s ( x)
[
]
~ W ( s , u ) g ( x ) = g ( x ) * ψ s (u )
(4.111)
|Ψ(f x )| Ψ2−3 (fx ) Ψ2−3 (fx )
0
π/8
π/4
Ψ2−2 (fx )
Ψ2−1 (fx )
π/2
f π x −m
Figure 4.21 Frequency response for the wavelet function from various scale parameters, s = 2 , m = 1, 2, ….
4.5 The Wavelet Transform
95
Therefore, a wavelet transform can be viewed as a filtering of g(x) with a ~ ~ bandpass filter whose impulse response of ψ s ( x). The Fourier transform of ψ s ( x) is given by Ψs ( f x ) =
f Ψ x s s
1
(4.112)
Hence, the wavelet transform can decompose the signal into a family of functions that are the translation and dilation of the mother wavelet. This family of functions is a set of frequency bands; therefore, the signal can be analyzed (localized) using various resolutions. Example 4.13
Figure 4.22(a) shows a wavelet that is a quadratic spline function and is the derivative of a smoothing function—the cubic spline function shown in Figure 4.22(b) [26]. The Fourier transform of the smoothing function is expressed as
[
]
S( f x ) = e iπf x cos( πf x )
3
(4.113)
The Fourier transform of the wavelet function is written as Ψ ( f x ) = 4 je jπf x sin( πf x )
(4.114)
A given signal and its wavelet transform computed on three scales according to (4.111) are shown in Figure 4.23. The numerical implementation of the wavelet transform is described in Section 4.5.5 after we introduce the discrete wavelet transform (DWT) in the next section. Since the wavelet is the derivative of a smoothing function, the wavelet transform at each scale denoted by Ws[g(x)] is proportional to the derivative of the original signal smoothed at the scale s. This is used to analyze the edges of the signal that is described in later sections.
0.8
1
0.4
0.8
0
0.6
−0.4
0.4
−0.8
0.2
−2
−1
0 (a)
1
2
0 −2
−1
0 (b)
1
2
Figure 4.22 (a) The Fourier transform of the wavelet, which is a quadratic spline. It is the derivative of a smooth function, which is the cubic spline shown in (b).
96
Basic Principles of Signal and Image Processing Input g(x)
W1[g(x)]
W2[g(x)]
W3[g(x)]
Figure 4.23 Input signal g(x) and its wavelet transform computed on three scales, W1[g(x)], W2[g(x)], and W3[g(x)]. Since the wavelet is the derivative of a smoothing function, the wavelet transform at each scale denoted by Ws[g(x)] is proportional to the derivative of the original signal smoothed at the scale s.
4.5.2
Dyadic and Discrete Wavelet Transform
The discrete wavelet transform is formed by sampling both the scale parameter s and the translation parameter, u. When the scale parameter s is sampled by a factor of s = m 2 , it is called dyadic sampling, as shown in Figure 4.24. In the dyadic sampling, the signal is sampled at a sampling rate 2−m (sampling space is 2m) at the scale 2m, in order to characterize the signal at each scale. Hence, at the scale 2m, the translation parameter u is sampled at a rate proportional to 2m (i.e., 2−m). When the scale decreases, the density of samples decreases. The scaled mother wavelet is denoted as
(
ψ 2 m ( x ) = 2 −m ψ 2 −m x
)
(4.115)
The discrete wavelet transform is defined by
[
]
[
] ∫ g ( x )ψ ( x − 2
W ( m , n ) g( x ) = W( 2 m , 2 − m n) g( x ) =
∞
−∞
−m
2m
−m
m
)
n dx
(4.116)
where 2 is the sample of the scale and 2 n is the sample of the translation in spatial domain. The wavelet transform using ψ 2 m ( x − 2 − m n) is referred to the dyadic wavelet transform. Scale m m = −2 m = −1 m=0 m=1 m=2
Figure 4.24
...... ..... ....... ....... ....... ....... ............ . . . . . . . . . . .
Dyadic sampling.
2−1
21
Shift n
4.5 The Wavelet Transform
4.5.3
97
Condition of Constructing a Wavelet Transform
What kind of function can be a mother wavelet? In order to reconstruct g(x) from its wavelet transform, an admissibility condition is described by Chui [18] in the following. A function ψ(x) is called a mother wavelet if its Fourier transform Ψ(fx) satisfies the admissibility condition:
∫
Cψ ≡
Ψ( f x )
∞
−∞
2
fx
df x < ∞
(4.117)
In addition, if the Fourier transform Ψ(fx) is a continuous function so that the finiteness of C in (4.117) implies Ψ(0) = 0, or equivalently,
∫
∞
−∞
ψ( x )dx = 0
(4.118)
This equation says that a requirement for a function to serve as a mother wavelet is that its integral has to be zero. In fact, it is often chosen to have a small spatial support. This is the reason why ψ(x) is called a wavelet. Because the Fourier transform is zero at the origin and the spectrum decays at high frequencies, the wavelet has a bandpass behavior, as seen in Figure 4.21. The frequency response of the wavelet function at each scale is a bandpass filter with a varying center frequency and bandwidth. 4.5.4
Forward and Inverse Wavelet Transform
As defined in Section 4.5.1, the forward dyadic wavelet transform is computed by convolving the signal with a dilated wavelet. In order to simplify the notation, we denote the wavelet transform of g(x) at the scale 2m and position x is computed with respect to the wavelet ψ 2 m ( x) in the following:
[
]
W( 2 m ) g( x )
m ∈Z
= g( x ) * ψ 2 m ( x )
(4.119)
Equation (4.119) generates a sequence of functions for scales at 2 mm∈Z , where Z is a set of integers. A function χ(x) is called inverse wavelet if its Fourier transform satisfies [26] ∞
∑ Ψ(2
m =−∞
m
fx
) χ(2
m
fx
)=1
(4.120)
By applying the inverse wavelet, the signal g(x) is recovered from its dyadic wavelet transform with the summation g( x ) =
∞
∑ W( ) [ g( x )] * χ ( x )
m =−∞
2m
2m
(4.121)
98
Basic Principles of Signal and Image Processing
Example 4.14
For the example shown in Section 4.5.1, the inverse wavelet that satisfies (4.120) is obtained in the following [26]:
(
χ( f x ) = 1 − S( f x )
2
) Ψ( f )
(4.122)
x
where S(fx) and Ψ(fx) are expressed in (4.113) and (4.114), respectively. Figure 4.25 shows the reconstructed signal from its wavelet transforms. 4.5.5
Two-Dimensional Wavelet Transform
The 2-D wavelet transform can be derived from 1-D wavelet transform. Mallat and Zhong developed a wavelet transform for 2-D signals [26] in which the wavelet function is selected as a smooth function and its first derivative. If the smooth function is denoted as θ(x, y), two wavelet functions are defined as: ψ x ( x , y) =
∂ θ( x , y) ∂x
and ψ y ( x , y) =
∂ θ( x , y)
(4.123)
∂y
Let the scaled functions be ψ xs ( x , y) =
1 x x y 1 x y ψ , and ψ ys ( x , y) = 2 ψ y , s s s s s2 s
(4.124)
Then the wavelet transform of the 2-D signal g(x, y) at the scale s has two components:
[ ] [ g( x, y)] = g( x, y) * ψ ( x, y)
W sx g( x , y) = g( x , y) * ψ xs ( x , y) and W
4.5.6
y s
(4.125)
y s
Multiscale Edge Detection
A multiscale edge detection algorithm for 2-D signals is developed in [26] using two components of the wavelet transform (see Chapter 8) from (4.125). A third-degree, Reconstructed g(x)
W1[g(x)]
W2[g(x)]
W3[g(x)]
Figure 4.25
Signal g(x) is reconstructed from its dyadic wavelet transform.
4.5 The Wavelet Transform
99
cardinal B-spline function as shown in (4.113) is selected as the smoothing filter and its first derivative as the mother wavelet. LaRossa and Lee [27] extend Mallat and Zhong’s design to use the fourth-order, cardinal B-spline function. The forward edge-wavelet transform is formulated in spatial domain, according to Figure 4.26. The filters used in the wavelet transform in Figure 4.26 could be all x-y separable and the subscripts x or y depend on which direction the filter is applied. For example, the smoothing filter is written as SxSy = S(fx)S(fy). In Figure 4.26, Sx (or Sy) is a smoothing filter, Dx (or Dy) is a differentiation filter, and Kx (or Ky) is an all-pass filter. Sx = S(fx) is chosen as the fourth-order, cardinal B-spline function: S( f x ) = cos 4 ( πf x )
(4.126)
The corresponding impulse responses of the forward wavelet functions [filter coefficients S(n) and D(n)] are given in Table 4.1. According to Figure 4.26, the input image, i(x, y), is convolved with smoothing filter SxSy (designed in Table 4.1) to produce a first-blurred version of the input image, b1(x, y). The input image is also convolved with a differentiation filter DxKy (designed in Table 4.1) to produce a horizontal partial derivative h1(x, y) and is convolved with a differentiation filter KxDy (designed in Table 4.1) to produce a vertical partial derivative v1(x, y). So, h1(x, y) and v1(x, y) are the horizontal and vertical partial derivatives (wavelet coefficients) of the input image at the finest resolution. They are the representations of the edge signals of the input image at the finest resolution. The process is then repeated using b1(x, y) in place of the input image and the same filters are used. But in Mallat’s implementation, the filters at the scale 2m are m obtained by putting 2 − 1 zeros between each of the coefficients of the filters. LaRossa and Lee [27] taught a more computationally efficient method to implement the wavelet transform. Instead of putting zeros between the coefficients of the m m filters, the convolution is done by skipping every 2 − 1 pixels at the scale 2 . Each time, a coarse resolution is computed, the convolution kernels are the same, but the
i(x,y) SxSy
*
KxDy Dx Ky
*
b1(x,y) SxSy
*
h1(x,y)
*
*
* h2(x,y)
v2(x,y) KxDy
Dx Ky
b3(x,y)
Figure 4.26
v1(x,y) KxDy
Dx Ky
b2(x,y) SxSy
*
*
* h3(x,y)
v3(x,y)
Block diagram showing the forward edge-wavelet transform.
100
Basic Principles of Signal and Image Processing Table 4.1 Filter Coefficients for the Forward Edge-Wavelet Transform n
−4
−3
−2 −1 0
1
2 3 4
16 × S(n)
0
0
1
6
4
1 0 0
n
−4
−3
−2 −1 0
1
2 3 4
2 × D(n)
0
0
0
−1 0
0 0 0
4 1
pixel intervals are increased by a factor of 2. Thus, this algorithm reduces the resolution to produce the further blurred image, bm(x, y), and the horizontal and vertical wavelet coefficients hm(x, y) and vm(x, y). The filtered output images have the same size as the input image. Although these images use more memory in the computer, they have advantages for edge enhancement when each pixel has the corresponding pixel for each resolution scale. For the decomposition to be useful, the inverse transform is necessary so that we can modify the edge information at each scale and reconstruct the image back to the intensity domain. Figure 4.27 illustrates the inverse edge-wavelet transform in spatial domain. The filter coefficients of the inverse edge-wavelet transform are computed according to (4.122) and are listed in Table 4.2. According to Figure 4.27, the reconstruction of the image starts with the lowest resolution blurred image, b3(x, y), and the first partial derivatives h3(x, y) and v3(x, y). Each of them is filtered with a different reconstruction filter PxPy (designed in Table 4.2), QxLy (designed in Table 4.2), and LxQy (designed in Table 4.2), respectively, by convolution with the respective reconstruction filters. The filtered results are summed together to produce a reconstructed image. At level three, the convolui(x,y)
Σ Px Py
*
L x Qy QxLy
*
b1´(x,y)
* h1´(x,y)
v1´(x,y)
Σ Px Py
*
L x Qy QxLy
*
b2´(x,y)
* h2´(x,y)
v2´(x,y)
Σ Px Py
* b´3(x,y)
Figure 4.27
L x Qy QxLy
*
* h´3(x,y)
v3´(x,y)
Block diagram showing the inverse edge-wavelet transform.
4.5 The Wavelet Transform
101
Table 4.2 Filter Coefficients for the Inverse Edge-Wavelet Transform n
−4 −3
−2
−1
0
1
2 3 4
16 × P(n)
0
0
1
4
6
4
1 0 0
128 × Q(n)
0
−1
512 × L(n)
1
8
−9 28
−37
−93 93 37 9 1
56 326
56 28 8 1
Figure 4.28 An example of a 2-D wavelet transform. The original input image is at the upper left, and the reconstructed image is at the upper right. They are identical. The first column from the left shows the bm(x, y) images, the second column, the hm(x, y) images, and the third column, the vm(x, y) images, m = 1, 2, 3.
tion is computed by multiplying the kernel element with every eighth pixel in the image being convolved. As shown in Figure 4.27, the same process is repeated until the output image is reconstructed at the finest resolution. For each level up, the convolution pixel interval is decreased by a factor of 2. It should be apparent that if b 3′ ( x , y) and all of the
102
Basic Principles of Signal and Image Processing
partial derivatives (wavelet coefficients h m′ and v m′ ) are unmodified, the reconstructed image will be the identical to the input image. Example 4.15
An example of a three-level wavelet transform for a FLIR image, when the forward (in Figure 4.26) and inverse edge-wavelet (in Figure 4.27) transforms are applied, is shown in Figure 4.28.
4.6
Summary In this chapter, we reviewed the theory of some basic signal and image processing techniques, including Fourier transforms, FIR filters, wavelet transforms, and their properties. The Fourier transform is useful in analyzing the signal spectrum characteristics and filter designing. The FIR filter is useful in compromising the Fourierbased filters in certain circumstances. The wavelet transform is useful in analyzing the signal using various resolutions and designing the tools for image enhancement and image compression. Applications of the theory are in many signal and image processing problems found in subsequent chapters.
References [1] Goodman, J., Introduction to Fourier Optics, New York: McGraw-Hill, 1968. [2] Bracewell, R. N., The Fourier Transform and Its Applications, 2nd ed., New York: McGraw-Hill, 1978. [3] Carlson, A. B., Communication Systems, New York: McGraw-Hill, 1986. [4] Friedman, B., Principles and Techniques of Applied Mathematics, New York: Wiley, 1956. [5] Soumekh, M., Synthetic Radar Signal Processing, New York: Wiley, 1999. [6] Oppenheim, A., and A. S. Wilsky, Signals and Systems, Englewood Cliffs, NJ: Prentice Hall, 1983. [7] Papoulis, A. V., Systems and Transforms with Applications in Optics, New York: McGraw-Hill, 1968. [8] Jain, A. K., Fundamentals of Digital Image Processing, Englewood Cliffs, NJ: Prentice-Hall, 1989. [9] Hamming, R. W., Digital Filters, Englewood Cliffs, NJ: Prentice-Hall, 1989. [10] Dumitras, A., and F. Kossentini, “High-Order Image Subsampling Using Feedforward Artificial Neural Networks,” IEEE Transactions on Image Processing, Vol. 10, No. 3, 2001, pp. 427–435. [11] Kim, J. -O., et al., “Neural Concurrent Subsampling and Interpolation for Images,” Proceedings of the IEEE Region 10 Conference, TENCON 99, Vol. 2, Cheju Island, South Korea, September 15–17, 1999, pp. 1327–1330. [12] Casas, J. R., and L. Torres, “Morphological Filter for Lossless Image Subsampling,” Proceedings of International Conference on Image Processing, ICIP’94, Vol. 2, Austin, TX, November 13–16, 1994, pp. 903–907. [13] Landmark, A., N. Wadstromer, and H. Li, “Hierarchical Subsampling Giving Fractal Regions,” IEEE Transactions on Image Processing, Vol. 10, No. 1, 2001, pp. 167–173. [14] Belfor, R. A. F., et al., “Spatially Adaptive Subsampling of Image Sequences,” IEEE Transactions on Image Processing, Vol. 3, No. 5, 1994, pp. 492–500.
4.6 Summary
103
[15] Cavallerano, A. P., and C. Ciacci, “Minimizing Visual Artifacts Caused by the Pixel Display of a Video Image,” World Patent 99/08439, February 18, 1999. [16] Drisko, R., and I. A. Bachelder, “Image Processing System and Method Using Subsampling with Constraints Such As Time and Uncertainty Constraints,” U.S. Patent 6157732, December 5, 2000. [17] Zhu, Q., and J. Stec, “System and Method for Imaging Scaling,” European Patent 1041511A2, April 10, 2000. [18] Chui, C. K., An Introduction to Wavelets, San Diego, CA: Academic Press, 1992. [19] Daubechies, I., Ten Lectures on Wavelets, Philadelphia, PA: Society for Industrial and Applied Mathematics, 1992. [20] Vetterli, M., and J. Kovacevic, Wavelets and Subband Coding, Englewood Cliffs, NJ: Prentice-Hall, 1995. [21] Kaiser, G., A Friendly Guide to Wavelets, Boston, MA: Birkhauser, 1994. [22] Burrus, C. S., R. A. Gopinath, and H. Guo, Introduction to Wavelets and Wavelet Transforms—A Primer, Upper Saddle River, NJ: Prentice-Hall, 1998. [23] Gabor, D., “Theory of Communication,” J. IEE (London), Vol. 93, 1946, pp. 429–457. [24] Grossman, A., and J. Morlet, “Decomposition of Hardy Functions into Square Integrable Wavelets of Constant Shape,” SIAM J. Math, Vol. 15, 1984, pp. 723–736. [25] Mallat, S. G., “Multifrequency Channel Decomposition of Images and Wavelets Models,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 37, No. 12, 1989, pp. 2091–2110. [26] Mallat, S. G., and S. Zhong, “Characterization of Signals from Multiscale Edges,” IEEE Transactions on Pattern and Machine Intelligence, Vol. 14, No. 7, 1992, pp. 710–732. [27] LaRossa, G. N., and H.-C. Lee, “Digital Image Processing Method for Edge Shaping,” U.S. Patent 6611627, August 26, 2003.
PART III
Advanced Applications
CHAPTER 5
Image Resampling 5.1
Introduction Image resampling is the first advanced application of signal processing to be explored in this book. The concept of image resampling originates from the sampled imager that is discussed in Chapter 2. Here, the discussion relates the image resampling with image display and reconstruction from sampled points of a single image, and then includes the spatial domain image resampling as well as Fourierbased alias-free image resampling. Image resampling performance measurements are also discussed. These topics provide the reader with a fundamental understanding that the way an image is processed and displayed is just as important as the blur and sampling characteristics of the sensor. It also provides the background for undersampled imaging for discussion on super-resolution image reconstruction in Chapter 6.
5.2
Image Display, Reconstruction, and Resampling As mentioned in Chapter 2, a digital image that is obtained from a digital imager presents a set of discrete sample values. In order to view it in a meaningful way, the image is presented on a softcopy display monitor, such as a cathode-ray-tube (CRT), which displays an array of closely spaced light spots whose intensities are proportional to the image sample magnitudes [1]. Image quality depends on the spot size, shape, and spacing. The shape and size (intensity distribution) of the display pixels determine the postsample blur (see Section 2.5), and they contribute to the performance of the system. In the simplest case, the number and location of pixels in the display correspond to the number and location of pixels in the image, which is equivalent to the number and location of detectors in the sensor. The brightness of each display pixel is proportional to the signal from the corresponding detector. However, in most cases, more than one display pixel is used per sensor sample. Therefore, interpolation is required to obtain the values between samples. This process is called image reconstruction. The interpolation process should be thought of as performing a nearoptimum reconstruction so that more displayed pixels are used in the final image [2]. This technique can be used to increase the size of the displayed image and improve the image reconstruction. Although the image size is enlarged and more values between image sample points are generated from the image, the image reconstruction does not increase the
107
108
Image Resampling
image resolution. However, poor display reconstruction function can reduce the overall imaging system performance. In Chapter 6, a process to increase the image resolution from sequential input images, called super-resolution image reconstruction, is discussed. In this chapter, image resampling for a single image is emphasized. In certain image applications, it is desirable to reduce the resolution of a captured image. Although the image resolution is reduced, the output image size can be larger or smaller than the original image using the interpolation process. This process is called image resampling. It is also called image resolution conversion or image change of scale. Reducing the resolution of an image is equivalent to increasing a sensor’s spot size and spot spacing. It is similar to resampling the original image at a lower sampling rate (with a large resolution spot). Image resampling a single image is analogous to the image reconstruction interpolation process. Specific examples of image resampling include the following: 1. Image softcopy display: When the size of an image is larger than the addressable pixel numbers of a monitor, the image size or resolution is reduced to display the entire image on the monitor. This situation is often encountered in diagnostic medical imaging systems. 2. Image analysis: In some automatic target recognition (ATR) systems, optical images are acquired by spaceborne (satellite) cameras for intelligence/ surveillance/reconnaissance (ISR) purposes. Such optical images are relatively large databases that slow down the speed of the ATR algorithms. By exploiting the fact that these optical images contain a significant amount of redundant data, one can reduce the resolution of the acquired image prior to processing with an ATR algorithm. This decreases the execution of the ATR algorithm without sacrificing its accuracy. Applications of image resampling in removing noise in synthetic aperture radar (SAR) images and estimating speckle motion in ultrasonic images are also found in [3, 4]. 3. Image transmission/image browser/Quick Look: Lossless and lossy data compression methods have been used to transfer image information. In certain applications, the user might not require high-resolution versions of the image for decision making, such as in Quick Look [5], image browser [6], and image transmission [7], among others. In the case of image transmission, the resampled image followed by a conventional image compression algorithm can significantly reduce the size of the data that needs to be transferred. For accessing and visualizing a large database, such as Earth science data, resampled, low-resolution images allow users to browse (or evaluate) the large database before using and ordering images. From the examination of sampled imaging systems in Chapter 2 and the review of sampled image Fourier transform in Chapter 4, it can be shown that the direct resampling of an image results in aliased data [1, 8, 9]. Direct resampling causes another phenomenon called a Moiré effect (or Moiré pattern), which is also due to aliasing [4, 10]. In the following sections, the sampling theory and sampling artifacts are revisited to better understand the source of aliasing or spurious response. Then, the meth-
5.3 Sampling Theory and Sampling Artifacts
109
ods of common image resampling in the spatial domain and antialias resampling in the spatial frequency domain based on Fourier windowing are presented.
5.3
Sampling Theory and Sampling Artifacts 5.3.1
Sampling Theory
Sampling theory was originally studied by Whittaker [11] and Shannon [12] in information theory. Sampling theory provides conditions under which the sampled data are an accurate representation of the original function in the sense that the original function can be reconstructed from the samples. Before we state Shannon’s sampling theorem, we define the bandwidth of the image first. Let the image be g(x, y). If its Fourier transform G(fx, fy) is zero outside a bounded region in the frequency domain, that is, G(f x , f y ) = 0, f x > f x 0 , f y > f y 0
(5.1)
the quantities fx0 and fy0 are called the x and y bandwidths of the image (see Figure 5.1). If the spectrum is radially symmetric, then the single spatial frequency ρ0 is called the bandwidth, where ρ 0 = f x20 + f y20 . The image is called bandlimited image. Both (2.53) and (4.35), and examples in Figures 2.17 and 4.12 of Chapters 2 and 4, respectively, show that the Fourier transform of a sampled function, G (fx), is composed of repeated versions of the Fourier transform of the original function, G(fx). Intuitively, if the sample spacing of the original function is small enough, then the replicas of G(fx) in spatial frequency domain do not overlap, as shown in Figure 5.2(a). Therefore, the original function can be reconstructed from the samples by lowpass filtering so that only the Fourier transform of the sampled function by the first-order reproduction is used to recover the original function [see Figure 5.2(a)]. Shannon’s sampling theorem states that a bandlimited image g(x, y) can be reconstructed without error from its sample values, g(xm, yn) [see (4.63)], if the sampling frequencies are greater than twice the bandwidths; that is, f x > 2f x0 , f y > 2f y 0
(5.2)
G(f x )
−fx 0
fx 0 2fx 0
Figure 5.1
Bandwidth illustration.
fx
110
Image Resampling Ideal lowpass filter
Gδ (fx )
fxs /2
fxs
fx
Sample frequency
(a)
Gδ (fx )
Ideal lowpass filter
fxs /2
fxs
Sample frequency (b)
Figure 5.2 Illustration of (a) nonoverlapping and (b) overlapping of spectrum of a sampled signal.
or, equivalently, if the sampling spaces (intervals) are smaller than half of the reciprocal of bandwidths; that is, ∆x
1) times, where L is an integer magnification factor, the interpolation function interpolates the image on a grid L times finer than the original sampling grid. 2. For the case of noninteger magnification with factor of L/M, step 1 is performed and every M grid points are obtained to form the output image [18]. 3. For the case of decimation, it is not straightforward. Hou and Andrews [19] suggested a method in which the role of input and output in the interpolation formula is interchanged and its transfer function is inversed. Section 5.5 shows that image rescale implementation in the spatial frequency domain is more straightforward. 5.4.3
Resampling Filters
A perfect resampling filter requires an infinite-order interpolation between image samples. Such an interpolation is impractical to implement. In the following, several commonly used interpolation functions for 1-D signal are presented. The interpolation functions for 2-D images are straightforward. 1. Sinc interpolation: The sinc function in (2.28) provides an exact reconstruction, but it is usually difficult to produce in the spatial domain. Section 5.5 demonstrates that the implementation of sinc interpolation in the spatial frequency domain is much simpler. 2. Nearest neighbor interpolation: Nearest neighbor interpolation uses a pulse function (see Figure 4.1) that results in a zero-order linear interpolation. 3. Linear interpolation: Linear interpolation uses a triangle function (see Figure 4.9) that provides first-order linear interpolation. In this case, the values of neighbors are weighted by their distance to the opposite point of interpolation. 4. Cubic interpolation: Cubic interpolation uses cubic polynomials to approximate the curve among the image sample points. In addition to the
5.4 Image Resampling Using Spatial Domain Methods
113
interpolation condition in (5.8), the interpolation kernel that has a continuous first derivative at the samples is also considered in the boundary conditions. Keys [20] provided a cubic interpolation function: 2 x h( x ) =
3
−3x 0,
2
+ 1, 0 ≤ x ≤ 1 otherwise
(5.9)
5. Spline interpolation: As mentioned in Figure 4.9, the triangle function is the result of convolving a pulse function with itself; that is, h2 ( x ) = h1 ( x ) * h1 ( x )
(5.10)
where h1(x) is the pulse function and h2(x) represents the triangle function. Basic splines, which are called B-splines, are among the most commonly used family-of-spline functions [19]. B-splines, hN(x), of order N are constructed by convolving the pulse function N − 1 times; that is, hN ( x ) = h1 ( x ) * h1 ( x )*K h1 ( x ) 14444244443
(5.11)
N −1 times
For N → ∞, this process converges to a Gaussian-shaped function h∞(x). For N = 3, we obtain the quadratic B-spline having a bell-shaped function. For N = 4, we obtain the cubic B-spline, which has properties of continuity and smoothness. Hou and Andrews [19] provided a cubic B-spline (1 2 ) x 3 − x 2 + 2 3 , 0≤ x < 1 3 2 h( x ) = −(1 6) x + x − 2 x + 4 3, 1 ≤ x ≤ 2 otherwise 0,
(5.12)
For a 2-D image, the previous 1-D function is formed to a 2-D separable function to perform 2-D image resampling by interpolating rows then columns of an image. Such 2-D interpolation functions are bilinear interpolation (associated with 1-D linear), bicubic interpolation (associated with 1-D cubic), and bicubic spline (associated with 1-D spline) interpolation, among others [1, 21]. The spatial domain interpolations are popular since the interpolation kernels are small and they produce acceptable results. However, the interpolation kernel has a finite and small support in the spatial domain. Thus, the frequency domain characteristics are not adequate. The spectral domain support is infinite, and the interpolation kernel cannot remove all of the frequencies beyond the desired cutoff frequency. Therefore, the resampled image suffers a noticeable degradation resulting from aliasing. In the next section, an antialias image resampling method based on improving the interpolation filters in the frequency domain is discussed.
114
5.5
Image Resampling
Antialias Image Resampling Using Fourier-Based Methods In digital image processing, there are various methods to form lower-resolution images. As explained in Section 4.4, Fourier-based filtering methods are suitable for antialias image resampling. With this approach, the user applies a filter in the frequency domain that is zero outside the desired band. The bandwidth of the resultant image is then reduced by the desired factor from the bandwidth of the original image. The resampled version of the image is obtained by taking the inverse Fourier transform of the filtered image in the spectral domain. The simplest Fourier-based filter is a rectangular one. However, this filter contains large sidelobes (ringing effect or Gibbs phenomenon) [8] in the spatial domain, which is not desirable. A Fourier-based filter, as discussed in Section 4.4, is utilized with a power window for application to the antialias image resampling problem. This Fourier-based windowing method has the following advantages: • •
5.5.1
It reduces the ringing effect in the spatial domain. It preserves most of the spatial frequency content of the resampled image. Image Resampling Model
The notations of 2-D continuous, discrete Fourier transform, and sampling are discussed in Section 4.2. Resampled versions of the image are lower-resolution images. The sample spacings in the spatial domain of the resampled image, ∆xs and ∆ys, are larger than those of the original image; that is, ∆ xs > ∆ x and ∆ ys > ∆ y
(5.13)
To avoid aliasing in the resampled image, the higher-frequency components of the image are filtered to obtain the resampled version of the image. Let the ratio of sample spacing of the resampled image and the original image be B; that is, B=
∆ xs or ∆ xs = B∆ x ∆x
(5.14)
If the bandwidth of the original image is fx0, then the bandwidth of the resampled version of the image (fxs0) is effectively reduced by a factor of B; that is, f xs 0 =
f x0 B
(5.15)
The ratio B is called the bandwidth reducing factor. Figure 5.3 depicts this phenomenon. The same principle holds for the y domain data. Thus, the Fourier transform of the resampled image, Gs(fx, fy), is generated via the following: G(f x , f y ), for f x < f xs 0 and f y < f xs 0 G s (f x , f y ) = otherwise 0,
(5.16)
5.5 Antialias Image Resampling Using Fourier-Based Methods
115
G(f x )
fx fxs0 fx0
Figure 5.3 Illustration of bandwidth reduction. The bandwidth of the original image, fx0, is reduced to fxs0, the bandwidth of the resampled version of the image.
As mentioned earlier, straightforward filtering causes ringing or sidelobe effects. A practical solution to counter this problem is to use a smoother transition near the edge of the cutoff frequency. In most of the applications, the desired filter should not significantly attenuate the higher-frequency components in the resampled image. 5.5.2 5.5.2.1
Image Rescale Implementation Output Requirements
In practical applications, the output size of the resampled image may be required by the output media. For example, an optimal performance of a softcopy display requires that the output image size should be compatible with the monitor’s pixel count. The pixel count or display element varies with the type of monitor. Usually, a high-resolution monitor has more pixels than a medium-resolution monitor. One desired feature of the resampled image is that the size of the resampled image can be selected according to the output requirement. The ratio of the resampled size and original image is not constrained by the power of two that was the constraint in most of the FIR filter implementations. Actually, the ratio of resampled image size and original image size can be selected as any noninteger values. This is one of the desirable features of using frequency domain filters. Let the bandwidth of the input image be ρi, the bandwidth of the resampled image be ρb, and the bandwidth of the output image be ρo. Figure 5.4 depicts the following four scenarios: 1. ρo = ρb < ρi. The output image has the same size as the resampled image and a smaller size than the original image. This scenario is called alias-free decimation or subsampling. This is achieved by removing all of the zeros in the band rejection region of the filter before constructing the inverse Fourier transform of the filtered spectrum to obtain the resampled image (see Section 5.5.3). 2. ρb < ρo < ρi. The output image has a larger size than the resampled image, but it has a smaller size than the original image. This scenario is also called
116
Image Resampling fy
fy Alias-free decimation
Alias-free decimation ρb < ρo < ρi
ρo = ρb< ρi ρb ρo
ρi
(a) fy
ρb ρo ρi
fx
fx
(b) fy Alias-free resolution reduction ρb < ρo = ρi ρb
ρi ρo
ρb
fx
(c)
Alias-free resolution reduction and interpolation ρb ≤ ρi < ρo ρi ρo
fx
(d)
Figure 5.4 Resampling output requirements. (a) Alias-free decimation/subsample. The output image has the same size as the resampled image but a smaller size than the original image. (b) Alias-free decimation/subsample. The output image has a larger size than the resampled image but a smaller size than the original image. (c) Alias-free resolution reduction. The output image has a larger size than the resampled image but the same size as the original image. (d) Alias-free resolution reduction and interpolation. The output image has a larger size than the original image.
alias-free decimation or subsampling. This is achieved by zero-padding the part of the band rejection region of the filter. 3. ρb < ρo = ρi. The output image has a larger size than the resampled image, but it has the same size as the original image. This scenario is called alias-free resolution reduction. This is achieved by zero-padding the band rejection region of the filter. 4. ρb ≤ ρi < ρo. The output image has a larger size than the original image. This scenario is called alias-free resolution reduction and interpolation. This is achieved by zero-padding the band rejection region and beyond the input band region. Scenarios 1–3 belong to the subsampling procedure, which is called decimation. Scenario 4 is an upsampling procedure, which is called interpolation. The interpolation that is implemented by the frequency domain filter is equivalent to the sinc function interpolation. 5.5.2.2
Computational Efficiency
Another desirable feature of alias-free image resampling is computationally efficient processing of large images. Medical professionals utilize relatively large medical
5.5 Antialias Image Resampling Using Fourier-Based Methods
117
images in practice, such as 4K × 5K digitized mammograms. Large-image processing carries a heavy computational burden, so instead of calculating on a large image, the input image is first partitioned into smaller images. If the number of partitions is not an integer (i.e., the partitions at the boundaries do not possess sufficient pixels), then mirror pixel image values can be added to those partitions at the boundaries. Each image partition is processed via the alias-free image resampling algorithm using Fourier-based windowing. Finally, all resampled partitions are added to form the entire output image. A multiprocessor computer may be utilized to simultaneously process multiple image partitions; this may reduce the computational cost. 5.5.3
Resampling System Design
An overview of system design for alias-free image resampling is illustrated in Figure 5.5(a). In the first step, a portion of the mirror image is added to the input image in
Iin
Add a portion of mirror image in the boundary
IM
Apply Fourier transform
IF
Apply alias-free windowing
Iz
Apply inverse-Fourier transform
Iv
Discard the added boundary
Zero pad
Iw
Iout
(a)
Ip
Spatial boundary padding
IM
Iz
Zero pad
Partitioning
Partitioning 2
Ip
Partitioning N
Apply inverse-Fourier transform
IF
Apply alias-free windowing
Iv
Discard the added boundary
Iw
Alias-free resampling
...
...
Iin
Apply Fourier transform
Ip
Alias-free resampling (b)
Figure 5.5 (a) Overview of alias-free resampling; and (b) second implementation of the resampling system.
Io
Io
...
Partitioning 1
Io
+
Iout
118
Image Resampling
the boundary to generate a partly mirrored image. Since Fourier-based filtering assumes that the signal is periodic, the user should provide padding at the boundaries of the input image to prevent the circular convolution leakage effects, which can be observed in the resampled image’s boundary in the spatial domain. In this approach, an image that contains a portion of the mirror image at the boundary of an input image in the spatial domain is generated before the Fourier transform. This is analogous to the cosine transform in image processing. After the Fourier transform, the image is passed through an alias-free resampling windowing. Zero-padding is realized before the inverse Fourier transform is applied to obtain the reconstructed image. Zero-padding procedure is applied according to the four scenarios that are listed in Section 5.5.2 and enables the output image size to be selected by the output image requirement. In the last step, the added boundary is discarded to generate the output image. The output image is a reduced resolution image with a different size relative to the original image and with minimal artifacts compared to using conventional filters. Figure 5.5(b) shows another implementation of the image resampling system, where the input image is first partitioned into many smaller images. Each image partition is processed via the alias-free image resampling algorithm. At the end, all resampled image partitions are combined to generate the output image. Here, the first step of the alias-free image resampling algorithm is called spatial boundary padding. For a partition located in the middle of the image, a portion of the adjacent image partition is added to prevent the circular convolution leakage effects. For a partition located at the boundary of the original image, a portion of the mirror image is formed before the Fourier transform is taken, making the output image appear more natural. 5.5.4
Resampling Filters
As reviewed in Section 4.4, the commonly used Fourier-based filters are as follows: • • • •
• •
2-D separable filters; 2-D radially symmetric filters; 2-D radially symmetric filters with Gaussian windows; 2-D radially symmetric filters with Hamming windows applied at a transition point; 2-D radially symmetric filters with Butterworth windows at a transition point; 2-D radially symmetric filters with power windows.
The spatial frequency filter is denoted by H(fx, fy). The spectrum of the resampled image is then obtained by the equation: Gs (f x , f y ) = G(f x , f y )H(f x , f y )
(5.17)
The filter H(fx, fy) is defined as a 2-D radially symmetrical filter with a smooth window, as defined in (4.72); that is, H(f x , f y ) = H r (f x , f y ) W ( ρ)
(5.18)
5.5 Antialias Image Resampling Using Fourier-Based Methods
5.5.5
119
Resampling Filters Performance Analysis
The performance of the previously mentioned filters is examined by the following two typical test patterns: a resampling 2-D delta test pattern and a resampling 2-D chirp test pattern. 5.5.5.1
Resampling 2-D Delta Test Pattern
Let the image g(x, y) be a Delta function as shown in Figure 5.6(a). Let the bandwidth reducing factor be B = 2. The resultant resampled image using a 2-D separable filter is shown in Figure 5.6(b). The ringing effect can be observed in the form of a distorted original shape of the Delta function. The cross-shaped point spread function (PSF) (impulse response) is also not desirable for images of natural scenes. The subsampled Delta image using a 2-D radially symmetric filter is shown in Figure 5.6(c). The PSF has a more natural-looking shape; however, due to the sharp transition at the edge of the cutoff frequency, the ringing effect is still prominent. Figure 5.6(d) shows the filtered Delta function using a 2-D radially symmetric filter with a power window. Figure 5.7 illustrates the cross section of the Delta function reconstructed from a 2-D radially symmetric filter without windowing, with a Gaussian window, with a Hamming window at a transition point, and with the power window, using the previous parameters.
Reconstructed from 2-D separable window
20
20
40
40
60
60
y
y
Delta function
80
80
100
100 120
120
20 40 60 80 100 120 x (b)
20 40 60 80 100 120 x (a)
Reconstructed from the power filter
20
20
40
40
60
60
y
y
Reconstructed from 2D radially symmetric filter
80
80
100
100
120
120 20 40 60 80 100 120 x (c)
20 40 60 80 100 120 x (d)
Figure 5.6 (a) Delta image; (b) the resultant resampled image at the bandwidth reducing factor B = 2 using 2-D separable filter; (c) using 2-D radially symmetric filter; and (d) using 2-D radially symmetric filter with the power window.
Image Resampling Cross section of delta function filtered by 2-D radially symmetric window 0.05
Cross section of reconstructed delta function by Gaussian filter 0.05
0.04
0.04
0.03
0.03
0.02
0.02
Signal
Signal
120
0.01
0.01
0
0
−0.01
−0.01
−0.02
−0.02 0
50
100
0
100 x Cross section of reconstructed delta function by power filter
0.05
0.05
0.04
0.04
0.03
0.03
0.02
0.02
Signal
Signal
x Cross section of reconstructed delta function by Hamming filter
0.01
50
0.01
0
0 −0.01
−0.01
−0.02
−0.02 0
50
100
0
x
50
100 x
Figure 5.7 Cross section of resampled Delta image (resampled at the bandwidth reducing factor B = 2) by using (a) 2-D radially symmetric filter; (b) 2-D radially symmetric filter with Gaussian window σ =0.7ρs0; (c) 2-D radially symmetric filter with Hamming window at a transition point (when ρ1 = 0.7ρs0); and (d) 2-D radially symmetric filter with a power window (ρ1 = 0.7ρs0).
5.5.5.2
Resampling 2-D Chirp Test Pattern
A chirp signal contains various frequency components that are spatially encoded (spatially visual). The signals are suitable candidates to test the filtering properties of a window at various resolution scales (levels), as well as the sidelobe effect. A 2-D chirp signal is expressed by 1 C( r ) = cos βr 2 2
(5.19)
where β is a constant and r=
x 2 + y2
(5.20)
The instantaneous frequency is obtained by d 1 2 βr = βr dr 2
(5.21)
5.5 Antialias Image Resampling Using Fourier-Based Methods
121
Then the maximum frequency ρ0 corresponds to the maximum distance r0 in the spatial domain; that is, ρ0 = β r
r = r0
= βr0
(5.22)
The cutoff frequency ρs0 can be related to the cutoff position at rs0 in the spatial domain; that is, ρs0 = β r
(5.23)
r = rs 0
This property of the chirp function is called spatially encoded and is demonstrated in Figure 5.8. The results of processing a 2-D chirp test pattern using the 2-D radially symmetric filter without windowing, with a 2-D FIR filter, and with a power window, using the same parameters as provided in Section 4.4, are illustrated in Figure 5.9. The cross sections of the 2-D images are shown in Figure 5.10. The FIR filter that is applied here is a standard, five-point FIR digital lowpass filter that is used in most commercial digital signal processing (DSP) products [22]. Original chirp support in spatial domain r0
Original chirp bandwidth ρ0 = βr 0
Chirp test pattern
Spectrum of chirp test pattern
3 200
Spatial freq. domain f y
Spatial domain y
2 1 0 −1
0
−100
−2 −3
100
−200 −2
0 100 200 −200 −100 Spatial freq. domain f x
0 2 Spatial domain x
Spectrum of chirp test pattern
Chirp test pattern 3 200
Spatial freq. domain f y
Spatial domain y
2 1 0 −1
−200 −2
0 2 Spatial domain x Visible chirp after filtering rs0
Figure 5.8
0
−100
−2 −3
100
−200 −100 0 100 200 Spatial freq. domain f x Resampled chirp bandwidth ρs0 = βrs0
Illustration of spatially encoded property of chirp function.
122
Image Resampling Chirp test pattern
Chirp pattern after 2-D radially symmetric filter −3
−2
−2
−1
−1
Spatial domain y
Spatial domain y
−3
0 1 2
0 1 2
3
3 −2
0 Spatial domain x (a)
−2
2
−3
−2
−2
−1
−1
0 1 2
2
Chirp pattern after 2-D power filter
−3
Spatial domain y
Spatial domain y
Chirp pattern after 2-D FIR filter
0 Spatial domain x (b)
0 1 2
3
3 −2
0 Spatial domain x (c)
2
−2
0 Spatial domain x (d)
2
Figure 5.9 (a) 2-D chirp image; (b) 2-D chirp resampled at the bandwidth reducing factor B = 2 by using 2-D radially symmetric filter; (c) using 2-D FIR filter; and (d) using 2-D radially symmetric filter with a power window (ρ1 = 0.7ρs0).
After the image is convolved with the FIR filter, the output resampled image is obtained by directly resampling the filtered image using the band reduction factor. In this example, the band reduction factor is B = 2. The resampled image result using a 2-D radially symmetric filter [Figures 5.9(b) and 5.10(b)] shows severe distortion due to the sidelobe effect. The result using the FIR filter [Figures 5.9(c) and 5.10(c)] shows severe aliasing due to the fact that the high-frequency components leak back into the passband. Figures 5.9(d) and 5.10(d) show the result of using 2-D radially symmetric filter with a power window. The sidelobe effects are significantly reduced, while the high-frequency components are preserved. 5.5.5.3
Ripple Property
In the frequency-domain filter design with the power window, an analytical expression is used to identify the filter at a set of spatial frequency sampled points. However, this design does not put any restrictions on the value of the filter at the points between the sampled points. Therefore, it is worthwhile to examine the points
5.5 Antialias Image Resampling Using Fourier-Based Methods Cross section of chirp pattern after 2-D radially symmetric filter
1.5
1.5
1
1
0.5
0.5
Filter signal
Chirp signal
Cross section of chirp pattern
0
−0.5
−1 −2
0 2 Spatial domain x
4
−1.5 −4
Cross section of chirp pattern after 2-D FIR filter 1.5 1
1
0.5
0.5
0
−0.5
0 2 Spatial domain x
4
0
−0.5
−1 −1.5 −4
−2
Cross section of chirp pattern after 2-D power filter 1.5
Filter signal
Filter signal
0
−0.5
−1 −1.5 −4
123
−1 −2
0 2 Spatial domain x
4
−1.5 −4
−2
0 2 Spatial domain x
4
Figure 5.10 Cross section of (a) 2-D chirp image; (b) 2-D chirp subsampled at the bandwidth reducing factor B = 2 by using 2-D radially symmetric filter; (c) using 2-D FIR filter; and (d) using 2-D radially symmetric filter with a power window (ρ1 = 0.7ρs0).
between the sampled points. The points provide information on the ripple properties of the power window filter. Since an FIR filter has finite support in the spatial domain, its distribution in the spatial frequency domain contains ripples (in both the passband and the stopband), and its spectral support is not finite. In general, an FIR filter is designed based on user-prescribed criteria, such as the ripple amplitude in the passband and stopband and the length of the transition band. One method of looking at the ripple property of the points between the sampled points is as follows. To obtain the values of the points between the sampled points, the sample spacing needs to be reduced. In (4.66), the relationship between the spatial domain sample spacing and the spatial frequency domain sample spacing is described to be: N ∆ x ∆ fx = 1
(5.24)
where N is the number of samples, ∆x is the sample spacing in the spatial domain, and ∆ f x is the sample spacing in the spatial frequency domain. To obtain the value
124
Image Resampling
of the window function at points in between the sampled points in the frequency domain, the spatial frequency sample spacing ∆ f x should be reduced. The reduced sample spacing ∆ f x can be obtained by increasing the number of samples, N. This can be achieved by zero-padding the data in the spatial domain. In this process, the number of samples is increased from N to, say, N2, where N2 = mN. In this case, the sample spacing is reduced from ∆ f x to ∆ fx2 , where ∆ fx2 = ∆ f x / m. When the Fourier transform is applied to the zero-padded signal in the spatial domain, the transformed signal in the frequency domain contains N2 points with the sample spacing of ∆ fx2 . As a result, the user could observe the ripples between the original sampled points. Figure 5.11 shows the ripple property of the power window. In Figure 5.11(a), the power window is designed when the cutoff frequency is ρ2 = 0.7ρs0 and 3-dB point is at ρ1 = 0.7ρ2. The solid line shows the power window when the number of samples is N = 512. The dashed line shows the power window after spatial domain zero-padding with N2 = 4,096. That is, the original power window is inverse Fourier
Window
1.2
Magnitude
1 0.8 0.6 0.4 0.2 0 −200
Window
(a)
Window
1
1
0.8
0.8
Magnitude
Magnitude
1.2
−100 0 100 200 Spatial frequency domain f x
0.6 0.4
0.6 0.4
0.2
0.2
0
0
50
55 60 65 70 Spatial frequency domain fx (b)
125
130 Spatial frequency domain fx
135
(c)
Figure 5.11 Ripple property of the power window. (a) The power window with the number of samples N = 512 (solid line) and one after spatial domain zero-padding with N2 = 4,096 (dashed line). (b) The power window in the passband. Circles represent the original sampled points. Crosses represent the points in between the original sampled points. (c) The power window from the transition band to the stopband. The largest ripple appears at the cutoff frequency point, and its amplitude is less than 0.02 in magnitude.
5.6 Image Resampling Performance Measurements
125
transformed to the spatial domain and zero-padded to N2 = 4,096, and then Fourier transformed back to the frequency domain, making the ripple almost invisible. Figures 5.11(b) and 5.11(c), respectively, show the power window in the passband and stopband regions. The circles represent the original sampled points. The crosses represent the points in between the original sampled points. Because the number of samples is increased from N = 512 to N2 = 4,096 (i.e., N2 = 8N), the sample spacing is reduced by eight times. Figure 5.11(b) shows that there are almost no ripples in the passband of the power window. Figure 5.11(c) shows the region from the transition band to the stopband. The biggest ripple appears at the cutoff frequency point. The amplitude of the biggest ripple is less than 0.02 in magnitude.
5.6
Image Resampling Performance Measurements An example of the performance implications of image resampling is presented as follows. For this example, the system described in Chapter 3 (see Table 3.3) is modified by assuming that the display distance is 55.41 cm. Display distance directly affects the magnification of the system. Optimal magnification can be estimated using the rule of thumb given by Burks et al. [23] and is given by M optimal ≈
11 . θ object
where M0 is the optimal system magnification and θ0 is the detector field of view in milliradians. The optimal magnification for this system is 2.75. The magnification at the display distance of 55.41 cm is half this, or 1.375. This situation can arise in practice when the display distance to observer is constrained by the other factors, such as the cockpit layout in an aircraft. By displaying twice the number of pixels, optimal magnification for this system can be achieved. This will result in an increase in performance, if the method used does not introduce artifacts that degrade performance. Increasing the number of pixels can be achieved through image resampling (i.e., interpolation). Two methods of interpolation are examined here: bilinear interpolation and power window interpolation. The process of modeling the performance of this system is identical to the example in Chapter 3 with the aforementioned change in display distance. To model the effects of interpolation, the magnification must be doubled from the baseline case and a transfer function representing the frequency response of the interpolation filter added as an additional postsample filter. Figure 5.12 shows the transfer functions for bilinear interpolation and the power window interpolation. The power window is assumed to be designed so that the magnitude of the transfer function drops by –3 dB at 70 percent of the sample frequency and –40 dB at the sample frequency. The sampling frequency for this system is 2 cycles per milliradian in object space. Performance as measured with and without interpolation is shown in Figure 5.13. In addition, a curve obtained by assuming double the number of real samples is shown. This curve represents magnification only with no performance degrada-
126
Image Resampling 1 0.9
Modulation contrast
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.2
0.4
0.6
0.8 1 1.2 1.4 1.6 Frequency in cycles/milliradian Power window
Figure 5.12
1.8
2
2.2
BLI
Transfer functions for bilinear and power window interpolation.
1 0.9
Probability of identification
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.2
Baseline
Figure 5.13
0.4
0.6
BLI
0.8
1 1.2 Range in km Power window
1.4
1.6
1.8
2
Ideal (magnification only)
Range performance of system with bilinear and power window interpolation.
tion due to interpolation. Techniques for achieving this type of performance are discussed by super-resolution image reconstruction in Chapter 6. As in Chapter 3, the assumed task is the identification of commercial and paramilitary vehicles in the longwave infrared (LWIR) band. Both forms of interpolation provide an increase in performance due to the increase in magnification. However, power window interpolation extends more range performance of this sensor system than bilinear interpolation does. Due to the interpolation, both transfer functions degrade performance from that obtained by an ideal case (i.e., a magnification
5.7 Summary
127
only). However, bilinear interpolation degrades performance more than power window interpolation.
5.7
Summary In this chapter, we examined the application of signal processing to image resampling. Image resampling using spatial domain methods and spatial frequency domain Fourier-based methods are reviewed. A computationally efficient, alias-free image resampling method is introduced using a Fourier-based power window. The performance of the proposed resampling filter shows that the sidelobe effects are significantly reduced while the high-frequency components are preserved in the resampled image when compared to using 2-D conventional filters. The ripple property of the power window filter shows that almost no ripples appear in the passband. The biggest ripple that appears in the stopband is less than 0.02 in magnitude. It is almost invisible in the subsampled image. A conventional FIR filter is developed so that the window is generated in the frequency domain, and then it is inversely transformed into the spatial domain. The actual filtering is implemented by convolving the window and the signal in the spatial domain. This kind of FIR filtering was considered preferable to the Fourierbased methods due to its simplicity and ease of implementation within the older computing environments. The new digital processing architectures and the filter design, the Fourier-based methods, not only provide more accurate solutions than the FIR filtering techniques, but they are also less computationally intensive. The image resampling performance measurements are also discussed. The performance measurement is illustrated using the optimal magnification for the display system, the image quality principle that is described in Chapter 3. The range performance of an imaging system is presented with bilinear interpolation and power window interpolation. Techniques to achieve the ideal case in which there is no performance degradation due to interpolation are discussed through super-resolution image reconstruction in Chapter 6.
References [1] [2] [3] [4] [5]
[6]
Jain, A. K., Fundamentals of Digital Image Processing, Englewood Cliffs, NJ: Prentice-Hall, 1989. Vollmerhausen, R., and R. Driggers, Analysis of Sampled Imaging Systems, Bellingham, WA: SPIE Press, 2001. Foucher, S., G. B. Benie, and J. M. Boucher, “Multiscale MAP Filtering of SAR Images,” IEEE Transactions on Image Processing, Vol. 10, No. 1, 2001, pp. 49–60. Geiman, B. J., et al. “A Novel Interpolation Strategy for Estimating Subsample Speckle Motion,” Physics in Medicine and Biology, Vol. 45, No. 6, 2000, pp. 1541–1552. Guarnieri, A. M., C. Prati, and F. Rocca, “SAR Interferometric Quick-Look,” Proceedings of IEEE International Conference on Geoscience and Remote Sensing Symposium, IGARSS ’93, Vol. 3, Tokyo, Japan, August 18–21, 1993, pp. 988–990. Leptoukh, R., et al., “GES DAAC Tools for Accessing and Visualizing MODIS Data,” Proc. IEEE International Conference on Geosci. Remote Sens. Symp. IGARSS’02, Vol. 6, Toronto, Ontario, Canada, June 24–28, 2002, pp. 3202–3204.
128
Image Resampling [7] Schmitz, B. E., and R. L. Stevenson, “The Enhancement of Images Containing Subsampled Chrominance Information,” IEEE Transactions on Image Processing, Vol. 6, No. 3, 1997, pp. 1052–1056. [8] Hamming, R. W., Digital Filters, Englewood Cliffs, NJ: Prentice-Hall, 1989. [9] Carlson, A. B., Communication Systems, New York: McGraw-Hill, 1986. [10] Roetling, P. G., “Halftone Method with Edge Enhancement and Moire Suppression,” J. Opt. Soc. Am., Vol. 66, 1976, pp. 985–989. [11] Whittaker, E. T., “On the Functions Which Are Represented by the Expansions of the Interpolation Theory,” Proc. Roy. Soc. Edinburgh, Sect. A, Vol. 35, 1915, p. 181. [12] Shannon, C. E., “Communication in the Presence of Noise,” Proc. Institute of Radio Engineers (IRE), Vol. 37, 1949, pp. 10. [13] Goodman, J., Introduction to Fourier Optics, New York: McGraw-Hill, 1968. [14] Lehmann, T. M., C. Gonner, and K. Spitzer, “Survey: Interpolation Methods in Medical Image Processing,” IEEE Transactions on Medical Imaging, Vol. 18, No. 11, 1999, pp. 1049–1075. [15] Maeland, E., “On the Comparison of Interpolation Methods,” IEEE Transactions on Medical Imaging, Vol. 7, No. 3, 1988, pp. 213–217. [16] Vaidyanathan, P. P., “Multirate Digital Filters, Filter Banks, Polyphase Networks, and Applications: A Tutorial,” Proc. IEEE, Vol. 78, No. 1, 1990, pp. 56–95. [17] Pratt, W. K., Digital Image Processing, New York: Wiley, 1978. [18] Seidner, D., “Polyphase Antialiasing in Resampling of Images,” IEEE Transactions on Image Processing, Vol. 14, No. 11, 2005, pp. 1876–1889. [19] Hou, H. S., and H. C. Andrews, “Cubic Splines for Image Interpolation and Digital Filtering,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-26, No. 6, 1978, pp. 508–517. [20] Keys, R. G., “Cubic Convolution Interpolation for Digital Image Processing,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-29, No. 6, 1981, pp. 1153–1160. [21] Press, W. H., et al., Numerical Recipes in C, Cambridge, U.K.: Cambridge University Press, 1992. [22] Signal Processing Toolbox User’s Guide, Natick, MA: The MathWorks, Inc., 1996. [23] Burks, S. D., et al., “Electronic Zoom Functionality in Under-Sampled Imaging Systems,” Proc. of SPIE, Vol. 6543, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing XVIII, Orlando, FL, April 11–13, 2007, pp. 65430M-1–12.
CHAPTER 6
Super-Resolution 6.1
Introduction One common factor for image degradation in an image formation and recording process is undersampled sensors. The use of these sensors results in aliased imagery in which the high-frequency components (subtle/detailed information) are lost. Super-resolution image reconstruction algorithms can recover the lost highfrequency detail information from the undersampled images. First, two concepts of super-resolution are introduced in this chapter. Then, several methods of each step in super-resolution image reconstruction algorithm are considered. Particularly, a frequency domain method for high-resolution output image reconstruction, the error-energy reduction method, is introduced. Super-resolution image performance measurements and performance modeling methods are also discussed in this chapter. 6.1.1
The Meaning of Super-Resolution
Regarding the super-resolution terminology, there is frequently confusion with respect to the meaning of the word. In fact, some say that the term has been “hijacked” or stolen from its original meaning and is being applied improperly to a newer area of work. The earlier work involved the estimation of spatial information beyond the modulation transfer function (MTF) bandlimit of an imaging system (typically the diffraction limit) [1]. The newer work involves the use of successive, multiple frames from an undersampled imager to collectively construct a higherresolution image [2]. The former area has to do with diffraction blur, and the latter area has to do with sampling. 6.1.2
Super-Resolution for Diffraction and Sampling
The original super-resolution terminology has been applied to the process of estimating spectral information beyond the diffraction limit of the sensor system (the system addressed here includes the atmosphere or transmission media). Professionals who work on these issues include astronomers, academics, and high-performance imager (such as space-based cameras) developers. Techniques for performing the estimation usually require some assumption about the target spectrum and some characterization of sensor system blur or point spread function (PSF). Typically, these techniques are applied to imagery with low noise and large
129
130
Super-Resolution
dynamic range. The goal of the process is to increase the human contrast sensitivity through the imager. The objective of super-resolution for sampling is an increase in the spatial resolution performance of the sensor, including the human observer, without physically altering the sensor (optics or focal plane). Specifically, it is to increase the effective spatial sampling of the sensor’s field of view (FOV) by operating on a sequence of undersampled images that have successive frames containing subpixel shifts. Therefore, each undersampled successive image can be exploited to obtain an overall improvement in sensor resolution. However, it is not beyond the diffraction limit. These professionals are primarily defense imaging specialists, with some component of researchers in the commercial sector. 6.1.3
Proposed Nomenclature by IEEE
The IEEE Signal Processing Magazine published a special issue on super-resolution in May 2003 (Vol. 20, No. 3). The guest editors, Moon Kang and Subhasis Chaudhui [2], provided the IEEE proposal for nomenclature clarification in the area of super-resolution. Super-resolution image restoration is the concept of recovering information beyond the diffraction limit of an imaging system by using its PSF and sometimes a priori target spectrum knowledge. Super-resolution image reconstruction increases the spatial resolution by combining sequences of lower-resolution images to form a higher-resolution image with less blur, noise, and aliasing. Super-resolution restoration is briefly discussed in Section 6.2. Descriptions of common practices in super-resolution reconstruction are the focus of the subsequent sections. Finally, performance analysis of super-resolution reconstruction, including performance measurements, sensors that benefit from super-resolution, and performance modeling and prediction, are presented in Sections 6.4 through 6.6.
6.2
Super-Resolution Image Restoration Super-resolution image restoration involves increasing the resolution of an imaging system beyond the diffraction limit through the estimation of higher spatial frequency content of the object. These techniques have been successfully applied to astronomy applications, such as the Hubble Space Telescope (HST); however, there has limited success with tactical applications associated with extended target and background objects. Super-resolution restoration can be achieved with nonlinear techniques. As an astronomy example, the CLEAN algorithm is one of the techniques in which the algorithm can develop image information at spatial frequencies where the imaging system’s optical transfer function (OTF) is equal to zero [3]. The CLEAN algorithm has worked well in a number of applications where there have been adequate signal-to-noise ratios (SNRs). There are a number of modifications to the CLEAN algorithm to improve its performance to include robust source-finding algorithms to account for noise, among other things. The performance of the CLEAN algorithm and other similar algorithms is ultimately related to the size of the object being imaged.
6.3 Super-Resolution Image Reconstruction
131
A number of other techniques are intended to provide imaging beyond the diffraction limit of an optical system. In microscopy, structured illumination (SI) is one of a number of techniques where different illumination methods are coupled with microscope imaging to obtain a source measurement that extends beyond the diffraction limit. Several techniques are under investigation at the Defense Advanced Research Projects Agency (DARPA), where imaging through turbulent media (the atmosphere) actually improves the resolution of an imaging system beyond the diffraction limit associated with the entrance aperture of a long-range imager, such as a spotter scope. In this case, the optical power associated with turbules in the atmosphere is used to increase the effective aperture of the overall system.
6.3
Super-Resolution Image Reconstruction 6.3.1
Background
Many low-cost visible and thermal sensors spatially or electronically undersample an image. Undersampling results in aliased imagery in which the high-frequency components are folded into the low-frequency components in the image. Consequently, subtle/detailed information (high-frequency components) is corrupted in these images. An image/signal processing method, called super-resolution image reconstruction, can increase image resolution without changing the design of the optics and the detectors, by using a sequence (or a few snapshots) of low-resolution images. The emphasis of the super-resolution image reconstruction algorithm is to de-alias the undersampled images to obtain an alias-free or, as identified in the literature, a super-resolved image and to increase the bandwidth of the imager. Super-resolution image reconstruction is especially important in infrared imaging systems. Typical focal planes in the infrared have detector sizes on the order of 20 to 50 µm and sample spacings slightly larger than the detector. In contrast, visible band CCD cameras are around 3 µm. The larger detector spacings are, in many cases, accompanied by a diffraction spot associated with the infrared optics that is smaller than the detector spacing. In this case, the imaging system is undersampled. Scanned systems, such as the second generation thermal imagers, are sampled at spacings smaller than the detector and the diffraction spot, so they are not undersampled. Super-resolution image reconstruction does not increase the performance of these systems. These systems are currently being replaced with large format staring focal planes that are undersampled. Today’s focal planes in the uncooled thermal imagers and, especially, the midwave infrared InSb and HgCdTe arrays, are significantly undersampled so that super-resolution image reconstruction can provide a substantial benefit. When undersampled images have subpixel shifts between successive frames, they represent different information from the same scene. Therefore, the information that is contained in an undersampled image sequence can be combined to obtain an alias-free (high-resolution) image. Super-resolution image reconstruction from multiple snapshots provides far more detail information than any interpolated image from a single snapshot.
132
Super-Resolution
6.3.2
Overview of the Super-Resolution Reconstruction Algorithm
An overview of the super-resolution image reconstruction algorithm is illustrated in Figure 6.1. There are three major steps in super-resolution image reconstruction methods [4–6], which are as follows: 1. Acquiring a sequence of images from the same scene with subpixel shifts (fraction pixel displacements) among the images. The undersampled low-resolution images are captured either by natural jitter or some kind of controlled motion of the camera. In some cases, the images of successive frames contain not only subpixel shifts but also integer pixel shifts (gross shifts). 2. Motion estimation (integer-pixel or subpixel): • Integer-pixel: The motion estimation algorithm estimates overall gross shift of each frame with respect to a reference frame with an integer pixel accuracy. To compensate the integer-pixel shifts among the input images, the algorithm realigns the input images. • Subpixel: The motion estimation algorithm estimates the subpixel (fraction pixel) shifts for each frame with respect to a reference frame. 3. High-resolution image reconstruction. The reconstruction algorithm is applied to the low-resolution input images with the estimated subpixel shifts among images to obtain the high-resolution (alias-free) output. The output is either a single high-resolution image that is generated from a collective sequence of low-resolution images or a sequence of high-resolution images, such as in a video sequence that is generated from multiple sequences of low-resolution images. There are many methods in each step, which are described in the following sections. 6.3.3
Image Acquisition—Microdither Scanner Versus Natural Jitter
In the first step of super-resolution image reconstruction, the input image sequence needs to have subpixel shifts among them. In general, two methods can acquire low-resolution images with subpixel shifts among them. One method is to have a
Subpixel shifted input low-resolution images
Motion estimation (integer-pixel or subpixel)
High-resolution image reconstruction
Output high-resolution image (single or sequence)
Figure 6.1
Overview of super-resolution image reconstruction.
6.3 Super-Resolution Image Reconstruction
133
controlled pixel displacement [7–9]. In this method, a special sensor or scanner (hardware) is designed to capture multiple images in a known pattern, where each image is captured by displacing the sensor in a known distance that is a multiple of a pixel, plus a known fraction of a pixel. This method is called micro-scanner, or mechanical dither. Another method is to have a noncontrolled pixel displacement, such as natural jitter. This method is more cost effective and practical. For example, in many applications, an imager is carried by a moving platform. In a rescue mission, the camera may be carried by a helicopter, a moving vehicle, or a moving ship. In a military reconnaissance situation, the camera may be carried by a person, an unmanned ground vehicle (UGV), or an unmanned aerial vehicle (UAV). In some environments, such as law enforcement and security, cameras might be stationary. However, when the scene changes between frames (a passing person or vehicle), the subpixel shifts could be generated by the motion of the entire scene or a portion of the scene. 6.3.4
Subpixel Shift Estimation
In the second step of super-resolution image reconstruction, the subpixel shift (or fraction pixel displacements) among the input images is estimated. Subpixel shift estimation is a method to estimate or calculate frame-to-frame motion within subpixel accuracy. In today’s digital computation, the smallest motion that can be obtained from the motion estimation method is one pixel. Obtaining subpixel motion accurately is not straightforward. We will describe the subpixel accuracy solution first in this section before we present the motion estimation methods in next section. 6.3.4.1
Signal Registration
Almost all of the motion estimation methods are based on the signal registration method. It is assumed that the signals to be matched differ only by an unknown translation. In practice, beside translation, there may be rotation, noise, and geometrical distortions. Nevertheless, the signal registration description reveals the fundamentals of the problem. The procedure for pixel-level registration is to first calculate the correlation between two images g1(x, y) and g2(x, y) as r( k, l ) =
N
M
∑ ∑ R[ g (n + k − 1, m + l − 1), g (n, m)] 1
2
n =1 m =1
where 1 ≤ k ≤ N, 1 ≤ l ≤ M and R is a similarity measure, such as a normalized cross-correlation or an absolute difference function. For the (N − 1, M − 1) samples of r(k, l), there is a maximum value of r(xs, ys), where (xs, ys) is the displacement in sampling intervals of g2(x, y) with respect to g1(x, y). Many motion estimation methods based on the variations of the signal registration are described in the next section. However, the actual displacement (xs, ys) is not necessarily an integer number of sampling intervals. We summarize three methods to achieve subpixel registration accuracy from the signal registration as follows:
134
Super-Resolution
• • •
Correlation interpolation; Resampling via intensity domain interpolation; Resampling via frequency domain interpolation.
6.3.4.2
Correlation Interpolation
Correlation interpolation is the most straightforward method to achieve subpixel registration accuracy. In this method, the first step is to calculate the discrete correlation function between two images, then to fit an interpolation surface to samples of this function, and finally to accurately search for the maximum of this surface [10–12]. When the images are sampled at a high enough frequency, the corresponding discrete correlation function is quite smooth and a second-order interpolation function can provide a relative accurate representation. An example of using a quadratic estimator to estimate the maximum of a second-order correlation function was shown in [10] for the analysis of a scan-line jitter that a mechanical forward looking infrared (FLIR) exhibits. The quadratic estimator for a one-dimension signal case can be expressed as x ss =
Pa − Pb 2 × (2Pm − Pb − Pa
)
where Pm is the maximum value of the sampled correlation function, Pb and Pa are the samples to the left and right of Pm, and xss is the estimated location of the peak in terms of the sample interval, referenced to Pm. Nevertheless, the accuracy of this estimator depends on how well a correlation function around the peak approximates a parabola [10]. Others use different criteria to fit the correlation surface around the peak to achieve the subpixel accuracy. Schaum and McHugh [13–15] use Pm and its highest neighbor to fit a sinc function; Abdou [16] fit a second-order polynomial to a local window centered around the peak; Stone et al. [17] find a least-square estimate of the points around the peak. 6.3.4.3
Resampling Via Intensity Domain Interpolation
Resampling is another method to achieve the subpixel level registration. The resampling process uses interpolation to create a much denser grid for selected parts of the input images. Resampling can be used accordingly in the intensity domain or the frequency domain for registration methods. For intensity domain interpolation, given a target image of M × M and a reference image of size N × N, if a registration accuracy of 0.1 pixel is desired, then the reference image should be interpolated to create a new version with dimensions of (10 × N) × (10 × N). Then, a search using the target image over these parts poten2 tially could cover all (10N − M + 1) positions [12]. The intensity domain interpolation methods include linear interpolation, spline interpolation, cubic interpolation, sinc interpolation, and the like. The success of the registration process depends not only on the accuracy of the displacement estimate but also on the fidelity of the resampling method [13]. Experiments on real images indicated in [12] show that
6.3 Super-Resolution Image Reconstruction
135
using a nearest neighbor bilinear interpolation should be excluded from consideration because it causes a maximum 0.5-pixel shift of the interpolated signal, which is not acceptable for subpixel accuracy. Also, the computation time is long. An iterative algorithm that decreases the computational cost is described in [12]. One consideration of the interpolation accuracy for the subpixel registration accuracy in super-resolution is aliasing. The input images are undersampled; therefore, they contain aliasing errors. For interpolation, this means that this method results in errors because the samples are too far apart in comparison with changes in the image signal. Therefore, interpolation to calculate intensities between these samples results in errors. Thus, a presmoothing of the sampled signals is often helpful in improving the accuracy, especially when the image to be sampled is very detailed [12]. 6.3.4.4
Resampling Via Frequency Domain Interpolation
Resampling via frequency domain interpolation can be used to achieve the subpixel registration accuracy when the frequency domain registration method is used. A Fourier windowing method [18] is an example for obtaining the resampling. In this Fourier windowing method (see Chapter 5), the zero-padding with a special window is applied to the Fourier transform of the input image to have an alias-free resampled image. This special window is called the power window, which is a Fourier-based smoothing window that preserves most of the spatial frequency components in the passband and attenuates quickly at the transition band. The power window is differentiable at the transition point, which gives a desired smooth property and limits the ripple effect. Then, the frequency domain correlation method is applied to two resampled Fourier transforms of two input images to estimate the shifts in subpixel accuracy. A computationally efficient frequency domain resampling method is summarized in Section 6.3.5.4. Similarly, as in the intensity domain, both images g1(x, y) and g2(x, y) are aliased. In general, for an aliased image, the image energy at lower-frequency components is stronger than the folded high-frequency components. Thus, the signal-toaliasing power ratio is relatively high around (fx, fy) ≈ 0. Therefore, a presmoothing using a lowpass filter is applied to the Fourier transforms before the correlation operation. To avoid the circular convolution aliasing in the spatial domain, the original images g1(x, y) and g2(x, y) are zero-padded prior to the Fourier transforms. 6.3.5
Motion Estimation
Among motion estimation methods, frame-to-frame motion detection based on gradient-decent methods is commonly used, as in [19–22]. The variations with these methods are to estimate the velocity vector field measurement based on the spatio-temporal image derivative [23]. Most of these methods need to calculate the matrix inversion or use an iterative method to calculate the motion vectors. Bergen et al. [21] describes a method to use warp information to obtain displacement. Stone et al. [17] and Kim and Su [24] present another method that estimates the phase difference between two images to obtain shifts. In this method, the minimum least square solution has to be obtained to find the linear Fourier phase relationship
136
Super-Resolution
between two images. Hendriks and van Vliet [25] compare different methods using the cross-correlation and Taylor series. Young and Driggers [26] utilize a correlation method without solving the minimum least square problem to explore the translational differences (shifts in x and y domains) of Fourier transforms of two images to estimate shifts between two low-resolution aliased images. In this section, we present three representative motion estimation methods: • • •
Gradient-based method; Optical flow method; Correlation method.
For more details on motion estimation, readers are referred to [27, 28]. 6.3.5.1
Gradient-Based Method
Two members of the image sequence are denoted by g1(x, y) and g2(x, y). The second image is a shifted version of the first image, such as g 2 ( x , y) = g 1 ( x − x s , y − y s )
(6.1)
where (xs, ys) are relative shifts between g1(x, y) and g2(x, y). As mentioned in Section 6.3.4.1, one might search for a peak in the cross-correlation N
M
∑ ∑ R[ g (n + k − 1, m + l − 1), g (n, m)] 1
2
n =1 m =1
or equivalently a minimum in the sum-of-squared-differences [20] N
M
∑ ∑ [ g (n, m) − g (n + k − 1, m + l − 1)] 2
2
1
n =1 m =1
In both cases, the summation is taken over an N × M block in the two frames. This is also called the block matching procedure. The gradient-based motion estimation method is based on the principle relationship between intensity derivatives, which is called the gradient constraint. One way that the gradient constraint is often derived is to assume that the time-varying image intensity is well approximated by a first-order Taylor series expansion—for example, g 2 ( x , y) = g 1 ( x , y) + x s g x ( x , y) + y s g y ( x , y)
(6.2)
where gx(x, y) and gy(x, y) are the partial derivatives of image intensity [gx(x, y) = ∂g1(x, y)/∂x and gy(x, y) = ∂g1(x, y)/ y]. Substituting this approximation into (6.1) gives g 2 ( x , y) − g 1 ( x , y) − x s g x ( x , y) − y s g y ( x , y) = 0
6.3 Super-Resolution Image Reconstruction
137
This equation offers only one linear constraint to solve for the two unknown components of the shift (xs, ys). Gradient-based methods solve for the shift by combining information over a spatial region (or block). A particularly simple rule for combining constraints from two nearby spatial positions is g 2 ( x i , yi ) − g1 ( x i , yi ) − x s g x ( x i , yi ) − ys g y ( x i , yi ) = 0
(
(
)
(
)
(
)
)
g 2 x j , y j − g1 x j , y j − x x g x x j , y j − ys g y x j , y j = 0
where the two coordinate pairs (xi, yi) and (xj, yj) correspond to the two spatial positions. Each row of this equation is the gradient constraint for one spatial position. Solving this equation simultaneously for both spatial positions gives the shift that is consistent with both constraints. A better option is to combine constraints from more than just two spatial positions (i.e., to select point pairs from a spatial region, or block). When we do this, we are not able to exactly satisfy all of the gradient constraints simultaneously because there will be more constraints than unknowns. To measure the extent to which the gradient constraints are not satisfied, the square of each constraint and sum is taken as E( x s , y s ) =
N
M
∑ ∑ [g (n, m) − g (n, m) − x 2
1
s
n =1 m =1
]
g x (n, m) − y s g y (n, m)
2
where N and M are number of points in x and y directions, respectively, within the block. The solution of (xs, ys) that minimizes E(xs, ys) is the least square estimate of the shift. The solution is derived by taking derivatives of this equation with respect to xs and ys, and setting them equal to zero: ∂ E( x x , y s ) ∂ xs ∂ E( x x , y s ) ∂ ys
=
2 x s g x (n, m) + y s g x (n, m)g y (n, m) =0 ∑ ∑ n =1 m =1 + g 2 (n, m) − g 1 (n, m) g x (n, m)
=
2 x s g x (n, m)g y (n, m) + y s g y (n, m) =0 ∑ ∑ n =1 m =1 + g 2 (n, m) − g 1 (n, m) g y (n, m)
N
M
[
N
]
M
[
]
These equations may be rewritten as a single equation in matrix notation: M⋅S = V
where S = [x s y s ] N M ∑ ∑ g x2 (n, m) n =1 m =1 M= N M ∑ g x (n, m)g y (n, m) ∑ n =1 m =1
T
∑ ∑ g (n, m)g (n, m) N
M
x
n =1 m =1 N
M
y
∑ ∑ g y2 (n, m) n =1 m =1
138
Super-Resolution
N M ∑ ∑ g 2 (n, m) − g 1 (n, m) g x (n, m) V = nN=1 mM=1 g n m − g n m g n m , , , ( ) ( ) ( ) y ∑ 2 1 ∑ n =1 m =1
[
]
[
]
The least squares solution is then given by S = M −1 V
presuming that M is invertible. Sometimes, M is not square, the pseudo-inverse technique is used to solve the equation, that is,
(
S = MT M
)
−1
MT V
When the matrix M is singular (ill-conditioned), there are not enough constraints to solve for the unknown shift (xs, ys). This situation corresponds to what has been called the aperture problem [20]. The gradient-based motion estimation method is also called the differential method [29] and works well only when the displacements of the objects are small with respect to their sizes. To incorporate the case involving larger shift values, the iterative technique [30] is developed and applied in Alam et al. [31]. At first, the initial registration parameters are estimated. The frame g2(x, y) is shifted by these estimated shifts so as to closely match g1(x, y). The resulting image is then registered to g1(x, y). Using this procedure, g2(x, y) is continuously modified until the registration estimates become sufficiently small. The final registration estimate is obtained by summing all of these partial estimates. 6.3.5.2
Optical Flow Method
Image motion between two images can also be obtained via the estimation of the optical flow vectors. The term optical flow was first used by Gibson et al. [32] in their investigation of motion parallax as a determinant of perceived depth. Optical flow is the apparent motion of the brightness pattern [33]. This allows us to estimate relative motion by means of the changing image. Let I(x, y, t) denote the image intensity at a point (x, y) in the image plane at time t. Then, two members of the image sequence g1(x, y) and g2(x, y) will be obtained as g1(x, y) = I(x, y, t1) and g2(x, y) = I(x, y, t2), respectively, where t1 and t2 are two time points when two images are acquired. The optical flow vector is defined as
[
]
O( x , y) = u( x , y), v( x , y)
T
for each pixel (x, y) in an image. Two components of the optical flow vector u(x, y) and v(x, y) at the point (x, y) (abbreviated as u and v, respectively) are defined as the velocity vector of that point in the following u=
dy dx ,v = dt dt
6.3 Super-Resolution Image Reconstruction
139
The computation of the optical flow vector similarly follows two constraints [19]: 1. The optical flow constraint equation; 2. The smoothness constraint. The optical flow constraint equation assumes that the image intensity of a particular point is constant when the image pattern moves, so that dI =0 dt
Using the chain rule for differentiation, it can be expressed as ∂ I dx ∂ I dy ∂ I + + =0 ∂ x dt ∂ y dt ∂ t
If we denote the partial derivatives of image intensity with respect to x, y, and t as Ix, Iy, and It, respectively, the optical flow equation has a single linear equation in the two unknowns u and v, I x u + I y v + It = 0
(6.3)
This equation is similar to the gradient constraint in (6.2), where the partial derivative of image intensity with respect to t is equal to the difference between two frames. Similarly, the optical flow equation cannot determine the velocity vector with two components, u and v. An additional constraint, called the smoothness constraint, is introduced. The smoothness constraint assumes that the motion field varies smoothly in most of the image. The pixel-to-pixel variation of the velocity vectors can be quantified by the sum of magnitude squares of the gradient of the velocity vectors. We attempt to minimize the following measure from smoothness: ∂u e s2 = ∂ x
2
∂ u + ∂ y
2
∂v + ∂ x
2
∂ v + ∂ y
2
The error of the optical flow equation, e c2 = (I x u + I y v + It )
2
should also be minimized. Then, Horn and Schunk [19] used a weighted sum of the error in the optical flow equation and a measure of the pixel-to-pixel variation of the velocity vector min e e = u,v
∫ ∫ (e B
2 c
+ α 2 e s2 dxdy
)
140
Super-Resolution T
to estimate the velocity vector [u v] at each point (x, y), where B denotes the continuous image support. The parameter α, usually selected heuristically, controls the strength of the smoothness constraint. Differentiating e with respect to u and v and setting them to zero, the velocity vector can be solved by the following two equations:
(1 + α α Iy 2
) I u + (1 + α 2
I x2 u + α 2 I x I y v = u − α 2 I x It
x
2
)
I y2 v = v − α 2 I y It
where u and v are local averages of u and v, respectively. The corresponding matrix in these equations is sparse and very large since the number of the rows and columns equals twice the number of pixels in the image. An iterative method from Gauss-Seidel method is obtained,
[ [I u
] (α + I ] (α
u n + 1 = u n − I x I x u n + I y v n + It
2
+ I x2 + I y2
v n+1 = v n − I y
2
+ I x2 + I y2
x
n
+ Iy v n
t
0
0
) )
The initial estimates of the velocities u and v are usually taken as zero. In the computer implementation, all spatial and temporal image partial derivatives are estimated numerically from the observed image samples. Horn and Schunk [19] proposed a simple averaging four finite differences to obtain the first three partial derivatives as follows [see Figure 6.2(a)]: Ix ≈
1 I(i, j + 1, k) − I(i, j, k) 4 + I(i + 1, j + 1, k) − I(i + 1, j, k)
[
+ I(i, j + 1, k + 1) − I(i, j, k + 1)
]
+ I(i + 1, j + 1, k + 1) − I(i + 1, j, k + 1) Iy ≈
1 I(i + 1, j, k) − I(i, j, k) 4 + I(i + 1, j + 1, k) − I(i, j + 1, k)
[
+ I(i + 1, j, k + 1) − I(i, j, k + 1)
]
+ I(i + 1, j + 1, k + 1) − I(i, j + 1, k + 1) It ≈
1 I(i, j, k + 1) − I(i, j, k) 4 + I(i + 1, j, k + 1) − I(i + 1, j, k)
[
+ I(i, j + 1, k + 1) − I(i, j + 1, k)
]
+ I(i + 1, j + 1, k + 1) − I(i + 1, j + 1, k)
6.3 Super-Resolution Image Reconstruction
141
i+1 i k+1 j+1
j
k
(a)
100
80
60
40
20 20
40
60
80
100
(b)
Figure 6.2 (a) The three partial derivatives of image brightness at the center of the cube are each estimated from the average of first differences along four parallel edges of the cube. Here, the column index j corresponds to the x direction in the image, the row index i to the y direction, while k lies in the time direction. (From: [19]. © 1981 North-Holland. Reprinted with permission.) (b) Two frames aerial imagery, separated by 1/25th second, from an undersampled sensor (top). The difference image between these two images is shown in the bottom left. The optical flow of this image sequence is shown in the bottom right. Note the spatially varying optical flow, with the greatest flow occurring at points closest to the sensor. (From: [5]. © 2000 The Society of Photo-Optical Instrumentation Engineers. Reprinted with permission.)
142
Super-Resolution
This means that the optical flow velocities were estimated at points lying between pixels and between successive frames. An example of optical flow estimation is shown in Figure 6.2(b) [5]. Variations of optical flow computation are proposed in [34, 35]. In these alternative approaches, the image intensity function is approximated locally by a linear combination of basis functions. The basis functions include some low-order polynomials, linear interpolation, separable splines, windowed sinc, or 2-D wavelets. 6.3.5.3
Correlation Method
The correlation method is another alternative for motion estimation between two images based on: (1) correlation theorem and (2) shift property of Fourier transform. This method begins with the cross-correlation between two images. We use the same notation as in Section 6.3.4.1. For two images, g1(x, y) and g2(x, y), the 2-D normalized cross-correlation function measures the similarity for each translation: N
C( k, l ) =
M
∑ ∑ g (n, m)g (n + k − 1, m + l − 1) 1
2
n =1 m =1
N
M
∑ ∑ g (n + k − 1, m + l − 1) 2 2
n =1 m =1
where 1 ≤ k ≤ N, 1 ≤ l ≤ M and (N, M) is the dimension of the image. If g1(x, y) matches g2(x, y) exactly, except for an intensity scale factor, at a translation of (xs, ys), the cross-correlation will have its peak at C(xs, ys). The detailed implementation and discussion can be found in [27, 36]. Correlation Theorem
A more efficient correlation method for motion estimation is in frequency domain by using the correlation theorem. The correlation theorem states that the Fourier transform of the correlation of two images is the product of the Fourier transform of one image and the complex conjugate of the Fourier transform of the other; that is,
[
]
Q12 (f x , f y ) = ᑤ C( x , y) = G 2 (f x , f y ) ⋅ G1* (f x , f y )
where G1(fx, fy) and G2(fx, fy) are Fourier transforms of two images, G1*(fx, fy) is the conjugate of G1(fx, fy). Since the Fourier transform can be computed efficiently with existing well-tested programs of FFT (and occasionally in hardware using specialized optics), the use of the FFT becomes most beneficial for cases where the images to be tested are large [27]. Shift Property of Fourier Transform
The shifts between two images are estimated using a correlation method to explore the translational differences (shifts in x and y domains) of Fourier transforms of two images. We denote two members of the image sequence by g1(x, y) and g2(x, y). The second image is a shifted version of the first image, such as,
6.3 Super-Resolution Image Reconstruction
143
g 2 ( x , y) = g 1 ( x − x s , y − y s )
where (xs, ys) are the relative shifts (translational differences) between g1(x, y) and g2(x, y). According to the correlation theorem and shift property of Fourier transform, Fourier transform of the cross-correlation, Q12(fx, fy), becomes: Q12 (f x , f y ) = G2 (f x , f y ) ⋅ G1* (f x , f y )
[
]
= G1 (f x , f y ) ⋅ G1* (f x , f y ) ⋅ exp − j(f x x s + f y y s )
[
]
= G1 (f x , f y ) ⋅ exp − j(f x x s + f y y s )
Then the shift (xs, ys) is found as the peak of the inverse Fourier transform of the cross-correlation, Q12(fx, fy). 6.3.5.4
Correlation Method Within Subpixel Accuracy
The operation of subpixel shift estimation is identical to the integer pixel shift estimation of the previous section. However, in order to achieve subpixel accuracy, the images are upsampled (resampled) first. The upsampling is obtained by a Fourier windowing method [18]. In this Fourier windowing method, the zero-padding with a window is applied to the Fourier transform of the input image to have an alias-free, upsampled image. Then, the correlation method is applied to two upsampled Fourier transforms of two input images to estimate the shifts in subpixel accuracy. A computationally efficient implementation is illustrated in Figure 6.3. Since the correlation of two upsampled images is equivalent to upsampling the correlated two original images, the upsampling operation is applied after the correlation operation in Figure 6.3. Also, because both images g1(x, y) and g2(x, y) are aliased, a lowpass filtering needs to be applied to the Fourier transforms before the correlation operation. Similarly, the correlation of two lowpass filtered images is equiva* lent to lowpass filtering the correlated two original images. (Note that W = W since W is a real signal.) The lowpass windowing is applied after the upsampling operation. Finally, the peak of inverse Fourier transform of the previous resultant image is found as the shift in subpixel accuracy. An example of subpixel shift estimation from 16 frames is shown in Figure 6.4 in which the subpixel shift of each frame with respect to a reference frame is illustrated. From this figure, we can see that the subpixel shifts among frames are randomly distributed in both x and y domains. The subpixel motion among frames is not controlled in a fixed pattern in this example.
6.3.6
High-Resolution Output Image Reconstruction
The third step in super-resolution image reconstruction is to utilize the ensemble information of successive input images, including the estimated subpixel shifts
144
Super-Resolution Correlation
g1(x,y)
G1 (fx,fy)
2D FFT
Q12(fx,fy)
Upsampling
Q12s(fx,fy)
G2* (fx,fy) g2(x,y)
G2 (fx,fy)
2D FFT
Lowpass windowing (WW*)
∼ Q12s(fx,fy)
2D IFFT
∼ q12s(fx,fy)
Find peak (xss,yss )
Subpixel shift estimation algorithm for two-dimensional images.
Vertical shift (# of processing array pixels)
Figure 6.3
Conjugate
30
·7
25
·14 ·8
20
·15 ·13
·10 ·4
·1 ·11 ·9
·16 ·12
·5 15 ··6 10
5
·3 00
5
10
15
20
·2
25
30
Horizontal shift (# of processing array pixels)
Figure 6.4 An example of subpixel shift estimation from 16 frames. The numbers on the figure represent the names of frames. The subpixel shifts of each frame with respect to a reference frame are illustrated at their corresponding subpixel locations.
among them to reconstruct the high-resolution output image. Several important issues in reconstructing the high-resolution image will be discussed in this section: •
•
•
The number of undersampled images that are required to recover the alias-free image; The factors that limit the resolution recovery of the high-resolution output image; The various methods that are used to reconstruct the high-resolution image.
6.3 Super-Resolution Image Reconstruction
6.3.6.1
145
Number of Input Images Required
One of the issues in super-resolution image reconstruction is the number of the undersampled images that are required to recover the alias-free image. The question is whether there exists a minimum or a maximum number of required input images. Here, it is appropriate to consider an example. To double the resolution, a minimum of four images are required with relative subpixel shifts of [0, 0], [0, 0.5], [0.5, 0], and [0.5, 0.5]. Generally, the low-resolution input images are offset by some number of integer pixels, plus a random subpixel remainder. When the subpixel shifts among images are random, it is likely that more than four images will be required to double the resolution in some reconstruction methods. However, an efficient reconstruction method would need only four input images to double the resolution (see following sections). Now, the question is posed from an information-theory point of view. Let the sampling space of the original undersampled images be ∆x, the sampling space of the output high-resolution image that is reconstructed from k frames (with distinct subpixel shifts) is ∆x/k in one-dimensional imaging systems, and ( ∆ x / k , ∆ y / k) in two-dimensional imaging systems. That means that it requires r input images for a resolution improvement factor of r for two-dimensional imaging systems. That is, doubling the resolution needs 4 images and quadrupling the resolution needs 16 images. Theoretically, more frames should provide more information and, thus, larger bandwidth and/or higher resolution. However, the super-resolution image quality does not improve when the number of processed undersampled images exceeds a certain value. This is due to the fact that the bandwidth recovery is limited by either bandwidth of the sensor or the target signature spectrum (see Section 6.3.6.2). Resolution is ultimately limited by the sensor bandwidth, and sensitivity is ultimately limited by noise or the human eye. 6.3.6.2
Factors Limiting the Resolution Recovery
In order to understand the factors limiting the resolution recovery in the super-resolution image reconstruction, consider the image acquisition model first. Figure 6.5 shows the system model for acquiring an undersampled image in a one-dimensional case. Let g(x) be the ideal target signature that is interrogated by the sensor. The measured target signature r(x) by the sensor is modeled as the output of a linear shift invariant (LSI) system whose impulse response is the sensor’s point spread function (PSF), h(x); this is also known as the sensor blurring function. The relationship between the measured target signature and the original target signature in the spatial frequency fx domain is Measured target Undersampled signature image
Target signature g(x)
Figure 6.5
Sensor PSF h(x)
Image acquisition model.
r (x)
rδ(x)
146
Super-Resolution
R( f x ) = G( f x ) H( f x )
where R(fx), G(fx), and H(fx) are the Fourier transforms of r(x), g(x), and h(x), respectively. Figure 6.6 shows the factors that dictate the bandwidth of the measured target signature. The bandwidth of the sensor’s PSF is fixed and is determined by the support of the transfer function H(fx). The bandwidth of the ideal target signature depends on, for example, the range of the target in FLIR, radar, sonar, visible, and others. For a target at the short range, more details of the target are observable; in this case, the ideal target signature bandwidth is relatively large. For a target at the long range, the ideal target signature bandwidth is smaller. Two curves in Figure 6.6 illustrate the bandwidths of the sensor and the ideal target signature. The wider one could be the bandwidth of the target or the sensor, or vice versa. The output bandwidth of the measured target signature is the minimum bandwidth of the ideal target signature and the sensor; that is, B o = min( Bt , B s )
where Bo, Bt, and Bs are the bandwidths of the output measured target signature, the ideal target signature and the sensor PSF, respectively. The super-resolution image reconstruction algorithm cannot produce an image whose bandwidth exceeds Bo. Suppose that the measurement system stores the incoming images at the sample spacing of ∆x. The measured target signature, r(x), is then sampled at the integer multiples of ∆x (e.g., xn = n∆x, n = 0, ±1, ±2, …, ±N) to obtain the undersampled (aliased) image; the resultant can be modeled as the following delta-sampled signal in the spatial domain (see Figure 6.5): rδ ( x ) =
N
N
∑ r( x )δ( x − x ) = r( x ) ∑ δ( x − x ) n
n
n =− N
n =− N
The Fourier transform of the delta-sampled signal is Rδ ( x ) =
1 ∆x
N
∑ R f
x
−
l =− N
2 πl ∆x
An acquired image is aliased when ∆x >
1 (Nyquist sample spacing) Bo
Bt or Bs B s or B t min(Bs,Bt)
fx
Figure 6.6 Factors that dictate the bandwidth of the measured target signature. The output bandwidth of the measured target signature is Bo = min(Bs, Bt), where Bo, Bt, and Bs are the bandwidths of the output measured target signature, the ideal target signature, and the sensor PSF, respectively.
6.3 Super-Resolution Image Reconstruction
147
As we mentioned earlier, many low-resolution sensors possess a spatial sample spacing ∆x that violates the Nyquist rate. The emphasis of the super-resolution image reconstruction algorithm is to de-alias a sequence of undersampled (aliased) images to obtain the alias-free or, as identified in the literature, a super-resolved image. We should emphasize again that the resolution of the alias-free output image is limited by the minimum bandwidth of the sensor and the ideal target signature. Many methods have been proposed to reconstruct the high-resolution image from multiple undersampled images. These can be divided into following categories: •
Nonuniform interpolation method;
•
Regularized inverse method;
•
Error-energy reduction method.
6.3.6.3
Nonuniform Interpolation Method
In the nonuniform interpolation method, the samples of the low-resolution images are viewed as the nonevenly spaced samples of the desired high-resolution image. The relative locations of these samples are determined by a subpixel shift estimation algorithm. Many works have described the nonuniform interpolation in spatial domain using biharmonic spline interpolation [37, 38], using Delaunay triangulation interpolation [39], using generalized multichannel sampling [40], using warping procedure [41, 42], using weighted interpolation [31], and convolving with a shifted-fractional kernel (polynomial kernel) [43]. The pictorial example is shown in Figure 6.7. The advantage of the nonuniform interpolation method is that it takes relatively low computational load. However, when the subpixel shifts among images are random, it is likely that many images will be required to obtain the desired resolution improvement in the output image.
Interpolation LR Images
HR Images
Figure 6.7
Interpolation reconstruction.
148
Super-Resolution
6.3.6.4
Regularized Inverse Method
In the regularized inverse processing methods, an observation model is formulated to relate the original high-resolution image to the observed low-resolution images. One observation model proposed by Park et al. [4] is shown in Figure 6.8. The model includes the sensor blur, warping, downsampling, noise, and the like. The relation of observed low-resolution image and the ideal high-resolution image is represented as y k = DB k M k x + n k for 1 ≤ k ≤ p
where x is the ideal high-resolution image that is sampled at or above the Nyquist rate, yk is the kth low-resolution image, k = 1, 2, …, p, Mk is a warping matrix, Bk represents a blur matrix, D is a subsampling matrix, and nk represents noise. The high-resolution image solution is often obtained by inverting a matrix or an iterative procedure. The observation model can be formulated either in the spatial or in the frequency domain. The corresponding inverse methods are implemented in both domains. For the spatial domain inverse method, an initial guess of a high-resolution image is obtained first [44]. Then, an image acquisition procedure is simulated to obtain a set of low-resolution images. The differences between the simulated and observed low-resolution images are used to improve the initial guess of the high-resolution image. In this method, the initial guess of the high-resolution image is crucial for the convergence of the algorithm. For the frequency domain inverse method, the relationship between lowresolution images and the high-resolution image is demonstrated in the frequency domain [45], in which the relation between the continuous Fourier transform (CFT) of an original high-resolution image and the discrete Fourier transform (DFT) of observed aliased low-resolution images is formulated based on the shifting property of the Fourier transform. Desired HR image x
Continuous scene
Continuous to discrete without aliasing
–Optical blur –Motion blur –Sensor PSF, etc.
kth warped HR image x k
–Translation –Rotation
kth observed LR image yk Undersampling (L1,L 2 )
Noise (nk)
Figure 6.8 Observation model relating LR image to HR images. (From: [4]. © 2003 IEEE. Reprinted with permission.)
6.3 Super-Resolution Image Reconstruction
149
When a priori knowledge about the high-resolution image and the statistical information of noise can be defined, the inverse processing method can provide stabilized (regularized) estimates. These methods are called regularized super-resolution approaches [46–52]. 6.3.6.5
Error-Energy Reduction Method
In the error-energy reduction algorithm, the high-resolution image is reconstructed by applying nonuniform interpolation using the constraints both in the spatial and spatial frequency domains. The algorithm is based on the concept that the error-energy is reduced by imposing both spatial and spatial frequency domain constraints: the samples from the low-resolution images and the bandwidth of the high-resolution, alias-free output image. The error-energy reduction algorithm has been utilized in other signal processing applications. Some of the well-known examples include the works by Papoulis [53], Gerchberg [54], De Santis and Gori [55], and Stark and Oskoui [56]. Papoulis utilized the available information both in the spatial and frequency domains to extrapolate the distribution of an autocorrelation for spectral estimation. In Gerchberg’s work, in order to achieve resolution beyond the diffraction limit, the spectrum of the reconstructed object is enhanced (expanded) by reducing the energy of the error spectrum from one snapshot (not a sequence). In the work by Stark and Oskoui, they described the projection onto convex sets (POCS) approach to combining the spatial observation model from a controlled rotation or scan and spatial domain constraints to achieve a high-resolution image. In the following, we will describe the implementation of the error-energy reduction reconstruction method. Let the bandwidth of the input undersampled (aliased) images be Bi. The bandwidth of the alias-free (high-resolution) output image is denoted as Bo. In order to recover the desired (alias-free) bandwidth, it is important to select a processing array with a sample spacing that is smaller than the sample spacing of the desired high-resolution image. In this case, the 2-D FFT of the processing array yields a spectral coverage, Bp, that is larger than the bandwidth of the output image. Figure 6.9 illustrates this phenomenon in the 2-D spatial frequency domain. Let p(x, y) be the high-resolution alias-free image on the processing array (grid). The goal of this process is to recover this image. The low-resolution (aliased) snapshots provide only a subset of samples of this image. The location of each aliased image on the processing grid is dictated by its corresponding subpixel shifts in the (x, y) domain. Figure 6.10 shows a diagram of the error-energy reduction algorithm. In the first step, the processing array is initialized by populating the grids using the input images according to the estimates of the subpixel shifts. In this procedure, the available image values (from multiple snapshots) are assigned to each subpixel shifted grid location of the processing array. At the other grid locations of the processing array, zero values are assigned. This can be written as p( x , y), p1 ( x , y) = 0,
( x , y) ∈ [X P , YP ] otherwise
(6.4)
150
Super-Resolution Bandwidth of lowresolution input image
fy
fx Bi fy
Bandwidth of desired highresolution output image fx
Bo fy
Bandwidth of processed image array is larger than bandwidth of desired output image fx
Bi Bo Bp
Figure 6.9
Bandwidth phenomenon in the error-energy reduction method.
Input images and estimated subpixel shifts
Initialize processing array
Apply spatial frequency domain constraint
2D FFT
2D IFFT
No Reached maximum iteration
No Error-energy reduced?
Yes
Apply spatial domain constraint
Yes Output high-resolution image
Figure 6.10
Error-energy reduction algorithm.
where [XP, YP] represents the set of grid locations where samples of the images are available from the aliased (undersampled) input images. Then, we form the 2-D Fourier transform P1(fx, fy) of this processing array. Its spectrum has a wider bandwidth than the true (desired) output bandwidth. Therefore, the spatial frequency domain constraint is applied (i.e., replacing zeros outside the desired bandwidth). We form the function
6.3 Super-Resolution Image Reconstruction
151
P2 (f x , f y ) = P1 (f x , f y ) ⋅ W p (f x , f y )
(6.5)
where Wp(fx, fy) is a windowing function of realizing the spatial frequency domain constraint. The next step is to perform the inverse 2-D Fourier transform p2(x, y) of the resultant array P2(fx, fy). The resultant inverse array is not a true image because the image values at the known grid locations are not equal to the original image values. Then, the spatial domain constraint is applied (i.e., replacing those image values at the known grid locations with the known original image values). We obtain the function p ( x , y), p3 ( x , y ) = p2 ( x , y),
( x , y) ∈[X p , Yp ]
(6.6)
otherwise
We next form the Fourier transform P3(fx, fy) of p3(x, y). We apply the spatial frequency domain constraint to form the function P4(fx, fy) from P3(fx, fy) similar to (6.5). Next, we compute the inverse Fourier transform p4(x, y), and apply the spatial domain constraint to form the function p5(x, y) from p4(x, y) and p(x, y) similar to (6.6). The procedure continues until the nth iteration. The use of the available information in the spatial and spatial frequency domains results in a reduction of the energy of the error at each iteration step. At the odd steps of the iteration, the error is defined by e2 n+1 =
∑ [ p ] ( x , y) − p( x , y)] ) [
( x , y ∈ X P ,Y P
2
2 n+1
Here, the condition of the error-energy reduction is examined by defining the following ratio:
[
SNR 2 n + 1
]
2 p( x , y) ∑ ( x , y ) ∈[ X P ,Y P ] = 10 log10 ∑ p ( x , y ) − p2 n + 1 ( x , y ) ( x , y ) ∈[ X P ,YP ]
[
]
2
(6.7)
If SNR2n+1 < SNRmax (where SNRmax is a preassigned threshold value), and the maximum iteration is not reached, the iteration continues. If SNR2n+1 > SNRmax, (i.e., the stopping criterion is satisfied), the iteration is terminated. Before the final super-resolution image is generated, the bandwidth of the output image is reduced from Bp to Bo using the Fourier windowing method [18]. Then, the final super-resolution image with the desired alias-free bandwidth is saved for the output. An illustration of the error-energy reduction algorithm for a 1-D signal is shown in Figure 6.11. The original signal is plotted as the continuous solid line signal. The dots at the horizontal line are the grid sample locations for the big processing array. The big processing array has a sample spacing that is eight times smaller than the original sample spacing in this example. Two snapshots are simulated from the original signal (alias-free). The samples from the first snapshot (reference signal) are plotted as the dark wider bars and are at 1, 9, 17, 25,… grid locations. There is no subpixel shift for this first snapshot. The samples from the second snapshot are plot-
152
Super-Resolution 3
2
Amplitude
1
0 −1
−2 −3
1
9
17
26
33
41
49
57
Time
Figure 6.11
An example of error-energy reduction algorithm for one-dimensional signals.
ted as the gray narrower bars. They possess a relative subpixel shift of three with respect to the first snapshot, and are at 4, 12, 20, 28, … grid locations. The processing array is populated with the samples of these two snapshots at 1, 4, 9, 12, 17, 20, 25, 28, … grid locations. The other grid locations (e.g., 2, 3, 5, 6, 7, 8, 10, 11, …) are assigned to be zeros at the start of the algorithm. The iterative error-energy reduction algorithm is applied to this processing array by imposing spatial and spatial frequency domain constraints at each iteration step. As the iteration continues, the error-energy between the output signal and the original signal is reduced. When the algorithm converges, the high-resolution output signal is bandlimited and is approximately equal to the original alias-free signal. 6.3.6.6
Examples
In this section, several examples are provided to demonstrate the merits of the super-resolution image reconstruction algorithm using the error-energy reduction method. The first example is to use the simulated low-resolution undersampled images from a high-resolution FLIR image. Because the subpixel shift information for these simulated low-resolution images is predetermined, the accuracy of the correlation method can be tested for the subpixel shift estimate algorithm. Then, real-world images of FLIR sensors and the visible images are utilized to demonstrate the merits of the algorithm. Example 6.1: Simulated, Low-Resolution FLIR Images
An original FLIR tank image is shown in Figure 6.12(a). The size of this image is 40 pixels in the first dimension and 76 pixels in the second dimension. First, we up-sample this image using the Fourier windowing method [18] by a factor of four to obtain a simulated high-resolution image with a size of 160 × 304, as shown in Figure 6.12(b). Then, the low-resolution images are generated from this simulated high-res-
6.3 Super-Resolution Image Reconstruction
153
(a)
(b)
(c)
(d)
(e)
(f)
Estimated subpixel shifts
8
7 o–Simulated x–Correlation
6
Vertical shift (pixel)
Vertical shift (pixel)
7
5 4
⊗
3 ⊗
2 ⊗
1 0⊗ 0
Estimated subpixel shifts
8
1
2
o–Simulated + –Gradient
6 5 4
+ + o +
3 2 o
1 3
4
5
6
7
8
0⊕ 0
1
Horizontal shift (pixel) (g)
2
o
+ +
3
4
5
6
7
8
Horizontal shift (pixel) (h)
(i)
(j)
(k)
(l)
Figure 6.12 Simulated low-resolution FLIR images: (a) original FLIR tank image; (b) simulated high-resolution image; (c)–(f) four simulated low-resolution images; (g) simulated and correlation-estimated subpixel shifts (the horizontal shift represents the first dimension shift and the vertical shift represents the second dimension shift); (h) simulated and gradient-estimated subpixel shifts; (i) one of the simulated low-resolution images; (j) one of the bilinear interpolated low-resolution images; (k) high-resolution output image by the standard interpolation method based on the gradient-estimated subpixel shifts; and (l) super-resolved image.
154
Super-Resolution
olution image by subsampling; the starting sample point (shift) is randomly generated for each low-resolution image. Let the size of the low-resolution image be 20 × 38. The resolution factor between the simulated high-resolution image and the low-resolution images is eight. The low-resolution images are formed by subsampling every 8 pixels in both dimensions from the simulated high-resolution image. The first samples of the 4 low-resolution images in the high-resolution upsampled image are at (0, 0), (2, 1), (1, 4), and (6, 2), which represent the subpixel shift locations. The low-resolution images are subsampled from the simulated high-resolution image based on the predescribed subpixel location points. For example, the simulated low-resolution image number two is formed by subsampling the simulated high-resolution image starting at the second subpixel position in the first dimension and the first subpixel position in the second dimension. Four simulated low-resolution images are shown in Figure 6.12(c–f). After applying the correlation subpixel shift estimation algorithm to these four simulated low-resolution images, the subpixel shift values are obtained. The estimated subpixel shift values are all correct according to the actual shifts. Figure 6.12(g) shows both the simulated and correlation-estimated subpixel shifts (the horizontal shift represents the first dimension shift, and the vertical shift represents the second dimension shift). For comparison purposes, we applied a gradient-based method [57] to obtain the estimated subpixel shifts for four simulated low-resolution images. Figure 6.12(h) shows both the simulated and gradient-estimated subpixel shifts. The mean absolute error between the simulated and gradient-estimated subpixel shifts is 1.67 out of 8 low-resolution pixels, or 0.21 of a whole low-resolution pixel. That shows that the gradient estimated subpixel shifts contain 20 percent discrepancy from the actual shifts for these four simulated low-resolution images. The reconstructed image from the error-energy reduction algorithm is shown in Figure 6.12(l). For comparison purposes, we also applied a standard interpolation algorithm based on the gradient-estimated subpixel shifts among these four low-resolution images. The resultant image is shown in Figure 6.12(k). This image shows artifacts that could be due to the fact that the subpixel shifts are not accurately estimated and the number of the input low-resolution images is small. (It is stated in the literature that these algorithms require a large number of snapshots [58].) The ratio of signal to error [defined in (6.7)] between the high-resolution output image in Figure 6.12(k) and the original image in Figure 6.12(a) is 0.56 dB. This indicates that the error between the high-resolution output image and the original image is large. Figures 6.12(i) and 6.12(j) show one of the low-resolution images and its bilinear interpolated image, respectively. The ratio of signal to error between the bilinear interpolated image in Figure 6.12(j) and the original image in Figure 6.12(a) is 0.71 dB. This indicates that the error between the interpolated low-resolution image and the original image is also large. The ratio of signal-to-error between the super-resolved image in Figure 6.12(l) and the original image in Figure 6.12(a) is 40.1 dB. This indicates that the image reconstructed via the error-energy reduction algorithm is a good estimate of the original image. By observing Figures 6.12(l) and 6.12(j), we can see that the super-resolved image shows a significant improvement by exhibiting the detailed information, especially around the road wheel area of the tank.
6.3 Super-Resolution Image Reconstruction
155
Example 6.2: FLIR Images
A FLIR image sequence of a foliage area is acquired using a FLIR sensor, an Indigo Systems Merlin LWIR uncooled microbolometer thermographic camera, which is undersampled. The subpixel shifts among the frames are due to the natural jitter of the camera. Sixteen low-resolution frames are used to reconstruct the high-resolution (de-aliased) image with a resolution improvement factor of four, as shown in Figure 6.13(c). For comparison, one of the original low-resolution images and one of the bilinear interpolated images are shown in Figures 6.13(a) and 6.13(b), respectively. The reconstructed image in Figure 6.13(c) shows a significant improvement by revealing high-frequency information on the tree branches. Next, we consider a sequence of FLIR images of a ground vehicle from an undersampled, airborne FLIR sensor. Similarly, the natural jittering camera produces the subpixel shifts among the frames. Figure 6.14(a) shows an entire scene, and the circled area indicates the area to be super-resolved. Figure 6.14(b) shows one of the original low-resolution images of this area. Figure 6.14(c) shows the high-resolution ground vehicle image that is reconstructed using 16 low-resolution images. This reconstructed image again illustrates a significant improvement by exhibiting the detailed information on the vehicle, especially in the tire area, which contains rich high-frequency information. Example 6.3: Visible Images
A sequence of visible images of a test pattern is acquired using a Canon GL-1 Digital Video Camcorder, which is also undersampled. Natural jittering by the handheld camera provides the subpixel shifts among the frames. Figure 6.15(a) shows the
(a)
(b)
(c)
Figure 6.13 FLIR images of a foliage area: (a) one of the original low-resolution images; (b) one of the bilinear interpolated low-resolution images; and (c) super-resolved image.
156
Super-Resolution
(a)
(b)
(c)
Figure 6.14 FLIR images of a ground vehicle from an airborne sensor: (a) entire scene; (b) one of the original low-resolution images; and (c) super-resolved image.
(a)
(b)
(c)
Figure 6.15 Visible images: (a) entire low-resolution test pattern image; (b) one of the bilinear interpolated images of a portion of the low-resolution image that is indicated from the box in part (a); and (c) super-resolved image of the same portion.
6.3 Super-Resolution Image Reconstruction
157
entire low-resolution test pattern image; Figure 6.15(b) shows the bilinear interpolated image of a portion of this low-resolution image that is indicated in the box from Figure 6.15(a); Figure 6.15(c) is the reconstructed high-resolution image from four low-resolution images of the same portion. The vertical bars that contain high-frequency information are severely aliased in the low-resolution image, as shown in Figure 6.15(b). The vertical highfrequency bars in the reconstructed high-resolution image clearly demonstrate the resolution improvement, where the high frequency information is correctly extracted. Even the horizontal bars in the reconstructed high-resolution image show a moderate resolution improvement. In this case, the sensor limits the highfrequency components in these horizontal bars to be revealed. The high-frequency components in the horizontal bars are not aliased by the image acquisition process; they are limited by the sensor resolution. The super-resolution image reconstruction algorithm can only correct for the aliased high-frequency information; it cannot recover the high-frequency information that is lost by the limited bandwidth of the sensor. 6.3.6.7
Practical Considerations
In this section, we will provide some insights on several practical issues for a successful super-resolution image reconstruction algorithm, including: (1) input image requirements, (2) noise, (3) output bandwidth determination, and (4) image warping effect. Input Image Requirements
The subpixel shifts among the acquired image frames should be widely distributed in the 2-D spatial domain. Limited and/or constrained subpixel shifts limit the additional information for the scene. For example, the subpixel shifts that exist only in one dimension do not provide the necessary information of another dimension. Sufficient subpixel shifts that distribute in both dimensions provide the necessary different “look” of the scene to reveal the hidden (aliased) high-frequency detail information. Acquired images should possess relevant though aliased high-frequency components. It is the aliased high-frequency components that are extracted by the super-resolution image reconstruction algorithm. If the scene does not have enough (relevant) high-frequency signals, for example, for a long-range target, the high-frequency components are lost due to the sensor limitation; there is no high-frequency detail information that can be revealed. Also, if the scene contains no variation in intensity, such as bland scene, then no shift estimation can be accomplished. Acquired images should possess good dynamic range. This requires that the imager is calibrated so that the signal is not saturated on the scene. The saturated signal appears as a step function. The super-resolution image reconstruction algorithm interpolates the signal using the spatial and spatial frequency domain constraints. For the signal that is a step function, the interpolation results in ripple artifacts similar to the Gibb’s phenomenon.
158
Super-Resolution
Noise
In many practical scenarios, noise in the input low-resolution aliased images would limit the subpixel shift estimation and registration, which is critical to the entire super-resolution processing procedure. However, if a lowpass filtering is applied to the input low-resolution aliased images to obtain the alias-free portion of the images, the high-frequency components, where typical noise inherently presents, are filtered out before the subpixel shift estimation. Therefore, the subpixel shift estimation algorithm is relatively less sensitive to noise. Output Bandwidth Determination
The selection of the output bandwidth in the error-energy reduction method is also a practical consideration. Too small an output bandwidth results in an undesired output high-resolution image. Too large an output bandwidth requires an unnecessarily large number of input images and results in an unchanged output high-resolution image quality after a certain number of input images are used. Usually, the sensor system provides the sampling rate and optics information. The severity of the undersampling phenomenon is known. Therefore, the output high-resolution bandwidth is easily determined by the sensor system that captures the input images. Image Warping Effect
Because the warping effect might exist in the image acquisition process, the shifts among frames are not consistent across the image. The shifts in one region might be different from ones in another region. In order to minimize the warping effect and estimate the shifts accurately, the low-resolution input images are divided into small regions (subpatches) in the spatial domain. Among the individual subpatches, the super-resolution image reconstruction algorithm is applied to obtain the highresolution (alias-free) subpatch image. At the last step, all super-resolved subpatches are combined to form the entire high-resolution output image.
6.4
Super-Resolution Imager Performance Measurements In this section, the performance enhancement of undersampled imagers using super-resolution techniques is demonstrated. There are many super-resolution algorithms, so it is important to determine a method of performance measurement. Here, the target acquisition performance of two super-resolution techniques is presented. One method by Schuler et al. [58], noted as Algorithm_1, computes the optical flow of the scene to calculate the scene motion, populates a high-resolution grid, and applies the interpolation algorithm. The other method by Young and Driggers [26], noted as Algorithm_2, computes the subpixel shifts using the frequency domain correlation method and uses the error-energy reduction method to reconstruct the high-resolution image. 6.4.1
Background
The introduction of staring array focal plane technology to succeed mechanically scanned detectors in military or commercial imagers was a major technological
6.4 Super-Resolution Imager Performance Measurements
159
advance of the last decade. It has, however, brought to light that there are potential performance losses in the resulting sensor architecture. Sensors using staring arrays may spatially undersample the image space. This characteristic may be the limiting characteristic in the sensor’s long-range performance. Infrared detectors, historically, are large (20 to 30 µm) compared to visible detectors (2 to 5 µm), so infrared systems (especially the midwave infrared) are good candidates for super-resolution performance increases. A focus in the military is on the potential for this type of techniques to be used in real-time tactical sensors. For instance, the Army Thermal Weapon Sight is a small rifle-mounted FLIR, which is undersampled. Significant performance improvement could be achieved without changing the optics or detector array by implementing this type of algorithm. A perception experiment, presented in the next section, quantifies the performance gains achievable. 6.4.2
Experimental Approach
The basic scope of the perception experiment was to collect undersampled thermal imagery and present it to human observers with and without super-resolution and quantify the improvement in performance for the super-resolution-processed imagery over unprocessed imagery. It was also decided to include both motion and still imagery to evaluate the impact of the human visual cortex’s finite integration time and its processing of motion. In this experiment, the performance is compared for four cases: static image with undersampled imager, static image with super-resolution frame sequence, dynamic image with undersampled imager, and dynamic image with super-resolution frame sequence. A simple task was chosen to minimize observer response statistical spread and training time. We used a derivative of the Triangle Orientation Discrimination (TOD) methodology by Netherlands TNO-FEL Laboratory [59–61]. This resulted in a basic four-alternative, forced-choice perception experiment. Details on these are included in subsequent sections. 6.4.2.1
Target—Triangle Orientation Discrimination (TOD)
One sensor performance measurement is the Triangle Orientation Discrimination (TOD) [59–61]. In TOD, the test pattern is a (nonperiodic) equilateral triangle in four possible orientations (apex up, down, left, or right), and the measurement procedure is a robust four-alternative, forced-choice (4AFC) psychophysical procedure. In this procedure, the observers have to indicate which triangle orientation they see, even if they are not sure. Variation of triangle contrast and/or size leads to a variation in the percentage correct between 25 percent (pure guess) and 100 percent. By interpolation, the exact 75 percent correct threshold can be obtained. A complete TOD curve [comparable to a minimum resolvable temperature difference (MRTD) curve] is obtained by plotting the contrast thresholds as a function of the reciprocal of the triangle angular size. A detailed description of the measurement of a TOD curve is given by Bijl and Valeton in [60]. The TOD method has a large number of theoretical and practical advantages. It is suitable for electro-optical and optical imaging systems in both the thermal and
160
Super-Resolution
visual domains; it has a close relationship to real target acquisition, and the observer task is easy. The results are free from observer bias and allow for statistical significance tests. The method may be implemented in current MRTD test equipment with little effort, and the TOD curve can be used easily in a target acquisition (TA) model. Two validation studies with real targets show that the TOD curve predicts TA performance for undersampled and well-sampled imaging systems. Currently, a theoretical sensor model to predict the TOD is under development. This approach was chosen due to its simplicity in analytical measurements. In this experiment, we used a modified version of TOD to determine the spatial resolution improvement of super-resolution imagery compared to undersampled imagery. The TOD was conducted in the field with a highly emissive triangle and a highly reflective background plate. One side of the triangle had a length of 18 inches, and the distance from the sensor to the triangle was varied to change the angular size of the triangle. The contrast of the triangle with respect to the background was measured to be around 45°C, so the experiment provided a high-contrast TOD measurement that corresponded to a high-contrast resolution limit. 6.4.2.2
Field Data Collection
The data collection was designed to provide a sufficient quantity of undersampled thermal images to test the effectiveness of super-resolution algorithms. The Indigo Systems Merlin uncooled microbolometer camera was selected as the sensor. This camera, as a calibrated radiometer, also provided temperature measurements. The camera was pointed at a resolution target, and various sensor motions were induced to create subpixel shifts. The resolution target used for this collection was derived from the TOD procedure. In the TOD procedure, the size, orientation, and thermal contrast of triangular test patterns are varied. The observer’s ability to resolve the orientation of the test targets is measured as a function of size and contrast. For the field data collection, a single triangular target was used. The orientation of the target was changed using a simple rotating mechanism. Its angular size varied by increasing the range of the target from the sensor. The thermal contrast was not varied. Instead, the test target was designed to provide a high-thermal contrast that resulted in a resolution limited point on the TOD curve. The test target consisted of a square piece of sheet aluminum (1.22m on a side) attached to some plywood backing and a separate triangular piece of sheet aluminum (0.457m on a side, or 18 inches). The triangle was painted matte black for maximum absorption of solar radiation and high emissivity. A small hole was made through the center of the aluminum square and plywood backing. A short rod was attached to the back of the triangular target through the hole. This allowed the triangle to be controlled from the back and allowed the triangle to “float” a small distance in front the larger square for thermal isolation. Figure 6.16 shows the triangle target. The aluminum square acted as a reflector in the longwave infrared region of the sensor. It was tilted back a few degrees on a support structure to reflect the cold sky back to the sensor. The temperature of the reflected sky, as measured by the Merlin Infrared radiometer, was less than 0°C, while the temperature of the triangle was
6.4 Super-Resolution Imager Performance Measurements
Figure 6.16
161
Triangle target.
greater than 45°C. As a result, there was a high thermal contrast for the triangle at all times during the test. As a thermal reference, an Electro-Optic Industries, 3-inch extended area blackbody with temperature controller was located in the foreground of the FOV. The triangle target was placed at 50m, 75m, 100m, 125m, and 150m from the sensor. The center of the triangle was approximately 1.5m high. At each range, the triangle was positioned at four orientations: left, up, right, and down. Twelve-bit digital thermal image data was collected for each of the 20 range-orientation combinations. For each range and orientation, the following data were collected. Four image sequences of 6 seconds each were collected at 60 frames/seconds while smoothly varying both the azimuth and elevation of the sensor on a tripod. The motion on the tripod was implemented by hand to give a natural, uncontrolled optical flow field. These image sequences were used to extract and compare five image types: •
•
•
•
•
Static undersampled images: These were taken by extracting a single frame from the image sequences. Static super-resolution images, Algorithm_1: These were image sequences that were processed with super-resolution using Algorithm_1 to result in a single static super-resolved image. Static super-resolution images, Algorithm_2: These were image sequences that were processed with super-resolution using Algorithm_2 to result in a single static super-resolved image. Dynamic undersampled images: These were the raw image sequences taken from the undersampled camera. Dynamic super-resolution images, Algorithm_1: These were image sequences that were processed with Algorithm_1 to result in a dynamic super-resolved image sequence.
6.4.2.3
Sensor Description
The sampled imaging sensor used for this investigation was an Indigo Systems Merlin LWIR uncooled, microbolometer thermographic camera. The thermographic camera was calibrated to absolute temperatures and has both an automatic and a manual nonuniformity correction mechanism to minimize fixed-pattern noise. For this investigation, the camera was fitted with a 25-mm lens so that there would be
162
Super-Resolution
only a small number of pixels on target at reasonably short ranges. Table 6.1 lists some of this camera’s key features and performance parameters. For the analysis that follows, the presample blur is comprised of the MTF from optical diffraction and the MTF from the detector size and shape [62–65]. Figure 6.17(a) shows the presample blur transfer response of the sensor used in this investigation as a function of spatial frequency in cycles per milliradian for one spatial dimension. From the sampled spectrum (sampling frequency is 0.49 cycles per milliradian) in Figure 6.17(b), it can be observed that this sensor is significantly undersampled, as evidenced by the large amount of overlap (aliasing) between the baseband spectra and the replica spectra generated from the sampling process. The postsample blur response of the system is determined by the digital signal processing, the display characteristics, the human eye response [65], the display observation distance, and the sensor FOV. Table 6.2 lists these key parameters. Note that, in the case of the undersampled imagers, a bilinear interpolation (three times) coupled to an upsample process provided for an 8 × increase in the image size. For the super-resolution images, the super-resolution process provided for a 4 × increase in the number of pixels in each direction followed by a bilinear interpolation, for a total of 8 × increase in the number of pixels. These processes maintained the same magnification for all sets of imagery. Figure 6.18 shows the sensor (baseband) postsample transfer response along with the total (in-band and out-of-band) spurious response. The baseband response is the result of multiplying the total postsample MTF by the total presample MTF centered at the frequency origin. The spurious response is defined as the Fourier transform of the sampling artifacts and is found by multiplying the total postsample MTF by all of the sample replicas of the total presample MTF not centered at the frequency origin. Figure 6.18 illustrates the effect of upsampling three times (which yields 8 × the number of pixels in each direction) using a bilinear interpolation function. Figure 6.18(a) gives the postsample blur response, which is composed of the display MTF, the human eye MTF, and the 8 × bilinear interpolation MTF. Figure 6.18(b) gives the associated transfer and spurious response functions. The responses given in Figure 6.18(b) represent the response functions for all of the nonsuper-resolved images
Table 6.1
Sensor Features and Specifications
Feature/Parameter Description
Specification
Spectral range
7.5–13.5 µm
Field of view (FOV)
25-mm lens: 36 × 27 degrees
Camera F/#
1.3
Optical transmission coefficient
0.93
Array format
320 × 240
Detector size
41 µm × 41 µm
Detector pitch
51 µm
NEdT
< 100 mK
Analog video
NTSC at 30 Hz
Digital video
Up to 60 Hz/12-bit resolution
6.4 Super-Resolution Imager Performance Measurements
Amplitude response
Amplitude response
1 Detector MTF
0.8
Optics MTF
0.6 0.4 0.2 0 −0.2 −0.4
Total presample MTF 0
163
1
Presample MTF
Sample replicas of presample MTF
0.8 0.6 0.4 −φ
e
0.2 0
−0.2
0.5 1 1.5 2 2.5 3 Frequency (cycles per milliradian) (a)
−0.4 0
0.5 1 1.5 Frequency (cycles per milliradian) (b)
Figure 6.17 (a) The MTF associated with presample blur; and (b) sampled spectrum (sampling frequency is 0.49 cycle per milliradian).
1 0.8 0.6 0.4
Postsample Blur Response Parameters and Specifications
Feature/Parameter Description
Specification
Display type
CRT (Gaussian display spot)
Display pixel pitch (cm)
0.024
Display spot diameter at 1/e of the maximum output intensity (cm)
0.024/(π)1/2 = 0.01354
Display luminance (fL)
15
Observer viewing distance (inch/cm)
(18/45.72)
Sensor FOV (degrees)
18 × 13.5
8X bilinear interpolation MTF
Amplitude response
Amplitude response
Table 6.2
Display MTF Eye MTF Total postsample MTF
0.2 0
−0.2 0
1 Transfer response 0.8 0.6
Spurious response
0.4 0.2 0
0.5 1 1.5 2 2.5 3 3.5 Frequency (cycles per milliradian) (a)
4
−0.2 −1
−0.5 0 0.5 Frequency (cycles per milliradian)
1
(b)
Figure 6.18 (a) Postsample MTFs; and (b) total baseband and spurious response for the undersampled imager.
processed for this experiment. It is notable that much of the lost spatial frequency components has been recovered as a result of the upsampling process and that the spurious response outside of the half-sample rate (0.49/2 cycles per milliradian—the out-of-band spurious response) has been substantially limited by the effect of the bilinear interpolation function.
164
Super-Resolution
The desired result of super-resolution image enhancement is to effectively increase the spatial sampling frequency through fusion of a series of image frames that were captured with lower spatial sampling frequency but that are related to each other by subpixel shifts or spatial translations. Figure 6.19 illustrates the expected response when super-resolution enhancement is applied to the subject system. Figure 6.19(a) gives the new postsample blur response for the super-resolved images, which is comprised of the display MTF, the human eye MTF, and the 2 × bilinear interpolation MTF. Figure 6.19(b) gives the transfer and spurious response functions for the super-resolved images processed for this experiment. It is notable that almost all of the lost spatial frequency components have now been recovered and that the ratio of the total area of the spurious response to the total area of the transfer response [the spurious response ratio (SRR)] has been reduced from 0.39 to 0.04. In addition, the overall baseband has been broadened (note that the scales are different). 6.4.2.4
Experiment Design
1 0.8 0.6 0.4
2X bilinear interpolation MTF
Amplitude response
Amplitude response
The form of perception experiment chosen for performance estimation was a four-alternative forced-choice (4AFC) format. This type of experiment is commonly used to reduce statistical spread in observer responses. In this case the choice was up, down, left, or right for the direction of the target. The target in this experiment was an equilateral triangle, which was highly emissive against a very low emissivity background. A series of 16 images were collected at each of five different ranges. The result was 25 cells of 16 images or movies each. These cells were arranged randomly for presentation to the observers. The images were presented to 12 observers to determine the performance difference as a result of super-resolution processing and motion (dynamic versus static imagery). Figure 6.20(a) is an undersampled static image of the triangle target at 50m. Figure 6.20(b) is the same target at 75m but with the triangle pointed downward. Figure 6.20(c) is the same target image sequence at 75m, but it is subsequent to super-resolution processing. Significant improvement in image quality is apparent.
Display MTF Eye MTF Total postsample MTF
0.2 0
−0.2 0
1 Transfer response 0.8 0.6
Spurious response
0.4 0.2 0
0.5 1 1.5 2 2.5 3 3.5 Frequency (cycles per milliradian) (a)
4
−0.2 −1.5
−1 −0.5 0 0.5 1 Frequency (cycles per milliradian) (b)
Figure 6.19 (a) Postsample MTFs; and (b) total baseband and spurious response for the undersampled imager, after super-resolution enhancement.
1.5
6.4 Super-Resolution Imager Performance Measurements
(a)
165
(b)
(c)
Figure 6.20 Triangle target: (a) at 50m; (b) at 75m and undersampled; and (c) at 75m with super-resolution.
While it is very difficult to determine the triangle’s orientation prior to processing, it is readily apparent afterward. The baseline imagery was either static or dynamic (361 frames and 6 seconds duration), grouped into five cells to correspond to five different ranges. The range varied from 50m to 150m in 25-m increments. Ranges were selected to vary observer performance, with and without processing, from very high probability to very low probability of a correct answer. Table 6.3 illustrates the organization and naming convention for the image cells. Each row represents an alternative way of presenting the imagery to the observer. The rows each contain the 80 unique images or dynamic series of images. Each cell contains four images of each direction the equilateral triangle could point: up, down, left, and right. The performance of each row can be directly compared with that of the other rows, since they were collected from the same targets at the same time and range, and presented at the same magnification. During the testing, observers were shown one image at a time, in random presentations. The responses were timed, but the observers were given unlimited time to make their recognition choices. The menu consisted of four options presented in a forced-choice method. The test required approximately 70 minutes of observer time. The observers were briefed on the imagery and targets and trained to use the experimental interface. The dimensions of the display monitor were 12 × 16 inches, and the viewing distance was approximated at 15 inches. The target image size was 6 × 4.5 inches for 640 × 480 pixels. The contrast level for each monitor was set prior to the experiment, using a seven-level black-and-white graduated target, to a maximum value of 70 candela per square meter (Cd/m2) for the white pixels, a minimum of 0.5 Cd/m2, and a center level of 35 Cd/m2. Using these dimensions and the display brightness,
Table 6.3
Image Cell Layout and Naming Convention
Range (m): D
A (50)
B (75)
C (100)
D (125)
E (150)
Static undersampled images
AA
BA
CA
DA
EA
Static SR images—Algorithm_1
AB
BB
CB
DB
EB
Static SR images—Algorithm_2
AC
BC
CC
DC
EC
Dynamic undersampled imagery
AD
BD
CD
DD
ED
Dynamic SR imagery—Algorithm_1
AE
BE
CE
DE
EE
166
Super-Resolution
the contrast threshold at the monitor’s limiting frequency was estimated to be 0.5 percent. In order to ensure that the monitor (display pixel) MTF was not a limiting MTF in the experiment, the undersampled images were interpolated up using a bilinear estimator. The super-resolved and undersampled images were processed so that image size was consistent for each row. These operations made the MTF of the monitor wide in the frequency domain compared to the other MTFs (pixel very narrow in space). 6.4.3
Measurement Results
The results of the experiment were tabulated and sorted into an Excel spreadsheet for analysis. These results were combined in ensemble form (observer responses were averaged to give ensemble probabilities) and then corrected for the random guess rate of 25 percent. The probability curves for the different cases are shown in Figure 6.21. The baseline probability case is the static undersampled data, which resulted in poor range results with a 50 percent probability of target discrimination at a range of 75m. The static super-resolution Algorithm_1 and super-resolution Algorithm_2 data should be compared with this baseline to determine the benefit of super-resolution algorithms for static images that result from super-resolution processing on sequences of undersampled imagery. Note that the Algorithm_2 provides a high probability of target discrimination with a 50 percent probability out to 120m, a significant improvement. The Algorithm_1 provides a 50 percent probability of target discrimination at 110m, also a significant improvement, but with a graceful degradation with range. For static imagery, the super-resolution processing provides between 40 percent (Algorithm_1) to 60 percent (Algorithm_2) improvement in range performance. The baseline for dynamic imagery is the dynamic undersampled data. When the dynamic super-resolution Algorithm_1 data are compared with this baseline, the range at 50 percent target discrimination improves from 100m to 120m for a 20 per-
TOD probability
1 0.8 0.6 0.4 0.2 0 50
75
100
125
Range (m) Static SR algorithm_2 Static undersampled Static SR algorithm_1 Static undersampled Dynamic SR algorithm_1
Figure 6.21
Experimental results.
150
6.5 Sensors That Benefit from Super-Resolution Reconstruction
167
cent increase in range performance. While some improvement is realized, the effect is not on the scale of the improvement seen for the static case. Such comparisons suggest that the human eye is providing some means for super-resolution processing. That is, the eye-brain combination is putting together sequential images in some limited sense to accomplish a super-resolution effect. The concept of human perceptual super-resolution is supported when comparing the two undersampled cases, the static versus dynamic. With no super-resolution processing, the static undersampled range at 50 percent target discrimination is 75m, whereas the dynamic undersampled range is 100m, a 33 percent increase in range performance. The idea of human perceptual super-resolution is also supported by the phenomenon of dynamic MRTD, whereby an observer can see four-bar patterns when they are moving and cannot see them without motion [66].
6.5
Sensors That Benefit from Super-Resolution Reconstruction Well-sampled imagers do not benefit from super-resolution. Assuming a system with a diffraction-limited optical system and a rectangular detector with 100 percent fill factor, the system can be generalized into three different regions. The first region is definitely undersampled, where the optical cutoff frequency is equal to the sample rate (left side of Figure 6.22). Any smaller optical blur (or corresponding higher optical cutoff frequency) makes a more undersampled sensor. The second region is definitely well sampled, where the optical cutoff is equal to the half sample (or Nyquist) rate. Any larger optical blur (or smaller optical cutoff frequency) would result in a more well-sampled system. The region between these cases is a transition point from an undersampled sensor to a well-sampled sensor. Three regions can be identified for sensors that may benefit from superresolution based on the previous discussion. For the MWIR (3 to 5 µm), Figure 6.23 shows three regions based on detector sample spacing in micrometers and sensor f-number. The interface between very beneficial and somewhat beneficial corresponds to the left side of Figure 6.22. The interface between somewhat beneficial and no benefit corresponds to the right side of Figure 6.22. The same type of graph for the LWIR (8 to 12 µm) is shown in Figure 6.24. Note that for a 20-µm MWIR detector system, a great benefit for super-resolution occurs at F/5 or below where no benefit occurs at F/10 and above. The transition from a benefit to no benefit occurs Detector MTF
H(f )
Nyquist
Optics MTF
H(f )
f fs/2 Undersampled
Figure 6.22
fs
Undersampled and well-sampled imagers.
Nyquist
f fs/2 Well-sampled
fs
168
Super-Resolution 50 45
Sample spacing (µm)
40
Very beneficial
35 30 25
Somewhat beneficial 20 15 10 No benefit
5 0 1
Figure 6.23
3
5
9
7 F-number
11
MWIR regions of super-resolution benefit.
50 45
Sample spacing (µm)
40 35
Very beneficial
30 25
Somewhat beneficial
20 15
No benefit
10 5 01
2
3
4
5
6
F-number
Figure 6.24
LWIR regions of super-resolution benefit.
between F/5 and F/10. For a 20-µm LWIR detector system, the transition occurs between F/2 and F/4. 6.5.1
Example and Performance Estimates
In order to understand the performance impact of super-resolution, a typical LWIR imaging sensor was modeled using NVThermIP (available at https://www.sensiac. gatech.edu/), a model based on the principles covered in Chapter 3. The relevant input parameters are listed in Table 6.4. The procedure for modeling superresolution was to increase the number of detectors (samples) without changing the detector dimensions (sample spacing), effectively adding additional samples to the model.
6.5 Sensors That Benefit from Super-Resolution Reconstruction Table 6.4
169
Example Model Inputs
Optics Parameters
Value
Units
Transmission F/# Optics blur
1 1 0
Milliradian in object space
6.13E+11 0 100 percent 1,280 × 720 2,560 × 1,440
cm-Sqr(Hz)Watt
25 12.5 15 32.15
µm µm cm cm
1.25 3.2 18.63 5 Varies with range
°C 2 m Cycles on target °C
Detector Parameters *
Peak D NETD Fill factor Baseline detectors Super-resolution detectors Display Parameters Baseline pixel size Super-resolution pixel size Display height Display viewing distance Target Parameters Target contrast Target size V50 identity Scene contrast temperature Gain
Given that the reconstruction is formed from multiple images, there are impacts to noise/sensitivity that must be taken into account in the model. The samples in a given frame are either unique with respect to the high-resolution reconstruction grid or are nonunique. A unique sample can be used to reduce the sampling artifacts in the image and is accounted for by the increase in the number of detectors. A nonunique sample can be averaged with others to produce a lower-noise value. As an example, if a sequence of 180 frames is processed to produce a single highresolution frame having three times the resolution of one of the original frames, there are only nine unique sampling positions necessary to obtain that resolution. The rest of the samples at any particular position are nonunique and can be used to reduce noise. In this case, you can average 20 frames (180 divided by 9) per unique sample location. This is accounted for in the model by adding frame averaging for the number of nonunique frames for each sample position. It should be noted that this procedure only applies to temporal noise. Having addressed the resolution and noise aspects of the imager, the issue of display must then be discussed. If super-resolution is used to obtain more samples, then those samples must be displayed. There are several possible scenarios for doing this, but they can be broadly categorized as those that increase magnification and those that do not. If the existing display has sufficient resolution to display all of the additional samples, then the original low-resolution imagery was either shown on a small portion of the display or interpolated to fill the screen. In the first case, displaying all of the high-resolution samples results in a system magnification increase. This is accounted for in the model by changing the display height. In the latter case, there is no change in overall image size, so no magnification occurs. It is also possi-
170
Super-Resolution
ble to display only a portion of the high-resolution image (cropping or chipping), if the display cannot handle all of the additional samples. This also results in a magnification increase, even though the image size has not changed. In this case, the change in magnification must be calculated offline and input directly to the model. The model results are found in Figures 6.25 and 6.26 and summarized in Table 6.5. Both the system modulation transfer function (MTF) and the system contrast threshold function (CTF) grew outward in frequency space from the additional samples. This improvement in turn led to an increase in the overall range prediction. For this particular case of an F/1 system, the 41 percent calculated model range increase is substantial. Likewise, for a sensor in the midwave region, there is also great benefit in using super-resolution. For a sensor with similar model inputs with the exception of spectral characteristics and average optical transmission, there is a 43 percent modeled range increase between the baseline sensor and the superresolution sensor.
Baseline system contrast threshold function 1
0.1
0.01 Hor Pre-MTF Hor Post-MTF
0.001
System MTF Eye CTF Noise CTF
0.0001 0
2
4
6
8
10
Spatial frequency (cycles/milliradian)
Figure 6.25
Model outputs for baseline (without super-resolution) longwave FLIR.
SR system contrast threshold function 1
0.1
0.01 Pre-MTF Post-MTF System MTF
0.001
Eye CTF Noise CTF
0.0001 0
2
4
6
8
10
Spatial frequency (cycles/milliradian)
Figure 6.26
Model outputs for longwave FLIR with super-resolution reconstruction.
6.5 Sensors That Benefit from Super-Resolution Reconstruction Table 6.5
171
NVTherm Model Outputs—Range and Contrast
Model Parameter
Baseline
SR-REC
Detector resolution
640 × 360 1,280 × 720
F/#
1
1
50 percent identification range in km
3.38
4.772
Target contrast at 50 percent ID range
0.09
0.08125
V50
18.63
18.63
As the F/# increases, the benefit of applying super-resolution to both midwave and longwave imagers diminishes. After adjusting for the relevant optical parameters (field of view, focal length, and F/#) and lengthening the integration time, the same procedure was applied to a modeled optical system as was applied to the F/1 system. Figure 6.27 shows that as the F/# increases, the percentage increase in the modeled range gain between nonsuper-resolution and super-resolution sensors declines to negligible values at F/6 in the longwave and F/12 in the midwave. There is not enough information in the additional samples to warrant the use of superresolution techniques as the optics approach at high F/#. Figure 6.27 was calculated for 20-µm detectors and is a horizontal slice through the graphs of Figures 6.23 and 6.24. Any other size detector would yield different results; however, 20-µm detectors are common in current infrared systems. Holst et al. [67] improved the method for determining which sensors benefit from super-resolution and generalized the sensors more, as shown in Figure 6.28. The horizontal axis of the graph in Figure 6.28 generalized the sensor for all wavelengths with the product of the f-number and the wavelength that provided an improvement over Figures 6.23 and 6.24. It was also realized that while the region of diffraction-limited systems was reasonable, the region where the diffraction cutoff met the Nyquist rate was an arbitrary line, and the benefit could be quantified over a series of lines as shown. The benefit is small near the diffraction-limited line and increases as the line rotates to the left of the graph. This plot of Figure 6.28 can be used in general for super-resolution applied to many different sensors.
Percentage range improvement
50 Percentage range improvement between SR and BL in LW
40
Percentage range improvement between SR and BL in MW
30
20
10
0 0
2
4
6
8
10
12
F/number
Figure 6.27 Modeled range performance improvement for super-resolution (SR) as a function of F/# compared to the baseline (BL) sensor.
172
Super-Resolution Range improvement 50%
50
40% 40
30%
d (µm)
20% 30 10% 20 10
No aliasing
0 0
Figure 6.28
10
20
F λ (µm)
30
40
50
Holst’s super-resolution performance chart.
6.6 Performance Modeling and Prediction of Super-Resolution Reconstruction Fanning et al. [68] provide a summary for the modeling of super-resolution in performance modeling and prediction of super-resolution. Essentially, the performance modeling of super-resolution begins with increasing the sampling rate of the sensor. The detector size ends up greater than the distance between samples, so a sampling improvement is realized. This improvement decreases the aliasing and also makes the spatial reconstruction filter smaller in angular space (thus increasing the resolution of the system). The additional samples must be displayed to the observer, and there are two ways to model this. If the display height is unchanged, then the display spot size must be reduced by the increase in sampling rate. This approach does not change the magnification of the system. The second approach is to retain the display spot size, and the image display height (and width) is increased by the sampling rate increase. This approach does change the system magnification. Super-resolution also affects the sensitivity of the sensor. The frame averaging aspects of super-resolution reduces sensor temporal noise, σtvh. The motion can also reduce detector nonuniformity or fixed-pattern noise, σvh. Column noise σv and row noise σh can also be reduced with motion. Assuming that the samples are independent, the noise is reduced by the square root of the number of samples averaged. The temporally correlated signal adds linearly, while the random noise only adds in quadrature—thus increasing the signal-to-noise ratio. The fixed-pattern noise terms, the terms that do not involve time (σvh, σv, and σh) are not affected by the frame averaging with no motion, but they are reduced with the averaging that results from shift estimation and then averaging image sections. The sensor motion combined with frame averaging used for super-resolution has the effect of reducing the sensor fixed-pattern noise. Moving the sensor in a pattern and then shifting the resulting images such that the target is stationary has the effect of moving the fixed-pattern noise. The resulting fixed-pattern noise is no longer fixed or constant in time, so it can be reduced using frame averaging. The symbol nt is the number of independent time samples or the number of frames. The symbol nvh is the number of
6.7 Summary
173
independent spatial samples or (x, y) pairs. Symbols nv and nh are the numbers of independent vertical and horizontal samples, respectively. Then, the effective super-resolution noise components become
( σ tvh ) srr
=
σ tvh nt
σ and vh = σ tvh srr
nt
k 2 σ vh n vh σ tvh
(6.8)
k2 σ v ,h = σ tvh srr
nt
k2 σ v , h n v , h σ tvh
(6.9)
where k is the increase in sampling rate in the horizontal and vertical direction. One type of possible motion is circular motion. In one complete revolution, the number of independent (x, y) samples in σvh is equal to the lesser of π times the diameter of the circle and the number of frames. The number of independent y and x samples in σv and σh is equal to at most the diameter of the circle. Multiple revolutions of the same diameter circle does not give additional independent samples, limiting the maximum improvement possible for a given circle diameter. The improvement in σv and σh is somewhat optimistic because circular motion does not uniformly distribute x and y samples. Fanning’s procedure for modeling super-resolution is as follows: 1. Increase the number of horizontal and vertical detectors to model the sampling rate increase. A 640 × 480 native detector array oversampled at a 2:1 rate would be modeled as a 1,280 × 960 array. 2. The display resolution increase may be modeled using one of two approaches. First, the image display height is increased relative to the oversampling ratio. This increases the total magnification of the system. Alternatively, the display height is maintained and the display spot size is decreased by the oversampling rate. This approach maintains the system magnification. 3. Pixel averaging produces an increase in sensitivity, which is modeled as described in (6.8) and (6.9).
6.7
Summary In this chapter, we studied fundamentals, algorithms, and performance analysis for super-resolution image reconstruction. The meaning of the super-resolution and the sources of loss of high-frequency components due to sensor undersampling are introduced. For each of the three steps in super-resolution reconstruction, we presented several techniques in detail for subpixel shift estimation, integer-pixel shift estimation, and high-resolution output image reconstruction. The concept and implementation of super-resolution are accompanied by the real data from several undersampled imager systems. The image performance measurements, sensors that
174
Super-Resolution
benefit, and performance modeling and prediction were discussed. Such image performance analysis methods will be found useful in quantifying the performance gains and determining sensors that benefit from super-resolution image reconstruction. Techniques to improve image quality from another source of loss of high-frequency information due to sensor’s blurring function by image deblurring are discussed in the next chapter.
References [1] Bertero, M., and C. De Mol, “Super-Resolution by Data Inversion,” Progress in Optics, Vol. XXXVI, No. 36, 1996, pp. 129–178. [2] Kang, M. G., and S. Chaudhuri, “Super-Resolution Image Reconstruction,” IEEE Signal Processing Magazine, Vol. 20, No. 3, May 2003, pp. 19–20. [3] Fried, D., “Analysis of the CLEAN Algorithm and Implications for Superresolution,” JOSA A, Vol. 12, No. 5, May 1995, pp. 853–860. [4] Park, S. C., M. K. Park, and M. G. Kang, “Super-Resolution Image Reconstruction: A Technical Overview,” IEEE Signal Processing Magazine, May 2003, pp. 21–36. [5] Schuler, J. M., and D. A. Scribner, “Dynamic Sampling, Resolution Enhancement, and Super Resolution,” Ch. 6 in Analysis of Sampled Imaging Systems, R. H. Vollmerhausen and R. G. Driggers, (eds.), Bellingham, WA: SPIE Press, 2000. [6] Borman, S., and R. L. Stevenson, “Super-Resolution from Image Sequences—A Review,” Proc. of IEEE International Midwest Symposium on Circuits and Systems, Notre Dame, IN, August 9–12, 1998, pp. 374–378. [7] Zalevsky, Z., N. Shamir, and D. Mendlovic, “Geometrical Superresolution in Infrared Sensor: Experimental Verification,” Optical Engineering, Vol. 43, No. 6, June 2004, pp. 1401–1406. [8] Ben-Ezra, M., A. Zomet, and S. K. Nayar, “Video Super-Resolution Using Controlled Subpixel Detector Shifts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 6, June 2005, pp. 977–987. [9] Krapels, K., et al., “Performance Comparison of Rectangular (4-Point) and Diagonal (2-Point) Dither in Undersampled IRFPA Imagers,” Applied Optics, Vol. 40, No. 1, 2001, pp. 101–112. [10] Dvorchenko, V. N., “Bounds on (Deterministic) Correlation Functions with Applications to Registration,” IEEE Transactions on Pattern Anal. Mach. Intell., PAMI-5, No. 2, 1983, pp. 206–213. [11] Anuta, P. E., “Spatial Registration of Multispectral and Multitemporal Digital Imagery Using Fast Fourier Transform Techniques,” IEEE Transactions on Geoscience Electronics, Vol. GE-8, 1970, pp. 353–368. [12] Tian, Q., and M. Huhns, “Algorithms for Subpixel Registration,” Computer Vision, Graphics and Image Processing, Vol. 35, 1986, pp. 220–233. [13] Schaum, A., and M. McHugh, “Analytic Methods of Image Registration: Displacement Estimation and Resampling,” Naval Research Report 9298, February 28, 1991. [14] Seitz, P., “Optical Superresolution Using Solid-State Cameras and Digital Signal Processing,” Optical Engineering, Vol. 27, 1988, pp. 535–540. [15] Foroosh, H., J. B. Zerubia, and M. Berthod, “Extension of Phase Correlation to Subpixel Registration,” IEEE Transactions on Image Processing, Vol. 11, No. 3, March 2002, pp. 188–200. [16] Abdou, I. E., “Practical Approach to the Registration of Multiple Frames of Video Images,” Proceedings of IS&T/SPIE Vol. 3653 Conference on Visual Communications and Image Processing ’99, San Jose, CA, January 25, 1999, pp. 371–382.
6.7 Summary
175
[17] Stone, H. S., et al., “A Fast Direct Fourier-Based Algorithm for Subpixel Registration of Images,” IEEE Transactions on Geoscience and Remote Sensing, Vol. 39, No. 10, October 2001, pp. 2235–2243. [18] Young, S. S., “Alias-Free Image Subsampling Using Fourier-Based Windowing Methods,” Optical Engineering, Vol. 43, No. 4, April 2004, pp. 843–855. [19] Horn, B. K. P., and B. G. Schunk, “Determining Optical Flow,” Artificial Intelligence, Vol. 17, 1981, pp. 185–203. [20] Heeger, D. J., “Model for the Extraction of Image Flow,” J. of the Optical Society of America A, Vol. 4, 1987, pp. 1455–1471. [21] Bergen, J. R., et al., “Dynamic Multiple-Motion Computation,” in Artificial Intelligence and Computer Vision: Proceedings of the Israeli Conference, Y. A. Feldman and A. Bruckstein, (eds.), Elsevier, 1991, pp 147–156. [22] Bierling, M., “Displacement Estimation by Hierarchical Blockmatching,” Proceedings of SPIE Vol. 1001, Visual Communications and Image Processing ’88, Cambridge, MA, November 1988, pp. 942–951. [23] Kleihorst, R. P., R. L. Lagendijk, and J. Biemond, “Noise Reduction of Image Sequences Using Motion Compensation and Signal Decomposition,” IEEE Transactions on Image Processing, Vol. 4, No. 3, March 1985, pp. 274–284. [24] Kim, S. P., and W. Y. Su, “Subpixel Accuracy Image Registration by Spectrum Cancellation,” Proceedings of ICASSP-93—1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 5, Minneapolis, MN, April 27–30, 1993, pp. 153–156. [25] Hendriks, C. L. L., and L. J. van Vliet, “Improving Resolution to Reduce Aliasing in an Undersampled Image Sequence,” Proceedings of SPIE, Vol. 3965, Sensors and Camera Systems for Scientific, Industrial, and Digital Photography Applications, San Jose, CA, January 24, 2000, pp. 214–222. [26] Young, S. S., and R. G. Driggers, “Super-Resolution Image Reconstruction from a Sequence of Aliased Imagery,” Applied Optics, Vol. 45, No. 21, July 2006, pp. 5073–5085. [27] Brown, L. G., “A Survey of Image Registration Techniques,” ACM Computing Survey, Vol. 24, No. 4, December 1992, pp. 325–376. [28] Barron, J., “A Survey of Approaches for Determining Optic Flow, Environmental Layout, and Egomotion,” Technical Report RBCT-TR-84-5, Department of Computer Science, University of Toronto, 1984. [29] Huang, T. S., Image Sequence Analysis, Berlin/Heidelberg: Springer-Verlag, 1981. [30] Irani, M., and S. Peleg, “Improving Resolution by Image Registration,” CVGIP: Graphical Models and Image Processing, Vol. 53, 1991, pp. 231–239. [31] Alam, M. S., et al., “Infrared image Registration and High-Resolution Reconstruction Using Multiple Translationally Shifted Aliased Video Frames,” IEEE Transactions on Instrumentation and Measurement, Vol. 49, No. 5, October 2000, pp. 915–923. [32] Gibson, E. J., et al., “Motion Parallax as a Determinant of Perceived Depth,” Journal of Experimental Psychology, Vol. 8, No. 1, 1959, pp. 40–51. [33] Horn, B. K. P., Robot Vision, Cambridge, MA: MIT Press, 1986. [34] Rakshit, S., and C. H. Anderson, “Computation of Optical Flow Using Basis Functions,” IEEE Transactions on Image Processing, Vol. 6, No. 9, September 1997, pp. 1246–1254. [35] Lim, J. S., Two-Dimensional Signal and Image Processing, Englewood Cliffs, NJ: Prentice-Hall, 1990. [36] Rosenfeld, A., and A.C. Kak, Digital Picture Processing, Vols. I and II, Orlando, FL: Academic Press, 1982. [37] Bose, N. K., “Superresolution from Image Sequence,” Proceedings of IEEE International Conference on Image Processing 2004, Singapore, October 24–27, 2004, pp. 81–86. [38] Sandwell, D. T., “Biharmonic Spline Interpolation of Geo-3 and Seasat Altimeter Data,” Geophysics Research Letters, Vol. 14, No. 2, February 1987, pp. 139–142.
176
Super-Resolution [39] Lertrattanapanich, S., and N. K. Bose, “High Resolution Image Formation from Low Resolution Frames Using Delaunay Triangulation,” IEEE Transactions on Image Processing, Vol. 11, No. 12, December 2002, pp. 1427–1441. [40] Ur, H., and D. Gross, “Improved Resolution from Subpixel Shifted Pictures,” CVGIP: Graphical Models and Image Processing, Vol. 54, March 1992, pp. 181–186. [41] Chiang, M. C., and T. E. Boult, “Efficient Super-Resolution Via Image Warping,” Image and Vision Computing, Vol. 18, No. 10, July 2000, pp. 761–771. [42] Gonsalves, R. A., and F. Khaghani, “Super-Resolution Based on Low-Resolution, Warped Images,” Proceedings of SPIE, Vol. 4790, Applications of Digital Image Processing XXV, Seattle, WA, July 8, 2002, pp. 11–20. [43] Candocia, F. M., and J. C. Principe, “Super-Resolution of Images Based on Local Correlations,” IEEE Transactions on Neural Networks, Vol. 10, No. 2, March 1999, pp. 372–380. [44] Irani, M., and S. Peleg, “Image Sequence Enhancement Using Multiple Motions Analysis,” Proceedings of 1992 IEEE Computer Society Conference on Computer Vision and Pattern Analysis, Champaign, IL, June 15–18, 1992, pp. 216–221. [45] Tsai, R. Y., and T. S. Huang, “Multiple Frame Image Restoration and Registration,” in Advances in Computer Vision and Image Processing, Greenwich, CT: JAI Press, 1984, pp. 317–339. [46] Mateos, J., et al., “Bayesian Image Estimation from an Incomplete Set of Blurred, Undersampled Low Resolution Images,” Proceedings Lecture Notes in Computer Science 2652, Pattern Recognition and Image Analysis, Mallorca, Spain, June 4–6, 2003, pp. 538–546. [47] Schulz, R. R., and R. L. Stevenson, “Extraction of High-Resolution Frames from Video Sequences,” IEEE Transactions on Image Processing, Vol. 5, June 1996, pp. 996–1011. [48] Hardie, R. C., K.J. Barnard, and E. E. Armstrong, “Joint MAP Registration and High-Resolution Image Estimation Using a Sequence of Undersampled Images,” IEEE Transactions on Image Processing, Vol. 6, December 1997, pp. 1621–1633. [49] Elad, M., and Y. Hel-Or, “A Fast Super-Resolution Reconstruction Algorithm for Pure Translational Motion and Common Space-Invariant Blur,” IEEE Transactions on Image Processing, Vol. 10, No. 8, August 2001, pp. 1187–1193. [50] Rajan, D., and S. Chaudhuri, “Simultaneous Estimation of Super-Resolved Scene and Depth Map from Low Resolution Defocused Observations,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. 9, 2003, pp. 1102–1115. [51] Farsiu, S., et al., “Fast and Robust Multiframe Super Resolution,” IEEE Transactions on Image Processing, Vol. 13, No. 10, October 2004, pp. 1327–1344. [52] Lee, E. S., and M. G. Kang, “Regularized Adaptive High-Resolution Image Reconstruction Considering Inaccurate Subpixel Registration,” IEEE Transactions on Image Processing, Vol. 12, No. 7, July 2003, pp. 826–837. [53] Papoulis, A., “A New Algorithm in Spectral Analysis and Band-Limited Extrapolation,” IEEE Transactions on Circuits and Systems, Vol. 22, No. 9, 1975, pp. 735–742. [54] Gerchberg, R. W., “Super-Resolution Through Error Energy Reduction,” Optica Acta, Vol. 21, No. 9, 1974, pp. 709–720. [55] De Santis, P., and F. Gori, “On an Iterative Method for Super-Resolution,” Optica Acta, Vol. 22, No. 9, 1975, pp. 691–695. [56] Stark, H., and P. Oskoui, “High-Resolution Image Recovery from Image-Plane Arrays Using Convex Projections,” J. Opt. Soc. Am. A, Vol. 6, No. 11, November 1989, pp. 1715–1726. [57] http://www.cns.nyu.edu/~david/ftp/registration/. [58] Schuler, J. M., et al., “TARID-Based Image Superresolution,” Proc. of SPIE, Vol. 4719, Infrared and Passive Millimeter-Wave Imaging Systems; Design, Analysis, Modeling, and Testing; Orlando, FL, April 3–5, 2002, pp. 247–254.
6.7 Summary
177
[59] Bijl, P., and J. M. Valeton, “TOD, a New Method to Characterize Electro-Optical System Performance,” Proc. SPIE Vol. 3377, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing IX, Orlando, FL, April 15, 1998, pp. 182–193. [60] Bijl, P., and J. M. Valeton, “TOD, the Alternative to MRTD and MRC,” Optical Engineering, Vol. 37, No. 7, 1998, pp. 1984–1994. [61] Bijl, P. and J. M. Valeton, “Guidelines for Accurate TOD Measurement,” Proc. SPIE Vol. 3701, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing X, Orlando, FL, April 9, 1999, pp. 14–25. [62] Hecht, E., Optics, Upper Saddle River, NJ: Addison-Wesley and Benjamin Cummings, 1998. [63] Leachtenauer, J. C., and R. G. Driggers, Surveillance and Reconnaissance Imaging Systems—Modeling and Performance Prediction, Norwood, MA: Artech House, 2001. [64] Boreman, G., Basic Electro-Optics for Electrical Engineers, Ch. 2, Bellingham, WA: SPIE Optical Engineering Press, 1998. [65] Vollmerhausen, R., Electro-Optical Imaging System Performance Modeling, Bellingham, WA: ONTAR and SPIE Press, 2000. [66] Webb, C., and C. Halford, “Dynamic Minimum Resolvable Temperature Difference Testing for Staring Array Imagers,” Optical Engineering, Vol. 38, No. 5, 1999, pp. 845–851. [67] Holst, G. C., et al., “Super-Resolution Reconstruction and Local Area Processing,” Proc. SPIE, Vol. 6543, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing, Orlando, FL, April 2007, pp. 65430E-1–7. [68] Fanning, J., et al., “IR System Field Performance with Superresolution,” Proc. SPIE Vol. 6543, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing, Orlando, FL, April 11–13, 2007, pp. 65430Z-1–12.
CHAPTER 7
Image Deblurring An imaging system’s blurring function [sometimes called the point spread function (PSF)] is another common factor in the reduction of high-frequency components in the acquired imagery and results in blurred images. Deblur filtering algorithms (also called image restoration) can inverse the sensor blurring degradation by inversing the blurring function. However, noise is usually amplified at the high frequencies by this type of filtering. In this chapter, several algorithms commonly used for image deblurring are considered. A particular deblurring filter, called P-deblurring, is introduced. It uses the Fourier-based windowing method that compensates image blur due to the sensor’s PSF and is less sensitive to noise. Image deblurring image performance measurements are also provided.
7.1
Introduction The image acquisition model, as shown in Figure 7.1(a), demonstrates the measurement model for the measured signal or image r(x, y) via the output of a linear shift invariant (LSI) system whose impulse response is the sensor’s point spread function (PSF) b(x, y); the latter is also known as the sensor blurring function. The signal n(x, y) is the additive noise. The observed signal or image r(x, y) is the sum of the noise and the convolution of the sensor blurring function b(x, y) with the “true” signal or image g(x, y): r( x , y) = g ( x , y) ⊗ b( x , y ) + n( x , y )
(7.1)
The measured signal r(x, y) is degraded by the sensor blurring function (equivalent to a convolution process). The deblurring process is to recover g(x, y) from the measured signal r(x, y) by applying the deblurring filter h(x, y) [as shown in Figure 7.1(a)]. The deblurring algorithm is a solution to the problem of deconvolving the blurring function from the measured signal. Therefore, the deblurring process is also called deconvolution or restoration [1–3]. The convolution relation in the spatial frequency domain (fx, fy) is written as follows: R(f x , f y ) = G(f x , f y ) ⋅ B(f x , f y ) + N(f x , f y )
179
180
Image Deblurring
Gaussian blur g(x,y) Desired image
Deblurring filter
b(x,y)
r(x,y)
+
h(x,y)
Output image
Measured image n(x,y) (noise)
General blur PSF
(a)
B(ρ)
N(ρ) ρn
2
ρ
(b)
Figure 7.1
(a) Image acquisition model and (b) the noise spectrum support.
where R(fx, fy), G(fx, fy), B(fx, fy), and N(fx, fy) are the Fourier transforms of r(x, y), g(x, y), b(x, y), and n(x, y), respectively. The blurring function B(fx, fy) in most practical imaging systems is a radially symmetric function. In this case, if the radial spatial frequency domain is defined to be ρ=
f x2 + f y2
then the blurring function can be identified as B(ρ). The blurring function B(ρ) is typically a lowpass filter (such as Gaussian or an Airy disk). The noise n(x, y) may be 2 a white process—that is, the spectrum N( ρ) has a wide-bandwidth (flat) support as shown in Figure 7.1(b). The straightforward deblurring filter is the inverse of the blurring function. When we apply this kind of deblurring filter, we have R( ρ) B( ρ)
= G( ρ) +
N( ρ) B( ρ)
The second term usually becomes unstable at the high frequencies where the signal level G( ρ) is below the noise level N( ρ) . Several image deblurring techniques are presented in this chapter emphasizing their performance in the presence of noise.
7.2 Regularization Methods
7.2
181
Regularization Methods Several review papers [4–7] have summarized the research that attempts to solve image deblurring problem. As quoted in [4], “Everything was summed up in a single word: regularization.” Deblurring is an ill-posed inverse problem due to the inherent instability. For ill-posed (or ill-conditioned) problems, obtaining the true solution from imperfect data is impossible. Regularization is a general principle to obtain an acceptable solution by incorporating some extra or a priori information of the imaging system. Regularization is a method whereby a priori knowledge of the imaging system’s point spread function (PSF) or assumption of the true image [g(x, y)] is utilized to form a risk function. The risk function is then solved by different implementations associated with nonlinear optimization problems, such as the gradient-based method, expectation-maximization (EM) technique, prediction error-based technique, and least square method. Recent research varies in the assumptions about the true image and the PSF, and how these assumptions are imposed in forming different risk functions. An example is to search the solution of g(x, y) by maximizing the a posterior (MAP) distribution
[
]
g( x , y) = arg max p g( x , y) r( x , y)
(7.2)
The risk function depends on the actual object, which is usually unknown. An appropriate solution has to be chosen through proper initialization of the algorithm. Because of the nature of nonlinear optimization problems, there is a chance that the methods may converge to local minima. In addition, the convergence speed may be slow.
7.3
Wiener Filter Recent works [8, 9] have also used the various implementations of the Wiener filter approach. The Wiener filter is defined as H wiener ( ρ) =
B* ( ρ) B( ρ)
2
+ S n ( ρ) S f ( ρ)
(7.3)
where Sf (ρ) and Sn (ρ) are the signal and noise power spectral densities, respectively. Theoretically, the Wiener filter is the optimal linear filter [in the root mean square error (RMSE) sense]. The problem with the Wiener filter is that, practically, its parameters (degradation function and SNR) are not always known and it suffers from limitations. When the noise power is weak (Sn(ρ)/Sf (ρ) > 1), the Wiener filter becomes B*(ρ)/[Sn(ρ)/Sf(ρ)], which is proportional to the SNR. This is similar to a lowpass filter, but it is not effective for deblurring. The parametric Wiener filter is implemented using the parameter γ in the following definition: H wiener ( ρ) =
B* ( ρ) B( ρ)
2
+ γS n ( ρ) S f ( ρ)
The effect of γ on the filter tends to emphasize (γ > 1) or de-emphasize (γ < 1) the noise and signal statistics, respectively. The value of γ must be chosen carefully for reliable deblurring. The choice of γ is achieved through an iterative search [8]. The uniqueness and convergence are uncertain. In addition, the restoration is sensitive to the initial image estimate. Combining an inverse filter with a weighting filter in a geometric mean filter [10] and digital restoration in the wavelet domain approach [11] are also reported in the literature.
7.4
Van Cittert Filter In 1931, Van Cittert [12] devised a method to correct for sensor distortion due to sensor blurring function (instrument distortion) in spectroscopy. In this method, the measured signal is taken as a first approximation to the true signal. This approximation is then convolved with the sensor blurring function (it is known or is determined by other methods [13]) to yield a first approximation to the measured signal. The difference between the ordinates of the measured signal and its approximation of the true signal is then applied as a correction to the first approximation of the true signal to give a second approximation to the true signal. This procedure may be repeated a number of times, in principle, until no improvement of the representation of the true signal is obtained. Consider the image acquisition model (7.1) as a matrix form: r = Bg
where r and g are vectors of appropriate dimensions, and B is a rectangular (say, N × M) PSF matrix. The n + 1 approximation to the true signal is given by g n+1 = g n + d n ,
g1 = r
where dn = r − gnB
Further, the procedure is modified in [13–16] to include a relaxation parameter σ to control the magnitude of the correction introduced in each iteration, plus the convergence rate is increased; that is,
7.5 CLEAN Algorithm
183
g n+1 = g n + σ n d n
where σn is a scale quantity. This updated procedure in the original Van Cittert filter is further refined by using the gradient method in the least square estimation [1, 3, 17, 18]. The unconstrained least square estimate g$ minimizes the norm J = r − Gg$
2
Let dn be the gradient of J at iteration step n; that is, d n ≡ − B T ( r − Bg n ) = d n −1 − σ n −1 B T Bd n −1
The one-step gradient method is of the form: g n+1 = g n − σ n d n
Rearranging these two equations, the solution can also be written as g n+1 = On r
where On is a form of power series: n
(
O n = σ ∑ I − σB T B k=0
)
k
BT
(7.4)
where σ = σn. The Van Cittert filter can be conveniently implemented by a finite impulse response (FIR) filter. Unfortunately, methods based on the Van Cittert deconvolution filter are quite sensitive to noise and typically require filtering before being applied to noisy data; thus, this introduces filtering distortion as they try to correct for the noise [19].
7.5
CLEAN Algorithm The CLEAN algorithm was first introduced by Hogbom in 1974 [20] as an image processing method to “clean” sidelobes phenomena in the output maps of synthesis telescopes. Since then, the work has been extended as an image deconvolution method for multiple element interferometry in radio astronomy [21–24]. CLEAN algorithm applications on restoring synthetic aperture sonar images and radio-astronomical images are found in recent works [25, 26]. The CLEAN algorithm has an extensive history of applications in the astronomy and the radar fields, and its advantages and limitations are fairly well described in [22, 23, 27]. The original CLEAN algorithm is used in the case where one has an observed map, the dirty map (d), which is the convolution of the brightness distribution (t0) with an instrumental response, called the dirty beam (b). This nomenclature comes from the application of the algorithm in radio astronomy. In optics, these terms correspond to “image,” “object,” and “point-spread-function (PSF),” respectively.
184
Image Deblurring
Therefore, the basic CLEAN algorithm is an iterative solution to the deconvolution problem described in (7.1). In the CLEAN algorithm, a dirty image is the starting point where the image has all of the PSF artifacts associated with the optical system. The brightest pixel in the dirty image is determined. In the beginning, a blank image field (all pixels set to zero) is established as the clean image. The value of a clean image pixel is set as some fraction of the brightest pixel in the dirty image. A scaled PSF is subtracted from the dirty image so that the source that caused the brightest pixel is taken into account. The residual dirty image is then analyzed to determine the next brightest pixel. This process of constructing the clean image continues while the residual dirty image continues to be modified. This type of process continues until the noise level limits the performance of the algorithm. The positions of the sources are archived and at the end of the process the collected sources form a clean map. The steps in the procedure are given as follows: 1. Find the maximum absolute value in the image r(x, y) (brightest spot), Imax , at, say, (xi, yi). 2. Translate the PSF center to the brightest location (xi, yi). 3. Scale the translated PSF and subtract it from the image: rf ( x , y) = r( x , y) − γ b( x − x i , y − y i )
where γ is called loop gain to scale the PSF before the subtraction. The image rf (x, y) is called residual image. 4. Enter the value γ Imax as a δ-function in the corresponding location (xi , yi) on the deconvolved output image. 5. Replace the image r(x, y) by the residual image rf(x, y) and go to (1). Iterations stop when Imax is below a certain level, λ (i.e., the noise level is reached), or the maximum iteration is reached. The final residual image is usually added to the deconvolved output image to keep the noise level realistic in the final image. In practice, the algorithm needs a priori knowledge of the PSF. The loop gain γ, as suggested in [20], need not be a fixed quantity. It should be less than about 0.5 and it also should not be too small as to result in an unnecessarily large number of iterations. In cases in which the image is composed of noiseless well-separated, point-like targets, the CLEAN algorithm gives the right deconvolution result and the implementation is straightforward. In images with extended bright regions (or a combination of extended regions and point-like targets), the CLEAN algorithm may provide a completely distorted picture, even if the noise amplitude is very small (a fraction of a percent) [23].
7.6
P-Deblurring Filter After depicting several deblurring algorithms, a practical deblurring filter is provided by the authors that is less sensitive to noise and removes the blur by a desired
7.6 P-Deblurring Filter
185
amount. We call it the P-deblurring filter because we utilize a special window called a power window to construct the deblurring filter. The deblurring process applies the P-deblurring filter to the input image in the spatial frequency domain. The method does not need an iterative approach. We start with the definition of the P-deblurring filter. The P-deblurring filter design method is presented by utilizing the filter properties and by estimating the energy of the signal and noise of the input image. The P-deblurring filter’s performance is demonstrated by a human perception experiment that is described in the next section. 7.6.1
Definition of the P-Deblurring Filter
The blurring function is usually a Gaussian-like function with a bell shape. Figure 7.2(a) shows a Gaussian filter. Its inverse function is shown in Figure 7.2(b). The inverse function’s magnitude increases at the high frequencies where the noise can be magnified enormously. In order to suppress the noise effect, we apply a special window to the inverse blurring function. This window is called a power window [28], which is a Fourier-based smooth window that preserves most of the spatial frequency components in the passing band and attenuates sharply at the transition band. The power window is differentiable at the transition point, which gives a desired smooth property and limits the ripple effect. Figure 7.2(c) shows a power Gaussian filter
Inverse Gaussian filter
1
100
0.8
80
0.6
60
0.4
40
0.2
20
0−4
−2
0
2
0−4
4
−2
0
2
4
Spatial frequency domain f x (b)
Spatial frequency domain f x (a)
P-deblurring filter
Power window 1
14 12
0.8
10 0.6
8
0.4
6 4
0.2 0 −4
Figure 7.2
2 −2
0
2
4
0 −4
−2
0
2
4
Spatial frequency domain f x
Spatial frequency domain f x
(c)
(d)
(a–d) P-deblurring filter.
186
Image Deblurring
window and Figure 7.2(d) shows a deblurring filter when applying the power window to a Gaussian blurring filter. Because the deblurring filter is derived from the use of the power window, we call this deblurring filter the P-deblurring filter. For a given blurring function B(ρ), the P-deblurring filter P(ρ) is written as P( ρ) =
1 ⋅ W ( ρ) B( ρ)
(7.5)
where W(ρ) is the power window. The power window is defined as in (4.84):
(
W ( ρ) = exp − α ρ n
)
(7.6)
where the parameters (α, n) are obtained from certain specifications. This window is called the power window due to the fact that n is the power of the window. When we assume a Gaussian blurring function, the P-deblurring filter is defined as P( ρ) =
W ( ρ)
ρ2 = exp − α ρ n + B( ρ) 2σ 2
(7.7)
where σ is the standard deviation of the Gaussian function. 7.6.2 7.6.2.1
Properties of the P-Deblurring Filter The Peak Point
The P-deblurring filter has a maximum magnitude at the peak point, as shown in Figure 7.3. This peak point is solved by setting the derivative of the P-deblurring filter to zero. That is, ∂ P( ρ) ∂ρ
(
= 0 ⇒ ρp = αn σ 2 ρp
)
1 2 −n
(7.8)
The maximum magnitude of the P-deblurring filter is an important parameter for the P-deblurring filter design. Here, the peak point ρp is a function of three parameters, (α, n, σ). The filter design criteria are derived from these parameters. 7.6.2.2
The Noise Separation Frequency Point
Usually, the signal is stronger than noise at low frequencies and weaker than noise at higher frequencies. The noise is assumed here to be a white process across a wide range of frequencies. The noise separation frequency point ρn is the point where the signal and noise are equal, as shown in Figure 7.1(b). The noise is stronger than the signal for the frequencies greater than ρn. One of the filter design criteria is to let the magnitude of the P-deblurring filter to be 1 at the noise separation point ρn, as shown in Figure 7.3. In this way, the magnitude of the P-deblurring filter is less than one for the frequencies greater than the
7.6 P-Deblurring Filter
187 20 18 Inverse Gaussian
16 14 12 10 8 6 P-deblurring
4 Power window
2 0 −3
Figure 7.3
−2
2 ρp ρn 3 −1 0 1 Spatial Freq. Domain ρ cyc/mm
P-deblurring filter properties.
noise separation frequency point ρn where the noise is stronger than the signal. Meanwhile, the magnitude of the P-deblurring filter can be large for the frequencies less than the noise separation frequency point ρn where the signal is strong. This kind of deblurring filter provides the necessary deblurring; therefore, the P-deblurring filter satisfies the following condition: > 1, if ρ < ρ n ; P( ρ) = = 1, if ρ = ρ n ; < 1, if ρ > ρ . n
(7.9)
If the SNR at the noise separation frequency point ρn is estimated (see Section 7.6.3), the Gaussian function then has a value of 1/SNR. That is, B( ρ) ρ = n
ρ2 1 ⇒ exp − 2 SNR 2σ
= ρn
ρ 1 ⇒ n = 2 ln( SNR ) SNR σ
(7.10)
Therefore, we can solve the ratio of the noise separation point and the standard deviation of the Gaussian, ρn/σ. Furthermore, because P(ρn) = 1, we can also solve for ρn and σ in the following: ρ2 exp − α ρ n + 2 2σ
ρn
( ρ n σ ) 2 2 = 1 ⇒ ρn = α
1n
and σ = ρ n
( ρn σ)
where the parameters (α, n) are obtained from the power window.
(7.11)
188
Image Deblurring
7.6.2.3
The Cutoff Frequency Point
The cutoff frequency point of the power window, ρ0, is where the magnitude of the power window attenuates significantly. This point has an important impact on the filter design, since it controls when the inverse blurring function drops significantly. The cutoff frequency also determines the shape of the power window by solving the two parameters (α, n) [28]. The parameters (α, n) are solved by specifying two distinct values of the power window W(ρ); for example, W1 , W ( ρ) = W 2 ,
for ρ = ρ1 ; for ρ = ρ 2 .
(7.12)
Consider if one selects ρ2 to be the cutoff frequency point, and ρ1 to be 70 percent of the cutoff frequency point. The magnitude of the power window spectrum drops by −3 dB at 70 percent of the cutoff frequency point, and the magnitude drops by −21.9 dB at the cutoff frequency point. That is, 20 log10 W1
ρ 1 = 0. 7 ρ 0
20 log10 W 2
ρ2 = ρ0
= −3 dB
= −219 . dB
In this case, the solution for the parameters (α, n) is obtained α = 2.5 × 10 −8 n=6
7.6.3 7.6.3.1
P-Deblurring Filter Design Direct Design
In this section, a direct design method for the deblurring filter is described, which can benefit an imaging system where the sensor information and the noise characteristics are known. The direct design method is simple and easy to implement. Also, the performance of this deblurring filter is predictable. Assume that the blurring function is a Gaussian function. The magnitude of the inverse Gaussian, mg, can be specified at one frequency point, ρg. This frequency point ρg is calculated in the following: 1
B( ρ g )
= mg ⇒ ρ g = σ 2 ln( mg )
(7.13)
where σ is the standard deviation of the Gaussian function. The standard deviation of the Gaussian σ is either known or is estimated by applying a curve fitting procedure to the ensemble magnitude spectrum (see Section 7.6.3.2). If we choose the cutoff frequency point ρ0 as ρg, the power window is constructed. The P-deblurring filter is finally determined from (7.7). The cutoff frequency point ρ0 is a function of two parameters (σ, mg). For an input image where the Gaussian blurring function is unknown, the user can select two parameters, σ and mg, to derive the P-deblurring filter. For some input images,
7.6 P-Deblurring Filter
189
the Gaussian blurring function maybe known from the sensor information; then users only need to select one parameter, mg. Examples of the Direct Design
The synthetic bar pattern images are considered first. A synthetic bar pattern image is shown in Figure 7.4(a). There are eight bar pairs, each with 30-degree rotation. Figure 7.4(b) shows the Gaussian-blurred bar image. The 2-D Gaussian blurring function is shown in Figure 7.4(d). Since the Gaussian blurring function is known (i.e., the standard deviation σ is known), the magnitude of inverse Gaussian can be selected (i.e., mg = 100). The cutoff frequency is calculated according to (7.13). The resultant P-deblurring filter is shown in Figure 7.4(e). The resultant deblurred image is shown in Figure 7.4(c). Figure 7.5 shows the 1-D cross section of each image shown in Figure 7.4. Both Figures 7.4 and 7.5 illustrate that the resultant deblurred bar image is recovered from the Gaussian blurred image. Next, white noise is added to the Gaussian blurred bar image. The noise level is 40 percent of the standard deviation of the Gaussian. The original and Gaussian blur plus noise images are shown in Figures 7.6(a, b), respectively. Figure 7.6(c)
Spatial domain Y
Original
Spatial domain X (a) Gaussian blurred
Spatial domain Y
Spatial domain Y Gaussian filter
P-deblurring filter
Spatial freq. domain fy
Spatial domain X (c)
Spatial freq. domain fy
Spatial domain X (b)
Spatial freq. domain fx (d)
Figure 7.4
P-deblurred
Spatial freq. domain fx (e)
(a–e) Example of the direct design for a synthetic bar image.
190
Image Deblurring
Original signal
Original
Spatial domain X (a) Gaussian blurred
Blurred signal
Deblurred signal
P-deblurred
Spatial domain X (c)
Gaussian filter
P-deblurring filter
Filter signal
Filter signal
Spatial domain X (b)
Spatial freq. domain fx (d)
Figure 7.5
Spatial freq. domain fx (e)
(a–e) Cross sections of images in Figure 7.4.
shows the deblurred bar image using the same P-deblurring filter shown in Figure 7.4. Figure 7.7 illustrates the 1-D cross section of each image in Figure 7.6. The resultant bar image illustrates that the original bar image is recovered—even the noise is present. An example of applying the P-deblurring filter to one FLIR tank image chip is shown in Figure 7.8. Note that the number of pixels in the x and y domains are different; however, the spatial sampling spaces, ∆x and ∆y, are the same. From the discrete Fourier transform (DFT) in (4.5), N x ∆ x ∆ fx = 1 N y ∆ y ∆ fy = 1
where Nx and Ny are the numbers of pixels in the x and y domains, respectively; ∆ f x and ∆ f y are the sampling spaces in the spatial frequency domains, respectively. Thus, ∆ f x and ∆ f y are not equal to each other. However, the support bands in the spatial frequency domains are
7.6 P-Deblurring Filter
191
Spatial domain Y
Original
Spatial domain X (a) P-deblurred
Spatial domain Y
Spatial domain Y
Gaussian blurred+noise
Spatial domain X (c)
Spatial domain X (b)
Figure 7.6
(a–c) Example of the direct design for a synthetic bar image with noise.
Original signal
Original
Spatial domain X (a) P-deblurred
Blurred signal
Deblurred signal
Gaussian blurred+noise
Spatial domain X (c)
Spatial domain X (b)
Figure 7.7
(a–c) Cross sections of images in Figure 7.6.
±f x0 =
1 2∆ x
±f y 0 =
1 2∆ y
192
Image Deblurring
and they are equal to each other. For this reason, the spectral distributions that are shown in Figure 7.8 are square-shaped, not rectangular-shaped as in the spatial domain image. This image is assumed to be already blurred by a Gaussian function. The parameters are selected as σ = 025 . * ρmax ,
mg = 10
where ρmax is the maximum frequency of the Fourier transform of the input image. The original and P-deblurred tank images are shown Figure 7.8(a, b), respectively. The Gaussian blurring function and the resultant P-deblurring filter are shown in Figure 7.8(c, d). The deblurred tank shows clearly the deblurring improvement, especially around the tire area. Figure 7.9 illustrates the 1-D cross section of each image in Figure 7.8. The 1-D cross section of the tank image is a horizontal line through the center of the tires. The deblurring effects are similarly significant. 7.6.3.2
Adaptive Design
In this section, we describe a deblurring filter that is adaptively designed, based on the estimated energy of the signal and noise in the image. This approach can benefit an imaging system with noise. The adaptive design estimates the noise energy and the noise separation frequency point at which the energy of signal and noise are equal. The noise energy is stronger than the signal energy beyond the noise separation frequency point. The filter design criteria are defined to obtain a deblurring filter such that the filter magnitude is less than one at the frequencies above the noise separation frequency point, and the filter magnitude is larger than one at the fre-
Figure 7.8
Original (blurred)
P-deblurred
(a)
(b)
Gaussian filter
P-deblurring filter
(c)
(d)
(a–d) Example of the direct design for a FLIR tank image.
7.6 P-Deblurring Filter
Figure 7.9
193 Original (blurred)
P-deblurred
(a)
(b)
Gaussian filter
P-deblurring filter
(c)
(d)
(a–d) Cross sections of images in Figure 7.8.
quencies below the noise separation frequency point, where the signal is strong, as shown in (7.9). Therefore, the adaptively designed P-deblurring filter is able to deblur the image by a desired amount based on the known or estimated standard deviation of the Gaussian blur while suppressing the noise in the output image. 7.6.3.3
Estimating Noise Energy and Noise Separation Frequency Point
A method to estimate the noise energy level in the spectral domain of the acquired image is outlined in the following. Let g(x, y) be the continuous input image. We define its 1-D marginal Fourier transform via G
( x)
( f x , y) = ᑤ x [ g( x , y)]
Now consider the image that is the discrete version of g(x, y); that is, gij, i = 1, …, (x) N, j = 1, …, M. The 1-D DFT of this 2-D array are discrete samples of G (fx, y): (i ) G ij , i = 1, …, N, j = 1, … M. The ensemble magnitude spectrum, S = [S1, … SN], is computed si =
1 M (i) ∑ Gij M j =1
2
, i = 1,K N
An example of the ensemble magnitude spectrum from one input FLIR image is shown in Figure 7.10(a). Usually, the signal is stronger than the noise at the low frequencies and weaker than the noise at the high frequencies. From Figure 7.10(a), the ensemble magnitude
194
Image Deblurring
Smoothed 8000
12000
7000
10000
6000
Magnitude
Magnitude
Ensemble magnitude spectrum 14000
8000 6000 4000
5000 4000 3000 2000
2000
1000
0 −4
−2 0 2 Spatial freq. domain f x (a)
4
0
0
1 2 3 Spatial freq. domain f x (b)
4
Gradient
1000
Magnitude
0 −1000 −2000 −3000 −4000 −5000
Figure 7.10
0
1 2 3 Spatial freq. domain f x (c)
4
(a–c) Ensemble magnitude spectrum of a FLIR image.
spectrum is observed to remain unchanged after a high-frequency point. The unchanged spectrum is mainly noise spectrum, since the noise dominates the signal at the higher frequencies. Furthermore, the noise is commonly a white process over a wide range of frequencies. The noise separation frequency point is the frequency where the ensemble spectrum becomes unchanged (levels off). To accurately estimate the noise separation frequency point, we first smooth the ensemble spectrum by convolving it with an averaging filter; we denote the resultant smoothed spectrum by SM: SM = [ sm1 , K , sm N ]
The averaging filter is chosen to be
[1
1 1] 3
as an example. After this step, the gradient of the smoothed spectrum is computed as follows: Vi = sm i + 1 − sm i −1 , i = 2, K , N − 1
Examples of the smoothed spectrum and its gradient curve for the positive frequencies are shown in Figures 7.10(b, c), respectively. The gradient curve is searched from zero frequency through the positive frequencies. The search stops at a fre-
7.6 P-Deblurring Filter
195
quency point when the absolute value of the gradient is less than a threshold; we refer to this as the index frequency point. The value of the ensemble magnitude spectrum at the index frequency point represents the noise energy, and the value of the ensemble magnitude spectrum near the zero frequency point represents the signal energy. The SNR is computed as the ratio of the signal energy to the noise energy. The spectrum of an ideal (point) target is flat; however, the acquired spectrum of a target is decayed due to the blurring function. At the index frequency point, the acquired signal value is 1/SNR if the ideal signal is 1. The noise separation frequency point ρn is calculated as follows: ρ n = σ 2 ln( SNR )
(7.14)
where σ is the standard deviation of the Gaussian function. 7.6.3.4
The Procedure of the Adaptive Design
The overall goal of the deblurring filter is to provide the proper deblurring and the proper noise control. This goal can be achieved by adaptively determining the parameters of the deblurring filter based on the estimated noise energy of the input image. As stated earlier, the P-deblurring filter can be rewritten in the following form: P( ρ) =
W ( ρ)
ρ2 = exp − α ρ n + B( ρ) 2σ 2
(7.15)
To construct the P-deblurring filter, we need to determine three parameters (α, n, σ). The standard deviation of the Gaussian σ is either known or is estimated by applying a curve fitting procedure to the ensemble magnitude spectrum. By specifying two distinct values of P(ρ) as shown earlier, we can solve for the other two parameters (α, n). For example, one can select: P P
ρ1 = ρn
=1
ρ 2 = 0. 8 ρ n
= W2
That is, the magnitude of the P-deblurring filter is set to be 1 at the noise separation frequency point, and the magnitude is set to be W2 (W2 > 1) at 80 percent of the noise separation frequency point. The selection of W2 is not completely arbitrary. Because the power window is less than 1, the magnitude of the P-deblurring filter is less than the inverse Gaussian function; that is, ρ2 W 2 < exp 22 2σ
Therefore, W2 is chosen according to this condition.
(7.16)
196
Image Deblurring
Another way to determine the two parameters (α, n) is to specify W2; then the corresponding frequency point ρ2 is solved by the previous condition. One has to be cautious when selecting W2 via the previous procedure. Since W2 is chosen to be greater than 1, according to the shape of the P-deblurring filter shown in Figure 7.3, the corresponding frequency ρ2 is less than ρ1 (ρ1 = ρn). This condition needs to be met for the selection of W2. The adaptive design procedure is summarized as follows: • •
•
•
•
• •
Step 1: Compute the ensemble magnitude spectrum of the input image. Step 2: Smooth the ensemble magnitude spectrum and compute the gradient of the smoothed spectrum. Step 3: Search the gradient curve to find the index frequency point where the gradient is less than a predetermined threshold. Step 4: Compute the SNR as the ratio of the ensemble magnitude spectrum at near 0 frequency and at the index frequency. Step 5: If the standard deviation of the Gaussian blurring σ is known, skip this step. Otherwise, estimate σ by applying a curve fitting procedure to the ensemble magnitude spectrum. Step 6: Calculate the noise separation frequency point ρn from (7.14). Step 7: Select two points for the P-deblurring filter W , for ρ = ρ1 P( ρ) = 1 W 2 , for ρ = ρ 2
•
where ρ1 is selected as the noise separation point ρn, and W1 ≤ 1. The value ρ2 is selected as a frequency less than ρ1, and W2 is selected to meet the conditions in (7.16). Step 8: Construct the P-deblurring filter from (7.15).
Examples of the Adaptive Design
An example of the adaptive design P-deblurring filter is shown in Figure 7.11. For an original FLIR image, a Gaussian blur is added to the image. White noise is also added to the Gaussian-blurred image. A white noise with a standard deviation of 50 milliKelvins (mK) is considered as one noise level (Noise1) and 100 mK is another noise (Noise2). An image that was blurred and a blurred image with added noise are shown in the second row of Figure 7.11. The P-deblurred resultant images are shown in the third row of Figure 7.11. The P-deblurred images demonstrate deblurring improvement from their blurred input images. The P-deblurring filter shows robust results, especially for the noise that is added in the blurred images. The noise is suppressed instead of magnified in the deblurred image, which benefits from the adaptive design method. Figures 7.12–7.15 show four examples when applying P-deblurring filter to the outputs of the super-resolved images that were discussed in Chapter 6. The original images were acquired from real sensors ranging from FLIR to visible. The superresolution algorithm was applied to originals to obtain the super-resolved images to compensate for the sensor aliasing. The P-deblurred images of the super-resolved
7.6 P-Deblurring Filter
197
Original
Gaussian blurred
P-deblurred
Figure 7.11
Blurred+noise1 (50 mK)
Blurred+noise2 (100 mK)
P-deblurred
P-deblurred
Sample examples of blurred, blurred with noise, and P-deblurred images.
Low-resolution
Super-resolved
Figure 7.12
Super-resolved + deblur
P-deblurred image of the super-resolved image from an airplane FLIR sensor.
images show clearly deblurring improvements. These examples demonstrate the benefits of applying P-deblurring filter to compensate for the sensor blurring of sensor’s output images.
198
Image Deblurring
Low-resolution
Super-resolved
Figure 7.13
Super-resolved + deblur
P-deblurred image of the super-resolved image from a handheld FLIR sensor.
Low-resolution
Super-resolved
Figure 7.14
Super-resolved + deblur
P-deblurred image of the super-resolved image from a naval ship FLIR sensor.
Remarks
One needs to be cautious when applying any deblurring filter to images. If the images are undersampled (or aliased, as discussed in Chapter 6), the images need to
7.7 Image Deblurring Performance Measurements
199
Low-resolution
Super-resolved
Super-resolved + deblur
Figure 7.15 P-deblurred image of the super-resolved image from an airplane visible sensor. The super-resolved portion is the side view of a building.
be processed to obtain alias-free images (using super-resolution algorithms or other algorithms) before deblurring algorithms are applied. The deblurring filter intends to compensate the sensor blurring effect (i.e., to uncover the high-frequency components in the image that is reduced due to sensor blurring). However, the high-frequency components in the aliased images are lost or corrupted in another way—that is, the high-frequency components are folded into the low-frequency components (or mixed with low-frequency components). The de-aliasing process is used to “separate” hidden high-frequency components from the rest of the components in the image. Then, the deblurring filter is applied to the alias-free images and can provide more satisfied deblurring effect. In general, adaptive filtering refers to some iterative method that adaptively updates the filter. This is not the case in P-deblurring filter. Here, the filter is designed adaptively with respect to the estimated blur and noise, and then it is applied directly. The P-deblurring filter method also works well in the cases where the modulation transfer function (MTF) degradation is not monotonically decreasing as long as the inverse of the MTF is valid.
7.7
Image Deblurring Performance Measurements In this section, a method is presented to assess the performance of a P-deblurring filter through a perception experiment. This experiment will demonstrate the P-deblurring filters effect on a target identification task.
200
Image Deblurring
7.7.1
Experimental Approach
A perception experiment is described here in which human observers perform a target identification task on images before and after applying the P-deblurring filter. The results of comparing target identification probabilities using blurred, deblurred, adding noise to blurred, and deblurred noisy images are derived from the perception experiment. 7.7.1.1
Infrared Imagery Target Set
The image set consists of a standard 12-target set taken from a field test using a longwave infrared (LWIR) sensor. The targets in the set are well characterized in terms of predicting range performance based upon sensor characteristics. Each target with 12 aspect angles makes the total image set that contains 144 images. The side views of the12 targets are depicted in Figure 7.16. 7.7.1.2
Experiment Design
Experimental cell organization was based upon three factors. Images were subjected to six levels of blur to simulate range effects associated with identification task difficulty. Images were also subjected to three different noise treatments to test the effectiveness of the deblurring algorithm on noisy images. Finally, each of the cells was processed using the deblurring algorithm to quantify differences between blurred and deblurred image cases as a function of blur (i.e., range). The experimental matrix layout and naming convention is shown in Table 7.1. A total of 144 images in the first row are arranged into six different levels of blur cells. The aspect angles are rotated through the blur cells so that each cell has the same number of images at a particular aspect angle. Within each blur cell, there are
Figure 7.16
Target_01
Target_02
Target_03
Target_04
Target_05
Target_06
Target_07
Target_08
Target_09
Target_10
Target_11
Target_12
Thermal image examples for 12 target sets.
7.7 Image Deblurring Performance Measurements Table 7.1
201
Experiment Matrix Cell Organization Blur1
Blur2
Blur3
Blur4 Blur5
Blur6
Noise0 Blur
AA
AB
AC
AD
AE
AF
Noise0 Deblur
BA
BB
BC
BD
BE
BF
Noise1 Blur
CA
CB
CC
CD
CE
CF
Noise1 Deblur
DA
DB
DC
DD
DE
DF
Noise2 Blur
EA
EB
EC
ED
EE
EF
Noise2 Deblur
FA
FB
FC
FD
FE
FF
24 target images that have various aspect angles for the 12 different targets. It should be noted that the target images within a particular blur in the first row are the same target images for that particular blur with noise added and deblurring cells in the same column. Figure 7.17 shows Gaussian filters with six levels of blur that were applied to imagery. These six levels were chosen to vary observer performance from a very high probability of correct answer to a very low probability of correct answer. The Blur1 Gaussian filter causes the least blur in the image, while the Blur6 function causes the most blur. Figure 7.18 shows examples of three blurred images (Blur1, Blur4, and Blur6) comparing with their corresponding original and deblurred images. Three noise treatments then follow. First, no additional noise is added to the blurred images. This case is considered as a low-noise situation, noted as Noise0. For the other two noise treatments, two levels of white noise were added to the blurred images. A white noise with a standard deviation of 100 mK is considered to be the upper bound of noise when it is added to a typical infrared image that has the SNR of 20K. The second noise case, noted as Noise1, is chosen to be a white noise 0.16 0.14 0.12 Blur6
Filter signal
0.1 0.08 0.06 0.04 0.02 0 −2
Blur1 −1.5
−1
−0.5
0
0.5
Spatial freq. domain ρ
Figure 7.17
Gaussian function in frequency domain.
1
1.5
2
202
Image Deblurring
Figure 7.18
Original
Original
Original
Blur-1
Blur-4
Blur-6
P-deblurred
P-deblurred
P-deblurred
Sample examples of various blurred and P-deblurred images.
with a standard deviation of 50 mK, which is a typical noise situation. The third noise case, noted as Noise2, is the white noise with 100-mK standard deviation, which is an upper-bound noise situation. 7.7.1.3
Observer Training
Ten observers were trained using the ROC-V Thermal Recognition software [29] to identify the 12-target set as presented in the experiment. Each observer was trained to a 96 percent proficiency in the identification of pristine images at various ranges and at various aspect angles. This proficiency requirement prevents potentially biased results stemming from subjects learning target-specific characteristics during the testing process. 7.7.1.4
Display Setting
Experimental trials were conducted using a high-resolution, medical-grade display to allow for accurate representation of image characteristics. For consistency, the displays were all calibrated between experimental setups using a Prichard Model 1980 Spot Photometer to ensure (realistically) identical minimum, mean, and peak luminance. Each display was calibrated to a mean level of 5.81 foot-Lamberts (fLs). Experimental trials were conducted in a darkened environment and observers were given ample time to adapt to ambient lighting conditions.
7.7 Image Deblurring Performance Measurements
7.7.2
203
Perception Experiment Result Analysis
The average probabilities of identification from the experiment P were adjusted using the following equation: Pid =
P − Pg Ps − P g
, Pg =
1 9 , Ps = 12 10
to remove the probability that an observer can select the proper response simply through guessing 1 out of 12 targets. Previous experimental analysis encourage the inclusion of a sweat factor, which normalizes probabilities to 90 percent for inclusion of a 10 percent mistake rate. For the typical observer, this normalization provides a more accurate reflection of observer performance by accounting for mistakes and unfamiliarity with the testing environment. The adjusted probability of identification Pid as a function of blur (range) for the different cases is shown in Figure 7.19. There was a significant improvement in human performance, as a function of range, when the blurred imagery was processed using the P-deblurring filter. Noise0 Blur images are considered the baseline images. For the Blur level 1–3 (near range), the Pid of the baseline images is high (60–80 percent). However, for the Blur level 4–6 (long range), the Pid of the baseline images is very low (10–30 percent); that is where the P-deblurring filter’s improvement takes place. For the Blur level 4–6, the average increase of Pid is 33 percent by using deblurred images versus blurred images in the Noise0 case; the average
1 0.9 0.8
Pid (corrected)
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
1
2
3 4 Blur Level
5
6
0
17
33
50 67 Range (% unit)
83
100
Figure 7.19
7
Row A Noise0 blur
Row C Noise1 blur
Row E Noise2 blur
Row B Noise0 deblur
Row D Noise1 deblur
Row F Noise2 deblur
Perception experiment results.
204
Image Deblurring
increase of Pid is 15 percent in the Noise1 case. In the Noise2 case, the improvement of Pid is negligible using the deblurred images. In summary, the Pid is increased 15 to 33 percent for the case of typical noise to low noise at long range using the P-deblurring filter method. In order to estimate the range performance, we assume that the range is linearly increased with the increasing blur level, since the chosen blurs generate equal increment blur effects on the images. From Figure 7.19, for the case Noise0, blurred imagery had a 50 percent Pid at the 55 percent range unit (Blur level 3.3), while the P-deblurred imagery reached the 92 percent range unit (Blur level 5.5) for the 50 percent Pid. That is equivalent to a 67 percent increase in range performance. Similarly, in the Noise1 case, P-deblurred imagery yielded the 67 percent range unit (Blur level 4), comparing with the 58 percent range unit (Blur level 3.5) from the blurred imagery at 50 percent Pid. That is about 16 percent increase in range performance. In the Noise2 case, at 50 percent Pid, the range is at 63 percent unit (blur level 3.8) for P-deblurred imagery, comparing with 58 percent unit (Blur level 3.5) from blurred imagery, which is a 9 percent increase in range performance. In summary, the range performance is increased by 9, 16, and 57 percent for the cases of upper-bound noise, typical noise, and low noise in the input images, respectively, using the P-deblurring filter method. The potential benefit of this practical deblurring algorithm is that it can be implemented in a real imaging system to compensate for sensor blurring and noise effects.
7.8
Summary In this chapter, we have studied several deblurring filter techniques that operate in spatial or spatial frequency domains. One important consideration in deblurring filter design is to control noise. The spatial frequency domain algorithms are useful and easy to be implemented. The performance measurements by a perception experiment for one deblur filtering method are discussed. For image degradations from sensor blur as discussed in this chapter and undersampling in previous chapters, a priori knowledge of image acquisition processing is available since the modeling of blur and undersampling has been studied for a long time and can be applied. For other sources of image degradation, such as low-contrast images due to low-cost sensors, and/or environmental factors, for example, such as lighting sources or background complexities, it is difficult to model. Image contrast enhancement, coupled with human visual study, is discussed in Chapter 8.
References [1] [2] [3]
Andrews, H. C., and B. R. Hunt, Digital Image Restoration, Englewood Cliffs, NJ: Prentice-Hall, 1977. Frieden, B. R., “Image Enhancement and Restoration,” in Picture Processing and Digital Processing, T. S. Huang, (ed.), Berlin: Spring-Verlag, 1975, pp. 177–248. Blass, W. E., and G. W. Halsey, Deconvolution of Absorption Spectra, New York: Academic Press, 1981.
7.8 Summary
205
[4] Demoment, G., “Image Reconstruction and Restoration: Overview of Common Estimation Structures and Problems,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 37, No. 12, 1989, pp. 2024–2036. [5] Kundur, D., and D. Hatzinakos, “Blind Image Deconcolution,” IEEE Signal Processing Magazine, Vol. 5, 1996, pp. 43–64. [6] Lagendijk, R. L., A. Tekal, and J. Biemond, “Maximum Likelihood Image and Blur Identification: A Unifying Approach,” Optical Engineering, Vol. 29, No. 5, 1990, pp. 422–435. [7] Sezan, M. I., and A. M. Tekalp, “Tutorial Review of Recent Developments in Digital Image Restoration,” Proc. SPIE Vol. 1360, Visual Communications and Image Processing, Lausanne, Switzerland, October 1, 1990, pp. 1346–1359. [8] Gerwe, D. R., et al., “Regularization for Non-Linear Image Restoration Using a Prior on the Object Power Spectrum,” Proc. SPIE Vol. 5896, Unconventional Imaging, San Diego, CA, July 31, 2005, pp. 1–15. [9] Mateos, J., R. Molina, and A. K. Katsaggelos, “Approximation of Posterior Distributions in Blind Deconvolution Using Variational Methods,” Proceedings of IEEE International Conference on Image Processing, Genoa, Italy, September 11–14, 2005, pp. II: 770–773. [10] Stern, A., and N. S. Kopeika, “General Restoration Filter for Vibrated-Image Restoration,” Applied Optics, Vol. 37, No. 32, 1998, pp. 7596–7603. [11] Lam, E., “Digital Restoration of Defocused Images in the Wavelet Domain,” Applied Optics, Vol. 41, No. 23, 2002, pp. 4806–4811. [12] Van Cittert, P. H., “Zum Einfluß der Spaltbreite auf die Intensitätsverteilung in Spektrallinien II” (On the Influence of Slit Width on the Intensity Distribution in Spectral Lines II), Z. Phys., Vol. 69, 1931, pp. 298–308. [13] Jansson, P. A., R. H. Hunt, and E. K. Plyler, “Response Function for Spectral Resolution Enhancement,” Journal of the Optical Society of America, Vol. 58, No. 12, 1968, pp. 1665–1666. [14] Jansson, P. A., R. H. Hunt, and E. K. Plyler, “Resolution Enhancement of Spectra,” Journal of the Optical Society of America, Vol. 60, No. 5, 1970, pp. 596–599. [15] Jansson, P. A., “Method for Determining the Response Function of a High-Resolution Infrared Spectrometer,” Journal of the Optical Society of America, Vol. 60, No. 2, 1970, pp. 184–191. [16] Herget, W. F., et al., “Infrared Spectrum of Hydrogen Fluoride: Line Position and Line Shapes II: Treatment of Data and Results,” J. of the Optical Society of America, Vol. 52, 1962, pp. 1113–1119. [17] Jansson, P. A. (ed.), Deconvolution of Images and Spectra, New York: Academic Press, 1997. [18] Jain, A. K., Fundamentals of Digital Image Processing, Englewood Cliffs, NJ: Prentice-Hall, 1989. [19] De Levie, R., “On Deconvolving Spectra,” American Journal of Physics, Vol. 72, No. 7, 2004, pp. 910–915. [20] Hogbom, J. A., “Aperture Synthesis with a Non-Regular Distribution of Interferometer Baselines,” Astronomy and Astrophysics Suppl., Vol. 15, 1974, pp. 417–426. [21] Schwarz, U. J., “Mathematical-Statistical Description of the Iterative Beam Removing Technique (Method CLEAN),” Astronomy and Astrophysics, Vol. 65, 1978, pp. 345–356. [22] Schwarz, U. J., “The Method ‘CLEAN’ Use, Misuse and Variations,” in Image Formation from Coherence Functions in Astronomy, Proceedings of International Astronomical Union Colloquium 49, C. Van Schounveld, (ed.), Boston, MA: Reidel, 1979, pp. 261–275. [23] Segalovitz, A., and B. R. Friedden, “A ‘CLEAN’-Type Deconvolution Algorithm,” Astronomy and Astrophysics, Vol. 70, 1978, pp. 335–343. [24] Tsao, J., and B. D. Steinberg, “Reduction of Sidelobe and Speckle Artifacts in Microwave Imaging: The CLEAN Technique,” IEEE Transactions on Antennas and Propagation, Vol. 36, No. 4, 1988, pp. 543–556.
206
Image Deblurring [25] Leshem, A., and A. J. van der Veen, “Radio-Astronomical Imaging in the Presence of Strong Radio Interference,” IEEE Transactions on Information Theory, Vol. 46, No. 5, 2000, pp. 1730–1747. [26] Chick, K. M., and K. Warman, “Using the CLEAN Algorithm to Restore Undersampled Synthetic Aperture Sonar Images,” Proceedings of IEEE Conference and Exhibition on OCEANS, MTS, Vol. 1, Honolulu, HI, November 5–8, 2001, pp. 170–178. [27] Cornwell, T., and B. Braun, “Deconvolution,” Synthesis Imaging in Radio Astronomy: Conference Series, Vol. 6, San Francisco, CA: Astronomical Society of the Pacific, 1989, pp. 167–183. [28] Young, S. S., “Alias-Free Image Subsampling Using Fourier-Based Windowing Methods,” Optical Engineering, Vol. 43, No. 4, April 2004, pp. 843–855. [29] Recognition of Combat Vehicles (ROC-V), U.S. Army Night Vision & Electronic Sensors Directorate, Version 9.3, 2005.
CHAPTER 8
Image Contrast Enhancement Due to certain sensors or environmental factors, some acquired imagery contains low contrast, which is undesirable. The image model describing this type of image degradation is often difficult or impossible to obtain. Image contrast enhancement coupled with human visual study can be used to improve image qualities, such as emphasizing image edges. In this chapter, single-scale and multiscale processing used in image contrast enhancement algorithms are considered. Contrast enhancement image performance measurements are also discussed.
8.1
Introduction Contrast can be defined as the ratio of the difference to the sum of the maximum and the minimum signal amplitude (called the Michelson contrast) [1, 2]; that is, V =
Emax − Emin Emax + Emin
When this concept is applied to image contrast, it is often expressed as the ratio of the difference between two image intensity values to the sum of them; that is, C=
Io − Ib Io + Ib
where Io and Ib are the image values of the object and the background, respectively. Many factors degrade image contrast during the capture and display process. One factor is due to image artifacts and noise. For example, in medical imaging, scatter radiation in X-ray projection radiography can strongly degrade the image contrast [3]; ultrasound images formed with coherent energy suffer from speckle noise [4]. Another factor is that many image capture devices can record a signal dynamic range that is wider than a typical display can effectively render. For example, consumer color negative films can record a scene luminance range of 1,000:1, but photographic color paper can only render a luminance range from 50:1 to 100:1 [2]. Another example is that the sky-to-ground interface in infrared imaging can include a huge apparent temperature difference that occupies most of the image dynamic range [5]. In all of these examples, images can suffer significantly loss of details for small objects with small signals.
207
208
Image Contrast Enhancement
Image contrast enhancement is a process of increasing the contrast of a chosen feature (or object). Clearly, contrast amplification amplifies noise, or unwanted features, if no additional steps are taken to achieve good images. Therefore, the goal of image contrast enhancement technique is to improve the visual appearance of an image (e.g., the details of the desired features) and limit the artifacts introduced. The enhancement process does not increase the inherent information content in the data, as is achieved in super-resolution image reconstruction (see Chapter 6). It does, however, increase the dynamic range of the chosen features so that they can be easily detected. Note that in an image contrast enhancement process there is no conscious effort to improve the fidelity of a reproduced image with regard to some ideal form of the image, as is performed in the image deblurring (image restoration) process (see Chapter 7). In this chapter, we emphasize two categories of image enhancement algorithms: the single-scale approach and the multiscale approach, which are widely used for image contrast enhancement. The single-scale approach is a process in which the image is processed in the original image domain. In the multiscale approach, the image is decomposed into multiple resolution scales, and the processing is performed in the multiscale domain. The output image is then formed through either combination or reconstruction processes from the results in the multiscale domain. We also present an image performance measurement for one of the contrast enhancement algorithms.
8.2
Single-Scale Process In the single-scale process, the image is processed in the original image domain (e.g., a simple look-up table is used to transform the image). The most common single-scale methods are contrast stretching, histogram modification, and region-growing, which are described in the following sections. 8.2.1
Contrast Stretching
The image presentation on an output display medium can sometimes be improved by applying a transfer function to rescale each pixel. This transfer function is called the contrast stretching transformation, which can be expressed as [6] 0 ≤ I i < I ia αI i , I o = β( I i − I ia ) + I oa , I ia ≤ I i < I ib γ( I − I ) + I , I ≤ I < L ib i i ib ob
where Ii, the input gray level Ii ∈ [0, L], is mapped into an output gray level Io. Other parameters are defined in Figure 8.1. More sophisticated contrast stretching can be defined nonlinearly and incorporated to enhance the human perception capabilities within a system. The reader is referred to the discussion of tone scale in Chapter 10.
8.3 Multiscale Process
209 Io
L γ Iob β Ioa
α Iia
Figure 8.1
8.2.2
Iib
L
Ii
Contrast stretching transformation.
Histogram Modification
The histogram of an image represents the relative occurrence frequency of the various gray levels in the image. The histogram modification method is based on the idea that if the gray level differences among neighboring pixels are stretched out by modifying the histogram of a local window, the contrast of the image details will be improved. The histogram modification can be coupled to contrast stretching. Contrast stretching changes the histogram shape, and histogram modification changes the contrast. Examples are provided in Chapter 10. For more details on histogram modification and variations, the reader is referred to [6, 7]. 8.2.3
Region-Growing Method
The region-growing method has been used for image contrast enhancement [8]. Morrow et al. applied the region-growing method to improve mammogram images. The growing starts at areas of interest in test mammogram images that are identified with the aid of an experienced radiologist. A tolerance parameter k is selected such that if a pixel value is between (1 − k)Ii and (1 + k)Ii, then the pixel belongs to the region where Ii is the gray value of the starting pixel. Then, a new contrast is reassigned based on the properties of the region. Because the initial seed is selected manually, the practical use of this method is reduced. Other region-growing-based methods are also introduced, such as in retinex theory, which was originally proposed by Land and McCann [9] in the 1970s. The original idea is for image intensity adjustments for images undergoing large lighting changes. Readers are referred to developments on the retinex method in [10–13].
8.3
Multiscale Process In this section, first we give an overview of multiresolution analysis. Then, we describe two multiscale methods in detail. One of them is the contrast enhancement method based on unsharp masking (USM), which is a special case among multiscale approaches, since the image is decomposed into two scales. The other is the contrast
210
Image Contrast Enhancement
enhancement method based on wavelet edges that capture image details as edge signals at each resolution scale. 8.3.1
Multiresolution Analysis
Multiresolution analysis is a technique to study image characteristics at different resolutions. Since one resolution is considered a frequency band, when an image is represented at different resolutions, the image details at different scales can be detected separately by using the same acquired image resolution. There is no need to acquire several images with different resolutions (i.e., different lenses) to detect the image details with different scales. Multiresolution analysis has been thoroughly studied in image processing since the early 1970s [14]. The process is to decompose the image into multiple resolutions. This is called the multiresolution representation, where the decomposed image at each resolution is the approximation of the image at that resolution. The image details are then extracted as the difference of information between the approximations of an image at two different resolutions. For example, given a sequence of increasing resolutions, the details of f(x) at the resolution 2m are defined as the difference of information between the approximation of f(x) at the resolution 2m and the approximation at the resolution of 2m+ 1. There are many multiresolution representation techniques. One hierarchical framework for interpreting the image information is the pyramidal multiresolution decomposition [15, 16]. It is a coarse-to-fine technique. At a coarse resolution, the image details are characterized by very few samples. The finer details are characterized by more samples. Burt and Adelson [17] and Tanimoto and Pavlidis [18] have developed efficient algorithms to compute the approximation of a function at different resolutions and compute the details at different resolutions. Because the Laplacian filter is used, this data structure is also called the Laplacian pyramid. The multiresolution representation can also be implemented as a subband coding scheme, in which a quadrature mirror filter (QMF) pair [19] gives the lowpass and detail signals. Vaidyanathan gives a good tutorial of QMF filters and applications in [20, 21]. Another implementation scheme is the scale-space filtering, which was introduced by Witkin in the 1980s [22]. 8.3.2
Contrast Enhancement Based on Unsharp Masking
Unsharp masking (USM) has a long history in the photography and printing industry, but its systematic exploration as a technique in enhancing image details [23] began around the 1940s. To present high-frequency image details without loss of overall image contrast, an idea was proposed in USM techniques in which the image is decomposed into two scales, as shown in Figure 8.2. The image is first passed through a lowpass filter to obtain a reduced-resolution (low-frequency) image. This lowpass filtered image is considered an unsharp version of the image. The high-frequency image is obtained by subtracting the low-frequency image from the original image. By modifying (or compressing) the low-frequency components and amplifying (or copying) the high-frequency components (unsharp masking), one can com-
8.3 Multiscale Process
211
Low-frequency high-frequency
Compression
Figure 8.2
Copy
Banding artifact from unsharp masking method.
bine the two modified components back to form a new image that has enhanced image detail. The USM operation can be mathematically presented by I o ( x , y) = αI L ( x , y) + βI H ( x , y)
where α < 1, β > 1, and IH(x, y) = Ii(x, y) − IL(x, y). Also, Ii, IL, and IH present the original image, low-frequency components, and high-frequency components of the image, respectively. Many variations of USM have been developed. The method in [24] combined two different USM sizes (7 × 7 and 25 × 25) to achieve enhancements of high and median frequencies. Chang and Wu [25] and Narendra and Fitch [26] described a method of adaptively adjusting the gain of high-frequency components by using human visual properties. These three methods are variations of the well-known USM method, and all of them suffer from a drawback like the USM (i.e., causing the dark and bright banding around high-contrast edges, as illustrated in Figure 8.2). This artifact is known as edge-banding, or the Mach-band artifact. 8.3.3 8.3.3.1
Contrast Enhancement Based on Wavelet Edges Multiscale Edges
The USM technique causes the Mach-band artifacts. These artifacts are objectionable in areas where there are rapid changes in the image. For example, this artifact is seen at the sharp boundaries between a bright target and a dark background. There are several ways to avoid or reduce the severity of the banding artifacts. One way that was taught in [27] is to decompose the image into more than two channels and fine-tune the frequency response of each channel carefully so that the banding artifacts are less visible. The ideal case is to selectively reduce the edges that
212
Image Contrast Enhancement
are not desired, and preserve and even increase other edges. This ideal case is shown in Figure 8.3. Psychophysical and physiological evidences [28, 29] indicate that the human visual system is sensitive to the high-frequency signals (local intensity changes). What the retinal ganglion cell does for the incoming signal is the result of a comparison of the amount of light hitting a certain spot on the retina with the average amount falling on the immediate surroundings. These local changes in the images are usually called edges. Another phenomenon in the human visual system is called the saccade (or jump). When exploring visual surroundings, the movement of the human eye is not smooth and continuous. The eyes fixate on an object for a short duration and then jump to a new position to fixate on a new object. The edges between the object of interest and its surroundings focus attention to the object. When the local changes are small compared to low-frequency background, they are difficult to detect; therefore, properly enhancing local changes (edges) in an image improves the visualization. Capturing image details as edge signals at multiscales helps enhance image details more effectively by reducing the artifacts. It is well known that the image edge is usually computed as the image gradient between the neighboring pixels. This representation is not enough for two reasons: (1) There is noise and the pixel gradients are not a reliable estimate of the scene change, and (2) the spatial extent of scene change is not explicitly represented in the local gradient. This is basically a scale problem. For example, an edge with a large spatial scale, such as big temperature difference between a cold roof and warm sky in infrared images, and an edge with a small spatial scale, such as small temperature difference between a person and a wall Edge amplitudes compressed
Edge amplitudes preserved or increased
Figure 8.3
An example of ideal case in edge amplitude adjustment.
8.3 Multiscale Process
213
inside the building, cannot be detected at the same image resolution. Therefore, it is useful to examine edges at different scales. In the following section, we describe a method to capture image details as edge signals at each resolution. It is referred to multiscale edge detection by Mallat and Zhong [30]. 8.3.3.2
Multiscale Edge Detection
Mallat and Zhong developed a multiscale edge detection algorithm for 2-D signals based on a wavelet transform [30] that was described in Section 4.5.5. Two components, W sx and W sy , of the wavelet transform of the 2-D signal g(x, y) at the scale s are defined in (4.125). The image gradient amplitude is then computed from these two wavelet coefficients as Gs =
(W ) x s
2
(
+ W sy
)
2
The edge signals are characterized at the scale s as the gradient amplitude Gs(x, y). As described in Section 4.5.6, LaRossa and Lee [31] formulated block diagrams of the multiscale edge decomposition and reconstruction, as shown in Figures 4.26 and 4.27. The corresponding impulse responses of the forward and inverse wavelet functions are given in Tables 4.1 and 4.2. In the next section, a wavelet edge modification method is described to achieve the goal: selectively reducing the undesired edges and preserving, even increasing, other edges as shown in Figure 8.3. 8.3.3.3
Wavelet Edge Modification
One method to enhance the edge signals in an image is to modify the wavelet coefficients h m′ and v m′ before the reconstruction process, as shown in Figure 4.27. In the first step, multiscale wavelet coefficients, representing the partial derivatives of an input image, are computed by applying a forward edge-wavelet transform to an input image to produce its partial derivatives, hm and vm, at several spatial resolutions, m = 1, 2, …, N. Next, the partial derivatives (wavelet coefficients) are modified to produce new partial derivatives h m′ and v m′ . Finally, the modified wavelet coefficients h m′ and v m′ are used in an inverse transform to produce a processed (enhanced) output image. Hence, the edge information is modified, and the output image contains the important image details. An overview of the multiscale process method is illustrated in the block diagram of Figure 8.4. The goal of wavelet edge modification is to preserve or amplify the edges that are important and suppress the edges that are not important. The “importance” of edges differs from image to image. Different applications have different needs for image contrast enhancement. For example, in remote sensing, large-scale edges, such as roads, need to be preserved and edges of small districts and streets need to be eliminated; in digital mammography, it is the opposite, large-scale edges of skin lines need to be eliminated and small-to-medium scale edges representing useful structures need to be preserved.
214
Image Contrast Enhancement Input image Output mountain-view presentation
Forward edge-wavlelet transform ( hm , vm )
Modify wavelet coefficients ´) ( h´m, vm
Output contrast-reconstruction presentation
Inverse edge-wavelet transform
Figure 8.4
A block diagram of an overview of the multiscale process method.
The simple example is to let the user set the threshold and modify the images using the linear stretching as proposed in [32] (see Figure 8.5): λh , if pK ≥ µ hm′ = m if pK < µ 0,
where µ is the threshold, λ is an adjustable constant, and pK is a defined criterion. In [32], this criterion is designed by exploiting the multiscale correlation of the desired signal [33]: K
p K = ∏ hm m =1
where K is the maximum number of scales for computing correlations. The choice of µ depends on the noise level of the image. The user has the option to input different values of λ for different degrees of enhancement.
´ hm
µ
Figure 8.5
A simple linear stretching function.
pK (hm )
8.3 Multiscale Process
215
In this approach, the stretching is constant at all scales, and features of all sizes are equally enhanced. Another approach is scale-variable stretching [34]. The stretching is variable depending on scales and features of different sizes that are selectively enhanced: hm′ = λ m hm ,
m = 1, 2, K , N
where λm is a given constant of the scale index m. Adaptive gain nonlinear processing is also used in various applications [3, 4, 35–37]. The adaptive gain operation is generalized to incorporate hard thresholding to avoid amplifying noise and remove small noise perturbations. A generalized adaptive gain operator is defined as [4]: if v < T1 ; 0, EGAG ( v) = sign( v)T2 + u , if T2 ≤ v ≤ T3 ; otherwise v,
where v = [−1, 1], 0 ≤ T1 ≤ T2 < T3 < 1, b ∈ (0, 1), c is a gain factor, and other parameters can be computed as
{
[
]
]}
[
u = a(T3 − T2 ) sigm c(u − b ) − sigm −c(u + b ) u = sign( v) ( v − T2 ) (T3 − T2 ) a=
[
]
1
[
]
sigm c(1 − b ) − sigm −c(1 + b ) sigm( x ) =
1 1 + e −x
EGAG is simply an enhancement operator as shown in Figure 8.6, and T1, T2, and T3 are selected parameters. The interval [T1, T2] serves as a sliding window for feature selectivity. It can be adjusted to emphasize features within a specific range of variation. By selecting a gain, a window, and other parameters, the distinct enhancement can be achieved. Thus, through this nonlinear operator, wavelet coefficients are processed for feature enhancement by hm′ = M m EGAG ( hm M m )
(
M m = max hm ( m, n ) m, n
)
where position (m, n) ∈ D, the domain of the input image f(m, n). Nonlinear operators for modifying the wavelet coefficients in color images are also proposed in [31, 38].
216
Image Contrast Enhancement Generalized adaptive gain: c=1, b=.35, T1=1, T2=.2, T3=.9 1 0.8 0.6 0.4
E-GAG
0.2 0
−0.2 −0.4 −0.6 −0.8 −1
−1
−0.8 −0.6 −0.4 −0.2
0
0.2 T1 T2
0.4
0.6
0.8
1 T3
V
Figure 8.6
8.3.3.4
A simple generalized adaptive gain (GAG) function.
Output Contrast Enhancement Presentation
Figure 8.4 illustrates an overview of the multiscale contrast enhancement method. There are two output images. The first output image is called the mountain-view presentation. The input image is passed through the forward edge-wavelet transformation to produce the multiscale-edge presentation, which are the partial derivatives of the input image. This multiscale-edge presentation shows a mountain (or hill) presentation of the input image. The mountains are the areas containing high-contrast edges. Instead of displaying the image in the traditional intensity domain, the mountain-view presentation displays the image in a multiscale-edge domain. Therefore, the high-contrast edges are displayed in an enhanced manner. In Figure 8.7(a), an original infrared image shows an urban scene. Figure 8.7(b) shows a mountain-view presentation output image of this original input image. There is a helmet hiding inside one of the windows at the right lower side; however, the helmet is hardly visible in the original input image. In the mountain-view presentation, the helmet “stands out” from the surrounding background because the edges of the helmet are presented and prominently displayed. The second output image is the contrast-reconstruction presentation, which is shown in the lower half of Figure 8.4. This is the output where the wavelet coefficients are generated, modified, and reconstructed through the previous process. Figure 8.7(c) shows the contrast-reconstruction presentation of the infrared image in Figure 8.7(a). Similarly, the helmet inside the lower window is displayed more clearly than in the original image because the edges of the helmet are emphasized. Yet, the overall image presentation is not distorted.
8.4 Contrast Enhancement Image Performance Measurements
217
(a)
(b)
(c)
Figure 8.7 An infrared image showing an urban scene: (a) original image; (b) mountain-view presentation; and (c) contrast-reconstruction presentation.
8.4
Contrast Enhancement Image Performance Measurements 8.4.1
Background
The Combined Arms Operation in Urban Terrain (CAOUT) document [39] estimated that by 2010, 75 percent of the world’s population will live in urban areas. Urban areas are expected to be the future battlefield, and combat in urban areas cannot be avoided. Conflict in the urban environment has unique characteristics that are not found in the traditional battlefield. For example, instead of distinct battle lines, the fighting may surround the soldier, including from above and possibly below. Conditions can rapidly change from peacekeeping missions to high-intensity situations and back again. Buildings can provide cover for hostile forces as well as civilians. The direction of gunfire can be masked by echoes and by ricocheting ammunition. Combined with these confusing factors, the range of visibility for a ground patrol is greatly reduced in urban areas due to surrounding structures. “Only 5 percent of targets are more than 100 meters away. About 90 percent are located 50 meters or less from the soldier” [40]. This environment can be dangerous and chaotic. The
218
Image Contrast Enhancement
unique nature of the urban environment creates new requirements on the search-and-detection task. Soldiers are required to observe a greater area than in the traditional battlefield. Image contrast enhancement techniques could be useful tools in assisting the soldier’s task of searching an area for threats. The U.S. Army Night Vision and Eletronic Sensor Directorate (NVESD) has examined the field-of-view (FOV) search task in the urban environment and has shown that the Edwards-Vollmerhausen Time Limited Search (TLS) model [41] can be adapted or calibrated to accurately describe human performance while searching the FOV in an urban environment. The human performance impact by the image enhancement processing is quantified by some of the calibration parameters of the Time Limited Search (TLS) model, which is discussed in the next section. 8.4.2
Time Limited Search Model
The TLS model was used to represent the behavior of computer-generated forces in war games and other simulations. This model was developed to characterize the search task in rural areas. The TLS model predicts the drop in probability of detection (i.e., the decrease in the likelihood of finding the target) as a function of shortened observation time during a FOV search. For a vehicle-borne sensor used in a rural environment, an observer first scans the image for areas of interest (such as potential targets) using the wide FOV (WFOV) setting and then switches to narrow FOV (NFOV) to further interrogate the area of interest. The observer then determines whether there is a target of military interest and continues the engagement process, or declares a false alarm (FA) and returns to the WFOV. The observer continues to switch between FOVs until a target is detected or the decision is made that no target is present in the FOV. This search process timeline is shown in Figure 8.8. The TLS methodology was designed to represent the total time that an observer takes to detect a target when operating with time-constrained conditions. This included the initial time searching in the WFOV, time wasted on FAs, and any additional search time required until a real target is detected or the observer is convinced that a target does not exist in the FOV. The primary focus of the model was the initial time associated with the observer scanning the area in WFOV before detecting a target, or finding something that would cause them to change to the NFOV (i.e., the dotted box in Figure 8.8). However, the overall impact could not be assessed and implemented without addressing other events, such as FAs, that occur under the same time-constrained conditions. It was assumed that the observers would make the effort to look at an area of interest as quickly as possible and not waste time during the search process; after all, there may be a threat vehicle in the FOV ready to engage. No target present (move to next FOV) WFOV NFOV
OR FA (return to WFOV)
Figure 8.8
Search process timeline.
Target detected
8.4 Contrast Enhancement Image Performance Measurements
219
In the search task, the probability of detecting a target in a given time was modeled as [42]: t −t d − P(t ) = P∞ 1 − e τ
where P(t) is the fraction of observers (percentage) who find the target in time “t”; t is the time to view the image; P∞ is the asymptotic probability that observers that will achieve given infinite time (it is derived from empirical measurement); td is the time delay associated with the experimental or hardware interface; and τ is the mean time required to find the target. The relationship between τ and P∞ in this model is [41] τ = 3 − 2.2P∞
This model was recently approved by the U.S. Army Materiel Systems Analysis Activity (AMSAA) for the traditional search task—the rural terrain scenario. Recently, NVESD had applied this methodology to the search-and-detection task in the urban environment. The TLS model was chosen over other models because of its emphasis on limited time to detect a target. Due to the speed at which conditions can change in the urban environment, the importance of time needed to be emphasized. 8.4.3 8.4.3.1
Experimental Approach Field Data Collection
A set of imagery was collected from a U.S. Army Urban Operation training site. The Urban Operation training site used in these experiments is modeled after a small, European-style town. This town consists of approximately 20, predominantly two-story, buildings; some buildings also have access to flat roofs. Painted cinderblock was the dominant building material, as well as metal doors and Plexiglas windows (many of which were opened). For these data collections, most of the targets were military combatants dressed in combat fatigues and carrying a rifle. The rest were civilians in informal dress. The targets were instructed to hide between buildings, in doorways and windows, on rooftops, and so on. Often, only a part of the target was visible because it was obscured by the buildings. An objective was to get a set of images representative of different times of the day. Collection times were centered around 0200 and 1400 to avoid thermal crossover. The criteria for choosing the perception experiment imagery were based on the perceived difficulty of detection and how closely the scenes from different wavebands could be matched. The original imagery was collected with the midwave infrared (MWIR) sensor, Avio TVS-8500. It had a 256 × 236 InSb detector array and was cooled by an integrated stirling cooler. Internal filters limited the wavelengths to the 3.5- to 4.1-µm and 4.5- to 5.1-µm bands, which effectively filtered any photons generated due to CO2 photon emission in the atmosphere. The FOV was 14.6 degrees by 13.7
220
Image Contrast Enhancement
degrees, and the detector pitch was 30 µm, yielding an instantaneous FOV (IFOV) of 1-mrad. The minimum temperature resolution (MTR) was 0.02°C for a 30°C blackbody, and the temperature measurement accuracy was ±2°C. Examples of MWIR day and night imagery are shown in Figure 8.9. The original imagery was processed by the following gain and level processing: I − Imin Io = i ⋅ Bit Imax − Imin
where Ii was the original image, Imin and Imax were the minimum and maximum values of the image, respectively, and Bit was the bit depth. The gain-and-level process was to approximate a gain control that is available to the sensor operator for most IR sensors. The original radiometric images were processed using contrast enhancement based on multiscale wavelet edges and produced three output contrast enhancement presentations: mountain-view, contrast-reconstruction, and gradient. Referring to Section 8.3.3, the gradient image was calculated as Gm =
( hm )
2
+ (vm )
2
where hm and vm were the horizontal and vertical wavelet coefficients, respectively, and m was the resolution scale. In this experiment, m = 2 was used. For comparison purposes, the human observer results of the image contrast enhancement processed images were compared with the performance achieved using gain-and-level processing. For the purpose of simplicity, we refer to the images processed using the gain-and-level processing as “original images.” The four image types that were used in the perception experiment are illustrated in Figure 8.10. 8.4.3.2
Experiment Design
The search experiment was divided into four types of imagery: original, mountainview, contrast-reconstruction, and gradient. There were 70 images in each experiment, divided into five cells with 14 images each. Only approximately 5 percent of the images contained no target. Observers were instructed to study a Microsoft PowerPoint presentation to familiarize themselves with the experimental procedure and the images. They were given the following instructions:
Day imagery
Figure 8.9
Examples of MWIR day and night imagery.
Night imagery
8.4 Contrast Enhancement Image Performance Measurements
221
(a)
(b)
(c)
(d)
Figure 8.10 Examples of FLIR search images: (a) original; (b) contrast-reconstruction presentation; (c) mountain-view presentation; and (d) gradient presentation.
Observers will be taking four experiments with 70 images each. These experiments are designed to determine the effects of different image processing techniques on the soldier’s ability to detect a target in the urban environment. The image processing techniques are as follows: mountain-view, contrast-reconstruction, original, and gradient. There are examples of each in the training experiments. Each image will be presented for a maximum of 9 seconds. Targets are defined as all humans that appear in the images and all military equipment (rifles). Soldiers are tasked to scan the image as quickly as possible and identify: • • • •
Any region of the scene that contains a target; Any region of the scene you would investigate more closely; If more than one of these regions exists, the one most likely to contain the target; If there are no regions of the scene that fit these descriptions, click on the no-target button located to the right of the image or press the space bar.
Please ensure that the soldiers do not take all of the experiments in the same order. The soldiers should also not take more than one experiment per day. This should negate any learning effects. Included in this experiment is a PowerPoint presentation describing the experiment, four training experiments with 5 images each, and four experiments with 70 images each.
222
Image Contrast Enhancement
Observers performed one training experiment (using imagery different from the true experiment) with a 3-second limit before proceeding to the 9-second time limit. The 3-second portion was used to condition the observers to work quickly, while the 9-second portion was used for analysis. An 8-image practice experiment with the appropriate time limits was administered to the observers so they could become acquainted with the interface. A total of 24 military observers took these experiments. 8.4.4
Results
The results of these experiments are analyzed separately as day and night images. The two parameters used as a measure of change are probability of detection and the time to detect targets. 8.4.4.1
Results—Night Images
The probability of detection for each image that contains a target for a direct comparison between image enhancement presentations is provided in Figure 8.11. There is a systematic increase in the probability of detection for the contrast-reconstruction images as compared to all other images, including the original imagery used to replicate sensor controls. The numerical analysis is shown in the next section. However, for the gradient and mountain-view presentations, the original imagery provided higher probabilities of detection for more images. The same graph is created for the average time to detect the target, as shown in Figure 8.12. For half of the images, the original images require the observer to use more time to detect the targets in the image than all of the other contrast enhancement processing presentations. For the night images, most of the contrast-reconstruction Average probability of detection 1 0.9 0.8 0.7
Pd
0.6 0.5 0.4 0.3 0.2 0.1 0 0
5
10
15
20
25
30
35
Image Original Contrast-reconstruction Gradient Mountain-view
Figure 8.11
Probability of detection for each of the 35 night images that contain a target.
8.4 Contrast Enhancement Image Performance Measurements
223
Average time to detect a target 6 5
Time (sec)
4 3 2 1 0 0
5
10
15
20
25
30
35
Image Original Contrast-reconstruction Gradient Mountain-view
Figure 8.12
The average time to detect a target for each night image.
images had higher probabilities of detection and required less time for detection than the original images. 8.4.4.2
Results—Day Images
Similar plots for the day images were prepared to evaluate the probability of detection for each image, which allows for a direct comparison between image enhancement processing presentations. As shown in Figure 8.13, there is a slight increase in the probability of detection for the contrast-reconstruction images as compared to all other images. However, for the gradient and mountain-view presentations, the original images provided higher probabilities of detection for more images. Figure 8.14 displays the graph for the average time to detect a target for each image using day images. From Figure 8.14, it can be seen that the contrastreconstruction and mountain-view presentations require less time than the original images. The gradient images require approximately the same amount of time as the original images. 8.4.5
Analysis
A more rigorous analysis is performed by calculating the average difference in probability of detection and time to detect a target between the original images and all other contrast enhancement presentation images. By calculating the difference of the probability of detection of the original images and each of the other contrastenhancement presentation images, a confidence interval may be calculated. The difference of the probability is calculated as follows:
224
Image Contrast Enhancement
Average probability of detection 1 0.9 0.8 0.7
Pd
0.6 0.5 0.4 0.3 0.2 0.1 0 0
5
10
15
20
25
30
35
Image Original Contrast-reconstruction Gradient Mountain-view
Figure 8.13
Probability of detection for each of the 35 day images that contain a target.
Average time to detect a target 6
Time (sec)
5 4 3 2 1 0
0
5
10
15
20 Image
25
30
35
Original Contrast-reconstruction Gradient Mountain-view
Figure 8.14
The average time to detect a target for each day image.
Pdiff = Porig − Pother
where Porig is the probability of detection for the original images, and Pother is the probability of detection for the same image using a different contrast enhancement presentation. Averaging the Pdiff value for all images represents the average change between the original images and the other presentation images. The confidence
8.4 Contrast Enhancement Image Performance Measurements
225
interval is then calculated as a t statistic with n − 1 degrees of freedom as shown in the following: σ CI = ± t α 2 (n − 1) ⋅ n
where σ is the standard deviation of the performance difference values for the probability of detection, n is the sample size, and t is the t-distribution for n − 1 degrees of freedom. The 95 percent confidence interval means that the difference between the parameters lies within the interval 95 percent of the time. A statistically significant result occurs when the confidence interval is applied to the average value and the result yields an interval that does not cross zero. 8.4.5.1
Analysis—Night Images
The statistical values for the differences in the probability of detection for the night images are shown in Table 8.1. Both the contrast-reconstruction and gradient images show an improvement in the probability of detecting targets. However, the gradient image is not a statistically significant improvement. The contrast-reconstruction images showed an average improvement of 5 percent over the original images. Performance with the gradient images was statistically the same as the original images, and the one with mountain-view images was worse than the original images. The same analysis was performed for the average time of detection. The statistical values are shown in Table 8.2. Considering the time required to detect targets, all contrast enhancement processed images required less time than the original images. However, the most significant average time difference of the processing presentations was the contrast-reconstruction presentation. This contrast enhancement output presentation requires on average 0.4 second less than the original images. 8.4.5.2
Analysis—Day Images
The same analysis was repeated for day images for both the probability of detection and the average time to detect the target. Table 8.3 shows the statistics of the difference of the probability of detecting a target between the original images and the other processed images. The difference in the probability of detection is only statistically significant when the original images are compared to the gradient and mountainview presentation images. Those comparisons show that the original images provide
Table 8.1 Statistical Values for the Difference Between the Probability of Detection of the Original Images and the Other Enhancement Presentations for Night Images Original Contrast Original Original Reconstruction Gradient Mountain View Average
−0.056
−0.007
0.042
Standard deviation
0.131
0.226
0.173
95 percent confidence interval
0.044
0.076
0.058
226
Image Contrast Enhancement Table 8.2 Statistical Values for the Difference Between the Average Time of Detection of the Original Images and the Other Enhancement Presentations for Night Images Original Contrast Original Original Reconstruction Gradient Mountain View Average
0.442
0.212
0.222
Standard deviation
0.464
0.597
0.679
95 percent confidence 0.158 interval
0.204
0.232
Table 8.3 Statistical Values for the Difference Between the Probability of Detection of the Original Images and the Other Enhancement Presentations for Day Images Original Contrast Original Original Reconstruction Gradient Mountain View Average
−0.024
0.087
0.143
Standard deviation
0.159
0.171
0.227
95 percent confidence 0.054 interval
0.058
0.076
a higher probability than the other two types of images. When comparing the original to contrast-reconstruction presentation images, a slight advantage exists for the contrast-reconstruction images, but it is not statistically significant. Table 8.4 shows the statistical differences in the time to detect a target between the original images and the other three presentations. The difference in the time to detect a target is statistically significant in all of the comparisons. Comparing the original images to both the contrast reconstruction and mountain view shows that the original images could take up to 0.5 second longer to find a target than with the other two types of images. When comparing the original to the gradient images, a slight advantage exists for the original images in finding the target faster. 8.4.6
Discussion
In this research, the TLS methodology of experimentation was used to evaluate the effectiveness of three different types of image contrast enhancement processing presentations. Tables 8.5 and 8.6 summarize the results for both night and day imagery. The entries compare the new processing techniques to the original images. Table 8.4 Statistical Values for the Difference Between the Average Time of Detection of the Original Images and the Other Enhancement Presentations for Day Images Original Contrast Original Original Reconstruction Gradient Mountain View Average
0.455
−0.155
0.330
Standard deviation
0.547
0.473
0.619
95 percent confidence interval
0.184
0.159
0.208
8.5 Summary
227
The contrast-reconstruction presentation provides for significant improvements in both the probability of detecting a target and the time required to detect a target for night imagery. For the day imagery, the results are inconclusive for this technique compared to the original images. This could be due to the fact of the solar reflections in the MWIR imagery [43]. This makes the clutter level in day imagery much higher than in night imagery. However, this technique still provides faster detection times for the same number of detections. The gradient and mountain-view presentations, while providing faster detection times, either provided fewer detections or there was no statistically significant difference between these images and the original images. Therefore, of these three contrast enhancement output presentations, only the contrast-reconstruction presentation imagery provided any measurable improvement. It is possible that the contrast-reconstruction presentation is in the same intensity domain as in the input image, so users might be in favor this presentation over other presentations. With more training and educating, the mountain-view presentation maybe found to be a useful tool in the search-and-detection task.
8.5
Summary In this chapter, we have studied several image contrast enhancement algorithms, including single-scale and multiscale processing methods. Among multiscale processing methods, wavelet edge–based contrast enhancement has advantages to eliminate the enhancement artifacts. A method of image performance measurements to demonstrate the benefit of one image contrast enhancement algorithm in enhancing the detectability of targets in the urban terrain is discussed. Another image contrast enhancement algorithm, tone scale, to improve image quality by adjusting the dynamic range of the image, is discussed in Chapter 10.
Table 8.5 Summary of the Three Contrast Enhancement Presentations as Compared to the Original Processing Utilized for This Experiment for Nighttime MWIR Images Original Processing Contrast-enhancement presentation
Probability of detection
Time to detect
Contrast reconstruction More detections
Faster
Gradient
Inconclusive
Faster
Mountain view
Fewer detections
Faster
Table 8.6 Summary of the Three Contrast Enhancment Presentations as Compared to the Original Processing Utilized for This Experiment for Daytime MWIR Images Original Processing Contrast-enhancement presentation
Probability of detection
Time to detect
Contrast reconstruction Inconclusive
Faster
Gradient
Fewer detections
Faster
Mountain view
Fewer detections
Faster
228
Image Contrast Enhancement
References [1] Michelson, A. A., Studies in Optics, New York: Dover, 1995 [2] Lee, H. -C., Introduction to Color Imaging Science, Cambridge: Cambridge University Press, 2005. [3] Fivez, C., et al., “Multi-Resolution Contrast Amplification in Digital Radiography with Compensation for Scattered Radiation,” Proc. IEEE Int. Conf. on Image Processing, Lausanne, Switzerland, September 16–19, 1996, pp. 339–342. [4] Zong, X., A. F. Laine, and E. A. Geiser, “Speckle Reduction and Contrast Enhancement of Echocardiograms Via Multiscale Nonlinear Processing,” IEEE Transactions on Medical Imaging, Vol. 17, No. 4, 1998, pp. 532–540. [5] Leachtenauer, J. C., and R. G. Driggers, Surveillance and Reconnaissance Imaging Systems—Modeling and Performance Prediction, Norwood, MA: Artech House, 2001. [6] Jain, A. K., Fundamentals of Digital Image Processing, Englewood Cliffs, NJ: Prentice-Hall, 1989. [7] Pratt, W. K., Digital Image Processing, New York: Wiley, 1978. [8] Morrow, W. M., et al., “Region-Based Contrast Enhancement of Mammograms,” IEEE Transactions on Medical Imaging, Vol. 11, No. 3, 1992, pp. 392–405. [9] Land, E. H., and J. J. McCann, “Lightness and Retinex Theory,” Journal of the Optical Society of America, Vol. 61, 1971, pp. 1–11. [10] Land, E. H., “Recent Advances in Retinex Theory,” Vision Research, Vol. 26, 1986, pp. 7–21. [11] Funt, B., F. Ciurea, and J. McCann, “Retinex in Matlab,” Journal of Electronic Imaging, Vol. 13, 2004, pp. 48–57. [12] Jobson, D. J., Z. Rahman, and G. A. Woodell, “Properties and Performance of a Center/ Surround Retinex,” IEEE Transactions on Image Processing, Vol. 6, No. 3, 1997, pp. 451–462. [13] Liu, Y., et al., “A Multi-Scale Retinex Algorithm for Image Enhancement,” Proc. IEEE International Conference on Vehicular Electronics and Safety, Jiangsu, China, October 14–16, 2005, pp. 131–133. [14] Rosenfeld, A., and M. Thurston, “Edge and Curve Detection for Visual Scene Analysis,” IEEE Transactions on Computer, Vol. C-20, No. 5, 1971, pp. 562–569. [15] Hall, E., J. Rough, and R. Wong, “Hierarchical Search for Image Matching,” Proc. IEEE Conference of Decision and Control, Houston, TX, December 1–3, 1976, pp. 791–796. [16] Koenderink, J., “The Structure of Images,” in Biological Cybernetics, New York: Springer-Verlag, 1984. [17] Burt, P. J., and E. H. Adelson, “The Laplacian Pyramid As a Compact Image Code,” IEEE Transactions on Communication, Vol. COM-31, 1983, pp. 532–540. [18] Tanimoto, S., and T. Pavlidis, “A Hierarchical Data Structure for Image Processing,” Computer Vision, Graphics and Image Processing, Vol. 4, 1975, pp. 104–119. [19] Johnson, J., “A Filter Family Designed for Use in Quadrature Mirror Filter Banks,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’80), New York, April 1980, pp. 291–294. [20] Vaidyanathan, P. P., “Multirate Digital Filters, Filter Banks, Polyphase Networks, and Applications: A Tutorial,” Proc. IEEE, Vol. 78, No. 1, 1990, pp. 56–95. [21] Vaidyanathan, P. P., Multirate Systems and Filter Banks, Englewood Cliffs, NJ: Prentice-Hall, 1993. [22] Witkin, A., “Scale-Space Filtering: A New Approach to Multi-Scale Description,” Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing (ICASSP’84), Vol. 9, Part 1, San Diego, CA, May 1984, pp. 150–1S. S. 53. [23] Yule, J.A.C., “Unsharp Masks and a New Method of Increasing Definition in Prints,” Photographic Journal, Vol. 84, 1944, pp. 321–327.
8.5 Summary
229
[24] Tahoces, P. G., et al., “Enhancement of Chest and Breast Radiography by Automatic Spatial Filtering,” IEEE Transactions on Medical Imaging, Vol. 10, No. 3, 1991, pp. 330–335. [25] Chang, D. C., and W. R. Wu, “Image Contrast Enhancement Based on a Histogram Transformation of Local Standard Deviation,” IEEE Transactions on Medical Imaging, Vol. 17, No. 4, 1998, pp. 518–531. [26] Narendra, P. M., and R. C. Fitch, “Real-Time Adaptive Contrast Enhancement,” IEEE Transactions on Pattern Machine Intell., PAMI-3, 1981, pp. 655–661. [27] Lee, H. -C., “Automatic Tone Adjustment by Contrast Gain-Control on Edges,” U.S. Patent 6285798, September 4, 2001. [28] Hubel, D. H., Eye, Brain, and Vision, New York: Scientific American Library, W. H. Freeman, 1988. [29] Zeki, S., A Vision of the Brain, Oxford, U.K.: Blackwell Scientific Publications, 1993. [30] Mallat, S. G., and S. Zhong, “Characterization of Signals from Multiscale Edges,” IEEE Transactions on Pattern and Machine Intelligence, Vol. 14, No. 7, 1992, pp. 710–732. [31] LaRossa, G. N., and H. -C. Lee, “Digital Image Processing Method for Edge Shaping,” U.S. Patent 6611627, August 26, 2003. [32] Wang, Y.-P., et al., “Image Enhancement Using Multiscale Differential Operators,” Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing (ICASSP’01), Salt Lake City, UT, May 2001, pp. 1853–1856. [33] Rosenfeld, A., “A Nonlinear Edge Detection Technique,” Proc. IEEE, 1970, pp. 814–816. [34] Lu, J., D. M. Healy, and J. B. Weaver, “Contrast Enhancement of Medical Images Using Multiscale Edge Representation,” Optical Engineering, Vol. 33, No. 7, 1994, pp. 2151–2161. [35] Laine, A., J. Fan, and W. Wang, “Wavelets for Contrast Enhancement of Digital Mammography,” IEEE Eng. Med. Biol., Vol. 14, No. 5, 1995, pp. 536–550. [36] Mencattini, A., et al., “Wavelet-Based Adaptive Algorithm for Mammographic Images Enhancement and Denoising,” Proc. IEEE Int. Conf. on Image Processing, Genoa, Italy, September 11–14, 2005, pp. I: 1141–1144. [37] Chang, C. -M., and A. Laine, “Enhancement of Mammography from Oriented Information,” Proc. IEEE Int. Conf. on Image Processing, Washington, D.C., October 26–29, 1997, pp. 524–527. [38] Toet, A., “Multiscale Color Image Enhancement,” Proc. IEEE Int. Conf. Image Processing and Its Applications, Maastricht, the Netherlands, April 7–9, 1992, pp. 583–585. [39] U.S. Army, Combined Arms Operations in Urban Terrain, 28, FM 3-06.11, 1–1, February 2002. [40] U.S. Army, Combined Arms Operations in Urban Terrain, 28, FM 3-06.11, 1–18, February 2002. [41] Edwards, T. C., et al., “NVESD Time-Limited Search Model,” Proc. of SPIE, Vol. 5076, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing XIV, Orlando, FL, April 2003, pp. 53–59. [42] Howe, J. D., “Electro-Optical Imaging System Performance Prediction,” in The Infrared & Electro-Optical Systems Handbook, M. C. Dudzik, (ed.), Vol. 4, Bellingham, WA: SPIR Press, 1993, p. 106. [43] Driggers, R. G., P. Cox, and T. Edwards, Introduction to Infrared and Electro-Optical Systems, Norwood, MA: Artech House, 1999.
CHAPTER 9
Nonuniformity Correction In this chapter, the correction of nonuniformity in pixel response in imagers is presented. The sources of nonuniformity in focal plane arrays (FPAs) and a standard model of its effects are described. A sampling of calibrator-based and adaptive techniques are presented with examples. The impact of residual noise after nonuniformity correction (NUC) is discussed.
9.1
Detector Nonuniformity Nonuniformity can be a major limitation in the performance of focal plane array (FPA) systems. Any lack of uniformity in the detector response characteristics across the array is seen on the output image as spatial noise (see Figure 9.1). For infrared systems, particularly uncooled systems, the resulting spatial noise can be larger than temporal noise and can cause serious performance limitations to the system. A primary benefit of FPAs is the large increase in the detector integration time applied to any point on the image plane. In other words, the detector dwell time is significantly increased over single or multiple pixel scanning systems. The increase in integration time provides a large increase in sensitivity (i.e., signal-to-noise ratio). To reap the benefits of this increased detector signal-to-noise ratio, the effects of detector nonuniformities and corresponding spatial noise must be corrected to a commensurately high level. Both staring and scanning arrays can suffer from the effects of detector nonuniformity. In staring arrays, the effect is a 2-D pattern noise, while in scanning arrays it is a 1-D pattern normal to the scan direction. The sources of detector response nonuniformity are associated with both the detector array and the readout circuitry [1]. Process variations, both global and between individual unit cells in the array, can cause differences in key parameters of the detectors, such as dark current, gain, impedance, and sense node capacitance, among others [2]. These differences lead to a different response characteristic for each detector in the array. While manufacturers take great pains to reduce nonuniformity, the reality is that most arrays require some form of correction to achieve good image quality. A mathematical description of nonuniformity can be formulated taking into account the physics of the scene and the device [3, 4]. These models are useful for understanding the interplay of device parameters on the noise in the array and the relative strength of spatial and temporal noise sources under various imaging scenarios. For our purposes, a simpler model adopted by most researchers developing algorithms for NUC is used [5–8]. This model assumes that the response of each
231
232
Nonuniformity Correction
Figure 9.1 Longwave thermal infrared image without NUC (left) and with two-point NUC (right).
detector is linear and described by a multiplicative gain and additive offset. For the pixel index i, j in the array at frame index k, the response is given by pk (i, j) = g k (i, j)φ k (i, j) + o k (i, j)
(9.1)
where p is the pixel value, g is the gain of the pixel, φ is the average photon flux on the pixel, and o is the offset of the pixel. While the model is linear, it can be applied to detector technologies having nonlinear responses by applying it in a piecewise fashion over the desired operating range. An alternate approach that circumvents nonlinear detector response characteristics is given by Schulz and Caldwell [9]. In (9.1), the differences in the gain and offset terms among pixels result in the spatial noise visible in the output image. The process of correction is accomplished by normalizing for all pixels the variations in gain and offset. Mathematically, the corrected output image is given by pkc (i, j) = g kc (i, j) pk (i, j) + o kc (i, j)
(9.2)
where p kc is the corrected pixel value, g kc is the gain correction term, and o kc is the offset correction term.
9.2
Linear Correction and the Effects of Nonlinearity The most direct method of correcting for nonuniformity is to use known calibration sources to solve for the appropriate terms in (9.2). The calibrator can be internal to the imager or external and must produce a known (typically uniform) radiant power across the focal plane of the imager. A number of calibrator-based NUC algorithms have been developed. They can be as simple as a two-point calibration where the response of each detector is measured at two different uniform input flux levels [5]. A linear correction is then applied as a gain and offset value for each of the detectors. A more sophisticated approach is a multipoint calibration with a piecewise linear set of gain and offset values for each detector. This approach requires a significant increase in the capability of the NUC electronics and a precise calibration source, which is variable over a large dynamic range. An even more effective NUC algorithm involves a higher-order polynomial fit for each detector response [9, 10]. The simple two-point calibration method is described in the next section.
9.2 Linear Correction and the Effects of Nonlinearity
9.2.1
233
Linear Correction Model
A source having two flux levels, φL and φH, where it is assumed that φL < φH, is imaged at each flux level. The flux levels do not vary with time and are uniform across the pixels. The uncorrected pixel value at location i, j due to φL is p kL (i , j) and that due to φH is p kH (i , j). To determine the gain and offset correction terms in (9.2), a uniformity condition based on a spatial average of the uncorrected pixel values is imposed. This condition is given by pkc (i, j) = K pk (i, j) + O
(9.3)
where denotes a spatial average over all pixels in the frame and K and O are arbitrary constants controlling the global gain and offset of the output image. The right-hand sides of (9.2) and (9.3) evaluated at the two flux levels φL and φH form a system of two equations in two unknowns. Solving this system, the gain and offset correction terms are written as g kc (i, j) = K
pkH (i, j) − pkL (i, j) pkH (i, j) − pkL (i, j)
(9.4)
o kc (i, j) = O + K pkH (i, j) − g kc (i, j) pkH (i, j)
By substituting (9.1) into (9.4), the corrected gain and offset can be found in terms of the unknown gain and offset of (9.1), or g kc (i, j) = K
g k (i, j) g k (i, j)
g k (i, j) o kc (i, j) = O + K o k (i, j) − o k (i, j) g k (i, j)
(9.5)
These last two expressions are useful in performing an error analysis. Defects in the fabrication of detectors and tolerance variations in the read-out circuitry can lead to distinctly nonlinear detector responses to radiant flux. These nonlinearities are often not large. However, even with 0.1 percent nonlinearity, the spatial noise can be large in comparison to the temporal noise when the sensor is designed to cover over an order of magnitude in flux. 9.2.2 9.2.2.1
Effects of Nonlinearity Residual Error
Nonlinear response is a departure from the model assumed in (9.1). The actual pixel response can be written as pka (i, j) = pk (i, j) + pknl (i, j; φ)
(9.6)
where p knl is the nonlinear portion of the response. Putting (9.6) into (9.2) gives the corrected pixel response as
234
Nonuniformity Correction
pkc (i, j) = g kc (i, j) pk (i, j) + o kc (i, j) + g kc (i, j) pknl (i, j; φ)
(9.7)
The last term in (9.7) is the error term associated with departure of the pixel response from linearity. Residual error after the NUC can be characterized by the variance of this term [10], or
[
]
σ 2NUC = var g kc (i, j) pknl (i, j; φ)
(9.8)
where the symbol var[ x ] = [ x − x ]2 . This expression can be simplified by assuming g k (i, j) = g k (i, j) {1 + δ k (i, j)}
(9.9)
The term δk(i, j) describes the variability in the linear gain gk(i, j). Noting that δ k (i , j) = 0, then using (9.9) and (9.5), pknl (i, j; φ) nl σ 2NUC = var ≈ var pk (i, j; φ) 1 − δ k (i, j) i j 1 + δ , ( ) k
[
]]
[
(9.10)
where the last expression results from a truncated Taylor series expansion of [1 + −1 δk(i,j)] for dk I r
(10.2)
where Il and Ir present the small gray shade value range shown in Figure 10.1(b). This operation maps the input image through a steep line from 0 to 255 only over a small gray shade value range (Il , Ir), and 0 below and 255 above the selected range. The operation stretches the small range of gray values over the full range from 0 to 255. By manipulating the previous equation, the values M and Ia are determined by
10.2 Piece-Wise Linear Tone Scale
249 1
0.8
0.6
0.4
0.2
0 0
50
100
Il
150
200
250
Ir
(a)
(b)
300 250 200 150 100 50 0 0
50 Il
100
150
200
250
300
Ir (c)
(d)
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
50
100 150 (e)
200
250
Figure 10.1 (a) An original thermal image is too dark and of low contrast due to the temperature of the lamp at the lower left corner, much higher than the rest of the scene. (b) Normalized histogram of part (a), where Il and Ir present the small gray value range in the histogram. (c) Tone scale mapping curve. This operation maps the input image through a steep line from 0 to 255 only over a small gray value range (Il, Ir), and 0 below and 255 above the selected range. (d) The resultant image after the tone scale is applied. (e) The histogram of the resultant image after the tone scale is applied.
250
Tone Scale
M=
255 , Ia = Il Ir − Il
Because the image values outside of the selected range are constrained (clipped), the process is called piece-wise linear tone scale, as shown in Figure 10.1(c). The image after the tone scale operation is shown in Figure 10.1(d). The histogram of the resultant image is shown in Figure 10.1(e). The piece-wise linear tone scale is also called a window-level tone scale. It is similar to the application of a window operation to the image values where the level (center) of the window is at the middle of selected range (Il , Ir)
10.3
Nonlinear Tone Scale A nonlinear tone scale transform expands some portions of the dynamic range at the expense of others. Figure 10.2 shows an example with radar imagery. The nonlinear transform brings out details on the buildings. Such transforms can selectively enhance specific portions of the dynamic range. Another example of such nonlinear tone scale functions is a logarithmic transformation of the gray shade values that allows a larger dynamic range to be recognized at the expense of resolution in the bright regions of the image. The dark regions become brighter and provide more details. The third example is a cubic transformation of the gray shade values that allows the image to be better adapted to the characteristics of the human visual system, which can detect relative intensity differences over a wide range of intensities. In this transform, the image gray shade values are converted into the range (0, 1) before the cubic transformation is performed. Then, the values are mapped to the range of (0, 255). That is, I − Imin Iρ = i Imax − Imin
ρ
× 255
where Iρ is the output of the transformed pixel intensity value, and Imin and Imax are the minimum and maximum pixel intensity values, respectively. Ii is the input original pixel intensity value, and ρ = 3. 10.3.1
Gamma Correction
A special case of the tone scale in digital images that are intended for viewing on a CRT monitor is called gamma correction [4]. In this process, the tone scale has to compensate for the nonlinear characteristics of a CRT monitor. The CRT is well γ described by a power function: L = A . (V + B) , where L is the luminance on the monitor screen, V is the input value to the monitor (usually normalized to 1 at the peak input), and A and B are two constants. The value of γ varies from monitor to monitor, but it is around 2.0–3.0. The standard γ value is assumed to be 2.22 in most international standards. If a monitor is carefully configured and adjusted, B is usually set to zero; that is,
10.3 Nonlinear Tone Scale
251
(a)
(b) 4
Out
3 2 1 0
0
1
2 In
3
4
(c)
Figure 10.2 (a) Original radar image; (b) the radar image after applying a nonlinear tone scale; and (c) a nonlinear tone scale.
L = A ⋅ (V ) = A ⋅ (V ) γ
2 . 22
The gamma correction curve is therefore the inverse function of the nonlinear tone scale of the CRT monitor; that is, V = Y 1 γ = Y 0. 45
where Y is the image illuminance, which is linearly proportional to scene luminance. With such a gamma correction, these two equations in combination give L = A . Y.
252
Tone Scale
Note: Illuminance is a photometry quantity and is a measure of the intensity of the incident light, wavelength-weighted by the luminosity function to correlate with human brightness perception. Illuminance is defined as the total luminous flux incident on a surface, per unit area in photometry [6]. It is a corresponding quantity of irradiance in radiometry where luminous flux converts to radiant flux. The unit of illuminance is lux (lx) or lumens per square meter (lm m−2). The lumen (lm) is the unit of luminous flux emitted in unit solid angle by an isotropic point source having a luminous intensity of 1 candela (cd). Luminance is a photometric measure of the density of huminous intensity in a given direction. It corresponds to radiance in radiometry. It describes the amount of light that passes through or is emitted from a particular area, and falls within a given soild angle. The unit for luminance is candela per square meter (cd m−2 or lm sr −1 m−2). When there is a need to produce very high-quality images that are viewed on a CRT monitor to compensate the camera flare, viewing flare, and viewing surrounds, a more complicated gamma correction curve is constructed. For the details of this gamma correction, readers are referred to [4]. 10.3.2
Look-Up Tables
According to (10.1) and (10.2) and other tone scales that will be discussed in the following section, the direct computation of the tone scale is sometimes very intensive. For example, to present a 512 × 512 image in a logarithmic gray value scale with the point operation P( I i ) = 25.5 log I i
the logarithm would have to be calculated 262,114 times. The key point for a more efficient implementation of tone scale lies in the observation that the definition range of any point operation consists of very few gray values, typically 256. Thus, the very same values would have to be calculated many times. This can be avoided if P(Ii) is calculated for all 256 possible gray values and the computed values are then stored in a 256-element table. Then, the computation of the point operation is reduced to a replacement of the gray value by the element in the table with an index corresponding to the gray value. Such a table is called a look-up table (LUT) [7].
10.4
Perceptual Linearization Tone Scale The eye does not respond to luminance changes in a linear fashion. At low-luminance levels, a larger luminance difference is required for detection than at high-luminance levels. How is it possible to obtain an optimal tone scale so that all gray-level differences are equally discriminable? This type of tone scale is called perceptual linearization tone scale. Lee, Daly, and Van Metter [1] propose that a possible objective of good tone scales is “to produce equal perceived contrast for features having equal physical contrast, independent of the density of the surrounding area.” What are perceived contrast and physical contrast? And why is it desirable to produce equal perceived
10.4 Perceptual Linearization Tone Scale
253
contrast for features having equal physical contrast? The answer was given in [1] as follows: The physical contrast is defined as log exposure difference (for computed radiography – CR). (The absorption and scattering of X-rays in CR produces a shadow image of the internal structures of an object that is not visible to our eyes). To a first approximation, equal thickness increments of a homogeneous object correspond to equal log exposure differences. A given object in the path of X-ray beams will absorb a certain percentage of the X-ray energy independent of what is in the front or in the back. That means that an object (say, a tumor) will roughly create a certain log exposure difference in the image no matter what density the surrounding area has. (This is, of course, only approximately true, because of the scattering and the polychromatic nature of the X-ray source.) Thus, the desired tone scale objective aims at making the same object equally visible independent of where it is located in the body part being examined. The most difficult aspect to quantify in this objective is the perceived contrast. Physical contrast as defined above is a function of the ratio between two exposures. Similarly, perceptual contrast is often defined as the ratio of two perceptual quantities. However, if our goal is to make equal log exposure difference equally visible, the desired perceptual quantity may be equal brightness (or lightness) difference.
Brightness is a perception quantity that corresponds to the strength of our sensation to the intensity of light [4]. Since brightness is a subject measurement, many studies and experiments try to relate the luminance to perceived brightness. One common method of experimental observation of brightness perception of an isolated finite uniform area is described in [4]. The stimulus is an isolated patch of light in the dark. When the light is first turned on, the target initially appears brighter, and then it becomes a little darker as an observer looks at it for a while (i.e., for 2 seconds or more). However, its brightness remains quite stable. An estimate can be given of its subjective brightness magnitude. If the light is then turned off and the observers’ eyes are allowed to rest in the dark for 20 seconds or more, then the light is turned on again but with a different luminance, another estimate of its brightness can be given. By repeating this for many different luminances, a relationship between the perceived brightness magnitude and the luminance can be developed. The relationship is called the brightness model. This discussion is centered around very constrained stimuli. Since brightness perception depends on many factors, including the perceived spatial layout, and images are mostly of complex scenes that are often viewed in complex environments, our choice of the brightness perception model has to be very selective. Lee, Daly, and Van Metter [1] adapted the Michaelis-Menten function [8, 9] as the brightness model for their X-ray image viewing applications. The MichaelisMenten (or Hill) function is written as B=
B m Ln Ln + Ln0
where B is the perceived brightness, Bm is a scale factor, L is the luminance of the object, and n is the Hill coefficient. For their particular application, n = 0.7 is cho-
254
Tone Scale −2
sen. Let La be the adapting luminance (cd m , candelas per square meter) of the visual field. L0 can be calculated as L 0 = 12.6 × L0a. 63 + 1083 . × 10 −5
Let Lw be the luminance of the reference white (the minimum density area of the X-ray film). Hunt [10] suggested that La = 0.2 × Lw. The brightness model takes into account the perceptivity of the human eye. Therefore, it provides a model transforming the luminance into brightness that is perceived by the human eye. Barten’s model of the human visual system [11] was adapted by a joint committee that was formed by the American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA) to develop a standard for Digital Imaging and Communications in Medicine (DICOM)—the Grayscale Standard Display Function (GSDF) [12]. The model was validated by comparison to previous sets of empirical data. The DICOM GSDF allows us to calculate luminance, L (cd m−2), as a function of the just-noticeable differences (JNDs), which are defined as the luminance difference of a given target under given viewing conditions that the average human observer can just perceive. This allows us to describe the characteristic curve of a display system that matches the sensitivity of the human eye to change in brightness (i.e., a perceptually linearized display system). The DICOM GSDF is defined in the following equation: log10 L( j) =
[ ( j) + d ⋅ [l ( j)]
]
[ + f ⋅ [l ( j)]
a + c ⋅ l n ( j) + e ⋅ l n ( j) 1+ b ⋅ln
n
2
2
] + m ⋅[l ( j)] + h ⋅ [l ( j)] + k ⋅ [l ( j)]
+ g ⋅ l n ( j) n
3
3
4
n
4
n
5
n
where ln refers to the natural logarithm, j to the index (1 to 1,023) of the luminance levels Lj of the JNDs, and a = −1.30111877, b = −2.584019E 2, c = −8.024636E 2, d = −1.0320229E 1, e = −1.3646699E 1, f = 2.8745620E 2, g = −2.5468404E 2, h = −3.1978977E 3, k = 1.2992634E 4, m = 1.3635334E 3. The DICOM GSDF model can be applied to a wide range of display devices, including CRT and flat-panel monitors (which are considered as a display system together with the graphic adapter to which they are matched), reflective hardcopy printers, and transmissive hardcopy printers. This has the effect of making different displays look the same, given the same input image. A DICOM calibration function is shown in Figure 10.3. A similar form is used for the IDEX system. The IDEX equation was developed on the basis of an empirical study and is defined as L = 01 . + 4060264 . x − 6226862 . x 2 + 48145864 . x3 . x5 −60928632 . x 4 + 49848766
where L = output luminance and x = normalized command level where 0 ≤ x ≤ 1.0 and L varies from 0.1 to 35 fL. No difference in the two forms has been seen for visible imagery; the NEMA/DICOM form was shown to be superior for radar imagery [13].
10.5 Application of Tone Scale to Enhanced Visualization in Radiation Treatment
255
300 250
Output
200 150 100 50 0
0
100
200
300
Input
Figure 10.3
NEMA/DICOM calibration function.
The effect of tone scale mapping depends on the characteristics of the input image and the interpretation task involved. Probably the most significant effect is that of limiting dynamic range. From a study of softcopy interpretability under conditions of varying dynamic range, a loss of up to 0.8 level of discrimination in the National Image Interpretability Rating Scale (NIIRS) can be inferred [14]. The tone scale transforms discussed were developed for monochrome imagery. The same transforms are applicable to color imagery [multispectral imagery (MSI) and hyperspectral imagery (HSI)] for general viewing. The NEMA/DICOM perceptual linearization function has also been shown to apply to MSI displays [14]. The applications of this function to the medical imaging displays are summarized in [15–18].
10.5 Application of Tone Scale to Enhanced Visualization in Radiation Treatment After describing the idea and various designs of tone scale, an application of tone scale to a real digital image processing problem is presented. In this problem, the enhanced image contrast and output appearance are desired for the radiation treatment of cancer patients. 10.5.1
Portal Image in Radiation Treatment
Figure 10.4 shows a diagrammatic view of a projection X-ray system illustrating an X-ray source of an X-ray beam that can be used for the radiation treatment of a patient. Collimator blades are adjusted to shape the size of beam. Shielding blocks are mounted in front of collimator to shield patient from radiation. Shielding blocks reduce the size of one beam (X-ray beam 1) to another beam (X-ray beam 2), which irradiates the region of patient to be treated. The X-ray image is captured by a device called a computed radiography (CR).
256
Tone Scale X-ray source
Collimator blades Patient X-ray beam 2
X-ray beam 1
Imaging device (CR)
Shielding blocks
Figure 10.4 A diagrammatic view illustrating an imaging system that can be used for radiation treatment of a patient.
Figure 10.5 is a diagrammatic view illustrating the X-ray image captured by this device. The image includes a central region representing the radiation field, a region outward of the radiation field representing the collimation field region blocked by shielding blocks, and a background region representing the region blocked by collimator. The radiation field is also called a port, so the image is called portal image. Portal images are used to evaluate the position of the radiation beam and the placement of X-ray radiation shielding blocks with respect to the patient’s anatomy. When portal images are taken before the treatment, they give the radiation oncologists the opportunity to correct for minor patient positioning errors. When portal images are taken during treatment, they provide a means of monitoring patient movement. A portal image may be acquired by two (or more) radiation exposures. A first exposure is taken with only the collimator mounted. Then, the shielding blocks are mounted in front of the collimator before the second (or the following) exposure is taken. Interior of the shielding blocks is the treatment field, which is called the radiation field. The collimation field represents the interior of the collimator where the useful patient anatomic locations are recorded. There is no treatment information exterior of the collimation field. The image portions inside and outside of the radiation field have different gray level concentrations due to different radiation doses that are applied to the inside and outside of the radiation field. Therefore, the edges of the radiation field appear relatively strong, indicating the placement of the shielding blocks. Because of the high energy of the radiation, digital
Background Collimation filed Radiation filed (port)
Figure 10.5 device.
A diagrammatic view illustrating the X-ray image captured by a portal imaging
10.5 Application of Tone Scale to Enhanced Visualization in Radiation Treatment
257
portal images suffer from systematic low contrast. An original portal image is shown in Figure 10.6. The patient anatomy is hardly visible. It is desirable to increase the contrast of a patient’s anatomy inside and outside of the radiation field. It is also desirable to reduce the differences of the gray level appearances inside and outside of the radiation field. This way the image details of both the inside and outside of the radiation field are displayed using the full dynamic range of the output medium, such as film or a high-resolution display monitor. Moreover, the edges of the radiation field should be preserved and even highlighted. Therefore, the processed portal images can provide radiation oncologists with the opportunity to check the treatment setup accurately and reduce localization errors. 10.5.2
Locating and Labeling the Radiation and Collimation Fields
In order to produce the desired appearance and contrast of the portal image, the radiation field, collimation field, and background need to be considered as separated images. The first step is to locate and label the radiation and collimation fields. This step can be achieved by the edge detection and morphological method. Figure 10.7 shows this result for the image in Figure 10.6. After locating the radiation and collimation fields, the next step is to label them. The labeled radiation field image is a binary image whose pixels inside and outside of the radiation field are 1 and 0, respectively. The labeled collimation field image is also a binary image whose pixels inside and outside of the collimation field are 1 and 0, respectively. 10.5.3
Design of the Tone Scale Curves
Assume an input tone scale curve that is designed for a certain type of film or is computed by the visually optimized method. This section presents a method of formu-
Figure 10.6
An original portal image.
258
Tone Scale
(a)
(b)
(c)
(d)
(e)
Figure 10.7 Locating and labeling radiation field and collimation field for the image in Figure 10.6: (a) candidate points from the edge detection method; (b) located collimation field; (c) labeled collimation field image; (d) located radiation field; and (e) labeled radiation field image.
lating the given input tone scale curve into the appropriate tone scale curves to display the digital portal image effectively. Therefore, one tone scale curve, which is called the radiation tone scale curve, is designed and maps the image inside the radiation field into the full dynamic range of the output medium. The other tone scale curve, which is called the collimation tone scale curve, is also designed and maps the image outside of the radiation field into the full dynamic range of the output medium. Usually, one particular tone scale curve might not be designed for a new output medium; for instance, the new printer system might have a higher- or lower-density range than the range of the tone scale curve. The tone scale curve needs to be scaled selectively instead of linearly to meet the dynamic range of the output medium. This allows the image to be displayed in the designated output medium effectively. The scaled tone scale is then adjusted for contrast to produce a tone scale for desired contrast. In order to map a radiation field image and a collimation field image to match the output medium, two separated tone scale curves need to be constructed, and they are controlled by a shifting parameter called the speed point. In summary, there are three steps for designing the desired tone scale: (1) scaling selectively the input tone scale curve; (2) adjusting the tone scale curve contrast; and (3) determining the speed point. These steps are described in the following section.
10.5 Application of Tone Scale to Enhanced Visualization in Radiation Treatment
10.5.3.1
259
Scaling Selectively the Input Tone Scale Curve
By convention in X-ray images, the tone scale is defined as the log exposure to density conversion curve. The horizontal axis of a tone scale curve represents the log exposure code values (range: 0–4,095 for a 12-bit image). The vertical axis of the tone scale represents the density code values (range: 0–4,095), which are linearly related to the output density (film density, range: 0–4). Usually, the given input tone scale is defined as the log exposure to film density conversion curve as shown in Figure 10.8(a), which is called D-logE curve. The original D-logE curve (dashed line) in Figure 10.8(a) spans the dynamic range from 0.21 to 3.0 in density unit. If the desired D-logE curve has a density range of 0.21 to 3.5, the new D-LogE curve (solid line) is scaled selectively such that the tone scale curve is the same as the original one with the density 0.21 to 3 and is expanded to the density 3.5 using a smoothing function. This operation keeps the printer’s response as the original one for the lower density range and takes the advantage of the higher density range of the new printer system. Figure 10.8(b) shows the linear relationship between the output film density and the density code values for the original (dashed line) and new (solid line) density ranges. The final selectively scaled tone scale curve representing the conversion between the log exposures and the density code values is shown in Figure 10.8(c) (solid line). For comparison, the original tone scale curve is shown in Figure 10.8(c) with a dashed line.
4
4
New
Original 2 1 0
3
Original
2 1 0
0
5000
LogE * 1000
5000
0 Density code value (b)
(a)
Density code value
Film density
Film density
New 3
6000
4000
Original New
2000
0
0
LogE * 1000
5000
(c)
Figure 10.8 (a) A graphical view illustrating an example of selectively scaling the input tone scale to meet the dynamic range of the output medium; (b) linear relationship between the output density and the density code values for the original (dashed line) and new (solid line) density ranges; and (c) the original and the selectively scaled tone scale curve representing the relationship between the log exposure code values and the density code values.
260
Tone Scale
10.5.3.2
Adjusting the Tone Scale Curve Contrast
For special applications, the contrast of the input tone scale curve may not be appropriate for the input image. For example, a digital portal image requires a tone scale curve with a higher contrast to effectively map the input image to the output medium. The crucial point of adjusting the contrast of a tone scale curve is to select a pivot point. At this pivot point, the contrast is unchanged. This pivot point is defined as a pair of points, pp(ep, dp), where ep represents the log exposure code value and dp represents the density code value. Here, the pivot point is determined such that dp is selected as the unchanged density code value and ep is the corresponding log exposure code value that is obtained from the inverse tone scale curve. Then, the contrast-adjusted input tone scale curve is produced by the following. In the case of increasing the contrast, the density code values for the points above dp are increased by a contrast factor, while the density code values for the points below dp are decreased by a contrast factor. In the case of decreasing the contrast, the operation is the opposite. Figure 10.9 shows this phenomenon for a contrast factor cs = 2. This unchanged density code value can be calculated directly from the output density value. The unchanged density value can be selected between the minimum and maximum of the output medium. For a laser printer with the minimum density 0.21 and the maximum density 3.5, the middle density point is about 1.6. For the application of portal images in the radiation treatment, the unchanged density point is selected as 1.6 for the collimation tone scale curve and 1.3 for the radiation tone scale curve. Because the image portion that is inside the radiation field is acquired by a higher radiation, it appears darker in the digital portal image. By selecting the unchanged density point at a lower density, the tone scale maps most of the image inside the radiation field to a lower density (it appears bright). The 4500 4000
Density code value
3500 3000 2500 2000 1500 dp 1000 500 0 0
1000
2000 ep
3000
4000
LogE * 1000
Figure 10.9 A graphical view showing a comparison of the new contrast-adjusted tone scale curve (solid line), and the original one (dashed line). The contrast factor cs = 2.
10.5 Application of Tone Scale to Enhanced Visualization in Radiation Treatment
261
image inside the radiation field appears even brighter than the image outside the radiation field. 10.5.3.3
Determining the Speed Point
In the last step of the design of the tone scale curves, the algorithm determines the speed point of the tone scale curve. The concept of the speed point is inherited from film processing. The speed point is the point where the input tone scale curve is shifted to such that the span of the tone scale curve covers the dynamic range of the input image. The speed point controls the dark or light appearance of the output image. The speed point of the tone scale is determined by a pair of points, sp(es, ds), where es represents the log exposure code value and ds represents the density code value. Here, es is determined by the log exposure code value corresponding to the peak of the histogram of the region of interest in the input image, and ds is determined by the unchanged density code value. The initial point e0 is determined by the corresponding log exposure code value of ds, which is obtained from the inverse tone scale curve. The scaled and contrast-adjusted tone scale curve tsc is shifted to the speed point sp(es, ds) from the initial point sp(e0, ds) to generate the output tone scale curve to. Figure 10.10 illustrates this situation. Figure 10.11 illustrates a block diagram determining the speed points for the radiation and collimation tone scale curves tsc. First, the histogram of the pixels inside the radiation field is computed, which is called the radiation histogram (port histogram). The speed point of the radiation tone scale curve, spr(er, dr), is determined from the peak of the radiation histogram, er, and the unchanged density code value dr. The scaled and contrast-adjusted input tone scale tsc is shifted to the 4500 4000 3500
Density code value
3000 2500 2000 1500 ds 1000 500 0
0
1000 e 0
2000 e s
3000
4000
LogE * 1000
Figure 10.10 A graphical view showing a comparison of the new shifted tone scale curve (solid line) and the original one (dashed line).
262
Tone Scale
Compute the histogram of the pixels inside the radiation field
Determine e r from the peak of the radiation histogram
Determine the unchanged density point dr
Shift t sc to the speed point spr (er , dr )
tr
Compute the histogram of the pixels outside the radiation field but inside the collimation field
Determine ec from the peak of the collimation histogram
Determine the unchanged density point dc
Shift t sc to the speed point spc (ec , dc )
tc
tsc
Figure 10.11 scale curves.
Block diagram determining the speed points for the radiation and collimation tone
selected speed point to output the radiation tone scale curve tr. Similarly, the histogram of the pixels outside the radiation field but inside the collimation field is calculated, which is called the collimation histogram. The speed point of the collimation tone scale curve, spc(ec, dc), is determined from the peak of the collimation histogram, ec, and the unchanged density code value dc. The collimation tone scale curve tc is obtained by shifting the scaled and contrast-adjusted input tone scale curve tsc to the selected speed point spc(ec, dc). Figure 10.12 shows an example of the radiation (solid line) and collimation (dashed line) tone scale curves designed using this method. 10.5.4
Contrast Enhancement
As mentioned in Section 10.1, contrast enhancement algorithms play an important role in tone scale. Here, a contrast enhancement algorithm, such as the one based on wavelet edges that is discussed in Chapter 8, is applied to the portal image before the tone scale operation. When the radiation tone scale curve tr is applied to the contrast-enhanced image, the output is the contrast-enhanced radiation field image, Icep,r. When the collimation tone scale curve tc is used, the output image is the contrast-enhanced collimation field image, Icep,c. Figure 10.13 shows these two images. From Figure 10.13(a), the image details inside the radiation field in the contrast-enhanced radiation field image appear prominent, while the portion outside of the radiation field appears “washed out” and less detailed. Similarly in Figure 10.13(b), the image details outside of the radiation field in the contrastenhanced collimation field image appear prominent, while the portion inside the radiation field appears dark and less detailed. Referring to Figure 10.12, this condition can be well understood. Looking at the radiation (port) tone scale, it maps the image values inside the radiation field to the full range of output density code value (output medium), while it maps the image values outside the radiation field (appear partially in the collimation histogram) to a small range of output density code value in lower density range. That is why the image portion outside of the radiation field in Figure 10.13(a) looks “washed out”
10.5 Application of Tone Scale to Enhanced Visualization in Radiation Treatment
263
4500 4000
Collimation tone scale Port tone scale
Density code value
3500
dc
2500 dr
1500 Collimation histogram
1000
500 0
0
500
Port histogram
1500 ec e r 2500
3500
4500
LogE * 1000
Figure 10.12 A graphical view showing an example of the radiation and collimation tone scale curves designed using the method from Section 10.5.3.
(a)
Figure 10.13 field image.
(b)
(a) Contrast-enhanced radiation field image; and (b) contrast-enhanced collimation
and less detailed. Similarly, the collimation tone scale maps the image values outside of the radiation field to a full range of output density code value, while it maps the
264
Tone Scale
image value inside the radiation field to a small range of output density code value in higher density range. Therefore, the image portion inside the radiation field in Figure 10.13(b) looks dark and less detailed. 10.5.5
Producing the Output Image
Figure 10.14 shows the diagram of combining the images Icep, r and Icep, c to produce the final output image Icep, o. First, the intermediate image I1 is produced by the sum of two signals, Icep,r × Ir and Icep,c × (1 − Ir), where Ir is the labeled radiation field image. This intermediate image I1 represents the image inside the radiation field using the enhanced radiation field image Icep, r and the image outside the radiation field using the enhanced collimation field image Icep,c. In the last stage, the black-surrounding collimation field is completed by summing two signals, I1 × Ic and V × (1 − Ic), where Ic is the labeled collimation field image and V is a constant value. For a 12-bit input image, V is selected as 4,095 to make the pixels outside the collimation field appear black. The combined output image is shown in Figure 10.15(b), and, for a comparison, the original input portal image is shown in Figure 10.15(a). The output image shows the image details both inside and outside of the radiation fields, and the appearance of both fields appear to be in a similar brightness range.
10.6
Tone Scale Performance Example It is extremely difficult to determine the imaging performance benefits associated with tone scale processes. In particular, the tone scale processes that are nonlinear lead to an image spectrum that does not represent the original image (e.g., a modified image spectrum). In many cases, a linear approximation is used to determine the benefits of tone scale processing. Here, a simple example of a linear stretch tone scale process is used to illustrate the performance benefits of tone scale processing. Ir
Ic
I cep,r
I1
I cep,o
I cep,c
V 1 − Ir
1 − Ic
Figure 10.14
Block diagram of producing the output image.
10.6 Tone Scale Performance Example
(a)
Figure 10.15
265
(b)
(a) Original input image; and (b) output combined image.
A 640 × 480 midwave (3 to 5 µm) infrared imager is used to search for targets in the wide field of view (WFOV). The F-number of the system is 4.0 and the aperture diameter is 15 cm. An optical transmission of 0.8 is assumed. The detector dimen11 sions are 20 × 20 µm and the detectivity is 5 × 10 Jones with an integration time that yields a 27 mK random spatio-temporal noise level. The display height is 15.24 cm with a viewing distance of 38 cm for a magnification of 3.8. The maximum display brightness is 10 foot-Lamberts (fL). A Beer’s Law atmospheric transmission of 0.9 per kilometer is assumed. The target contrast is 2K with a scene contrast temperature of 10K. An image is shown that was simulated on the upper-left side of Figure 10.16. An issue with the imaging system in the search mode is that the dark sky-to-ground contrast consumes a large part of the system dynamic range. The information required for good target detection is contained in the range of 80 to 150 gray levels. The image is then processed such that a linear stretch maps the 80 to 150 gray level range into the 0 to 255 gray level output such that the information is mapped into the display range of 10 fL. This remapping is equivalent to lowering the scene contrast temperature (i.e., the temperature range that maps onto the display). Since 70 gray levels (150 minus 80) are now mapped to 256 gray levels, the dynamic range was changed by a factor of 0.27 so that the scene contrast temperature is 2.7K instead of 10K. This change in dynamic range provides for a higher probability of detection and increased range performance as shown in Figure 10.17. As described previously, there are many tone scale enhancements that can provide performance benefits. The primary benefit is a change in target contrast compared to: (1) the noise level of the system, (2) the contrast limit of the eye, or (3) the contrast of the background clutter (for detection cases). Many of these techniques are nonlinear, such as histogram equalization, where it is extremely difficult to judge the target contrast compared to the background contrast and eye contrast lim-
16000 14000 12000 10000 8000 6000 4000 2000 00
Histogram
Tone Scale
Histogram
266
50
100 150 Gray level
200
250
300
4000 3500 3000 2500 2000 1500 1000 500 0
0
50
100 150 Gray level
200
250
300
Figure 10.16 Upper-left image is sensor output; lower-left graph is the corresponding histogram. Upper-right image uses a linear stretch tone scale process, and the lower-right graph is the corresponding histogram.
P(Detection)
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
5
10 Range (Km)
15
20
Figure 10.17 Sensor probability of detection as a function of range. Solid line is for the original image. Dashed line is for the modified image (with linear stretch).
its. In some cases, the performance has to be determined through perception experiments and other empirical means.
10.7
Summary In this chapter, we studied the concepts of tone scale and various techniques, including the piece-wise linear, nonlinear, and perceptual linearization tone scales. Among
10.7 Summary
267
these techniques, the perceptual linearization tone scale is a more flexible technique to produce a desired output image when the scene contains a large dynamic range. An example of applying the tone scale to a radiation treatment illustrated that if the tone scale is coupled with other image contrast enhancement algorithms, as discussed in Chapter 8, it improves the image quality. The tone scale performance measurement was also provided in an example.
References [1] Lee, H. -C., S. Daly, and R. L. Van Metter, “Visual Optimization of Radiographic Tone Scale,” Proceedings of SPIE 3036, Medical Imaging 1997: Image Perception, Newport Beach, CA, February 26, 1997, pp. 118–129. [2] Lee, H. -C., “Automatic Tone Adjustment by Contrast Gain-Control on Edges,” U.S. Patent 6285798, September 4, 2001. [3] Leachtenauer, J. C., and R. G. Driggers, Surveillance and Reconnaissance Imaging Systems—Modeling and Performance Prediction, Norwood, MA: Artech House, 2001. [4] Lee, H.-C., Introduction to Color Imaging Science, Cambridge: Cambridge University Press, 2005. [5] Central Imagery Office, USIS Standards and Guidelines, Appendix IV: Image Quality Guidelines, Vienna, VA: Central Imagery Office, 1995. [6] CIE (Commission Internationale de l’Eclairage), Publication CIE 15.2, International Lighting Vocabulary, 4th ed., Vienna: CIE Central Bureau, 1987. [7] Jahne, B., Digital Image Processing—Concepts, Algorithms, and Scientific Applications, 3rd ed., New York: Springer-Verlag, 1995. [8] Michaelis, L., and M. Menten, “Die Kinetic der Invertinwerkung,” Biochemische Zeitschrift, Vol. 49, 1913, p. 333. [9] Hood, D. C., et al., “Human Cone Saturation as a Function of Ambient Intensity: A Test of Models of Shifts in the Dynamic Range,” Vision Research, Vol. 18, 1978, pp. 983–993. [10] Hunt, R. W. G., “Revised Colour-Appearance Model for Related and Unrelated Colours,” COLOR Research and Application, Vol. 16, 1991, pp. 146–165. [11] Barten, G. J., “Physics Model for the Contrast Sensitivity of the Human Eye,” Proc. of SPIE Vol. 1666, Human Vision, Visual Processing, and Digital Display III, San Jose, CA, February 10, 1992, pp. 57–72. [12] NEMA/DICOM, Grayscale Standard Display Function, PS 3.14–2004, National Electrical Manufacturers Association, 2004. [13] Leachtenauer, J. C., G. Garney, and A. Biache, “Effects of Monitor Calibration on Imagery Interpretability,” Proceedings of the PICS Conference, Portland, OR, 2000, pp. 124–129. [14] Leachtenauer, J. C., and N. L. Salvaggio, “Color Monitor Calibration for Multispectral Displays,” Digest of Technical Papers, Society for Information Display, Vol. XXVIII, Boston, MA, 1997, pp. 1037–1040. [15] Muka, E., H. Blume, and S. Daly, “Display of Medical Images on CRT Soft-Copy Displays: A Tutorial,” Proc. SPIE, Vol. 2431, Medical Imaging 1995: Image Display, San Diego, CA, February 26, 1995, pp. 341–359. [16] Muka, E., and B. R. Whiting, “On the Human Visual System Intrascene Luminance Dynamic Range,” Proceedings of SPIE 4686, Medical Imaging 2002: Image Perception, Observer Performance, and Technology Assessment, San Diego, CA, February 26, 2002, pp. 169–176.
268
Tone Scale [17] Eichelberg, M., et al., “Consistency of Softcopy and Hardcopy: Preliminary Experiences with the New DICOM Extensions for Image Display,” Proc. of SPIE Vol. 3980, Medical Imaging 2000: PACS Design and Evaluation: Engineering and Clinical Issues, San Diego, CA, February 15, 2000, pp. 97–106. [18] Bressan, M. C., et al., “ Local Contrast Enhancement,” Proc. of SPIE-IS&T Vol. 6493, Color Imaging XII: Processing, Hardcopy, and Applications, San Jose, CA, January 30, 2007, pp. 64930Y-1–12.
CHAPTER 11
Image Fusion We conclude this book by discussing a relatively new advanced signal and image processing technique called image fusion. The concepts and advantages of image fusion are presented. Various image fusion algorithms and image quality metrics are considered. Imaging system performance methods with image fusion are also proposed.
11.1
Introduction Image fusion is a method or technique for producing a single image from two or more images, where the fused image has more complete information. If this method is performed correctly, its information is useful to human or machine interpretation. While image fusion algorithms have been around a number of years, the determination of imaging system performance with image fusion is relatively new. Interest is growing in the ability to quantify the additional performance offered by more than one sensor and by the addition of image fusion to an imaging system. The Lehigh University Web site [1] describes the advantages of image fusion in two ways (consider Figure 11.1). The sensor information produced in the imagery by two sensors is shown in Figure 11.1. Some of the information is redundant, which can improve the reliability of the system. That is, if one of the sensors is degraded in performance, the information in the other sensor can effectively provide the information. The output of the two sensors can also be altogether different and complementary. In this case, the additional information can improve the overall system capability. An example of this is fused television and infrared imagery. The television sensor provides outstanding daytime imagery from the visible spectrum and the infrared sensor provides good night vision. These two sensors are complementary in that the visible image is from reflected light, while a thermal infrared image is dominated by self-emitted radiation, and the signatures of objects can be quite different in the respective bands. Image fusion has a large number of applications and is used frequently when one sensor cannot satisfy the requirements associated with a specific system requirement. For example, military helicopter operations are often constrained by the numerous environmental conditions associated with low-light levels and poor weather. Low-altitude operations are extremely difficult when flying over featureless terrain with low-scene contrast. In addition, low altitude operations cause obscurations with wind-blown and recirculated dust that can cause a vision “brownout.” These conditions yield poor situational awareness and can result in
269
270
Image Fusion Sensor A
Sensor B
Redundant information
Complementary information
Figure 11.1
Sensor information.
loss of aircraft control or, worse, in loss of crew and aircraft. Even at higher altitudes, fog, clouds, rain, and snow can cause loss of image contrast and result in hazardous situations. In this particular application, image fusion has been used successfully in the combination of infrared imagers and low-light-level television sensors. The benefits of fusion have been demonstrated by flight trials conducted in these degraded environmental conditions [2].
11.2
Objectives for Image Fusion There are a number of objectives to be accomplished with fusion algorithms. First, all of the useful information from the source images should be extracted and used. Second, any artifacts associated with the image fusion process that will impact the system performance (whether consumed by humans or machine vision) should be minimized. Examples of these artifacts are misregistration blurring, mismatched dynamic range, and ripples due to edge enhancements. Sometimes, the fusion algorithm assumes that the input images are already registered spatially. In many cases, this assumption is not valid, and the misregistration compromises the performance of the fusion algorithm. Again, the goal of a good image fusion process is to combine the relevant information from two or more images into a single image of higher information content and fidelity. The relevance of the information depends on the application for which the fusion algorithm is being used. The optimal fusion approach is a subject for which many researchers continue to investigate [3]. There have been a number of investigations to determine metrics for choosing an optimal fusion algorithm for a given application. The metrics have provided methods for comprehensive, objective, image fusion performance characterization using a fusion evaluation framework. This framework has allowed for the quantification of information contributions by each input sensor, fusion gain, fusion information loss, and fusion artifacts. This framework and method has been demonstrated on the evaluation on an extensive set of multisensor images fused with a significant number of established fusion algorithms. There have also been subjective tests associated with the evaluation of fusion algorithms [4]. Relative fusion quality, fusion performance robustness, and personal
11.3 Image Fusion Algorithms
271
preference were investigated as difference aspects of fusion performance. In addition, these subjective tests were correlated to objective fusion metrics to determine the subjective relevance of the objective metrics. Petrovic [5] states that while it is relatively straightforward to obtain a fused image (such as an average of signals), assessing the performance of fusion algorithms is much more difficult. This is particularly true when the fused image is provided for display to a human operator. The most direct and reliable fusion evaluation method is human subjective tests, which are both expensive and time consuming. There have been some successful efforts toward modeling the visible signal differences as an objective evaluation of fusion algorithm performance. One such approach is the visible difference prediction process, which evaluates the probability that a signal difference between each of the inputs and the fused image can be detected by the human visual system. Petrovic has shown that this modeling approach is more accurate in the prediction of algorithm performance than other objective measures. As a guideline of what fusion algorithms should and should not do, Table 11.1 provides a summary of characteristics that was suggested by many of the fusion papers cited in this chapter.
11.3
Image Fusion Algorithms A large number of image fusion algorithms can be successfully applied to two or more images to result in a fused image. The simplest methods are those that operate on a pixel level of the two or more input images and results in a pixel-level fused image output. The superposition algorithm is one of these cases and is described later in this section. Pyramid methods [6] are multiresolution by nature. The image pyramids are a collection of lowpass or bandpass copies of an original image, where the passband and sample density are reduced in regular steps. Feature selection rules are used to construct a fused pyramid representation from the component images. The composite image is constructed by taking the inverse pyramid transform. Some
Table 11.1
Important Fusion Algorithm Characteristics
Fusion Should: Form a composite image from two or more images Preserve important details in the component images Improve image content Make it easier for a user to detect, recognize, and identify targets Increase situation awareness Increase system reliability (increase confidence) with redundant data Increase capability with complementary data Be consistent and stable with input image sequences Fusion Should Not: Introduce artifacts or distortion Introduce spurious pattern elements Decrease sensitivity or resolution Distract the observer from executing important tasks
272
Image Fusion
examples of pyramid fusion schemes are the Laplacian pyramid, the ratio of lowpass (ROLP) pyramid, the gradient pyramid, and the morphological pyramid. Discrete wavelet transform (DWT) fusion algorithms are also multiscale. DWT coefficients are fused in a pixel-by-pixel approach such as taking the average of the approximation coefficients at the highest transform scale and the larger absolute value coefficient of the detail coefficients at the various transform scales. An inverse DWT is taken to obtain the fused image. There are advanced DWT (aDWT) approaches that incorporate principal component analysis or morphological processing into the coefficient fusions. Here, we describe a number of fusion algorithms that can be used to fuse two or more images. The algorithms provided in this section are just a few that are available in the open literature and that have good prospects for being implemented in real-time. There are also many more algorithms in the literature that may provide for good performance in many applications. 11.3.1
Superposition
The simplest of image fusion algorithms is the superposition fusion method. The output image is a superposition of input images, where I( x , y) = αA( x , y) + βB( x , y)
(11.1)
where x and y are pixel locations and α and β are nonnegative scalars that add to unity. In its simplest form, α and β are both 0.5 and the output image is a 50 percent/ 50 percent combination of the two input images. This superposition algorithm is used frequently in image intensifier/infrared fused imagers, where the superposition is implemented with an optical beam splitter or even electronically. Sometimes when an optical beam splitter is used, the α and β are fixed and cannot be changed to adapt to imaging conditions. However, there are implementations where α and β can be changed continuously so that the observer can adapt to scene conditions. One example is an analog image summation where the amounts of image A and image B are controlled with a variable resistor. Such “image overlay” techniques offer the advantage of little or no computational cost for economical systems. 11.3.2
Laplacian Pyramid
There has been a great deal of success in the representation of images with localization in the spatial domain as well as localization in the spatial frequency domain. This is accomplished by decomposing images into spatial frequency component bandpass images. These component images represent the image in various fineness or details in a manner similar to the function of the human eye. One particular approach for the decomposition of images, the Laplacian pyramid, is versatile, efficient, and convenient [7]. The first step in constructing a Laplacian pyramid is to lowpass filter and decimate the original image Go(x,y) to obtain G1(x,y), a “reduced” version of Go(x,y) [8]. The reduction corresponds to a decrease in image resolution and sample density. Filtering is accomplished with a procedure similar to convolution with a family of local, symmetric weighting functions. The functions resemble the Gaussian proba-
11.3 Image Fusion Algorithms
273
bility distribution, so the sequence of images, Go(x,y), G1(x,y), G2(x,y),… Gn(x,y) is called the Gaussian pyramid. Consider an example for an image Go(x,y) with C columns and R rows of pixels. This image becomes the bottom or zero level of the Gaussian pyramid. Pyramid level 1 image G1(x,y) is a lowpass or reduced version of Go(x,y), where each pixel value of G1(x,y) is a weighted average of values from Go(x,y) within a 5 × 5 pixel window. Each pixel value in level 2 is then weighted values of level 1, and so on. The size of the window is not important to the method, but a 5 × 5 window provides good filtering and is relatively efficient. The level-to-level process is performed by the REDUCE function
[
]
Gl ( x , y) = REDUCE G l −1 ( x , y )
where for levels 0 < l < N and 0 < x < C and 0 < y < R Gl ( x , y) =
2
2
∑ ∑ w( m, n )G (2 x + m,2 y + n )
m =−2 n =−2
l −1
(11.2)
The C and R are the number of columns and rows at the particular level. For simplicity, the weighting function sometimes called the generating kernel, w(m,n), is chosen to be separable. The symbol w(m,n) = w′(m)w′(n), where convenient values are w′(0) = a, w′(1) = w′(−1) = 0.5, and w′(2) = w′(−2) = a/2. A typical value for a is 0.4. Pyramid construction is equivalent to convolving the original image with a set of weighting functions where the functions double in width for each level. The convolution acts as a lowpass filter, and the bandlimit is reduced by one octave for each level. The series of lowpass images is called the Gaussian pyramid. For some purposes, bandpass images are desired and can be achieved by subtracting image levels in the Gaussian pyramid. Since the levels differ in sampling density, it is necessary to interpolate between low-density samples before images are subtracted. Interpolation can be achieved using the reverse of the REDUCE process (a process called the EXPAND) process. Let G l , k be the image obtained by using the EXPAND operation on G l by k times. That is,
[
]
Gl , k ( x , y) = EXPAND G l , k −1 ( x , y)
(11.3)
or 2
Gl , k ( x , y) = 4 ∑
2
∑ w( m, n )G
m =−2 n =−2
l , k −1
x + m y + n , 2 2
where (x + m)/2 and (x + n)/2 contribute when integers only. The EXPAND process doubles the image size with each iteration. Now that the Gaussian pyramid has been defined, the EXPAND process provides interpolated images from the Gaussian pyramid, and the Laplacian pyramid can be defined as the difference between Gaussian pyramid images,
274
Image Fusion
[
]
L l ( x , y) = G l ( x , y) − EXPAND G l + 1 ( x , y )
(11.4)
The result is N levels of bandpass images where the images reduce in frequency content and sample density. A beneficial property of the Laplacian pyramid is that it is a complete image representation. The original image can be retrieved by reversing the Laplacian pyramid process. The Laplacian pyramid can be used for numerous applications to include data compression, image analysis, and image fusion. Image fusion can be accomplished a number of ways with the Laplacian pyramid, but a straightforward method is to first construct the Laplacian pyramid for two images, LAl ( x , y) and LBl ( x , y). Next, the pyramid images are compared for all nodes, and the largest absolute pixel value at x,y is taken in each image of the pyramid to result in a new Laplacian pyramid LC l ( x , y). The reverse Laplacian pyramid process is performed to achieve the fused image. An example of Laplacian pyramid fusion is shown in Figure 11.2. Two images of a high mobility military vehicle (HMMV) are shown, one in the shortwave infrared (SWIR) and one in the longwave infrared (LWIR). The SWIR image can clearly see through the windshield and shows a driver and a passenger. The LWIR image cannot see through the windshield, but it easily detects a soldier in the trees near the HMMV. The fused image shows the benefits of both SWIR and LWIR sensors in a single image. In Figure 11.3, the Gaussian pyramid image sequence is shown and the Laplacian image sequence is also shown for the SWIR image case. The image sequence demonstrates the levels of resolution and/or bandpass that are used to combine the images.
SWIR
Fused
LWIR
Figure 11.2 Laplacian pyramid example. (Images courtesy of Mike Prairie and Roy Littleton, U.S. Army Night Vision and Electronic Sensors Directorate.)
11.3 Image Fusion Algorithms
275
“Gaussian” pyramid
“Laplacian” pyramid
Blur
Original image
Diff
Reduce Expand Difference
Blur
Figure 11.3 Gaussian pyramid and Laplacian pyramid image sequences. (Images courtesy of Mike Prairie, U.S. Army Night Vision and Electronic Sensors Directorate.)
11.3.3
Ratio of a Lowpass Pyramid
Toet et al. [9] developed a hierarchical image-merging scheme based on multiresolution contrast decomposition—the ratio of a lowpass pyramid (ROLP). The algorithm preserves the details from the input images that are most relevant to visual perception. Judgments on the relative importance of pattern segments are based on their local luminance contrast. Input images are first decomposed into sets of light and dark blobs on different levels of resolution. Afterward, the contrasts of blobs at corresponding locations (and corresponding resolutions) are compared. Image fusion is achieved by selecting the blobs with maximum absolute luminance contrast. Fused images are constructed from the selected sets of blobs. The result is that, perceptually, the important details with high local luminance contrast are preserved. Constructing an ROLP pyramid is similar to the construction of a Laplacian pyramid. A sequence of images is obtained in which each image is a lowpass filtered and subsampled image of its predecessor. Let Go(x, y) be the original image, where the “o” subscript describes the zero level of the pyramid structure. The pyramid has node levels, l, where l is between 1 and N and N is the top level of the pyramid. The node of pyramid level l is obtained as a Gaussian-weighted average of the nodes at level l − 1 that are positioned within a 5 × 5 window centered on that node. The Gaussian pyramid is constructed as outlined in the previous section. Toet describes a ratio of images, R i ( x , y) =
Gi ( x , y)
[
]
EXPAND G i +1 ( x , y)
for 0 ≤ i ≤ N − 1
and R N ( x , y) = GN ( x , y)
(11.5)
276
Image Fusion
For every level, Ri(x,y) is a ratio of two successive levels in the Gaussian pyramid. Luminance contrast is defined as C = (L−Lb)/Lb = (L/Lb) − 1, where L is the luminance at a particular location and Lb is the luminance of the local background. If I(x, y) = 1 for all x, y, then the contrast can be written for the entire image C( x , y) =
Gi ( x , y)
[
]
EXPAND Gi + 1 ( x , y)
− I( x , y)
(11.6)
so that R i ( x , y) = C( x , y) + I( x , y )
(11.7)
Therefore, the sequence Ri(x, y) is called the ROLP or contrast pyramid. The ROLP pyramid can be recovered by
[
]
Gi ( x , y) = R i ( x , y)EXPAND G i + 1 ( x , y )
(11.8)
GN ( x , y) = R N ( x , y)
(11.9)
and
Toet describes the process of merging two images as a three-step process. First, an ROLP pyramid is constructed for each of the source images. Given that the two source images are registered and have the same dimensions, a ROLP pyramid is constructed for a composite image by taking values from the nodes on the component images. The selection rules can vary, but a convenient one is to take the maximum absolute contrast RA ( x , y) if RA l ( x , y) − I( x , y) ≥ RB l ( x , y) − I( x , y) RC l ( x , y) = l otherwise RB l ( x , y)
(11.10)
where RA and RB are the ROLP pyramids for the two source images, and RC represents the fused output image. The composite image is obtained through the EXPAND and multiply procedure. An example of an ROLP fusion is given in Figure 11.4 where an image intensifier image is fused with an LWIR image [10]. For objects with good thermal contrast, the LWIR image provides good detail and the image intensifier provides good reflectivity contrast in the near infrared. In this case, the image intensifier image is saturated in some regions, but the best contrast is maintained in the composite image. 11.3.4
Perceptual-Based Multiscale Decomposition
Jiang et al. [11] demonstrated a combination of multiple scale fusion and perceptualbased fusion for a means to combine infrared and visual images for human detection. In this paper, Jiang described some of the problems associated with previous fusion algorithms. Some of these algorithms were pyramid decomposition, such as the
11.3 Image Fusion Algorithms
277
Fused
Image intensifier
LWIR
Figure 11.4 ROLP pyramid example. (Images courtesy of Stephen Burks and Chris Howell, U.S. Army Night Vision and Electronic Sensors Directorate.)
Laplacian pyramid, ROLP pyramid, and gradient pyramid. A multiresolution pyramid by Toet et al. [9] utilized the maximum contrast information in the ROLP pyramid. These pyramids selected noisier parts of the image since these parts were high in contrast. Jiang goes on to describe many approaches to reduce the noise and provide better SNR over a wide range of noise levels and backgrounds. The approach offered by Jiang et al. combined multiscale fusion and perceptualbased fusion implemented in the spatial domain (operating on every pixel in the image). The algorithm is based on the contrast sensitivity function of the human visual system. The algorithm emphasizes how the human observer would perceive the differences in two images and decide whether to average image details or choose details from one image and discard the other. The perceptual contrast difference is described as D=
S vis − S IR S vis + S IR
(11.11)
where Svis and SIR are the saliencies calculated from a pixel and its neighbors in the visible and infrared, where saliency is the prominence of a pixel relative to the neighbors. The saliency can be described by S = I( x , y) − I ave ( x , y)
(11.12)
278
Image Fusion
where Iave(x,y) is the average intensity value of the pixel and its neighbors. It is calculated by summing the pixel and its neighbors and dividing by the total number of pixels summed. In an M × M scale fusion, I ave ( x , y) =
a b 1 I( x + s, y + t ) ∑ ∑ M × M s =− a t =− b
(11.13)
where a = (M − 1)/2 and b = (M − 1)/2. The scale M is the fusion scale and takes values of 3, 5, 7, 9, 11…resulting in the term multiple scale fusion. Once the perceptual contrast difference, D, is determined, it is compared to a threshold and in Jiang’s work, the threshold was set to 0.25 for all fusion scales. A high threshold yields noisy images and a low threshold yields a weak fusion algorithm. If D is greater than the threshold, the pixel with the higher saliency is retained and the pixel from the other image is discarded. If D is less than the threshold level, then weights are computed for each pixel W vis = 1 − 05 .
S IR and W IR = 1 − W vis S vis
(11.14)
and then the fused image pixel becomes I fused ( x , y) = W vis I vis ( x , y) + W IR I IR ( x , y)
(11.15)
To sharpen the features of the fused images, the Laplacian pyramid is used to enhance the fused image g( x , y) = f ( x , y) − ∇ 2 f ( x , y)
(11.16)
where g(x, y) is the enhanced image and f(x, y) is the image before enhancement. The Laplacian pyramid emphasizes discontinuities and deemphasizes slow varying gray level regions. The direct calculation of the Lapacian pyramid can be implemented by
[
g( x , y) = 8f ( x , y) − f ( x + 1, y) + f ( x − 1, y) + f ( x , y − 1) + f ( x , y + 1)
]
+ f ( x − 1, y − 1) + f ( x − 1, f + 1) + f ( x + 1, y − 1) + f ( x + 1, y + 1)
(11.17)
An example of multiscale decomposition fusion is provided in Figure 11.5. The lower-left image is an MWIR image and the lower-right image is an LWIR image. The image on the top is a fused MWIR/LWIR image. Note that both the solar glint in the MWIR image as well as the lines in the road in the LWIR image show up in the fused image. The fused image shows the high-contrast objects from each image, such as the manhole cover in the MWIR. 11.3.5
Discrete Wavelet Transform
The discrete wavelet transform is a multiresolution approach that can be used to decompose images into detail and average channels [12]. The channels contain all of the image information so that sensor fusion can be achieved using the wavelet coeffi-
11.3 Image Fusion Algorithms
279
Fused
MW
LW
Figure 11.5 Multiscale decomposition example. (Images courtesy of Khoa Dang, U.S. Army Night Vision and Electronic Sensors Directorate.)
cients. Unlike the Fourier transform, the wavelet image decomposition is a multiresolution process that contains both frequency and spatial information. The detail (γ) channels of the transform are comprised of localized orientation information about the image in different scales, and the average (α) channel of the transform contains general information about the image at different scales. It has been demonstrated that selectively ignoring coefficients within the detail and average channels can lead to large compression factors (200:1 or more). Also, fast algorithms have been developed for real-time implementation. Huntsberger and Jawerth [12] describe how standard wavelet decomposition of a signal, f, can be written in linear combinations of translated dilates of a single function ψ f( x) =
∑ ∑c v
v ,k
ψ v ,k ( x )
(11.18)
k
where
(
)
ψ v ,k ( x ) = 2 v 2 ψ 2 v x − k
(11.19)
for integers v and k. The process of mapping f(x) into the coefficients cv,k is called the wavelet transform of f(x). The function ψv,k is varied on a scale of 2 v at the location 2−vk, and the corresponding coefficient cv,k depends of the function, f(x), at that scale and location.
280
Image Fusion
Wavelet transform sensor fusion is accomplished by using the wavelet coefficients of the two source images in the integration of the two images. AND and OR operations are performed in the channels of the wavelet transform and can be accomplished in uncompressed or compressed forms. Also, the wavelet transform approach lends itself to real-time operation due to algorithm implementations in massively parallel processing. The discrete wavelet transform is not shift-invariant since the image is downsampled (each wavelet series image is downsampled by a factor of 4). That is, the output depends on shifts in the image so artifacts can be present when two images are fused that are not perfectly aligned (registered). The shift invariant discrete wavelet transform (SIDWT) is a solution to the shift variance problem, where the downsampling operation is removed from the wavelet decomposition process. However, since the images are not reduced in size, the operation is very inefficient, as the number of calculations grows significantly compared to the DWT. The SIDWT is very applicable to component images that are not exactly registered. Figure 11.6 shows a DWT fusion example. Image 1 shows a scene with a target in the foreground, and image 2 shows the same scene but with the target in the background. The DWT fusion provides an image where both target positions appear in the output image. A second DWT fusion example is shown in Figure 11.7 where a visible image is fused with an LWIR image. The fused image has characteristics associated with both reflective properties of visible imagers and emissive properties of LWIR imagers.
11.4
Benefits of Multiple Image Modes Previous sections of this chapter discussed the general benefits of image fusion algorithms. This section presents the specific benefits of fusing images from two or more spectral bands. For example, the fusion of infrared and visible imaging sensor out-
Image 1
Image 2
Fused image
Figure 11.6 Discrete wavelet transform example. (Images courtesy of Jonathan Fanning, U.S. Army Night Vision and Electronic Sensors Directorate.) Image 1 has target in the foreground. Image 2 has target in the background. Fused image has both targets.
11.5 Image Fusion Quality Metrics
281
Fused
Visible
LWIR
Figure 11.7 A second discrete wavelet transform example. (Images courtesy of Richard Moore, University of Memphis.)
puts into a single video stream for human consumption has been demonstrated to improve concealed weapon detection [13]. There are numerous benefits associated with the fusion of various bands that have been either quantified or demonstrated. Table 11.2 describes a wide array of benefits and the corresponding input imager modes. Also given are the references for further information on the demonstration and/or performance quantification. This table is not exhaustive—it is a brief summary of some of the literature associated with fusion researchers working in applications.
11.5
Image Fusion Quality Metrics One method for evaluating the effectiveness of fusion algorithms is image-quality metrics. There are a number of image metrics that are used to determine the quality of the image-fusion process. These metrics have not been calibrated to human performance in a general sense, but they are quantitative calculations that can measure the additional contrast associated with the fusion of two or more input images. Many papers in the literature compare fusion algorithms with image-quality metrics, and a few papers provide calculations of image-quality metrics on various fusion algorithms. The assessments in most cases are qualitative in whether the image-quality metric is a good measure of fusion algorithm performance. In this section, we provide a few examples of image-quality metrics. These examples provide some typical methods for algorithm assessment and are in no way
282
Image Fusion Table 11.2
Benefits of Image Fusion
Input Modes
Fusion Benefit
Reference
Infrared and visible
Day and night capability Feature enhancements Concealed weapon detection
[11]
Infrared and low-lightlevel TV
Improved pilotage in poor weather/severe environments
[2]
MWIR and LWIR
Target/decoy discrimination Defeat of IR countermeasures Maritime clutter reduction
[14]
Infrared and millimeter wave
Penetration of fog, snow, smoke [16] for aircraft Improved landing capability under low-contrast conditions
Infrared and image intensifier
Improved pilotage image interpretability Enhanced night driving
[17]
Visible image sequences
Three-dimensional image reconstruction Increased depth of microscope focus
[19]
Magnetic resonance and computed tomography
Improved spinal diagnostics
[21]
Landsat and SAR
Improved rainforest/vegetation mapping
[22]
Various resolution panchromatic satellite images (IKONOS and Landsat)
Improved visual interpretation More precise segmentation and classification
[23]
SAR and ISAR
Improved target detection
[24]
[13]
[15]
[18]
[20]
exhaustive. Also, no indication is provided on which of these examples provide the best method for algorithm assessment. We begin with a few image metrics that require a reference (a perfect image for comparison) image, which is usually an ideal fused image. We then describe a few image-quality metrics where no reference is required in which the goal is to determine the information that has been transferred from the input image to the fused image. 11.5.1
Mean Squared Error
The mean squared error (MSE) [25] between two images (the reference image R and the fused image F) is N
MSE(R, F ) =
M
∑ ∑ [R(i, j) − F (i, j)]
2
i =1 j =1
M×N
where the root mean-squared error (RMSE) is the square root of (11.20).
(11.20)
11.5 Image Fusion Quality Metrics
11.5.2
283
Peak Signal-to-Noise Ratio
The peak signal-to-noise ratio (PSNR) [26] between two images in decibels (dB) is then computed using the RMSE 255 PSNR(R, F ) = 20 log RMSE(R, F ) 11.5.3
(11.21)
Mutual Information
Mutual information (MI) measures the degree of dependence between two random variables, A and B. Mutual information is described by the Kullback-Leibler measure [27] I AB ( a, b ) =
pAB ( a, b )
∑ p ( a, b ) ⋅ log p ( a) ⋅ p (b ) AB
a ,b
A
(11.22)
B
where pAB(a, b) is the joint distribution and pA(a) . pB(b) is the case of completely independent distributions. For two input images A and B, and a fused image F, the amount of information that F contains from A is IF , A ( f , a) =
pFA ( f , a)
∑ p ( f , a) ⋅ log p ( f ) ⋅ p ( a) FA
f,a
F
(11.23)
A
and the amount of information that F contains from B is IF , B ( f , b ) =
pFB ( f , b )
∑ p ( f , b ) ⋅ log p ( f ) ⋅ p (b ) FB
f ,b
F
(11.24)
B
Finally, the image performance can be estimated from MFAB =
11.5.4
IF , A ( f , a) + IF , B ( f , b ) 2
(11.25)
Image Quality Index by Wang and Bovik
The image quality index introduced by Wang and Bovik [28] begins with two real-valued sequences: x = (x1, x2, …, xn) and y = (y1, y2, …, yn) where σ 2x =
1 n 1 n 2 ( x i − x ) and σ xy = ∑ ∑ ( x i − x )( y i − y ) n − 1 i =1 n − 1 i =1
(11.26)
In this case, x is the mean of x, σx is the variance of x, and σxy is the covariance of x and y. Therefore, the following is a measure between −1 and +1 that describes the similarity between x and y
284
Image Fusion
Q0 =
(x
4σ xy xy + y2
2
)(σ
2 x
+ σ 2y
(11.27)
)
which can be written Q0 =
σ xy σxσy
2σ x σ y 2 xy ⋅ 2 2 σ x + σ 2y x +y
⋅
(11.28)
2
The first component of (11.28) is the correlation coefficient between x and y. Given that xi and yi are image gray level values, the second term is the luminance distortion. The third factor is the contrast distortion. Qo Qo(x,y) is usually measured on a local scale and then combined for an overall measure. Qo 1 when x and y are identical. Wang and Bovik used a sliding window approach starting from the top left corner of two images, a and b with window size of n. This window was moved pixel by pixel across the image all the way to the bottom right of the images. Given a window w, the local image quality Qo(a, b|w) is calculated for all a(i, j) and b(i, j), where pixel i, j defines the position of the sliding window. The global image quality is taken by averaging the local qualities Q o ( a, b ) =
11.5.5
1 W
∑ Q (a, b w )
(11.29)
o
w ∈W
Image Fusion Quality Index by Piella and Heijmans
Piella and Heijmans [29] used the image quality index and tailored it for fusion algorithm evaluation. Their method does not require a reference image, so it is easily applied to fusion. They defined a saliency, s(a|w), of an image within some window w. The saliency describes some relevance of the image within the window and depends on contrast, sharpness, or entropy. Then, a local weight λ(w) between 0 and 1 is computed based on the relative importance of image a to image b λ( w ) =
s(a w )
(11.30)
s(a w ) + s(b w )
λ(w) is high when image a is more important. The fusion quality index is then Q( a, b, f ) =
1 W
∑ {λ( w )Q (a, f w ) + [1 − λ( w )]Q (b, f w )} o
o
(11.31)
w ∈W
where a and b represent two input images, respectively, and f represents the output fused image. In regions where image a has a large saliency, the fusion quality index is largely determined by image a (and similarly for image b). This metric is useful for determining the amount of salient information transferred from the two input images to the fused image.
11.5 Image Fusion Quality Metrics
285
Piella and Heijmans also developed variants of this fusion quality index to include a weighted fusion quality index that considers the human visual system (HVS) and weights the saliency values for visual tasks. Another modification they provided is the inclusion of edge information. That is, the weighting for edge information is increased to result in the edge-dependent fusion quality index. Both of these modifications can be found in [29]. 11.5.6
Xydeas and Petrovic Metric
The Xydeas and Petrovic [30] metric measured the amount of edge information transferred from the source image to the fused image. A Sobel edge operator is used to calculate the strength of the edge, g(n,m), and the orientation of the edge, (n, m), for each pixel in the input and output images. The “changes” in the strength and orientation values GAF(n,m) and AAF(n,m) of an input image A with respect to the fused image F are defined as G AF (n, m) =
g F (n, m)
g A (n, m)
if g A (n, m) > g F (n, m),
A AF (n, m) =
g A (n, m) g F (n, m)
otherwise
α A (n, m) − α(n, m) − π 2
(11.32)
(11.33)
π2
These measures are then used to calculate preservation values for both strength and orientation Q AF g (n, m) =
Γg
(11.34)
k G AF ( n , m ) − σ g ] 1+ e g[
and Q αAF (n, m) =
1+ e
(
Γα
kα A AF ( n , m ) − σ α
(11.35)
)
The constants, Γg, kg, σg, Γα, kα, σα determine the shape of a sigmoid used to form the strength and orientation quantities. The overall edge preservation values are then AF AF Q AF (n, m) = Q AF ≤1 g (n, m) ⋅ Q α (n, m), 0 ≤ Q
(11.36)
A metric that is normalized and that fuses A and B into F is calculated by N
Q
AB F
(n, m) =
M
∑ ∑ Q (n, m)w (n, m) + Q (n, m)w (n, m) AF
BF
A
B
n =1 m =1
N
M
∑ ∑ w (n, m) + w (n, m) A
n =1 m −1
B
(11.37)
286
Image Fusion
Note that the edge preservation values are weighted by coefficients that describe the importance of edge components in the input images. Also, note that this metric describes the edge information transfer and does not describe region information transfer.
11.6
Imaging System Performance with Image Fusion There are two standard military imager performance models: one for target acquisition (currently called the target task performance [TTP] model) and one for surveillance and reconnaissance performance called the general image quality equation (GIQE). Both of these models are used to predict sensor performance for some given military task. In this section, we focus on the target acquisition approach. Similar techniques can be implemented in the surveillance/reconnaissance models. To understand the performance of an imaging system with fusion, we first review the performance prediction of a broadband imager, such as an infrared system. In Figure 11.8, a performance metric (the TTP value, V) is determined by calculating resolvable frequencies (cycles) on target. The integral gives cycles per milliradian and the s/R term is the number of milliradians the target subtends. The product gives a “resolvable cycles on target” calculation that is compared with a required value for tasks, such as target detection, recognition, and identification. The integral accumulates the root of “how much contrast my target has” over “how much contrast my sensor needs” to see the target. The result of the TTP metric, when compared to the required value, provides a probability of task performance as a function of range as shown in Figure 11.9. This approach is described in detail in Chapter 3. When two sensors are used simultaneously, such as an image intensifier and infrared imager, the range curves can be plotted on one graph as shown. One method of predicting the performance of fused imagers is the following. When images from these two sensors are fused, the highest performance range prediction is taken as the performance of the fused imager. Under various conditions, such as different targets, difference atmospheric transmission, and different turbulence values, the performance curves for each sensor change and the highest performance curve is taken as the system performance. This is a very System contrast threshold function 1
Apparent target contrast
0.1 Contrast 0.01 0.001 0
ξ
cut V= ξlow
∫
5 10 Cycles/mrad
15
s dξ CTF (ξ ) R C (ξ )
Figure 11.8 Current method for calculating resolvable cycles on target to determine range performance. Apparent target contrast is assumed to be constant across frequency.
11.6 Imaging System Performance with Image Fusion
287
Probability of identification 1 0.8 0.6
Sensor 2
0.4 0.2
Sensor 1
0 0
Figure 11.9
5
Range (km)
10
15
Range performance curves for two sensors.
cursory approach for the treatment of sensor fusion performance, and it does not account for frequency-dependent signal fusion. However, it is a simple method nevertheless. Using the “best” sensor performance curve of Figure 11.9 does not account for the performance benefit that is seen from complementary data in the source imagery. Work by Moore [personal communication with Richard Moore, 2007] demonstrates that two images with forced complementary data can provide an increase in performance beyond the best source image. Consider the process shown in Figure 11.10. Two images of number sequences are generated where the content of the images are completely complementary. A white noise image is filtered and thresholded, and the result is a noise mask. The complement of this noise mask is also used and the two masks are multiplied as shown by the number sequence image resulting in two number sequence images made up of completely complementary information. Observers are required to identify the numbers shown as a function of font size for the complement images. Then, they are required to do the same for a fused image (of the two complement images.) The fusion process image (DWT) is shown in Figure 11.11 along with two source images. These images were run through a forced-choice experiment to determine an observer’s ability to identify the numbers and the probability was plotted as a function of font size (relative font
White noise
Thresholded Clean numbers Laplacian pyramid images layer of noise
Ones minus the above image
Mask applied to make source band 1
Mask applied to make source band 2
Figure 11.10 Image examples of forced complementary data. (Images courtesy of Richard Moore, University of Memphis.)
288
Image Fusion
Source image 1
Source image 2
DWT fused image
Figure 11.11 Two complementary images and a DWT-fused image. (Images courtesy of Richard Moore, University of Memphis.)
size). The results are shown in Figure 11.12. While this experiment was a “contrived” experiment where the information in the images was forced to be complementary, the experiment demonstrates a performance advantage with complementary image data. This experiment shows that the approach of taking the highest performance curve of Figure 11.9 does not provide accurate performance estimates in sensors where there is a significant complementary information component. Note that the contrast of the target in Figure 11.8 is assumed to be constant over frequency. This approach for single imager output performance has worked well over the past 50 years, and it is possible that for small targets, the frequency distribution is so broad that the assumption is valid. However, recent experiments such as facial identification [31] have required a target contrast with spatial frequency dependency. Consider the case of apparent target contrast where it is a function of spatial frequency as shown in Figure 11.13 and demonstrated as a new metric by Vollmerhausen [32]. In this case, the target spectrum is compared to the required sensor contrast and integrated to provide a task performance metric. This approach has been demonstrated to provide a more general approach to target acquisition performance prediction. With the frequency-dependent performance metric, fusion performance can be better predicted. Reynolds proposed a metric that is calculated in the same manner that good fusion algorithms are implemented (on a frequency-dependent basis) [personal communications with Dr. Joe Reynolds, 2007]. Consider Figure 11.14, where two sensors are very different in system contrast threshold performance (i.e., note
Probability of identification
1 0.8 Image 1
0.6
Image 2 0.4
DWT
0.2 0 0
2
4
6
Font size
Figure 11.12 Probability of number identification as a function of font size. (Courtesy of Richard Moore, University of Memphis.)
11.6 Imaging System Performance with Image Fusion
289
System contrast threshold function 1 0.1 Contrast
Apparent target contrast function of spatial frequency
0.01 0.001 0
ξ
cut V= ξlow
5 10 Cycles/mrad
15
s dξ R CTF (ξ ) C (ξ )
∫
Figure 11.13 Modification to the target task performance metric (TTP) allows for the target spectrum (target contrast as a function of spatial frequency) to be included in the calculation.
Sensor 1 with Target Signature 1
ξ
cut V = ξlow
∫
0.1 Contrast 0.01 0.001
0
5
10
Sensor 2 with target signature
0.1 Contrast 0.01
Fused
0 5 10 Cycles/mrad
s R
| d ξ
Probability of identification 1 0.8 0.6 0.4 0.2 0
1
0
C (ξ ) CTF (ξ )
15
Cycles/mrad
0.001
MAX |
5
10 Range (km)
15
15
Figure 11.14 Range performance for fusion can include the maximum target/sensor spatial frequency contributions to the output image.
that the CTF for the two sensors are different). Two different target signatures are represented in the emission and reflection characteristics by a contrast that is a function of spatial frequency (i.e., the target contrast as a function of spatial frequency is different for the two sensors). The TTP is calculated as a combination of the ratio of the apparent target contrast to the required sensor contrast for both sensors. The maximum ratio is taken at each spatial frequency, since the fusion algorithms retain the best contrast from each of the sensor images. The maximum is integrated into a single image; therefore, it is counted in the sensor performance prediction. Where the maximum ratio is the combination rule here, there may be other rules implemented here in the same manner as fusion rules. This method has not been proven, but it will be investigated in the future.
290
11.7
Image Fusion
Summary Image fusion techniques are of significant interest in practice. We have considered a number of image fusion algorithms and image quality metrics. Image performance modeling of image fusion is a new research area. We have discussed performance modeling using TTP in the target acquisition approach and illustrated its merit as a possible method for determining fused sensor performance. In terms of military imaging systems, image fusion is in its infancy. Many systems have been fielded under rapid prototyping conditions and there has been some work in understanding the performance of these systems. There is a great deal of work to be done in the future to answer many questions. How much does image fusion improve target detection, target identification, situational awareness, mobility, and the like? What is the best way to measure the performance benefit of situational awareness, mobility, and nontraditional sensor performance? Most researchers and system developers believe that image fusion will impact military systems significantly through increased task performance and decreased false alarms. This is an area of research that is fertile ground for significant advances in the near future.
References [1] [2]
http://www.eecs.lehigh.edu/SPCRL/IF/image_fusion.html/. Smith, M., and J. Sadler, “Flight Assessment of Real-Time Resolution Image Fusion System for Use in Degraded Visual Environments,” Proceedings of SPIE, Vol. 6559, Enhanced and Synthetic Vision 2007, Orlando, FL, April 2007, pp. 65590K-1–11. [3] Petrovic, V., and C. Xydeas, “Objective Image Fusion Performance Characterization,” Proceedings of 10th IEEE International Conference on Computer Vision, Vol. 2, Beijing, China, October 15–21, 2005, pp. 1866–1871. [4] Petrovic, V., “Subjective Tests for Image Fusion Evaluation and Objective Metric Validation,” Information Fusion, Vol. 8, No. 2, April 2007, pp. 208–216. [5] Petrovic, V., Evaluation of Image Fusion Performance with Visible Differences, Book Series Lecture Notes in Computer Science, Berlin/Heidelberg: Springer, Vol. 3023/2004, 2004, pp. 380–391. [6] Zheng, Y., E. Essock, and B. Hansen, “Advanced Discrete Wavelet Transform Fusion Algorithm and Its Optimization by Using the Metric of Image Quality Index,” Optical Engineering, Vol. 44, No. 3, March 2005, pp. 037003-1–10. [7] Adelson, E., et al., “Pyramid Methods in Image Processing,” RCA Engineer, Vol. 29, No. 6, November/December 1984, pp. 33–41. [8] Burt, P. J., and E. H. Adelson, “The Laplacian Pyramid As a Compact Image Code,” IEEE Transactions on Communication, Vol. COM-31, 1983, pp. 532–540. [9] Toet, A., L. J. van Ruyven, and J. M. Valeton, “Merging Thermal and Visual Images by a Contrast Pyramid,” Optical Engineering, Vol. 28, No. 7, 1989, pp 789–792. [10] Howell, C., et al., “An Evaluation of Fusion Algorithms Using Image Fusion Metrics and Human Identification Performance,” Proceedings of SPIE, Vol. 6543, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing XVIII, Orlando, FL, April 30, 2007, pp. 65430V-1–11. [11] Jiang, L., et al., “Perceptual-Based Fusion of IR and Visual Images for Human Detection,” Proceedings of 2004 International Symposium on Intelligent Multimedia, Video, and Speech Processing, Hong Kong, October 2004, pp. 514–517.
11.7 Summary
291
[12] Huntsberger, T., and B. Jawerth, “Wavelet-Based Sensor Fusion,” Proceedings of SPIE, Vol. 2059, Sensor Fusion VI, Boston, MA, September 1993, pp. 488–498. [13] Toet, A., “Color Image Fusion for Concealed Weapon Detection,” Proceedings of SPIE, Vol. 5071, Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Defense and Law Enforcement II, Orlando FL, April 2003, pp. 372–379. [14] Muller, M., O. Schreer, and M. Lopez Saenz, “Real-Time Processing and Fusion for a New High Speed Dual Band Infrared Camera,” Proceedings of SPIE, Vol. 6543, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing XVIII, Orlando, FL, April 30, 2007, pp. 654310-1–9. [15] Toet, A., “Detection of Dim Point Targets in Cluttered Maritime Backgrounds Through Multisensor Image Fusion,” Proceedings of SPIE, Vol. 4718, Targets and Backgrounds VIII: Characterization and Representation, Orlando, FL, August 2002, pp. 118–129. [16] Sweet, B., and C. Tiana, “Image Processing and Fusion for Landing Guidance,” Proceedings of SPIE, Vol. 2736, Enhanced and Synthetic Vision 1996, Orlando, FL, May 1996, pp. 84–95. [17] Smith, M., and G. Rood, “Image Fusion of II and IR Data for Helicopter Pilotage,” Proceedings of SPIE, Vol. 4126, Integrated Command Environments, San Diego, CA, July 2000, pp. 186–197. [18] Bender, E., C. Reese, and G. Van der Wal, “Comparison of Additive Image Fusion vs. Feature-Level Image Fusion Techniques for Enhanced Night Driving,” Proceedings of SPIE Vol. 4796, Low-Light-Level and Real-Time Imaging Systems, Components, and Applications, Seattle, WA, July 2003, pp. 140–151. [19] Jung, J., et al., “Fusion-Based Adaptive Regularized Smoothing for 3D Reconstruction from Image Sequences,” Proceedings of SPIE Vol. 4310, Visual Communications and Image Processing 2001, San Jose, CA, Jan. 2001, pp. 238–244. [20] Cheng, S., et al., “An Improved Method of Wavelet Image Fusion for Extended Depth of Field Microscope Imaging,” Proceedings of SPIE Vol. 6144, Medical Imaging 2006, San Diego, CA, March 2006, pp. 61440Q-1–8. [21] Hu, Y., et al., “MR and CT Image Fusion of the Cervical Spine: A Non-Invasive Alternative to CT-Myelography,” Proceedings of SPIE, Vol. 5744, Medical Imaging 2005: Visualization, Image-Guided Procedures, and Display, San Diego, CA, April 2005, pp. 481–491. [22] Moigne, J., N. Laporte, and N. Netanyahu, “Enhancement of Tropical Land Cover Mapping with Wavelet-Based Fusion and Unsupervised Clustering of SAR and Landsat Image Data,” Proceedings of SPIE, Vol. 4541, Image and Signal Processing for Remote Sensing VII, Toulouse, France, December 2001, pp. 190–198. [23] Schneider, M., O. Bellon, and H. Araki, “Image Fusion with IKONOS Images,” Proceedings of SPIE, Vol. 4881, Sensors, Systems, and Next-Generation Satellites VI, Agia Pelagia, Crete, Greece, September 2002, pp. 521–530. [24] Papson, S., and R. Narayanan, “Multiple Location SAR/ISAR Image Fusion for Enhanced Characterization of Targets,” Proceedings of SPIE, Vol. 5788, Radar Sensor Technology IX, Orlando, FL, May 2005, pp. 128–139. [25] Zhou, H., M. Chen, and M. Webster, “Comparative Evaluation of Visualization and Experimental Results Using Image Comparison Metrics,” Proceedings of IEEE VIS Visualization 2002, Boston, MA, November 2002, pp. 315–322. [26] Canga, E., et al., “Characterization of Image Fusion Quality Metrics for Surveillance Applications over Bandlimited Channels,” IEEE 2005 8th International Conference on Information Fusion, Philadelphia, PA, July 25–29, 2005, pp. 483–489. [27] Guihong, Q., Z. Dali, and Y. Pingfan, “Medical Image Fusion by Wavelet Transform Modulus Maxima,” Optics Express, Vol. 9, No. 4, 2001, p. 184. [28] Wang, Z., and A. Bovik, “A Universal Image Quality Index,” IEEE Signal Processing Letters, Vol. 9, No. 3, March 2002, pp. 81–84.
292
Image Fusion [29] Piella, G., and H. Heijmans, “A New Quality Metric for Image Fusion,” Proceedings of IEEE International Conference on Image Processing, Barcelona, Spain, September 14–17, 2003, pp. 173–176. [30] Xydeas, C., and V. Petrovic, “Sensor Noise Effects on Signal-Level Image Fusion Performance,” Information Fusion, Vol. 4, No. 3, September 2003, pp. 167–183. [31] Driggers, R., et al., “Direct View Optics Model for Facial Identification,” Proceedings of SPIE Vol. 6543, Infrared Imaging Systems: Design, Analysis, Modeling, and Testing XVIII, Orlando, FL, April 30, 2007, pp. 65430F-1–9. [32] Vollmerhausen, R., and A. Robinson, “Modeling Target Acquisition Tasks Associated with Security and Surveillance,” Applied Optics, Vol. 46, No. 20, June 2007, pp. 4209–4221.
About the Authors S. Susan Young is a research scientist with the U.S. Army Research Laboratory. Previously, she was a research associate at the Department of Radiation Medicine at Roswell Park Cancer Institute. Later, she became a senior research scientist at Health Imaging Research Laboratory, Eastman Kodak Company. She received a B.S and an M.S. at South China University of Technology, Guangzhou, China, and a Ph.D. in electrical engineering at SUNY Buffalo. Dr. Young has published more than 50 technical papers and holds six patents for inventions that are related to medical diagnostic imaging, image compression, image enhancement, pattern analysis, computer vision, and image super-resolution. Ronald G. Driggers received a doctorate in electrical engineering from the University of Memphis in 1990. He is currently the director of the Modeling and Simulation Division at the U.S. Army’s Night Vision and Electronic Sensors Directorate (NVESD). Dr. Driggers is the author of three books on infrared and electro-optics systems and has published more than 100 papers. He is also a Fellow of the SPIE and MSS. Eddie L. Jacobs is an assistant professor in the Department of Electrical and Computer Engineering at the University of Memphis. He was previously at the U.S. Army Night Vision and Electronic Sensors Directorate, Fort Belvoir, Virginia, where he led a team of engineers and scientists in research into modeling the performance of passive and active imaging sensors. His research interests are in novel imaging sensor development, electromagnetic propagation and scattering, and human performance modeling. He received a B.S. and an M.S. in electrical engineering from the University of Arkansas, and a Doctor of Science in electrophysics from the George Washington University.
293
Index 2-D chirp test pattern, 120–22 2-D Delta test pattern, 119–20 2-D radically symmetric functions, 81–82 3-D noise, 37, 56 components, 37 coordinate system, 36 See also Noise
A ACQUIRE, 42 Adaptive design, 192–93 defined, 192 examples, 196–98 procedure, 195–99 procedure summary, 196 See also P-deblurring filter Adaptive filtering, 199 Adaptive NUC, 238–42 spatio-temporal processing, 240–42 temporal processing, 238–40 See also Nonuniformity correction (NUC) Aided target recognizer (AiTR), 11 Antialias image resampling, 114–25 model, 114 resampling filters, 118 resampling filters performance analysis, 119–25 resampling systems design, 117–18 rescale implementation, 115–17 Aperture problem, 138 Army Materiel Systems Analysis Activity (AMSAA), 219 Automatic target recognizer (ATR), 11 Average flux, 235
B Band-averaged detectivity, 36 Bandwidth reducing factor, 114 Barten CTF, 45 Basic imaging systems, 11–15 Blurring function, 145, 180 Brightness, 253, 254 distribution, 183 model, 254 perceived, 253
Butterworth window, 88–89
C Calibration error sources, 237–38 image quality metrics, 281 NEMA/DICOM function, 255 with second-order assumption, 237 Calibration-based NUCs, 7, 232, 237–38 Campbell-Robson chart, 44 Cathode-ray-tube (CRT), 107, 243 Chirp function spatially encoded, 121 test pattern, 120–22 Circular convolutional leakage effects, 118 CLEAN algorithm, 130 brightness distribution, 183 defined, 183 dirty beam, 183 dirty map, 183 procedure, 184 Collimation fields defined, 256 radiation, locating/labeling, 257 Collimation tone scale, 263 Collimation tone scale curve, 258 Comb function, 29 Combined Arms Operation in Urban Terrain (CAOUT) document, 217 Continuous Fourier transform, 78–79 Contrast factor, 260 increasing, 208 luminance, 276 perceived, 252 physical, 252–53 sketching, 208–9 tone scale curve, 260–61 See also Image contrast enhancement Contrast-reconstruction presentation, 216 Contrast threshold function (CTF), 16, 170 Barten, 45 defined, 43 measuring, 44 for simple system, 58
295
296
Convolution property, Fourier transform, 73, 74 Correlation interpolation, 134 Correlation motion estimation, 142–43 shift property of Fourier transform, 142–43 theorem, 142 within subpixel accuracy, 143 See also Motion estimation Cubic interpolation, 112–13 Cutoff frequency, 121 Cutoff frequency point, 188
D Day images analysis, 225–26 results, 223 Deblurring, 6–7, 179–204 CLEAN algorithm, 183–84 experimental approach, 200–202 introduction, 179–80 P-deblurring filter, 184–99 perception experimental result analysis, 103–4 performance measurements, 199–204 regulation methods, 181 Van Cittert filter, 182–83 Wiener filter, 181–82 Delta test pattern, 119–20 Detector nonuniformity, 231–32 Detector spatial filters, 22–24 impulse response, 23 sample-and-hold function, 23–24 transfer function, 23 DICOM GSDF model, 254 Differentiation, 69–72 example, 70–72 illustrated, 71 proof, 72 See also Fourier transform Diffraction filter, 21 super-resolution for, 129–30 Digital Imaging and Communications in Medicine (DICOM), 254 Digital signal processing (DSP) products, 121 Direct design, 188–92 defined, 188 examples, 189–92 for FLIR tank image, 192 for synthetic bar image, 189 See also P-deblurring filter Dirty beam, 183
Index
Dirty map, 183 Discrete Fourier transform (DFT), 190 defined, 78 two-dimensional, 82–83 Discrete wavelet transform (DWT), 96 defined, 95, 96 fused images, 288 fusion algorithms, 272 in image fusion, 278–80 shift invariant (SIDWT), 280 Display filtering, 25–26 Display setting, 202 Duality, Fourier transform, 72 Dyadic wavelet transform, 96
E Edge-dependent fusion quality index, 285 Electromagnetic spectrum, 9 Electronics filtering, 24–25 Electro-optical imaging systems, 38 Error-energy reduction, 149–52 algorithm, 149, 150 bandwidth phenomenon, 150 defined, 149 illustrated, 151–52 EXPAND process, 273 Experimental deblurring approach, 200–202 design, 200–202 display setting, 202 infrared imagery target set, 200 observer training, 202 Eye aided, 47–49 CTF, 43–45 MTF, 26, 58 unaided, 43–47 Eye-brain, 49
F Far infrared waveband, 9 Field of view (FOV), 25, 130 defined, 25 wide (WFOV), 218, 265 See also Sensors Filters deblurring, 180 detector spatial, 22–24 diffraction, 21 FIR, 24, 83–86, 121, 127 Fourier-based, 86–90 IIR, 84 nonrecursive, 83 radially symmetric, 87–89
Index
reconstruction, 31 recursive, 84 Van Cittert, 182–83 Wiener, 181–82 Finite impulse response (FIR) filters, 24, 83–86, 121, 127 highpass, 85 implementation, 84–85 lowpass, 84 as nonrecursive filter, 83 passband ripples, 85 shortcomings, 85 smooth filter example, 83–84 See also Filters Fixed-pattern noise, 243–44 defined, 243 performance levels, 243 reduction, 244 See also Noise FLIR sensors, 155 Focal plane array (FPA) sensors, 4 discrete sampled representation, 52 photodetectors, 7 uses, 7 Foldover frequencies, 111 Forward Fourier transform, 91 Forward looking infrared (FLIR) images, 13, 42 from airborne sensor, 156 direct design example, 192 example, 155 search, 221 simulated, low-resolution, 152–54 See also FLIR sensors Forward wavelet transform, 97–98, 100 Four-alternative forced-choice (4AFC) format, 164 Fourier-based filters, 86–90 defined, 86 performance, 90 power window, 86 radially symmetric with Butterworth window, 88–89 radially symmetric with Gaussian window, 87 radially symmetric with Hamming window, 87–88 radially symmetric with power window, 89 resampling, 118 See also Filters Fourier-based windowing method, 114 Fourier integral, 65–66
297
Fourier transform, 65–83 continuous, 78–79, 148 convolution property, 73, 74 delta-sampled signal, 146 differentiation, 69–72 discrete (DFT), 78, 148 duality, 72 forward, 91 of Gaus function, 22 of impulse response, 24 inverse, 66 linearity, 66, 68 marginal, 79–80 one-dimensional, 65–78 Parseval’s Identity, 73–75 periodic signals, 75, 76–78 polar representation, 80–81 properties and characteristics, 19 of pulse function, 70–71 sampling, 75–76, 77 scaling, 68, 71 shifting, 66–68, 69 as tool, 65 total energy, 75 transfer function, 19 triangle section, 74 two-dimensional, 78–83 wavelet function, 95 window, 91–92 Frequency domain interpolation, 135 Frequency domain inverse method, 148 Fused imagers, 286
G Gabor transform, 91, 92 Gamma correction, 250–52 curve, 251 defined, 250 illuminance, 252 luminance, 252 See also Tone scale Gaus function, 21, 22 Gaussian deblurring function, 186, 189 Gaussian probability distribution, 272–73 Gaussian pyramid defined, 273 image sequences, 275 Gaussian window, 87 Gauss-Seidel method, 140 General image quality equation (GIQE), 286 Generalized adaptive gain (GAG) function, 216
298
Ghosting, 239 Gradient-based motion estimation, 136–38 defined, 136 gradient constraint, 136, 137 least squares solution, 138 See also Motion estimation
H Half-sample rate, 30 Hamming window, 87–88 High mobility military vehicle (HMMV), 274 Highpass FIR filters, 85 High-resolution image reconstruction, 143–58 error-energy reduction method, 149–52 examples, 152–57 factors limiting resolution recovery, 145–47 image warping effect, 158 input image requirements, 157 noise, 158 nonuniform interpolation method, 147 number of input images required, 145 output bandwidth determination, 158 practical considerations, 157–58 regularized inverse method, 148–49 See also Super-resolution image reconstruction Historgram modification, 209 Human eye, 26–27 Human-in-the-loop (HITL), 15
I Illuminance, 252 Image acquisition model, 145, 180 with subpixel shifts, 132 Image analysis, 108 Image browser, 108 Image contrast enhancement, 7, 207–27 analysis, 223–26 day images, 223 experimental approach, 219–22 experiment design, 220–22 field data collection, 219–20 introduction, 207–8 multiscale process, 209–17 night images, 222–23 output presentation, 216–17 performance measurements, 217–27 in radiation treatment, 262–64 results, 222–23 single-scale process, 208–9 summary, 227 time limited search model, 218–19
Index
Image deblurring, 179–204 Image display, 107–8 Image fusion, 8–10, 269–90 algorithm characteristics, 271 algorithms, 271–80 applications, 269 benefits, 282 DWT, 272 evaluation tests, 270–71 imaging system performance with, 286–89 introduction, 269–70 Laplacian pyramid, 272–75 multiple image modes and, 280–81 objectives, 270–71 optimal approach, 270 quality metrics, 281–86 range performance for, 289 ratio of lowpass pyramid (ROLP), 275–76 summary, 290 superposition, 272 Image fusion quality index (Piella and Heijmans), 284–85 Image quality index (Wang and Bovik), 283–84 Image quality metrics, 50–53 calibration, 281 image fusion, 281–86 image fusion quality index (Piella and Heijmans), 284–85 image quality index (Wang and Bovik), 283–84 mean square error (MSE), 282 mutual information (MI) measures, 283 peak signal-to-noise ratio (PSNR), 283 Xydeas and Petrovic metric, 285–86 Image resampling, 4–5, 107–27 antialias, 114–25 concept, 4–5 examples, 108 filters, 112–13 image analysis, 108 image softcopy display, 108 image transmission/image browser/Quick Look, 108 introduction to, 107 methods, 5 model, 111–12 performance measurements, 125–27 sampling theory and, 109–11 with spatial domain methods, 111–13 summary, 127 Image rescale
Index
computational efficiency, 116–17 implementation, 112, 115–17 output requirements, 115–16 Image restoration, 6–7 Image size, 107 Image softcopy display, 108 Imaging systems, 11–39 basic, 11–15 combined performance, 3 components, 20 detector spatial filters, 22–24 display filtering, 25–26 electronics filtering, 24–25 electro-optical, 38 human eye and, 26–27 infrared, 38 LSI, 16–19 optical filtering, 21–22 overall image transfer, 27–28 performance, 3 performance with image fusion, 286–89 point spread function (PSF), 20 processing principles, 65–102 sampled, 28–34 simple illustration, 55 Infinite impulse response (IIR) filters, 84 Infrared imagery target set, 200 Infrared imaging systems, 38 Infrared sensors, performance characteristics, 15 Inhomogeneity equivalent temperature difference (IETD), 37 Input tone scale curve, 259 Instantaneous frequency, 120 Instrument distortion, 182 Intensity domain interpolation, 134–35 Interpolation bilinear, 126 cubic, 112–13 frequency domain, 135 functions, 111, 113 infinite-order, 112 intensity domain, 134–35 kernels, 112 linear, 112 nearest-neighbor, 112 nonuniform, 147 power window, 126 sinc, 112 spatial domain, 113 spline, 113 Inverse Fourier transform, 66
299
Inverse wavelet transform, 97–98, 100, 101
J Johnson methodology, 42–43 defined, 42 illustrated, 43 reinterpretation, 42–43 Just-noticeable differences (JNDs), 254
L Laplacian pyramid, 272–75 characteristics, 272 construction, 272–74 direct calculation, 278 example, 274 EXPAND process, 273 image sequences, 275 REDUCE function, 273 See also Image fusion Light emitting diode (LED) displays, 25 Linear correction model, 233 Linear interpolation, 112 Linearity, Fourier transform, 66 Linear shift-invariant (LSI) imaging systems, 16–19 as isoplanatic, 17 simplified model, 18, 19 spatial domain, 19 superposition principle, 16, 17 LMS neural network configuration, 240 Localized contrast enhancement (LCE), 7 Longwave infrared (LWIR) waveband, 9, 10, 12, 126 sensors, 274 super-resolution benefit, 168 See also LWIR images Look-up tables (LUTs), 252 Lowpass FIR filters, 84 Lowpass windowing, 143 Luminance, 252 LWIR images, 13
M Mach-band artifacts, 211 Marginal Fourier transform, 79–80 Mean square error (MSE), 282 Michaelis-Menten function, 253 Midwave infrared (MWIR) waveband, 9, 10, 12 day and night imagery, 220 super-resolution benefit, 168 Minimum resolvable contrast (MRC), 16, 42
300
Minimum resolvable temperature difference (MRTD), 42 Minimum resolvable temperature (MRT), 15, 16 Minimum temperature resolution (MTR), 220 Modelfest, 45–46 MODTRAN, 12 Modulation transfer function (MTF), 15, 19, 170 contrast modulation, 48 display, 55 eye, 26, 58, 164 lens, 55 limiting, 166 pre-/post-, 28, 164 system, 58 of system components, 57 Moiré effect, 108 Mother wavelet, 97 Motion estimation, 135–43 correlation, 142–43 frame-to-frame detection, 135 gradient-based, 136–38 optical flow, 138–42 Multiple scale function, 278 Multiresolution analysis, 210 Multiscale edge detection, 98–102, 213 Multiscale process multiresolution analysis, 210 unsharp masking (USM), 209, 210–11 wavelet edges, 210, 211–17 See also Image contrast enhancement Mutual information (MI) measures, 283
N National Electrical Manufacturers Association (NEMA), 254 National Image Interpretability Rating Scale (NIIRS), 255 Nearest neighbor interpolation, 112 Near infrared (NIR) waveband, 9, 12, 47 NEMA/DICOM calibration function, 255 Night images analysis, 225 results, 222–23 Night Vision and Electronic Sensor Directorate (NVESD), 36, 41 FOV search task, 218 models, 61 Noise 3-D, 36, 37, 56 fixed-pattern, 243–44
Index
fixed-pattern, power spectrum, 49 separation frequency point, 193–95 spatial frequency introduction, 48 spatio-temporal, 56 spectrum support, 180 treatments, 201 Noise energy, 193–95 estimating, 193–95 signal energy and, 192 Noise equivalent power (NEP), 35 Noise separation frequency point, 186–87 Nonlinearity calibration error sources, 237–38 calibration with second-order assumption, 237 effects of, 233–38 residual error, 233–34 second order, error, 234–37 Nonlinear tone scale, 250–52 examples, 250 gamma correction, 250–52 look-up tables, 252 See also Tone scale Nonrecursive filters, 83 Nonuniform interpolation method, 147 Nonuniformity detector, 231–32 limitation, 231 mathematical description, 231 standard deviation, 236 Nonuniformity correction (NUC), 7–8 adaptive, 238–42 calibration-based, 7, 232, 237–38 effectiveness measure, 236 example, 8 IETD and, 37 LMS neural net, 242 residual noise impact after, 231, 236 scene-adaptive, 8, 9 single-point, 237 two-point, 234, 235, 236, 243, 244 Nyquist frequencies, 110 Nyquist intervals, 110 Nyquist rates, 30, 110, 171 Nyquist spaces, 110
O Observer training, 202 One-dimensional Fourier transform, 65–78 convolution property, 73, 74 differentiation, 69–72 duality, 72
Index
Fourier integral, 65–66 linearity, 66 marginal, 80 Parseval’s Identity, 73–75 periodic sampled signal, 76–78 periodic signals, 75, 76 properties, 66–78 sampling, 75–76, 77 scaling, 68, 71 shifting, 66–68, 69 Optical filtering, 21–22 Optical flow motion estimation, 138–42 computation variations, 142 defined, 138 error, 139 vector constraints, 139 velocity estimates, 140 Out-of-band spurious response, 33, 52 Overall image transfer, 27–28
P Parseval’s identity, 73–75, 92, 93 P-deblurring filter, 185–99 adaptive design, 192–93, 195–99 benefits, 197 cutoff frequency point, 188 defined, 185 design, 188–92 direct design, 188–92 illustrated, 185 magnitude, 187 noise energy/ noise separation frequency point, 193–95 noise separation frequency point, 186–87 peak point, 186 properties, 186 Peak point, 186 Peak signal-to-noise ratio (PSNR), 283 Perceived contrast, 252 Perceptual linearization tone scale, 252–55 brightness, 253, 254 physical contrast, 252–53 See also Tone scale Performance “best” sensor curve, 287 combined imaging system, 3 Fourier-based filter, 90 image contrast enhancement measurements, 217–27 image deblurring measurements, 199–204 with image fusion, 286–89 image resampling measurements, 125–27
301
imaging system, 3, 243–44 infrared sensor characteristics, 15 measures, 3 range, 204 super-resolution imager, 158–67 target acquisition, 61 tone scale example, 264–66 Periodic signals Fourier transform, 75, 76 sampled, 76–78 Perpetual-based multiscale decomposition, 276–78 Physical contrast, 252–53 Piece-wise linear tone scale, 247, 248–50 defined, 248 mapping curve, 249 as window-level tone scale, 250 See also Tone scale Pixel averaging, 173 Pixelization, 33 Planck’s blackbody radiation curves, 14 Point spread function (PSF), 4, 20, 129, 179 Portal images, 255–57 Power spectral density (PSD), 48–49 Power window, 89 defined, 86, 135 interpolation, 126 ripple property, 124 Projection onto convex sets (POCS), 149 Pulse function, Fourier transform, 70–71
Q Quadrature mirror filter (QMF), 210 Quick Look, 108
R Radially symmetric filters with Butterworth window, 88–89 with Gaussian window, 87 with Hamming window, 87–88 with power window, 89 See also Filters Radiation tone scale curve, 258 Radiation treatment collimation fields, locating/labeling, 257 contrast enhancement, 262–64 output image production, 264 portal image, 255–57 tone scale curves, 257–62 tone scale in visualization enhancement, 255–64 Range performance, 204
302
Ratio of lowpass pyramid (ROLP), 275–76 constructing, 275–76 defined, 275 example, 277 luminance contrast, 276 See also Image fusion Rayleigh Criterion, 15 Reconstruction filters, 31 Rectangular filters, 31 Recursive filters, 84 REDUCE functions, 273 Region-growing method, 209 Regularized inverse method, 148–49 Regulation methods, 181 Resampling. See Image resampling Resampling filters, 112–13 2-D chirp test pattern, 120–22 2-D Delta test pattern, 119–20 Fourier-based, 118 interpolation functions, 112–13 performance analysis, 119–25 ripple property, 122–25 Residual error, 233–34 Resolution, sensor, 15, 173 Retinal processing, 239 Ripple property, 122–25 method, 123 power window, 124 Root-sum-squared (RSS), 32
S Sampled imaging systems, 28–34 defined, 28 frequency analysis, 29 output, 31 spurious response, 33 three-step imaging process, 29 Samples per IFOV, 23 Sampling 2-D grid, 83 artifacts, 34, 110–11 dyadic, 96 Fourier transform, 75–76, 77 mathematics, 52 output, 31 process, 30 super-resolution for, 130 theory, 109–10 Scaling, Fourier transform, 68, 71 Scene-adaptive NUCs, 8, 9 Second order nonlinearity, 234–37 Sensitivity, sensor, 15, 172–73
Index
Sensors blur and noise characterization, 47 blurring function, 145 field of view (FOV), 25, 130, 218, 265 FLIR, 155 half-sample rate, 30 information, 270 LWIR, 274 MWIR, 219 Nyquist rate, 30 resolution, 15 sensitivity, 15, 172–73 super-resolution reconstruction and, 167 SWIR, 274 thermal, 50 thermographic camera, 161–62 Shannon’s sampling theorem, 109 Shielding blocks, 255 Shifting, Fourier transform, 66–68, 69 Shift invariant discrete wavelet transform (SIDWT), 280 Short-time Fourier transform (STFT), 91 Shortwave infrared (SWIR) waveband, 9, 12, 274 Signal processing advanced applications, 4 basic principles, 4, 65–102 Signal registration, 133–34 Signal-to-noise ratio (SNR), 34–38 of 1, 36 image noise, 35 peak (PSNR), 283 sensitivity and, 34 Sinc interpolation, 112 Single-scale process contrast sketching, 208–9 histogram modification, 209 region-growing method, 209 See also Image contrast enhancement Solar loading, 13 Spatial domain inverse method, 148 Spatial domain linear systems, 19 Spatial filters, 20, 22–24 Spatial frequency domain, 19 Spatio-temporal noise, 56 Spatio-temporal processing, 240–42 implementation, 240 LMS neural network configuration, 240 neural net configuration, 241 See also Adaptive NUC Speed point block diagram, 262
Index
defined, 261 determining, 261–62 Spline interpolation, 113 Subpixel shift estimation, 133–35 correlation interpolation, 134 frequency domain interpolation, 135 intensity domain interpolation, 134–35 signal restoration, 133–34 for two-dimensional images, 144 Superposition fusion method, 272 principle, 16, 17 Super-resolution, 129–74 background, 131 defined, 129 for diffraction, 129–30 human perceptual, 167 performance chart, 172 performance impact, 168 proposed IEEE nomenclature, 130 for sampling, 130 sensor sensitivity and, 172 Super-resolution image reconstruction, 5–6, 131–74 acquisition, 132 algorithm overview, 132 defined, 60 high-resolution, 132, 143–58 motion estimation, 132, 135–43 performance measurements, 158–67 performance modeling, 172–73 prediction, 172–73 sensors benefiting from, 167–72 steps, 132 subpixel shift estimation, 133–35 system performance comparison with, 60 Super-resolution image restoration, 130–31 Super-resolution imager performance measurements, 158–67 background, 158–59 display, 169–70 experimental approach, 159–66 field data collection, 160–61 noise, 169 resolution, 169 results, 166–67 sensor description, 161–64 TOD, 159–60 Super-resolved images, 131 obtaining, 147 quality, 145 Sweat factor, 203
303
T Target acquisition, 41–61 example, 53–61 human performance of, 53 image quality metric, 50–53 performance, 61 spatial frequencies, 47 theory, 41–43 threshold vision, 43–49 Target task performance (TTP) metric, 43, 50 calculation, 289 in human performance of target acquisition, 53 modification to, 289 result, 286 Target transfer probability function (TTPF), 42 Temporal processing, 238–40 correction term processing, 239 method of constant statistics, 239 temporal average, 238 temporal variance, 238–39 See also Adaptive NUC Thermal sensors, 50 Thermographic camera defined, 161 features and specifications, 162 postsample transfer response, 162, 163 presample blur transfer response, 162, 163 Threshold vision aided eye, 47–49 unaided eye, 43–47 Time-frequency wavelet analysis, 91–96 Time limited search (TLS) model, 218–19 Tonal transfer compensation (TTC), 247 Tone reproduction curve (TRC), 247 Tone scale, 8, 247–67 application to enhanced visualization, 255–64 collimation, 263 contrast enhancement and, 248 defined, 8, 247 introduction, 247–48 nonlinear, 247, 250–52 perceptual linearization, 247, 252–55 performance example, 264–66 piece-wise, 247, 248–50 processing, 247 summary, 266–67 techniques, 247 Tone scale curves, 257–62 collimation, 258 contrast, adjusting, 260–61
304
Tone scale curves (continued) design of, 257–62 input, selective scaling, 259 radiation, 258 selective scaling, 258 speed point determination, 261–62 See also Tone scale Total energy, Fourier transform, 75 Transfer functions for bilinear interpolation, 126 corresponding to impulse responses, 21 defined, 19, 20 detector spatial filter, 23 eye, 27 of optical blur, 22 postsampling, 31 for power window interpolation, 126 presample blur, 30 Triangle Orientation Discrimination (TOD), 159–60 advantages, 159–60 curve, 160 defined, 159 modified version, 160 test pattern, 159 Turbules, 131 Two-dimensional Fourier transform, 78–83 continuous, 78–79 discrete, 82–83 marginal, 79–80 polar representation, 80–82 sampling, 82–83 See also Fourier transform Two-dimensional wavelet transform, 98
U Undersampled imagers, 167 Undersampling, 5 Unmanned aerial vehicle (UAV), 133 Unmanned ground vehicle (UGV), 133 Unsharp masking (USM), 209, 210–11 banding artifact, 211 defined, 210 variations, 211
Index
V Van Cittert filter, 182–83
W Wavelet edges, 210 contrast enhancement based on, 211–17 modification, 213–15 multiscale, 211–13 multiscale detection, 213 Wavelets defined, 90 development, 90–91 mother, 97 time-frequency analysis, 91–96 Wavelet transform, 90–102 admissibility condition, 97 discrete (DWT), 95 dyadic, 96 forward, 97–98, 100 inverse, 97–98, 100, 101 multiscale edge detection, 98–102 sensor fusion, 280 signal decomposition, 95 three-level, 102 two-dimensional, 98 Weighted fusion quality index, 285 Well-sampled imagers, 167 White hot, 13 Wide FOV (WFOV), 218, 265 Wiener filter, 181–82 defined, 181 parametric, 182 weak noise power and, 181 Window Fourier transform, 91–92 Window-level tone scale. See Piece-wise linear tone scale
X X-ray images, 255, 256 Xydeas and Petrovic metric, 285–86
Z Zero-padding, 118
Recent Titles in the Artech House Optoelectronics Series Brian Culshaw and Alan Rogers, Series Editors Chemical and Biochemical Sensing with Optical Fibers and Waveguides, Gilbert Boisdé and Alan Harmer Coherent and Nonlinear Lightwave Communications, Milorad Cvijetic Coherent Lightwave Communication Systems, Shiro Ryu DWDM Fundamentals, Components, and Applications, Jean-Pierre Laude Fiber Bragg Gratings: Fundamentals and Applications in Telecommunications and Sensing, Andrea Othonos and Kyriacos Kalli Frequency Stabilization of Semiconductor Laser Diodes, Tetsuhiko Ikegami, Shoichi Sudo, and Yoshihisa Sakai Handbook of Distributed Feedback Laser Diodes, Geert Morthier and Patrick Vankwikelberge Helmet-Mounted Displays and Sights, Mordekhai Velger Introduction to Infrared and Electro-Optical Systems, Ronald G. Driggers, Paul Cox, and Timothy Edwards Introduction to Lightwave Communication Systems, Rajappa Papannareddy Introduction to Semiconductor Integrated Optics, Hans P. Zappe LC3D: Liquid Crystal 3-D Director Simulator, Software and Technology Guide, James E. Anderson, Philip E. Watson, and Philip J. Bos Liquid Crystal Devices: Physics and Applications, Vladimir G. Chigrinov New Photonics Technologies for the Information Age: The Dream of Ubiquitous Services, Shoichi Sudo and Katsunari Okamoto editors Optical Document Security, Third Edition, Rudolf L. van Renesse Optical FDM Network Technologies, Kiyoshi Nosu Optical Fiber Amplifiers: Materials, Devices, and Applications, Shoichi Sudo, editor Optical Fiber Communication Systems, Leonid Kazovsky, Sergio Benedetto, and Alan Willner Optical Fiber Sensors, Volume Three: Components and Subsystems, John Dakin and Brian Culshaw, editors Optical Fiber Sensors, Volume Four: Applications, Analysis, and Future Trends, John Dakin and Brian Culshaw, editors Optical Measurement Techniques and Applications, Pramod Rastogi
Optical Transmission Systems Engineering, Milorad Cvijetic Optoelectronic Techniques for Microwave and Millimeter-Wave Engineering, William M. Robertson Reliability and Degradation of III-V Optical Devices, Osamu Ueda Signal Processing and Performance Analysis for Imaging Systems, S. Susan Young, Ronald G. Driggers, and Eddie L. Jacobs Smart Structures and Materials, Brian Culshaw Surveillance and Reconnaissance Imaging Systems: Modeling and Performance Prediction, Jon C. Leachtenauer and Ronald G. Driggers Understanding Optical Fiber Communications, Alan Rogers Wavelength Division Multiple Access Optical Networks, Andrea Borella, Giovanni Cancellieri, and Franco Chiaraluce
For further information on these and other Artech House titles, including previously considered out-of-print books now available through our In-Print-Forever® (IPF®)
program, contact: Artech House
Artech House
685 Canton Street
46 Gillingham Street
Norwood, MA 02062
London SW1V 1AH UK
Phone: 781-769-9750
Phone: +44 (0)20 7596-8750
Fax: 781-769-6334
Fax: +44 (0)20 7630-0166
e-mail:
[email protected] e-mail:
[email protected] Find us on the World Wide Web at: www.artechhouse.com