Imaging for Detection and Identification
NATO Security through Science Series This Series presents the results of scientific meetings supported under the NATO Programme for Security through Science (STS). Meetings supported by the NATO STS Programme are in security-related priority areas of Defence Against Terrorism or Countering Other Threats to Security. The types of meeting supported are generally “Advanced Study Institutes” and “Advanced Research Workshops”. The NATO STS Series collects the results of these meetings. The meetings are co-organized by scientists from NATO countries and scientists from NATO’s “Partner” or “Mediterranean Dialogue” countries. The observations and recommendations made at the meetings, as well as the contents of the volumes in the Series, reflect those of participants and contributors only; they should not necessarily be regarded as reflecting NATO views or policy. Advanced Study Institutes (ASI) are high-level tutorial courses which convey the latest developments in a subject to an advanced-level audience. Advanced Research Workshops (ARW) are meetings of experts where an intense but informal exchange of views at the frontiers of a subject aims at identifying directions for future actions. Following a transformation of the programme in 2004 the Series has been re-named and re-organised. Recent volumes on topics not related to security, which result from meetings supported under the programme earlier, may be found in the NATO Science Series. The Series is published by IOS Press, Amsterdam, and Springer, Dordrecht, in conjunction with the NATO Public Diplomacy Division. Sub-Series A. B. C. D. E.
Chemistry and Biology Physics and Biophysics Environmental Security Information and Communication Security Human and Societal Dynamics
http://www.nato.int/science http://www.springer.com http://www.iospress.nl
Series B: Physics and Biophysics
Springer Springer Springer IOS Press IOS Press
Imaging for Detection and Identification edited by
Jim Byrnes Prometheus Inc., Newport, U.S.A.
Published in cooperation with NATO Public Diplomacy Division
Proceedings of the NATO Advanced Study Institute on Imaging for Detection and Identification Il Ciocco, Italy 23 July–5 August 2006 A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN-10 ISBN-13 ISBN-10 ISBN-13 ISBN-10 ISBN-13
1-4020-5618-4 (PB) 978-1-4020-5618-5 (PB) 1-4020-5619-2 (HB) 978-1-4020-5619-2 (HB) 1-4020-5620-6 (e-book) 978-1-4020-5620-8 (e-book)
Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. www.springer.com
Printed on acid-free paper
All Rights Reserved C 2007 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
CONTENTS
Preface
vii
Multi-perspective Imaging and Image Interpretation Chris J. Baker, H. D. Griffiths, and Michele Vespe
1
Radar Imaging for Combatting Terrorism Hugh D. Griffiths and Chris J. Baker
29
Optical Signatures of Buried Mines Charles A. Hibbitts
49
Chemical Images of Liquids L. Lvova, R. Paolesse, C. Di Natale, E. Martinelli, E. Mazzone, A. Orsini, and A. D Amico
63
Sequential Detection Estimation and Noise Cancelation E. J. Sullivan and J. V. Candy
97
Image Fusion: A Powerful Tool for Object Identification ˇ Filip Sroubek, Jan Flusser, and Barbara Zitov´a
107
Nonlinear Statistical Signal Processing: A Particle Filtering Approach J. V. Candy Possibilities for Quantum Information Processing ˇ ep´an Holub Stˇ Multifractal Analysis of Images: New Connexions between Analysis and Geometry Yanick Heurteaux and St´ephane Jaffard Characterization and Construction of Ideal Waveforms Myoung An and Richard Tolimieri
v
129
151
169
195
vi
CONTENTS
Identification of Complex Processes Based on Analysis of Phase Space Structures Teimuraz Matcharashvili, Tamaz Chelidze, and Manana Janiashvili
207
Time-resolved Luminescence Imaging and Applications Ismail Mekkaoui Alaoui
243
Spectrum Sliding Analysis Vladimir Ya. Krakovsky
249
Index
263
PREFACE
The chapters in this volume were presented at the July–August 2006 NATO Advanced Study Institute on Imaging for Detection and Identification. The conference was held at the beautiful Il Ciocco resort near Lucca, in the glorious Tuscany region of northern Italy. For the eighth time we gathered at this idyllic spot to explore and extend the reciprocity between mathematics and engineering. The dynamic interaction between world-renowned scientists from the usually disparate communities of pure mathematicians and applied scientists which occurred at our seven previous ASI’s continued at this meeting. The fusion of basic ideas in mathematics, radar, sonar, biology, and chemistry with ongoing improvements in hardware and computation offers the promise of much more sophisticated and accurate detection and identification capabilities than currently exist. Coupled with the dramatic rise in the need for surveillance in innumerable aspects of our daily lives, brought about by hostile acts deemed unimaginable only a few short years ago, the time is ripe for image processing scientists in these usually diverse fields to join together in a concerted effort to combat the new brands of terrorism. This ASI was one important initial step. To encompass the diverse nature of the subject and the varied backgrounds of the participants, the ASI was divided into three broadly defined but interrelated areas: the mathematics and computer science of automatic detection and identification; image processing techniques for radar and sonar; detection of anomalies in biomedical and chemical images. A deep understanding of these three topics, and of their interdependencies, is clearly crucial to meet the increasing sophistication of those who wish to do us harm. The principal speakers and authors of the following chapters include many of the world’s leading experts in the development of new imaging methodologies to detect, identify, and prevent or respond to these threats. The ASI brought together world leaders from academia, government, and industry, with extensive multidisciplinary backgrounds evidenced by their research and participation in numerous workshops and conferences. This forum provided opportunities for young scientists and engineers to learn more about these problem areas, and the crucial role played by new insights, from recognized experts in this vital and growing area of harnessing mathematics and engineering in the service of a world-wide public security interest. An ancillary benefit will be the advancement of detection and identification capabilities for natural threats such as disease, natural disasters, and environmental change. vii
viii
PREFACE
The talks and the following chapters were designed to address an audience consisting of a broad spectrum of scientists, engineers, and mathematicians involved in these fields. Participants had the opportunity to interact with those individuals who have been on the forefront of the ongoing explosion of work in imaging and detection, to learn firsthand the details and subtleties of this exciting area, and to hear these experts discuss in accessible terms their contributions and ideas for future research. This volume offers these insights to those who were unable to attend. The cooperation of many individuals and organizations was required in order to make the conference the success that it was. First and foremost I wish to thank NATO, and especially Dr. F. Pedrazzini and his most able assistant, Ms. Alison Trapp, for the initial grant and subsequent help. Financial support was also received from the Defense Advanced Research Projects Agency (Drs. Joe Guerci and Ed Baranoski), AFOSR (Drs. Arje Nachman and Jon Sjogren), ARO (Drs. Russ Harmon and Jim Harvey), EOARD (Dr. Paul Losiewicz), ONR (Mr. Tim Schnoor, Mr. Dave Marquis and Dr. Tom Swean) and Prometheus Inc. This additional support is gratefully acknowledged. I wish to express my sincere appreciation to my assistants Parvaneh Badie, Marcia Byrnes and Ben Shenefelt and to the co-director, Temo Matcharashvili, for their invaluable aid. Finally, my heartfelt thanks to the Il Ciocco staff, especially Bruno Giannasi, for offering an ideal setting, not to mention the magnificent meals, that promoted the productive interaction between the participants of the conference. All of the above, the speakers, and the remaining conferees, made it possible for our Advanced Study Institute, and this volume, to fulfill the stated NATO objectives of disseminating advanced knowledge and fostering international scientific contacts. August 6, 2006
Jim Byrnes Il Ciocco, Italy
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
Chris J. Baker, H. D. Griffiths, and Michele Vespe Department of Electronic and Electrical Engineering, University College London, London, UK
Abstract. High resolution range profiling and imaging have been the principal methods by which more and more detailed target information can be collected by radar systems. The level of detail that can be established may then be used to attempt classification. However, this has typically been achieved using monostatic radar viewing targets from a single perspective. In this chapter methods for achieving very high resolutions will be reviewed. The techniques will include wide instantaneous bandwidths, stepped frequency and aperture synthesis. Examples showing the angular dependency of high range resolution profiles and two-dimensional imagery of real, full scale targets are presented. This data is examined as a basis for target classification and highlights how features observed relate to the structures that compose the target. A number of classification techniques will be introduced including statistical, feature vector and neural based approaches. These will be combined into a new method of classification that exploits multiple perspectives. Results will be presented, again based upon analysis of real target signatures and are used to examine the selection of perspectives to improve the overall classification performance. Key words: ATR, radar imaging, multi-perspective classification
1. Introduction Target classification by radar offers the possibility of remotely identifying objects at ranges well in excess of those of any other sensor. Indeed radar has been employed in operational systems for many years. However, the systems use human interpretation of radar data and performance is generally unreliable and slow. Nevertheless, the benefits of fast and reliable classification are enormous and have the potential for opening huge areas of new application. Research in recent years has been intense but still, automated or semi-automated classification able to work acceptably well in all conditions 1 J. Byrnes (ed.), Imaging for Detection and Identification, 1–28. C 2007 Springer.
2
C. J. BAKER ET AL.
seems a long way off. The prime approach to developing classification algorithms has been to use higher and higher spatial resolutions, either one dimensional range profiles (Hu and Zhu, 1997) or two-dimensional imagery (Novak et al., 1997). High resolution increases the level of detail in the data to be classified and this has generally been seen as providing more and better information. However, the performance of classifiers, whilst very good against a limited set of free space measurements is much less satisfactory when applied to operationally realistic conditions. In this chapter we will review methods for obtaining high resolution, use these to generate high resolution target signatures and subsequently illustrate some of their important aspects that require careful understanding if classification is to be successful. We then examine some typical classifiers before considering in more detail an approach to classification that uses a multiplicity of perspectives as the data input. This also enables much information about the nature of target signatures and their basis for classification to be evaluated. Firstly though, the concepts of resolution and classification are discussed as these terms are often used with imprecise or varying meanings. Most often resolution is defined as the ability of radar (or any sensor) to distinguish between two closely spaced scatterers. A measure of this ability is captured in the radar system point spread function or impulse response function with resolving power being determined by the 3-dB points. This is a reasonable definition but care needs to be taken as there is an implicit assumption that the target to be considered has point like properties. This is well known not to be the case, but nevertheless it has proved a useful descriptor of radar performance. As will be seen later, if resolution is improved more scatterers on a target can be distinguished from one another and there is undoubtedly an improved amount of detail in the resulting signature. Care also has to be taken with synthetic aperture imaging in two dimensions such as SAR and ISAR. These imaging techniques again have an implicit assumption that targets have point like properties and are continuously illuminated during formation of the image. Once again most targets are not points and quite often there is occlusion of one target by another. For example high placed scatterers near the front of a target often place the scatterers further back in shadow and hence they are not imaged. Thus the resolution in a typical SAR image is not necessarily constant, a fact often overlooked by developers of classification algorithms. However, once again, these assumptions have still resulted in robust SAR and ISAR image generation and as larger apertures are synthesised there is an undeniable increase in detail in the resulting imagery. We will return to these themes as we explore high resolution techniques and approaches to classification of the resulting target signatures.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
3
2. High Down Range Resolution Techniques Despite the reservations discussed above we nevertheless begin by modelling the reflectivity from a target as the coherent sum of a series of spatially separated scatterers (Keller, 1962). Its simplicity offers valuable insight into some of the scattering processes observed. Thus the frequency domain reflectivity function ζ θ ( f ) of a complex target illuminated at a given aspect angle θ by an incident field of frequency λ D, where D is the physical dimension of the target, is given by: ζθ ( f ) =
N
ζθi ( f )
(1)
i=1
where ζθi ( f ) = Aiθ ( f ) exp − jϑθi ( f ) .
(2)
Thus we see that the amplitude and phase of the ith scatterer depend on both frequency and aspect angle. An inverse FFT will yield the complex reflectivity function or range profile, where the magnitude in each of the IFFT bins represents the magnitude of the reflections from the scatterers in a range resolution cell. The resolution in the range dimension is related directly to the pulse width (Tp ) of the waveform. Two targets of equal RCS are said to be recognized as being resolved in range when they are separated from each other by a distance: d=
cTp . 2
(3)
This equation tells us the range resolution of a pulsed radar system when the pulse is not modulated. Thus a very short pulse is needed if high range resolution is required. Indeed to resolve all the scatterers unambiguously the pulse length has to be short enough such that only one scatterer appears in each range cell. Normally as high a range resolution as possible is used and it is accepted that not all scatterers will be resolved. The resulting ambiguity is one source of ensuing difficulty in the next stage of classification. In long range radar, a long pulse is needed to ensure sufficient energy to detect small targets. However, this long length waveform has poor range resolution. The use of a short duration pulse in a long range radar system implies that a very high peak power is required. There is a limitation on just how high the peak power of the short pulse can be. Ultimately the voltage required will ‘arc’ or breakdown, risking damage to the circuitry and poor efficiency of transmission.
4
C. J. BAKER ET AL.
Figure 1. Changing the frequency as a function of time across the duration of a pulse. B is the modulation bandwidth and T the pulse duration.
One method which is commonplace today that overcomes this limitation is pulse compression (Knott et al., 1985). Pulse compression is a technique that consists of applying a modulation to a long pulse or waveform such that the bandwidth B of the modulation is greater than that of the un-modulated pulse (i.e. 1/Tp ). On reception the long pulse is processed by a matched filter to obtain the equivalent resolution of a short pulse (of width 1/B). In the time domain the received signal is correlated with a time reversed replica of the transmitted signal (delayed to the chosen range). This compresses the waveform into a single shorter duration that is equivalent to a pulse of length given by 1/B. In other words it is determined by the modulation bandwidth and not the pulse duration. Thus the limitation of poor range resolution using long pulses is overcome and long range as well as high resolution can both be achieved together. Here pulse compression can be achieved by frequency, phase and amplitude modulation. The most common form of modulation used is to change the frequency from the start to the end of the pulse such that the required bandwidth B is swept out as shown in Figure 1. This leads to a resolution given by: d=
c . 2B
(4)
To achieve even higher range resolutions a frequency modulated steppedfrequency compressed waveform may be employed. This reduces the instantaneous modulation bandwidth requirement while increasing the overall bandwidth. In other words the necessary wider bandwidth waveform is
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
5
Figure 2. Spectrum reconstruction of the target reflectivity function using a series of chirp pulses (a) and the corresponding synthesised bandwidth (b).
synthesised using a number of pulses. However, note that this has the disadvantage of collecting the individual waveforms over a finite period of time making the required coherency vulnerable to target motion. High range resolution (HRR) profiles are subsequently produced by processing a wideband reconstruction of a targets reflectivity spectrum in the frequency domain (Wilkinson et al., 1998). By placing side by side N narrower chirp waveforms or pulses of bandwidth B and using an inter-pulse frequency step increment also equal to B, it is possible to synthesise a total bandwidth of Bt = (N + 1)B/2. This is illustrated in Figure 2. A range profile may then be defined as a time sequence of the vector sum of signals reflected back by different scatterers within a range cell. By matched filtering this stepped frequency waveform a range resolution d = c/(2NB) is achieved. The received signal is the transmitted pulse modulated by the corresponding sub-spectrum representing the target reflectivity function. By adding the compressed individual portions of reflectivity function, which result from time convolution between each received pulse with the complex conjugate of the corresponding transmitted pulse, the entire spectrum is eventually obtained commensurate with the extended bandwidth. The HRR profile may then be synthesised from an inverse FFT applied to each row of the time history of the target’s frequency domain signature matrix.
3. High Cross Range Resolution In real aperture radar the cross range resolution is determined by the antenna beamwidth. This is often thought of as being so large that it effec-
6
C. J. BAKER ET AL.
tively produces no resolution of any value and is why HRR profiles are often referred to as one-dimensional target signatures. In reality they are two dimensional images where one of the dimensions of resolution is very poor. The real beamwidth (or cross range resolving power) is defined as the width at the half power or 3-dB points from the peak of the main lobe. The beamwidths may be the same in both the elevation (vertical) and the azimuth (horizontal) dimensions, although this is by no means mandatory. The beamwidth in radians is a function of antenna size and transmission wavelength. For a circular aperture (i.e. a circular profile of a parabolic dish) the beamwidth in radians at the half power or 3-dB points is given approximately by: Baz =
λ . D
(5)
This means that the cross-range extent of the beam at a range R is given by: Raz = R
λ . D
(6)
Thus for a range of only 10 km and a wavelength of 3 cm, a 0.5 m diameter antenna will have a cross range resolution of 600 m. This contrasts with a fre quency modulated pulse bandwidth of 500 MHz leading to a range resolution of 30 cm. It is typical to have equal down and cross range resolutions. Two techniques by which much higher cross range resolution may be obtained are SAR and ISAR. These are essentially completely equivalent and rely on viewing a target over a finite angular ambit to create a synthetic aperture with a length in excess of that of the real radar aperture and hence able to provide a higher resolution. The greater the angular ambit traversed, the greater the length of the synthesised aperture and the higher the cross range resolution. Here we will confine our discussions to ISAR imaging only and the interested reader is referred to excellent textbooks that cover SAR imaging (Curlander and McDonough, 1991; Carrara et al., 1995). To form an ISAR image the HRR profiles assembled using the step frequency technique reviewed in the previous section are used. To obtain the magnitude of the ISAR image’s pixel in lth range cell and jth Doppler cell (Dl, j ) an FFT is applied to each column of the matrix representing the time history of target’s range profile. This is demonstrated in Figure 3. For ISAR, the resolution achieved in cross-range depends upon the minimum resolvable frequency f D between two adjacent scatterers (Wehner, 1995). Doppler resolution is also related to the available coherent time of integration T which is equal to the time required to collect the N chirp returns.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
7
Figure 3. Matrix decomposition for HRR profiles and ISAR imagery.
Therefore, consecutive reflectivity samples from the same range cells are taken every N T seconds: fD =
1 1 ≈ . N T T
(7)
As a consequence, the cross-range resolution rc can be written as: rc =
c f D λ = 2ω0 f c 2ω0 T
(8)
where λ = c/ f c is the illuminating wavelength and ω0 is the angular velocity of the target rotational motion. In ISAR image processing the motion of the target is usually unknown. The target motion can be seen as the superposition of rotational and translational motions with respect to the radar system. If the former contributes to the ability to resolve in cross-range, in order to obtain a focused image of the target it is necessary to compensate for phase errors due to the translational motion occurring during data collection. This is usually referred to as motion compensation. After this correction an ISAR image may be obtained by processing data collected over an arc of circular aperture whose dimensions depend on the rotational speed of the target ω0 and the integration time T . There may still be further residual motions that require correction using autofocus techniques. We now introduce experiments by which HRR profiles and ISAR images are generated.
8
C. J. BAKER ET AL.
4. High Resolution Target Signatures In this section we exploit the high resolution techniques introduced above to examine the form of the resulting signatures. To do this we use measurements made of calibration and vehicle targets that have been mounted on a slowly rotating platform. Figure 4 shows the experimental set up. Here the radar system views a turntable at a shallow grazing angle of a few degrees. The figure shows corner reflector calibration targets in place. Two are located on the turntable and two are located in front of and behind the turntable. Additional experiments have also been performed with the corners removed and a vehicle target situated on the turntable instead. The profiles are generated from an Xband radar having an instantaneous bandwidth of 500 MHz. Eight frequency steps spaced by 250 MHz are used to synthesise a total bandwidth of 2.25 GHz. The turntable data is first enhanced by removing any stationary clutter (Showman et al., 1998). Estimation and subtraction have been performed in the frequency domain. Figure 5 shows the resulting range profiles and their variation as the turntable rotates through 360◦ . The two stationary trihedrals show a constant response at near and far range as expected. For the two rotating trihedral targets, when the line-of-sight is on the trihedral bisector, a peak of reflection occurs. This is consistent with the expected theoretical response. As the trihedral targets rotate, the backscattered field decreases progressively until a point is reached where there is a peak of specular reflection. This is a reflection from one of the sides making up the trihedral which is orthogonal to the illuminating radar system (i.e. it faces the radar beam and looks like a flat plate reflector). At increasing rotation angles the RCS of the target drops since the orientation of the trihedral is such that it tends to reflect incident radiation away from the radar. This angular dependency of the RCS of a well known reflector such as a trihedral begins to illustrate how the backscattering properties of real targets may vary with the orientation of observation. For example if a target has part of its structure that mimics a trihedral it may show this feature over a similarly
Figure 4. ISAR geometry: two stationary corner reflectors are in front of and behind the turntable, while two rotating ones are placed on the turntable.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
9
Figure 5. History of HRR range profiles (30 cm of range resolution) from four corner reflectors, two rotating and two stationary. The 0◦ angle is in correspondence of the two corners having the same range relative to the system and facing directly the radar beam.
limited angular range. Thus in a multi-perspective (M-P) environment, different angular samples of a target’s signature should improve the likelihood of observing a corner or corner like reflector and recognising it as such. Particular shapes such as flat plates and corners can be common on many manmade structures and are often quite dominant features that may prove useful for classification. In Figure 6, the complete angular ambit of range profiles spanning 360◦ from a Land Rover vehicle rotating on the turntable is shown. This highlights a number of different scattering behaviours: the strong peaks from specular reflections (0◦ , 90◦ , 180◦ ) appear over a very limited angle range and obscure nearby point-like backscattering. Corner-like returns can be observed at far range (∼6 m) for two range angular spans (∼[10◦ −60◦ ] and [130◦ −180◦ ]). These returns correspond to the trihedral like structures formed at the rear of the Land Rover. This is a vehicle without the rear soft top and has a metallic bench seat that makes a corner where it joins the rear bulkhead of the drivers cabin. At ∼8 m range there is a double bounce return corresponding to one of the corners. This type of effect increases the information that can be processed which would be otherwise impossible to reconstruct by a traditional single perspective approach. It also illustrates the
10
C. J. BAKER ET AL.
Figure 6. History of HRR range profiles (range resolution less than 15 cm) from a series of X-Band stepped frequency chirps illuminating a ground vehicle as it rotates over 360◦ . At 0◦ , the target is broadside oriented, while at 90◦ has its end-view towards the radar.
complexity and subtlety of radar target scattering. Note also that here we are dealing with targets on turntables where there is no clutter or multipath and the signal to noise ratio is high. In more realistic scenarios the scattering from targets will in fact be even more complicated. The turntable data can be processed using the ISAR technique to yield two-dimensional imagery. Figure 7 shows a series of ISAR images of a Ford Cougar car in which the number of perspectives used to form the imagery is slowly increased. It is clear that when only a single perspective is used there is considerable self-shadowing by the target and the shape cannot be determined with good confidence. When all eight perspectives are employed then the effects of self-shadowing are largely eliminated and much more complete information is generated. Note, however, that the concept of resolving scatterers as seemed to be observed in the range profiles of the Land Rover is much less obvious. As with most modern vehicles the Cougar is designed to have low wind resistance and has little in the way of distinct scattering centres. This begins to place in question both the image formation process and classification techniques if they are using a point target scatterer model assumption.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
One perspective
Three perspective
Five perspective
Eight perspective
11
Figure 7. Multi-look image reconstruction: (a) is generated using a single perspective, (b) three perspectives, (c) five perspectives and (d) eight perspectives.
Figure 8 shows a multi-look ISAR image of the Land Rover target used to generate the HRR profiles seen earlier. There are two ‘point-like’ scatterers that are consistent with the area where the rear metal bench seat meets the bulk head behind the drivers cabin and forms a corner like structure. There are some
Figure 8. Multi-look image of the Land Rover target.
12
C. J. BAKER ET AL.
‘pointish-like’ scatterers at the rear of the vehicle although these are rather less distinct. Otherwise the scattering appears quite ‘un-point-like’ and resembles in many ways the form of image for the Cougar. Nevertheless, the shape of the Land Rover is easily discernable and is clearly quite different to that of the Cougar. In this way we start to appreciate the complexity of electro-magnetic backscatter and the ability to reliably and confidently classify real targets.
5. Target Classification Having examined the detailed structure and composition of target signatures we now consider operating on them to make classification decisions. Robust and reliable target classification has the potential to radically increase the importance and value of radar systems. Interest in radar target classification has intensified due to the increase of information available from advanced radar systems. As seen in the previous section this is mainly due to the backscattered signature having high resolution in range and crossrange. Although the description of target scattering is detailed enough to attempt radar target classification, the reliability of this crucial task is influenced by a number of unpredictable factors and is generally poor. For example, one and two-dimensional signatures can both appear significantly different even though the measurements are collected with the same radar and a very similar target geometry (measurement noise, rotational and translational range migration, speckle, etc.). In addition, global or local shadowing effects also noticeably challenge the attempts to classify targets reliably, as will multipath. As many sources of target signature variability depend on target orientation as viewed by the radar system, it may be possible to aid target recognition by using more than one perspective and hence average out the sources causing misclassification. Furthermore, the possible employment of bistatic modes of operation between the nodes of the network might additionally provide a valuable counter to stealth technology (Baker and Hume, 2003). In a monostatic netted radar scenario, the classification performance increases with the number of perspectives. Although the improvements are significant, those benefits are dependent on the location of the nodes with respect to the position of the target and its orientation.
6. Classification Techniques There are two main aspects to target classification. The first is to isolate the target returns from the clutter echoes (e.g. by filtering) and to extract the
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
13
features that can help to distinguish the class of the target. The second aspect is related to the method used for performing the decision as to which class or target type the feature data belongs. When target classification is achieved through automatic computation it is usually referred to as automatic target recognition (ATR). In ATR the classification task requires complex techniques and there are a number of approaches that can be used. For example in a model based technique a model of the target is made by computer aided design (CAD) and electro-magnetic simulations. This enables many simulated versions to be compared with the target signature to be classified. This is a computationally intensive technique. Alternatively, in a template matching based technique, many real versions of the target signatures (at a large number of geometries) are stored in a database and subsequently compared with the target detected in order to assign it to a class. Consequently, a very large database is needed. Further, if the target is altered in some way (e.g. a tank may carry some additional equipment) then the templates may no longer represent the modified signature and the classification can fail. Finally, pattern based techniques exploit features extracted from the input signature. These might include peak amplitudes and their locations in a HRR profile or ISAR image. These then are used to make a multi-dimensional feature vector which can be compared with the stored feature vectors from previous measurements of known targets in order to perform classification. This technique is less costly in terms of computation and it is consistent with a netted radar framework as a number of perspectives of the same object are available. Classification typically requires a high probability of declaration Pdec which is the probability that a detected target will be classified either correctly or incorrectly as a member of the template training set. A target is known when its feature-vector belongs to the training data set. A second performance requirement is to have a high probability of correct classification Pcc . These two parameters are often related each other: if Pdec is low, the ATR system may declare just those cases of very high certainty of correct classification. As a result, the system is able to achieve a very high Pcc but may not classify all possible targets. Finally, a low probability of false alarm Pfa is required, i.e. a low probability that an unknown target is incorrectly classified as a different object in the ATR database. This final task is particularly difficult since it is impossible to include a specific class of unknown targets in the template. This is because there is always a possibility that the classifier would need more information to make a correct decision. The decision, which is usually made automatically, may be performed with two different data input types. The first is one-dimensional target classification by HRR profiles. The HRR profile can be thought of as representing the
14
C. J. BAKER ET AL.
projection of the apparent target scattering centers onto the range axis. Hence the HRR profile is a one-dimensional feature vector. Three classifiers are considered here, a more exhaustive treatment may be found in (Tait, 2006). These are (i) a na¨ıve Bayesian classifier, (ii) a nearest neighbour classifier and (iii) a classifier using neural networks. 6.1. NA¨IVE BAYESIAN MULTI-PERSPECTIVE CLASSIFIER
The na¨ıve Bayesian classifier is a pattern recognition statistical method (Looney, 1998). The decision-making is reduced to pure calculations of feature probabilities. The training set is stored in different databases of templates and therefore the number of classes is known as well as sets of their representative vectors (i.e. the learning strategy is supervised). We consider a set of n c classes {Ci : i = 1, . . . , n c } and a single HRR profile X formed by a sequence of n elements x1 , x2 , . . . , xn as the feature vector. The classifier decides that X belongs to the class Ci showing the highest posterior probability P(Ci | X ). This is done using Bayes’ theorem to calculate the unknown posterior probability of classes conditioned on the unknown feature vector to be classified: P(X | Ci )P(Ci ) P(Ci | X ) = . (9) P(X ) The na¨ıve Bayesian classifier is based on the assumption of class independence of each attribute of the feature vector. Furthermore, since the values x1 , x2 , . . . , xn are continuous, they are assumed Gaussian distributed. Their statistical parameters are deduced from the training set and used to calculate the conditional probability P(xi | Ci ) for the attribute xi and, eventually, the likelihood ratio test: P(C j ) P(X | Ci ) If > ⇒ X ∈ class i. (10) P(X | C j ) P(Ci ) These concepts are integrated for the M-P na¨ıve Bayesian classifier. Here we consider a network of N radars and the sequence of 1D signatures {X j : j = 1, . . . , N } is the information collected by the system. The posterior probability P(Ci | X 1 , . . . , X N ) of the sequence of range profiles conditioned to Ci is the probability that the sequence belongs to that class and by applying Bayes’ theorem, can be expressed as follows: P(Ci | X 1 , . . . , X N ) =
N P(X j | Ci )P(Ci ) . P(X j ) j=1
(11)
Assuming constant the probability P(X 1 , . . . , X N ) with a fixed number of perspectives, the final decision is made for the class Ci that maximises
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
15
P(Ci | X 1 , . . . , X N ). This procedure enables the distinction of a singleperspective stage where all the conditional probabilities P(X j | Ci ) are computed separately for each perspective. 6.2. K -NN MULTI-PERSPECTIVE CLASSIFIER
The K -nearest neighbours (K -NN) is a non-parametric approach to classification (Duda et al., 2001). It consists of measuring and selecting the minimum K distances from the feature vector to be classified and comparing with the templates of the different classes. Consider an input vector X and a population made up of a set of classes {Ci : i = 1, . . . , n c }. Then the distances from di, j to Ti, j , the tth template vector of the ith class, can be computed and stored: di, j = d(X, Ti, j ) = X − Ti, j .
(12)
Subsequently, the K minimum scalars di, j are selected from each class forming a K -dimensional vector D labelled in ascending order. The final decision is made on the basis of the largest number of votes over the K dimensional vector obtained. There are three stages of the M-P K -NN classifier which are implemented as follows: 1. The mono-perspective stage: after the collection of the sequence of feature vectors {X j : j = 1, . . . , N } where N is the number of sensors in the network, the same number of single-perspective classifiers is implemented. The jth classifier computes a vector D j consisting of the K minimum distances from the templates. 2. M-P processing: the whole of the vector D j is processed and the minimum K distances are selected giving a weight for the decision. 3. Classification: the input sequence of feature vectors is associated with the class with the greatest number of weights. Different K values have been tested for this problem: the best trade-off between complexity and classification performance suggests a value K = 5 minimum distances. 6.3. FANN MULTI-PERSPECTIVE CLASSIFIER
Given a feature vector X , artificial neural networks (ANN) learn how to execute the classification task by means of examples (Christodoulou and Georgiopoulos, 2001). They are able to analyse and associate an output corresponding to a particular class of objects. Feed-forward ANN (FANN) supervised with a back-propagation strategy can be implemented. During the learning phase the training samples are used to set internal parameters of the network,
16
C. J. BAKER ET AL.
that is, after giving the templates as inputs to the classifier, the weights are modified on the basis of the distance between the desired and actual outputs of the network. Considering an input vector X and a population made up of a set of n c classes, the execution mode consists of the calculation of the output vector Y = (y1 , y2 , . . . , yn c ). Since the unipolar sigmoid (or logistic) function was chosen as activation function, the elements of the output vectors range from zero to one. The ultimate decision is made for the ith class, where i is the index of the maximum value of Y . If we assume N perspectives, the sequence of feature vectors {X j : j = 1, . . . , N } is the input for the first stage. Each single-perspective network accepts a vector and the partial outputs {Y j : j = 1, . . . , N } are calculated. Subsequently, the output Y of the M-P stage is the mean value of the partial outcomes, and the classification decision is finally made on the basis of the maximum index of Y .
7. Multi-perspective Classification We concentrate on the particular situation in which a single non-cooperative target has been previously detected and tracked by the radar system. Preprocessing raw data is necessary in order to increase the quality of the radar signatures: the target region is isolated and made more prominent thanks to the noise level subtraction from the rest of the range profile (Zyweck and Bogner, 1996). Principal discriminating factors for classification purposes are range resolution, side-lobe level (SLL) and noise level. Higher resolution means better point scatterers separation but the question of compromise regarding how much resolution is needed for good cost-recognition is difficult to resolve. Generally, a high SLL means clearer range profiles but this also implies deterioration in resolution. Eventually, a low noise level means high quality range profiles for classification. Real ISAR turntable data have been used to produce HRR range profiles and images. Three vehicles classified as A, B and C constitute the subpopulation problem. Each class is described by a set of range profiles covering a 360◦ rotation of the turntable. Single chirp returns are compressed giving 30 cm range resolution. The grazing angle of the radar is 8◦ and 2
of turntable rotation is the angular interval between two consecutive range profiles. Therefore, approximately 10,000 range profiles are extracted from each data file over the complete rotation of 360◦ . The training set of representative vectors for each class is made by 36 range profiles, taken approximately every 10◦ for rotation of the target. The testing set of each class consists of the remaining range profiles excluding the templates. The features extracted after principal component analysis (PCA) are the input attributes to the classifier.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
17
Figure 9. Multi-perspective environment deduced from ISAR geometry.
The three algorithms have been implemented and tested in both single and multi-perspective environments. In this way any bias introduced by a single algorithm should be removed. Figure 9 represents a possible approximation of the multi-perspective scenario: each node of the network is assumed as having a fixed position as the target rotates by an angle ψ. The perspective angle ϕ is the angular displacement between the line-of-sights of two consecutive radars. From each of the radar positions either a series of range profiles can be generated as inputs to a one-dimensional classifier or they can be processed into an ISAR image that can be input to a two-dimensional classifier. It is therefore possible to perform classification using multiple perspectives. In Figure 10, the classification performance of the three previously described classifiers is plotted versus the number of perspectives used by the network. Because of the nature of the data and the small available number of targets, the classifiers present a high level of performance when using only a single aspect angle. Improved performance is achieved increasing the number of radars in the network but the greatest improvement in performance can be appreciated with a small number of radars. Since the number of perspectives, and therefore the number of radars, is strictly related to complexity, costs and time burden of the network, in terms of classification purposes, it may be a reasonable trade-off to implement networks involving a small number of nodes. However, this analysis is against a small number of target classes and these conclusions require further verification.
18
C. J. BAKER ET AL.
Figure 10. Multi-perspective classification rates using different numbers of perspectives.
The extent to which SNR affects classification and whether multiperspective scenarios are effective at different SNR levels are now examined. The FANN classifier has been used for this particular task. The range profiles I and Q components are corrupted with additive white Gaussian noise. The original data, after noise removal, has a 28.24 dB SNR. Subsequently, the classifier is tested with range profiles presenting a progressively lower SNR. In Figure 11, five different SNR levels of the same range profile from class A are represented. From the particular orientation, the length of the object is 5.5 m (spanning almost 18 range bins). As the SNR decreases, some of the useful features become less distinct, making the range profile more difficult to be classified. In Figure 12, the performance of the FANN classifier only is plotted versus the number of perspectives used and SNR levels, showing how the enhancement in classification varies with different noise levels. The graph illustrates an increase in classification performance with numbers of perspectives in each case, particularly valuable for lowest SNR levels. However, below an SNR of 17 dB the performance quickly degrades indicating that classifiers will be upset by relatively small amounts of noise.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
Figure 11. Different SNR levels of the same range profile.
Figure 12. Correct classification rates for different SNR levels and netted radars.
19
20
C. J. BAKER ET AL.
8. Choice of Perspectives We now examine the geometrical relationships of nodes and target location and orientation using full scale measurements of representative targets. We start by re-visiting the neural network classification process in a little more detail and examine the classification performance as a function of the angle between the two perspectives taken. In a two-perspective (2-P) scenario, the parameter that distinguishes the perspective node locations is their relative angular displacement φ1,2 = φ2 − φ1 . Hence, after fixing φ1,2 , the 2-P classifier is tested with all possible pairs of HRR profiles displaced by that angle covering all the possible orientations of the target. Having a test set consisting of N profiles, the same number of pairs can be formed to test the 2-P classifier. The training set of representative vectors for each class is made up by 36 range profiles, taken approximately every 10◦ of target rotation. The testing set of each class consists of the remaining range profiles neglecting the templates. The M-P classifier can be seen as the combination of N single-perspective classifiers (i.e. N nodes in the network) whereas the eventual decision is made by processing the outputs of each FANN. For this first stage of investigation, the angle φ1,2 is not processed as information by the M-P classifier. Features from radar signatures have been extracted in order to reduce the intrinsic redundancy of data and simplify the model. This methodology, to a certain extent, decreases the overall correct classification rates but, on the other hand, makes the representation model in the feature space less complex yielding a classification process consistent with those features effectively characterising the object. A typical HRR profile is shown in Figure 13, where a threshold is applied to the HRR profile. The threshold is determined by measuring the mean intensity value neglecting the maximum peak, after normalising the profile. This guarantees adaptive target area isolation and less sensitiveness to the main scatterer reflection. Therefore, the radar length of the target for that particular orientation is measured as the distance between the first and last threshold crossings. This is the first component of the feature vector f . The second component is a measure of the average backscattering of the target whilst the successive M triples contain the information of the M peaks extracted in terms of amplitude, location and width. If the number P of peaks above the threshold is less than M, the last M-P triples are set to zero (Huaitie et al., 1997). Different numbers of peaks have been extracted until the classification process revealed a certain degree of robustness. For M = 4, the feature vector has a dimension of 14 elements while 52 range bin values make up the raw echo profile. Next, PCA has been applied to the profiles in order to better separate the classes in the feature space. By choosing the largest K eigenvectors from the
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
21
Figure 13. Feature extraction from HRR profile.
original 14 of the covariance matrix, the dimension of the new feature vector f
is additionally reduced. In Table I the resulting confusion matrix for K = 10 is shown for the four-class population problem for a single perspective classifier. Radar classification is highly dependent on the object orientation. Since radar targets are usually man-made objects, they often present a number of 3D symmetries. In this work, a low grazing angle is used to simulate a groundbased scenario, where the radar system, the target and their relative motion vectors lie on the same plane. This geometry allows us to consider the 2D problem, whereby the perspectives represented by 1D signatures collected by the measurement system are on the same plane. For example, for most of the possible 2D ground-vehicle orientations, with 180◦ between the two perspectives, the corresponding profiles might expect to be quite highly correlated. TABLE I. Single-perspective FANN confusion matrix on features after PCA using K = 10 (correct classification rate = 74.4%). Output → Input ↓ Class A Class B Class C Class D
Class A
Class B
Class C
Class D
75.85 4.75 2.81 12.91
11.19 80.91 18.98 0.76
9.57 13.26 69.94 15.26
3.39 1.08 8.27 71.07
22
C. J. BAKER ET AL.
Figure 14. 2-P correct classification rates versus the angular displacement φ1,2 for the threeclass problem.
This is due to the 180◦ symmetry typically exhibited by vehicles and hence little extra information is added. If this is the case it will cause a reduction in target characterisation and eventually, of M-P correct classification rate (CRR) improvement. However, details such as rear-view mirrors, the antenna position and any non-symmetrical or moving parts will change the two signatures, producing CRR benefits when compared with single-perspective classifiers. We now consider the relationship between nodes location and target orientation, investigated for the cases of both two and three-perspective classifiers. In a 2-P scenario the angular perspective displacement between radars is the discriminant factor for the combined CRR: as the two range profiles decorrelate (i.e. φ1,2 increases) the information content of the pair-increases. Figure 14 shows the classification performance as a function of angular separation of the two perspectives. The equivalent monostatic CRR is shown at the 0◦ and 360◦ positions. The single target classification rates show how the global accuracy depends on the peculiar geometric features of the target. The drop at φ1,2 = 180◦ , as previously hypothesised, is due to the multiple axes of symmetries of the targets and it is visible for all the classes. For targets A and B there is also a drop at 45◦ indicating a possible further degree of symmetry. The relationship between M-P classification and range profiles information content can be deduced from Figure 15, where the cross-correlation between profiles collected from different perspectives is represented for the class C target.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
23
Figure 15. Non-coherent cross-correlation between profiles belonging to class C.
The regions of high cross-correlation influence the M-P classifier: when 90◦ of separation occurs (dotted line in Figure 15), the profiles between two perspectives, the input profiles taken in the range [120–150]◦ are highly correlated with the ones belonging to the orientations [210–240]◦ . M-P classification maxima and minima are mainly related to geometric symmetries. We now add to the dataset another sub-population class D and a new M-P FANN classifier. All the internal parameters of the Neural Network were changed by the new learning phase. As a result, the decision boundaries between the classes are modified. For this reason, as can be observed in Figure 16, class B is more likely misclassified than class C, whose CRR remain almost unaltered. The four-class CRR shows the overall performance deterioration in terms of classification. On the other hand, the internal symmetries that influence the classifier remain unchanged by adding elements to the population. For example, it is thought that the target belonging to class C has a number of multiple-bounce phenomena from corner-like scatterers. Their persistence is less than 15◦ , causing the number of relative maxima within less than 90◦ of separation. Every target shows different trends: 90◦ angular perspective displacement can mean either an improvement or reduction in information content from a backscattering information point of view. This is due to the detailed make up of the target and the way in which differential symmetries can be exhibited.
24
C. J. BAKER ET AL.
Figure 16. 2-P correct classification rates versus the angular displacement φ1,2 for the fourclass problem.
This can make the overall M-P classification rates less sensitive to the angular perspective displacement of the nodes. The effects of symmetries on CRR performance can also be seen in the neighborhood of φ1,2 = 90◦ and φ1,2 = 270◦ . This is verified when the
relationship between two perspective displacements φ1,2 and φ1,2 is:
. φ1,2 = π − φ1,2
(13)
When the perspective condition expressed in the above equation is verified, the 2-P performance is similar because of the intrinsic geometrical symmetries of classes A, B and C. This is verified in both the three and four-class problems shown in Figures 14 and 16, illustrating the M-P independence of the crossaffinity between different targets. The de-correlation rate of the HRR profiles also seems to vary depending on the particular target. In general, it appears proportional to the number of wave trapping features and their persistency. If a single main scatterer has a high persistency over a wide range of orientations, the cross-correlation of the profiles over the same range of target orientations is expected to be high and hence we conclude there is less extra information to improve classification. The more distinguishing features that exist, the greater the separation benefits achieved for single-aspect classification. This concept is amplified for the M-P environment since those crucial features appearing for few orientations affect a greater number of inputs now represented by the perspective
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
25
Figure 17. 3-P correct classification rates versus the angular displacements φ1,2 and φ1,3 for the three-class problem.
combination. For example, the length of the target is an important classification feature. For the M-P environment those targets presenting lengths not common to the other objects have even greater benefits. The mean CRR of the 2-P classifier is increased from 74.4% to 85.7% (+11.3%) with respect to the single-perspective case, while for the single class D the probability of correct classification increases by 13.1%. However, this result is highly averaged but is indicative of the overall performance improvement. In Figure 17 the classification rates for three perspectives are shown. Here the first perspective is fixed whilst the other two slide around the target covering all 360◦ . Thus these are shown as a function of the angular displacement φ1,2 between the second and the first node, and φ1,3 between the third and the first node. The origin represents the mono-perspective case and again indicates the overall improvement offered by the multi-perspective approach. The bisector line corresponds to two radar systems at the same location, while the lines parallel to the bisector symbolise two perspectives displaced by 90◦ and 180◦ . The inherent symmetry in the radar signature of the vehicle gives rise to the relatively regular structure shown in Figure 17. In a 2-Perspective scenario, if the two nodes view the target from the same perspective (i.e. the two radars have the same LOS) then this gives the monoperspective classification performance. This is not the case for a 3-Perspective
26
C. J. BAKER ET AL.
Figure 18. 3-P correct classification rates versus the angular displacements φ1,2 and φ1,3 for the four-class problem.
network. This is a consequence of weighting twice the perspective of two nodes and once the third node’s perspective. As a result, the 3-Perspective performance when φ1,2 = 0 and φ1,3 = 0 is worse than the 2-Perspective scenario that simply neglects one of the coinciding perspectives. In Figure 18, the CRR of the four-class population problem is represented with respect to the angular displacements between the three nodes of the network. As has been observed for the 2-Perspective case, the classification performance is less sensitive to the aspect angle when considering a greater number of classes problem. This may suggest that in a real environment with a large number of classes, the M-P classification could be equally effective for any target position provided that each node LOS is spaced enough from the others in order to collect uncorrelated signatures. The 3-P classifier probability of correct classification is increased from 74.4% to 90.0% (+15.6%) with respect to the single-perspective case.
9. Summary and Conclusions After implementing a multi-perspective classifier for ATR, the results in terms of classification rates have been examined using features extracted from HRR profile signatures. The benefits of the M-P classifier implementation have
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
27
been analysed, showing a non linear but very clear CCR improvement with the number of perspectives. Furthermore, the correct classification gain from employing multi-perspectives is achievable for different SNRs and any radar node location. However, there is a small variation in the correct classification rate for a relatively constrained set of node locations. This is due to inherent geometrical symmetries of man-made objects. The M-P classification performance has been described for two and three perspectives and applied to a three- and a four-class problem. The multiple perspectives affect the single class probability of correct classification differently depending mainly on the number, nature and persistency of scattering centers appearing in the profiles. As a consequence, the node location dependence of the global classification rate decreases when a greater number of classes is involved, making the classifier equally reliable for a wide range of angular displacements. The M-P classification improvements are reduced when the nodes are closely separated since the perspectives exhibit a significant degree of correlation. Nevertheless, the overall probability of correct classification is well above the mono-perspective case (+11.3% and 15.6% using two and three perspectives respectively). In addition the complexity and variability of reflectivity from real targets has been highlighted. Multiple perspective classification doesn’t necessarily offer a trouble free route to acceptable classification and requires further testing under more realistic conditions. It also helps to indicate what information in the radar signature is important for classification. However, much further research remains before routine and reliable classification by radar becomes the norm.
Acknowledgement The work reported herein was funded by the Electro-Magnetic Remote Sensing (EMRS) Defence Technology Centre, established by the UK Ministry of Defence and run by a consortium of Selex, Thales Defence, Roke Manor Research and Filtronic. The authors would also like to thank Thales Sensors for providing ADAS files to investigate HRR range profiles and ISAR images from real targets.
References Baker, C. J. and Hume, A. L. (2003) Netted Radar Sensing, IEEE Aerospace and Electronic Systems Magazine 18, 3–6. Carrara, W. G., Goodman, R. S., and Majewski, R. M. (1995) Spotlight Synthetic Aperture Radar: Signal Processing Algorithms, Boston, Artech House.
28
C. J. BAKER ET AL.
Christodoulou, C. and Georgiopoulos, M. (2001) Applications of Neural Networks in Electromagnetics, Boston, Artech House. Curlander, J. C. and McDonough, R. N. (1991) Synthetic Aperture Radar: Systems and Processing, New York, Wiley. Duda, R. O., Hart, P. E., and Stork, D. G. (2001) Pattern Classification, New York, John Wiley and Sons. Hu, R. and Zhu, Z. (1997) Researches on Radar Target Classification Based on High Resolution Range Profiles, In Proceedings of the IEEE Aerospace and Electronics Conference, Vol. 2, pp. 951–955. Huaitie, X., Zhaowen, Z., Zhenpin, C., Songhua, H., and Biao, G. (1997) Aircraft Target Recognition Using Adaptive Time-delay Neural Network, In Proceedings of the Aerospace and Electronics Conference, Vol. 2, pp. 764–768. Keller, J. B. (1962) Geometrical Theory of Diffraction, Journal of the Optical Society of America 52, 116–130. Knott, E. F., Shaeffer, J. F., and Tuley, M. (1985) Radar Cross Section, Boston, Artech House. Looney, C. G. (1998) Pattern Recognition Using Neural Networks, Oxford, Oxford University Press. Novak, L. M., Halversen, S. D., Owirka, G., and Hiett, M. (1997) Effects of Polarization and Resolution on SAR ATR, IEEE Transactions on Aerospace and Electronic Systems 33, 102–116. Showman, G. A., Richards, M. A., and Sangston, K. J. (1998) Comparison of Two Algorithms for Correcting Zero-Doppler Clutter in Turntable ISAR Imagery, In Proceedings of the Conference on Signals, Systems & Computers, Vol. 1, pp. 411–415. Tait, P. (2006) Introduction to Radar Target Recognition, Bodmin, Cornwall, UK, IEE, Peter Perigrinus Publishing. Wehner, D. R. (1995) High Resolution Radar, Boston, Artech House. Wilkinson, A. J., Lord, R. T., and Inggs, M. R. (1998) Stepped-frequency Processing by Reconstruction of Target Reflectivity Spectrum, In Communications and Signal Processing, ’98, pp. 101–104. Zyweck, A. and Bogner, R. (1996) Radar target classification of commercial aircraft, IEEE Transactions on Aerospace and Electronic Systems 32, 598–606.
RADAR IMAGING FOR COMBATTING TERRORISM
Hugh D. Griffiths and Chris J. Baker Department of Electronic and Electrical Engineering University College London, London, UK
Abstract. Radar, and in particular imaging radar, have many and varied applications to counterterrorism. Radar is a day/night all-weather sensor, and imaging radars carried by aircraft or satellites are routinely able to achieve high-resolution images of target scenes, and to detect and classify stationary and moving targets at operational ranges. Short-range radar techniques may be used to identify small targets, even buried in the ground or hidden behind building walls. Different frequency bands may be used, for example high frequencies (X-band) may be used to support high bandwidths to give high range resolution, while low frequencies (HF or VHF) are used for foliage penetration to detect targets hidden in forests, or for ground penetration to detect buried targets. The purpose of this contribution is to review the fundamental principles of radar imaging, and to consider the contributions that radar imaging can make in four specific aspects of counterterrorism: through-wall radar imaging, radar detection of buried targets, tomography and detection of concealed weapons and passive bistatic radar. Key words: radar imaging, ground penetrating radar, passive bistatic radar
1. Introduction Radar techniques and technology have been developed over many decades. Radar has the key attributes of day/night, all-weather performance, and also the ability to measure target range accurately and precisely. The techniques of imaging radar date back to the Second World War, when crude radar images were obtained on the displays of airborne radar systems, allowing features such as coastlines to be distinguished. A major advance took place in the 1950s, when Wiley in the USA made the first experiments with airborne synthetic aperture radar (Wiley, 1985), and nowadays synthetic aperture radar is an important part of the radar art, with radar imagery from satellites and from aircraft routinely used for geophysical remote sensing and for military surveillance purposes. The enormous advances that have been made in imaging radar 29 J. Byrnes (ed.), Imaging for Detection and Identification, 29–48. C 2007 Springer.
30
H. D. GRIFFITHS AND C. J. BAKER
since the 1950s owe a great deal to the development of fast, high-performance digital processing hardware and algorithms. The objective of this chapter is to review the application of imaging radar systems to counterterrorism. In so doing, we need to bear in mind the things that radar is good at doing, and the things that it is not good at doing. At a fundamental level, then, we will be concerned with the electromagnetic properties of the targets of interest, and of their contrast with the surroundings. We will be concerned with how these properties vary with frequency, since targets may exhibit resonances (and hence enhanced signatures) at certain frequencies, and with the propagation characteristics through materials such as building walls and soil. Hence, we will be interested in techniques which allow us to distinguish targets from the background, by exploiting differences in signature, and wherever possible making use of prior knowledge. It is likely that additional information may be obtained from multiple perspective views of targets (Baker et al., 2006). Further information may be obtainable through the use of techniques such as radar polarimetry and interferometry. There are two distinct stages to this: (i) the production of high-quality, artefact-free imagery, and (ii) the extraction of information from imagery. It should not be expected that radar images should look like photographs. Firstly radar frequencies are very different from those in the optical region. Secondly, radar is a coherent imaging device (just like a laser) and therefore exhibits multiplicative speckle noise (just like a laser). This makes for an extremely challenging image interpretation environment. The structure of the chapter is therefore as follows. Firstly, we provide a brief review of the fundamentals of radar imaging, establishing some of the fundamental relations for the resolution of an imaging radar system. This is followed by a discussion of four specific applications to counterterrorism: (i) the detection of buried targets; (ii) through-wall radar imaging; (iii) radar tomography and the detection of concealed weapons; (iv) passive mm-wave imaging; and (v) passive bistatic radar. This is followed by some discussion and some comments on future prospects. 2. Fundamentals of Radar Imaging Firstly we can establish some of the fundamental relations for the resolution of an imaging radar system. In the down-range dimension, resolution r is related to the signal bandwidth B, thus c r = (1) 2B where c is the velocity of propagation. High range resolution may be obtained either with a short-duration pulse or by a coded wide-bandwidth signal, such
RADAR IMAGING FOR COMBATTING TERRORISM
31
as a linear FM chirp or a step-frequency sequence, with the appropriate pulse compression processing. A short-duration pulse requires a high peak transmit power and instantaneously broadband operation; these requirements can be relaxed in the case of pulse compression. With the advances in digital processing power it is now relatively straightforward to generate wideband coded waveforms and to perform the pulse compression processing in the receiver in real time. In the first instance cross-range resolution is determined by the product of the range r and the beamwidth θB (in radians). The beamwidth is determined by the dimension d of the antenna aperture and thus the cross-range resolution is given by x = r θB ≈
rλ d
(2)
where λ is the wavelength. As the dimensions of most antennas are limited by practical considerations (such as fitting to an aircraft) the cross-range resolution is invariably much poorer than that in the down range dimension. However, there are a number of techniques that can improve upon this. All of these are ultimately a function of the change in viewing or aspect angle. The cross-range resolution achieved in synthetic aperture radars is determined by the relative motion between the radar and the object. Consider the scenario in Figure 1. For this geometry, Brown (1967) defines the point target response of a mono chromatic CW radar as x 4π 2 W (x) = A exp j r + x2 . (3) r λ
Figure 1. A target moving perpendicularly past a monochromatic CW radar.
32
H. D. GRIFFITHS AND C. J. BAKER
Computation of the instantaneous frequency of the response allows the bandwidth to be written as 4π 4π x θ B= = sin (4) √ λ r2 + x2 λ 2 and hence the cross-range resolution is given by x =
λ π = . B 4 sin(θ/2)
(5)
For a linear, stripmap-mode synthetic aperture, Equation (5) reduces to x = d/2, which is independent of both range and frequency. Even higher resolution can be obtained with a spotlight-mode synthetic aperture, steering the realaperture beam (either mechanically or electronically) to keep the target scene in view for a longer period, and hence forming a longer synthetic aperture. Thus as θ → 180◦ , the cross-range resolution x → λ/4. A practical maximum value for θ is likely to be no more than 60◦ , leading to x ≈ λ/2, for real systems limited by practical considerations such as range ambiguity, range curvature, SNR, etc. The equivalent result for range resolution may be obtained by writing (1) as c λ f0 r = = . (6) 2B 2 B The fractional bandwidth, B/ f 0 could in principle be 200%, with the signal bandwidth extending from zero to 2 f 0 , and giving r = λ/4, but a practical maximum value is likely to be closer to 100%, giving r ≈ λ/2. In the last year or so results have appeared in the open literature which approach this limit (Brenner and Ender, 2004; Cantalloube and DuboisFernandez, 2004). Figure 2 shows one example from a recent conference of an urban target scene. Critical to the ability to produce such imagery is the ability to characterise and compensate for platform motion errors, since in general the platform will not move with perfectly uniform motion in a straight line. Of course, motion compensation becomes most critical at the highest resolutions. This is conventionally achieved by a combination of inertial navigation (IN) and autofocus processing, and a number of different autofocus algorithms have been devised and evaluated (Oliver and Quegan, 1998). In practice IN sensors are sensitive to rapidly varying errors and autofocus techniques are best able to correct errors which vary on the same spatial scale as the synthetic aperture length, so the two techniques are essentially complementary. It is possible to improve the resolution in images using so-called superresolution image processing techniques. Under most favourable conditions an improvement of a factor of between 2 and 3 in image resolution is achievable,
RADAR IMAGING FOR COMBATTING TERRORISM
33
Figure 2. High resolution aircraft-borne SAR image of a part of the university campus in Karlsruhe (Germany). The white arrow refers to a lattice in the left courtyard, which is shown in more detail in the small picture on the left bottom. The corresponding optical image is shown on the left top (after Brenner and Ender (2004)).
though to achieve this it is necessary that the signal-to-noise ratio should be adequately high (at least 20 dB) to start with and that imagery should be free from artefacts. Other means of extracting information from the target scene, and hence of providing information to help discriminate the target from clutter and to classify the target, include (i) interferometric processing to provide highresolution three-dimensional information on target shape; (ii) polarimetric radar, since the polarimetric scattering properties of targets and of clutter may be significantly different, especially if the target includes dihedral or trihedrallike features; and (iii) multi-aspect imaging, since views from different aspects of a target will almost certainly provide greater information to assist the classification process (Baker et al., 2006).
3. Applications to Counterterrorism 3.1. RADAR DETECTION OF BURIED TARGETS
An important application is the use of radar to detect and classify objects buried in the ground. Specifically in a counterterrorism context such objects may take the form of landmines and improvised explosive devices (IEDs), weapons caches and tunnels, though other applications include archaeology
34
H. D. GRIFFITHS AND C. J. BAKER
and the detection of buried pipes and cables. Fundamental to such applications are the propagation characteristics of electromagnetic radiation through soil, and at the boundary between air and soil, and how these characteristics depend on frequency and on soil properties. In general it can be appreciated that a lower frequency may give lower propagation loss than a higher frequency, but will in general give poorer resolution, both in range and in azimuth. Daniels (2004) has provided a comprehensive account of the issues in ground penetrating radar (GPR) and examples of systems and results. He states that ‘. . . GPR relies for its operational effectiveness on successfully meeting the following requirements: (a) efficient coupling of electromagnetic radiation into the ground; (b) adequate penetration of the radiation through the ground having regard to target depth; (c) obtaining from buried objects or other dielectric discontinuities a sufficiently large scattered signal for detection at or above the ground surface; (d) an adequate bandwidth in the detected signal having regard to the desired resolution and noise levels.’ Daniels provides a table of losses for different types of material at 100 MHz and 1 GHz (Table I) and presents a taxonomy of system design options (Figure 3). The majority of systems use an impulse-type waveform and a sampling receiver, processing the received signal in the time domain. More recently, however, FMCW and stepped frequency modulation schemes have been developed, which require lower peak transmit powers. Both types of systems, though, require components (particularly antennas) with high fractional bandwidths, which are not necessarily straightforward to realise Figure 4. As an example of the results that can be achieved, Figure 5 shows images of a buried antipersonnel mine at a depth of 15 cm, showing both the original image and the results after image processing techniques have been used to enhance the target. c IET, 2004). TABLE I. Material loss at 100 MHz and 1 GHz (after Daniels (2004) Material Clay (moist) Loamy soil (moist) Sand (dry) Ice Fresh water Sea water Concrete (dry) Brick
Loss at 100 MHz (dB m−1 )
Loss at 1 GHz (dB m−1 )
5–300 1–60 0.01–2 0.1–5 0.1 100 0.5–2.5 0.3–2
50–3000 10–600 0.1–20 1–50 1 1000 5–25 3–20
RADAR IMAGING FOR COMBATTING TERRORISM
35
Figure 3. System design and processing options for ground penetrating radar (after Daniels c IET, 2004). (2004)
c IET, Figure 4. Physical layout of ground penetrating radar system (after Daniels (2004) 2004).
36
H. D. GRIFFITHS AND C. J. BAKER
Figure 5. Oblique antipersonnel mine at an angle of 30◦ : (a) B-scan of raw data; (b) after c IET, 2004). migration by deconvolution; (c) after Kirchhoff migration (after Daniels (2004)
3.2. THROUGH-WALL RADAR IMAGING
The ability to image targets through building walls, to detect and classify personnel and objects within rooms, is of significant importance in counterterrorism operations, and there has been a great deal of work on the subject in the last decade. Essentially similar considerations to those for GPR apply, in that a lower frequency may give lower propagation loss than a higher frequency, but will in general give poorer resolution, both in range and in azimuth. The final line of Table I shows the attenuation of brick at frequencies of 100 MHz and 1 GHz, but many internal building walls may be made of material of lower attenuation. Police officers, search and rescue workers, urban-warfare specialists and counteterrorism agents may often encounter situations where they need to detect, identify, locate and monitor building occupants remotely. An ultrahigh resolution through-wall imaging radar has the potential for supplying this key intelligence in a manner not possible with other forms of sensor. High resolution and the ability to detect fine Doppler such as heart beat movements enable a detailed picture of activity to be built up so that appropriate and
RADAR IMAGING FOR COMBATTING TERRORISM
37
informed decisions can be made as to how to tackle an incident. For example, in a hostage scenario the layout of a room can be established and it may be possible to differentiate hostage from terrorist. An escaping suspect hidden behind a bush or wall can be detected and tracked. Alternatively, in the aftermath of a terrorist event, people buried in rubble but alive can be detected on the basis of radar reflections from their chest cavity, thus helping to direct rescue workers to best effect. Key enabling technology and techniques are: r distributed aperture sensing for very high (∼λ or better) cross-range resolution in both azimuth and elevation dimensions; r ultra-wideband pulsed waveforms (to give wavelength equivalent range resolution); r novel near-field focus tomographic processing; r narrow band adjunct waveforms for fine Doppler discrimination. Whilst aspects of each of these has been reported individually (Aryanfar and Sarabandi, 2004; Song et al., 2005; Yang and Fathy, 2005), there has been little or no research combining them to generate wavelength-sized resolution in three dimensions enabling fine discrimination of internal objects (e.g. an individual holding a gun could be potentially identified and position determined against background furniture). Much of the technology necessary already exists and can be procured at low cost (of the order of a few thousand pounds, aided by the fact that transmitter powers need only be a fraction of a Watt). The key research challenges lie primarily in developing processing techniques and algorithms that allow a wavelength (1015 /0.2(/pF), with an input bias current as low as I B = 3fA. The core of the Electronic Tongue is an ad-hoc programmed 8-bit RISC microcontroller (AT90LS8535), which is responsible for managing the measurement flow: r control of the multiplexer’s line switching; r control of the A/D’s functionalities;
80
L. LVOVA ET AL.
Figure 4. Simplified schematic of the Electronic Tongue electronics.
r communication with PC using an RS232 line. Data A/D conversion is performed using a 12 bit (plus polarity) converter, whose functionalities are managed by the controller. An external low-noise, precision voltage reference with extremely low temperature coefficient (0.5 ppm/C) has been used. 8. “Chemical Images of Liquids”—Selected Applications A brief review of selected applications of electrochemical multisensor arrays is presented in this section and several examples of Electronic tongue applications are discussed in details. Evaluation of the ability to distinguish compounds responsible for basic tastes and by detection such compounds in analyzed sample to evolve the sample “taste” were performed in several early works of Toko and coworkers (Iiyama et al., 1995, 2003) and other authors (Habara et al., 2004; Riul et al., 2002). Besides a plenty of works related to the qualitative discrimination of waters, this target is still attract many researchers, since a monitoring of water quality is an important task for food safety and environmental control (Natale et al., 1997; Rudnitskaya et al., 2001). Common methods permit a separate determination of several standard water parameters like pH, COD (chemical oxygen demand), BOD (biological
CHEMICAL IMAGES OF LIQUIDS
81
oxygen demand), water hardness expressed in Ca2+ and Mg2+ content, conductivity, turbidity, content of several inorganic anions, etc. Nevertheless some these methods are enough complicated (for instance, COD and BOD detection demand sample pre-exposition and harmful dichromate oxidation). Often an integral oreview is preferable for classification of water samples, especially in ecological monitoring tasks for alarm-like purposes (Krantz-Rlcker et al., 2001). An application of multisensor systems can provide such type of information. “Images” of water samples provided by multisensor systems may be classified according to a membership to defined class (like pure standard water or polluted water). Moreover, several quantitative parameters can be evolved. An interesting application of potentiometric sensor array based on thick film technology and composed of RuO2 , C, Ag, Ni, Cu, Au, Pt and Al micro-electrodes printed on to alumina substrate for analysis of six Spanish natural mineral waters was reported in Martinez-Mez et al. (2005c). Basing on possibility to extract chemical information from spontaneous polarization processes on the electrode surface, authors could perform not only an identifi2− 2+ cation of mineral water, but also quantitative analysis of HCO− 3 , SO4 , Ca 2+ and Mg ions content. The calibration PLS models were built on the base of known content of mentioned ions and potential readings of 12-element sensor array. Received correlation coefficients were 0.954, 0.913, 0.980 and 0.902 for bicarbonate, sulphate, calcium and magnesium ions correspondingly. Quantitative determinations of copper, zinc, lead and cadmium activity at low ppb and ppt levels in mixed artificial solution mimicking marine water (with salinity about 30 pc) was reported in Legin et al. (2004b). Average relative error in prediction was 20 to 30 pc, which is encouraging result for ultra low activity. Wine production is a one of the biggest branches of modern food industry and a control of wine production, aging and storage processes as far as taste properties is demanded. Many applications of Electronic tongue systems were reported for recognition of red wines from different vineyards (Legin et al., 1999b, 2001). Results of wine analysis were correlated with taste panel and the standard chemical analysis data. At the same time a few applications were reported for white wine analysis. Either if white wine has a shorted life time (3 to 5 years in comparison to 5 to 20 years for red wines) it occupies a considerable part of the wine market. In Lvova et al. (2004) a potentiometric electronic tongue with PVC plasticized membrane sensors doped with several porphyrins (H2 TPP, Co(II, III) and Pt(II, IV) porphyrinates) deposited on glassy carbon working electrodes were evaluated for analysis of Italian ‘Verdicchio’ dry white wines. Porphyrin-based “electronic tongue” system discriminated between artificial and real wines and simultaneously distinguished wine samples from different cantinas and production year, Figure 5. The more old wines have a negative scores alone PC3 component, while a clear discrimination between real artificial wine is directed alone PC1.
82
L. LVOVA ET AL.
Figure 5. ‘Verdicchio’ wines identification.
Several attempts of qualitative ethanol determination with artificial taste systems have been undertake. Thus, in Arikawa et al. (1995) along with the discrimination of four kinds of sake, the concentration of ethanol in analytes was determined by means of taste sensor. Electronic tongue array comprised of eight potentiometric solid state sensors with polymeric plastisized membranes based on metallo-porphyrins was applied for “chemical imaging” of alcoholic beverages made of two different source materials grape and barley and quantitative determination of alcoholic degree of these beverages (Lvova et al., in press b). Prior to real samples analysis, multivariate calibration of Electronic tongue was performed using 36 calibration solutions containing four alcohols in various concentrations. Model was trained with PLS and good correlation of real and predicted ethanol content (R = 0.952) has been received. The alcoholic strength of real alcoholic beverages was evaluated according to the evolved one dimensional scale of ‘alcoholic degree’ of beverages in ethanol concentration, Figure 6. Except outlaying samples of beer “Ceres” and Ballantine’s whiskey, a good correlation of predicted using electronic tongue alcohol content and labeled by manufacturers was found. The possibility of porphyrin-based electronic tongue system to distinguish alcoholic beverages made of different source material: grapes and barley was also shown, Figure 7. Two areas, corresponding to the alcoholic drinks made of
83
CHEMICAL IMAGES OF LIQUIDS
Figure 6. Evaluation of beverages alcoholic degree using porphyrin-based electronic tongue.
100 gottod'oro bianc
50
whiskey 2 31 whiskey whiskey
PC2,21%
moretti 3 leffe 2 ceres peroni 2 3 peroni leffe33 moretti 2 ceres 2
0
-50
velletri bianco velletri bianco gottod'oro b gottod'oro bian velletri bianco gottod'oro rosso
I gottod'oro rosso gottod'oro rosso castelliromani castelliromani castelliromani r rr
II -100
grappa grappa 3 1 grappa 2
-150 -150
-100
-50
0
50
100
150
PC1,69% Figure 7. Imaging of grape and barley made alcoholic drinks by means of porphyrin-based electronic tongue system. All the analyzed beverages were available in local stores in Rome, Italy. Grape-made alcoholic drinks were as follows: two kinds of white dry wines (“Velletri superiore” by Le Contrade, Italy, 11.5 vol, 2001; “Marino” by Gotto D’Oro, Italy, 11 vol, 2002); two kinds of red dry wines (“Castelli Romani” by Le Contrade, Italy, 11 vol, 2002; “Castelli Romani” by Gotto D’Oro, Italy, 10.5 ± 1.5 vol, 2003) and grappa “Colli del Frascati” by “IlahCoral” s.r.l., Italy, 38 vol, 2003. All the wines were of DOC quality i.e. denominated to controlled origin. Beverages made of barley were: two lager beers (“Moretti” 4.6 vol and “Peroni” 4.7 vol both from Italian manufactures ); red beer “Ceres”, 6.5 vol, Denmark; amber strong beer “Leffe”, 8.2 vol, Belgium and Ballantine’s scotch whiskey by G.Ballantine and son LTD, Scotland, 40 vol, 2003.
84
L. LVOVA ET AL. 150
"Budweiser"** 100
"Cafri"* Light taste
PC2 (30%)
50
0
"OBLager"* can "Cass"* can
-50 "OBLager"*
"RedRock"* "Beck's" light
"Hite"* can
bottle
Mild taste
Rich taste
-100
"Beck's" dark -150
-100
-50
0
50
100
150
PC1(32%) Figure 8. A taste map (PCA score plot) of several beers available in Korea obtained from the all-solid-state electronic tongue chips.
grape and barley (areas I and II along the PC1 axis on Figure 6 correspondingly) may be evolved. Classification of spirits such as ethanol, vodka, eau-de-vie and cognac has been performed by means of electronic tongue system based on array of potentiometric sensors in Legin et al. (2005b). The system distinguished different samples of cognac from each other, and from any eau-de-vie, synthetic and alimentary ethanol and various sorts of spirit differing in ethanol quality, and presence of contaminants in vodka. Samples of different brands of beer produced in different countries were analyzed with lipid taste sensor Toko (2000). In Lvova et al. (2002) several beers were distinguished from each other by means of all-solid-state potentiometric Electronic tongue, correlation between beer taste characteristics and multisensor systems output has been found and a “beer taste map” was evolved, Figure 8. An impressive example of chemical imaging of port wines according to the wine age was demonstrated in Rudnitskaya et al. (2005). The electronic tongue comprised of 28 potentiometric sensors were applied for analysis of two sets (22 and 170 samples correspondingly) of port wines aged in oak casks for 10, 20, 30 and 40 years, Vintage, LV and Harvest wines (2 set) of 2 to 70 years old. The resulted PLS precision of port wine age prediction with Electronic tongue system was 8 pc, i.e. one year. Several quantitative parameters
CHEMICAL IMAGES OF LIQUIDS
85
Figure 9. PCA classification of diluted and pure vinegars.
such as pH (1 pc), total (8 pc), volatile (21 pc) and fixed (10 pc) acidity as well as content of organic acids (tartaric −10 pc, malic −15 pc), sulphate (9 pc) and sulphur dioxide (24 pc) were evaluated in port wines by means of preliminary calibration of Electronic tongue system and further prediction of unknown samples with adequate accuracy. Other application closely related to the wine analysis is vinegars identification. Vinegar production is related with a wine industry (wine utilization for vinegar acidification) and identification of vinegars is important due to the effort to achieve an adequate quality of production, to ensure uniformity within a brand, to avoid falsifications (dilutions). An electronic tongue system based on the array of six metallic potentiometric sensors (Cu, Sn, Fe, Al, brass and stainless steel wires) was developed and utilized for discrimination of foodstuffs: several types of vinegar and fruit juices. The result of “imaging” of vinegars prepared from white and red wines, balsamic vinegar and diluted vinegars samples, is given on Figure 10. According to the first two principal components (PC1, 86 pc and PC2, 13 pc), two main areas can be indicated on the plot, one of them includes all the pure vinegar samples, while the other embraces tap water and diluted vinegar samples. Simultaneously with qualitative discrimination, metallic electronic tongue system was able to observe the direction of quantitative dilution of vinegars. Fruit juices, dairy products (Winquist et al., 2005), various drinks (black and green teas, coffer) and soft drinks are popular objects of electronic tongues
86
L. LVOVA ET AL.
Figure 10. Imaging of Lavazza coffee, black tear Lipton and several sorts of Korean green tea.
application. A multicomponent analysis of Korean green tea samples was performed in Lvova et al. (2003) by all-solid-state ‘electronic tongue’ microsystem comprised of polymeric sensors of different types based both on PVC and aromatic polyurethane (ArPU) matrices doped with various membrane active components, electrochemically deposited conductive films of polypyrrole (PPy) and polyaniline (PAn) and potentiometric glucose biosensors. The system successfully discriminated different kinds of teas (black and green) and natural coffee, Figure 10. The output of electronic tongue in green teas correlated well with the manufacture’s specifications for (−)-EGC concentration, total catechines and sugars content, l-arginine concentration which was determined with enzyme destruction method and then concentrations of mentioned components were determined in green tea samples with unknown manufacturer specifications. Clinical analysis of biological liquids is another area of electronic tongue systems application. In Legin et al. (1999c) a multisensor system comprising an array of 30 sensors with solvent polymeric membranes doped with various MAC and back-propagation artificial neural network as data processing tool has been applied for multicomponent analysis of solutions close in composition to biological liquids (blood plasma). It has been found that such an approach allows to determine Mg2+ , Ca2+ , pH, HCO3 −, HPO2− 4 in typical ranges with average precision 24 pc that allows to suggest the method as a perspective one for clinic analysis.
CHEMICAL IMAGES OF LIQUIDS
87
Figure 11. Human urine classified by potentiometric electronic tongue based on metallic sensors.
Six potentiometric metallic sensors: Co, 99.9 pc pure, brass (alloy of Cu and 20 wt pc Zn), two alloys of silver (Ag42-Cu17-Zn16-Cd25, Ag60-Cu26-Sn14, where numbers in wt pc), two component alloy Sn40-Pb60, and Cu-P (copper doped with 5 wt pc of phosphorous) and two PVC based electrodes doped with 3 wt pc of monensin and 5 wt pc of TpClPBK were utilized for clinical analysis of human urines. A PCA discrimination score plot of 36 solutions mimicking human urine composition (points 1–131) and 14 real human urine samples (points 132 to 173) are shown on Figure 11. Simultaneously with discrimination between artificial and real urine samples, the groups corresponding to healthy and persons affected by pathologies have been evolved on PCA classification step. It is interesting to notice here, that samples NN 91, 92 has provided by a person, following a strict vegetarian diet. Moreover, a blind sample (points 103 to 105) has been classified as a non urine sample (according to the legend sample appeared a standard black tea preparation). In closing several examples of untypical applications of electronic tongue systems should be reviewed. First of all, a several impressive examples of imaging of bacteria, molds and yeasts species (Sderstrm et al., 2003, 2005) as far as for fermentation process with Aspergillus niger observation (Legin et al., 2004a) and mold growth (Sderstrm et al., 2003) were recently reported. The application of potentiometric metallic sensor array composed of 6 metallic sensors of first kind included Fe, Al, Cu, Sn, brass and stainless steel for discrimination of water soluble humic substance (HS) preparations, organic
88
L. LVOVA ET AL. 400
Agrocore
200
Aldrich
Humate 80
PC2, 36%
0
Haplic Chernozem
August Green Belt
-200
-400
Nobel -600
Albic Luvisol
-800 -400
0
400
800
1200
PC1, 59% Figure 12. Identification of organic fertilizers and soils aqueous extracts with metallic sensors array.
fertilizers and various soil type aqueous extracts was performed. Such a classification may be a first step of qualitative agricultural analysis of soils according to their actual fertility and fertilizers according to their effectiveness since fertility is strictly connected to the amount of water soluble humic substanses. Classification was performed basing on HS complexation affinity with transition metals. A PCA score plot for discrimination of five organic fertilizers, humic acid preparation and two soils (Halpoc Chernozem and Albic Luvisol) is shown on Figure 12. Aqueous extracts of both soil samples are far separated from the group of all organic fertilizers alone PC1 axis, which may be attributed to the lover content and nature of HS in soil extracts. HC and AL soils are well distinguished also alone the PC2, which may be correlated to the pH of the samples as far as to the HS lability, since HC contains less amount of labial fulvic acids. An interesting application of vegetable oils imaging were reported by Apetrei et al. (2005). Vegetables oils used as electroactive binder material of carbon paste electrodes. The electrochemical response of such electrodes immersed in a variety of electrolytic aqueous solutions has been exploited to discriminate the oils. The features observed in the voltammograms are a reflect of the red-ox properties of the electroactive compounds (mainly antioxidants) present in the oils inside of the carbon paste matrix. The different content in polyphenols allows olive oils to be easily discriminated from sunflower oil or corn oil. These parameters together with voltammograms pH influence
CHEMICAL IMAGES OF LIQUIDS
89
were on used as the input variable of PCA. The results indicated that suggested method allows a clear discrimination among oils from different vegetal origins and also allows olive oils of different quality (extra virgin, virgin, lampante and refined olive oil) to be discriminated. An application of the electronic tongue multisensor system (ET) composed of 28 potentiometric chemical sensors both chalcogenide glass and plasticized polymeric membranes was used for discrimination of standard and Mancha Amarela (Yellow spot) cork (Rudnitskaya et al., 2006). Extracts of cork in 10pc ethanol water solutions were analyzed. Two sets of cork samples that included both standard (S) and Mancha Amarela (MA) cork samples from two different factories were studied. It was found that ET could reliably distinguish extracts made from S and MAcork regardless samples‘ origin. ET could predict total phenols’ content with average precision of 9 pc when calibrated using reference data obtained by Folin-Ciocalteu method. Composition pf S and MA cork was studied by ET. The largest difference in concentration between two types of cork extracts was found for content of two acids: gallic and protocatechuic.
9. Non-Electrochemical Electronic Tongues Although the majority of the electronic tongues reported in the literature are based on electrochemical transduction principles, in the last few years some examples of electronic tongues based on different sensing mechanisms. In this scenario, optical transduction has been exploited by the group working at the University of Austin, Texas (Lavigne et al., 1998). The electronic tongue developed by this group was based on polyethylene glycol- polystyrene resin beads functionalized with different dyes. These beads are positioned within silicon microcavities in order to mimic the natural taste buds, Figure 13. Both colorimetric and fluorescence properties variation induced by the interaction of the immobilized dyes with target analytes can be exploited for the sensing mechanism. These changes are recorded by a CCD camera to obtain
Figure 13. Schematic presentation of optical Electronic tongue based on polyethylene glycolpolystyrene resin beads functionalized with different dyes.
90
L. LVOVA ET AL.
the chemical images of the analyzed environments. The analysis of the RGB patterns will allow the identification and potentially the quantification of the analytes present in the sample analyzed. More recently the electronic tongue has been implemented with a micromachined fluidic structure for the introduction of the liquid samples (Sohn et al., 2005). As an extension of his work on the “Smell Seeing” approach, Suslick has recently reported a colorimetric array for the detection of organic compound in water (Zhang and Suslick, 2005). The array was prepared by printing metalloporphyrins, solvatochromic and pH indicator dye spots on a hydrophobic surface.The array was saturated with a reference solution and imaged by an ordinary flatbed scanner to have the starting image. The interaction with target analytes induce color change recorded by the scanner; subtraction of the images with the original reference provides the color change profile, which will be the fingerprint of each analyte. In this configuration the array is not influenced by salts or hydrophilic compounds and for this reason the authors noted that for this reason the array is not really an electronic tongue. Other than optical arrays, nanogravimetric sensors have also been exploited for the development of electronic tongues. The first report was done by Hauptmann and co-workers, with the use of quartz crystal microbalance (QCM) in liquids (Hauptmann et al., 2000). QCMs are more difficult to use in liquid phase, because the high damping of the oscillator requires more electronics and care in the measurements, although the advantages of QCMs in terms of miniaturization and integration with a wide range of different sensing materials are preserved also in the liquid phase. While the use of QCM has not been further developed, more recently Gardner and co-workers have reported the use of a dual shear horizontal surface acoustic wave (SH-SAW) device for a realization of a miniaturized electronic tongue (Sehra et al., 2004). The main advantage claimed by the authors lies in the fact that this device is based on physical and not chemical or electrochemical transduction principles and for this reason it is more robust and durable. The sensing mechanism is based on the measurement of both mechanical (physico-acoustic) properties and electrical (electro-acoustic) parameters of the analyzed liquid. The selectivity of the system can however improved, if necessary, by the deposition of a selective membrane onto the sensing area of the device. The system has bee tested toward the classification of model analytes, representing the basic human taste and exploited also for the discrimination of real samples, such as cow milk. References Albert, K. J., Lewis, N. S., Schauer, C. L., Sotzing, G. A., Stitzel, S. E., Vaid, T. P., and Walt, D. R. (2000) Cross-Reactive Chemical Sensor Arrays, Chemical Review 100, 2595–2626.
CHEMICAL IMAGES OF LIQUIDS
91
Apetrei, C., Rodriguez-Mendez, M. L., and de Saja, J. A. (2005) Modified carbon paste electrodes for discrimination of vegetable oils, Sensors and Actuators B 111/112, 403– 409. Arikawa, Y., Toko, K., Ikezaki, H., Shinha, Y., Ito, T., Oguri, I., and Baba, S. (1995) Sensor Materials 7, 261. Arrieta, A., Rodriguez-Mendez, M. L., and de Saja, J. A. (2003) Sensors Actuators B 95, 357–365. Badr, A. J. and Fulkner, L. R. (2000) Electrochemical Methods: Fundamentals and Applications, 2nd ed., New York, Wiley, 856 pp. Bhlmann, P., Pretsch, E., and Bakker, E. (1998) Carrier-Based Ion-Selective Electrodes and Bulk Optodes. 2. Ionophores for Potentiometric and Optical Sensors, Chemical Review 98, 1593–1687. Buratti, S., Benedetti, S., Scampicchio, M., and Pangerod, E. C. (2004) Characterization and Classification of Italian Barbera Wines by Using an Electronic Nose and Amperometric Electronic Tongue, Analytica Chimica Acta 525, 133–139. Cattrall, R. W. (1997) Chemical Sensors, Vol. 1, Oxford, Oxford University Press. Ciosek, P., Sobanski, T., Augustyniak, E., and Wroblewski, W. (2006) ISE-based Sensor Array System for Classification of Foodstuffs, Measurement Science and Technology 17, 6–11. D’Amico, A., Di Natale, C., and Paolesse, R. (2000) Portraits of Gasses and Liquids by Arrays of Nonspecific Chemical Sensors: Trends and Perspectives, Sensors and Actuators B 68, 324–330. Di Natale, C., Magagnano, A., Davide, F., D’Amico, A., Legin, A., Rudnitskaya, A., Vlasov, Yu., and Seleznev, B. (1997) Multicomponent Analysis of Polluted Waters by Means of Electronic Tongue, Sensors and Actuators B 44, 423–428. Di Natale, C., Paolesse, R., Macagnano, A., Mantini, A., D’Amico, A., Legin, A., Lvova, L., Rudnitaskaya, A., and Vlasov, Yu. (2000a) Electronic Nose and Electronic Tongue Integration for Improved Classification of Clinical and Food Samples, Sensors and Actuators B 64, 15–21. Di Natale, C., Paolesse, R., Macagnano, A., Mantini, A., D’Amico, A., Ubigli, M., Legin, A., Lvova, L., Rudnitaskaya, A., and Vlasov Yu. (2000b) Application of a Combined Artificial Olfaction and Taste System to the Quantification of Relevant Compounds in Red Wine, Sensors and Actuators B 69, 342–347. Duda, R. O., Hart, P. E., and Stork, D. E. (2001) Pattern Classification, 2nd ed., New York, Wiley. Duran, A., Cortina, M., Velasco, L., Rodriguez, J. A., Alegret, S., Calvo, D., and del Valle, M. (2005) Vitrual Instrument as an Automated Potentiometric e-Tongue Based on SIA, In Proceedings of ISOEN’11, April 13–15, Barcelona, Spain, pp. 316–319. Gallaro, J., Alegret, S., and del Valle, M. (2004) A Flow-injection Electronic Tongue Based on Potentiometric Sensors for the Determination of Nitrate in the Presence of Chloride, Sensors and Actuators B 101, 82–80. Geladi, P. and Kowlaski, B. (1986) Partial Least Square Regression: A Tutorial. Analytica Chemica Acta 35, 1–17. Grl, M., Cortina, M., Calvo, D., and del Valle, M. (2005) Automated e-Tongue Based on Potentiometric Sensors for Determining Alkaline-earth Ions in Water, In Proceedings of ISOEN’11, April 13–15, Barcelona, Spain, pp. 296–299. Habara, M., Ikezaki, H., and Toko, K. (2004) Study of Sweet Taste Evaluation Using Taste Sensor with Lipid/Polymer Membranes, Biosensors and Bioelectronics 19, 1559–1563. Harvey, D. (2000) Modern Analytical Chemistry, 1st ed., New York, McGraw-Hill, p. 468.
92
L. LVOVA ET AL.
Hayashi, K., Yamanaka, M., Toko, K., and Yamafuji, K. (1990) Sensors and Actuators B 2, 05. Holmin, S., Krantz-Rlcker, C., and Winquist, F. (2004) Multivariate Optimization of Electrochemically Pre-treated Electrodes Used in a Voltammetric Electronic Tongue, Analytica Chimica Acta 519, 39–46. Iiyama, S., Ezaki, S., Toko, K., Matsuno, T., and Yamafuji, K. (1995) Study of Astringency and Pungency with Multichannel Taste Sensor Made of Lipid Membranes, Sensors and Actuators B 24/24, 75–79. Iiyama, S., Kuga, H., Ezaki, S., Hayashi, K., and Toko, K. (2003) Peculiar Change in Membrane Potential of Taste Sensor Caused by Umami Substances, Sensors and Actuators B 91, 191– 194. Izs, S. and Garnier, M. (2005) The Umami Taste and the IMP/GMP Synergetic Effects Quantify by the ASTREE Electronic Tongue, In Proceedings of ISOEN’11, April 13–15, Barcelona, Spain, pp. 66–69. Krantz-Rlcker, A., Stenberg, M., Winquist, F., and Lundstrm, I. (2001) Electronic Tongues for Environmental Monitoring Based on Sensor Arrays and Pattern Recognition: A Review, Analytica Chimica Acta 426, 217–226. Krusse-Jaress, J. D. (1988) Ion-selective Potentiometry in Clinical Chemistry, Medichal Progress Throufh Technology 13, 107–130. Lavigne, J. J. and Anslyn, E. V. (2001) Angewandte Chemie International Edition 40, 3118. Lavigne, J., Savoy, S., Clevenger, M. B., Ritchie, J. E., Bridget, M., Yoo, S. J., Anslyn, E. V., McDevitt, J. T., Shear, J. B., and Neikirk, D. (1998) Journal of American Chemical Society 120, 6429. Legin, A., Kirsanov, D., Rudnitskaya, A., Iversen, J. J. L., Seleznev, B., Esbensen, K. H., Mortensen, J., Houmller, L. P., and Vlasov, Yu. (2004a) Multicomponent Analysis of Fermentation Growth Media Using the Electronic Tongue (ET), Talanta 64, 766–772. Legin, A., Lvova, L., Rudnitskaya, A., Vlasov, Yu., Di Natale, C., Mazzone, E., and D’Amico, A. (2001) In Proceedings of the Second Symposium in Vino Analytical Science, Bordeaux, France, p. 165. Legin, A., Makarychev-Mikhailov, S., Goryacheva, O., Kirsanov, D., and Vlasov, Yu. (2002) Cross-Sensitive Chemical Sensors Based on Tetraphenylporphyrin and Phthalocyanine, Analytica Chimica Acta 457, 297–303. Legin, A. V., Rudnitskaya, A. M., Legin, K. A., Ipatov, A. V., and Vlasov, Yu. G. (2005a) Methods for Multivariate Calibrations for Processing of the Dynamic Response of a Flow-Injection Multiple-Sensor System, Russian Journal of Applied Chemistry 78, 89–95. Legin, A., Rudnitskaya, A., Seleznev, B., Kirsanov, D., and Vlasov, Yu. (2004b) Chemical Sensor Arrays for Simultaneous Activity of Several Heavy Metals at Ultra Low Level, In Proceedings of Eurosensors XVIII, Rome, Italy, pp. 85–86. Legin, A., Rudnitskaya, A., Seleznev, B., and Vlasov, Yu. (2005b) Electronic Tongue for Quality Assessment of Ethanol, Vodka and Eau-de-vi, Analytica Chimica Acta 534, 129–135. Legin, A., Rudnitskaya, A., Smirnova, A., Lvova, L., and Vlasov, Yu. (1999a) Journal of Applied Chemistry (Russia) 72, 114. Legin, A., Rudnitskaya, A., and Vlasov, Yu. (2003) In S. Alegret (ed.), Integrated Analytical Systems, Comprehensive Analytical Chemistry, Vol. XXXIX, Amsterdam, Elsevier, p. 437. Legin, A., Rudnitskaya, A., Vlasov, Yu., Di Natale, C., Mazzone, E., and D’Amico A. (1999b) Electroanalysis 11, 814. Legin, A., Rudnitskaya, A., Vlasov, Yu., Di Natale, C., Mazzone, E., and D’Amico, A. (2000) Application of Electronic Tongue for Qualitative and Quantitative Analysis of Complex Liquid Media, Sensors and Actuators B 65, 232–234.
CHEMICAL IMAGES OF LIQUIDS
93
Legin, A., Smirnova, A., Rudnitskaya, A., Lvova, L., Suglobova, E., and Vlasov, Yu. (1999c) Chemical Sensor Array for Multicomponent Analysis of Biological Liquids, Analytica Chimica Acta 385, 131–135. Lvova, L., Kim, S. S., Legin, A., Vlasov, Yu., Yang, J. S., Cha, G. S., and Nam H. (2002) All Solid-state Electronic Tongue and its Application for Beverage Analysis, Analytica Chimica Acta 468, 303–314. Lvova, L., Legin, A., Vlasov, Yu., Cha, G. S., and Nam, H. (2003) Multicomponent Analysis of Korean Green Tea by Means of Disposable All-solid-state Potentiometric Electronic Tongue Microsystem, Sensors and Actuators B 95, 391–399. Lvova, L., Martinelli, E., Mazzone, E., Pede, A., Paolesse, R., Di Natale, C., and D’Amico, A. (in press a) Electronic Tongue Based on an Array of Metallic Potentiometric Sensors, Atlanta. Lvova, L., Paolesse, R., Di Natale, C., and D’Amico, A. (in press b) Detection of Alcohols in Beverages: An Application of Porphyrin-based Electronic Tongue, Sensors and Actuators B. Lvova, L., Verrelli, G., Paolesse, R., Di Natale, C., and D’Amico A. (2004) An Application of Porphyrine-based “Electronic Tongue” System for “Verdicchio” Wine Analysis, In Proceedings of Eurosensors XVIII, September 12–15, Rome, Italy, pp. 385–386. Martens, H. and Naes, T. (1989) Multivariate Calibration, London, Wiley. Martinez-Mez, R., Soto, J., Garcia-Breijo, E., Gil, L., Ibez, J., and Gadea, E. (2005a) A Multisensor in Thick-film Technology for Water Quality Control, Sensors and Actuators A 120, 589–595. Martinez-Mez, R., Soto, J., Garcia-Breijo, E., Gil, L., Ibez, J., and Llobet, E. (2005b) An “electronic Tongue” design for the Qualitative Analysis of Natural Waters, Sensors and Actuators B 104, 302–307. Martinez-Mez, R., Soto, J., Gil, L., Garcia-Breijo, E., Ibez, J., Gadea, E., Llobet, E. (2005c) Electronic Tongue for Quantitative Analysis of Water Using Thick-film Technology, In Proceedings of ISOEN’11, April 13–15, Barcelona, Spain, pp. 136–137. Massart, D. L., Vandegiste, B. G., Deming, S. N., Michotte, Y., Kaufmann, L. (1988) Data Handling in Science and Technology. Vol. 2: Chemometrics: A Textbook, Amsterdam, The Netherlands, Elsevier. Moreno, L., Bratov, A., Abramova, N., Jimenez, C., and Dominguez, C. (2005) Multi-sensor Array Used as an Electronic Tongue for Mineral Water Analysis, In Proceedings of ISOEN’11, April 13–15, Barcelona, Spain, pp. 103–105. Mortensen, J., Legin, L., Ipatov, A., Rudnitskaya, A., Vlasov, Yu., and Hjuler, K. (2000) A Flow Injection System Based on Chalcogenide Glass Sensors for the Determination of Heavy Metals, Analytica Chimica Acta 403, 273–277. Nam, H., Cha, G. S., Jeon, Y. H., Kim, J. D., Seo, S. S., Shim, J. H., and Shim, J. H., Artificial Neural Network Analysis and Classification of Beverage Tastes with Solid-state Sensor Array, In Proceedings of Pittcon’05 Conference, Orlando, Florida, February 27–March 4, 2005, 2050–10. Olsson, J., Winquist, F., and Lundstrm, I. (2005) A Self-polishing Electronic Tongue, In Proceedings of Eurosensors XIX, September 11–14, 2005, Barcelona, Spain, TA21. Otto, M. and Thomas, J. D. R. (1985) Analytical Chemistry 57, 2647. Paolesse, R., Di Natale, C., Burgio, M., Martinelli, E., Mazzone, E., Palleschi, G., and D’Amico, A. (2003) Porphyrin-based array of cross-selective electrodes for analysis of liquid samples, Sensors and Actuators B 95, 400–405. Pearce, T. C., Schiffman, S. S., Nagle, H. T., and Gardner, J. W. (2002) (eds.), Handbook of Machine Olfaction: Electronic Nose Technology, New York, Wiley.
94
L. LVOVA ET AL.
Riul Jr., A., dos Santos Jr., D. S., Wohnrath, K., Di Tommazo, R., Carvalho, A. C. P. L. F., Fonseca, F. J., Oliveira Jr., O. N., Taylor, D. M., and Mattoso, L. H. C. (2002) Artificial Taste Sensor: Effcient Combination of Sensors Made from Langmuir–Blodgett Films of Conducting Polymers and a Ruthenium Complex and Self-assembled Films of an Azobenzene Containing Polymer, Langmuir 18, 239–245. Riul Jr., A., Malmegrim, R. R., Fonseca, F. J., and Mattoso, L. H. C. (2003) Nano-Assembled Films for Taste Sensor Application, Artificial Organs 27, 469–472. Rouessac, F. and Rouessac, A. (2002) Chemical Analysis. Mordern Instrumental Methods and Techniques, New York, Wiley, 445 pp. Rudnitskaya, A., Delgadillo, I., Legin, A., Rocha, S., Da Gosta, A.-M., and Simoes, T. (2005) Analysis of Port Wines Using the Electronic Tongue, In Proceedings of ISOEN’11, April 13–15, Barcelona, Spain, pp. 178–179. Rudnitskaya, A., Delgadillo, I., Rocha, S. M., Costa, A. M., and Legin, A. (2006) Quality Valuation of Cork from Quercus suber L. by the Electronic Tongue, Analytica Chimica Acta 563, 315–318. Rudnitskaya, A., Ehlert, A., Legin, A., Vlasov, Yu., and Buttgenbach, S. (2001) Multisensor System on the Basis of an Array of Non-specific Chemical Sensors and Artificial Neural Networks for Determination of Inorganic Pollutants in a Model Groundwater, Talanta 55, 425–431. Sakai, H., Iiyama, S., and Toko, K. (2000) Evaluation of water quality and pollution using multichannel sensors, Sensors and Actuators B 66, 251–255. Sanz Alaejos, M. and Garcia Montelongo, F. J. (2004) Chemical Review 104, 3239–3265. Sderstrm, C., Boren, H., Winquist, F., and Krantz-Rlcker, C. (2003) Use of an Electronic Tongue to Analyze Mold Growth in Liquid Media, International Journal of Food Microbiology 83, 253–261. Sderstrm, C., Rudnitskaya, A., Legin, A., and Krantz-Rlcker, C. (2005) Differentiation of Four Aspergillus Species and one Zygosaccharomyces with Two Electronic Tongues Based on Different Measurement Techniques, Journal of Biotechnology 119, 300–308. Sderstrm, C., Winquist, F., Krantz-Rlcker, C. (2003) Recognition of Six Microbial Species with an Electronic Tongue, Sensors and Actuators B 89, 248–255. Sehra, G., Cole, M., and Gardner, J. W. (2004) Sensors and Actuators B 103, 233. Sohn, Y. S., Goodey, A., Anslyn, E. V., McDevitt, J. T., Shear, J. B., and Neikirk, D. P. (2005) Biosensors and Bioelectronics 21, 303–312. Toko, K. (1996) Taste Sensor with Global Sensitivity, Material Science and Engineering C 4, 69–82. Toko, K. (2000a) Sensors and Actuators B 64, 205. Verrelli, G., Francioso, L., Paolesse, R., Siciliano, P., Di Natale, C., and D’Amico, A. (2005) Electronic Tongue Based on Silicon Miniaturized Potentiometric Sensors, In Proceedings of EUROSENSORS XIX, September 11–14, Barcelona, Spain, TA24. Vlasov, Yu. and Legin, A. (1998) Fresenius Journal of Analytical Chemistry 361, 255. Vlasov, Yu., Legin, A., and Rudnitskaya, A. (1997) Cross-sensitivity Evaluation of Chemical Sensors for Electronic Tongue: Determination of Heavy Metal Ions, Sensors Actuators B 44, 532–537. Wang, J. (2006) Analytical Electrochemistry, 3rd ed., New York, Wiley, 250 pp. Winquist, F., Bjorklund, R., Krantz-Rulcker, C., Lundstrom, I., Ostergren, K., and Skoglund, T. (2005) An Electronic Tongue in the Dairy Industry, Sensors and Actuators B 111/112, 299–304. Winquist, F., Holmin, S., Krantz-Rlcker, C., Wide, P., and Lundstrm, I. (2000) A hybrid electronic tongue, Analytica Chimica Acta 406, 147–157.
CHEMICAL IMAGES OF LIQUIDS
95
Winquist, F. and Lundstrm, I. (1997) An Electronic Tongue Based on Voltammetry, Analytica Chimica Acta 357, 21–31. Winquist, F., Lundstrm, I., and Wide, P. (1999) The Combination of an Electronic Tongue and an Electronic Nose, Sensors and Actuators B 58, 512–517. Wold, H. (1966) Estimation of Principal Components and Related Models by Iterative Least Squares. In P. R. Krishnaiaah (ed.), Multivariate Analysis, New York, Academic Press, pp. 391–420. Yoon, H. J., Shin, J. H., Lee, S. D., Nam, H., Cha, G. S., Strong, T. D., and Brown, R. B. (2000) Solid-state Ion Sensors with a Liquid Junction-free Polymer Membrane-based Reference Electrode for Blood Analysis, Sensors and Actuators B 64, 8–14. Yoshinobu, T., Iwasaki, H., Ui, Y., Furuichi, K., Ermolenko, Yu., Mourzina, Yu., Wagner, T., Nather, N., and Schning, M. J. (2005) The Light-addressable Potentiometric Sensor for Multi-ion Sensing and Imaging, Methods 37, 94–102. Zhang, C. and Suslick, K. S. (2005) Journal of American Chemical Society 127, 11548–11549. Hauptmann, P., Borngraeber, R., Schroeder, J., von-Guericke, O., and Auge, J. (2000) IEEE/EIA International Frequency Control Symposium and Exhibition, p. 22.
This page intentionally blank
SEQUENTIAL DETECTION ESTIMATION AND NOISE CANCELATION E. J. Sullivan1 (
[email protected]) and J. V. Candy2 (
[email protected]) 1 Prometheus Inc., Newport Rhode Island 02840 2 Lawrence Livermore National Laboratories, Livermore California, 94551
Abstract. A noise canceling technique based on a model-based recursive processor is presented. Beginning with a canceling scheme using a reference source, it is shown how to obtain an estimate of the noise, which can then be incorporated in a recursive noise canceler. Once this is done, recursive detection and estimation schemes are developed. The approach is to model the nonstationary noise as an autoregressive process, which can then easily be placed into a state-space canonical form. This results in a Kalman-type recursive processor where the measurement is the noise and the reference is the source term. Once the noise model is in state-space form, it is combined with detection and estimation problems in a self-consistent structure. It is then shown how parameters of interest, such as a signal bearing or range, can be enhanced by including (augmenting) these parameters into the state vector, thereby jointly estimating them along with the recursive updating of the noise model. Key words: (Sullivan) sequential detection, Kalman filter, own-ship noise, noise canceling, recursive detection, nonstationary noise, Neyman–Pearson, likelihood ratio, signal model, noise model, Gauss–Markov
1. Introduction
In underwater acoustic signal processing, own-ship noise is a major problem plaguing the detection, classification, localization and tracking problems (Tolstoy and Clay, 1987; Clay and Medwin, 1977). For example, this noise is a major contributer to towed array measurement uncertainties that can lead to large estimation errors in any form of signal processing problem aimed at extracting quiet target information. Many sonar processing approaches deal with this problem by relying on straightforward filtering techniques to remove these undesirable interferences, but at the cost of precious signal-to-noise ratio (SNR). 97 J. Byrnes (ed.), Imaging for Detection and Identification, 97–105. C 2007 Springer.
98
E. J. SULLIVAN AND J. V. CANDY
This chapter addresses the idea of noise cancelation by formulating it in terms of joint detection and estimation problems (Candy and Sullivan, 2005a,b). Since such problems are, in general, nonstationary in character, the solution must be adaptive, and therefore the approach taken here is to cast it as a recursive model-based problem. By modelbased (Candy, 2006) we mean that as much a priori information as possible is included in the form of physical models, where the parameters of these models can be adaptively estimated. Such an approach leads naturally to state-space based recursive structures. In other words, we will be dealing with Kalman filter based processors. We begin in the next section with the development of a recursive detection structure. This is followed in Sections 3 and 4 with the development of the ship noise and signal models, respectively. Section 5 develops the detection/noise-cancelation scheme. Section 6 continues with the development by showing how this structure can provide enhanced estimation. Finally, Section 7 contains a discussion. 2. Sequential Detection
The detector is configured as a binary hypothesis test based on the Neyman–Pearson criterion. Thus, the optimum solution is given by the likelihood ratio, i.e., L(t) =
Pr(Pt |H1 ) , Pr(Pt |H0 )
(1)
where Pt is the set of time samples of the spatial vector of pressure measurements, i.e., Pt = {p(1), p(2), . . . , p(t)} = {Pt−1 , p(t)},
(2)
where H1 and H0 , respectively, refer to the signal present and the null hypothoses, and p(t) is the spatial vector of measurements at time t. Thus, using Bayes’ rule, the probabilities in Equation (1) can be written as Pr(Pt |H1,0 ) = Pr(p(t)|Pt−1 ; H1,0 )Pr(Pt−1 |H1,0 ).
(3)
Substitution into Equation (1) yields
Pr(p(t)|Pt−1 ; H1 ) Pr(Pt−1 |H1 ) , L(t) = Pr(p(t)|Pt−1 ; H0 ) Pr(Pt−1 |H0 )
(4)
SEQUENTIAL DETECTION AND ESTIMATION
99
but from Equation (1) we see that this can be written as L(t) = L(t − 1)
Pr(p(t)|Pt−1 ; H1 ) . Pr(p(t)|Pt−1 ; H0 )
(5)
Upon taking logarithms, we can now write the sequential log-likelihood ratio as (t) = (t − 1) + lnPr(p(t)|Pt−1 ; H1 ) − lnPr(p(t)|Pt−1 ; H0 ),
(6)
which leads to the binary test implemented as (t) ≥ lnT1 Accept H1 lnT0 < (t) < lnT1 Continue (t) ≤ lnT0 Accept H0
(7)
Following Wald (1947) these thresholds are usually chosen to be PrD 1 − Prmiss = , PrFA PrFA
(8)
1 − PrD Prmiss = . 1 − PrFA 1 − PrFA
(9)
T0 = and T1 =
In these thresholds, PrD and PrFA are probabilities of detection and false alarm chosen by the user. Before we can continue with the detection problem, we must develop expressions for the relevant probability densities in Equation (6). In order to do this we must develop the models for the ship noise and the signal.
3. Ship Noise Model
It can be shown (Candy, 2006; Widrow, 1975; Friedlander, 1982) that the optimum noise canceling structure can be formulated as an identification problem with the noise reference input z(t) related to the ship noise estimate η(t) ˆ by a coloring filter h(t), i.e., η(t) ˆ = h(t) ∗ z(t).
(10)
100
E. J. SULLIVAN AND J. V. CANDY
Using a canonical form, Equation (10) can be written in Gauss–Markov (Jazwinski, 1970) form as η(t) =
Nh
h i z(t − i),
(11)
i=1
with z(t) being the reference. This leads to the state equation ξ (t) = Aξ ξ (t − 1) + Bξ z(t − 1) + w ξ ,
(12)
and the measurement equation η(t) = Cξ ξ (t) + vξ , where
⎡ ⎢ Aξ = ⎢ ⎣
0···
0
I Nh −1
(13)
⎤ 0 .. ⎥ .⎥ 0⎦ 0
Bξ = [1
0
···
0].
(14)
and Cξ = [h 1 h 2 · · · h Nh −1 h Nh ].
(15)
The recursive solution is given by the following algorithm. ξ (t|t − 1) = η(t|t − 1) = η (t) = Rη η (t|t − 1) = ξ (t|t) = K ξ (t) =
Aξ ξ (t − 1|t − 1) + Bξ z(t − 1) Cξ ξ (t − 1|t − 1) η(t) − η(t|t − 1) Cξ P˜ ξ ξ (t|t − 1)Cξ + Rνξ νξ ξ (t|t − 1) + K ξ (t)η (t) Cξ (t) P˜ ξ ξ (t|t − 1)R−1 (t|t − 1) η η
[Prediction] [Meas Pred] [Innovation] [In Cov] [Correction] [Gain] (16)
The P˜ ξ ξ (t|t − 1) term is the state error covariance, and is computed as an integral part of the Kalman algorithm. The notation in these equations has been generalized in order to explicitly indicate the predictive nature of the algorithm. That is, ξ (t|t − 1) is the value of ξ at time t, based on the data up to time t − 1.
SEQUENTIAL DETECTION AND ESTIMATION
101
4. Signal Model
The measurement system is taken to be a towed array of N receiver elements. The received pressure field at the array, as a function of space and time, is given by p(xn (t), t) = s(rn (t), t) + η(xn (t), t) + ν(xn (t), t),
(17)
where xn is the spatial location of the nth receiver element on the array, s is the source (target) signal, η is the interfering own ship noise and ν is the ambient noise. It is assumed that the signal is a weak narrow-band planar wave given by s(xn , t) = a0 ei(ω0 t−k[xn (0)+vt] sin θ) .
(18)
The radian frequency at the source is ω0 , θ is the bearing of the source and v is the speed of forward motion of the array. k = ω0 /c is the wavenumber and c is the speed of sound. Note that for this signal model, x(t) = x(0) + vt. That is, the array motion is modeled as a Galilean transformation on the receiver elements. The significance of this is that, in bearing estimation, the motion contributes information that is contained in the Doppler that enhances the estimate (Sullivan and Candy, 1997; Sullivan and Edelson, 2004). The broadband measurement noise is modelled as zero-mean, white Gaussian. Note that we are not restricting the statistics to be stationary, so we can accommodate the nonstationarities that occur naturally in the ocean environment.
5. Joint Detection/Noise Cancelation
We now have everything we need to construct our processor. Under the null hypothesis, we have that Pr(p(t)|Pt−1 ; H0 ) is conditionally Gaussian, since η is a consequence of the Gauss–Markov model of Equations (16). Thus ˆ − 1), R p p (t|t − 1)). Pr(p(t)|Pt−1 ; H0 ) ∼ N (p(t|t
(19)
Here, p(t|t ˆ − 1) = E{p(t)|Pt−1 ; H0 } is the conditional mean estimate at time t based on the data up to time t − 1, and R p p is the innovations covariance, both of which are available from the estimator of Equations (16). Including the ship noise term in the signal model, we see that for
102
E. J. SULLIVAN AND J. V. CANDY
the null hypothesis, po (t) = η(t) + ν(t),
(20)
so that po (t) = p(t) − p(t|t − 1) = [η(t) − η(t|t − 1)] + ν(t),
(21)
or po (t) = η (t) + ν(t),
(22)
which has the corresponding covariance Rpo po (t) = Rηη (t) + Rνν (t).
(23)
In the case of a deterministic signal, it now follows that for the alternate hypothesis p1 (t) = s(t) + η(t) + ν(t).
(24)
From Equations (20) and (24) then, it now follows that the noise canceled log likelihoods are ˆ R−1 (t)[p0 (t) − η(t)] ˆ lnPr(p(t)|Pt−1 ; H0 ) = [p0 (t) − η(t)] po po
(25)
and ˆ R−1 (t)[p1 (t) − η(t)], ˆ lnPr(p(t)|Pt−1 ; H1 ) = [p1 (t) − η(t)] po po
(26)
where we have used the fact that R p1 p1 (t) = R po po (t).
(27)
Finally, upon substitution of Equations (25) and (26) into Equation (6), the recursive detection/cancelation processor is given by ˆ R−1 (t)[p1 (t) − η(t)] ˆ (t) = (t − 1) + [p1 (t) − η(t)] po po ˆ R−1 (t)[p0 (t) − η(t)]. ˆ −[p0 (t) − η(t)] po po
(28)
6. Joint Estimation/Noise Cancelation
Consider a moving towed array of N elements. The signal at the nth element is given by Equation (18). It has been shown that by including the motion in this manner, the variance on the bearing estimate is reduced, as compared to that of the conventional (beamformer) estimator (Sullivan and Candy, 1997).
SEQUENTIAL DETECTION AND ESTIMATION
103
Here, we wish to further enhance the estimation results by incorporating the noise canceler into a recursive estimation scheme. As can be seen in Equation (18), there are three parameters in the signal model. These are the amplitude, source frequency and bearing. However, if we choose to work in the phase domain, the amplitude is eliminated and we are left with two parameters, ω0 and θ . Including the ship noise, Equation (18) generalizes to1 s(xn , t) = a0 ei(ω0 t−k[xn (0)+vt] sin θ) + η(t).
(29)
The hydrophone measurement is then pn = a0 ei(ω0 t−k[xn (0)+vt] sin θ) + η(t) + v(t),
(30)
so that the canceled measurement for the Kalman filter is ˆ = a0 ei(ω0 t−k[xn (0)+vt] sin θ) + v(t). pn − η(t)
(31)
Although the source frequency appears to be a nuisance parameter, its inclusion is necessary in order to obtain the performance improvement by including the bearing information contained in the Doppler. This is sometimes referred to as the passive synthetic aperture effect. Making the assumption that the bearing changes slowly in time, the recursive estimator can now be cast in the form of a Kalman filter with a “random walk” state equation (Sullivan and Candy, 1997). That is 1 0 θ(t|t − 1) θ (t|t) = . (32) (t|t) = 0 1 ωo (t|t − 1) ωo (t|t) Since we wish to work in the phase domain, the measurement is based on the exponent of Equation (18), and is given by yn ( ) = (ω0 /c)([xn − xn−1 ] + vt) sin θ
n = 1, 2, · · · , N − 1. (33) Note that the ω0 t term does not appear as it does in Equation (18). This is due to the fact that it is only the phase differences that are relevant to the problem; thus, there are N − 1 phase difference measurements for N hydrophones. There is an auxiliary measurement equation that is based on the observed frequency. This is basically the Doppler relation 1
Although this development is based on a narrow band signal model, it can be easily generalized to accommodate a broadband model.
104
E. J. SULLIVAN AND J. V. CANDY
and is given by y N ( ) = ω = ω0 (1 + (v/c) sin θ ),
(34)
with ω being the observed radian frequency. Note that the measurement equations are nonlinear due to the appearance of the term ω0 sin θ . Thus, we will need to use the extended Kalman filter. The joint parametrically adaptive model-based processor (Enhancer/Estimator) in now given by − 1) = Aξ ξ (t − 1|t − 1) + Bξ z(t − 1) − 1) = (t − 1|t − 1) − 1) = Cξ ξ (t − 1|t − 1) − 1) = c[ (t − 1|t − 1)] η (t) = η(t) − η(t|t − 1) y (t) = y(t) − y(t|t − 1) − η(t|t − 1) Rη η (t|t − 1) = Cξ P˜ ξ ξ (t|t − 1)Cξ + Rνν (t) ξ (t|t (t|t η(t|t y(t|t
Ry y (t|t − 1) = J (t) P˜ (t|t − 1)J (t) + Rvv (t) ξ (t|t) = ξ (t|t − 1) + K ξ (t)η (t) (t|t) = (t|t − 1) + K (t)y (t) K ξ (t) = P˜ ξ ξ (t|t − 1)Cξ R−1 (t) η η
K (t) =
[Prediction] [Meas Pred] [Innovation] [In Cov]
(35)
[Correction] [Gain]
P˜ (t|t − 1)J (t)R−1 (t) y y
The term J (t) in the last of Equations (35) is the Jacobian of the (nonlinear) measurement of Equations (33) and (34), which are collectively written as c[ ] in the third of Equations (35), i.e., ∂c[ ] . (36) ∂ That is, the nonlinearity of the measurement equations necessitates the use of the EKF (Candy, 2006), which in turn, requires the Jacobian. J =
7. Discussion
We have shown theoretically that recursive model-based processing can provide adaptive processing schemes that are capable of enhancing both detection and estimation procedures through the proper application of
SEQUENTIAL DETECTION AND ESTIMATION
105
physical models. Here, we have used an all-pole model of own-ship noise and a recursive detector to enhance the detection of a signal. Further, we have shown how the use of such noise models, along with a realistic signal model, can enhance towed-array bearing estimation. References Candy, J. V. (2006) Model-Based Signal Processing, New York, John Wiley & Sons. Candy, J. V. and Sullivan, E. J (2005a) Joint Noise Canceling/Detection for a Towed Array in a Hostile Ocean Environment, presented at the International Conference on Underwater Acoustics Measurements, Heraklion, Crete, Greece, June 28–July 1. Candy, J. V. and Sullivan, E. J. (2005b) Canceling Tow Ship Noise Using an Adaptive Modelbased Approach, presented at the IEEE OES 8th Working Conference on Current Measuring Technology, Oceans 05 Europe, Southampton UK, June 27–29. Clay, C. S. and Medwin, H. (1977) Acoustical Oceanography, New York, John Wiley & Sons, pp. 110–134. Friedlander, B. (1982) System Identification Techniques for Adaptive Noise Canceling, IEEE Transactions on Acoustical Speech Signal Processing., ASSP-30(5), 699–709. Jazwinski, A. (1970) Stochastic Processes and Filtering Theory, New York, Academic Press. Sullivan, E. J. and Candy, J. V. (1997) Space-Time Array Processing: The Model-based Approach, Journal of the Acoustical Society of America, 102(5), 2809–2820. Sullivan, E. J. and Edelson, G. S (2004) A Generalized Cramer-Rao Lower Bound for Line Arrays, presented at the 148th Meeting of the Acoustical Society of America, San Diego CA, Nov. 15–19. Tolstoy, I. and Clay, C. S. (1987) Ocean Acoustics: Theory and Experiment in Underwater Sound, Acoustical Society of America, 3–12. Wald, A. (1947) Sequential Analysis, New York, John Wiley & Sons. Widrow, B., Glower, J. R.(Jr.), McCool, J. M., Kaunitz, J., Williams, C. S., Heart, R. H., Zeldler, J. R., Dong, E. (Jr.), and Goodlin, R. C., (1975) Adaptive Noise Canceling: Principles and Applications, Proceedings of IEEE, 63, 1692–1716.
This page intentionally blank
IMAGE FUSION: A POWERFUL TOOL FOR OBJECT IDENTIFICATION ˇ Filip Sroubek (
[email protected]), Jan Flusser (
[email protected]), and Barbara Zitov´a (
[email protected]) Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vod´arenskou vˇezˇ´ı 4, Praha 8, 182 08, Czech Republic
Abstract. Due to imperfections of imaging devices (optical degradations, limited resolution of CCD sensors) and instability of the observed scene (object motion, media turbulence), acquired images are often blurred, noisy and may exhibit insufficient spatial and/or temporal resolution. Such images are not suitable for object detection and recognition. Reliable detection requires recovering the original image. If multiple images of the scene are available, this can be achieved by image fusion. In this chapter we review the respective methods of image fusion. We address all three major steps – image registration, blind deconvolution and resolution enhancement. Image registration brings the acquired images into spatial alignment, multiframe deconvolution estimates and removes the blur, and the spatial resolution of the image is increased by so-called superresolution fusion. Superresolution is the main topic of the chapter. We propose a unifying system that simultaneously estimates blurs and recovers the original undistorted image, all in high resolution, without any prior knowledge of the blurs and original image. We accomplish this by formulating the problem as constrained least squares energy minimization with appropriate regularization terms, which guarantees a close-to-perfect solution. We demonstrate the performance of the method on many examples, namely on car license plate recognition and face recognition. Both of these tasks are of great importance in security and surveillance systems. Key words: image fusion, multichannel systems, blind deconvolution, superresolution, regularized energy minimization 1. Introduction Imaging devices have limited achievable resolution due to many theoretical and practical restrictions. An original scene with a continuous intensity function o[x, y] warps at the camera lens because of the scene motion and/or 107 J. Byrnes (ed.), Imaging for Detection and Identification, 107–127. C 2007 Springer.
108
ˇ F. SROUBEK ET AL.
change of the camera position. In addition, several external effects blur images: atmospheric turbulence, camera lens, relative camera-scene motion, etc. We will call these effects volatile blurs to emphasize their unpredictable and transitory behavior, yet we will assume that we can model them as convolution with an unknown point spread function (PSF) v[x, y]. This is a reasonable assumption if the original scene is flat and perpendicular to the optical axis. Finally, the CCD discretizes the images and produces a digitized noisy image g[i, j] (frame). We refer to g[i, j] as a low-resolution (LR) image, since the spatial resolution is too low to capture all the details of the original scene. In conclusion, the acquisition model becomes g[i, j] = D((v ∗ o[W (n 1 , n 2 )])[x, y]) + n[i, j] ,
(1)
where n[i, j] is additive noise and W denotes geometric deformation (spatial warping) of the image. Geometric deformations are partly caused by the fact that the image is a 2D projection of a 3D world, and partly by lens distortions and/or motion of the sensor during the acquisition. D(·) = S(g ∗ ·) is the decimation operator that models the function of the CCD sensors. It consists of convolution with the sensor PSF g[i, j] followed by the sampling operator S, which we define as multiplication by a sum of delta functions placed on an evenly spaced grid. The above model for one single observation g[i, j] is extremely ill-posed. Instead of taking a single image we can take K (K > 1) images of the original scene and, in this way, partially overcome the equivocation of the problem. Hence we write gk [i, j] = D((vk ∗ o[Wk (n 1 , n 2 )])[x, y]) + n k [i, j] ,
(2)
where k = 1, . . . , K and D remains the same in all the acquisitions. In the perspective of this multiframe model, the original scene o[x, y] is a single input and the acquired LR images gk [i, j] are multiple outputs. The model is therefore called a single input multiple output (SIMO) formation model. To our knowledge, this is the most accurate, state-of-the-art model, as it takes all possible degradations into account. Because of many unknown parameters of the model, it is hard to analyze (automatically or visually) the images gk and to detect and recognize objects in them. A very powerful strategy is offered by image fusion. The term fusion means in general an approach to extraction of information adopted in several domains. The goal of image fusion is to integrate complementary information from all frames into one new image containing information the quality of which cannot be achieved otherwise. Here, the term “better quality” means less blur and geometric distortion, less noise, and higher spatial resolution. We may expect that object detection and recognition will be easier and more reliable when performed on the fused image. Regardless of the particular fusion algorithm, it is unrealistic to assume that the
IMAGE FUSION
109
Figure 1. Image fusion in brief: acquired images (left), registered frames (middle), fused image (right).
fused image can recover the original scene o[x, y] exactly. A reasonable goal of the fusion is a discrete version of o[x, y] that has higher spatial resolution than the resolution of the LR images and that is free of the volatile blurs. In the sequel, we will refer to this fused image as a high resolution (HR) image f [i, j]. Fusion of images acquired according to the model (2) is a three-stage process – it consists of image registration (spatial alignment), which should compensate for geometric deformations Wk , followed by a multichannel (or multiframe) blind deconvolution (MBD) and superresolution (SR) fusion. The goal of MBD is to remove the impact of volatile blurs and the aim of SR is to increase spatial resolution of the fused image by a user-defined factor. While image registration is actually a separate procedure, we integrate both MBD and SR into a single step (see Figure 1), which we call blind superresolution (BSR). The approach presented in this chapter is one of the first attempts to solve BSR under realistic assumptions with only little a priori knowledge. Image registration is a very important step of image fusion, because all MBD and SR methods require either perfectly aligned channels (which is not realistic) or allow at most small shift differences. Thus, the role of registration methods is to suppress large and complex geometric distortions. Image registration in general is a process of transforming two or more images into a geometrically equivalent form. From the mathematical point of view, it consists of approximating Wk−1 and of resampling the image. For images which are not blurred, registration has been extensively studied in the recent literature (see Zitov´a and Flusser (2003) for a survey). However, blurred images require special registration techniques. They can be, as well as the general-purpose registration methods, divided in two groups – global and landmark-based ones. Regardless of the particular technique, all feature extraction methods,
110
ˇ F. SROUBEK ET AL.
similarity measures, and matching algorithms used in the registration process must be insensitive to image blurring. Global methods do not search for particular landmarks in the images. They try to estimate directly the between-channel translation and rotation. In Myles and Lobo (1998) they proposed an iterative method which works well if a good initial estimate of the transformation parameters is available. In Zhang et al. (2000, 2002) the authors proposed to estimate the registration parameters by bringing the channels into canonical form. Since blur-invariant moments were used to define the normalization constraints, neither the type nor the level of the blur influences the parameter estimation. In Kubota et al. (1999) they proposed a two-stage registration method based on hierarchical matching, where the amount of blur is considered as another parameter of the search space. In Zhang and Blum (2001) they proposed an iterative multiscale registration based on optical flow estimation in each scale, claiming that optical flow estimation is robust to image blurring. All global methods require considerable (or even complete) spatial overlap of the channels to yield reliable results, which is their major drawback. Landmark-based blur-invariant registration methods have appeared very recently, just after the first paper on the moment-based blur-invariant features (Flusser et al., 1996). Originally, these features could only be used for registration of mutually shifted images (Flusser and Suk 1998; Bentoutou et al. 2002). The proposal of their rotational-invariant version (Flusser and Zitov´a, 1999) in combination with a robust detector of salient points (Zitov´a et al., 1999) led to the registration methods that are able to handle blurred, shifted and rotated images (Flusser et al., 1999, 2003). Although the above-cited registration methods are very sophisticated and can be applied almost to all types of images, the results tend to be rarely perfect. The registration error usually varies from subpixel values to a few pixels, so only MBD and SR methods sufficiently robust to between-channel misregistration can be applied to channel fusion. We will assume in the sequel that the LR images are roughly registered and that Wk ’s reduce to small translations. During the last 20 years, blind deconvolution has attracted considerable attention as a separate image processing task. Initial blind deconvolution attempts were based on single-channel formulations, such as in Lagendijk et al. (1990), Reeves and Mersereau (1992), Chan and Wong (1998), Haindl (2000). A good overview is in Kundur and Hatzinakos (1996a,b). The problem is extremely ill-posed in the single-channel framework and cannot be resolved in the fully blind form. These methods do not exploit the potential of multiframe imaging, because in the single-channel case the missing information about the original image in one channel cannot by supplemented by information obtained from the other channels. Research on intrinsically
IMAGE FUSION
111
multichannel methods has begun fairly recently; refer to Harikumar and Bresler (1999), Giannakis and Heath (2000), Pai and Bovik (2001), Panci ˇ et al. (2003), Sroubek and Flusser (2003) for a survey and other references. Such MBD methods break the limitations of previous techniques and can recover the blurring functions from the degraded images alone. We further ˇ developed the MBD theory in Sroubek and Flusser (2005) by proposing a blind deconvolution method for images, which might be mutually shifted by unknown vectors. A similar idea is used here as a part of the fusion algorithm to remove volatile blurs and will be explained more in Section 3. Superresolution has been mentioned in the literature with an increasing frequency in the last decade. The first SR methods did not involve any deblurring; they just tried to register the LR images with subpixel accuracy and then to resample them on a high-resolution grid. A good survey of SR techniques can be found in Park et al. (2003), Farsui et al. (2004). Maximum likelihood (ML), maximum a posteriori (MAP), the set theoretic approach using POCS (projection on convex sets), and fast Fourier techniques can all provide a solution to the SR problem. Earlier approaches assumed that subpixel shifts are estimated by other means. More advanced techniques, such as in Hardie et al. (1997), Segall et al. (2004), Woods et al. (2006), include shift estimation in the SR process. Other approaches focus on fast implementation (Farsiu et al., 2004), space–time SR (Shechtman et al., 2005) or SR of compressed video (Segall et al., 2004). Some of the recent SR methods consider image blurring and involve blur removal. Most of them assume only a priori known blurs. However, few exceptions exist. Authors in Nguyen et al. (2001), Woods et al. (2003) proposed BSR that can handle parametric PSFs with one parameter. This restriction is unfortunately very limiting for most real applications. Probably the first attempts for BSR with an arbitrary PSF appeared in Wirawan et al. (1999), Yagle (2003), where polyphase decomposition of the images was employed. Current multiframe blind deconvolution techniques require no or very little prior information about the blurs, they are sufficiently robust to noise and provide satisfying results in most real applications. However, they can hardly cope with the downsampling operator, which violates the standard convolution model. On the contrary, state-of-the-art SR techniques achieve remarkable results in resolution enhancement in the case of no blur. They accurately estimate the subpixel shift between images but lack any apparatus for calculating the blurs. We propose a unifying method that simultaneously estimates the volatile blurs and HR image without any prior knowledge of the blurs and the original image. We accomplish this by formulating the problem as a minimization of a regularized energy function, where the regularization is carried out in both the image and blur domains. Image regularization is based on variational
112
ˇ F. SROUBEK ET AL.
integrals, and a consequent anisotropic diffusion with good edge-preserving capabilities. A typical example of such regularization is total variation. However, the main contribution of this work lies in the development of the blur regularization term. We show that the blurs can be recovered from the LR images up to small ambiguity. One can consider this as a generalization of the results proposed for blur estimation in the case of MBD problems. This fundamental observation enables us to build a simple regularization term for the blurs even in the case of the SR problem. To tackle the minimization task we use an alternating minimization approach, consisting of two simple linear equations. The rest of the chapter is organized as follows. Section 2 outlines the degradation model. In Section 3 we present a procedure for volatile blur estimation. This effortlessly blends in a regularization term of the BSR algorithm as described in Section 4. Finally, Section 5 illustrates applicability of the proposed method to real situations.
2. Mathematical Model To simplify the notation, we will assume only images and PSFs with square supports. An extension to rectangular images is straightforward. Let f [x, y] be an arbitrary discrete image of size F × F, then f denotes an image column vector of size F 2 × 1 and C A { f } denotes a matrix that performs convolution of f with an image of size A × A. The convolution matrix can have a different output size. Adopting the Matlab naming convention, we distinguish two cases: “full” convolution C A { f } of size (F + A − 1)2 × A2 and “valid” convolution CvA { f } of size (F − A + 1)2 × A2 . In both cases the convolution matrix is a Toeplitz-block-Toeplitz (TBT) matrix. In the sequel we will not specify dimensions of convolution matrices if it is obvious from the size of the right argument. Let us assume we have K different LR frames {gk } (each of size G × G) that represent degraded (blurred and noisy) versions of the original scene. Our goal is to estimate the HR representation of the original scene, which we denoted as the HR image f of size F × F. The LR frames are linked with the HR image through a series of degradations similar to those between o[x, y] and gk in (2). First f is geometrically warped (Wk ), then it is convolved with a volatile PSF (Vk ) and finally it is decimated (D). The formation of the LR images in vector-matrix notation is then described as gk = DVk Wk f + nk ,
(3)
where nk is additive noise present in every channel. The decimation matrix D = SU simulates the behavior of digital sensors by first performing
IMAGE FUSION
113
convolution with the U × U sensor PSF (U) and then downsampling (S). The Gaussian function is widely accepted as an appropriate sensor PSF and it is also used here. Its justification is experimentally verified in (Capel, 2004). A physical interpretation of the sensor blur is that the sensor is of finite size and it integrates impinging light over its surface. The sensitivity of the sensor is highest in the middle and decreases towards its borders with Gaussian-like decay. Further we assume that the subsampling factor (or SR factor, depending on the point of view), denoted by ε, is the same in both x and y directions. It is important to underline that ε is a user-defined parameter. In principle, Wk can be a very complex geometric transform that must be estimated by image registration or motion detection techniques. We have to keep in mind that sub-pixel accuracy in gk ’s is necessary for SR to work. Standard image registration techniques can hardly achieve this and they leave a small misalignment behind. Therefore, we will assume that complex geometric transforms are removed in the preprocessing step and Wk reduces to a small translation. Hence Vk Wk = Hk , where Hk performs convolution with the shifted version of the volatile PSF vk , and the acquisition model becomes gk = DHk f + nk = SUHk f + nk .
(4)
The BSR problem then adopts the following form: We know the LR images {gk } and we want to estimate the HR image f for the given S and the sensor blur U. To avoid boundary effects, we assume that each observation gk captures only a part of f . Hence Hk and U are “valid” convolution matrices CvF {h k } and CvF−H +1 {u}, respectively. In general, the PSFs h k are of different size. However, we postulate that they all fit into a H × H support. In the case of ε = 1, the downsampling S is not present and we face a slightly modified MBD problem that has been solved elsewhere (Harikumar ˇ and Bresler, 1999, Sroubek and Flusser, 2005). Here we are interested in the case of ε > 1, when the downsampling occurs. Can we estimate the blurs as in the case ε = 1? The presence of S prevents us from using the cited results directly. However, we will show that conclusions obtained for MBD apply here in a slightly modified form as well.
3. Reconstruction of Volatile Blurs Estimation of blurs in the MBD case (no downsampling) attracted considerable attention in the past. A wide variety of methods were proposed, such as in Harikumar and Bresler (1999), Giannakis and Heath (2000), that provide a satisfactory solution. For these methods to work correctly, certain channel disparity is necessary. The disparity is defined as weak co-primeness of the channel blurs, which states that the blurs have no common factor except a
114
ˇ F. SROUBEK ET AL.
scalar constant. In other words, if the channel blurs can be expressed as a convolution of two subkernels then there is no subkernel that is common to all blurs. An exact definition of weakly co-prime blurs can be found in Giannakis and Heath (2000). Many practical cases satisfy the channel co-primeness, since the necessary channel disparity is mostly guaranteed by the nature of the acquisition scheme and random processes therein. We refer the reader to Harikumar and Bresler (1999) for a relevant discussion. This channel disparity is also necessary for the BSR case. Let us first recall how to estimate blurs in the MBD case and then we will show how to generalize the results for integer downsampling factors. For the time being we will omit noise n, until Section 4, where we will address it appropriately. 3.1. THE MBD CASE
The downsampling matrix S is not present in (4) and only convolution binds the input with the outputs. The acquisition model is of the SIMO type with one input channel f and K output channels gk . Under the assumption of channel co-primeness, we can see that any two correct blurs h i and h j satisfy gi ∗ h j − g j ∗ h i = 0 .
(5)
Considering all possible pairs of blurs, we can arrange the above relation into one system N h = 0 ,
(6)
where h = [h1T , . . . , hTK ]T and N consists of matrices that perform convolution with gk . In most real situations the correct blur size (we have assumed square size H × H ) is not known in advance and therefore we can generate the ˆ1 × H ˆ 2 . The nullity (null-space above equation for different blur dimensions H dimension) of N is exactly 1 for the correctly estimated blur size. By applying SVD (singular value decomposition), we recover precisely the blurs except for ascalar factor. One can eliminate this magnitude ambiguity by stipulating that x,y h k [x, y] = 1, which is a common brightness preserving assumption. For the underestimated blur size, the above equation has no solution. If the blur ˆ 1 − H + 1) ( H ˆ 2 − H + 1). size is overestimated, then nullity(N ) = ( H 3.2. THE BSR CASE
Before we proceed, it is necessary to define precisely the sampling matrix S. Let Sε1 denote a 1D sampling matrix, where ε is the integer subsampling factor. Each row of the sampling matrix is a unit vector whose nonzero element is at such a position that, if the matrix multiplies an arbitrary vector b, the result
IMAGE FUSION
115
of the product is every εth element of b starting from b1 . If the vector length is M then the size of the sampling matrix is (M/ε) × M. If M is not divisible by ε, we can pad the vector with an appropriate number of zeros to make it divisible. A 2D sampling matrix is defined by Sε := Sε1 ⊗ Sε1 ,
(7)
where ⊗ denotes the matrix direct product (Kronecker product operator). Note that the transposed matrix (Sε )T behaves as an upsampling operator that interlaces the original samples with (ε − 1) zeros. ˇ A naive approach, e.g., proposed in Sroubek and Flusser (2006), Chen et al. (2005), is to modify (6) in the MBD case by applying downsampling and formulating the problem as min N [I K ⊗ Sε U]h2 , h
(8)
where I K is the K × K identity matrix. One can easily verify that the condition in (5) is not satisfied for the BSR case as the presence of downsampling operators violates the commutative property of convolution. Even more disturbing is the fact that minimizers of (8) do not have to correspond to the correct blurs. We are going to show that if one uses a slightly different approach, reconstruction of the volatile PSFs h k is possible even in the BSR case. However, we will see that some ambiguity in the solution of h k is inevitable. First, we need to rearrange the acquisition model (4) and construct from the LR images gk a convolution matrix G with a predetermined nullity. Then we take the null space of G and construct a matrix N , which will contain the correct PSFs h k in its null space. Let E × E be the size of “nullifying” filters. The meaning of this name will be clear later. Define G := [G1 , . . . , G K ], where Gk := CvE {gk } are “valid” convolution matrices. Assuming no noise, we can express G in terms of f , u and h k as G = Sε FUH ,
(9)
H := [CεE {h 1 }(Sε )T , . . . , CεE {h K }(Sε )T ] ,
(10)
where
U := CεE+H −1 {u} and F := CvεE+H +U −2 { f }. The convolution matrix U has more rows than columns and therefore it is of full column rank (see proof in Harikumar and Bresler (1999) for general convolution matrices). We assume that Sε F has full column rank as well. This is almost certainly true for real images if F has at least ε 2 -times more rows than columns. Thus Null(G) ≡ Null(H) and the difference between the number of
116
ˇ F. SROUBEK ET AL.
columns and rows of H bounds from below the null space dimension, i.e., nullity(G) ≥ K E 2 − (εE + H − 1)2 .
(11)
Setting N := K E 2 − (εE + H − 1)2 and N := Null(G), we visualize the null space as ⎤ ⎡ n1,1 . . . n1,N ⎢ .. ⎥ , .. (12) N = ⎣ ... . . ⎦ n K ,1
...
n K ,N
where nkn is the vector representation of the nullifying filter ηkn of size E × E, k = 1, . . . , K and n = 1, . . . , N . Let η˜ kn denote upsampled ηkn by factor ε, i.e., η˜ kn := (Sε )T ηkn . Then, we define ⎤ ⎡ C H {η˜ 1,1 } . . . C H {η˜ K ,1 } ⎥ ⎢ .. .. .. (13) N := ⎣ ⎦ . . . C H {η˜ 1,N } . . . C H {η˜ K ,N } and conclude that Nh = 0,
(14)
where hT = [h1 , . . . , h K ]. We have arrived at an equation that is of the same form as (6) in the MBD case. Here we have the solution to the blur estimation problem for the BSR case. However, since Sε is involved, ambiguity of the solution is higher. Without proofs we provide the following statements. For the correct blur size, nullity(N ) = ε 4 . For the underestimated blur size, (14) has ˆ1 × H ˆ 2 , nullity(N ) = ε 2 ( H ˆ1 − no solution. For the overestimated blur size H ˆ H + ε) ( H2 − H + ε). The conclusion may seem to be pessimistic. For example, for ε = 2 the nullity is at least 16, and for ε = 3 the nullity is already 81. Nevertheless, Section 4 will show that N plays an important role in the regularized restoration algorithm and its ambiguity is not a serious drawback. It is interesting to note that a similar derivation is possible for rational SR factors ε = p/q. We downsample the LR images with the factor q, thereby creating q 2 K images, and apply thereon the above procedure for the SR factor p. Another consequence of the above derivation is the minimum necessary number of LR images for the blur reconstruction to work. The condition of the G nullity in (11) implies that the minimum number is K > ε2 . For example, for ε = 3/2, 3 LR images are sufficient; for ε = 2, we need at least 5 LR images to perform blur reconstruction.
IMAGE FUSION
117
4. Blind Superresolution In order to solve the BSR problem, i.e, determine the HR image f and volatile PSFs h k , we adopt a classical approach of minimizing a regularized energy function. This way the method will be less vulnerable to noise and better posed. The energy consists of three terms and takes the form E(f, h) =
K
DHk f − gk 2 + α Q(f ) + β R(h) .
(15)
k=1
The first term measures the fidelity to the data and emanates from our acquisition model (4). The remaining two are regularization terms with positive weighting constants α and β that attract the minimum of E to an admissible set of solutions. The form of E very much resembles the energy proposed ˇ in Sroubek and Flusser (2005) for MBD. Indeed, this should not come as a surprise since MBD and SR are related problems in our formulation. Regularization Q(f ) is a smoothing term of the form Q(f ) = fT Lf ,
(16)
where L is a high-pass filter. A common strategy is to use convolution with the for L, which in the continuous case
Laplacian
corresponds to Q( f ) = |∇ f |2 . Recently, variational integrals Q( f ) = φ(|∇ f |) were proposed, where φ is a strictly convex, nondecreasing function √ that grows at most linearly. Examples of φ(s) are s (total variation), 1 + s 2 − 1 (hypersurface minimal function), log(cosh(s)), or nonconvex functions, such as log(1 + s 2 ), s 2 /(1 + s 2 ) and arctan(s 2 ) (Mumford-Shah functional). The advantage of the variational approach is that, while in smooth areas it has the same isotropic behavior as the Laplacian, it also preserves edges in images. The disadvantage is that it is highly nonlinear. To overcome this difficulty one must use, e.g., the half-quadratic algorithm (Aubert and Kornprobst, 2002). For the purpose of our discussion it suffices to state that after discretization we arrive again at (16), where this time L is a positive semidefinite block tridiagonal matrix constructed of values depending on the gradient of f . The rationale behind the choice of Q( f ) is to constrain the local spatial behavior of images; it resembles a Markov Random Field. Some global constraints may be more desirable but are difficult (often impossible) to define, since we develop a general method that should work with any class of images. The PSF regularization term R(h) directly follows from the conclusions of the previous section. Since the matrix N in (13) contains the correct PSFs h k in its null space, we define the regularization term as a least-squares fit R(h) = N h2 = hT N T N h .
(17)
ˇ F. SROUBEK ET AL.
118
The product N T N is a positive semidefinite matrix. More precisely, R is a consistency term that binds the different volatile PSFs to prevent them from moving freely and, unlike the fidelity term (the first term in (15)), it is based solely on the observed LR images. A good practice is to include with a small weight a smoothing term hT Lh in R(h). This is especially useful in the case of less noisy data to overcome the higher nullity of N . The complete energy then takes the form E(f, h) =
K
DHk f − gk 2 + αfT Lf + β1 N h2 + β2 hT Lh .
(18)
k=1
To find a minimizer of the energy function, we perform alternating minimizations (AM) of E over f and h. The advantage of this scheme lies in its simplicity. Each term of (18) is quadratic and therefore convex (but not necessarily strictly convex) and the derivatives w.r.t. f and h are easy to calculate. This AM approach is a variation on the steepest-descent algorithm. The search space is a concatenation of the blur subspace and the image subspace. The algorithm first descends in the image subspace and after reaching the minimum, i.e., ∇f E = 0, it advances in the blur subspace in the direction ∇h E orthogonal to the previous one, and this scheme repeats. In conclusion, starting with some initial h0 the two iterative steps are: step 1.
fm = arg min E(f, hm ) f K K ⇔ HkT DT DHk + αL f = HkT DT gk , k=1
step 2.
(19)
k=1
hm+1 = arg min E(f m , h) h
⇔ ([I K ⊗ FT DT DF] + β1 N T N + β2 L)h = [I K ⊗ FT DT ]g, (20) CvH {
[g1T , . . . , gTK ]T
where F := f }, g := and m is the iteration step. Note that both steps consist of simple linear equations. Energy E as a function of both variables f and h is not convex due to the coupling of the variables via convolution in the first term of (18). Therefore, it is not guaranteed that the BSR algorithm reaches the global minimum. In our experience, convergence properties improve significantly if we add feasible regions for the HR image and PSFs specified as lower and upper bounds constraints. To solve step 1, we use the method of conjugate gradients (function cgs in Matlab) and then adjust the solution f m to contain values in the admissible range, typically, the range of values of g. It is common to assume that PSF is positive (h k ≥ 0) and that it preserves image brightness.
IMAGE FUSION
119
We can therefore write the lower and upper bounds constraints for PSFs as 2 hk ∈ 0, 1 H . In order to enforce the bounds in step 2, we solve (20) as a constrained minimization problem (function fmincon in Matlab) rather than using the projection as in step 1. Constrained minimization problems are more computationally demanding but we can afford it in this case since the size of h is much smaller than the size of f. The weighting constants α and βi depend on the level of noise. If noise increases, α and β2 should increase, and β1 should decrease. One can use parameter estimation techniques, such as cross-validation (Nguyen et al., 2001) or expectation maximization (Molina et al., 2003), to determine the correct weights. However, in our experiments we set the values manually according to a visual assessment. If the iterative algorithm begins to amplify noise, we have underestimated the noise level. On the contrary, if the algorithm begins to segment the image, we have overestimated the noise level.
5. Experiments This section consists of two parts. In the first one, a set of experiments on synthetic data evaluate performance of the BSR algorithm with respect to the SR factor and compare the reconstruction quality with other methods. The second part demonstrates the applicability of the proposed method to real data. Results are not evaluated with any measure of reconstruction quality, such as mean-square errors or peak signal to noise ratios. Instead we print the results and leave the comparison to a human eye as we believe that in this case the visual assessment is the only reasonable method. In all the experiments the sensor blur is fixed and set to a Gaussian function of standard deviation σ = 0.34 (relative to the scale of LR images). One should underline that the proposed BSR method is fairly robust to the choice of the Gaussian variance, since it can compensate for insufficient variance by automatically including the missing factor of Gaussian functions in the volatile blurs. Another potential pitfall that we have to take into consideration is a feasible range of SR factors. Clearly, as the SR factor ε increases we need more LR images and the stability of BSR decreases. In addition, rational SR factors p/q, where p and q are incommensurable and large regardless of the effective value of ε, also make the BSR algorithm unstable. It is the numerator p that determines the internal SR factor used in the algorithm. Hence we limit ourselves to ε between 1 and 2.5, such as 3/2, 5/3, 2, etc., which is sufficient in most practical applications.
ˇ F. SROUBEK ET AL.
120
1 4 7 1
(a)
4
7
(b)
Figure 2. Simulated data: (a) original 150 × 230 image; (b) six 7 × 7 volatile PSFs used to blur the original image.
5.1. SIMULATED DATA
First, let us demonstrate the BSR performance with a simple experiment. A 150 × 230 image in Figure 2a blurred with the six masks in Figure 2b and downsampled with factor 2 generated six LR images. In this case, registration is not necessary since the synthetic data are precisely aligned. Using the LR images as an input, we estimated the original HR image with the proposed BSR algorithm for ε = 1.25 and 1.75. In Figure 3 one can compare the results printed in their original size. The HR image for ε = 1.25 (Figure 3b) has improved significantly on the LR images due to deconvolution, however some details on the column are still distorted. For the SR factor 1.75, the reconstructed image in Figure 3c is almost perfect. Next we compare performance of the BSR algorithm with two methods: interpolation technique and state-of-the-art SR method. The former technique ˇ consists of the MBD method proposed in Sroubek and Flusser (2005) followed by standard bilinear interpolation (BI) resampling. The MBD method first removes volatile blurs and then BI of the deconvolved image achieves the desired spatial resolution. The latter method, which we will call herein a “standard SR algorithm”, is a MAP formulation of the SR problem proposed, e.g., in Hardie et al. (1997), Segall et al. (2004). This method uses a MAP framework for the joint estimation of image registration parameters (in our case only translation) and the HR image, assuming only the sensor blur (U) and no volatile blurs. For an image prior, we use edge preserving Huber Markov Random Fields (Capel, 2004).
121
IMAGE FUSION
(a) LR
(b) e = 1.25
(c) e = 1.75
Figure 3. BSR of simulated data: (a) one of six LR images with the downsampling factor 2; (b) BSR for ε = 1.25; (c) BSR for ε = 1.75.
In the case of BSR, Section 3 has shown that two distinct approaches exist for blur estimation. Either we use the naive approach in (8) that directly utilizes the MBD formulation, or we apply the intrinsically SR approach summarized in (14). Altogether we have thus four distinct methods for comparison: standard SR approach, MBD with interpolation, BSR with naive blur regularization and BSR with intrinsic blur regularization. Using the original image and PSFs in Figure 2, six LR images (see one LR image in Figure 3a) were generated as in the first experiment, only this time we added white Gaussian noise with SNR = 50 dB.1 Estimated HR images and volatile blurs for all four methods are in Figure 4. The standard SR approach in Figure 4a gives unsatisfactory results, since heavy blurring is present in the LR images and the method assumes only the sensor blur and no volatile blurs. (For this reason, we do not show volatile blurs in this case.) The MBD method in Figure 4b ignores the decimation operator and thus the estimated volatile blurs are similar to LR projections of the original blurs. Despite the fact that blind deconvolution in the first stage performed well, many details are still missing since interpolation in the second stage cannot properly recover high-frequency information. The signal-to-noise ratio is defined as SNR = 10 log(σ 2f /σn2 ), where σ f and σn are the image and noise standard deviations, respectively. 1
ˇ F. SROUBEK ET AL.
122
1 2 3 4
(a)
1 4 7
1234
(b)
1 4 7
1 4 7
(c)
1 4 7
(d)
Figure 4. Comparison of four different SR approaches (ε = 2): (a) standard SR method, (b) MBD followed by bilinear interpolation, (c) naive BSR approach and (b) proposed intrinsic BSR approach. Volatile blurs estimated by each method, except in the case of standard SR, are in the top row. Due to blurring, the standard SR method in (a) failed to reconstruct the HR image. MBD in (b) provided a good estimate of the blurs in the LR scale and performed correct deconvolution but the HR image lacks many details as simple interpolation increased resolution. Both BSR approaches in (c) and (d) gave close to perfect results. However in the case of the naive approach, inaccurate blur regularization resulted in several artifacts in the HR image.
Both the naive and the intrinsic BSR methods outperformed the previous approaches and the intrinsic one provides a close-to-perfect HR image. Due to the inaccurate regularization term in the naive approach, estimated blurs contain tiny erroneous components that resulted in artifacts in the HR image (Figure 4c). However, a more strict and accurate regularization term in the case of the intrinsic BSR approach improved results, which one can see in Figure 4d. 5.2. REAL DATA
The next two experiments demonstrate the true power of our fusion algorithm. We used real photos acquired with two different acquisition devices: webcamera and standard digital camera. The webcam was Logitech QuickCam for Notebooks Pro with the maximum video resolution 640 × 480 and the minimum shutter speed 1/10s. The digital camera was 5 Mpixel Olympus C5050Z equipped with 3× optical zoom. In both experiments we used cross-correlation to roughly register the LR images.
123
IMAGE FUSION
1 8 1
(a)
8
(b)
Figure 5. Reconstruction of images acquired with a webcam (ε = 2): (a) one of ten LR frames extracted from a short video sequence captured with the webcam, zero-order interpolation; (b) HR image and blurs estimated by the BSR algorithm. Note that many facial features, such as glasses, are not apparent in the LR image, but are well reconstructed in the HR image.
In the first one we hold the webcam in hands and captured a short video sequence of a human face. Then we extracted 10 consecutive frames and considered a small section of size 40 × 50. One frame with zero-order interpolation is in Figure 5a. The other frames look similar. The long shutter speed (1/10s) together with the inevitable motion of hands introduced blurring into the images. In this experiment, the SR factor was set to 2. The proposed BSR algorithm removed blurring and performed SR correctly as one can see in Figure 5b. Note that many facial features (eyes, glasses, mouth) indistinguishable in the original LR image became visible in the HR image. The second experiment demonstrates a task of license plate recognition. With the digital camera we took eight photos, registered them with crosscorrelation and cropped each to a 100 × 50 rectangle. All eight cuttings printed in their original size (no interpolation), including one image enlarged with zero-order interpolation, are in Figure 6a. Similar to the previous experiment, the camera was held in hands, and due to the longer shutter speed, the LR images exhibit subtle blurring. We set the SR factor to 5/3. In order to
ˇ F. SROUBEK ET AL.
124
(a)
(b)
(c)
Figure 6. Reconstruction of images acquired with a digital camera (ε = 5/3): (a) eight LR images, one enlarged with zero-order interpolation; (b) HR image estimated by the BSR algorithm; (c) image acquired with optical zoom 1.7×. The BSR algorithm achieved reconstruction comparable to the image with optical zoom.
better assess the obtained results we took one additional image with optical zoom 1.7× (close to the desired SR factor 5/3). This image served as the ground truth; see Figure 6c. The proposed BSR method returned a well reconstructed HR image (Figure 6b), which is comparable to the ground truth acquired with the optical zoom. 6. Conclusions In this chapter we proposed a method for improving visual quality and spatial resolution of digital images acquired by low-resolution sensors. The method is based on fusing several images (channels) of the same scene. It consists of three major steps – image registration, blind deconvolution and superresolution enhancement. We reviewed all three steps and we paid special attention to superresolution fusion. We proposed a unifying system that simultaneously estimates image blurs and recovers the original undistorted image, all in high resolution, without any prior knowledge of the blurs and original image. We accomplished this by formulating the problem as constrained least squares
IMAGE FUSION
125
energy minimization with appropriate regularization terms, which guarantees a close-to-perfect solution. Showing the good performance of the method on real data, we demonstrated its capability to improve the image quality significantly and, consequently, to make the task of object detection and identification much easier for human observers as well as for automatic systems. We envisage the application of the proposed method in security and surveillance systems.
Acknowledgements This work has been supported by Czech Ministry of Education under project No. 1M0572 (Research Center DAR), and by the Grant Agency of the Czech Republic under project No. 102/04/0155.
References Aubert, G. and Kornprobst, P. (2002) Mathematical Problems in Image Processing, New York, Springer Verlag. Bentoutou, Y., Taleb, N., Mezouar, M., Taleb, M., and Jetto, L. (2002) An Invariant Approach for Image Registration in Digital Subtraction Angiography, Pattern Recognition 35, 2853– 2865. Capel, D. (2004) Image Mosaicing and Super-Resolution, New York, Springer. Chan, T. and Wong, C. (1998) Total Variation Blind Deconvolution, IEEE Transactions on Image Processing 7, 370–375. Chen, Y., Luo, Y., and Hu, D. (2005) A General Approach to Blind Image Super-resolution Using a PDE Framework, In Proceedings of SPIE, Vol. 5960, pp. 1819–1830. Farsiu, S., Robinson, M., Elad, M., and Milanfar, P. (2004) Fast and Robust Multiframe Super Resolution, IEEE Transactions on Image Processing 13, 1327–1344. Farsui, S., Robinson, D., Elad, M., and Milanfar, P. (2004) Advances and Challenges in SuperResolution, International Journal on Imaging System and Technology 14, 47–57. Flusser, J., Boldyˇs, J., and Zitov´a, B. (2003) Moment Forms Invariant to Rotation and Blur in Arbitrary Number of Dimensions, IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 234–246. Flusser, J. and Suk, T. (1998) Degraded Image Analysis: an Invariant Approach, IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 590–603. Flusser, J., Suk, T., and Saic, S. (1996) Recognition of Blurred Images by the Method of Moments, IEEE Transactions on Image Processing 5, 533–538. Flusser, J. and Zitov´a, B. (1999) Combined Invariants to Linear Filtering and Rotation, International Journal of Pattern Recognition Artificial Intelligence. 13, 1123–1136. Flusser, J., Zitov´a, B., and Suk, T. (1999) Invariant-based Registration of Rotated and Blurred Images, In I. S. Tammy (ed.), Proceedings IEEE 1999 International Geoscience and Remote Sensing Symposium, Los Alamitos, IEEE Computer Society, pp. 1262–1264. Giannakis, G. and Heath, R. (2000) Blind Identification of Multichannel FIR Blurs and Perfect Image Restoration, IEEE Transactions on Image Processing 9, 1877–1896.
126
ˇ F. SROUBEK ET AL.
Haindl, M. (2000) Recursive Model-based Image Restoration, In Proceedings of the 15th International Conference on Pattern Recognition, Vol. III, Piscataway, NJ, IEEE Press, pp. 346– 349. Hardie, R., Barnard, K., and Armstrong, E. (1997) Joint MAP Registration and High-Resolution Image Estimation Using a Sequence of Undersampled Images, IEEE Transactions on Image Processing 6, 1621–1633. Harikumar, G. and Bresler, Y. (1999) Perfect Blind Restoration of Images Blurred by Multiple Filters: Theory and Efficient Algorithms, IEEE Transactions on Image Processing 8, 202– 219. Kubota, A., Kodama, K., and Aizawa, K. (1999) Registration and Blur Estimation Methods for Multiple Differently Focused Images, In Proceedings International Conference on Image Processing, Vol. II, pp. 447–451. Kundur, D. and Hatzinakos, D. (1996a) Blind Image Deconvolution, IEEE Signal Processing Magazine 13, 43–64. Kundur, D. and Hatzinakos, D. (1996b) Blind Image Deconvolution Revisited, IEEE Signal Processing Magazine 13, 61–63. Lagendijk, R., Biemond, J., and Boekee, D. (1990) Identification and Restoration of Noisy Blurred Images Using the Expectation-maximization Algorithm, IEEE Transanctions on Acoustics, Speech Signal Processing 38, 1180–1191. Molina, R., Vega, M., Abad, J., and Katsaggelos, A. (2003) Parameter Estimation in Bayesian High-resolution Image Reconstruction With Multisensors, IEEE Transactions on Image Processing 12, 1655–1667. Myles, Z. and Lobo, N. V. (1998) Recovering Affine Motion and Defocus Blur Simultaneously, IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 652–658. Nguyen, N., Milanfar, P., and Golub, G. (2001) Efficient Generalized Cross-validation With Applications to Parametric Image Restoration and Resolution Enhancement, IEEE Transactions on Image Processing 10, 1299–1308. Pai, H.-T. and Bovik, A. (2001) On Eigenstructure-based Direct Multichannel Blind Image Restoration, IEEE Transactions on Image Processing 10, 1434–1446. Panci, G., Campisi, P., Colonnese, S., and Scarano, G. (2003) Multichannel Blind Image Deconvolution Using the Bussgang Algorithm: Spatial and Multiresolution Approaches, IEEE Transactions on Image Processing 12, 1324–1337. Park, S., Park, M., and Kang, M. (2003) Super-resolution Image Reconstruction: a Technical Overview, IEEE Signal Processing Magazine 20, 21–36. Reeves, S. and Mersereau, R. (1992) Blur Identification by the Method of Generalized Crossvalidation, IEEE Transactions on Image Processing 1, 301–311. Segall, C., Katsaggelos, A., Molina, R., and Mateos, J. (2004) Bayesian Resolution Enhancement of Compressed Video, IEEE Transactions on Image Processing 13, 898–911. Shechtman, E., Caspi, Y., and Irani, M. (2005) Space-time Super-resolution, IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 531–545. ˇ Sroubek, F. and Flusser, J. (2003) Multichannel Blind Iterative Image Restoration, IEEE Transactions on Image Processing 12, 1094–1106. ˇ Sroubek, F. and Flusser, J. (2005) Multichannel Blind Deconvolution of Spatially Misaligned Images, IEEE Transactions on Image Processing 14, 874–883. ˇ Sroubek, F. and Flusser, J. (2006) Resolution Enhancement Via Probabilistic Deconvolution of Multiple Degraded Images, Pattern Recognition Letters 27, 287–293. Wirawan, Duhamel, P., and Maitre, H. (1999) Multi-channel High Resolution Blind Image Restoration, In Proceedings of IEEE ICASSP, pp. 3229–3232.
IMAGE FUSION
127
Woods, N., Galatsanos, N., and Katsaggelos, A. (2003) EM-based Simultaneous Registration, Eestoration, and Interpolation of Super-resolved Images, In Proceedings of IEEE ICIP, Vol. 2, pp. 303–306. Woods, N., Galatsanos, N., and Katsaggelos, A. (2006) Stochastic Methods for Joint Registration, Restoration, and Interpolation of Multiple Undersampled Images, IEEE Transactions on Image Processing 15, 201–213. Yagle, A. (2003) Blind Superresolution From Undersampled Blurred Measurements, In Advanced Signal Processing Algorithms, Architectures, and Implementations XIII, Vol. 5205, Bellingham, pp. 299–309, SPIE. Zhang, Y., Wen, C., and Zhang, Y. (2000) Estimation of Motion Parameters from Blurred Images, Pattern Recognition Letters 21, 425–433. Zhang, Y., Wen, C., Zhang, Y., and Soh, Y. C. (2002) Determination of Blur and Affine Combined Invariants by Normalization, Pattern Recognition 35, 211–221. Zhang, Z. and Blum, R. (2001) A Hybrid Image Registration Technique for a Digital Camera Image Fusion Application, Information Fusion 2, 135–149. Zitov´a, B. and Flusser, J. (2003) Image Registration Methods: a Survey, Image and Vision Computing 21, 977–1000. Zitov´a, B., Kautsky, J., Peters, G., and Flusser, J. (1999) Robust Detection of Significant Points in Multiframe Images, Pattern Recognition Letters 20, 199–206.
This page intentionally blank
NONLINEAR STATISTICAL SIGNAL PROCESSING: A PARTICLE FILTERING APPROACH
J. V. Candy (
[email protected]) University of California, Lawrence Livermore National Lab. & Santa Barbara Livermore, CA 94551
Abstract. An introduction to particle filtering is discussed starting with an overview of Bayesian inference from batch to sequential processors. Once the evolving Bayesian paradigm is established, simulation-based methods using sampling theory and Monte Carlo realizations are discussed. Here the usual limitations of nonlinear approximations and non-Gaussian processes prevalent in classical nonlinear processing algorithms (e.g. Kalman filters) are no longer a restriction to perform Bayesian inference. It is shown how the underlying hidden or state variables are easily assimilated into this Bayesian construct. Importance sampling methods are then discussed and shown how they can be extended to sequential solutions implemented using Markovian state-space models as a natural evolution. With this in mind, the idea of a particle filter, which is a discrete representation of a probability distribution, is developed and shown how it can be implemented using sequential importance sampling/resampling methods. Finally, an application is briefly discussed comparing the performance of the particle filter designs with classical nonlinear filter implementations. Key words: particle filtering, Bayesian processing, Bayesian approach, simulation-based sampling, nonlinear signal processing
1. Introduction In this chapter we develop the “Bayesian approach” to signal processing for a variety of useful model sets. It features the next generation of processor that has recently been enabled with the advent of high speed/high throughput computers. The emphasis is on nonlinear/non-Gaussian problems, but classical techniques are included as special cases to enable the reader familiar with such methods to draw a parallel between the approaches. The common ground is the model sets. Here the state-space approach is emphasized because of its inherent applicability to a wide variety of problems, both linear and nonlinear, as well as time invariant and time-varying, including what 129 J. Byrnes (ed.), Imaging for Detection and Identification, 129–150. C 2007 Springer.
130
J. V. CANDY
has become popularly termed “physics-based” models. Here we discuss the next generation of processors that will clearly dominate the future of modelbased signal processing for years to come (Candy, 2006). This chapter discusses a unique perspective of signal processing from the Bayesian viewpoint in contrast to the pure statistical approach. The underlying theme of this chapter is the Bayesian approach which is uniformly developed and followed throughout.
2. Bayesian Approach to Signal Processing In this section we motivate the idea of Bayesian estimation from the purely probabilistic perspective, that is, we do not consider underlying models at all, just densities and distributions. Modern statistical signal processing techniques evolve directly from a Bayesian perspective, that is, they are cast into a probabilistic framework using Bayes’ theorem as the fundamental construct. Bayesian techniques are constructed simply around Bayes’ theorem. More specifically, the information about the random signal, x(t), required to solve a vast majority of estimation/processing problems is incorporated in the underlying probability distribution generating the process. For instance, the usual signal enhancement problem is concerned with providing the “best” (in some sense) estimate of the signal at time t based on all of the data available at that time. The filtering distribution provides that information directly in terms of its underlying statistics. That is, by calculating the statistics of the process directly from the filtering distribution, the enhanced signal can be extracted using a variety of estimators like maximum a posteriori, maximum likelihood, minimum mean-squared error, accompanied by a variety of performance statistics such as error covariances and bounds (Candy, 2006; Sullivan and Candy, 1997). We cast this discussion into a dynamic variable/parameter structure by defining the “unobserved” signal or equivalently “hidden” variables as the set of N x -vectors, {x(t)}, t = 0, . . . , N . On the other hand, we define the observables or equivalently measurements as the set of N y -vectors, {y(t)}, t = 0, . . . , N considered to be conditionally independent of the signal variables. The goal in recursive Bayesian estimation is to sequentially (in-time) estimate the joint posterior distribution, Pr(x(0), . . . , x(N ); y(0), . . . , y(N )). Once the posterior is estimated, then many of the interesting statistics characterizing the process under investigation can be exploited to extract meaningful information. We start by defining two sets of random (vector) processes: X t := {x(0), . . . , x(t)} and Yt := {y(0), . . . , y(t)}. Here we can consider X t to be the set of dynamic random variables or parameters of interest and Yt as the
NONLINEAR STATISTICAL SIGNAL PROCESSING
131
set of measurements or observations of the desired process.1 In any case we start with Bayes’ theorem for the joint posterior distribution as Pr(X t |Yt ) =
Pr(Yt |X t ) × Pr(X t ) . Pr(Yt )
(1)
In Bayesian theory, the posterior defined by Pr(X t |Yt ) is decomposed in terms of the prior Pr(X t ), its likelihood Pr(Yt |X t ) and the evidence or normalizing factor, Pr(Yt ). Each has a particular significance in this construct. It has been shown (Doucet et al., 2001) the joint posterior distribution can be expressed, sequentially, as the joint sequential Bayesian posterior estimator as Pr(y(t)|x(t)) × Pr(x(t)|x(t − 1)) Pr(X t |Yt ) = Pr(X t−1 |Yt−1 ). (2) Pr(y(t)|Yt−1 ) This result is satisfying in the sense that we need only know the joint posterior distribution at the previous stage, t − 1, scaled by a weighting function to sequentially propagate the posterior to the next stage, that is, NEW
WEIGHT
OLD
Pr(X t |Yt ) = W(t, t − 1) × Pr(X t−1 |Yt−1 ) .
(3)
where the weight is defined by Pr(y(t)|x(t)) × Pr(x(t)|x(t − 1)) W(t, t − 1) := Pr(y(t)|Yt−1 ) Even though this expression provides the full joint posterior solution, it is not physically realizable unless the distributions are known in closed form and the underlying multiple integrals or sums can be analytically determined. In fact, a more useful solution is the marginal posterior distribution (Doucet et al., 2001) given by the update recursion2 as Likelihood
Prior
Posterior Pr(y(t)|x(t)) × Pr(x(t)|Yt−1 ) Pr(x(t)|Yt ) = Pr(y(t)|Yt−1 )
(4)
Evidence
where we can consider the update or filtering distribution as a weighting of 1
In Kalman filtering theory, the X t are considered the states or hidden variables not necessarily observable directly, while the Yt are observed or measured directly. 2 Note that this expression precisely satisfies Bayes’ rule as illustrated in the equation.
132
J. V. CANDY TABLE I. Sequential Bayesian processor for filtering posterior Prediction Pr(x(t)|Yt−1 ) = Pr(x(t)|x(t − 1)) × Pr(x(t − 1)|Yt−1 ) d x(t − 1) Update/Posterior Pr(x(t)|Yt ) = Pr(y(t)|x(t)) × Pr(x(t)|Yt−1 )/Pr(y(t)|Yt−1 ) Initial Conditions x(0) P(0) Pr(x(0)|Y0 )
the prediction distribution as in the full joint case above, that is, UPDATE
WEIGHT
PREDICTION
Pr(x(t)|Yt ) = Wc (t, t − 1) × Pr(x(t)|Yt−1 )
(5)
where the weight in this case is defined by Wc (t, t − 1) :=
Pr(y(t)|x(t)) . Pr(y(t)|Yt−1 )
We summarize the sequential Bayesian processor in Table I. These two sequential relations form the theoretical foundation of many of the sequential particle filter designs. Next we consider the idea of Bayesian importance sampling.
3. Monte Carlo Approach In signal processing, we are interested in some statistical measure of a random signal or parameter usually expressed in terms of its moments. For example, suppose we have some signal function, say f (X ), with respect to some underlying probabilistic distribution, Pr(X ), then a typical measure to seek is its performance “on the average” which is characterized by the expectation E X { f (X )} = f (X )Pr(X ) d X. (6) Instead of attempting to use numerical integration techniques, stochastic sampling techniques known as Monte Carlo (MC) integration have evolved as an alternative. The key idea embedded in the MC approach is to represent the required distribution as a set of random samples rather than a specific analytic function (e.g. Gaussian). As the number of samples becomes large, they provide an equivalent representation of the distribution enabling moments to be estimated directly.
NONLINEAR STATISTICAL SIGNAL PROCESSING
133
MC integration draws samples from the required distribution and then forms sample averages to approximate the sought after distributions, that is, it maps integrals to discrete sums. Thus, MC integration evaluates Equation 6 by drawing samples, {X (i)} from Pr(X ) with “→” defined as drawn from. Assuming perfect sampling, this produces the estimated or empirical distribution given by N 1
ˆ Pr(X )≈ δ (X − X (i)) N i=1
which is a probability distribution of mass or weights, 1/N , and random variable or location X (i). Substituting the empirical distribution into the integral gives N 1
ˆ f (X )Pr(X ) dX ≈ f (X (i)) ≡ f (7) E X { f (X )} = N i=1 which follows directly from the sifting property of the delta function. Here f is said to be a MC estimate of E X { f (X )}. A generalization to the MC approach is known as importance sampling which evolves from: g(x) I = × q(x) d x for q(x) d x = 1. (8) g(x) d x = q(x) X X Here q(x) is referred to as the sampling distribution or more appropriately the importance sampling distribution, since it samples the target distribution, g(x), non-uniformly giving “more importance” to some values of g(x) than others. We say that the support of q(x) covers that of g(x), that is, the samples drawn from q(·) overlap the same region (or more) corresponding to the samples of g(·). The integral in Equation 8 can be estimated by: r Draw N -samples from ˆ q(x) : X (i) → q(x) and q(x) ≈
N 1
δ(x − X (i)); N i=1
r Compute the sample mean,
N N g(x) 1
g(X (i)) g(x) 1
I = Eq ≈ × δ(x − X (i)) d x = q(x) q(x) N i=1 N i=1 q(X (i)) Consider the case where we would like to estimate the expectation of the function of X given by f (X ); then choosing an importance distribution, q(x),
134
J. V. CANDY
that is similar to f (x) with covering support gives the expectation estimator p(x) f (x) × p(x) d x = f (x) × q(x) d x. (9) E p { f (x)} = q(x) X X If we draw samples, {X (i)}, i = 0, 1, . . . , N from the importance distribution, q(x), and compute the sample mean, then we obtain the importance sampling estimator N 1
p(x) p(X (i)) E p { f (x)} = × q(x) d x ≈ f (x) f (X (i)) × q(x) N i=1 q(X (i)) X (10) demonstrating the concept. NNote we are again assuming perfect (uniform) ˆ sampling with q(x) ≈ N1 i=1 δ(x − X (i)). The “art” in importance sampling is in choosing the importance distribution, q(·), that approximates the target distribution, p(·), as closely as possible. This is the principal factor effecting performance of this approach, since variates must be drawn from q(x) that cover the target distribution. Using the concepts of importance sampling, we can approximate the posterior distribution with a function on a finite discrete support. Since it is usually not possible to sample directly from the posterior, we use importance sampling coupled with an easy to sample proposal distribution, say q(X t |Yt ) – this is the crucial choice and design step required in Bayesian importance sampling methodology. Here X t = {x(0), . . . , x(t)} represents the set of dynamic variables and Yt = {y(0), . . . , y(t)}, the set of measured data as before. Therefore, starting with a function of the set of variables, say g(X t ), we would like to estimate its mean using the importance concept, that is, (11) E{g(X t )} = g(X t ) × Pr(X t |Yt ) d X t where Pr(X t |Yt ) is the posterior distribution. Using the MC approach, we would like to sample from this posterior directly and then use sample statistics to perform the estimation. Therefore we insert the proposal importance distribution, q(X t |Yt ) as before Pr(X t |Yt ) × q(X t |Yt ) d X t . (12) gˆ (t) := E{g(X t )} = g(X t ) q(X t |Yt ) Now applying Bayes’ rule to the posterior distribution, and defining an unnormalized weighting function as W (t) :=
Pr(Yt |X t ) × Pr(X t ) Pr(X t |Yt ) = q(X t |Yt ) q(X t |Yt )
(13)
NONLINEAR STATISTICAL SIGNAL PROCESSING
and substituting gives
gˆ (t) =
W (t) g(X t ) × q(X t |Yt ) d X t . Pr(Yt )
135
(14)
The evidence or normalizing distribution, Pr(Yt ), is very difficult to estimate; however, it can be eliminated in this expression by first replacing it by the total probability and inserting the importance distribution to give gˆ (t) =
E q {W (t) × g(X t )} E q {W (t)}
(15)
that are just expectations with respect to the proposal importance distribution. Thus, drawing samples from the proposal X t (i) → X t ∼ q(X t |Yt ) and using the MC approach (integrals to sums) leads to the desired result. That is, from the “perfect” sampling distribution, we have that N 1
ˆ t |Yt ) ≈ q(X δ(X t − X t (i)) N i=1
(16)
and therefore substituting, applying the sifting property of the Dirac delta function and defining the “normalized” weights by Wi (t) Wi (t) := N i=1 Wi (t)
for Wi (t) =
Pr(Yt |X t (i)) × Pr(X t (i)) q(X t (i)|Yt )
(17)
we obtain the final estimate gˆ (t) ≈
N
Wi (t) × g(X t (i)).
(18)
i=1
This importance estimator is biased, being the ratio of two sample estimators, but it can be shown that it asymptotically converges to the true statistic and the central limit theorem holds (Tanner, 1993; West and Harrison, 1997; Liu, 2001). Thus as the number of samples increase (N →∞), a reasonable estimate of the posterior is ˆ t |Yt ) ≈ Pr(X
N
Wi (t) × δ(X t − X t (i))
(19)
i=1
which is the goal of Bayesian estimation. This estimate provides a “batch” solution, but we must develop a sequential estimate from a more pragmatic perspective. The importance distribution can be modified to enable a sequential estimation of the desired posterior distribution, that is, we estimate the posterior,
136
J. V. CANDY
ˆ t−1 |Yt−1 ) using importance weights, W(t − 1). As a new sample bePr(X comes available, we estimate the new weights, W(t) leading to an updated ˆ t |Yt ). This means that in order to obtain the estimate of the posterior, Pr(X new set of samples, X t (i) ∼ q(X t |Yt ) sequentially, we must use the previous set of samples, X t−1 (i) ∼ q(X t−1 |Yt−1 ). Thus, with this in mind, the importance distribution, q(X t |Yt ) must admit a marginal distribution, q(X t−1 |Yt−1 ) implying a Bayesian factorization q(X t |Yt ) = q(X t−1 |Yt−1 ) × q(x(t)|X t−1 , Yt ).
(20)
This type of importance distribution leads to the desired sequential solution (Ristic et al., 2004). Recall the Bayesian solution to the batch posterior estimation problem (as before), Pr(y(t)|x(t)) × Pr(x(t)|x(t − 1)) Pr(X t |Yt ) = × Pr(X t−1 |Yt−1 ) Pr(y(t)|Yt−1 ) and recognizing the denominator as just the evidence or normalizing distribution and not a function of X t , we have Pr(X t |Yt ) ∝ Pr(y(t)|x(t)) × Pr(x(t)|x(t − 1)) × Pr(X t−1 |Yt−1 ).
(21)
Substituting this expression for the posterior in the weight relation, we have W (t) =
Pr(Yt |X t ) Pr(x(t)|x(t − 1)) Pr(X t−1 |Yt−1 ) = Pr(y(t)|x(t)) × × q(X t |Yt ) q(x(t)|X t−1 , Yt ) q(X |Yt−1 ) t−1 Previous Weight
(22) which can be written as W (t) = W (t − 1) ×
Pr(y(t)|x(t)) × Pr(x(t)|x(t − 1)) q(x(t)|X t−1 , Yt )
(23)
giving us the desired relationship—a sequential updating of the weight at each time-step. These results then enable us to formulate a generic Bayesian sequential importance sampling algorithm: 1. Choose samples from the proposed importance distribution: xi (t) ∼ q(x(t)|X t−1 , Yt ); 2. Determine the required conditional distributions: Pr(xi (t)|x(t − 1)),
Pr(y(t)|xi (t));
NONLINEAR STATISTICAL SIGNAL PROCESSING
137
3. Calculate the unnormalized weights: Wi (t) using Equation 23 with x(t) → xi (t); 4. Normalize the weights: Wi (t) of Equation 17; and 5. Estimate the posterior distribution: ˆ t |Yt ) = Pr(X
N
Wi (t)δ (x(t) − xi (t)) .
i=1
Once the posterior is estimated, then desired statistics evolve directly. Next we consider using a model-based approach incorporating state-space models (Candy, 2006).
4. Bayesian Approach to the State-Space Bayesian estimation relative to the state-space models is based on extracting the unobserved or hidden dynamic (state) variables from noisy measurement data. The Markovian state vector with initial distribution, Pr(x(0)), propagates temporally throughout the state-space according to the probabilistic transition distribution, Pr(x(t)|x(t − 1)), while the conditionally independent measurements evolve from the likelihood distribution, Pr(y(t)|x(t)). We see that the dynamic state variable at time t is obtained through the transition probability based on the previous state (Markovian property), x(t − 1), and the knowledge of the underlying conditional probability. Once propagated to time t, the dynamic state variable is used to update or correct based on the likelihood probability and the new measurement, y(t). This evolutionary process is illustrated in Fig. 1. Note that it is the knowledge of these conditional distributions that enable the Bayesian processing. The usual model-based constructs of the dynamic state variables indicate that there is an equivalence between the probabilistic distributions and the underlying state/measurement transition models. The functional discrete state representation given by x(t) = A (x(t − 1), u(t − 1), w(t − 1)) y(t) = C (x(t), u(t), v(t))
(24)
where w and v are the respective process and measurement noise sources with u a known input. Here A (·) is the nonlinear (or linear) dynamic state transition function and C (·) the corresponding measurement function. Both conditional probabilistic distributions embedded within the Bayesian framework are completely specified by these functions and the underlying noise
138
J. V. CANDY
Figure 1. Bayesian state-space probabilistic evolution.
distributions: Pr(w(t − 1)) and Pr(v(t)). That is, we have the equivalence3 A (x(t − 1), u(t − 1), w(t − 1)) ⇒ Pr (x(t)|x(t − 1)) ⇔ A (x(t)|x(t − 1)) C (x(t), u(t), v(t)) ⇒ Pr (y(t)|x(t)) ⇔ C (y(t)|x(t)) (25) Thus, the state-space model along with the noise statistics and prior distributions define the required Bayesian representation or probabilistic propagation model defining the evolution of the states and measurements through the transition probabilities. This is a subtle point that must be emphasized and illustrated in the diagram of Figure 1. Here the dynamic state variables propagate throughout the state-space specified by the transition probability (A (x(t)|x(t − 1))) using the embedded process model. That is, the “unobserved” state at time t − 1 depends the transition probability distribution to propagate to the state at time t. Once evolved, the state combines under the corresponding measurement at time t through the conditional likelihood distribution (C (y(t)|x(t))) using the embedded measurement model to obtain the required likelihood distribution. These events continue to evolve throughout with the states propagating through the state transition probability using the process model and the measurements generated by the states and likelihood using the measurement model. From the Bayesian perspective, the broad initial prior is scaled by the evidence and “narrowed” by the likelihood to estimate the posterior. With this in mind we can now return to the original Bayesian estimation problem, define it and show (at least conceptually) the solution. Using 3
We use this notation to emphasize the influence of both process (A) and measurement (C) representation on the conditional distributions.
139
NONLINEAR STATISTICAL SIGNAL PROCESSING
the state-space and measurement representation, the basic dynamic state estimation (signal enhancement) problem can now be stated in the Bayesian framework as: GIVEN a set of noisy uncertain measurements, {y(t)}, and known inputs, {u(t)}; t = 0, . . . , N along with the corresponding prior distributions for the initial state and process and measurement noise sources: Pr(x(0)), Pr(w(t − 1)), Pr(v(t)) as well as the conditional transition and likelihood probability distributions: Pr (x(t)|x(t − 1)), Pr (y(t)|x(t)) characterized by the state and measurement models: A (x(t)|x(t − 1)), C (y(t)|x(t)), FIND the “best” (filtered) estimate of the state, x(t), say xˆ (t|t) based on all of the data up to and including t, Yt ; that is, find the best estimate of the filtering posterior, Pr (x(t)|Yt ), and its associated statistics. Analytically, to generate the model-based version of the sequential Bayesian processor, we replace the transition and likelihood distributions with the conditionals of Equation 25. The solution to the signal enhancement or equivalently state estimation problem is given by the filtering distribution, Pr (x(t)|Yt ) which was solved previously in Section 2 (see Table I). We start with the prediction recursion characterized by the Chapman–Kolmogorov equation replacing transition probability with the implied model-based conditional, that is, Pr(x(t)|Yt−1 ) =
Embedded Process Model
Prior
A(x(t)|x(t − 1)) × Pr(x(t − 1)|Yt−1 ) d x(t − 1).
(26) Next we incorporate the model-based likelihood into the posterior equation with the understanding that the process model has been incorporated into the prediction Prediction ×Pr(x(t)|Yt−1 ) Pr(y(t)|Yt−1 ).
Embedded Measurement Model
Pr(x(t)|Yt ) =
C(y(t)|x(t))
(27)
Thus, we see from the Bayesian perspective that the sequential Bayesian processor employing the state-space representation of Equation 24 is straightforward. Next let us investigate a more detailed development of the processor resulting in a closed-form solution – the linear Kalman filter.
5. Bayesian Particle Filters Particle filtering (PF) is a sequential MC method employing the recursive estimation of relevant probability distributions using the concepts of “importance
140
J. V. CANDY
sampling” and the approximations of distributions with discrete random measures (Ristic et al., 2004; Godsill and Djuric, 2002; Djuric et al., 2003; Haykin and de Freitas, 2004; Doucet and Wang, 2005). The key idea is to represent the required posterior distribution by a set of N p -random samples, the particles, with associated weights, {xi (t), Wi (t)}; i = 1, . . . , N p , and compute the required MC estimates. Of course, as the number of samples becomes very large the MC representation becomes an equivalent characterization of the analytical description of the posterior distribution. Thus, particle filtering is a technique to implement recursive Bayesian estimators by MC simulation. It is an alternative to approximate Kalman filtering for nonlinear problems (Candy, 2006; Sullivan and Candy, 1997; Ristic et al., 2004). In PF continuous distributions are approximated by “discrete” random measures composed of these weighted particles or point masses where the particles are actually samples of the unknown or hidden states from the state-space and the weights are the associated “probability masses” estimated using the Bayesian recursions as shown in Fig. 2. From the figure we see that associated with each particle, xi (t) is a corresponding weight or (probability)
Figure 2. Particle filter representation of posterior probability distribution in terms of weights (probabilities) and particles (samples).
NONLINEAR STATISTICAL SIGNAL PROCESSING
141
mass, Wi (t). Therefore knowledge of this random measure, {xi (t), Wi (t)} characterizes the empirical posterior distribution – an estimate of the filtering posterior, that is, ˆ Pr(x(t)|Y t) ≈
Np
Wi (t)δ(x(t) − xi (t))
i=1
at a particular instant of time t. Importance sampling plays a crucial role in state-space particle algorithm development. PF does not involve linearizations around current estimates, but rather approximations of the desired distributions by these discrete measures. In comparison, the Kalman filter recursively estimates the conditional mean and covariance that can be used to characterize the filtering posterior Pr(x(t)|Yt ) under Gaussian assumptions (Candy, 2006). In summary, particle filters are sequential MC based “point mass” representation of probability distributions. They only require a state-space representations of the underlying process to provide a set of particles that evolve at each time step leading to an instantaneous approximation of the target posterior distribution of the state at time t given all of the data up to that time. Fig. 2 illustrates the evolution of the posterior at a particular time step. Here we see the estimated posterior based on 21-particles (non-uniformly spaced) and we select the 5th particle and weight to illustrate the instantaneous approximation ˆ at time t for xi versus Pr(x(t)|Y t ). Statistics are calculated across the ensemble created over time to provide the inference estimates of the states. For example, the minimum mean-squared error (MMSE) estimate is easily determined by averaging over xi (t), since ˆ xˆ mmse (t) = x(t)Pr(x(t)|Yt ) d x ≈ x(t)Pr(x(t)|Y t) dx =
Np Np 1
1
x(t)Wi (t)δ(x(t) − xi (t)) = Wi (t)xi (t) N p i=1 N p i=1
while the maximum a posterior (MAP) estimate is simply determined by finding the sample corresponding to the maximum weight of xi (t) across the ensemble at each time step, that is, ˆ xˆ MAP (t) = max{Pr(x(t)|Y t )}. xi
(28)
The sequential importance sampling solution to the recursive Bayesian state estimation problem was given previously starting with the recursive form for the importance distribution as q(X t |Yt ) = q(X t−1 |Yt−1 ) × q(x(t)|X t−1 , Yt )
142
J. V. CANDY
and evolving to the recursive expression for the importance weights as W (t) =
Pr(Yt |X t ) × Pr(X t ) Pr(X t |Yt ) = q(X t |Yt ) q(X t−1 |Yt−1 ) × q(x(t)|X t−1 , Yt ) Likelihood
Transition
Pr(y(t)|x(t)) × Pr(x(t)|x(t − 1)) W (t) = W (t − 1) × . q(x(t)|X t−1 , Yt )
(29)
The state-space particle filter (SSPF) evolving from this sequential importance sampling construct follows directly after sampling from the importance distribution, that is, xi (t) → q(x(t)|x(t − 1), y(t)) C(y(t)|xi (t)) × A(xi (t)|xi (t − 1)) Wi (t) = Wi (t − 1) × q(xi (t)|xi (t − 1), y(t)) Wi (t) Wi (t) = N p i=1 Wi (t)
(30)
and the filtering posterior is estimated by ˆ Pr(x(t)|Y t) ≈
Np
Wi (t) × δ(x(t) − xi (t)).
(31)
q (x(t)|X t−1 , Yt ) → q (x(t)|x(t − 1), y(t))
(32)
i=1
Assuming that
then the importance distribution is only dependent on [x(t − 1), y(t)] which is common when performing filtering, Pr(x(t)|Yt ), at each instant of time. This completes the theoretical motivation of the state-space-particle filters. Next we consider a pragmatic approach for implementation.
6. Bootstrap Particle Filter In the previous section we developed the generic SSPF from the simulationbased sampling perspective. The basic design tool when developing these algorithms is the choice of the importance sampling distribution q(·). One of the most popular realizations of this approach was developed by using the transition prior as the importance proposal (Doucet et al., 2001). This prior
NONLINEAR STATISTICAL SIGNAL PROCESSING
143
is defined in terms of the state-space representation by A(x(t)|x(t − 1)) → A(x(t − 1), u(t − 1), w(t − 1)) which is dependent on the known excitation and process noise statistics. It is given by qprior (x(t)|x(t − 1), Yt ) → Pr(x(t)|x(t − 1)). Substituting this choice into the expression for the weights gives Pr(y(t)|xi (t)) × Pr(x(t)|xi (t − 1)) qprior (x(t)|xi (t − 1), Yt ) = Wi (t − 1) × Pr(y(t)|xi (t))
Wi (t) = Wi (t − 1) ×
since the priors cancel. Note two properties of this choice of importance distribution. First, the weight does not use the most recent observation, y(t) and second this choice is easily implemented and updated by simply evaluating the measurement likelihood, C(y(t)|xi (t)); i = 1, . . . , N p for the sampled particle set. These weights require the particles to be propagated to time t before the weights can be calculated. This choice can lead to problems, since the transition prior is not conditioned on the measurement data, especially the most recent. Failing to incorporate the latest available information from the most recent measurement to propose new values for the states leads to only a few particles having significant weights when their likelihood is calculated. The transitional prior is a much broader distribution than the likelihood indicating that only a few particles will be assigned a large weight. Thus, the algorithm will degenerate rapidly. Thus, the SSPF algorithm takes the same generic form as before with the importance weights much simpler to evaluate with this approach. It has been called the bootstrap PF, the condensation PF, or the survival of the fittest algorithm (Doucet et al., 2001). One of the major problems with the importance sampling algorithms is the depletion of the particles, that is, they tend to increase in variance at each iteration. The degeneracy of the particle weights creates a problem that must be resolved before these particle algorithms can be of any pragmatic use in applications. The problem occurs because the variance of the importance weights can only increase in time (Doucet et al., 2001), thereby making it impossible to avoid this weight degradation. Degeneracy implies that a large computational effort is devoted to updating particles whose contribution to the posterior is negligible. Thus, there is a need to somehow resolve this problem to make the simulation-based techniques viable. This requirement leads to the idea of “resampling” the particles. The main objective in simulation-based sampling techniques is to generate i.i.d. samples from the targeted posterior distribution in order to perform
144
J. V. CANDY
statistical inferences extracting the desired information. Thus, the importance weights are quite critical since they contain probabilistic information about each specific particle. In fact, they provide us with information about “how probable a sample has been drawn from the target posterior” (van der Merwe, 2004; Schoen, 2006). Therefore, the weights can be considered acceptance probabilities enabling us to generate independent (approximately) samples ˆ from the posterior, Pr(x(t)|Yt ). The empirical distribution, Pr(x(t)|Y t ) is defined over a set of finite (N p ) random measures, {xi (t), Wi (t)}; i = 1, . . . , N p approximating the posterior, that is, ˆ Pr(x(t)|Y t) ≈
Np
Wi (t)δ(x(t) − xi (t)).
(33)
i=1
Resampling, therefore, can be thought of as a realization of enhanced particles, xˆ k (t), extracted from the original samples, xi (t) based on their “acceptance probability”, Wi (t) at time t, that is, statistically we have Pr xˆ k (t) = xi (t) = Wi (t) for i = 1, . . . , N p (34) or we write it symbolically as xˆ k (t) ⇒ xi (t) with the set of new particles, {xˆ k (t)}, replacing the old set, {xi (t)}. The fundamental concept in resampling theory is to preserve particles with large weights (large probabilities) while discarding those with small weights. Two steps must occur to resample effectively: (1) a decision, on a weight-by-weight basis, must be made to select the appropriate weights and reject the inappropriate; and (2) resampling must be performed to minimize the degeneracy. The overall strategy when coupled with importance sampling is termed sequential importance resampling (SIR) (Doucet et al., 2001). We illustrate the evolution of the particles through a variety of prediction-update time-steps in Fig. 3 where we see evolution of each set of particles through prediction, resampling and updating. We summarize the bootstrap particle filter algorithm in Table II. This completes the algorithm, next we apply it to a standard problem.
7. Example: Nonlinear Non-Gaussian Prediction We consider a well-known problem that has become a benchmark for many of the PF algorithms. It is highly nonlinear, non-Gaussian and nonstationary
NONLINEAR STATISTICAL SIGNAL PROCESSING
145
Figure 3. Evolution of particle filter weights and particles using the sequential state-space SIR algorithm: resampling, propagation (state-space transition model), update (state-space measurement likelihood), resampling . . . .
and evolves from studies of population growth (Doucet et al., 2001). The state transition and corresponding measurement model are given by 25x(t − 1) 1 x(t − 1) + + 8 cos (1.2(t − 1)) + w(t − 1) 2 1 + x 2 (t − 1) x 2 (t) y(t) = + v(t) 20
x(t) =
where t = 1.0, w ∼ N (0, 10) and v ∼ N (0, 1). The initial state is Gaussian distributed with x(0) ∼ N (0.1, 5). In terms of the state-space representation, we have 25x(t − 1) 1 x(t − 10) + 2 1 + x 2 (t − 1) b[u(t − 1)] = 8 cos (1.2(t − 1)) x 2 (t) . c[x(t)] = 20
a[x(t − 1)] =
146
J. V. CANDY TABLE II. Bootstrap SIR state-space particle filtering algorithm INITIALIZE: xi (0) → Pr(x(0))
Wi (0) =
1 Np
i = 1, . . . , N p
[sample]
IMPORTANCE SAMPLING: xi (t) ∼ A (x(t)|xi (t − 1)) ; w i ∼ Pr(w i (t))
[state transition]
Weight Update Wi (t) = Wi (t − 1) × C (y(t)|xi (t))
[weights]
Weight normalization Wi (t) Wi (t) = N p i=1 Wi (t) RESAMPLING: (xˆ i (t) ⇒ x j (t)) DISTRIBUTION: ˆ Pr(x(t)|Y t) ≈
Np
Wi (t)δ(x(t) − xˆ i (t))
[posterior distribution]
i=1
STATE ESTIMATION: xˆ (t|t) = E{x(t)|Yt } ≈
Np 1
xˆ i (t) N p i=1
ˆ Xˆ M A P (t) = max Pr(x(t)|Y t)
[conditional mean] [MAP]
In the Bayesian framework, we would like to estimate the instantaneous posterior filtering distribution, ˆ Pr(x(t)|Y t) ≈
Np
Wi δ(x(t) − xi (t))
(35)
i=1
where the unnormalized importance weight is given by Wi (t) = Wi (t − 1) ×
C (y(t)|x(t)) × A (x(t)|x(t − 1)) . q(x(t)|X t−1 , Yt )
(36)
The weight recursion for the bootstrap case is Wi (t) = Wi (t − 1) × C (y(t)|x(t)). Therefore, for our problem, the Bayesian processor has its state transition probability given by Pr (x(t)|x(t − 1)) → A (x(t)|x(t − 1)) ∼ N (x(t) : a[x(t − 1)], Rww ) . (37) Thus, the SIR algorithm becomes:
NONLINEAR STATISTICAL SIGNAL PROCESSING
147
r Draw samples (particles) from the state transition distribution: xi (t) → N (x(t) : a[x(t − 1)], Rww ) w i (t) → Pr(w(t)) ∼ N (0, Rww ) xi (t) =
1 25xi (t − 1) + 8 cos (1.2(t − 1)) + w i (t − 1) xi (t − 1) + 2 1 + xi2 (t − 1)
r Estimate the likelihood, C (y(t)|x(t)) → N (y(t) : c[x(t)], Rvv (t)) c[xi (t)] =
xi2 (t) 20
r Update and normalize the weight: Wi (t) = Wi (t)/ N p Wi (t) i=1 r Resample: xˆ i (t) ⇒ x j (t) r Estimate the instantaneous posterior: ˆ Pr(x(t)|Y t) ≈
Np
Wi δ(x(t) − xˆ i (t))
i=1
r Estimate the corresponding statistics: ˆ Xˆ map (t) = arg maxPr(x(t)|Y t) Xˆ mmse (t) = E{x(t)|Yt } =
Np
ˆ xˆ i (t)Pr(x(t)|Y t)
i=1
ˆ Xˆ median (t) = median Pr(x(t)|Y t )) . We show the simulated data in Fig. 4. In a we see the hidden state and b the noisy measurement. The estimated instantaneous posterior distribution surface for the state is shown in Fig. 5a while slices at selected instants of time are shown in b. Here we see that the posterior is clearly not unimodal and in fact we can see its evolution in time as suggested by Fig. 2 previously. The final state and measurement estimates are shown in Fig. 4b demonstrating the effectiveness of the PF bootstrap processor for this problem. Various ensemble estimates are shown (e.g. median, MMSE, MAP). It is clear that the extended Kalman filter gives a poor MMSE estimate, since the posterior is not Gaussian (unimodal). 8. Summary In this chapter we have provided an overview of nonlinear statistical signal processing based on the Bayesian paradigm. We have shown that the next generation processors are well-founded on MC simulation-based sampling
Figure 4. Population growth problem: (a) Simulated state with mean. (b) Simulated measurement with mean. (c) Ensemble of state estimates: median, EKF , MMSE , MAP. (d) Ensemble of measurement estimates: median, EKF, MMSE, MAP.
148 J. V. CANDY
Figure 5. Population growth problem: (a) instantaneous posterior surface. (b) time slices of the posterior (cross-section) at selected time-steps.
NONLINEAR STATISTICAL SIGNAL PROCESSING
149
150
J. V. CANDY
techniques. We reviewed the development of the sequential Bayesian processor using the state-space models. The popular bootstrap algorithm was outlined and applied to a standard problem used within the community to test the performance of a variety of PF techniques.
References Candy, J. V. (2006) Model-Based Signal Processing, New Jersey, John Wiley. Djuric, P., Kotecha, J., Zhang, J., Huang, Y., Ghirmai, T., Bugallo, M., and Miguez, J. (2003) Particle Filtering. IEEE Signal Processing Magazines 20(5), 19–38. Doucet, A., de Freitas, N., and Gordon, N. (2001) Sequential Monte Carlo Methods in Practice, New York, Springer-Verlag. Doucet, A. and Wang, X. (2005) Monte Carlo Methods for Signal Processing, IEEE Signal Processing Magazines 24(5), 152–170. Godsill, S. and Djuric, P. (2002) Special Issue: Monte Carlo methods for statistical signal processing. IEEE Transactions on Signal Processing, 50, 173–449. Haykin, S. and de Freitas, N. (2004) Special Issue: Sequential State Estimation: From Kalman Filters to Particle Filters. Proceedings of IEEE, 92(3), 399–574. Liu, J. (2001) Monte Carlo Strategies in Scientific Computing, New York, Springer-Verlag. Ristic, B., Arulampalam, S., and Gordon, N. (2004) Beyond the Kalman Filter: Particle Filters for Tracking Applications, Boston, Artech House. Schoen, T. (2006) Estimation of Nonlinear Dynamic Systems: Theory and Applications Linkopings Univ., Linkoping, Sweden PhD Dissertation. Sullivan, E. J. and Candy, J. V. (1997) Space-time Array Processing: The Model-based Approach, Journal of the Acoustical Society of America, 102(5), 2809–2820. Tanner, M. (1993) Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions, 2nd ed.; New York, Springer-Verlag. van der Merwe, R. (2004) Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models OGI School of Science & Engr., Oregon Health & Science Univ., PhD Dissertation. West, M. and Harrison, J. (1997) Bayesian Forecasting and Dynamic Models, 2nd ed., New York, Springer-Verlag.
POSSIBILITIES FOR QUANTUM INFORMATION PROCESSING∗ ˇ ep´an Holub Stˇ Charles University, Prague, Czech Republic
Abstract. This tutorial introduces quantum information science, a quickly developing area of multidisciplinary research. The basic idea of the field is to study physical systems obeying the laws of quantum mechanics to obtain promising possibilities for computation, image processing, data transfer and/or high precision detection. Quantum information science is an intersection of mathematics, physics and informatics. We explain basic ingredients that contribute to the general picture. Although we mention also some up-to-date experimental results, the focus will be on the theoretical description of quantum data transfer and a possible quantum computer. Key words: quantum mechanics, quantum information, quantum computer, Mach-Zehnder interferometer
1. Introduction Automatic processing of information is one of the most striking features of our present civilization. Origins of what we call “computer” date back to the first half of the twentieth century, when Alan Turing gave an ingenious mathematical description of a general algorithmic process (Turing, 1936). All our computers are in fact a kind of physical realization of the old fashioned device known as a Turing machine. From the mathematical point of view computers do not evolve, they are just faster and faster, due to enormous technological improvement, which obeys, since 1965, the famous prediction of Gordon Moore: the power of computer processing doubles every eighteen months. There are, however, ultimate physical restrictions for classical computers, given by the speed of light (electromagnetic field) and the size of atoms. At the sub-atomic level, the laws of quantum physics begin to predominate, which profoundly challenges the classical approach to information. Quantum ∗
The work is a part of the research project MSM 0021620839 financed by MSMT.
151 J. Byrnes (ed.), Imaging for Detection and Identification, 151–167. C 2007 Springer.
152
ˇ EP ˇ AN ´ HOLUB ST
mechanics, on the other hand, is not only a thread. More importantly, it offers possibilities which give rise to a new and fascinating field of research called quantum information science. The mathematical notion of information is usually based on symbols chosen from a finite alphabet, typically the famous zeros and ones. The physical part of the question is how to represent these symbols: which states of which physical system should distinguish between them. If the information is “encoded” by a quantum system the situation seems to be quite analogous. The crucial difference, however, is the fact that quantum systems can exist in superpositions of basic states, a strange kind of mixture of them. Quantum information science is basically an attempt to take advantage of this property.
2. The Difference Between Classical and Quantum Physics 2.1. MACH-ZEHNDER INTERFEROMETER
The difference between classical and quantum physics can be illustrated by a simple experiment called the Mach-Zehnder interferometer. The device is depicted in Figure 1. A and B are half-silvered mirrors, which split a beam of light in two parts, hence called beamsplitters. Half of the beam goes through the mirror, the second half is reflected. U and L are just ordinary mirrors. By the laws of optics, light changes its phase by π if reflected by a surface bounding an environment with higher refraction factor (i.e., lower speed of light). Therefore in our picture the only reflection not causing the phase shift is the reflection of the beam arriving from mirror U to mirror B, since the
Figure 1. Mach-Zehnder interferometer.
QUANTUM INFORMATION
153
reflection occurs between glass (foreground) and air (background). (The phase shift caused by the transition through the glass is neglected in our description.) Numbers 1 to 4 refer to possible positions of detectors. If we measure at positions 1 and 2, the same energy is detected by each detector. Measurement by detectors 3 and 4 (with detectors 1 and 2 removed) shows that there is no signal on detector 3, only on 4. The explanation is simple: the beam traveling to 3 through U is shifted by π with respect to the beam traveling through L. Therefore the destructive interference cancels the signal. On detector 4, on the other hand, the interference is constructive. This is the classical description of the experiment. When the amount of light is diminished under certain quantum, the light starts to behave as a particle, the famous photon. This means that the whole photon is either reflected or passes through the mirror, it does not divide in two. The probability that the photon will be detected by detector 1 (2 resp.) is exactly one half. This is confirmed by the experiment. A strange thing happens when we let the photon go through the mirror B without previous measurement. The classical probabilistic interpretation suggests that the probability will again be one half for both detectors 3 and 4. Indeed, with probability 1/4 the photon will be reflected both by A and B, and similarly for the other three combinations. But the experimental result is the same as in the interference model: the signal is detected always at 4, never at 3. It is now the task for quantum mechanics to explain this kind of phenomena.
2.2. THE POSTULATES OF QUANTUM MECHANICS
Quantum mechanics is a mathematical tool for the description of quantum phenomena, and is based on three postulates. Postulate 1 A quantum system with m possible observable values is described by the complex vector space Hm = Cm with inner product (called Hilbert space). A state of the system is a one-dimensional subspace of Hm . It is always represented by a unit vector u, traditionally written as |u. From the point of view of quantum information science the basic quantum system is two-dimensional, and is called qubit, as an analog to the classical bit. To stress the analogy, basis vectors of H2 are usually denoted by |0, and |1. Note that even if we restrict ourselves to a unit representation of the one-dimensional state subspace, it is not given uniquely. All vectors of the form eαi |u are equivalent expressions of the same state. Note also that
154
ˇ EP ˇ AN ´ HOLUB ST
the arithmetic expression of basis vectors is the canonical 1 0 |0 = , |1 = . 0 1 The basic difference between bit and qubit, already mentioned above, is the possibility of the system existing in a superposition of basis states, as for example the state |u =
|0 + |1 . √ 2
Postulate 2 The time evolution of a quantum system is given by a unitary operator U . Therefore, if a system is in state |u at time t0 , then it is in state U |u at time t1 . Obviously, the operator U depends on what happened to the system. It may be for example affected by an electromagnetic field of given intensity. In quantum information science we usually consider operators U as black boxes. This is again analogous to the classical approach to bits, when we for example speak about a NOT-gate, without examining its physical realization. An operator is unitary, by definition, if it satisfies U ∗U = Id, where U ∗ is the adjoint operator of U . It is natural to work with the matrix representation of operators. Then U ∗ is the transposed complex conjugate matrix of U . An equivalent definition of unitary operator is that it preserves inner product, and therefore also the norm. This is a natural requirement on the time evolution of quantum systems, since then a unit vector evolves again to a unit vector. It is also important that unitary transformations are invertible, it is therefore always possible to reverse the process. Probably the most useful unitary operator in quantum computing, as we shall see, is the Hadamard transform 1 1 1 H=√ . 2 1 −1 Verifying that it is unitary is a simple exercise. Postulate 3 Measurement of a quantum system is always related to a certain orthonormal basis. Consider a measurement of Hm in the orthonormal basis |b1 , . . . , |bm . After the measurement, the system collapses to one of the basis states |bi , and the outcome will be a value λi corresponding to that state. It is the eigenvalue of |bi with respect to some operator related to the basis and called an observable. If the measured system is in the state |u =
m i=1
αi |bi
QUANTUM INFORMATION
155
then the eigenvalue of |bi will be obtained with probability |αi |2 . Note that the sum of probabilities of all outcomes is one, since |u is a unit vector. The complex number αi is called the amplitude of probability. The measurement postulate is a very strange one. It claims that the result of a measurement is given only with certain probability. Quantum physics has no means to say more about the outcome; it claims that it is not the fault of the theory, it is nature, which is substantially random when it comes to measurements. Moreover, the collapse of the system is in general not a unitary operation. Therefore measurements do not obey Postulate 2. These facts have caused a lot of controversies. Albert Einstein, for example, insisted that the theory cannot be considered a complete description of physical reality (Einstein et al., 1935), and it seems that he never acquiesced to quantum mechanics. Nevertheless, the point of view we are describing here, advocated for example by Bohr (1935), became the mainstream. A very important fact is that the same system can be measured in different bases of our choice. In other words, we can measure different observables. To obtain respective probabilities, it is then necessary to express the state in the chosen basis (or to use some other linear algebra tricks). Let us give an example. Measure the qubit |u =
|0 + |1 √ 2
in the canonical basis |0, |1, with eigenvalues 1, −1, respectively. Then the outcome will be plus or minus one with probability 1/2, and the state of the system after the measurement will be |0 or |1. Note that the measurement is destructive. We will never be able to figure out which state has been measured. If, on the other hand, one measures the same state in the basis |b1 = |u =
|0 + |1 , √ 2
|b2 =
|0 − |1 , √ 2
then one obtains the eigenvalue of |b1 with probability one, and the state does not change. 2.3. MACH-ZEHNDER: THE SOLUTION OF THE RIDDLE
Let us now show how quantum mechanics explains the experimental results of the Mach-Zehnder interferometer. Observe that the interferometer consists of an experiment repeated twice in the same way, namely an interaction of a photon with a half-silvered mirror. The elementary experiment has two possible inputs: the photon arrives either from the glass-side, or from the
ˇ EP ˇ AN ´ HOLUB ST
156
silver-side of the mirror, and similarly two possible outputs: it leaves the mirror on the glass side or on the silver side. (In Figure 1 only one possible input for the mirror A is shown.) Schematically the situation can be depicted as follows:
The detection of the photon on one of the two paths is an observable. Denote the basis for this observable by |0, |1. The photon is in state |0 if it travels along the upper path (mirror U in Figure 1) and in state |1 if traveling along the lower path (mirror L). The key to the riddle is the superposition. The state of the photon can be a mixture of both basis states, and that’s exactly what, according to quantum mechanics, happens after the interaction of the photon with the beamsplitter. The transformation is given by |0 →
(|0 + |1) , √ 2
|1 →
(|0 − |1) . √ 2
Recall that |0, |1 are vectors of C2 : 0 1 . , |1 = |0 = 1 0 The matrix of the interaction is then 1 1 1 , √ 2 1 −1 which is the above Hadamard transform. The action of the beamsplitter on a photon in a state α|0 + β|1 can be portrayed as:
In our description of the experiment in Figure 1 we suppose that the photon begins in state |0. Therefore after the interaction with the mirror A it will be √ in state (|0 + |1)/ 2, and, by Postulate 3, the probability that the photon will be detected by the detector 1 (2 resp.) is exactly one half. After detection the photon will be in a basis state corresponding to the detector by which it was detected.
QUANTUM INFORMATION
157
An inspection of Figure 1 shows that the entire Mach-Zehnder interferometer can be schematized as:
The matrix of the second beamsplitter is 1 1 1 , √ 2 −1 1 and the overall effect of the apparatus is given by 1 1 1 1 1 1 01 ·√ = , √ 10 2 −1 1 2 1 −1 which agrees with experimental results: the photon which enters along the upper path will be detected in the lower one (and vice versa).
3. Complexity and Circuits An algorithm is a method to solve a problem, which consists of many (typically infinitely many) instances. For example, an instance of the traveling salesman problem (TSP) is a weighted graph, and the solution is the cheapest path through all nodes. An algorithmic solution of the problem must describe a finite and uniform procedure which solves all possible instances. It is not clear, a priori, which instructions should be accepted in an algorithm. One will surely not allow an instruction “find the cheapest path” in the algorithm for TSP, since instructions have to be clear and elementary. On the other hand, restricting the set of tools can result in too weak a concept of the algorithm. During the twentieth century the Turing machine (TM) was accepted as the correct expression of what we mean by algorithm. This general agreement is based on the fact that all alternative models are either weaker or equivalent to TM. Whence the Church–Turing thesis: Any problem, which can be algorithmically solved, can be solved by the TM.
The thesis cannot be proved, of course, since it hinges on an intuitive notion of algorithmic solution. In fact, it is something between a claim and a definition. A natural question arises: what about possible quantum computers? Could they do more than TM? The answer is negative. It is possible to simulate all quantum processes by the conventional computer, it just is very complicated and slow. Consequently, the right question is whether quantum computers can
158
ˇ EP ˇ AN ´ HOLUB ST
do something substantially faster than TM. The answer to this question is not so simple. We first have to specify what is meant by “substantially faster”, and therefore how the complexity of an algorithm should be measured. The usual concept of algorithmic complexity deals with a function f (n), which gives an upper bound for the number of elementary steps leading to the solution of any instance of length n. It turns out that given an algorithm, various kinds of classical computers do not differ too much in their ability to execute the algorithm. More precisely, if the complexity of an algorithm is given by a polynomial on the traditional TM, then it is given by a (smaller) polynomial also on our best contemporary computers. Somehow, since Turing, we have not invented better computers, we have only constructed faster TM. It seems, therefore, that whether an algorithm is polynomial is independent of the machine we use. In this respect (hypothetical) quantum computers can beat TM. As we shall see, there is a problem—although a bit artificial—which cannot be solved in polynomial time by TM, but is easy for a quantum computer. Even here, however, the situation is not so clear if we allow probabilistic TMs, i.e. algorithms which give the right answer with some required probability close to one. Of course, deterministic algorithms are always better, but if the error probability of a probabilistic TM is lower than the probability that your computer will be destroyed by a meteorite, it does not seem very reasonable to insist on deterministic algorithms. It is not known whether quantum probabilistic algorithms are stronger than classical probabilistic ones, and it seems unlikely that this could be proved in the near future, since the fact would imply extremely hard theoretical claims. From the theoretical point of view, therefore, quantum computers are not convincingly better than classical ones. In practice, however, quantum algorithms have an ace in the hole: there is a practical probabilistic quantum algorithm which factors large numbers. The intractability assumption of such factorizations is the cornerstone of many presently used secure algorithms. 3.1. BOOLEAN AND QUANTUM CIRCUITS
Boolean circuits are a convenient way to describe an algorithm for a given input length, since each algorithm can be seen as a device computing a boolean function. The input is encoded as a sequence of 0s and 1s, and the output as well. In the previous chapter we represented the beamsplitter in a way which recalls the schematic notation of logical gates. This was not by chance. We want to view quantum phenomena as building blocks for construction of algorithms. There is, however, one important difference between classical logical gates and quantum ones: quantum processes are all reversible, while
159
QUANTUM INFORMATION
for example NAND, which is a universal boolean gate, is irreversible. It is impossible to find out whether the resulting 1 came from two zeros or from 0,1. So far we have discussed the possibility that quantum computers are stronger than classical ones. Now it turns out that the possibilities of quantum gates may be limited since they have to be reversible. In reality, every classical circuit can be simulated in a reversible way, perhaps with some auxiliary inputs. There are universal reversible gates, for example the Toffoli gate T , which has three input and three output bits. It is defined by T : (a, b, c) → (a, b, c ⊕ ab), where ⊕ denotes the logical sum (i.e. XOR). The Toffoli gate is reversible, since T ◦ T = Id: (a, b, c) → (a, b, c ⊕ ab) → (a, b, c ⊕ ab ⊕ ab) = (a, b, c), and it is universal, since NAND(a, b) = T (a, b, 1). In order to simulate boolean circuits by quantum circuits, it is therefore enough to construct a quantum Toffoli gate, which is in principle possible. Every quantum circuit will have a lot of auxiliary bits as is clear from the following comparison. a
a
a
b
b
1
ab
ab b
classical NAND
reversible NAND
4. Quantum Algorithms So far three basic quantum algorithms which have better performance than classical algorithms are known: r Deutch-Jozsa algorithm, which decides whether a given boolean function is constant or balanced; r Quantum Fourier transform with several applications, the most important of which is Shor’s factoring algorithm; r Grover’s algorithm for database search. We have already noted that the most spectacular achievement of hypothetical quantum computers would be their ability to factor large integers. The
ˇ EP ˇ AN ´ HOLUB ST
160
Shor’s algorithm is, from the quantum point of view, a direct application of the Fourier transform, which is a well known tool able to detect periodicity. It allows to find the order of a chosen element a in the cyclic group Z N , where N is the factored number, which allows, with a high probability, to find a factor. The key capability of quantum computers is therefore fast computation of the Fourier transform. While the classical Fast Fourier Transform works in time O(N log N ), the Quantum Fourier Transform is polynomial in log N , which means an exponential speedup. Here we explain in detail a simple version of Deutch-Jozsa algorithm. Although it has limited importance for practical problems, the algorithm makes clear why the quantum approach can have advantages over the classical one. First we need some more theory regarding composite systems. 4.1. COMPOSITE SYSTEMS
Consider a quantum system made up of n smaller systems. Then we have the following additional Postulate 4 The system V consisting of systems V1 , V2 , . . . , Vn is the tensor product V = V1 ⊗ V2 ⊗ · · · ⊗ Vn . This means that it is a n1 di dimensional system, where di is the dimension of Vi , and the basis of V is the set {|v j1 ⊗ · · · ⊗ |v jn | ji = 1, . . . , di , i = 1, . . . , n}. We usually consider only systems composed of several qubits. They have dimension 2n , where n is the number of qubits. The notation of basis vectors |k1 ⊗ · · · ⊗ |kn , where ki ∈ {0, 1} is often simplified into |k1 . . . kn . This is a very clever notation, since if sequences k1 . . . kn are understood as binary numbers, one obtains the basis {|0, |1, |2, . . . , |2n − 1} of the composite system. For example the basis of the four dimensional system is {|00, |01, |10, |11}, which indicates its possible decomposition into two qubits. It is important to note that, although the composite system is a tensor product of smaller systems, it also contains states which cannot be written as tensor product of vectors from those systems; it cannot be decomposed. For
QUANTUM INFORMATION
161
instance, the state |w =
|00 + |11 √ 2
of the four dimensional system is not a product of two qubits, as is easy to verify. The two qubits are said to be entangled. Note that if two qubits are in the state |w, they will with probability 1/2 have both values corresponding to |0, or both values corresponding to |1. This implies a very strange fact, which challenges basic intuitions of classical physics. It is experimentally possible to prepare entangled qubits (photons, for instance), which can be then separated to a distance of many kilometers. If one of the qubits is measured, the outcome will immediately determine the outcome of the measurement of the distant entangled qubit. In other words, quantum mechanics allows non-local effects, which was one of the motives for Einstein’s disconcertion.
4.2. DEUTSCH’S ALGORITHM
Deutsch’s problem is linked to origins of quantum algorithms. We are given a black box function f : {0, 1}2 → {0, 1}, and have to decide whether it is constant or not. In the classical setting it is obvious that we need to make two queries, i.e. to learn both values of f , in order to decide the question. On the other hand, the question asks for just one bit of information: constant yes, or no? Unfortunately there is no way to ask just this question. But that is exactly what a quantum computer can do. Deutch’s algorithm shows how to decide the question by a single query. First it is important to clarify how a quantum black box function is given. The function f is in general not injective, therefore irreversible, and we have seen that all quantum circuits have to be reversible. The standard way to represent such functions is the following gate, with an auxiliary input and output:
x
x
Uf y
y f(x)
ˇ EP ˇ AN ´ HOLUB ST
162
The U f transform acts on the two qubit system and is defined on the basis vectors |x ⊗ |y by Uf
|x ⊗ |y −→ |x ⊗ |y ⊕ f (x). It is easy to see that it just permutes the basis vectors |00, |01, |10, |11. Note that U f ◦ U f = Id, therefore the transform is unitary. This is the first two-qubit gate we see here. It should not be confused with the beamsplitter gate, which is one-qubit, and in this context should be depicted as
where |v = H |u, or even better as
to indicate that we do not care about whether the Hadamard gate is physically realised by the beamsplitter or otherwise. The quantum circuit solving Deutsch’s problem is quite simple. It consists, apart from the black box, of three Hadamard gates: H
0
x
H
x
Uf H
1
y
y f(x)
s2
s1
s3
s4
We have indicated the four stages of the algorithm by vertical lines. In the initial stage, the two qubit are in the state s1 = |01. In the second stage we obtain s2 =
|0 + |1 |0 − |1 ⊗ √ . √ 2 2
What happens in the black box depends, of course, on the function f . The simplest case is f (0) = f (1) = 0, when U f is the identity. Therefore s3 =
|0 + |1 |0 − |1 ⊗ √ √ 2 2
for f (0) = f (1) = 0.
163
QUANTUM INFORMATION
If f (0) = f (1) = 1, the action of U f is given by |00 → |01
|01 → |00
|10 → |11
|11 → |10.
Then 1 1 s3 = U f (|00 − |01 + |10 − |11) = (|01 − |00 + |01 − |10) = 2 2 |0 + |1 |0 − |1 =− √ ⊗ √ . 2 2 Similarly we can proceed with the other two possibilities to obtain altogether ⎧ |0+|1 |0−|1 ⎪ ⎨ ± √2 ⊗ √2 if f (0) = f (1), s3 = ⎪ ⎩ ± |0−|1 √ √ ⊗ |0−|1 if f (0) = f (1). 2 2 Finally, s4 =
⎧ ⎪ ⎨ ±|0 ⊗
|0−|1 √ 2
if f (0) = f (1),
⎪ ⎩ ±|1 ⊗
|0−|1 √ 2
if f (0) = f (1).
Now it suffices to measure the first qubit. The eigenvalue of |0 will disclose that f is constant, the eigenvalue of |1 the opposite. Deutch’s algorithm shows the basic idea of all quantum algorithms: the superposition of states allows, in a sense, to compute many values at the same time. Note that after the Hadamard transforms, a combination of all basis vectors enters the gate U f . On the other hand, this alone does not mean that after the evaluation of the combination we have a direct approach to any value we wish to know, since there is no obvious way to extract the desired information from the result. For example, if we measured the first qubit of the state s3 , we would get both eigenvalues with the same probability regardless of f .
5. Quantum Information Information science is not limited to algorithms and solving problems. An equally important task is sending information over a channel, and doing so efficiently and often also secretly, which gives rise to classical coding theory and cryptology. What happens if we consider a channel which has quantummechanical properties, instead of the classical one? Quantum mechanics suggest that quantum systems contain much more information than classical bits. Superposition of states allows parallel transmission of a lot of information.
164
ˇ EP ˇ AN ´ HOLUB ST
Suppose for example that we want to transmit four bits bi , i = 0, . . . , 3 of classical information. We can prepare a two qubit system in superposition 3 1 |00 |01 |10 |11 + (−1)b1 + (−1)b2 + (−1)b3 , (−1)bi |i = (−1)b0 2 i=0 2 2 2 2
which contains four bits of information encoded as signs of the basis states. This is a promising idea, but alas, it does not work. The irreparable reason is that it is impossible to gain the information from the system. There is no way to distinguish reliably between the sixteen possible states suggested above. By a measurement we obtain again just two bits of information, and maybe not even that, if the measurement is not reasoned: we have already seen that to measure the state |0 + |1 √ 2 yields no information at all, since the outcome is a completely random bit. The problem is that each measurement, by Postulate 3, is destructive. Some information is always lost as soon as we measure a superposition of basis states. 5.1. NO-CLONING THEOREM
One may think of the following solution. Make many copies of the received state, and then by many different experiments, for example by repeated measurements in different bases, learn what the original state was like. Unfortunately, this idea does not help either. Quantum mechanics prohibits copying states! More precisely, it allows to make a copy only of basis states, which corresponds to the classical copying of information, and hence does not yield an advantage. The result is known as the No-cloning theorem, and we are going to prove it. First we have to formulate the claim properly. Suppose we have an unknown state |u and want to copy it. This means we want to implement the transformation |u ⊗ |0
U
−→
|u ⊗ |u.
The choice of |0 as the “blank state” is arbitrary, but it has to be firm. We do not know anything about u, and therefore cannot choose the blank state to somehow fit the copied one. By Postulate 2, the transformation has to be unitary. The No-cloning theorem now claims that if U is a unitary transformation then the desired copying equality holds only for two states |a, |b.
QUANTUM INFORMATION
165
To prove the theorem let U (|a ⊗ |0) = |a ⊗ |a, U (|b ⊗ |0) = |b ⊗ |b. Consider now a general state α|a + β|b. Since U is a linear operator, we have U ((α|a + β|b) ⊗ |0) = α · U (|a ⊗ |0) + β · U (|b ⊗ |0) = α · |a ⊗ |a + β · |b ⊗ |b. Suppose that U copies α|a + β|b. Then also U (α|a + β|b) ⊗ |0 = (α|a + β|b) ⊗ (α|a + β|b) = α 2 · |a ⊗ |a + αβ|a ⊗ |b +βα|b ⊗ |a + β 2 · |b ⊗ |b. Comparing the two expressions it is obvious that either α or β is zero, which completes the proof. 5.2. QUANTUM CRYPTOGRAPHY
So far we have spoken about drawbacks of quantum information, which essentially follow from the destructive nature of the measurement. That very feature, however, turns out to be very useful from the security point of view. In short, if an eavesdropper intercepts a quantum channel he or she cannot escape unnoticed. We describe the cryptography protocol BB84, which exploits this basic idea. The name of the protocol comes from the fact that it was published by Bennett and Brassard (1984). The aim of the protocol is to exchange a secret key over a public quantum channel. The key can be subsequently used for other cryptography tasks. Suppose A and B want to exchange a secret key of length n. Then A generates two sequences b1 , . . . , bm and c1 , . . . , cm of m = δn random bits. The number δ has the following significance: during the protocol many transmitted qubits will be discarded. Therefore δ is chosen so that at least n usable bits remain with required high probability. A then encodes each bit by a qubit in the following way. Denote |+ =
|0 + |1 √ 2
|− =
|0 − |1 . √ 2
166
ˇ EP ˇ AN ´ HOLUB ST
There are now two possibilities to encode each bit bi by a qubit |u i : 0 → |0 1 → |1
or
0 → |+ . 1 → |−
A choses the encoding according to the values of ci . If ci = 0, the bit bi is encoded by |0 or |1, if ci = 1, by |+ or |−. All qubits are now transmitted through a public channel to B, which measures |u i in a randomly chosen basis di . If di = ci , the outcome is a useless random number. If, on the other hand, di = ci , the output is equal to the input. After the measurement, both A and B make their sequences of basis choices ci and di public. It is now clear which bits are usable for the shared secret. Suppose there are at least 2n of them (which happens with high probability depending on δ). In the control stage A and B randomly choose half of the usable bits to check publicly, seeing whether the values agree. They should agree, and if they do not, then the channel was intercepted (or is otherwise unreliable). Therefore, if a disagreement in some chosen number t of control bits appears, the exchange is discarded. The eavesdropper E is successful on the jth bit only if ci = di = ei , where ei is the basis in which E measured |u i , and only if the bit was not chosen for the control. Since the basis in which the bit is encoded is random and is disclosed only after the transmission, the interception by E will, with high probability, cause a detectable disturbance of the transmitted signal. Note that the no-cloning theorem plays an important role here. Even if the transmission of the signal is public, an eavesdropper cannot make a copy of it for further inspection. 5.3. PRACTICAL REALIZATIONS
Although the theory of quantum information is very nice, there is an obvious question: How much of the theory can be implemented? The effort to build quantum computers by such research giants as IBM, the U.S. Department of Defense and the NEC Electronic Corporation is extremely intense. The opinions on the practicality of quantum computers differ from scientist to scientist, and the situation is changing very quickly. The difficulties encountered are enormous, and it is hard to say whether they are substantial, or whether everything is just a question of technology development. The theoretical results described in this chapter are experimentally verifiable. The main problem is that quantum systems are very unstable. Quantum mechanics works for closed quantum systems and it is difficult to avoid an interaction of the system with the environment, and to keep the system under control. For reasonably large systems this is impossible so far.
QUANTUM INFORMATION
167
In 2001 IBM announced an experimental realization of 7 qubit computation which implemented Shor’s algorithm to verify that 15 = 3 · 5. At the moment, however, it seems that the approach used in this experiment cannot be extended beyond 10 qubits. The approach is called nuclear magnetic resonance (NMR), which handles large number of molecules (about 108 ), which produce a macroscopically detectable signal. An alternative approach is called ion trap, and deals with single atoms cooled to a very low temperature. This approach seems to be more promising at the moment. In December 2005 researchers from the University of Michigan announced that they constructed an ion trap using a technology similar to classical chips (Stick et al., 2006). In the area of quantum communication the situation is much more optimistic. The first computer network in which communication is secured with quantum cryptography was unveiled in 2004 in Cambridge, Massachusetts. It is based on optic fibers and covers a distance of about 10 km. Quantum key distribution described in this chapter was experimentally demonstrated in 1993 for a distance of several tens of kilometers. However, at the present time the maximum is about 50 km due to weakening of the signal.
References Bennett, C. H. and Brassard, G. (1984) Quantum Cryptography: Public Key Distribution and Coin Tossing, In Proceedings of IEEE International Conference on Computers, Systems, and Signal Processing, pp. 175–179. Bohr, N. (1935) Can Quantum-Mechanical Description of Physical Reality be Considered Complete?, Physics Review 48, 696–702. Einstein, A., Podolsky, B., and Rosen, N. (1935) Can Quantum-Mechanical Description of Physical Reality be Considered Complete?, Physics Review 47, 777–780. Stick, D., Hensinger, W. K., Olmschenk, S., Madsen, M. J., Schwab, K., and Monroe, C. (2006) Ion Trap in a Semiconductor Chip, Nature Physics 2, 36–39. Turing, A. M. (1936) On Computable Numbers, with an Application to the Entscheidungs Problem, Proceedings of London Mathematical Society. 43(2), 230–265.
This page intentionally blank
MULTIFRACTAL ANALYSIS OF IMAGES: NEW CONNEXIONS BETWEEN ANALYSIS AND GEOMETRY Yanick Heurteaux1 (
[email protected]) and St´ephane Jaffard2 (
[email protected]) 1 Laboratoire de Math´ematiques, Universit´e Blaise Pascal, 63177 Aubi`ere Cedex, France 2 Laboratoire d’Analyse et de Math´ematiques Appliqu´ees, Universit´e Paris XII, 61 Avenue du G´en´eral de Gaulle, 94010 Cr´eteil Cedex, France
Abstract. Natural images can be modelled as patchworks of homogeneous textures with rough contours. The following stages play a key role in their analysis: r Separation of each component r Characterization of the corresponding textures r Determination of the geometric properties of their contours. Multifractal analysis proposes to classify functions by using as relevant parameters the dimensions of their sets of singularities. This framework can be used as a classification tool in the last two steps enumerated above. Several variants of multifractal analysis were introduced, depending on the notion of singularity which is used. We describe the variants based on H¨older and L p regularity, and we apply these notions to the study of functions of bounded variation (indeed the BV setting is a standard functional assumption for modelling images, which is currently used in the first step for instance). We also develop a multifractal analysis adapted to contours, where the regularity exponent associated with points of the boundary is based on an accessibility condition. Its purpose is to supply classification tools for domains with fractal boundaries. Key words: pointwise exponents, fractal boundaries, multifractal analysis
1. Mathematical Modelling of Natural Images In order to develop powerful analysis and synthesis techniques in image processing, a prerequisite is to split the image into simpler components, and to develop some classification procedures for these components. Consider the example of a natural landscape: It consists of a superposition of different pieces 169 J. Byrnes (ed.), Imaging for Detection and Identification, 169–194. C 2007 Springer.
170
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
which present some homogeneity. Perhaps there will be a tree in the foreground, mountains in the background and clouds at the top of the picture. An efficient analysis procedure should first be able to separate these components which will display completely unrelated features and analyse each of them separately, since they will individually present some homogeneity. Considering an image as a superposition of overlapping components is referred to as the “dead-leaves” model, introduced by Matheron. See (Matheron, 1975) and (Bordenave et al., 2006) for recent developments. Each piece appears as a relatively homogeneous part which is “cut” along a certain shape . The homogeneous pieces are the “textures” and their modelling can be performed by interpreting them as the restriction on the “shape” of random fields of R2 with particular statistical properties (stationarity, . . . ). If a statistical model depending on a few parameters is picked, then one needs to devise a robust statistical test in order to estimate the values of these parameters. Afterwards, the test can be used as a classification tool for these textures. Furthermore, once the relevant values of the parameters have been identified, the model can be used for simulations. The procedure is now standard to classify and generate computer simulations of clouds for instance (Arneodo et al., 2002; Naud et al., 1997). Another problem is the modelling of the shape of ; indeed natural scenes often do not present shapes with smooth edges (it is typically the case for the examples of trees, mountains or clouds that we mentioned) and the challenge here is to develop classification tools for domains with non-smooth (usually “fractal”) boundaries. Until recently, the only mathematical tool available was the box-dimension of the boundary (see Definition 3.1) which is an important parameter but nevertheless very insufficient for classification (many shapes share the same box dimension for their boundary, but clearly display very different features). Let us come back to the separation of the image into “simpler” components that present some homogeneity. It can be done using a “global” approach: The image is initially stored as grey-levels f (x, y) and is approximated by a simple “cartoon” u(x, y). What is meant by “simple” is that textures will be replaced by smooth pieces and rough boundaries by piecewise smooth curves.1 The building of a mathematical model requires summarizing these qualitative assumptions by choosing an appropriate function space setting. In practice, the space BV (for “bounded variation”) is usually chosen. A function f belongs to BV if its gradient (in the distribution sense) is a bounded measure (the name BV refers to the one-dimensional case where a function f belongs to BV if the sums i | f (xi+1 ) − f (xi )| are uniformly bounded, no matter how
1
This kind of simplification was anticipated by Herg´e in his famous “Tintin” series and by his followers of the Belgian “la ligne claire” school.
MULTIFRACTAL ANALYSIS OF IMAGES
171
we chose the finite increasing sequence xi ). Indeed, the space BV presents several of the required features : It allows only relatively smooth textures but, on the other hand, it allows for sharp discontinuities along smooth lines (or hypersurfaces, in dimension > 2). It is now known that natural images do not belong to BV (Gousseau and Morel, 2001), but this does not prevent BV being used as a model for the “sketch” of an image: f is decomposed as a sum u + v where u ∈ BV and v is an approximation error (for instance it should have a small L 2 norm) and one uses a minimization algorithm in order to find u. In such approaches, we may expect that the discontinuities of the “cartoon” u will yield a first approximation of the splitting we were looking for. Such decompositions are referred to as “u + v” models and lead to minimization algorithms which are currently used; they were initiated by L. Rudin and S. Osher, and recent mathematical developments were performed by Meyer (2001) and references therein. Once this splitting has been performed, one can consider the elementary components of the image (i.e. shapes that enclose homogenous textures) and try to understand their geometric properties in order to obtain classification tools; at this point, no a priori assumption on the function is required; one tries to characterize the properties of the textures and of the boundaries by a collection of relevant mathematical parameters; these parameters should be effectively computable on real images in order to be used as classification parameters and hence for model selection. Multifractal analysis is used in this context: It proposes different pointwise regularity criteria as classification tools and it relates them to “global” quantities that are actually computable. The different pointwise quantities (regularity exponents) which are used in multifractal analysis are exposed in Section 2, where we also recall their relationships see Jaffard (2004). In Section 3, we deal with “global” aspects. The tools (fractional dimensions) used in order to measure the sizes of sets with a given pointwise regularity exponent are defined (they are referred to as spectra of singularities). We also draw a bridge between these local analysis tools and the global segmentation approach described above. The implications of the BV assumption on the multifractal analysis of a function are derived. The results of this section supply new tools in order to determine if particular images (or homogenous parts of images) belong to BV. In Sections 4 and 5 we concentrate on the analysis of domains with fractal boundaries. Section 4 deals with general results concerning the pointwise exponents associated with these domains and Section 5 deals with their multifractal analysis. Apart from image processing, there are other motivations for the multifractal analysis of fractal boundaries, e.g. in physics and chemistry: turbulent mixtures, aggregation processes, rough surfaces, see (Jaffard and Melot, 2005) and references therein.
172
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
2. Pointwise Smoothness Each variant of multifractal analysis is based on a definition of pointwise smoothness. In this section, we introduce the different definitions used, explain their motivations and their relationships. 2.1. POINTWISE EXPONENTS FOR FUNCTIONS AND MEASURES
The most simple notion of smothness of a function is supplied by C k differentiability. Recall that a bounded function f belongs to C 1 (Rd ) if it has everywhere partial derivatives ∂ f /∂ xi which are continuous and bounded; C k differentiability for k ≥ 2 is defined by recursion: f belongs to C k (Rd ) if it belongs to C 1 (Rd ) and each of its partial derivatives ∂ f /∂ xi belongs to C k−1 (Rd ). Thus a definition is supplied for uniform smoothness when the regularity exponent k is an integer. Taylor’s formula follows from the definition of C k differentiability and states that, for any x0 ∈ Rd , there exists C > 0, δ > 0 and a polynomial Px0 of degree less than k such that if |x − x0 | ≤ δ,
then
| f (x) − Px0 (x)| ≤ C|x − x0 |k .
This consequence of C k differentiability is just in the right form to yield a definition of pointwise smoothness which also makes sense for fractional orders of smoothness; following a usual process in mathematics, this result was turned into a definition. Definition 2.1. Let α ≥ 0, and x0 ∈ Rd ; a function f : Rd → R is C α (x0 ) if there exists C > 0, δ > 0 and a polynomial Px0 of degree less than α such that if |x − x0 | ≤ δ,
then | f (x) − Px0 (x)| ≤ C|x − x0 |α .
(1)
The H¨older exponent of f at x0 is h f (x0 ) = sup {α : f is C α (x0 )}. Remarks: The polynomial Px0 in (1) is unique and, if α > 0, the constant term of Px0 is f (x0 ); P is called the Taylor expansion of f at x0 of order α; (1) implies that f is bounded in a neighbourhood of x0 ; therefore, the H¨older exponent is defined only for locally bounded functions; it describes the local regularity variations of f . Some functions have a constant H¨older exponent: They display a “very regular irregularity”. A typical example is the Brownian motion B(t) which satisfies almost surely: ∀x, h B (x) = 1/2. H¨older regularity is the most widely used notion of pointwise regularity for functions. However, it suffers several drawbacks; one of them appeared at the very beginning of the introduction of multifractal analysis, in the mideighties; indeed, it was introduced as a tool to study the velocity of turbulent fluids, which is not necessarily a locally bounded function see Parisi and Frisch (1985); and, as mentioned above, H¨older regularity can only be applied
173
MULTIFRACTAL ANALYSIS OF IMAGES
to locally bounded functions. Several mathematical drawbacks were already discovered at the beginning of the sixties by Cald´eron and Zygmund (1961). Another one which appeared recently is that the H¨older exponent of a function which has discontinuities cannot be deduced from the size of its wavelet coefficients. This is a very serious drawback for image analysis since images always contain objects partly hidden behind each other (this is referred to as the “occlusion phenomenon”), and therefore necessarily display discontinuities. If B is a ball, let f B,∞ = supx∈B | f (x)|, and, if 1 ≤ p < ∞, 1/ p 1 p | f (x)| d x ; f B, p = Vol(B) B finally let Br = {x : |x − x0 | ≤ r } (not mentioning x0 in the notations won’t introduce ambiguities afterwards). A clue to understand how the definition of pointwise H¨older regularity can be weakened (and therefore extended to a wider setting) is to notice that (1) can be rewritten f − Px0 Br ,∞ ≤ Cr α . Therefore, one obtains a weaker criterion by substituting in this definition the local L ∞ norm by a local L p norm. The following definition was introduced by Cald´eron and Zygmund (1961). p
Definition 2.2. Let p ∈ [1, +∞); a function f : Rd −→ R in L loc belongs p to Tα (x0 ) if ∃R, C > 0 and a polynomial Px0 of degree less than α such that ∀r ≤ R
f − Px0 Br , p ≤ Cr α . p
(2)
p
The p-exponent of f at x0 is h f (x0 ) = sup{α : f ∈ Tα (x0 )}. It follows from the previous remarks that the H¨older exponent h f (x0 ) coincides with h ∞ f (x 0 ). Note that (2) can be rewritten | f (x) − Px0 (x)| p d x ≤ Cr αp+d . (3) ∀r ≤ R, Br
These p-smoothness conditions have several advantages when compared with the usual H¨older regularity conditions: They are defined as soon as f belongs locally to L p and the p-exponent can be characterized by conditions bearing on the moduli of the wavelet coefficients of f (Jaffard, 2006). Note that the p Tα (x0 ) condition gets weaker as p goes down, and therefore, for a given p x0 , p → h f (x0 ) is a decreasing function. Let us now focus on the weakest possible case, i.e. when p = 1. First, recall that, if f is a locally integrable function, then x0 is a Lebesgue point of f if 1 ( f (x) − f (x0 ))d x −→ 0 when r −→ 0. (4) Vol(Br ) Br
174
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
Therefore, one can see the Tα1 (x0 ) smoothness criterium as a way to quantify how fast convergence takes place in (4) when x0 is a Lebesgue point of f . The L 1 norm of f − Px0 expresses an average smoothness of f : How close (in the mean) are f and a polynomial. Sometimes one rather wants to determine how large f is in the neighbourhood of x0 ; then the relevant quantity is the rate of decay of the local L 1 norms B(x0 ,r ) | f (x)|d x when r → 0. This quantity can also be considered for a nonnegative measure μ instead of an L 1 function. In that case, one considers B(x0 ,r ) dμ = μ(B(x0 , r )). This leads us to the following pointwise size exponent. Definition 2.3. Let p ∈ [1, +∞); a nonnegative measure μ belongs to Sα (x0 ) if there exist positive constants R and C such that dμ ≤ Cr α . (5) ∀r ≤ R, Br
The size-exponent of μ at x0 is sμ (x0 ) = sup{α : μ ∈ Sα (x0 )} = lim inf r →0
log μ(B(x0 , r )) . log r
If f ∈ L 1 , then s f (x0 ) is the size exponent of the measure dμ = | f (x)|d x. If f ∈ L 1 and if Px0 in (3) vanishes, then the definitions of the 1-exponent and the size exponent of f coincide except for the normalization factor r d in (3) which has been dropped in (5); thus, in this case, s f (x0 ) = h 1f (x0 ) + d. This discrepancy is due to historical reasons: Pointwise exponents for measures and for functions were introduced independently. It is however justified by the following remark which is a consequence of two facts: μ((x, y]) = |F(y) − F(x)|, and the constant term of Px0 is F(x0 ). Remark: Let μ be a non-negative measure on R such that μ(R) < +∞ and let F be its repartition function defined by F(x) = μ((−∞, x]); if the polynomial Px0 in (3) is constant, then sμ (x0 ) = h 1F (x0 ). One does not subtract a polynomial in the definition of the pointwise exponent of a measure because one is usually interested in the size of a measure near a point, not its smoothness. Consider the very important case where μ is the invariant measure of a dynamical system; then the size exponent expresses how often the dynamical system comes back close to x0 , whereas a smoothness index has no direct interpretation in terms of the underlying dynamical system. p We will need to use Tα (x0 ) smoothness expressed in a slightly different form:
MULTIFRACTAL ANALYSIS OF IMAGES
175
p Proposition 2.4. Let f ∈ L loc , and α ∈ (0, 1]; let f r = 1 f (x)d x. Vol(Br ) Br Then 1/ p p 1 p f ∈ Tα (x0 ) ⇐⇒ f (x) − f r d x ≤ Cr α . (6) Vol(Br ) Br p
Proof. Suppose that f ∈ Tα (x0 ) and let A be the constant polynomial which p appears in the definition of Tα ; then 1 fr − A = ( f (x) − A)d x; Vol(Br ) Br H¨older’s inequality yields that | f r − A| is bounded by 1/ p 1 p | f (x) − A| d x (Vol(Br ))1/q Vol(Br ) Br ≤ C(Vol(Br ))1/q−1+1/ p r α = Cr α . Thus, f r = A + O(r α ). As a consequence, if we replace A by f r in the p quantity to be estimated in the definition of Tα , the error is O(r α ). Conversly, suppose that (6) is true. Let r, r be such that 0 < r ≤ r . We have f − f r L p (Br ) ≤ Cr α+d/ p
and f − f r L p (Br ) ≤ C(r )α+d/ p .
Since r ≤ r , f − f r L p (Br ) ≤ C(r )α+d/ p ; therefore f r − f r L p (Br ) ≤ C(r )α+d/ p , so that | f r − f r | ≤ C(r )α . It follows that f r converges to a limit f 0 when r goes to 0. Moreover f r = f 0 + O(r α ) and therefore one can take A = f 0. 2.2. POINTWISE EXPONENTS FOR BOUNDARY POINTS OF DOMAINS
We will show how to draw distinctions between points of the boundary of a domain , by associating to each of them an exponent, which may change from point to point along the boundary. This will allow us afterwards to perform a multifractal analysis of the boundary, i.e. to use as a discriminating parameter between different types of boundaries the whole collection of dimensions of the corresponding sets where this exponent takes a given value. Let us check if the exponents previously introduced can be used; the function naturally associated with a domain is its characteristic function 11 (x) which takes the value 1 on and 0 outside . The H¨older exponent of 11 cannot play the role we expect, since it only takes two values: +∞ outside ∂ and 0 on
176
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
∂. Let us now consider the p-exponents and the size exponent. We start by a toy-example: The domain α ⊂ R2 defined by (x, y) ∈ α if and only if |y| ≤ |x|α . At the point (0, 0) one immediately checks that, if α ≥ 1, the p-exponent takes the value (α − 1)/ p, and the size exponent takes the value α + 1. On the other hand, if 0 < α < 1, the p exponent takes the value (1 − α)/(αp) but the size exponent is always equal to 2. This elementary computation shows the following facts: The p-exponent of a characteristic function can take any nonnegative value, the size exponent can take any value larger than 2; the 1-exponent and the size exponent give different types of information. The following proposition, whose proof is straightforward, gives a geometric interpretation for the size exponent of 11 . Proposition 2.5. Let be a domain of Rd and let x0 ∈ ∂; 11 ∈ Sα (x0 ) if and only if ∃R > 0 and C > 0 such that ∀r ≤ R Vol( ∩ B(x0 , r )) ≤ Cr α . The following definition encapsulates this geometric notion. Definition 2.6. A point x0 of the boundary of is weak α-accessible if there exist C > 0 and r0 > 0 such that ∀r ≤ r0 , Vol ( ∩ B(x0 , r )) ≤ Cr α+d .
(7)
The supremum of all values of α such that (7) holds is called the weak accessibility exponent at x0 . We denote it by αw (x0 ). Thus αw (x0 ) is a non negative number and is nothing but the size exponent of the measure 11 (x)d x shifted by d. The following proposition of Jaffard and Melot (2005) shows that, for characteristic functions, all the p-exponents yield the same information and therefore one can keep only the 1-exponent. Proposition 2.7. Let be a domain of Rd and let x0 ∈ ∂; then 11 ∈ p Tα / p(x0 ) if and only if either 11 ∈ Sα+d (x0 ) or 11c ∈ Sα+d (x0 ), where c denotes the complement of . Following the same idea as above, one can also define a bilateral accessibility exponent of a domain which is the geometric formulation of the 1-exponent of the function 11 , see Jaffard and Melot (2005). Definition 2.8. A point x0 of the boundary ∂ is bilaterally weak α-accessible if there exist C > 0 and r0 > 0 such that ∀r ≤ r0 , c min Vol ( ∩ B(x0 , r )) , Vol ( ∩ B(x0 , r )) ≤ Cr α+d . (8) The supremum of all values of α such that (8) holds is called the bilateral weak accessibility exponent at x0 . We denote it by βw (x0 ).
MULTIFRACTAL ANALYSIS OF IMAGES
177
Remark 1: It follows immediately from the above definitions that the bilateral exponent βw (x0 ) is the supremum of the unilateral exponents αw (x0 ) associated with and its complement c . In practice, using unilateral or bilateral exponents as classification tools in multifracal analysis will be irrelevant when and c have the same statistical properties. It is the case when they are obtained by a procedure which makes them play the same role (for instance if ∂ is the edge of the fracture of a metallic plate). On the other hand, unilateral exponents should yield different types of information when the roles played by and its complement are very dissymetric (electrodeposition aggregates for instance). Remark 2: If ∈ BV , then, by definition grad(11 ) is a measure, and therefore one could also consider an additional exponent, which is the size exponent of |grad(11 )|. We won’t follow this idea because, in applications, one has no direct access to the measure grad(11 ), and we want to base our analysis only on information directly available from . We will also use the following alternative accessibility exponents. Definition 2.9. A point x0 of the boundary of is strong α-accessible if there exist C > 0 and r0 > 0 such that ∀r ≤ r0 , Vol ( ∩ B(x0 , r )) ≥ Cr α+d .
(9)
The infimum of all values of α such that (9) holds is called the strong accessibility exponent at x0 . We denote it by αs (x0 ). A point x0 of the boundary of is bilaterally strong α-accessible if there exist C > 0 and r0 > 0 such that ∀r ≤ r0 , c (10) min Vol ( ∩ B(x0 , r )) , Vol ( ∩ B(x0 , r )) ≥ Cr α+d . The infimum of all values of α such that (10) holds is called the bilateral strong accessibility exponent at x0 . We denote it by βs (x0 ). The following result yields alternative definitions of these exponents. Proposition 2.10. Let x ∈ ∂; then log Vol ( ∩ B(x, r )) , r →0 log r log Vol ( ∩ B(x, r )) . αs (x) + d = lim sup log r r →0 αw (x) + d = lim inf
Similar relations hold for the indices βw (x) and βs (x). Other exponents associated with boundaries have been introduced; they were based on the notion of density, which we now recall.
178
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
Definition 2.11. Let x0 ∈ ; the density of at x0 is Vol(B(x0 , r ) ∩ ) . r →0 Vol(B(x0 , r ))
D(, x0 ) = lim
(11)
This limit does not necessarily exist everywhere; thus, if one wants to obtain an exponent which allows a classification of all points of ∂, the upper density ¯ exponent D(, x0 ) or the lower density exponent D(, x0 ) should rather be used; they are obtained by taking in (11) respectively a lim sup or a lim inf. The set of points where D(, x0 ) differs from 0 and 1 is called the measure theoretic boundary, see Chap. 5 of Ziemer (1989). This allows to introduce topological notions which have a measure-theoretic content. The measure theoretic interior of is the set of points satisfying D(, x0 ) = 1; the measure theoretic exterior is the set of points satisfying D(, x0 ) = 0. See Chap. 5 of Ziemer (1989) for more on these notions which bear some similarities with the ones we will develop in Section 4.1. Note that points with a positive weakaccessibility exponent all have a vanishing density, so that density exponents are a way to draw a distinction between different points of weak-accessibility 0. This refinement has been pushed even further when has a finite perimeter (i.e. when 11 ∈ BV ). Points of density 1/2 can be classified by considering points where the boundary is “close” to a hyperplane (see (Ziemer, 1989) for precise definitions); such points constitute the “reduced boundary” introduced by de Giorgi. We will come back to these classifications in Section 4.2.
3. Fractional Dimensions, Spectra and Multifractal Analysis 3.1. FRACTIONAL DIMENSIONS
In order to introduce global parameters which allow to describe the “fractality” of the boundary of a domain, we need to recall the notions of dimensions that will be used. Their purpose is to supply a classification among sets of vanishing Lebesgue measure in Rd . The simplest notion of dimension of a set E (and the only one that is computable in practice) is the upper box-dimension. It can be obtained by estimating the number of dyadic cubes that intersect E. Recall that a dyadic cube of scale j is of the form k1 k1 + 1 kd kd + 1 λ= , × ··· × j , , where k = (k1 , . . . kd ) ∈ Zd ; 2j 2j 2 2j F j denotes the set of dyadic cubes of scale j. Definition 3.1. (Upper box-dimension) Let E be a bounded set in Rd and N j (E) be the number of cubes λ ∈ F j that intersect E. The upper
MULTIFRACTAL ANALYSIS OF IMAGES
179
box-dimension of the set E is defined by (E) = lim sup j→+∞
log(N j (E)) . log(2 j )
This notion of dimension presents two important drawbacks. The first one is that it takes the same value for a set and its closure. For example, the upper box-dimension of the set Q of rational numbers is equal to 1, but we would expect the dimension of a countable set to vanish. The second one is that it is not a σ -stable index, i.e. the dimension of a countable union of sets usually differs from the supremum of the dimensions of the sets. In order to correct these drawbacks, a very clever idea, introduced by Tricot (1982), consists in “forcing” the σ -stability as follows: Definition 3.2. (Packing dimension) Let E ⊂ Rd ; the packing dimension of E is
dim P (E) = inf sup[(E i )] ; E ⊂ Ei , i∈N
i∈N
where the infimum is taken on all possible “splittings” of E into a countable union. The Hausdorff dimension is the most widely used by mathematicians. Definition 3.3. (Hausdorff dimension) Let E ⊂ Rd and α > 0. Let us introduce the following quantities: Let n ∈ N; if = {λi } i∈N is a countable collection of dyadic cubes of scales at least n which forms a covering of E, then let
Hnα (E, ) = diam (λi )α , and Hnα (E) = inf Hnα (E, ) , i∈N
where the infimum is taken on all possible coverings of E by dyadic cubes of scales at least n. The α-dimensional Hausdorff measure of E is Hα (E) = lim Hnα (E). n→+∞
The Hausdorff dimension of E is dim H (E) = sup (α > 0 ; Hα (E) = +∞) = inf (α > 0 ; Hα (E) = 0) . Remark 1. Hausdorff measures extend to fractional values of d the notion of d-dimensional Lebesgue measure, indeed, Hd is the Lebesgue measure in Rd . The Hausdorff dimension is an increasing σ -stable index. Remark 2. The following inequalities are always true, see Falconer (1990). 0 ≤ dim H (E) ≤ dim P (E) ≤ (E) ≤ d.
180
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
3.2. SPECTRA OF SINGULARITIES
In all situations described in Section 2, a “pointwise smoothness function” is associated to a given signal (this may be for example the H¨older exponent, the p-exponent or the size exponent). In the case where the signal is irregular, it is of course impossible to describe this function point by point. That is why one tries to obtain a statistical description, by determining only the dimensions of the sets of points with a given exponent. This collection of dimensions, indexed by the smoothness parameter is called the spectrum of singularities. Actually, two kinds of spectra are used, depending on whether one picks the Hausdorff or the packing dimension, see Theorems 5.3 and 5.4 for estimates on such spectra. In the next section, we will estimate the pp spectrum of BV functions. This p-spectrum d f (H ) is the Hausdorff dimension of the set of points whose p-exponent is H . If p = ∞, d ∞ f (H ) is simply denoted by d f (H ). It denotes the Hausdorff dimensions of the sets of points where the H¨older exponent is H , and is called the spectrum of singularities of f .
3.3. MULTIFRACTAL ANALYSIS OF BV FUNCTIONS
We saw that the space BV is currently used in order to provide a simple functional setting for “sketchy” images, i.e. images which consist of piecewise smooth pieces separated by lines of discontinuities which are piecewise smooth. This approach is orthogonal to the multifractal point of view; indeed, multifractal analysis makes no a priori assumption on the function considered and, therefore, is relevant also in the analysis of non smooth textures and irregular edges. In order to go beyond this remark, it is important to understand the implications of the BV assumption on the multifractal analysis of a function. They strongly depend on the number of variables of f ; therefore, though our main concern deals with functions defined on R2 , considering the general case of functions defined on Rd will explain some phenomena which, if dealt with only for d = 1 or 2, might appear as strange numerical coincidences. We start by recalling the alternative definitions of the space BV(Rd ). Let be an open subset of Rd and f ∈ L 1 (Rd ). By definition, 1 d |D f | = sup f div g, g = (g1 , . . . , gd ) ∈ C0 (, R ) and g∞ ≤ 1 ,
d ∂gi where div g = i=1 . ∂ xi This notation is justified as follows: An integration by parts shows that if f ∈ C 1 (Rd ), |D f | = |grad f |d x where grad f = ( ∂∂xf1 , . . . , ∂∂xfd ).
MULTIFRACTAL ANALYSIS OF IMAGES
181
Definition 3.4. Let ⊂ Rd , and f ∈ L 1 (Rd ); f belongs to BV () if |D f | < +∞.
Recall that the alternative definition is: f ∈ BV () if f ∈ L 1 () and grad f (defined in the sense of distributions) is a Radon vector-measure of finite mass. What is the correct setting in order to perform the multifractal analysis of a BV function? In dimension 1, the alternative definition in terms of Radon measures immediately shows that a BV function is bounded (indeed a Radon measure is the difference of two positive measures and the primitive of a positive measure of finite mass is necessarily bounded). Therefore, one can expect that the BV assumption has a consequence on the “usual” spectrum d f (H ) based on the H¨older exponent. On the other hand, if d > 1, then a BV function need not be locally bounded (consider for instance the function 1/(xα ) in a neighborhood of 0, for α small enough). A simple superposition argument shows that it may even be nowhere locally bounded; therefore, we cannot expect the BV assumption to yield any information concerning the “usual” spectrum of singularities in dimension 2 or more. The following Sobolev embeddings precisely determine for which values of p a BV function locally belongs to L p (see (Giusti, 1984)). d Proposition 3.5. ((Giusti, 1984)) Let d = d−1 (d is the conjugate exponent of d). If f ∈ BV (Rd ) then f d ≤ C(d) |D f |. (12)
Moreover, if B = B(x0 , r ) and f B = f − f B L
d
(B)
1 Vol(B)
B
f (x)d x,
≤ C(d)
|D f |.
(13)
B
Since (12) states that BV(Rd ) is embedded in L d (Rd ), we can infer from this proposition that the “right” value of p in order to study the pointwise smooothness of functions in BV (Rd ) is p = d . The following result actually gives estimates of the d -spectrum of BV functions. Theorem 3.6. Let f ∈ BV(Rd ). The d -spectrum of f satisfies
d df (H ) ≤ H + (d − 1). Proof of Theorem 3.6. If d = 1, f is the difference of two increasing functions. The theorem is a consequence of the classical bound d(H ) ≤ H for probability measures, see Brown et al. (1992) and the remark that, if H ≤ 1, the
182
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
size exponent of a positive measure and the H¨older exponent of its primitive coincide. We can therefore assume that d ≥ 2. We can clearly suppose that H ≤ 1. Let us consider f on the unit cube [0, 1]d and let j ≥ 0. We split this cube into 2d j dyadic cubes of width 2− j . If λ is a dyadic cube √ in F j , let T V (λ) denote the total variation of f on the ball Bλ = B(μλ , d2− j ) where μλ is the center of λ, i.e. TV(λ) = Bλ |D f |. Let δ > 0 and denote by A(δ, j) the set of λ’s such that TV(λ) ≥ 2−δ j and by ˜ N (δ, j) its cardinal. Since only a finite number C(d) of balls Bλ overlap,
TV(λ) ≤ C(d) |D f |. N (δ, j)2−δ j ≤ λ∈A(δ, j)
Therefore N (δ, j) ≤ C2δ j .
(14)
Let x0 be such that it only belongs to a finite number of A(δ, j). Let λ j (x0 ) −j denote the dyadic cube of width 2√ which contains x0 . For j large enough, −δ j TV(λ j (x0 )) ≤ 2 . If B = B(x0 , d2−( j+1) ), (13) implies that f − f B L d (B) ≤ C |D f | ≤ C |D f | ≤ C2−δ j ; B
Bλ
d d thus, using Proposition 2.4, f ∈ Tδ−d/d (x 0 ) (= Tδ−d+1 (x 0 )). Denote
Aδ = lim sup A(δ, j). j→+∞
The set Aδ consists of points that belong to an infinite number of sets A(δ, j). Then, (14) implies that dim H (Aδ ) ≤ δ. If x0 ∈ Aδ , we just showed that f ∈ d Tδ−d+1 (x0 ). It follows that the set of points of d -exponent δ − d + 1 is of Hausdorff dimension at most δ. In other words, d df (δ − d + 1) ≤ δ, hence Theorem 3.6 holds. Remark: Let us pick δ > d − 1 but arbitrarily close to d − 1. We saw that Aδ has dimension less than δ and if x0 ∈ / Aδ , then x0 belongs to Tαd for an α > 0 so that x0 is a Lebesgue point of f . It follows that, if f is a BV function, then the set of points which are not Lebesgue points of f has Hausdorff dimension at most d − 1. Related results are proved in Section 5.9 of Evans and Gariepy (1992) (see in particular Theorem 3). Theorem 3.6 only gives information on the d -exponent and cannot give additional information on q-regularity for q > d since a function of BV(Rd ) may nowhere be locally in L q for such values of q. However, images are just grey-levels at each pixel and therefore are encoded by functions that take values between 0 and 1. Therefore, a more realistic modelling is supplied by
183
MULTIFRACTAL ANALYSIS OF IMAGES
the assumption f ∈ BV ∩ L ∞ . Let us now see if this additional assumption allows us to derive an estimate on the q-spectrum. Lemma 3.7. Let f ∈ Tα (x0 ) ∩ L ∞ (Rd ) for some p ≥ 1 and let q satisfy p < q q < +∞. Then f ∈ Tαp/q (x0 ). p
Proof. By assumption, f − f Br L p (Br ) ≤ Cr α+d/ p , where Br denotes the ball B(x0 , r ). Let ω = qp , so that 0 < ω < 1; since f is bounded, by interpolation, f − f Br L q (Br ) ≤ (2 f ∞ )(1−ω) f − f Br ωL p (Br ) . Therefore, if β = αp/q, then f − f Br L q (Br ) ≤ Cr (α+d/ p)ω = Cr β+d/q . Corollary 3.8. Let f ∈ BV (Rd ) ∩ L ∞ (Rd ), and q ≥ d . The q-spectrum of f satisfies q
d f (H ) ≤
q H + (d − 1). d
Remark: Of course this inequality is relevant only when H ≤
d . q
Proof. We come back to the proof of Theorem 3.6. We proved that outside the d set Aδ , f belongs to Tδ−d+1 (x0 ). It follows from the previous lemma that f also q belongs to Tγ (x0 ) for γ = δ − dd dq = δdq − qd . Since Aδ is of dimension at most δ, the corollary follows just as the end of Theorem 3.6.
4. Topological and Geometric Properties of the Essential Boundary 4.1. ESSENTIAL BOUNDARY AND MODIFIED DOMAIN
The geometric quantities introduced in Section 2 do not change if is replaced ˜ as long as they differ by a set of measure 0. This is clear by another set , p when we consider the function 11 (viewed as a L loc -function), the measure 11 (x)d x or the indices αw , αs , βw and βs . Therefore, the only points of the boundary that are pertinent to analyse from a “measure” point of view are those for which ∀r > 0, Vol(B(x0 , r ) ∩ ) > 0 and Vol(B(x0 , r ) ∩ c ) > 0. This motivates the following definition. Definition 4.1. (Essential boundary) Let be a Borel subset of Rd . Denote by ∂ess the set of points x0 ∈ Rd such that for every r > 0, Vol(B(x0 , r ) ∩ ) > 0
and
Vol(B(x0 , r ) ∩ c ) > 0.
The set ∂ess is called the essential boundary of .
184
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
It is clear that ∂ess ⊂ ∂. More precisely, we have the following characterization of ∂ess ; recall that, if A and B are subsets of Rd , then AB = (A ∪ B) \ (A ∩ B). Proposition 4.2. Let x ∈ Rd . Then, x ∈ ∂ess if and only if x is a boundary point of every Borel set such that Vol( ) = 0. Remark: In particular, ∂ess is a closed subset of Rd . Proof of Proposition 4.2. Let
A=
∂ .
Vol( )=0
It is clear that ∂ess ⊂ A. Conversly, suppose for example that there exists r > 0 such that Vol( ∩ B(x, r )) = 0. Define by = \ B(x, r ). Then Vol( ) = 0 and x ∈ ∂ . The essential boundary can also be defined as the support of the distribution grad(11 ). According to Proposition 4.2, it is natural to ask if there exists ˜ which is minimal in the sense that Vol() ˜ = 0 and a modified Borel set ess ˜ ∂ = ∂ . Proposition 4.3. (Modified domain) Let be a Borel set in Rd . There exists ˜ such that a Borel set ˜ =0 Vol()
and
˜ ∂ess = ∂ .
˜ ⊂ ∂ for every such that Vol( ) = 0. The Borel set In particular ∂ ˜ is called the modified domain of . Remark: This notion is implicit in many books of geometric measure theory, see for instance (Giusti, 1984) page 42. We can suppose in the following that ˜ and ∂ess = ∂. = Proof of Proposition 4.3. Let (Bn )n∈N be a sequence of open balls which is a base for the usual topology in Rd . Let I − = {n ∈ N ; Vol(Bn ∩ ) = 0} and I + = n ∈ N ; Vol(Bn ∩ c ) = 0 . Observe that if p ∈ I − and q ∈ I + , then, B p ∩ Bq = ∅. Define
˜ = \ Bn Bn . n∈I −
n∈I +
˜ = 0. There remains to prove that ∂ ˜ ⊂ ∂ess . Let It is clear that Vol() ˜ and r > 0. Let n be such that x ∈ Bn ⊂ B(x, r ). Since x is in the x ∈ ∂ ˜ Bn ∩ ˜ = ∅. So, n ∈ I − and Vol(Bn ∩ ) > 0. In the same way, closure of , ˜ c and Bn ∩ ˜ c = ∅; thus n ∈ I + and Vol(Bn ∩ c ) > 0. x is in the closure of
MULTIFRACTAL ANALYSIS OF IMAGES
185
Finally Vol(B(x, r ) ∩ ) > 0 and Vol(B(x, r ) ∩ c ) > 0 so that x ∈ ∂ess . We can also define the essential interior and essential closure of by ◦ ess = x ∈ Rd ; ∃r > 0 ; Vol(B(x, r ) ∩ c ) = 0 and ess
= x ∈ Rd ; ∀r > 0, Vol(B(x, r ) ∩ ) > 0 .
They are respectively open and closed subsets of Rd and satisfy ess
◦ ess
∂ess = \
.
4.2. BALANCED POINTS
We now explore the topological properties of the sets of points of the essential boundary ∂ess for which either βw or βs vanishes. We begin with a definition which identifies natural subsets of the sets of points with accessibility 0. Definition 4.4. Let ⊂ Rd be a Borel set and x0 ∈ ∂ess . 1. A point x0 is strongly balanced if there exists 0 < η < 1/2 and r0 > 0 such that Vol(B(x0 , r ) ∩ ) ∀r ≤ r0 , η ≤ ≤ 1 − η. Vol(B(x0 , r )) 2. A point x0 is weakly balanced if there exists 0 < η < 1/2 such that ∀r0 > 0, ∃r ≤ r0 ;
η≤
Vol(B(x0 , r ) ∩ ) ≤ 1 − η. Vol(B(x0 , r ))
We denote by SB() (resp. WB()) the set of strongly (resp. weakly) balanced points in ∂ess . It is clear that SB() ⊂ x0 ∈ ∂ess ; βs (x0 ) = 0 and
WB() ⊂ x0 ∈ ∂ess ; βw (x0 ) = 0 .
Recall that Baire’s theorem asserts that, if E is a complete metric set, a countable intersection of open dense sets is dense. A set which contains such an intersection is called generic. Proposition 4.5. Let be a Borel subset of Rd and ∂ess its essential boundary. The set W B() of weakly balanced points is generic in ∂ess (for the
186
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
induced topology). As a consequence, the set of points x0 ∈ ∂ess such that βw (x0 ) = 0 is generic in ∂ess . Proposition 4.6. Let be a Borel subset of Rd and ∂ess its essential boundary. The set SB() of strongly balanced points is dense in ∂ess . As a consequence, the set of points x0 ∈ ∂ess such that βs (x0 ) = 0 is dense in ∂ess . Remark: It would be interesting to determine if SB() is generic in ∂ess . Proof of Proposition 4.5. We first remark that Baire’s theorem can be applied in ∂ess (because it is a closed subset of Rd ). Let x0 ∈ ∂ess and ε > 0. Lebesgue’s differentiability theorem, applied to the Borel function f = 11 asserts that, for almost every x ∈ Rd , Vol(B(x, r ) ∩ ) −→ f (x) Vol(B(x, r ))
when r −→ 0.
Recall that Vol({x ∈ B(x0 , ε/2) ; f (x) = 1}) > 0 and Vol({x ∈ B(x0 , ε/2) ; f (x) = 0}) > 0. We can then find y0 , y1 ∈ B(x0 , ε/2) such that Vol(B(y0 , r ) ∩ ) 3 ≥ Vol(B(y0 , r )) 4
and
Vol(B(y1 , r ) ∩ ) 1 ≤ Vol(B(y1 , r )) 4
when r is small enough. Let yt = t y1 + (1 − t)y0 . The intermediate value theorem applied to the continuous function t −→
Vol(B(yt , r ) ∩ ) Vol(B(yt , r ))
allows us to construct a point x1 ∈ B(x0 , ε/2) (which is equal to yt for some value of t) such that Vol(B(x1 , r ) ∩ ) 1 = . Vol(B(x1 , r )) 2 Such an open ball B(x1 , r ) will be called a “perfectly balanced” ball. The connexity of the ball B(x1 , r ) implies that it intersects ∂ess (remember that ˜ see Proposition ∂ess is the topological boundary of the modified domain , 4.3). Let On be the union of all the open balls of radius r ≤ 1/n that are “perfectly balanced”. We just have seen that On ∩ ∂ess is an open dense subset of ∂ess . So n≥1 On ∩ ∂ess is a countable intersection of open dense subsets of the essential boundary ∂ess . Moreover, if x ∈ n≥1 On ∩ ∂ess ,
MULTIFRACTAL ANALYSIS OF IMAGES
187
we can find a sequence of points xn ∈ Rd and a sequence of positive real numbers rn ≤ 1/n such that for every n ≥ 1 x ∈ B(xn , rn )
and
1 Vol(B(xn , rn ) ∩ ) = Vol(B(xn , rn )). 2
We then have 2−(d+1) Vol(B(x, 2rn )) ≤ Vol(B(x, 2rn ) ∩ ) ≤ (1 − 2−(d+1) ) ×Vol(B(x, 2rn )), which proves that x ∈ W B().
Proof of Proposition 4.6. We develop the same idea as in Proposition 4.5. We use the norm ∞ instead of the Euclidian norm in Rd and we will denote by B∞ (x, r ) the “balls” related to this norm (which are cubes!). Let x0 ∈ ∂ess and ε > 0. Using the same argument as in Proposition 4.5, we can find x1 ∈ B∞ (x0 , ε/2) and r ≤ ε/2 such that Vol(B ∞ (x1 , r ) ∩ ) Vol(B ∞ (x1 , r ))
1 = . 2
The closed cube B ∞ (x1 , r ) can be divided into 2d closed cubes of radius r/2 whose interiors do not overlap. Suppose that none of them is “perfectly balanced”. We can then find two points z 0 , z 1 such that B ∞ (z 0 , r/2) ⊂ B ∞ (x1 , r ), B ∞ (z 1 , r/2) ⊂ B ∞ (x1 , r ),
1 Vol(B ∞ (z 0 , r/2) ∩ ) > Vol(B ∞ (z 0 , r/2)) 2 1 Vol(B ∞ (z 1 , r/2) ∩ ) < Vol(B ∞ (z 1 , r/2)). 2
Using once again the intermediate value theorem, we can construct a point x2 (which is a barycenter of z 0 and z 1 ) such that B ∞ (x2 , r/2) ⊂ B ∞ (x1 , r ) and such that the ball B ∞ (x2 , r/2) is “perfectly balanced”. Iterating this construction we obtain a sequence of “perfectly balanced” cubes B ∞ (xn , r 2−(n−1) ) such that B ∞ (xn+1 , r 2−n ) ⊂ B ∞ (xn , r 2−(n−1) ). Let x∞ = limn→∞ xn and 0 < ρ ≤ r . Let us denote by n the integer such that ρ r 2−n < √ ≤ r 2−(n−1) . d We observe that
√ B (x∞ , ρ) ⊃ B∞ x∞ , ρ/ d ⊃ B∞ x∞ , r 2−n ⊃ B∞ xn+2 , r 2−(n+1) .
188
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
In other words, the√ball B (x∞ , ρ) contains a “perfectly balanced” cube with length at least ρ/2 d. We deduce that ⎧ d ⎪ ρ 1 ⎪ ⎪ √ ⎨Vol(B(x∞ , ρ) ∩ ) ≥ 2 2 d (15) d ; ⎪ 1 ρ ⎪ c ⎪ √ ⎩Vol(B(x∞ , ρ) ∩ ) ≥ 2 2 d (15) asserts that x∞ ∈ SB(). Moreover, x0 − x∞ ∞ ≤ ε and the proof is finished. 4.3. THE FRACTAL DIMENSION OF THE SET OF BALANCED POINTS
We first consider the dimension of the set of points of accessibility 0. Theorem 4.7. Let be a Borel subset of Rd such that ∂ess = ∅. Then dim P (W B()) ≥ d − 1. Remark: In particular, dim P (∂ess ) ≥ d − 1. Let us begin with a lemma which is a slight modification of a well known result (see (Falconer, 1990) or (Heurteaux, 2003)). Lemma 4.8. Let G be a nonempty subset of Rd which satisfies Baire’s property (for the induced topology) and δ > 0. Suppose that for every x ∈ G, and every r > 0, (G ∩ B(x, r )) ≥ δ. Then dim P (G) ≥ δ. Proof. Suppose that G ⊂ n∈N E n . Denote by E n the closure (in Rd ) of E n . Baire’s property implies that one of the related closed sets E n ∩ G has an interior point in G. Thus there exist x ∈ G, r > 0 and n 0 ∈ N such that G ∩ B(x, r ) ⊂ E n 0 ∩ G, so that (E n 0 ) = (E n 0 ) ≥ (E n 0 ∩ G) ≥ (G ∩ B(x, r )) ≥ δ and Lemma 4.8 follows.
Proof of Theorem 4.7. As in Section 4.2, let On be the union of all “perfectly balanced” open cubes of radius r ≤ 1/n and let G = n≥1 On ∩ ∂ess ; G is a dense Gδ of the Baire space ∂ess , so that it satisfies Baire’s property. Moreover, G ⊂ W B(). According to Lemma 4.8, it is sufficient to prove that for every x ∈ G and every r > 0, (G ∩ B(x, r )) ≥ d − 1. Let x ∈ G and r > 0. We can find y ∈ Rd and ρ > 0 such that the cube B∞ (y, ρ) is “perfectly balanced” and x ∈ B∞ (y, ρ) ⊂ B(x, r ). Let us split the cube B∞ (y, ρ) into 2d j cubes of length 2− j+1 ρ which are called Ci . We want to estimate the number N j of cubes Ci that intersect G. For each cube Ci , let ω(Ci ) = Vol(Ci ∩ )/Vol(Ci ). The mean of ω(Ci ) is 1/2. So, at least 1/3r d of the ω(Ci ) are greater than 1/4
MULTIFRACTAL ANALYSIS OF IMAGES
189
and 1/3r d of the ω(Ci ) are lower than 3/4. Now, there are two possibilities: Either 1/6th of the cubes Ci are such that 1/4 ≤ ω(Ci ) ≤ 3/4; all those cubes intersect G (see the proof of Proposition 4.5 and 4.6) and N j ≥ 2d j /6. Else, there are at least 1/6th of the cubes such that ω(Ci ) ≤ 1/4 and 1/6th of the cubes such that ω(Ci ) ≥ 3/4. Let A be the union of all the closed cubes C i such that ω(Ci ) ≤ 1/2. Then 5 1 Vol(B∞ (y, ρ)) ≤ Vol(A) ≤ Vol(B∞ (y, ρ)). 6 6 Isoperimetric inequalities (see for example (Ros, 2005)) ensure that the “surface” of the boundary of A is at least Cρ d−1 . In particular, there exist at least C(ρ)2 j(d−1) couples of cubes (C, C ) such that C ∩ C = ∅, ω(C) ≤ 1/2 and ω(C ) ≥ 1/2. It follows that C ∩ G = ∅ or C ∩ G = ∅ (an intermediate cube is “perfectly balanced”). It follows that N j ≥ C2 j(d−1) . In either case, N j ≥ C2 j(d−1) . So, (G ∩ B(x, r )) ≥ d − 1.
5. Multifractal Properties of the Essential Boundary 5.1. CONSTRUCTION OF THE SCALING FUNCTION
We will construct a multifractal formalism based on the dyadic grid whose purpose is to derive the Hausdorff (or packing) dimensions of the level sets of the functions αw and αs . Recall that Fn is the set of dyadic (semi-open) cubes of scale n; denote by λn (x) the unique cube in Fn that contains x. The following proposition is a simple consequence of the inclusions B(x, 2−n ) ⊂ √ −n 3λn (x) ⊂ B(x, 3 d2 ). Proposition 5.1. Let be a Borel subset of Rd and x ∈ ∂ess . Then log Vol (3λn (x) ∩ ) , −n log 2 log Vol (3λn (x) ∩ ) . αs (x) + d = lim sup −n log 2 n→+∞ αw (x) + d = lim inf n→+∞
Proposition 5.1 suggests to introduce a scaling function as follows. Let be a Borel set such that ∂ess is bounded and not empty; let
(Vol(3λ ∩ ))q where Fn ∗ = {λ ∈ Fn : λ ∩ ∂ess = ∅}, S(q, n) = λ∈Fn ∗
and τ (q) = lim sup n→+∞
1 log (S(q, n)) . n log 2
(16)
190
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
The function τ is decreasing and convex. The standard justification of the multifractal formalism runs as follows: First, the contribution to S(q, n) of the set of points where the (weak or strong) accessibility exponent takes a given value α is estimated. If the dimension of this set is d(α), then there are about 2d(α)n dyadic cubes in Fn ∗ which cover this set; and such a cube satisfies Vol(3λ ∩ ) ∼ 2−αn . Therefore the order of magnitude of the contribution we look for is 2−(αq−d(α))n . When n → +∞, the preponderent contribution is clearly obtained for the value of α that minimizes the exponent αq − d(α); thus τ (q) = infα (αq − d(α)). If d(α) is a concave function, then this formula can be inverted and d(α) is recovered from τ (q) by an inverse Legendre transform: d(α) = inf(αq + τ (q)). q
The multifractal formalism holds if, indeed, this relationship between the scaling function and the spectrum of singularties holds. We give in Section 5.3 some results in this direction. Remark 1: The factor 3 in the definition of S(q, n) is not always used in the derivation of the multifractal formalism for measures; however, it improves its range of validity, as shown by Riedi (1995). The novelty in our derivation is the restriction of the sum to the cubes λ such that λ ∩ ∂ess = ∅; this allows ◦ ess
to eliminate all the points in
ess
and in Rd \ .
Remark 2: In (Testud, 2006), Testud already introduced such a “restricted” scaling function. In the context of his paper, a strange Cantor set K perturbs the multifractal analysis of the measure. Multifractal formalism breaks down at different levels. Testud introduces the scaling function τ K in which the sum is restricted to the dyadic intervals that meet the Cantor set K and proves that for all the “bad exponents”, the dimension of the level set is given by the Legendre transform τ K∗ . 5.2. PROPERTIES OF THE SCALING FUNCTION
Theorem 5.2. Let be a Borel subset of Rd such that ∂ess is nonempty and bounded. Define τ (q) as in (16). The following properties hold. 1. 2. 3. 4.
τ (0) = (∂ess ) and ∀q ≥ 0, τ (q) ≤ (∂ess ) − dq. ∀q ≥ 0, τ (q) ≥ d − 1 − dq. ∀q ∈ R, dim P (SB()) ≤ τ (q) + dq. ∀q ∈ R, dim H (WB()) ≤ τ (q) + dq.
191
MULTIFRACTAL ANALYSIS OF IMAGES
Proof of Theorem 5.2. 1. If λ ∩ ∂ess = ∅, then, Vol(3λ ∩ ) > 0 and (Vol(3λ ∩ ))0 = 1, thus τ (0) = (∂ess ). More precisely, if q > 0, then
(Vol(3λ ∩ ))q ≤ Card (Fn ∗ )(3.2−n )dq ; λ∈Fn ∗
it follows that τ (q) ≤ (∂ess ) − dq. 2. If n is large enough, using a similar argument as in Theorem 4.7, we can find at least c2(d−1)n cubes in Fn ∗ which are “quite balanced”. These cubes satisfy Vol(3λ ∩ ) ∼ 2−dn and the inequality follows. 3. It is easy to see that x0 ∈ SB() if and only if there exists 0 < η < 1/2 and n 0 such that ∀n ≥ n 0 ,
η≤
Vol(3λn (x0 ) ∩ ) ≤ 1 − η. (3.2−n )d
(17)
Let Un 0 , η denote the set of points that satisfy (17). Let α < dim P (SB()). We can find p, n 0 ∈ N∗ such that (Un 0 , 1/ p ) ≥ dim P (Un 0 , 1/ p ) > α. If Nk is the number of cubes λ ∈ Fk we need to cover Un 0 , 1/ p , then, Nk ≥ 2kα infinitely often. Suppose q > 0 (the proof is similar if q < 0). We get q
1 3dq q −k d (Vol(3λ ∩ )) ≥ Nk (3.2 ) ≥ q 2k(α−dq) p p λ∈F ∗ k
infinitely often. We conclude that τ (q) ≥ α − dq. 4. Note that x0 ∈ W B() if and only if there exists 0 < η < 1/2 such that ∀n 0 , ∃n ≥ n 0 ;
η≤
Vol(3λn (x0 ) ∩ ) ≤ 1 − η. (3.2−n )d
(18)
Let Vη denote the set of points that satisfy (18). Let p ∈ {2, 3, . . .}, n 0 ∈ N∗ and suppose that q > 0 (the proof is similar if q < 0). We can cover V1/ p with cubes of scale n ≥ n 0 such that Vol(3λ ∩ ) ≥ 1p (3.2−n )d . Let R be such a covering and τ > τ (q). We have
(Vol(3λ ∩ ))q diam (λ)τ diam (λ)τ +dq ≤ C λ∈R
λ∈R
≤C
n≥n 0
λ∈Fn ∗
(Vol(3λ ∩ ))q 2−nτ .
192
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
Moreover, if τ > τ > τ (q) and n 0 sufficiently large, then
(Vol(3λ ∩ ))q ≤ 2nτ . λ∈Fn ∗
It follows that
diam (λ)τ +dq ≤ C
λ∈R
n≥n 0
2n(τ
−τ )
≤
C . 1 − 2τ −τ
We conclude that dim H (V1/ p ) ≤ τ + dq and dim H (WB()) ≤ τ + dq.
5.3. THE MULTIFRACTAL FORMALISM ASSOCIATED WITH ∂ess
The proofs of points 3 and 4 in Theorem 5.2 allow to obtain estimates of the level sets of accessibility index. Theorem 5.3. Let be a Borel subset of Rd such that ∂ess is nonempty and bounded. Define τ (q) as in (16). If α ≥ 0, let E αw = x ∈ ∂ess ; αw (x) ≤ α and E αs = x ∈ ∂ess ; αs (x) ≤ α . For every q > 0, dim H (E αw ) ≤ (d + α)q + τ (q)
and
dim P (E αs ) ≤ (d + α)q + τ (q).
In particular, if α + d ≤ −τ− (0), then dim H (E αw ) ≤ τ ∗ (α + d)
and
dim P (E αs ) ≤ τ ∗ (α + d).
The proof uses the same ideas as in Theorem 5.2 and requires to introduce the set of points x ∈ ∂ess such that Vol (3λn (x) ∩ ) ≥ 2−n(α+d+ε) infinitely often (resp. for n large enough). In the same way, we can also prove the following twin result. Theorem 5.4. Let be a Borel subset of Rd such that ∂ess is nonempty and bounded. Define τ (q) as in (16). If α ≥ 0, let Fαw = x ∈ ∂ess ; αw (x) ≥ α and Fαs = x ∈ ∂ess ; αs (x) ≥ α . For every q < 0, dim P (Fαw ) ≤ (d + α)q + τ (q)
and
dim H (Fαs ) ≤ (d + α)q + τ (q).
and
dim H (Fαs ) ≤ τ ∗ (α + d).
In particular, if α + d ≥ −τ+ (0), dim P (Fαw ) ≤ τ ∗ (α + d)
MULTIFRACTAL ANALYSIS OF IMAGES
193
Remark 1: The set E αs (resp. Fαw ) is quite similar to the set of strong αaccessible points (resp. weak α-accessible points). Remark 2: The results in Theorem 5.3 and 5.4 are standard multifractal inequalities adapted to the context of boundaries (see (Brown et al., 1992)).
References Arneodo, A., Audit, B., Decoster, N., Muzy, J.-F., and Vaillant, C. (2002) Wavelet-based Multifractal Formalism: Applications to DNA Sequences, Satellite Images of the Cloud Structure and Stock Market Data, In A. Bunde, J. Kropp, and H. J. Schellnhuber (eds.), The Science of Disasters, New York, Springer, pp. 27–102. Bordenave, C., Gousseau, Y., and Roueff, F. (in press) The Dead Leaves Model: A General Tessellation Modelling Occlusion, Advances in Applied Probability. Brown, G., Michon, G., and Peyri`ere, J. (1992) On the Multifractal Analysis of Measures, Journal of Statistical Physics 66, 775–790. Cald´eron, A. P. and Zygmund, A. (1961) Local Properties of Solutions of Elliptic Partial Differential Equations, Studia Mathemalica 20, 171–227. Evans, L. and Gariepy, R. (1992) Measure Theory and Fine Properties of Functions, Boca Raton, FL, CRC Press. Falconer, K. (1990) Fractal Geometry: Mathematical Foundations and Applications, NewYork, John Wiley & Sons Ltd. Giusti, E. (1984) Minimal Surfaces and Functions of Bounded Variation, Bassle, Birkh¨auser. Gousseau, Y. and Morel, J.-M. (2001) Are Natural Images of Bounded Variation?, SIAM Journal on Mathematical Analysis 33, 634–648. Heurteaux, Y. (2003) Weierstrass Functions with Random Phases, Transactions of the American Mathematical Society 355, 3065–3077. Jaffard, S. (2004) Wavelet Techniques in Multifractal Analysis, In M. Lapidus and M. van Frankenhuijsen (eds.), Fractal Geometry and Applications: A Jubilee of Benoˆıt Mandelbrot, Vol. 72, Proceedings in Symposia in Pure Mathematics, AMS, pp. 91–152. Jaffard, S. (2006) Wavelet Techniques for Pointwise Regularity, Annales de La Faculte des Sciences de Touloure 15, 3–33. Jaffard, S. and Melot, C. (2005) Wavelet Analysis of Fractal Boundaries, Part 1: Local Regularity and Part 2: Multifractal Formalism, Communications in Mathematical Physics 258, 513–565. Jaffard, S., Meyer, Y., and Ryan, R. (2001) Wavelets: Tools for Science and Technology, S.I.A.M. Matheron, G. (1975) Random Sets and Integral Geometry, New York, John Wiley and Sons. Meyer, Y. (2001) Oscillating Patterns in Image Processing and Nonlinear Evolution Equations, University Lecture Series. 22. AMS. Naud, C., Schertzer, D., and Lovejoy, S. (1997) Radiative Transfer in Multifractal Atmospheres: Fractional Integration, Multifractal Phase Transitions and Inversion Problems, Vol. 85, New York, Springer. IMA Vol. Math. Appl., pp. 239–267. Parisi, G. and Frisch, U. (1985) On the Singularity Spectrum of Fully Developed Turbulence, In Turbulence and Predictability in Geophysical Fluid Dynamics, Proceedings of the International Summer School in Physics Enrico Fermi, North Holland, pp. 84–87. Riedi, R. (1995) An Improved Multifractal Formalism and Self-similar Measures, Journal of Mathematical Analysis and Applications 189, 462–490.
194
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
Ros, A. (2005) The Isoperimetric Problem, In Global Theory of Minimal Surfaces, Clay Mathematic Proceedings, 2, Providence, RI, American Mathematical Society, pp. 175–209. Testud, B. (2006) Phase Transitions for the Multifractal Analysis of Self-similar Measures, Non Linearity 19, 1201–1217. Tricot, C. (1982) Two Definitions of Fractional Dimension, Mathematial Proceedings of the Cambridge Philosophical Society 91, 57–74. Ziemer, W. (1989) Weakly Differentiable Functions, New York, Springer.
CHARACTERIZATION AND CONSTRUCTION OF IDEAL WAVEFORMS
Myoung An and Richard Tolimieri Prometheus Inc., 17135 Front Beach Road, Unit 9, Panama City Beach, FL 32413
Abstract. Using the finite Zak transform (FZT) periodic polyphase sequences satisfying the ideal cyclic autocorrelation property and sequence pairs satisfying the optimum cyclic cross correlation property are constructed. The Zak space correlation formula plays a major role in the design of signals having special correlation properties. Sequences defined by permutation matrices in Zak space always satisfy the ideal cyclic autocorrelation property. A necessary and sufficient condition on pairs of permutation matrices in Zak space such that the corresponding pairs of sequences satisfy the optimum cyclic correlation property is derived. This condition is given in terms of a collection of permutations, called ∗-permutations. The development is restricted to the case N = L 2 , L an odd integer. An algorithm for constructing ∗-permutations is provided.
1. Introduction In this work we use the finite Zak transform (FZT) to construct periodic polyphase sequences satisfying the ideal cyclic autocorrelation property and sequence pairs satisfying the optimum cyclic cross correlation property. In communication theory these sequences are called Z q -sequences. Their correlation properties have been extensively studied in Chu (1972), Frank and Zadoff (1973), Heimiller (1961), Popovic (1992), and Suehiro and Hatori (1988). A detailed treatment of Z q -sequences and their application to spread spectrum can be found in Mow (1995). The approach in these works has the merit that it is direct. The approach in this work is based on designing periodic sequences in Zak space (ZS). The ZS correlation formula plays a major role in the design of signals having special correlation properties. Sequences defined by permutation matrices in ZS always satisfy the ideal cyclic autocorrelation property. The analogous result proved directly in communication theory can be found in Frank and Zadoff (1973) and Heimiller (1961). We will derive a necessary and sufficient condition on pairs of 195 J. Byrnes (ed.), Imaging for Detection and Identification, 195–206. C 2007 Springer.
196
MYOUNG AN AND RICHARD TOLIMIERI
permutation matrices in ZS such that the corresponding pairs of sequences satisfy the optimum cyclic correlation property. This condition is given in terms of a collection of permutations, called ∗-permutations. The development is restricted to the case N = L 2 , L an odd integer. We will show examples and provide an algorithm for constructing ∗-permutations. We believe this concept to be new. The main consequence is that we can construct significantly larger collections of sequence pairs satisfying the optimal correlation property than those constructed in Popovic (1992). The following notation and elementary results will be used throughout this work. A vector x ∈ C N is written ⎤ ⎡ x0 ⎢ x1 ⎥ ⎥ ⎢ x = ⎢ .. ⎥ = [xn ]0≤n 1.7) before the beginning of EM experiments (1975–1983), during experiments (1983–1988) and after experiments (1988–1992). Recurrence plots analysis of waiting times sequences: (a) before experiments, (b) during experiments, (c) after experiments.
during experiments. On the other hand, opposite to what was mentioned above, after cessation of experiments (Figure 3, triangles) the correlation dimension of waiting times sequences noticeably increases (d2 > 5.0), exceeding the low dimensional threshold (d2 = 5.0). That means that after termination of experiments the extent of regularity or extent of determinism in the process of earthquake temporal distribution decreases. The process becomes much more random both qualitatively (Figure 4) and quantitatively (Figure 3 triangles). For comparison, in Figure 3, results for random number sequence are shown too (diamonds). The found low correlation dimension of integral waiting time series in central Asia is in good agreement with the above results on low dimensional dynamical structure of earthquake temporal distribution for other seismoactive regions (Goltz, 1998; Matcharashvili et al., 2000). On the other hand Figure 3 shows that a relatively short ordered sequence can decrease correlation dimension of a long integral time series, which contains much larger sequences of high dimension. Thus the low value of d2 of a long time series does not mean that the whole sequence is ordered—ordering can be intermittent. The data on correlation dimension, together with qualitative RP results shown in Figure 4, provide evidence that after the beginning of EM discharges the dynamics of temporal distribution of earthquakes around IVTAN test area undergoes essential changes; it becomes more regular, or events of corresponding time series become functionally much more interdependent. These results were tested against the possible influence of noise using a noise reduction procedure (Schreiber, 1993; Kantz and Schreiber, 1997) as well as against trends or non-stationarity in interevent data sets (Goltz, 1998). The tests confirm our conclusions. Thus the changes before, during and after experiments (Figure 3) are indeed related to dynamics of the temporal
IDENTIFICATION OF COMPLEX PROCESSES
227
distribution of earthquakes caused by external anthropic influence (MHD discharges)(Chelidze and Matcharashvili, 2003). Subsequently, in order to have a basis for a more reasonable rejection of spurious conclusions caused by possible linear correlations in the data sets considered, we have used the surrogate data approach to test the null hypothesis that our time series are generated by a linear stochastic process (Theiler et al., 1992; Rapp et al., 1993, 1994; Kantz and Schreiber, 1997). Precisely, PR and GSRP surrogates sets for the waiting times series were used (Chelidze and Matcharashvili, 2003). Surrogate testing of waiting time sequences before (a) and during (b) experiments using d2 as a discriminating metric for each of our data sequences has been carried out. 75 of PR and GSRP surrogates have been generated. The significance criterion S for analyzed time series before experiments is: 22.4 ± 0.2 for PR and 5.1 ± 0.7 for GSRP surrogates. After beginning of experiments the null hypothesis that the original time series is linearly correlated noise was rejected with significance criterion S: 39.7 ± 0.8 for PR and 6.0 ± 0.5 for GSRP surrogates. These results can be considered as strong enough evidence that the analyzed time series are not linear stochastic noise. The above conclusion about the increase of regularity in an earthquake’s temporal distribution after beginning of experiments (external influence on seismic process) is confirmed also by results of RQA; namely RR(t) = 9.6, DET(t) = 3.9 before experiments, RR(t) = 25, DET(t) = 18 during and RR(t) = 3, DET = 1.5 after experiments. It was shown in our previous research that small earthquakes play a very important role in the general dynamics of earthquake temporal distribution (Matcharashvili et al., 2000). This is why we also have carried out the analysis of time series containing all data available from the entire catalogue of waiting time sequences, including small earthquakes that are below the magnitude threshold. This test is also valid for checking the robustness of results in the case of adding a new, not necessarily complete, set of data to our original set. The total number of events in the whole catalogue increased to 14,100, while in the complete catalogue for all three above-mentioned periods (before, during and after MHD experiments) there were about 4000 data in each one. Both the complete and whole catalogues of waiting time sequences reveal low dimensional nonlinear structure in the temporal distribution of earthquakes before and especially during experiments. This means that our results on the influence of hot and cold EM runs on the general characteristics of earthquakes’ temporal distribution dynamics remain valid for small earthquakes too. Thus the conclusion drawn from this analysis is that the anthropogenic external influence, in the form of strong electromagnetic pulses, invokes distinct changes in the dynamics of earthquakes’ temporal distribution—dynamical characteristics of regional seismic processes are changed. Though energy released during these experiments
Figure 5. (a) Variation of water level in Enguri high dam reservoir above sea level in 1978–1995. (b) RQA %DET of daily number of earthquakes calculated for consecutive one year sliding windows, (c) Cumulative sum of released daily seismic energy. (d) RQA %DET of magnitude (black columns) and interearthquake time interval (grey columns) sequences (1) before impoundment, (2) during flooding and reservoir filling and (3) periodic change of water level in reservoir.
228 TEIMURAZ MATCHARASHVILI ET AL.
IDENTIFICATION OF COMPLEX PROCESSES
229
was very small compared to the energy in even small earthquakes, it still can be considered as a strong enough man made impact. It was recently found that it is possible to control the dynamics of complex systems through small external periodic influences. In this respect we have investigated the possible influence of water level periodic variation on dynamics of regional seismic activity. For this purpose data sets of the water level variation in the Western Georgia Enguri high dam reservoir and the seismicity data sets for the surrounding area for 1973–1995 were investigated. The height of the dam is 272 m, the (average) volume of water in the reservoir 1.1 × 109 m3 . The Enguri reservoir was built in 1971–1983. Preliminary flooding of the territory started at the end of December 1977; since 15 April 1978 the reservoir was filled step by step to a 510 m mark (above the sea level). Since 1987 the water level in the reservoir has been changing seasonally, almost periodically. Thus we have defined three distinct periods for our analysis, namely, (i) before impoundment, (ii) flooding and reservoir filling and (iii) periodic change of water level. Figure 5a shows the daily record of the water level in the Enguri dam reservoir for the period 1978–1995. The relevant size of the region to be investigated, i.e. the area around the Enguri high dam which can be considered sensitive to the reservoir influence (90 km), was evaluated based on the energy release acceleration analysis approach (Bowman et al., 1998) as it is described in (Peinke, et al., 2006). The number of earthquakes which occurred above the representative threshold (1200) was too small to carry out a correct correlation dimension and Lyapunov exponent calculation for these three periods. So the RQA calculation was carried out. It was shown that when the external influence on the earth’s crust caused by reservoir water becomes periodic the extent of regularity of the daily distribution of earthquakes essentially increases (see Figure 5a and b). It is also clear that during flooding and nonregular reservoir filling the amount of released seismic energy increases (Figure 5a and c) in complete accordance with well-known concepts of reservoir-induced seismicity (Talwani, 1997; Simpson et al., 1988). It is important to mention that the influence of an increasing amount of water and its subsequent periodic variation essentially affects the character of earthquake’s magnitude and temporal distribution (see Figure 5d). In particular the extent of order in earthquake’s magnitude distribution substantially increases when an external influence becomes periodic (black columns). At the same time dynamics of earthquake’s temporal distribution changed even under irregular influence, though not so much as under periodic influence. Similar conclusions are drawn from other RQA measures. All these results indicate that a slow periodic influence (loading – unloading) may change dynamical properties of local seismicity. Assuming the possibility of control of dynamics of seismic processes under small periodic influences of the water level variation, the following question might arise: if there exists the control
230
TEIMURAZ MATCHARASHVILI ET AL.
caused by periodic reservoir loading, why do the earthquakes correlate weakly with the solid earth tides (see Vidale et al., 1998; Beeler and Lockner, 2003). A possible explanation is that the control strength depends not only on the amplitude of forcing, but also on its frequency. If both parameters are varied, the synchronization area between external periodic influence and seismic activity forms the so-called Arnold’s tongue with some preferred frequency that needs minimal forcing (Pikovsky et al., 2003; Chelidze et al., 2005). At larger or smaller frequencies the forcing needed for achieving synchronization increases drastically and at large deviations from the optimal frequency the synchronization becomes impossible. According to (Beeler and Lockner, 2003) the daily tidal stress is changing too fast in comparison with the characteristic time of the moderate seismic event nucleation, which is of the order of months or years. On the other hand, the frequency of periodic loading exerted by the reservoir exploitation is one year and that is close to the characteristic time of significant earthquake preparation. Therefore, the synchronization effect of reservoir periodic loading of seismic events can be accepted as a realistic physical mechanism for related dynamical changes. 3.2. DETECTION OF DYNAMICAL CHANGES IN LABORATORY STICK-SLIP ACOUSTIC EMISSION UNDER PERIODIC FORCING
The above results provide serious arguments that, under external periodic influence, the dynamics of natural seismicity can be changed. This is very important from both the scientific and engineering points of view. At the same time, real field seismic data often are too short and incomplete to draw unambiguous conclusions on related complex dynamics. Therefore, in order to test our field data results, similar analysis on the acoustic emission data sets, during stick-slip, has been carried out (Chelidze and Matcharashvili, 2003; Chelidze et al., 2005). Acoustic mission accompanying stick-slip experiments is considered as a model of a natural seismic process (Johansen and Sornette, 1999; Rundle et al., 2000). Our laboratory setup consisted of two samples of roughly finished basalt. One of the samples (plates),the lower one, was fixed; the upper one was pulled with constant speed by the special mover, connected to the sample by the rope-spring system; acoustic and EM emissions accompanying the slip were registered. The following cases were studied: i) pulling the sample without any additional impact; ii) the slip with applied additional weak mechanical (periodic) impact; iii) slip with applied periodic EM field. Thus, experiments have been carried out under or without periodic mechanical or electromagnetic (EM) forcing, which simulates the external periodic influence. If the large pulling force can be modulated by a weak periodic force of EM or mechanical nature, this could show high sensitivity of critical or “nearly critical” systems to small external impact. As was mentioned the
IDENTIFICATION OF COMPLEX PROCESSES
231
Figure 6. Dynamical changes in temporal distribution of acoustic emission under increasing external periodic forcing.
aim was to prove experimentally the possibility of controlling the slip dynamical regime by a weak mechanical or EM impact. The elementary theoretical model of EM coupling with friction can be formulated in the following way. It is well known that application of an EM field to a dielectric invokes some forces acting upon molecules of the body; the resultant of them is called the ponderomotive or electrostriction force Fp that is affecting the whole sample. The force is proportional to the gradient of the field intensity squared and it carries away the sample in the direction of the largest intensity. Quite different regimes of time distribution of maximal amplitudes of acoustic emission were observed depending on the intensity of the applied external weak (relative to the pulling force) perturbations. As shown in Figure 6, increased periodic forcing first leads to a more regular temporal distribution, while further increase decreases the extent of regularity. This experiment confirms the above field results on the possible controlling effect of external periodic influence on the seismic process. 3.2.1. Changes in Dynamics of Water Level Variation During Increased Regional Seismic Activity As the next example of using the phase space structures testing methods of dynamical changes detection in earthquake related natural processes, we present results of our investigation of water level variation in deep boreholes. Variation of water level in deep boreholes is caused by a number of endogenous and exogenous factors.
232
TEIMURAZ MATCHARASHVILI ET AL.
Figure 7. Variation of monthly extent of regularity in water level data sets, observed in deep boreholes calculated for 720 windows at 720 data steps. Triangles—Akhalkalaqi borehole, Circles—Lisi borehole, and asterisks—Kobuleti borehole.
The most important factor is the strain change in the upper Earth crust. Deep boreholes represent some kind of sensitive volumetric strainmeters, where the water level is reacting to the small ambient deformation. Hence, water level variations obviously will also reflect the inherent response of the aquifer to the earthquake related strain redistribution in the earth’s crust (Kumpel, 1994; Gavrilenko et al., 2000). Therefore investigation of water level variations in deep boreholes may provide additional understanding of dynamics of processes related to earthquake preparation in the earth’s crust (King et al., 1999). We have investigated dynamics of water level variation in deep boreholes in Georgia. Analysis was carried out on water level hourly time series from 3 (about 2000 m depth) boreholes: Lisi (44.45 N, 21.45 E), Akhalkalaki (43.34 N, 41.22 E), and Kobuleti (41.48 N, 41.47 E) (01.03.1990– 29.02.1992). Several strong seismic events took place during this time period in the Caucasus. Namely, M = 6.9 (29.04.1991) M = 6.3 (15.06.1991) and M = 6.3 (23.10.1992) earthquakes. The aim was to answer the question whether strain redistribution in the earth’s crust related to earthquake preparation may lead to dynamical changes of water level variation in deep boreholes.As it is shown in Figure 7 in most cases regularity of variation of
IDENTIFICATION OF COMPLEX PROCESSES
233
water level in deep boreholes essentially decrease several months prior to a strong earthquakes. At the same time the extent of such decrease is different for different boreholes and obviously depends on the geological structure of area. 3.3. DETECTION OF DYNAMICAL CHANGES IN EXCHANGE RATE DATA SETS
Evidence of increased interest of economists in the field of investigation of complex processes has often been demonstrated during the last decade (Bunde et al., 2002). In fact, though economists have postulated the existence of economic cycles, still there are serious problems in the practical detection and identification of different types of behavior of economic systems (Hodgson, 1993; McCauley, 2004). Here we show results of our analysis of time series of the Georgian Lari—US Dollar exchange rate. Data sets were provided by the National Bank of Georgia for the 03.01.2000–31.12.2003 time period. This 7230 data length time series consists of GEL/USD exchange rate values calculated 5 times per day. It follows from our analysis that the dynamics of the GEL/USD exchange rate is characterized by a clear internal structure. Indeed, as seen in Figure 8a, the recurrence plot of GEL/USD exchange rate time series reveals visible structure in the phase point distribution. The same time series after dynamical structure distortion (data shuffling), PR and GSRP procedures do not reveal visible structure (see e.g. Figure 8b). We conclude that the revealed dynamical nonrandom structure of the analyzed GEL/USD exchange rate time series is an inherent characteristic and is not caused by linear effects. Quantitatively, GEL/USD
Figure 8. Recurrence plots of (a) original and (b) GSRP, GEL/USD exchange rate time series.
234
TEIMURAZ MATCHARASHVILI ET AL.
Figure 9. RQA metrics of Georgian Lari/US Dollar exchange rate time series. Percentage of recurrence points in 7 days span sliding windows of original (solid line) and phase randomised time series (thin line). Grey peaks corresponds to percentage of determinism in original time series.
exchange rate time series were analyzed using shifted, 7 days span, sliding windows. Quantitative analysis of the recurrence plot structure shows that the dynamics of GEL/USD exchange rates is a complex process with a variable share of deterministic components (Figure 9). The approximate period of this variation is one year (see Figure 9). The maximal extent of GEL/USD exchange rate regularity was detected in the beginning of 2001. 3.4. DETECTION OF DYNAMICAL CHANGES IN ARTERIAL PRESSURE DATA SETS
Detection and identification of dynamical aspects of physiological processes remain two of the main challenges of current clinical and experimental medicine. In this regard significant attention was paid to changes in heart dynamics under different cardiac conditions (Elbert et al., 1994; Pikkujamsa et al., 1999; Weiss et al., 1999; Bunde et al., 2002). Physiological time series of different origin have been investigated. For previous years, based on the correlation integral approach, it was suggested that there is some evidence of nonlinear structure in normal heart dynamics (Lefebvre et al., 1993; Govindan et al., 1998). Moreover it has been established for physiological data sets of different origin that a high degree of variability is often a feature of young healthy hearts, and that an increase in variability of physiological characteristics is common for aging and diseases (Zokhowski et al., 1997; Pikkujamsa et al., 1999). According to the demands of the GPA algorithm, these results were
IDENTIFICATION OF COMPLEX PROCESSES
235
obtained for long data time series, e.g. 24-h Holter monitor recordings. At the same time, despite a number of important findings, from the practical point of view the method of nonlinear analysis of medical time series discussed above is problematic. This is conditioned by problems related to the absence of stability in the obtained data sequences, i.e., the measured signal depends on the patients’ emotional and physical condition at a given moment (Zokhowski et al., 1997; Zhang and Thakor, 1999). Due to the difficulty of obtaining long time series the RQA method became increasingly popular among physiological researchers. In the present study time series of indexes of the myocardial contractile function have been investigated, including time series of maximal velocity of myocardial fibers circular contraction, time of intraventricular pressure increase, mean velocity of myocardial fibers circular contraction and maximal rate of intraventricular pressure. These time series consist of apex cardiograph records (Mason et al., 1970; Antani et al., 1979). A total 120 adult males were studied, including: 30 healthy subjects, 30 patients with first, 30 with second, and 30 with severe third stage of arterial hypertension (according to the classification of 1996). These time series correspond to a composite set of concatenated subsequences containing 15 to 20 observables (calculated contractile indexes) for each of 30 people relevant to the same group. These subsequences form a total of 500 data length time series for separate indexes. A similar approach of multivariable reconstruction was successfully applied earlier for the calculation of correlation dimension of different composite time series (Elbert et al., 1994; Rombouts et al., 1995; Yang et al., 1994). Besides systolic and diastolic pressure, the heart rate variability in healthy subjects and patients with different stages of arterial hypertension were investigated. The analyzed time series consist of 24 h ambulatory monitoring of blood pressure recordings from 160 patients. Their age was between 30 and 70. The patients under study were not given medicines for 2–3 days preceding the examination. Blood pressure recording was carried out in a calm environment, in the sitting position according to the standard method provided by hypertension guidelines. The 24 hr monitoring of blood pressure was carried out from 11:00 a.m. to 11.00 a.m. of the next day, taking into consideration the physiological regime of the patients. Intervals between the measurements was 15 min. Systole, diastole, average tension as well as heart rate were defined. The analyzed multivariable time series were compiled as consecutive sequences of appropriate data sets of each patient from the considered groups. Integral time series contain about 1300 data for each healthy and pathological group analyzed. As seen in Figure 10, the extent of regularity of time series of myocardial indeces increases in pathology. A similar result was obtained for systolic pressure time series (Figure 11 diamonds). An increase of regularity is also visible for heart rate time series, although this is not so essential. At the same time the dynamics of diastolic pressure was practically unchanged.
236
TEIMURAZ MATCHARASHVILI ET AL.
Figure 10. RQA %DET of indexes of myocardial contractile function time series versus the state of health. Diamonds—maximal velocity of myocardial fibers circular contraction, squares—time of intraventricular pressure increase, triangles—mean velocity of myocardial fibers circular contraction. Asteriscs - maximal rate of intraventricular pressure.
It is important to mention that time series used were multivariable as they contain data from different patients of the same physiologic group. Moreover myocardial indeces and arterial pressure data sets were taken from different groups. Taking into account all these facts it is very important that dynamical changes with the extent of pathology increase are similar. In other words, in almost all cases considered dynamics of physiological processes in pathology became more regular than in healthy conditions. These results are in accord with our and other authors’ results (Elbert et al., 1994; Garfinkel et al., 1997; Pikkujamsa et al., 1999; Weiss et al., 1999; Matcharashvili and Janiashvili, 2001) and show that nonlinear time series analysis methods enable the detection of dynamical changes in complex physiological processes. 4. Conclusions Modern methods of qualitative and quantitative analysis of the complexity of natural processes are able to detect tiny dynamical changes imperceptible with classical data analysis methods. In this chapter the main principles of modern time series analysis methods, based on reconstructed phase space
Figure 11. (a) RQA %DET and (b) Laminarity of blood pressure time series versus the state of health. Diamonds—systolic pressure, squares—diastolic pressure, triangles—heart rate.
IDENTIFICATION OF COMPLEX PROCESSES
237
238
TEIMURAZ MATCHARASHVILI ET AL.
structure testing, have been briefly described. It was shown, using data sets of different origins, that correctly using these methods may indeed provide a unique opportunity for qualitative detection and quantitative identification of dynamical peculiarities of complex natural processes. Acknowledgements Authors would like gratefully acknowledge director of NATO ASI “Imaging for Detection and Identification”, Dr. James S. Byrnes for manuscript corrections and overall support. We also wish to thank Dr. Mikheil Tsiklauri from Tbilisi State University for technical assistance. References Abarbanel, H.D.I., Brown, R., Sidorowich., and Tsimring, L. S. (1993) The Analysis of Observed Chaotic Data in Physical Systems, Rev. Mod. Phys. 65 (4), 1331–1392. Antani, J.A., Wayne, H.H., and Kuzman, W.J. (1979) Ejaction Phase Indexes by Invasive and Noninvasive Methods: An Apexcardiographic, Echocardiographic and Ventriculographyc Correlative Study, Am. J. Cardiol. 43(2), 239–247. Arecchi, F.T., and Fariny, A. (1996) Lexicon of Complexity, ABS. Sest. F. Firenze. Argyris, J. H., Faust, G., and Haase, M. (1994) An Exploration of Chaos, North-Holland, Amsterdam. Bak, P., Tang, C., and Wiesenfeld, K. (1988) Self-organized Criticality, Phys. Rev. A 38, 364– 374. Beeler, N.M. and Lockner, D.A. (2003) Why Earthquakes Correlate Weakly with the SolidEarth Tides: Effects of Periodic Stress on the Rate and Probability of Earthquake Occurrence, J. Geophys. Res., ESE. 108(8), 1–17. Bennett, C. H. (1990) in Complexity, Entropy and the Physics of Information, ed., AddisonWesley, Reading MA. Berge, P., Pomeau. Y., and Vidal, C. (1984) Order within Chaos, J. Wiley, NY. Bhattacharya, J. (1999) Search of Regularity in Irregular and Complex Signals, Nonlinear Dynamics PhD, Indian Institute of Technology, India. Boffetta, G., Cencini, M., Falcioni, M., and Vulpiani, A. (2002) Predictability: A Way to Characterize Complexity, Phys. Rep. 356, 367–474. Bowman D., Ouillon G., Sammis C., Sornette A., and Sornette, D. (1998) An Observational Test of the Critical Earthquake Concept, J. Geophys. Res. 103, 24359–24372. Bunde, A., Kropp, J., and Schellnhuber H. J. (2002) The Science of Disasters, Climate Disruptions, Heart Attacks, Market Crashes, ed., Springer, Heidelberg. Casdagli, M.C. (1997) Recurrence Plots Revisited, Physica D 108, 12–44. Castro, R. and Sauer, T. (1997) Correlation Dimension of Attractors through Interspike Interval, Phys. Rev. E 55(1), 287–290. Chelidze, T. and Matcharashvili, T. (2003) Electromagnetic Control of Earthquakes Dynamics? Computers and Geosciences 29(5), 587–593. Chelidze, T., Matcharashvili, T., Gogiashvili, J., Lursmanashvili, O., and Devidze, M. (2005) Phase Synchronization of Slip in Laboratory Slider System, Nonlin. Proc. in Geophysics 12, 1–8.
IDENTIFICATION OF COMPLEX PROCESSES
239
Cover T. M. and Thomas, J. A. (1991) Elements of Information Theory, Wiley, New York. Eckman, J.P. and Ruelle, D. (1985) Ergodic Theory of Chaos and Strange Attractors, Rev. Mod. Phys. 57(3), 617–656. Elbert, T., Ray, W.J., Kowalik, Z.J., Skinner, J.E., Graf, E. K., and Birnbauer, N. (1994) Chaos and Physiology, Physiol Rev. 74, 1–49. Garfinkel, A., Chen, P., Walter, D.O., Karaguezian, H., Kogan, B., Evans, S.J., Karpoukhin, M., Hwang, C., Uchida, T., Gotoh, M., and Weiss, J.N. (1997) Qoasiperiodicity and Chaos in Cardiac Fibrillation, J. Clin. Invest. 99(2), 305–314. Gavrilenko, P., Melikadze, G., Chelidze, T., Gibert, D., and Kumsiashvili, G. (2000) Permanent Water Level Drop Associated with Spitak Earthquake: Observations at Lisi Borehole and Modeling, Geophys. J. Int. 143, 83–98. Geller, R.J. (1999) Earthquake Prediction: Is this Debate Necessary? Nature, Macmillan Publishers Ltd.; http://helix.nature.com. Gilmore, R. (1993) A New Test for Chaos, J. Econ. Behav. Organization 22, 209–237. Gilmore, R. (1998) Topological Analysis of Chaotic Dynamical Systems, Rev. Mod. Phys. 70, 1455–1529. Goltz C. (1998) Fractal and Chaotic Properties of Earthquakes, Springer, Berlin. Govindan, R.B., Narayanan, K., and Gopinathan, M.S. (1998) On the Evidence of Deterministic Chaos in ECG: Surrogate and Predictability Analysis, Chaos 8(2), 495–502. Grassberger, P. and Procaccia, I. (1983) Estimation of the Kolmogorov Entropy from a Chaotic Signal, Phys. Rev. A 28(4), 2591–2593. Hegger, R., Kantz, H., and Schreiber, T. (1999) Practical Implementation of Nonlinear Time Series Methods: The TISEAN Package, Chaos 9, 413–440. Hodgson G. M. (1993) Economics and Evolution: Bringing Life Back into Economics, Ann Arbor: University of Michigan Press. Ivanski, J. and Bradley, E. (1998) Recurrence Plots of Experimental Data: To Embed or not to Embed? Chaos 8, 861. Johansen. A. and Sornette, D. (1999) Acoustic Radiation Controls Dynamic Friction: Evidence from a Spring-Block Experiment, Phys. Rev. Lett. 82, 5152–5155. Jones, N. (2001) The Quake Machine, New Scientist 30(6), 34–37. Kagan, Y.Y. (1994) Observational Evidence for Earthquakes as a Nonlinear Dynamic Process, Physica D 77, 160–192. Kagan, Y.Y. (1997) Are Earthquakes Predictable? Geophys. J. Int. 131, 505–525. Kanamori, H. and Brodsky, E.E. (2001) The Physics of Earthquakes, Physics Today 6, 34– 40. Kantz, H. and Schreiber, T. (1997) Nonlinear Time Series Analysis, Cambridge, University Press. Keilis-Borok, V.I. (1994) Symptoms of Instability in a System of Earthquake-Prone Faults, Physica D 77, 193–199. Kennel, M.B., Brown R., and Abarbanel, H.D.I. (1992) Determining Minimum Embedding Dimension using a Geometrical Construction, Phys. Rev. A 45, 3403–3411. King, C. Y., Azuma, S., Igarashi, G., Ohno, M., Saito, H., and Wakita, H. (1999) Earthquake– Related Water-lewel Changes at 16 Closely Clustered Wells in Tono, Central Japan, J. Geoph. Res. 104(6), 13073–13082. Knopoff, L. (1999) Earthquake Prediction is Difficult But Not Impossible, Nature, Macmillan Publ. Ltd.; http://helix.nature.com. Korvin, G. (1992) Fractal Models in the Earth Sciences, Elsevier, NY. Kraskov, A., Stogbauer,H., and Grassberger, P. (2004) Estimating Mutual Information, Phys. Rev. 69 (066138).
240
TEIMURAZ MATCHARASHVILI ET AL.
Kumpel, H. (1994) Evidence for Self-similarity in the Harmonic Development of Earth Tides, In: Fractals and Dynamic Systems in Geoscience, Kruhl, J. H. ed., Springer, Berlin, pp. 213– 220. Lefebvre, J.H., Goodings, D.A., Kamath., and Fallen, E.L. (1993) Predictability of Normal Heart Rhytms and Deterministic Chaos, Chaos 3(2), 267–276. Main, I. (1997) Earthquakes–Long Odds on Prediction, Nature 385, 19–20. Marzochi, W. (1996) Detecting Low-dimensional Chaos in Time Series of Finite Length Generated from Discrete Parameter Processes, Physica D 90, 31–39. Marwan, M., Wessel,N., Meyerfeldt, U., Schirdewan, A., and Kurths, J. (2002) Recurrenceplot-based Measures of Complexity and their Application to Heart-rate-variability Data, Phys. Rev. E 66 (026702). Marwan, M. (2003) Encounters with neighborhood, University of Potsdam, Germany, Theoretical Physics PhD. Mason, D. T., Spenn, J.F., and Zelis, R. (1970) Quantification of the Contractile State of the Intact Human Heart, Am. J. Cardiol 26(3), 248–257. Matcharashvili, T., Chelidze, T., and Javakhishvili, Z. (2000) Nonlinear Analysis of Magnitude and Interevent Time Interval Sequences for Earthquakes of Caucasian Region, Nonlinear Processes in Geophysics 7, 9–19. Matcharashvili, T. and Janiashvili, M. (2001) Investigation of Variability of Indexes of Myocardial Contractility by Complexity Measure in Patients with Hypertension, In Sulis W., Trofimova I. (Eds) Proceedings of the NATO ASI Nonlinear dynamics in life and social sciences, 204–214, IOS Press, Amsterdam. Matcharashvili, T., Chelidze, T., Javakhishvili Z., and Ghlonti, E. (2002) Detecting Differences in Dynamics of Small Earthquakes Temporal Distribution before And After Large Events, Computers & Geosciences 28(5), 693–700. McCauley, J. L. (2004) Dynamics of Markets: Econophysics and Finance. Cambridge, UK, Cambridge University Press. Ott, E. (1993) Chaos in Dynamical Systems, Cambridge University Press. Packard, N.H., Crutchfield, J.P., Farmer, J.D., and Shaw, R.S. (1980) Geometry from a Time Series, Phys. Rev. Lett. 45, 712–716. Peinke, J., Matcharashvili, T., Chelidze, T., Gogiashvili, J., Nawroth, A., Lursmanashvili, O., and Javakhishvili, Z. (2006) Influence of Periodic Variations in Water Level on Regional Seismic Activity Around a Large Reservoir, Field and Laboratory Model, Physics of the Earth and Planetary Interior 156(2), 130–142. Pikkujamsa,S., Makikallio, T., Sourander, L., Raiha., I., Puukka, P., Skytta, J, Peng, C.K., Goldberger, A., and Huikuri, H. (2006) Cardiac Interbeat Interval Dinamics from Childhood to Senescence: Comparison of Conventional and New Measures Based on Fractals and Chaos Theory, Circulation 100, 383–393. Pikovsky, A., Rosenblum, M.G., and Kurth. J. (2003) Synchronization: Universal Concept in Nonlinear Science, Cambridge University Press 411. Rapp, P.E., Albano, A.M., Schmah, T.I., and Farwell, L. A. (1993) Filtered Noise can Mimic Low-dimensional Chaotic Attractors, Phys. Rev. E. 47(4), 2289–2297. Rapp, P.E., Albano, A.M., Zimmerman, I. D., and Jumenez-Montero, M.A. (1994) Phaserandomized Surrogates Can Produce Spurious Identification of Non-random Structure, Phys. Lett. A 192(1), 27–33. Rapp, P. E., Cellucci, C. J., Korslund, K. E., Watanabe, T. A., and Jimenez-Montano, M. A. (2001) Effective Normalization of Complexity Measurements for Epoch Length and Sampling Frequency, Phys. Rev. 64 (016209).
IDENTIFICATION OF COMPLEX PROCESSES
241
Rombouts, S.A., Keunen, R.W., and Stam, C.J. (1995) Investigation of Nonlinear Structure in Multichannel EEG, Phys.Lett. A 202(5/6), 352–358. Rosenstein, M. T., Collins, J. J., and DeLuca, C. J. (1993) A Practical Method for Calculating Largest Lyapunov Exponents from Small Data Sets, Physica D 65, 117–134. Ruelle, D. (1994) Ehere Can One Hope to Profitably Apply the Ideas of Chaos? Physics Today 47(7), 24–32. Rundle, J., Turcotte, D., and Klein, W. (2000) GeoComplexity and the Physics of Earthquakes, AGU, Washington. Sato, S., Sano, M., and Sawada, Y. (1987) Practical Methods of Measuring the Generalized Dimension and the Largest Lyapunov Exponent in High Dimensional Chaotic Systems, Prog. Theor. Phys. 77, 1–5. Scholz C.H. (1990) Earthquakes as Chaos, Nature 348, 197–198. Schreiber, T. (1993) Extremely Simple Nonlinear Noise-reduction Method, Phys. Rev. E 47(4), 2401–2404. Schreiber, T. (1999) Interdisciplinary Application of Nonlinear Time Series Methods, Phys. Rep. 308, 1–64. Schreiber, T., and Schmitz, A. (1999) Testing for Nonlinearity in Unevenly Sampled Time Series, Phys. Rev. E 59(044). Shannon, C. E. (1948) A Mathematical Theory of Communications, The Bell System, Techn. J. 27, 623–656. Shannon, C. E. (1964) The Mathematical Theory of Communication, University of Illinois, Urbana, IL. Shiner, J. S., Davison, M., and Landsberg, P. T. (1999) Simple Measure for Complexity, Phys. Rev. E 59(2), 1459–1464. Sibson, R. (1994) Simple Measure for Complexity, In: Geofluids: Origin, Migration and Evolution of Fluids in sedimentary Basins, Parnell J., ed., The Geological Society, London, pp. 69–84. Simpson, D.W., Leith, W. S., and Scholz. C. (1988) Two Types of Reservoir-induced Seismicity, Bull. Seism. Soc. Am. 78, 2025–2040. Sivakumar, B., Berndtsson, R., Olsson, J., and Jinno, K. (2002) Reply to “Which Chaos in the Rainfall-runoff Process”, Hydrol. Sci. J. 47(1), 149–58. Smirnov, V.B. (1995) Fractal Properties of Seismicity of Caucasus, J. of Earthq. Prediction Res. 4, 31–45. Sprott, J. C. and Rowlands, G. (1995) Chaos Data Analyzer; the Professional Version, AIP, NY. Takens, F. (1981) Detecting Strange Attractors in Fluid Turbulence, In: Dynamical Systems and Turbulence, Rand, D., and Young, L.S., ed., Berlin, pp. 366–381. Talwani, P. (1997) On Nature of Reservoir-induced Seismicity, Pure and Appl. Geophys. 150, 473–492. Tarasov, N.T. (1997) Crustal Seismicity Variation Under Electric Action, Transactions (Doklady) of the Russian Academy of Sciences 353(3), 445–448. Theiler, J., Eubank, S., Longtin, A., Galdrikian, B., and Farmer, J.D. (1992) Testing for Nonlinearity in Time Series: The Method of Surrogate Data, Physica D 58, 77–94. Theiler, J. and Prichard, D. (1997) Using “Surrogate-surrogate Data” to Calibrate the Actual Rate of False Positives in Tests for Nonlinearity in Time Series, In: Nonlinear Dynamics and Time Series,Cutler, D., and Kaplan, D.T., ed., Fields Institute Communications, pp. 99–113. Turcotte, D. (1992) Fractals and Chaos in Geology and Geophysics, Thesis PhD, University Press, Cambridge.
242
TEIMURAZ MATCHARASHVILI ET AL.
Vidale, J.E., Agnew, D.C., Johnston, M.J.S., and Oppenheimer, D.H. (1998) Absence of Earthquake Correlation with Earth Tides: An Indication of High Preseismic Fault Stress Rate, J. Geophys. Res. 103(24), 567–572. Volykhin, A.M., Bragin, V.D., and Zubovich, A.P. (1993) Geodynamic Processes in Geophysical Fields, Moscow, Nauka. Wackerbauer, R., Witt, A., Atmanspacher, H., Kurths, J., and Scheingraber, H. (1994) A Comparative Classification of Complexity Measures, Chaos, Solitons & Fractals 4, 133–137. Weiss, J.N., Garfinkel, A., Karaguezian, H., Zhilin, Q., and Chen, P. (1999) Chaos and Transition to Ventricular Fibrilation: A New Approach to Antiarrhythmic Drug Evaluation, Circulation 99(21), 2819–2826. Wolf, A., Swift, J., Swinney, H., and Vastano, J. (1985) Determining Lyapunov Exponents from a Time Series, Physica D 16, 285–317. Wyss, M. (1997) Cannot Earthquakes Be Predicted? Science 278, 487–488. Yang, P., Brasseur, G. P., Gille, J. C., and Madronich, S. (1994) Dimensionalities of Ozone Attractors and their Global Distribution, Physica D 76 (3310343). Yao, W., Essex, C., Yu, P., and Davison, M. (2004) Measure of Predictability, Phys. Rev. E 69, 110–123. Zbilut, J.P. and Weber, C, L. (1992) Embeddings and Delays as Derived from Quantification of Recurrence Plots, Phys. Lett. A 171, 199–203. Zbilut, J.P.A., Giuliani, C.L., and Webber Jr. (1998) Detecting Deterministic Signals in Exceptionally Noisy Environments Using Cross-recurrence Quantification, Phys. Lett. A 246, 122–128. Zhang, X. and Thakor, N.V. (1999) Detecting Ventricular Tachicardia and Fibrillation by Complexity Measure, IEEE Trans. on Biomed. Eng. 46(5), 548–555. Zokhowski. M., Winkowska-Nowak, K., and Nowak, A. (1997) Autocorrelation of R-R Distributions as a Measure of Heart Variability, Phys. Rev. E 56(3), 3725–3727.
TIME-RESOLVED LUMINESCENCE IMAGING AND APPLICATIONS
Ismail Mekkaoui Alaoui (
[email protected]) Department of Physics, Faculty of Sciences Semlalia, Cadi Ayyad University, BP. 2390 Marrakech 40000, Morocco
Abstract. The registration of luminescence emission spectra at different decay times from the excitation pulse is called time-resolved (gated) luminescence spectroscopy. This technique makes it possible to distinguish between short and long decay components. In particular, a long decay emission (phosphorescence) can be separated from the background signal, which most often consists of short decay fluorescence and scattering; because measurements do not occur until a certain time has elapsed from the moment of excitation. Time-resolved spectroscopy finds applications in many areas such as molecular phosphorescence, fingerprint detection, immunoassay, enzyme activity measurements, etc. In this presentation we will focus on fingerprint detection with time-resolved luminescence imaging using Eu(III) and Tb(III). These two chemicals react with Ruhemann’s Purple, RP (RP is the reaction product of ninhydrin with amino acid glycine), and are suitable for this purpose. They have a relatively long luminescence decay of millisecond-order. Instrumentation and fingerprint samples developed by this technique will be presented and discussed. Key words: luminescence, time-resolved, imaging, laser, fingerprints
1. Introduction The detection of latent fingerprints by laser (Menzel, 1989), or more generally, luminescence detection of fingerprints provides great sensitivity. Fingerprint treatments such as rhodamine 6G staining and ninhydrin/zinc chloride are now routine. What has kept laser fingerprint detection from becoming a truly universal technique is that many surfaces fluoresce very intensely under the laser illumination, overwhelming the fingerprint luminescence. Such surfaces were until recently not amenable to the current routine procedures such as those mentioned above. To mitigate this deficiency, timeresolved luminescence imaging has been explored (Mitchell and Menzel, 1989). 243 J. Byrnes (ed.), Imaging for Detection and Identification, 243–248. C 2007 Springer.
244
ISMAIL MEKKAOUI ALAOUI
Two general chemical strategies were explored. One involved the use of transition metal complexes that yield charge transfer phosphorescence (long-lived luminescence) with microsecond-order lifetimes. These complexes would be used as staining dyes for smooth surfaces much like rhodamine 6G, or be incorporated into dusting powders (Menzel, 1988). Alternatively, for porous surfaces such as paper, ninhydrin treatment followed by treatment with rare earth salts, involving the lanthanides Eu3+ or Tb3+ , an analog to the ninhydrin/zinc chloride treatment, was investigated (Mekkaoui, Alaoui, 1992). The formed rare earth-RP complexes exhibit luminescence lifetimes of millisecond order (Mekkaoui Alaoui and Menzel, 1993). Some of the above transition metal complexes are amenable to excitation with bluegreen light (the customary Ar-laser output), others respond to near-ultraviolet excitation (also obtainable with Ar-lasers). Such wavelength switching, although not difficult, is clumsy nonetheless. Worse, for time-resolved imaging of luminescence of microsecond order lifetime, the laser chopping requires devices such as electro-optic modulators. These are not only expensive, but demand delicate optical alignment and careful gain and bias adjustment. For luminescence of millisecond-order lifetimes, the laser chopping is much easier since a cheap and easily usable mechanical light chopper suffices. 2. Time Resolved Luminescence Imaging 2.1. SENSITIZED LUMINESCENCE AND INTRAMOLECULAR ENERGY TRANSFER
Sensitized luminescence is the process whereby an element having no appreciable absorption band in the visible or ultraviolet (UV) spectrum is made to emit appreciable radiation upon excitation in this region as a result of energy transfer from the absorbing ligand with which it is complexed. The rare earth salts showing luminescence (Eu3+ and Tb3+ ) are known for their narrow and weak absorption bands in the UV region coupled with emission bands which have narrow half-widths in the visible region. The radiative transitions of these elements are weak because the absorption of the ion itself is very low. But it can be enhanced, when (Eu3+ or Tb3+ ) is bonded to appropriate organic ligands, via intramolecular energy transfer from the organic ligands (good absorbers) to the rare earth ions (good luminescers) when excited with the right excitation (Weissman, 1942). Rare earth-RP complexes show emission enhancement of the rare earth ions. So far, the only excitations that lead to energy transfer from the organic ligands, RPs, to the Eu3+ or Tb3+ are in the near UV range (200 to 400 nm) (Mekkaoui Alaoui, 1995). Eu-RP complexes are readily prepared in solution. If such solutions were effective for fingerprint staining, akin to rhodamin 6G, then one should be able to use
TIME-RESOLVED LUMINESCENCE IMAGING AND APPLICATIONS 245
this chemical strategy for fingerprints on smooth and porous surfaces alike. This could reduce instrumentation cost and also make for ease of instrumentation operation. We need to state that some fingerprint reagents, such as 1,2-indanedione (Joullie and Petrovskaia, 1998) fluoresce under Ar laser and develop fingerprints without requiring time resolved luminescence imaging (Mekkaoui Alaoui et al., 2005). 2.2. PRINCIPLE OF TIME-RESOLVED LUMINESCENCE
The basic principle of the technique is as follows. The beam of an Ar-laser is chopped by a mechanical light chopper or electro-optic modulator so that laser pulses with sharp cut-off illuminate the article under scrutiny. The article is viewed with an imaging device (CCD camera) synchronized to the laser pulses such that it turns on shortly after laser cut-off and turns off shortly before onset of the next laser pulse. The imaging device is operated in this way because the offending background fluorescences have short lifetimes (nanoseconds order), i.e., the background fluorescence decays very quickly after laser pulse cut-off. If fingerprint treatments can be developed such that much longer luminescence lifetimes result, then the imaging device will only detect this long-lived luminescence and will suppress the background. The principle of the time-resolved luminescence imaging is sketched in Figure 1. Timeresolved spectroscopy finds other applications in many areas (Diamandis, 1988). The system used in time-resolved luminescence imaging consists of a laser, modulator, microscope, camera, signal trigger discriminator, microcomputer. An argon-ion laser modulated by a mechanical chopper excites the samples. The sample luminescence is detected after the background fluorescence has died. The image is taken by the CCD (charge coupled device) camera through a microscope. The camera is connected to a microcomputer. The image taken by the camera is sent to the monitor and can be manipulated or stored or printed. 2.3. LASER MODULATIONS
CW argon-ion lasers are used as excitation of our samples. They deliver continuous wave signals. For time-resolved luminescence purposes, we need some method of switching off and on (modulation) the excitation of the samples as desired depending on the luminescence decay time and also on the background decay lifetime. Two ways of laser modulation were used depending on the decay time range of the compounds. Mechanical light chopper for relatively long lifetimes (of milliseconds order), and electro-optic modulation for relatively short lifetime decays (of nanosecond or microseconds orders). For
246
ISMAIL MEKKAOUI ALAOUI
Figure 1. Principle of time-resolved luminescence imaging
Eu-RP compounds (3 ms), a mechanical chopper is sufficient when operating at 169 Hz (6 ms).
3. Application in Fingerprints Development 3.1. FINGERPRINTS STAINING
The stability of the RP-Eu3+ complex when conserved at low temperature inspired us to use them for fingerprints staining. Fingerprints were deposited on surfaces that can be stained (aluminum cans, plastics, tapes, etc.). Eu-RP complexes were used to stain these surfaces. The surfaces were chosen to show
TIME-RESOLVED LUMINESCENCE IMAGING AND APPLICATIONS 247
Figure 2. Fingerprint, on a coke can, developed by time-resolved luminescence technique: ungated camera (left), gated camera (right)
high background fluorescence, and the time-resolved luminescence imaging system was used to suppress the background. The results are promising: A fingerprint from a soft drink can (a highly fluorescent surface) stained with Eu-nitrate-RP solution has been developed as shown in Figure 2. We can see a high background luminescence signal (left image) overwhelming the fingerprint luminescence signal, while, when the image is gated (right), the fingerprint signal is showing up. 3.2. FINGERPRINT DEVELOPMENT
Fingerprints were deposited on porous surfaces (i.e., paper, cardboard) and were then treated with Ninhydrin and analogs. Some of the samples were left at ambient humidity and temperature, and some of the samples were incubated at about 50◦ C and 60% relative humidity. The samples left at ambient conditions were slower to develop. The samples were then sprayed with solutions of EuCl3 · 6H2 O. Under UV light, the red luminescence from the fingerprint was generally comparable to that of the rest of the surface with only slight enhancement of luminescence. The usage of the time resolved luminescence imaging system to picture the fingerprints was not always possible, because something quenches the energy transfer from the RP to Eu or possibly enhances the emission of unreacted Eu (paper absorbs strongly in the near-UV and fluoresces as well as a result). We concluded that either the problem comes from the surface itself or that the chloride salt is simply insufficient in many instances. Then we started to use different kind of papers and different Eu anions. Some samples were developed in this way, but still more work is needed in this area.
248
ISMAIL MEKKAOUI ALAOUI
Acknowledgements Most of this work has been done at the Center for Forensic Studies, TTU, with a grant from the Moroccan-American Commission for Cultural and Educational Exchanges and the Fulbright Program.
References Diamandis, E. P. (1988) Immunoassay with Time-resolved Fluorescence Spectroscopy: Principle and Applications, Clinical Biochemistry 21, 139–150. Joulli, M. M. and Petrovskaia, O. (1998) A Better Way to Develop Fingerprints, ChemTech 28(8), 41–44. Mekkaoui Alaoui, I. (1992) Optical Spectroscopic Properties of Ruhemann’s Purple Complexes, Time Resolved Luminescence Imaging and Applications, Ph.D. Texas Tech University. Mekkaoui Alaoui, I. (1995) Non-Participation of the Ligand First Triplet State in Intramolecular Energy Transfer In Europium and Terbium Ruhemann’s Purple Complexes, Journal of Physical Chemistry 99, 13280–13282. Mekkaoui Alaoui, I. and Menzel E. R. (1993) Spectroscopy of Rare Earth Ruhemann’s Purple Complexes, Jouranl of Forensic Sciences 38(3), 506–520. Mekkaoui Alaoui, I., Menzel, E. R., Farag, M., Cheng, K. H., and Murdock, R. H. (2005) Mass Spectra and Time-resolved Fluorescence Spectroscopy of the Reaction Product of Glycine with 1,2-Indanedione in Methanol, Forensic Science International 152, 215–219. Menzel, E. R. (1988) Laser Detection of Latent Fingerprints: Tris(2,2 -bipyridyl Ruthenium (II) Chloride Hexahydrate as a Staining Dye for Time-resolved Imaging, In E. R. Menzel (ed.), Fluorescence Detection II, SPIE Proceedings, Vol. 910, pp. 45–51. Menzel, E. R. (1989) Detection of Latent Fingerprints by Laser-excited Luminescence, Analytical Chemistry 61, 557–561A. Mitchell, K. E. and Menzel, E. R. (1989) Time Resolved Luminescence Imaging: Application to Latent Fingerprint Detection, In E. R. Menzel (ed.), Fluorescence Detection III, SPIE Proceedings, Vol. 1054, pp. 191–195. Weissman, S. I. (1942) Intramolecular Energy Transfer, the Fluorescence of Complexes of Europium, Journal of Chemical Physics 10, 214–217.
SPECTRUM SLIDING ANALYSIS
Vladimir Ya. Krakovsky (vladimir
[email protected]) National Aviation University, Kiev, Ukraine
Abstract. The spectrum sliding analysis (SSA) is a dynamic spectrum analysis in which the next analysis interval differs from the previous one by including the next signal sample and excluding the first one from the previous analysis interval. Such a spectrum analysis is necessary for time-frequency localization of analyzed signal with given peculiarities. Using the well-known fast Fourier transform (FFT) towards this aim is not effective. Recursive algorithms which use only one complex multiplication for computing one spectrum sample during each analysis interval are more effective. The author improved one algorithm so that it is possible to use only one complex multiplication for computing two, four, and even eight (for complex signals) spectrum samples simultaneously. Problems of realization and application of the spectrum sliding analysis are also considered in the paper. Key words: spectrum sliding analysis, dynamic spectrum analysis, recursive algorithms, sliding spectrum algorithms and analyzers, instant spectrum algorithms and analyzers, algorithms for instantaneous spectrum digital analyzers, multichannel filter
1. Introduction Depending on the form of signal presentation, spectrum sliding analysis (SSA) may be implemented by analog, digital, or discrete-analog spectrum analyzers. Narrow band analog filters can be used to implement SSA at some points of the frequency’s working band. However, such analyzers are usually designed to analyze power spectrum and are not capable of analyzing phase spectrum and complex spectrum Cartesian constituents, which restricts their application (Chajkovsky, 1977). The spectrum analysis discrete-analog method uses the discrete signal formed by sampling the signal to be analyzed with an uncrossed pulse sequence, the magnitudes of which are altered in accordance with the sample values. This type of analyzer permits spectrum analysis with quality and quantity compatibility conditions, satisfying the information completeness condition (Chajkovsky, 1977), and it may be adapted for SSA (Chajkovsky and Krakovsky, 1979; Krakovsky, 1981). Spectrum digital 249 J. Byrnes (ed.), Imaging for Detection and Identification, 249–261. C 2007 Springer.
250
VLADIMIR YA. KRAKOVSKY
analyzers (Plotnikov et al., 1990) are the most popular. There are two SSA spectrum methods: in one the origin of the reference function is matched with the interval analysis origin. This approach computes the sliding spectrum (Chajkovsky, 1976). In many applications (e.g., speech recognition) there is no need for such matching. When this is the case the computation is called the instant spectrum (Chajkovsky, 1976). Algorithms and devices implementing the two methods are considered in the following sections.
2. Sliding Spectrum Algorithms and Devices The discrete sliding spectrum is given by the formula: Fq ( p) =
1 N
q
− p(k−(q−N +1))
f (k)W N
, p ∈ 0, P − 1, q = 0, 1, 2, . . . ,
k=q−N +1
(1) where q is an analysis interval index, k is a signal sample index in the limits of the sliding window k ∈ q − N + 1, q, N is the size of the processing extract, and Fq ( p) is the complex spectrum sample at the frequency pω at the instant − pk qt, f (k) is the analyzed signal sample value at the instantkt, W N is 2π the complex reference coefficient symbolic representation exp − j N pk . t and ω are the discrete time and frequency intervals, determined by the information completeness condition (Chajkovsky, 1977). Direct computation of the functional (1) requires NP complex multiplications and (n − 1)P additions. Fast Fourier transformation (FFT) (Cooley and Tukey, 1965) reduces this number by N /(log2 N ). Further reduction is possible. The formula: 1 p Fq ( p) = Fq−1 ( p) + f (q) − f (q − N ) W N , p ∈ 0, P − 1, N q = 0, 1, 2, . . . . (2) leads to a recursive algorithm (Halberstein, 1996) computing the sliding spectrum samples. The computer requirements (number of complex multiplications) are considerably lower using (2) rather than (1). In (1), N complex multiplications are computed for each spectrum sample (log2 N if FFT is used). Using (2) requires only one complex multiplication for the same computation. This advantage has led to many sliding spectrum analyzers implementing (2). The spectrum analyzer functional diagram implementing the algorithm (2) is shown in Fig. 1. The following symbols are used in the diagram: ADC is the analog-digital converter, DR is the delay register, SU is the subtraction
251
SPECTRUM SLIDING ANALYSIS
Figure 1. The spectrum analyzer functional diagram (2).
unit, AD is the adder, WU is the weighting unit, MU is the multiplier, RAM is the random-access memory; f (t) is the analog input signal; f q are digital readouts of the signal f (t) ; f q−N are the readouts f q delayed by N steps; f q p are the readouts of the difference 2π signal f q − f q−N ; W N are the readouts of the weighting function exp j N p ; Fq ( p) , Fq−1 ( p) are the sliding spectrum readouts in the current and the preceding steps, respectively. A detailed description of the characteristics of the analyzers (Chajkovsky et al., 1977, 1978; Dillard, 1977) implementing (2), are given in Krakovsky (1980). Computational stability using algorithm (2) is achieved by the appropriate rounding of twiddle factor constituents having factor modulus not exceeding unity (Krakovsky and Chajkovsky, 1984). The constituents word size should be (1/2)log2 (number of sliding steps) more than the signal samples word size (Krakovsky, 1983a). For an unlimited number of sliding steps it is possible to use the technical decision described in (Perreault and Rich, 1980). While using algorithm (2) the reference function is always matched with the analysis interval origin, thereby ensuring a matched filtering (Chajkovsky and Krakovsky, 1983, 1985, 1986). 3. Instant Spectrum Algorithms and Devices The instant spectrum 1 Fq ( p) = N
q k=q−N +1
− pk
f (k)W N
,
p ∈ 0, P − q,
q = 0, 1, 2, . . .
(3)
can be implemented by the following simple recursive algorithm, in Lejtes and Sobolev (1969) and Schmitt and Starkey (1973). 1 − pq Fq ( p) = Fq−1 ( p) + f (q) − f (q − N ) W N , p ∈ 0, P − 1, N q = 0, 1, 2, . . . . (4) − pq
This algorithm is always stable and the twiddle factors W N word size is the same as the word size of signal samples. Moreover, it is possible to achieve matching with an additional multiplier and a complex conjugate
252
VLADIMIR YA. KRAKOVSKY
Figure 2. The spectrum analyzer functional diagram (4).
device (Chajkovsky, 1976). In Krakovsky and Koval (1982, 1984) devices that implement this algorithm using the conveyer mode are described. The spectrum analyzer functional diagram implementing algorithm (4) is shown in Fig. 2, where Fq ( p) are the increments of the spectrum readouts (1/N ) f q W N− pq . The spectrum analyzer functional diagram implementing algorithm (4) with matching is shown in Fig. 3, where CCU and AMU are the complex conjugate device and additional multiplier, respectively (Chajkovsky, 1976). This algorithm has a remarkable property which permits one to organize SSA so that one complex multiplication may be used for computing two, four, and even eight (for complex signals) spectrum harmonics at once (Krakovsky, 1990, 1997, 1998, 2000; Krakovskii, 1993). This may be done by presenting algorithm (4) as follows: F( p) = Fq−1 ( p) + Fq ( p), p ∈ 0, P − 1, q = 0, 1, 2, . . . , 1 2π Fq ( p) = f (q) − F(q − N ) exp − j pq N N
(5) (6)
The spectrum increments Fq ( p) may be used not only for spectrum harmonic p, but also for the spectrum harmonics pi = i
N + p, i ∈ 1, 3 4
(7)
pk = k
N − p, k ∈ 1, 4 4
(8)
and
Figure 3. Spectrum analyzer function with matching.
SPECTRUM SLIDING ANALYSIS
253
Figure 4. Generalized functional diagram: implementation device.
using known properties of the complex exponential function. A simplified summary of the algorithm (5) can be described as follows: (a) for harmonics of the form (7) Fq ( pi ) = Fq−1 ( pi ) + (− j)iq Fq ( p),
q = 0, 1, 2, . . . ,
(9)
q = 0, 1, 2, . . . ,
(10)
(b) for harmonics of the form (8) Fq ( pk ) = Fq−1 ( pk ) + (− j)kq Fq (− p),
where Fq (− p) are complex conjugated spectrum increments Fq ( p), if the signal samples are real. If the signal samples are complex, the increments Fq (− p) are generated of the products by inverting the signs
of the signal increments f q = N1 f (q) − f (q − N ) with the imaginary part of the weighting function and then forming the appropriate algebraic sums.
Devices implementing algorithms (9) and (10) are presented in Krakovsky and Koval (1988a, 1988b, 1989) and Krakovsky et al. (1991a, 1991b). Their generalized functional diagram is shown in Fig. 4, where SW are switching components, m is the number of additional subranges, m ∈ {1, 3, 7}. In the simplest case of complex signal analysis (P = N ), the harmonic range is partitioned into two subranges p ∈ 0, N2 − 1, and by (7) for i = 2 we have p2 = N2 + p. By (9) the readouts for the harmonics p2 are determined by: Fq ( p2 ) = Fq−1 ( p2 ) + (−1)q Fq ( p),
(11)
254
VLADIMIR YA. KRAKOVSKY
TABLE I. Multiplexer control signal q mod 4
Fq ( p1 )
Fq ( p2 )
Fq ( p3 )
0
Fq ( p)
Fq ( p)
Fq ( p)
1
ImFq ( p) − j ReFq ( p)
−Fq ( p)
−ImFq ( p) + j ReFq ( p)
2
−Fq ( p)
Fq ( p)
−Fq ( p)
3
−ImFq ( p) + j ReFq ( p)
−Fq ( p)
ImFq ( p) − j ReFq ( p)
i.e., the spectrum increment Fq ( p) for harmonic p obtained at the MU output are fed through SW to the input of the adder for subrange p2 . In SW, as we see from (11), the sign of the increments Fq ( p) are either preserved or inverted, depending on the parity of the readout index q. To this end, it is sufficient to include in SW two operand sign control units (for the real and imaginary components of Fq ( p)), which are controlled by the binary signal from the output of the least significant bit in the q-counter (Krakovsky and Koval (1988a)). Operand sign control can be performed by exclusive OR circuits. Simultaneous evaluation of two harmonics doubles the functional speed at a small additional hardware cost. Using (7) and dividing the harmonic range into four (for a complex signal) or two (for a real signal) subranges, we can similarly design appropriate analyzer structures (Krakovsky and Koval, 1988b, 1989). In this case, SW, in addition to operand sign control unit, also contains multiplexers for the operations (− j)iq Fq ( p). The activation sequence of these multiplexers with the corresponding operand sign control circuits can be determined from the data in Table I, which gives the dependences of the increments Fq ( p1 ), Fq ( p2 ), and Fq ( p3 ) on Fq ( p) and q. We see from Table I that the multiplexer control signal can be taken from the output of the least significant bit of the q-counter, while the control signals for the operand sign control circuits can be provided either by the outputs of the two least significant bits of this counter or by their logical XOR. The SSA speed can be further doubled if, in addition to (7), we also use (8). In this case, SW must also include components that generate the spectrum increments Fq (− p) (10). Moreover, the second SW input should receive from the SU the signal readout increment f q (dashed line in Fig. 4). This is needed for organizing simultaneous processing in corresponding subranges of operands of the harmonics (2i + 1) N8 and k N4 , i ∈ 1, 3, k ∈ 1, 4. To explain this technique, Tables II and III list the indices of simultaneously determined harmonics for real (Krakovsky et al., 1991a) and complex (Krakovsky et al., 1991b) signals. With the determination of the harmonic
255
SPECTRUM SLIDING ANALYSIS TABLE II. Indices of simultaneously determined harmonics Indices of simultaneously determined harmonics
Subrange p
1
2
···
N 8
−1
N 8
N 8
+1
N 4
p1
N 4
−1
N 4
−2
···
p2
N 4
+1
N 4
+2
···
3 N8 − 1
3 N8
p3
N 2
−1
N 2
−2
···
3 N8 + 1
0
TABLE III. Indices of simultaneously determined harmonics Indices of simultaneously determined harmonics
Subrange p
1
2
···
N 8
−1
N 8
N 8
+1
N 4
p1
N 4
−1
N 4
−2
···
p2
N 4
+1
N 4
+2
···
3 N8 − 1
3 N8
p3
N 2
−1
N 2
−2
···
3 N8 + 1
N 2
p4
N 2
+1
N 2
+2
···
5 N8 − 1
5 N8
p5
3 N4 − 1
3 N4 − 2
···
5 N8 + 1
3 N4
p6
3 N4 + 1
3 N4 + 2
···
7 N8 − 1
7 N8
p7
N −1
N −2
···
7 N8 + 1
0
p = N8 , Fq ( p), is used to determine the harmonics (2i + 1) N8 , i ∈ 1, 3, and f q is used at the same time to calculate the harmonics k N4 , k ∈ 1, 4. Structural diagrams of analyzers based on algorithms (9) and (10) are given in the invention descriptions (Krakovsky and Koval, 1988a, 1988b, 1989; Krakovsky et al., 1991a, 1991b). These analyzers use modern hardware components and require some additional hardware costs to expand the frequency range by two, four, or eight times. The additional hardware costs increase much slower than the incremental benefits produced by group complex multiplication.
4. Multichannel Matched Filtering on the Basis of SSA SSA may be used for FIR-filtering (Rabiner and Gold, 1975) in invariant to information shift (stationary) mode. Such filters perform signal convolution
256
VLADIMIR YA. KRAKOVSKY
with a given impulse response τ ∗ s(τ ) = f (t)g(t − τ ) dt = τ −T
T
f ∗ (t + τ )g(t) dt.
(12)
0
In accordance with Parsevol theorem (12) is equivalent to the integral ∞ 1 s(τ ) = Fτ (ω)G ∗ (ω) dω, 2π −∞
(13)
where G(ω) is the unshifted impulse response spectrum, and Fτ (ω) is the signal sliding spectrum in the analysis interval T . The discrete version of (13) is r (q) =
P−1
Fq ( p)G ∗ ( p),
(14)
p=0
where Fq ( p) and G ∗ ( p) are samples of the spectra Fτ (ω) and G ∗ (ω), P is the number of impulse response spectrum readouts. Expression (14) is equivalent to the traditional discrete correlation r (q) =
N −1
f (k + q)g(k).
(15)
k=0
The expression (14) is the formal description of the FIR-filtering algorithm on the basis of SSA (Chajkovsky and Krakovsky, 1986). Practical realization of the filtration procedure while processing with a set of D different impulse responses may be performed by the device, whose functional diagram is shown in Fig. 5, where SSA is a harmonic sliding analyzer; MUi are multipliers,
Figure 5. Varying impulse responses.
SPECTRUM SLIDING ANALYSIS
257
i ∈ 1, D ; ROM G∗i are complex conjugated transfer function ROMs of the spectral readouts for the appropriate filter channels; and AAi are accumulating adders of the pair products. In such a device mutually overlapped input signal readouts series are used, with the help of SSA, to continuously in time complete the sliding spectrum readouts series Fq ( p), q = 0, 1, 2, . . . . These values are simultaneously placed in all the MUi . The second inputs of the MUi are connected to outputs of the ROM G∗i , which, synchronously with each pth harmonic readout of the sliding spectrum, present respective pth harmonic readouts of the complex conjugated transfer function spectrum. The products from the MUi outputs are placed into the AAi inputs. At the end of each sliding step, we form the sum (14) of the readouts ri (q). As follows from Fig. 5, each channel of filtering consists of one accumulating adder, one ROM G∗i , and one multiplier connected to the common output of the SSA. The channel structure is uniform throughout. The difference is determined by the contents of the transfer function spectral readouts ROM G∗i . The concrete performances of the appropriate devices for multichannel filtering on the basis of SSA are given in Chajkovsky and Krakovsky (1983) for real signals and Chajkovsky and Krakovsky (1985) for complex signals. A criterion of effectiveness of the multichannel filter based on the SSA is the number of complex multiplications used for realization of one step of sliding on the set of D channels of filtering. In accordance with algorithm (14) and the structure of the complex multichannel filter the number αs of complex multiplication operations realizing one step of sliding is determined by the sum of the operations in the first stage of processing (stage of computing the sliding spectrum) and operations in the second stage of processing (D times of weighting and summation of the spectral components). Thus, αs = P + DP = P(1 + D). The appropriate computing expenditure realizing the standard correlation procedure of the complex multichannel filtering (15) is αc = D N . Consequently, the benefit of the effectiveness of the multichannel filtering on the basis of SSA can be estimated by the ratio of the computing expenditures B=
αc DN . = αs P(1 + D)
(16)
The more N and the less P, compared with N , the more advantageous filtering on the basis of SSA. It is possible to show that, in the case of matched
258
VLADIMIR YA. KRAKOVSKY
(optimal) filtering, the number P practically coincides with the base of the useful signal Q = FT. The most benefit is realized in the use of the matched filtering when P = 1. In this case the filter degenerates into D independent digital resonators. If N D, then Bmax =
DN ≈ N. 1+ D
(17)
As the order of the realized filter or base of the filtered signal increases (P increases) the relative benefit decreases. In the limit, when P = N Bmin =
D ≈ 1, 1+ D
(18)
and the effectiveness of the digital filter on the basis of SSA does not essentially differ from the effectiveness of the correlation filter. It is important to note that the decrease in computing expenditures, when using SSA, leads to an almost proportional decrease in the hardware volume used. For example, when realizing D-channel filtering on the basis of correlation convolution (15) it is necessary to use D multipliers and adders working in parallel. The speed of each is determined by the necessity to perform N multiplications and additions for one sliding step. When realizing multichannel filters on the basis of SSA the number of multipliers and adders used to achieve the same productivity may be decreased approximately by N /P times, because each channel performs only P multiplications and additions. In the extreme case when organizing D digital resonators (P = 1) it is sufficient to use solely the multiplier in the SSA, since the secondary processing in each channel completely degenerates and the SSA itself serves as the multichannel filter. When filtering simple and low complexity signals in conditions of high degree a priori frequency uncertainty, when N P, multichannel filter on the basis of SSA has advantages in hardware expenditures not only in comparison with correlation FIR-filtering but also with very effective filtering on the basis of FFT (Rabiner and Gold, 1975). Multichannel filtering devices contain a direct Fourier transform unit. The output is connected with D filtering channels, each of which consists of a weighting unit and an inverse Fourier transform unit. These devices, without special buffer memory units, form the readouts of filtering the signal irregularly in time. This distorts the time scale and the regularity of the output information. To achieve stationary mode using these devices, we require additional hardware expenditures. Furthermore, the presence in each channel of an inverse Fourier transform unit increases weight, volume, and cost of the hardware. Devices (Chajkovsky and Krakovsky 1983, 1985), which realize the discussed algorithm of filtration, are free of these drawbacks.
SPECTRUM SLIDING ANALYSIS
259
5. Recommendation on Spectrum Sliding Analysis Realization It is also possible to increase the response speed of the instant digital spectrum analyzers using table multiplication (Sobolev, 1991). This may be used if there is enough memory for appropriate tables. An A-D converter with the aim to increase the accuracy may be used. A converter (Chajkovsky et al., 1988) with Sample-Hold circuit (Krakovsky, 1985) allows an increase in both the accuracy and the speed. Project Computer System for Precise Measurement of Analog Signals showed that it is possible to achieve an accuracy of 16 bits with the sampling frequency up to 2 MHz. Projects Real Signal Instant Spectrum Digital Analyzer in the Information Processing Systems and Complex Signal Instant Spectrum Digital Analyzer in the Information Processing Systems showed that using Xilinxs PLAs Virtex-2 (http://www.xilinx.com) it is possible to implement inventions by Krakovsky et al. (1991a) and Krakovsky et al. (1991b) with the sampling frequency up to 3.9 MHz for the size of processing extracts N = 256 and the accuracy of 8 bits. Spectrum digital analyzers output only Cartesian constituents of the spectrum complex samples. If only the modulus of these samples is required, it is possible to increase the response speed by using approximating methods of modulus calculations based on comparison, shift, and summation (Krakovsky, 1982, 1983b).
Acknowledgments The author would like to thank Myoung An, Richard Tolimieri, and Benjamin Shenefelt for their help in preparing the paper for the publication. All the material presented in this paper has been reproduced from the text Spectrum Sliding Analysis, Vladimir Ya. Krakovsky, Psypher Press, 2006, with explicit permissions from the author and the publisher.
References Chajkovsky, V. I. (1976) Real Time Spectrum Digital Analyzers Functional Peculiarities, Kiev (preprint/Ukraine SSR NAS Institute of Cybernetics; 76-39) (in Russian). Chajkovsky, V. I. (1977) Spectrum Experimental Analysis Metrology Peculiarities, Kiev (preprint/Ukraine SSR NAS Institute of Cybernetics; 77-6) (in Russian). Chajkovsky, V. I., Koval, V. F., Krakovsky, V. Ya., and Pikulin, V. S. (1977) Fourier Spectrum Analyzer, USSR Author Certificate 560232, May 30. Chajkovsky, V. I. and Krakovsky, V. Ya. (1979) Sliding Spectrum Discrete-analog Analyzer, USSR Author Certificate 666488, June 5.
260
VLADIMIR YA. KRAKOVSKY
Chajkovsky, V. I. and Krakovsky, V. Ya. (1983) Multichannel Filter, USSR Author Certificate 995282, February 7. Chajkovsky, V. I. and Krakovsky, V. Ya. (1985) Multichannel Filter, USSR Author Certificate 1193778 A, November 11. Chajkovsky, V. I. and Krakovsky, V. Ya. (1986) Signal Discrete Filtration on the Basis of Spectrum Sliding Analysis, Kibernetika i vychislitelnaja tekhnika (71), 55–58. Chajkovsky, V. I., Krakovsky, V. Ya., and Koval, V. F. (1978) Fourier Spectrum Digital Analyzer, USSR Author Certificate 614440, July 5. Chajkovsky, V. I., Krakovsky, V. Ya., and Koval, V. F. (1988) A-D Converter, USSR Author Certificate 1401608 A2, May 7. Cooley, J. W. and Tukey, J. W. (1965) An Algorithm for the Machine Calculation of Complex Fourier Series, Mathematics and Computation 19, 297–301. Dillard, G. M. (1977) Method and Apparatus for Computing the Discrete Fourier Transform Recursively, USA Patent 4023028, May 10. Halberstein, J. H. (1966) Recursive, Complex Fourier Analysis for Real-time Applications, Proceedings of IEEE 54, 903. Krakovskii, V. Ya. (1993) Digital Analyzers for Dynamic Spectral Analysis, Measurement Techniques (USA) 36(12), 1324–1330 (translation of Izmer. Tekh. (Russia) 36(12), 13–16 (1993). Krakovskii, V. Ya. (1997) Generalized Representation and Implementation of Speedimprovement Algorithms for Instantaneous Spectrum Digital Analyzers, Cybernetics and System Analysis 32(4), 592–597. Krakovsky, V. Ya. (1980) Development and Investigation of Specialized Computing Devices for Electrical Signals Spectrum Sliding Analysis, Eng. Sci. Cand. Dissertation, NAS Institute of Cybernetics, Kiev (in Russian). Krakovsky, V. Ya. (1981) Instant Spectrum Discrete-analog Analyzer, USSR Author Certificate 834576, May 30. Krakovsky, V. Ya. (1983a) Selection of Sliding Spectrum Digital Analyzer Twiddle Factors Word Size. Izmeritelnaja tekhnika 10, 13–14 (in Russian). Krakovsky, V. Ya. (1983b) Complex Quantities Modulus Approximate Determination, Metrology (4), 7–10, 1983 (in Russian). Krakovsky, V. Ya. (1985) Sample-Hold Circuit, USSR Author Certificate 1185398 A, October 15. Krakovsky, V. Ya. (1990) Algorithms for Increase of Speed of Response for Digital Analyzers of the Instant Spectrum, Kibernetika (5), 113–115 (in Russian). Krakovsky, V. Ya. (1992) Complex Quantities Modulus Approximate Determination Errors, Metrology (4), 25–30 (in Russian). Krakovsky, V. Ya. (1998) Spectrum Sliding Analysis Algorithms and Device, In Proceedings of the 1st International Conference on Digital Signal Processing and Its Application, June 30–July 3, Vol. I, Moscow, Russia, pp. 104–107. Krakovsky, V. Ya. (2000) Harmonic Sliding Analysis Problem, In J. S. Byrnes (ed.), Twentieth Century Harmonic Analysis—A Celebration, NATO Science Series, II. Mathematics, Physics and Chemistry, Vol. 33, pp. 375–377. Krakovsky, V. Ya. and Chajkovsky, V. I. (1984) Spectrum Sliding Analysis Peculiarities, Autometrija 6, 34–37 (in Russian). Krakovsky, V. Ya. and Koval , V. F. (1982) Instant Spectrum Digital Analyzer, USSR Author Certificate 932419, May 30. Krakovsky, V. Ya. and Koval, V. F. (1984) Instant Spectrum Digital Analyzer, USSR Author Certificate 1095093 A, May 30.
SPECTRUM SLIDING ANALYSIS
261
Krakovsky, V. Ya. and Koval, V. F. (1988a) Instant Spectrum Digital Analyzer, USSR Author Certificate 1377762 A2, February 29. Krakovsky, V. Ya. and Koval, V. F. (1988b) Complex Signal Instant Spectrum Digital Analyzer, USSR Author Certificate 1406507 A2, June 30. Krakovsky, V. Ya. and Koval, V. F. (1989) Instant Spectrum Digital Analyzer, USSR Author Certificate 1456904 A2, February 7. Krakovsky, V. Ya., Koval, V. F., Kuznetsova, Z. A., and Eresko, V. V. (1991a) Real Signal Instant Spectrum Digital Analyzer, USSR Author Certificate 1563408 A2 (Permission for publication N 267/22, 28.11.91). Krakovsky, V. Ya., Koval, V. F., Kuznetsova, Z. A., and Eresko, V. V. (1991b) Complex Signal Instant Spectrum Digital Analyzer, USSR Author Certificate 1568732 A2 (permission for publication N 267/22, 28.11.91). Lejtes, R. D. and Sobolev, V. N. (1969) Synthetic Telephone Systems Digital Modeling, Moscow, Svjaz (in Russian). Litovchenko, L. S. (2003) Computer System for Precise Measurement of Analog Signals (Diploma Project), V. Ya. Krakovsky (project leader), Kyiv, National Aviation University (in Ukrainian). Perreault, D. A. and Rich, T. C. (1980) Method and Apparatus for Suppression of Error Accumulation in Recursive Computation of a Discrete Fourier Transform, USA Patent 4225937, September 30. Plotnikov, V. N., Belinsky, A. V., Sukhanov, V. A., and Zhigulevtsev, Yu. N. (1990) Spectrum Digital Analyzers, Moscow, Radio i svjaz (in Russian). Rabiner, L. R. and Gold, B. (1975) Theory and Application of Digital Signal Processing, Englewood Cliffs, NJ, Prentice-Hall. Schmitt, J. W. and Starkey, D. L. (1973) Continuously Updating Fourier Coefficients Every Sampling Interval, USA Patent 3778606, December 11. Sergienko, P. V. (2003) Real Signal Instant Spectrum Digital Analyzer in the Information Processing Systems (Diploma Project), V. Ya. Krakovsky (project leader), Kyiv, National Aviation University (in Ukrainian). Smetaniuk, S. V. (2003) Complex Signal Instant Spectrum Digital Analyzer in the Information Processing Systems (Diploma Project), V. Ya. Krakovsky (project leader), Kyiv, National Aviation University (in Ukrainian). Sobolev, V. N. (1991) Dynamic Spectral Analysis Fast Algorithm, In All-Union ARSO-16 Seminar Reports Abstracts, Moscow State University, pp. 232–233 (in Russian). Tolimieri, R. and An, M. (1998) Time-Frequency Representations, Bassle, Birkhauser.
INDEX
algorithms for instantaneous spectrum digital analyzers 249 automatic target recognition (ATR) 1, 13
joint research 64 Kalman filter 97, 98, 103, 104, 129, 139, 141, 147
batch posterior estimation 136 Bayesian estimation 130, 135, 137, 138 Bayesian inference 129 blind deconvolution 107, 109–111, 121, 124 bootstrap 142–144, 146, 147, 150
landmines 33, 49 laser 30, 243–245 Latex 49 likelihood ratio 14, 97–99 luminescence 243–247
central limit theorem 135 complexity 10, 12, 15, 17, 27, 44, 66, 157, 158, 207–209, 212, 215, 223, 236, 258 cyclic autocorrelation 195, 198, 200, 203 cyclic cross correlation 195, 198
Mach-Zehnder interferometer 151, 152, 155, 157 Markov 117, 120 Monte Carlo 129, 132, 215 multichannel filter 249, 257, 258 multichannel systems 107 multifractal analysis 169, 171, 172, 175, 178, 180, 181, 190 multi-perspective classification 1, 16, 18
dynamic spectrum analysis 249 dynamics 207–224, 226, 227, 229–236 electrochemical sensors 63–65, 67 electronic tongue 63–65, 72, 74–76, 78–87, 89, 90 expectation 119, 132–135 filtering 5, 12, 39, 97, 129–132, 139–142, 146, 215, 251, 255–258 fingerprints 90, 243–247 finite support 134 fractal boundaries 169–171 Finite Zak Transform (FZT) 195, 198, 199 Gauss-Markov 97, 100, 101 Golay pair 203 ground penetrating radar 29, 34–36 Improvised Explosive Decices (IED) 49, 50 image fusion 44, 107–109 imaging 1, 2, 6, 29, 30, 33, 36–39, 41–45, 75, 82–88, 107, 108, 243–247 importance sampling 129, 132–134, 136, 141–144, 146 instant spectrum algorithms & analyzers 249–251
NATO 49 natural processes 207, 208, 219, 231, 236, 238 Neyman-Pearson 97, 98 noise canceling 97, 99 noise model 97, 99 non-Gaussian 131, 144, 215 nonlinear statistical signal processing 129, 147 nonlinear time series analysis 208, 213, 218, 225 nonstationary noise 97 orthonormal basis 154, 196 own-ship noise 97, 101, 105 particle filter 129, 132, 139–142, 144, 145 passive bistatic radar 29, 30, 44, 45 pattern recognition 14, 63 permutation matrix 195, 196, 199, 203 permutation waveforms 199 photometry 49 point mass 140, 141
263
264
INDEX
pointwise exponents 169, 171, 172, 174, 175 posterior distribution 130, 131, 134, 135, 137, 140, 141, 143, 146, 147 prediction distribution 132 probability distribution 77, 129, 130, 133, 138–141, 211 quantum computer 151, 157–161, 166 quantum information 151–154, 163, 165, 166 quantum mechanics 151, 153, 155, 156, 161, 163, 164, 166 radar imaging 1, 29, 30, 36 recursive algorithms 149–151 recursive detection 97, 98, 102 regularized energy minimization 107 remote sensing 29, 49, 50
sequential detection 97, 98 signal model 97, 98, 101, 103, 105 sliding spectrum algorithms & analyzers 249, 250 spectroscopy 49, 64, 65, 243, 245 spectrum sliding analysis 249, 259 star permutation 195, 196, 200–205 state-space model 129, 137, 138, 150 superresolution 32, 107, 109, 111, 117, 124 time reversal matrix 196 time-resolved 243–247 waveforms 3–5, 31, 34, 37, 41, 46, 195, 199 Zak space ZS) 195, 196, 198–200 Zak transform 195