ADVANCES AND CHALLENGES IN MULTISENSOR DATA AND INFORMATION PROCESSING
NATO Security through Science Series This Series presents the results of scientific meetings supported under the NATO Programme for Security through Science (STS). Meetings supported by the NATO STS Programme are in security-related priority areas of Defence Against Terrorism or Countering Other Threats to Security. The types of meeting supported are generally “Advanced Study Institutes” and “Advanced Research Workshops”. The NATO STS Series collects together the results of these meetings. The meetings are co-organized by scientists from NATO countries and scientists from NATO’s “Partner” or “Mediterranean Dialogue” countries. The observations and recommendations made at the meetings, as well as the contents of the volumes in the Series, reflect those of participants and contributors only; they should not necessarily be regarded as reflecting NATO views or policy. Advanced Study Institutes (ASI) are high-level tutorial courses to convey the latest developments in a subject to an advanced-level audience. Advanced Research Workshops (ARW) are expert meetings where an intense but informal exchange of views at the frontiers of a subject aims at identifying directions for future action. Following a transformation of the programme in 2004 the Series has been re-named and reorganised. Recent volumes on topics not related to security, which result from meetings supported under the programme earlier, may be found in the NATO Science Series. The Series is published by IOS Press, Amsterdam, and Springer Science and Business Media, Dordrecht, in conjunction with the NATO Public Diplomacy Division. Sub-Series A. B. C. D. E.
Chemistry and Biology Physics and Biophysics Environmental Security Information and Communication Security Human and Societal Dynamics
Springer Science and Business Media Springer Science and Business Media Springer Science and Business Media IOS Press IOS Press
http://www.nato.int/science http://www.springer.com http://www.iospress.nl
Sub-Series D: Information and Communication Security – Vol. 8
ISSN: 1574-5589
Advances and Challenges in Multisensor Data and Information Processing
Edited by
Eric Lefebvre Lockheed Martin Canada, Montreal, Quebec, Canada
Amsterdam • Berlin • Oxford • Tokyo • Washington, DC Published in cooperation with NATO Public Diplomacy Division
Proceedings of the NATO Advanced Study Institute on Multisensor Data and Information Processing for Rapid and Robust Situation and Threat Assessment Albena, Bulgaria 16–27 May 2005
© 2007 IOS Press. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 978-1-58603-727-7 Library of Congress Control Number: 2007922628 Publisher IOS Press Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail:
[email protected] Distributor in the UK and Ireland Gazelle Books Services Ltd. White Cross Mills Hightown Lancaster LA1 4XS United Kingdom fax: +44 1524 63232 e-mail:
[email protected] Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail:
[email protected] LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
v
Preface From the 16th to the 27th of May 2005, a NATO Advanced Study Institute entitled Multisensor Data and Information Processing for Rapid and Robust Situation and Threat Assessment was held in Albena, Bulgaria. This ASI brought together 72 people from 13 European and North American countries to discuss, through a series of 48 lectures, the use of information fusion in the context of defence against terrorism, which is a NATO priority research topic. Information fusion resulting from multi-source processing, often called multisensor data fusion when sensors are the main sources of information, is a relatively young (less than 20 years) technology domain. It provides techniques and methods for: 1) integrating data from multiple sources and using the complementarity of this data to derive maximum information about the phenomenon being observed; 2) analyzing and deriving the meaning of these observations; 3) selecting the best course of action; and 4) controlling the actions. Various sensors have been designed to detect some specific phenomena, but not others. Data fusion applications can combine synergically information from many sensors, including data provided by satellites and contextual and encyclopedic knowledge, to provide enhanced ability to detect and recognize anomalies in the environment, compared with conventional means. Data fusion is an integral part of multisensor processing, but it can also be applied to fuse non-sensor information (geopolitical, intelligence, etc.) to provide decision support for a timely and effective situation and threat assessment. One special field of application for data fusion is satellite imagery, which can provide extensive information over a wide area of the electromagnetic spectrum using several types of sensors (Visible, Infra-Red (IR), Thermal IR, Radar, Synthetic Aperture Radar (SAR), Polarimetric SAR (PolSAR), Hyperspectral...). Satellite imagery provides the coverage rate needed to identify and monitor human activities from agricultural practices (land use, crop types identification...) to defence-related surveillance (land/sea target detection and classification). By acquiring remotely sensed imagery over earth regions that land sensors cannot access, valuable information can be gathered for the defence against terrorism. Developed on these themes the ASI’s program was subdivided in ten half-day sessions devoted respectively to the following research areas: • • • • • •
Target recognition/classification and tracking Sensor systems Image processing Remote sensing and remote control Belief functions theory Situation assessment
vi
The lectures presented at the ASI proved to be of great contribution and importance to the research and development of the multisensor data fusion based surveillance systems used in rapid and robust situations and for threat assessment. The ASI gave all the participants the opportunity to interact and exchange valuable knowledge and work experience to overcome challenging issues in various research areas. This book summarizes the lectures that were given at this ASI. An Advanced Research Workshop (ARW) related to this ASI was held in Tallinn, Estonia from June 27th to July 1st 2005. This ARW addressed the data fusion technologies for harbour protection. More information on this event can be found at http://www.canadiannatomeetings.com. I would like to thank all the lecturers who accepted the invitation to participate in the ASI. The time they spent preparing their lectures and their active participation were key factors to the ASI’s success. I would also like to thank them for the summary papers they provided to make this book happen. I extend my thanks to all the attendees of the ASI for their interest and participation. A special acknowledgement goes to Kiril Alexiev, the co-director of this ASI who initiated this project and was always very supportive. His tremendous help in the coordination of all events and logistics was much appreciated. My warm thanks go to Gayane Malkhasyan and Masha Ryskin, my administrative assistants and interpreters who ensured that everything ran smoothly during the course of the ASI. I would also like to thank the officers from the Albena Congress centre office, in particular, Mrs. Galina Toteva for her extra assistance. I would like to thank Pierre Valin and Erik Blasch who did the technical reviews of this book. Their judicious comments were very helpful. Very special thanks go to Kimberly Nash who reviewed the papers and formatted the book. Thank you for your patience and all the time you spent increasing the quality of the book. Finally I wish to express my gratitude to NATO who supported this ASI along with Lockheed Martin Canada, the Institute of Parallel Processing of the Bulgarian Academy of Science, Defence Research and Development Canada, the European Office of Aerospace Research and Development of the USAF and the National Science Foundation, without whom it would have been impossible to organize this event. Eric Lefebvre Montreal, Canada
vii
Contents Preface Eric Lefebvre
v
Sensor Data Fusion: Methods, Applications, Examples Wolfgang Koch
1
Simulation of Distributed Sensor Networks Kiril Alexiev
24
Joint Target Tracking and Classification via Sequential Monte Carlo Filtering Donka Angelova and Lyudmila Mihaylova
33
A Survey on Assignment Techniques Felix Opitz
41
Non-Linear Techniques in Target Tracking Thomas Kausch, Kaeye Dästner and Felix Opitz
48
Underwater Threat Source Localization: Processing Sensor Network TDOAs with a Terascale Optical Core Device Jacob Barhen, Neena Imam, Michael Vose, Arkady Averbuch and Michael Wardlaw On Quality of Information in Multi-Source Fusion Environments Eric Lefebvre, Melita Hadzagic and Éloi Bossé
56
69
Polarimetric Features and Contextual Information Fusion for Automatic Target Detection and Recognition Yannick Allard, Mickael Germain and Olivier Bonneau
78
Enhancing Efficiency of Dynamic Threat Analysis for Combating and Competing Systems Edward Pogossian, Arsen Javadyan and Edgar Ivanyan
85
Evidence Theory for Robust Ship Identification in Airborne Maritime Surveillance Missions Pierre Valin
92
Improved Threat Evaluation Using Time of Earliest Weapon Release Eric Ménard and Jean Couture Detection of Structural Changes in a Multivariate Data Using Change-Point Models David Asatryan, Boris Brodsky and Irina Safaryan
99
106
Unification of Fusion Theories (UFT) Florentin Smarandache
114
Belief Functions Theory for Multisensor Data Fusion Patrick Vannoorenberghe
125
viii
Dempster-Shafer Evidence Theory Through the Years: Limitations, Practical Examples, Variants Under Conflict and a New Adaptive Combination Rule Mihai Cristian Florea, Anne-Laure Jousselme and Dominic Grenier
148
Decision Support and Information Fusion in the Context of Command and Control Éloi Bossé
157
Fusion in European SMART Project on Space and Airborne Mined Area Reduction Isabelle Bloch and Nada Milisavljević
164
The DSmT Approach for Information Fusion and Some Open Problems Jean Dezert and Florentin Smarandache
171
Multitarget Tracking Applications of Dezert-Smarandache Theory Albena Tchamova, Jean Dezert, Tzvetan Semerdjiev and Pavlina Konstantinova
179
Image Registration: A Tutorial Pramod K. Varshney, Bhagavath Kumar, Min Xu, Andrew Drozd and Irina Kasperovich
187
Automated Registration for Fusion of Multiple Image Frames to Assist Improved Surveillance and Threat Assessment Malur K. Sundareshan and Mohamed I. Elbakary Data Fusion and Image Processing: A Few Application Examples Olivier Goretta and Francis Celeste Secondary Application Wireless Technologies to Increase Information Potential for Defence Against Terrorism Christo Kabakchiev, Vladimir Kyovtorov and Ivan Garvanov
211 221
236
Adaptive Image Fusion Using Wavelets: Algorithms and System Design Stavri G. Nikolov, Eduardo Fernández Canga, John J. Lewis, Artur Loza, David R. Bull and C. Nishan Canagarajah
243
Methods for Fused Image Analysis and Assessment Artur Loza, Timothy D. Dixon, Eduardo Fernández Canga, Stavri G. Nikolov, David R. Bull, C. Nishan Canagarajah, Jan M. Noyes and Tom Troscianko
252
Object Tracking by Particle Filtering Techniques in Video Sequences Lyudmila Mihaylova, Paul Brasnett, C. Nishan Canagarajah and David Bull
260
Wavelets, Segmentation, Pixel- and Region- Based Image Fusion John J. Lewis, Richard J. O’Callaghan, Stavri G. Nikolov, David R. Bull and C. Nishan Canagarajah
269
Data Fusion and Quality Assessment of Fusion Products: Methods and Examples Paolo Corna, Lorella Fatone and Francesco Zirilli
277
ix
Information Management Methods in Sensor Networks Lyudmila Mihaylova, Andy Nix, Donka Angelova, David R. Bull, C. Nishan Canagarajah and Alistair Munro A Novel Method for Correction of Distortions and Improvement of Information Content in Satellite-Acquired Multispectral Images Vyacheslav I. Voloshyn, Volodymyr M. Korchinsky and Mykola M. Kharytonov
307
315
Multisensor Data Fusion in the Processes of Weighing and Classification of the Moving Vehicles Janusz Gajda, Ryszard Sroka and Tadeusz Zeglen
324
Sensor Performance Estimation for Multi-Camera Ambient Security Systems: A Review Lauro Snidaro and Gian Luca Foresti
331
Principles and Methods of Situation Assessment Alan N. Steinberg
339
Higher Level Fusion for Catastrophic Events Galina L. Rogova
351
Ontology-Driven Knowledge Integration from Heterogeneous Sources for Operational Decision Making Support Alexander Smirnov, Michael Pashkin, Nikolai Chilov and Tatiana Levashova
359
Evaluation of Information Fusion Techniques Part 1 – System Level Assessment Erik Blasch and Susan Plano
366
Evaluation of Information Fusion Techniques Part 2 – Metrics Erik Blasch
375
Rapid and Reliable Content Based Image Retrieval Dimo T. Dimov
384
Subject Index
397
Author Index
401
This page intentionally left blank
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
*
*
1
2
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
3
4
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
5
6
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
Zk
tl l = 1
Zl = {z1l , . . . , zl k } n
... k xl {Zk nk Z k−1 }
xl Zk
t1
p(xl |Z k ) l = k
tk p(xk |Z k ) =
p(Zk , nk |xk ) p(xk |Z k−1 ) . dxk p(Zk , nk |xk ) p(xk |Z k−1 )
p(Zk , nk |xk ) (xk ; Zk , nk ) ∝ p(Zk , nk |xk ) Zk , nk
xk Hk xk Rk (xk ; zk ) ∝ N zk ; Hk xk , Rk
dxk−1 p(xk , xk−1 |Z
p(xk |Z k−1 ) tk tk−1 k−1
)
xk−1
p(xk |Z k−1 ) = tk−1
p(xk |Z k−1 ) = dxk−1 p(xk |xk−1 , Z k−1 ) p(xk−1 |Z k−1 ) . p(xk |xk−1) = N (xk ; Fk|k−1 xk−1 , Dk|k−1 ) Fk|k−1 Dk|k−1 p(xk |Z k ) p(xk−1 |Z k−1 )
p(xk |xk−1 ) p(xk |Z k ) = N (xk ; xk|k , Pk|k )
tk
(xk ; Zk , nk )
7
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
p(x0 |Z 0 ) = N x0 ; x0|0 , P0|0 P0|0
t0
tl tk p(xl |Z k ) =
l < k
p(xl |Z k ) dxl+1 p(xl , xl+1 |Z k ) xl+1 p(xl |Z k ) =
dxl+1
p(xl+1 |xl ) p(xl |Z l ) dxl p(xl+1 |xl ) p(xl |Z l )
p(xl+1 |Z k ) tl+1
tl
p(xl |Z k ) p(xl |Z k ) xl
Zk
tl tk tl xl tl > tk
tl = tk
tl < tk
p(xk−1 |Z k−1 ) −−−−−−−−−−−−−−−−→ p(xk |Z k−1 ) p(xk |Z k−1 )
−−−−−−−−−−−−−→
p(xk |Z k )
p(xl−1 |Z k )
←−−−−−−−−−−
p(xl |Z k ).
p(xl |Z k ) p(xk−1 |Z k−1 ) p(xk |Z k ) tk−1 tk tk+1 N z; Hx, R N x; y, P = N z; Hy, S ν = z − Hy,
S = HPH + R,
×
p(xk+1 |Z k+1 )
N x; y + Wν , P − WSW N x; Q−1 (P−1 x + R−1 H z), Q
W = PH S−1 ,
Q−1 = P−1 + H R−1 H.
8
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
p(xk+2 |Z k+1 )
tk−1 tk
p(xk |Z k )
tk+1 tk+1 p(xk+2 |Z k+1 ) p(xk+2 |Z k+2 ) p(xk−1 |Z
tt+2 )
k+2
p(xk |Z k )
p(xk+1 |Z k+2 ) p(xk |Z k+2 )
9
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
Z k = {Zi }ki=1
• h1 • h0
P1 = P ( h1 |h1 ) P0 = P ( h1 |h0 )
Zk
Zk h1 PD PF
p(h1 |Z k ) LR(k) = p(Z k |h1 )/p(Z k |h0 ) LR(k) A B
LR(k) < A LR(k) > B A < LR(k) < B
P0 P1 p(h0 |Z k ) k=1
h0 h1 Zk+1
LR(k + 1)
10
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
PD ◦ ◦
2.4
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
11
12
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
ΔT i
ΔTc ΔTi i = 1, . . . , n n ΔT c n kum i PD (n) = 1 − i=1 (1 − PD ) c PD
= 1−
n
i ΔTc /ΔTi i=1 (1 − PD )
i PD
1 ΔTc
c PD
=
n
1 i=1 ΔTi
ΔTc
13
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
ΔTi ΔTi
i i PD
c PD
c PD
ΔTc ΔT2
ΔT1
c PD
n
√ ∝ 1/ n
zi Ri
zk = (rk , ϕk ) R = diag[σr2 , σϕ2 ] t[zk ] = rk (cos ϕk , sin ϕk ) t[zk ] xk|k−1 t(zk ) ≈ t(xk|k−1 ) + ∂t[xk ]/∂xk |xk =xk|k−1 (zk − xk|k−1 ) ∂t[xk ] ∂xk
=
cos ϕk −rk sin ϕk sin ϕk rk cos ϕk
=
cos ϕk − sin ϕk sin ϕk cos ϕk
1 0 0 rk
Dϕ
ϕk|k−1 Dϕ Sr RSr D ϕ
Dϕ diag[σr2 , (rk|k−1 σϕ )2 ]D ϕ
x1|1 = z1 P1|1 = R1 k i=1
R−1 i zi ,
Pk|k =
.
Sr
rk|k−1
xk|k = Pk|k
k i=1
R−1 i
−1
.
14
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
O1
X
O2
O3 r
r
ϕ S1
π −ϕ R
Ri i = 1, . . . , k Pk|k = R/k
S2
S1
S2
15
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
tk
Hg xk = 12 H(x1k + x2k )
zgk = Hg xk + ugk
zgk
ugk ∼ N (0, Rg )
Rg Hxik = (rki , ϕik )
H i = 1, 2
αr αϕ
αr αϕ Δr/αr 1 Δϕ/αϕ 1 Pr = Pu
Pr (Δr, Δϕ) Pu Pr (Δr, Δϕ) = 1 − Pu (Δr, Δϕ) 2 2 exp − log 2( Δϕ . Pu (Δr, Δϕ) = exp − log 2( Δr αr ) αϕ )
xk Pu
H(x1k − x2k )
0 Ru Pu (xk ) = |2πRu |−1/2 N O; Hu xk , Ru
(Zk , nk |xk ) = Ekii
αr αϕ Ru =
2 2 1 2 log 2 diag[αr , αϕ ].
Ek
Ek (Zk , nk , Ek |xk )
zik ∈ Zk
Zk (Zk , nk , Ekii |xk ) ∝ Pu (xk ) N (zik ; Hgk xk , Rgk )
i zk ; Hg x , Rg ∝N k O H u 0
O Ru
.
16
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
Ekii
zik
Hgk xk = 12 H(x1k + x2k ) Hu xk = H(x1k − x2k )
Ek00 Zk (Zk , nk , Ek00 |xk ) ∝ Pu (xk ) ∝ N 0; Hu x, Ru .
zik , zjk
∈ Zk
Ekij
(Zk , nk , Ekij |xk ) ∝ [−Pu (xk )] N
zik zjk
R O) . ;(H ) x , ( k H O R
1 1 − Pu (xk ) = 1 − |2πRu | 2 N 0; Hu x, Ru
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
epk = (rk − pk )/|rk − pk | pk tk rk ˙ xk = (r k ,r k)
r˙ k
hn (rk , r˙ k ; pk ) = (rk − pk ) r˙ k /|rk − pk | hc (xk ; pk ) = 0
PD
hn (xk ;pk ) 2 PD (xk ; pk ) = Pd 1 − e− log 2( MDV ) .
PD p(xk |Z k )
hn (xk ; pk )
17
18
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
xrk lk
tk
l˙k xrk = (lk , l˙k )
19
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
p(xrk−1 |Z k−1 ) −−−−−→ p(xrk |Z k−1 ) l R p(xsk |Z k−1 ) −−−→ p(xsk |Z k )
p(xrk |Z k−1 )
−−−−−−−−−−−−−−−−−−−→
p(xsk |Z k−1 )
20
W. Koch / Sensor Data Fusion: Methods, Applications, Examples 2000
End
20
Target stops
(4) (3)
(2)
15
position error semi−axes [m]
Target stops
25
(2)
Kalman filter mean: 364 (109) m
(4)
1500 Terrain obscuration
(1)
Clutter notch and terrain obscuration
1000
(3) 500
0 0
500
1000
1500
2000
2500
3000
3500
y [km]
time [s]
1200 Kalman filter with track generated road mean: 250 (46) m
position error semi−axes [m]
Terrain obscuration
10
Terrain obscuration
(1) 5
Start 0
0
1000
(1)
Target stops
800
(2)
Clutter notch and terrain obscuration
600
(3)
(4)
400 200
5
10
15
x [km]
20
25
0 0
500
1000
1500
2000 time [s]
2500
3000
3500
21
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
g
tk
bk
dk = (uk , vk ) tk bk = (uk|k−1 , vk|k−1 )
k
tk SNk = SN0 ( rrk0 )−4 e−(log 2)|dk −bk |
2
/b2
b |dk − bk | = b
0
PFA 1/[1+SNR(dk ,rk ;bk )] PD (dk , rk ; bk ) = PFA p(dk |Z k−1 ) tk
p(dk |¬D1k , Z k−1 ) ∝ [1−PD (dk , rk ; bk )]p(dk |Z k−1 ) p(dk |Z k−1 ) ¬D1k
p(dk |¬D1k , ¬D2k , Z k−1 ) ¬D2k
22
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
W. Koch / Sensor Data Fusion: Methods, Applications, Examples
23
24
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Simulation of Distributed Sensor Networks1 K. ALEXIEV Department of Mathematical Methods for Sensor Information Processing, Institute for Parallel Processing, Bulgarian Academy of Sciences
Abstract. Sensor networks have emerged as fundamentally new tools for monitoring spatially distributed phenomena. They incorporate the most progressive ideas from several areas of research: computer networks, wireless communications, grid technologies, multiagent systems and network operating systems. In spite of great interest centered on sensor data processing and information fusion, the simulation of entire multisensor network remains very important for the optimal solution of many tasks relevant to joint data processing and data transmission between sensor nodes. The current state of development automation tools does not correspond to the contemporary needs of modern design. The main purpose of this paper is to outline the structure of a simulation tool for modeling dynamical self-organizing heterogeneous sensor networks. Focus is concentrated on different component modeling and data flow simulation in the sensor networks.
Keywords. Modeling, Sensor networks
Introduction Sensor networks (SN) are useful tools for monitoring spatially distributed phenomena. A sensor network may consist of homogeneous or heterogeneous sensors spread in a global surveillance volume, which act jointly to optimally solve required tasks. Video cameras, acoustic microphone arrays, thermal imaging systems, seismic or magnetic sensing devices, microwave or millimeter wave radar systems, laser radar systems and etc. based on different sensing principles such as mechanical, chemical, thermal, electrical, chromatographic, magnetic, biological, fluidic, optical, acoustic, ultrasonic, mass sensing are used for monitoring. With advances in low-power processors, Internet, mobile communication and micro-mechanical systems the development of networks of numerous tiny, intelligent, wirelessly networked sensor nodes has become more cost effective, allowing these networks to penetrate deeply into the consumer market. SN are now used by the armed forces, the coast guard, the police, fire departments, environmental protection, etc. SN could be deployed on transport vehicles, in hospitals and in factories. The project “Smart home” supposes that nearly all home tools will be equipped with a variety of sensors, communicating with each other and with the outside world. The SN could also help society through health care 1
The research reported in this paper is partially supported by the Bulgarian Ministry of Education and Science under grant No I-1202/2002, and grant No I-1205/2002.
K. Alexiev / Simulation of Distributed Sensor Networks
25
by monitoring vital signs and other indicators of the aged and chronically ill. Using real time controls combined with databases could enhance treatment and reduce medical costs. The quick proliferation of SN requires appropriate tools for R&D of SN. Digital computer simulation presents a picture of expected system performance, and optimizes sensor deployment and collaborative information processing. Analysis of test results in different environment conditions and scenarios increases the robustness of the overall system at a reasonable cost.
1. Fundamentals of Sensor Networks SN architecture can be presented as communicating sensor nodes. The appearance, development and fast growth of SN is based on the rapid progress of information and communication technologies. The means for SN simulation are heirs of modeling tools from these areas. 1.1. Computer Networks The Atanasoff-Berry Computer was the world's first electronic digital computer, built at Iowa State University during 1937-1942 [15]. On October 19, 1973, US Federal Judge Earl R. Larson signed his decision, which declared the Electronic Numerical Integrator and Computer (ENIAC) patent No 3120606 of Mauchly and Eckert invalid and named Atanasoff the inventor of the electronic digital computer. Atanasoff’s invention had no patent rights and the court deemed his device public knowledge. One of the important consequences of this decision is that it prevented patent rights from being applied to computer architecture, which in turn lead to the quick growth of the computer industry. In 1972, Metcalfe developed the Ethernet system to interconnect several computers. In spite of its simplicity and robustness, the Ethernet cannot be used in large networks based only on the addressing scheme. Information flow in large networks can however be controlled using the third and fourth levels of the Open System Interconnection (OSI) model by devices called routers. The addressing scheme of the third level has a hierarchical structure and is called Internet Protocol (IP). The Transmission Control Protocol (TCP) adds Layer 4 connection-oriented reliability services to connectionless IP communications. In a dynamically changed network, routers require special types of networking protocols, called routing protocols, which spread adequate and consistent information about the current state of the network topology between routing devices. The progress of conventional computer networks creates all the necessary hardware and software prerequisites for sensor networks built on a fixed infrastructure. 1.2. Wireless Local Area Networks (WLAN) The mass production and distribution of wireless phones has lead to progresses in wireless technology and created technical solutions that can be easily implemented in wireless computer networks. These networks inherit the traditional problems of wireless and mobile communications, such as bandwidth optimization, power control, and transmission quality enhancement. In addition, their multi-hop nature and the possible lack of a fixed infrastructure introduce new research problems, such as
26
K. Alexiev / Simulation of Distributed Sensor Networks
network configuration, device discovery, and topology maintenance, as well as ad hoc addressing and self-routing. The most worthy characteristic of these networks is their ad hoc availability, which is a valuable property for sensor networks. Standard 802.11 determines two types of WLAN: infrastructure mode and ad-hoc mode [16]. 1.3. New Protocols The SN collaboration assumes a specific protocol stack, which concerns all layers of the OSI model [11,12,13,14]. The physical layer is responsible for frequency selection, carrier frequency generation, signal detection and modulation and data encryption. In WLAN the sensor transmitter antenna is often placed near to the ground and works in a diffraction zone. As a result, long-distance wireless communication can be expensive, both in terms of energy and implementation complexity. The choice of a good modulation scheme is also critical for reliable communication in a sensor network. While an M-ary modulation scheme can reduce the transmit on-time by sending multiple bits per symbol, it results in complex circuitry and increased radio power consumption, but the binary modulation scheme is more energy efficient. The data link layer is responsible for the multiplexing of data streams in LAN, data frame detection, Medium Access Control (MAC), and error control. Traditional CSMA protocol is not optimal because of its assumption of stochastically distributed traffic and independent point-to-point flows. The SN is characterized by highly correlated and dominantly periodic traffic. The TDMA based communication dedicates the full bandwidth to a single sensor node. The FDMA based communication allocates minimum signal bandwidth per node. Several hybrid TDMA–FDMA schemes are proposed to lower synchronization costs and optimize channel width. If the transmitter consumes more power, a TDMA scheme is favored, while the scheme leans toward FDMA when the receiver consumes greater power. Demand-based MAC schemes may be unsuitable for SN due to their large messaging overhead and link set-up delay. Network layer. The IP is almost the only routed protocol used in contemporary computers and SN. The number of routing protocols used in computer networks is less than 10, but the situation with routing protocols in WN is extremely complicated because of their number and different criteria for delivery optimization. The routing protocols can be classified in different groups according to their different characteristics. The first classification scheme divides the routing protocols into flatbased groups of nodes with equal functionality, hierarchical-based groups, where the nodes play different roles in the network, and location-based groups where the sensor nodes' positions are exploited to route data in the network. Another classification scheme divides routing protocols into multipath-based, query-based, negotiationbased, QoS-based, or coherent-based routing techniques depending on the protocol operation. The third scheme classifies the routing protocols into three categories: proactive, reactive, and hybrid protocols, depending on how the source finds a route to the destination. In proactive protocols, all routes are computed before they are really needed, while in reactive protocols, routes are computed on demand. Hybrid protocols use a combination of these two ideas. The number of proposed sensor routing protocols continuously increases. Transport layer SN protocols do not always require reliable packet delivery (if guaranteed delivery is too energy prohibitive). One possible solution is to split TCP into two parts – one for the connection of sensor nodes with the stationary node and
K. Alexiev / Simulation of Distributed Sensor Networks
27
classical TCP for information transmission between stationary nodes and to the querying node. Sometimes a simple UDP is used for the first part of the communication. Application layer SN protocols like the Sensor management protocol set up the rules for data aggregation, sensor node clustering and the addressing scheme. The Task assignment and data advertisement protocol controls interest dissemination from the querying node to the whole SN or to a problem subset of the sensors. The Sensor query and data dissemination protocol determines the interfaces and rules that an application has to use to issue queries, respond to queries and collect incoming replies. 1.4. Multiple Agent Systems A Multiple Agent System (MAS) can be defined as a loosely coupled network of problem solvers (agents) that interact to solve problems that are beyond the individual capabilities or knowledge of a single problem solver [10]. These problem solvers are functionally specific modular autonomous components, usually heterogeneous in nature. It is considered that the agent consists of different modules, such as the planning module, communication module, coordination module, task-reduction module, scheduling module, execution monitoring module, exception-handling module, etc. The agents also use different protocols to exchange information and for cooperative problem solving. In MAS there is no global control and there is the potential for disparities and inconsistencies. It is very important to determine the conflicts and resolve them correctly in time. 1.5. Grid Technology When networking was in its incipient stage, communications were limited to a narrow bandwidth of 56Kbps. Now standard communications (optic and cable) permit information transition of 10 and more Gbps and are no longer the bottleneck of distributed computing. Grid systems use resources of remote nodes as resources of one virtual scalable supercomputer [17]. The first grid system was developed in 1989 in the project CASA, where IP and Message Passing Interface were used. The projects FAFNER, I-WAY, Globus, Legion, Condor, Nimrod, NEOS, NetSolve, Horus, etc. settle different problems in communication, resource management and distributed data processing. The first standards for grid computing - Open Grid Services Architecture and Grid Remote Procedure Call enabled the integration of services and resources across distributed, heterogeneous, dynamic virtual organizations. 1.6. Network Operating System (NOS) The computer OS is a software package that enables computer hardware to be used, provides hardware integrity from a system point of view, runs user application software and ensures a suitable user interface. Usually OS performs these functions for a single user at a time. A NOS distributes these functions over multiple networked devices and shares resources across the network. Most network operating systems are built around a client-server model. The server provides resources and services to one or more clients by means of a network. Microsoft Windows NT, 2000/2003, Linux, UNIX, and Mac OS are the most popular NOS today. The NOS and servers act as the central point of the network and the main repositories of resources. This creates a single point of failure
28
K. Alexiev / Simulation of Distributed Sensor Networks
in the network. The International Standards Organization (ISO) created organization, information, communication, and functional models for network management. The Organization model describes the components of network management, such as the manager, agent and their relationships. The Information model concerns structure and storage of network management information. The Communication model deals with the communication between the agent and manager. The Functional model addresses the network management applications that reside upon the network management station.
2. Synchronization of Sensor Data and Principal SN Components Sensor data must be synchronized in time and space. This means that every sensor packet must be labeled with a time and space stamp to determine when these data were measured and where. Geo-synchronization is also necessary for communication purposes. A source must know the geographical position of any destination to which it wishes to send data, and must label the packets for that destination with its position. Synchronization can be performed either locally or globally, with suitable chosen protocols or services. Examples of sensor data synchronization protocols are Grid’s location service for ad hoc networks with dynamic nodes and Network Time Protocol. The SN can be simulated using four principal components: models of sensors used, environment model, sensor signal and data processing and communication modeling. These components are included in a common framework with suitable graphical user interfaces and a section for performance evaluation.
3. Modeling SN simulation is regarded as the only systematic tool for the detailed analysis of complex systems [2,4,6]. It solves problems arising from the dynamic allocation of sensors at random or a priori known moments and the disappearance of sensors. But it is important to not overstate the significance of modeling. There is a very useful saying to return us to reality: “Computers can take you farther than you really are!” 3.1. Sensor Models Sensors are sources of information about an ill-defined and chaotic reality and targets of interest. The mathematical representation of a sensor includes generating received measurements in a time-surveillance volume domain, considering technical sensor characteristics. A sensor node may also include communication and processing software and the platform on which the sensor is embedded (Figure 1). The most important sensor characteristics are field of view ( FOV ), maximal and minimal detection range Dmax / Dmin , probability of correct/incorrect detection, measurement rate Fm , and measurement noise characteristics N = N ( p N , mN , σ N ) , where p N is noise probabilistic density function with corresponding parameters mN and σ N . When the sensors are embedded on moving platforms every measurement is done at a different location point of space. The parameters of a moving platform for trajectory based modeling are initial platform position data
K. Alexiev / Simulation of Distributed Sensor Networks
29
K = K ( x0 , y0 , z0 , v x0 , v y 0 , v z 0 , a x0 , a y 0 , a z 0 ) and platform trajectory, which can be
described by the turn points’ position and corresponding velocities and accelerations at those points K = K ( xi , yi , zi , v xi , v y i , v z i , a xi , a y i , a zi ) . The probability model of sensor measurements can be expressed by the equation M = M ( FOV , Dmax , Dmin , Fm , N , K , t ) . In this equation there are implicitly included all measured physical phenomena in the detection area of the sensor.
Figure 1. Generalized architecture of sensor node
Figure 2. Radar simulator
3.2. Environment Modeling We have to construct a versatile environment in which SN can be studied. The environment employs a wide range of models to orchestrate and simulate realistic scenarios. The environment is a set of noisy and useful (from process of interest) signals, received by sensors. There are two possible sources of this input information. The first uses data measured by real sensors in the original environment. The limitations caused by data modeling are avoided but there is a strong drawback – it is very difficult, dangerous, expensive and sometimes impossible to explore estimated algorithms in a complex scenario. Such a scenario is improbable, but not impossible, and can exist in real life critical situations. A second drawback is that we have information about targets of interest, received by the same or other sensors with limited accuracy. As a result, inexact knowledge about targets is used for the estimation of sensor characteristics and the corresponding estimation algorithms. The true target state parameters are unknown in practice, or they are measured with limited accuracy, which is insufficient for estimation. In this case the researcher does not have the exact reference data for the accurate evaluation of the explored algorithm [2]. The second source of this input information is from input data that are entirely generated. This approach has considerable flexibility in the generation of complex target and clutter scenarios and a priori known reference input is provided, but generated data only approximate the real sensor data on an appropriate level of abstraction.
30
K. Alexiev / Simulation of Distributed Sensor Networks
Clutter can have a natural or artificial origin. It is easy to explain clutter in the case of imaging sensors. The received image consists of objects of interest and a so-called background. The objects appear, move and disappear above a constant or variable background. Usually, this background is natural. The environment clutter also includes weather conditions – rain, snowfall, the influence of the sun, moon or stars, water vapor, light or night, etc. Sometimes an artificial signal is generated to deceive the enemy sensors and the corresponding clutter has specific characteristics. Another useful feature of the simulation program is the possibility of generating input sensor information with randomized parameters for statistical estimation or algorithm parameters. 3.3. Sensor Processing Algorithms Processing algorithms can be grouped as signal processing algorithms, data processing algorithms and information processing algorithms. Sensor signal processing algorithms filter noisy raw data and locate, detect, or recognize objects of interest. They have to reject all unnecessary information. Choosing a detection threshold and correlation detection algorithm will determine the characteristics of a whole sensor system. For example, for a radar sensor system, a low detection threshold reduces the possibility of undetected targets, but the larger volume of data requires more effective track initiation and estimation algorithms. In the case of an imaging system, a low detection threshold essentially impacts subsequent feature extraction, but a high detection threshold increases the probability of losing feature information. Sensor data processing algorithms estimate object parameters. In the case of a single measured object, data processing algorithms are based on statistic estimation procedures [1-3]. The only problem to be solved is outlier detection. A more complicated case is when there are several measurements with unknown origin. This problem is known as an assignment problem for point objects or a registration problem for imaging sensors. The complexity of the task is NP with the number of measurements. Sometimes, data processing algorithms are also regarded as fusion algorithms because they process a sample of measurements. The most popular algorithms are α − β , α − β − γ , Kalman and extended Kalman filters, AR and ARMA estimators, different transformations, correlation algorithms, template matching algorithms, Sobel, Prewitt, Canny, Roberts, Laplacian, Hough, Radon edge detection, nearest-neighbor association algorithm, different kinds of probabilistic association rules, hybrid estimation procedures as interactive multiple modeling, etc. [1-9]. Sensor information processing algorithms serve as a source of object ID information. They are based on a prior knowledge about objects of interest saved in database and on a variety of decision rules, like Bayesian, Dempster-Shafer, etc. SN provides an advantage if and only if it is able to dynamically use the received signals, data and information of all sensors as a whole, not just as a mere collection of individual sensors. The collaborative information processing process is often called fusion [4-6]. Fusion can be very different and depends on the level of source information and type of sensors. Competitive sensors provide independent measurements of the same information, regarding a physical phenomenon. Competing sensors can be identical or can use different methods to measure physical attributes. Sensors are generally put in a competitive configuration to provide greater reliability or fault-tolerance to a system. Competitive sensors are most often used for missioncritical components to provide a more robust and reliable system. Sensors are
K. Alexiev / Simulation of Distributed Sensor Networks
31
complementary when they do not depend on each other directly, but can be combined to give a more complete image of the phenomena being studied. In general fusing complementary data is easy. The data from independent sensors can be appended to each other, providing a more complete mapping of the physical attributes being studied. 3.4. Communication Distributed information processing is impossible without the timely delivery of two types of information. The sensors have to exchange measured signals, data and information for cooperative information processing. This is the main source of channel load. Efficient use of bandwidth requires exchange of filtered information only, but the usage of raw information can increase the detectability of objects of interest and can reduce estimation errors. Besides this data, the processing nodes have to be provided with information giving a complete view of the SN topology by routing protocols. Event driven updates allow efficient use of bandwidth and faster convergence. The sensors process the received routing information about SN states and build a database. When the process of routing is completed successfully all sensors in the network will have a consistent database. 3.5. Examples General Purpose Simulation System (GPSS) is a system that originates from the Geoffrey Gordon simulator (1959) and is still off the shelf. GPSS models well statistical and control-flow based applications, where events can be modeled in discrete time units. Petri nets was introduced by Carl Adam Petri in his PhD thesis (1962) as a special class of generalized graphs or nets [18]. This is the first modeling and analysis tool well suited for the study of Discrete Parallel Event Systems. Petri net is a mathematical description of the system structure that can then be investigated analytically. Prof. Atanasov from BAS and his group further improved this theory (Generalized Petri nets) and programmed a shell for complex system modeling [19]. Radar simulator (fig. 2, [9]) has embedded scenario generator that is used for radar signal and data processing. The Network simulator (NS-2) is a discrete event simulator targeted at networking research and was supported by DARPA. NS provides substantial support for simulation of TCP, routing, and multicast protocols over wired and wireless networks. NetLogger is a methodology that enables the real-time diagnosis of performance problems in distributed systems. The methodology includes tools for generating time stamped event logs that can be used to provide detailed end-to-end application and system level monitoring; and tools for visualizing the log data and real-time state of the distributed system. Maryland Routing Simulator (MaRS) is a flexible platform developed specifically for valuation and comparison of network routing algorithms. MaRS was used previously for comparative evaluation of link-state and distance-vector routing protocols. QualNet WiFi was released by Scalable Network Technologies as a WLAN simulation tool for the interaction between the MAC and physical layers of wireless
32
K. Alexiev / Simulation of Distributed Sensor Networks
networks. QualNet WiFi comes with a WiFi-specific library of network protocols as well as the Animator, Designer, Analyzer, Tracer and Simulator modules. GloMoSim builds a scalable simulation environment for WLAN. It possesses parallel discrete-event simulation capability. References [1] [2] [3] [4]
[5] [6]
[7] [8]
[9]
[10]
[11] [12] [13] [14] [15] [16] [17]
[18] [19]
Y. Bar-Shalom, Multitarget multisensor tracking: Advanced applications, Artech House, 1990. A. Farina, F. A. Studer, Radar Data Processing, John Wiley & Sons inc., 1985. S. Blackman, Multiple Target Tracking with Radar Applications, Artech House, 1986. Kiril Alexiev, Modeling of Sensor Networks and Collaborative Information Processing in Time-Space Domain, International Conference Automatic and Informatics’03, 6-8 October 2003, Sofia, Bulgaria, pp.33-36. R.R. Brooks, S.S. Iyengar, Multi-Sensor Fusion: Fundamentals and Applications with Software, Prentice Hall, New Jersey, 1998. T. Horney, M. Brännström, M. Tyskeng, J. Mårtensson, GeeWah Ng, M. Gossage, WeeSze Ong, HueyTing Ang, KheeYin How, Simulation framework for collaborative fusion research, Int. conf. Fusion’04, Sweden, pp. 214-218. D. McMichael, G. Jarrad, S. Williams, M. Kennett, Modelling, simulation and estimation of situation histories, Int. Conf. Fusion’04, Sweden, pp. 928-935. Аlexiev К., Djerassi Е., Bоjilov L., Flight object modeling in radar surveillance volume, Sixth international conference: “Systems for automation of engineering and research, SAER'92, St. Konstantin - Varna, Bulgaria, pp. 316 - 320, 01-03, 1992, in Bulgarian. K. Alexiev, A MATLAB Tool for Development and Testing of Track Initiation and Multiple Target Tracking Algorithms, Information&Security An International Journal "Sensor Data Fusion",Vol. 9, 2002, pp.166-174. Huhns, M.N. and Singh, M.P., Agents and Multi-agent Systems: Themes, Approaches, and Challenges, In: Readings in Agents, Huhns, M.N. and Singh, M.P. (Eds.), San Francisco, Calif., Morgan Kaufmann Publishers, 1998, pages 1-23. I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, Wireless sensor networks: a survey, Computer Networks 38 (2002) 393–422 L. P Clare, G. J. Pottie, and J. R. Agre, Self-Organizing Distributed Sensor Networks, Proc. SPIE, vol.3713, April 199, pp. 229-238. Nirupama Bulusu, John Heidemann, Deborah Estrin, Gps-less low cost outdoor localization for very small devices, IEEE Personal Communications Magazine, 7(5):28–34, October 2000. Jae-Hwan Chang, Leandros Tassiulas, Energy conserving routing in wireless ad-hoc networks, INFOCOM, 2000. http://www.britannica.com/eb/article-216038?hook=723621#723621.hook IEEE 802.11 Standard. Chervenak, A., Foster, I., Kesselman, C., Salisbury, C. and Tuecke, S. The Data Grid:Towards an Architecture for the Distributed Management and Analysis of Large Scientific Data Sets. J. Network and Computer Applications, 2001. Petri, C.A., Fundamentals of a Theory of Asynchronous Information Flow, Proc. of IFIP Congress 62, Amsterdam: North Holland Publ. Comp., 1963, Pages: 386-390. K. T. Atanasov, Generalized Nets and Systems Theory, Academichno izdatelstwo, 1997.
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
33
Joint Target Tracking and Classification via Sequential Monte Carlo Filtering1 D. ANGELOVA a , L. MIHAYLOVA b a
Bulgarian Academy of Sciences, 25A Acad. G. Bonchev Str, 1113 Sofia, Bulgaria
b
Department of Electrical and Electronic Engineering, University of Bristol, UK
Abstract. This paper addresses the problem of joint tracking and classification (JTC) of maneuvering targets via sequential Monte Carlo (SMC) techniques. A general framework of the problem is presented within the SMC. A SMC algorithm is developed, namely a Mixture Kalman filter (MKF), which accounts for speed and acceleration constraints. The MKF is applied to airborne targets: commercial and military aircraft. The target class is modeled as an independent random variable, which can take values over the discrete class space with equal probability. A multiple-model structure in the class space is implemented, which provides reliable classification. The performance of the proposed MKF is evaluated by simulation over typical target scenarios. Keywords: joint tracking and classification, sequential Monte Carlo methods, mixture Kalman filtering
Introduction Joint Target Tracking and Classification (JTC) deals with determining the identity of a target while tracking it. Classification or identification of the target involves determining the type of the target, e.g., bomber, commercial aircraft, fighter, or helicopter. Different methods have been developed to solve this problem: Kalman filtering, Interacting Multiple Model (IMM) techniques [1,2], Monte Carlo methods [35], belief functions techniques (e.g., based on the transferable belief model [6]). Many of the IMM and Kalman filtering techniques suffer from the limitation of this framework, which is restricted to linear models and Gaussian processes. The sequential Monte Carlo (SMC) techniques are much more general, and can afford to incorporate nonlinear constraints that are typical to the JTC problem. However these methods are computationally expensive. SMC methods [3-5,7] are very suitable for classification purposes due to the fact that they can easily cope with the highly nonlinear relationships between state and class (feature) measurements and non-Gaussian noise processes. An example of a successful application of this approach to littoral tracking is proposed in [4]. A multiple model particle filter and a Mixture Kalman Filter (MKF) for JTC are developed in [7]. The classification in [7] is carried out by processing kinematic measurements only, primarily in the air surveillance context. The features of the proposed algorithms include the following: for each target class a separate filter is 1 Partially supported by the UK MOD Data and Information Fusion DT Center and the Bulgarian Foundation for Scientific Investigations under grants I-1202/02 and I-1205/02.
34
D. Angelova and L. Mihaylova / Joint Target Tracking and Classification
designed; class filters operate in parallel, covering the class space; each classdependent filter represents a switching multiple model filtering procedure, covering the kinematic state space of the maneuvering target. This kind of multiple model configuration provides precise and reliable tracking and correct class identification but at the cost of a rather complex algorithm. In this paper, another MKF with reduced computational complexity compared to the MKF developed in [7] is proposed for tracking and identification of air targets in two classes: commercial and military aircraft. The parallel work of class filters are simulated by utilising a random class variable. The independent sets of particles for each class are replaced by two class-dependent sets of particles, randomly generated at each time step. The class variable is modeled as an independent random variable, taking values over a finite discrete class space with an equal probability. The proposed filtering algorithm has a relatively simple structure and exhibits the same performance as the MKF, proposed in [7]. The elimination of the unlikely filters after classification decision can be easily realized. Section 2 summarizes the Bayesian formulation of the JTC problem, and section 3 presents the mixture Kalman filtering for JTC. Section 4 deals with the implementation of the MKF for JTC. Simulation results are given in Section 5. Section 6 contains concluding remarks. 1. Bayesian Formulation of JTC Consider the following model of a discrete-time jump Markov system xk
F (Ok ) x k 1 G (Ok )uk (Ok ) B (Ok ) w k ,
zk
H (O k ) x k D (O k )v k , k
(1)-(2)
1,2, ,
where x k nx is the base (continuous) state vector, z k n z specifies the measurement vector, uk nu represents a known control input and k is a discrete time. The input noise process w k and the measurement noise v k are assumed to be independent identically distributed Gaussian processes with characteristics w k ~ N (0, Q ) and v k ~ N (0 , R) , respectively. The modal (discrete) state Ok , characterising the different system modes (regimes), can take values over a finite set S, i.e. Ok S # {1,2, , s} . We assume that the mode Ok evolves according to a first-order Markov chain with transition probabilities S ij Pr{O k j | O k 1 i}, (i, j S ), initial probability distribution P0 (i ) t 0 and
¦
s
P (i ) i 1 0
0 . Next, we suppose that the target
belongs to one of M classes c C where C # {1,2, , M } represents the set of the target classes. Generally, the number of the discrete states s probability
ʌ (c )
[S ijc ], i,
distribution
P0c (i )
and
the
transition
j S (c) are different for each target class.
s (c ), the initial
probability
matrix
D. Angelova and L. Mihaylova / Joint Target Tracking and Classification
35
{z k , y k } the set of kinematic z k and class (feature) y k
Denote with Ȧk
measurements obtained at time instant k. Then ȍ k {Z k , Y k } specifies the cumulative set of kinematic and feature measurements, available up to time k. The goal of the joint tracking and classification task is to estimate simultaneously the base state xk, the modal state Ok and the posterior classification probabilities P(c | ȍ k ) , c C , based on the observation set ȍ k . The problem can be stated in the Bayesian framework of estimating the posterior joint state-mode-class probability density function (pdf) p( x k , Ok , c | ȍ k ) , which can be computed recursively from the joint pdf at the previous time step in two stages - prediction and measurement update [3]. The predicted state-mode-class pdf p( x k , Ok , c | ȍ k 1 ) at time k is given by the equation
¦³ O
p( x k , Ok , c | ȍ k 1 )
k 1S ( c )
xk 1
p( xk , Ok | xk 1 , Ok 1 , c, ȍ k 1 ) p( xk 1 , Ok 1 , c | ȍ k 1 )dxk 1
(3) where the state prediction pdf p( x k , Ok | x k 1 , Ok 1 , c, ȍ k 1 ) is obtained from the state transition equation (1). The form of the measurement likelihood function p(Ȧk | x k ,O k , c) is usually known. When the measurement Ȧk arrives, the update step can be completed p ( x k , Ok , c | ȍ k )
p (Ȧk | x k ,O k , c) p( x k , Ok , c | ȍ k 1 ) p (Ȧk | ȍ k 1 )
(4)
The recursion (3)-(4) begins with the prior density P{ x0 , O0 , c} , assumed known, where x0 n x , c C , O0 S (c) . The target classification probability is calculated by P (c | ȍ k )
p(Ȧk | c, ȍ k 1 ) P(c | ȍ k 1 )
¦
M i 1
(5)
p(Ȧk | ci , ȍ k 1 ) P(ci | ȍ k 1 )
with an initial prior target classification probability P0 (c),
¦
cC
P0 (c) 1 . The state
estimate xˆ kc xˆ kc
¦ ³ O k S ( c )
xk
x k p ( x k , Ok , c | ȍ k ) d x k , c C
(6)
36
D. Angelova and L. Mihaylova / Joint Target Tracking and Classification
for each class c takes part in the calculation of the combined state estimate xˆ k xˆ kc P (c | ȍ k ) . It is obvious that the estimates needed for each class can be
¦
cC
calculated independently from the other classes. Therefore, the JTC task can be accomplished by the simultaneous work of M filters. SMC methods provide a number of useful suboptimal algorithms to approximate the optimal JTC solution, given by (3)-(6). In the general case of nonlinear state and measurement equations, particle filters represent the above complicated probability distributions by a set of N discrete, weighted (by W) samples {O(k j ) , x k( j ) , Wk( j ) }Nj 1 for each class c , and utilize importance sampling and weighted resampling to complete the filtering task [3].
2. Mixture Kalman Filtering for JTC
The dynamic system model (1)-(2) under consideration belongs to the class of Conditional Dynamic Linear Models (CDLM). The modal state variable Ok is called indicator variable. For a given trajectory of the indicator Ok , k 1, 2, , the system is both linear and Gaussian and can be estimated by a KF. The MKF exploits the conditional Gaussian property and utilizes a marginalization operation to improve the efficiency of the conventional SMC procedure. The samples are generated only in the indicator space and the target state distribution is approximated by a mixture of Gaussian distributions. Let ȁk {O0 , O1 , O2 , , Ok } be the set of indicator variables up to time instant k . By recursively generating a set of properly weighted random samples { ȁk( j ) , Wk( j ) }Nj 1 to represent the pdf p ( ȁk | ȍ k ) (for class c), the MKF approximates the state pdf p( x k | ȍ k ) by a random mixture of Gaussian distributions [8,7] N
¦W
( j) ( j) k N ( μk ,
Ȉ k( j ) )
(7)
j 1
where μk( j )
μk ( ȁk( j ) ) and Ȉ k( j )
Ȉ k ( ȁk( j ) ) are obtained by a KF, designed with the
system model (1-2), corresponding to class c . We denote by KFk( j ) { μk( j ) , Ȉ k( j ) } the sufficient statistics that characterize the posterior mean and covariance matrix of the state x k , conditional on the set of accumulated observations ȍ k and the indicator trajectory ȁk( j ) . Then based on the set of samples { ȁk( j )1 , KFk(j1) , Wk(j1) }Nj 1 at the previous
time
(k 1)
,
samples { ȁk( j ) , KFk( j ) , Wk( j ) }Nj 1
the
MKF
produces
a
respective
set
of
at the current time k . The correctness of the procedure
is proven in [8]. Using the likelihood function p(Ȧk | c, ȍ k 1 ) of class c C at time k , the class probabilities are calculated according to (5).
D. Angelova and L. Mihaylova / Joint Target Tracking and Classification
37
Consider the set of N compound particles {ck( j ) , O(k j ) , x k( j ) }Nj 1 . The class variable ck is assumed to be independent of ck 1 . It can take each possible value in C with an equal probability, i.e., P(ck c) 1 / M , c C . The indicator variable Ok takes values from the set S and evolves according to a Markov chain with transition matrix S . The set of samples is initialized according to the known initial class, state and mode distributions. The initial weights have equal probabilities (1 / N ) . At every time
instant k
1,2, , for each particle ( j ) 1, , N , first a class variable ck( j ) is drawn.
Then, depending on the realization of the class variable, O(k j ) and x k( j ) are generated in the following way [6]: the MKF scheme runs s KF prediction steps, according to each O S . The likelihoods of the predicted states are calculated based on the received measurement z k . They form a trial sampling distribution, according to which the new
O(k j ) is selected. Then the KF update step is accomplished only for the selected O(k j ) . The weight Wk( j ) is updated based on the factorized likelihood of the measurement Ȧ k . The sum of the weights, pertaining to class c , form the likelihood of class c and takes part in the calculation of the posterior class probabilities (5). The state estimate xˆ kc is updated based on the normalized weights of the particles, corresponding to c, according to (7). Finally, the combined output state estimate xˆ kc is evaluated according to the procedure from Section 2. The resampling scheme deals with the elimination of particles with small weights and replicates the particles with higher weights.
3. Model Parameters and Incorporation of the Constraints
Target model. In the two-dimensional target dynamics given by (1), the state vector x ( x, x , y, y ) ' contains target positions and velocities in the horizontal Cartesian coordinate frame. The control input vector u
(a x , a y ) ' includes target accelerations
along x and y coordinates. The matrices F and B G have the well known form [7, 10], corresponding to the uniform motion. The target is assumed to belong to one of two classes (M = 2), representing either a lower speed commercial aircraft with limited maneuvering capability (c1) or a highly maneuvering military aircraft (c2). The flight envelope information comprises speed and acceleration constraints, typical for each class. The speed V
x 2 y 2 of each class is limited respectively to the interval:
{c1 : V (100, 300)} and {c2 : V (150, 650)} [m / s ] . The control inputs are restricted to the following sets of accelerations: {c1 : u (0, 2 g , 2 g )} and {c2 : u (0, 5 g , 5 g )} , where g 9.81 [m/s2] is the acceleration due to gravity. The acceleration process uk is a Markov chain with five states (modes) s(c1) = s(c2) = 5. The following sets of modes (ax, ay) are selected in the implementation: { (0, 0), (0, A), (0, A), (-A, 0), (0, A) }, where A = 2g stands for class c1 target and A = 5g refers to the class c2.
38
D. Angelova and L. Mihaylova / Joint Target Tracking and Classification
Measurement model. The measurement vector z ( D, E ) ' consists of the distance D to the target and bearing E , measured by the radar. For the purposes of the MKF design, a measurement conversion is performed from polar to Cartesian coordinates [7]. The following sensor parameters are selected in the simulations: V D 120 [m] , V E 0.2 [deg] . The sampling interval is T 5 s . Speed constraints. Acceleration constraints are imposed on the filter operation by the use of a control input in the target model. The speed constraints are enforced through the speed likelihood functions [7]. According to the problem formulation, presented in Section 2, feature measurements y k , k 1,2, are used for the purposes of classification. In our case we do not have feature measurements. The feature measurement likelihood functions g c ( y k ), c C are replaced by speed likelihood functions. They are constructed based on the speed envelope information. The speed likelihood functions, together with speed estimates, form a virtual “feature measurement” set {Y k } . At each time step k , the filtering algorithm gives a combined state estimate xˆ k . Let us assume that the estimated speed from the previous time step, ˆ V , is a kind of “feature measurement”. The likelihood function is factorized k 1
ˆ p(Ȧ k | x k , Ok , c) f (z k | x k , Ok ) g c (y k ) where y k V k -1 . The normalized speed likelihoods represent speed-based class probabilities estimated by the filters.
4. Simulation Results
The simulated path of a second class target is shown in Figure 1 (a). It performs four turn maneuvers with accelerations 1g; 2g; 5g; 2g. The 5g turn provides insufficient true class information, since the maneuver is of short duration, and the next 2g turn can lead to a misclassification. The speed of 250 [m/s] provides better conditions for the probability; that the target belongs to class 2, according to the speed constraints. The estimated speed probabilities assist in the proper class identification, as we can see in Figure 1 (b). The Root-Mean Squared Errors (RMSEs) [10]: on position (both coordinates combined) and speed (magnitude of the velocity vector) are presented in Figure 2. The RMSEs shown are for each separate class, and the combined RMSE obtained after averaging with the class probabilities. The MKF is implemented with N = 200 particles. The results are based on 100 Monte Carlo runs. The experiments show that the filter provides reliable tracking of intensively maneuvering targets with accelerations up to 5g with acceptable errors.
39
D. Angelova and L. Mihaylova / Joint Target Tracking and Classification 70
class # 1 class # 2
65 1
60
50
y [km]
55 0.8
45
0.6
40 0.4
35
30 0.2
START 25
t [scans]
x [km] 20 20
25
30
35
40
45
50
55
60
65
70
0
0
10
20
30
Figure 1. (a) Test trajectory
40
50
60
70
80
(b) Class probabilities
340
180
combined class # 1 class # 2
300
280
260
combined class # 1 class # 2
160
Speed RMSE [m/s]
Position RMSE [m]
320
140
120
100 240 80 220 60 200 40
180
t [scans] 160
0
10
20
30
40
50
60
70
t [scans] 80
20
Figure 2. (a) MKF position RMSE [m]
0
10
20
30
40
50
60
70
80
(b) MKF speed RMSE [m/s]
The computational complexity of the proposed MKF allows for an on-line implementation. The experiments are performed on a PC computer with AMD Athlon processor 1.4 GHz. The MKF computational time in Matlab environment is 0.3 seconds per scan.
5. Conclusions
We propose a mixture Kalman filter for joint maneuvering target tracking and classification by accounting for acceleration and speed constraints. The operation of two multiple model class-dependent MKFs is simulated by a suitably determined random class variable. Thus a relatively simple structure of the algorithm is achieved. The filter performance is analyzed by simulation over typical target trajectories. The results show reliable tracking and correct class discrimination. Generalization to more target classes is straightforward.
References [1] [2] [3]
E. Blasch, C. Yang, Ten Methods to Fuse GMTI and HRRR Measurements for Joint Tracking and Identification, Proc. of the 7th Intl. Conf. on Inf. Fusion, pp.1006-1013, 2004. B. Ristic, N. Gordon, A. Bessell, On Target Classification Using Kinematic Data, Information Fusion, Elsevier Science, 5, pp.15–21, 2004. A. Doucet, N. de Freitas, N. Gordon (editors), Sequential Monte Carlo Methods in Practice, SpringerVerlag, New York, 2001.
40 [4] [5] [6] [7] [8] [9] [10]
D. Angelova and L. Mihaylova / Joint Target Tracking and Classification
N. Gordon, S. Maskell, T. Kirubarajan, Efficient particle filters for joint tracking and classification, Proc. SPIE Signal and Data Processing of Small Targets, 2002. M. Malick, S. Maskell, T. Kirubarajan, N. Gordon, Littoral Tracking Using Particle Filter, Proc. of the Fifth Int. Conf. Information Fusion, 2002. P. Smets, B. Ristic, Kalman Filter and Joint Tracking and Classification in the TMB Framework, Proc. of the 7th Intl. Conf. on Multisensor Information Fusion, Sweden, pp. 46-53, 2004. D. Angelova, L. Mihaylova, Joint Target Tracking and Classification with Particle Filtering and Mixture Kalman Filtering Using Kinematic Radar Information, Digital Signal Proc., 2005 (to appear). R. Chen, J. Liu, Mixture Kalman Filters, J. Royal Statistical Society B, 62, pp. 493–508, 2000. R. Chen, X. Wang, J. Liu, Adaptive Joint Detection and Decoding in Flat-Fading Channels via Mixture Kalman Filtering, IEEE Trans. on Inform. Theory, 46, pp.493–508, 2000. Y. Bar-Shalom, X.R. Li, Multitarget–Multisensor Tracking: Principles and Techniques, YBS Publishing, Storrs, CT, 1995.
41
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
A Survey on Assignment Techniques Felix OPITZ EADS Deutschland GmbH, Wörthstr. 85, 89077 Ulm, Germany
Abstract: We address the assignment problem by considering the relationship between single sensor measurements and real targets. Here, the class of multidimensional assignment is considered for multi-scan as well as unresolved measurement problems in multi-target tracking. Keywords: Data Fusion, Multi-Target-Tracking, Assignment, Data Association, Convex Analysis, Bundle Trust Region Method, Linear Programming, Interior Point.
Introduction Assignment methods form an essential functionality of multi sensor data fusion, which is applied in various areas: guidance of traffic flows, coastal surveillance, and air traffic control. These methods examine the relations between sensor plots (positions) and targets. A plot may be caused by one or multiple targets, or by the environment. On the other hand, a target may also go undetected by a sensor for several reasons. Assignment techniques generate, evaluate, and select hypothesis about the associations between plots and their origins.
1. Hypothesis Generation Hypothesis generation decides over the association hypotheses. Two-dimensional assignment considers the simultaneous relation between targets and sensor plots within a full sensor scan. This is done assuming that each plot is associated with at most one target and each target is associated with at most one plot. The mathematical description is realized by an 0-1-indicator s.t. F ij
1 determines the association between target i
and plot j , where plot 0 expresses false alarm. c ij determines the weight of the local association F ij
1 to the global assignment. The assignment problem, shown in Figure
1, becomes an integer optimisation problem of the form: m
min
n
¦ ¦ c ij F i j
F ij i 0 j 0
n
with
¦ F ij j 0
m
1, i 1,..., m and
¦ F ij
i 0
1, j 1,..., n
(1)
42
F. Opitz / A Survey on Assignment Techniques
Figure 1. Two dimensional assignment
Higher-dimensional optimisation simultaneously establishes an optimal relation between the targets, the plots of a first scan, and those of a second one, shown in Figure 2. Using an indicator function F ij j ^0,1` this is expressed as [1]: 1 2 n1
m
min ¦ ¦
n2
¦
F ij1 j2 i 0 j 0 j 0 1 2
c ij j F ij j subject to 1 2 1 2
n2
m
¦ ¦ F ij j
1 2
i 0 j2 0
n1
n2
¦ ¦ F ij j
j1 0 j2 0
1 2
1, i 1,..., m;
m n1
1, j1 1,..., n1 ;
¦ ¦ F ij j
1 2
i 0 i1 0
1, j 2 1,..., n2
(2)
Figure 2. Three dimensional assignment
Another application of multidimensional assignment considers merged plots caused by the limited resolution capability of sensors [2]. Let F ij1i2 ^0,1` be an indicator between a pair of targets and plots. An unresolved plot j belongs to an unordered pair of targets i1 , i2 , i.e. F ij1i2 ii
most one target i1 , i.e. F j1 1 m
min
n
¦ ¦
F iji i2 i ,i 1 2
0j 0
ii
ii
c j1 2 F j1 2 s.t.
F ij2i1
1 . A resolved plot is associated with at
1: m
¦
m
¦ F ij1i2
i1 0 i2 i1
n
1; j 1,..., n
m
¦ ¦ F ij i
12
j 0 i2 0
1; i1
1,..., m (3)
43
F. Opitz / A Survey on Assignment Techniques
2. Hypothesis Evaluation
The hypothesis evaluation is computer by the likelihoods defined in filter theory, see [1, 2]. 2
>1 PD @G
Lij
1 j2
jk , 0
P
[ UD N ( z ki ; Hx ij F
k 1
k | j k 1
, S ij
k | j k 1
1G jk , 0
)]
, c ij
ln( Lij
1 j2
1 j2
) (4)
3. Hypothesis Selection 3.1. Two Dimensional Optimisation
Methods to solve the two dimensional optimisation problem are: the Hungarian method, the Munkres-algorithm, and the Jonker-Volgenannt-Castanon or so called Auction algorithms [3, 4], shown in Figure 3. if i unassigned
start forward step
u i : 0, i 1,..., m; u j : 0, j 1,..., n;
while i : j : Ȥ ij
for j 1 : n if
c 0j
u j 0 then u j :
i: c 0j
argmax {Ȥ ij
reverse step
0
while j : i : Ȥ ij
0};
j : max ( argmin { cik u j} );
end;
k
min{c ik u k } ° w : ® kz j °¯ ci0 H
j i
max (argmin{c kj u k } );
0};
k
jz 0 ; j 0
min{c kj u k }; ° w : ® k zi c0j H °¯
Ȥ ij 1; u i : w İ;
jz0 ; j 0
Ȥ ij 1; u j : w İ;
if j z 0 then
if i z 0 then
Ȥ kj : 0 k z i; u j : w cij İ end; end;
0
argmax {Ȥ ij
Ȥ ik
0 k z j; u i : w cij İ;
end; end;
if j unassigned
Figure 3. Forward – Reverse – Auction Algorithm
3.2. Higher Dimensional Optimisation: Convex Optimisation
The higher dimensional optimisation problem is NP-hard. One method of dealing with the optimisation problem Eq. (2) is to reduce it to a 2-dimensional one, using Lagrange multipliers, and an additional convex optimisation.
44
F. Opitz / A Survey on Assignment Techniques
3.2.1. Relaxation and Lagrange Multipliers & Relaxation uses the last set of constraints and Lagrange multipliers u
(u j2 ) j2 1,...,n2 , to relax the 3-dimensional problem into a 2-dimensional one [1, 3, 5]. One obtains ( u0 : 0 ): m
&
min ¦
I (u )
n2
¦ ¦ c
z ij1 j 2 i 0 j 0 j 0 1 2
n1
m
n1
¦¦
i 0 j1 0
z ij j 1 2
u j2 z ij j 1 2
i j1 j2
n1
m
1, i 1,..., m;
¦¦
i 0 j1 0
z ij j 1 2
n2
¦u j
subject to
2
j2 0
(5)
1, j1 1,..., n1 and
z ij j 1 2
^0,1`
& & Given a fixed u , I (u ) may be calculated via the Auction algorithm. With the optimal
solution F ij
of the original problem in Eq. (2)., the following identity holds:
1 j2
m
&
I (u ) d ¦
n1
n2
¦ ¦ c ij j
i 0 j1 0 j2 0
1 2
F ij
1 j2
& &max I (u ) d uIR N
n1
m
n2
¦ ¦ ¦ c ij j i 0 j1 0 j2 0
1 2
F ij
(6)
1 j2
3.2.2. Convex Analysis and Bundle-Trust-Region Method
The function \ I is a convex (not differentiable) function, s.t. Eq. (6) is a convex & optimization problem. Instead of a vector valued gradient \ (u ) of a differentiable function, the convex analysis deals with set-valued subgradient resp. H subgradient [6]:
^ ^
`
& & & & & & & & w\ (u ) : g IR N : \ (v ) t \ (u ) g7 (v u ), v & & & & & & & & w H \ (u ) : g IR N : \ (v ) t \ (u ) g7 (v u ) H, v
(7)
`
An advanced and efficient method of solving the convex optimization problem & & & defines iterative bundle Bk ^g i | g i w\ (ui ); i 1,..., lk ` of subgradients and thus establishes an approximation of \ by a sequence of so called cutting planes \ k (Figure 4):
&
&
& &
&
\ k (u ) : &max ^\ (ui ) g i u ui g i Bk
`
(8) &
Further, defining the linearization errors as D ki
&
& &
&
\ (uk ) \ (ui ) g i u k ui , one obtains an inner approximation of the H subgradient for \ [7]: lk ° lk & S k ,H : ® E i g i | E i t 0, E i °¯ i 1 i 1
¦
¦
1,
lk
½°
i 1
¿
&
¦ EiD ki d H ,¾° wH\ (uk )
(9)
45
F. Opitz / A Survey on Assignment Techniques
&
\
\ ( u1 )
&
\3
\ (u 2 ) & \ (u 3 )
& g3 & & u1 u 3
&
\2
\ (u 2 ) & \ (u 3 )
\
\ ( u1 )
& u1
& u2
& u3
& u2
Figure 4. Cutting plane approximation
The cutting plane \ k approximation is used (Figure 4) to find the descent & & & direction d u u k in the different iteration steps: &
&
\ (u k , d )
& & & & max ^\ (ui ) g i u ui
i 1,...,lk
& & & & & & max \ (ui ) g i , d g i , ui u k
`
i 1,...,lk
^
`
(10)
& Of course, one cannot trust this model far away from u k , so a penalty term is & added as a further modification [7]. The determination of d k for known step size tk is done by:
& & d : d (t )
^
& & arg min \ (u k , d )
& & d d
1 2t k
`
(11)
i.e., the determination of the descent direction is a quadratic optimisation problem: & & (v, d ) : (v(t k ), d (t k ))
^
arg min v ( v , d )IR n 1
1 2t k
& & & & d d | v t g i d D ki
2 lk lk lk & °1 arg min ® 2 E i g i t1 E iD ik | E i t 0, i 1,..., lk , E i k E ° i 1 i 1 i 1 ¯
¦
& with d (t k )
lk
t k
¦ i 1
¦
& E i g i , v(t k )
¦
lk
t k
¦ i 1
2
& Ei gi
½ ° 1¾ ° ¿
`
or
(12)
(13)
lk
¦ E iD ik , using the duality [7, 8]. i 1
Eq. (13) may be interpreted as the projection of the null vector 0 onto S k ,H for lk
H
¦ E iD ik . i 1
46
F. Opitz / A Survey on Assignment Techniques
Therefore, it may be used to realize a stopping criteria (Figure 6). step 0: initialisation: constants: l ! 3 and 0 m1 m 2 1,0 m 3 1, T ! 0, W ! 0, W ! 0, H ! 0 ~& & & ~ u 0 IR n ; g1 w\ (u 0 ); B0 ^g1`; D10 : 0; g0 : g1; D 0 : 0; k : 0
& step 1: iteration: determine new trust parameter t k and descent direction d k (fig. 6)
Bk 1 & gi
{
& gi
serious step & & & u k 1 u k d k
null step & & u k 1 u k
Dik 1
Dik 1
& gl k 1
~& gk
D lk11
& gl k 2
& h}
D lkk12
& & & & Dik \ (u k 1 ) \ (u k ) t k gi 7d k & 7& ~ \ (u& ) \ (u& ) t ~ D k k 1 k k gk d k
Dik , i
1,..., lk ,
lk
D lkk11
¦ Eik Dik i 1
& & & & & \ ( u k ) \ ( u k d k ) h 7d k
D lkk12
0
^
`
step 2: reduction of bundle: Set N max i | Dik 0 and obtain Bk 1 by deletion of one or two & , i z N from B , s.t. lk 1 # Bk 1 d l whenever necessary. Set k:=k+1 and go to step 1. Figure 5. Bundle trust region algorithm (main)
step 0: set W 0L : 0, W 0U : T, W d W 0 T, j : 0 & step 1: calculate (Q, d ) and E via SQP, see Eq. (13) & & & & & ~ v : u k d, h w\ ( v), D k:
lk
¦
~& Ei D ik , g k :
lk
&
¦ Ei g i , D
& & & & \ ( u k ) \ ( v) h 7 d
i 1
i 1
& ~ d H and ~ Whenever D g k d H then stop the algorithm with t k 1 k
Wj
step2: Distinguish between the following cases: & & & & case 3 : \( v) \ (u k ) t m1Q and case 1 : \( v) \ (u k ) m1Q and & & & ~ \ (u& ) \( v& ) d ~ ~ g k 1 D D d m 3D h 7d t m 2Q W j t T W k k k 1 W j ; stop (serious step)
tk
tk
W j ; stop (null step)
& & case 2 : \ ( v) \ (u k ) m1Q and & 7& h d m 2Q W j T W
& & case 4 : \ ( v) \ (u k ) t m1Q and & ~ \ (u& ) \ ( v& ) ! ~ g D!m D
W Lj1
W j , W Uj1
W Lj1
W j1
1 (W 2 j
j
W Uj ,
min(T, W Uj ))
j 1; goto step 1
3 k
k
W Lj , W Uj1
W j , W j1
k 1
1 (W L 2 j
~ D k 1
W j)
j j 1; goto step 1
Figure 6. Determination of the trust region parameter
The final modification of the bundle trust region method is to distinguish between so called serious and null steps: a new iteration step is called a serious step if it delivers an improved approximation of the minimal point. The remaining steps do not improve the minimum point. Nevertheless they are used to improve the cutting plane (Figure 5 and 6).
F. Opitz / A Survey on Assignment Techniques
47
3.3. Higher Dimensional Optimisation: Linear Programming To be able to apply linear programming, one has to renounce the integer constraint of the indicators. The optimisation problem in Eq. (2) or Eq. (3) is transformed via [2, 9]: x( n1 1)( m 1) j1 ( m 1) j 2 i 1 l F ij
1 j2
resp. x( m 1)( m 1) j ( m 1)i1 i2 1 l F ij1i2
(14)
One obtains the linear programming in the standard form: min c 7 x subject to x
& Ax
& & & & b 0d x d1
(15)
Theoretically, this linear programming may be solved by the simplex method. Unfortunately, the complexity of the simplex method may increase exponentially with the dimension. However, Khachiyan proved that a linear program is solvable in polynomial time. A class of efficient algorithms are the Inner Point methods [10]. j j
However, a solution F i ji resp. F i 1 2 for Eq. (2) resp. (3) found by these new methods 12
need not be integer. But, due to the constraints it allows an interpretation in the sense of a (pseudo-) probability for an association [9].
References [1]
A. B. Poore; Rijavec, N.: “A Lagrangian Relaxation Algorithm for Multi-dimensional Assignment Problems Arising from Multitarget Tracking,” SIAM Journal on Optimization, Vol. 3, No. 3, pp 544563, August 1993. [2] F. Opitz: Clustered multidimensional data association for limited sensor resolutions. Proc. 8th Int. Conf. on Information Fusion, Philadelphia, PA, USA, 2005. [3] P. J. Shea, Lagrange-Relaxation-Based Methods for Data Association Problems in Tracking, PhD Thesis, Colorado State University, Fort Collins, 1995. [4] D. Bertsekas: Network Optimization – Continuous and discrete models, Athena Scientific, Belmont, Massachusetts, 1998. [5] F. Opitz: Data Association based on Lagrange Relaxation & Convex Analysis. NATO-RTO Symposium on “Target Tracking & Data Fusion for Military Observation Systems”, Budapest, Hungary, 2003. [6] J.-B. Hiriart-Urruty; Lemaréchal, C.: Convex Analysis and Minimization Algorithms I and II, Grundlehren der mathematschen Wissenschaften 306, Springer-Verlag, Berlin Heidelberg, 1993. [7] W. Alt: Numerische Verfahren der konvexen, nichtglatten Optimierung, Teubner, Stuttgart, 2004. [8] H. Schramm; Zowe, J.: A Version of the Bundle Idea for Minimizing a Nonsmooth Function, SIAM Journal on Optimization 2, pp. 121-152, 1992. [9] X. Li; Luo, Z.-Q.; Wong, K.M.; Bossé, E.: An Interior Point Linear Programming Approach to TwoScan Data Association, IEEE Trans. on Aerospace and Electronic Systems, Vol. 35, No. 2, April 1999. [10] Y. Ye; Todd, M.J.; Mizuno, S.: “An O(¥nL)-Iteration Homogeneous and Self-Dual Linear Programming Algorithm,” Mathematics of Operations Research, Vol. 19, No. 1, February 1994.
48
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Non-linear Techniques in Target Tracking Thomas KAUSCH, Kaeye DÄSTNER and Felix OPITZ EADS Deutschland GmbH, Wörthstr. 85, D-89077 Ulm, Germany
Abstract. New classes of tracking algorithms combining Variable-StructureInteracting Multiple Model (VS-IMM) techniques, augmentation or dual estimation, and Unscented Kalman filtering are presented in this paper. These filter methods ensure significant self-adjusting and inherent manoeuvre detection capabilities. The algorithms are distinguished through their highly accurate course and speed estimations, even for manoeuvring targets. The performance of these techniques is demonstrated for targets performing turns of varying cross accelerations. Keywords. target tracking, sensor data fusion, interacting multiple model, augmentation, variable structure, Kalman Filter, Unscented Transformation, Unscented Kalman Filter.
Introduction The challenge of the tracking problem is to identify the path of a manoeuvring target based on noisy radar measurements. To be more precise, it is the aim of modern filter techniques to improve the sensor measured position, to derive course and speed estimations and to provide a measure of the estimation uncertainty, which is used in a succeeding data association process. Finally, modern filter techniques also allow the simultaneous classification of the target manoeuvres. The basic components needed for every filter is the mathematical expression for the assumed target dynamics and the relationship between the target state and the measurement, i.e., the measurement equation:
xk
f( xk 1 ) nkx1 , yk
h( xk ) nky
(1)
Herein x k is the state vector; i.e., Cartesian position and velocity and the process noise n kx1 and the measurement noise n ky1 are zero mean white Gaussian processes x
y
with covariances Rk 1 and Rk .
T. Kausch et al. / Non-Linear Techniques in Target Tracking
49
1. Increase of maneuver spectrum
1.1. Interacting Multiple Model (IMM) The challenge of tracking maneuvering targets is to find a suitable dynamic model with respect to the true but uncertain target behaviour to get an improved radar system performance. The uncertainty in the choice of the propagation equation leads directly to the idea of the well known Interacting Multiple Model (IMM) [1, 2, 3] as an efficient estimation technique suitable for unsure target maneuver hypotheses. Instead of a single dynamic model the IMM contains a whole filter bank of different maneuver models. By a statistical mixing of the implemented models, the covered target maneuver spectrum is extended. To avoid an exploding growth of concurring models with the resulting increased CPU load, one searches for methods that offer a limitation of the necessary model set within the IMM approach.
1.2. Variable Structure One such extension, called VS-IMM, is the real-time modification of the used model set, so that the number of models may be limited by concentrating on the most suitable ones [4].
1.3. Augmentation Another possibility is to extend the reliability of a single maneuver model. The augmentation techniques [5, 6] are applicable when a maneuver model m allows a parameterization f m (., Z ) with a parameter Z. Prominent examples are coordinate turn models, with Z as turn rate, or the ballistic models [7], with Z as ballistic coefficients. Augmentation is realized by extending the state vector with the parameterisation Zk , which leads to the new propagation equation for model m:
xˆk
§ nx · fˆ m ( xˆk 1 ) ¨¨ kZ1 ¸¸ © nk 1 ¹
§ f m ( xk 1 , Zk 1 ) · § nkx1 · ¸¨ Z ¸ ¨ ¸ ¨n ¸ ¨ Zk ¹ © k 1 ¹ ©
(2)
and
yk
hˆ( xˆk ) nky
h( xk ) nky
(3)
instead of Eq. (1). Filtering is performed with the augmented states or within a dual estimation approach, where two separated estimators are used both for state and parameter estimation [8, 9].
50
T. Kausch et al. / Non-Linear Techniques in Target Tracking
2. Handling of Non-Linearity The most popular Kalman filter is a linear filter assuming linear relationships in both propagation of the state and projection onto the measurement space. To handle nonlinearities the extended Kalman filter and the newer Unscented Kalman filter (UKF) are possible choices.
2.1. Extended Kalman Filter The extended Kalman filter [2, 3] handles the problem of non-linearity by linearization of the corresponding functions with respect to the covariance transformations. This results in the following change wrt the linear Kalman filter:
Pkm
S km
7
Fkm1 Pkm1 Fkm1 Rkmx1 , with Fkm1
7
H km Pkm1 H km Rky , with H km
§wf m ¨ ¨ wx ©
· ¸ ¸ m xk 1 ¹
§ wh · ¨ ¸ ¨ wx x m ¸ k © ¹.
(4)
(5)
Finally the Kalman gain matrix and the new state and covariance are calculated:
K km
Pkm H kT S km
1
(6)
(7)
(8)
xkm
xkm K km y k y km
Pkm
Pkm K km S km K km
7
2.2. Unscented Kalman Filter To handle non-linear estimation problems, the UKF makes use of the Unscented Transformation. This means that a certain number of samples of the probability distribution are used to perform the non-linear transformation. After this the transformed samples are recombined to get the transformed mean and covariance of the probability distribution. A very clear description of this algorithm can be found in [10, 11, 12].
51
T. Kausch et al. / Non-Linear Techniques in Target Tracking
3. Combined Algorithms
3.1. Dual UKF-VS-IMM & Dual EKF-VS-IMM The Dual UKF-VS-IMM (Dual EKF-VS-IMM) uses a bank of M manoeuvre models realised by Unscented Kalman filters (extended Kalman filters). Every model is described by a parameterised propagation function f m (., Z m ), m 1, , M . It is assumed that the model parameters are bounded to avoid a degeneration of the model resulting in a decreased filter performance by allowing non-realistic states: m m @ Z m >Zlower , Zupper
(9)
This approach starts with the interaction and mixing step of the standard IMM technique using the mixing probabilities P kn|m1 , states xkn1 and covariances Pkn1 of the previous step:
xk0m1
M
¦P
n|m n k 1 k 1
x
n 1
(10)
M
Pk0m1
¦P n 1
n| m k 1
>P
n k 1
xkn1 xk0m1 xkn1 xk0m1
@ 7
(11)
A filtering is executed by the dual estimation methodology for every individual model. Given Z km1 a UKF (EKF) is applied to estimate the current dynamic parameter Z~ m based on the new measurement y and the last state estimation x 0 m . In this step k
k
k 1
the propagation of the dynamic parameter is the trivial one while the projection onto the measurement space is given by
Z h$ f m ( xkm1 , Z )
(12)
One must prevent the model set from model coalescence by considering Eq. (9). This is done by a recovery step whenever the estimated parameter drifts out of its region:
Zkm
m m Zlower if Z~km Zlower ° m m ~m ®Zupper if Zk ! Zupper ° Z~ m else ¯ k
(13)
52
T. Kausch et al. / Non-Linear Techniques in Target Tracking
Finally, the new state xkm and covariance Pkm are estimated by a second UKF (EKF) assuming Z km . Simultaneously the likelihood is calculated for each individual model:
/mk
1 ( yk ; ykm , S km )
(14)
After the above branching into the different models, the new mixing and model probabilities are determined via the transition probabilities W n|m :
W n|m P kn1
P kn|m1
M
¦W
i|m
P ki 1
i 1
(15) M
P km
/mk ¦W i|m P ki 1 i 1 M i k j 1
M
¦ / ¦W i 1
j|i
P kj1 (16)
Finally, the model specific states and covariances are combined into global ones. M
xk
¦x
m k
P km
m 1
M
Pk
¦P m 1
(17) m k
>P
m k
xkm xk xkm xk
@ 7
(18)
3.2. UKF-VS-AIMM & EKF-VS-AIMM The UKF-VS-AIMM (EKF-VS-AIMM) applies VS-IMM methods to the augmented state spaces instead of the dual estimation approach above. The drawback is that parameter and state are highly coupled. This may be a disadvantage with respect to modularisation aspects and the capability to combine different types of manoeuvre models.
4. Example and Simulation Results The following example illustrates the techniques developed above. It is applied on a model set consisting of two Coordinated Turn (CT) models (left and right handed turns) parameterised by turn rates and a Constant Velocity (CV) model. The following
T. Kausch et al. / Non-Linear Techniques in Target Tracking
53
diagrams show a comparison between the different techniques. The xy-plots (Figure 1 and Figure 2) show a single run. The remaining plots (Figure 3 to Figure 5) are the result of Monte Carlo simulations. The solid line determines the Dual UKF VS-IMM [13], the dashed line the UKF-VS-AIMM [14] and the dotted line the result of an EKFVS-AIMM (VS-AIMM) [5]. The sensors are assumed to deliver range, azimuth and Doppler.
Figure 1. Measurements (left) and resulting track calculated with a Dual UKF VS-IMM (right)
Figure 2. Tracking result when using an UKF VS-AIMM (left) and EKF VS-AIMM (right)
Figure 3. Probabilities of the CT left model (left) and CV model (right)
54
T. Kausch et al. / Non-Linear Techniques in Target Tracking
Figure 4. Probability of the CT right model (right)
Figure 5. Course (left) and speed (right) accuracy (rms)
5. Conclusion New classes of UKF controlled IMM algorithms were introduced. These classes of algorithms realise synergies by combining recently founded techniques in a common scheme. For an example based on coordinated turn models these algorithms were compared with classical, EKF based, techniques. The new algorithms are proven to possess excellent estimation accuracy and manoeuvre detection capabilities.
References [1] [2] [3] [4]
[5] [6] [7] [8]
Y. Bar-Shalom and X.-R. Li: Estimation and Tracking: Principles, Techniques, and Software , Artech House, 1993. Y. Bar-Shalom and X.-R. Li: Multitarget-Multisensor Tracking: Principles and Techniques, YBS, Year of publication, 1995. S. Blackman and R. Popoli: Modern Tracking System, Artech House, 1999. X. R, Li.. Engineer’s guide to variable-structure multiple-model estimation for tracking: "Engineer’s guide to variable-structure multiple-model estimation for tracking", In Yaakov Bar-Shalom and W. D. Blair L. editors, Multitarget-Multisensor Tracking Applications and Advances" Volume III, pages 499 – 567, Artech House, Boston, 2000. E. Semerdjiev, L. Mihaylova and X. R. Li: "Variable- and Fixed-Structure Augmented IMM Algorithms Using Coordinate Turn Model", Third Int. Conf. Information Fusion, Paris, France, 2000. R. F. Stengel: Optimal Control and Estimation, Dover Pubns., 1994. A. Farina, D. Benevenuti, B. Ristic: "Estimation accuracy of a landing point of a ballistic target", Fifth International Conference on Information Fusion, Annapolis, Maryland, USA, 2002. C. K. Chui and G. Chen: Kalman Filtering, Springer Verlag, 1987.
T. Kausch et al. / Non-Linear Techniques in Target Tracking
[9] [10] [11]
[12] [13] [14]
55
E. A. Wan and A. T. Nelson: Dual Extended Kalman Filter Methods", In Siman Haykin editor, "Kalman Filtering and Neural Networks. John Wiley & Sons Inc., 2001. B. Ristic, S. Arulampalam and N. Gordon: Beyond the Kalman Filter – Particle Filters for Tracking Applications; Artech House, 2004. S. Julier, J. K. Uhlmann, and H. F. Durant-Whyte: "A New Method for the nonlinear transformation of means and covariances in filter and estimators", IEEE Transactions on Automatic Control, Vol. 45, No. 3, March 2000. S. Julier and J. K. Uhlmann. Data Fusion in Nonlinear Systems. Authors: "Data Fusion in Nonlinear Systems", in David L. Hall and James Llinas editors, "Handbook of multisensor data fusion", pages 131 – 13-21, CRC Press, Boca Raton, 2001. F. Opitz and T. Kausch: "UKF controlled Variable-Structure IMM Algorithms using Coordinated Turn Models", Seventh Int. Conf. Information Fusion, Stockholm, Sweden, 2004. F. Opitz: "A Variable Structure Augmented IMM Algorithm based on Unscented Transformations", International Radar Symposium, Warszawa, Poland, 2004.
56
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Underwater Threat Source Localization: Processing Sensor Network TDOAs with a Terascale Optical Core Device Jacob BARHEN a,1, Neena IMAM a, Michael VOSE a,b, Arkady AVERBUCH c, and Michael WARDLAW d a Oak Ridge National Laboratory, United States of America b University of Tennessee, United States of America c Lenslet Inc., Israel d Office of Naval Research, United States of America
Abstract. Revolutionary computing technologies are defined in terms of technological breakthroughs, which leapfrog over near–term projected advances in conventional hardware and software to produce paradigm shifts in computational science. For underwater threat source localization using information provided by a dynamical sensor network, one of the most promising computational advances builds upon the emergence of digital optical-core devices. In this article, we present initial results of sensor network calculations that focus on the concept of signal wavefront Time-Difference-of-Arrival (TDOA). The corresponding algorithms are implemented on the EnLight™ processing platform recently introduced by Lenslet Laboratories. This tera-scale digital optical core processor is optimized for array operations, which it performs in a fixed-point-arithmetic architecture. Our results (i) illustrate the ability to reach the required accuracy in the TDOA computation, and (ii) demonstrate that a considerable speed-up can be achieved when using the EnLight™ 64D prototype processor as compared to a dual Intel XeonTM processor. Keywords. Time-Difference-of-Arrival (TDOA), optical-core processor, sensor net, underwater source localization.
Introduction In recent years, there has been a rapidly growing interest in near–real–time remote detection and localization of underwater threats using information provided by dynamically evolving sensor networks. This interest has been driven by the requirement to improve detection performance against stealthier targets using ever larger distributed sensor arrays under a variety of operational and environmental conditions. Figure 1 illustrates a typical mission, depicting a submerged threat (here a submarine), a patrol aircraft searching for it, and a field of Global Positioning System (GPS) capable sonobuoys. The buoys are passive omnidirectional sensors that provide 1
Corresponding Author: J. Barhen, Center for Engineering Science Advanced Research, Computer Science and Mathematics Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN 37831-6016; E-mail:
[email protected] J. Barhen et al. / Underwater Threat Source Localization
57
sound pressure measurements of the target signal perturbed by the ambient conditions. Once the buoys are placed, the aircraft monitors their transmissions and processes the data to detect, classify and localize the threat. The sonobuoys continuously monitor and transmit the measured signal via radio link with the aircraft. The position of the buoys is sampled periodically and also transmitted via radio link. A field of self localizing sonobuoys provides a unique means for underwater target detection in terms of its deployment flexibility, signal acquisition speed, focused ranging, and capability for net-centric information fusion. However, demanding calculations need to be performed to achieve source localization, and their complexity is known to increase dramatically with the size of the sensor array. This, in turn, results in substantial processing power requirements that cannot readily be met with off-the-shelf computing hardware. In fact, it is generally recognized that the development of acoustic sensors for Figure 1. Patrol aircraft monitoring GPS-capable underwater detection is much less sonobuoys challenging than identifying and implementing, in near real-time, and often under severe power availability constraints, the appropriate signal processing and detection algorithms. Here, we will consider the implementation of an algorithm for signal wavefront Time-Difference-of-Arrival (TDOA) at each array element of a distributed sensor network. TDOA techniques are the cornerstone of modern source localization paradigms. Our implementation is carried out on the recently introduced EnLight platform. This revolutionary digital optical core processor offers tera-scale computing capabilities in a limited (native 8-bit) precision, fixed-point arithmetic architecture. The specific objective of our effort was to (i) demonstrate the ability to reach a required accuracy in the TDOA computation, and (ii) estimate the speed-up achieved when using an EnLight device as compared to a leading-edge Intel XeonTM processor.
1. Underwater Threat Localization with a Sensor Network Over the past few decades, a great deal of effort has been devoted to the extraction of spatio-temporal information from an array of spatially distributed sensors [1, 2]. In the area of Anti-Submarine Warfare (ASW), much attention has focused on adaptive beamforming, primarily in the context of towed arrays [3, 4]. The basic emphasis of such a research was to achieve robust detection and Direction-of-Arrival (DOA) estimation under requirements for auto-calibration of the arrays [5, 6]. Notwithstanding the
58
J. Barhen et al. / Underwater Threat Source Localization
considerable progress reported over the years, today’s leading paradigms still face substantial degradation in the presence of realistic ambient noise and clutter [7]. With the emergence of large scale dynamic sensor networks (as depicted in Figure 1), where each individual sensor is subject to random motion, many previously postulated basic assumptions [8] (e.g., far-field geometry) are no longer valid. For instance, the sensors typically have arbitrary spacing between them, and the aperture of the distributed array may be considerable compared to the distance to the threat. Thus, different paradigms must be considered. One of the most robust is based on the concept of signal wavefront TDOA [9]. Several methodologies are available for localizing a threat source in the context of TDOAs. For illustrative purpose, we mention here briefly only three interesting approaches. Each of them requires, as a necessary first step, that accurate estimates of TDOAs for each combination of sensors in the network be obtained. The first methodology finds an estimate for the source location given the TDOAs and the sonobuoy positions using either maximum likelihood [10] or iterative least squares [11] optimization procedures. The second methodology attempts to directly obtain a closed form solution of the source location. Recently reported results [12] indicate that excellent accuracy can be achieved under minimal operational constraints of sensor non colinearity. The third approach is novel, and is introduced in a companion paper [13]. Its primary interest resides in the fact that it enables simultaneous estimation of the TDOAs and the threat source location. It builds upon the NOGA algorithms [14] that were developed for uncertainty analysis of nonlinear systems.
2. Time Differences of Arrival Let W n m (p ( s ) ) denote the TDOA between sensor n and sensor m for a signal wavefront originating from a source with position coordinates p ( s ) ( p1( s ) , p2( s ) , p3( s ) ) . Note that the superscript ~ refers to transposition. The TDOA is defined as:
W nm (p ( s ) )
|| p ( n ) p ( s ) || || p ( m ) p ( s ) || c c
(1)
where p ( n ) represents the position of the n-th sonobuoy, and c represents the speed of sound in the medium. Because of the absence of a timing reference on the unknown threat source, the most commonly used technique for TDOA computation is crosscorrelation. One usually has to estimate the TDOA for each sensor pair ( n, m ) from signals xn (t ) and xm (t ) measured respectively at sonobuoys n and m. Consider then a signal s (t ) radiating from a remote source through a channel that is subject to possibly strong interference and noise. The simplest signal propagation model for estimating the TDOA between signals xn (t ) and xm (t ) is
x m (t )
Am s (t G m ) K m (t )
(2a)
J. Barhen et al. / Underwater Threat Source Localization
x n (t )
A n s (t G n ) K n ( t )
59
(2b)
where Am and An are amplitudes scaling the signal, K m (t ) and K n (t ) represent noise and interfering signals, and G m and G n are signal delay times. Let m correspond to the sensor with the smaller delay. If we refer the delay times and scaling amplitudes to m , denote the amplitude ratio by A , and define W n m G n G m , the signal propagation model becomes
K m (t )
(3a)
An s (t W n m ) K n (t )
(3b)
x m (t ) x n (t )
s (t )
To apply cross-correlation techniques, one assumes that K m (t ) and K n (t ) are real, jointly stationary, zero-mean random processes that are uncorrelated mutually, as well as with s (t ) . The cross correlation between signals xn (t ) and xm (t ) measured at sonobuoys n and m is then defined as
R xm xn (W )
³
f
f
xn (t ) xm (t W ) dt
(4)
The argument W that maximizes R xm xn provides an estimate of the TDOA W n m . Such a technique enables the synchronization of all sensors participating in the localization process. However, the correlation R xm xn can only be estimated from sequences of length N corresponding to discrete samples of the signals. Thus, an estimate of it is given by
Rˆ xm xn ( P )
1 N
N | P |1
¦Q
0
xn (n ) xm (n P )
(5)
Alternatively, the cross-correlation can be computed from the cross-power spectral density G xm xn ( f ) of xn (t ) and xm (t ) , i.e.,
R xm xn (W )
³
f
f
G xm xn ( f ) e jS f W df
(6)
This is of interest, because G xm xn can be obtained efficiently using Fourier transforms, which can be computed very fast by the optical core processor introduced in the sequel. In practice, one uses the concept of Generalized Cross Correlation (GCC) [15], where a frequency weighting filter is introduced in order to sharpen the correlation peak. The GCC is defined as
H nm (t ,W )
³
f f
\ (t , f ) Gnm (t , f ) e i 2S f W df
(7)
60
J. Barhen et al. / Underwater Threat Source Localization
where Gn m (t , f ) is the crosspower spectrum at instant t and frequency f corresponding to signals xn (t ) and xm (t ) , and \ (t , f ) is the frequency weighting filter. The GCC provides a coherence measure [16] that captures, for a hypothesized delay W , the similarity between signal segments extracted from sensors n and m. For broad-band signals, the so-called phase transform technique was introduced in [15]. It translates into the following choice for the frequency weighting filter
\ (t , f )
1 . | Gnm (t , f ) |
(8)
In practice, of course, the signals at each sensor are sampled, and both
H n m and Gn m have to be obtained from finite length sequences. To further increase
the accuracy of the TDOA estimation and to be able to achieve sub-sample precision, interpolation of the normalized cross-correlation can be performed, as suggested in [16], before finding the maximum using a windowed sinc filter.
3. The EnLight Optical Core Processor To address the computational challenges raised by underwater threat source localization, revolutionary computing technologies are needed. These are defined in terms of technological breakthroughs, both at the device and algorithmic levels, which leapfrog over near–term projected advances in conventional hardware and software to (potentially) result in paradigm shifts in computational science. For maritime sensing applications, one of the most promising advances builds upon the emergence of digital optical-core devices, inherently capable of high parallelism that can be translated to very high performance computing. Enlight256 32Gbps
High Speed Input (HSIP)
Vector Memory
High Speed Output (HSOP)
32Gbps
768Gbps
Vector Register File
Enlight Instruction Set
Vector Processing Unit (VPU) Micro-program Memory
Vector Matrix Multiplier Optical Core Fast Matrix Buffer
EMIF
TI 64xx 64xx TI InstructionSet Set Instruction
Matrix Memory
EMIF
Scalar Processing Unit & Control 2Gbps
Host (system) Figure 2. Architecture of the EnLight optical core processor
32Gbps
J. Barhen et al. / Underwater Threat Source Localization
61
Recently, Lenslet Inc. introduced the novel EnLight™ processing platform [17]. The EnLight™256 is a small factor digital signal processing chip (5u5 cm2) with an optical core. The processor is optimized for array operations, which it can perform in fixed point arithmetic at the rate of 16 TeraOPS at 8bit precision. This is substantially faster than the fastest FPGA or DSP processors available today. The architecture of a computational node is shown in Figure 2. The optical core performs matrix-vector multiplications (MVM), where the nominal matrix size is 256u256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields the peak performance of 16 TeraOPS. The rationale for large matrices is the good scaling and parallelism of such an optical processor – the larger the scale the faster the computation, with relatively small scaling penalty, comparing to electronics.
Figure 3. The EnLight™ 64D demonstrator board
Before starting production of the EnLight™256 processor, Lenslet built the EnLight™64D board, shown in Figure 3. This is a prototype demonstrator for the optical processing technology, with reduced size 64u64 optical core. In our proof-of-concept effort focused on TDOA estimation, we have used the 64D for all hardware tests. To project scale-up capabilities, we also tested our algorithms with the bit-exact simulator of the EnLight™256.
The EnLight™64D is specified as follows. Its clock operates at 60 MHz. The optical core has 64 input channels (configured as 256 vertical cavity surface emitting lasers, bundled in groups of 4 per channel). The size of the active matrix is 64u64; it is embedded in a larger multiple quantum well (MQW) spatial light modulator of size 264u288. There are 64 output channels (64 light detectors integrated with an array of analog-to-digital converters). The optical core performs the MVM function at the rate of 60 106 u 642 u 2 = 492 Giga operations per second. Each of the 64 data components in the input and output channels has an 8-bit accuracy, which results in a data stream of 60 106 u 64 u 8 bits/s = 30.7 Giga bits per second. We have developed algorithms that not only specifically build upon the massive parallelism of the EnLight processor, but also exploit the physics of this unique device. What is meant here is that a Discrete Fourier Transform (implemented as a simple matrix-vector multiplication) can be performed using the EnLight in a single processor clock cycle, provided the matrix fits in the core. This has enabled us to develop new
62
J. Barhen et al. / Underwater Threat Source Localization
hybrid FFT/DFT high-radix implementation of transforms. Details are given in a separate paper [18].
4. Results In this study, we are interested in demonstrating the ability of the EnLight computing platform to accurately carry out the estimation of signal wavefront TDOAs. For the purpose of the numerical simulations and optoelectronic hardware implementation, a number of operational simplifications are made. In particular, we assume that: Only a single target is present during the detection process; The same speed of sound is experienced at each sensor location; Each sonobuoy position is known exactly (via GPS) as it drifts; Detection opportunities are defined by the incidence of signals at the sonobuoy. Not all sonobuoys may detect the source. 1500
1000
500
0 -1500
-1000
-500
0
500
1000
1500
-500
-1000
x-y projection -1500
1 -1500
-1000
-500
0
500
1000
-9
-19
-29
-39
-49
x-z projection
-59
Figure 4. Synthetic scenario for TDOA estimation
1500
A synthetic scenario is illustrated in Figure 4. The sensor net comprises 10 sonobuoys, all identified by yellow icons. Only seven of these sensors detect a signal. The target is denoted by a red icon. Both x-y and x-z plane projections of their coor-dinates are shown. Because of symmetry, there are only 21 TDOAs that will be estimated for the seven active sensors. They will be labelled lexicographically. Thus, TDOA1 corresponds to sonobuoy pair (1,2), TDOA7 refers to pair (2,3), and TDOA21 is obtained from pair (6,7).
For assessing the accuracy of EnLight computations, we consider here a very simple model. We assume that the target emits a periodic pulsed signal with unit nominal amplitude. Pulse duration is 1 SI, and inter-pulse period is 25 SIs. The size of one Sampling Interval (SI) is 0.08s. Noise and interference are taken as a Gaussian process with a varying power level (typically up to unity). Signal extinction is neglected. Each sensor stores sequences of measured signal samples. Sequence lengths can range between 1K and 80K samples. In Figure 5, we illustrate the first 250 samples of two such signal traces, for instance those recorded
63
J. Barhen et al. / Underwater Threat Source Localization
at sensors 1 and 2 for a Signal-To-Noise Ratio (SNR) of – 11dB. Clearly, the pulsed signal signature from the threat source is indistinguishable because of the strong noise and interference. This contributes to the rationale for using correlation techniques in the source localization process. 3
2
Signal Magnitude
1
0 0
25
50
75
100
125
150
175
200
225
250
275
-1
-2
-3 Sample Sequence
Figure 5. Synthetic data sequences Sensor recorded at sensors #1 Sensor ##1 2 (orange) and #2 (green).
9
8
TDOA magnitude
7
6
5
4
3
2
1
0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
T D OA s
Figure 6. TDOA magnitude (in units of sampling intervals) versus sensor pairs (ordered lexicographically) for 7 active sensors. Exact (model) results are in blue, sensor-inferred (model) FORTRAN) inferred (sensors) results (computed usingexact 64-bit floating-point are in brown. SNR = - 9dB.
To assess the accuracy of computations performed with the EnLight processor, we have computed the TDOAs for all 21 sensor pairs using three different approaches.
64
J. Barhen et al. / Underwater Threat Source Localization
First, exact results were obtained using the model specified by Eq (1). The sensor and target positions were assumed to be exactly known, and the sonic velocity was taken to be identical at all sensor locations. Calculations were carried out using the Intel Visual FORTRAN in 64-bit precision. In Figures 6 to 9, the corresponding magnitudes are coloured in blue. In the second approach, the TDOAs were estimated using noise corrupted data samples collected at each sensor. The correlations were calculated in terms of Fourier transforms, and the computations were again carried out using 64-bit Intel Visual FORTRAN. The magnitudes of the corresponding TDOAs are coloured in brown in Figures 6 and 8. In the third approach, the sensor data processing was implemented on the EnLightTM64D hardware prototype. The TDOAs are coloured in yellow in Figures 7 and 9. For benchmark purposes, two sets of data were used. Each set corresponds to a different SNR level. These levels were selected to show the break-point of correct TDOA estimation for signals buried in ever stronger noise, when calculations are performed in high precision (floating point). This allows us to illustrate (by comparison) the occurrence of potential additional discrepancies introduced by the fixed-point limited-precision EnLight architecture.
9
8
TDOA magnitude
Delay (in Sampling Intervals)
7
6
5
4
3
2
1
0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
TDOAs
Figure 7. TDOA magnitude (in units of sampling intervals) versus sensor pairs (ordered exact (model) inferred (sensors) lexicographically) for 7 active sensors. Exact (model) results are in blue, sensor-inferred TM results (computed using EnLight 64D) are in yellow. SNR = - 9dB.
As observed from Figures 6 and 7, both the EnLight and the high precision visual FORTRAN computations from sensor data produce TDOA estimates that are identical to the exact model results for SNR = – 9 dB. Similar quality results were obtained for all sets of equal or higher SNR, and for sequence lengths of at least 2K samples. Consider now a target signal embedded in noise at SNR = – 11 dB. Figure 8 illustrates
65
J. Barhen et al. / Underwater Threat Source Localization
the emergence of discrepancies due to noise in the correlations computed in high precision. The TDOA for sensor pair (2,7) is estimated incorrectly (wrong correlation peak selected as result of noise). 9
discrepancy
8
TDOA magnitude
Delay in Sampling Intervals
7 6
5 4
3 2 1 0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
TDOAs
Figure 8. TDOA magnitude (in units of sampling intervals) versus sensor pairs (ordered exact (model) inferred (sensors) lexicographically) for 7 active sensors. Exact (model) results are in blue, sensor-inferred results (computed using Intel Visual FORTRAN) are in brown. SNR = -11B.
9
discrepancies
8
Delay in Sampling Intervals
TDOA magnitude
7
6
5
4
3
2
1
0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
TDOAs Figure 9 TDOA magnitude (in units of sampling intervals) versus sensor pairs (ordered lexicographically) for 7 active sensors. Exact are in blue, sensor-inferred exact (model) (model) results inferred (sensors) results (computed using EnLightTM64D) are in yellow. SNR = -11dB.
Figure 9 shows that two discrepancies appear in the EnLight computations at – 11dB SNR. The TDOA discrepancy for sensor pair (2,7) corresponds to the one noted in
66
J. Barhen et al. / Underwater Threat Source Localization
Figure 8 for the 64-bit precision calculations. Here another error (peak misclassification) is introduced for sensor pair (4,5). It is a direct consequence of the limited precision used in EnLight. In summary, the above results indicate that excellent accuracy can be achieved with the EnLight processor for properly scaled signals sampled over broad dynamic ranges. In terms of processing speed, we have carried out benchmark calculations for Fourier transforms of long signal sequences. In particular, we have compared the execution speed of the EnLightTM64 D hardware to processing using dual Intel Xeon processors running at 2 GHz and having 1 GB RAM. The benchmark involved the computation of 32 sets of 80K complex samples transforms. For each sample, both the forward and the inverse Fourier transforms were calculated, the latter following multiplication of the former by the transform of a reference (to estimate the correlation). The measured times were 9,626 ms on the dual Xeon system, versus 1.42 ms on the EnLight. This corresponds to a speed-up of over 13,000 on a per processor base. We also carried out a capability projection estimation using the EnLightTM256 bit exact simulator. The resulting time was 0.17 ms, yielding a speed-up of over 113,000 per processor. For the positive SNR used, perfect accuracy in determining the correlation peaks was obtained. More details on these computations can be found in [18].
5. Conclusions and Future Research To achieve the real-time performance required for underwater threat source localization, many existing algorithms need to be revised and adapted to the emerging revolutionary computing technologies. These include field programmable gate arrays (FPGA), processor in memory (PIM) architectures, and optical (or optoelectronic) devices. The EnLight terascale optical core processor represents one such revolutionary advance. In that context, our future efforts will focus on demonstrating the ability to achieve the accuracy required (including, if necessary, higher than 8-bit) for other relevant maritime sensing applications; quantifying the speed-up achieved per processor as compared to a leadingedge conventional processor or DSP; determining the scaling properties per processor as function of the number of sensors present in the detection, tracking, and discrimination network; characterizing the SNR gain and detection improvement as function of array size and geometry. Thirty five years ago, fast computational units were only present in vector supercomputers. Twenty five years ago, the first message-passing machines (NCUBE, Intel) were introduced. Today, the availability of fast, low-cost processors has revolutionized the way calculations are performed in various fields, from personal workstation to terascale machines. An innovative approach to high performance, massively parallel computing remains a key factor for progress in science and national defense applications. In
J. Barhen et al. / Underwater Threat Source Localization
67
contrast to conventional approaches, one must develop computational paradigms that exploit, from the onset (1) the concept of massive parallelism, and (2) the physics of the implementation device. This has been the guiding principle for our algorithm implementation on the EnLight processor. Ten to twenty years from now, asynchronous, optical, nanoelectronic, biologically inspired, and quantum technologies have the potential of further revolutionizing computational science and engineering by (a) offering unprecedented computational power for a wide class of demanding applications, and (b) enabling the implementation of novel information–processing paradigms.
Acknowledgments Primary funding for this work was provided by the Office of Naval Research. Additional support was received from the Oak Ridge National Laboratory’s LDRD program. Oak Ridge National Laboratory is managed for the United States Department of Energy by UT-Battelle, LLC under contract DE_AC05- 00OR22725.
References 1.
R. Klemm, Space – Time Adaptive Processing, The Institution of Electrical Engineers (UK) Press (1998).
2.
R. Klemm, ed., Applications of Space – Time Adaptive Processing, The Institution of Electrical Engineers (UK) Press (2004).
3.
W. Burdick, Underwater Acoustic System Analysis, Prentice Hall (1984).
4.
P. Tichavsky and K. T. Wong, “Quasi-fluid-mechanics-based quasi-Bayesian Cramer-Rao bounds for deformed towed-array direction finding”, IEEE Transactions on Signal Processing, 52(1), 36-47 (2004).
5.
A. Van Buren, “Near-field transmitting and receiving properties of planar near-field calibration arrays”, Journal of the Acoustical Society of America, 89(3), 1423-1427 (1991).
6.
M. Viberg and A.L. Swindlehurst, “A Bayesian approach to auto-calibration for parametric array signal processing”, IEEE Transactions on Signal Processing, 42(12), 3495-3507 (1994).
7.
A. Nuttall and J. Wilson, “Adaptive beamforming at very low frequencies in spatially coherent, cluttered noise environments with low signal-to-noise ratio and finite-averaging times”, Journal of the Acoustical Society of America, 108(5), 2256-2265 (2000).
8.
P. Tichavsky and K. T. Wong, “Near-field / far-field azimuth and elevation angle estimation using a single vector hydrophone”, IEEE Transactions on Signal Processing, 49(11), 2498-2510 (2001).
9.
T. Ajdler, I. Kozintsev, R. Lienhart, M. Vetterli, “Acoustic source localization in distributed sensor networks”, Asilomar Conference on Signals, Systems and Computers, CD ROM IEEE Press (2004).
10. Y. Chan and K. Ho, “A simple and efficient estimator for hyperbolic location”, IEEE Transactions on Signal Processing, 42(8), 1905-1915 (1994). 11. R. Schmidt, “Least squares range difference location”, IEEE Transactions on Aerospace and Electronic Systems, 32(1), 234-241 (1996). 12. G. Mellen, M. Pachter, and J. Raquet, “Closed-form solution for determining emitter location using time difference of arrival measurements”, IEEE Transactions on Aerospace and Electronic Systems, 39(3), 1056-1058 (2003). 13. J. Barhen, N. Imam, and M. Wardlaw, “Underwater Threat Source Localization: Uncertainty Reduction Algorithms for the EnLight Terascale Optical Core Processor”, NATO ARW Data Fusion Technologies for Harbor Protection – Estonia 2005, IOS Publishers (in press, 2006).
68
J. Barhen et al. / Underwater Threat Source Localization
14. J. Barhen, V. Protopopescu, and D. Reister, “Consistent uncertainty reduction in modeling nonlinear systems”, SIAM Journal of Scientific Computing, 26, 653-665 (2004). 15. C. Knapp and G. Carter, “The generalized correlation method for estimation of time delay”, IEEE Transactions on Acoustics, Speech and Signal Processing, 24(4), 320-327 (1976). 16. M. Omologo, P.Svaizer, “Use of crosspower-spectrum phase in acoustic event location”, IEEE Transactions on Speech and Audio Processing, 5(3), 288-292 (1997). 17. A. Sariel, A. Halperin, and S. Levit, at URL: www.lenslet.com . 18. J. Barhen, N. Imam, A. Averbuch, M. Berlin, and M. Wardlaw, “Implementation of an Active Sonar Matched Filter Algorithm for Broadband Doppler-Sensitive Waveforms on the EnLight Terascale Optical Core Processor”, IEEE Transactions on Signal Processing (submitted, February 2006).
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
69
On Quality of Information in Multi-Source Fusion Environments Eric LEFEBVRE a,1 , Melita HADZAGIC b and Éloi BOSSÉ c a Lockheed Martin Canada, Montréal, QC, Canada b Dept. of Elec. and Comp. Engineering, McGill University, Montréal, QC, Canada c DRDC-RDDC, Val-Belair, QC, Canada Abstract. The effectiveness of a multi-source fusion process for decision making highly depends on the quality of information that is received and processed by the fusion system. This paper summarizes the existing quantitative analyses of different aspects of information quality in multi-source fusion environments. The summary includes definitions of four main aspects of information, namely, uncertainty, reliability, completeness and relevance. The quantitative assessment of quality of the information can facilitate evaluating how well the product of the fusion process represents the reality, hence contribute to improved decision making. Keywords. information quality, multi-source fusion, uncertainty, completeness, relevance, reliability
Introduction The effectiveness of a multi-source fusion process for decision making highly depends on the quality of information that is received and processed by the fusion system. A quantitative assesment of the quality of this information can facilitate evaluating how well the product of the fusion process represents the reality, hence contribute to improved decision making. This paper summarizes the existing quantitative analyses of different aspects of information quality in multi-source fusion environments. The summary includes definitions of four main aspects of information, namely, uncertainty, reliability, completeness and relevance, and descriptions of strategies and metrics for accounting for each aspect. In Section 1 we put these aspects of information quality in context of an operational fusion environment. In Section 2, we define uncertainty, reliability, completeness and relevance, and present available quantitative methodologies for accounting for each of the information property, within the process of information fusion. In Section 3, we provide the conclusions. 1. Fusion Environment In the operational context, a fusion process includes several components which influence the quality of information produced by the fusion system. Figure 1 illustrates a simplified 1 Correspondence to: Eric Lefebvre, Lockheed Martin Canada, 6111 Royalmount Ave., Montréal, QC, H4P 1K6. Tel.: +1 514 340 8310 ext.8715; Fax: +1 514 340 8354; E-mail:
[email protected].
70
E. Lefebvre et al. / On Quality of Information in Multi-Source Fusion Environments
model of the operational fusion process. It shows the components that take part in real events, as well as in the fusion system representation. First, sensors detect events. This detection is subject to the sensor characteristics. The information obtained by the sensors is limited by the type of sensors, and it deviates from the real values depending on the sensor precision and accuracy.
Figure 1. Simplified model of the operational fusion process.
Next, the sensors’ information is collected. A component that performs this task, the Collector, may be a sensor management system or an external fusion node, such as a collaborative agency (e.g. the Coast Guard providing the information to the Navy would be considered as a Collector in this representation). The Collector asses the reliability of the information provided by the sensors it manages. The Fusion Engine is responsible for fusing the information and constructing the representation of the real world for a decision maker. Within this simplified model of the operational fusion, it is possible to identify four main aspects (or properties) of information quality, namely, uncertainty, reliability, completeness, and relevance. Each aspect can be loosely coupled with a different component of the fusion system. The uncertainty, in our view, relates to the detection ability of the sensor. The reliability of sensors, hence of the information, relates to the sensor properties as well, but it is evaluated at a higher level, i.e. within the Collector component. The information completeness will depend on the fusion procedure, hence it is related to the Fusion Engine component. Finally, the relevance of information, in terms of added value and timeliness of information, will depend on the needs of the decision maker. Therefore, the overall quality of information produced by the fusion system may not be of absolute value, but rather depending on the situation, the choice of the system components, and the system/user’s needs, i.e context dependent. The assessment of the aforementioned aspects of the information quality will allow to assess how much the representation of reality, obtained by the fusion system, is accurate. Thus, it will also lead to improved decision making. Another important issue for a decision maker is the situation awareness. However, in this paper, we present only the methodologies that condsider the objective part of information, without addressing the subjective opinion that a human may have about the information, hence excluding the situation awareness from the representation of the quality of information.
2. Information Properties The following information properties determine the quality of information in the information fusion process: uncertainty, reliability, completeness and relevance. To com-
E. Lefebvre et al. / On Quality of Information in Multi-Source Fusion Environments
71
pletely account for the quality of information in the fusion process, we need to assess each of these properties individually. In this Section, we present the definitions, the principal concepts and the strategies for accounting for uncertainty, reliability, completeness, and relevance of information in the process of information fusion. 2.1. Uncertainty Various typologies of uncertainty exist in the literature and they have been discussed in [1]. The typology proposed by Klir and Wierman [4], does not mention knowledge, and thus stays at a lower level of processing, i.e. at the information level. Its concept is closely related to quantitative theories, and leads to corresponding measures of uncertainty or the uncertainty-based information. This typology distinguishes three main types of uncertainty, namely fuzziness, nonspecificity, and discord. In this paper, we adopt the circular typology of uncertainty that is proposed in [5]. This typology is based on the one of Klir and Wierman’s, see Figure 2.
Figure 2. Circular typology of uncertainty.
In the framework of evidence theory, the belief function can model both nonspecificity and discord. The fuzzy set theory, representing and managing vague information, deals with fuzziness and nonspecificit as main kinds of uncertainty. The most adequate framework for representing uncertainty when dealing with all three kinds of uncertainty is the combination of the evidence and fuzzy set theory, i.e. fuzzy evidence theory [5]. Here, we briefly provide the theoretical basics of fuzzy evidence theory. A more detailed description of the fuzzy evidence theory can be found also in [5] and references therein. Let X be a frame of discernment containing N distinct objects,
P(X) the power set of X, and let x ∈ X be any element of X. Let Bel(A) = B⊆A m(B) be a belief function, and m : P(X) → [0, 1] a basic probability assignment (BPA), as defined in evidence theory. If the set A is defined as a fuzzy set A˜ (with the membership ˜ then the BPA, m, becomes the fuzzy BPA, m, ˜ in fuzzy evidence theory. μA˜ (x), ∀x ∈ A), The belief function in the fuzzy event is given by ˜ ˜ ˜ = I(A˜ ⊂ B)m( A) (1) Bel(A) ˜ B⊆X
˜ is the inclusion in the power set of X, P(X). The pignistic probawhere I(A˜ ⊂ B) bilities, when A˜ reduces to a singleton, (i.e. when A(x) = 1 for a single x ∈ X, and 0 elsewhere), are extended to
72
E. Lefebvre et al. / On Quality of Information in Multi-Source Fusion Environments
BetPm (x) =
x∈B⊆X
˜ m(B)B(x) ˜ |B|
(2)
The appropriate choice of framework to represent uncertainty may improve the quality of the information in a fusion process. Furthermore, the quality of information can be measured by the reduction of uncertainty [6] . Various theories of the generalized information theory field provide methods for measuring the uncertainty, and they are summarized in [7]. We refer to the results in [5], where a general measure of uncertainty (GM) is used to quantify in aggregate fashion the total uncertainty of a system that is based on fuzzy evidence theory. Moreover, GM is used for artificially reducing the uncertainty of a fuzzy BPA. Definition 1 Let m ˜ be a fuzzy BPA defined on a finite frame of discernment X. The General Measure of Uncertainty of m ˜ is defined by GM (m) ˜ =−
[BetPm (x)log2 BetPm (x)
+BetPm (x)log2 BetPm (x)]
(3)
m(A)A(x) ˜ |A|
(4)
m(A)(1 − A(x)) ˜ |A|
(5)
x∈X
where
BetPm (x) =
x∈A⊆X
BetPm (x) =
x∈A⊆X
Three basic operations for artificially reducing the uncertainty of a fuzzy BPA using the GM are proposed and they are: (1) defuzzification, (2) specification, and (3) accordance. The defuzzification transforms a fuzzy BPA into a crisp one. When applied to a fuzzy set, it gives a crisp set, while when applied to a fuzzy probability distribution gives a classical probability distribution. The specification transforms a fuzzy BPA into fuzzy probability distribution. When applied to a fuzzy set, specification gives nonspecific fuzzy set, while when applied to a crisp set, it gives a singleton. The accordance transforms a fuzzy BPA into a fuzzy set. When applied to a fuzzy probability distribution, it gives a nonspecific fuzzy set, while when applied to a classical probability distribution, accordance gives a singleton. The GM belongs to the group of quantitative methods for representing and measuring uncertainty. Uncertainty can also be represented using qualitative methods, see [8] and references therein. 2.2. Reliability In reality, the information sources, which are used in the fusion process (e.g. sensors), may not be completely reliable or may not have the same reliability. Hence, to obtain a better representation of the real world one needs to account for the reliability. The concepts and strategies of incorporating reliability into the fusion process have been summarized in [9]. According to [9], the reliability of information is closely related to
E. Lefebvre et al. / On Quality of Information in Multi-Source Fusion Environments
73
the modeling of uncertainty of a source of information and the difficulty of finding an adequate belief model to describe the uncertainty. Additionally, the models may come from different uncertainty theoretical frameworks due to dealing with different sources of information. 2.2.1. Reliability as Higher Order of Uncertainty Recommended metrics for accounting for the ranges of validity and other limitations of a chosen belief model for each information source are the reliability coefficients. In this context, the reliability coefficients represent uncertainty of evaluation of uncertainty, also called the second (or higher) order of uncertainty. They are considered as measures of the adequacy of a model which is used and the state of the observed environment. There are two approaches to account for the reliability as the higher order of uncertainty: (1) representing reliability as relative stability, i.e. by measuring the performance of each information source, and (2) by measuring the accuracy of the predicted beliefs, where the reliability coefficients represent adequacy of each belief model with respect to the reality. Here, as in [9], we present only the second approach. Let si , i = 1, . . . , I be data produced by I sources, and Θ = θ 1 , . . . , θN be a set of events under consideration. It is assumed that there is a model M , which utilizes the data and the prior information to provide us the degree of belief x i in the event A ∈ Θ : xi (A), i = 1, . . . , I. These degrees of belief take values in a real interval and are modeled within a chosen theoretical framework used to represent uncertainty. The degree of belief based on fused information which takes into account the reliability is defined by the fusion operator F R (x1 , . . . , xI , R1 , . . . , RI ), where Ri ∈ [0, 1], i = 1, . . . , I represent the reliability coefficients. R i is close to zero if the source i is unreliable and close to 1 if the source is reliable. The reliability coefficients do not only depend on a selected belief model, but also on the characteristics of the environment and a domain of the input. Hence, the reliability coefficients can be written as R i = R(Mi , γ, Y ), where Mi is the model chosen for the source i, while γ and Y , are the vectors of parameters which characterize the external and internal environments, respectively. 2.2.2. Fusion Strategies The fusion operator, F R = F (x1 , . . . , xI , R1 , . . . , RI ), depends on the global knowledge about the information sources, the environment and the chosen belief model, each possibly providing different information about the reliability. According to the level of knowledge we have about the information sources, the following situations can be distinguished [9]: 1. It is possible to assign a numerical degree of reliability to each source. In this case, each reliability value must be "relative" or "absolute",
i.e. the reliability values may or may not be linked by an equation such as i Ri = 1. 2. The order of the reliabilities of the sources are known, but not their precise values. 3. A subset of sources is reliable, but we do not know which one. To employ the knowledge in the situations above, the following strategies may be used: a. Explicitly utilizing the reliability of sources. b. Identifying the quality of input data of the fusion process and eliminating the data of poor reliability.
74
E. Lefebvre et al. / On Quality of Information in Multi-Source Fusion Environments
2.2.3. Reliability Coefficients The major issue in building the fusion operator, F R , is modeling the reliability coefficients, Ri . Their models may be constructed using the domain knowledge of external sources and the contextual information, learned by training data (e.g. in neural networks), or as a function of agreement between different sources, or between sources and fusion results. 2.3. Completeness Several descriptions of completeness as an aspect of imperfect information can be found in the literature, see [11], and they all relate to the deficiency of information. The deficiency is a property that results from incompleteness of what concerns the user (or the fusion system). Even incomplete information can sometimes be sufficient from the user’s point of view. In the structured thesaurus of imperfection of Smets [11], the word incomplete is mentioned in the context of imprecision and data without error, but missing (i.e. not present, although expected). A problem of completeness in the evidence theory (i.e. updating beliefs with incomplete information) was raised in [12]. The process of updating probabilities with observations that are incomplete, or set-valued, requires the knowledge of the incompleteness mechanism or so called protocol, which turns a complete observation into an incomplete one. The results in [12] show that neglecting the incompleteness mechanism leads to naive conditioning that is generally prone to failure. Nevertheless, it is also observed that the protocols do not always exist in the practical applications of probability or evidence theory. Recently, in [13] has been shown that commonly used strategies for updating beliefs fail, except under very special assumptions. It has also been confirmed that the incompleteness mechanism may be unknown, or difficult to model; and that the condition of coarsening at random (CAR), which guarantees that naive updating produces correct result, does not hold frequently. In [14] a new method for updating beliefs with incomplete observations, which makes no assumptions about the incompleteness, is proposed. The ignorance about the probability of the missing measurement A is modeled by a vacuous lower prevision, a tool from the theory of imprecise probabilities. Without loss of generality, the vacuous lower prevision can be considered as equivalent to the set of all distributions (i.e. it makes all the incompleteness mechanisms possible a priori). Only the coherence arguments are used to update the probabilities. The model for incompleteness mechanism is applied to the special case of the classification problem using the Bayesian networks. A definition of completeness, more confined to a case of information fusion for situation awareness and decision making, is given in [15]. The completeness is defined as a degree to which the information is not missing with the respect to the relevant ground truth. In this context, having the information about all the relevant features of interest means that the information is complete. Here, the term relevant implies that the completeness depends on the situation, the command level, and the scale of operation. The completeness is assessed within a so called information domain which includes the information obtained from the sensor sources, the fusion and the communication networks. The completeness of information obtained from the sensors is related to the sensor detection and the ability of the sensor suite to cover the area of operation (AO).
E. Lefebvre et al. / On Quality of Information in Multi-Source Fusion Environments
75
2.4. Relevance A problem may arise when the amount of available information in a fusion system grows beyond its capacity. Too much information usually results in degradation of system performance, which is usually due to the computational complexity for reasoning. Poor performance leads to the unsuccessful fusion process. Hence, it is important to be able to determine which information is relevant to a particular fusion task and what can be ignored without compromising the resulting fused information. A significant amount of research about relevance of information has been reported in the research fields of information retrieval (IR) systems and query answering (QA) tasks, see [11] and references therein. The question of relevance has been raised in [18], as one of the major problems in upgrading a search engine to a QA system. The latter is considered as a very complex problem and far from solution. Nevertheless, it is suggested that the relevance should be treated as a matter of degree, i.e. as a fuzzy concept. In [15], the relevance is defined as the proportion of collected information that is related to the task at hand, meaning that it is also context dependent as completeness. Despite the amount of published work on relevance, it seems that there is no cointensive definition of relevance in the literature, [18]. In the presence of uncertainty, the uncertainty representation will determine the approach to assess relevance. In the following paragraph, we present two methods for assessing relevance within the quantitative uncertainty framework. They consider two important issues in the information fusion systems: the temporal relevance (e.g. time of the measurement arrival) and the value-added of sensor reports. The description of methods for representing conditional ignorance and informational relevance in the symbolic entropy theory, and methods for extracting the best relevant information within the qualitative uncertainty framework can be found in [19]. 2.4.1. Relevance Measures In dynamic uncertain domains two classes of irrelevant information, namely, mutually independent beliefs and conditionally independent beliefs are considered as independent information, and accounted for accordingly. The third class of irrelevant information includes information that becomes obsolete with time due to the uncertain dynamic nature of change, i.e. the relevance of such information degenerates in time. It has been shown in [16] that the degeneration occurs in probabilistic temporal reasoning. A weaker temporal relevance criterion, that represents a degree of relevance measure called temporal extraneousness, and that captures this relevance is defined in [16]. Definition 2 If the maximum effect of information Θ at time t j on belief ql at time ti is less than a small value δ, then t i and tj are temporally extraneous with respect to q l . The extraneousness level δ 1 is met when the inequality |P (q li |Θj )−P (qli |¬Θj )| ≤ δ holds. The strength of the degree of relevance as measured by the temporal extraneousness can change according to the value of δ. A δ value of zero results in strong irrelevance notion of probabilistic independence. It has been shown that the efficiency of probabilistic temporal reasoning can be improved by ignoring irrelevant and weakly relevant information. Another notion of relevance has been provided in [17]. This notion of relevance corresponds to the uncertainty as defined in Section 2.1. If the amount of uncertainty
76
E. Lefebvre et al. / On Quality of Information in Multi-Source Fusion Environments
of information is large, then the information is considered as irrelevant. The amount of uncertainty remaining about the measurement x after the measurement y is observed, can be represented by the conditional entropy, H(x|y). Therefore, the relevance of y can be measured using the conditional entropy, which is defined as: H(x|y) = H(x, y) − H(y)
(6)
where 0 ≤ H(x|y) ≤ H(x), and H(x, y) is the joint entropy of observations x and y. To distinguish which sensor gives more accurate observations, another measure, the mutual information, I(x, y) is used. The mutual information represents a measure of uncertainty that is resolved by observing y and is defined as I(x, y) = H(x) − H(x|y)
(7)
The methodology described in [17] is applied to tracking of multiple targets by a network of radar sensors. It has been shown that it contributes to improving the decision accuracy in the current network node. This further helps in determining whether the sensor is functional or how much to weigh the decision of a neighboring node, which are important issues for data fusion. This is done using the lower and the upper bounds and the probabilistic fusion framework as described in [17].
3. Conclusions To completely account for the quality of information in the operational multi-source fusion environment for decision making, one needs to account for all aspects of information as the information moves through the informational value chain of the fusion process. It is possible to identify four main aspects of information quality, namely, uncertainty, reliability, completeness, and relevance. Each aspect can be loosely coupled with a different component of the fusion system. In this paper we provided a summary of existing descriptions of the quantitative measures and metrics for these four aspects of information. In designing such measures, one may assume particular theoretical framework for representing the processes that produce information (e.g. sensors, fusion operators, networks), such as fuzzy evidence theory and the corresponding GM [5]. However, as reported in [15], this assumption may not be necessary, and one may not be concerned about how the information is transformed as it moves through the fusion system. In the latter case, the quality of the information processing is expressed through the quality metrics in forms of parameters and alternative probability functions. The choice of measures of individual information property and the choice of strategy to incorporate that property in the fusion process, depend on the particular fusion application. Accounting for uncertainty, reliability, completeness, and relevance of information in the multi-source fusion environment will contribute to better understanding of the information and hence, improve the representation of the reality. A more accurate representation of reality will contribute to improved decision making. The measures of individual aspects of information quality within the fusion environment may also facilitate the development of measures of performance of a fusion system.
E. Lefebvre et al. / On Quality of Information in Multi-Source Fusion Environments
77
References [1] A-L. Jousselme, P. Maupin, and E. Bossé, Uncertainty in situation analysis perspective, Proceedings of the 6th International Conference on Information Fusion, 23(2):728–736, July 2003. [2] D. L. Hall and S. A. H. McMullen, Mathematical techniques in multisensor data fusion, Artech House, 2004. [3] J. Y. Halpern, Reasoning about uncertainty, MIT Press, 2003. [4] G. J. Klir and M. J. Wierman, Uncertainty based information, Vol.15 of Studies of Fuzziness and Soft Computing, Physica-Verlag, NY, 2nd.edition, 1999. [5] C. Liu, A-L. Jousselme, D. Grenier and E. Bossé, A general measure of uncertainty framed into the fuzzy evidence theory, IEEE Transactions on Systems, Man and Cybernetics - Part A: Humans and Systems, 2005. In review. [6] G. J. Klir and T. A. Folger, Fuzzy Sets, Uncertainty and Information,NJ, Prentice Hall, 1988. [7] C. Liu, A general measure of uncertainty-based information, Ph.D. Thesis submitted to the Dept. of Electrical and Software Engineering, Laval University, 2004. [8] S. Parsons, Qualitative methods for reasoning under uncertainty, MIT Press, 2001. [9] G. L. Rogova and V. Nimier, Reliability in Information Fusion: Literature Survey, Proceedings of the 7th International Conference of Information Fusion,pp. 1158-1165, 2004. [10] D. Dubois and H. Prade, Combination of Fuzzy Information in the Framework of Possibility Theory, in Data Fusion in Robotics and Machine Learning, M. A. Abidi and R. C. Gonzales, editors, Academic Press, 1992. [11] A. Motro and P. Smets, editors. Uncertainty managementin information systems: from needs to solutions, Kluwer Academic Publishers, 1997. [12] G. Shafer, Conditional probability, International Statistical Review, 53:261–277, 1985. [13] P. D. Grunwald and J. Y. Halpern, Updating probabilities, Journal of Artificial Intelligence, pp. 243-278, 2003. [14] G. De Cooman and M. Zaffalon, Updating beliefs with incomplete observations, Journal of Artificial Intelligence Research, Vol. 159,1-2, Nov.2004. [15] W. Perry, D. Signori, and J. Boon, Exploring Information Superiority: A Methodology for Measuring the Quality of Information and Its Impact on Shared Awareness. RAND Corporation Report, 2004. [16] A. Y. Tawfik and E. M. Neufeld, Irrelevance in uncertain temporal reasoning, Proc of the 3rd Intl. IEEE Workshop on Temporal Representation and Reasoning, pp. 196-202, 1996. [17] S. Kadambe, Sensor/data fusion based on value of information Proc. of the 6th International Conference on Information Fusion, pp. 25-32, 2003. [18] L. Zadeh, From search engines to QA systems; the problems of world knowledge, relevance and deduction, WSEAS Conference, 2005. [19] M. Chachoua and D. Pacholzyk, Qualitative reasoning under ignorance and informationrelevance extraction, Knowledge and Information Systems, Vol. 4, Issue 4, pp. 483-506, 2002.
78
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Polarimetric Features and Contextual Information Fusion for Automatic Target Detection and Recognition a
Yannick ALLARD a, Mickael GERMAIN b, Olivier BONNEAU b Research and Development Department., Lockheed Martin Canada, b Centre de Recherche en Mathematiques, Universite de Montreal,
[email protected],
[email protected] Abstract. Several studies have already shown that remote sensing imagery would provide valuable information for area surveillance missions and activity monitoring and that its combination with contextual information could significantly improve the performance of target detection/target recognition (TD/TR) algorithms. In the context of surveillance missions, spaceborne synthetic aperture radars (SARs) are particularly useful due to their ability to operate day and night under any sky condition. Conventional SARs operate with a single polarization channel, while recent and future spaceborne SARs (Envisat ASAR, Radarsat-2) will offer the possibility to use multiple polarization channels. Standard target detection approaches on SAR images consist of the application of a constant false alarm rate (CFAR) detector and usually produce a large number of false alarms. This large number of false alarms prohibits their manual rejection. However, over the past ten years a number of algorithms have been proposed to extract information from a polarimetric SAR scattering matrix to enhance and/or characterize man-made objects. The evidential fusion of such information can lead to the automatic rejection of the false alarms generated by the CFAR detector. In addition, the aforementioned information can lead to a better characterization of the detected targets. In the case of more challenging backgrounds, such as groundbased target detection, the use of higher level information such as context can help in the removal of false alarms. This paper will discuss the use of polarimetric information for target detection using polarimetric SAR imagery as well as the benefit of contextual information fusion for ground-based target detection.
Key words: polarimetric SAR, target detection, contextual information, evidential fusion.
Introduction Remote sensing imagery, due to its large spatial coverage, enables the monitoring of large areas and provides valuable information in the context of area surveillance and activity monitoring. The next generation of sensors that will likely be used for this particular task consists primarily of high-resolution Polarimetric SAR (PolSAR), Hyperspectral Imagery (HSI) and high-resolution optical systems. These sensors will provide a large amount of data and will require the development of tools and methodologies to automatically analyze them and extract meaningful information. For the particular tasks of target detection and area monitoring, PolSAR sensors have the
Y. Allard et al. / Polarimetric Features and Contextual Information Fusion
79
advantage of being independent of solar illumination and are very sensitive to the presence of man-made objects. During the last decade, many algorithms were developed for point target detection and characterization on polarimetric data. However, these polarimetric features cannot provide all the information needed to achieve target detection with an acceptable level of false alarms in the case of very challenging backgrounds. One challenge for the particular task of ground-based object detection and discrimination is the adequate use of contextual information. This paper discusses the task of target detection and characterization using PolSAR imagery for application in wide area surveillance. In the next section, conventional target detection methodology on SAR imagery is described. Section 2 presents the more commonly used polarimetric features that can be used for point target detection using polarimetric SAR imagery. The use of contextual information as an aid for target detection and discrimination is discussed in Section 3 while Section 4 presents examples of applications of polarimetric features fusion for target detection and characterization for maritime and ground area surveillance. Finally, conclusions are drawn in Section 5.
1. Target Detection using SAR Imagery Conventional systems for target detection and recognition on SAR imagery usually consist of five stages (Figure 1).
Figure 1. Stages of a target detection and recognition system
The detection stage consists of determining the presence of a target signature at a particular position in the image. This task is usually achieved mainly with the application of a two-parameter Constant False Alarm Rate (CFAR) detector. However, this CFAR detection step generates numerous false detections making it impossible to perform manual rejection, especially in the case of a very challenging background. On the other hand, target discrimination may be seen as the binary classification in target versus non-target. This paper focuses on these two tasks of the target detection and recognition scheme. Target characterization using polarimetric information will be briefly discussed. 1.1. Constant False Alarm Rate (CFAR) Detection The target detection stage selects areas with a high probability of containing targets. The detector must be computationally simple and should provide high probability of detection while creating at the same time as few False Alarms (FAs) as possible; an FA being a detection that corresponds to a clutter region. One of the most widely used prescreeners in SAR target detection is the two-parameter CFAR detector, which is based on a normalized test of the pixel intensity versus its local neighborhood. Figure 2 represents the typical window of analysis of a CFAR detector. The moving window is composed of a test pixel surrounded by a guard ring to prevent any influence of the target on the boundary ring, which is used to compute the necessary statistics. The
80
Y. Allard et al. / Polarimetric Features and Contextual Information Fusion
popularity of such a simplicity/performance.
detector
is
due
to
a
good
compromise
between
Figure 2. Principle of a CFAR Detector
However, the discriminating power of the two-parameter CFAR is not enough to reduce the false alarms to an acceptable level. The incorporation of complementary features such as polarimetric features should help the target system provide a more reliable result. The next section describes some of the polarimetric features that can be used in the target discrimination stage to reduce the number of FA to an acceptable level.
2. Polarimetric Features for Target Discrimination The detection stage highlights potential targets that must pass through a discrimination stage intended to reject false alarms based on geometrical and electromagnetic properties. Polarimetric features introduce the means to tackle the task of target discrimination more effectively. With the current airborne and upcoming spaceborne polarimetric SARs (Radarsat-2), polarimetric decomposition algorithms can be used to remove false detection in the target discrimination stage. Usually, target discrimination is only applied in the Region Of Interest (ROI) and more computationally demanding algorithms, such as the polarimetric decompositions, can be applied for that task. Amongst the large number of available polarimetric decompositions, the following are the most interesting for point target detection, discrimination and characterization: x The Odd/Even basis decomposition (4) x Cameron’s Coherent Target Decomposition (CTD) (2) x Polarization anisotropy (7) x Symmetric Scattering Characterization Method (SSCM) (6) x Subaperture Coherence (5). For a complete description of these algorithms the reader should refer to the appropriate publications. Examples of these polarimetric features computed over maritime and ground areas are provided in the corresponding presentation.
3. Contextual Information for Target Detection and Discrimination In the case of very challenging backgrounds, such as ground-based target detection and discrimination, the previously mentioned polarimetric features can still be used to
Y. Allard et al. / Polarimetric Features and Contextual Information Fusion
81
reduce the number of false detections. However, for such targets, object-centred approaches, which are largely used in the ATR community, have some drawbacks when the image suffers some degradation or when the target is small compared to the sensor’s resolution. In these cases, not enough local evidence can be extracted to ensure a reliable detection and recognition. In the absence of local evidence, the scene structure and a priori knowledge should provide the information for efficient detection and recognition. The background will therefore be considered as an indicator of an object’s presence and properties and not as a potential distractor. Context information may be captured through a wide variety of methods such as: x Well known pixel/object-based labelling techniques introducing dependencies through neighbouring pixel/region relationships x Temporal data revisit x Fusion (pixel, features) provided by other sensors x Geographical Information System (GIS) thematic maps x Other valuable knowledge sources for interpretation: meteorological data, tides timetables… In our example of application, we used a previously interpreted Ikonos image to model the context using topological relationships such as being on, near to or far from a certain land cover type. This contextual information can be used in the target detection task by modifying the false alarm probability of a CFAR detector (1) or in the target discrimination step, by performing context-based false alarm mitigation.
4. Examples of Application This section presents practical examples of target detection and discrimination using polarimetric features computed over PolSAR imagery. Two cases are discussed: maritime surveillance and ground-based target detection. 4.1. Maritime Surveillance When performing ship detection on a SAR image using a CFAR detector, many false alarms (typically from 3-10 to more than an hundred) are generated. These false alarms are mainly caused by the sea state, small fishing boats, icebergs, etc… In this case, using polarimetric information should be beneficial in removing a huge number of false alarms due to a not-so-challenging background. The method we have chosen to demonstrate this is the evidential fusion of polarimetric information in the CFAR contacts to validate or discard the ROI. 4.1.1. Evidential Fusion of Polarimetric Features for False Alarm Mitigation As mentioned earlier, because of the computational burden of the polarimetric decomposition and fusion algorithms, these are only applied in the ROI that survives the CFAR detection step. The fusion of polarimetric features is achieved using Dempster-Shafer’s evidence theory (3). This framework offers us the advantages of an easy modelization of imprecision and uncertainty in the reasoning process and takes into account compound hypothesis, which is particularly useful since our polarimetric features are unable to discriminate in a precise manner all the objects of interest.
82
Y. Allard et al. / Polarimetric Features and Contextual Information Fusion
To fuse polarimetric information, it is mandatory to define mass functions for each feature. We use trapezoidal mass functions for each “continuous” feature (e.g., subaperture coherence) or a hard confidence if the feature is a hard decision provided by an algorithm (e.g., binary classification of coherent and non-coherent point target). The mass functions are assigned using our knowledge about each feature. In the case of continuous features, the overlapping parts of the trapezoidal mass functions do not allow us to choose their parameters very precisely (8). 4.2. Ground-based Target Detection Consider now the case of ground-based area monitoring using remotely sensed imagery. In this case, false alarms are more likely to happen because of numerous reasons, such as speckle noise, smaller target size with regard to a sensor’s resolution; man-made return from any metallic object, commercial vehicular traffic; etc... In addition, the possibility of a target’s camouflage is present and adds to the already difficult task of target detection. Given all these potential distracters, the task of ground-based target discrimination is more complex than its maritime counterpart. However, the use of contextual information should reduce the number of false detections to an acceptable level. 4.2.1. Integration of Contextual Information Contextual information can be used in the two first steps of an ATR scheme. It can be used either directly in the detection stage during pre-screening or in the discrimination step by performing context-based false alarm mitigation. During pre-screening, one will seek to use contextual information to vary the probability of false detections on a per-pixel basis according to the land-cover features present in the scene under analysis and the distance of a particular pixel with regard to these features. If we suppose that interpreted imagery or GIS information layers are available, it is possible to use a priori knowledge about the terrain type and edges positions to modify the PFA of a CFAR detector to reflect the military behavior of the target we expect to detect. As an example, one could state that a target would prefer being close to a forest boundary to allow protection on one flank and camouflage. Proximity to a means of transportation should also be favored for displacement reasons. One can use all these subjective assumptions of target behavior to modify the false alarm rate of the CFAR detector on a per-pixel basis. Knowledge of the land cover types present in a SAR scene can also be used to remove false alarms that occurred in regions where a target cannot be detected according to the sensor or target properties. This process is called context-based false alarm mitigation. In addition, the knowledge of land cover types enables a system to extract the target’s context and infer some of the target’s properties. 4.3. Target Characterization The task of target characterization aims at extracting target features and recognizing targets from polarimetric SAR images. The target’s length extraction and characterization is a complex problem to resolve, especially for ships, due to various problems such as uncontrolled environments, variable image acquisition geometry and
Y. Allard et al. / Polarimetric Features and Contextual Information Fusion
83
resolution, focus problems and dependence of the radar scattering to a target’s orientation. 4.3.1. Length and Orientation Estimation Once a target is detected and segmented from the imagery, one of the tasks in target characterization is to estimate the length and the orientation of the target. One of the commonly used methods to extract this information is the Hough transform. The Hough transform computes the target’s centreline and the length and orientation are computed using the end-points of the line assuming that the sample spacing in range and azimuth are known. 4.3.2. Charactreization using Polarimetric Features The task of target characterization aims at extracting target features and recognizing targets from polarimetric SAR images. When a target is detected, the polarimetric information can be used to characterize the target and/or detect and identify supersuperstructures on the target. One way to use the polarimetric features in such a task is to analyze the distribution of the elemental scatterers in different portions of the target. To do so, the detected target is segmented in a certain number of parts, and each part is subject to an analysis to detect potential structures. We used the scatter type derived from the Cameron decomposition in an attempt to characterize the detected target. However, despite highlighting the difference in shape distribution between the target and its surroundings, the limited available datasets render a detailed study and analysis of the target’s shape composition impossible. The length of a target is therefore the major source of information for target identification.
5. Conclusion The next generation of spaceborne and airborne imaging sensors will increase the role of remote sensing imagery for wide area surveillance and monitoring. However, due to this growing amount of available data, it will be necessary to develop automated tools to help the image analyst in his or her task. The image analysis application considered in this lecture was an application directed toward automatic detection and characterization of targets using SAR imagery. As shown, the detection and discrimination performances of an automatic system are better when using polarimetric SAR data than using only its single channel counterpart. The polarimetric nature of the data provides additional features of interest for ship and ground target detection/recognition. The evidential fusion of these polarimetric features within the CFAR contacts can eliminate many of the false alarms generated by the CFAR detector. Dual polarization should be investigated for detection and false alarm reduction, especially using Subaperture Coherence of the HV channel, and perhaps other features computed from the HH-HV channel. In the case of ground targets, the fusion of polarimetric features alone cannot, in a general case, reduce the number of FA. As we have seen, the use of multiple sensors improves scene description. The integration of contextual information is required to increase performance of ATD/R algorithms, especially in the case of challenging backgrounds, because of the lack of local information to provide reliable object detection and characterization.
84
Y. Allard et al. / Polarimetric Features and Contextual Information Fusion
Acknowledgment This work was supported by the Canadian Space Agency under the Earth Observation Application Development Program (EOADP) (contract 9F028-3-4910/A) and the Radar Applications and Space Technologies section of the Defence Research and Development Canada – Ottawa (DRDC-O). We would like to acknowledge Space Imaging for some data that were used in this project.
References [1] [2] [3] [4] [5] [6] [7] [8]
Blacknell, D., Contextual information in SAR target detection, IEE Proceedings; Radar, Sonar and Navigation, Vol. 148, Issue 1, pp. 41-47, 2001. Cameron, W., Youssef, N., Leung, L.K., Simulated polarimetric signatures of primitive geometrical shapes. IEEE Transactions on Geoscience and Remote Sensing, 34(3), pp. 793-803, 1996. Dempster, A.P., A Generalization of Bayesian Inference, Journal of the Royal Statistical Society, 30, 1968. Novak, L., Halversen, S., Owirka, G, Hiett, M, Effects of Polarization and Resolution on the Performance of a SAR Automatic Target Recognition System, Lincoln Laboratory Journal, vol. 8, no. 1, pp. 49-68, 1995. Sourys, J.-C., Henry, C., Adragna, F., On the use of Complex SAR Image Spectral Analysis for Target Detection : Assessment of Polarimetry, IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no. 12, pp. 2725-2734, 2003. Touzi, R., Charbonneau, F., Characterization of Target Symmetric Scattering Using Polarimetric SARs, IEEE Transactions on Geoscience and Remote Sensing, vol. 40, no. 11, pp. 2507-2516, 2002. Touzi, R., Charbonneau F., Hawkins R. K., Vachon, P. W., Ship Detection and Characterization using Polarimetric SAR; Canadian Journal of Remote Sensing (RADARSAT 2 Special Issue),June 2004. Tupin, F., Reconnaissance de forme et Analyses de Scène en Imagerie Radar à ouverture Synthétique, Thèse de Doctorat, École Nationale Supérieure des Technologies, Paris, 1997.
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
85
Enhancing Efficiency of Dynamic Threat Analysis for Combating and Competing Systems Edward POGOSSIAN a,b, Arsen JAVADYAN a,b, Edgar IVANYAN b a Academy of Science of Armenia, Institute for Informatics and Automation Problems, b State Engineering University of Armenia,
[email protected],
[email protected],
[email protected] Abstract. We study the class of problems where Solutions Spaces are specified by combinatorial Game Trees. A version of Botvinnik’s Intermediate Goals At First (IGAF) algorithm is developed for strategy formation based on common knowledge planning and dynamic plan testing in the corresponding game tree. The algorithm (1) includes a range of knowledge types in the form of goals and rules, (2) demonstrates a strong tendency to increase strategy formation efficiency, and (3) increases the amount of knowledge available to the system.
Key words: game tree, expert knowledge, decision making, intrusion protection, measurement.
Introduction Many security and competition problems belong to a class where Spaces of Solutions are specified by combinatorial Game Trees (SSGT). Specifically, network Intrusion Protection Optimal Strategy Provision (IP OSP) and Management in oligopoly competitions (MOSP) problems, chess-like problems – Chess OSP, are examples. Many other security problems such as Computer Terrorism Countermeasures, Disaster Forecast and Prevention, Information Security, and Medical Countermeasures may also be reduced to an SSGT class. To solve SSGT problems we define a class of Decision Making Systems (DMS) Intermediate Goals At First (IGAF) algorithms, based on the following constructions and procedures: (1) a game tree model for target competition with sub models of the states, actions, and (contra)actions, (2) the rules to apply (contra)actions to the states and transform them to new ones, (3) descriptors of the goal states, (4) the optimal strategy search procedure with strategy planning units aimed at narrowing the search area in the game tree, (5) the plans quantification, (6) the game tree based dynamic testing strategy, and (6) the best action selection units . IGAF algorithms were successfully tested on the network Internet Protocol (IP) problem and other SSGT problems. For example, for the IP problem, the IGAF1 version exceeded system administrators and known standard protection systems in about 60% of the experiments on fighting 12 different types of known network attacks [1].
86
E. Pogossian et al. / Enhancing Efficiency of Dynamic Threat Analysis
A Linux version of the IGAF algorithm is now currently being used for the IP system of the ArmCluster [2]. Pioneering research into strengthening the performance of the chess version of IGAF like programs, which simulate a chess master’s decision making processes through systematic acquisition of human knowledge, was performed in [3, 4, 5] and developed in [6]. In [9] an attempt was undertaken to study the viability of a decision making system, with various types of chess knowledge, including an ontology of about 300 concepts. Significant advances in the ontology of the security domain, ontology-based representation of distributed knowledge of agents, a formal grammar of attacks and their application to network IP systems, as well as a comprehensive review of ontology studies in the field, are presented in [7, 8]. Compared with [10,11,12], where network-vulnerability analysis is based on finding critical paths in attack graphs, our game tree based model searches counteraction strategies comprising elementary and universal units – elementary procedures or an alphabet, that the intruder or administrator uses to combine either attacks or defense procedures, respectively. Some of these procedures can coincide, particularly with elementary attacks of [10, 11, 12]. However the aim is to find procedures elementary enough to cover the diversity of intruder and defender behaviors, while meaningful enough for human understanding and operations. An alphabetic approach to representation of attacks and defense operations causes game tree size explosion, which we attempt to overcome using successful computer chess experience. In this paper, we describe our experience at enhancing the effectiveness of the IGAF algorithms. First, we determine the class of SSGT problems and provide examples. Then we discuss whether the SSGT Expert Knowledge (EK) can be adequately simulated and whether models can be regularly used for problem solving. Finally, we describe our approach to measuring the performance of the IGAF algorithms and our experiments on enhancing their performance for the IP OSP problem.
1. Determining the Class of SSGT SSGT problems are identified in a unified way by game tree constituents, which create the base for a unified methodology for their resolution. The constituents include, particularly, the list of competing parties and their goals, their actions and (contra)actions, states of trees. and rules for their transformations. For the above problems the GT constituents are determined as: x The Chess OSP problem: white and black players with checkmate as the goal chess piece moves are (contra)actions and composition of the chess pieces on the board specify game states transformed by actions corresponding to chess rules. x The MOSP problem for the Value War [13] model’s interpretation: a company competing against a few others with Return On Investment as the goal price changes and product quality as the actions
E. Pogossian et al. / Enhancing Efficiency of Dynamic Threat Analysis
x
87
tree states determined by competition scenarios, i.e., the competition template is formed from the conceptual basis of management theory, with a set of parameters specifying a particular competition in the scenario and the actions of all competing parties transformation rules determined by general micro- and macro- economics laws, which, when applied to input states, create new output ones. The IP OSP problem: network protection systems, e.g., system administrators or IP special software, combat against intruders or network disturbing forces (e.g., hackers or disturbances caused by technical casualties) to ensure the network is kept in a safe and stable state network states are determined by the composition of current resources vulnerable to network disturbances actions and (contra)actions are the lists of means able to change resources and therefore transform states [1, 14].
2. Whether SSGT EK can be Simulated Adequately? Expert concept approximation with acceptable adequacy might have the following impact on understanding cognition and computers [15]: x explaining cognitive mechanisms for concept creation and processing x highlighting whether human conceptual activity associates imagery operations with an attributive base or whether it can avoid imagery support [16, 17], i.e., understanding “….the act of forming and examining a ‘picture in the head’ ”[16] x revealing whether computers are able to simulate in a “natural” way an “alive” fragment of human knowledge. x By “natural” computer simulation, we mean: whether human image processing is an essential part of human mental operations, the answer to which may involve finding new simulation means to enrich computer operations; otherwise new effective procedures may need to be developed with regard to the current concept of computers. We expect the last answer may be a step towards understanding the principal abilities of computers [18], which are universally perceived as the means to simulate systems. Our preliminary analysis of the chess knowledge collected [9] allows us to state that expert knowledge for SSGT problems can be simulated by computers. The analysis is based on Cermelo’s reduction of chess knowledge to descriptions of classes of winning positions in the game tree.
3. Can SSGT EK Models be Regularly Used for Problem Solving? Using the above game tree model, we experiment with a variety of algorithms to counteract intrusions. Our IGAF1 is similar to Botvinnik’s chess tree cutting-down algorithm (CTCD). The CTCD is based on the natural hierarchies of goals in control problems and the
88
E. Pogossian et al. / Enhancing Efficiency of Dynamic Threat Analysis
assertion that search algorithms become more efficient if they try to achieve subordinate goals before attempting main ones. The trajectories of confronting parties to those subgoals are chained in order to construct around them zones of the most likely actions and counteractions. As a result of comparative experiments with the minmax and IGAF1 algorithms in [1, 14], we state the following: x the model, which uses the minimax algorithm, is compatible with experts (the system administrators or specialized programs) against intrusions or other forms of base system perturbations x the IGAF1 cutting-down tree algorithm, besides being compatible with the minimax algorithm, can also be effectively applied to real IP problems. x Here we consider a more advanced version of the algorithm, Intermediate Goals At First (IGAF2), which is able to: (1) acquire a range of expert knowledge in the form of goals or rules, and (2) increase the efficiency of strategy formation by increasing the amount of expert knowledge available to the algorithm [21]. The following expert goals and rules have been embedded in the IGAF2 algorithm. The goals: 1. the critical vs. normal states are determined by a range of values of the system states; for example, any state of the system with a value corresponding to a criterion function, which is more or equal to some threshold, may be determined as a critical goal 2. the suspicious vs. normal resources are determined by a range of states of the resource classifiers; combinations of classifier values identified as suspicious or normal induced signals for appropriate actions. The rules: 1. Identify the suspicious resources by classifier and narrow the search to the corresponding game subtree 2. Avoid critical states and focus on normal ones 3. Normalize the state of the system. First, try the actions of the defender, whose influence on the resources caused the current state change; if this does not work, try other actions 4. In building a game subtree for suspicious resources use defending actions able to influence such resources normal actions until there are no critical states if some defensive actions were used in previous steps decrease their usage priority 5. Balance the resource parameters by keeping them in the given ranges of permitted changes.
4. Measuring the Performance The On-the-Job Competition Scales method is aimed at evaluating DMS or their constituents in competitions. Given competition, this method allows the DMS to be ordered by on-the-job performance or absolute scales, in accordance with comprehensive comparisons of all competitor performances according to the base criteria of success declared in the original competition definitions [19].
E. Pogossian et al. / Enhancing Efficiency of Dynamic Threat Analysis
89
To compare IP algorithms, we use the Distance to Safety (DtS) and Productivity (P) criteria to estimate the “distance” of current states of protected systems from normal ones and the level of performance that the IP algorithms can preserve for them, correspondingly. A special tool is developed to estimate the quality of protection against unauthorized access [20]. The tool allows the component parts of experiments to vary, such as estimated criteria, attack types, and IP system parameters. Each experiment includes an imitation of the work of the base system with/without suspicion of an attack (or any other perturbation in the system), during which the IP algorithm makes a decision about the best strategy and chooses the best action according to the strategy. Data from attack experiments contain the system state’s safety estimate for each step, the actions taken by each side, and the system’s performance. The attack experiments have to be representative of a variety of possible attacks. We assume a combinatorial, individual nature of the attacks that are unified into classes of similar ones. By experimenting with a few class representatives, we hope to approximate a coverage of their variety. In [21], the SYN-Flood, Smurf, Fraggle Login-bomb attacks were studied. In this experiment we also added the ICMP Hack and Data Fragmentation attacks.
5. Experiments on Enhancing Performance The experiments were aimed at proving that the IGAF2 cutting-down tree algorithm, besides being compatible with the minimax algorithm, increases efficiency by embedding expert knowledge. The investigated version of the algorithm used the following components: x Over 12 single-level and multilevel solver-classifiers of the local system states x 6 actions/procedures of the attacking side x 8 “normal” actions/procedures of the system x 22 known counteractions against attack actions/procedures (the database of counteractions). IGAF2 was tested in experiments against four attacks with a depth of the game tree search up to 13 and the following controlled and measured criteria and parameters: distance to safety, productivity, working time, and number of game tree nodes searched, new queue of incoming packages, TCP connection queue, number of processed packages, RAM, HD, unauthorized access to files and login into the system. The results of the experiments show: x Sampling means for Distance to Safety and Productivity of the IGAF2 and minmax algorithms are compatible. x Number of nodes searched by the IGAF2 algorithm with all expert rules and subgoals are decreasing compared with the IGAF1 algorithm or the minimax one. x Number of nodes searched by the IGAF2 algorithm with all expert rules and subgoals is the smallest compared with the IGAF1 algorithm or the minimax one when the depth of search is increasing up to 13
90
E. Pogossian et al. / Enhancing Efficiency of Dynamic Threat Analysis
x
The time spent by the IGAF2 algorithm with all expert rules and subgoals is the smallest compared with the IGAF1 algorithm or the minimax one when the depth of search is increasing up to 13.
6. Conclusion A version of Botvinnik’s IGAF2 algorithm was developed, which is able to acquire a range of expert knowledge in the form of goals or rules and to increase the efficiency of strategy formation by increasing the amount of expert knowledge available to the algorithm. The viability of the IGAF2 algorithm was successfully tested in the network IP problems against representatives of six classes of attacks: SYN-Flood, Fraggle, Smurf, Login-bomb, ICMP Hack and Data Fragmentation. The recommended version of the algorithm – IGAF2, with all expert rules and subgoals, for the depth of search 5 and 200 defending steps, outperforms the Productivity of minmax algorithm by 14%, using 6 times less computing time and searching 27 times less nodes of the tree. Future plans include, expanding the alphabet of attack and defense actions, including hidden ones, and developing our approach to increase the strength of the IGAF algorithms through systematic enrichment of their knowledge base by new IP goals and rules.
References 1. Pogossian E. Javadyan A. “A Game Model For Effective Counteraction Against Computer Attacks In Intrusion Detection Systems,” NATO ASI 2003, Data Fusion for Situation Monitoring, Incident Detection, Alert and Response Management, Tsahkadzor, Armenia, August 19-30, pp.30. 2. H. V. Astsatryan, Yu. H. Shoukourian,, V. G. Sahakyan. The ArmCluster1 Project: Creation of HighPerformance Computation Cluster and Databases in Armenia Proceedings of Conference. Computer Science and Information Technologies, 2001, pp. 376-379 3. M.M. Botvinnik, About solving approximate problems, S. Radio, Moscow, 1979(Russian) 4. Botvinnik, M.M., Computers in Chess: Solving Inexact Search Problems. Springer Series in Symbolic Computation, with Appendixes, Springer-Verlag: New York , 1984. 5. Botvinnik, M. M., Stilman, B., Yudin, A. D., Reznitskiy , A. I., Tsfasman, M.A., “Thinking of Man and Computer,” Proc. of the Second International Meeting on Artificial Intelligence, Repino, Leningrad, Russia, Oct. 1980, pp. 1-9. 6. Stilman, B., Linguistic Geometry: From Search to Construction, Kluwer Academic Publishers, Feb.2000, 416 pp. 7. V. Gorodetski, I. Kotenko: “Attacks against Computer Network: Formal Grammar Based Framework and Simulation Tool.” Proc. of the 5 Intern. Conf. "Recent Advances in Intrusion Detection", Lecture Notes in Computer Science, v.2516, Springer Verlag, pp.219-238, 2002. 8. I. Kotenko, A. Alexeev E., Man’kov. “Formal Framework for Modeling and Simulation of DDoS Attacks Based on Teamwork of Hackers-Agents.” Proc. of 2003 IEEE/WIC Intern. Conf. on Intelligent Agent Technology, Halifax, Canada, Oct. 13-16, 2003, IEEE Computer Society. 2003, pp.507-510. 9. E.Pogossian. Adaptation of Combinatorial Algorithms.(in Russian), Yerevan., 1983, 293 pp. 10.Phillips C., Swiler L.. “A Graph-Based System for Network-Vulnerability Analysis,” New Security Paradigms Workshop ,Proceedings of the 1998 workshop on New security paradigm 11.Sheyner O., Jha S., Haines J., Lippmann R., Wing J., “Automated Generation and Analysis of Attack Graphs.” Proceed. of the IEEE Symposium on Security and Privacy, Oakland, 2002. 12.Sheyner O., Wing J., Tools for Generating and Analyzing Attack Graphs, to appear in Proceed. of Formal Methods for Components and Objects, Lecture Notes in Computer Science, 2005. 13. Chussil M., Reibstein 1994. D. Strategy Analysis with Value War. The SciPress,
E. Pogossian et al. / Enhancing Efficiency of Dynamic Threat Analysis
91
14.Pogossian E. Javadyan A. “A Game Model And Effective Counteraction Strategies Against Network Intrusion.” 4th International Conference in Computer Science and Information Technologies, CSIT2003, Yerevan, 2003, pp.5 15.Pogossian E., Tumasyan K. “Toward Chess Concepts Adequate Simulation,” Proceedings of the Annual Conference of the State Engineering University of Armenia, 2004, pp.5 (in Russian). 16.Pylyshyn Z. Seeing and Visualizing: It’s Not What You Think, An Essay On Vision And Visual Imagination, http://ruccs.rutgers.edu/faculty/pylyshyn.html 17.Kosslyn S. Image and Mind. Cambridge, MA Harvard University Press 1980 18.Winograd T., Flores F. 1986.Understanding Computers and Cognition (A new foundation for design). PUBLISHER? 19. Pogossian E. 1999. “Management Strategy Search and Programming.” Proceed. CSIT99, Yerevan, 20.Pogossian E. Javadyan A., Ivanyan E. “Toward a Toolkit for Modeling Attacks and Evaluation Methods of Intrusion Protection,” Annual Conference of the State Engineering University of Armenia, 2004, pp.5 (in Russian). 21.Pogossian E., Javadyan A., Ivanyan E. “Effective Discovery of Intrusion Protection Strategies,” AISADM 2005, Lecture Notes in Artificial Intelligence , Vol. 3505, pp. 263-276, 2005. Springer-Vergal Berlin Heidelberg 2005.
92
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Evidence Theory for Robust Ship Identification in Airborne Maritime Surveillance Missions Pierre VALIN Defence R&D Canada Valcartier (DRDC-V) 2459 Blvd Pie XI Nord, Val-Bélair, QC, G3J 1X5, Canada
Abstract. The CP-140 (Aurora) Canadian maritime surveillance aircraft is presently undergoing an Aurora Incremental Modernization Program that will allow multi-sensor data fusion to be performed. Dempster-Shafer (DS) evidence theory is chosen for the identity information fusion due to its natural handling of conflicting, uncertain and incomplete information. Two realistic scenarios were constructed in order to test DS under countermeasures, miss-associations, and incorrect classification. Results show that DS theory is robust under all but the worst cases, when using the existing suite of sensors. Keywords: attributes, platform database, SAR, imagery classifiers, DempsterShafer
Introduction Reasoning over attributes (or situations) plays a big role in military domains, where complementary sensor information can lead to quicker and more stable identification (ID) through identity information fusion. The focus of this lecture is an application of Dempster-Shafer (DS) evidence theory to airborne surveillance of ships through reasoning over attributes, using both passive and active sensors to properly identify targets in a hostile environment. Many other lecturers [1] at this NATO ASI have presented DS theory, so it will not be detailed further here.
1. The Aurora’s Current Sensor Suite The Aurora has dissimilar non-imaging sensors for fusion x A 2-D radar (the AN/APS-506) x An Electronic Support Measures (ESM) system (the AN/ALR-502) providing passive ID information through detection of emitters which are crosscorrelated with a realistic a priori Platform Data Base (PDB) x An Identification Friend or Foe (IFF) (the AN-APX-502) providing allegiance (if in proper working condition) x A datalink, Link-11, mainly for ID information and complementary imaging sensors for fusion
P. Valin / Evidence Theory for Robust Ship Identification
93
x
A Spotlight Synthetic Aperture Radar (SAR) planned upgrade currently under way, for which a cued classifier was designed and implemented. x A Forward-Looking Infra-Red (FLIR) passive sensor (the OR-5008/AA), for which several different classifiers were implemented and their results fused. In order to simplify the presentation of ID fusion results, only results from the fusion of interpreted SAR imagery with ESM reports will be shown.
2. The Attributes of the PDB The attributes, over which one has to reason during the level-1 Object Refinement phase of fusion, often referred to as Multi-Sensor Data Fusion (MSDF), can originate from imaging or non-imaging sensors, and be kinematical, geometrical or relate more directly to an ID (if those come from intelligent sensors such as an IFF, or an ESM, or imagery classifiers). These attributes form the columns of the PDB and the rows correspond to all the possible platforms that can be encountered. Kinematical attributes can be estimated through tracking in the positional estimation function of DF, and through reports from IFF and datalink. Since the tracker can provide speed, acceleration and sometimes altitude, attributes such as maximum (max) acceleration, max platform speed, minimum (min) platform speed, cruising speed, and max altitude either serve as bounds to discriminate between possible air target IDs or suggest the plausible IDs. However, speed reports should be fused only if they involve a significant change from past historical behaviour in that track. The reason is two-fold: 1. First, no single sensor must attempt to repeatedly fuse identical ID declarations, otherwise the hypothesis that sensor reports are statistically independent is violated. 2. Second, the benefits of the fusion of multiple sensors are lost when one sensor dominates the reports. Geometrical attributes can be estimated by algorithms which post-process imaging information from sensors such as the FLIR, or Electro-Optics (EO) and SAR. Classifiers that perform such post-processing can be thought of as Image Support Modules (ISM) performing much the same functionality as the ESM does for the analysis of electromagnetic signals. These ISMs can provide the three geometrical dimensions of height, width and length (for FLIR and EO), and also Radar Cross Section (RCS) of the platform as seen from the front, side or top. In addition, the distribution of relevant features may be needed for classifiers, but may be considered part of the algorithms that generate plausible IDs. Identification attributes can be directly given by the ESM, as outputs of the FLIR and SAR ISM, from acoustic signal interpretation (for surface and sub-surface targets), and from Doppler radar (for airborne targets). The ESM requires an exhaustive list of all the emitters that are carried by the platform, since it will provide an emitter list with some confidence level about the accuracy of the list that reflects the confidence in its electromagnetic spectral fit. However an IFF response can lead to an identification of a friendly or commercial target but the lack of a response does not necessarily imply that the interrogated platform is hostile. One has to distribute the lack of a response between at least two declarations: the most probable foe declaration and a less probable friendly or neutral declaration corresponding to an IFF equipment that is not working or absent. On the other hand, the ISMs are usually designed to not only provide the
94
P. Valin / Evidence Theory for Robust Ship Identification
best single ID possible, but also to estimate confidence in higher levels of an appropriate taxonomy tree (STANAG 4420 or MIL-STD 2525B, which are mostly consistent, but vary in the detail provided).
3. Justification for Choosing DS Evidence Theory The best choice of a method for combining sensor propositions (such as from the ESM and the SAR ISM depends on such factors as: x Must process incomplete information ==> notion of ignorance x Sensor performance is not always monitored ==> notion of uncertainty x Must handle conflicts between contact/track ==> notion of conflict x Must not require a priori information ==> no Bayesian reasoning x Real-time method ==> possibility of approximation (truncation) is required x Operator wants best ID ==> give preference to single ID (singleton) x Operator wants next best thing ==> doublet (2 best IDs), triplet, etc… x Must resist countermeasures ==> conflict again (emitter not in PDB) x Must resist false associations ==> ESM report associated to wrong track x Must be tested operationally ==> complex scenarios needed x Ordinary method must explode ==> large complex PDB needed Thus, one requires a reasoning method where ignorance, uncertainty and conflict have mathematical meaning, which is robust, and which can be simplified to reduce calculational complexity. It is well known that incomplete, uncertain, and sometimes conflicting information is ideally suited to DS evidential reasoning, where "mass" or Basic Probability Assignment (BPA) plays the role of the probability. Indeed, when the intersection of sets is null for certain combinations between the new contact and the existing track, conflict exists. Furthermore, when one is uncertain of the correctness of the declared proposition, and its associated probability, it is wise to assign a small mass to the ignorance, as well as the best estimate for the larger mass of the declared proposition. Finally, well-documented and tested approximation (truncation) methods exist that keep the algorithm real-time. This approximation at every fusion step is absolutely necessary since, for a PDB of size N, one may have to keep tracks of up to 2N combinations (the power set) with associated masses becoming increasingly smaller (of order 2-N). A realistic military PDB can have a few hundred (in this lecture, about 140) to several thousand platforms (our most recent PDB has about 2,200), so, without approximation, one would have to monitor typically 21000 or approximately 10300 platforms, with masses expressed in floating point arithmetic of extremely high precision.
4. Hierarchical SAR Imagery Classifier A hierarchical SAR classifier was designed and implemented to provide three complex declarations that are sent . 1. first, it provides an estimate of the Ship Length (SL) interval, and correlates it to all platforms of the PDB
P. Valin / Evidence Theory for Robust Ship Identification
2. 3.
95
then, it analyses the superstructure profile to identify whether the ship is a line combatant or a merchant (using a neural net trained on Knowledge Base rules), thus providing a declaration of Ship Category (SC) and finally (in the case of a combatant), it provides a Ship Type (ST) declaration by Bayesian reasoning over length distributions for the five types: frigate, destroyer, cruiser, battleship, and aircraft carrier (in order of increasing mean length). Length is a discriminator for line combatants, as shown in Figure 1 for the probability of length s for ship type t. However length is not a discriminator for merchant ships.
Figure 1. Probability distributions for the 5 line combatant types as a function of length
5. The Designed Scenarios The complete set of fusion algorithms (registration, association, positional fusion and identity information fusion) that can lead to timely ID were tested for two scenarios, in which radar and ESM contacts were provided by DRDC-V’s Concept Analysis and Simulation Environment for Automatic Tracking and Identification (CASE-ATTI) sensor module, and SAR imagery was simulated with DRDC Ottawa’s SARSIM. 1. Maritime Air Area Operations (MAAO) which involves the ID of three enemy Russian ships (Udaloy destroyer, Kara cruiser, and Mirka frigate) in the presence of ESM countermeasures, and which fuses the SAR ISM results since the enemy line ships are of different types. 2. Direct Fleet Support (DFS) involving American and Canadian convoys, which are also imaged by the SAR, but for which miss-associations can occur, due to the geometry of the scenario, and for which certain SAR images are atypical of the type of ship being imaged, leading to false declarations from the ISM.
6. Typical Results in the MAAO Scenario The performance of the SAR ISM classifier is shown in Figure 2 below. The three declarations (SL, SC and ST) are clearly indicated (from top to bottom, or left to right). For example, for the Kara cruiser, the ST declaration would have the set of all cruisers
96
P. Valin / Evidence Theory for Robust Ship Identification
assigned a mass of 0.67, the set of all destroyers assigned a mass of 0.10 and the set of all aircraft carriers assigned a mass of 0.04, with ignorance having the residual mass.
Figure 2. SL, SC and ST Declarations for Russian ships in the MAAO Scenario
The evolution of fusing ESM data (represented by triangles) and SAR data results in the refinement of the ID of the Kara Azov upon fusing the key emitters #92 and #93, as shown in Figure 3 below. SAR data arrives late enough in the scenario that platform ID was already firmly established.
Figure 3. Evolution of the Leading Proposition when Fusing ESM and SAR Data for the Kara Azov
7. Typical Results in the DFS Scenario The performance of the SAR ISM classifier is shown in Figure 4 below. The three declarations (SL, SC and ST) are clearly indicated (from top to bottom, or left to right). In this case, it should be noticed that the Virginia is an atypically small cruiser and is miss-identified as a Destroyer by the Bayes length classifier. Indeed, by referring to
P. Valin / Evidence Theory for Robust Ship Identification
97
Figure 1, a length of 127 meters is near the peak of the destroyer distribution and is in the tail of the cruiser one.
Figure 4. SL, SC and ST Declarations for American ships in the DFS Scenario
The evolution of fusing ESM data (represented by triangles) and SAR data results in the refinement of the ID of the Ticonderoga, where the fusion of emitter #110 and the SL, SC, and ST declarations provide the final correct ID, as shown in Figure 5 below..
Figure 5. Evolution of the Leading Proposition When Fusing ESM and SAR Data for the Ticonderoga
It is clear that automatic fusion of ESM reports to tracks and the correct interpretation of SAR imagery data can lead to correct ID under most conditions. Errors in ID can occur occasionally due to algorithmic errors such as false associations, or wrongly interpreted imagery, or as the result of intentional deceit, e.g. countermeasures. It is our experience that when only one such type of error is present, the DS scheme is robust enough to recover. When two or more are present, DS only recovers if enough correct data, such as provided by the ESM, are fused. The same
98
P. Valin / Evidence Theory for Robust Ship Identification
conclusions hold if other data (IFF, tracking info such as speed, Link-11 ID data, FLIR, …) are included.
8. Aurora Incremental Modernization Program The original sensor suite of the CP-140 was essentially that of an anti-submarine warfare platform such as the US S-3 Viking, but the airframe itself was that of a US P3 Orion. The CP-140 is currently undergoing an Aurora Incremental Modernization Program (AIMP) to update its sensors, according to its new more peaceful roles. Its new sensors will provide data to a modernized Data Management System (DMS), which could have a “fusion box” providing fused data to an operator for a human-inthe-loop aid to decision making. x The new L3 Wescam MX-20 EO/IR replacing the ageing FLIR, for which several classifiers (k-NN, neural net, Dempster-Shafer, Bayes…) were designed, tested, and fused several ways. x The new Telephonics APS-143 SAR, for which a SAR classifier for category (line, merchant, etc.) and type (e.g., frigate, cruiser, destroyer, aircraft carrier, etc.) has been designed and tested, as shown in this study. x The new Lockheed ESM ALQ-217, which enables extreme bearing accuracy. This study has shown that a fusion capability would have helped even with the old sensor suite. Of course, with the improved sensor suite, more complex missions can be attempted and the fusion capability (new algorithms?) will have to be re-evaluated in the near future.
9. Conclusions Through the use of an a priori database and the DS reasoning framework, all sensors and ISMs contribute declarations of (possible multiple) propositions, which can be fused to achieve a correct platform ID. The DS scheme is robust in the sense that it can handle conflicts, ignorance, and ambiguities, which can result from inadequate performances from sensors or ISMs, or from miss-associations in difficult tracking conditions.
References [1]
See for example the lectures of P. Vannoorenberghe, J. Dezert and F. Smarandache.
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
99
Improved Threat Evaluation using Time of Earliest Weapon Release Eric MÉNARD and Jean COUTURE Lockheed Martin Canada, 6111 Ave. Royalmount, Montréal, Qc, H4P 1K6
Abstract. Lockheed Martin Canada has recently developed a Situation and Threat Assessment and Resource Management application using some recent technologies and concepts that have emerged from level 2 and 3 data fusion research. The current paper describes some exploration work on Improved Time of Earliest Weapon Release for threat evaluation and the utilization of target weaponry system information for threat evaluation refinement. Keywords. Threat Assessment, Situation Assessment, Time of Earliest Weapon Release
Introduction In recent years, Lockheed Martin Canada (LM Canada) has been active in data fusion research, specifically in evaluating methods, algorithms, techniques and architectures. Most of this research effort has involved experimentation, i.e., implementing and testing the most promising techniques of the research. This practical experimentation allows potential solutions to be identified and evaluated for those problems and issues that do not materialize during theoretical approaches. This experimentation has led to the development of the Situation and Threat Assessment (STA) and Resource Management (RM) application (hereafter STA/RM) [1]. This application integrates the capabilities of a modern high-level fusion system based on concepts from the data fusion level 2-3 as per the JDL model [2]. The STA/RM application is built on an expendable framework where new algorithms can be easily implemented and evaluated [3]. The current paper focuses on experimentation performed to evaluate the use of information from an external a priori database about a target’s weaponry system, during the threat evaluation process. A brief scenario of three targets attacking a valuable asset will be given and the results from the basic threat evaluation and improved Time of Earliest Weapon Release (TEWR) will be compared.
1. Simulation Environment The R&D Testbed is a framework used to conduct investigations on Data Fusion algorithms and perform data analysis. This paper concerns the following points of interest.
100
E. Ménard and J. Couture / Improved Threat Evaluation
1.1. Multi-Source Data Fusion (MSDF) The MSDF application executes the processing related to Data Fusion level 1: x Association x Gating x Tracking x ID Fusion 1.2. Situation and Threat Assessment and Resource Management (STA/RM) The STA/RM application regroups (level 2-3) Data Fusion and is divided into the following two major components: 1.2.1. Situation and Threat Assessment STA combines the following functional components: x Situation assessment x Threat evaluation x Target/weapon analysis x Engagement planning 1.2.2. Resource Management The RM component regroups the following functionalities: x Weapon assignment x Execution of planning The remainder of the paper describes the external a priori database used for this threat evaluation comparison, and the performance of two threat evaluation algorithms on a brief scenario.
2. External a priori Database Information Threat assessment evaluation is based on information about a target determined by MSDF and target ID refinement by STA/RM. The new concept added to the STA/RM for the ID refinement process consists of interrogating an external database to retrieve information about the target weaponry system. This database was built in collaboration with and sponsored by DRDC Valcartier [4, 5] and contains a wealth of a priori information about: x Aircraft and ships (more than 2 000) of all types and their characteristics x Sensors (more than 1 500) and their characteristics x Weapons (more than 500) of all types and their characteristics x Ground infrastructures, maps, pictures, and documents.
E. Ménard and J. Couture / Improved Threat Evaluation
101
3. Threat Assessment Implementation The main tasks of a “real life” threat assessment system are the evaluation of the threat level of non-friendly tracks and the ranking of those threat levels to build a prioritized threat list. This list is in turn used to establish engagement planning and reserve and assign resources to the most threatening entities. Threat level evaluation usually addresses three different aspects of a threat: its opportunity to do damage, its capability to do damage and its intent [6]. This paper demonstrates the utility of integrating the target capabilities’ information for threat assessment evaluation. The following sections describe three threat evaluation algorithms. 3.1. Basic Threat List The basic threat evaluation list is also called threat’s opportunity and was addressed during the first implementation phase of STA/RM. Opportunity is defined relative to a specific location or a valuable asset, like the ownship, and represents a time and space measure of how close the threat will approach its assumed target (i.e., the location of asset). Its computation requires the following threat information: x Speed x Heading x Closest Point of Approach (CPA) x Time to reach CPA Many different algorithms exist to perform the computation, each having a particular method for weighting the different pieces of information. In any case, using only the opportunity to deduce threat values can be misleading for the following reasons: 1. Threat opportunity does not take into account that: a) Slow entities (e.g., ships, submarines) are usually assigned low threat values in spite of the possibility that they may have very threatening long range weapons b) Projection of entity trajectories may not lead directly towards the asset to be protected but entity still can launch weapons directly to the asset 2. Threats may not have the intent to attack. It is therefore necessary to include a threat’s capability in the threat evaluation process. 3.2. Time of Earliest Weapon Release A threat’s capability has just been implemented in the latest STA/RM implementation phase. To include a threat’s capability in the overall threat evaluation process, the two following information sources are required: x Threat’s identity provider x Characteristics and capabilities of the identified entities The first source is fulfilled by the MSDF application and uses Dempster-Shafer evidential theory to compute entity identities. The second source of information comes from an external a priori database described in Section 2.
102
E. Ménard and J. Couture / Improved Threat Evaluation
With these two sources of information in place, we investigated two different methods that include a threat’s capability in threat evaluation [7]: the Constant Velocity Time Of Earliest Weapon Release (CVTEWR) and the Maneuver Time Of Earliest Weapon Release (MTEWR). These two methods are explained in Figure 1. The former uses the estimated time the identified threat would take to launch its most threatening weapon (e.g., the one having the longest range) if the threat maintains its current velocity. The second method is similar except that the threat is assumed to instantaneously break its trajectory in order to launch its most threatening weapon in a minimum of time. In the end, we implemented the MTEWR method, which constitutes a “worst case scenario” compared with the other method. This method requires the following a priori information: x Precise threat identity x “Most threatening” weapon identity x Weapon velocity x Weapon maximum range x Weapon type (Missile, Close-In Weapon System (CIWS), Gun, Cannon, Torpedo, Mortar) x Weapon Utility (air / surface / subsurface)
Figure 1. Definition of CVTEWR and MTEWR computational methods
This method returns a threat value between 0 and 1 until the point of earliest weapon range is reached. Once a target reaches the maximum range of a target weapon or is inside the range, the threat value remains equal to 1. The problem is that targets inside the maximum weapon range cannot be prioritized. This effect can be solved with the Improved TEWR. 3.3. Improved TEWR This method divides a threat value into three parts: x TEWR [0, 0.7] x Time for the weapon to reach the asset to protect [0. 0.2] x If the target itself is a missile + 0.1 The sum of the three parts forces a threat value into [0, 1] interval. An additional feature was also implemented to detect whether an incoming missile is dangerous to the asset. The feature verifies whether the distance between the missile and the asset is larger than the maximum missile range. If this is the case, 0.5 is subtracted from the threat value.
E. Ménard and J. Couture / Improved Threat Evaluation
103
Figure 2. Improved TEWR threat value evaluation
4. Results 4.1. Scenario
Figure 3. Scenario with fast and long range missile on the bomber
This scenario has two fighters going straight to the valuable asset and a bomber passing on the side. The valuable asset has a weaponry system with two illuminators for guiding missiles. Each of these illuminators can engage only one target and will not be released until the engaged target is destroyed. By using only the position and the speed for the Basic Threat List Evaluation (Table 1), the two fighters get the highest priorities and are engaged first. Once fighter #1 is destroyed, the bomber has already moved away too fast from the valuable asset to be engaged. From the external a priori database, the STA/RM receives weapons information for these targets, including information that the bomber is equipped with a fast long range missile. The Improved TEWR Threat List Evaluation method sets the bomber as the second highest threat. The weaponry system engages and destroys fighter #1 and the bomber and, when an illuminator is released, fighter #3 is engaged and destroyed. The Improved TEWR method allows all threats to the valuable asset to be destroyed.
104
E. Ménard and J. Couture / Improved Threat Evaluation
This method does not kill the targets earlier but kills them according to a threat ranking based on more integrated target information. Table 1. Comparison between Basic Threat List evaluation and Improved TEWR Threat List Evaluation
With basic threat list evaluation Track name Fighter #1 Bomber #2 Fighter #3
Threat value 0.75 0.6 0.72
Created (s) 0.0 0.9 0.9
Threat value 0.9 0.85 0.8
Created (s) 0.0 0.9 0.9
Killed (s) 70.0 0.0 74.7
Time To Be Killed (s) 70.0 Never be killed 73.8
Distance To Be Killed (m) 14584.0 Never be killed 15805.6
With improved TEWR Track name Fighter #1 Bomber #2 Fighter #3
Killed (s) 68.0 62.4 98.7
Time To Be Killed (s) 68.0 61.5 97.8
Distance To Be Killed (m) 15242.0 13455.2 9176.4
5. Conclusion In this paper, we demonstrated the advantage of using the Improved TEWR Threat List Evaluation compared with the Basic Threat Evaluation by taking onboard target weapon characteristics for threat evaluation. The Improved TEWR is an exploration of ideas to refine threat evaluation processing and there remain some refinement and additional issues to be investigated, such as: x Improve handling of threats with incomplete identity x Refine the concept of “Most Threatening” weapon by including threat sensors, jamming and softkill capabilities x Introduce Measures of Performance (MOPs) based on probability of killing targets and probability of survival A great deal of information can be further extracted from the external a priori database about sensors, jammers and flares onboard the targets. This information can be used to refine the threat evaluation by assigning a higher threat status to targets that could jam the valuable asset weaponry system. A smarter choice of weapon to use for targets should depend on the target defence system against this weapon to increase the probability of kill.
References [1] [2] [3]
E. Shahbazian, J.R. Duquet, P. Valin, A Blackboard Architecture for Incremental Implementation of Data Fusion Applications, in FUSION 98 Las Vegas, 6-9 July 1998, Vol I, pp 455-461. A.N. Steinberg, C.L. Bowman, F.E. White, Revisions to the JDL Data Fusion Model, in Joint NATO/IRIS Conference, Quebec City, Quebec, 19-29 October, 1998. P. Bergeron, J. Couture, J.R. Duquet, M. Macieszczak, and M. Mayrand, A New Knowledge-Based System for the Study of Situation and Threat Assessment in the Context of Naval Warfare, in FUSION 98 Las Vegas, 6-9 July 1998, Vol II, pp 926-933.
E. Ménard and J. Couture / Improved Threat Evaluation
[4] [5] [6]
[7]
105
J.-F. Truchon and J. Couture, MSDF/STARM Libraries Study – Data Representation for the Information Libraries, Doc. No. 6520014004, Lockheed Martin Canada, 2002. J. Couture, J.R. Duquet, and Y. Allard, MSDF/STARM Libraries Study –Final Report, Doc. No. 6520014004, Lockheed Martin Canada, 2002. Couture J. and E. Menard, Issues with Developing Situation and Threat Assessment Capabilities, Data Fusion Technologies for Harbour Protection, NATO Advanced Reserch Workshop, Tallinn, Estonia, June 27-July 1, 2005 (in preparation). M.G. Oxenham, Enhancing Situation Awareness for Air Defence via Automated Threat Analysis, In Proceedings of the Sixth International Conference on Information Fusion (FUSION 2003), Cairns, Australia, 8-11 July 2003. International Society of Information Fusion, 2003, pp. 1086-1093.
106
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Detection of Structural Changes in a Multivariate Data Using Change-Point Models David ASATRYANa,1, Boris BRODSKYb, Irina SAFARYANc a Institute for Informatics and Automation Problems, Armenia b Central Economics and Mathematics Institute, Russian Federation c Slavonic University of Armenia
Abstract. Many problems of image processing, remote sensing and remote control can be formulated in terms of detection of structural changes in observed multivariate temporal or spatial data. The proposed lecture considers modern methods of detection of structural changes in multivariate data and some important applications. Various methods for the effective solution of these problems are described. The most popular methods of change-point determination in the onedimensional interpretation are given: parametrical and non-parametrical statistical methods, a wavelet analysis method. A method of detection of structural changes in a multivariate regression analysis is considered. Keywords. Change-point determination, structural changes, nonparametric methods, segmentation, wavelet analysis, multivariate regression model
Introduction Adequate and effective information processing is an important part of a decisionmaking system that operates by means of remote sensing and satellite imagery. Various problems of target detection, recognition and threat assessment are solved by information received from multi-sensor systems. Systems of this type usually use algorithms based on statistical methods, digital data, signal and image processing and other mathematical methods. As a result of such processing, an object or a phenomena is derived, which is detected from the presence of various disturbing factors, noises and distortions. If remote sensing occurs in a time and/or space domain, the corresponding methods are likewise performed in a time and/or space domain. It is supposed that the presence or absence of objects of interest or expected phenomena in the scene under analysis influences the character and structure of the received data. By structure we mean the regularities of the observed objects’ behaviour as shown in their data distribution, dependence between variables, the presence of groups of observation results with certain properties, etc. Therefore, many problems of image 1
Corresponding Author: David Asatryan, Institute for Informatics and Automation Problems, 1, P.Sevaki Str., Yerevan, 375014, Armenia; E-mail:
[email protected] D. Asatryan et al. / Detection of Structural Changes in a Multivariate Data
107
processing, remote sensing and remote control may be formulated in terms of detection of structural changes (or jumps) in the observed multivariate temporal or spatial data. The past four decades have seen considerable theoretical and empirical research on the detection of abrupt changes in multivariate data and its applications to various problems of regression analysis, monitoring of dynamical systems and other stochastic models (well known as change-point analysis). Development of these models has resulted in the creation of a huge number of approaches, methods and algorithms for detecting abrupt changes in the structure and (or) the features of the available information. Review of all these methods in a limited paper is impossible. The given lecture is therefore devoted to a brief but systematic presentation of some popular processing methods and algorithms for detecting abrupt changes in onedimensional and multivariate data.
1. Simplest Change-Point Problem One of the initial papers devoted to this problem is [1]. A vast amount of scientific literature on this topic currently exists. We can refer, for example, to [2]. The simplest change-point problem is formulated as follows. Let X 1 , X 2 ,..., X n be a sequence of i.i.d. random variables. Let’s suppose that for a fixed (but unknown) value of 1 d k d n , the random variables X1 , X 2 ,..., X k are i.i.d. with a distribution function F1 ( x ) and, analogously, the random variables X k 1 , X k 2 ,..., X n are from a distribution function F2 ( x ) , with it being known that F1 ( x ) z F2 ( x ) at least in a point x . Hence, the value of k is named a change-point in the sequence X 1 , X 2 ,..., X n .
Let x1, x 2 ,..., x n be the observation results. Concerning the distribution function Fj ( x ), j 1,2 , various assumptions can be made, depending on the considered sensor control model and the presence of aprioristic information regarding the type and parameters of distribution. For example, in a continuous case a normal, exponential or other model can be accepted with some unknown parameters. We need to estimate the change-point k in the various models of structure of a sequence X 1 , X 2 ,..., X n , which can be a multivariate variable as well. To complete a Change-Point Problem (CPP) it is necessary to define a model M M (X; F1 , F2 | k ) that connects the distribution of observed variables with changepoint k and a criterion ) )(k | M; x 1 , x 2 ,..., x n ) for estimating procedure (it can be either maximized or minimized). Thus, we can estimate the unknown change-point via the following procedure: ^
k
arg max ) (k | M; x 1 , x 2 ,..., x n ) 1d k d n
Let’s consider an example of CPP based on a normal distribution model. Let 1, 2 be normal distribution functions with the parameters ( P j , V j 2 ); j 1, 2 .
F j ( x ), j
At first, we consider a case of known parameters ( P j , V j 2 ); j 1, 2 .
108
D. Asatryan et al. / Detection of Structural Changes in a Multivariate Data
Absence of a change-point means that all instances of X i , i 1,...n have a normal distribution N(P 1 , V1 ) (hypothesis H 0 ). This situation can be considered as k ! n . The presence of a change-point means that X i ~ N(P1 , V12 ), i 1,...k , and
X i ~ N(P 2 , V 22 ), i k 1,...n (hypothesis H1 ). We want to test the hypothesis H 0 against the alternative hypothesis H1 . The logarithm of the likelihood ratio for these two alternatives at the independent observations is as follows P 2 P1
ln / n
V
S nk (P 0 , Q)
where Q
2
n § P P2 · ¸ ¦ ¨ xi 1 2 i k© ¹
P 2 P1 V
2
n § P P1 · ¸ ¦ ¨ x i P1 2 2 ¹ i k©
1 V2
S nk (P1 , Q),
n § Q· Q ¦ ¨ x i P0 ¸ , 2¹ i k©
P 2 P1 is a jump magnitude (by taking into account the sign). We can set
) (k | M; x 1 , x 2 ,..., x n )
S nk (P 0 , Q) , i.e., as an estimate of unknown change-point k we ^
can use the maximum likelihood ratio as follows k
arg max S nk (P 0 , Q) . 1d k d n
When the jump magnitude Q is unknown (this case usually occurs in applications) we can use the same technique based on likelihood ratio but now we must estimate the change-point and jump magnitude at the same time [2]. It is obvious that this approach needs to use the complete information on probability distribution function of observation; therefore it is not robust in general.
2. Nonparametric Methods
In contrast to the previous situation we can suppose that the distribution functions F j ( x ), j 1,2 are unknown but we know some integrated information on their behaviour. This problem is considered in [3]-[6] in detail, so only a simple case is provided to demonstrate the ideas of this approach. Let Z( n , k ) be a two-sample nonparametric statistic to test a hypothesis H 0 (where the samples x 1 , x 2 ,..., x k and x k 1 , x k 2 ,..., x n are from the same distribution) against the alternative hypothesis H1 . As the statistics of such kind we can indicate the Wilcoxon (or connected with them Mann-Witney statistics), some statistics based on range powers and many others.. We assume that if a change-point exists, then F1 ( x ) z F2 ( x ) and F1 ( x ) ! F2 ( x ) . This model M is considered, particularly, in [4]. Let’s consider, for example, the Mann-Witney statistics, which have the following expression Z( n , k )
k n 1 ¦ ¦ Z ij (n, k ) , k (n k ) i 1 j k 1
109
D. Asatryan et al. / Detection of Structural Changes in a Multivariate Data
where Z ij (n , k )
1, x i d x j ®0, x ! x , i 1,2,..., k; j k 1, k 2,..., n . i j ¯
Thus, we can put ^
) (k | M; x 1 , x 2 ,..., x n )
Z(n , k ) and k
arg max Z(n , k ) . 1d k d n
3. Multiple Change-Point Problem (CPP)
A situation with many change-points is more typical. There are a few methods for change-point determination in the multivariate case. We’ll consider two of these, each differing in mathematical approach and, consequently, in processing method. 3.1. Segmentation Method
This method uses a time series segmentation to break a series into homogeneous pieces. The quality of a segmentation is determined by the sum of the squared deviation of the data from the means of their respective segments; in what follows we will use the term segmentation cost for this quantity. Given a time series, the procedure computes the minimal cost segmentation with K 2,3,... change-points. The procedure gradually increases K and, for every value of K , the best segmentation is computed. The procedure is terminated when differences in the means of the obtained segments are no longer statistically significant (as measured by Schefe’s contrast criterion). In this section we formulate time series segmentation as an optimization problem. We follow Hubert’s presentation [7], but modify his notations and formulate them in the terms of a time series. Given a time series x 1 , x 2 ,..., x T and a number K , a segmentation is a sequence of times t ( t 0 , t 1 ,..., t K ) that satisfies 0 t 0 t 1 ... t K 1 t K T . The intervals of integers >t 0 1, t 1 @ , >t 1 1, t 2 @ ,…, >t K 1 1, t K @ are the segments and the times t 0 , t 1 ,..., t K are the change-points. K , the number of segments, is the order of the segmentation. The length of the k th segment (for k 1,2,..., K ) is denoted by Tk t k t k 1 . The following notation is used for a given segmentation t ( t 0 , t 1 ,..., t K ) . For k 1,2,..., K define ^
Pk
1 Tk
tk
¦ x t , dk
t t k 1 1
tk
^
2 ¦ (x t P k ) , D K (t)
t t k 1 1
where D K ( t ) defines the cost of segmentation t
K
¦ dk
k 1
K
¦
^
tk
2 ¦ (x t P k ) ,
k 1 t t k 1 1
( t 0 , t 1 ,..., t K ) . ^
Now we can define the best K -th order segmentation t to be the one minimizing ^
D K ( t ) and denote the minimal cost by D K
^
^
^
^
D K ( t ) . Note that we have D K t D K 1
110
D. Asatryan et al. / Detection of Structural Changes in a Multivariate Data
for every K . One can show that the number of possible segmentations grows exponentially with T . Minimization of D K can be achieved by several alternatives. There are many algorithms to efficiently search the set of all possible segmentations (Hubert uses a branch-and-bound approach). One can consider a cost function as follows )
) ( t 0 , t 1 ,..., t K | M; x 1 , x 2 ,..., x T )
D K (t) ,
and minimize it for the set of all discrete values of t 0 , t 1 ,..., t K . 3.2. Wavelet Analysis
Because of their good time-frequency localization, among other reasons, wavelets have proven useful in many applications in statistics and other fields (especially signal and image processing techniques). In particular, they are well equipped to deal with abrupt jumps and other irregular features in nonparametric regression. Following the approach of Daubechies, we start with two related and specially chosen, mutually orthonormal, functions or parent wavelets: the scaling function I , (sometimes referred to as the father wavelet), and the mother wavelet, \ . Other wavelets in the basis are then generated by translations of the scaling function I , and dilations and translations of the mother wavelet \ , using the relationships: I j0 , k ( t )
2 j0 / 2 I(2 j0 t k ); j 0 , k Z , \ j,k ( t )
2 j / 2 \(2 j t k );
j, k Z (4-1)
for some fixed j 0 Z , where Z is the set of integers. Typically the scaling function I resembles a kernel function and the mother wavelet \ is a well-localized oscillation (hence the name wavelet). A unit increase in j in (4-1) (i.e., dilation) has no effect on the scaling function ( I j0k has a fixed width), but packs the oscillations of \ jk into half the width (doubles its “frequency” or, in strict wavelet terminology, its scale or resolution). A unit increase in k (i.e., translation) shifts the location of both I j0 k and \ jk , the former by a fixed amount ( 2 j0 ) and the latter by an amount proportional to its width ( 2 j ). Given the above wavelet basis, a function y( t ) is then represented in a corresponding wavelet series as: y( t )
f
¦ c j0 k I j0k ( t ) ¦ ¦ w jk \ jk ( t ) , c j0 k
kZ
j j0 kZ
y, I j0 k
and w jk
y, \ jk .
The parent wavelets need to be specially chosen if that is to be the case. For our purpose, the best simplest wavelet basis seems to be the Haar basis, which uses a parent couple given as follows
D. Asatryan et al. / Detection of Structural Changes in a Multivariate Data
I( t )
1, 0 d t d 1 , \(t ) ® ¯ 0, otherwise.
111
1, 0 d t d 1 / 2, ° ® 1, 1 / 2 d t 1, ° 0, otherwise. ¯
The heuristic underlying the development of this method is that under the alternative, most of the empirical coefficients will still be near zero, but that a few coefficients, localized to the area of the change-point, will exhibit significant signals.
4. Detection of Structural Changes in a Regression Model
The CPP for regression models was first considered by Quandt [8]. The following model of observations (X 1 , Y1 ),..., (X n , Yn ) was analyzed: Yj
E 0 E1 X j VZ j , ® (E ' ) (E ' )X VZ , 0 1 1 j j ¯ 0
where Z j are i.i.d.r.v.’s with EZ j
0, EZ 2j
jd k , (5-1) j! k
1;
(' 0 , ' 1 ) z (0,0) .
If 1 d k d n 1 then statistical characteristics of the dependent variable Y j change at the instant k , and if k n then model (5-1) is statistically homogenous. We consider a method [7] for estimation of the change-point k by observations (X 1 , Y1 ),..., (X n , Yn ) . The general statement of the CPP for the linear regression models can be formulated as follows. Suppose y i , i 1,2,..., k are i.r.v.’s. Under the null hypothesis H 0 the linear model is yi
Ex *i H i , 1 d i d n ,
where E (E1 , E 2 ,..., E d ) is an unknown vector of coefficients; x i ( x 1i , x 2i ,..., x di ) are known predictors and * is the transposition symbol. The errors are supposed to be i.i.d.r.v.’s with
EH i
0, 0 V 2
DH i f .
Under the alternative hypothesis H 1 a change at the instant k * occurs, i.e. yi
Ex *i H i , 1 d i d k * , ® * * ¯Jx i H i , k i d n
where k * and J R d are unknown parameters, and E z J . Denote
112
D. Asatryan et al. / Detection of Structural Changes in a Multivariate Data
1 ¦ xi , Qn n 1d i d k
1 ¦ yi , x k n 1di d k
yk
and X n
( x 1 , x 2 ,..., x n ) * , Yn
¦ ( x i x n )( x i x n ) * ,
1di d n
( y1 , y 2 ,..., y n ) * .
The least square estimate of E is ^
E *n
^
^
^
(E1n , E 2 n ,..., E dn )
(X *n X n ) 1 X *n Yn
Many authors propose to reject H 0 for the large values of max | U n (k ) | , where 1d k d n
§ k · ¨ ¸ ©1 k / n ¹
U n (k )
^
1/ 2
1 k(x
y k y n E n (x k x n ) * k
x n )( x k x n ) * (Q n (1 k / n ))1 / 2
.
^
Thus, the change-point estimate has a form k
arg max | U n (k ) | . 1d k d n
If k ! 1 for a given k -partition ^i 1 , i 2 ,...i k `, the least-squares estimates for the E j can easily be obtained. The resulting minimal Residual Sum Of Squares (RSS) is given by RSSi 1 , i 2 ,...i k
k 1
¦ rss(i j1 1, i j ) , where rss(i j1 1, i j ) is the usual minimal RSS
j 1
in the j th segment. The problem of dating structural changes is to find the change^
^
points i 1 ,..., i k that minimize the objective function ^ · §^ ¨ i 1 ,..., i k ¸ © ¹
arg min RSSi 1 ,..., i k (i1 ,...,i k )
over all partitions i 1 , i 2 ,...i k with i j i j1 t n h t m
References [1] [2] [3] [4] [5]
E. Page. A test for a change in a parameter occurring at an unknown point. Biometrica, 42: 523-527, 1955. M. Basseville, I. Nikiforov. Detection of abrupt changes: Theory and applications. Prentice-Hall, N.Y., 1993. B. Brodsky, B.Darkhovsky. Nonparametric change-point detection. Proceedings of the 2nd IFAC Symposium on Stochastic Control, Vilnius, 1986. D. Asatryan, I. Safaryan. Nonparametric methods for detecting changes in the properties of random sequences. In: Detection of Changes in Random Processes (ed. L.Telksnys) N.Y. 1986, pp 1-13. B. Brodsky, B. Darkhovsky. Nonparametric Methods in Change-Point Problem. Kluwer Academic Press, 1993.
D. Asatryan et al. / Detection of Structural Changes in a Multivariate Data
113
B. Brodsky, B. Darkhovsky. Non-Parametric Statistical Diagnosis: Problems and Methods. Kluwer Academic Publishers. The Netherlands. 2000. [7] P. Hubert. The segmentation procedure as a tool for discrete modeling of hydrometeorogical regimes. Stoch. Env. Res. and Risk Ass., vol. 14, pp.297-304, 2000. [8] R.E. Quandt. The estimation of parameters of a linear regression system obeying two separate regimes. Journal American Statistical Asso-ciation, 50, 873–880, 1958. [9] M. Cs¨org˝ o, L. Horvath. Limit theorems in change-point analysis. Chichester: Wiley, 1997. [10] J. Bai, P. Perron. Estimating and testing linear models with multiple structural changes. Econometrica, 66, 1, 47–78, 1998. [11] B. Darkhovsky. Retrospective change-point detection in some regression models. Theory of Probability and Applications, 40, 4, 898-903, 1995. [6]
114
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Unification of Fusion Theories (UFT) Florentin SMARANDACHE The University of New Mexico 200 College Road Gallup, NM 87301, USA
[email protected] Abstract. We propose Unification of Fusion Theories and Fusion Rules in solving problems/applications. For each particular application, check the reliability of sources, and select the most appropriate model, rule(s), fusion theories, and algorithm of implementation. The unification scenario presented herein, which is in an incipient form, should periodically be updated to incorporate new discoveries from fusion and engineering research. Keywords. Fusion theories, fusion rules, lattice, Boolean algebra, Lindenbaum algebra, frame of discernment, model, static/dynamic fusion, incomplete/ paraconsistent/imprecise information, specificity chains, specialization
Introduction Each theory works well for some applications and less well for others. This unification, a fusion overview attempt, might look like a cooking recipe, or more precisely, a logical chart or a computer program; however, we do not yet see another method to comprise/unify all things. We extend the power set and hyper-power set from previous theories to a Boolean algebra that we construct by closing the frame of discernment under union, intersection, and complement of sets. All basic belief assignments (bba) and rules are extended to this Boolean algebra. A similar generalization has been previously used by Guan-Bell (1993) for the Dempster-Shafer rule using propositions in sequential logic. Herein we reconsider Boolean algebra for all fusion rules and theories, but use sets instead of propositions because it is generally harder to work in sequential logic with summations and inclusions than in the set theory. We present the definition of a model, some classifications of frames of discernment and their elements, types of information, what specificity chains and specialization mean, the definition of static and dynamic fusions, and the algebraic properties of rules. We list the fusion rules and theories but are not able to present them due to space limitation. We also propose a partial Unification of Fusion Rules (UFR).
F. Smarandache / Unification of Fusion Theories (UFT)
115
1. Fusion Space For n t 2 let Ĭ = {ș1, ș2, …, șn} be the frame of discernment of the fusion problem/ application under consideration. Then (Ĭ, , , U ), Ĭ closed under these three operations: union, intersection, and complementation of sets respectively, forms a Boolean algebra. With respect to the partial ordering relation, the inclusion , the minimum element is the empty set
I , and the maximal element is the total ignorance
n
I = T i . i=1
Similarly one can define: (Ĭ, , , \) for sets, Ĭ closed with respect to each of these operations: union, intersection, and difference of sets respectively. (Ĭ, , , U ) and (Ĭ, , , \) generate the same super-power set SĬ closed under , , U and \ because for any A, B SĬ one has U (A) = I \ A and reciprocally A\B = A U (B). If one considers propositions, then (Ĭ, , , ) forms a Lindenbaum algebra in sequential logic, which is isomorphic with the above (Ĭ, , , U ) Boolean algebra. By choosing the frame of discernment Ĭ with exclusive elements, closed under only, one gets Dempster-Shafer’s, Yager’s, Transferable Belief Model, Dubois-Prade’s power set. Then making Ĭ closed under both , one gets Dezert-Smarandache’s hyper-power set. While, extending Ĭ for closure under , , and U one also includes the complement of set (or negation of proposition if working in sequential logic). In the case of non-exclusive vague elements in the frame of discernment one considers the complement is involutive, i.e., C(C(A))=A for any set A to avoid an infinite loop in the closure under complement process. Therefore the super-power set (Ĭ, , , U ) includes all previous fusion spaces. The power set 2Ĭ, used in DST, Yager’s, TBM, DP, which is the set of all subsets of Ĭ, is also a Boolean algebra, closed under , , and U , but does not contain intersections of elements from Ĭ since the elements are supposed exclusive. The Dedekind distributive lattice DĬ, used in DSmT, is closed under , , and if negations/complements arise, they are directly introduced into the frame of discernment, say Ĭ’, which is then closed under , . Unlike others, DSmT allows intersections, generalizing the previous theories. The Unifying Theory contains intersections and complements as well. Model means to know the empty intersections in the super-power set, whose conflicting masses should be transferred to non-empty sets. Comments on Frames and their extensions. F.1.1 Open World is a frame that misses some hypotheses (non-exhaustive) [Smets], e.g., ȍ = {John, George}, but later we find another suspect, David. An open world becomes closed if one adds another hypothesis in the frame of discernment șc, which includes all missing hypotheses. F.1.2 Closed World is a frame that includes all hypotheses (exhaustive). F.2.1 Homogeneous frame: all its elements are of the same nature. F.2.2 Heterogeneous frame: at least two of its elements are of a different nature, e.g., ȍ = {White, Bird, Long}. It is split into homogeneous sub-frames; the complement is computed with respect to an element’s sub-frame. Construct superpower sets for each sub-frame.
116
F. Smarandache / Unification of Fusion Theories (UFT)
F.3.1 Finite frame. F.3.2 Infinite frame. E.1.1 Exclusive elements: their intersection is empty. E.1.2 Non-exclusive elements: their intersection is not empty, e.g., ȍ = {A, B} of target cross-sections, A={x| 1.5<x 0 is called a focal element of mΩ . A bba mΩ such that mΩ (∅) = 0 is said to be normal. This requirement was originally imposed by Shafer [37], but may be relaxed if one accepts the open-world assumption stating that the set Ω might be complete. Given this bba, a belief function belΩ and a plausibility function plΩ can be defined, respectively, as: belΩ (A) =
mΩ (B), ∀ A ⊆ Ω.
(2)
mΩ (B), ∀ A ⊆ Ω.
(3)
∅=B⊆A
plΩ (A) =
A∩B=∅
Whereas belΩ (A) represents the amount of support given to subset A, the potential amount of support that could be given to A is measured by plΩ (A). A belief function (a Choquet [9] capacity monotone of infinite order) belΩ can also be mathematically defined as a function from 2Ω to [0, 1] satisfying: belΩ (∅) = 0 ∀ n ≥ 1, ∀ i = 1, . . . , n, Ai ⊆ Ω belΩ (∪i=1,...,n Ai ) ≥ (−1)|I|+1 belΩ (∩i=1,...,n Ai ).
(4)
I⊆{1,...,n},I=∅
As such these inequalities are hardly meaningful, but the special case with n = 2 and A1 ∩ A2 = ∅ is worth considering: belΩ (A1 ∪ A2 ) ≥ belΩ (A1 ) + belΩ (A2 ) ∀ A1 , A2 ⊆ Ω.
(5)
This last relation just illustrates that the belief given to the union of two disjoint subsets A1 and A2 of Ω is larger or equal to the sum of the beliefs given to each subset individually. When all the inequalities of relation (4) are replaced by equalities, the resulting function belΩ would then be a classical probability function.
128
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
Among the functions derived from mΩ introduced in Shafer’s book [37], the commonality function q Ω is defined as: q Ω (A) =
mΩ (B)
∀ A ⊆ Ω.
(6)
B⊇A
All these functions belΩ , plΩ , q Ω and mΩ are in one-to-one correspondence and represent different facets of the same piece of information. We can retrieve each function from the others using the fast Mobis transform [27]. The full notation for belΩ and its related functions is: Ω {x}[ECY,t ](ω0 ∈ A) = λ. belY,t
(7)
It denotes that the degree of belief held by the agent Y (shortcut for You) at time t that the actual world ω0 (the possible value for a variable x) belongs to the set A of worlds is equal to λ, where A is a subset of the frame of discernment Ω. The belief is based on the evidential corpus ECY,t held by Y at t, where ECY,t represents all what agent Y knows at t. Fortunately, in practice many indices can be omitted for simplicity sake as the domain Ω in the sequel of this paper. Let us suppose a variable x taking values in the finite and unordered set Ω called the frame of discernment. Partial knowledge regarding the actual value taken by x can be represented by a bba m{x}. Complete ignorance corresponds to m{x}(Ω) = 1, called the vacuous bba, and perfect knowledge of the value of x can be represented by the allocation of the whole mass of belief to a unique singleton of Ω (m{x} is then called a certain bba). Another particular case is that where all focal sets of m are singletons: m is then equivalent to a probability function and is called a Bayesian bba. 1.2. Rules of combination for data fusion Let m1 and m2 be two bba’s defined on the same frame Ω. Suppose that the two bba’s are induced by two distinct pieces of evidence. Then the joint impact of the two pieces of evidence can be expressed by the conjunctive rule of combination which results in the bba: ∩ m2 )(A) = m m1 (B)m2 (C). (8) ∩ (A) = (m1 B∩C=A
This rule is sometimes referred to as the (unnormalized) Dempster’s rule of combination. If necessary, the normality assumption m ∩ (∅) = 0 may be recovered by dividing each mass by a normalization coefficient. The resulting operator which is knows as Dempster’s rule denoted by m⊕ is defined as: m⊕ (A) = (m1 ⊕ m2 )(A)
∩ m2 )(A) (m1 1 − m(∅)
∀ ∅ = A ⊆ Ω
(9)
where the quantity m(∅) is called the degree of conflict between m1 and m2 and can be computed using:
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
∩ m2 )(∅) = m ∩ (∅) = (m1
m1 (B)m2 (C) .
129
(10)
B∩C=∅
The use of Dempster’s rule is possible only if m1 and m2 are not totally conflicting, i.e., if there exist two focal elements B and C of m1 and m2 satisfying B ∩ C = ∅. This rule verifies some interesting properties (associativity, commutativity, nonidempotence) and its use has been justified theoretically by several authors [43, 29,18] according to specific axioms. 1.2.1. Notes about conflict The normalization in Dempster’s rule redistributes conflicting belief masses to non-conflicting ones, and thereby tends to eliminate any conflicting characteristics in the resulting belief mass distribution. The non-normalized Dempster’s rule avoids this particular problem by allocating all conflicting belief masses to the empty set. In [38], Smets explains this by arguing that the presence of highly conflicting beliefs indicates that some possible event must have been overlooked (the open world assumption) and therefore is missing in the frame of discernment. The idea is that conflicting belief masses should be allocated to this missing (empty) event. Smets has also proposed to interpret the amount of belief mass allocated to the empty set as a measure of conflict between separate beliefs. Another approach on how to eliminate conflicts from the Dempster’s rule is to replace ∩ by ∪ in Eq.(8) which produces the Disjunctive (or Dual Dempster’s) Rule [19] defined as: ∪ m2 )(A) m ∪ (A) = (m1
m1 (B)m2 (C)
∀ A ⊆ Ω.
(11)
B∪C=A
The interpretation of (conjunctive) Dempster’s rule is that both beliefs to be combined are assumed to be correct, while at least one of them is assumed to be correct in the case of the Disjunctive Rule. Unfortunately, while the Disjunctive rule has some nice theoretical properties, its disadvantage is that non-specificity of beliefs is increased by an application of the rule; more and more belief mass is assigned to larger subsets of Ω and to the whole Ω. The drastic case is the combination of any belief with vacuous one, where result is always vacuous belief function. Let q1 and q2 denote the commonality functions related to two bba’s m1 and m2 induced by distinct items of evidence. The conjunctive combination of these two pieces of evidence (m ∩ = m1 ∩ m2 ) can be computed from q1 and q2 as: q ∩ (A) = q1 (A)q2 (A)
∀ A ⊆ Ω.
(12)
1.2.2. The Weighted Operator The Weighted Operator, denoted WO, has been created to overcome the sensibility problem of Dempster’s rule which produces unexpected results when evidence conflicts [31]. The idea of the weighted operator is to distribute the conflicting belief mass m(∅) on some subsets of Ω according to additional knowledge. More precisely, a part of the mass m(∅) is assigned to a subset A ⊆ Ω according to a
130
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
weighting factor denoted w. This weighting factor can be a function of the considered subset A and belief functions m = {mj , j = 1, · · · , J} which are involved in the combination and have caused the conflict. This idea is formalized in the following definition of the Weighted Operator. Definition 1 (Weighted Operator) Let m = {mj , j = 1, · · · , J} be the set of belief functions defined on Ω to be combined. The combination of the belief functions m with the weighted operator, denoted , is defined as: m (∅) w(∅, m).m(∅)
(13)
m (A) m ∩ (A) + w(A, m).m(∅) ∀ A = ∅.
(14)
In the definition of the weighted operator , the first term of equation (14), m ∩, corresponds to the conjunctive rule of combination. The second one is the part of the conflicting mass assigned to each subset A and added to the conjunctive term. The symbol has been chosen to highlight these two aspects. Weighting factors w(A, m) ∈ [0, 1] are coefficients which depend on each subset A ⊆ Ω and
on the belief functions m to be combined. They must be constrained by w(A, m) = 1 so as to respect the property that the sum of mass functions A⊆Ω
must be equal to 1 (cf. Eq.(1)). In order to completely define this operator, we need additional information to choose the values of w(., m) which allow to have a particular behavior of the operator. This generic framework allows Dempster’s rule of combination and other proposed by Smets [38], Yager [44] and Dubois and Prade [20] to be rewritten. For each operator, we only have to define the weighting factors w(A, m) associated to each subset A ⊆ Ω. For example, the unnormalized Dempster’s rule is no more than the weighted operator with w(A, m) = 0 for all A ⊆ Ω\{∅} and w(∅, m) = 1. This is the open world assumed by Smets. Yager [44] assumes that the frame of discernment Ω is exhaustive but its idea consists in assigning the conflicting mass m(∅) to the whole set Ω. According to the weighted operator previously presented, it is easy to reformulate the Yager’s idea in setting w(Ω, m) = 1 and w(∅, m) = 0. According to the choice of weights w, we can define a family of weighted operators. Another operator of this family is the proportionalized combination which has been proposed by Daniel in [10]. 1.3. Discounting An α-discounted bba mα (.) can be obtained from the original bba m as follows: mα (A) = αm(A)
∀ A ⊆ Ω, A = Ω
mα (Ω) = 1 − α + αm(Ω)
(15) (16)
with 0 ≤ α ≤ 1. The discounting operation is useful when the source of information from which m has been derived is not fully reliable, in which case coefficient α represents some form of meta-knowledge about the source reliability, which could not be encoded in m.
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
A⊆Ω
m
bel
pl
BetPm
m.5
bel
pl
BetPm.5
{ω1 } {ω2 }
0.2 0.1
0.2 0.1
0.9 0.4
0.5 0.2
0.10 0.05
0.10 0.05
0.95 0.70
0.416 0.266
{ω1 , ω2 }
0.0
0.3
1.0
-
0.00
0.15
1.00
-
{ω3 } {ω1 , ω3 }
0.0 0.4
0.0 0.6
0.7 0.9
0.3 -
0.00 0.20
0.00 0.30
0.85 0.95
0.316 -
{ω2 , ω3 } Ω
0.0 0.3
0.1 1.0
0.8 1.0
-
0.00 0.65
0.05 1.00
0.90 1.00
-
131
Table 1. Example of bba, belief, plausibility and pignistic probability functions.
1.4. Pignistic transformation In the TBM, we distinguish the credal level where beliefs are entertained (formalized, revised and combined) and the pignistic level used for decision making. Based on rationality arguments developed in the TBM, Smets proposes to transform m into a probability function BetPm on Ω (called the pignistic probability function) defined for all ωk ∈ Ω as: BetPm (ωk ) =
m(A) 1 |A| 1 − m(∅)
(17)
A ωk
where |A| denotes the cardinality of A ⊆ Ω and BetPm (A) = ω∈A BetPm (ω), ∀A ⊆ Ω. In this transformation, the mass of belief m(A) is distributed equally among the elements of A [41]. Example of such transformation is given in Table 1. 1.5. The Generalized Bayesian Theorem Let us suppose the two finite spaces X, the observation space, and Θ, the unordered parameter space. The Generalized Bayesian Theorem (GBT), an extension of Bayes theorem within the TBM, consists in defining a belief function on Θ given an observation x ⊆ X, the set of conditional bbas mX [θi ] over X, one for each θi ∈ Θ and a vacuous a priori on Θ. Given this set of bbas (which can be associated to their related belief or plausibility functions), then for x ⊆ X and ∀A ⊆ Θ, we have: plΘ [x](A) = 1 −
(1 − plX [θi ](x)).
(18)
θi ∈A
1.6. Uncertainty in DST Because a belief function can represent several kinds of knowledge, it constitutes a rich and flexible way to represent uncertainty. As remarked by Klir [30], a belief function can model two different kinds of uncertainty: nonspecificity and conflict. A measure of nonspecificity, which generalizes the Hartley measure to belief functions, was introduced by Dubois and Prade [17]. It is defined as:
132
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
N (m) =
m(A) log2 |A|.
(19)
A⊆Ω
Since focal elements of probability measures are singletons, nonspecificity is null for probability functions, and it is maximal (log2 |Ω|) for the vacuous belief function. Several measures of conflict, viewed as generalized Shannon entropy measures, have also been introduced [30]. One such measure is discord, defined as: D(m) = −
m(A) log2 BetPm (A)
(20)
A⊆Ω
which is maximal (log2 |Ω|) for the uniform probability distribution on Ω. Finally, a measure Uλ of total uncertainty can be defined using a linear combination of N and D: U (m) = (1 − λ)N (m) + λD(m)
(21)
where λ ∈ [0, 1] is a coefficient. The choice of λ is not theoretically justified (Klir recommends to take λ = 0.5). In the sequel, we shall see that it can be used as a regularization parameter and determined from learning data. 1.7. Refinement and Coarsening Part of the flexibility of DST is due to the existence of justified mechanisms allowing to change the level of detail, or granularity of the frame of discernment. In this section, we briefly recall the concepts of refinement and coarsening of a frame of discernment ([37], p.115), which play a key role in the theory. Let Ω and Θ be two finite sets. A mapping ρ from 2Θ to 2Ω is called a refining if and only if it verifies: ρ({θ}) = ∅
∀ θ ∈ Θ,
ρ({θ}) ∩ ρ({θ }) = ∅
∀θ = θ ,
ρ({θ}) = Ω.
θ∈Θ
In other terms, the sets ρ({θ}) , θ ∈ Θ constitute a partition of Ω. Θ is then called a coarsening of Ω, and Ω is called a refinement of Θ. Given a bba mΘ defined on Θ, we can define its vacuous extension (see [37] p. 146) mΩ on Ω by transferring each mass mΘ (A) to ρ(A), for all subset A of Θ: mΩ (ρ(A)) = mΘ (A)
∀ A ⊆ Θ.
(22)
Conversely, let mΩ be a bba on Ω. Transferring mΩ to Θ is not so easy because, for some B ⊂ Ω, there may exist no subset A of Θ such that ρ(A) = B. However, the restriction (or outer reduction) of mΩ may still be defined as:
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
mΘ (A) =
mΩ (B)
∀ A ⊆ Θ.
133
(23)
{B⊆Ω | ρ(A)∩B=∅}
1.8. Decision analysis Let us assume that we have a bba m on Ω summarizing one’s beliefs concerning the value of the unknown variable y, and we have to choose an action among a finite set of actions A. A loss function λ : A × Ω → R is also assumed to be given, such that λ(a, ω) denotes the loss incurred if one chooses action a and y = ω. Which action should we choose? Based on the pignistic probability defined in equation (17), we can associate to each a ∈ A a risk, defined as the expected loss (relative to BetPm ) if one chooses action a: R(a) =
λ(a, ω)BetPm (ω).
(24)
ω∈Ω
We then choose the action with the lowest risk. Alternatively, the decision process could be based on non-probabilistic extensions of the concept of mathematical expectation [13]. For example, the concept of lower expectation leads to the definition of the lower expected loss as R∗ (a) =
A⊆Ω
m(A) min λ(a, ω), ω∈A
(25)
which results in a different decision strategy. In pattern classification, Ω = {ω1 , . . . , ωK } is the set of classes, and the elements of A are, typically, the actions ak of assigning the unknown pattern to each class ωk . With 0-1 losses, defined as λ(ak , ωl ) = 1 − δk,l for k, l ∈ {1, . . . , K}, it can be shown [13] that the minimization of the pignistic risk R leads to choosing the class ω0 with maximum pignistic probability, whereas the minimization of R∗ leads to choosing the class ω∗ with maximum plausibility. If an additional rejection action a0 with constant loss λ0 is added, then the pattern is rejected if BetP (ω0 ) < 1 − λ0 using the first rule, and if pl(ω∗ ) < 1 − λ0 using the second rule [13].
2. Case-based, Likelihood-Based Approaches and Belief Decision Trees for Pattern Recognition Problems 2.1. The problem Let us suppose a population P of objects, each object described by two variables: x a vector of d attributes (features), quantitative, qualitative or mixed and ω a class variable, qualitative which takes values in finite set Ω = {ω1 , . . . , ωK }. The pattern recognition problem (discrimination, supervised learning, discriminant analysis) consists in assigning an input pattern x to a class, given a learning set L composed of n patterns xi with known classification. Each pattern in L is represented by a d-dimensional feature vector xi and its corresponding class label
134
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
ω i . In the last ten years, several solutions to this problem have been proposed, based on belief functions theory [37,41]. Such approaches have been called evidential classifiers by their authors. An evidential classifier is a mapping f : Rd → Ω allowing to predict the class ω of any new object described by feature vector x given an output belief function m ˆ Ω . Advantages of these techniques (description of the uncertainty on the prediction, possibility of rejecting a pattern and detecting unknown class) have been demonstrated in numerous papers [2,14]. In particular, these classifiers are well adapted to applications where the available data come from multiple imperfect information sources (multisensor problems, environmental monitoring, medical diagnosis, classifier combination). The classifier output is a belief function which allows to have: • a more faithfull description of uncertainty (greater flexibility to handle various sources of uncertainty such as imprecise or bad quality data), • a distinct representation of: ∗ ignorance (pattern dissimilar from all training examples), ∗ conflicting information (pattern similar to examples of different classes), • a greater robustness (decision procedures) and improved performance when combining several classifiers (e.g. sensor fusion), • a reduced need for unjustified assumption in situations of weak available information. Furthermore, they offer the possibility to handle weak learning information such as partial knowledge of the class of learning examples (e.g., o ∈ {ω1 , ω3 }, o ∈ ω2 , ...) and heterogeneous, non exhaustive learning sets: • a learning set L1 with objects from {ω1 , ω2 } and attributes xj , j ∈ J • a learning set L2 with objects from {ω2 , ω3 } and attributes xj , j ∈ J = J. The main approaches to pattern recognition (parametric, distance-based, treestructured classifiers) can be transposed in the TBM framework. The case-based approach (2.3), developed by Denoeux., is an adaptation of the k-nearest neighbor method, which allows computing a belief function based on the similarity of an object to training samples. It can be applied to build classifiers from training data, possibly with imprecise and/or uncertain class labels. The likelihood-based approach uses the General Bayesian Theorem that replaces the Bayesian Theorem used for diagnosis (2.2). Finally, induction methods, called belief decision trees due to their links with belief function theory and decision trees, have been proposed (2.4). Such techniques give the possibility to interpret each decision rule in terms of individual features. 2.2. Likelihood-Based Methods (LB) Let us assume the class-conditional probability densities f (x|ωk ) to be known. Having observed x, the likelihood function is a function from Ω to [0, +∞) defined as L(ωk |x) = f (x|ωk ), for all k ∈ {1, . . . , K}. Shafer [37, p.238] proposed to derive from L a belief function on Ω defined by its plausibility function as:
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
pl(A) =
maxωk ∈A [L(ωk |x)] maxk [L(ωk |x)]
∀A ⊆ Ω.
135
(26)
In pattern recognition, an application of this method (and a variant thereof) can be found in Ref. [28]. Note that pl defined by (26) is consonant, i.e., its focal elements are nested. For that reason, this first model will be called the “consonant likelihood-based” (CLB) model. Starting from axiomatic requirements, Appriou [2] proposed another method based on the construction of K belief functions mk (.). The idea consists in taking into account separately each class and evaluating the degree of belief given to each of them. In this case, the focal elements of each bba mk are the singleton {ωk }, its complement ωk , and Ω. Appriou actually obtained two different models with similar performances [22]. According to Appriou, one of these models seems to be preferable on theoretical grounds, because it is consistent with the generalized Bayes theorem introduced by Smets [39]. This model, hereafter referred to as the Separable Likelihood-based (SLB) method, has the following expression: mk ({ωk }) = 0
(27)
mk (ωk ) = αk (1 − R.L(ωk |x)) mk (Ω) = 1 − αk (1 − R.L(ωk |x)),
(28) (29)
where αk is a coefficient that can be used to model external information such as sensor reliability, and R is a normalizing constant that can take any value in the range (0, (maxk (L(ωk |x)))−1 ]. Parameter R is somewhat arbitrary, but the principle of maximum uncertainty leads to choose the largest allowed value, which results in the least specific bba. With these K belief functions and using the Demspter’s rule of combination, a unique belief function m is obtained as m = k mk . 2.3. Distance-Based Method (DB) A totally different approach was introduced by Denœux [12]. In this method, a bba is constructed directly, using as a source of information the training patterns xi situated in the neighborhood of the pattern x to be classified. If nearest neighbors (according to some distance measure) are considered, we thus obtain several bba’s that are combined using the Dempster’s rule of combination. The initial method was later refined to allow parameter optimization [45], and a neural-network-like version was recently proposed [14]. This version, which will be considered here, uses a set of prototypes that are determined to minimize an error function. Each prototype can be viewed as a piece of evidence that influences the belief concerning the membership class of x. A belief function mi associated to each prototype i is then defined for all k ∈ {1, · · · , K} as: mi ({ωk }) = αi φi (di )
(30)
m (Ω) = 1 − α φ (d )
(31)
mi (A) = 0 ∀ A ∈ 2Ω \ {{ωk }, Ω}
(32)
i
i i
i
136
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
where di is the Euclidean distance to the i-th prototype, αi is a parameter attached to prototype i, and φi (.) is a decreasing function defined as φi (di ) = exp[−γ i (di )2 ]. In this expression, γ i is a positive parameter associated to prototype i. The belief functions mi for each prototype are then aggregated using the Dempster’s rule of combination. 2.4. Belief decision tree (BDT) In this paper, we only consider the Belief Decision Tree’s approach introduced by Denœux and Skarstein-Bjanger [15] and extended to multiclass problems by Vannoorenberghe et al. [42]. Due to its main ability to represent different kinds of knowledge (from total ignorance to full knowledge), DST allows us to process training sets whose labeling has been specified with belief functions (see 2.4.1). An impurity measure, based on a total uncertainty criterion, is used to grow the tree and has the advantage to define simultaneously the pruning strategy (2.4.2). Finally, we present in paragraph 2.4.3, a multi-class generalization of the method introduced in [15] which allows us to handle the most general case in which each example is labeled by a general belief function [42]. 2.4.1. Principle A decision tree is a specific graph in which each node is either a decision node or a leaf node. To each decision node is associated a test based on attribute values, and a node has two or more successors (depending on the number of possible outcomes of the test). The most commonly used decision tree classifiers are binary trees which use a single feature at each node with two outcomes. In [15], the problem of handling uncertain labels is solved for two-class problems. In this context, the available learning set is given by: L = {(xi , mΩ i ) | i = 1, · · · , n} where mΩ i is defined on Ω = {ω1 , ω2 } and represents the knowledge on the label of the ith example. The belief function mΩ [t] at node t is then derived from the n(t) belief functions mΩ i (by induction using the Dempster’s rule of combination) and becomes: mΩ [t]({ω1 }) =
(j,k) | j+k≤n(t)
αjk
j j+k+1
(33)
where αjk are coefficients which depend only on the functions mΩ i . Similar expressions for mΩ [t]({ω2 }) and mΩ [t](Ω) can be obtained. In the equation (33), n(t) is the total number of examples reaching the node t. These equations are derived from a theoretical result on credal inference presented by Smets in [40]. Demonstrations concerning the extension to the more general case of belief functions have been proposed by Denœux and can be found in [15] and [42]. 2.4.2. Induction For each node t, an impurity measure is computed from the belief function mΩ [t] using the total uncertainty measure: Uλ (t) = (1 − λ)N (mΩ [t]) + λH(mΩ [t]).
(34)
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
137
This impurity measure is used at node t to choose a candidate split s which divides t into two nodes tL and tR . The goodness of a split s is defined as a decrease in impurity by: ΔUλ (s, t) = Uλ (t) − (pL Uλ (tL ) + pR Uλ (tR ))
(35)
where pL and pR are, respectively, the proportions of examples reaching tL and tR . The best split sˆ is chosen by testing all possible splits for each attribute. One of the advantages of this technique is that the tree growing can be controlled using parameter λ. In fact, according to the value of λ, it is possible to give more importance to the non-specificity term which penalizes small nodes. Optimizing this parameter by cross-validation allows us to build smaller trees, thus avoiding overtraining. Unfortunately, this induction method is only available for two-class problems but can be generalized as explained in the next section. 2.4.3. Dichotomous approach for K-class problems A standard way of handling a K-class problems is to decompose it into several 2-class subproblems. One way to do this is to train K binary classifiers, each classifier attempting to discriminate between one class ωk and all other classes. When the learning set is of the form L = {(xi , mΩ i ) | i = 1, · · · , n}, where mΩ is a bba defined on Ω, this approach implies transforming each bba mΩ i i originally defined on Ω into a bba defined on the 2-class coarsened frame. For each coarsening, a tree is grown, and the resulting K trees are combined using the averaging operator. More precisely, let us denote by Ωk the following coarsening of Ω: Ωk = {{ωk }, {ωk }},
(36)
where {ωk } denotes the complement of {ωk }. Each bba mΩ i defined on Ω may be k on Ωk using the following transformation: transformed into a bba mΩ i Ω k mΩ i ({ωk }) = mi ({ωk }) k mΩ mΩ i (A) i ({ωk }) =
(37) (38)
A⊆{ωk } Ωk Ωk k mΩ i (Ωk ) = 1 − mi ({ωk }) − mi ({ωk }).
(39)
k Each of the K coarsenings thus leads to a training set Lk = {(xi , mΩ i ) | i = 1, · · · , n}, which is used to build a decision tree. At the testing step, we obtain, k for each input vector x, K bba’s mΩ x , each defined on a distinct coarsening Ωk . Each of these bba’s can be trivially carried back to Ω using the transformation:
Ωk mΩ x,k ({ωk }) = mx ({ωk })
(40)
Ωk mΩ x,k (Ω \ {ωk }) = mx ({ωk })
(41)
Ωk mΩ x,k (Ω) = mx (Ωk ).
(42)
138
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
Because information sources are not independent, Dempster’s rule of combination cannot be used to combine the bba’s mΩ x,k , k = 1, . . . , K. An alternative is to use the weighted operator as previously explained, which leads to: mΩ x =
mΩ x,k
(43)
w
where w are coefficients to be optimized. This dichotomous approach of Belief Decision Trees allows us to quantify the uncertainty of the prediction of vector x (the belief function mΩ x itself), process learning sets whose labeling has been specified with belief functions (mΩ i for each learning example) and is available for K-class pattern recognition problems. 2.5. Parameter optimization or ’Tuning the discounting’ In the application of the LB methods, the first difficulty concerns the estimation of likelihood functions. Several density estimation can be used, including parametric methods based, e.g., on a Gaussian model, and non parametric kernel methods. In the simulations presented in the sequel, we chose to use a Gaussian mixture model together with the EM algorithm as an estimation technique [34]. As remarked by Bastire [5], there is no general technique for evaluating the discounting coefficients αk in the separable method. In this paper, we propose to use the same approach as used by Denœux [14] for the DB method, i.e., minimizing the following error criterion: E(α) =
K n
(BetP i (ωk ) − uik )2
(44)
i=1 k=1
where uik is the class indicator of pattern xi (uik = 1 if ω i = ωk ), and BetP i (ωk ) is the pignistic probability of ωk for vector xi . In the same manner, it is possible to define an error criterion based on the plausibility function E∗ where p is replaced with pl. 2.6. Simulations For the following simulations, a learning set L was generated using 3 classes containing 50 bidimensional vectors each. Each vector x from class k was generated by first drawing a vector z from a Gaussian f (z|ωk ) ∼ N (μk , Σk ), and applying a non linear transformation z → x = exp(0.3 z) to obtain non-Gaussian data. The means of the 3 Gaussian distributions were taken as: μ1 = (−1, −1) , μ2 = (1, 2) , μ3 = (−1.5, 2) and the variance matrices were of the form Σk = Dk ADk with √ 3 √0 A= 0 3/3 and θ1 = π/3, θ2 = π/2, θ3 = −π/3.
Dk =
cos θk − sin θk sin θk cos θk
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
139
Belief Decision Tree − Maximum pignistic probability 0.65 7 0.6 6
Feature x2
5
0.55
4 0.5 3
0.45
2
1 0.4 0 0.35 −1
0
0.5
1 Feature x1
1.5
2
2.5
Figure 1. Maximum pignistic probabilities for BDT, (learning samples with ω1 = ×, ω2 = ◦, ω3 = +).
We first demonstrate the qualitative effects using the first learning set L1 . Figure 1 shows the maximum pignistic probabilities as grey values for the BDT. For each vector x, this value is obtained using max BetPmx (wk )
ωk ∈Ω
(45)
where m x corresponds to the output belief function. The decision regions for the CLB and DB methods, with the two decision rules are shown in Figures 2 and 3 (the decision regions for the SLB method are somewhat similar to those of the CLB method, and are consequently not shown here for lack of space). In these figures, mixture component centers and prototypes are represented as asterisks (∗). For the LB methods, likelihood functions were estimated using a Gaussian mixture model with k = 2 modes per class, and the parameters were estimated by the EM algorithm [34]. For the DB method, we chose by analogy two prototypes per class whose locations were initialized using the c-means algorithm. The value of the rejection cost λ0 was set at 0.4. The specific form of the belief functions for the CLB and SLB methods impose that maxk pl({ωk }) = 1. For this reason, only the DB method allows to reject patterns using the maximum plausibility decision rule. As can be seen from these figures, both the inference method and the decision rule have a dramatic influence on the shape of the decision regions. To compare the performances of these models, a test set T was generated using the same distribution as L with 15, 000 samples. The experiment was repeated ten times with independent training sets. The number of components in the mixture model (for the LB methods) and the number of prototypes (for the DB method) were optimized using a cross-validation set. The left part of Fig. 4 shows the error rate vs. the reject rate for the 3 methods and the 2 decision rules. For the CLB and SLB methods associated to the maximum plausibility decision rule, there are no rejected patterns. For this data set, all the proposed models obtain comparable performances. However, the DB model yields lower error rates as compared to the LB model without rejection. Moreover, if the classes have different prior probabilities, this gain is further increased. To demonstrate the
140
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion Shafer − Decision regions with Rbet
Shafer − Decision regions with Rinf
6
6
5
5
4
4
3
3
2
2
1
1
0
0
−1
−1
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
Figure 2. Decision regions for the CLB Method (Shafer) with R (left) and R∗ (right) for rejection loss λ0 = 0.4, (ω1 = ×, ω2 = ◦, ω3 = +) Denoeux − Decision regions with Rbet
Denoeux − Decision regions with Rinf
6
6
5
5
4
4
3
3
2
2
1
1
0
0
−1
−1
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
Figure 3. Decision regions for the DB Method (Denœux) with R (left) and R∗ (right) for rejection loss λ0 = 0.4, (ω1 = ×, ω2 = ◦, ω3 = +)
robustness of these methods, the test set T was then contaminated with 1, 500 outliers with uniform distribution and random class labels. The right part of Fig. 4 presents the error rates of the different methods as functions of the rejection rates. The most robust decision rule seems to be the DB method with the maximum pignistic probability rule. This observation is easily explained by the shapes of the decision regions.
3. Applications for Multisensor Data Fusion We provide practical examples on how to use the previous models in detectionrecognition problems. They have nice properties like the possibility to quantify that none of the original hypothesis is supported, that the value of some ’likelihoods’ are unknown, that we can accept an a priori belief that really represents total ignorance. We survey several applications where belief functions have been successfully applied.
141
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion Error vs. reject rate
Error vs. reject rate
0.18
0.25
Denoeux RBet Appriou RBet Shafer RBet Denoeux RInf
0.16
0.14
Denoeux RBet Appriou RBet Shafer RBet Denoeux RInf
0.2
0.12
Error rate
Error rate
0.15 0.1
0.08
0.1 0.06
0.04
0.05
0.02
0
0
0.1
0.2
0.3
0.4
0.5 Reject rate
0.6
0.7
0.8
0.9
1
0
0
0.1
0.2
0.3
0.4
0.5 Reject rate
0.6
0.7
0.8
0.9
1
Figure 4. Test error rate vs. rejection rate for the three methods with the two decision rules without (left) and with outliers (right)
3.1. Sensors on partially overlapping frames Problem 1 A first sensor S1 has been trained to recognize objects in the frame Ω1 = {ω1 , ω2 } and a second sensor S2 has been trained to recognize objects in the frame Ω2 = {ω2 , ω3 }. Let us suppose that a new object O is presented to the two Ω2 1 sensors. Both sensors S1 and S2 express their beliefs mΩ 1 and m2 , the first on the frame Ω1 , the second on the frame Ω2 . How to combine these two beliefs on a common frame Ω = {ω1 , ω2 , ω3 }? Sensor S1 (respectively S2 ) never saw a ω3 (ω1 ) object, and we know nothing on how S1 (S2 ) would react if it looks at a ω3 (ω1 ) object. A solution has been proposed by Smets within the TBM and is based on the following idea. If both Ω2 1 mΩ 1 and m2 are conditioned on ω2 , and combined by conjunctive combination rule, the resulting belief function should be the same as the one obtained after Ω2 1 ’combining’ the original mΩ 1 and m2 on Ω, and conditioning the result on ω2 . Ω2 1 The problem is of course how to combine mΩ 1 and m2 because both belief mass functions are not defined on compatible frames of discernment so the Dempster’s rule of combination can’t be applied. The solution is as follows. Let Ω = {Ω1 ∩Ω2 } Ω1 Ω2 Ω and mΩ 1 , m2 the basic belief assignments obtained by extension of m1 and m2 on Ω. The result of the combination of m1 and m2 is given for all A ⊆ Ω1 ∪ Ω2 : m(A) =
1 2 mΩ mΩ 1 (A1 ) 2 (A2 ) (mΩ [ω2 ] ∩ mΩ 2 [ω2 ])(A1 ∩ A2 ) Ω Ω m1 [ω2 ](A0 ) m2 [ω2 ](A0 ) 1
(46)
where A0 = A ∩ Ω , A1 = A ∩ Ω1 and A2 = A ∩ Ω2 . In table 2, we illustrate the computation of m. We have (mΩ 1 [ω2 ] ∩ Ω m2 [ω2 ])(ω2 ) = (.1 + .3).(.7 + .1) = .32. This mass is distributed on ω2 , {ω1 , ω2 }, {ω2 , ω3 } and Ω according to the next ratios: (.1/.4).(.7/.8), (.3/.4).(.7/.8), Ω (.1/.4).(.1/.8), and (.3/.4).(.1/.8). The mass (mΩ 1 [ω2 ] ∩ m2 [ω2 ])(∅) = .68 is given to {ω1 , ω3 }. In this example, the sensor S1 supports that object O is an ω1 , whereas the second claims that O is a ω2 . If O had been a ω2 , how comes the first sensor did not say so? So the second sensor is probably facing an ω1 and
142
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
A⊆Ω
1 mΩ 1
2 mΩ 2
mΩ 1
mΩ 2
mΩ 1 [ω2 ]
mΩ 2 [ω2 ]
mΩ 1∩2 [ω2 ]
m
BetPm
∅ ω1
0.0 0.6
0.0
0.0 0.0
0.0 0.0
0.6 0.0
0.2 0.0
0.68 0.00
0.00 0.00
0.455
ω2
0.1
0.7
0.0
0.0
0.4
0.8
0.32
0.07
0.190
0.2
0.0 0.0
0.0 0.7
0.0 0.0
0.0 0.0
0.00 0.00
0.00 0.21
0.355 -
0.6 0.1
0.2 0.0
0.0 0.0
0.0 0.0
0.00 0.00
0.68 0.01
-
0.3
0.1
0.0
0.0
0.00
0.03
-
ω3 {ω1 , ω2 } {ω1 , ω3 } {ω2 , ω3 } Ω
0.3 0.1
Table 2. Example of belief computation for two sensors on partially overlapping frames.
just states ω2 because it does not know what an ω1 is. So we feel that the most plausible solution is O = ω1 , what is confirmed by BetPm as it is the largest for ω1 . 3.2. Data association problem Multisensor systems are characterized by specific features that must be taken into account. While the different sensors observe the same scene, or at least partially (overlapping fields of view), they may have different resolutions, accuracies and points of view. The usual functions requested from multisensor systems are detection, localization, and recognition of the objects that may be present in the observed area. In most surveillance applications, the sensors are spread inside or around the area to be observed. For this reason, the association problem is complex and may result in a highly combinatorial problem. To illustrate this complexity, we propose an example which could be easily translated into a battlefield surveillance problem. Problem 2 Having five sensors that can locate targets in a given observed scene, how many targets are there, which sensor is associated with which target and where are the targets? We assume that the five sensors, denoted Si for i = 1, · · · , 5, observe the same scene, which is composed of not overlapping resolution cells which takes values in a set Ω. Once a sensor detects one target, it reports the detection and the position of the resolution cell where the detection was made. We assume that the location precision of each sensor is much better than the size of the resolution cell so that in this example there is no localization error. Besides, we know the confidence or reliability of each sensor, which we define as the ’probability that the sensor is in working condition’, assuming that when the sensor is in working condition, what it states is true. Let us consider the very simple case where sensors S1 and S2 locate a target in resolution cell c1 , while sensors S3 , S4 and S5 locate a target in cell c2 . The questions are therefore: 1. how many targets are we detecting? (a detection problem) 2. where are the targets located ? (a localization problem) 3. which sensor has detected which target? (an association problem).
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
Sensor S mΩ F [S](∅) mΩ F [S]({c1 }) mΩ F [S]({c2 }) mΩ F [S](Ω)
S1
S2
0.7
0.8
0.3
0.2
S3
S4
S5
143
{Si }5i=1 0.925 0.015
0.6
0.6
0.9
0.059
0.4
0.4
0.1
0.001
Table 3. Target association problem.
If the sensors were perfect, we would conclude that there are two targets, one in c1 and one in c2 , that sensors S1 and S2 report on target located in c1 , whereas sensors S3 , S4 and S5 report on target located in c2 . We consider how these conclusions could be reached once uncertainty is introduced. Each sensor produces a belief function with a mass 1 allocated to the resolution cells c1 (sensors S1 Ω and S2 ) or c2 (sensors S3 , S4 and S5 ). So, we have: mΩ 1 ({c1 }) = m2 ({c1 }) = Ω Ω Ω m3 ({c2 }) = m4 ({c2 }) = m5 ({c2 }) = 1. For example, sensor S1 claims that there is an object in c1 and sensor S4 claims there is one in c2 . The fusion unit F , which function is to integrate the data, collects each of these five basic belief assignments and discounts them with the confidence that F gives to each sensor. This basic belief assignment expresses the belief held by F that there is an object in c1 and this belief results from what sensor S1 states and F ’s opinion about the reliability of sensor S1 . The resulting discounted belief functions are given in the table 3. We will now consider subsequently the hypotheses that these measurements resulted from either one target or two targets. Case 1: One target This means that all the declarations refer to the same event. Therefore, the fusion unit F combines the five belief functions using Dempster’s rule of combination, denoted by: 5 mΩ F [Si ]i=1 =
5
mΩ F [Si ].
(47)
i=1
The result is given in table 3 where the value 0.925 obtained for the empty set reflects a high degree of conflict between the measurements given by the five sensors. Case 2: Two targets Now, let us assume that there are two targets, so that some sensor measurements may refer to one target and the others to the other target. Schubert’s idea [6,36] is to cluster the sensors whose measurements are compatible, that is refer to the same target. As the hypothesis is that there are two targets, the set of five measurements is to be partitioned into two clusters, denoted X1 and X2 . Table 4 presents the masses given to the conflicts when the five sensors are grouped in two clusters. The least conflicting solution is X1 = 12 and Ω X2 = 345 which has the smallest internal conflict (mΩ F [X1 ](∅) = mF [X2 ](∅) = 0). We accept the heuristic that we should try to keep the number of targets as small as possible. The presence of two targets is sufficient to explain the data. Of course there might be three or more targets. As far as the data can be explained by the presence of two targets, that hypothesis is accepted. We can therefore conclude that there are two targets, without conflict. One target is located in c1 with belief
144
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
Cluster X1
Cluster X2
mΩ F [X1 ](∅)
mΩ F [X2 ](∅)
1 2
2345 1345
0.00 0.00
0.787 0.689
3
1245
0.00
0.902
4 5
1235 1234
0.00 0.00
0.902 0.790
12 13
345 245
0.00 0.42
0.000 0.768
14
235
0.42
0.768
15 23
234 145
0.63 0.48
0.672 0.672
24
135
0.48
0.672
25 34
134 125
0.72 0.00
0.588 0.846
35 45
124 123
0.00 0.00
0.564 0.564
Table 4. Basic belief mass given to ∅ by F after combining the bba’s within each cluster X1 and X2 .
.94, and it is observed by sensors S1 and S2 . The other one is located in c2 with belief .984 and it is observed by sensors S3 , S4 and S5 .
4. Concluding Remarks This paper has focused on belief functions theory for multisensor data fusion. In this context, belief functions have showed their ability to model uncertain information while offering a suitable set of tools in a federative framework which includes other uncertainty theories. In the second part of this paper, they have been used for classification and pattern recognition tasks. From evidential classifiers, we can draw several conclusions: • The output belief functions take very different forms from the methods studied (more or less specific, consonant or not); consequently, the uncertainty related to the prediction is not represented in the same manner. • All the proposed models (except LB methods with the maximum plausibility decision rule) obtain comparable performances in the case of “standard” data; however, the DB method associated to pignistic risk minimization seems to be more robust to outliers than the other methods. Although these conclusions cannot be blindly generalized to all classification tasks, they seem to be sufficiently explicit to guide the choice of a model. For multisensor data fusion, several advantages may be expected from belief functions such as the ability to face a more important set of situations, the improvement of discrimination capacity. Practical examples on how to use the previous models in detection-recognition problems have been finally introduced such as management of heterogeneous frames of discernment or data association problem.
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
145
References [1] M.A. Abidi and R.C. Gonzalez. Data Fusion in Robotics and Machine Intelligence, chapter 4 : Multisensor strategies using Dempster-Shafer belief accumulation, pages 165–210. Academic Press, INC, 1992. [2] A. Appriou. Aggregation and Fusion of Imperfect Information, chapter Uncertain Data Aggregation in Classification and Tracking Processes, pages 231–260. PhysicaVerlag, 1998. [3] A. Appriou. Multisensor signal processing in the framework of the theory of evidence. In Application of Mathematical Signal Processing Techniques to Mission Systems, pages (5–1)(5–31). Research and Technology Organization (Lecture Series 216), November 1999. [4] Y. Bar-Shalom and X.R. Li. Multitarget-Multisensor Tracking: Principles and Techniques. Storrs, CT: YBS Publishing, 1995. [5] A. Basti`ere. Methods for multisensor classification of airbone targets integrating evidence theory. Aerospace Science and Technology, 2(6):401–411, 1998. [6] M. Bengtsson and J. Schubert. Dempster-Shafer clustering using Potts spin mean field theory. Soft Computing, 5(3):215–228, 2001. [7] I. Bloch. Some aspects of Dempster-Shafer evidence theory for classification of multi-modality medical images taking partial volume effect into account. Pattern recognition Letters, 17:905–919, 1996. [8] L. Cholvy. About Merged Information. In D. Dubois and H. Prade, editors, Handbook of Defeasible Reasoning and Uncertainty Management Systems, volume 3, pages 233–263. Kluwer Acad. Publ., Dordrecht, 1998. [9] G. Choquet. Th´eorie des capacit´es. Annales de l’Institut Fourier, 5:131–295, 1954. [10] M. Daniel. Distribution of Contradictive Belief Masses in Combination of Belief Functions. In B. Bouchon-Meunier, R. R. Yager, , and L. A. Zadeh, editors, Information, Uncertainty and Fusion, pages 431–446. Kluwer Academic Publishers, 2000. [11] A. Dempster. Upper and lower probabilities induced by multivalued mapping. Annals of Mathematical Statistics, AMS-38:325–339, 1967. [12] T. Denœux. A k-nearest neighbour classification rule based on Dempster-Shafer theory. IEEE Transactions on Systems Man and Cybernetics, 25(5):804–813, 1995. [13] T. Denœux. Analysis of evidence-theoretic decision rules for pattern classification. Pattern Recognition, 30(7):1095–1107, 1997. [14] T. Denœux. A neural network classifier based on Dempster-Shafer theory. IEEE Transactions on Systems, Man and Cybernetics, Part A : Systems and humans, 30(2):131–150, 2000. [15] T. Denœux and M. Skarstein Bjanger. Induction of decision trees from partially classified data using belief functions. In Proceedings of SMC’2000, pages 2923–2928, Nashville, USA, 2000. IEEE. [16] J. Desachy, L. Roux, and E. Zahzah. Numeric and symbolic data fusion: A soft computing approach to remote sensing images analysis. Pattern Recognition Letters, 17:1361–1378, 1996. [17] D. Dubois and H. Prade. A note on measures of specificity for fuzzy sets. International Journal of General Systems, 10:279–283, 1985. [18] D. Dubois and H. Prade. On the unicity of Dempster’s rule of combination. International Journal of Intelligent Systems, 1:133–142, 1986. [19] D. Dubois and H. Prade. A set-theoretic view of belief functions. International Journal of General Systems, 12:193–226, 1986. [20] D. Dubois and H. Prade. Representation and combination of uncertainty with belief functions and possibility measures. Computationnal Intelligence, 4:244–264, 1988.
146
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
[21] D. Dubois and H. Prade. Combination of Fuzzy Information in the framework of Possibility Theory. Data Fusion in Robotics and Machine Intelligence, pages 481–505, 1992. [22] S. Fabre, A. Appriou, and X. Briottet. Presentation and description of two classification methods using data fusion based on sensor management. Information Fusion, 2:49–71, 2001. [23] L. Fouque and A. Appriou. An evidential Markovian model for data fusion and unsupervised image classification. In Proc. of the Third International Conference on Information Fusion (FUSION 2000), pages TuB4 25–32, Paris, France, 2000. [24] J. Gebhardt and R. Kruse. Parallel Combination of Information Sources. In D. Dubois and H. Prade, editors, Handbook of Defeasible Reasoning and Uncertainty Management Systems, volume 3, pages 393–439. Kluwer Acad. Publ., Dordrecht, 1998. [25] S. Le H´egarat-Mascle, I. Bloch, and D. Vidal-Madjar. Application of DempsterShafer evidence theory to unsupervised classification in multisource remote sensing. IEEE Transactions on Geoscience and remote Sensing, 35(4):1018–1032, 1997. [26] S. Le H´egarat-Mascle, I. Bloch, and D. Vidal-Madjar. Introduction of neighborhood information in evidence theory and application to data fusion of radar and optical images with partial cloud cover. Pattern recognition, 31(11):1811–1823, November 1998. [27] R. Kennes. Computational aspects of the M¨ obius transform of a graph. IEEE Transactions on Systems,Man and Cybernetics, 22:201–223, 1992. [28] H. Kim and P.H. Swain. Evidential reasoning approach to multisource-data classification in remote sensing. IEEE Transactions on Systems, Man and Cybernetics, 25(8):1257–1265, 1995. [29] F. Klawonn and E. Schwecke. On the axiomatic justification of Dempster’s rule combination. International Journal of Intelligent Systems, 7:469–478, 1992. [30] G.J. Klir and M.J. Wierman. Uncertainty-Based Information. Physica-Verlag, Heidelberg, Germany, 1998. [31] E. Lefevre, O. Colot, and P. Vannoorenberghe. Belief functions combination and conflict management. Information Fusion, 3(2):149–162, June 2002. [32] E. Lefevre, P. Vannoorenberghe, and O. Colot. About the Use of Dempter-Shafer Theory for Color Image Segmentation. In First International Conference on Color in Graphics and Image Processing (CGIP’2000), pages 164–169, October 2000. [33] H. Li, S. Munjanath, and S. Mitra. Multisensor Image Fusion Using the Wavelet Transform. Graphical Models and Image Processing, 57(3):235–245, 1995. [34] G. J. McLaclan and T. Krishnan. The EM algorithm and extensions. John Wiley, New York, 1997. [35] C. Pohl and J.L. van Genderen. Multisensor Image Fusion in Remote Sensing: Concepts, Methods and Applications. International Journal of Remote Sensing, 19(5):823–854, 1998. [36] J. Schubert. Clustering belief functions based on attracting and conflicting metalevel evidence. In Proceedings of IPMU’2002, pages 571–578, Annecy, France, 2002. [37] G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, 1976. [38] Ph. Smets. The combination of evidence in the transferable belief model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(5):447–458, 1990. [39] Ph. Smets. Belief functions: The disjonctive rule of combination and the generalized bayesian theorem. International Journal of Approximate Reasoning, 9:1–35, 1993. [40] Ph. Smets. What is Dempster-Shafer’s model? In R.R. Yager, M. Fedrizzi, and J. Kacprzyk, editors, Advances in the Dempster-Shafer Theory of Evidence, pages 5–34. Wiley, 1994.
P. Vannoorenberghe / Belief Functions Theory for Multisensor Data Fusion
147
[41] Ph. Smets and R. Kennes. The Transferable Belief Model. Artificial Intelligence, 66(2):191–234, 1994. [42] P. Vannoorenberghe and T. Denœux. Handling uncertain labels in multiclass problems using belief decision trees. In Proceedings of IPMU’2002, pages 1919–1926, Annecy, France, 2002. [43] F. Voorbraak. On the justification of Dempster’s rule of combinations. Artificial Intelligence, 48:171–197, 1991. [44] R.R. Yager. On the Dempster-Shafer framework and new combination rules. Information Sciences, 41:93–138, 1987. [45] L.M. Zouhal and T. Denœux. An evidence-theoretic k-nn rule with parameter optimization. IEEE Transactions on Systems, Man and Cybernetics-Part C, 28(2):263– 271, May 1998.
148
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Dempster-Shafer Evidence Theory Through the Years: Limitations, Practical Examples, Variants Under Conflict and a New Adaptive Combination Rule Mihai Cristian FLOREA (a), Anne-Laure JOUSSELME (b) and Dominic GRENIER (a) (a) Université Laval, Québec, Canada (b) Defence Research and Development Canada - Valcartier
Abstract. Evidence theory has been primarily used in the past to model imperfect information, and it is a powerful tool for reasoning under uncertainty. It appeared as an alternative to probability theory and is now considered a generalization of it. In this paper we first introduce an object identification problem and then present two approaches to solve it: a probabilistic approach and the Dempster- Shafer approach. We also present the limitations of Dempster’s rule of combination when conflictual pieces of information are combined and we present alternatives rules proposed in the literature to overcome this problem. We propose a class of adaptive combination rules obtained by mixing the basic conjunctive and disjunctive combination rules. The symmetric adaptive combination rule is finally considered and we compare it with the other existing rules. Keywords. Evidence theory, reliability, adaptive combination rule, information fusion, data fusion, reasoning under uncertainty.
Introduction Dempster-Shafer theory (DST) [1,2] is a mathematical tool developed for reasoning under uncertainty. One of its strengths is that it can cope with imprecise and uncertain information. The first combination rule in this framework, proposed by Dempster in 1968, has often been criticized based on its occasionally unexpected behavior. In particular, Zadeh proved, through a simple example [3], that it provides counterintuitive results when combining highly conflictual information. Some authors argue that this is a false problem since the reason for the counter-intuitive results comes from an improper use of the rule [4,5,6]. Indeed, Dempster’s rule of combination should be used only under the restrictions initially imposed by Dempster of (1) independent sources providing independent evidences, (2) homogeneous sources defined on a unique frame of discernment and (3) a frame of discernment containing an exclusive and exhaustive list of hypotheses. In practice, the restrictions (1)-(3) are severe and not easily satisfied, which has lead to evidence theory being extended to include new, more flexible theories to cope with an unknown and unpredictable reality (the Transferable Belief Model by Smets and Kennes [7] or the Dezert-Smarandache Theory (DSmT) [8]). A new direction consists of defining new combination rules in the DST as
M.C. Florea et al. / Dempster-Shafer Evidence Theory Through the Years
149
alternatives to Dempster’s rule such as Yager [9], Dubois and Prade [10] or Inagaki [11]. Haenni [5] promotes the fact that Dempster’s rule does not need any alternative rule and that rather the initial belief functions should be modified to better represent the sources’ information [2,12]. In this paper, we consider the sources of information to be independent and homogeneous and the frame of discernment to contain an exclusive and exhaustive list of hypotheses. Consequently, we consider that the conflict should be due to the unreliability of some sources. Moreover, instead of modifying the initial BPA to take into account reliability, we adopt the approach of an automatic combination rule based on the mixing of two conjunctive and disjunctive operators. In Section 1 we present the object identification problem. Sections 2 and 3 are short reviews of probability theory and Dempster-Shafer theory, respectively. Some alternatives to Dempster’s rule are presented in Section 4. Section 5 presents a class of adaptive combination rules as a mixing of the conjunctive and disjunctive combination rules. We restrict ourselves to a particular case of a symmetric adaptive combination rule. Section 6 illustrates some properties of the proposed adaptive rule in examples as well as in a test scenario for target identification. Section 7 presents a conclusion.
1. Object Identification Problem In the object identification problem several sensors placed on a platform receive information from one or more objects that have to be identified. The sensors can be human (experts) or electronic (Radars, Forward Looking Infra-Red (FLIR), Electronic Support Measures (ESM), etc.), can provide opinions or statistical information and can be more or less reliable. The possible observed objects and their attributes for each object (physical dimensions, cruise speed, maximum altitude, emitters on board, etc.) are listed in a database.
2. Probability Theory Probability theory first appeared as a mathematical tool able to model problems from game theory (such as card and dice games). Let 4 {T1,T2,...,TN} be the frame of discernment, containing N objects, hypotheses, etc. A probability distribution 4
function (pdf) P is defined over the powerset 2 such that:
P( A) [0,1], A 4
(1)
P(Ø) 0 and P(4) 1
(2)
P( A B)
(3)
P( A) P( B) P( A B), A, B 4
From the probability of all singletons T 4 , one can compute the probability of i any subsets A 4 , which can be interpreted as a restriction of the definition domain. The probability, which is not associated to a set A , is associated with its complement. When the probability of a set A is known, and the underlying distribution on its
150
M.C. Florea et al. / Dempster-Shafer Evidence Theory Through the Years
singletons is unknown, a common way to distribute the probability of A to its singletons is the equiprobability distribution P(T ) P( A) / | A |, T A . i i Example 1: Let consider a frame of discernment of 100 objects and a piece of information under the form of a pdf such that P ( ship ) 0.8 . From the database, we
T ,T ,...,T
are ships. Then P(not a ship) 0.2 and 1 2 90 P(T T ... T ) 0.8 and P(T T ... T ) 0.2 . By modeling the 1 2 90 91 92 100 ignorance using the equiprobability, it can be shown that the probability of any ship is smaller than the probability of any other singleton since P (T ) ... P (T ) 0.8 / 90 0.0088 and P(T ) ... P(T ) 0.2 /10 0.02 . 1 90 91 100 This result is somehow counterintuitive and inhibits good decision making. To overcome this problem, new theories were developed in the last couple of decades.
know
that
3. Dempster-Shafer Theory (DST)
Evidence theory is a powerful tool that deals with imprecise and uncertain information, developed by Dempster [1] and later formalized by Shafer [2]. This theory is often described as an extension of probability theory as it is based on the power set of the universe of discourse instead of on the universe itself. A Basic Probability Assignment (BPA) is a mapping m : 24 o [0,1] that must satisfy the following conditions: (1)
m(Ø) 0 and (2) ¦ m( A) 1 , where 0 d m( A) d 1 , A 4 . m( A) is called the mass of A and represents the degree of belief that someone strictly assigns to A . A subset A with a non-null mass is called a focal element of m. Let F designate the set of all focal elements of m . A wide variety of combination rules exists; a review and classification is proposed in [13], where the rules are analyzed according to their algebraic properties (idempotence, commutativity, associativity) as well as on different examples. Let m and m be two BPAs defined on the same frame of discernment 4 . The 1 2 basic combination rules between m1 and m2 are the disjunctive rule and the conjunctive rule defined respectively A 4 by: p( A)
¦
B C A
m1 ( B )m2 (C )
and
q ( A)
¦ m ( B)m (C ) 1
2
B C A
where the functions p and q are introduced here to simplify some upcoming expressions. K q(Ø) is called the weight of conflict (or simply conflict) between m1 and m2 and is equal to the mass of the empty set after the conjunctive combination. If K is close to 0, the BPAs are not in conflict, while if K is close to 1, the BPAs are in conflict. The most common combination rule of two BPAs is the rule proposed by Dempster [1], denoted by here and called an orthogonal sum. Dempster’s rule of combination is indeed a normalized conjunctive rule constrained by the mass of the empty set to be always equal to 0. For two BPAs, Dempster’s rule of combination is defined as m m (A) q( A) /(1 K ), A 4, A z Ø and m m (Ø) 0 . Although 1 2 1 2
M.C. Florea et al. / Dempster-Shafer Evidence Theory Through the Years
151
Dempster’s rule of combination is used for a large number of applications, in the presence of conflicting information (when K ป 1) this rule does not provide an adequate representation of the aggregation of these two BPAs (see Section 6). In the identification problems, given a BPA m , one needs to find a singleton T which is the most probable from the frame of discernment 4 . Several decision criteria were proposed to identify the most credible singleton. A recent survey of the different methods of decision making may be found in Bloch [14]. In this paper we consider only the maximum of pignistic probability decision criteria, proposed by Smets [15].
4. Alternatives to Dempster’s Rule of Combination in Conflictual Situations
Several combination rules were proposed in the past few decades to cope with different problems and, particularly, to solve the conflict problem, when Dempster’s rule of combination cannot be used. Inagaki [11] proposed to distribute the mass of the empty set after the conjunctive combination to any subset of 4 using a set of weighting coefficients w( A) ! 0 such that m m (A) q ( A) w( A) q (Ø) , A 4, A z Ø with 1 2 ¦ w(A) 1 . Inagaki also proposed a particular case of the previous equation where the ratio between the mass of any subsets A and B is the same before and after the distribution of the conflict. Note that as Dempster’s rule, this particular case of Inagaki’s combination rules cannot be used whenever K 1 . Yager proposed in [9] to allocate the mass of the conflict q(Ø) to the ignorance 4 . Dempster’s rule and Yager’s rule turn out to be particular cases of Inagaki’s particular combination rule. Dubois and Prade [10] proposed to distribute the mass of the empty set not only to focal elements of F F but also to some focal elements from F F . When the 1 2 1 2 intersection of two focal elements is the empty set, the combined mass is allocated to their union and not to the empty set. This combination rule cannot be written as Inagaki’s particular combination rule. However it can be obtained from the general expression of Inagaki’s combination rule from the previous equation.
5. A New Class of Adaptive Combination Rules (ACR)
Here we propose a new class of combination rules based on the mixing of the disjunctive rule and the conjunctive rule, following an idea equivalent to the one developed by Dubois and Prade in [10,16]. This class of rules, which includes Dempster’s rule as a special case, leads to more intuitive results than other rules. After introducing the general class and highlighting an interesting property, we focus our attention on a special class of rules, where the weighting coefficients of the conjunctive and disjunctive rules are symmetric. 5.1. Adaptive Combination Rule with Symmetric Weighting Coefficients A class of Adaptive Combination Rules (ACR) between two BPAs m and m is 1 2 defined by m ¡m (A) D ( K ) p( A) E ( K )q( A), A 4, A z Ø and m ¡m (Ø) 0 . 1 2 1 2
152
M.C. Florea et al. / Dempster-Shafer Evidence Theory Through the Years
Here, D and E are functions of the conflict K q (Ø) from [0, 1] to [0, 1]. A desirable behavior for the ACR is that it should act more like the disjunctive rule whenever K is close to 1 (i. e., at least one source is unreliable), while it should act more like the conjunctive rule if K is close to 0 (i. e., both sources are reliable). Thus, we consider the three conditions: x (C1) D is an increasing function with D (0) 0 and D (1) 1 ; x (C2) E is a decreasing function with E (0) 1 and E (1) 0 ; x (C3) D ( K ) 1 (1 K ) E ( K ) (arising from the necessity of ¦ m ¡m ( A) 1 ). 1 2 While the behaviors at the extrema ( K = 0 and K = 1) are easily interpretable, what should happen at the medium value K = 0.5 can be subjected to discussion. We suppose in this case that an acceptable choice could be that D (0.5) E (0.5) giving an equal weight to the two basic rules p and q . Hence, an interesting property for the adaptive rule is to have symmetric weightings for p and q . It can be shown that the combination rule m ¡m ( A) 1 2
K 1 K K 2
p( A)
1 K 1 K K 2
q( A)
is the unique symmetric-ACR (or SACR). A partial positive reinforcement of the belief can be observed in the case of the ACR for the focal elements common to both m and m . This property is one of the 1 2 strengths of this new class of adaptive combination rules. Moreover, this new class can be used when the conflict between two BPAs is equal to 1, when Dempster’s rule cannot be used. 5.2. Adaptive Combination Rule and Sequential Fusion Processes The adaptive combination rule with symmetric weighting factors is not associative. However, based on the commutative and associative properties of the conjunctive and the disjunctive combination rules p and q a quasi-associative ACR can be derived from the general representation of the new class of adaptive combination rules. p( A) and q( A) are propagated in a sequential fusion process, as illustrated in Figure 1.
Figure 1. Fusion process: quasi-associative adaptive combination rule in a sequential fusion process.
153
M.C. Florea et al. / Dempster-Shafer Evidence Theory Through the Years
6. Illustrations
6.1. Simple Examples In order to show how Dempster’s rule of combination provides counter-intuitive results when conflictual information is fused, Zadeh proposed the following example. Example 2 (Zadeh [3]) Let m and m be two BPAs defined over a frame of 1 2 discernment 4 {T ,T ,T } such that m (T ) m (T ) 0.99 and 1 1 2 2 1 2 3 m (T ) m (T ) 0.01 . 1 3 2 3 This example is now well known and has been frequently analyzed. Since K ป 1, the results obtained using the SACR are almost identical to the results obtained using the disjunctive rule or Dubois and Prade’s rule. The positive reinforcement of the belief of the singleton T (compared to the conjunctive rule) is insignificant in this case. 3 Singletons T and T cannot be distinguished, while singleton T is not necessarily 1 2 3 the most credible singleton. Significant differences between SACR and Dempster’s rule, Yager’s rule and Dubois and Prade’s rule of combination arise when K is neither too close to 0 nor to 1. Example 3 Let m and m be two BPAs defined over 4 : 3 4 m (T ) m (T ) 0.3 and m (T ) m (T ) 0.7 . 3 1 4 2 3 3 4 3 Table 1. Example 3: Combining m3 and m4 using different combination rules in evidence theory Focal elements
Ø T3
Conjunctive rule q Smets’ rule 0.51 0.49
Disjunctive rule p Dubois & Prade rule 0 0.49
Dempster’s rule Inagaki’s rule 0 1
Yager’s rule
SACR
0 0.49
0 0.6532 0.0612 0.1428
{T 1 , T 2 } {T 1 , T 3 }
0 0
0.09 0.21
0 0
0 0
{T 2 , T 3 }
0
0.21
0
0
0.1428
4 {T1 , T 2 , T 3 }
0
0
0
0.51
0
Decision
Ø
T3
T3
T3
T3
Table 1 presents the combination of m and m using different rules. We observe 3 4 a positive reinforcement of the belief for the common focal elements of both m and 3 m when using SACR. Moreover, the other focal elements have then a smaller mass 4 than those resulting from the disjunctive or Dubois and Prade combination rules. Example 4 Let m and m be two BPAs defined over 4 : 5 6 m (T ) m (T ) 0.3 m ({T , T }) m ({T ,T }) 0.2 and , 5 1 5 3 5 1 3 5 1 2 m (T ) m ({T , T }) 0.4 , m (T ) 0.2 . 6 1 6 2 3 6 3
154
M.C. Florea et al. / Dempster-Shafer Evidence Theory Through the Years
Table 2. Example 4: Combining m5 and m6 using different combination rules in evidence theory Focal elements Ø T1 T2 T3
Conjunctive rule q Smets’ rule 0.34 0.28 0.08 0.30
Disjunctive rule p 0 0.12 0 0.06
Dempster’s rule Inagaki’s rule 0 0.4242 0.1212 0.4546
Yager rule
{T 1 , T 2 } {T 1 , T 3 }
0 0
0.08 0.30
0 0
0 0
0 0.28 0.08 0.30
Dubois & Prade rule 0 0.28 0.08 0.30 0 0.18
SACR 0 0.2909 0.0681 0.2816 0.0351 0.1315
{T 2 , T 3 }
0
0.12
0
0
0
0.0525
{T 1 , T 2 , T 3 }
0
0.32
0
0.34
0.16
0.1403
Decision
Ø
T1
T3
T3
T3
T1
Computing the pignistic probability before combination, Source 5 is in favour of
T , while Source 6 is in favour of T and T , equally. We thus expect that after
1 1 3 combination, T will be the solution. Table 2 shows the results of the different 1 combination rules using this example. Dempster’s rule, Yager’s rule or Dubois and Prade’s rule associated with the considered decision criteria provide singleton T as 3 the solution, while SCAR alone gives T as solution because of a higher positive 1 reinforcement.
6.2. Test Scenario of Target Identification In this section we study a simple test scenario of target identification. Several pieces of evidential information coming from ESM are sequentially combined at a fusion centre. The 135 targets to be potentially identified are listed in a Platform Data Base (PDB), according to 22 features (ID, country, type, sub-type, emitters on board, etc.). The following simulation test was generated considering that the ESM provides wrong emitters in 20 % of reports. Suppose the observed target is the object T from the 48 database. All pieces of information are modeled in the DST using two focal elements: m( A) 0.8 and m(4) 0.2 where A is the set of objects corresponding to the received information. Figure 2 shows the results of the test-scenario: the final pignistic probabilities obtained using the SACR and Dempster’s rule and the comparative evolution of the pignistic probability of the object T during the fusion process for 48 three combination rules. The pignistic probability decreases in the last fusion step, since the last information received at the fusion center was a counter-measure. Because the combination rules (except Dempster’s rule) are not associative, the last piece of information to be fused will always be more important than those already fused. Using Dempster’s rule, the final pignistic probability distribution shows that object T is the 48 only possible object, since BetP( T ) ป 1. All other possibilities are excluded. We also 48 remark that the pignistic probability of the singleton T does not decrease 48 significantly in the case of false reports (except for the instant #9). This behavior is somehow counter-intuitive. SACR reacts to the “false” information in a more natural
M.C. Florea et al. / Dempster-Shafer Evidence Theory Through the Years
155
manner. The object T
is well identified by the three combination rules. Where 48 Dempster’s rule fails to provide other options in the identification of the observed target, the SACR gives other alternatives.
Figure 2. Test scenario with a probability of error for the ESM equal to 0.2
7. Conclusion
In this paper we defined a new class of adaptive combination rules for evidence theory. This new class of rules was developed to cope with the problem of combining conflictual information, a case where Dempster’s classical rule fails to provide intuitive results. The mixing of the classical conjunctive and disjunctive rules is done by using two weighting functions of the conflict factor K. Depending on the conflict, the proposed adaptive rules act more like a conjunctive rule or more like a disjunctive rule. We built the unique adaptive combination rule with symmetric weighting coefficients. An interesting property of positive reinforcement of the belief is shown. In some examples we saw that SACR is more adequate than other classical rules. Finally, we illustrated a test scenario of target identification and, due to non-associative properties, observed a more desirable behavior for SACR than for Dempster’s rule in the case of sequential fusion problems.
156
M.C. Florea et al. / Dempster-Shafer Evidence Theory Through the Years
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]
A. Dempster, “Upper and Lower Probablilities Induced by Multivalued Mapping,” Ann.Math. Statist., vol. 38, pp. 325–339, 1967. G. Shafer, A Mathematical Theory of Evidence. Princeton University Press, 1976. L. A. Zadeh, “Review of Shafer’s Mathematical Theory of Evidence,” AI Magazine, vol. 5, pp. 81 – 83, 1984. F. Voorbraak, “On the justification of Dempster’s rule of combination,” Artificial Intelligence, vol. 48, pp. 171–197, 1991. R. Haenni, “Are alternatives to Dempster’s rule of combination real alternatives? Comments on "About the belief combination and the conflict management problem"—Lefevre et al.” Information Fusion, vol. 3, pp. 237–239, 2002. W. Liu and J. Hong, “Reinvestigating Dempster’s idea on evidence combination,” Knowledge and Inf. Syst., vol. 2, pp. 223–241, 2000. P. Smets and R. Kennes, “The Transferable Belief Model,” Artificial Intelligence, vol. 66, pp. 191–234, 1994. J. Dezert, “Foundations for a new theory of plausible and paradoxical reasoning,” Information & Security: An International Journal, vol. 9, pp. 90–95, 2002. R. R. Yager, “On the Dempster-Shafer framework and new combination rules,” Information Science, vol. 41, pp. 93–138, 1987. D. Dubois and H. Prade, “Representation and combination of uncertainty with belief functions and possibility measures,” Computational Intelligence, vol. 4, pp. 244–264, 1988. T. Inagaki, “Interdependence between safety-control policy and multiple-sensor schemes via DempsterShafer theory,” IEEE Trans. on reliability, vol. 40, no. 2, pp. 182 – 188, 1991. G. Rogova and V. Nimier, “Reliability in Information Fusion: Literature Survey,” in Proceedings of the 7th Annual Conference on Information Fusion , ISIF, Ed., 2004, pp. 1158 – 1165. K. Sentz and S. Ferson, Combination of evidence in Dempster-Shafer theory. Sandia National laboratory - Epistemic Uncertainty Project, Tech. Rep. SAND 2002-0835, 2002. I. Bloch, “Fusion of image information under imprecision and uncertainty: Numerical methods,” in Data Fusion and Perception, G. D. Riccia, H.-J. Lenz, and R. Kruse, Eds., vol. 431. Springer-Verlag, NY, 2001, pp. 135–168. P. Smets, “Constructing the pignistic probability function in a context of uncertainty,” Uncertainty in Artificial Intelligence, vol. 5, pp. 29–39, 1990, elsevier Science Publishers. D. Dubois and H. Prade, “La fusion d’informations imprécises,” Traitement du Signal, vol. 11, no. 6, pp. 447–458, 1994.
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
157
Decision Support and Information Fusion in the Context of Command and Control Éloi BOSSÉ Defence R&D Canada Valcartier
Abstract. Command and control can be characterized as a dynamic human decision making process. A technological perspective of command and control has led system designers to propose solutions, such as decision support and information fusion, to overcome many domain problems. Solving the command and control problem requires balancing the human factor perspective with that of the system designer and coordinating the efforts in designing a cognitively fitted system to support the decision-makers. This paper discusses critical issues in the design of computer aids, such as a data/information fusion system, by which the decision-maker can better understand the situation in his area of operations, select a course of action, issue intent and orders, monitor the execution of operations, and evaluate the results. These aids will support the decision-makers in coping with uncertainty and disorder in warfare and in exploiting people or technology at critical times and places to ensure success in operations. Keywords. Command and Control, Data/Information Fusion, Situation Awareness, Decision Making, Cognitive Systems Engineering.
Introduction Command and Control (C2) is defined by the military community as the process by which a commanding officer can plan, direct, control and monitor any operation for which he is responsible in order to fulfill his mission. From a human factor perspective, the complexity of military operations highlights the critical role of human leadership in C2. To resolve adversity, C2 systems require qualities inherent to humans such as decision-making abilities, initiative, creativity and the notion of responsibility and accountability. Although these qualities are essential, characteristics inherent to the C2 environment combined with the advancement in threat technology significantly challenge the accomplishment of this process and therefore require the support of technology to complement human capabilities and limitations. A technological perspective of C2 has led system designers to propose solutions by providing operators with Decision Support Systems (DSS). These DSSs should aid operators in achieving the appropriate Situation Awareness (SA) state for their decision-making activities, and support the execution of the resulting actions. The lack of knowledge in cognitive engineering has in the past jeopardized the design of helpful computer based aids aimed at complementing and supporting human cognitive tasks. Moreover, this lack of knowledge has most of the time created new trust problems in the designed tools. Solving the C2 problem thus requires balancing the human factor perspective with that of the system designer and coordinating the efforts in designing a cognitively fitted
158
É. Bossé / Decision Support and Information Fusion in the Context of Command and Control
system to support decision-makers. The paper starts with a discussion on the C2 and decision-making process followed by decision support definitions and concepts. It then presents the problem of designing a cognitively fitted DSS using a Cognitive Science Engineering (CSE) approach.
1. Decision Making The aim of C2 is to allow a commander to make decisions and take actions faster and better than any potential adversary. Accordingly, it is essential to understanding how commanders make decisions. One stream of decision-making theory, based on decision theoretic paradigms, views decision making as an analytic process that corresponds closely to the military estimate of the situation. According to this analytic approach, the commander generates several options, identifies criteria for evaluating these options, rates the options against these criteria and then selects the best option as the basis for future plans and action. It aims at finding the optimal solution, but it is time consuming and information intensive. A second stream of decision making theory emphasizes a more inductive than analytic approach. Called Naturalistic Decision Making (NDM), it emphasizes the acquisition of knowledge, the development of expertise and the ability of humans to generalize from past experience. It stresses pattern recognition, creativity, experience and initiative. In general, while the two models represent conceptually distinct approaches to decision making, they are not mutually exclusive in practice. The commander will adopt the approach that is best tailored to the situation and may use elements of the two at the same time. Indeed, a combination of the two is probably always at work within the C2 system. 1.1. Task/Human/Technology Triad Model A triad approach has been proposed by Breton, Rousseau and Price [1] to represent the collaboration between the system designer and the human factors specialist. As illustrated in Figure 1, three elements compose the triad: the task, the technology and the human. In the C2 context, the Observe-Orient-Decide-Act (OODA) loop represents the task to be accomplished. The design process must start with the identification of environmental constraints and possibilities by Subject-Matter Experts (SMEs) within the context of a CSE approach. System designers are introduced via the technology element. Their main axis of interest is the link between the technology and the task. The general question related to this link is: “What systems must be designed to accomplish the task?” System designers also consider the human element. Their secondary axis of interest is thus the link between the technology and the human. The main question of this link is: “ How must the system be designed to fit with the human”? However, system designers have a hidden axis. The axis between the human and the task is usually not covered by their expertise. From their analyses, technological possibilities and limitations are identified. However, not all environmental constraints are covered by the technological possibilities. These uncovered constraints, named hereafter deficiencies, are then addressed as statements of requirements to the human factor community. These
É. Bossé / Decision Support and Information Fusion in the Context of Command and Control
159
requirements lead to better training programs, the reorganization of work and the need for leadership, team communication, etc. REQUIREMENTS
OODA
REQUIREMENTS
Task
1
2
3
Technology
Human
System Designers
Human Factor Specialists
Principal Axis: (1) Technology -Task Secondary Axis: (3) Technology- Human Hidden Axis: (2) Human - Task
Principal Axis: (2) Human - Task Secondary Axis: (3) Technology- Human Hidden Axis: (1) Technology -Task
Figure 1. Task/Human/Technology Triad Model
2. Cognitive Engineering System Analyses CSE analyses are defined as approaches that aim to develop knowledge about the interaction between human information processing capacities and limitations, and technological information processing systems. The usefulness of a system is closely related to its compatibility with human information processing. Therefore, CSE analyses focus on the cognitive demands imposed by the world to specify how technology should be exploited to reveal problems intuitively to the decision maker’s brain. Cognitive Work Analysis (CWA) [2], one of the CSE approaches, seems to be the best choice to answer questions related to understanding the C2 task. 2.1. A Pragmatic approach to Cognitive Work Analysis (CWA) The Applied Cognitive Work Analysis (ACWA) methodology [3] emphasizes a stepwise process to reduce the gap to a sequence of small, logical engineering steps, each readily achievable. At each intermediate point the resulting decision-centered artifacts create the spans of a design bridge that link the demands of the domain as revealed by the cognitive analysis to the elements of the decision aid.
160
É. Bossé / Decision Support and Information Fusion in the Context of Command and Control
The ACWA approach is a structured, principled methodology to systematically transform the problem from an analysis of the demands of a domain to identifying visualizations and decision-aiding concepts that will provide effective support. The steps in this process include: x Using a Functional Abstraction Network (FAN) model to capture the essential domain concepts and relationships that define the problem-space confronting the domain practitioners; x Overlaying Cognitive Work Requirements (CWR) on the functional model as a way of identifying the cognitive demands / tasks / decisions that arise in the domain and require support; x Identifying the Information / Relationship Requirements (IRR) for successful execution of these cognitive work requirements; x Specifying the Representation Design Requirements (RDR) to define the shaping and processing for how the information / relationships should be represented to the practitioner; x Developing Presentation Design Concepts (PDC) to explore techniques to implement these representation requirements into the syntax and dynamics of presentation forms to produce the information transfer to the practitioner. Those steps are more extensively described in [3].
3. Ontological Engineering for Computer-Based Decision Support Systems One of the main challenges for the design of computer-based support systems for decision-makers resides in the representation of domain specific types of objects and situations, and the relations between these elements and the environment. Formally and consistently representing these entities and relations is critical to the successful design of automated processes, which take such representations as input (e.g., for automated reasoning). A key element in the development of appropriate knowledge-based systems is to relate the representation of situation elements in the system to those used by operators. In this regard, CSE techniques such as ACWA are essential in deriving the way humans represent these elements within their mental models. The use of methods such as ACWA enables the elicitation of information and knowledge. It provides a design framework to build effective and trusted decision support. Methodologies in CSE focus on identifying information needs and aiding strategies that reflect the goals and tasks in the domain, along with the means available to achieve them. These analysis tasks are carried out by means of various knowledge elicitation techniques, such as interviews with SMEs. Importantly, methods such as CWA and ACWA not only model the cognitive tasks (e.g., decisions) undertaken by operators in the domain, but also the fundamental purposes, functions and constraints of the work domain (captured in FAN or Abstraction Hierarchy (AH) models) as well. The FAN and AH models consist of networked or hierarchically organized entities representing the purposes, functions, and physical components of the work domain. Ontological engineering is the process of the construction of ontologies, including as steps the analysis of the domain of interest, the modeling of relevant concepts, the building and encoding of the resulting ontology into an appropriate formalism and, finally, the validation and evaluation of the ontology.
É. Bossé / Decision Support and Information Fusion in the Context of Command and Control
161
Combining CSE and Ontological Engineering (OE) principles will potentially provide a very powerful methodology for the building of formal domain knowledge models (domain ontologies) for military applications and its effective exploitation in decision support systems. In contrast to ontology and cognitive engineering alone, this novel approach will synergistically integrate ontological engineering and cognitive engineering into a unique and innovative methodology. The novelty comes from the mutual information process that will contribute to enrich the results of each approach. The ontology provides a formal, theoretical structure to represent the concepts identified through the cognitive engineering analysis (specifically, the FAN/AH work domain models), while the cognitive engineering analysis provides constraint over the scope of entities incorporated into the ontology. 3.1. Ontologies In the artificial intelligence community, where the concept of ontologies was investigated in the first place for the engineering of knowledge bases, an ontology is defined as a formal and explicit specification of a conceptualization [5]. It specifies the semantics of the domain concepts using attributes, properties, and relationships between concepts, as well as constraints and axioms. Therefore, it provides a formal and shared understanding of a domain, facilitating exploitation both by human agents or application programs. Ontologies are central to the design of DSS for military applications; military C2 problems are knowledge-intensive and involve a large number of concepts (either physical or abstract elements) to be considered within the situation analysis and decision-making processes. Ontologies constitute formal domain models upon which reasoning processes can be based and knowledge-based systems can be built. A formal ontological framework is necessary to afford a formal structure for analysis of domainspecific types of objects and situations, and the relations between them, and to ensure a certain level of reusability of the designed domain-specific ontology in a different application domain. The strength of ontologies is that they constitute knowledge components that are reusable across different applications. Different level data fusion processes could thus exploit these knowledge bases according to their reasoning processes. Finally, ontologies can serve to support knowledge level interoperability among heterogeneous knowledge sources. They provide a layer between an agent (human or artificial) and physical knowledge sources by formally defining the domain knowledge and explicitly specifying the content of the knowledge sources using the concepts of the ontology (meta-models). Using this approach, knowledge and information sources can be linked to the concepts of the ontology, and services can be provided to exploit the data and knowledge bases by a human or a machine agent.
4. Designing a Data/Information Fusion System The data-to-information relationship can be complex and requires a significant number of computations and/or transformations. The degree of transformation required can vary from simple algebra to complex, intelligent algorithms such as those used in data/information fusion. Data/information fusion is clearly a key enabler for situation
162
É. Bossé / Decision Support and Information Fusion in the Context of Command and Control
awareness and when built as a system becomes a key support to the decision-making process. A rich legacy in data fusion technology exists, ranging from the Joint Directors of Laboratories (JDL) data fusion process model to taxonomies of algorithms and engineering guidelines for architecture and algorithm selection. To date, numerous data fusion systems have been developed, especially for military applications. How can we use this legacy in our data fusion system design today? Unfortunately, the lack of common engineering standards, well-documented performance evaluations and architecture paradigms is a major impediment to objective evaluation, comparison or re-use of current data fusion systems. Developing a system that utilizes existing or developmental data fusion technology requires a standard method for specifying data fusion processing and control functions, interfaces, and associated data bases. In the initial JDL process model, the data fusion levels were never intended to be taken as a blueprint for system design or software development. However, in the revised version [6], the model has been extended to integrate a data fusion tree architecture model for system description, design and development. This data fusion tree would benefit from being guided and enriched with the formal methodology presented in the previous sections. The JDL guidelines recommend an architecture that represents data fusion processing in terms of nodes. When the data fusion process is partitioned into multiple processing nodes, the process is represented via a data fusion tree, illustrated in Figure 2 (from [6]). The guidelines recommend a four-phase process for developing data fusion processes within an information processing system, shown in Figure 3 (from [6]). To further enrich the guidelines, we suggest adding the CSE/OE (e.g. ACWA with an ontology layer) that will provide the data fusion designer with a formal methodology to capture overall system requirements and constraints. As a result, the risk of designing a data fusion system not cognitively fitted with the human will be mitigated.
5. Conclusion This paper presented discussions on the balance of human factors and technology in the design of decision support systems. CSE analysis methods and ontologies were introduced followed by an example of where these methodologies can be utilized in current engineering guidelines for designing data fusion systems.
References [1] [2] [3]
[4]
Breton, R., Rousseau, R. & Price, W. L. The Integration of Human Factors in the Design Process: a TRIAD Approach. Defence Research Establishment Valcartier, TM 2001-002, November 2001, 33 pages. Breton R., Potter S. S., and Bossé É., Application of a Human-centric Approach to the Design of Decision Support Systems for Command and Control, Journal of Decision Support Systems, Elsevier, Submitted 2005. Paradis. S., Elm, W. C., Potter. S. S., Breton. R. and Bossé. E., A Pragmatic Cognitive System Engineering Approach to Model Dynamic Human Decision-Making Activities in Intelligent and Automated Systems, NATO RTO HFM Symposium on Human Integration in Intelligent and Automated Systems, Warsaw, October 2002, 8 pp. Bossé, É., Valin, P., Boury-Brisset, A-C., Grenier, D., Exploitation of A Priori Knowledge for Information Fusion, Journal of Information Fusion, Elsevier, 2005.
É. Bossé / Decision Support and Information Fusion in the Context of Command and Control
[5] [6]
163
Gruber, T., A translation Approach to Portable Ontology Specifications, Knowledge Acquisition, Vol. 5, No. 2, pp. 199-220, 1993. Steinberg, A. N., Bowman, C. L. and White, F. E. Revision to the JDL data fusion model, Joint NATO/IRIS Conference, Quebec City, October 1998.
Figure 2. Integrated data fusion/resource management trees (from [6])
Figure 3. Data fusion system engineering method (modified from [6])
164
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Fusion in European SMART Project on Space and Airborne Mined Area Reduction Isabelle BLOCH a, 1, Nada MILISAVLJEVIĆ b Ecole Nationale Superieure des Télécommunications, Paris, France b Royal Military Academy, Signal and Image Centre, Brussels, Belgium a
Abstract. This paper presents three fusion strategies applied within the European SMART project on space and airborne mined area reduction tools. Two strategies are based on belief function theory, and the third one is a fuzzy method. The main aim of the three methods consists of taking advantage of several available data sources with different properties, improving landcover classification and anomaly detection results, taking advantage of existing knowledge, and allowing user interaction. Keywords. Airborne mined area reduction, landcover classification, anomaly detection, belief function theory.
Introduction Three fusion strategies developed within the European SMART project [1,2] are presented in this paper. The underlying ideas of the methods are to take advantage of several available data sources with different properties, improve landcover classification and anomaly detection results, take advantage of existing knowledge, and allow user interaction. Available sources of information are Synthetic Aperture Radar (E-SAR) and multiband optical data (Daedalus), older KVR (satellite) data, knowledge about the sensors, registration, classification and detection results (from teams of the SMART consortium), ground-truth (legend and labeled regions in training and test areas) and expert knowledge. Two fusion strategies are based on belief function theory [3], with their main differences being the way discounting is performed and how classifiers are treated as information sources. The third strategy consists of fuzzy [4] weighted maximum fusion of only the best classifiers for each class. The strategies are illustrated on the test site of Glinska Poljana in Croatia, which is one of the three representatives of the Croatian terrain analyzed within the SMART project. The paper is organized as follows. In Section 2, three fusion methods are explained. In the following sections, knowledge inclusion and spatial regularization are discussed. Section 5 provides more detail about the levels of fusion involved, and Section 6 contains results. 1 Corresponding author: Isabelle Bloch, ENST-TSI, CNRS UMR 5141 LTCI, Rue Barrault 46, 75013 Paris, France; E-mail:
[email protected] I. Bloch and N. Milisavljevi´c / Fusion in European SMART Project
165
1. Fusion Input The types of fusion inputs provided by different members of the SMART consortium are: • E-SAR classification with confidence images per class • E-SAR and Daedalus detection of hedges, trees, shadows, rivers, with confidence degrees for hedges and trees; shadows and rivers are discounted based on Daedalus bands • supervised classification of Daedalus data, where the result is provided as a decision image • region-based classification of Daedalus data with confidence images per class • belief function classification of Daedalus data with confidence images per class • E-SAR and Daedalus binary detection of roads • E-SAR binary detection of rivers • binary image of Daedalus and KVR change detection. The main characteristics of the available information are: • no classifier or anomaly detector is perfect • their reliabilities are very variable in function of class • complementarity • lack of spatial information • not all available knowledge is taken into account.
2. Three Fusion Methods 2.1. Belief Function Fusion Strategy Based on Global Discounting Factor (BF1) Here, each classifier is considered as one information source. The focal elements are simply the classes (singletons of the frame of discernment D) and the full set (D). At first, the classifier outputs (confidence values) are directly used as starting mass functions for singletons. In cases where no confidence values are provided but only a decision image or a binary detection is, the mass takes only values 0 or 1. In the next step, global errors are included in order to discount the starting masses. We propose to use a discounting factor α equal to the sum of the diagonal elements of the confusion matrix, divided by the cardinality of the training areas. This discounting is applied to all starting masses. Then: m(D) = 1 - α. Note that this means the explicit use of the confidence matrix, which should be computed on the training areas for each classifier or detector. Finally, definition of the full set mass should take into account the fact that some classes are not detected (therefore it should be equal to 1 at points where 0 is obtained for all detected classes). As a result, at each step of the fusion, the focal elements are always singletons and D. The Decision rule can be a maximum of belief, maximum of mass, or maximum of pignistic probability, all being equivalent in this case. This approach is very easy to implement, and demonstrates in a simple way that classifiers or detectors may not give any information on some classes and may be imperfect.
166
I. Bloch and N. Milisavljevi´c / Fusion in European SMART Project
2.2. Belief Function Fusion Strategy Based on a More Specific Discounting Factor (BF2) More sophisticated methods can be designed by considering each classifier as several sources. Namely, each classifier provides an output for each class, which can be considered as an information source for the fusion. In a simple model, focal elements for a source are defined by the output of a classifier or detector to a class Ci are Ci and D. Several instances of this approach have been designed, but the most interesting one is where the confusion matrix is used for more specific discounting for each class. From the confusion matrix computed from the decision made from one classifier and from training areas, we derive a kind of probability that the class is Ci given that the classifier says Cj as: conf (i, j ) c(i, j ) = , ∑ conf (i, j ) i
where the values conf(i,j) denote the coefficients of the confusion matrix. This formula corresponds to a normalization of the lines of the confidence matrices on training areas for all classifiers and detectors used as fusion input. It is possible to ignore low values and normalize others to reduce the number of non-zero coefficients (so the number of focal elements in the following). In our experiments, the threshold value is equal to 0.05. We use c(i,j) for discounting in methods described in the previous subsection. At the moment, we still consider that a class j of a classifier is one source. Then, from v(Cj), i.e., the value provided for this class by a classifier, we define: • m(Ci) = v(Cj) c(i,j), for all classes Ci which are confused with Cj (which provides ∑ m(C i ) = v(C j ) ), and i
• m(D) = 1 - v(Cj). Compared to BF1, instead of keeping a mass on Ci only (and D), this mass is spread over all classes possibly confused with Ci, thus better exploiting the richness of the information provided by a classifier. 2.3. Fuzzy Fusion To compare the previous methods with a fuzzy approach, we test a simple method, where we choose for each class the best classifiers, and combine them with a maximum operator (possibly with some weights). Then, a decision is made according to a maximum rule. The choice is made based on the confusion matrix for each classifier or detector, by comparing the diagonal elements in all matrices for each class. This approach is interesting because it is very fast. It uses only a part of the information, which could also be a drawback if this part is not chosen appropriately. Some weights have to be tuned, which may need some user interaction in some cases. Although it may sound somewhat ad hoc, it is interesting to show what we can get by using the best parts of all classifiers. Note that it is possible to make this approach more automatic.
I. Bloch and N. Milisavljevi´c / Fusion in European SMART Project
167
3. Knowledge Inclusion To improve results, some additional knowledge can be included in the fusion results (knowledge of the classifiers, their behaviors, etc. has already been included in the previous steps). We use at this step only the pieces of knowledge that directly provide information on the landcover classification. Other pieces of knowledge such as mine reports, etc., are not directly related to classes of interest, but rather to the dangerous areas, and are therefore included in the danger map construction, which follows the fusion task. At this step, several pieces of knowledge prove to be very useful. They concern, on the one hand, some “sure” detection. Some detectors are available for roads and rivers, which provide areas or lines that surely belong to these classes. There is almost no confusion, though some parts may be missing. These detections can then be imposed on the classification results. This is achieved by replacing the label of each pixel in the decision image by the label of the detected class if this pixel is actually detected. If not, its label is not changed. As for roads, additional knowledge is used, namely on the width of the roads (based on observations done during the field missions). Since the detectors provide only lines, these are dilated by the appropriate size, taking into account both the actual road width and the resolution of the images. On the other hand, another type of knowledge is very useful: the detection of changes between images taken during the project and KVR images obtained several years earlier. The results of the change detection processing provide information primarily about a class named “abandoned agricultural land”, since it exhibits fields which were previously cultivated, and which are now abandoned. Again these results do not show all regions belonging to the class “abandoned agricultural land”, but the detected areas surely belong to that class. A similar process can then be applied as for the previous detectors. With the proposed methods, it was difficult to obtain good results on the class “agricultural land in use”, while preserving the results on the class “abandoned agricultural land”, which is crucial since it corresponds to fields no longer in use and which are therefore potentially dangerous. Therefore we use the best detection of the class “agricultural land in use” (extracted from region based classification on Daedalus) as an additional source of knowledge.
4. Spatial Regularization The last step is a regularization step. Indeed, it is very unlikely that isolated pixels of one class can appear in another class. Several local filters have been tested, such as a majority filter, a median filter, or morphological filters, applied on the decision image. A Markovian regularization approach on local neighborhoods was tested too. The results are somewhat better, but not significantly. A better approach is to use the segmentation into homogeneous regions provided by another team in the project. In each of these regions, a majority voting is performed. We count the number of pixels in this region that are assigned to each class and the class that has the largest cardinality is chosen for the whole region (all pixels of this region are relabeled and assigned to this class). This type of regularization, which is performed at a regional level rather than at a local one, provides good results.
168
I. Bloch and N. Milisavljevi´c / Fusion in European SMART Project
5. On the Levels of Fusion Three levels of fusion are often distinguished: low level (usually pixel level), intermediary level (features such as lines or regions) and higher level (usually called decision fusion). Here all three levels are addressed in the proposed schemes. All three levels appear in the input of fusion, since classifiers may be based on pixels, on regions, on detection of linear structures (as for river or roads), on semantics (like change detection), etc. But, they also appear in the fusion itself; the computation of the combination is performed at pixel level, but based on semantic information provided by the classifiers and detectors (in terms of classes and decisions). Therefore this step elegantly merges two levels of fusion. The final regularization step is performed at an intermediary level, in homogeneous regions which are not reduced to a simple pixel neighborhood, but which, on the other hand, do not cover one class each (several regions belong to the same class).
6. Results For method BF1, the basic fusion results are reasonably good, except for the class “abandoned agricultural land”, where they are worse than using individual classifiers. Results for the class “water” are especially good. After knowledge inclusion, the results are improved, in particular for the class “abandoned agricultural land”. This shows the importance of knowledge on changes in the fusion. An additional improvement is obtained by the regularization step. In the case of BF2, the results are somewhat better than those obtained with the method BF1. The class “abandoned agricultural land” is not well detected and a lot of confusion occurs with the classes “agricultural land in use” and “trees and shrubs”. The class “rangeland” is not detected at all, but that class is not important for the analyzed application. After knowledge inclusion, the results are improved and confusion between the classes “abandoned agricultural land” and “agricultural land in use” is greatly reduced. Finally, the regularization step improves in particular the class “water”, which reaches a very satisfactory level. For the fuzzy method, the following output of classifiers has been used for each class: • for the class “abandoned agricultural land”: E-SAR logistic regression, regionbased classification, belief function classification and change detection (results for this class of these classifiers, combined with a maximum operator, with a factor 2 for logistic regression to compensate the lower values provided by this classifier); • for the class “agricultural land in use”: region-based classification and belief function classification; • for the class “asphalted roads”: region-based classification and road detection; • for the class “rangeland”: region-based classification, minimum distance classification and belief function classification; • for the class “residential areas”: region-based classification and belief function classification; • for the class “trees and shrubs”: region-based classification and E-SAR trees and hedges detection;
169
I. Bloch and N. Milisavljevi´c / Fusion in European SMART Project
•
for the class “shadows”: E-SAR logistic regression, E-SAR shadow detection, minimum distance classification and belief function classification; the maximum is discounted by a factor 0.5, taking into account that this class is not really significant for further processing (shadows “hide” meaningful classes); • for the class “water”: region-based classification, belief function classification and river detection. The results of the basic fusion are already very good. This can be explained by the fact that not all information provided by the classifiers is used, but only their best part. Compared with previous methods, it is somewhat less automatic and more ad hoc, but allows for reaching good results very fast. After knowledge inclusion, the improvement is clear, although not as strong as with the previous approaches (since the results were already good). All classes are detected. Finally the regularization step provides some more improvements, but the class “shadow” disappears. This is not a serious problem since this class is not significant for further processing. Comparison of the three methods based on user and producer accuracy for the three most important classes with regard to the application is given in Table 1. Note that the best classifier is not always the same, which further justifies the benefit of fusion.
Table 1. Comparison of the three methods
Class Abandoned agricultural land Agricultural land in use Asphalted roads
Measure User accuracy Producer accuracy User accuracy Producer accuracy User accuracy Producer accuracy
Best classifiers 0.82 0.84 0.87 0.84 0.88 0.77
BF1 0.87 0.81 0.86 0.90 0.96 0.88
BF2 0.79 0.78 0.81 0.91 0.96 0.88
Fuzzy 0.81 0.88 0.91 0.85 0.97 0.88
Note that in addition to the decision images for each method, we also provide confidence and stability images. The confidence image represents, at each pixel, the maximum confidence over all classes at this point (i.e., the confidence degree of the decided class). The stability image is computed as the difference between the two highest confidence degrees (i.e., confidence in the decided class and confidence in the second highest possible class). If the stability is high, it means that there is no doubt about the decision (one class is well distinguished from all other ones). If it is low, it means that two classes are very close to each other in terms of confidence, so the decision should be considered carefully. The confidence image and the stability images can be multiplied to provide a global image evaluating the quality of the classification at each point.
170
I. Bloch and N. Milisavljevi´c / Fusion in European SMART Project
Conclusion After a detailed bibliographical analysis and general considerations regarding the SMART project, several numerical fusion approaches were specified, adapted to the available data and classifiers or detectors. These approaches are to a large part original. Results have been detailed with the three most promising approaches. We have shown how the results can be improved by introducing knowledge in the fusion process. For instance, knowledge about roads has two parts: one comes from the images and provides mainly the location, and the other comes from field missions and provides mainly road width. A spatial regularization further improves the results. The final results are at least as good as the ones provided for each class by the best classifier for that class. Thus, they are globally better than any input classifier or detector. This clearly shows the improvement brought by fusion. Implementation issues have been addressed too, and a specific implementation is proposed to reduce memory cost and computational burden. The user has the possibility of intervening in the choice of classifiers and some of the parameters. The methods are to some extent specific to the actual SMART data. However, the programs are very general and can be used in any other application of belief function based data fusion without further work. The most crucial point is to define how to use the output of classifiers as input to fusion. It requires knowledge of the behavior of the classifiers, which type of information they provide on each class (or disjunction of classes), in order to choose the appropriate focal elements and mass functions. Although some help can be found in confusion matrices, some supervision may be needed at this step. Also the relative weight to be given to each classifier output with respect to the others belongs to the same class of problems. This requires moderate additional work for each new application, since it is expected that several trials with different parameters will have to be done. But an important reduction of the necessary work with respect to the one presented here is certain. The huge quantity of work done for this fusion module will certainly be useful in many other applications, even in quite different domains, and constitutes therefore a large set of methods and tools for both research and applicative work.
Acknowledgements The authors wish to thank the members of the SMART consortium who have provided their processing results as inputs for fusion.
References [1] [2]
[3] [4]
I. Bloch and N. Milisavljević, Report on possible fusion strategies in SMART. Technical report, 2003. Y.Yvinec, European project of remote detection: SMART in a nutshell. In Proc. of Robotics and Mechanical Assistance in Humanitarian Demining and Similar Risky Interventions, Brussels-Leuven, Belgium, 2004. P. Smets, The Transferable Belief Model for Uncertainty Presentation. Technical Report TR/IRIDIA/95-23, IRIDIA, Université libre de Bruxelles, Brussels, Belgium, 1995. L.A. Zadeh, Fuzzy Sets. Information and Control, 8: 338-353, 1965.
171
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
a,*
b
a b
Θ θi i = 1, . . . , n
*
Θ Θ Θ
172
J. Dezert and F. Smarandache / The DSmT Approach for Information Fusion
θi
Θ
Θ
Θ
Θ = {θ1 , . . . , θn }
DΘ
∅, θ1 , . . . , θn ∈ DΘ A, B ∈ DΘ A∩B
A∪B DΘ
n
DΘ
Θ Θ θi
θi
M
J. Dezert and F. Smarandache / The DSmT Approach for Information Fusion n
|DΘ | ≤ 22 Θ |DΘ | ≥ |2Θ |
|Θ| = n n ≥ 1
DΘ
D
Θ
173
Θ |DΘ |
Θ = {θ1 , . . . , θn } DΘ
Mf (Θ)
θi
DΘ
2Θ
0
M (Θ) Θ
M(Θ) B
Θ
M(Θ) = M0 (Θ)
M(Θ) = Mf (Θ) m(.) : DΘ → [0, 1]
m(∅) = 0
m(A) = 1
A∈D Θ
m(A)
A
(A)
m(B)
(A)
DΘ m2 (.) mMf (Θ) (C) ≡ m(C) =
m(B)
B∩A=∅ B∈D Θ
B⊆A B∈DΘ
m1 (.)
∀C ∈ DΘ A,B∈DΘ A∩B=C
m1 (A)m2 (B)
174
J. Dezert and F. Smarandache / The DSmT Approach for Information Fusion
DΘ
∪
∩
m(.)
m(.) : DΘ → [0, 1] Mf (Θ) k > 2
Mf (Θ)
k≥2
M(Θ) = Mf (Θ)
A ∈ DΘ mM(Θ) (A) φ(A) · S1 (A) + S2 (A) + S3 (A)
φ(A) A∈ /∅ φ(A) = 0 DΘ M ∅
∅ {∅M , ∅} ∅M
A
φ(A) = 1
S1 (A) ≡ mMf (θ) (A) S2 (A) S3 (A)
S1 (A)
k
mi (Xi )
X1 ,X2 ,...,Xk ∈DΘ i=1 (X1 ∩X2 ∩...∩Xk )=A
S2 (A)
k
mi (Xi )
i=1 X1 ,X2 ,...,Xk ∈∅ [U =A]∨[(U ∈∅)∧(A=It )]
S3 (A)
k
mi (Xi )
i=1 X1 ,X2 ,...,Xk ∈DΘ u(c(X1 ∩X2 ∩...∩Xk ))=A (X1 ∩X2 ∩...∩Xk )∈∅
U u(X1 ) ∪ . . . ∪ u(Xk ) X It θ1 ∪ . . . ∪ θn X S1 (A)
u(X)
θi c(X) k
175
J. Dezert and F. Smarandache / The DSmT Approach for Information Fusion
Mf (Θ) S2 (A) S3 (A)
mI (.) [0, 1]
X ∈ DΘ
DΘ
mI (.)
[0, 1] m(X) ∈ mI (X)
X∈D Θ
m(X) = 1
X1 X2 {x | x = x1 + x2 , x1 ∈ X1 , x2 ∈ X2 } X1 X2 {x | x = x1 · x2 , x1 ∈ X1 , x2 ∈ X2 } ∀A = ∅ ∈ DΘ mIMf (Θ) (A) =
mIi (Xi )
X1 ,X2 ,...,Xk ∈DΘ i=1,...,k (X1 ∩X2 ∩...∩Xk )=A
mM(Θ) (A) S1 (A) S2 (A) S3 (A) I I S2 (A) S3 (A) + S1I (A) S2I (A)
X = (A ∪ B)∩ C ∩ (A ∪ C)
mIM(Θ) (A) S1I (A) ·
S3I (A)
mi (Xi )
A ∩ B ∩ (C ∪ D) c(X) = (A∪B)∩C m(Θ) = 1
C ∩(A∪C) = C
mIi (Xi )
176
J. Dezert and F. Smarandache / The DSmT Approach for Information Fusion
2Θ m1 , m2 : G → [0, 1]
Θ = {θ1 , θ2 , . . . , θn } DΘ
G
X∈G
mi (X) = 1 i = 1, 2
∀X ∈ G m1...s (X) = k12
m(X1 ∩ X2 )
X1 ,...,Xs ∈G X1 ∩...∩Xs =X
k12 = X1 , X2 ∈ G
m(X1 ∩ X2 ) X1 X2
X1 ∩ X2 = ∅
s i=1
mi (Xi )
m1 (X1 )m2 (X2 ) = X1 ∩ X2 = ∅ s ≥ 2
k12 ∈ [0, 1]
mv (θ1 ∩ . . . ∩ θn ) = 1 mv ](X)
[m1 ⊕ . . . ⊕ ms ](X)
s > 1 [m1 ⊕ . . . ⊕ ms ⊕ mv (.)
∀X ∈ G \ {∅} mP CR5 (X) = m12 (X) +
Y ∈G\{X} c(X∩Y )=∅
c(x)
[
m2 (X)2 m1 (Y ) m1 (X)2 m2 (Y ) + ] m1 (X) + m2 (Y ) m2 (X) + m1 (Y )
x m12 (.) s≥2
J. Dezert and F. Smarandache / The DSmT Approach for Information Fusion
177
178
J. Dezert and F. Smarandache / The DSmT Approach for Information Fusion
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
179
Multitarget Tracking Applications of Dezert-Smarandache Theory A. TCHAMOVA1,2, J. DEZERT3, T.SEMERDJIEV1, P.KONSTANTINOVA1 Bulgarian Academy of Sciences, Institute for Parallel Processing, Sofia, Bulgaria 3 Office National d’Etudes et Recherches Aérospatiales, Chatillon, France
1
Abstract. The objective of this study is to present two multitarget tracking applications based on Dezert-Smarandache Theory (DSmT) for plausible and paradoxical reasoning: (1) Target Tracking in Cluttered Environment with Generalized Data Association incorporating the advanced concept of generalized data (kinematics and attribute) association to improve track maintenance performance in complicated situations (closely spaced and/or crossing targets), when kinematics data are insufficient for correct decision making.; (2) Estimation of Target Behavior Tendencies - it is developed on the principles of DSmT applied to conventional passive radar amplitude measurements, which serve as an evidence for corresponding decision-making procedures. The aim is to present and to demonstrate the ability of DSmT to improve the decision-making process and to assure awareness about the tendencies of target behavior in case of discrepancies in measurements interpretation. Keywords. Dezert-Smarandache Theory, Multitarget Tracking, Attribute Data Fusion, Generalized Data Association, Decision Making under Uncertainty.
Introduction An important function of each radar surveillance system in a cluttered environment is to keep and improve targets’ tracks maintenance performance. This becomes a crucial and challenging problem, especially in complicated situations of closely spaced and/or crossing targets. The design of a modern multitarget tracking (MTT) algorithm [4,5,6] in a real-life stressful environment motivates the incorporation of advanced concepts for generalized data association. To resolve correlation ambiguities and to select the best observation-track pairings, in this first application of DSmT, a particular generalized data association approach is proposed and incorporated in an MTT algorithm. This approach allows the introduction of target attributes to the association logic, based on the general DSm rule of combination. Estimation of target behavior tendencies is an important subject related to angle-only tracking systems, which are based on passive sensors. These systems tend to be less precise than those based on active sensors, but one important advantage is their stealth. In a single sensor case, only the direction of the target as an axis is known, but the true target position and behavior (approaching or descending) remains unknown. A number of developed tracking 1 Corresponding Author: Albena Tchamova, Bulgarian Academy of Sciences, Institute for Parallel Processing, “Acad. G.Bonchev”str.,bl.25-A,1113 Sofia, Bulgaria; e-mail:
[email protected]. 2 This work is partially supported by MONT grants I-1205/02, I-1202/02 and by Center of Excellence BIS21++.
180
A. Tchamova et al. / Multitarget Tracking Applications of Dezert-Smarandache Theory
techniques operating on angle-only measurements use additional information. We utilize the measured emitter's amplitude values in consecutive scans. This information can be used to assess tendencies in a target's behavior and, consequently, to improve the overall angle-only tracking performance. The aim of this application is to present and to improve the ability of DSmT to successfully finalize the decision-making process and to ensure awareness about a target’s behavior tendencies in case of discrepancies of angle-only measurements. Results are presented and compared in detail with the respective ones drawn from the fuzzy logic approach in companion papers [3,7]. The DSmT [1,2,3] proposes a new general mathematical framework for solving fusion problems. It overcomes the practical limitations of Dempster-Shafer Theory (DST), coming essentially from the acceptance of the law of the third excluded middle.3 DSmT is an extension of probability theory and DST.
1. Target Tracking in Cluttered Environment with Generalized Data Association based on the General DSm Rule of Combination 1.1. Basic Elements of Tracking Process The tracking process consists of two basic elements: data association and track filtering. The goal of the first element is to correctly associate observations with existing tracks. To eliminate unlikely observation-track pairing, a validation gate is formed around the predicted track position. Measurements in the gate are candidates for association with the corresponding track. The used tracking filter is the first order extended Kalman filter [4,5]. We assume Gaussian distributed measurements. One defines a threshold constant for gate G such that correlation is allowed, if the 2
relationship d ij2 G is satisfied, where d ij is the norm of residual vector. 1.2. Generalized Data Association When attribute data are available, generalized probability can be used to improve the assignment. In view of the independence of kinematic and attribute measurement errors, the generalized probability for measurement j originating from track i is:
Pk i , j Pa i , j
Pgen i , j
where Pk i , j and Pa i , j are kinematic and attribute probability terms. We choose a set of assignments that ensures a maximum of the total generalized probability sum, n
i.e., use the solution of the assignment problem min
m
¦¦ a
ij
F ij . Because our
i 1 j 1
3
It is a basic theorem of propositional logic, where it is written P ¬ P
181
A. Tchamova et al. / Multitarget Tracking Applications of Dezert-Smarandache Theory
probabilities vary between 0 and 1, the elements of the particular assignment matrix are defined as: aij 1 Pk i, j .Pa i, j to satisfy the condition to be minimized. 1.3. The Fuzzification Interface The fuzzification interface [3, pp.307] transforms crisp measurements into a fuzzy set. The input variable is the Radar Cross Section (RCS) of the observed targets, which is determined as a linguistic variable. The modeled RCS data [7] are analyzed with the subsequent declaration for specified type (Fighter, Cargo) or False Alarms. Bearing this in mind, we define two frames of the problem: first - the size of RCS: 4 1 ^VerySmall VS , Small ( S ), Big B ` and second - its corresponding Target Type 4 2 ^FalseAlarm s ( FA), Fighter F , Military C arg oC `. The RCS for real targets is modelled as a Swerling3 type function, for False Alarm,- a Swerling2. 1.4. Tracks’ Updating Procedures 1.4.1. Using Classical DSm Rule of Combination DSm classical combinational rule is used for track updating: ij C mupd
>m
i his
@
j C m mes
¦ m A .m B
A, BD 41
where mupd ij
i his , A B C
j mes
represents the general basic beliefs assignments (gbba) of the updated
track i with the new observation j;
i j mhis , mmes are respectively gbba vectors of track’s i
history and the new observation. DSmT takes into account and utilizes the paradoxical information hidden in non-empty intersections VS S B, VS S , VS B, S B . 1.4.2. Using Hybrid DSm Rule of Combination RCS data are used to analyze and subsequently to determine the type of the observed target, so a target’s type represents the second frame of the problem 42 ^( FA)lseAlarm, ( F )ighter, Military (C ) arg o`. We consider the following relationships: x If rcs is Very Small then the target is False Alarm x If rcs is Small then the target is Fighter x If rcs is Big then the target is Cargo We transform the updated tracks’ gbba from D
ij mupd C CD 4 2
ij mupd C CD 41
41
into respective gbba in D
42
,i.e:
182
A. Tchamova et al. / Multitarget Tracking Applications of Dezert-Smarandache Theory
Here the following exclusivity constraints are introduced: 01
01
01
01
FA F { , FA C { , F C { , FA F C { . We update the previous fusion result, obtained via the classical DSm rule, with this new information on the model 01 42 , and solve with the DSm hybrid rule [3], which transfers the mass of empty sets to the non-empty sets of D
42
1.5. The Generalized Data Association (GDA) Algorithm We consider particular clusters and sets of n tracks and m received observations at a current scan. The validation test is used for filling the assignment matrix. We solve the assignment problem by the extension of Munkres algorithm [8]. The JPDA approach [4] is used to produce the probability terms Pk , Pa . To define the probabilities for data association the following steps are implemented: (1). Check gating; (2). Clustering; (3). For each cluster: (3.1) Generate hypotheses following Depth First Search procedure; (3.2) Compute hypothesis probabilities for kinematic and attribute contributions; (3.3) Fill assignment matrix, solve assignment problem. 1.5.1. Attribute Probability Term for Generalized Data Association Calculating the attribute probability term follows the joint probabilistic approach: P
''
H l
d
i , j ,
e i z 0 , j z 0 | i , j H
l
where
¦ m C m C
d e i , j
i
j
2
C D 41
is the Euclidean distance between m C -predicted bba of C from track history of i
target i ; m C -bba of C of attribute measurement. The corresponding normalized probabilities are obtained as: j
Pa H l
P ' ' H l N
H
¦
P ' ' H l
l 1
where N H is the number of hypotheses. To compute Pa' i, j , a sum is taken over the probabilities from these hypotheses, in which this assignment occurs. Because the Euclidean distance is inversely proportional to the probability of association, the probability Pa i , j
1 Pa' i , j is used to match the corresponding Pk i, j .
A. Tchamova et al. / Multitarget Tracking Applications of Dezert-Smarandache Theory
183
1.6. Simulation Scenario Scenario 1 [3,pp.317] consists of two air targets (Fighter, Cargo) in clutter and a stationary sensor at the origin with Tscan= 5 sec., measurement standard deviations 0.3 [deg] and 60m for azimuth and range. The targets are moving east to west with a constant velocity of 250m/sec. The headings of the fighter and cargo are 225 [deg] and 315 [deg] from north, respectively. During the 11th –14th scans, the targets perform maneuvers with 2.5g. Scenario 2 [3,pp.317] consists of four air targets (alternating Fighter, Cargo, Fighter, Cargo) moving with a constant velocity of 100m/sec. The heading at the beginning is 155 [deg] from north. The targets make maneuvers with 0.85g-(right, left, right turns). 1.7. Comparative Analysis of the Results obtained using Kinematics only, DezertSmarandache and Dempster-Shafer Theory The incorporated advanced concept of Generalized Data Association (GDA) leads to improving track maintenance performance, especially in complicated situations (closely spaced and/or crossing targets). It influences the obtained tracks’ purity results [3, pp.318-321]. Track purity increases using DSmT as compared with DST. Analyzing all the obstacles, it can be underlined that: x DSmT allows paradoxical information to be processed and utilized in a flexible manner. This paradoxical information is peculiar to the problem of multiple target tracking in clutter, where the conflicts between the bodies of evidence often become high and critical. In this way it contributes to a better understanding of the overall tracking situation and to producing an adequate decision. Processing the paradoxes, the estimated entropy in the confirmed tracks’ attribute histories decreases during the consecutive scans. x Because of Swerling type modelling, observations for False Alarms, Fighter and Cargo are mixed. This causes some conflict between general basic beliefs assignments of the described bodies of evidence. When the conflict becomes unity, it leads to indefiniteness in Dempster’s rule and consequently the fusion process cannot be realized and the whole MTT becomes corrupt. x If the new measurement leads to an update of a track’s attributes, in which some particular hypothesis is supported by the unity, after that point, the Dempster’s rule becomes indifferent to any other measurements in the following scans. This means the track’s attribute history remains the same, regardless of the received observations. It leads to non coherent and non adequate decisions according to the right associations.
2. Estimation of Target Behavior Tendencies using DSmT 2.1. Approach for Behavior Tendency Estimation The block diagram of the target's behavior tracking system [3, pp.291] maintains two single-model-based Kalman-like filters using two models of target behavior Approaching and Receding. The tendency prediction is based on Zadeh's
184
A. Tchamova et al. / Multitarget Tracking Applications of Dezert-Smarandache Theory
compositional rule [9,10,11]. The updating procedure uses DSm rule to estimate target behavior states. 2.2. Fuzzification Interface A decisive variable in our task is the value transmitted from the emitter amplitude and received at consecutive time moments. We use the fuzzification interface [3, pp.292] that maps these values into two fuzzy sets T ^Small S , Big B`. Their membership functions rely on the inverse proportion dependency between the measured amplitude value and corresponding distance to target. 2.3. Behavior Models We consider two target behavior models: Approaching - characterized as a stable process where the amplitude value gradually increases; and Receding - characterized as a stable process where the amplitude value gradually decreases. To conform to these models the following rule bases have to be carried out: Behavior Model 1: Approaching Target: Behavior Model 2: Receding Target: Rule 1: IF A k is Small THEN A k 1 is Small Rule 1: IF A k is Big THEN A k 1 is Big
A k is Small THEN A k 1 is Big Rule 2: IF A k is Big THEN A k 1 is Small Rule 3: IF A k is Big THEN A k 1 is Big Rule 3: IF A k is Small THEN A k 1 is Small The models are derived as fuzzy graphs, in which Larsen product operator is used for fuzzy conjunction; maximum for fuzzy union; Zadeh max-min rule of composition [14] Relation1: Approaching Target k o k 1 SB B S SB Rule 2: IF
S SB
1
0
1
0
0
0
0
0
B SB
0.2
0
1
0
0
0
0
0
k o k 1
Relation2: Receding Target SB B S SB
S SB
1
0
0.2
0
0
0
0
0
B SB
1
0
1
0
0
0
0
0
2.4. Models’ Conditioned Attribute State Prediction At initial time moment k , the target is characterized by the fuzzified amplitude state
estimates according to the models P A App k k
and P A Re c k k . Using them and
applying the Zadeh max-min compositional rule to relations 1 and 2 we obtain models' conditioned amplitude state predictions for time moment k 1 , i.e.:
A. Tchamova et al. / Multitarget Tracking Applications of Dezert-Smarandache Theory
P A App k 1 k
PA
Re c
k 1 k
max min P A App k k , P App k o k 1
185
max min P A Re cp k k , P Re c k o k 1
2.5. Attribute State Updating using DSmT The updating procedure uses the DSm combinational rule: App / Re c C mupd
>m
App / Re c pred
@
mmes C
¦m
App / Re c pred A ,BD , A B C
A .mmes B
T
DSmT takes into account and utilizes the paradoxical information hidden in the nonempty set S B . This information refers to a moving target residing in an overlapping region, where it is hard to properly predict the tendency in its behavior. 2.6. The Decision Criteria The decision criterion for estimating the plausibility of models is based on the evolution of generalized pignistic entropies [3], associated with updated amplitude M M M states: H pig ^A`ln PupdM ^A` .The correct model corresponds to the Pupd Pupd
¦
AQ
smallest entropy value among these entropies. 2.7. Simulation Study A simulation scenario [3, pp.296] is developed for a single target trajectory in plane coordinates and for constant velocity movement. The target’s start-point and velocities x x x x are: X 0 5km, Y0 10km , X 100m / s, Y 100m / s , X 100m / s, Y 100m / s The time sampling rate is T distributed process.
10s . The measured amplitude value is a random Gaussian
2.8. Comparison between result of DSmT and Fuzzy Logic (FL) Approaches DSmT and FL approaches deal with a frame of discernments, based in general on imprecise/vague notions and concepts. DSmT allows dealing with rational, uncertain or paradoxical data, operating on the HyperPower Set. In our particular application DSmT provides an opportunity for flexible tracking during the overlapping region S B . DSmT based behavior estimates can be characterized as noise resistant, while FL uses an additional noise reduction procedure to produce ‘smoothed’ behavior estimates.
186
A. Tchamova et al. / Multitarget Tracking Applications of Dezert-Smarandache Theory
References [1] Dezert J., Foundations for a new theory of plausible and paradoxical reasoning,, in Information & Security, An international Journal, edited by Prof. Tzv. Semerdjiev, CLPP, BAS,Vol.9.,2002. [2] Dezert J. and F. Smarandache, "On the generation of hyper-powerset for the DSmT," Proceedings of the 6th International Conference on Information Fusion, Cairns, Australia, July 8-11,2003. [3] Smarandache.F, J.. Dezert (Editors), Advances and Application of DSmT for Information Fusion, American Research Press, Rehoboth, 2004 [4] Blackman S. Multitarget tracking with Radar Applications , Artech House, 1986 [5] Blackman S. and R. Popoli, Design and Analysis of Modern Tracking Systems, Norwood, MA, Artech House,1999 [6] Bar-Shalom Y.(Ed), Multitarget_ multisensor Tracking: Advanced Applications, Artech House,1990. [7] Benchmark Problem for Radar Resource Allocation and Tracking Maneuvering Targets in the presence of ECM, Technical Report NSWCDD/TR-96/10 [8] Bugeois F., J. C. Lassalle, "An Extension of the Munkres Algorithm for the Assignment Problem to Rectangular Matrices," Communications of the ACM, Vol.14, Dec.1971, pp.802-806. [9] Zadeh L., "Fuzzy Sets as a Basis for a Theory of Possibility," Fuzzy Sets and Systems,1978,1,pp.3-28. [10] Zadeh L., "From computing with numbers to computing with words - from manipulation of measurements to manipulation of perceptions," IEEE Trans. on Circuits and Systems, Jan.1999, 45,1, pp.105-119. [11] Mendel J., "Fuzzy Logic System for Engineering: A Tutorial," Proc. of the IEEE, March 1995,pp.345377.
187
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Image Registration: A Tutorial Pramod K VARSHNEY a, 1, Bhagavath KUMAR a, Min XU a, Andrew DROZD b and Irina KASPEROVICH b a EECS Department, Syracuse University, Syracuse, New York, USA b ANDRO Computational Solutions, Rome, NY, USA
Introduction Multiple imaging sensors are increasingly being used in a variety of imaging applications. Image registration is a necessary preprocessing task for all such systems. The images involved in such applications can be taken at different times, using different sensors and from different viewpoints. Such a huge variation in the type of imagery makes the problem of image registration non trivial. Also, with the increase in the number of sensor types, application of multi-modal images has become very popular. There is a current need to develop an in-depth understanding of image registration and the goal of this chapter is to provide a tutorial exposition of the topic. For more detailed discussion, the reader is referred to [1, 2]. Image registration essentially involves the process of aligning or overlaying two or more images of the same scene. Some of the application areas where image registration is required include, remote sensing, medical image analysis, cartography, pattern recognition and computer vision. Remote sensing applications involve environmental monitoring (pollution), change detection (urban studies, forestry), oil and mineral exploration, weather forecasting, target location, planetary observation and integrating information into Geographic Information Systems (GIS). Medical applications involve combining different modalities of images for biomedical research, tumor detection and other medical analysis.
1. Definition of Image Registration Let the two images to be registered be R & F (reference and floating image), each being a 2D scalar function dependent on the pixel location (x, y). Let T be the transformation required to align them and G be the radiometric transformation, required to equate the intensities of both images. Image registration can be defined as the mapping between R (x, y) and F (x, y). The mapping can be expressed as, R (x, y) = G (F (T (x, y)))
(1)
The process of image registration essentially involves the estimation of the transformation T. One can model the problem of registration as an optimization
1
Corresponding Author, Email:
[email protected] 188
P.K. Varshney et al. / Image Registration: A Tutorial
problem, where T is the argument of the optimum of some similarity metric S, applied to R and Transformed FT. This can be expressed as in Eq. (2) T = arg (opt (S (R, FT)))
(2)
2. Types of Image Registration Image registration can be classified into four different classes based on the kind of images and the problem involved [1, 2]. The four classes are, • Multi-modal registration • Viewpoint registration • Temporal registration • Template matching. The first three typically involve registration of images of the same area, but with distortion as a result of different sensors, different orientation of the sensors and different times of acquisition. The last one usually involves the identification of the location of a particular smaller template in a large image, if one exists. It is often used in pattern matching and object recognition. Registration problems could involve a combination of these four classes also, e.g., registration of images taken at different times and from different viewpoints. 2.1. Multi-modal Registration Registration of images of the same scene taken using different sensors is known as multi-modal registration. The main aim of multi-modal image analysis is to integrate information from different sources to obtain enhanced and detailed image information. Images from various kinds of sensors have special properties. For example, panchromatic images have high spatial resolution, multi-spectral images have high spectral resolution, and active sensors like SAR work even at night. In the case of medical images, a CT image of the brain (Figure 1) captures a better view of bones while a Magnetic Resonance Image (MRI) of the brain (Figure 2) provides information about the soft tissues [3]. The fusion of these two images would yield a third image which would be able to provide an analyst better understanding of the anatomical structure of the brain. Other areas of application of multi-modal image analysis include remote sensing and video surveillance.
P.K. Varshney et al. / Image Registration: A Tutorial
Figure 1. CT image of brain, showing bones
189
Figure 2. MRI image of brain, showing soft tissues
2.2. Viewpoint Registration Registration of images of the same scene acquired from different viewpoints is known as viewpoint registration. Images acquired using an aircraft typically have variations in the view angle due to the motion of the aircraft. Many SAR sensors like Side-Looking Airborne Radar (SLAR) have a deliberate angle of acquisition to the vertical, which provides them better range information. In general it provides a larger view. Also such images help in 3D reconstruction and shape recognition, e.g., in concealed weapon detection (Figure 3) [4, 5]. It helps in depth recovery used in Digital Elevation Model (DEM) generation. Registration of such images involves the use of assumptions about viewing geometry and properties of the surfaces. The perspective distortions need to be accounted for using local transformations. Feature-based algorithms are often used for such registration.
Figure 3. Images taken from two view angles for Concealed Weapon Detection
2.3. Temporal Registration Registration of images of the same scene taken at different times is known as temporal registration. The analysis of multi temporal images is used in the detection and evaluation of changes that have taken place in between the times of acquisition. In the recent past, due to increased interest in surveillance and disaster management, the evaluation of changes in an area using multi-temporal images [6, 7] has become important. Some other remote sensing applications [8, 9] of this analysis include natural resource monitoring, urban growth monitoring and landscape planning. Such
190
P.K. Varshney et al. / Image Registration: A Tutorial
analysis also finds great application in a medical context as it is useful in detecting and monitoring tumor growth. Registration of multi-temporal images is a problem of dissimilar images. The method should be able to tolerate and differentiate between distortions caused due to changes (to be evaluated) and mis-registration in the original images. 2.4. Template Registration Registration of a template image in a larger image is referred to as template registration. It essentially is a high level matching of pre-selected features with known properties. It finds applications in various areas such as in remote sensing where a satellite image is to be registered into a GIS layer or a map. This kind of registration is also referred to as scene to model registration, where an image of the scene and a model of the scene are registered. Similarly, one can also register the images to a DEM. Such a process is useful for interpreting scenes like airports, battlefields, and networks of highways etc. In medical imaging it is used to compare a patient’s image with digital anatomical atlases. It is also used for pattern matching in computer vision, automatic quality inspection, signature verification, and character recognition. For example, in Figure 4, a T-shaped template is to be located in an IC Circuit for automatic quality assurance of the circuit designed or inspected [1].
Figure 4. Image of an IC Circuit and ‘T’ shaped Template
3. Transformations Registration techniques involve searching within a certain type of transformation space to find the optimal transformation for a particular problem [1]. Hence, the selection of the type of spatial transformation or mapping is the fundamental characteristic of any image registration technique. The most common transformations used in image registration are: • Rigid transformation • Affine transformation • Projective transformation • Polynomial transformation • Radial basis function based transformation. The first two are most commonly used in remote sensing images in practice, but depending on various conditions, different transformations are useful. Figure 5 shows various types of transformation applied to a Baboon image.
191
P.K. Varshney et al. / Image Registration: A Tutorial
(a) Original Image
(b) Rigid transform
(d) Second order Polynomial transform
(c) Affine transform
(e) Projective transform
Figure 5. Transformation Examples
3.1. Rigid Transformation A rigid-body transformation is composed of a combination of translation, rotation, and scaling. A 2-D image typically has four parameters, translation in x, translation in y, scaling and rotation (tx, ty, s, θ), which map a point (x1, y1) of the first image to a point (x2, y2) of the second image as,
( xy ) = ⎛⎜⎝ tt ⎞⎟⎠ + s (sincosθθ 2 2
x y
)( )
− sin θ x1 y1 cos θ
(3)
For rigid body transformation, the angles and lengths in the original image are preserved after the transformation. 3.2. Affine Transformation Affine transformations are more general than rigid body transformations and, therefore, admit more complicated distortions while maintaining some nice mathematical properties. The affine transformation has six parameters, which can be written as,
( xy ) = ( aa ) + ( aa 2
13
11
2
23
21
)( )
a12 x1 a22 y1
(4)
Affine transformation does not have the properties associated with the orthogonal rotation matrix. Angles and lengths are no longer preserved, but parallel lines remain
192
P.K. Varshney et al. / Image Registration: A Tutorial
parallel. A shear transformation is an example of one type of affine transformation. Shear transform can act either along the x-axis, or along the y-axis. Shear transform in the x-axis and y-axis is represented as, Sh ea r
x
=
(
1 0
)
(
a 1 , S h ea r = 1 b y
0 1
)
(5)
3.3. Projective Transformation Projective transformation and perspective transformation account for distortions due to the projection of objects at varying distances to the sensor onto the image plane. It is a transform from 3-D to 2-D. When the object plane is parallel to the image plane, the perspective transformation becomes projective transformation. Let (xp, yp) denote plane coordinates and (xi, yi) denote image coordinates. Thus, projective transformation is written as, a11 x p + a12 y p + a13 a21 x p + a22 y p + a23 xi = , yi = (6) a31 x p + a32 y p + a33 a31 x p + a32 y p + a33 3.4. Polynomial Transformation Polynomial transformations are one of the general global transformations (of which affine is the simplest) and can account for many types of distortions so long as the distortions do not vary too much over the image. Distortion due to moderate terrain relief can often be corrected by a polynomial transformation. However, higher order polynomials are not usually used in practical applications because they can unnecessarily warp the sensed images [2]. The typical second order polynomial transformation is represented as,
⎛ x1 ⎜ x1 y1 ⎜ y2 ⎜ 1 ⎜ x1 ⎜ y1 ⎝1 2
( ) ( x2 y2
=
a 11 a 12 a 13 a 14 a 15 a 16 a 21 a 22 a 23 a 24 a 25 a 26
)
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
(7)
3.5. Radial Basis Function-based Transformation Radial basis function-based transformations are the class of transforms that can handle even locally varying geometric distortions. This transformation has the form of a linear combination of translated radially symmetric function plus a low-degree polynomial [1]. The radial basis function reflects an important property of the function value at each point; it depends just on the distance of the point from the control points, not on its particular position [1]. The main radial basis functions used in image registration include multi-quadrics, reciprocal multi-quadrics, Gaussian, Wendland’s functions, and
193
P.K. Varshney et al. / Image Registration: A Tutorial
thin-plate splines. This group of transformations provides good registration accuracy but has a large computation complexity.
4. Interpolation Techniques Interpolation involves the process of estimating intensity values based on neighborhood information. Let us denote the two images that need to be registered as F, the floating image on which a geometric transformation will be applied, and R, the reference image that will be interpolated. When a transformation is applied to F, a new grid is obtained, and an intensity interpolation algorithm is necessary for the calculation of the intensity values in R at every transformed grid point of F. Some of the most commonly used techniques are, • Nearest neighbor interpolation • Bi-linear interpolation • Bi-cubic interpolation.
(a) Original Image
(b) Nearest Neighbor
(c) Bi-linear
(d) Bi-cubic
Figure 6. Resulting Image using Different Interpolation Techniques
4.1. Nearest Neighbor Interpolation Nearest neighbor interpolation is the simplest interpolation method. This method uses the digital value from the pixel in R, which is nearest to the new transformed grid point (Figure 7.a), and therefore does not alter the original values. However, it results in some pixel values being duplicated while others are lost. The transformed image may have a disjointed or blocky appearance. Figure 6 shows the resulting image from this interpolation.
(a) Nearest Neighbor
(b) Bi-linear Figure 7. Different Interpolation Techniques.
(c) Bi- cubic
194
P.K. Varshney et al. / Image Registration: A Tutorial
4.2. Bi-linear Interpolation Bi-linear interpolation takes a weighted average of four pixels in the original image nearest to the new pixel location (Figure 7.b). Given two points x0 and x1 and the values of the function at these two points equal to y0 and y1, the linear interpolation in 1-D is equivalent to the calculation of y, the value of the function at any point x in the interval [x0, x1]. From Figure 8, using basic coordinate geometry, we can obtain the value y as given in Eq. (8). y = y0 +
y1 − y0 ( x − x0 ) x1 − x0
(8)
It follows in the 2-D domain as in Figure 8, with four known points Q11, Q12, Q21, Q22, we can first obtain the value at point R1 by linear interpolation of Q11 and Q21 and the value at point R2 by linear interpolation of Q12 and Q22. Then by the linear interpolation of R1 and R2, we can obtain the value at point P.
Figure 8. Linear Interpolation and Bi-linear Interpolation
In bi-linear interpolation, the averaging process alters the original pixel values and creates entirely new digital values in the output image (Figure 6). Thus this may be undesirable if further processing and analysis, such as classification based on spectral response, is to be done. 4.3. Bi-cubic Interpolation The bi-cubic interpolation method calculates a distance-weighted average of a block of sixteen pixels from the original image, which surround the new output pixel location (Figure 7.c). As with bi-linear interpolation, this method results in completely new pixel values. However, these two methods both produce images that have a much sharper appearance and avoid the blocky appearance of the nearest neighbor method (Figure 6).
5. Image Registration Methods 5.1. Feature-based Method Feature-based methods are based on the extraction of salient structures or features in the images [2]. The feature space is a particular aspect of the images that is used for
195
P.K. Varshney et al. / Image Registration: A Tutorial
comparing the images. The common features used for registration are of three types, namely, feature points [10], contours such as lines and edges [11, 12], and regions such as trees, fields, and buildings. The feature points can be points of locally maximum curvature on contour lines; centers of windows having locally maximum variances; centers of gravity of closedboundary regions and line intersections.These feature points are usually selected as control points (Figure 9). Once the control points are matched, the spatial transformation can be determined using a least squares method. Appendix 1 provides the whole process of control point based image registration. In addition to these features, some high level features [13] such as Fourier descriptors; Moment invariant, Shape Matrix and B-Spline descriptors are often employed to represent the object. These high level features are normally scale, rotation and translation invariant and therefore can be effectively used to match the objects in the two images. The choice of the type of invariant description depends on the feature characteristics and the assumed geometric deformation of the images.
Figure 9. Control point based registration
5.2. Fourier-based Method The Fourier-based method [14, 15, 16] works with the image in the frequency domain and utilizes the translation and rotational property of Fourier transforms. Let f1 and f2 be the two images that differ only by displacement (xo, yo), i.e., f2 (x, y) = f1 (x - xo, y yo). Then in the Fourier domain, the corresponding Fourier transforms, F1and F2 are related as shown in Eq. (9). Thus, the cross power spectrum of two images f1 and f2 with Fourier transforms F1and F2 is defined as in Eq. (10). F2 ( u, v ) = e− j 2π (ux0 + vy0 ) F1 ( u, v )
(9)
F1 ( u, v ) F2* ( u , v )
(10)
F1 ( u, v ) F2* ( u , v )
=e
j 2π ( ux0 + vy0 )
where F* is the complex conjugate of F. Eq. (10) shows that the phase of the cross power spectrum is equivalent to phase difference between the images. Further, the inverse Fourier transform of the cross power spectrum is an impulse at the displacement that is needed to optimally register the two images (See Eq. (11)).
196
P.K. Varshney et al. / Image Registration: A Tutorial
(
F −1 e
j 2π ( ux0 + vy0 )
) = δ (x − x , y − y ) 0
0
(11)
According to the Fourier translation property and the Fourier rotation property, if the floating image has rotation θ0 besides the translation, the images are related as in Eq. (12) and the corresponding transforms are related as in Eq. (13). If we consider the magnitudes of the transforms F1 and F2 to be M1 and M2 , it is easy to see that magnitudes of both the spectra are the same, but one is a rotated replica of the other (See Eq. (14)). f 2 ( x, y ) = f1 ( x cos θ 0 + y sin θ 0 − x0 , − x sin θ 0 + y cos θ0 − y0 )
(12)
F2 ( u, v ) = e− j 2π (ux0 + vy0 ) F1 ( u cos θ0 + v sin θ 0 , −u sin θ 0 + v cos θ 0 )
(13)
M 2 ( u , v ) = M1 ( u cos θ 0 + v sin θ 0 , −u sin θ 0 + v cos θ 0 )
(14)
Rotation without translation can be determined in a similar manner using phase correlation by representing the rotation as a translation displacement with polar coordinates, i.e., in polar representation using phase correlation and angle θ0 can be easily found [3]. Once the rotation is estimated, we apply it to the floating image so that the new floating image only has translation error, compared with the reference image. Then we calculate the cross power spectrum of the new floating image and reference image to obtain the translation. Appendix 2 provides the whole process of the Fourier based image registration. In summary, the Fourier based method uses the cross power spectrum of the two images to determine the translation and the cross power spectrum of the two polar images to determine the rotation angle. It is faster computationally but cannot reach high accuracy. The Fourier based method works well if the images are corrupted by frequency dependent noise. 5.3. Intensity-based Method Intensity based methods use the raw intensities of the images to perform registration. A similarity measure, denoted as S, in Eq. (2), is defined using the intensity values. Some commonly used similarity measures are: • Sequential similarity • Correlation • Mutual information. Both correlation and sequential similarity measure the degree of similarity between an image and a template. Unlike correlation, smaller values imply a better match in the case of sequential similarity. In comparison with correlation, the sequential similarity technique improves the efficiency to find the optimal transformation by orders of magnitude. The application of these methods is restricted to a great extent to images of
P.K. Varshney et al. / Image Registration: A Tutorial
197
the same modality. Unlike these two methods, the mutual information based method is more successful with multi modal images. 5.3.1. Sequential Similarity The sequential similarity measure is computationally efficient, but it increases the size of the search space. Three sequential similarity measures used are • Mean Square Difference (MSD) (Eq. (15)) • Absolute Difference (AD) (Eq. (16)) • Normalized Absolute Difference (NAD) (Eq. (17)).
MSD (i, j ) =
AD (i, j ) =
(u ( x, y ) − v ( x − i, y − j )) ∑∑ x y
2
# of pixels in the overlap
u ( x, y ) − v ( x − i , y − j ) ∑∑ x y
NAD(i, j ) =
(15)
(16)
# of pixels in the overlap
u ( x, y ) − uˆ − v ( x − i, y − j ) − vˆ ∑∑ x y
(17)
# of pixels in the overlap
5.3.2. Correlation Cross-correlation [17] provides the basic statistical approach to registration. It is often used for template matching or pattern recognition in which the location and orientation of a template or pattern is to be found in an image. It is a similarity measure or match metric, i.e., it gives a measure of the degree of similarity between an image and a template. These methods are generally useful for images that are misaligned by small rigid or affine transformations. They are most successful in cases where the intensities of the two images involved have a linear relationship. For a template T and image I, where T is small compared to I, the twodimensional Normalized Cross-Correlation (NCC) function (Eq. (18)) measures the similarity for each translation.
∑ x ∑ y T ( x, y ) I ( x − u , y − v ) CC (u, v) = ∑ x ∑ y I ( x − u, y − v) 2
If the template matches the image exactly, except for an intensity scale factor, at a translation of (i, j), the cross-correlation will have its peak at C (i, j).
198
P.K. Varshney et al. / Image Registration: A Tutorial
NCC (u, v) =
∑ x ∑ y T ( x, y ) I ( x − u , y − v ) ∑x∑ y I
2
(18)
( x − u , y − v)
5.3.3. Mutual Information Mutual information has its roots in information theory, where it was developed to set fundamental limits on the performance of communication systems. However, since then it has been successfully used in varied disciplines like mathematics, physics and economics. The use of mutual information as a similarity metric for image registration was proposed in 1995 [20, 21]. Since its introduction, MI has been used widely for the purpose of image registration. It has been demonstrated that MI is robust for multimodal images and hence well suited for dissimilar images. Also, it facilitates the automation of the process of image registration without compromising accuracy as compared with correlation-based methods and control point based registration. This method assumes that out of the two images to be registered, one can give maximum information about the other when it is correctly aligned (least registration error). In other words, this method attempts to find the proper transformation required for the alignment of both the images (or image and template) such that information given by one about the other is maximum. The added advantage of this approach is that it assumes no specific relationship between the intensities of the images involved. 5.3.3.1. Basic Definition of Some Terms Involved Before we can explain the definition of Mutual Information, let us go over some basic terms involved. Entropy Entropy is the amount of information an event gives when it takes place. It can also be interpreted as the uncertainty about the outcome of an event. Given events e 1,. . . ,em occurring with probabilities p1,. . ., pm, the Shannon entropy is defined as in Eq. (19). H = ∑ pi log 1 i
pi
= −∑ pi log pi
(19)
i
The Shannon entropy can also be computed for an image, where we consider the probability distribution of the gray values of the image. The probability distribution of gray values of an image can be approximated with its histogram, which can be estimated, by counting the number of times each gray value occurs in the image and dividing those numbers by the total number of pixels (occurrence). Joint Entropy Joint entropy measures how much entropy is contained in a joint system of two random variables [22]. It summarizes the degree of dependence of the two random variables. Given a pair of discrete random variables (A, B) with joint distribution p (a, b), the Shannon entropy for the joint distribution is defined as in Eq. (20).
199
P.K. Varshney et al. / Image Registration: A Tutorial
H ( A, B) = −∑ p(a, b) log p(a, b)
(20)
a ,b
In the case of images, the joint distribution of the image pair is estimated by counting the occurrence of a particular pair of gray values in both the images and dividing it by the total number of all such pairs. This joint distribution can also be referred to as the joint histogram of the two images. Conditional Entropy Conditional entropy measures how much entropy a random variable has remaining if we have already learned completely the value of a second random variable [23]. It summarizes the randomness of one random variable given knowledge of the other. Given a pair of discrete random variables (A, B) the entropy of B conditioned on A is referred to as H (B | A) and is defined as in Eq. (21). H(B | A) = H( A, B) − H( A)
(21)
5.3.3.2. Definition of Mutual Information Mutual information has been defined in many ways in the literature [24]. One of the frequently used definitions is based on conditional entropy. Mutual information for two images, A and B, can be defined as in Eq. (22).
I ( A, B ) = H ( B) − H ( B | A)
(22)
This definition can be understood as the amount of reduction in the uncertainty of image B when A is known. Hence Mutual Information is the information A contains about B. The mutual information of A and B is the same as that of B and A (Eq. (23)). Hence it is also the information B contains about A. This information is hence called the Mutual Information of A and B. I ( A, B) = H ( B) − H ( B | A) = H ( A) − H ( A | B)
(23)
Mutual Information can also be defined using joint entropy as in Eq. (24). According to this definition the maximization of mutual information (criterion of registration) essentially means the minimization of joint entropy. I ( A, B ) = H ( B ) + H ( A) − H ( A, B )
(24)
6. Issues Related to Mutual Information Based Methods In recent years mutual information based image registration has become very popular and it is one of the highly researched areas of image registration. Its most popular
200
P.K. Varshney et al. / Image Registration: A Tutorial
application is in medical imaging. Research on this problem and its practical application has given rise to a number of research issues. Some of the issues are: • Joint Histogram Estimation • Interpolation methods • Interpolation Artifacts • Speed of computation. 6.1. Joint Histogram Estimation The most important step involved in the entire mutual information based method is the estimation of the joint histogram. The process involved in the estimation of the joint histogram can be divided into two kinds, viz. two-step and single-step. The most commonly used method is the two-step procedure. In the first step, the intensities are estimated at the transformed grid points using one of the interpolation schemes explained in the following sections. The interpolated intensities are not integers in most cases and hence in the second step, these are rounded to the nearest integer and the joint histogram is obtained by increasing the corresponding entry (of the pair) by one. The joint histogram depicts the relationship between the intensities of the two images involved [25]. The robustness of the mutual information similarity metric for various modalities is due to the fact that it does not assume any such specific relationship between the intensities of the two images, unlike the other similarity measures such as Correlation and MSD, which assume a linear relationship as shown in Figure 10(a). The mutual information based method is suitable even for a pair of images where the intensities are related in some arbitrary manner as shown in Figure 10(b).
(a) Correlation and MSD measure.
(b) Mutual Information.
Figure 10. Joint Histogram depicting the intensity relationship assumed in different metrics
6.2. Interpolation Artifacts The quality of the joint histogram estimated has a direct impact on the Mutual Information (MI) function and hence registration accuracy. In recent studies [26], it has been observed that certain artifacts appear in the MI function with the use of certain
201
P.K. Varshney et al. / Image Registration: A Tutorial
interpolation techniques as shown in Figure 11. It is observed that these artifacts hamper the global optimization process due to the presence of spurious local optima (the artifacts). It is also observed that in certain situations the true global optimum may be buried in the artifact patterns and hence directly limit the registration accuracy. 1.4
1.4
1.3
1.2
1.2
1
1.1
0.8
1 0.9
0.6
0
1
2
3
4
Linear Interpolation
5
0.4 0
1
2
3
4
5
Partial Volume Interpolation
Figure 11. Mutual Information vs. Shift in x axis. plot showing the interpolation artifacts
It has been pointed out in [26] that certain types of artifact patterns as shown in Figure 11 occur when the two images have equal sample spacing in one or more dimensions and interpolation schemes like partial volume interpolation and linear interpolation (explained in Section 6.3) are used. More precisely it is seen that the artifacts occur when the ratio of the two sample-spacing along a certain dimension is a simple rational number [27]. This happens because, in such cases, many of the grid lines may be aligned along these dimensions under certain geometric transformations and, therefore, fewer interpolations are required for the estimation of the joint histogram as compared to the case where no grid lines are aligned. In practice, artifacts influence the registration accuracy only when the true global optimum is located very close to any of the spurious local optima introduced by the artifacts. Otherwise it only makes the optimization (maximization) problem complicated and difficult. For example, when the resolution of the two images are 4 m/pixel and 1 m/pixel, then by shifting the first image (4 m/pixel) say along the x-axis, then along the y-axis, grid lines 1,2,3… of the first image can be made to coincide with the grid lines 1,5,9… of the second image. In this case the contribution of the coincident grid points to the joint histogram can be counted directly without resorting to any form of estimation. But when we shift the first image by slightly more, then no grid points will be coincident and one would need interpolation to estimate the joint histogram. This shift from much less estimation to substantially more estimation is one of the possible causes of the artifacts. 6.3. Interpolation Methods As mentioned in Section 6.1, the coordinates of the transformed pixel are not integers and hence the intensities need to be interpolated. The importance of interpolation techniques increases tremendously due to the resultant artifacts explained in the previous section. Interpolation techniques described in Section 4 can be used for the two-step procedure of estimating joint histograms. Currently the use of single-step estimation of the joint histogram is becoming popular. A graphical illustration of the interpolation schemes is shown in Figure 12. There are a number of interpolation techniques in the literature that perform the single-step estimation. Some of the frequently used methods are:
202
P.K. Varshney et al. / Image Registration: A Tutorial
• • •
Linear Interpolation (Two step) Partial Volume Interpolation (PVI) Generalized Partial Volume Estimation (GPVE). v1 = ( vx , v y )
v2 = ( vx , v y + 1)
w4
Δx
Δy
u′
w2
w3
w1
v3 = ( vx + 1, v y )
v4 = ( vx + 1, v y + 1)
Figure 12 Graphical illustration of interpolation
6.3.1. Linear Interpolation Linear interpolation is the commonly used method in a two-step interpolation procedure. First, a grid point u from the floating image is transformed into u̡ in the reference image. The interpolated gray value R (u’) can be calculated by the weighted gray values of four nearest neighbor points v1, v2, v3, v4 using the linear interpolation method mentioned in Section 4.2. That is, R ( u ′ ) = ∑ wi • R ( vi ), where i
∑w i
i
=1
(25)
Secondly, the histogram of h (F (u), R (u’)) is increased by 1. Linear interpolation generates a new interpolated reference image and therefore may introduce new intensity values, which were originally not present in the reference image, leading to unpredictable changes in the marginal distribution of the reference image. 6.3.2. Partial Volume Interpolation (PVI) Instead of generating a new interpolated reference image, PVI updates the joint histogram for each pixel pair (u, vi) by the same weights as for linear interpolation. The PVI algorithm obtains the joint histogram as defined in Eq. (26), PVI can make the changes of the histogram more smoothly than the linear interpolation method2. However, interpolation artifacts may still occur when using PVI as in the case of linear interpolation. h ( F ( u ) , R ( vi ) ) + = wi
2
A += B implies A = A+B
i = 1, 2,3, 4 .
(26)
203
P.K. Varshney et al. / Image Registration: A Tutorial
6.3.3. Generalized Partial Volume Estimation (GPVE) GPVE is a generalized version of PVI [3,8]. It can overcome the artifact problem as mentioned in Section 6.2. GPVE updates the joint histogram as described in Eq. (27), where, p and q are used to specify the pixels involved in the histogram updating procedure and f is the Kernel function.
(
)
h F ( u x , u y ) , R ( vx + p, v y + q ) + = f ( p − Δ x ) • f ( q − Δ y ) , ∀p, q ∈ Z .
(27)
The kernel function f is a real valued function with the following properties, 1. f (x) ≥ 0, where x is a real number. 2. ∑ f (n-Δ) = 1, where n∈ I, -∞ ≤ n ≤ ∞ ; 0 ≤ Δ ≤ 1
Figure 13. B-splines function: (a) First order (b) Second order (c) Third order.
0.8
0.9
Mutual Information
Mutual Information
A B-splines function is used as the kernel function f in GPVE. The shapes of the first, second, and third order B-splines are shown in Figure 13. When the 1st order Bspline function is employed in each direction, GPVE is equivalent to PVI. As the order of the B-spline function increases, more entries of the joint histogram are involved in updating each pixel in the floating image. The artifacts can be hardly seen in the Mutual Information function when either the 2nd or 3rd order GPVE is used. An example of the calculation of the Mutual Information vs. Shift in x-axis using 1 st and 3rd order GPVE is shown in Figure 14.
0.8 0.7 0.6 0.5 -20
-15
-10
-5
(a) First order GPVE
0
5
0.7 0.6 0.5 -20
-15
-10
-5
0
5
(b) Third order GPVE
Figure 14. Example of Mutual Information vs. Shift in x-axis
6.4. Speed of Computation Mutual Information based registration is computationally intensive. When the entire search space is taken into account to find the optimal solution, it consumes a large amount of time and quite often the optimizer does not converge to a solution. Various
204
P.K. Varshney et al. / Image Registration: A Tutorial
approaches have been devised to reduce this effort. One of the solutions is to perform a two level registration; a coarse registration using the Fourier method or the control point-based method so as to narrow down the search space followed by a finer registration method. In addition to this approach people usually adopt the multiresolution strategy to reduce the computation time. The multi-resolution strategy is explained in detail in Section 7.2.
7. Search Strategy 7.1. Optimization Techniques The registration measure as a function of transformation defines a multi-dimensional function, four (4-D) in the case of rigid body transformation. The argument of the transformation corresponding to the optimum of this function (See Eq.2) is assumed to be the transformation that correctly aligns the images. In practice the registration (similarity) function is not a smooth function, which makes this process a non-trivial one. Also, when dealing with the search strategy the selection of a bounded search space is very important. It has been seen that the function may attain higher values (in case of maximization) than that for the correct transformation for a large misregistration. Hence the optimization is futile, as it would result in some local maxima, which we are not interested in. But in case the search space is limited this anomaly is not found and hence the correct maxima/minima can be located. There are a number of optimization techniques available in the literature and many of them have been applied to the problem of image registration. A detailed listing of these algorithms can be found in a survey paper on mutual information based registration [28] and the related references cited there. The most commonly used methods are the Powell and the Simplex methods. The Powell method optimizes each transformation parameter in turn, but is sensitive to local optima in the registration function. Unlike the Powell method, the Simplex method considers all parameters together, which makes it computationally expensive. Both these methods do not require the computation of the derivatives of the image. Unlike these, methods such as Gradient descent, Newton, Levenberg-Marquardt, which are all local optimization methods and require the computation of the derivative of the image. LevenbergMarquardt is a combination of the Gradient method and the Newton method. It is computationally efficient and has been applied to minimize the sum of squared differences and mutual information. 7.2. Multi-resolution Strategy In some image registration applications, such as remote sensing, the image size is usually large, which results in a high computational cost of the registration algorithms, especially in the Mutual Information method. Therefore, a multi-resolution strategy is introduced to reduce the computation. The multi-resolution strategy starts with the registration of the reference image and the floating image on a coarse resolution (generated using Gaussian pyramids, Laplacian pyramids, simple averaging or wavelet transform coefficients) [2] and then goes up to a finer resolution. Figure 15 shows a two-level wavelet decomposition of an image. At each level, most of the methods listed in Section 5.3 can be used to obtain the registration result. The coarse registration
P.K. Varshney et al. / Image Registration: A Tutorial
205
result reduces the search space of the registration algorithm on the finer resolution and therefore considerably reduces the computational time. Accuracy increases while the registration goes from coarse to fine. However, this strategy will fail if the registration on coarser level gets a false result. To overcome this, a backtracking or consistency check procedure should be incorporated in the algorithms [2]. Wavelet-based registration is a typical multi-resolution strategy. Appendix 3 provides a wavelet registration process combined with correlation method.
Figure 15. Wavelet Decomposition of the Image
8. Evaluation of Image Registration Methods Performance assessment of different registration algorithms is highly desirable so that a user can select the appropriate algorithm for the application involved. This is a nontrivial problem since errors can be introduced into the process at various levels and it is difficult to distinguish between registration inaccuracies and differences due to actual changes in the scene. When registration of multi-modality images is involved, the performance evaluation task becomes even more difficult. It should be pointed out that registration accuracy is not the only metric when evaluating registration algorithm. As mentioned in [29], other metrics for registration evaluation include, Precision, Accuracy, Robustness and Stability, Reliability, Knowledge and Resource requirement, Algorithm Complexity and Computational time, Assumption verification and Usability.
9. Intelligent Methods for Image Registration The growing need for automation calls for an automatic registration algorithm. But since there are a large number of choices available in the various components of the
206
P.K. Varshney et al. / Image Registration: A Tutorial
registration process along with a large spectrum of images and requirements, one registration method cannot satisfy all the scenarios. Hence, an intelligent method, which is based on the inputs and the requirements, can decide which method (at various stages) to apply. Currently research is underway related to this concept at Syracuse University, Syracuse, NY, USA and a prototype of this system is under development at ANDRO Computational Solutions, Rome, NY, USA under a Small Business Innovation Research (SBIR) Phase II Program sponsored by AFRL/SNAR. The proposed architecture of the system is shown in Figure 16.
Figure 16. Proposed architecture of the intelligent image registration system
10. Challenges and Future Work Image registration is one of the most researched areas for multi-sensor image analysis. But the basic understanding of the process taking into account the various modalities of images and other related choices is still in its early stages. As described in Section 7, identifying a bounded search space is an important issue in the registration process and in an automatic registration algorithm this needs to be intelligently decided based on the particular case at hand. Coarse registration followed by fine registration is one possible method to solve the search space problem. Hence, there is a need to develop fast coarse registration techniques for multi-modality images. Also research needs to be done to develop algorithms that can find the global optimum correctly and more efficiently (e.g., use of heuristic testing algorithm) [27]. Recently some work [30, 31, 32] has been initiated to derive achievable performance bounds. This needs to be continued. Another related area, which is underexplored, is a common evaluation criteria/platform for different algorithms. There does not seem to be a consensus in the registration community on the metrics for image registration performance and this requires further investigation.
P.K. Varshney et al. / Image Registration: A Tutorial
207
References [1] [2] [3]
[4]
[5]
[6]
[7]
[8]
[9] [10]
[11] [12] [13]
[14]
[15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26]
Lisa G. Brown, A survey of image registration techniques. ACM Computing Surveys, 24(4): 325-376, December 1992. B. Zitov’a and J. Flusser. Image registration methods: a survey. Image and Vision Computing, vol 21, pp 977–1000, 2003. H. Chen and P. K. Varshney, Mutual Information Based CT-MR Brain Image Registration Using Generalized Partial Volume Joint Histogram Estimation, IEEE Transactions on medical imaging, vol. 22, no.9, pp. 1111-1119, 2003. H. Chen and P. K. Varshney, Automatic two-stage IR and MMW image registration algorithm for concealed weapon detection, IEE Proceedings of vision, image and signal processing. , Vol. 148, no. 4, pp. 209-216, Aug. 2001. P. K. Varshney, H. Chen, L. C. Ramac, Registration and fusion of infrared and millimeter wave images for concealed weapon detection, in Proc. of International Conference on Image Processing, Japan, vol. 3, pp. 532-6, Oct. 1999. H.M. Chen and P.K. Varshney, MI Based Registration of Multi-Sensor and Multi-Temporal Images, Advanced Image Processing Techniques for Remotely Sensed Hyper spectral Data, Editors: P.K. Varshney and M.K. Arora. Publisher: Springer Verlag, 2004. H. Chen, P. K. Varshney, and M. K. Arora, Performance of Mutual Information Similarity Measure for Registration of Multi temporal Remote Sensing Images. IEEE Transactions on Geosciences and Remote Sensing, vol. 41, no. 11, pp. 2445-2454, Nov. 2003 H.M. Chen and P.K. Varshney, Mutual Information Based Image Registration, Advanced Image Processing Techniques for Remotely Sensed Hyper spectral Data, Editors: P.K. Varshney and M.K. Arora. Publisher: Springer Verlag, 2004 H. Chen, P. K. Varshney, and M. K. Arora, Mutual information based image registration for remote sensing data, International Journal of Remote Sensing, Vol. 24, no. 18, pp. 3701-3706, 2003 G. C. Stockman, S. Kopstein, and S. Bennet, Matching images to models for registration and object detection via clustering, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 4, pp. 229–241, 1982. M. G. Tommaselli, and C. L. Tozzi, A recursive approach to space resection using straight lines, Photogrammetric Engineering and Remote Sensing, Vol. 62, N0. 1, pp. 57-66, 1996. Jan F. Andrus, C. Warren Campbell, Robert R. Jayroe, Digital Image Registration Method Using Boundary Maps, IEEE Trans. Computers Vol. 24 N0. 9, pp. 935-940, 1975 X. Huang, Y. Sun, D. Metaxas, F. Sauer, and C. Xu, Hybrid Image Registration based on Configural Matching of Scale-Invariant Salient Region Features, 2nd IEEE Workshop on Image and Video Registration July 2004 Q Chen, M Defrise, F Deconinck, Symmetric Phase-Only Matched Filtering of Fourier-Mellin Transforms for Images Registration and Recognition, IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 16, no. 12, pp. 1156- 1168, 1994. B. S. Reddy and B. N. Chatterji, An FFT-based technique for translation, rotation, and scale-invariant images registration, IEEE Trans. Image Processing, vol.3, pp.1266-1270, Aug.1996 E. De Castro, and C. Morandi, Registration of translated and rotated images using finite Fourier transforms, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 9, pp. 700-703, 1997. R. Berthilsson. Affine correlation. In Proc. Int. Conf. Pattern Recognition, Brisbane, Australia, pp. 1458-1467, 1998. S. Kaneko, Y. Satoh, , S. Igarashi, Using selective correlation coefficient for robust image registration, Pattern Recognition, 36(5), pp.1165–1173, May 2003. J Kim and J A Fessler. Intensity-based image registration using robust correlation coefficients, IEEE Transactions on Medical Imaging, vol. 23, No.11, pp.1430-44, Nov., 2004. P. Viola, and W.M. Wells III, Alignment by maximization of mutual information, Int. Conf. on Computer Vision, pp. 1623, 20-23 June 1995 Collignon, F. Maes, et al, Automated multi-modality image registration based on information theory, in Information Processing in Medical Imaging, Kluwer, pp. 263-274, 1995. http://en.wikipedia.org/wiki/Joint_entropy http://en.wikipedia.org/wiki/Conditional_entropy T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons, New York, 1991. Xenios Papademetris, Image registration: a review, Yale MRRC fMRI Seminar Series, 16th October 2003. J. Pluim et al, Interpolation Artifacts in mutual information-based image registration, Computer Vision and Image Understanding, Vol. 77, pp. 211-232, 2000.
208
P.K. Varshney et al. / Image Registration: A Tutorial
[27] Hua-mei Chen, Mutual information based image registration with applications, Ph.D. dissertation, Syracuse University, Syracuse, NY, May 2002. [28] J.P.W. Pluim, J.B.A. Maintz, M.A. Viergever, Mutual information based registration of medical images: a survey, IEEE Trans on Medical Imaging, vol X No Y, 2003 [29] Maintz, J.B.A. and Viergever, M.A., A survey of medical image registration, Medical Image Analysis, vol. 2, no.1, pp.1-36, 1998. [30] Robinson, D., and P. Milanfar, Fundamental performance limits in image registration, IEEE Transactions on Image Processing, vol. 13, no. 9, pp. 1185-1199, September 2004. [31] S. Yetik and A. Nehorai, Performance bound on image registration, IEEE International Conference on Acoustics, Speech, and Signal Processing, March, 2005. [32] M. Xu and P. K. Varshney, Tighter performance bounds on image registration, submit to IEEE International Conference on Acoustics, Speech, and Signal Processing, 2006.
Appendix 1 Control point based image registration. Step I: Selection of control points − At least four pairs of feature points in the reference image and floating image are selected and matched manually (automated algorithms can also be used). In this example we select four pairs as shown in Figure 9. Step II: Selection of transformation space − A transformation space corresponding to the first-order polynomial function is assumed. Step III: Formulation of relationship between features − Using the selected control points and the transformation space, eight (twice the number of feature pairs) equations similar to the one below are formed. u i = f ( x i , y i ) + n i = a 0 0 + a1 0 x i + a 0 1 y i + n i v i = g ( x i , y i ) + m i = b 0 0 + b1 0 x i + b 0 1 y i + m i i = 1 ...4 where ui and vi are the coordinates of the reference image; xi and yi are the coordinates of the floating image; ni and mi are usually modeled as noise; {a00, a01, a10, b00, b01, b10} are the parameters of transformation space. Step IV: Solving for the transformation parameters − Least squares technique is used to estimate the six parameters {a00, a01, a10, b00, b01, b10}. − Let us denote, * ⎧ f * ( x i , y i ) = a 0* 0 + a 10 x i + a 0*1 y i ⎨ * * * * ⎩ g ( x i , y i ) = b 0 0 + b1 0 x i + b 0 1 y i The criterion is to minimize the noise energy, which is to minimize ⎧ ∑ n ( u i − f * ( x i , y i )) 2 ⎪ i =1 ⎨ n * 2 ⎪⎩ ∑ i = 1 ( v i − g ( x i , y i )) The estimates can be obtained by solving the following equations
P.K. Varshney et al. / Image Registration: A Tutorial
⎧ ∂ ⎪ ∂a * j ,k ⎪ ⎨ ⎪ ∂ ⎪⎩ ∂ b *j , k
∑ ∑
n i =1
209
( u i − f ( x i , y i )) 2 = 0
n
( j , k ) = (0, 0 ), (0,1), (1, 0 )
( v i − g ( x i , y i )) 2 = 0 i =1
Step V: Estimating new image using the transformation determined − Choose an interpolation method − Transform the image using the transformation determined.
Appendix 2 Fourier based image registration. Step I: Domain transformation − Convert the images into frequency domain. − Further pass the magnitude of the Fourier spectrum through the high pass filter. − Convert this magnitude spectrum into log polar plane. Step II: Find the rotation angle and scale − Compute the Cross power spectrum (Eq. (10)) between the two log polar images. − Determine the inverse Fourier transform (Eq. (11)) of the cross power spectrum and identify the top peaks. The coordinates of the peaks are the estimated rotation angle and scale. The number of peaks taken into consideration depends on the accuracy desired. Step III: Find the translation − Apply the estimated rotation and scale to the floating image so that the new floating image only has translation errors. − Compute the cross power spectrum (Eq. (10)) between the new floating image and the reference image. − Determine the inverse Fourier transform (Eq. (11)) of the cross power spectrum and identify the top peaks. The coordinates of the peaks are the translations in xaxis and y-axis.
Appendix 3 Wavelet based image registration. Step I: Wavelet decomposition − An image is first decomposed recursively into four sets of coefficients (LL, HL, LH, HH) by filtering the image with two filters, a low-pass filter L and a high pass filter H, both working along the image rows and columns. Figure 15 shows the two-level wavelet decomposition by the Harr wavelet. Step II: Initial registration
210
P.K. Varshney et al. / Image Registration: A Tutorial
− Initial transformation parameters are estimated by optimizing normalized crosscorrelation function (Eq. (18)) of wavelet coefficients between the reference image and the floating image on a coarse level. The normally used wavelet coefficients are LL coefficients, HH coefficients, both HL and LH coefficients and modulus maxima of LH and HL coefficients, depending on the application. Step III: Finer registration − After initial registration, the search space of the optimization of the correlation function of wavelet coefficients in the next finer resolution will be narrowed down. New transformation parameters are updated. Step IV: Stopping criteria − Repeat Step III, until the accuracy of the estimation of the transformation parameters is achieved.
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
211
Automated Registration for Fusion of Multiple Image Frames to Assist Improved Surveillance and Threat Assessment Malur K. SUNDARESHAN and Mohamed I. ELBAKARY Department of Electrical and Computer Engineering University of Arizona, Tucson, AZ 85721-0104 Tel: (520) 621-2953; Fax: (520) 626-3144; e-mail:
[email protected] and
[email protected] Abstract. Automated registration of image frames is often required for construction of High-Resolution (HR) data to perform surveillance and threat assessment. While some efficient approaches to image registration have been developed lately, the registration algorithms resulting from these approaches generally remain application dependent and may require operator-assisted tuning for different images to achieve same efficiency levels. In this article, we describe an algorithm for automatic image registration that assists improved surveillance and threat assessment in scenarios where multiple diverse sensors are used for these applications. This algorithm offers scene-independent registration performance and is efficient for different scenes ranging from complex highlyvarying gray-scale images to simpler low variable gray-scale images. While use of feature-based methods has emerged as more versatile for automatic registration in surveillance applications (compared to other methods based on correlation, mutual information maximization, etc.), the algorithm described here employs the local frequency representation for the image frames to be registered in order to generate a set of control points to solve the matching problem and to determine the registration parameters. The algorithm exploits certain inherent strong points of local frequency representation, such as robustness to illumination variation, capability of detecting the structure of the scene in the image (ridges and edges) simultaneously, and good localization in spatial domain. Experimental results reported here indicate that this registration technique is efficient and yields promising results for the alignment and fusion of complex images. Keywords: Image Registration, Resolution Enhancement, Feature Extraction, Object Recognition, Sensor Fusion, Surveillance, Threat Assessment.
Introduction High-Resolution (HR) imagery data are often needed in many practical applications to support important image exploitation tasks, such as detecting objects of interest within a scene, identifying and assessing severity of threats, autonomously recognizing specific targets that may be of interest for a specific mission, or precisely tracking the motion of a chosen target. Due to optical diffraction limits (and the consequent lowpass filtering effects) and also due to the fact that detector arrays in these sensors may not be configured to have required densities for imaging at Nyquist rates (and the consequent aliasing due to undersampling), the resolution in the data captured by EO
212
M.K. Sundareshan and M.I. Elbakary / Automated Registration for Fusion
and other types of imaging sensors (such as IR, MMW, LADAR) will be quite low to permit image exploitation tasks to be executed reliably. A very promising approach for the construction of HR frames is to use advanced signal processing methods directed to summing multiple frames of data output by a sensor (such as video data or image frames acquired from employing micro-scanning techniques) or directed to fusing image frames captured by multiple diverse sensors looking at the same scene. A fundamental processing step that is required to ensure an efficient integration of multiple Low-Resolution (LR) frames into a single HR frame by frame summing (when the frames are captured by the same sensor) or by image fusion (when the frames are captured by multiple diverse sensors) is the registration of frames to be integrated. A schematic of an image processing system (shown in Figure 1) that reconstructs HR image frames for executing image exploitation operations provides a framework for describing the general methodology where image registration will be of interest in the studies reported in this article. The integration of EO data with data from other sensors (IR or other), as shown, is only for illustration of general concepts. Our focus in this article will be the development of image registration algorithms that facilitate the multiframe summing and image fusion operations shown in order to produce a HR frame for surveillance and threat assessment applications. It should be noted that registration of image frames is an important problem in many other fields as well. It is of particular interest in remote sensing, medical imaging and computer vision, and is a prerequisite for the alignment and fusion of multiple image frames. In this problem we are given two images of roughly the same scene, and are asked to determine the transformation that most nearly maps one image to the other. A good survey of the existing literature on this problem can be found in [1,2]. High Resolution (HR) EO Frame
Frame 1
EO Sensor
Frame 2 Frame 3
Multi-frame Summing
Frame K
Image Fusion
Frame 1
IR or Other Sensors
Fused HR Image of Scene
Image Exploitation Stages
Frame 2 Frame 3 Frame K
Multi-frame Summing
High Resolution (HR) IR or other sensor
Figure 1. Schematic of an Image Processing System for Data Fusion and High-Resolution (HR) Frame Reconstruction
1.
Development of Image Registration Algorithms for Surveillance and Threat Assessment
One may broadly classify existing image registration methods into two classes, viz. feature-based matching methods [3-5] that attempt to match certain control points or dominant features in the two frames, and direct methods which implement a search strategy that attempts to optimize some meaningful criterion [6-8]. The former approach requires computing of contours, features, surfaces, or geometric distribution in the images, while the latter method uses the raw images (without any significant
M.K. Sundareshan and M.I. Elbakary / Automated Registration for Fusion
213
preprocessing) to compute the chosen optimization criterion (as in the minimization of normalized least-square error or the maximization of mutual information). Both approaches have certain strong and weak points. While methods that attempt to match extracted features have been shown to give accurate solutions, obtaining the correct features to match and selecting a corresponding matching algorithm are particularly difficult problems, especially in the case of images acquired from diverse sensors operating in different modalities. Consequently, most of the image registration algorithms developed using this approach assume that features are well preserved within the different images. Direct methods, on the other hand, can get computationally expensive and typically need a good initial guess to ensure proper convergence. The optimization procedure may get trapped in a local extremum, especially in the cases when the registration parameters (translation, rotation, and/or scaling) that need to be estimated are of significant sizes. Despite these differences between the two approaches, feature-based matching methods have generally emerged as more versatile for applications in surveillance and threat assessment, since it is possible to give a greater emphasis on specific portions of the overall scene (by extracting features from these portions only), while deemphasizing or even disregarding other areas. On the other hand, direct methods that typically utilize information contained in the entire scene in computing the optimization measure have found a more satisfactory application in geo-remote sensing and medical imaging. While some efficient image registration procedures have been developed lately following the two approaches cited above, the resulting registration algorithms are still application dependent [1,2]. In general, an algorithm that offers superior performance for one class of images (or scenes) may not be equally efficient for images of a different type. Recently, use of local frequency computed from the images to be registered has been suggested for multi-modal image registration [9-12]. An introduction to local frequency representation and its use in signal analysis can be found in [12-14]. It is of interest to note that local frequency enjoys some inherent advantages that make it useful for image analysis: it is relatively invariant to illumination changes (and hence insensitive to the level of signal energy), it provides a faithful representation of the structure (both edges and ridges simultaneously), and has a good localization in spatial domain [11]. These benefits make use of local frequency a promising candidate for handling image registration problems. Unfortunately, however, computation of local frequency can become quite involved, especially for complex images with significant gray-scale variations. In recent work, the authors have established a computationally efficient procedure for obtaining the local frequency representation of input images [15]. Once the local frequency representations for the two images to be registered are obtained, a set of points that have high local frequency values can be extracted from each. These sets serve to provide a characterization of the dominant features (edges and ridges, for instance) from each image and hence provide an ideal selection of control points for establishing a match between the two images. The matching problem can be solved within an optimization framework in order to estimate the registration parameters. In this article we shall outline a systematic procedure that implements these steps and demonstrate the performance of the overall scheme by application to diverse complex images of specific interest in surveillance and threat assessment scenarios.
214
2.
M.K. Sundareshan and M.I. Elbakary / Automated Registration for Fusion
Local Frequency Representation of Images
Given an image, its local frequency representation can be obtained from computing the spatial derivative of the local phase extracted from the image. For a brief introduction to the local frequency representation, consider a one-dimensional signal s , whose corresponding analytic signal is defined as s A s isHi , where sHi is the Hilbert transformation of s defined by
sH i ( x )
1
S
f
s([ )
³ [ x d[
f
(1)
It may be noted that the Hilbert transform can be computed quite easily by performing the operation [12]
sH i
s
1 , Sx
i.e., by convolving s with the function
(2) 1
S x
. Thus, the transformation of a given
real signal s to the corresponding analytic signal s A can be regarded as the result of convolving the real signal with a complex filter, such as a Gabor filter [16]. The argument of s A is referred to as the local phase of s , which is defined in the spatial domain. The spatial derivative of the local phase is called the instantaneous or local frequency [12]. To summarize the above discussion, given a real signal, its corresponding analytic signal is complex with the real part being the original signal itself and the imaginary part being its Hilbert transform obtained by convolving the signal with a Gabor filter. Although the above discussion is given for a onedimensional signal for the sake of simplicity, it can be generalized to higher dimensions readily [12]. It may be mentioned that Gabor filters are the most popularly used filters for local frequency representation of a given image due to the fact that 2-D Gabor functions are optimal in terms of their space-frequency localization. A computationally efficient procedure for obtaining the local frequency representation of a given image is recently developed by the authors [15, 21]. This procedure utilizes some important recent findings inspired by biological data (efficiency of the model human image code and orientation bandwidth of visual cortex cells) of splitting up the two-dimensional spatial frequency plane of a given image into 4 orientation bands, with orientation bandwidths of 45 degrees (in order to cover 180 degree range, or one half of the 2-D frequency plane), for constructing a multi-channel filtering scheme. A brief outline of this procedure will now be given. 1. Create a set of Gabor filters, Gk ( x, y, f k ,T k , V k ) , k= 1,2,3,4, to cover the frequency space for the image under investigation, where f denotes the spatial frequency, T is the orientation angle with x-axis, and V is the Gaussian window width. (x,y) denotes the pixel position in the spatial domain. 2. Convolve image I whose local frequency representation is desired with each filter Gk, k=1,2,3,4, using
M.K. Sundareshan and M.I. Elbakary / Automated Registration for Fusion
u ( k ) ( x, y )
3.
I
Gk
(3)
Compute the local phase, < calculating
< ( k ) ( x, y )
4.
tan 1
215
(k )
, for the analytic signal u ( k ) , k=1,2,3,4, by
imag (u ( k ) ( x, y )) real (u ( k ) ( x, y ))
(4)
Create the local frequency representation using 2 x (< ( k ) ( x, y )) 2 y (< ( k ) ( x, y )) | cos(T k T ' ( K ) ) | ,
*(k )
where T k is the orientation of
(5)
the kth Gabor filter and T ' ( x, y ) denotes the
direction of gradient vector with respect to x-axis , T ' ( k ) ( x, y )
tan 1
< ( k ) y ( x, y ) < ( k ) x ( x, y )
. It
k
may be noted that * provides a spatially localized estimate of the local frequency along the direction T k [17]. 5. Fuse the local frequency estimations obtained from each of the four filters,
* k , k=1…4, to get one representation using * ( x, y )
(6)
max{* (1) ( x, y ), * (2) ( x, y ), * (3) ( x, y ), * (4) ( x, y )}
An illustration of local frequency representation of a given image is shown in Figure 2, where the original image (“Cameraman” image) is shown in Figure 2(a) and its local frequency representation encoded as a gray-scale image is depicted in Figure 2(b). It is evident that the higher local frequency values in the local frequency representation translate directly into higher gray-scale values in the encoded image depicted in Figure 2(b).
(a)
(b)
Figure 2. (a) Cameraman image; (b) Gray- scale encoded local frequency representation
3.
Matching the Local Frequency Representations
Having obtained the local frequency representations of a pair of images, one can select a set of points from each representation as control points. We select those points that have the largest values because they reflect the most apparent structure in the image
216
M.K. Sundareshan and M.I. Elbakary / Automated Registration for Fusion
(edges and ridges) as shown in the encoded image in Figure 1b. The number of the selected points should be enough to capture all dominant features from the image and to establish the matching. Evidently, the number of points to be selected varies from one image to another based on the size of the image and level of activity within the scene. In our experiments, we have found that the number of points enough to capture most dominant features and to establish a good match between frames typically ranges from 100 to 300. The size of the images considered in our experiments varies from 64 x 64 to 256 x 256. For more details on these experiments and some guidelines for selecting control points, one may see [18]. Once the control points are selected for a pair of images, the correspondence between them can be obtained by matching the set of points selected. While there exist a number of point matching procedures, we employed the algorithm presented by Gold et. al. [19]. This algorithm incorporates an optimization technique and an iterative correspondence assignment technique called “Softassign”, which is a general procedure for identifying the correspondence between two sets of points in space. The motivation for using this algorithm comes from its capability that it can detect the outliers from matched pairs while at the same time estimate the transformation parameters. Given two 2D point sets { X i } and {Y j } related by an affine transformation X AY , A
denoting the transformation matrix, the algorithm attempts to minimize the objective function, min E (m, A) m, A
M
N
i 1
j 1
¦ ¦
mij X i AY j
2
M
D ¦i
1
¦
N j 1
mij
(7)
In Eq. (7), mij denote the correspondence variables that define the match matrix of dimension M u N . The second term with the multiplier D biases Eq. (7) towards matches. It acts as a threshold error distance, indicating how far apart the two points must be before they may be treated as outliers. For image registration, A is a 3x3 2D affine transformation matrix in the plane and defined by six parameters a, b, c , e, f, and g in the form
ªa A «« e «¬ 0
b f 0
cº g »» 1 »¼
As is well-known, these six parameters specify the translation, the scaling and the rotation in the plane [20]. Eq. (7) describes an optimization problem whose solution yields the transformation matrix, A. Due to space limitations, more details on this algorithm are omitted. They may however be found in [19].
4.
Experimental Results
A number of experiments were conducted with different types of images in order to evaluate the performance of the present registration algorithm. Results from a few illustrative ones are briefly summarized in this section. In order to create ground truth data with which the registration parameters estimated by the present algorithm could be compared for a quantitative evaluation, the basic image frame in each case was
M.K. Sundareshan and M.I. Elbakary / Automated Registration for Fusion
217
distorted by a known affine transformation in order to obtain a second frame that was then registered with the first (undistorted) frame. The first experiment uses the “Cameraman” image, of size 128 x 128, shown in Figure 1(a). This was rotated by a 4o angle in order to obtain the distorted frame shown in Figure 3(a). Estimating the registration parameter (rotation angle, in this case) between the two frames by the present algorithm occurs in three steps. First, we obtain the local frequency representations for the two frames using the procedure outlined in Section 2 (the local frequency representation of the frame in Figure 2(a) is shown in Figure 2(b)). Then a set of control points are selected from each local frequency representation for execution of the matching step by isolating points with local frequency values exceeding a chosen threshold. For the images under consideration, it was found that a set of 200 control points extracted from each local frequency representation is enough to cover all principal features in the image as well as to ensure that there are an adequate number of matching pairs to implement the matching algorithm. Determination of an appropriate number of control points is generally dependent on the activity within the image (and hence is image-dependent). However, in all experiments that were performed, matching sets containing about 300 control points were found to be adequate to give reasonably accurate estimates of the registration parameters. For illustration, Figure 3(b) shows the control points extracted from the local frequency representation shown in Figure 2(b). It must be emphasized that selection of the least number of control points necessary for the specific image being processed is useful for minimizing the computational effort in the matching step. However, extracting a matching set that may have more than the required minimum number of points would eliminate user intervention and makes the registration process fully automatic for all considered images. In the third step, we apply the matching algorithm to establish correspondence between the two extracted sets and to estimate the transformation parameters at the same time. The estimated rotation angle in this experiment was 3.9o, which agrees with the ground truth data quite well. Figure 3(c) shows the distorted frame in Figure 3(a) after it is registered by the estimated parameters (i.e. de-rotated by 3.9o).
(b)
(a)
(c) Figure 3. (a) Cameraman image distorted by a rotation angle 4 o. (b) The selected control points from the local frequency representation. (c) Image in (a) after de-rotating by the recovered angle 3.9 o
Figure 4 shows the results of an experiment performed with a different type of image. The original image, of size 256 x 256 and shown in Figure 4(a), is distorted by scaling to 85% of the original and a rotation of 3o. The resulting distorted image is then
218
M.K. Sundareshan and M.I. Elbakary / Automated Registration for Fusion
registered with the original frame. The estimated registration parameters by the present algorithm in this case are 3.1o for rotation and 0.85 for scaling. Figure 4(b) shows the local frequency representation of the frame in Figure 4(a), and the extracted control points for matching are shown in Figure 4(c). The registered image after de-rotating by 3.1o and prior to applying scaling is shown in Figure 4(d). For increasing the challenge to the present algorithm, one other experiment with remotely sensed earth data was conducted. Figure 5(a) shows a reference data frame of size 256 x 256. A distorted image, shown in Figure 5(b), was obtained by rotating this frame by 6o (and no scaling), which was then registered with the original frame. The estimated registration parameters obtained were 6.01o for rotation and 1 for scaling. The local frequency representation of the original frame is shown in Figure 5(c) and the distorted image after de-rotation by 6.01o is shown in Figure 5(d).
(a) (b)
(d)
(c)
Figure 4. (a) Original airplane image. (b). Local frequency representation. (c). The selected control points from the local frequency representation. (d). Distorted image registered with image in (a) by applying 3.1 o de-rotation.
(a)
(b)
(c)
(d)
Figure 5. (a) An aerial image of part of a city. (b) Distorted image by applying 6 o rotation to the reference image. (c) Local frequency representation for the image in (a). (d) The registered image obtained by applying 6.01o de-rotation to image in (b).
M.K. Sundareshan and M.I. Elbakary / Automated Registration for Fusion
5.
219
Conclusions
Utilization of local frequency representation of images to be fused offers a promising approach for solving image registration problems arising in the construction of highresolution frames for multi-sensor surveillance and threat assessment. Computing the local frequency representations of the images by the algorithm described in this article enables to obtain a fully automated approach for image registration. Results presented in this paper demonstrate that the technique described here can be efficiently utilized to register diverse images with differing complexity levels and that the algorithm is quite robust to scene details. The present algorithm can hence find many applications in sensor fusion, object recognition and threat assessment, and detection and tracking of surveillance targets. With some modifications, the algorithm can also be extended to sub-pixel registration, which has important applications in super-resolution frame reconstruction using micro-scanned sensor measurements. Details on these developments can be found in [22].
6. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]
L. G. Brown, “A Survey of Image Registration Techniques”, ACM Computing Surveys, 24(4), pp. 325376, 1992. A. J. B. Maintz, and Viergever, M. A., “A Survey of medical image registration”, Medical Image Analysis, 2(1), 1-36, 1998. V. Govindu and C. Shekhar, “Alignment using distributions of geometric properties,” IEEE Trans. PAMI 21(10), pp. 1031-1043, 1999. J. Ton and A. K. Jain, “Registration Landsat images by point matching,” IEEE Trans. Geosci. Rem. Sen., 27(9), pp.642-651, 1989. T. Kim and Y. J. Im, “Automatic satellite image registration by combination of matching and random sample consensus,” IEEE Trans. Geoscience and Remote Sensing, 41(5), pp.1111-1117, 2003. P. Viola and W. Wells, “Alignment by maximization of mutual information,” Proc. Fifth Int. conf. On Computer Vision, pp.16-23 Boston, MA, 1995. K. Johnson, A Rhodes, J. Le Moigne, and I. Zavorin, “Multi-resolution image registration of remotely sensed imagery using mutual information,” Proc. SPIE Aerosense conf. on Wavelet Applications VII, 2001. P. Fua and Y. Leclerc, “Image registration without explicit point correspondences,” Proc. DARPA Image Understanding Workshop, pp. 981-992, 1994. J. Liu, B. C. Vemuri and F. Bova, “Multi-modal image registration using local frequency,” Fifth IEEE Workshop on Applications of Computer Vision. pp: 120 –125, 2000. J. Liu, B. C. Vemuri, and J. L. Marroquin, “Local frequency representations for robust multimodal image registration,” IEEE Trans. Medical Imaging, 21(5), pp. 462-469, 2002. J. Liu, “Regularized quadrature for local frequency estimation: application to multi-modal volume image registration,” VMV’01, Stuttgart, Germany, pp.507-514, 2001. G. H. Granlund and H. Knutsson, Signal processing for Computer Vision, Dordrecht, The Netherlands: Kluwer Academics, 1995. B. Boashash, “Estimating and interpreting the instantaneous frequency of a signal – Part 1: Fundamentals,” Proc. Of IEEE , 80(4),pp. 520-536, 1992. B. Boashash, “Estimating and interpreting the instantaneous frequency of a signal – Part 2: Algorithms and applications”, Proc. of IEEE , 80(4), pp. 540-568, 1992. M. Elbakary and M. K. Sundareshan, “A Novel scheme for registration of images from multiple diverse sensors,” Proc. of The International Conference on Imaging Science, Systems and Technology (CISST’04), Las Vegas, Nevada, June 2004. D. Gabor, “Theory of communications,” Journal of International Electrical Engineers, 93:427-457, 1946. G. M. Haley and B. S. Manjunath, “Rotation-Invariant texture classification using a complete spacefrequency model”, IEEE Trans. Image Processing, 8(2), pp.255-269, 1999.
220
M.K. Sundareshan and M.I. Elbakary / Automated Registration for Fusion
[18] M. Elbakary and M. K. Sundareshan, “Extraction of control points from local frequency representation for image registration”, Technical Report IPDSL-12-2003, ECE Department, University of Arizona. Tucson, AZ, July 2003. [19] S. Gold, A. Rangarajan, C. Lu, S. Poppu, and E. Majolseness “New algorithms for 2D and 3D point matching: pose estimation and correspondence,” Pattern Recognition, Vol. 31, No. 8, pp. 1019-1031, 1998. [20] W. K. Pratt, Digital Image Processing., John Wiley & Sons Inc., New York, 1991. [21] M. Elbakary and M. K. Sundareshan, “Accurate representation of local frequency using a computationally efficient Gabor filter fusion approach with application to image registration”, Pattern Recognition Letters, June 2005. [22] M. Elbakary and M. K. Sundareshan, “Sub-pixel registration of images using local frequency representations”, Technical Report IPDSL-10-2004, ECE Department, University of Arizona. Tucson, AZ, December 2004.
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
221
Data Fusion and Image Processing : A Few Application Examples Olivier GORETTA and Francis CELESTE DGA/SPOTI/ASC/EORD DGA/CEP/ASC/GIP
Abstract. Image data provided by different available and future observation satellites can improve our capabilities of detection and reconnaissance concerning an area of interest. To be effective, this information must be processed and used in a coherent manner. We propose to introduce several examples that take advantage of SAR, optic and infrared images for mapping and threat activity detection and assessment. Image fusion can be performed at three different processing levels (pixel, feature and decision). These different examples of fusion application are related to defence purposes. Keywords: Image registration, data fusion, image fusion, image exploitation, remote sensing, DTM, mapping.
Introduction Imagery exploitation is now a key element in several defence applications. Image data provided by different available and future observation satellites can improve our capabilities of detection and reconnaissance concerning an area of interest. To attain the best level of preparedness and provide the warfighter with requisite information, all information must be processed and used coherently. We propose to introduce several examples that take advantage of Synthetic Aperture Radar (SAR) and optical (Visible and Infrared) images for mapping and threat activity detection and assessment. Image fusion can be performed at three different processing levels (pixel, feature and decision). These different examples of fusion application are related to defence purposes. The definition for data fusion will be the one proposed by Wald in 1998: Data fusion is a formal framework in which are expressed means and tools for the alliance of data originating from different sources and for exploitation of their synergy to obtain an information whose quality cannot be achieved otherwise. The following points will be discussed: x Co-registration or geo-localization x SAR image enhancement based on a multi-temporal fusion: SAR images are naturally corrupted by multiplicative noise (“speckle”), which can make analysis of SAR images more difficult than optical images. Additive multitemporal image fusion can make the interpretation easier by reducing the speckle without affecting the ground sample distance.
222
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
x x x x x
Digital Terrain Model (DTM) extraction from SPOT stereoscopy, European Remote Sensing Satellite (ERS) interferometry and Radarsat radargrammetry fusion. 3D model reconstruction enhanced with hyperspectral imagery used for ground/building classification. Assessment of a 3D model quality using a single SAR image. An example of multi-source image data synergy (optical: visible and infrared, SAR) for a military threat assessment (Detection Recognition and Identification). Change detection method.
1. Geo-localization and Registration Task
1.1. Image Geo-localization Geo-localization consists of finding a mathematical model that allows the geographical position of each image pixel to be identified. This model may be more or less accurate depending on the knowledge of the imaging acquisition process [4]. A geo-localization model is defined by two functions:
(O , M , h) G (i, j , h) ® ¯ (i, j ) H (O , M , h)
(O , M ) , h and (i, j ) are, respectively, the geographical position, the height on the Earth surface and the pixel position of one point present in the image. Two main families of models can be found: x A complete physical model where all information on flight or satellite and on sensors is used. x A mathematical model such as the polynomial or rational polynomial model. Generally, some parameters of the geo-localization functions are provided with the image but very often they are re-estimated. This can be done for meaningful points whose geographical positions are known. 1.2. Image Registration Image registration [5] [1] is the process of overlaying two or more images of the same geographic area. The images can be taken: x at different times, x from different viewpoints, x and/or by different sensors. Due to the differences introduced by several imaging conditions, accurate registration is still a challenging task. Nevertheless, image registration is a crucial step in all image analysis tasks in which final information is gained from the synergy of various data sources like in image fusion, change detection, and DTM mapping. The
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
223
purpose of a registration algorithm is to geometrically align two or more images—the master and slave (s). It is impossible to define a generic image registration method for all applications due to the diversity of images to be registered and the various types of image degradation. The geometric and radiometric distortions but also the noise corruption must be taken into account according to the nature of the considered data. Nevertheless, many image registration methods can be divided into four main steps: x Feature extraction: distinctive structures such as contours, lines, points or surfaces are detected and extracted from the images. The extraction process can be performed manually, semi-automatically or completely automatically. Point features are classically called tie-point or Control Points. x Feature matching: the aim is to establish a correspondence between the features from the master and those from the slave. This step can also be done manually or automatically. x Geometric transform model estimation: conditional to the features and the map correspondence given in the previous step, the aim is to find the “best transformation” model for aligning the slave image with the master. x Image geometric and radiometric resampling: the slave image is resampled in the geometry of the master image using an appropriate interpolation method. Master image
slave image
Feature extraction
Master features
Feature matching
Correspondent features Geometric model estimation
Model parameters
Image resampling
Figure 1. Image registration method
Slave features
224
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
1.3. Example : Automatic SAR-SAR Registration The example shown in Figure 2 deals with the registration of SAR and SAR images with different aspects. In this context, classical area based methods like crosscorrelation are not suitable because of the important geometric distortion between the two images. The idea is to use high-level features like lines. The matching and the geometric transformation model are estimated at the same time by a hypothesis test on the transformation.
Figure 2. Image with extracted line features (image courtesy of ONERA)
2. SAR Images Enhancement Based on a Multi-Temporal Fusion SAR images are naturally corrupted by a multiplicative noise-like phenomenon (known as “speckle”), which can make analysing them more difficult than optical images. The wave emitted by the sensor interacts with each discrete scatterer (surface individual element) within the resolution cell and thus each scatterer contributes a backscattered wave with a phase and amplitude change, so the total returned signal of the incident wave is: iI Ae
N iI k ¦ A e k k 1
The individual scattering amplitudes Ak and I k are unobservable because the individual scatterers are on a much smaller scale than the resolution of the SAR sensor and there are normally many such scatterers per resolution cell. This is the Goodman model [2]. This model can be used for low and medium resolution SAR sensors such as ERS or Radarsat. With this approach, speckle can be understood as an interference phenomenon in which the principal source of the noise-like quality of the observed data is the distribution of the phase terms I k. In practice, one can consider that the phase terms
I
k
are uniformly distributed in >S ; S @ and independent of the amplitude. If we
presume large numbers of scatterers to be statistically identical, the observed phase
I
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
225
is uniformly distributed over > S ; S @ . The speckle reduction can be made with one image by special filtering of the neighbouring pixels but this will reduce the resolution. If we assume that the scatterers in different images are randomly distributed and numerous enough, then by fusing, i.e., taking the sum of the returned signal of each image, the resulting imaginary term should be approximately equal to 0, which means that additive multitemporal fusion can reduce speckle without affecting the ground sample distance, thus making the interpretation process easier. This additive multitemporal fusion can be done using different techniques or filters [7], which lead to different visual aspects for a given image as can be observed in Figure 3.
Figure 3. Original (upper left) and multitemporal fused images with three different techniques [7]. (images courtesy of CNES and SILOGIC)
In practice, filter selection depends on the amount of available data (images) and on user needs. For example, some methods are more relevant for application where contours must be preserved.
3. Digital Terrain Modelling (DTM) Extraction from ERS Interferometry, SPOT Stereoscopy and Radarsat Radargrammetry Fusion There are three main methods to obtain elevation information from remote sensing data. These methods take advantage of data from different viewpoints, usually involving an image pair. These data can be either optical (SPOT stereoscopy) or SAR (interferometry or radargrammetry).
226
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
3.1. Interferometry Processing Interferometry [1] [2] uses two overlapping images from two orbits at a small distance (base-line). It differs from stereoscopy and radargrammetry in that the information used is not absolute but ambiguous. The height information is obtained by unwrapping the phase difference between the two SAR data. The absolute height information can then be calculated using ground control points. B D
S1
B
S2
Bz Bx
G
T
T
r2
r2 r1
H r1
zoom
z
The height can be estimated from the following approximated formulas:
°I1 I 2 ® °¯
'I
4S
r1 r2 # 4S Bx sin T Bz cos T O O cos T H z r1
Interferometry processing involves: x Image co-registration as described before. First, a coarse registration is done with the orbital information of the sensors. It is then refined with an areabased correlation technique using Fast Fourier Transform (FFT) computation. x Interferogram building: The complex multiplication of the two registered images gives the phase difference map (interferogram) and the coherence image. The resolution of the derived DTM is generally lower than the resolution of the original images. x DTM extraction. This is done by unwrapping the interferogram. Each pixel phase is known modulo 2S . The unwrapping strategy consists of converting the phase difference measurement to distance, from which the elevation is derived. x DTM calibration. Due to a phase’s ambiguous nature, the DTM does not have an absolute calibration. Height ground control points are therefore used to
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
227
calculate a correct DTM. These ground control points need to be chosen from a perfectly unwrapped area, usually manually. They can be taken from a map or from another DTM. x DTM correction. This is often necessary because of residual misalignment of the two images or unwrapping failures in some areas. The theoretical accuracy of the interferometry process depends on the so-called ambiguous height linked to the wavelength O , the interferometric baseline B and the sensor acquisition angle T .
ea
Or sin T 2 Bx
The quality of the extracted DTM is also described by the coherence image; the higher the coherence, the better the estimated height. Nevertheless, interferometry suffers from limitations such as: x Unwrapping failures x Very high sensitivity to time differences between the two images, which causes low coherence. x Sensitivity to atmospheric effects x Foreshortening and shadowing effects due to the SAR sensor geometric acquisition 3.2. Stereoscopy Processing Stereoscopy is a classical approach to derive DTM from two optical images, and involves the following steps: : x Image registration x Image resampling in epipolar geometry x Correlation between the two images to obtain a disparity image map x DTM extraction and filtering.
z#
d d # tan(T 2 ) tan(T1 ) B / H
S1
S2
B
T1
T
H
z d
2
228
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
The theoretical accuracy of the stereoscopy process depends on the geometrical configuration of the two images defined by the so-called B / H ratio.
Gd
Gz
B/H
with x Gz : height error x Gd : disparity error The confidence criteria of the estimated height is directly related to the correlation peak; the higher the correlation, the better the estimate. The limitations of stereoscopic methods are well known and are essentially: x Sensitivity to time differences between image acquisition x Presence of shadows and clouds in the images x Correlation difficulties in mountainous areas 3.3. Radargrammetry Processing Radargrammetry [1] [2] is based on the same principles as stereoscopy, but takes into account the geometric peculiarities of SAR images. The general algorithm should be the same as for stereoscopy but with the added possibility of using speckle reducing filters (§.2) before computation. The elevation confidence criteria is also derived from the correlation process. 'r S1
B
S2
M 'z T1
T2
T1
r2
r1
'x
H
M
M'
'x ® ¯'x
cot T1 cot T 2 .'z 'r / cos T 2
T2
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
229
The main limitations of radargrammetry are: x Sensitivity to time differences between image acquisition x Foreshortening and shadowing effects due to the SAR geometry x Speckle 3.4. Fusion Methodology The goal of the fusion process is to obtain a DTM from several DTMs previously acquired using the three main methods described above. Each method provides a height estimate with a confidence parameter. Due to limitations, there is no estimate for some locations. The purpose is to find the height h(X) at position X while minimizing the cost function:
J (h( X ))
¦¦ k
X
ok ( X ) o kth X , h( x)
Vo
k
2
J .¦ ((h( x)) X
with : x o k ( X ) : Estimate value given by method k at position X, which may be the x
disparity, the phase difference or height information. V ok ( X ) : confidence associated with ok ( X ) .
x
okth ( X , h( X )) theoretical estimation for method k at position X.
x
J .¦ ((h( x)) : is a regularization function used to filter or smooth the fused X
DTM, and to fill in the missing parts. It introduces rigidity constraints on the result. The three following combinations were tried: x C1 : SPOT stereoscopy and Radarsat radargrammetry x C2 : several radargrammetry with Radarsat data x C3 : SPOT stereoscopy and interferometry C1 enables any missing parts, due to stereoscopy limitations (clouds...), to be filled in and maintains accuracy. C2 and C3 not only derive a more complete DTM, but also provide more accurate height estimates.
230
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
GCP images
registration
Image pair 1
Image pair 2
Image pair n
method 1
method 2
method n
( o 2 , V o2 )
( o n , V on )
(o1 , V o1 )
Height fusion process
Fused DTM Figure 4. Height fusion process description
Figure 5. Stereoscopy (upper left). Radargrammetry (upper right). Interferometry stereoscopy+interferometry (lower right). Images courtesy of Thalès.
(lower left) and
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
231
4. Fusion of a Stereo Image Pair and Hyperspectral Data for 3D Urban Enhanced Extraction 3D model process extraction can be improved by using hyperspectral data. As described, 3D information can be obtained by exploiting two images (SAR or optical) from two different viewpoints. For urban 3D mapping, it is necessary to differentiate between points such as building tops and others high points. To solve this problem, a classification step of hyperspectral data can be used. Hyperspectral data composed of a hundred spectral bands can reduce confusion between building and non building classes. Again, an accurate registration of the images and the hyperspectral data set is necessary. Different classification methods can be used; however, hyperspectral data classification needs a pre-processing step to reduce the considerable amount of information. Statistical methods such as Principal Component Analysis (PCA), also called the Karhunen Loeve approach, can be used. This technique transforms a multivariate data set of M dimensions into a data set of new un-correlated linear combinations of the original variables. It generates a new set of axes that are orthogonal. In this new description space, a substantial part of the information contained in the original data is done by the first q ( q M ) bands. The classification can be supervised.
Figure 6. The left stereo pair image and the right one with one band of the hyperspectral data set
232
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
Figure 7. A view of the extracted 3D model draped with the left image pair. Images courtesy of ISTAR, Alcatel Space and Thalès
5. 3D Model Extraction and Assessment With Optical and SAR Images In this application, the 3D model is extracted from a stereo image pair. One or several SAR images with different aspect angles are used to assess the extraction. The extracted buildings are projected onto the SAR images with the geo-localization model, and the building’s shadow is also calculated. A visual assessment can then be done.
Figure 8. SAR image with one building with its projection. Image courtesy of ONERA and EADS
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
233
6. Relevant Images Synergy for Threat Assessment In this application, two areas with a possible military threat are considered. Intelligence sources have revealed that only one of the two areas is really dangerous. The dangerous area (with true military targets) must be found with the help of satellite images in different spectral ranges. Thanks to multi-sensor synergy, real targets can be separated from decoys. The images below show that a decision cannot be made with visible images only. The infrared image shows objects with a high radiant temperature but no definite conclusion can be made since inflatable decoys with warming devices exist. With SAR images, only real targets have a signature. The joint use of images from different spectral ranges enables decoys to be distinguished from real targets.
Images courtesy of French MOD
7. Change Detection Method Change detection is of tremendous importance in remote sensing applications. It can be performed by a human interpreter, and some automated methods seem to be promising. Automated change detection is a challenging domain for research in the field of defence applications. An example of automated change detection is automatic map update, where changes in the image concern objects like man-made structures, fields, forests and water areas. To simplify the process, cartographic objects that cease to exist can be processed separately from new objects. For configurations of partial overlapping between map and images, it is difficult or even impossible to formalise the approach suggested within a probabilistic framework. Thus, the Dempster-Shafer theory is shown as a more suitable formalism in view of the available information [6].
234
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
To detect changes, both new information (images) and previous information (map) are used. A two-stage processing is done. The first stage concerns the detection of missing objects and the second, the detection of new objects. The strategy to detect new objects is obtained by a multi-spectral classification into four classes (water area, ground area, gas tank, building) of the areas in the images that correspond to ground area classes on the map (precisely where there are new man-made constructions (gas tank and building) or new water areas). A region not classified as a ground area class corresponds to a new object. On the contrary, a «no-change» decision is obtained for this region of the map. Before classification, an important operation is the statistical learning of each class carried out automatically from objects declared during the first stage as still present in the images. These regions are used as training areas to estimate the a priori probabilities and the probability densities with respect to these classes. Missing objects must be assessed with an appropriate strategy to detect them. Objects from classes {Building, Gas tank, Water area}
Ground area objects
Detection of missing objects
Automatic learning process
Detection of new objects by multispectral classification
General synoptic to detect changes
The results of the extraction of new object areas by each approach are presented in the following figure. The two approaches attain an equivalent good-detection rate. The evidential method extracts one more gas tank and a new object area with a slightly better precision. The difference is mainly due to the false alarms that are more significant with the Bayesian approach as shown in the upper left part of the image. In conclusion, while we are in favour of the Bayesian approach, for an equivalent good detection rate, the evidential approach has a false alarm rate largely lower than the Bayesian approach.
O. Goretta and F. Celeste / Data Fusion and Image Processing: A Few Application Examples
235
Good detection False Alarm
Bayesian approach (left). Evidential approach (right)
8. Conclusion Several fusion methods and applications have been shown in this paper. Image fusion can be used in different cases at three different levels (pixel, feature, decision). The examples emphasized the need for fusion to produce efficient and user-friendly (from the user’s point of view) geospatial intelligence data. This will provide the warfighter with the information necessary to have the best level of preparedness.
References [1] : Radargrammetric image processing. F. W. Leberl. 1990 .Artech House. [2] : Traitement des images RSO, sous la direction d’H. Maître. Hermes 2001 [3] : Understanding Synthetic Aperture Radar Images. C.Oliver & S. Quegan. 1998. Artech House [4] : Photogrammétrie satellitale pour les capteurs de haute résolution : état de l’art. T. Toutin. SFPT n°175 [5] : Image registration method : a survey. B.Zitova & J.Flusser. Elsevier 2003 [6] : Automated map updating by fusion of multispectral images in the Dempster Shafer framework. F.Janez, O.Goretta, A. Michel. SPIE 2000. [7] : The speckle filters comparative test project. Multi-Temporal filters.CNES (French Space Agency). May 2003.
236
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Secondary Application Wireless Technologies to Increase Information Potential for Defence against Terrorism* Christo KABAKCHIEV, Vladimir KYOVTOROV, Ivan GARVANOV Institute of Information Technologies Bulgarian Academy of Sciences Acad. G. Bonchev Str., bl. 2, 1113 Sofia Bulgaria, Phone: +3592/979-29-28 E-mail:
[email protected];
[email protected];
[email protected] Abstract: This paper concerns the detection, parameter, and height estimation of Pseudo-random Noise (PN) signals; with a passive radar receiver network applied in wireless communication systems with multipath interference. The investigation is obtained by Monte Carlo simulation. The achieved results can be applied for target detection in multistatic radars using existing communication networks. Keywords: Multi-sensor data fusion, Passive correlation receivers, OS CFAR processor, Multipath interference.
Introduction Data Fusion technology is continually searching for new information sensors, new data processing algorithms, and reliable data relationships to increase the completeness of information needed for the defense against terrorism. Using networks of passive receivers to locate moving targets that reflect signals emanating from communication, navigation, TV or radio broadcasts is not well studied. However, these so-called Secondary Application Wireless Technology (SAWT) systems can be successfully used for the surveillance of large areas where the positioning of sensors is difficult and where the only reliable sensors are satellites. These systems could prove extremely important in rapid and robust situations, typically arising as a result of terrorist actions. Our aim is to use the existing Code Division Multiple Access (CDMA) wireless network for secondary application - radar detection, and parameter and height estimation of low flying targets. This could be achieved by adding passive radar receivers to the Base Station (BS) of the CDMA network to form a CDMA wireless network and a radar passive coherent network. As a result, a low flying target, as it is flying over the wireless communication network, would also cross the network of passive radar receivers. The receivers would use the global time synchronization of the Global Positioning System (GPS) or CDMA network and would be phase* This work is supported by IIT – 010059/2004, MPS Ltd. Grant “RDR” and Bulgarian NF “SR” Grant ʋ TH – 1305/2003.
C. Kabakchiev et al. / Secondary Application Wireless Technologies
237
synchronized with the CDMA network, but would be managed through their own or central control system. This paper concerns the detection, and parameter and height estimation of Pseudorandom Noise (PN) signals in multipath interference, using a passive radar receiver network linked to a wireless communication system [2]. In our research work, we use for each passive radar receiver the optimal detection structure for the PN sequences, consisting of a correlation receiver and an Ordered Statistics Constant False Alarm Rate (OS CFAR) processor. The receiver signals from the passive receiver network are processed in the fusion node. First they are synchronized in time and space, and then are assessed through a typical decision rule. By using three passive radars to simultaneously perform target detection, distance estimation, and data synchronization; target height can be estimated in the fusion node by applying the technique presented in Skolnik [3]. We assume that the target echo from the communication signal fluctuates according to a Swerling II case model at the input of the correlation receiver, the multipath interference follows a Poisson distribution for probability of appearance, and the amplitudes follow a Rayleigh distribution. Turin's model for multipath propagation is used [4,5]. The same approach can be used for other complex signals from different communication systems (GPS, CDMA 2000, WCDMA) [7].
1. Detection and Estimation in CDMA Networks in the Presence of Multipath Interference – Problem Formulation The purpose of our research work, presented in this paper, is to synthesize the structure of a network passive radar algorithm for simultaneous detection, and parameter and height estimation of a low flying target in a communication CDMA network. The contemporary CDMA communication networks are used for data transfer via air interface for mobile subscribers [6]. Generally, the networks consist of Base Stations (BS), with each BS covering a specific area (cell). The air interface consists of a pilot signal, a synchronization signal, and paging and traffic channels. The pilot signal is the same for the whole network and is used for phase initialization of demodulation (it supports system coherence) [6]. The synchronization channel is demodulated by all mobiles and contains important system information conveyed by the “synch channel message”, which is broadcast repeatedly. The paging channels are used to alert the mobile to incoming calls, to convey channel assignments and to transmit system overhead information. The traffic channels carry the digital voice or data to mobile users. All moving and fixed objects reflect these signals in the area covered by the communication networks. In our case, the target, flying over the wireless communication network, crosses the network of small passive radars. The receivers use the global time synchronization from the GPS or CDMA network, are phase-synchronized with the CDMA network, but are managed through their own or central control system. We suppose the target goes through different cells and receives the pilot signal from different base stations. Of the various signals used in a CDMA network, we choose the pilot signal as the radar signal as it is the most powerful signal, and has a continued code sequence, a long period of repetition, and is continuous in time. The signal, reflected from the target, consists of many independent reflecting elements. Such a signal distribution is known as Swerling II model with a priori known parameters. Specific interference in the
238
C. Kabakchiev et al. / Secondary Application Wireless Technologies
CDMA communication networks is the multipath interference, caused by the spread spectrum character of the signals. We use the well-known Turin’s model for the multipath propagation in urban areas with a priori known parameters. It describes signals with a Rayleigh distribution and Poisson probability of amplitude appearance. The signal time delay is assumed to be approximately 20Ps (or 3km). These interferences, together with the background interference, cover all small cells (micro and pico). Different from the background interference, which conceals the target signal, the multipath propagation forms many clutter targets and worsens detection and estimation. The task for passive radar network detection and estimation in a CDMA network can be transformed into a task for the detection of a target with unknown coordinates and velocity in the presence of multipath interference with a priori known signal and interference parameters. We use the approach applied in surveillance radar for the detection of moving targets and estimation of their parameters, as well as some of the well-known methods for background suppressing (for example Moving Target Detection (MTD), Adaptive Moving Target Detection (AMTD) or Space Time Adaptive Processing (STAP)) [7]. In this case, the detection of a target with unknown coordinates and velocity is transformed into multi-channel range target detection in a fixed velocity (azimuth) channel [7]. Using the CFAR approach allows the false alarm rate to be kept constant in all range cells. As a result, target detection in any communication cell could be transformed as CFAR target detection in a moving window creeping in range in all channels of velocity (in any channel azimuth). We do not investigate moving targets in our paper. Therefore target detection is reduced to pilot signal CFAR detection in the moving window in range (in any channel azimuth), in the presence of multipath interference.
2. Signal and Environment Model In this paper we study the signal and environment models similar to the ones proposed in [4,5]:
f w(t ) a0s(t td ) ¦ a s(t t ) n(t ) k k k 1
(1)
where: a0s(t - td) is the reflected signal in every range element of the signal matrix (a cell), a0 is an amplitude fluctuating independently according to Rayleigh distribution law (Swerling II target model), s(t) is the PN communication pilot signal, td is the time f delay of the direct signal; ɚ s(t t ) is Turin's multipath model where ak is the
¦
ɤ
k
ɤ 1
amplitude fluctuating according to Rayleigh distribution law, tk is the delay time with Poisson probability of appearance, n(t) is the Additive White Gaussian noise (AWGN). We do not consider the uniform random distribution of phases in the multipath model. The emitted signal is continuous, but the received signal can be considered a pulse signal after the use of a correlator with fixed length. In our case, the reflected signal is a PN code communication pilot signal. The PNcode spreading is followed by classic (quadraphase-shift keying) QPSK modulation of
239
C. Kabakchiev et al. / Secondary Application Wireless Technologies
the radio frequency carrier. We use the communication channel model suggested by Turin [4,5]. This model is described as a pulse train with Rayleigh amplitude distribution, Poisson probability of appearance, and uniform random distribution of phases. In accordance with Turin's model, we choose probability of appearance Pa=0.2 at the input of the CFAR.
3. Passive Receiver Network for Target Detection in a CDMA Communication Network in the Presence of Multipath Interference 3.1. Correlation Receiver We use the baseband acquisition diagram for the pilot signal of the CDMA - IS 95-A [6]. It consists of a correlator and a threshold detector with a fixed threshold. In our case, we use an OS CFAR processor for target detection in the presence of multipath interference (Figure 1). The quadrature components at the baseband filter input are presented in Figure 1, where CI(t) and CQ(t) are PN sequences in the I and Q channels respectively, n(t) is the Additive Gaussian White Noise (AWGN), nI(t) and nQ(t) are the corresponding noise in both channels (statistically independent white Gaussian noise) [6].
OSCFAR
H1 H0
H1 H0
Figure 1 : Filter diagram
3.2. OS CFAR Processor The signal after the correlation receiver is very dynamic. Unlike pulse radars, for which it is considered that there is no signal in the reference window, our model works with continuous signals; and therefore, the statistics in both the test and reference windows have similar structures, including white noise, signal, and multipath interference. Therefore we use the OS approach for noise level estimation in both windows. The minimum of the average decision threshold is used as a criterion of effectiveness of these estimations [1]. The rank-ordered parameters, giving the best OS estimation, are chosen by using Monte Carlo simulation. The effective estimations are equivalent to those elements of the ordered statistics in both windows, where the minimum SNR occurs for probability detection Pd=0.5 and fixed false alarm probability Pfa. In order
240
C. Kabakchiev et al. / Secondary Application Wireless Technologies
to optimize these estimations, we change dependently and independently the rankordered parameters (from 3/4L to 1/8L), as it is done in [1]. The algorithm consists of the following stages. The elements of the reference & & window x x1 , x2 ...x R , R=ML and the test resolution cell z z1 , z2 ... z L are rankordered according to increasing magnitude. The main idea of an OS CFAR procedure is to select one main value xk(1), k ^1,2,..., R` and zk1(1), k ^1,2,..., L` from the order statistic sequences. These two values xk(1) and zk1(1) are used as estimators V and q0 respectively for the average noise level and the average signal level of the observed reference and test window. The rank-ordered parameters k and k1 of the OS CFAR procedure are chosen in such a way that the average decision threshold of the OS CFAR processor is a minimum value. The target is then detected according to the following algorithm: H 1: : )q o 1, q o t TaV ® ¯ H o : )q o 0, q o TaV
(2)
where H1 is the hypothesis that the test resolution cells contain the echoes from the target and H0 is the hypothesis that the test resolution cells contain white noise, signal, and multipath interference. The constant Ta is a scale factor, which is determined in order to maintain a given false alarm probability constant. Analytical equations for the Probability Density Functions (PDF) for probability detection Pd0 and false alarm probability Pd1 at the output of the correlator are not available. The Monte Carlo simulation approach is then used for estimation of the probability performance of the OS CFAR processor. 3.3. Fusion Node - Detection of Low Flying Targets We use the hard-decision sensors. They choose a single – hypothesis decision at the sensors, and that decision, alone, is reported to the fusion process. With these single look (sensor decision is based upon a single measurement of the target signal) methods, the measure of certainty can be based upon accumulated evidence from multiple, independent looks at the signals. We use the sequential algorithms, L-of-M criteria. The binary integrator performs summing of L decisions from sensors. The fusion node detection is declared if this sum exceeds the second threshold M. The probability of target detection for the fusion node is computed by using the expression: L
PD
L l
¦ C Ll Pd1 1 Pd1 l
(3)
l M
where Pd1 is the probability of detection from each sensor and the probability of false alarm is calculated, setting s 0 Analytical equations for the probability density functions (PDF) for detection probability Pd 0 and false alarm probability Pd1 at the output of the OS CFAR are not available. The Monte Carlo simulation approach is then used for estimation of the probability performance of the OS CFAR BI processor in multipath interference, as it is in [9,10].
C. Kabakchiev et al. / Secondary Application Wireless Technologies
241
3.4. Fusion Node- Height Estimation of Low Flying Targets It is extremely important to know the height of low flying targets that have no transponders or have entered air traffic controlled areas without permission. This problem can be solved by applying Skolnik’s approach to a passive radar network, estimating the three coordinates of a target by measuring the distance from each of the passive radars to that target. By using a three-positioned passive radar system for example, the target coordinates can be determined only by measuring the three target distances (r1, r2 , r3). Complete synchronization of the radar performance should be ensured and the distances between the radars (the particular radar coordinates) should be measured. The target coordinates using three-positioned passive radar system can be estimated with [2]:
x
r12 r22 4a
(4)
y
r12 r22 2r32 2(b 2 a 2 ) 4b
(5)
z
r r12 ( x a ) 2 y 2
(6)
In this paper we use also these dependencies to obtain the error of target height estimation, modeling all the parameters in a MATLAB computational environment. The numerical characteristics of the coordinates are evaluated – mean value and standard deviation for a three-positioned radar system. The mean values of the target coordinates (x, y, z) could be obtained according to the following mathematical expressions (4, 5, 6). The analytical mean value and standard deviation of z – height estimation of low flying targets is:
M >z @
M >r1 @ M >x@ a M > y@ D>r1 @ D>x@ D> y@ D>z@ 2
2
2
(7)
The standard deviation of z is:
D>z @ M >r1 @ M >x @ a M > y @ M >z @ D>r1 @ D>x @ D> y @ 2
2
2
2
(8)
4. Conclusion This paper has considered the benefits of using a passive radar receiver network wireless communication system for the detection, and parameter and height estimation of PN signals in multipath interference. By having three passive radars simultaneously perform target detection, distance estimation and data synchronization, target height estimation can be performed in the fusion node by applying the Skolnik approach,
242
C. Kabakchiev et al. / Secondary Application Wireless Technologies
which estimates the three coordinates of a target by measuring the distance of each passive radar to the target. The results can then be applied for target detection in multistatic radars using existing communication networks.
References Rohling H.: Radar CFAR Thresholding in Clutter and Multiple Target Situation, IEEE Trans. vol. AES19, 4, July, pp. 608-621, 1983. [2] Cherniakov M., Kubik M.: "Secondary applications of wireless technology (SAWT)", 2000 European Conference on Wireless Technology – Paris 2000 [3] Skolnik M.: Radar handbook, McGraw-Hill, 1990. [4] Turin G.L.: et al, " A statistical model of urban multipath propagation", IEEE Trans. Vehicul.Technol.", pp.1-8, Feb.1972. [5] Suzuki, H.: "A Statistical Model for Urban Radio Propagation", IEEE Transactions on Communications, vol.com-25, No7, July 1977 [6] Lee J., Miller L.: CDMA Systems Engineering Handbook, Artech House, 1998. [7] Lazarov, A., Minchev, Ch.: "ISAR Technique with Complementary Phase Code Modulated Signals", PLANS 2004 Conference, Monterey, CA on April 26 to April 29, 2004. [8] Kabakchiev Chr., I. Garvanov and V. Kyovtorov – “Correlation Receiver with Active CFAR Detector for PN Signal Processing in Pulse Jamming with Unknown Parameters,” International Conference on Radar’ 04, Toulouse, France, CD - 6P-SP-121, 2004. [9] Chr. Kabakchiev, V. Kyovtorov and I. Garvanov: “ Detection with OS CFAR processor in CDMA networks in the presence of multipath interference” Cybernetics and Information Technologies, Volume 4, ȹ 2, pp. 101-120, 2004. [10] Behar V., Chr. Kabakchiev, L. Doukovska: "Adaptive CFAR PI Processor for Radar Target Detection in Pulse Jamming", VLSI, SP-26, pp. 383-396, 2000. [11] Garvanov, I. and Chr. Kabakchiev: “Sensitivity of API CFAR Detectors Towards Change of Input Parameters of Pulse Jamming”, Proc. of the International Radar Symposium – IRS 2004, Warszawa, Poland, pp. 233-238, 2004 [12] Garvanov, I., and Chr. Kabakchiev: ”Sensitivity of CFAR Processors Toward the Change of Input Distribution of Pulse Jamming”, Proc. of IEEE - International Conference on Radar “Radar 2003”, Adelaide, Australia, pp. 121-126, 2003. [13] Himonas S.: CFAR Integration Processors in Randomly Arriving Impulse Interference, IEEE Trans., vol. AES-30, 3, July, pp. 809-816, 1994. [14] Garvanov, I., V. Behar and Chr. Kabakchiev: “CFAR Processors in Pulse Jamming”, Conference, Numerical Methods and Applications 2002, NMA 2002, “Lectures Notes and Computer Science ”, LNCS 2542, pp. 291-298, 2003. [15] Akimov, P., F. Evstratov and S. Zaharov: Radio signal detection, Moscow, Radio and Communication, 1989, pp. 195-203, (in Russian). [16] Waltz E., Llinas J. Multisensor Data Fusion, Artech House, Boston, 1990. [17] Kabakchiev, H., Garvanov I and Kyovtorov V.:“Error estimation in targetheight finding using VHF radar and three-antenna system positioned one above the other”, Distributed Computer and Communication Networks, Sofia, 2005, pp. 222-238. [1]
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
1,2
' A
243
244
S.G. Nikolov et al. / Adaptive Image Fusion Using Wavelets: Algorithms and System Design
W
I1 (x, y) φ I(x, y)
In (x, y) W −1
I(x, y) = W −1 (φ(W (I1 (x, y)), W (I2 (x, y), .., W (In (x, y)))).
S.G. Nikolov et al. / Adaptive Image Fusion Using Wavelets: Algorithms and System Design
245
QAB/F
•
•
•
246
S.G. Nikolov et al. / Adaptive Image Fusion Using Wavelets: Algorithms and System Design
•
•
•
•
QAB/F
S.G. Nikolov et al. / Adaptive Image Fusion Using Wavelets: Algorithms and System Design
247
248
S.G. Nikolov et al. / Adaptive Image Fusion Using Wavelets: Algorithms and System Design
iR k
tR k
ftk
ftk
ftk
ftk
ftk
S.G. Nikolov et al. / Adaptive Image Fusion Using Wavelets: Algorithms and System Design
249
250
S.G. Nikolov et al. / Adaptive Image Fusion Using Wavelets: Algorithms and System Design
S.G. Nikolov et al. / Adaptive Image Fusion Using Wavelets: Algorithms and System Design
251
252
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Methods for Fused Image Analysis and Assessment A. LOZAa,1,2, T. D. DIXONb, E. FERNÁNDEZ CANGAa, S. G. NIKOLOVa, D. R. BULLa, C. N. CANAGARAJAHa, J. M. NOYESb and T. TROSCIANKOb a Department of Electrical and Electronic Engineering, University of Bristol, UK b Department of Experimental Psychology, University of Bristol, UK
Abstract. The widespread use of image fusion – combining images of different modalities – is finding increasing application in fields such as, for example, medical imaging, remote sensing and surveillance. Consequently, the ability to assess the fused images accurately has become of great importance. In this correspondence, the differences between conventional image quality measures and composite image assessment are outlined and the image fusion assessment methods are covered with examples of both the objective quality measures and psychophysical testing techniques.
Keywords. Image fusion, image quality assessment, psychophysical testing
Introduction Digital image processing, compression and transmission affect image quality and may lead to a reduction of its readability and value. To measure the extent of the image degradation, image quality measures are necessary. Apart from quality monitoring, from an image processing point of view, two applications of image quality assessment are important: benchmarking of the algorithms, giving a point of reference to which considered methods can be compared, and, within the image processing system itself, optimising the performance of the algorithm and its parameters. When the image is not intended as an input to a processing procedure (for example classifier), the ultimate recipient or interpreter of the image is a human observer. Consequently, the subjective quality judgment seems to be the most appropriate. Subjective rating is usually performed by a group of interpreters, either trained experts or non-experts, and usually has the form of absolute or comparative evaluation. Subjective methods of image assessment, however, in many cases may not be reproducible, and are expensive, time-consuming and can be influenced by observers’ professional and physical qualifications, and the technological conditions of the experiment [1]. Thus, the need for computationally effective qualitative image and video assessment measures that would assess the distortion of a processed image without 1
Corresponding Author: Artur àoza, Merchant Venturers Building, University Gate 2.34, Woodland Road, Bristol, BS8 1UB, UK; e-mail:
[email protected]. 2 This work has been funded by the UK MOD Data and Information Fusion Defence Technology Centre.
A. Loza et al. / Methods for Fused Image Analysis and Assessment
253
having to rely on human participants arises. An objective quality metric for image or video assessment is essentially a computational method of comparing an image with a reference image, or, less often, with a statistical measure, that does not require a reference. The earliest and still ubiquitously used quantitative measures are simple functions of the analysed images, so-called mathematical metrics, such as Mean-Square Error (MSE), the Mean Absolute Error (MAE), and the Peak Signal-to-Noise Ratio (PSNR), as well as distortion contrast and the local MSE [2]. Some of the primary benefits of the mathematical metrics are that they are simple to calculate, have clear physical and logical meaning, and facilitate efficient mathematical optimisation. The main disadvantage of the mathematical metrics is that they often do not correlate well with actual subjective performance on quality estimation tasks [3] and might give very similar estimates across a range of picture distortion types without actually accounting for the nature of the distortion [4]. Metrics based on the Human Visual System (HVS) generate most of the research interest in the current literature, as they purport to explain and predict human image rating behaviour more accurately than pixel-based mathematical models. HSV functions may be modelled by: Contrast Sensitivity Function filtering (defines how sensitive a viewer is to a given spatial frequency, may also account for masking effects) and single- or multi-channel decompositions with transforms ranging from simple discrete cosine and wavelet decompositions to complex coders based on models of the low-level processing of the HVS [4, 5]. The errors of the transformed images with regard to the reference image are then calculated, normalised and pooled, usually using a method referred to as Minkowski error pooling [2, 5]. The HVS framework described above can be seen as based on error sensitive methods that are trying to identify a level of error between the reference image and the distorted image. Instead of error sensitivity, it is suggested that quality assessment should be based upon a measure of structural similarity [6]. This alternative philosophy is based on the premise that the HVS is highly tuned to extracting structural information from its field of view. It is from this foundation that a new quality metric has been developed, called The Structural SIMilarity (SSIM) Index [4]. The mathematical description of the SSIM Index and its application to the fused image assessment will be presented in Section 2.4. Multi-sensor image or video fusion can be defined as the process in which several images, or some of their features, coming from different sensors, are combined together to form a single fused image or video containing required complementary information. The successful fusion of images acquired from different sensors, modalities or instruments is of great importance in many applications, such as remote sensing, computer vision, robotics, surveillance, medical imaging and microscopic imaging. The recent rise in research interest into fused data has brought with it significant problems in the objective assessment of such data. Much research effort is being placed upon creating new and more efficient fusion schemes, without also devising new and more appropriate methods for objective evaluation of the output image quality. The issue of fused image quality assessment is complicated not only by the range of different fusion options available; it is the fusion itself that often requires a different approach to the problem, mostly due to the lack of a reference image and its strong dependency on datasets, fusion techniques and applications [7]. Therefore, it is essential to consider modalities, algorithms and tasks undertaken when attempting to assess the quality of a
254
A. Loza et al. / Methods for Fused Image Analysis and Assessment
fused image. This correspondence will concentrate on still image metrics as research into fused video metrics is virtually nonexistent [8]. This paper reviews briefly the specific issues associated with fused image quality assessment in Section 1 and focuses on selected assessment methods in Section 2 where main definitions and examples of the fused image objective quality measures and subjective testing methods are discussed.
1. Specifics of the Fused Image Assessment As pointed out in early reviews of multisensor image fusion techniques [7], general statements on the quality of a fused image or technique are very difficult to make, due to the lack of a reference fused image and task/application dependency of most of the methods. The complexity of the image fusion assessment criteria and aspects distinguishing image fusion from other image processing procedures that should be considered when assessing its quality will be discussed in the following paragraph. 1.1. Lack of Reference Fused “Ideal” Image Fused image inevitably contains complementary information from several, often incompatible sources. Even though the image fusion users may have a clear idea of what kind of image they want to obtain, the real-world equivalent of the composite image may not exist. For example, when fusing images from two or more cameras operating in different bandwidths, the equivalent of the ideal image is not available, because a single camera technology operating in such a bandwidth range is not available. In some applications however, such as digital photography, it is possible either to synthesise the reference image from available data [9] or to use distorted versions of available reference image in the simulated fusion process [10]. 1.2. Task, Application and Modality Dependency Image fusion can be used to achieve various ends, such as enhancement of spatial, spectral or temporal resolution in remote sensing; in fused imagery coming from a surveillance system. The detection of intrusions or abnormal behaviour is sought and the aesthetics of the image is less important. The results of image fusion are either presented to a human observer for easier and enhanced interpretation or are subjected to further computer analysis or processing, e.g., segmentation, classification, target detection or tracking, with the aim of improved accuracy and more robust performance. In all aforementioned cases, the true quality of the fused image depends upon how well it performs in the specific task. Most of conventional subjective image quality testing methods have been designed with assessing unimodal image quality in mind. The wide range of imaging modalities to be fused presents a significant problem relating to the compatibility of the images attained from each sensor, and how fully a metric can account for this issue. For example, images obtained by means of computer tomography and magnetic resonance scans might complement one another in terms of content, and be spatially and temporally similar. Consequently, the quality can be found using a measure which can be based on the amount of statistical dependence of feature and visual information of
A. Loza et al. / Methods for Fused Image Analysis and Assessment
255
input and fused images [11]. However, when combining images with maximum spatial and spectral resolutions that are largely differing, a number of issues arises, due to the incompatibility of the images, caused by possible differences in time or incommensurate spatial and spectral bandwidths [12]. Such issues must be considered in the creation of appropriate metrics for the appropriate fusion method. This situation is complicated further by the range of different fusion options and methods available, for which, in some cases, only specific assessment methods may be appropriate. As will be shown in Section 2.5, more specialised, task-related methods can be introduced for subjective quality or performance rating of the image fusion. 1.3. Desirable Properties of Fused Images When fusion serves a certain purpose, it is often possible to specify requirements that should be fulfilled by the fused image. For example, when images are fused to create a synthesis that has enhanced spatial resolution, it is necessary for them to have following properties [12]: 1) they should be as identical as possible to the originals; 2) they should have a spatial resolution close to the original high-spatial resolution images; 3) the multispectral set of synthesised images should coincide with the multispectral set of images observed by the sensor at the highest resolution. It is therefore suggested that such fused images should be evaluated in a content-dependent manner, both spatially and spectrally [13, 14].
2. Selected Fused Image Metrics 2.1. Mathematical Metrics The mathematical metrics for fused image assessment might be applicable to situations where the image content of a fused image is not to be assessed directly by a human. Under such circumstances, as with uni-modal metrics, a simple mathematical measure might be suitable to evaluate the differences between the fused and original/ideal images. This again raises the problem of sensor capabilities that cannot supply an ideal image. A number of mathematically based visual assessment procedures for fused images, such as colour-matching of features in fused images via histogram stretching and inversion of image channels have been specified in [7]. The root MSE was considered a useful evaluation technique in [10], where it produced appropriate quality values for a digital camera image fusion assessment, based against an ideal. A metric based upon the standard deviation of difference between an ideal and fused image in order to assess spectral quality has been proposed in [15]. In [16], a normalised squared error metric has been used within subparts of the fusion process to assess image quality. The measure however was found not robust enough to correspond to visual information. Among the fused image mathematical metrics, much attention was given to the Mutual Information (MI) measure. The MI is a natural measure of the dependence between random variables and was first used for image fusion assessment in [17]. It is defined by the Kullback-Leibler “distance” between two images, or, in the case of image fusion, as the average of the distances between the input images (A and B) and the fused image F:
256
A. Loza et al. / Methods for Fused Image Analysis and Assessment
.
where
MAF can be interpreted as the distance between the joint distribution of greyscale values of the images A and F, p(a, f), and the joint distribution of statistically independent images p(a)p(f). Measure MBF is defined analogously to MAF. 2.2. Combined Spatial and Spectral Metrics In remote sensing applications, where spatial and spectral image information is usually fused, reference images are usually not available. In [18] the correlation coefficients between the high-resolution satellite images were used as an indicator of spatial quality. However, this method is unable to make a direct comparison between the fused image and the high-resolution panchromatic image [13]. An alternative method is based on attempting to reconstruct the missing information that is not present in the high spectral and fused images [19]. Another recent fused image metric that measures both spatial and spectral information is the Blur Parameter Estimation (BPE) of [15]. The BPE is based on the supposition that image resolution and spatial quality are positively correlated. The resolution of an image is dependent on the sensor equipment used to attain the image, and can be limited by pixel size as much as optical constraints. The spatial quality of a fused image can by characterised by the line point spread function. This measure was shown to work more accurately than the metric of [18] on a limited set of fused satellite images. 2.3. Edge Based Metric The only aspect of the HVS that has been fully examined for image fusion purposes is the edge extraction [20]. This is a comparatively simple model similar to the singlechannel models described in Introduction, which assumes that one essential feature of the HVS can be used to evaluate the quality of an image. A metric, which measures the amount of edge information “transferred” from the source image to the fused image, has been recently proposed in [20]. It uses a Sobel edge operator to calculate the edge strength g(n,m) and orientation D(n,m) information of each pixel in the input and output images. The relative strength and orientation “change”, GAF(n,m) and AAF(n,m) of an input image A with respect to the fused image F, are defined as:
. These measures are then used to estimate the edge strength and orientation preservation values, Qg and QD, ,
257
A. Loza et al. / Methods for Fused Image Analysis and Assessment
where the constants k and V determine the exact shape of the sigmoid nonlinearities used to form the edge strength and orientation. The overall edge information preservation values are then defined as: . Measure QBF is defined analogously to QAF. A normalised weighted performance metric of a given process p that fuses A and B into F, is given as: . It can be observed that the edge preservation values QAF(n,m) and QBF(n,m), are weighted by coefficients wA(n,m) and wB(n,m), which reflect the perceptual importance of the corresponding edge elements within the input images. Note that in this method the visual information is associated with the edge information while the region information is ignored. 2.4. Metric Based on Structural Similarity This Image Fusion Quality Index (IFQI) [21] is based on the SSIM (see Introduction) image quality index recently introduced in [6], which is defined as:
where P and V stand for mean and standard deviation, respectively. The first and second component of SAF measures how close the luminance and contrast of the images are, respectively; the third component is the correlation coefficient between the two images, measuring the spatial similarity between the images. In order to apply this metric for image fusion evaluation, the authors of [21] introduce salient information O to reflect the relative importance of image A compared to image B, within the window w: .
Finally, to take into account aspects of the HVS which is the perceptual relevance of edge information, the same measure is computed with the “edge images”, Aಿ, Bಿ and Fಿ, and the final value is calculated as a product of the two measures: .
As with the previous metrics, this metric does not require a ground-truth or reference image.
258
A. Loza et al. / Methods for Fused Image Analysis and Assessment
2.5. Psycho-visual Fusion Evaluation Psycho-visual image fusion evaluation has been appropriately dominated by taskrelated perceptual evaluation of the images. The early advances in this field were initiated in [22], where colour image fusion schemes were applied to visible and thermal images of military relevant scenarios. The fusion methods used have been shown to improve the accuracy of observers performing detection and localization tasks. Other factors of human observer performance such as global scene recognition, target recognition and detection versus single modalities and different colour mapping used in fusion were tested in [23]. The psychophysical testing presented in [22, 23], has been extended to compare the JPEG2000 and JPEG compression schemes and combined with metric assessment (MI, QAB/F and IFQI) across a wider range of image fusion methods in [24, 25]. The fusion methods used were an averaging, contrast pyramid [26] and the dual-tree complex wavelet transform [27]. In the experiments of [24, 25] participants were asked to perform visual target detection tasks and to assess image fusion quality comparatively. The results obtained have shown that there is a correlation between two of the metrics (QAB/F and IFQI) and the psychophysical evaluation. They also indicate that the selection of the correct fusion method has more impact on task performance than the presence of compression.
3. Summary This paper reviews a wide range of computational, objective and subjective image fusion testing methods. It is emphasised that the issue of image fusion quality assessment differs from conventional image quality testing due to a lack of reference fused images, a wide range of fusion methods and modalities used, and task/application dependency of most of the methods. The most commonly used image fusion computational metrics try to estimate the amount of information transferred from input images to the fused image, whereas incorporating specific tasks into psycho-visual testing allows task-dependant objective fusion assessment.
References [1] Eskicioglu, A.M., Quality measurement for monochrome compressed images in the past 25 years, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) Conference. June 5-9, 2000: Istanbul, Turkey. p. 1907-1910. [2] Eckert, M.P. and A.P. Bradley, Perceptual quality metrics applied to still image compression. Signal Processing, 1998. 70(3): p. 177-200. [3] Miyahara, M., K. Kotani, and V.R. Algazi, Objective picture quality scale (PQS) for image coding. Communications, IEEE Transactions on, 1998. 46(9): p. 1215-1226. [4] Wang, Z., et al., Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, April 2004. 13(4): p. 600-612. [5] Pappas, T. and R. Safranek, eds. Perceptual Criteria for Image Quality Evaluation. Handbook of Image and Video Processing, ed. A. Bovik. 2000, Academic Press: San Diego. 669-684. [6] Wang, Z. and A.C. Bovik, A universal image quality index. Signal Processing Letters, IEEE, 2002. 9(3): p. 81-84. [7] Pohl, C. and J.L.v. Genderen, Review article Multisensor image fusion in remote sensing: concepts, methods and applications. International Journal of Remote Sensing, 1998. 19(5): p. 823-854. [8] CiteSeer.IST Scientific Literature Digital Library Online Search for "fused video metric", in http://citeseer.ist.psu.edu/cis?q=video+fusion+metric.
A. Loza et al. / Methods for Fused Image Analysis and Assessment
259
[9] Hill, P., N. Canagarajah, and D. Bull. Image Fusion using Complex Wavelets. in The 13th British Machine Vision Conference. 2002. [10] Zhang, Z. and R. Blum, A categorization of multiscale-decomposition-based image fusion schemes with a performance study for a digital camera application. Proceedings of the IEEE, August 1999. 87: p. 1315-1326. [11] Qu, G., D. Zhang, and P. Yan, Information measure for performance of image fusion. IEE Electronics Letters, 2002. 38(7): p. 313-315. [12] Wald, L., T. Ranchin, and M. Mangolini, Fusion of satellite images of different spatial resolution: Assessing the quality of resulting images. Photogrammetric and Remote Sensing, 1997. 63(6): p. 691699. [13] Li, J., Spatial Quality Evaluation Of Fusion Of Different Resolution Images. International Archives of Photogrammetry and Remote Sensing, 2000. XXXIII: p. 331-338. [14] Buntilov, V. and T. Bretschneider. Objective content-dependent quality measure for image fusion of optical data. in IEEE International Geoscience and Remote Sensing Symposium. 2004. [15] Li, H., B.S. Manjunath, and S.K. Mitra, Multisensor Image Fusion Using the Wavelet Transform. Graphical Models and Image Processing, May 1995. 57(3): p. 235-245. [16] Robinson, G.D., H.N. Gross, and J.R. Schott, Evaluation of Two Applications of Spectral Mixing Models to Image Fusion. Remote Sensing of Environment, 2000. 71(3): p. 272-281. [17] Qu, G., D. Zhang, and P. Yan, Medical image fusion by wavelet transform modulus maxima. Optics Express, August 2001. 9(4): p. 184-190. [18] Zhou, J., D.L. Civco, and J.A. Silander, A wavelet transform method to merge Landsat TM and SPOT panchromatic data. International Journal of Remote Sensing, 1998. 19(4): p. 743 - 757. [19] Ranchin, T., et al., Image fusion--the ARSIS concept and some successful implementation schemes. ISPRS Journal of Photogrammetry and Remote Sensing, 2003. 58(1-2): p. 4-18. [20] Petrovic, V.S. and C.S. Xydeas, Sensor noise effects on signal-level image fusion performance. Information Fusion, 2003. 4: p. 167-183. [21] Piella, G. and H. Heijmans, A New Quality Metric for Image Fusion, in Proceedings of the Intl. Conf. on Image Processing. 2003: Barcelona, Spain. [22] Toet, A., et al., Fusion of visible and thermal imagery improves situational awareness. Displays, 1997. 18(2): p. 85-95. [23] Toet, A. and E.M. Franken, Perceptual evaluation of different image fusion schemes. Displays, 2003. 24(1): p. 25-37. [24] Dixon, T., et al. Psychophysical and Metric Assessment of Fused Images. in 2nd Symposium on Applied Perception in Graphics and Visualization. 2005. Spain. [25] Fernandez-Canga, E., et al. Characterisation of Image Fusion Quality Metrics for Surveillance Applications over Bandlimited Channels. in The 8th Intl. Conf. on Information Fusion. 2005. Philadelphia, PA, USA. [26] Toet, A., L.v. Ruyven, and J. Velaton, Merging thermal and visual images by a contrast pyramid. Optical Engineering, 1989. 28(7): p. 789-792. [27] Kingsbury, N., Complex Wavelets for Shift Invariant Analysis and Filtering of Signals. Applied and Computational Harmonic Analysis, 2001. 10(3): p. 234-253.
260
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Object Tracking by Particle Filtering Techniques in Video Sequences1 2
Lyudmila MIHAYLOVA , Paul BRASNETT, Nishan CANAGARAJAH and David BULL Department of Electrical and Electronic Engineering, University of Bristol, UK
Abstract. Object tracking in video sequences is a challenging task and has various applications such as port security. We review particle filtering techniques for tracking single and multiple moving objects in video sequences by using different features such as colour, shape, motion, edge, and sound. Pros and cons of these algorithms are discussed along with difficulties that have to be overcome. Results of a particular particle filter with colour and texture cues are reported. Conclusions and open research issues are formulated. Keywords: particle filtering, sensor data fusion, tracking in video sequences
Introduction Object tracking is required in many vision applications such as human-computer interfaces, video communication/compression, road traffic control, and security and surveillance systems. Often the goal is to obtain a record of the trajectory of the single or multiple moving targets over time and space by processing information from distributed sensors. Object tracking in video sequences requires on-line processing of a large amount of data and is time-expensive. Additionally, most of the problems encountered in visual tracking are non-linear, non-Gaussian, multi-modal, or any combination of these. Different techniques are available in the literature for solving tracking tasks in vision and can be generally divided into two groups: i) classical applications, where targets do not interact much with each other and behave independently, such as aircraft that does not cross paths, and ii) applications in which targets do not behave independently (ants, bees, robots, people) and their identity is not always well distinguishable. Tracking multiple identical targets has its own challenges when the targets pass close to each other or merge. In this paper. we concentrate on the particle filtering technique, which has recently proven to be a powerful and reliable tool for tracking non-linear systems. The promise of particle filtering is that it allows fusion of different sensor data to incorporate constraints and account for different non-Gaussian uncertainties. Furthermore, it can cope with missing data, circumvent possible occlusions, and solve data association problems when multiple targets are being tracked with multiple sensors [1]. The 1
We acknowledge the financial support of the UK MOD Data and Information Fusion DT Center. 2 Corresponding author, E-mail:
[email protected] L. Mihaylova et al. / Object Tracking by Particle Filtering Techniques in Video Sequences
261
observations may come synchronously, or asynchronously in time from one sensor or many sensors, static or moving. How to position the cameras is an optimisation and decision making problem. The rest of the paper is organised as follows. Section 1 formulates the problem for object tracking in video sequences within a Bayesian framework. The most commonly used motion models and cues are presented in Section 2. Section 3 gives a particle filter based on fusing multiple independent information cues, colour, and texture. The algorithm relies on likelihood factorisation as a product of the likelihoods of different cues. We show the advantage of fusing multiple cues compared to colour-based tracking only and texture-based tracking only. Section 4 generalises the results and outlines future work. 1. Monte Carlo Framework for Object Tracking in Video Sequences The generic objective is to track the state of a specified object or region of interest in a sequence of images captured by a camera. Different techniques are available in the literature for solving tracking problems in vision. We focus mainly on Monte Carlo techniques (particle filters) because of their power and versatility [2-5]. Monte Carlo techniques are based on computation of the state posterior density function by samples, and are known under different names: Particle Filters (PFs) [3], bootstrap methods [2], or the condensation algorithm [6, 7], which was the first variant applied to video processing. The acronym CONDENSATION stems from CONDitional DENSity propagATION. The aim of sequential Monte Carlo estimation is to evaluate the posterior Probability Density Function (PDF) p( x k | Z k ) of the state vector x k nx , with dimension n x given a set Z k {z1 ,..., z k } of sensor measurements up to time k. The Monte Carlo approach relies on a sample-based representation of the state PDF. Multiple particles (samples) of the state are generated, each associated with a weight that characterises the quality of a specific particle. An estimate of the variable of interest is obtained by the weighted sum of particles. The two major steps are: prediction and update. During the prediction stage, each particle is modified according to the state model of the region of interest in the video frame, including the addition of random noise to simulate the noise in the state. In the update stage, each particle’s weight is re-evaluated with the new data. A resampling update procedure eliminates particles that have small weights and replicates the particles with larger weights. Many of the proposed particle filters for tracking in video sequences rely on a single feature, e.g., colour. However, single-feature tracking does not always provide reliable performance when there is clutter in the background. Multiple-feature tracking [1,19,20] provides a better description of the object and improves robustness. In [1], a particle filter is developed that fuses three types of raw data: colour, motion, and sound. However, developing a visual tracking algorithm that is robust to a wide variety of conditions is still an open problem. Part of this problem is the choice of what to track. Colour trackers are distracted by other objects that have the same or similar colour as the target.
262
L. Mihaylova et al. / Object Tracking by Particle Filtering Techniques in Video Sequences
2. Typical Motion and Observation Models 2.1. Motion Models The techniques used to accomplish a given tracking task depend on the purposes, and in particular on i) the objects possessing certain characteristics, e.g., cars, people, faces; ii) objects possessing certain characteristics with a specific attribute, e.g., moving cars, walking people, talking faces, face of a given person; iii) objects of a priori unknown nature but of specific interest, such as moving objects. In each case, part of the input video frame is searched against a reference model describing the appearance of the object. The reference can be based on image patches, which describe the appearance of the tracked region at the pixel level, on contours, and/or on global descriptors such as colour models. To characterise a target, first a feature space is chosen. The reference object (target) model is represented by its PDF in the feature space. For example, the reference model can be the colour PDF of the target [1]. In the subsequent frame, a target candidate is defined at some location and is characterised by the PDF. Both PDFs are estimated from the data and compared by a similarity function. The local maxima in the similarity function indicates the presence of objects in the second image frame with representations similar to the reference model defined in the first frame. Examples of similarity functions are the Bhattacharyya distance and the KullbackLeibler distance. In the tracking of a specified object or region of interest in image sequences, different object models have been proposed in the literature. Many of them make only weak assumptions about the precise object configuration and are not particularly restrictive about the types of objects. A reasonable approximation to the region of interest can be an ellipse [8] or a rectangular box such as in [1]. The object (motion) models used in the literature vary from general random walk models [10, 1] to constant acceleration models [9] or other specific models. To design algorithms that are applicable to fairly large groups of objects including people, faces, vehicles, etc., in [1], a weak model for the state evolution is adopted, mutually independent Gaussian Random Walk models. These models are augmented with small random uniform components to capture (rare) events such as jumps in the image sequence. They also help in recovering tracks after periods of complete occlusion. Mixed-state motion models as in [20] can be used to overcome partial and full occlusions. 2.2. Observation Models The observation models for object tracking in video sequences are usually highly nonlinear and can be either parametric (e.g., mixture of Gaussians) or nonparametric (e.g., histograms). Some of the most often used observation models are based on colour, shape and/ or motion cues. The localisation cues impact trackers based on PFs in different ways. Usually, likelihood models of each cue are constructed [10, 1]. These cues are assumed mutually independent, but it must be kept in mind that any correlation that may exist between them, e.g., the colour, motion and sound of an object, is likely to be weak. Adaptation of the cues is essential in distinguishing different
L. Mihaylova et al. / Object Tracking by Particle Filtering Techniques in Video Sequences
263
objects, making tracking robust to appearance variations due to changing illumination and pose. 2.2.1. Shape Information When a specific class of objects is considered, a complete model of its shape can be learned offline and contour cues can be applied to capture the visual appearances of tracked entities. Colour/spline based PFs are developed in [6, 7]. In [7], colour information has been used in particle filtering for initialisation and importance sampling. These models can be contaminated by edge clutter and they are not adaptable to scenarios without a predefined class of objects to be tracked or where the class of objects does not exhibit very distinctive silhouettes. When shape modelling is not appropriate, colour cues are a powerful alternative. 2.2.2. Colour Modelling Colour represents an efficient cue for object tracking and recognition that is easy to implement and requires only modest hardware. Most colour cameras provide RGB (red, green, blue) signal. HSI (hue, saturation, intensity) representation [11] can also be used [12]. Hue refers to the perceived ‘colour’ (technically, the dominant wavelength), e.g., ‘purple’ or ‘orange’. Saturation measures its dilution by white light, giving rise to ‘light purple’, ‘dark purple’, etc., i.e., it corresponds to ‘vividness’ or ‘purity’ of colour. HSI decouples the intensity information from the colour, while hue and saturation correspond to human perception. Colour-based trackers have been proven to be robust and versatile for a modest computational cost [13, 1, 14]. Colour localisation cues are obtained by associating a reference colour model with the object or region of interest. This reference model can be obtained by hand-labelling or from some automatic detection module. To assess whether a given candidate region contains the object of interest, a colour model of the same form as the reference model is computed within the region and compared to the reference model. The smaller the discrepancy between the candidate and the reference models, the higher the likelihood that the object is located inside the candidate region. Histogram-based colour models are used in [1, 8, 13]. The likelihood is computed from the histogram distance between the empirical colour distribution in the hypothesised region and the reference colour model. For colour modelling in [1], independent normalised histograms are used in the three channels of the RGB colour space. The colour likelihood model is then defined to favour candidate colour histograms close to the reference histogram. An appropriate distance metric for making decisions about the closeness of the histograms h1 , h2 is the Bhattacharyya similarity coefficient [15, 16] 1/ 2
D(h1 , h2 )
B § · ¨¨1 ¦ hi,1hi,2 ¸¸ © i 1 ¹
,
(1)
where B is the number of bins. This metric is within the interval [0,1]. Based on this distance, the colour likelihood model can be defined by [1]
264
L. Mihaylova et al. / Object Tracking by Particle Filtering Techniques in Video Sequences 1/ 2
§ · c p( z | x ) v ¨ D 2 (hxc , href ) / 2V C2 ¸ ¨ ¸ © c{R ,G , B} ¹
¦
(2)
c based on the histograms of the target and reference objects, respectively hxc and href ,
and standard deviation V C of the colour cue. Two PFs are developed in [1]: one based on colour and sound and one based on colour and motion. In the PF with colour and sound the search is performed at first in one dimensional space x direction, followed by another in a two dimensional space (x, y). This increases the PF efficiency, allowing the same accuracy to be achieved for a smaller number of particles. The same strategy is applied when colour and motion are being fused. The colour cues are persistent and robust to changes in pose and illumination, but are more prone to ambiguity, especially if the scene contains other objects characterised by a colour distribution similar to that of the object of interest. The motion and sound cues are very discriminative and they allow the object to be located with low ambiguity. 2.2.3. Motion Cues Instantaneous motion activity captures other important aspects of the sequence content and has been studied from various perspectives [17]. In the case of a static camera, the absolute value of the luminance frame difference computed on successive pair images is used to calculate a likelihood model [1] similar to the one developed for the colour measurements. Motion cues are usually based on histogramming consecutive frame differences. 2.2.4. Texture Cues Despite there being no unique definition of texture, it is generally agreed that texture describes the spatial arrangements of pixel grey levels in an image, which may be stochastic or periodic or both [18]. Texture is often considered to be made up of basic elements (textural primitives) repeated in a regular or random fashion across the image. Some of the most successful methodologies proposed to describe and analyse textures are spatial frequency techniques and stochastic random fields approaches. Texture cues can be implemented, e.g., by using wavelet transforms [20]. 2.2.5. Edge Cues Edges are pixels where the intensity changes abruptly. An edge in an image is usually taken to mean the boundary between two regions with relatively distinct grey levels. The ‘ideal’ situation is when the two regions have distinct constant grey levels and the edge is characterised by an abrupt change. However, in most practical situations, edges are usually characterised by a smooth transition in grey level with the two regions having slowly varying but distinct average grey level. Edges may be: i) viewpoint dependent - they may change as the viewpoint changes and typically reflect the geometry of the scene, objects occluding one another, or they may be ii) viewpoint independent - reflect properties of the viewed objects, e.g., markings and surface shape.
L. Mihaylova et al. / Object Tracking by Particle Filtering Techniques in Video Sequences
265
An image function depends on two co-ordinates in the image plane - and so operators describing edges are expressed using partial derivatives. A change of the image function can be described by a gradient that points in the direction of the largest growth of the image function. An edge [11], is a property attached to an individual pixel and is calculated from the image function behaviour in a neighbourhood of that pixel. An edge is a vector variable with two components: magnitude and direction. The edge magnitude is the magnitude of the gradient and the edge direction T is rotated with respect to the gradient direction ȥ by -900. 2.2.6. Multiple Cues The greatest weakness of the colour cue is its ambiguity due to the presence of objects or regions with colour features similar to those of the object of interest. By fusing colour, motion, texture, and other cues, this ambiguity can be considerably reduced if the object of interest is moving as shown in [1, 19, 20]. When the object is moving, strong localisation cues are provided by motion measurements, whereas colour measurements can undergo substantial fluctuations due to changes in the object pose and illuminations. Conversely, when the object is stationary or near stationary, motion information disappears and colour information dominates to provide a reliable localisation cue.
3. Particle Filtering Using Multiple Cues A PF algorithm for object tracking in video sequences using multiple cues - colour and texture - was developed [19, 20] and is presented in Table 1. Results from a natural sequence are shown in Fig. 1 with colour and texture cues (initial, intermediate and last frame). Fig. 2 presents the root-mean-square error obtained with synthetic data. It is evident from Fig. 2 that the colour cue, when compared to the other cues, is the least accurate.
Figure 1. a) initial frame
b) intermediate frame
c) last frame
Results from a natural sequence: tracking of the small boat by colour and texture cues with a PF.
266
L. Mihaylova et al. / Object Tracking by Particle Filtering Techniques in Video Sequences 3 Colour and Texture Cues Colour Cues Texture Cues
2.5
RMSE
xy
2
1.5
1
0.5
0 0
10
20
30
40
50
Frame
Figure 2. Results with synthetic data [19], with 100 Monte Carlo runs: root-mean-square error in x and y directions combined.
Due to space limitations other results could not be included here. However, these results show the algorithm performance under different scenarios. The PF is able to: 1) track a single moving object and 2) retrieve the object after tracking loss. This is achieved by a mixed-state motion model [19] composed of a constant velocity model, and a re-initialisation model drawing uniform samples (needed to recover the object after being lost). Table 1. A particle filter with multiple cues Initialisation
1. k = 0, for i=1, …, N, generate samples {x0 (i)} from the initial distribution p(x0 (i)) Prediction step
2. For k =1,2,…, i=1, …, N, sample x0(i) ~ p(xk+1|xk (i)) according to the object model Measurement Update: evaluate the importance weights
3. On the receipt of a new measurement, compute the weights
Wˆ k(i)1 v Wk(i)1 / L( z k 1 | x k(i)1 )
N
4. Normalise the weights Wˆ k(i) 1 v W k(i)1 /
¦W
(i) k 1
. The likelihood
L( z k 1 | x k(i)1 ) is calculated as a
i 1
product of the likelihoods of the separate independent cues. Output 5. A collection of samples from which the approximate posterior distribution is computed N
pˆ ( x k 1 | Z k 1 )
¦Wˆ
i k 1G ( x k 1
x k(i) 1 ) ,
i 1
where
Z k 1 is the set of measurements available till the time instant k+1.
6. The posterior mean is computed using the collection of samples (particles) N
xˆ k 1
E[ x k 1 | Z k 1 ]
¦Wˆ
(i ) ( i ) k 1 x k 1
i 1
Selection step (resampling)
7. Multiply/ suppress samples
xk( i)1 with high/ low importance weights Wˆ k(i)1
(i ) random samples approximately distributed according to p ( x k 1
8. Set
k o k 1 and return to step 2.
|Z
k 1
).
in order to obtain N new
L. Mihaylova et al. / Object Tracking by Particle Filtering Techniques in Video Sequences
267
4. Conclusions and Open Issues for Future Research
Particle filtering is a technique that is very suitable for object tracking in video sequences. We have results for a single object using video sequences from a fixed or moving single camera. The tracking algorithm is based on colour and texture cues. There are several challenges in solving tracking problems in image/ video applications. The first is the non-linear character of the object of interest and of the observation model. The algorithms must often run at high update rates. In many applications, prior information available for the environment is limited. From the point of view of implementations, this research domain is rich and challenging because of the need to overcome occlusions of the tracked entities over one or more frames and dealing with missing sensor data. How to handle clutter in the background is of considerable importance as well, especially with multiple targets. In case of multiple sensors, the data has to be fused appropriately, and probabilistic data association techniques are then of primary importance. We are aiming to consider: i) detection of the object, i.e., the object has to be localised at first in the image and continuously tracked afterwards. One of the biggest problems in motion based tracking is losing the object due to rapid movements and re-detecting the object of interest and following its movement afterwards; ii) tracking rigid and non-rigid bodies in three dimensions with multiple dynamically selected static or moving cameras.
References [1] P. Pérez, J. Vermaak, A. Blake, Data Fusion for Tracking with Particles, Proc. IEEE, 92:3, 2004, 495513. [2] N. Gordon, D. Salmond and A. Smith, A Novel Approach to Non-linear / Non-Gaussian Bayesian State Estimation, IEE Proc. on Radar and Signal Processing, 40, 1993, 107-113. [3] A. Doucet, N. Freitas, N. Gordon, Eds., Sequential Monte Carlo Methods in Practice, New York: Springer-Verlag, 2001. [4] M. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A Tutorial on Particle Filters for Online Non-linear/ Non-Gaussian Bayesian Tracking, IEEE Trans. Sign. Proc., 50: 2, 2002, 174-188. [5] J. Liu, Monte Carlo Strategies in Scientific Computing, Springer Verlag, 2001. [6] M. Isard and A. Blake, Contour Tracking by Stochastic Propagation of Conditional Density, European Conf. on Comp. Vis., Cambridge, UK, 1996, 343-356. [7] M. Isard, A. Blake, Condensation -- Conditional Density Propagation for Visual Tracking, Intl. Journal of Computer Vision, 28:1, 1998, 5-28. [8] C. Shen, A. van den Hengel, A. Dick, Probabilistic Multiple Cue Integration for Particle Filter Based Tracking, Proc. of the VIIth Digital Image Comp.: Techniques and Appl., 2003. [9] Y. Bar-Shalom, X.R. Li, Estimation and Tracking: Principles, Techniques and Software, Artech House, 1993. [10] H. Nait-Charif, S. McKenna, Tracking Poorly Modelled Motion Using Particle Filters with Iterated Likelihood Weighting, Proc. of Asian Conf. on Comp. Vis., 2003. [11] M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysis, and Machine Vision, IInd Edition., Brooks/ Cole Publ. Company, 1999. [12] S. McKenna, S. Jabri and S. Gong, Tracking Colour Objects Using Adaptive Mixture Models, Image and Vision Computing, 17:3-4, 1999, 225-231. [13] K. Nummiaro, E. Koller-Meierand, L. Van Gool, An Adaptive Color-Based Particle Filter, Image and Vision Comp., 21, 2003, 99-110. [14] D. Comaniciu, V. Ramesh, P. Meer, Real-Time Tracking of Non-Rigid Objects Using Mean Shift, Proc. of 1st Conf. Comp. Vision Pattern Recogn., 2000, 142-149. [15] F. Aherne, N. Thacker, P. Rockett, The Bhattacharyya Metric as an Absolute Similarity Measure for Frequentcy Coded Data, Kybernetika, 3: 4, 1997, 1-7.
268
L. Mihaylova et al. / Object Tracking by Particle Filtering Techniques in Video Sequences
[16] T. Kailath, The Divergence and Bhattacharyya Distance Measures in Signal Selection, IEEE Trans. on Communication Technology, COM-15:1, 1967, 52-60. [17] J. Konrad, in Handbook of Images and Video Processing, Academic Press, 2000, 207-225. [18] R. Porter, Texture Classification and Segmentation, PhD thesis, 1997, Univ. of Bristol. [19] P. Brasnett, L. Mihaylova, N. Canagarajah, D. Bull, Particle Filtering with Multiple Cues for Object Tracking in Video Sequences, Proc. of SPIE's Annual Symp. EI ST, 5685, 2005. [20] P. Brasnett, L. Mihaylova, N. Canagarajah, D. Bull Sequential Monte Carlo Tracking by Fusing Multiple Cues in Video Sequences, IEEE Trans. on Image Proc., submitted, 2005.
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
269
Wavelets, Segmentation, Pixel- and Region- Based Image Fusion J. J. LEWIS1,2, R. J. O’CALLAGHAN, S. G. NIKOLOV, D. R. BULL and C. N. CANAGARAJAH The Centre for Communications Research, University of Bristol, UK
Abstract. Enabling technologies for pixel- and region-based fusion of wavelets and segmentation are introduced. Pixel-based fusion using the Discrete Wavelet Transforms (DWT) and the Dual-Tree Complex Wavelet Transform (DT-CWT) are discussed and compared with a region-based fusion method using the DTCWT. Rather than performing fusion pixel-by-pixel, segmentation is used to produce a set of regions representing features in the image and fusion is performed on a region-by-region basis. The DT-CWT is found to outperform the DWT. Region-based fusion gives results comparable to pixel-based DT-CWT fusion and has a number of advantages over these methods such as more intelligent fusion rules and the ability to manipulate regions with certain properties. Keywords. Image Fusion, Discrete Wavelets, Complex Wavelets
Introduction Image fusion usually makes use of either redundant or complimentary information in two or more images to produce a single image with improved accuracy or reduced uncertainty or produce an image with more information than any of the input images [1]. Thus, data collected from different modalities, at different times or frame rates, or using sensors in different locations can be fused to produce a single image. Image fusion is defined in [2] as “the process by which several images or some of their features are combined together to form a single image.” Traditionally, fusion is performed at one of four levels of abstraction: signal-level; pixel-level; feature-level; and object-level. The majority of image fusion methods implemented to date are pixellevel methods, where fusion is performed on a pixel-by-pixel basis depending on the information contained at that pixel or in an arbitrary window around that pixel. These methods range from the simple, such as averaging to the more complicated where the images are first transformed (e.g., using pyramids [3-5] or wavelets [2, 6-10]) and fusion is performed in these domains. More recent fusion work has included featurelevel algorithms such as region-based fusion [11-14]. Wavelet transforms are a key technology successfully used in fusion and segmentation is required for region-based fusion. These are discussed in some detail and results from fusion using both pixel- and region-based algorithms discussed.
1
Corresponding Author: J. J. Lewis, The Centre for Communications Research, University of Bristol, Bristol, BS8 1UB, UK; Tel.:+44 117 331 5073; E-mail:
[email protected]. 2 This work is funded by the UK MOD Data and Information Fusion Defence Technology Center.
270
J.J. Lewis et al. / Wavelets, Segmentation, Pixel- and Region- Based Image Fusion
1. Wavelets Wavelets perform multiresolution analysis by carrying out a series of sub-band filtering operations on a signal and decomposing it into sums of basis functions. They are similar to Fourier decompositions, but while Fourier functions are only localized in space (i.e., frequency changes in the Fourier domain produces changes throughout the time domain), wavelets are local in both time and space [15]. At each scale, high- and low-pass filters split the signal into detail and approximation signals. Wavelets are derived from a basis function called the mother wavelet. 1.1. The Discrete Wavelet Transform (DWT) The Discrete Wavelet Transform (DWT) is an orthogonal one-dimensional wavelet applied in two dimensions using filtering and down sampling performed along columns then rows. An example of DWT coefficients is given in Figure 1. At each scale there are four sub-bands: High-High (HH); High-Low (HL); Low-High (LH); and Low-Low (LL). The LH sub-band is sensitive to vertical frequencies, the HL to horizontal frequencies and the HH sub-band to diagonal (45º) frequencies. The DWT can be recursively applied to LL sub band to achieve multiresolution analysis. Provided that the filters used are orthogonal (e.g., the Daubechies mother wavelet), a related set of synthesis filters will exist to perfectly reconstruct the original image [2]. The DWT has been found to have a number of advantages over other multiresolution schemes such as pyramids schemes: (a) Compact representation: the wavelet transform is the size of the original image which is a more compact representation than pyramids [7]; The wavelet transform provides directional information on the image while pyramids do not contain any spatial orientation selectivity in the decomposition process [7]; Pyramid based fused images often contain blocking artifacts which do not occur in wavelet fused images [7]; Images generated by wavelets have better Signal To Noise Ratios (SNR) when compared with images fused using pyramids [4]; Finally, wavelet based fused images are better perceived than pyramid based fused images when compared using human analysis [4, 7]. 1.2. The Dual-Tree Complex Wavelet Transform There are two main problems with DWT [16]: the DWT is not shift invariant due to the sub-sampling at each scale, i.e., small shifts in the input signal to the DWT can cause large changes in the energy across the sub-bands at different levels. Shift invariance within wavelet transform image fusion is essential for the effective comparison of coefficient magnitudes by the fusion rule. The Shift Invariant DWT (SIDWT) [6] was developed to improved shift invariance in the DWT by removing all sub-sampling causing a very over complete method. The Dual Tree CWT (DT-CWT) [16] overcomes this problem by not decimating at the first level of filtering and producing two fully decimated trees from the odd and even samples produced at the first level. The DT-CWT has improved directional selectivity over DWTs as complex wavelets are able to distinguish between positive and negative orientations, six distinct subbands are produced at ±15o, ±45o, ±75o. Qualitative and quantitative experiments a [2, 8] show that DT-CWT outperformed methods such as DWT and SIDWT. All DWT schemes suffered from ringing and the DT CWT shows less ringing errors and better preserves subtle details, but there are increased computational costs.
J.J. Lewis et al. / Wavelets, Segmentation, Pixel- and Region- Based Image Fusion
(a) Original Image
(b) DWT Coefficients
271
(c) DT-CWT Coefficients
Figure 1. Wavelet Decompositions
2. Segmentation Segmentation is a key step in many computer vision tasks (e.g., tracking; classification; object based coding; and region-based fusion). A plethora of papers exist including some review chapters and papers such as [15, 17, 18]. Segmentation is defined by [17] as the process of partitioning the image into some non-intersecting regions, such that each region is homogeneous and the union of no two adjacent regions is homogeneous.
Approaches to segmentation can generally be divided into four methods: (a) EdgeBased; (b) Region-Growing; (d) Model-Based. A set of segmented images can be fused in region-by-region. The quality of the segmentation is important to produce good fused images and ideally should have the following properties: segmented as a set of closed connected regions; each feature in the image is represented by a single region; and as few regions as possible created as more regions take longer to fuse. 2.1. The Combined Morphological-Spectral Image Segmentation (CoMSUIS) Algorithm The Combined Morphological-Spectral Image Segmentation (CoMSUIS) algorithm [19] has been found to compare well with existing algorithms. It groups together areas of similar intensity and/or texture into separate regions. Texture information can be modeled as the superposition of oscillating components at characteristic scales and orientations. Textural information is extracted from the sub-bands of the DT-CWT. These are more efficient than other techniques such as two dimensional Gabor filters giving similar accuracies, compact representation and the transform is complete [20]. A perceptual gradient function is derived from the intensity and texture information. Larger gradients indicate possible edge locations (e.g., Figure 2(b)). A region-based method, the watershed transform (described in [15]), is used to produce the initial segmentation. However, it tends to over-segment and so this initial segmentation is further processed to reduce the number of regions, with a spectral clustering algorithm. Regions representing the same feature are grouped together by globally optimizing a cost function. The initial regions are used to construct a graph representation of the image which is processed by the spectral clustering algorithm. 2.2. Joint and Unimodal Segmentation Traditionally, information from an image produces a single segmentation map. This is called unimodal segmentation. However, fusion tasks usually deal with a set of two or more images. A weak region in one image may correspond to a strong region in another. There is an advantage of using information from all images of a scene to
272
J.J. Lewis et al. / Wavelets, Segmentation, Pixel- and Region- Based Image Fusion
(a) Original Texture
(b) Gradient Image
(c) Initial Segmentation
(d) Final Segmentation
(a) Unimodal Segmentation of Visible Image
(b) Unimodal Segmentation of IR Image
(c) Union of both Unimodal Segmentations
(d) Joint Segmentation
Segmentation textures
Joint and Unimodal Segmentations Figure 2. Segmentation Methods
produce a single segmentation map for all images in the set. This process is called Joint Segmentation and is introduced in [12]. In general, jointly segmented images work better for fusion as the segmentation map contains a minimum number of regions to represent all the features in the scene most efficiently. With separately segmented images, where different images show features differently a problem occurs where regions partially overlap. If the overlapped region is incorrectly dealt with, artifacts will be introduced and the extra regions created to deal with the overlap will increase the time taken to fuse the images. Joint segmentation can overcome some of the problems of noise and other inaccuracies in an image to produce a more reliable segmentation. However, if the information from the segmentation process is going to be used to register the images or if input images are very different, separate segmentations of the images are needed. The effects of segmenting the images in different ways are shown in Figure 2. In particular, the inefficient segmentation union of the two unimodal segmentation maps, which is necessary in order to fuse the images, is shown in Figure 2(c).
3. Fusion 3.1 Pixel Based Image Fusion Consider N registered input images, I1, I2…IN. Multiresolution fusion methods involve transforming these registered images from normal image space into another domain by applying the transform, ω, fusing using some rules, F, and then performing the inverse transform, ω-1, to reconstruct the fused image, I [2]. I = ω −1 (F (ω (I1 ), ω (I 2 ),...,ω (I N )))
(1)
273
J.J. Lewis et al. / Wavelets, Segmentation, Pixel- and Region- Based Image Fusion DT-CWT
I1
DT-CWT
I2
DT-CWT
I1
F
DT-CWT
I2
DT-CWT-1
F
DT-CWT-1
I DT-CWT
IN Input Images
I DT-CWT
IN
Complex Wavelet Fusion Fused Wavelet Coefficients Coefficients Rule
Fused Image
(a) Pixel-Based Fusion with the DT-CWT
Input Images
Complex Wavelet Coefficients
Joint/Separate Segmentation
Region Fused Wavelet Fusion Rule Coefficients
Fused Image
(b) Region-Based Fusion with the DT-CWT
Figure 3. Pixel Based Fusion Methods
Figure 3(a) shows pixel-based fusion with the DT-CWT. As wavelets tend to pick out salient features in an image (such as corners and edges), wavelet coefficients with larger values contain more information about the features in an image. Thus, a choose maximum scheme picking the higher absolute wavelet coefficient at each pixel gives good results. More complex fusion rules have been proposed such as coefficients combined as a weighted average [5] based on a local activity measure in the sub-bands of the images. An area based selection rule with consistency verification [7] decision at each pixel is made based on image with the higher the activity of a small arbitrary window centered on the pixel. If the activity of pixels from different images is similar, an average could be considered. Finally, a consistency check is made. These methods give some improvement to the quality of the fused image especially for DWT fusion. These methods can be thought of as a step towards region based fusion, but the arbitrary regions used here bear no relation to the features in the image. 3.2 Region Based Image Fusion The majority of applications of a fusion scheme are interested in features within the image, not in the actual pixels. Therefore, it seems reasonable to incorporate feature information into the fusion process [11]. There are a number of perceived advantages of this, including: • Intelligent fusion rules: Fusion rules are based on combining the regions of an image. Thus, more useful tests for choosing between the regions, based on various properties of a region, can be implemented; • Highlight features: Regions with certain properties can be either accentuated or attenuated in the fused image depending on a variety of the region's characteristics; • Reduced sensitivity to noise: Processing semantic regions rather than individual pixels or arbitrary regions can help overcome some of the problems with pixel-fusion methods such as sensitivity to noise, blurring effects and mis-registration; A number of region-based fusion schemes have been proposed, for example, [1114]. These initially transform pre-registered images using an Multiresolution (MR) transform. Regions representing image features are then extracted from the transform coefficients. A grey-level clustering using a generalized pyramid linking method is used for segmentation in [11]. The regions are then fused based on a simple region property such as average activity. These methods do not take full advantage of the wealth of information that can be calculated for each region.
274
J.J. Lewis et al. / Wavelets, Segmentation, Pixel- and Region- Based Image Fusion
A region-based fusion algorithm, initially proposed in [12], is briefly described here and shown in Figure 4. Initially, the registered input images are transformed
(a) Priorities for Visible Image
(b) Priorities for IR Image
(c) Mask
Figure 4. Choosing the Regions for the Fused Images
with the DT-CWT and the high-pass coefficients together with the original image are passed to the CoMSUIS algorithm to produce a set of corresponding segmentation maps. Joint or unimodal segmentation can be used, but if unimodal segmentation is used, the union of both segmentation maps is used in the fusion process. The segmentation map is then down-sampled, giving priority to smaller regions, so that a segmentation map is available at each level of the wavelet coefficients. A priority value that determines whether a given region in an input image should be included in the fused image is calculated for all regions in all input images. Thus, a priority map is generated for each image in the wavelet domain. Priority can be calculated from some property of a region, such as a statistical measure (e.g., activity, variance or entropy), the size, shape or spatial position of a region. Examples of priority maps are provided in Figure 4(a) and 4(b) using variance to calculate priority. Fusion decisions can now be made region by region based on the priority maps. Possible fusion rules include weighted averages of regions or a “choose maximum region” scheme. Intuitively, weighted averages based on the priority maps should produce good results, however, the averaging effect is detrimental to the quality of the fused image and the choose maximum scheme gives better results. Figure 4(c) shows the fusion decision: black regions are taken from the IR image while grey regions are from the visible image. The wavelet coefficients are combined based on this mask and the fused image reconstructed with the inverse DT-CWT. One of the main advantages of region based fusion is that as we are dealing with regions representing actual features in the images, the regions can be manipulated to improve the fused image for an end user. Based on some property of the region, manual or automatic classification etc., a region can easily be attenuated or highlighted to change its influence in the fused image.
(a) DWT Pixel Fused Image
(b) DT-CWT Pixel Fused Image Figure 5. Image Fusion
(c) DT-CWT Region Fused Image
J.J. Lewis et al. / Wavelets, Segmentation, Pixel- and Region- Based Image Fusion
275
4. Results and Discussion The IR and visible images shown in Figure 5 were fused with three methods: DWT pixel-based; DT-CWT pixel-based; and DT-CWT region-based. Four levels of decomposition and a choose maximum fusion rule are used with all methods with joint segmentation for region-based fusion. The results are given in Figure 6. The DWT has a number of artifacts, including ringing, particularly around edges with large contrast changes. These artifacts are much less obvious in the DT-CWT fusion. The regionbased fused image has improved contrast over the pixel-based methods as pixel-based fusion techniques tend to cause some averaging between the images and the visible image is very dark. However, some detail from the visible image is lost in the regionfused image, for example the café windows. This is caused since the segmentation (see Figure 2(d)) has not picked up some detail in the background and, as it is taken from the IR image, this detail from the visible image is lost. Figure 6 shows an example of how regions can easily be manipulated to improve the fused result. In this situation, we define a problem where it is more important to spot a figure that is closer to the road. The distance between the centre of masses of the region representing the road and the figure is calculated and the coefficients of the figure are weighted inversely proportional to the closeness to the road. For this experiment the road was manually selected and the figure detected by thresh-holding the IR image. These images are jointly segmented and fused using an entropy priority. The figure is seen to get brighter as he moves closer to the road. While this is a relatively trivial example, it is a worthwhile exercise showing some advantages of region-based fusion.
Figure 6. Person in IR Highlighted Depending on Closeness to the Road3
5. Conclusions This paper has introduced the topics of wavelet transforms and segmentation: two key technologies for many fusion applications. The advantages of DT-CWT over the DWT 3
The original IR and visible images are kindly supplied by Alexander Toet of the TNO Human Factors Research Institute and are available online at www.imagefusion.org.
276
J.J. Lewis et al. / Wavelets, Segmentation, Pixel- and Region- Based Image Fusion
and wavelets over other transforms have been discussed and the DT-CWT has been shown to out perform other methods for pixel based fusion. The CoMSUIS segmentation algorithm was described and used in a region based segmentation algorithm. The region-based DT-CWT fusion has been shown to produce fused images of similar quality to pixel-based fusion. The main advantage of region-based fusion is the ability to use more intelligent higher-level fusion rules; however this is at a cost of complexity.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
[14] [15] [16] [17] [18] [19] [20]
Abidi, M. and R. Gonzalez, Data Fusion in Robotics and Machine Intelligence. 1992, USA: Academic Press. Nikolov, S.G., et al., Wavelets for Image Fusion, in Wavelets in Signal and Image Analysis, A.P.a.F. Meyer, Editor. 2001: Dordrecht, The Netherlands. p. 213--244. Toet, A., J.v. Ruyven, and J. Valeton, Merging thermal and visual images by contrast pyramids. Optical Engineering, 1989. 28(7): p. 789-792. Wilson, T., S. Rogers, and M. Kabrisky, Perceptual based hyperspectral image fusion using multispectral analysis. Optical Engineering, 1995. 34(11): p. 3154--3164. Burt, P. and R. Kolczynski, Enhanced image capture through fusion, in Proceedings of the 4th international conference on Computer Vision. 1993. p. 173--182. Rockinger, O., Image Sequence Fusion Using a Shift-Invariant Wavelet Transform, in Proceedings of the IEEE International Conference on Image Processing. 1997. p. 288--291. Li, H., S. Manjunath, and S. Mitra, Multisensor Image Fusion Using the Wavelet Transform. Graphical Models and Image Processing, 1995. 57(3): p. 235--245. Hill, P.R., Wavelet Based Texture Analysis and Segmentation for Image Retrieval and Fusion, in Department of Electrical and Electronic Engineering. 2002, University of Bristol: Bristol, UK. Chipman, L., T. Orr, and L. Graham, Wavelets and Image Fusion, in Wavelet Applications in Signal and Image Processing III. 1995. p. 208--219. Rockinger, O., Pixel-Level Fusion of Image Sequences using Wavelet Frames, in Proceedings of the 16th Leeds Applied Shape Research workshop. 1996. Piella, G., A general framework for multiresolution image fusion: from pixels to regions. Information Fusion, 2003. 4: p. 259--280. Lewis, J.J., et al. Region-Based image Fusion Using Complex Wavelets. in The Seventh International Conference on Image Fusion. 2004. Stockholme. Matuszewski, B., L.-K. Shark, and M. Varley, Region-based wavelet fusion of ultrasonic, radiographic and shearographyc non-destructive testing images, in Proceedings of the 15th World Conference on Non-Destructive Testing. October 2000: Rome. Zhang, Z. and R. Blum, Region-Based Image Fusion Scheme for Concealed Weapon Detection, in Proceedings of the 31st Annual Conference on Information Sciences and Systems. March 1997. Sonka, M., V. Hlavac, and R. Boyle, Image Processing, Analysis, and Machine Vision. 1999, Brooks/Cole Publishing Company: USA. Kingsbury, N., The dual-tree complex wavelet transform: a new technique for shift invariance and directional filters, in IEEE Digital Signal Processing Workshop. 1998. Pal, N.R. and S.K. Pal, A review on image segmentation techniques. Pattern Recognition, 1993. 26(9): p. 1277--1294. Cheng, H.D., et al., Color image segmentation: Advances and prospects. Pattern Recognition, 2001. 34(12): p. 2259--2281. O'Callaghan, R. and D.R. Bull, Combined Morphological-Spectral Unsupervised Image Segmentation. IEEE Transactions on Image Processing, 2005. 14(1): p. 49--62. Hill, P.R., C.N. Canagarajah, and D.R. Bull, Image segmentation using a texture gradient based watershed transform. IEEE Transactions on Image Processing, 2003. 12(12): p. 1618-1633.
Advances and Challenges in Multisensor Data and Information Processing E. Lefebvre (Ed.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
277
Data Fusion and Quality Assessment of Fusion Products: Methods and Examples Paolo CORNA a , Lorella FATONE,b and Francesco ZIRILLIc,1 Via Silvio Pellico 4, 20030 Seveso (MI), Italy, e-mail:
[email protected] b Dipartimento di Matematica Pura ed Applicata, Università di Modena e Reggio Emilia, Via Campi 213/b, 41100 Modena (MO), Italy, e-mail:
[email protected] c Dipartimento di Matematica “G. Castelnuovo”, Università di Roma “La Sapienza” Piazzale Aldo Moro 2, 00185 Roma, Italy, e-mail:
[email protected] a
Abstract. In this paper we present ideas about image fusion methods based on the use of partial differential equations (PDE). These ideas have been translated into several mathematical models of fusion procedures. These models have been approached with appropriate numerical algorithms and tested on real data (e.g., ERS-SAR and SPOT data). Moreover we introduce quality assessment techniques for fusion products, i.e., quantitative procedures able to measure the quality of the fused images compared with the quality of the originals or other images. Some numerical results, obtained from these quality assessment techniques during tests on real data, are shown. Keywords. image fusion, calculus of variations, non linear optimization, quality assessment of fusion products
Introduction Data fusion covers a very wide domain, making it difficult to precisely define. In the last decades several definitions of data fusion have been proposed in the literature. Hall and Llinas [1] give the following definition: “data fusion techniques combine data from multiple sensors, and related information from associated databases, to achieve improved accuracy and more specific inferences that could be achieved by the use of single sensor alone”. The open geographic information systems (GIS) consortium defines fusion as “the process of organizing, merging and linking disparate information elements (i.e., map features, images, video and so on) to produce a consistent and understandable representation of an actual or hypothetical set of objects and/or events in space and time”. According to Wald [1], data fusion is a “formal framework in which are expressed means and tools for the alliance of data of the same scene originating from different sources. It aims at obtaining information of greater quality; the exact definition of greater quality will depend upon the application”. More specifically, image fusion focuses on the combination of images rather than the more general process of combining data. The information obtained from fused images generally enhances the information obtained from the originals. Moreover we note that
1
Corresponding author
278
P. Corna et al. / Data Fusion and Quality Assessment of Fusion Products
a secondary purpose of data fusion may be saving memory space when storing or transmitting data relative to a scene. One of the most significant and straightforward examples able to illustrate the advantages and benefits of data fusion is human vision. The two eyes extend the ability of a single eye; in fact they have a slightly different viewing angle that makes stereo vision and depth perception possible. Moreover if one eye is disabled, vision is still possible, although in a degraded mode. This is what, in the fusion framework, is called exploitation of redundancy. With human vision, the image fusion process is carried out by the brain, while for digital data, it is carried out through numerical algorithms. The definitions mentioned above are concerned with fusion methods and information quality. We note that in this context quality does not have a very specific meaning. It is a generic word denoting that the information available is more satisfactory for the “customer” after the fusion process is performed than before it is performed. The problem of giving a quantitative meaning to statements about the quality of information contained in images resulting from fusion processes is called quality assessment of fusion products. Data fusion applications are numerous - we mention only two of them: remote sensing applications in earth observation and medical imaging. In the first application, see for example [2], [3], sensors travelling onboard satellites or airplanes provide repeated coverage of the earth’s surface on a regular basis and furnish a large number of data that can be of great interest for earth resource assessment and environment monitoring. For a proper exploitation of these data, it is mandatory to develop effective data fusion techniques able to take advantage of the multisource and multitemporal characteristics of the available data. In the second application, the medical framework, see for example [4], non invasive imaging technologies provide a unique window on the anatomy, physiology and functioning of living organisms. In this specific case one interesting goal of a fusion procedure is the fusion of anatomical and functional images to allow improved spatial localization of abnormalities. In detail, multisensor image data are observations of a given scene acquired by different sensors; they are functions of the parameters that define unknown objects contained in the observed scene. The extraction of objects or object parameters from image data of a single sensor is an inverse problem. Therefore, the extraction of objects or object parameters from multisensor image data in a data fusion procedure corresponds to the joint solution of several inverse problems. There are several fusion approaches that can take place at the signal, pixel, feature or symbolic level of representation (see Figure 1). Signal-level fusion refers to the combination of signals from different sensors before the production of images. Pixellevel fusion consists of merging information from different digital images on a pixelby-pixel basis. Feature-based fusion merges the different data sources at the intermediate level - we speak of feature-level fusion when features extracted from different images are merged. Finally, symbolic-level fusion refers to the combination of information obtained from images at a higher level of abstraction. This last type of fusion is possible even when images come from very dissimilar sensors. Despite this classification, in several application fields, e.g., in earth observation, a fusion approach can deal simultaneously with more than one of these levels. Many approaches to multisensor data fusion that implicitly or explicitly deal with uncertainty are based on a variety of tools such as artificial neural networks, Markov random fields, Bayes networks, wavelet transforms, Dempster-Shafer methods, and fuzzy logic, as well as combinations of several of these techniques. For an extensive
P. Corna et al. / Data Fusion and Quality Assessment of Fusion Products
279
review of these techniques in data fusion we refer the interested reader to [5], [6] and the references quoted there. In recent years new techniques to process images based on Partial Differential Equations (PDE) have been proposed, see for example [7], [8], [9], [10] and the references therein. The use of PDE in image processing was originally introduced in the context of computer vision and robotics. These PDE based techniques are of practical interest due to the availability of numerical algorithms to solve PDE and the ability of today’s computers to quickly solve discretized PDE involving thousands or even millions of independent variables coming from the discretization procedure. Note that often each pixel in an image is associated with one independent variable of the discretized PDE. While data fusion is becoming a mature research field both in engineering and applied mathematics, the problem of quantitative quality assessment of fusion products remains in the pioneering stage. In this paper we discuss fusion procedures that can be classified as feature level fusion procedures and the problem of quality assessment of fusion products. In particular, the image fusion problem has been translated into several different mathematical problems involving PDE, and these models have been solved with appropriate numerical algorithms and tested on real data (e.g., ERS-SAR and SPOT data of the earth’s surface). Note that we always assume that the images to be fused refer to the same scene and are coregistered. Moreover in this paper we present algorithms for the quality assessment of fusion products. That is, we give a quantitative basis (i.e., we define numerical performance indices) to explain in which sense we believe that “the quality of the information contained in the fused images is higher than the quality of the information provided by the original images considered one by one”. As well as comparing the “fused” images with the originals, we compare the “fused” images corresponding to different fusion procedures with each other. The general idea that we propose for the problem of quality assessment of fusion products is the use of automatic recognition techniques together with a multiscale resolution analysis of the images. Automatic recognition techniques (e.g., the Hough transform) are used to detect simple features in an image (e.g., straight lines, circles, ellipses,), and a multiscale algorithm is used to decompose an image into subimages containing only simple features. These ideas are tested on fusion products obtained from the fusion of ERS-SAR and SPOT data. Finally we compare fused images coming from the fusion of ERS-SAR and SPOT data of a given scene with a high resolution optical image of the same scene (IRS-1C data). We call this last image “ground truth”; which, for our purposes, establishes a conclusive criterion to perform the quality assessment of the fusion products obtained, and test the validity of the results obtained with the performance indices. The paper is organized as follows: in Section 2 we present several image fusion techniques based on PDE. In Section 3 we show results obtained using these techniques on the fusion of SAR (ERS-SAR data) and optical images (SPOT data) relative to the same scene on the earth’s surface. In Section 4 we suggest quantitative methods to measure the quality improvement of the fused images compared with the quality of the originals and each other. Finally we present some numerical results on the quality assessment of SAR/optical image fusion products and compare them with the results obtained using the “ground truth” data.
280
P. Corna et al. / Data Fusion and Quality Assessment of Fusion Products
(a) Signal-level fusion Sensor 1
Sensor 2
(b) Pixel-level fusion
Sensor n
Sensor 1
Sensor 2
Image 1
Fusion
Sensor n
Image 2
Image n
Processing Image 1
Image 2
Processing
Image m
Fusion
Results
Results
(c) Feature-level fusion Sensor 1
Image 1
Sensor 2
Image 2
(d) Symbolic-level fusion
Sensor n
Sensor 1
Image n
Image 1
Sensor 2
Sensor n
Image 2
Image n
Feature Extraction
Feature Extraction
Processing
Processing Feature Identification
Fusion
Fusion
Results
Results
Figure 1. Fusion approaches
1. The Use of PDE in Image Fusion The fusion procedures that we present are based on the idea of associating a “structure” to the images to be fused. Images referring to the same scene are supposed to have the same structure. Fusing two images consists of minimizing the difference between the image structures to be fused subject to the constraints posed by the data. Note that the fusion of more than two images can be treated with simple generalizations of the methods described below. We limit our attention to images comprising a few subregions, where the image is smooth but separated by boundaries where the image changes abruptly. Note that not all images satisfy these assumptions. For example images of objects with complicated textures or fragmented structures, such as a canopy of leaves, may not satisfy these assumptions. However we believe that the piecewise smooth or the piecewise constant model of the image that is at the foundation of the image segmentation and fusion algorithms discussed in this paper is a good model for many applications of potential interest to end users. For example we have in mind the
281
P. Corna et al. / Data Fusion and Quality Assessment of Fusion Products
classification of agricultural fields and urban areas starting from SAR and optical images (see Section 3). Given two images, the fusion algorithms we propose (see [11], [12], [13], [14]) perform the following functions: x segmentation and denoising; x fusion. First let us separately examine these two functions. 1.1. Segmentation and Denoising of Images: PDE Based Filters The goal is to decompose a given noisy image of the type described above into piecewise smooth regions bounded by contours where the image intensity is allowed to change abruptly. This is called image segmentation, or more precisely due to the presence of noise in the images, image segmentation and denoising. Let R be a rectangle. An image can be seen as a function g(x,y), (x,y) R. The variables (x,y) can be real variables or discrete variables taking values from a discrete set. The processing of images using PDE is based on the idea that the images with discrete values of the independent variables (x,y) (i.e., images made by pixels) can be considered approximations of images where the independent variables (x,y) are real variables. If we consider black and white images, we can assume g to be a real valued function, so that g(x,y) is a measure of the brightness of the image in the location (x,y). In several applications, where digital images are concerned, the dependent variable g takes values from a discrete set. The use of PDE in the processing of these images is based on the assumption that the digital image g can be seen as an approximation of an image where g takes real values. More general situations where g is a complex variable (e.g., SAR images) or a vector (e.g., colour images) can be treated with simple generalizations of the methods that follow. The process of measuring the brightness of an image is always affected by noise so that the measured image will always be a noisy ~ (x,y), (x,y) R the noisy image measured corresponding image. Let us indicate with g to the (ideal) image g. Note that different types of images measure different physical properties of the underlying scene and are affected by different types of noise due to the different characteristics of the instruments used to measure the images. Let us assume that the (ideal) image g(x,y), (x,y) R is a piecewise smooth or a piecewise constant function, i.e., there exists a finite number of subsets Ri, i=1,2,..., n, of
R
Ri R j
such
that
^Ri `in 1 n
if i z j and i 1 Ri
is
a
partition
of
R
(i.e.,
R , see Figure 2) and the function g(x,y) on
each set Ri, i=1,2,..., n, is a smooth function or a constant function and changes rapidly or even discontinuously across the boundaries delimiting the sets Ri, i=1,2,..., n. Let us denote with ī the union of the parts of the boundaries of Ri, i=1,2,..., n, that do not belong to the boundary of R, which we denote with wR, see Figure 2. We call ī the structure of the image g. Let |ī | denote the total length of ī and g R ( x, y ) denote the i
restriction of g(x,y) to Ri, i=1,2,..., n.
282
P. Corna et al. / Data Fusion and Quality Assessment of Fusion Products
Figure 2. The rectangle R; ī = dashed line n~
^~ `
~ * be the partition of R and the structure associated with g~ ~ (x,y), (x,y) respectively. Due to the presence of noise in the measurement process, g ~ R, is such that the relation between its structure * and ī and the relation between the ~ ~ ( x, y ) , i=1,2..., n~ , and the functions g ( x, y ), (x,y) R , i=1,2,..., functions g Let Ri
i 1
and
Ri
Ri
n, are not easily determined. Image segmentation and denoising is a numerical procedure that from the ~ (x,y), (x,y) R, recovers, as much as possible, g(x,y), (x,y) R, and knowledge of g in particular recovers the structure ī of g(x,y), (x,y) R. To successfully perform the image segmentation procedures suggested later, it is necessary that the partition of R,
^Ri `in 1 , that defines the structure of g(x,y), (x,y) R is a relatively “simple” one, i.e., it is a partition made of a few “easy” pieces so that ī is the union of a few elementary curves. The PDE based filters for image segmentation and denoising make use of PDE in two different ways: x solving a calculus of variation problem; x solving an initial value problem for an evolution equation. We refer to [9] for a sample of the type 1 approach and to [7] for a sample of the type 2 approach. The references [9], [7] are taken from the mathematical literature and deal mainly with the methodological aspects of using PDE in image processing. The engineering literature on this subject is vast and a very small sample of it can be found in [8], [10]. Type 1 filters transform the problem of image segmentation and denoising, i.e., the ~ ( x, y ) , (x,y) R with a piecewise problem of approximating the measured image g smooth (constant) function h(x,y), (x,y) R, in a problem of optimal approximation.
^ `
Let Rh ,i
nh
i 1
and *h be respectively the partition of R and the structure associated
with a function h. Note that h is a function smooth on each Rh ,i , i=1,2,..., n h , that is discontinuous (changes abruptly) across *h . We consider the following functional:
E (h, *h ) D
~ 2 dx dy + E
³ (h g ) R
³
R \ *h
|| h || 2 dx dy + F | *h | ,
(1)
where ||(·)|| is the Euclidean norm of the gradient of the function and D , E , F are positive constants used to control the scale of the segmentation and the smoothing effect. In particular the three addenda appearing in (1) have the following meaning: the
P. Corna et al. / Data Fusion and Quality Assessment of Fusion Products
283
~ , the second one is a first one is a measure of how much h approximates the data g measure of how much the function h differs from a constant function on each component Rh ,i , i=1,2,... , nh , of the partition of R associated with h and the third one is a measure of how complicated the partition of the rectangle R associated with h is. Note that since the decomposition of the rectangle R is unknown, the curve *h is an argument of the functional E. The optimal approximation problem to be considered is the following:
min E (h, *h ).
(2)
h , *h
Problem (2) is a calculus of a variation problem. Since *h is an unknown to be determined solving problem (2), i.e., since the function h is discontinuous across the boundary *h , this calculus of a variation problem is non standard and its solution, both from a mathematical and computational point of view, is a challenging task. If we
h is the required denoised ~ , approximates the and segmented image that, starting from the measured image g (ideal) image g , and * is the approximation of the structure ī associated with the obtained g (see Figure 3). We remind the reader that solving problem (2) through its denote the minimizer of problem (2) to ( h , * ) , then
first order optimality conditions corresponds to the solution of a problem involving elliptic PDE, i.e., PDE whose behaviour is similar to the behaviour of the Laplace equation. In scientific and engineering literature, several other choices of the functional E (h, *h ) have been considered that are omitted here for simplicity. Start
Read g~ , D , E , F
Compute the minimization step 'h , '*h
Compute E(h+ 'h, *h '*h )
no
min E(h, *h ) h, *
h
yes
Output (h , * )
End
Figure 3. Numerical solution of the calculus of variation problem
284
P. Corna et al. / Data Fusion and Quality Assessment of Fusion Products
Let us now consider the type 2 filters. Let u (t , x, y ) , (x,y) R, t t 0, be a real function; to fix the ideas let us consider the following problem:
wu wt
div(s a,b ( || u || ) u ) ,
wu (t , x, y ) wn
(x,y) R, t > 0,
(3)
0 , (x,y) wR , t > 0,
~ ( x, y ) , u(0,x,y) = g
(4)
(x,y) R
(5)
where div() is the divergence with respect to the (x,y) variables,
wu (t , x, y ) wn
means derivative of u in the direction n of the exterior unit normal vector to R in (x,y) wR , and the function sa,b( K ), K t0, is chosen as follows:
s a ,b (K )
a 1K 2 / b2
,
K
t 0,
(6)
where a and b are suitable real parameters such that a ! 0 , b z 0 (see Figure 4). Note that Eq. (3) is an evolution equation whose behaviour is similar to the behaviour of the heat equation. Start
Read g~,T,a,b
t=0
compute the time step 't
t