PROCEEDINGS IWISP '96 4-7 November 1996 Manchester, U.K.
This Page Intentionally Left Blank
PROCEEDINGS IWISP '96 4-7 November 1996, Manchester, United Kingdom Third International Workshop on Image and Signal Processing on the Theme of Advances in Computational Intelligence
Edited by
B.G. MERTZIOS Automatic Control Lab., Dept. of Electrical & Comp. Engineering, Democritus University of Thrace, GR-67 100 Xanthi, GREECE
P. LIATSIS Control Systems Centre, Dept. of Electrical Engineering & Electronics, UMIST, Sackville Street, P.O. Box 88, Manchester M60 1QD, United Kingdom
ELSEVIER AMSTERDAM - LAUSANNE- NEW YORK - OXFORD - SHANNON- TOKYO
1996
ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 211, 1000 AE Amsterdam, The Netherlands
ISBN: 0 444 82587 8
9
1996 Elsevier Science B.V. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science B.V., Copyright & Permissions Department, P.O. Box 521, 1000 AM Amsterdam, The Netherlands. Special regulations for readers in the U.S.A.- This publication has been registered with the Copyright Clearance Center Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the U.S.A. All other copyright questions, including photocopying outside of the U.S.A., should be referred to the copyright owner, Elsevier Science B.V., unless otherwise specified. No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. This book is printed on acid-free paper. Printed in The Netherlands.
Preface The p a p e r s t h a t
are
included
International
Workshop
Computational
Intelligent,
November,
on
in t h i s
volume have been p r e s e n t e d a t
Image/Signal
Processing
(IWISP):
UMIST in a s s o c i a t i o n Electrical
Advances
which was h e l d a t UMIST, M a n c h e s t e r ,
1996. The 3rd IWISP was o r g a n i s e d by t h e C o n t r o l
the 3rd
UK on 4-7
Systems C e n t r e ,
w i t h IEEE Region 8 and c o - s p o n s o r e d by the I n s t i t u t e
Engineers,
the
Institute
of Measurement
in
and C o n t r o l ,
the
of IEEE
S i g n a l P r o c e s s i n g S o c i e t y and the C o n t r o l Technology T r a n s f e r Network, under the
General
Chairmanship
C h a i r m a n s h i p of P r o f .
of
Prof.
Peter
E.
Wellstead
and
the
Programme
B a s i l G. M e r t z i o s .
E v i d e n t l y , a Workshop cannot c o v e r t h e i n t e n s i v e l y d e v e l o p e d a r e a of Image and Signal
Processing.
The t r e n d of t h e 3rd
Computational
IWISP i s
Intelligence',
emphasized by i t s
'Advances
in
referring
efficiency
and c o m p l e x i t y on Image and S i g n a l P r o c e s s i n g .
Workshop f o c u s e s in t h e most modern and c r i t i c a l Processing society.
and t h e i r
related
Specifically,
categorized
the
areas
that
articles
to
theme:
computational
In p a r t i c u l a r ,
the
a s p e c t s of Image and S i g n a l
have a s i g n i f i c a n t
presented
in
the
3rd
impact
in our
IWISP
may be
in t h e f o l l o w i n g f o u r major p a r t s :
I Coding and Compression (image c o d i n g , representation,
image subband,
wavelet
c o d i n g and
v i d e o c o d i n g , motion e s t i m a t i o n and m u l t i m e d i a ) ;
I Image P r o c e s s i n g and P a t t e r n R e c o g n i t i o n (image a n a l y s i s ,
edge d e t e c t i o n ,
segmentation,
systems,
processing,
image enhancement
pattern
and r e s t o r a t i o n ,
adaptive
colour
and o b j e c t r e c o g n i t i o n and c l a s s i f i c a t i o n ) ;
n F a s t P r o c e s s i n g T e c h n i q u e s ( c o m p u t a t i o n a l methods, VLSI DSP a r c h i t e c t u r e s ) ; I Theory and A p p l i c a t i o n s banks,
wavelets
in
(identification
image and s i g n a l
and m o d e l l i n g ,
processing,
multirate
biomedical
and
filter
industrial
applications). The p r o p o s a l s
from each c a t e g o r y were t h e n r e v i e w e d by the members of t h e
International
Programme
Committee
and
numerous
other
reviewers.
We a r e
sincerely grateful
to t h e r e v i e w e r s and to t h e v o l u n t e e r s who a c t e d as i n v i t e d
sessionorganisers
and h e l p e d up to a t t r a c t
r e v i e w p r o c e s s , about t h r e e f i f t h s final
programme c o n s i s t e d
quality
papers.
exceptionally
interesting
of
the
sessions, papers
giving a total
presented
and wide i n t e r n a t i o n a l
c o n t i n e n t s and r e p r e s e n t i n g
In t h e
of t h e s u b m i t t e d p a p e r s were a c c e p t e d . The
of 24 o r a l
The a u t h o r s
high q u a l i t y c o n t r i b u t i o n s .
in
of 152 h i g h
IWISP-96
form an
group coming from the f i v e
t h e f o l l o w i n g 33 c o u n t r i e s :
Argentina,
Armenia,
vi
Australia,
Belgium, B r a z i l , Canada, China, C r o a t i a , Czech R e p u b l i c , F i n l a n d ,
France, Germany, 6 r e e c e , Hong Kong, I n d i a , I s r a e l , Mexico, The N e t h e r l a n d s ,
Poland, R u s s i a ,
Iran, Italy,
Slovakia,
Slovenia,
Japan, Korea, Spain,
Sweden,
Taiwan, Turkey, UK, USA and Y u g o s l a v i a . The f i r s t
and second IWISP have been held in Budapest under the c h a i r m a n s h i p
of
Kalman Fazekas.
Prof.
signifies
a
true
successful
future.
The
internationalisation
and
fertilisation strong
techniques, reduction,
and
Systems,
focus on the
the
3rd
IWISP to
strengthens
and
where
there
interdisciplinary is
of t h e o r y and a p p l i c a t i o n s .
interest
of
Manchester
guarantees
a
The next Workshops w i l l be o r g a n i s e d by an I n t e r n a t i o n a l
S t e e r i n g Committee and w i l l Processing
transition
include
lossless
multiresolution
a
great
systems,
and w a v e l e t s ,
a d a p t i v e systems and f i l t e r s ,
potential
Amongst o t h e r s ,
and o r t h o g o n a l
analysis
a r e a s of Signal for
typical
c a s e s of
linear prediction
model
and
data
c o m p u t a t i o n a l c o m p l e x i t y and n o n - l i n e a r dynamics.
Acknowledgements
and
are
order
2D c o n t r o l systems, l e a r n i n g t h e o r y
and a p p l i c a t i o n s ,
appreciation
cross-
due
to
all
the
contributors
who
s u b m i t t e d t h e i r p r o p o s a l s f o r review to IWISP'96. Needless to say, we could not have such a high q u a l i t y t e c h n i c a l programme w i t h o u t t h e i r c o n t r i b u t i o n s . We a l s o wish to s i n c e r e l y
thank the members of the I n t e r n a t i o n a l
Programme
Committee, the r e v i e w e r s and a l l those t h a t helped in the o r g a n i s a t i o n of the Workshop.
B a s i l G. M e r t z i o s Panos L i a t s i s
vii
IWISP '96 ORGANIZING COMMITTEE
P.E. W e l l s t e a d , UMIST, UK ( General Chair) M. Domanski, TU Poznan, Poland ( T u t o r i a l s Chair) K. Fazekas, TU Budapest, Hungary ( F i n a n c i a l Chair) P. L i a t s i s ,
UMIST, UK
( P r o c e e d i n g s / P u b l i c i t y Chair)
B.G. M e r t z i o s , D e m o c r i t u s Univ. of Thrace, Greece (Program Chair)
. ~ 1 7 6
Vlll
INTERNATIONAL PROGRAMME COMMITTEE
I. A n t . H i . u , J.
Solvay I n s t . ,
Biemond, TU D e l f t ,
Belgium
The N e t h e r l a n d s
Z. B o j k o v i c , Belgrade Univ., Y u g o s l a v i a I. B o u t a l i s ,
Democritus Univ. of Thrace, Greece
M. Brady, Univ. of Oxford, UK V. C a p p e l l i n i ,
Florence Univ.,
G. C a r a g i a n n i s ,
Italy
NTUA, Greece
A.C. C o n s t a n t i n i d e s ,
I m p e r i a l C o l l e g e , UK
T. Cooklev, Univ. of Toronto, Canada J. C o r n e l i s ,
Vrije Universiteit
Brussel,
Belgium
A. Davies, K i n g ' s C o lle g e London, UK I. E r e n y i , KFKI Research I n s t . ,
Hungary
G. F e t t w e i s , R u h r Univ. Bochum, Germany M. Ghanbari, Univ. of Essex, UK S. van H u f f e l , KU Leuven, Belgium G. I s t e f a n o p o u l o , V.V. I v a n . v ,
Bosporous U n i v . , Turkey
JINR, R u s s i a
M. Karny, UTIA, Academy of S c i e n c e s , Czech Republic T. Kida, Tokyo I n s t .
of Technology, Japan
J. K i t t l e r ,
Univ. of S u r r e y , UK
S. K o l l i a s ,
NTUA, Greece
M. Kunt, U n i v e r s i t y of Lausanne, S w i t z e r l a n d C.L. N i k i a s , Univ. of Southern C a l i f o r n i a ,
USA
T. Nossek, TU Munchen, Germany D. van Ormondt, TU D e l f t , K.K. P a r h i , M. P e t r o u ,
The N e t h e r l a n d s
Univ. of M i n n e s o t t a , USA Univ. of S u r r e y , UK
D.T. Pham, Univ. of Wales C a r d i f f , M. S a b l a t a s h ,
UK
Mcmaster Univ., Canada
D.G. Sampson, Democritus Univ. of Thr~ceT-Greece W. Schemmp, Siegen U n i v . , Germany M. S t r i n t z i s ,
Aristotle
J. Turan, TU Kosice,
Univ. of T h e s s a l o n i k i ,
Slovak Republic
G.J. V a c h t s e v a n o s , Georgia I n s t . A. V e n e t s a n o p o u l o s ,
of T e c h . , USA
Toronto Univ., Canada
Greece
ix
Contents
Session A: Image Coding I: Vector Quantisation, Fractal and Segmented Coding Joint optimization of multidimensional SOFM codebooks with QMA modulations for vector quantized image transmission O. Aitsab, R. Pyndiah and B. Solaiman Visual vector quantization for image compression based on laplacian pyramid structure Z. He, G. Qiu and S. Chen Kohonen's self-organizing feature maps with variable learning rate: Application to image compression A. Cziho, B. Solaiman, G. Cazuguel, C. Roux and I. Lovany
11
An efficient training algorithm design for general competitive neural networks J. Jian and D. Butler
15
Architecture design for polynomial approximation coding of image compression C.-Y. Lu and K.-A. Wen
19
Application of shape recognition to fractal based image compression S. Morgan and A. Bouridane
23
Chrominance vector quantization for coding of images and video at very low bitrates M. Bartkowiak, M. Domanski and P. Gerken
27
Region-of-interest based compression of magnetic resonance imaging data N.G. Panagiotidis and S.D. Kollias
31
Scalable parallel vector quantization for image coding applications D.G. Sampson, A. Cuhadar and A.C. Downton
37
Session B: Wavelets in Image/Signal Processing Real time image compression methods incorporating wavelet transforms D.T. Morris and M.D. Edwards
43
Custom wavelet packet image compression design M.V. Wickerhauser
47
Two-dimensional directional wavelets in image processing J.-P. Antoine
53
The importance of the phase of the symmetric daubechies wavelets representation of signals J.-M. Lina
61
Contrast enhancement in images using the 2D continuous Wavelet transform J.-P. Antoine and P. Vandergheynst
65
Wavelets and differential-dilation equations T. Cooklev, G. Berbecel and A.N. Venetsanopoulos
69
Wavelets in high resolution radar imaging and clinical magnetic resonance imaging W. Schempp
73
Wavelet transform based information extraction from 1-D and 2-D signals A. Dabrowski
81
Invited Session C: General techniques and algorithms Computational methods and tools for simulation and analysis of complex processes V.V. Ivanov
89
Rare events selection on a background of dominated processes applying multilayer perceptron V.V. Ivanov and P.V. Zrelov
97
Cellular automation and elastic neural network application for event reconstruction in high energy physics I. Kisel, E. Konotopskaya and V. Kovalenko
101
Recognition of tracks detected by drift tubes in a magnetic field S.A. Baginyan and G.A. Ososkov
105
Session D: Adaptive Systems I: Identification and Modeling A unified connective representation for linear and nonlinear discrete-time system identification J. Fantini
111
Predicting a chaotic time series using a dynamical recurrent neural network R. Teran, J-P. Draye and D. Pavisic
115
A new neural network structure for modelling non-linear dynamical systems A. Hussain, J.J. Soraghan, T.S. Durrani and D.C. Campell
119
xi A neural network for moving light display trajectory prediction H.M. Lakany and G.M. Hayes
123
Recognizing flow pattern of gas/liquid two-component flow using fuzzy logical neural network P. Lihui, Z. Baofen, Y. Danya and X. Zhijie
127
Adaptive algorithm to solve the mixture problem with a neural networks methodology A.M. Perez, P. Martinez, J. Moreno, A. Silva and P.L. Aguilar
133
Process trend analysis and fuzzy reasoning in fermentation control S. Kivikunnas, K. Ibatici and E. Juusso
137
Higher order cumulant maximisation using non-linear hebbian and anit-hebbian learning for adaptive blind separation of source signals M. Girolami and C. Fyfe
141
Session E: Pattern/Object Recognition A robot vision system for object recognition and work piece location W. Min, D. Qizhi and W. Jun
147
Recognition of objects and their direction of moving based on sequence of two-dimensional frames B. Potochik and D. Zazula
151
Innovative techniques for the recognition of faces based on multiresolution analysis and morphological filtering A. Doulamis, N. Tsapatsoulis and S. Kollias
155
Partial curve identification in 2-D space and its application to robot assembly E-H. Yao, G.-E Shao, A. Tamaki and K. Kato
161
A fast active contour algorithm for object tracking in complex background C.L. Lam and S.Y. Yuen
165
The 2-point combinatorial probabilistic Hough transform for circle detection J.Y. Goulermas and P. Liatsis
169
Modified rapid transform features in information symbols recognition system J. Turan, K. Fazekas, L. Ovsenik and M. Kovesi
173
Image data processing in flying object velocity optoelectronic measuring device J. Mikulec and V. Ricny
177
xii
Session F: Texture Analysis Rotation invariant texture classification schemes using GMRFs and Wavelets R. Porter and N. Canagarajah
183
A new method for describing texture D.T. Pham and B. Cetiner
187
Texture discrimination for quality control using wavelet and neural network techniques D.A. Karras, S.A. Karkanis and B.G. Mertzios
191
A region oriented CFAR approach to the detection of extensive targets in textured images C. Alberola-Lopez, J.R. Casar-Corredera and J. Ruiz-Alzola
195
Generating stabile structure of a color texture image using scale-space analysis with nonuniform gaussian kernels S. Morita and M. Tanaka
199
Session G: Image Coding II: Transform, Subband and Wavelet Coding Approximation of bidimensional Karhunen-Loeve expansions by means of monodimensional Karhunen-Loeve expansions, applied to Image Compression N. Balossino and D. Cavagnino
205
Blockness distortion evaluation in block-coded pictures M. Cireddu, EG.B. De Natale, D.D. Giusto and P. Pes
209
A new distortion measure for the assessment of decoded images adapted to human perception F. Bock, H. Walter and M. Wilde
215
Image compression with interpolation and coding of interpolation errors J. Yi and F. Arp
219
Matrix to vector transformation for image processing D. Ait-Boudaoud
223
A speech coding algorithm based on wavelet transform X. Wu, Y. Li and H. Chen
227
Automatic determination of region importance and JPEG codec reflecting human sense R. Hayasaka, J. Zhao, Y. Shimazu, K. Ohta and Y. Matsushita
231
Directional image coding on wavelet transform domain D.W. Kang
235
xiii
Session H: Video Coding I: MPEG An universal MPEG decoder with scalable picture size R. Prabhakar and W. Li
241
The influence of impairments from digital compression of video signal on perceived picture quality S. Bauer, B. Zovko-Cihlar and M. Grgic
245
On scalable coding of image sequences L. Erwan
249
Image transmission problems between IP and ATM networks V.S. Mkrttchian, A.V. Eranosian and H.L. Karamyan
253
A scalable video coding scheme based on adaptive infield/inframe DCT and adaptive frame interpolation M. Asada and K. Sawada
257
Rate conversion of compressed video for matching bandwidth constraints in ATM networks* P. Assunq~o and M. Ghanbari
Session h Image Subband, Wavelet Coding and Representation Unified image compression using reversible and fast biorthogonal wavelet transforms H. Kim and C.C. Li
263
Subband image coding using adaptive fuzzy quantization step controller P. Planinsic, E Jurkovic, Z. Cucej and D. Donlagic
267
EZW algorithm using visual weighting in the decomposition and DPCM L. Lecornu and C. Jedrzejek
271
Efficient 3-D subband coding of color video M. Domanski and R. Swierczynski
277
Adaptive wavelet packet image coding with zerotree structure T. Otake, K. Fukuda and A. Kawanaka
281
Efficiency of the image morphological pyramid decomposition D. Sandic and D. Milovanovic
285
Optimal vector pyramidal decompositions for the coding of multichannel images D. Tzovaras and M.G. Strintzis
289
* Due to unavoidable circumstances this paper has been placed at the end of the book on page 701.
xiv
Session J: Segmentation Multilingual character segmentation using matching rate K.-A. Moon, S.-Y. Chi, J.-W. Park and W.-G. Oh
295
Architecture of an object-based tracking system using colour segmentation R. Garcia-Campos, J. Battle and R. Bischoff
299
Segmentation of retinal images guided by the wavelet transform T. Morris and Z. Newell
303
An adaptive fuzzy clustering algorithms for image segmentation Y.A. Tolias and S.M. Panas
307
Hy2: A hybrid segmentation method E Marino and G. Mastonardi
311
Session K: Image Enhancement/Restoration Efficient computation of the 2-dimensional RGB vector median filter S.J. Sangwine and A.J. Bardos
317
Image restoration for millimeter wave images by Hopfield neural network K. Yuasa, H. SawN, K. Watabe, K. Mizuno and M. Yoneyama
321
Image restoration of medical diffraction tomography using filtered MEM K. Hamamoto, T. Shiina and T. Nishimura
325
Directionally adaptive image restoration X. Neyt, M. Acheroy and I. Lemanhieu
329
Optimal matching of images at low photon level M. Guillaume, T. Amoroux and P. Refregier
333
A method for controlling the enhancement of image features by unsharp masking filters E. Cernadas, L. Gomez, A. Casas, P.G. Rodriguez and R.G. Carrion
337
Image noise reduction based on local classification and iterated conditional models K. Hafts, S.N. Efstratiadis, N. Maglaveras and C. Pappas
341
Session L: Adaptive Systems II: CLASSIFICATION A neural approach to invariant character recognition I.M. Spiliotis, P. Liatsis, B.G. Mertzios and Y.P. Goulermas
347
xv Image segmentation based on boundary constraint neural network F. Kurugollu, S. B irecik, M. Sezgin and B. Sankur
353
A high performance neural multiclassifier system for generic pattern recognition applications D. Mitzias and B.G. Mertzios
357
Application of a neural network for multifont farsi character recognition using fuzzified pseudo-zernike moments M. Namazi and K. Faez
361
Integrating LANDSAT and SPOT images to improve landcover classification accuracy A. Chiuderi
365
Classification of bottle rims using neural networks-an LMS approach C. Teoh and J.B. Levy
369
INVITED SESSION M: Wavelets and Filter Banks in Communications Data compression, data fusion and kalman filtering in wavelet transform Q. Jin, K.M. Wong, Z.M. Luo and E. Bosse
377
Performance of wavelet packet division multiplexing in timing errors and flat fading channels J. Wu, K.M. Wong and Q. Jin
381
Time-varing wavelet-packet division multiplexing T.N. Davidson and K.M. Wong
385
Co-channel interference mitigation in the time-scale domain: the CIMTS algorithm S. Heidari and C.L. Nikias
389
Design and performance of DS/SS signals defined by arbitrary orthonormal functions W.W. Jones and J.C. Dill
393
COFDM, MC-CDMA and wavelet-based MC-CDMA K. Chang and X. Lin
397
Signal denoising through multifractality W. Kinsner and A. Langi
405
Application of multirate filter bank to the co-existence problem of DS-CDMA and TDMA systems S. Hara, T. Matsuda and N. Morinaga
409
xvi
Session N: Edge Detection Multiscale edges detection by wavelet transform for model of face recognition E Yang, M. Paindavoine and H. Abdi
415
Edge detection by rank functional approximation of grey levels J.P. Asselin de Beauville, D. Bi and EZ. Kettaf
419
Fuzzy logic edge detection algorithm S. Murtovaara, E. Juuso and R. Sutinen
423
Topogical edge finding M. Mertens, H. Sahli and J. Cornelis
427
Session O: Video Coding II: Motion Estimation Automatic parallelization of full 2D block matching for real-time motion compensation and mapping into special purpose architectures N. Koziris, G. Papakonstantinou and P. Tsanakas
433
New search region prediction method for motion estimation D.H. Ryu, C.R. Kim, T.W. Choi and J.C. Kim
439
Motion estimation by direct minimisation of the energy function of the Hopfield neural network L. Cieplinski and C. Jedrzejek
443
A modified MAP-MRF motion-based segmentation algorithm for image sequence coding D. Gatica-Perez, E Garcia-Ugalde and V. Garcia-Garduno
447
Unsupervised motion segmentation of image sequences using adaptive filtering O. Pichler, A. Teuner and B.J. Hostika
451
Development of a motion compensated coding system for an enhanced wide screen TV T. Hamada and S. Matsumoto
455
Session P: Biomedical Applications Brain evoked potentials mapping using the diffuse interpolation D. Bouattoura, P. Gaillard, P. Villon and E Langevin
461
Computer-aided diagnosis: detection of masses on digital mammograms A.J. Mendez, P.G. Tahoces, M.J. Lado, M. Souto and J.J. Vidal
465
Model order determination of ECG beats using rational function approximations J.S. Paul, V. Jagadeesh Kumar and M.R.S. Reddy
469
xvii Computation of the ejection rate of the ventricle from echocardiographic image sequences A. Teuner, O. Pichler and B.J. Hosticka
475
Contour detection of the left ventricle in echocardiographic images S.G. dos Santos, E Bortolozzi and J. Facon
479
Identification of a stochastic system involving neuroelectric signals A.G. Rigas
483
Invited Session Q: Signal Processing Theory and Applications Design of m-band linear phase FIR filter banks with high attetuation in stop bands T. Kida and Y. Kida
489
Robustness of filter banks F.N. Kouboulis, M.G. Scarpetis and B.G. Mertzios
493
Design and learning algorithm of neural networks for pattern recognition H. Takahashi and M. Nakajima
497
Statistical comparison of minimum cross entropy spectral estimators R.C. Papademetriou
501
Generalized optimum approximation minimizing various measure of error at the same time T. Kida
507
Determination of optimal coefficients of high-order error feedback upon Chebyshev criteria A. Djebbari, A1. Djebbarri, M.E Belbachir and J.M. Rouvaen
511
Invited Session R: VLSI DSP Architectures Dynamic codelength reduction for VLIW instruction set architectures in digital signal processors M. Weiss and G. Fettweis
517
Implementation aspects of FIR filtering in a wavelet compression scheme G. Lafruit, B. Vanhoof, J. Bormans, M. Engels and I. Bolsens
521
Recursive approximate realisation of image transforms with orthonormal rotations G.J. Hekstra, E.F. Deprettere, M. Monari and R. Heusdens
525
Radix distributed arithmetic: algorithms and architectures M.K. Ibrahim
531
Order-configurable programmable power-efficient FIR filters C. Xu, C.-Y. Wang and K.K. Parhi
535
xviii
Session S: Video Coding III: Multimedia On speech compression standards in multimedia videoconferencing: Implementation aspects M. Markovic and Z. Bojkovic
541
Multimedia communication graphical user interface design principles for the teleeducation J. Turan, K. Fazekas, L. Ovsenik and M. Kovesi
545
Image and video compression for multimedia applications D.G. Sampson, E. da Silva and M. Ghanbari
549
A multilayer image coding and browsing system G. Qiu
553
Switched segmented image coding-JPEG schemes for progressive image transmission C.A. Christopoulos, A.N. Skodras, W. Philips and J. Cornelis
557
Low bit rate coding of image sequences using regions of interest and neural networks N. Doulamis, A. Tsiodras, A. Doulamis and S. Kollias
561
Session T: Image Analysis I Iterated function systems for still image processing J.-L. Dugelay, E. Polidori and S. Roche
569
Sensing Surface Discontinuities via Coloured Spots C.J. Davis and M.S. Nixon
573
Image analysis and synthesis by learning from examples S.G. Brunetta and N. Ancona
577
A stabilized multiscale zero-crossing image representation for image processing tasks at the level of the early vision S. Watanabe, T. Komatsu and T. Saito
581
Finding geometric and structural information from 2D image frames R. Jaitly and D.A. Fraser
585
Detection of small changes in intensity on images corrupted by signal-dependent noise by using the wavelet transform Y. Chitti and P. Gogan
589
Deterioration detection in a sequence of large images O. Buisson, B. Besserer, S. Boukir and L. Joyeux
593
xix
Invited Session U: Color Processing Segmentation of multi-spectral images based on the physics of reflection N. Kroupnova
599
Using color correlation to improve restoration of colour images D. Keren, A. Gotlib and H. Hel-Or
603
Colour eigenfaces G.D. Finlayson, J. Dueck, B.V. Funt and M.S. Drew
607
Colour quantification for industrial inspection M. Petrou and C. Boukouvalas
611
Colour object recognition using phase correlation of log-polar transformed Fourier spectra A.L. Thornton and S.J. Sangwine
615
SIIAC: Interpretation system of aerial color images S. Mouhoub, M. Lamure and N. Nicoloyannis
619
Session V: Industrial Applications Nodular quantification in metallurgy using image processing V.L. Ballarin, E. Moler, E Pessana, S. Torres and M. Gonzalez
625
Image processing in the measurement of trash content and grades in cotton B.D. Farah
629
Automated visual inspection based on fermat number transform J. Harrington and A. Bouridane
633
Segmentation of birch wood board images D.T. Pham and R.J. Alcock
637
Techniques for classifying sugar crystallization images based on spectral analysis and the use of neural networks E.S. Gonzalez-Palenzuela and P.I. Vega-Cruz
641
Large-scale tomographic sensing system to study mixing phenomena M. Wang, R. Mann, EJ. Dickin and T. Dyakowski
647
Session W: Image Analysis lI Structural indexing of infra-red images using statistical histogram comparison B. Huet and E. Hancock
653
XX
A model-based approach for the detection of airport transportation networks in sequences of aerial images D. Sarantis and C.S. Xydeas
657
Context driven matching in structural pattern recognition S. Gautama and J.P.E D'Haeyer
661
An efficient box-counting fractal dimension approach for experimental image variation characterization A. Conci and C.F.J. Campos
665
An identification tool to build physical models for virtual reality J. Louchet and L. Jiang
669
Cue based camera calibration and its application to digital moving image production Y. Nakazawa, T. Komatsu and T. Saito
673
Session X: Signal Processing II A novel approach to phoneme recognition using speech image (spectrogram) M. Ahmadi, N.J. Bailey and B.S. Hoyle
679
Modified NLMS algorithms for acoustic echo cancellation M, Medvecky
683
Matrix polynomial computations using the reconfigurable systolic torus T.H. Kaskalis and K.G. Margaritis
687
Real-time connected component labelling on one-dimensional array processors based on content-addressable memory: optimisation and implementation E. Mozef, S. Weber, J. Jabar and E. Tisserand
691
A 2-D window processor for modular image processing applications and its VLSI implementation P. Tzionas, C. Mizas and A. Thanailakis
695
Session A IMAGING CODE I: VECTOR QUANTISATION, FRACTAL AND SEGMENTED CODING
This Page Intentionally Left Blank
Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
Joint optimization of multi-dimensional SOFM codebooks with QAM modulations for vector quantized image transmission O. AITSAB*, R. PYNDIAH* & B. SOLAIMAN** TELECOM BRETAGNE, B.P. 832, 29285 Brest Cedex, France. (Tel : (33) 98 00 10 70, Fax : (33) 98 00 10 98) *Dept. S.C., **Dept. I.T.I. Email : omar.aitsab @enst-bretagne.fr
Abstract Traditionally, source coding and channel modulation characteristics are optimized separately. Source coding reduces the redundancy in an input signal (information compression), while the modulation adapts the information to the transmission channel characteristics in order to be noise resistant. In this paper, the internal structure of the source coding scheme (a self organized feature map, vector quantizer) is trained in conjunction with a QAM modulation type, in order to increase the tolerance of transmission error effects. Results obtained using the standard Lenna image are extremely encouraging.
I- Introduction The requirements of digital transmission systems are now becoming so severe that it is no longer possible to optimize different functions in the system independently. Today, most transmission systems use the concept of coded-modulation [1] (TCM) which leads to a better spectral efficiency through the global optimization of channel coding and modulation. On the other hand, powerful source coding techniques are used to increase the number of sources transmitted in a given frequency bandwidth. However, the quality of the transmitted sources using these source coding techniques usually depends on the channel bit error rate. To go one step further, one would expect the subjective quality of the transmitted sources (image or speech) to remain acceptable even at a very low channel signal to noise ratio as in an analogue transmission system. In this paper, the joint optimization of image coding (using vector quantization) and modulation is considered in order to minimize the effect of transmission errors on the subjective quality of the received/reconstructed images.
II - Image source coding Recently, vector quantization (VQ) has emerged as an effective tool for image compression (source coding) [2]. In VQ, a data vector X (or a sub-image) to be encoded is represented as one of a finite set of M symbols. Associated with each symbol "i" is a reference vector (sub-image) "Ci" called a codeword. The complete set of M Code,re-orals is called the codebook. The codebook C = {Ci, i=1,2, .. M} is usually obtained through a train'mgprocess using a large set of training data that is statistically representative of the data encountered in practice. In this study, the determination of the codebook is conducted using the Self Organizing Feature Map (SOFM) proposed by T.Kohonen [3]. This model builds up a mapping from the N-dimensional vector space of real numbers ~RN to a two dimensional array "S" of cells. Each cell is given a virtual position in ~ N . This position (given by synaptic weights connecting this cell to the input vector) is in fact the codeword.
The purpose of the self-organization process is to find the position vectors such that the resulting mapping (correspondence between an input vector X and the cell which lies nearest in ~RN ) is a topology-preserving mapping (adjacent vectors in ~RNare mapped on adjacent, or identical, cells in the array "S"). The learning algorithm that forms feature maps selects the best matching (or winning) cell according to the minimum Euclidean distance between its position and the input vector X. All position vectors in the neighborhood of the winning cell are adjusted in order to make them more responsive to the current input. The quantized Lenna image using a 16xl 6 SOFM is given in figure 2 (Image 1). The codebook trained by the SOFM algorithm presents an internal order, which means that the Euclidean distance between codewords increases with the topological distance in the codebook (see figure 1); this order can be employed to increase error tolerance. In the next section, each codeword will be referenced by its topological position (i,j) on the SOFM. III- Image
transmission
In the case of a vector quantized image, the image transmission is done by transmitting the coordinates (i,j) of the different codewords representing the image. At the receiver end, the codewords corresponding to the received coordinates are used to reconstruct the transmitted image. It is clear that the received codeword can be different from the transmitted one when the received coordinates are subject to transmission errors. Furthermore, if we do not take any precautions, these codewords can be completely different, that is a white block may be transformed into a black one and vice-versa ("salt and pepper" noise). This can lead to a very bad subjective quality of the received image with black dots in white zones and vice-versa as illustrated by Image 3 in figure 2. To reduce the effect of transmission errors on the received image, the probability of a transition between two codewords must be a decreasing function of the Euclidean distance between them. To obtain this characteristic, the internal order of the bi-dimensional (16x16) codebook obtained with the SOFM algorithm was used in conjunction with a 256QAM modulation. In this particular case, each codeword is associated to one specific point in the 256QAM constellation (see figure 1). This means that the topology of the SOFM is preserved in the modulation space. Thus, and since the symbol error probability is a decreasing function of the Euclidean distance between the constellation points, the transition probability between two codewords will be a decreasing function of the Euclidean distance between them. The performance of this approach is illustrated by Image 2 in figure 2. We observe that the subjective quality of the reconstructed image is very good for a bit error rate of l 0 -2.
Figure I : Mapping of bi-dimensional (16x16) SOFM codebook and 256 QAM constellation
However, the 256QAM modulation is rarely used in practical transmission systems. So, we propose to transmit the codeword coordinates using a QAM modulation with a smaller number of states, for example 16QAM modulation. In this case, each coordinate is represented by 4 bits and associated with a specific point in the 16QAM constellation by using a Gray mapping. The result of the reconstructed image is shown in figure 2 (Image 3). The degradation of the image is great because the bi-dimensional codebook is not adapted to 16QAM modulation. In order to improve the quality of the received image, we have adapted the SOFM codebook topology to the type of modulation without increasing the complexity of modulation and source coding [4]. The main idea is to minimize the transmission error effects. So, two adjacent codewords must have adjacent points in the QAM constellation. In the best case, the number of codewords must be equal to the number of modulation states. This was the case with the 256QAM modulation and the reconstructed image presented good subjective quality even at a low BER (10-2). However, when the codeword number is greater than the number of modulation states, the SOFM topology must be adapted to the modulation. For 16QAM modulation, a four-dimensional codebook is required, and each codeword has 4 coordinates. Each coordinate takes 4 values, and each specific constellation point is associated with two coordinates. Thus, the four-dimensional codebook is trained for 16QAM modulation. Image 4 in figure 2 shows the reconstructed image by using this ordered codebook for a BER of 10-2. We clearly observe an improvement in the subjective quality: the PSNR is 5.7 dB higher than for the unordered codebook. IV - Simulation results
We simulated the effects of transmission errors and their compensation by joint opimization of the SOFM codebook and QAM modulation in image compression [5][6], using codebooks consisting of 256 codewords for 3 by 3 pixel subimages. The codebooks were trained using two images (boat and bridge) and were tested on the Lenna image. All the images were 512 by 512 pixels, with 256 grey levels. Distortion in the decoded images was measured using a peak signal-to-noise ratio (PSNR) defined as :
PSNR = 10 log
2552 MSE
dB,
where MSE is the mean square error.
V- Conclusion
The optimal association of a two-dimensional code book containing 16x16 elements with a 256QAM modulation is very robust to transmission errors. When using a 16QAM modulation, the overall performance of the system can be improved by using a 4-dimensional codebook specifically trained for 16 QAM modulation. However, we obtain lower performances than with the 256QAM constellation. This is due to the fact that in a 4-dimensional codebook of 256 elements, each codeword has 8 closest neighbors instead of 4. In this case it is difficult to minimize the VQ distortion and reduce the transmission error effect.
Figure 2 : The reconstructed VQ image after transmission through a Gaussian noisy channel. 1 : The reconstructed image without transmission errors PSNR = 30dB. Image 2 : the reconstructed image with ordered codebook for 256QAM modulation (BER = 10 "2) PSNR = 29.1dB. Image 3 : the reconstructed image with u n o r d e r e d codebook for 16QAM modulation (BER = 10 "2) PSNR = 21.12dB. Image 4 : the reconstructed image with ordered codebook for 16QAM modulation ( B E R = 10 "2) PSNR = 26.82dB.
Image
References [1] G.Ungerboeck, "Channel Coding With Multilevel/Phase Signals", IEEE Trans. on Information Theory, vol. IT-28, 1982, pp 55-67. [2] R.M.Gray, "Vector quantization," IEEE Acoustic, Speech and Signal Processing Magazine, vol. 1, pp 4-29, Apr. 1984. [3] T.Kohonen, " Self Organization and Associative Memory, "New York, Springer-Verlag, 1984. [4] J.Kangas, "Increasing the Error Tolerance in Transmission of Vector Quantized Images by SelfOrganizing Map", ICANN 95, pp 287-291, Paris. [5] J. Kangas and T. Kohonnen, "Developments and applications of the Self-organizing map and related algorithms". In Proc. IMACS Int. Symp. on Signal Processing, Robotics and Neural Networks, pp 19-22, 94. [6] D. S. Bradburn, "Reducing transmission error effects using a self-organizing network". In Proc. IJCNN'89, Int. Joint Conf. on Neural Networks, vol.II, pages 531-537, Piscataway, NJ,1989
Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
Visual Vector Quantization For Image Compression Based on Laplacian Pyramid Structure tZ. H e , SG. Qiu )~University of P o r t s m o u t h , U . K .
and tS. C h e n SU n i v e r s i t y o f D e r b y , U . K .
Abstract In this paper, we propose a new image coding scheme based on the Laplacian pyramid structure (LPS) and the visual vector quantization (VVQ). In this new scheme, the LPS is used to generate the residual image sequence, and the VVQ is used to code these residual images. Comparing with other block-based coding methods, the new scheme has much less blocking effects on the reconstructed image since coding is performed on the basis of hierarchical multiresolution blocks. The new scheme also has an additional advantage of a much lower computational cost over traditional vector quantization (VQ) techniques since encoding and decoding are based on much smaller dimensional 'visual vectors'. Experimental results show that the new scheme can achieve comparable rate distortion performance to that of traditional VQ techniques, while the computational complexity of the new scheme is only a fraction of that of traditional VQ techniques.
1
Introduction
In recent years, the demand for image transmission and storage has increased dramatically and research into efficient techniques for image compression has attracted extensive interest. Among many coding techniques, the LPS [1] and the VVQ [2] are two efficient coding techniques in terms of compression ratio, fidelity and computational expense. In this paper, we propose a new image coding scheme by combining the LPS and the VVQ, which inherits the advantages of the both techniques. In this new scheme, the LPS is employed to generate the residual image sequence and the VVQ is used to code these residual images. Experimental results show that the new scheme can achieve comparable rate distortion performance to that of traditional VQ techniques, while the computational cost of the new scheme is much lower since the encoding and deoding are based on much smaller dimensional 'visual vectors'. Because the coding operation is performed on the basis of hierarchical multiresolution blocks, the new scheme has much less blocking effects on the reconstructed image than that of traditional VQ techniques. The remaining of the paper is organized as follows. Section 2 summarizes the LPS and the VVQ system for coding Laplacian residual images is described in section 3. Section 4 discusses the image reconstruction. Section 5 presents experimental results and section 6 gives some conclusion remarks.
2
The Pyramid Structure
The generation of the pyramid structure includes the generation of the Gaussian pyramid and the generation of the Laplacian pyramid. The process is illustrated in Fig.1.
Gaussian Pyramid Generation The original image Go of size M • N pixels becomes the level 0 of the Gaussian Pyramid. Upper level images are generated b y using the reduction function R(.)[1] defined in (1), iteratively.
Gl(i,j) -
~ m---2
y ] w(m,n).Gl_l(2i + m, 2j + n)
O< l < i,
O < i < Ml, O < j < Nl.
(1)
n---2
L is the number of levels in the pyramid, Ml and Nl are the dimensions of the lth level, and w(m, n) are weighting kernels. Fig.2 shows a 5-level Gaussian pyramid of "Lena".
Laplacian Pyramid Generation The reverse of the reduction function R(.)is the expansion function E(.)[1] defined in (2). Let the GZ,n be the result of expanding Gl n times. Then
Figure 1: Pyramid Structure Generation
2
Figure 2: 5-Level Gaussian Pyramid of "Lena"
2
Gt,n(i,j)- 4 ~
~
w(m,n).Gl,~_a( i~- ' m j - 2 n )
O 1, is a 2 x 2 anisotropy matrix. However, this wavelet is of little use in practice, because it still acts as a second order operator and detects singularities in all directions. Indeed it is not a directional wavelet, in the technical sense defined below. Hence the mexican hat will be efficient for a fine pointwise analysis, but not for detecting directions.
2.2. Directional wavelets When the aim is to detect oriented features (segments, edges, vector field,... ) in an image, for instance to perform directional filtering, one has to use a wavelet which is not rotation invariant. The best angular selectivity will be obtained if r is directional. By this we mean that the effective support of its Fourier transform r is contained in a convex cone in spatial frequency space {]~}, with apex at the origin, or a finite union of disjoint such cones (in that case, one will usually call r multidirectionaO. According to this definition, the anisotropic mexican hat is not directional, since the support of CH is centered at the origin, no matter how big its anisotropy is, and, indeed, detailed tests confirm its poor performances in selecting directions [5]. Typical directional wavelets are the 2-D Morlet wavelet or the Cauchy wavelets [6]. A
2.2.1. The 2-D Morlet wavelet This is the prototype of a directional wavelet: CM(Z) -- exp(if%. Z) e x p ( - ~~IA~I 2) ,
(2.2)
r
(2.3)
= V~ exp ( - l i c k 2 + (ky - ko)21).
The parameter ko is the wave vector, and A the anisotropy matrix as above. As in l-D, we should add a correction term to (2.2) and (2.3) to enforce the admissibility condition CM(0) = 0. However, since it is numerically negligible for [f%[ _ 5.6, we have dropped it altogether. The modulus of the (truncated)
55 wavelet GM is a Gaussian, elongated in the x direction if e > 1, and its phase is constant along the direction orthogonal to ko. Thus the wavelet GM smoothes the signal in all directions, but detects the sharp transitions in the direction perpendicular to k"o. The angular selectivity increases with Ifcol and with the anisotropy e. The best selectivit.y will be obtained by combining the two effects, i.e. by taking k-'o = (O, ko). The effective support of GM is centered at ko and is contained in a convex cone, that becomes narrower as e increases.
2.2.2. The Cauchy wavelet Let g _= g(0, a,/3) = {fr E R 2 [a < r _ 0, V/0 e g(0,a,/3)} is also convex. Given afixed vector r7 e C(0,&,/~), we define the Cauchy wavelet in spatial frequency variables [6]:
,•(e)(•) rn
/ (~" r l, 0,
(~. ~), ~-~.,~, fi e c(0,~,Z) otherwise.
(2.4)
where e~ (resp. e~) denotes the unit vector in the direction c~ (resp. /3). The Cauchy wavelet r162 is strictly supported in the cone g(0, c~,/3) and the parameters m, 1 E N* give the number of vanishing moments on the edges of the cone. An explicit calculation then yields the following result:
(c)-.
Gtrn (x) = const.
(~. ~. )-l-~ 9
(E. g~)
-~-~
,
(2.5)
where we have introduced the complex variable E = E + ir~ E R 2 + iC. We show in Figure 1 the wavelet 4 (k) for C -- C ( - 2 0 ~ 20~ this is manifestly a highly directional filter. ~4(C)
,,
\
"..',,
Figure 1" Two directional wavelets, in spatial frequency space: (left) The Morlet wavelet (e = 5, 0 = 45~ (right) The Cauchy wavelet (a = 20~
3.
Evaluation
of the performances
of the CWT
Given a wavelet, what is its resolving power, in particular what is its angular and scale selectivity ? What is the minimal discretization grid for the reconstruction formula (1.5) that guarantees that no information is lost? The answer to both questions resides in a quantitative knowledge of the properties of the wavelet at hand, that is, the tool must be calibrated. To that effect, one takes the WT of particular, standard signals. Three such tests have proven useful [5], and in each case the outcome may be viewed either at fixed (a, ~) (position representation) or at fixed (scale-angle representation).
9 Point signal: for a snapshot at the wavelet itself, one takes as signal a delta function, i.e. one evaluates the impulse response of the filter: =
1 /R x(t)r
(~)dt
(1)
The continuous-time wavelet transform depends on two parameters: dilation a and shift b. The C W T is, in principle, invertible, provided the wavelet is admissible (i. e. , it has sufficient decay). The wavelet transform involves basis functions which do not have a constant length: very short basis functions are used to achieve good time resolution, while longer basis functions can be used to obtain fine frequency analysis. When a and b are continuous the set of basis functions does not constitute an orthonormal basis, i. e. , the representation is redundant. The discrete-time wavelet transform can be obtained by discretizing a and b. The basis functions m t-bo n ) , where a = a~n and b0 = n b0 a~n. The case where a0 = 2 and b0 = 1 become Cm,n(t) = aom/2r is the most common and the corresponding grid is called dyadic. The discrete wavelet transform (DWT) corresponds to a filter bank iterated along the lowpass channel. In this paper we shall be concerned only with the CWT. A very important question in wavelet analysis is choosing the basis function, and this is the focus of our concern in this paper. We are looking for a continous-time wavelet r that is infinitely differentiable, has compact support and provides good frequency localization. A wavelet with these three properties has not been used in signal processing. The Mexican hat and Morlet wavelets are infinitely differentiable, but do not have compact support. Wavelets generated from filter banks can have compact support, but cannot be infinitely differentiable.
2
THE BASIC CONSTRUCTION
Iteration of a digital filter followed by downsampling leads to a limit function provided the filter satisfies certain constraints [1]. Downsampling and upsampling are discrete-time multirate operations. We are going to find it useful to define a continuous-time decimator (Fig. 1 (a)). While the term "continuous-time decimator" may not be ideally appropriate the idea is clear - the support of the function f(t) shrinks by a factor of two. Note that the block in Fig. 1 (a) is purely a mathematical tool that is only conceptually similar to the discrete-time decimator. Suppose now that the blocks of continous-time filtering and decimation are cascaded and iterated (Fig. l(b)). The impulse and frequency responses of the resulting system after two stages will
f(t)
@
f(2t)
Figure 1: Fig. 1 (a) Continuous-time decimator and (b) infinite iteration of a continuous-time system followed by decimator
70
be r = 4 h ( 4 t ) , 2h(2t) and O2(w) = H (r H (~0/2). The functions r satisfy a dilation equation Oi(w) = H (w/2) ~i-1 (w/2). If we continue the iteration to infinity and assume convergence the impulse and frequency responses of the system will be r
-
lim r
= 2i + i - 1 + + 2
i-+oo
lim [ h ( 2 i t ) , h(2i-lt),
i-+oo
... , h(2t)]
(2)
'
{:x3
~(w)
-
i~oolimOi(w)-HH(W)~
9
(3)
i=1
is equal to the support of h(t). The iterations make sense only if they converge.
Note that the support of r The simplest case is when h(t)
_ { 1/2 -l__ 110. The relative error 5 of the passband center frequency fp for the DTMF frequency fDTMFis defined as
= fp - fDTMF 100% .
(19)
fDTMF
Since DTMF frequencies are generated with relative error less than :kl.8%, then approximately the same bounds can be postulated for 5. Assuming some margin of say 45% we get the bounds -t-1.8 x 1.45% = • and then we obtain N >_ 104, i.e., approximately the same value as before from the frequency resolution requirement.
86
5
Conclusions
Multiresolution approach has been proposed for information extraction from signals and was applied to edge detection in images. By this means pseudoedges can easily be removed while the proper edges are placed precisely. Two mask types: the FDG mask type and the LOG mask type and also two edge-detection criteria: extremum search and zero-crossing, have been considered. FDG type masks surprisingly occured to be better in the considered application than the commonly used LOG type masks. This is because even very simple FDG masks like (-1, 0,1) (see Fig. 2a) yield quite satisfactory results and the threshold can be adaptively chosen for adges with the prescribed thickness. On the other hand, LOG type masks are very noise sensitive and the threshold criterion works very badly with them (see Fig. 3b). Therefore it cannot be practically used for them. It should be stressed that the threshold operation is much easier than the zero-crossing detection. The second example considered in this contribution is the DTMF detection. The main idea proposed for that application is a modification of the classical Goertzel algorithm. This modification consists in varying the analysed block length (to some extent only - - of say 5-6%) according to the frequency to be detected. This leads to the massive reduction of the average block size and to the reduction of the center frequency errors (f.
References [1] L. Cohen Time-Frequency Analysis, Prentice-Hall, Englewood Cliffs, N J, 1995. [2] Y. T. Chan Wavelet Basics, Kluwer Academic Publ., Boston, 1995. [3] S. G. Mallat, Multifrequency channel decompositions of images and wavelet models, IEEE Trans. Acoustics, Speech, a. Signal Proc., vol. 37, No. 12, Dec. 1989, pp. 2091-2110. [4] A. D~browski, A. Franc, A. Czajka, Realization of wavelet transform for Windows with application to edge detection in images, First International Symposium "Mathematical Models in Automation and Robotics", Mi~dzyzdroje, Sep. 1-3, 1994. [5] A. Dajbrowski, Wavelet transform-based modification of Goertzel algorithm for detection of DTMF signals, The International Conference "Signal Processing Applications & Technology, Boston, MA, 1995. [6] G. Goertzel, "An algorithm for the evaluation of finite trigonometric series", The American Math. Monthly, vol. 65, pp. 34-35, Jan. 1958. [7] A. D~browski, "Nonuniform digital filter bank for DTMF receiver", Proc. Workshop on Multirate Systems, Filter Banks and Wavelet Analysis, ETH Zurich, Oct. 26, 1992, pp. 10-13. [8] A. Da~browski, W. Kabacifiski, "Experiences with DTMF receivers and tone senders in Poland using DSP's", Proc. Int. Conf. Signal Process. Appl. & Technology, ICSPAT'93, Santa Clara, USA, Sep.1993, pp. 193-198. [9] S. Bagchi and S. K. Mitra, "An efficient algorithm for DTMF decoding using the subband NDFT", Proc. ISCAS'95, Seattle, USA, April 1995, pp. 1936-1939. Work supported in part by the grant KBN 0452/P4/94/07 and in part by the project DPB 44-443.
Invited Session C:
GENERAL TECHNIQUES AND ALGORITHMS
This Page Intentionally Left Blank
Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
89
C o m p u t a t i o n a l M e t h o d s and Tools for Simulation and Analysis of C o m p l e x P r o c e s s e s
k criteria, ANN and CA) (applying cv~V.V. Ivanov Laboratory of Computing Techniques and Automation Joint Institute for Nuclear Research, 1~1980 Dubna, Russia FAX: 007-096-21-651~5; E-mail: ivanov 9 Abstract The tutorial is devoted to computational methods and tools for simulation and analysis of different complex processes in physics, in medicine and in social life. There are considered: 1) multivariate data analysis methods based on w~k-criteria and artificial neural networks, 2) neural networks applications for solving problems of data classification and one-dimensional function approximation, and 3) cellular automata usage in pattern recognition and complex system simulation. These methods and tools are developed in the Laboratory of Computing Techniques and Automation of the Joint Institute for Nuclear Research (Dubna, Russia) in collaboration with the International Solvay Institutes for Physics and Chemistry (Brussels, Belgium).
1. M u l t i v a r i a t e d a t a a n a l y s i s b a s e d on cv~ k-criteria and A N N The primary goal of experimental data processing consists in identification of useful events among all events obtaining in the experiment. Under an event we mean the set of features characterizing the analyzed pattern. The classification of events in one-dimensional case is carried out with the help of a simple cut on a feature variable. When an event is characterized by more then one variable, the procedure for constructing a multivariate classifier is not trivial. k In paper [1] we have suggested and investigated a class of new nonparametric co~statistics
w~ =
n5 k+l
i=1
i - 1 _ F(xi) n
1
-
- F(xi)
1
where F(x) is the theoretical distribution function of x, Xl < x2 < ... < x,~ is an ordered sample, and n is the sample size. On their basis were constructed corresponding goodnessof-fit criteria, which are usually applied for testing the correspondence of each sample-event to the distribution known apriori. On the w~k criteria basis was developed a method for extracting low probability multivariate events from a background of predominant processes [1], which was successfully applied in several experiments for selection of rare events [2, 3]. Recently, the use of artificial neural networks (ANN) in multi-dimensional data analysis has became widespread [4]. One such problem consists in classifying individual events represented by empirical samples of finite volumes pertaining to one of the different partial distributions composing the distribution analyzed. A feed-forward multilayer network multilayer perceptron(MLP) - is a convenient tool for constructing multivariate classifiers, although its learning speed and power of recognition depends critically on the choice of input data [5].
90 Such network involves an input layer corresponding to the processing data, an output layer dealing with the results and, also, hidden layers. A network architecture is presented in Fig. 1.
Result
t
t
yi
Wij hj Wjk
t
t
Input
t
data
t
Xk
Figure 1: Architecture of multilayer perceptron with one hidden layer Here xk, hj and yi denote the input, hidden and output neurons, respectively; wjk are the weights of connections between the input neurons and the hidden layer, and wij are the weights of connections between the hidden and the output neurons. The signals aj = ~kwjkxk and a i = ~jcoijhj are fed to the inputs of hidden and output neurons, respectively. The output signals from these neurons are determined by the expressions hj = g[(aj + Oj)/T] and yi = g[(ai + Oi)/T], where g ( a , T ) i s a transfer function, T is the "temperature", determining its slope, 0 is the threshold of the corresponding node. Typically, g(a, T)is a sigmoid, for example, of the form g(a, T ) = tanh(a/T). The tuning of MLP on the solving problem (this procedure is known as the ANN learning) consists in minimization of the following error functional with respect to weights E-~I ~[r
t-(p)]2
p
where p = 1,2, ..., Ntr~i~ - is the number of training patterns, and tip) is the desired value of the output signal. A comparative study of multidimensional classifiers based on the goodness-of-fit criteria k and multilayer perceptrons (MLP) has been carried out in work [6]. It was shown that a) n MLP exhibits the "instantaneous" learning effect and a power close the limit in the case of input data represented in the form of variational series. The reasons are analyzed that underlie these effects. Recommendations for joint usage of the w~k criteria and of MLP are given [6]. Rare events identification on a background of dominated processes is an important problem of applied mathematical statistics. The practical impossibility of ANN training on data with significantly different contributions of separating classes strongly restricts the wide inculcation of neural computational methods in this field. The method for solving this problem was developed in work [7]. It is based on application of MLP with a single layer of
91 hidden neurons having a step-like transition function. The procedure includes two stages: 1) the network learning on data with identical contributions of each separating class, and 2) the transformation of calculated bias matrix. It is shown that the developed approach allows to use neural networks for the identification of rare events with a contribution of order 0.1%. 2. A N N in d a t a classification and function a p p r o x i m a t i o n In recent years artificial neural networks have acquired widespread application in natural sciences, in medicine etc. Here we present some examples of ANNs usage for solving problems of data classification and pattern recognition, and for function approximation. A two-level trigger is developed for suppression of the background and for effective selection of events involving short-lived A-, E- and C-particles in the experiment DISTO. The first-level trigger is intended for selection of events by their multiplicity" only fourprong events are selected. Events accepted by the first-level trigger are then examined with the help of the second-level trigger, which is to be applied for track recognition, in searching for a secondary vertex, and for identifying the secondary particles. It is based: 1) on a recognition of straight tracks applying the specialized cellular automaton (see details in the next section), 2) on the momentum variables permitting effective selection of events containing a secondary vertex, and 3) on the identification of the secondary charged particles applying MLP [8]. A simple and efficient algorithm for identifying events with secondary vertex making use of MLP was developed in paper [9]. The differences Rx, Ry (respectively, in X O Z and Y O Z projections) between the largest and the smallest impact parameters 1 D~ (i - 1,2, ..., n) of all tracks belonging to each of the events analyzed were used in establishing the identification criteria of signal and background events. An effective method for identifying the tracks associated with a particular secondary vertex in an event was developed. The method is based on the differences between the asymmetries exhibited by the sets Di of individual signal events and background events. A procedure for recognition of features in the ECG of one heart beat and from a single channel using MLP was developed in work [10]. The main idea of the method is to present to the network not raw data, but the transformed data. We believe that a system of polynomials orthogonal on a set of uniformly spaced points is the adequate formalism for the analysis of electrocardiograms as measurements are taken in equal time intervals, and all points can be denoted 0, 1,2,... n. The above mentioned polynomials Pk,n(x), k - O, 1,2, ...,m _< n are related by the following recurrent equation (see, for instance, [11])"
(x
(rn + 1)(n - rn)
2(2, + 1)
m(n+m+l) 2(2m + 1)
P.~_~,~(x)- O,
Po,~(x)- 1, P l , ~ ( x ) - 1
Pm+l,n(X)+
1 < m < n, 2x n
The polynomial P,~(x) approximating the function f ( x ) i n this case is
P (x) - c0P0,
(x)+ ClPl, (x) + . . . + cmP ,n(x),
1The impact parameter of a track in the plane passing through the center of the target and perpendicular to the beam.
92 where
(2i + 1)n (i)
Ci -- (i q- n q- 1)(i+1)
E f(k)Pi,,~(k), k=0
i = 0, 1 , 2 , . . . m ,
and x (i) is the polynomial of the form x ('~) - x ( x -
1)... (x - n + 1).
The proposed transformation provides significantly simpler data structure, stability to noise and to other accidental factors. The method was tested on the data generalizing features of normal and modified ECGs and provided high level of recognition for unveiling of barely noticeable pathologies. A by-product of the method is compression of the raw data and reduction of its amount; the compression coefficient has a value of 5+ 10 and can be improved. The procedure adopted for the parametrization of functions defined on a finite set of argument values plays an essential role in the problem of experimental data processing. Diverse methods have been developed and are widely applied in constructing approximating functions in the form of algebraic or trigonometric polynomials. In our work [12], a nontraditional approach to the interpolation of one-dimensional functions is presented. It is based on the application of the specialized feed-forward neural network, which realizes expansion in the set of orthogonal Chebyshev polynomials of the I-st kind. This approach permits to calculate the expansion coefficients during the network training process, for which arbitrary points (for instance, measured in experiments) from the function's domain are used. The neural network provides an accuracy of function approximation practically coinciding with the accuracy, that can be achieved within the traditional approach, when the values of the function at the nodal points are known. 3. CA in p a t t e r n r e c o g n i t i o n and c o m p l e x s y s t e m simulation Cellular automata arose from numerous attempts to create a simple mathematical model describing complex biological structures and processes [13]. A cellular automaton is a most simple discrete dynamical system, the behavior of which is totally dependent upon the local interconnections between its elementary parts [14]. This model turned out to be very productive and has been widely and successfully applied in describing various complex structures and processes in physics, biology, chemistry and etc. A typical cellular automaton is constructed in accordance with the following algorithm: 1. cells and their possible discrete states are defined; usually, each cell may assume one of two states, 0 or 1; however, there may be cellular automata with more states; 2. interconnections between cells are defined; usually, each cell can only communicate with neighbour cells; 3. rules determining evolution of the cellular automaton are fixed; they depend on the actual problem considered and usually have a simple functional form; 4. the cellular automaton is a timed system, in which all cells change states simultaneously. A model of the cellular automaton for recognition of straight tracks has been developed in paper [15]. In this case a cell was identified with the straight-line segment connecting two hits in neighbouring coordinate detectors. To take into account the inefficiency of the detectors one must consider, also, the segments connecting hits skipping one detector.
93
Clearly, only such segments can be considered neighbours which have a common point serving as the end of one segment and the beginning of the second. At each step a cell can assume one of two possible states: 1, if the segment can be considered a part of the track, and 0 otherwise. As the criterion in assigning segments to a track was taken the angle between two adjacent segments. Owing to the coordinate detectors having a discrete structure and to multiple scattering in the material of the experimental apparatus, the angles between track segments in the real experiment are not zero, but an upper limit can be imposed. Upon completion of the work of the cellular automaton additional testing of the quality of reconstructed tracks (for instance, for the presence of at least two hits belonging only to each individual track) is carried out. This permits rejecting "phantom" tracks, which were accidentally constructed from hits belonging to different tracks.
Figure 2: Initial configuration of the cellular automaton for a typical Monte-Carlo event in the spectrometer DISTO
Figure 3: Resultant configuration of the cellular automaton for the event presented in the previous figure
The program realization of the described approach has shown high efficiency and speed for the simulated data for the experiment DISTO. Its working speed provides for the processing of approximately 1000 events/sec using the 50 MIPS RISC processor. This makes suitable its application for track recognition in the second-level trigger of the DISTO spectrometer. In the paper [16] the implementation of Probabillistic Cellular Automata in the study of multispecies agent groups is investigated. As a first step we consider the communication between the two species governed by a probabilistic rather than a deterministic process. This way we implement the kind of coupling suggested in [17] following the spirit of probabilistic control for unstable systems. Here as controller one can consider the population of agents following a specific pattern, and as the unstable system the population which tends to cover all available space in an ergodic-like fashion. From the variety of all possible realizations of the above idea we start our investigations, for the sake of clarity and simplicity (helping us fix ideas by simple examples), considering first two-species one of these species being represented by a single agent and the other by a small group of agents (50 + 100).
94
Acknowledgments This work has been partly supported by the Commission of the European Community within the framework of the EU-RUSSIA Collaboration, in accordance with the project ESPRIT 21042: Computational Tools and Industrial Applications of Complexity.
References [1] V.V. Ivanov and P.V. Zrelov:
"Nonparametric Integral Statistics w~" k Main Properties and Applications", Int. Journal "Computers & Mathematics with Applications" (in Press); JINR Communication P10-92-461, 1992 (in Russian).
[2] P.V. Zrelov and V.V. Ivanov:
"The Relativistic Charged Particles Identification Method Based on the Goodness-of-Fit w~3-Criterion', Nucl. Instr. and Meth. in Phys. Res., A310 (1991) 623-630.
[3] P.V. Zrelov, V.V. Ivanov, V.I. Komarov, A.I. Puzynin and A.S. Khrykin:
"Modelling of Experiment on Investigation of Processes of Subthreshold K + - Mesons Production". JINR Preprint, P10-92-369, Dubna, 1992; "Mathematical Modelling", v.4, N%ll, 1993, p.56-74, (in Russian).
[4] B. Denby: "Tutorial on Neural Networks Applications in High Energy Physics: The 1992 Perspective". In Proc. of II Int. Workshop on "Software Engineering, Artificial Intelligence and Expert Systems in High Energy Physics". New Comp. Tech. in Phys. Res. II, edited by D. Perret-Gallix, World Scientific, 1992, p.287. [5] A.Yu. Bonushkina, V.V. Ivanov and P.V. Zrelov: "Input Data for a Multilayer Perceptron in the Form of Variational Series". In: Proc. of the Fourth Int. Workshop on Software Engineering, Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics, April 3-8, 1995, Pisa, Italy; "New Computing Techniques in Physics Research IV", edited by B. Denby & D. Perret-Galix, "World Scientific", 1995, p.751. [6] V.V. Ivanov: "Multidimensional Data Analysis Based on the wnk Criteria and Multilayer Perceptron". In: Proc. of the Fourth Int. Workshop on Software Engineering, Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics, April 3-8, 1995, Pisa, Italy; "New Computing Techniques in Physics Research IV", edited by B. Denby & D. Perret-Galix, "World Scientific", 1995, p.765. A.Yu. Bonushkina, V.V. Ivanov and P.V. Zrelov: "Multivariate Data Analysis Based on the wnk-Criteria and Multilayer Perceptron', Int. Journal "Computers & Mathematics with Applications" (in Press). [7] V.V. Ivanov and P.V. Zrelov: "Rare Events Selection on a Background of Dominated Processes Applying Multilayer Perceptron'. Report at this conference. [8] M.P. Bussa, L. Fava, L. Ferrero, A. Grasso, V.V. Ivanov, I.V. Kisel, E.V. Konotopskaya, G.B. Pontecorvo: "On a Possible Second-Level Trigger for the Experiment DISTO", "Nuovo Cimento", vol. 109A, 1996, p. 327. [9] A.Yu. Bonushkina, V.V. Ivanov, Yu.K. Potrebenikov, T.B. Progulova and G.T. Tatishvili: "Identification of Events with Secondary Vertex in the Experiment EXCHARM". JINR Communications, P1-96-56, Dubna, 1996 (in Russian).
95 [10] A. Babloyantz, V.V. Ivanov, P.V. Zrelov and P. Maurer: "A New Approach to ECG's Analysis Involving Neural Network". "Neural Networks Letters" (in Press). [11] I.S. Berezin and N.P. Zhydkov: "Computing Methods", vol. 1, Moscow, 1959 (in Russian). [12] V. Basios, A.Yu. Bonushkina and V.V. Ivanov: "On a Method for Approximating OneDimensional Functions", Int. Journal "Computers & Mathematics with Applications" (in Press). [13] S. Wolfram (ed.): "Theory and Applications of Cellular Automata". World Scientific, 1986.
[14]
T. Toffoli and N. Margolus: "Cellular Automata Machines: A New Environment for Modelling". MIT Press, Cambridge, Mass., 1987.
[15]
M.P. Bussa, L. Fava, L. Ferrero, A. Grasso, V.V. Ivanov, I.V. Kisel, E.V. Konotopskaya, G.B. Pontecorvo: "Application of a Cellular Automaton for Recognition of Straight Tracks in the Spectrometer DISTO", Int. Journal "Computers & Mathematics with Applications" (in Press).
[16]
V. Basios, F. Bosco, V.V. Ivanov and I.V. Kisel: "From Individual Interactions to a Collective Behaviour of Autonomous Agents Group". Report at this conference.
[17] "Probabilistic Control of Chaos: Chaotic Maps Under Control", to appear in: The Int. Journal "Computers & Mathematics with Applications", Special Issue, Eds. I. Prigogine, I. Antoniou, et al. (in Press).
This Page Intentionally Left Blank
Proceedings IWISP '96," 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
97
RARE EVENTS SELECTION ON A BACKGROUND OF DOMINATED PROCESSES APPLYING MULTILAYER PERCEPTRON Ivanov V.V. and Zrelov P.V. Joint Institute for Nuclear Research, Dubna, Russia Abstract
Rare events identification on a background of dominated processes is an important problem of applied mathematical statistics. The practical impossibility of a neural network training on data with significantly different contributions of separating classes strongly restricts the wide inculcation of neural computational methods. In this work the uniform approach for solving the mentioned problem is developed. Our approach is based on application of one type of neural networks - a multilayer perceptron with a single layer of hidden neurons having a step-transfer function. 1. I n t r o d u c t i o n The present work deals with problems involving the application of neural-network classifiers for identifying rare events in the case of a background of dominant processes. Under an event we mean the set of features characterizing the analyzed pattern. The main difficulty of application of neural networks for solving the indicated problem is connected with "reluctance" of network to learning, when samples corresponding to different classes with strongly distinguishing apriori probabilities are supplied to its input. The network as if does not "observe" patterns presented by relatively small quantities. In this case the source of errors is caused by a will of investigator to train the network on the basis of equal probabilities (P(Wl) = P(w2)), and then to apply it for separation of classes with inequal contributions (P(Wl) =/= P(w2)). We call this procedure the "approximate" bayessian classification. The investigator usually does not take into account the fact that separating boundaries for these two cases can differ significantly in these two cases and this can lead to incorrect classification. All problems considered in the paper correspond to the bayessian classification with minimal level of error [1]. Results of this work correspond to the case when bayessian limit exists and separating boundary can be found. 2. C r i t e r i a of I d e n t i f i c a t i o n of R a r e E v e n t s In the theory of pattern recognition a keyword criterion characterizing a quality of receiving result is, so-called, level of recognition R. It represents a fraction of correctly identified events from a whole number of events presented for classification and can be written in the form
R-
1-a+m(1-fl) l+m
'
(1)
where m - ~ and N1, N2 are numbers of events of the first and the second class respectively, c~ and/3 are parts of correctly identified events of the first and the second classes. However, it must be noted that in the problem of identification of rare events the value R can not be basic or, at least, the only criterion, because in such problems it is necessary
98 to extract useful events with minimum losses and to leave that part of background events on which level the examined signal is displied quit well. A fraction of signal events to the whole number of events can be used as a convenient criterion. It can be represented in the form: 1--OZ 71
1 - c~ + / 3 m '
(2)
In the dependence of the subject field and on the concrete problem a role of parameters R and 71 can be changed. It can be convenient also to consider some modification of criterion 713. B a y e s s i a n classification of d i s t r i b u t i o n s w i t h different c o n t r i b u t i o n s Let's carry out the quantitative consideration of indexes of bayessian classification on the example of classification of multidimensional gaussian distributions with diagonal covariance matrixes Ej - 0"~ I, where j - 1,2, and I is an unit matrix. Let O"1 # 0"2- The boundary separating classes has a character of a hypersphere with a radius r and with a center in a point b, 2 2 0"10"2 11/71 -
2
-
-
} 1/2
0"~) l n [ ( . e ~ ) = g @ l
0"2#1i
I ,1
-
-
,
,
(3)
i - 1 , 2 , . . . n; fij is a vector of mean values, and P(wj) is apriori probability of events wj, j = 1, 2; n is the dimension of space. It can be shown, that in common case fil 7~ fi2 (for definiteness 0.2 < 0.1, fil = 0), a good approximation for value c~ is served an expression { l(n+2a)
a ~ I
-2 n + 3 a
(_~1)
+
a2 ] ~f}l
n+3a
(4)
'
where I is incomplete V function, a = Y ] n _ l ( b i / 0 . 1 ) 2 , and f = (n + 2a)3/(n + 3a) 2. Similarly the approximate expression for/3 has a form
~- 1 - I
1
-~
+ 2a'
+ 3a'
r
a t2
+
n + 3a'
1,}
, ~f
,
(5)
where a ' = ~=l((bi-~- ~2i)/0.2) 2, and f ' = (n + 2a')3/(n + 3a') 2. Expressions (1) - (5) connect values Crl, cr2,/71, fi2 with variables R and 71 characterizing the quality of classification. 4. R a r e e v e n t s classification a p p l y i n g n e u r a l n e t w o r k The general scheme of the method involves a two-step procedure. At the first stage the training of a network is performed for a ratio of 1:1 between the classes being separated, while at the second stage correction is carried out of a certain group of network parameters termed shifts. In a number of simple cases the correction permits simple analytic transformations, and in more complicated cases it requires minimization of a functional in the space of shifts for given weights of the neural connections. In general case a change of
99
Fig.1.
T h e efficiency of t h e m e t h o d for two rela~tions b e t w e e n c o n t r i b u t i o n s of t h e e v e n t s :
1) P ( w 1) = 0.1, P ( w 2 ) = 0.9 (two t o p c h ~ r t s ) , 2)
P ( w l ) -- 0.001, P ( w 2 ) = 0.999 ( t w o b o t t o m c h a r t s ) .
relation between P(col) and P(co2) leads to some transformation of the separating surface and in special case are determined by a relation of similarity. In the process of construction of separating boundary with the help of a multilayer perceptron with one layer of hidden neurons, which have step-transfer function, fitting of parameters approximating this boundary is carried out. The change of the relation between P(col) and P(w2) corresponds to a parallel translation of each hyperplane. The value of this shift is determined by the threshold of a network. The method are best considered using the example of previous chapter. In this case a change in the relationship between P(COl) and P(aJ2) only results in a change of the radius of the separating hypersphere, i.e. is determined by similarity relations. It can be readily shown, that for known radii, r and r', of the Bayes hyperspheres the
100 shift (c~) of each neuron of the hidden layer should be recalculated by the formula
-
+
.
(6)
i=1
where wij is the weight connecting the j-th neuron of the hidden layer with the i-th input neuron.
For practical realization of the method a multilayer perceptron simulator from the package J E T N E T - 3 [2] was used. There were considered two versions of the method: l) with recounting values of shifts after network training for a ratio P(w~) = P(w2), and 2) with determining the indicated values by means of minimization of network functional in the process of its repeated training. In the second case to the input of neural network with fixed weight matrix were supplied the same data. The problem of separation of two gaussians with diagonal matrixes Ej - crj2I, j - 1 2 i n the space of dimension n = 5 and with or1 = 1, ~r2 = 0.3 was considered. In fig. 1 are presented values of variables R and 71 for the case ~1 = /~2 = 0 and P(wl) = 0.1, P(w2) = 0.9 (figures l a and l b) and for the case IA#il = 1,i = 1 , 2 , . . . 5 and P(wl) = 0.001,P(w2) = 0.999 (figures 1 c and 1 d). The presented values are marked by asterisks for the case of minimization of functional, and by squares - for the case of recounting shifts using a formula (6). Contour notations concerns the training pattern, s h a d e d - to the control sample. The results concerning the approximate bayessian classification (denoted by circles) are presented for comparison. Moreover, results of training and testing of network after first stage of the correction procedure concerning a ratio P(wl) = P(w2) = 50% (notated by triangles), as well as curves concerning bayessian classification are presented. Some deviations from a theory are caused by insugiciently well carried out of training at the first stage of the correction procedure, which characterizes the accuracy of the method. 5. C o n c l u s i o n The method for solving the problem of small probability events classification was developed. It is based on application of one type of neural network - - multilayer perceptron with a single layer of hidden neurons having a step-transfer function. The developed approach allows effectively use the neural network for identification of rare events, which contribution does not exceed 0.1%. This method is mostly actual in the case of small sample sizes n< 10+20. Acknowledments This work has been partly supported by the Commission of the European Community within the framework of the E U - R U S S I A Collaboration, in accordance with the project ESPRIT 21042: Computational Tools and Industrial Applications of Complexity.
Bibliography [1] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, (Wiley, New York, [2] C.Peterson, T.RSgnvaldsson and L.LSnnblad, : "JETNET 3.0- A Versatile Artificial Neural Network Package", CERN-TH.7135/94, December 1993.
Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
101
Cellular Automaton and Elastic Neural Network Application for Event Reconstruction in High Energy Physics I. K i s e l
E. K o n o t o p s k a y a
V. K o v a l e n k o
J o i n t I n s t i t u t e for N u c l e a r R e s e a r c h , D u b n a , 141980 R u s s i a
Abstract
cellular automaton for filtering data and elastic net for geometrical reconstruction of events in high energy physics. The advantages of methods are simplicity of the algorithms, fast and stable convergency, and reconstruction efficiency close to 100%. These methods were tested with success on simulated events and real data obtained in the experiments NEMO (Modane, France) and MI0 (PSI, Switzerland). W e use
1
Introduction
The rapid development during last 10-15 years of various theories of artificial neural networks [1] was a reflection of and attempt to overcome the gulf between the huge amount of factual material relating to the biological mechanisms of brain operation accumulated in neurophysiology and the inadequate existing mathematical formalism and computational means of technical realization of the formalism. The principal advantages of the brain in fulfilling logical, recognition, and computational functions, using capabilities that are essentially parallel, nonlinear, and nonlocal, did not match the prevailing principle of sequential calculations with orientation of the mathematical formalism toward locality, linearity, and stationarity of the descriptions. Included among these are problems whose solution is complicated precisely by nonlinearity, nonlocality, discreteness, and, often, nonstationarity of the situation. For instance, we have here problems of pattern recognition, construction of associative memory and optimization. Essentially, the theory of artificial neural networks is a part of the general theory of dynamical systems in which particular attention is devoted to the investigation of the complicated collective behavior of a very large number of comparatively simple logical objects. Having their own significance cellular automata can be regarded as local discreet form of neural networks. They are used particularly in high energy physics for data filtering and track searching. Here we describe an application of a cellular automaton for searching tracks and an elastic neural net for fitting tracks in the NEMO experiment [2] and for searching for vertex in the MI~ experiment [3].
2
N E M O experiment
The goal of the NEMO collaboration I is to study tiff decays of l~176 and other nuclei to probe the effective Majorana neutrino mass down to 0.1 eV. The collaboration is building the NEMO-3 detector to realize this. A prototype detector NEMO-2 designed for tiff studies has already provided some measurements and is presently running in the Frdjus Underground Laboratory. The detector is a 1 m 3 volume made of tracking chambers composed of drift cells operating in the Geiger mode and two plastic scintillators arrays for energy and timeof-flight measurements. A typical event in this experiment has a few number of tracks usually good separated in space. But this situation is complicated by essential effect of multiple scattering and even hard scattering on wires. 2.1
Cellular
automaton
for track
searching
Searching for tracks in the presence of the left-right ambiguity of drift tubes and significant effects of multiple scattering in a gas and even hard scattering on wires becomes a task lying out of typical problems of event reconstruction in high energy physics. Therefore the method of cellular automata was chosen as flexible one and was good recommended itself working in nonstandard situations. Cellular automata are dynamical systems that evolve in discreet, usually two-dimensional, spaces consisting of cells. Each cell can take several values; in the simplest case one has a single-bit cell: 0 and 1. The laws of Ihttp ://nuweb. j inr. dubna, su/LNP/NENO
102
evolution are local, i.e., the dynamics of the system is determined by an unchanged set of rules (for example a table) in accordance with which the new state of a cell is calculated on the basis of the states of the nearest neighbors surrounding it. It is important that this change of states is made simultaneously and in parallel, with time proceeding discreetly. Cellular automata became particularly popular in the 1970's through the publication of M. Gardner in Scientific American which was devoted to Conway's game, Life. Specific features of the experiment make more preferable the segment model of cellular automaton where elementary unit (cell) is segment connecting two fired wires in neighboring layers. To construct a cellular automaton for track searching in NEMO-2 data, one proceeds following the logic of cellular automata. First, note that the cellular automaton is three-dimensional. A cell is identified with a straight-line segment connecting two fired wires in neighboring drift tube layers, making the cellular automaton essentially local. To take into account Geiger tube inefficiencies one must also include segments connecting wires which skip one layer. At each step an individual cell has two possible states: 1 ~ if a segment can be a part of a track, and 0 otherwise. Second, in establishing the criterion for assigning segments to a track it is obvious that only segments with a common extremity can be considered as neighbors. Then owing to the coordinate detectors having a discreet structure and to multiple scattering in the material of the experimental apparatus, the angles between track segments in the real experiment are not zero, but an upper limit Cmax can be imposed. Third, all cells are initialized with a state 1 and at each step of evolution they look on neighbours and decide to change states to 0 if there are no neighbours with state 1 at both sides. Neighboring segments forming a small angle are preferable during evolution. Fourth, the definition of time is as usual and evolves discreetly. All cells change their states simultaneously. The cellular automaton has following features (comparing with the previous track finder): increase in the tracking efficiency of 9%; increase by a factor 35 in the processing speed; working in 3D space; good reconstruction of tracks with hard scattering on wires; reconstruction of short tracks; simple to modify. 2.2
Elastic
net for tracking
Let us consider only single straight tracks out of magnetic field. There are no noise, which does not need track searching, and no missing wires, which slightly simplify an algorithm. It is obvious to
Define track with multiple scattering as the most smooth line touching all circles rounded fired wives and crossing all layers. Let's try to construct the elastic net as a line which is deformed under influence of t w o t y p e s of force: 1. the first pulls it to the edges of circles; 2. the second smooths out the line. In the case of left-right ambiguity of drift tubes the task can be considered as a problem of minimization of a function of many variables with many local minima. To solve this problem we propose to start from two points rounding global minimum and covering all possible area of physical region of the parameters have to be found. These points are not independent but attract each other. So the points will pull each other from all local minima until the global minimum will be reached. According to this idea we start from two bounded tracks which restrict geometrical area of the real track. One of them touches circles at upper sides but another one B at down sides. Then we introduce a t h i r d t y p e o f force which is: 3. attraction between these bounded tracks to squeeze a geometrical area to the real track. This method allows to find an optimal trajectory which corresponds to our model of track. The elastic net can be simply modified to be able to reconstruct broken tracks. We have only to switch off track smoothing at a break point, which has to be found during preliminary analysis. An example of evolution of the elastic net for a multiple scattered track is presented in Fig. 1. Layers are numbered from left to right and iterations go from up to down. There are two starting tracks B upper and down tracks. One can see smoothing the upper track at first layer after first iteration, but then this track becomes almost linear at the left group of layers and is stable being attracted to neighboring edges of circles. It is in a local minimum and moves down only under pressure of the third type of force ~ attraction to the down track. The middle part of upper track is smoothed at the beginning and then evolves mainly due to attraction to the down track. The right part of tracks is in equilibrium stage at the middle of the evolution and goes to the global minimum only due to smoothing.
103
~
!40
~120
IO0
8O
4O
2O
2:_~
~
.
.
.
.
.
.
.
.
.
.
.
.
.
00 ' .
5
.
10
.
.
.
.
15 20 25 Number o f iterations
i
30
.
~
.
35
.
Figure 1" Example of the elastic net evolution for a track with multiple scattering.
Figure 2: Number of iterations for convergency.
The elastic net method has fast convergency to track with few iterations (see Fig. 2) and reconstruction efficiency close to 100%. m
MM experiment
3
This is a new experimental search for the lepton-number-violating process: M = (~+, e - ) --+ M - ( j u - , e+), a spontaneous conversion of muonium (M) into antimuonium (l~I). This process is forbidden in the standard electroweak theory but allowed in some modern theories beyond the standard model. The experiment is performed at the proton cyclotron at the Paul Scherrer Institute (PSI) in Villigen, Switzerland. The detector consists mainly of two parts. The first one is a magnetic spectrometer with a large solid angle. It consists of five cylindrical multiwire proportional chambers and one scintillator hodoscope built up of 64 strips surrounding the chambers. The target is located at the center of the detector. The second part of the detector, the positron detector, consists of position sensitive micro channel plate and 12 segmented CsI crystal surrounding it.
3.1
E l a s t i c net for v e r t e x search
We use elastic net for vertex search in the case of arc tracks. This kind of task is appeared in many experiments, so the algorithm can be applied widely. The main problem is caused here by a big target (~ 10 cm in diameter) used in the experiment. So we have no good initial approximation for the least squares method applied usually for such tasks. Another feature of the experiment is different number of tracks per event w up to 10. So the algorithm must search for vertex in events with any number of tracks. Let us
Define vertex as geometrical point with maximum density of tracks. We construct the algorithm on the basis of elastic ring and introducing only t w o t y p e s of force: 1. attraction of the ring to all tracks placing it at the condensation area of tracks; 2. attraction to the nearest tracks localizing the vertex region. An example of testing of the algorithm on simulated events is presented in Fig. 3. Here you see 3 tracks crossing the big target. The iterational procedure is also presented in the picture by circles converging to the vertex. A comparison of the elastic net method with the fast vertex search method [4] based on the Chebyshev metrics was also made. Good correlation between errors of vertex for the elastic net method and the method based on Chebyshev metrics (Fig. 4) shows reliable working of the method.
104
Figure 3: Vertex search in 3 tracks event. The iterational procedure is presented by circles converging to the vertex.
4
Figure 4: Correlation between errors of vertex for the elastic net method and the method based on the Chebyshev metrics.
Conclusion
The results of testing on simulated and real NEMO tracks and simulated and real MI~I events demonstrate reliable working of the cellular automaton of the elastic net method. The advantages of the methods are: 9 simplicity of the algorithms; 9 fast and stable convergency; 9 high reconstruction efficiency. This work is partially supported by the Commission of the European Community within the framework of the EU-RUSSIA Collaboration under the ESPRIT contract P9282-ACTCS.
References [1] I. Kisel, V. Neskoromnyi and G. Ososkov, Applications of Neural Networks in Experimental Physics. Phys. Part. Nucl. 24 (6), November-December 1993, p. 657. [2] R. Arnold et al. (NEMO Collaboration), Performance of a Prototype Tracking Detector for Double Beta Decay Measurements. Nucl. Instr. and Meth. A354 (1995) 338. [3] W. Bertl et al. (MI~I Collaboration), Searching for Muonium-Antimuonium Oscillations. Proc. IV INS on Weak and Electromagnetic Interactions in Nuclei (WEIN'95), Osaka, ed. H- Ejiri et al., World Scientific Publ., Nov. 1995. [4] N. Chernov, A. Glazov, I. Kisel, E. Konotopskaya, S. Korenchenko and G. Ososkov, Track and Vertex Reconstruction in Discrete Detectors Using Chebyshev Metrics. Comp. Phys. Commun. 74 (1993) 217.
Proceedings IWISP '96," 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
105
R E C O G N I T I O N OF T R A C K S D E T E C T E D B Y D R I F T T U B E S IN A M A G N E T I C FIELD Baginyan S.A., Ososkov G.A. Joint Institut for Nuclear Research, Dubna, Russia. Abstract An algorithm of track recognition in an uniform magnetic field is proposed for the drift straw tube detecting system of solenoidal geometry. The problem solution is given for (x,y) plane perpendicular to the magnetic field. Our algorithm is elaborated on the basis of (1) sequential histogramming method, which is, in fact, a modifications of the Hough transform and (2) a modification of deformable template method following by a special procedure of parameter correction. Being tested on simulated events our method shows satisfactory efficiency and accuracy in determination of particle momenta.
Introduction The efficiency of the track reconstruction algorithm depends on reasonability of a clustering method applied to group measured data points into track candidates. As examples of such reasonable algorithms one can point out well known methods like variable slope histogramming or stringing (track following) methods [1], as well as relatively new approaches like Hopfield neural networks [2]. One of detector systems widely used in modern experiments of high energy physics (ATLAS, EVA/E850) are drift straw tube detectors (DSTD). Each time, when a passing particle track hits a tube, it registers two data: its own center coordinate and the drift radius, i.e. the drift distance between particle tracks and the anode wire situated in the center of this tube. The main problem, which hinders applications of above mentioned conventional track recognition methods, is so-called left-righ-ambiguity of drift radii. They don't contain the information about, on which side of anode wire the track was passed. In this report the algorithm of track recognition in an uniform magnetic field is proposed for the DSTD system of solenoidal geometry. A problem solution is given for (x,y) plane perpendicular to the magnetic field and anodes of drift straw tubes. Our algorithm is elaborated on the basis of modifications of the Hough transform and deformable template methods. However, the main futures of the proposed algorithm have the common character and are independent of the experimental setup geometry.
2
Formulation
of the Problem
The set S = {xi, yi; ri, i = 1,N}, where (x~, y~) are coordinates of the hitted tube centers and ri are drift radii, is the result of the event measurments. Geometrically the set S can be considered as the set of circles on the plain with centers (xi, yi) and radii ri. Thus the m a t h e m a t i c a l f o r m u l a t i o n of the p r o b l e m is to draw the track line as a circle (a, b, R) tangential to the maximum number of these little circles from S. Let us introduce, as a measure of two circle tangency on the plain, the minimum distance between crossing points of these circles with the straight line juncting centers of both circles. If two circles aretangential, their tangency measure is, obviously, equal to zero. Then our above formulated problem can be reformulated as the following" to find such a circle
106
(a, b,R) t h a t m i n i m i z e s t h e s u m of its t a n g e n c y m e a s u r e s w i t h all circles f r o m t h e set S. Let us denote by D{(a, b,R) the distance from the center of the circle (x~, y{; r~) to circle (a, b, R). This variable can take both positive and negative values. Therefore the tangency measure square of those two circles (xi, yi; r,) and (a, b; R) is twofold: if Di(a, b; R) > 0, then d:, - (Di(a, b; R) - ri) 2, otherwise d + - (Di(a, b; R) + r~) 2. As in [3] we define the two-dimensional vector ~'~ - (s+,sT) with admissible values (1, 0), (0, 1), (0, 0). ~'~ - (0, 0) means i-th tube is the noise tube and the combination ~'~ (1, 1) is forbidden. Let us denote by A the measurement error of the drift radius and define a functional L depending of five parameters (a, b, R, s~-, s+) 9 N i--1
Thus to recognize a track one has to: (1) from the set of all measurement extract a subset S, which as much as possible contains all data for one of tracks; (2) find the L global minimum (although it would be enough to reach its close vicinity). To solve the first problem we modify the Hough transform method [4], which we following to [5] call as the method of sequential histogramming by parameters (SHPM). Besides of extracting of a subset S SHPM provides also starting values of the circle (a0, b0; R0) needed to solve the problem on the next step. The second problem is solved by the deformable template method (DTM) with the special correction of parameters of obtained tracks.
3
Sequential histogramming method
Let ~ - {X~, Yi, i - 1, N} be a set of coordinates Xi, Y~ measured in the process of registering of an event. So-called sequential histogramming method [5] gives us the following algorithm for finding of initial track parameters" 1. Circles are drawn through all admissible point triplets. Then the first coordinate aj of each circle is histogrammed. The value am is obtained corresponding to the maximum of this histogram. 2. With the fixed a,~ circles are drawn through all admissible pair of points from Y/. Then the second coordinate bj of each circle is histogrammed. The value b,~ is obtained corresponding to the maximum of this second histogram. i
3. With the fixed coordinates of the center a,~, b,~ all admissible points Rj of the set are histogrammed. The value Rm is obtained corresponding to the maximum of this third histogram. Then the obtained parameters (am, bin; Rm) are subjected to more sophisticated tests and specifyings. If results are positive, i.e parameters (a,~, bin; Rm)are accepted as a true track, all measurements corresponding to it are eliminated from the set ~ and the whole procedure is repeated starting from the step 1. If the circle (a,~, b,~; R,~) is rejected by testing, then select next combination of parameters. In order to apply SHPM the results of measurements must have a format of the ~-set, i.e. to be a set of track point coordinates.. However, we have instead the set S of little circles {xi, y~; r~, i = 1,N}, so we have to determine on each of these circles a point associated
107
with some of tracks. Supposing the vertex area, from which all tracks of the given event are emanated, is known, one can roughly determine such a point, as a tangent point of the tangent line drawn to each little circle (xi, yi; ri) from the center of the vertex area. So, we have two possible track points. It would not restrain us in applying of the SHPM, but it should be kept in mind that the left-and-right uncertainty factor doubles the elements number of the set f~ = {Xi, Y/, i = 1,2N} in a comparison with the number of elements in the original set S = {xi, yi; ri, i = 1, N}.
4
Deformable template method
After obtaining by SHPM initial values of track parameters and choosing an area where this track could lie, we proceed to look for the global minimum of the functional L (1). One of the main problems here is how to avoid local minima of L provoked by the stepwise character of the vector ~'i - (s +, s~-) behaviour. One of known way to avoid this obstacle is the standard mean field theory (MFT) approach leads to the simulated annealing schedule [6]. As it was shown in [3], parameters s + n s~- of the functional L with fixed (a, b; R) can be calculated by the formulae, where the stepwise behaviour of the vector s'i is replaced in fact onto sigmoidal one. The L global minimum is calculated according to the following scheme: 1. Three temperature values are taken: high, middle and a temperature in a vicinity of zero, as well as three noise levels corresponding to them [3, 6]. 2. According to the simulated annealing schedule our scheme is started from the high temperature. With initial circle values (a0, b0; R0) parameters s +, s~- are calculated. 3. For obtained s +, s~- new circle parameters a, b; R are calculated by standard gradient descent method. 4. The ending rule is standard. 5. If the conditions of the step 4 are not satisfied, then with the new circle parameters (ak+l, bk+l,/i~k+l) next values of s +, s~- are again calculated and go to the step 3. 6. After converging the process with the given temperature, it is changed (system is cooled), values of (a, b, R) achieved with the previous temperature are taken as starting values and we go to the step 2 again. 7. With each temperature value after completing step 5 the condition L < Lc~t is tested. If it satisfied, then our scheme is completed and the algorithm proceeds the next stage of correcting of obtained track parameters (a, b, R). Otherwise, if with the temperature in a vicinity of zero we obtain L > Lc~t, then a diagnostic is provided that the track finding scheme is failed.
P r o c e d u r e of the track p a r a m e t e r correction Deformable template method provide us by track parameters (a, b; R). Hovever these parameters could appeare rather apart of the L global minimum. Therefore we have to elaborate an extra stage for the track parameter correction.
108 On each circle of the set S = {x~, yi; ri, i = 1, N} taking in account corresponding values of ~'i a point is found nearest to the track-candidate. Then all these points are approximated by a circle and X2 value is calculated as a criterion of their smoothness and fitness quality. If it is hold X2 < X~t, then the approximating parameters (ac, be; Re) are accepted as true. Otherwise the track-candidate is rejected.
6
Results
Proposed track finding algorithm of tracks detected by DSTD system in a magnetic field was tested on simulated events. 990 tracks have been modelled as circle arches with radii in the range from 40 cm to 5 m emanatying from a target under various angles. 955 tracks from 990 have been recognized correctly that means 96,4% of the algorithm efficiency. The distribution of the radius relative error shows that its mean and RMS are of the order 10 -2 of radii what is satisfactory for considered experimental setup.
References [1] H.Grote, Pattern recognition in High Energy Physics, CERN 81-03, 1981. [2]-C. Peterson, B. S5derberg, Int. J. Of Neural Syst. 1, 3 (1989). [3] S. Baginyan et al.,Application of deformable templates for recognizing tracks detected with high pressure drift tubes, JINR Commun. E10-94-328, Dubna, 1994. [4] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, (Wiley, New York, 1973) [5] Yu. A. Yatsunenko, Nucl. Instr. and Meth. A287 (1990), 422--430. [6] M. Ohlsson et al., Track Finding with Deformable Templates- The Elastic Arms Approach, LU TP 91-27.
Session D: ADAPTIVE SYSTEMS I: IDENTIFICATION AND M O D E L I N G
This Page Intentionally Left Blank
Proceedings IWISP '96," 4-7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
111
AN UNIFIED CONNECTIVE REPRESENTA.TION FOR LINEAR AND NONLINEAR DISCRETE-TIME SYSTEM IDENTIFICATION Jacques FANTINI L.A.M.A, Universit6 d'Orl6ans, E.S.E.M, rue L6onard de Vinci 45072 ORLEANS Cedex 2 Abstract System identification is the subject of much research and many articles propose often complex computation algorithms. Moreover, the models and identification methods used are different according to whether the real system is linear or nonlinear. This paper presents an identification methodology based on a single model deduced from Neural network theory. It measures and analyses the degree of precision obtained, and defines the influence parameters for convergence of network errors. I-- INTRODUCTION In order to regulate and control a real system, a mathematical representation is required which provides a satisfactory estimation of the process and which can be obtained by identification. The particularity of Nonlinear systems lies in the fact that the principle of superposition is not applicable. Thus the identification algorithms currently proposed are based either on approximation principles or on methods which can not be generalized. 2. ===.
DISCRETE-TIME REPRESENTATION AND CONNECTIVE MODELISATION
2.1 Linear systems Let [Y] = [ F ] [U] be the representation by transfer matrix of a multivariable system, defined as commandable and observable such that [YI the output vector of dimension ny, [U] the command vector of dimension nu, and
~b I zl fij = l= 1 , m < n, element ij of matrix F (ny x nu). The output yi is linked to command uj by the following m Z" ! Eal !=0 . . + Y~bluk.l, m j j characteristic polynomial of degree n . (1) y[ = EalY~,.i whereYri the value of yi and u rJ I=i
the value
ofu
j
in
1=1
the time range [rA, (r+ 1)A[, with A the sample period. i The determination of all of the coefficients a~ and b i of It] and of degree n of each characteristic polynomial is a necessary and sufficient condition for defining a satisfactory representation of the system. Given the Neural network in figure 2.1, with p = ~n, q = m2 and the transformation of variables (T) y~ = 1 I exp(yi~) ,U~ = ~1- ,/ ~exp(u~ )+ ) , bijections of the set of real numbers in [0 -~- [. "N"~,exp(y'r ) + The activation function of each neuron f..(x)=sh(x), an odd, ascending function,for which there exists a derivative, oo
X 2n
and which has the following properties: fact (x) = ~o (2n + 1)! = x + x~ with ~-~0 when x-~0. i
The output expression of the neuron r is O r = f0r) = sh(wr, tE[-t + Wr.t+lEk-t+l )" Given hypothesis (H1): sh(x)~,x, whose conditions and limits of validity are set out in section 4 ==> Or = (Wr.tEik.t + ::> (2)
/ i hy, [wy !,(21-1 )Yk-(21-1)
yi k
+ wy
q ,.2,Yl~-21]+ ,~ ha, (wu,,(21-1)U
i k-(21-1) +
i Wr, t + l E k . t + l )
wul,21U ik-2,)
1=1
From the identity (1)=(2) it follows that: i
ar =
hY~(r)WY~r)Yl~'r
i Yk-r
"b~-
hu ~(r) wu d(r) U Jk-r U~-r
, re[1,m] and 8(r)=Whole Part of ( r + 1~ k 2 /
2.._22 Nonlinear systems Let e(t)=eosincot be applied to the input e of a Nonlinear system. The output y is a non sinusoidal periodic function oo
decomposable into a Fourier transform y(t) = Z sisin(iro t + u ). Its discrete-time representation is: i=!
112
S(Z.I) _-- ~. S i [si__n_9'i_.+z"sin(._....~i_coT-q/i ] i=~
eoZ'lsinw
, and E(z "t) ffi
z 2 - 2z Icosco T + 1 YIk.l wyt,t
z 2 - 2z "lcosicoT + 1
yik. 2
hyl
yi
wyza
Yik. 4 yi k
Eik4
Yk
well,! :wel2,1
het
Eik.t i
/ha,
EJk.2
hez
EJ~_, jk-m
!,2
l~UIq,m
input
hidden layer
output
EJk-Z(
fifure 2.1
EJu.3
EJk.3}
The transfert function F(z a) = Y ( z l ) of the E(z "l ) nonlinear system becomes
wel,3 '2,3
~k~
3 ) - ' l b j i z "j F(Z "1) ~. ~ - - - - - - - - L J = ' 3
i=~ eosinoJT
~vell,2
~k-~
Y aj z j
j=0
The output y is linked to the command e by the following characteristic polynomial: Yk ffi
2
i=! e osincoT j9 - - !
b j,i e
~.j
a y
+ "=
J k.j
=
EJk-4~// inpul ~4
(3) ._.-
a multivariables system, each output y' is linked to the input e i by the same polynomial statement. F o r
we~,4
hidden layers
1~4
output
/ieure 2.2
Given the Neural network infigure 2. 2 with the same definitions enonced in 2...21,the output expression of the i neuron r is O r - f(Ir) = sh(wr, tEik.t + Wr,t+lEk.t+l). Hypothesis (Ill) ~ O r = (Wr, tEk.t + Wr,t+lEk.t+l) :::>
From the identity (3)=(4) it follows that"
ar
_.
"b r =eosinwT
hY~
Yk-r
he r
i we o~(r),rE k-r ek-r
2.3 Identification methodoloL~v
.....
The set of data y ik, discrete-time responses of a system subjected to the commands u ~ (e~), are known and define the information vectors of the Neural network. Thus, Vk>O and bounded, the training sample is defined
(i)
,--{
by the s couples X [ , y k with X k the output.
Yii
"
[k - 1, k -
m]/}the input vector of the network and y 'k
113
The weights wyt,r, WUm,r,wel,~, hy,, hui and hel determined by the training stage allow direct calculation of the characteristic polynomials coefficients. However, the quality of the identification performed will depend respectively on the behaviour of the learning error and the incidence of error generated by the approximation hypothesis. 3_ PROPAGATION OF THE APPROXIMATION ERROR 3.1 Expression of the approximation error of the activation function oo
x2n
fact (X) = 0~ (2n + 1)! = x + x~=~.G < exp(x) - ~ let G~0 for x--~0, x any variable treated by the Neural network. The application of Yk and Ek to the inputs of the Neural network makes it possible to reduce the numeric value of the variables of the system without modifying the identification results. Therefore, the transformation (T) with N a scale factor defining the adjustment parameters of all variables x of the network with e sufficiently weak to result in a satisfactory identification of the system. 3....22Expression of the approximation error in feedforward propagation For a neuron j of the hidden layer: Ij = Wj,rEk-r + Wj,~+lEk- O, let tg 0
= r, ,
1/11
~tl~ < O,let tg 0
= r 2 . W h e n lXll = O , t h e a b o v e
equation (11) is not applicable, that means that there are two or more symmetric axes in the object: (1) When there are more then two axes,
' I sI
Fig.3 The rotary orientation of the object
g o2 = g 2o, the shapes are the square, circle, polygon and so on.
Considering that the directive information should be applied to the robot hand, we choose the normal direction of the shortest radius vector as the robot grasping direction. (2) When there are two axes, g 0z ~ g z0, we choose the radial of the longest radius vector as the rotary orientation.
150
4.3 Calibration The calibration task is to determine the geometric relation of the camera and the robot coordinates. Referring to 2D image the calibration is performed in 2D coordinate. Suppose the visual sensor fame is xv-ovyv, the robot frame is xr-o~-yr, a point (x~,y~) in x~-o~-y~frame can be represented with (xr,y~) in x~-o~-y~frame. If the origin of the vision frame is (x0,Y0) in the robot frame, then we have
Ix l:Ic~
s,o l[ x l Ixol
Yr sintp C O S 9 PyY,, Yo Where, q~ is the rotation angle of the sensor frame relative to the robot frame in the counter clockwise, Px and py represent the unit length of a pixel Xv and yv orientation. We can get three matrix equations by replacing three different points, then the parameters of Px, Py, x0, Y0and q~ can be solved by the equations. 5. Experiment Results The smoothing effect results using the two different smoothing methods of the super-quadrant smoothing and the open-close algorithm are compared in Figure 4. As shown in Figure 4, (a) is the original digital image on which there are a lot of salt-like-noises and spot-noises bexause of the unequal reflection. (b) is the result processed by the super-quadrant smoothing method. The small random noises are eliminated, but the bigger spot noise exists yet. At the same time the image edges nearby the bigger spot are destroyed and a gap is made up. (c) is the result processed by the open-close algorithm smoothing method. All kinds of the noises are eliminated while the details of the image are kept well. As mentioned above we can see that the effect of the open-close algorithm method is better than the superquadrant method in binary-state image smoothing. However, the former scans the image four times, while the latter needs to scan the image just one time. The experiment results showed that the correct rate of recognition is over 95%, the accuracy of location is _+2 mm and the accuracy of the orientation angle is +_2 degrees.
(a)
(b)
(c)
Figure 4 The compared results of the image smoothing 6. Conclusion This paper presents the object recognition method based on the feature parameters, determines the invariant features of the models, and composes a robot vision system which integrates the binary-state image sampling, recognition and location. This system can be used in the robot assembly tasks as the scene vision. The experiment results show that this system is reliable to accomplish the object recognition and work piece location, as well as the system has the features of low costs, simple structure and easy realization. ReaUy the system has much to be improved, for example, the methods of the image smoothing and feature extracting need to discuss deeply. It is possible to adopt a part of hardware or image process chip to speed the system performance which may satisfy the real-time control requirement for high speed assembly tasks. References: [1] B.K.P.Hom. Robot Vision. The MIT Press, McGraw-Hill Book Company, 1986 [2] Tang Chengqing. The Method and Application of the Mathematical Morphology. The Science Press, 1990 [3] Yang Jingan, Zhang Daincheng. The Vision System Based on the Model Recognising the Complexit Object. The Pattern Recognization and Artificial Intelligence, Vol. 3, No. 2, 1990 [4] Zhou Ruiyu, Wang Dapei, Li Quanyi. A Simple Robot Assembly Experiment System Guided by Vision. The Robot, Vol.3, No.2, 1989
Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
151
Recognition of Objects and Their Direction of Moving Based on Sequence of Two-Dimensional Frames Bo~idar Poto~nik, Damjan Zazula Faculty of Electrical Engineering and Computer Science Smetanova 17, 2000 Maribor, Slovenia {bozo.potocnik, zazula} @uni-mb.si
Abstract We developed a new algorithm that belongs to the so-called differential methods, however, it represents a significant extension of the known analysis approaches. Our algorithm can perceive 4 types of object shifts: translatorical, object shifts on optical axis inwards and outwards, object rotations with regard to optical axis and, eventually, appearance of additive noise. We introduced a new approach in analysis of objects which are in occlusion (analytical optimization with respect to the MSE algorithm). The algorithm is very fast (time complexity is of order O(n:)). Our algorithm is a frame that can be very easily upgraded to the needs of real applications.
1. Introduction In our work, we deal with digital processing of a sequence of images from which we try to determine a moving object and trajectory of its movement. Recently, a few methods for movement analysis have been published. Sonka [5] described basic steps for movement analysis on optical flow basis and on significant point basis, J/ihne [3] attempted movement analysis with assistance of space-time images, etc. Because the result of these methods is a vector or matrix (movement field or displacement vector field), there is no possibility for accurate reconstruction of the trajectory of moving object. Various methods of movement analysis has been gathered in [5] and classified to different groups according to algorithms used. Basic steps of the algorithms may be employed also in determination of the movement trajectory. We developed a new algorithm that belongs to the so-called differential methods, however, it represents a significant extension of the known analysis approaches. The paper is organized as follows. In Section 2, we decribe the algorithm developed for movement analysis in detail, while the results and discussion follow in Section 3. Section 4 concludes the paper.
2. Analysis algorithm With our algorithm, we can analyse movement of one moving object in a sequence of gray value images. It can perceive 4 types of object shifts: translatorical, object shifts on optical axis inwards and outwards, object rotations with regard to optical axis and, eventually, appearance of additive noise. Our algorithm consists of the following steps: 1. The first step of the algorithm is binarisation of the sequence of images. Every image from the sequence is binarized with threshold operation using a global threshold. We determine the threshold for every image from the sequence extra, i.e., as a mean between minimum and maximum gray value in that image. The type of binarisation (or other preprocessing operations) can also be selected with respect to the image sequence, to be analysed (ultrasound, MR, CT or SAR images etc.). 2. Given a sequence of n binary images, the static background background(i,j) is established as follows:
152
/n
/
background(i, j) = 2 b k (i, j) div n,
(1)
k=l
where bk is the k-th binary image from the sequence and div stands for integer division. Equation (1) gives only an estimate of the real background. It is evident, that longer sequences produce better estimates. In sequences where the object is at least in one image in no occlusion with any static part of scene this estimate corresponds to actual image of the background (binary image). 3. Then, every image is subtracted the static background obtained (equation (1)), thus producing a sequence of dynamic images. Dynamic images comprise white areas where changes in gray values appear along with subsequent images in the sequence. This feature is used as a criterion for recognition of moving object in the following steps. 4. The moving object is defined as an object with the largest surface area in the dynamic images. This one is, afterwards, used as a praform (template). Polar histogram is constructed for subsequent comparisons. This criterion proved as a robust one, nevertheless it fails in case of only sligth movement throughout the entire sequence. 5. Now, all the flames with dynamic images are processed in order to find successive appearances of the moving object. This matching or searching is divided with respect to whether the object is partially hidden with another object or not. When there is no occlusion in images the procedure is straightforward. However, the occlusions introduce several problems [1, 6], like uncomplete or faulty object identification. We divided searching of the moving object position into two parts, each of the two variants composed of several steps: a. The object is in no occlusion with any static part of the scene: - Polar histogram is constructed for it (number of elements of polar histogram is selected in advance). Individual components are taken into quotients with components of the praform's histogram:
quot[i] - hist~176
+r~176 mod m] (2) histogramobje, [i] where m is number of elements of polar histogram, rotation is the shift index, and mod stands for modulus division. From Equation (2), it is obvious that the vector of quotients is to be calculated for every single rotation (number of rotations is equal to m). For the vector of quotients obtained, the mean and the variance are calculated. These two values play important role in determination of the type of shifts and rotation of the object. -
Rotating the praform, the position with minimum variance points out the most probable rotation of the processed object. At the same time, the mean of quotients corresponds to the object's scaling. -
b. The object is partially hidden: An extended area is formed in a separate flame containing the visible part of the object (from the dynamic image) and the static component, i.e. occlusion. This, newly composed region -
153 (composed object), is the basis for analysis in the following steps. - Centre of gravity is found for this composed area and a partial polar histogram is constructed for the uncovered part of the object. This calculated centre of gravity is the first estimate of our partially hidden object. The estimate becomes in case of very high occlusions rather unrealible. A partial vector of quotients is also computed for every single rotation of the praform (number of elements of polar histogram is not m anymore, but correspondingly lower). - In every rotational position, an analytical optimization with respect to the MSE algorithm for the differences of successive quotients is applied in order to reposition the centre of gravity. With this optimization we determine the final centre of gravity of moving object. - The centre of gravity calculated in the previous step is a basis for subsequent analysis. Partial variance in quotients, which was recalculated with new centre of gravity, is minimum at the most probable orientation of the object under the occlusion. The quotient mean is equivalent to the object scaling. 6. Centres of gravity discovered either way are finally bind into a trajectory of the moving object. Also the data on the object rotation and shifts on the optical axis are available. Besides, if the minimum variance in quotients at a certain frame exceeds a preselected threshold, the object is declared corrupted by additive noise. 3.
Results
and discussion
In Section 2, we described a new algorithm for analysis of movement in a sequence of gray value images. This algorithm was also implemented in C++ language for Windows and tested. An example is shown in Figure 1. In the sequel, the processing results of the image sequence from Figure 1 are shown as generated by our algorithm. First, we binarise every image, so we get a binary-image sequence (Figure 2). This sequence is used in determination of static background (Figure 3), which is obtained with Equation (1). Then, every image is subtracted the static background obtained, thus producing a sequence of dynamic images (Figure 4). From this sequence we recognize moving object with heuristic criterion (Figure 5). In Figure 6, we can see the final result of processing - an image of the trajectory reconstructed for moving obiect.
Figure 1: Testing a gray-value image sequence. Images of dimensions of 256x256 pixels have 256 gray-value levels. In this sequence, all possible object shifts which the algorithm can percieve (translatorical, object shifts on optical axis inwards and outwards, object rotations with regard to optical axis and appearance of additive noise) are present.
Figure 2: Binary-image sequence.
In the above example, we analysed a synthetic image sequence where our algorithm gives completly right results. But this is not the case for every image sequence. Our algorithm has particulary big
154 problems at sequences where the occlusion is very high. Experimenting, we also realized, that if the first estimate of the centre of gravity (step 5b in Section 2) was not close enough to the right value, then the optimization with respect to the MSE did correct the position of the centre of gravity, but this position was still faulty. Completly different problem arise at sequences where the object moves very slow through the subsequent images. In this cases we misidentify moving object (step 4 in Section 2). This problem can be solved in many ways, e.g. by coarse-to-fine strategy - we consider only every fifth image from the image sequence.
Figure 3: Static background image.
Figure 4: Dynamic-image sequence.
Figure 5: Image of moving object.
Figure 6: Image of reconstructed trajectory.
4. Conclusion In our work we presented a new algorithm for movement analysis in the image sequences. Algorithm is an extension to the differential methods of movement analysis. In its basic version, the algorithm is very simple and thus very fast (time complexity is of order O(n2)). It can be easily extended for concrete real applications.
References [ 1] E. Chamiak and D. McDermott, Introduction to artificial intelligence. Massachusetts: Addison Wesley, 1985, pp. 87-167. [2] F. van Heijden, Image based measurement systems. London: J. Wiley and Sons, 1994. [3] B. J~ihne, Digital image processing. Berlin: Springer-Verlag, 1993. [4] J.C. Russ, The image processing handbook. London: CRC Press, 1995. [5] M. Sonka, V. Hlavac, R. Boyle, Image processing, analysis and machine vision. London: Chapman and Hall, 1994. [6] P.H. Winston, Artificial intelligence. Massachusetts: Addison Wesley, 1984, pp. 335-384.
Proceedings IWISP '96; 4- 7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
155
Innovative Techniques for Recognition of Faces Based on Multiresolution Analysis and Morphological Filtering Anastasios Doulamis, Nicolas Tsapatsoulis, Nikolaos Doulamis and Stefanos Kollias Department of Electrical and Computer Engineering National Technical University of Athens Greece Heroon Polytechneiou 9 Zographou Tel.: +301 772-2491
e-mail:
[email protected] Abstract In this paper, we introduce two new methods for face recognition of frontal images. The methods combine the well known Karhunen-Loeve transform with morphological and subband analysis. The use of this kind of analysis contributes to better discrimination between different images. Morphological and subband approaches are compared while the former is a non linear method and the latter a linear one. The results, obtained using 100 test images, show that both approaches are quite efficient. However, the morphological technique seems to lead to slightly better results (5%, 12% error respectively) while the subband technique has the advantage of decreasing the complexity of the task.
1. Introduction The main purpose of a face recognition system is to find a person within a large database of faces (e.g. in a police database). Such a system typically returns a list of the most likely people in the database. However, there are applications in which we want to identify a particular person (e.g. in a security monitoring system) or to allow access to a group of people and deny access to all others (e.g. access to a computer). Some other applications, like speech recognition, better man-machine interaction or visual communication over telephone and other low bandwidth lines, use face identification as an auxiliary tool. So far, the best results for two-dimensional face recognition have been obtained by techniques based on either template matching [1] or matching eigenfaces [2]. The latter uses KL-transform and has the advantage of not requiring specialised hardware. Since this transform achieves the optimal energy compression, faces can be represented in a low dimensional space as a weighted linear combination of the eigenvectors of the autocorrelation matrix of face images. This enforces the mean square error between the representation of the face image and the original image to be minimal. This representation, although optimal in discriminating physical categories of faces, e.g. sex and race, is not optimal in recognising faces due to the details which are necessary to discriminate different faces [3]. In addition, there is no accurate method that verifies the given results of the identification algorithm in order to avoid false alarms (see sect. 4). Two alternative techniques are proposed in this article, so as to increase the efficiency of the discrimination task and to obtain more reliable results. These two techniques combine KL-transform with subband decomposition and morphological filtering respectively. Subband decomposition separates the original images into complementary frequency bands (e.g. Low-Low LL, Low-High LH and so on) for each of which we create a different KL base. Since the LL band contains the largest amount of information, we use the projection of a test image on this band to find a list of the most likely face images, in the database. The higher bands are used for verification if the confidence of the decision made on the LL base is poor. Thus it is feasible to achieve a correct identification using the details kept on the higher bands. Using morphological filtering we are able to change an image to another one with lower frequencies than the original (for example the morphological opening or closing). Therefore we use the result of these filters in the same way that we use the lower band of the subband analysis. The structuring element of the morphological operator was chosen after measurements for various test images have been made. The difference between the original image and the filtered one, projected on the respective base, is used to verify our results.
2. Subband decomposition In this section we describe the first approach which is based on a multiresolution scheme proposed in [4] (Fig.l). An image of resolution (MxN) is decomposed into four frequency complementary images of resolution ---~• ~ . Using this scheme we can create from the original face database four different databases. The KL transform on each of this databases is used to produce four different instances for each face image in the database. Actually, in our consideration, only two instances for each image are used, the instances related with the LL KL-base and the LH KL-
156
base. In the XLL image, which is the image containing the Low-Low spatial frequencies of the original X(m,n) image, most of the energy is accumulated. The respective LL KI~transform converges faster than that of the KL-transform taken on the original images. In addition the complexity of the computation is lower since the autocorrelation matrix of the LL images is of dimensions ( ~ x ff)x(-~- x if) instead of (MxN)x(MxN) in the original images. The LH KLtransform converges slower than the original so more KL coefficients must be kept. Since the LH images are images of details then are used only in the verification step. The proposed algorithm is described below: D e c o m p o s i t i o n step
Given an image Y(m,n) of dimensions (MxN)we create the images YLL, YLH, YHL, YHH USing the subband decomposition scheme shown in (Fig.l). Projection step
The YLL and YLH images are projected in the LL KL-base and LH KL-base respectively, and k, 1 coefficients are retained in each case. The numbers k, I were chosen after many simulations (see section 4.1, Table II). As a result of this step two vectors, related to image Y(m,n), of sizes k, I are created: y~, Yh.
M S E calculation step
For each LL instance, xil, in the database we calculate the MSE e i -- (Xil -- y/)r
(Xil
_ Yl )
and the emin = I~. (ei ) . 1
As potential instances of the image Y(m,n), in the database, we consider the instances whose MSE lie in the interval [emJn 2*%~]. If for only one instance the MSE lies in the previously stated interval, the confidence of the decision is high, and the instance with the minimal MSE is considered to be the prototype of the image Y(m,n) in the database. On the other hand, if more than 10% of the total instances in the database have MSE which lies within the interval, the confidence of the decision is considered inadequate and the image Y(m,n) is discarded without verification. If neither of the previous extreme cases occurs the verification step is needed. Verification step
For instances selected in the previous stage the error m i = (Xih - Yh ) r (Xih -- Yh ) is calculated (Xth is the LH instances of the databases). If the minimal error is lower than a threshold T, which is equal to 0.9*max(error of images which have a prototype in the database), then the instance with the minimal error is considered as the prototype of the image Y(m,n) in the database, otherwise the image Y(m,n) is considered to have no prototype. columns t XHH G, H: Perfect Reconstruction rows $ 1 ~ II 1 s 2l 2 Mirror Filters
~
[1+21
X(m,n)
[2s1[
Keep one column out of
I1, 1
Keep one row out of two
two
IH
~.q251~
-~G
I I 1+2l
XLH
152]
XL L
I
Figure 1. Subband decomposition scheme used to split an image X(m,n) into four frequency complementary images, XLL, XLH,XHL,XHH.
3. Morphological Analysis The goal of this section is to briefly describe morphological tools of interest for the face identification algorithm. A complete description of the mathematical morphology can be found in [7]. Let f(x) denote an input signal and M~ a window or flat structuring element of size n. Then the erosion and dilation caused by this flat element are given by : e , , ( f ) ( x ) = n f m { f ( x + y),y ~ g n ] and ~n(f)(x) = max({f(x- y),y ~ Mn} Two morphological filters can be defined from the above morphological operators, namely Opening and Closing. A morphological opening (closing) simplifies the original signal by removing the bright (dark) components that do not fit within the structuring element [7]. If it is required to remove both bright and dark elements, an opening-closing or closing-opening should be used. We also define the difference between the original signal and the signal after the morphological opening (closing). We should not confuse this difference with the morphological gradient which is given by subtracting the dilation from the erosion with a corresponding indicator of structuring M~ equal to 1. Based on the above morphological filters, we propose an innovative algorithm both for identification and verification. Fig.2 illustrates the mechanism which is used. As it can be seen in Fig.2, we firstly apply a morphological operator on each image of the database. Thus a new database is created which contains the filtered images. From this database we calculate the "opening KL-base" in which the filtered images are projected. Since the new images consist
157
of lower frequencies, it is expected that higher energy will accumulate in the first coefficients of K-L transform. Moreover for each face we calculate the difference between the original and the filtered images and we also create the "difference KL base". However the images of this database contain higher frequencies and thus more coefficients are needed to accumulate the same energy as the original one. As a result this database can be used only for verification purposes. If the confidence of the decision is poor (there are many faces in the list after the projection of the test image on the open KL-base) we use the verification relied on a difference base. List of likely matching faces
Projection on open. KL- base
opening
Test image ~r
I difference I
prototype 9
J Pr~176 ~ KL-base
Confidence of decision
in
I
database /discard Verification prototype/ discard Figure 2. Face recognition scheme based on morphological filtering.
4. Results We have used the male database of the University of Essex in our experiments. We have chosen 100 different frontal faces, with no facial expressions, oriented in the centre of the image and with small scale and decline variations (let us call these images prototypes) to build the K-L bases. As test images we have selected 90 face images, which have a prototype in the database, with variations in scale, decline, orientation and facial expressions. We have used as well 10 face images with no prototype in the database. Given a test image, the question was to recognise the respective prototype, if there was one, or to discard the image because there was no prototype. Two kind of errors are emerged: False alarms (a face which has not a prototype in the base is recognised as one which has) and false discrimination (a face which has a prototype is discarded or is recognised as a false one).
4.1 Results obtained by the subband algorithm 16
25
36
49
not simulated
not simulated
not simulated not simulated not simulated 4 4
not simulated
Num. Of LL KL-base Coeff
Num. of LH KL-base Coeff.
15 16
12
25
10
not simulated 8
5
36 10 8 5 49 9 8 5 Table I: Total percentage error for various simulations of the subband based algorithm.
not simulated not simulated not simulated 3
In Table I the total percentage error (discrimination error + false alarms) is shown, for various simulations. For example retaining 16 coefficients from the projection on the LL KL-base and 25 from the projection on LH KL-base the total error is 8%. Increasing the number of the retained coefficients of the LL KL-base the total error decreases. However, increasing the number of the retained coefficients of the LH KL-base the total error doesn't decrease essentially. Note also that, the total error consists mainly of the false alarms. This can't be easily reduced because it is dependent of the considered threshold T. Faces (with a prototype in database)
Faces with high Conf. Of Decis..
Faces with inadequate Conf.
Faces with low Conf. of Decis.
Disc. Err. of faces with High Conf. of Decis.
Disc. Err. of faces with Low Conf. of Decis.
158
Faces Faces with high Faces with Faces with low False alarms (no protot~e in database) Conf. of Decis.. inadequate Conf. Conf. of Decis. ! 10 0 5 5 3 Table II : Performance of the subband based algorithm retaining 9 and 16 coefficients of the projection on LL KLbase and LH KL-base respectively. In Table II the results of a simulation in which we retain 9 and 16 coefficients of the projection on LL KL-base and LH KL-base respectively are shown. Comparisons with the results of the morphological algorithm, shown in Table III, an be deducted. 4.2 Results obtained by the morphological algorithm In Fig.3 it is presented the results of error discrimination using the above test images. The results have been taken for different structuring elements (5, 10, 15, 20, 25) and for different number of coefficients for each base. The number of coefficients which are kept for opening are the same as the keeping numbers in the difference base. (in this results 9 coefficients). It is observed that the structuring elements 15 gives better results. This is quite logical since the use of a small structuring element deducts good recognition at the opening base and poor at the difference while the use of large structuring element good verification of the difference base and poor recognition for the opening base. It should also be mentioned that the opening base keep well the significant information and as a results it gives very well identification despite the fact that the images of the database (prototypes) are not easily recognisable by the humans.
Figure 3 : Discrimination Error for different
Figure 4 9Number of Coefficients of KL transform for
structuring element
each base and the used verification.
Fig.4 shows the discrimination error for each base and for the verification (in this case we have kept the same umber of coefficients for the Open. and Dif-base). As the number of coefficients increases the total error decreases significantly, especially for the verification and the Open. Base. We choose the same number of coefficients for verification because the results conclude to be very satisfactory without keeping a large number of coefficients for the Dif. Base. One exception is presented in Table III in order to have a comparison with the subband based algorithm. It should also be mentioned that in the verification procedure the major proportion (about 70%) give the right results without the use of Dif. Base and as a result the computational time reduces significantly. Faces (with a prototype in database) 90
Faces with high Conf. of Decis.. 69
Faces with inadequate Conf. 0
Faces with low Conf. of Decis. 21
Disc. Err. of faces with High Conf. of Decis. 0
Disc. Err. of faces with Low Conf. of Decis
Faces Faces with high Faces with Faces with low False alarms (no prototype in database) Conf. of Decis. inadequate Conf. Conf. of Decis. 10 0 6 4 2 Table III" Performance of the morphological algorithm retaining 9 and 16 coefficient of the projection on opening KL-base and difference KL-base respectively (size of structuring element 15).
5.Conclusions In this paper we have presented two innovative techniques for face recognition. In the morphological based approach the results are more promising, since the verification step increases the efficiency of the algorithm. On the other hand the subband based approach is more attractive computationally. Due to the perfect reconstruction filters used in this
159 approach the LH KL-base converges slowly and consequently the verification step does not improve the efficiency of the algorithm significantly.
References [ 1] R. Brunelli and T. Poggio, "Face Recognition: Features versus templates," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. I0, pp. 1042-1052, Oct. 1993. [2] M. Turk and A. Pentland, "Eigenfaces for Recognition," J. Cognitive Neuroscience, vol. 3, no. I, pp. 71-86, 1991. [3] A. O'Toole, H. Abdi, K. A. Deffenbacher and D. Valentin, "Low-dimensional representation of faces in higher dimensions of the face space," J. Opt. Sac. Am. A., vol. I0, No. 3, pp. 405-41 I, March 1993 [4] S.G. Mallat, "A Theory for Multiresolution Signal Decomposiotion: The Wavelet Representation", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674-693, July. 1989. [5] A. Tirakis, A. Delopoulos and S. Kollias, "Two-dimensional filter bank design optimal reconstruction using limited subband information," IEEE Trans. on Image Processing, vol. 4, no. 8, pp. 176-200, August 1995 [6] L. Vincent, "Morphological grayscale reconstruction in image analysis: Applications and efficient algorithms," IEEE Trans. on Image Processing, vol. 2, no. 2, pp. 176-200, April 1993 [7] J. Serra, Image Analysis and Mathematical Morphology, New York: Academic Press 1982.
This Page Intentionally Left Blank
Proceedings IWISP '96," 4- 7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
161
PARTIAL CURVE IDENTIFICATION IN 2-D SPACE AND ITS APPLICATION TO ROBOT ASSEMBLY Feng-Hui Yao*, Gui-Feng Shao**, Akikazu Tamaki*, Kiyoshi Kato* *Dept. of Electric, Electronic and Computer Engineering, Faculty of Engineering, Kyushu Institute of Technology. 1-1 Sensui-cho, Tobata-ku, Kitakyushu 804, Japan Phone (+81)093-884-3255(Direct), Fax (+81)093-871-5835, E-mail:
[email protected] **Dept. of Commercial Science, Seinan Gakuin University, 6-2-92 Nishijin, Sawara-ku, Fukuoka 814, Japan Phone (+81)092-841-131 l( Ext. 262), E-mail:
[email protected] A B S T R A C T This paper describes an algorithm to identify the partial curves o f planar objects in 2-D space and its application to robot assembly. For the given boundary curves of objects, dominant points o f every boundary curve are detected. Then, by considering the dominant points as the separation points, the corresponding boundary curve is segmented into partial boundary curves which are called curve segments. And then, the curve segments belonging to the boundary curve o f an object are translated and rotated to match those o f another object to obtain the matched curve segments. From these matched curve segments, the longest consecutive matched curve is detected. At last, the effectiveness of this algorithm is shown by the experiment results.
I. Introduction The shape of the object plays a very. important role in object recognition, analysis and classification. Researches in this field can be roughly classified into (1) edge detection; (2) dominant point detection of the boundary curve; and (3) shape recognition. The researches about the edge detection focus on edges or contours[I]-[2]. Those about dominant point detection focus on the points of high curvature[3]-[4]. And those about the shape recognition pay attention to the entire shape of the boundary curve and identify the objects[5]. These researches seldom involve the problem of object connection relationship, i.e., to determine whether a part of an object can be connected with that of another one. This problem is very important in robot assembly system. This problem can be thought of as the problem of the partial curve identification. This paper focuses on this problem and proposes an algorithm to identify the partial curve of planar object. In this algorithm, firstly, the boundary curves of objects are extracted from the input image after binarization, and dominant points with high curvature are detected. Then, each boundary curve is segmented into partial boundary curves which are called curve segments by taking the dominant points as the separation points. And then, the curve segment matching is performed. The partial curve is identified based on matching errors. In the following, section 2 describes the algorithm for partial curve identification; Section 3 relates its digital implementation; Section 4 shows its application and experiment results; At last, the effectiveness of this algorithm is discussed and the future works are given.
2. Algorithm to Identify the Partial Curve of Planar Object In the following explanation, the boundary curve is simply called curve if not to point out specifically.
2.1 Dominant Point Extraction For a given object, let 7"(s) represent its boundary curve. 7"(s) is expressed parametrically by its coordinate functions x(s) and y(s), where s is a path length variable along the curve. If the second derivatives ofx(s) and y(s) exist, curvature at (x, y) is computed by C(x,y) = (x ' y " -- y ' x " ) - - ((x ' ~ + 0 ~ ' )2)3/2 (1) To express the curvature at varying levels of detail, both boundary coordinate function x(s) and y(s) are convolved with the Gaussian function g(s, or) defined by g(s, or) = exp(-s2/(2 0 2))-~-( O"4 2 7z)
(2)
where cr is the standard deviation of the distribution. The Gaussian function decreases smoothly with the distance and is differentiable and integrable. Let us assume that cr of the Gaussiang function is small compared with the total length of the curve 7 (s). The Gaussian-smoothed coordinate functions X(s, a) and Y(s, or) are defined as x(s) | g(s, or) and y(s) | g(s, a), respectively, where "| means convolution. Because both X(s, or) and Y(s, a ) are smooth functions and their first and second derivative exist, the curvature C(s, a) of the curve 7 (s) smoothed by the Gaussian function is readily given by applying X ' (s, or), Y ' (s, or), X "(s, or) and Y "(s, a ) to equation (1). For a given scale a, the corresponding curvature C(s, or) can be obtained according the procedure related above. The searching process is applied to detect the local maximum of absolute curvature within the region of support given
162
by the sequence {IGI ..... IG-~I, IGI, IG+ +1..... Ifrl}, where C~ is the curvature of the point in question, and C~ and Cr are the leftmost and rightmost points of the local region of support, respectively. The region of support for each point i is the largest possible window containing i in which [C[ to both the left and right of i is strictly decreasing. The points with local maximal absolute curvature are considered as the dominant points.
2.2 Curve Segmentation For any two objects A and B, let us assume that their boundary, curves are represented by a(s) and ,6'(s), respectively, and that their dominant points are denoted by P ( a ) ={ p ~ p ~r..... p ~ ~.~ } and P( fl)={ p ~o, p z~ ..... P aN-r}, correspondingly, where M is the number of dominant points of the curve ct(s) and N the curve ~(s). Dominant points are numbered clockwise and they are considered as the separation points. Therefore, the curves a~(s) and ~'(s) can be spilt up into curve segments. Let S ~ and S a denote these two sets of curve segments, i.e.
S a = { C t o , 1, Ctl,2 ..... a'u.,,,o}(moduloM),
Sa={/Yo,~,
,8~,2 ..... /5'N_~,o}(moduloN)
(3)
where a ~,j (i, j=O, 1..... M-I, modulo M) and ~u,v (u, v=O, 1..... N-I, modulo N) are the curve segments of the curves a~(s) and ,6'(s), respectively. In this notation, dominant point i is the start of a ~,~ and j the end, and dominants u, v has the same meanings.
Fig.1 Partial curve l~j+l,j-I is translated so that the dominant point i andj overlap,
Fig.2 Input image after binarization, which include two objects.
2.3 Partial Curve Matching The partial curve matching includes the extraction of the candidates of the longest consecutive matched curves (abbreviated as LCMC) and the decision of LCMC. 2.3.1 L C M C Candidates Extraction For the dominant point i on curve a'(s), the curve segment a i-~,i terminates at i and a i.i+~ starts from i, clockwise, where a ~-i,~, a ~,~§~ ~ S a ( i=0, 1 ..... M-l). Similarly, for the dominant point j on curve ~(s), the curve segment Bj§ ~,j terminates atj and jSj,j_~ starts from j, counterclockwise, where/qj+ ~,/, jSj,j.~ ~S~(/=O, 1..... N-l). For simplicity, these two pairs of curve segments are denoted as a ~.z, ~+ ~ and flj+ ~,j_~ and are called partial curves. Then let us consider the matching of a ~.~,i +~ and Aqj+ ~,j. ~. To perform the matching of these two partial curves, fig +~,/. 1 is translated so that the dominant pointj included in Bj+ ~,j_~ overlaps the dominant point i included in c~~-r,i+ ~ (see Fig. 1). The displacement in X-axis is the difference of the x-coordinates of the dominant point i on a'(s) and j on B(s). Likewise, the displacement in Y-axis can be obtained by using their y-coordinates. Next, Bj + r,/-~ is rotated around the dominant point j, clockwise, from 0~ to 360 ~ by 1~ per step. Let E(cr ;_r,;,,6'/+ ~,j)0 express the matching error when B j+ r,j-1 is rotated 8 ~ , which is defined by :
ffdxay + ffd ay
Dr
(4)
D2
where Dr is the area surrounded by the arc~j.~, arcj.~,~+~and arc~§ and D2 is the area surrounded by the arc~_r,~, arc~j§ and arcj+r,~.2, as shown in Fig. 1. When ~j+ ~,/.2 is rotated from 0~ to 360 ~ , the minimal value o f E ( a ~.1,i,~/§ 1,/)a is called the minimal matching error between a 5-r, ~+ r and fl~.+~,j_2, and is denoted by E ( a 5-r, f+ r, fl~ + z,/- Z)m~,. And the corresponding rotation angle is denoted by 8 ( a 5-~, ~+~, ~ j + ~,j- ~)m~,. In the following, if no confusion, they are simply written as E,~, and 8 ,~,. E,~, is simply obtained from the follows
Emi, = min{Eo, E~..... Es59}
(5)
IfE,,~, is small compared with the threshold value Te~, the partial curves a ~-r,~+1 and/5'j§ r,j-~ are said "matched". Then the clockwise neighbor of a i-r,~+ 2, i.e., the curve segment a ~+r,i+2 is added to the end of a ~.~,~+r, and the counterclockwise neighbor of /5'j+2a-r, i.e., flj-~,j-2 is added to the end of/qj§ ~,j-r, the matching procedure related
163
above is performed again. Note here that threshold value is dynamically increased by Tel, i.e., the threshold value is set at 2Te 1. If E( a ,_ I, ~+2,/~j +1,j- 2 )m~, is smaller than 2Tel, and the absolute value of difference of 8 ( a ;_1,; +i, Bj + 1,j- 1),m, and 8 ( a i_1,;+2,r 1,j-2)mi, is smaller than threshold value T 0/2, the partial curves a ~+I,~+2 and flj-l,j-2 are said "matched". This repetition will continue until the "unmatched" curve segments are encountered. Likewise, this procedure is also applied to the counterclockwise neighbors of a g-l,i+ 1 and the clockwise neighbors of flj+ ~,j-1. Here, it is worth to note that the new curve segments will be added to the beginning of a ~_i, ;+ 1 and ,6'j+ 1,j-1. The repetition will stop when the "unmatched" curve segments are encountered. These consecutive curve segments form a candidate of LCMC. The above procedure is applied to all curve segments in S ,~ and S ~. LCMC candidates whose numbers of curve segments are greater than the threshold value TL are passed to the next step for the decision of LCMC.
2.3.2 LCMC Decision For the k-th LCMC candidate (k=0, 1..... K, and K is the total number of LCMC candidates), its minimal matching error is recalculated by overlapping the centers of the corresponding consecutive curve segments and rotating the curve segments belonging to S ~ from 8 m,n -T o to 8 ,,in +T 8 by 1~ per step. The LCMC candidate whose minimal matching error is smallest is considered as LCMC at which the two curves match optimally.
3. Digital Implementation When to implement the above algorithm, it is necessary, to define the digital curve, digital curvature and digital matching error. In Cartesian coordinates, the coordinate function x(s) and y(s) of closed curve is digitally expressed by a set of Cartesian grid samples {~q, y~} for i=1, 2 ..... N (modulo N). The digital curvature at point i on curve can be calculated by
c, = A x a ~v- Ay A ~
(6)
where A is the difference operator and A 2 is the second-order difference[3]. The digital Gaussian function in [6] with a window size of K=3 is employed here to generate smoothing functions at various values of a and it is given by h[O] = 0.2261
h[l] = 0.5478
h[2] = 0.2261
(7)
where h[1] is the center value and ~ h[k] =1 (k=-0, 1, 2). This digital function has been mentioned in [7] and [8] as the best approximation of the Gaussian distribution. For digital functions with higher values of or, the above K=3 function is used in a repeating convolution process. A 2(/+1)+1 digital smoothing function is created by repeating the self convolutionj times. Note here that the digital Gaussian smoothing function for a largest a must have a window size no larger than the perimeter arc length N of the curve. A multiscale representation of the digital boundary curve from cr =0 to cr ,,~ can be constructed by the digital function defined above. Therefore, the multiscale digital curvature can be obtained according to the equation (6). And then, for each point i, a searching procedure is applied to detect the local maximum of absolute curvature. Points on the curve with local maxima of absolute curvature are considered as dominant points. For any two objects A and B, let a and/7 represent their digital boundary curve. Then a and/7 can be expressed by the sets of digital points on the boundary curves, i.e., a={(Xo, yo), (xl, yl) ..... (xM_1,yM-1)}, /7={(Xo, yo), (xl, yl) ..... (xN_l, YN-1) }. Their dominant points can be obtained by the method related just above. Their segmentation can be performed according to the method related in section 2.2. And the digital curve segments are also expressed by equation (3). Here and after, if no confusion, the digital curve segments are also simply called curve segments. Next, the matching procedure is applied to these digital curve segments. The matching error shown in equation (4) is digitally computed by max { P, Q }
max { U, V}
E( a r i-l,i, t~j+ 1,j)8 -" E (Sz3(p,p+l,q)+Szs(q,q+l,p+l))+ Z (Szs(u,u+l,v)+Szs(v,v+l,u+l)) p=O u=O q=O v=O
(8)
where P, Q, u and V are the numbers of digital points of the curve segments ,6:.+ ~,j, a ~_i,i,/qj.j- 1 and a ;, ~+ 1, respectively. As shown in Fig.l, S~,e+l,q ) is the area of the triangle formed by the points p, p+l and q. Similarly, S~ (q,q+l,p+1),S• and S~(v,v+l,u+l) have the same meanings. Here, it is worth to note that if the number of digital points included in a curve segment is less than that of the curve segment in comparison, its start point or terminal point will be employed to correspond the rest points of the curve segment in comparison to continue the calculation of equation (8). Which of them will be used is decided by the tracing direction along the curve segment (clockwise or counterclockwise). For example, in the region D2 of Fig. 1, the digital matching error is calculated, starting from the overlapped domoinant point i (or j), by taking out one point from each curve segment a ~_l,iand/~j+1,j and putting into the first item of
164
equation (8). Because the number of the points included in the curve segment a i_l,iis less than that in ,6'j+ 1,j, the start point of a i-1,~, i.e., the point i-1 is employed to continue the calculation for the rest points of ~j,j+ 1. This calculation stops at the terminal point of/~j.j+ 1, i.e., the pointj+l. The same procedure is also applied to the region D1. The partial digital curve/~j.l,j+ 1 is rotated from to 0~ to 360 ~ by 1~ per step. After each rotation, the matching error is computed. The minimal matching error can be obtained according to equation (5). Table 1. L C M C candidates obtained
No. 0 1 2 3
Curve segments of the object on left 6-5-4-3-2 6-5-4-3-2-1-0 6-5-4-3-2-1-0 9-8-7 -6-5-4-3-2
Curve segments of the object on right 17-18-19-0-1 0-1-2-3-4-5-6 0-1-2-3-4-5-6 1-2-3-4-5-6-7-8
4. Application and Experiment
Overlapped dominant point 31eftand lri~t 21eftand 5de,ht 1left and 6ri#t 31eftandZrie,ht
~
0
1
~ ~1~ 2
2
~1 18~17 u--- 19
12
,4
5
" ~15
17
An application model of this algorithm is supposed that a robot mounted a camera assembles machine parts. The experiment is Fig.3 Extracted boundary curves, detected dominant points and the final LCMC. performed with the real image. Fig.2 shows the input image after binarization, which includes two objects. Fig.3 shows the extracted digital boundary curves, the detected dominant points (marked by small "tr') numbered clockwise. Four LCMC candidates are listed in table 1. The first LCMC candidate is decided as LCMC and is shown in Fig.3 by the thicker lines. Fig.4 shows assembled result after the object on the left is translated 162 dots along X-axis and -32 dots along Y-axis, and is rotated 90 ~ clockwise. The values of Tel, T e and TL are 80, 30 ~ and 4. 5. Conclusions and Future Works
This paper proposed an algorithm for the partial curve identification in 2-D space. The application model is supposed that a robot mounted a camera assembles the machine parts in which the connection relationships among the machine parts are necessary. The problem of object connection relationship can be simplified as the problem of partial curve identification. The real images are employed to test this algorithm. From the experiment result, it is clear that this algorithm is effective.
Fig. 4 Assembled result.
This experiment employed the images of objects without texture. If the objects have some texture, the boundary curve detection will become more difficult. Moreover, if the input image includes objects more than three, a partial curve of an object may match the partial curves of multiple objects. In this case, it is necessary to employ the image values near matched curves to decide the optimally matched partial curve. Further, in the vision robot assembly system, only this is not enough. It must be combined with other 3-D information. All these are left to do in the future. REFERENCES [ 1] R.M. Haralick, "Digital step edges from zero crossing of second directional derivatives," IEEE Trans. Patt. Anal. Mach. Intell., vol. PAMI-6, no. 1, pp.58-68, Jan. 1984. [2] R. Mehrotra, K. R. Namuduri and N. Ranganathan, "Gabor filter-based edge detection," Pattern Recognition, vol. 25, no.12, pp.1479-1494, 1992. [3] A. Rattarangsi and R.T. Chin, "Scale-based detection of corners of planar curves," IEEE Trans. Patt. Anal. Mach. Intell., vol PAMI-14, no. 4, pp.432-449, Apr. 1992. [4] P. Zhu and P. M. Chirlian, "On critical point detection of digital shapes," IEEE Trans. Patt. Anal. Mach. Intell., vol. PAMI-17, no. 8, pp.737-748, Aug. 1995. [5] I. Sekita, T. Kurita and N. Otsu, "Complex autoregressive model for shape recognition," IEEE Trans. Patt. Anal. Mach. Intell., vol. PAMI-14, no. 4, pp.489-496, Apr. 1992. [6] P.J. Butt, "Fast filter transforms for image processing," Comput. Vision, Graphics & Image Processing, vol. 16, pp.20-51, 1981. [7] P.J. Butt and E. H. Adelson, "The Laplacian pyramid as a compact image code," IEEE Trans. Commun., vol. COM-31, no.4, Apr., 1983. [8] P. Meer, E.S. Baugher and A. Rosenfeld, "Frequency domain analysis and synthesis of image pyramid generating kernels," IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-9, no.4, pp.512-522, Apr., 1988.
Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors)
165
9 1996 Elsevier Science B.V. All rights reserved.
A fast active contour algorithm for object tracking in complex background Chun Leung Lam, Shiu Yin Yuen E-mail: cllam @ee.cityu.edu.hk, eekelvin@ cityu.edu.hk Department of Electronic Engineering City University of Hong Kong 83 Tat Chee Avenue, Kowloon, Hong Kong Abstract- Active contour is a powerful tool to object tracking. However, the existing models are only applicable to track on simple image. Based on the idea of the original greedy algorithm, we present a fast greedy tracking algorithm to face with the problem of tracking on complex real image. We have demonstrated the algorithms on tracking complex shape object on complex background. 1. Introduction 2D object tracking is a hit research topic in dynamic scene analysis. Different methods can be used : (1) image region based tracking algorithms [1 ]; (2) feature point based tracking algorithms [2]; and (3)line segment based tracking algorithms [3]. In general, these methods require an explicit definition of a dynamic model of the moving objects. Many objects cannot be described by simple geometric shapes (e.g. circle, ellipse) but need to be represented with complex contours. In order to model the complex natural shape contours, Kass et al. [4] has introduced the idea of active contour (deformable contour). Active contour models were successfully applied in computer vision problems such as optimal contour detection [5,6,7], and simple shape object tracking on a uniform background [7,8]. D. J. Williams and M. Shah [9] proposed a greedy active contour algorithm which is fast and stable. In section 2, the tracking results using their greedy algorithm are shown, which is useful in summarizing the difficulties of tracking by active contour. In section 3, a new "greedy tracking algorithm" is proposed and the results of using the proposed algorithm to track objects with complex shapes in complex background is given. Finally, a conclusion is given in section 4. 2. Object tracking by greedy algorithm Suppose the contour Ct of a moving object M at time t is known, Ct can be used as an approximate contour of the target object at time t+l, provided that there is only a slight change in the target object. In order to find the best description Ct+ 1 from Ct, an adjustment process is necessary to fine tune the shape of the contour by using the information available at image frame It+l. 2.1. Classical active contour approach The snake equations provide flexible tracking mechanisms that are driven by simulated forces derived from time varying images. Let the contour be represented by v,.=(x(s) ,y(s)). The classical active contour approach involves minimizing an energy function defined by i
(1 )
Esmtke : i Eint (V(S)) "1- Eex ` (v(s))as 0
for the active contour to move onto the object border. The internal energy is written as 1
2
+
2
)
which serves as a smoothness constraint. The external energy force.
Eext consists of the external constraints and the image
2.2. Greedy algorithm Greedy algorithm is a fast active contour algorithm proposed by D. J. Williams and M. Shah in 1991 [9]. Since the algorithm is both stable and fast, it is suitable as being an adjustment process for object tracking. The quantity being minimized by the greedy algorithm is
E = ~a(s)nor(E ......)+ ~(s)nor(Ecurv) + ~y(s)nor(Ei,,~ge)ds
(3)
and the energy terms are defined by
Eco,,.i -I-d-Iv i -vi_,l[
(4)
-Iv;_,- 2v, + m
where d represents the average distance between contour points in the previous iteration cycle, nor(E) represents a normalizing function with respect to the energy value of the neighboring pixels. The values of Eco,,t, E,.,~v and Eim,,g, are
166 all normalized to values between 0 and 1, and o~=1, 13=0 or 1 (depending on whether a corner is assumed at that location) and y=1.2 in the greedy algorithm.
(a) original image (b) result of greedy algorithm, require 0.16s (c) result of greedy tracking algorithm, require 0.21s Figure 1. Translated square and circle. (20 points used, with window size 3x3)
Figure 2.6 degree/frame rotating cup. (31 points used, with window size 3x3) The complexity of the greedy algorithm is O(nm e) for a contour having n points which allows the active contour to move to any point in a neighborhood window of size mxm at each iteration. (The full greedy algorithm will not be listed in this paper, but for more information, please refer to [9].) The results of using the greedy algorithm to object tracking are given in Fig. 1b, 2b and 3b. In Fig. 1b, the circle and the square are both moved slightly to the right and downward. We find that two of the contour points on the right edge of the square are attracted by the border of the circle when the contour in Fig. 1a is used as the initial contour of Fig. lb. In Fig. 2b, the cup has been rotated. The upper and lower portion of the arm contour are attracted by the internal structure and the background structural noises respectively. Although the rough shape of the body of the cup can be successfully extracted, the extracted border on the arm of the cup is not satisfactory. Fig. 3b is the result of tracking the human body silhouette in two consecutive images using greedy algorithm. We can see that only the regions near the shoulders and the left foot are extracted correctly. Results show the active contour model is sensitive to both the internal structure and background structural noises. Therefore, the active contour model can only be applied to track simple object moving on a uniform background.
Figure 3.0.5frame/s walking man. (68 points used, with window size 5x5) 3. Greedy tracking algorithm To incorporate more shape information into the model and reduce both the influence of internal structure and background structural noises when the method is applied to object tracking problems, we propose a "greedy tracking algorithm". The proposed model aims to maintain the shape and gray level values along the contour, in order to minimize the influence of both the internal structure and the background structural noises, due to the complexity of the target object and the background. The structure of the proposed algorithm is similar to the original greedy algorithm.
167 The algorithm is an iterative process. In each turn, each contour point is allowed to move to the neighboring location which has the lowest energy level, and the computational complexity is O(nm2). However, the definition of the energy function being minimized is different to the greedy algorithm. This makes the model has more desirable behavior when applied to the object tracking problem. The form of the energy function being minimized in the proposed algorithm is similar to equation (3). Internal energy of the contour is contributed by the sum of a continuity force and a curvature force. Let the contour be represented by {vi} = Ix(i), y(i)} where i=0,1 ..... n-l, and x(i),y(i) are pixels' coordinates. The continuity energy is redefined as IIu:i lu:+lll where U t E .......' = II/+lllu'I,- lu'+'--'--]] i - Vi-1;i-I in image I t. (6) I i+1 II (Note that all index arithmetic is modulo n) The internal continuity energy is so defined since we allow the points to be unevenly distributed on the contour and we only want to maintain the approximate distribution of the contour points on a newer image frame. The internal curvature energy is redefined as E ..... , =
Ic: -
I
C: +1
with
I
^,1 "]/~: •x u-"/'+l[
C: : /~i'+1 - ui
i - -
Again, the curvature at each point is maintained by minimizing the curvature energy. The curvature vector C: at point i has a magnitude equal to the square of the difference of the unit vector ui+ ~', and ui^', and with direction parallel to the vector ~.' • t~:+l . The continuity and curvature energy are so defined since it is assumed that the shape of the contour does not change a lot in a short time gap, the results of minimizing the continuity and the curvature energy together is that the approximate shape of the contour across any two consecutive frames is maintained. Note that the original assigned contour point can be of any shape (including low and high curvature point), this is a desirable property since many real objects has sharp curvature points, like corners. In the original active contour model, Econt, i (equation (4)) and Ecurv, i will be zero when the contour points are equally spaced and the curvature is zero. Thus the original active contour model biased towards i) equally spaced contour points and ii) low curvature. Moreover, corners have to be specified using the special method of setting fl = 0. This is undesirable since during a motion, i) maintaining equally distort feature points may not be the best strategy to represent a shape most faithfully (compactly); ii) a priori assumption of low curvature is not particularly realistic in a shape representation; iii) this problem is even more pronounced since the motion and view changes may continuously produce points of sharp curvatures as new occluding contours come into view. On the contrary, our method does not suffer from such anomalies. Eco,t.i (equation (6)) and Ec.... i (equation (7)) will be zero merely when i) the spacing ratio of consecutive contour points and ii) the curvature do not change between frames. Also, from equation (7), it is clear that the method does not have to take special care for corner contour points and the appearance or disappearance of a corner point can be gradually accounted for by the equation. On the other hands, the external energy is defined as Eex , = ]Gbl, (v) - abl,+ l (v)]- ]VI,+ l (v)] 2
(8)
where Gbl is the Gaussian blurred image of I. Minimizing the external energy will cause the contour point to move to the new location where the approximate gray level value can be maintained, and the contrast is high. The proposed algorithm is listed below :
Greedy tracking algorithm
Input : Output :
Image It, It+t, contour Ct o f image It Adjusted contour Ct+l o f image It§
ct = [3 = 1, T = 1.2, ptsmoved = 0"
do{
for i = 0 to n { for j = 0 to m-1
//Note: all index arithmetic is modulo n //first point is processed twice
for k = 0 to m-1 calculate Econt,i(j,k), Ecurv,i(j,k), Eimage,i(j,k) ; nor( Econt,i(j,k) ) = Econt.i(j,k) / MAX( Econt,i(j,k ) ) ; nor(Ec~,i(j,k) ) = Ecurv,i(j,k) / MAX(Ecu~v,i(j,k) ) ; nor(Eimage,i(j,k) )=Eimage,i(j,k) / MAX(Eimage,i(j,k) ) ; for j = 0 to m-1 //mxm =size of neighborhood for k = 0 to m-1 Ei(j,k)=o~ nor( Eeont,i(j,k) )+ ~ nor(Eeurv,i(j,k) ) + y nor(Eimage,i(j,k) ) ; Locate smallest Ei(j,k) ; Move vi to location with smallest Ei(j,k) ; ptsmoved += 1 ;
168
} }while ptsmoved < threshold ; Note that the first contour point Vo is processed twice (like the greedy algorithm), since the point v,.l has not been updated when Vo is processed. Reprocessing the point Vo helps to make its behavior more like that of the other points. Results of using the greedy tracking algorithm to object tracking are given in Fig. lc, 2c, 3c and 4. (Note, we use the same weight settings as in the greedy algorithm, ot=fl=l, 7'=1.2. In contrast to the original greedy algorithm, we have no need to set fl=O for corner points) We use gray level images of size 256x256 pixels. The processing time, the number of point and the window size used for each image (using a PC 486DX33) are listed under each picture. Fig. 2c, 4a and 4b are the results of tracking a rotating cup at time frame 2, 5 and 10 respectively, which demonstrates that the proposed algorithm is successful in tracking rigid objects with complex background provided that the motion is slow. Fig. 3c is the result of tracking the human body silhouette in two consecutive images. The upper portion of the body is correctly extracted which shows that the model is applicable to track complex shape non-rigid body. However the right foot is lost, because the proposed algorithm intends to maintain the shape of the contour across two consecutive image frames. This demonstrates that the algorithm only allows a small change of the shape of the contour across different frames.
Figure 4. 9~ rotating cup image sequence: Fig.2a( I1 )--->2c( 12 )--->4a( I5 )--->4b( I10 ). (31 points used, with window size 3x3) 4. Conclusions A fast "greedy tracking algorithm" is proposed in this paper. The proposed model aims to maintain the shape and gray level values along the contour, in order to minimize the influence of the internal structure and background structural noises due to either the surface texture complexity cf the target object or the background. The proposed algorithm has been applied to tracking objects in complex real images, and results manifest that the model is quite successful in tracking rigid or non-rigid object providing the changes are slight. Also, the tracking results is satisfactory even when the shape of the object is complex. On the other hand, although maintaining the shape of the contour is helpful in tracking complex objects, it limits the flexibility of the model since it only allows slight changes to occur. Alternatively, the method requires that successive frames be closely spaced in time. This is a compromise which we have to make in our approach.
References
1. D.S.Kalivas, A.Sawchuk, "A Region Matching Motion Estimation algorithm', CVGIP: Image Understanding, Vol.54(2), 275-288, 1991. 2. S.K.Sethi, R.Jain, "Finding Trajectories of Features Points in a Monocular Image Sequence", IEEE Trans. PAMI, Vol.9(1), 56-73, 1987. 3. R.Deriche, O.Faugeras, "Tracking Line Segments", Image and Vision Computing, Vol.8(4), 261-270, 1990. 4. M.Kass, A.Witkin, D.Terzopoulos, "Snakes: Active Contour Models", Proc. Int. Conf. Comp. Vis., 259-268, 1987. 5. A.A.Amini, T.E.Weymouth, R.C.Jain, "Using Dynamic Programming for Solving Variational Problems in Vision", IEEE Tans. PAMI, Vo1.12(9), 855-867, 1990 6. C.A.Davatzikos, J.L.Prince, "An Active Contour Model for Mapping the Cortex", IEEE Trans. Medical Imaging, Vol. 14(1), 65-80, 1995. 7. D.Geiger, A.Gupta, L.A.Costa, J.Vlontzos, "Dynamic Programming for Detecting, Tracking, and Matching Deformable Contours", IEEE Trans. PAMI Correspondence, Vol.17(3), 294-302, 1995. 8. F.Leymarie, M.D.Levine, "Tracking Deformable Objects in the Plane Using an Active Contour Model", IEEE Trans. PAMI, Vo1.15(6), 1993. 9. D.J.Williams, M.Shah, "A Fast Algorithm for Active Contours and Curvature Estimation", CVGIP: Image Understanding, Vol.55(1), pp. 14-26, 1992.
Proceedings IWISP '96," 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
169
The Two-Point Combinatorial Probabilistic Hough Transform for Circle Detection (C2PHT) J. Y. Goulermas and P. Liatsis Control Systems Centre, Dept. of EE&E, UMIST, PO Box 88, Manchester M60 1QD, UK e-mail: {goulerma/panos}@csc.umist.ac.uk A novel Hough Transform (HT) for circle detection, the C2PHT, is presented. While, other Combinatorial Probabilistic HTs reduce generation of redundant evidence by sampling point-triples, the C2PHT achieves a much higher reduction in two ways. Firstly, by using the edge gradient information, it allows point-tuples to define circles and consequently decreases the sampling complexity from O(N3) to O(N2). Secondly, the transformation is conditional, that is not all the tuples are eligible to vote. The evidence is gathered in a very sparse parameter space, so that peak recovery is readily despatched. The result is high speed, increased accuracy and very low memory resources. INTRODUCTION The Hough Transform >
,L I
I
bus
driver btt fer SR kM btt fer SR kM I
vo, I
ISA bus counters
~
data bus drfi,er
I ha,
[driver control signals status signals
add "ess ~ dee ~der
to PC
Fig. 4: Block diagram of the scanlng unit and PC add-on card
The block diagram of the sensing unit and PC add-on card can be seen from Fig.4. This adapter enables the amplification, A/D conversion and storage of both output videosignals video 1 and video 2 of the sensor (sample frequency approx. 1 MHz, 8 bite representation) into RAM or hard disk of standard PC compatible computers. The data can be then processed using the algorithms mentioned above. 6. Conclusion
Computer simulation of the designed optoelectronic method of the aircraft's TVV proved the possibility of obtaining high accuracy of measurement and the admissibility of the design of real measuring device. It enables the optimization of parameters of algorithms applied for the TVV determination and to quantify and minimize the machine time for computations. References
[1] RICNY, V.-MIKULEC,J.: Measuring Flying Object Velocity with CCD Sensors. IEEE Aerospace and Electronic Systems. Vo.9, Nr.6, June 1994 (pp..3-6) [2] JURIK, R.: PC Add-on Card for the Double-line CCD Sensor. Proceedings of the 6th National Scientific Conference ,,Radioelektronika 96". Faculty of Electrical Engineering and Computer Science TU BRNO, 1996 (pp.95-96)
Session F: TEXTURE ANALYSIS
This Page Intentionally Left Blank
Proceedings IWISP '96,"4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
183
Rotation Invariant Texture Classification Schemes using GMRFs and Wavelets Robert Porter* and Nishan Canagarajah* Image Communications Group, Centre for Communications Research, University of Bristol, UK. Abstract Many texture classification schemes suffer from a number of drawbacks. They require an excessively large image area for texture analysis, use a large number of features to represent each texture and are often computationaUy very demanding. Furthermore, few classification schemes have the ability to maintain a high classification rate for textures that have undergone a rotation. In this paper, we present two new rotation invariant texture classification schemes based on Gaussian Markov random fields and the wavelet transform. These schemes offer a high classification performance on textures at any orientation using significantly fewer features and a smaller area of analysis than most existing schemes.
1.
Introduction
Texture classification is a difficult but important area of image analysis with a wide variety of applications ranging from remote sensing and crop classification to medical diagnosis. A number of approaches to this problem have been proposed over recent years including stochastic models such as Gaussian Markov Random Fields (GMRFs) [1] and autoregression [2, 3], statistical analysis methods [4] and spatial frequency based techniques [5, 6] amongst many others. However, many of the existing methods require a large number of features to describe each texture which can lead to an unmanageable size of feature space [4]. Furthermore, the feature extraction techniques employed are often computationally very demanding [4] and require an excessively large image area for the analysis [4, 6]. This is clearly undesirable if only small texture samples are available or if the features are to be applied to a segmentation problem requiring high resolution. Another drawback of the majority of classification schemes is their inability to maintain a high classification rate when the textures for classification have undergone a rotation [5]. Here, two new classification schemes are proposed, employing features extracted using either wavelet analysis or Gaussian Markov random field modelling on a small area of the image. It is shown that these schemes require significantly fewer features than most others and provide high performance rotation invariant texture classification.
2. 2.1
ProposedSchemes The Wavelet Transform
The first approach derives features from a 3-level wavelet decomposition of a small area ( 1 6 x 1 6 ) of the image. Fig. l(a) shows the 10 main wavelet channels resulting from such a decomposition. A feature vector made up of the average energies within these channels was successfully employed in segmenting textured images in [7]. However, the HH channels in each level of decomposition tend to contain the majority of noise in the image and were found to degrade the performance when used for texture classification. Therefore, only the remaining seven channels were chosen to provide features for texture classification (the numbered channels in Fig. l(b)). The energy in each of the chosen wavelet channels is calculated to create a seven-dimensional feature vector for texture classification. The energy of a wavelet channel is given simply by the mean magnitude of its wavelet coefficients, i.e. ec,,, the energy in the nth channel is given by: 1
M N
eo, = - ~ ,~. ~~[x(i,j) l,
(1)
where the channel is of dimensions M by N, i and j are the rows and columns of the channel and x is a wavelet coefficient within the channel. Unfortunately, these features are not rotation invariant, since different features are used to represent the texture's horizontal and vertical frequency components. Rotation invariance can be e-mail :
[email protected] * e-mail :
[email protected] 184 achieved by combining the horizontal and vertical frequency components to form single features. Hence, the pairs of diagonally opposite LH and HL channels in each level of decomposition are grouped together to produce four main frequency or scale bands in the proposed scheme, as illustrated in Fig. 1(c). The energy in each of the four chosen bands is calculated (using equation 1) to create a four-dimensional feature vector which is then used in the classification algorithm. This approach is thus based entirely on the composition of spatial frequencies within the texture and is not heavily dependent on the texture's directionality. Although this can have disadvantages in distinguishing between textures of very similar spatial frequency, it provides a robust rotation invariant set of features for texture classification.
8
ltl
6
H
1[~
4
i!111
ltJ
H
(a) (b) (c) Figure I - (a) Ten main channels of a 3-1eve1 wavelet decomposition of an image; (b) Wavelet channels used to produce features for texture classification; (c) Grouping of wavelet channels to form the 4 bands used to produce rotation invariant features.
2.2
Gaussian Markov R a n d o m Fields
GMRFs have been shown to perform well both in texture classification [ 1] and image segmentation. Here, the texture can be represented as a set of zero mean observations, y(s), s E ~ , s = {s = (i, j): 0 s i, j < M - 1} (2) for an
MxM lattice.
The GMRF model assumes the observations obey the following equation [ 1],
y(s) where
Ns
=
r~O rY(S + r)+ e(s)
is the neighbour set, 0 r is the GMRF parameter for neighbour r and
(3)
e(s) is
a stationary Gaussian
noise sequence. The neighbour set is assumed to be symmetric: O r = 0_r, for all r E N s
(4)
The GMRF parameters and the variance, v, of the noise source can be estimated for a given texture using the least squares approach [1 ] and are often successfully employed as features for texture classification. However, these features are not rotation invariant since each pair of neighbours can only represent the texture in a single direction. It was found that in order to achieve rotation invariance, the neighbour set should be circularly symmetric so that each GMRF parameter depends on neighbours in all directions. The neighbour sets for the 1st, 2nd and 3rd order circular GMRFs are shown in Fig. 2. The grey levels of neighbours which do not fall exactly in the centre of pixels can be estimated by interpolation. This model is the GMRF equivalent of the autoregressive models in [2] and [3], but was found to give a high classification performance without the need for multiresolution analysis [3] and is thus more computationally efficient. For the third order circular GMRF, just three parameters exist for the three sets of circularly symmetric neighbours. The features used for texture classification comprise these three parameters and the variance parameter, extracted using the least squares approach from a 16x16 area of the image. The third order GMRF is chosen to balance a high performance with a small number of features.
Figure 2 - Neighbour sets for 1st, 2nd and 3rd order circular GMRFs.
185
3.
Classification Results
Sixteen 256x256 Brodatz textures [8] were used to test the performance of the features. One sample image of each texture was used to provide several 16xl 6 sub-images with which to train the classification algorithm. A further 7 sample images of each texture were presented to the algorithm in a random order as unknown textures for classification. A minimum distance classifier was employed (using the Mahalonobis distance [6]) to perform the actual classification. Training and classification were first performed on the original textures, producing the first column of results in Table 1. The training set was then presented at angles of 0, 30, 45 and 60 degrees and the textures for classification at 20, 70, 90, 120, 135 and 150 degrees, yielding the second column of results in Table 1. The classification results for the two proposed rotation invariant schemes were compared to those using features from the traditional 3rd order GMRF and from the wavelet transform without the combination of channels. Table 1 summarises the results. Although the third order GMRF parameters give 100% correct classification when the textures are presented at their original orientation, they perform very poorly on the rotated textures, classifying only 45.8% of the samples correctly (see confusion matrix in Fig. 3a). This is due to the strong directional dependence of the parameters in the traditional GMRF model. The proposed circular GMRF model uses a circularly symmetric neighbour set to remove this directional dependence, resulting in a high classification performance both for the textures at their original orientations (93.8%) and for the rotated textures (95.1%). The confusion matrix in Fig. 3(b) illustrates this performance for the rotated textures. Misclassifications tend to occur either for visually very similar textures (e.g. paper and sand) or for textures with a high level of directionality which cannot be identified using a circular model (e.g. wood). The wavelet-based features using seven channels of the wavelet transform also have a strong directional dependence. These features give a high classification performance for the original textures (99.1%), but a mediocre performance for the rotated textures (86.5%, see Fig. 3c). By combining the directionally dependent wavelet channels, as in the proposed scheme, a high level of rotation invariance is achieved giving a correct classification rate of 95.5% for the original textures and 95.8% for the rotated textures. The scheme's performance for the rotated textures is illustrated in the confusion matrix in Fig. 3(d). The misclassifications occur only on the highly directional textures such as wood and raffia. This is because the directional information is lost when the wavelet channels are combined. For each of the proposed schemes, there is a slight degradation in their performance on the original textures compared to the non-rotation invariant approaches. This is due to the loss in directional information on making the schemes rotation invariant.
4.
Conclusion
Two novel texture classification schemes have been proposed, the first using the wavelet transform and the second using Gaussian Markov random fields. These schemes exhibit comparable performances to existing methods but both use a significantly smaller feature space. Furthermore, the features are robust and computationally inexpensive (both methods are amenable to fast implementation) and only a small analysis area for feature extraction is required, as desirable for texture segmentation applications. In addition, unlike most existing techniques, the proposed schemes are invariant to rotations of the textures to be classified, attaining the same high classification performance on the textures at all orientations. The traditional GMRF approach or the non-rotation invariant wavelet method are obviously preferable if the textures are guaranteed to occur only at the orientation they have been trained at. However, the proposed schemes are far superior when the rotation of the texture is not known a-priori as is often the case in real applications. The waveletbased approach is especially favourable, since it gives a higher performance, is computationally more efficient and its features are easily derivable from its non-rotation invariant counterpart.
3rd order GMRFs (7 features) 3rd order Circular GMRFs (4 features) Wavelet-Based Features (7 features) Rotation Invariant Wavelet-Based Features (4 features)
Original Textures
Rotated Textures
100.0% 93.8% 99.1% 95.5%
45.8% 95.1% 86.5% 95.8%
Table 1 - Texture Classification Performance Results
186 References [1]
[2] [3]
[4] [5] [6]
[7] [8]
R. Chellappa and S. Chatterjee, "Classification of Textures Using Gaussian Markov Random Fields," IEEE Trans. Acoustics, Speech, and Signal Processing, vol.33, no.4, pp.959-963, Aug. 1985. R.L. Kashyap and A. Khotanzad, "A Model-Based Method for Rotation Invariant Texture Classification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol.8, no.4, July 1986. J. Mao and A.K. Jain, "Texture Classification and Segmentation Using Multiresolution Simultaneous Autoregressive Models," Pattern Recognition, vol.25, no.2, pp.173-188, Feb. 1992. Y.Q. Chen, M.S. Nixon and D.W. Thomas, "Statistical Geometrical Features for Texture Classification," Pattern Recognition, vol.28, no.4, pp.537-552, Apr. 1995. K. Etemad and R. Chellappa, "Separability Based Tree Structured Local Basis Selection for Texture Classification," Proc. International Conference on Image Processing 1995, pp.441-445. T. Chang and C.-C.J. Kuo, "Texture Analysis and Classification with Tree-Structured Wavelet Transform," IEEE Trans. Image Processing, vol.2, no.4, pp.429-441, Oct. 1993. R. Porter and C.N. Canagarajah, "A Robust Automatic Clustering Scheme for Image Segmentation using Wavelets," IEEE Trans. Image Processing, vol.5, no.4, pp.662-665, Apr. 1996. P. Brodatz, Textures: A Photographic Album for Artists and Designers. Dover: New York, 1966. CLASSIFIED AS
T E X T U R E
cloth cotton canvas grass raffia rattan wood leather mcrttlnQ wOOl rep~le sand straw plaskln paper
7
38
3
IE 2cJ I 39 3 357 7 112
25
T
4 15 312 1 1
5 7
CLASSIFIED AS
2 13
21
3 20
5 14
13 28 1913 42
6
15
weave
E X T U R E
i 9
14
4~
cloth cotton canvas grass raffia rattan wood leather mating wool rep111e sand shaw pl~kln paper
2
38
42 37
42
37
3 42 4.2 34
42
(b) CLASSIFIED AS
canvas
42
38
weave
cloth cotton 35
42
42
CLASSIFIED AS
T E x T U R E
39
I
(a)
42 cloth cotton 26 42 canvas ~rass raffia rattan wood leather matflna WOOl rep111e sand straw pl~kJn paper weave
42
4~
7
(c)
9
35
9 2417 42
42
42
32
T r raffia E rattan X wood T leather U mating R wOOl E reD111e sand rtraw 31Clskln paper
2
4,2
4~
42
32 35
42
7 31
42 4~
42
42 42
weave
(d)
Figure 3 - Confusion matrices for classification results of rotated textures using: (a) GMRF features; (b) circular GMRF features; (c) wavelet-based features; (d) rotation invariant wavelet-based features.
Proceedings IWISP '96," 4- 7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
187
A NEW METHOD FOR DESCRIBING TEXTURE D. T. Pham* and B. G. ~etiner + *Intelligent Systems Laboratory, School of Engineering, University of Wales, Cardiff, PO Box 917, Newport Road, Cardiff, CF2 1XH, UK. +Istanbul Technical University, Faculty of Aeronautics Engineering, Maslak, Istanbul, Turkey.
ABSTRACT A new method is presented for obtaining feature vectors for describing texture. The method uses grey level difference matrices that are reminiscent of co-occurrence matrices but are much simpler to compute. Textural feature vectors are classified using artificial neural networks (ANNs). Comparative results for the new method and the standard Spatial Grey Level Dependence (SGLD) method are provided. Key words: Texture Analysis, Texture Classification, Neural Networks.
1 INTRODUCTION Texture is a fundamental stimulus for visual perception. Natural image analysis systems, such as the human visual system, use texture as an aid in segmentation and interpretation of scenes. Despite its importance, there is no generally accepted definition of texture and no agreement on how to measure it. This paper describes a new second-order statistics method for computing textural features and provides the results of using neural networks to recognise textures based on those features. Comparative results for the Spatial Grey Level Dependence (SGLD) or co-occurrence matrix method [Haralick et al., 1973] are also presented.
2 GREY LEVEL DIFFERENCE (GLD) METHOD FOR TEXTURE ANALYSIS The method involves computing GLD matrices, each element of which is the sum of scaled grey level differences between neighbouring pixels. Grey levels are quantised into groups to reduce the dimensions of the matrix, the number of groups being the number of rows/columns in the matrix. For each interpixel distance d and direction 0, a matrix can be computed. The concepts of interpixel distance and direction are similar to those adopted in the SGLD method. For example, with d=l, pixels that are immediately next to the pixel of interest are considered and with d=2, pixels that are separated by one pixel from the pixel of interest are used. There is a maximum of 8 directions, namely, 0 - 0 ~ 45 ~ 90 ~ 135 ~ 180 ~ 225 ~ 270 ~ and 315 ~ These define the position of a neighbouring pixel relative to the pixel of interest. For instance, the 0 ~ and 180 ~ neighbours of a pixel are the pixels to its fight and to its left respectively. The GLD matrix for a givend and 0 is computed as follows:(i) Quantise the grey levels inton groups. This fixes the dimensions of the GLD matrices tonxn. (ii) Initialise all elements of the GLD matrix to zero. (iii) Select the pixel to be processed in the image window. Call this pixel 1. (iv) Find the neighbour of pixel 1 at the specified interpixel distance and in the specified direction. Call this pixel 2. (v) Calculate the scaled grey levels of pixels 1 and 2, namely:
188
Pl
= Pl N----g-
pz = P.._2_.2 Ng
where pl and pz are the raw grey levels of pixels 1 and 2. P1 and PE range between 0 and 1. N g is the number of grey levels in the image. (vi)Calculate the scaled grey level difference between pixels 1 and 2:
Thus, GLD is a number between 1 and 2. GLD is equal to 1 when P1 and P2 are the same and to 2 when PI is 1 and PE is 0 or vice versa. GLD is arranged to be between 1 and 2 so that elements representing zero grey level differences are distinguished from ordinary (initialised) zero elements in the GLD matrix. (vii)Determine the GLD matrix element that corresponds to the scaled grey levels P1 and P2 that is to be updated. The position (i, j) of the element is calculated as follows: i = INT[n* P1]; j = INT[n* P2] where INT is a function that converts real numbers n* Pl and n* P2 into the nearest integer numbers. (viii)Update the GLD matrix element found in the previous step by adding to it the GLD value obtained in step (vi), that is: new_GLD(i, j) = old_GLD(i, j) + GLD (ix) If all neighbouring pixels of pixel 1 have been processed then go to step (x). Otherwise, go to step (iv). (x) If all pixels in the image window have been processed then STOP. Otherwise, go to step (iii). As an example, consider the image window in Figure 1 (a). The numbers of grey levels and grey level groups are 64 and 5 respectively. Let pixel 1 (with grey level equal to 48) be element (3, 3) and pixel 2 (with grey level equal to 35) be element (3, 4) in the image window. The scaled grey levels for these pixels are P~=48/64---0.75 and P2=35/64=0.547. The GLD value for these pixels is ]P1P21+1=1.203. The GLD matrix element corresponding to P~ and P2 is element (4, 3). Thus, it is updated from its initial zero value to 1.203. Similarly, let pixel 1 be element (4, 1) and pixel 2 be element (4, 2). The scaled grey levels for these pixels are P~--0.75 and P2=0.5. The GLD value for these pixels is 1.25. Again, the GLD matrix element corresponding to P1 and PE is element (4, 3). Thus, that element now becomes 2.453. The GLD matrix for the entire image window corresponding to an interpixel distance of 1 and a neighbouring pixel direction of 0 ~ is shown in Figure 1 (b).
.I.ID~ ,l~mE (a)
189 Grey Level
aar~ 0-12
0
0
13-25
0
0
0
26-38
2.031
0
0
2.453
1
0
2.219
52-63
0
0 5.219
(b) Figure 1. (a) Image window (b) GLD matrix calculated from the image 3 CLASSIFICATION OF GLD MATRICES GLD matrices were constructed for the 16 texture images from the Brodatz album [Brodatz, 1968]. These were 128x128 images of natural objects or scenes (for instance, reptile skin, grass lawn and beach pebbles). Each image was divided into 32x32 non-overlapping windows. This yielded a total of 256 patterns. The number of grey levels was 256. Eight grey level groups were employed giving GLD matrices of size 8x8. In addition to individual GLD matrices for the eight directions, direction invariant matrices were also computed by adding the corresponding elements in the individual matrices. Interpixel distances of 1 to 5 were adopted. This gave a total of 45 data sets, each with 256 patterns. A GLD matrix was obtained for each pattern. The matrix elements were used directly as features. Half of the feature vectors were selected randomly and employed as training examples. The remainder were used to test the classification accuracy of the trained classifiers. Thus, there were 45x128 feature vectors for training and the same number for testing.
The LVQ2 neural network with a conscience mechanism [Pham and Oztemel, 1994, 1996] was adopted as the tool for classifying the feature vectors into the correct texture class. That network was chosen after comparing its performance with the popular Multi-layer Perceptron classifier [Pham and Liu, 1995] on an experimental group of 9 data sets. The network had 64 inputs (the elements of the GLD matrix), 16 outputs (the texture classes) and 96 hidden Kohonen neurons. The number of Kohonen neurons was chosen empirically. To compare the proposed texture description method against the popular SGLD method, SGLD features were obtained for the same directions as was carried out for the proposed method. A feature vector of five components (energy, entropy, correlation, local homogeneity and inertia) was computed for each direction. An LVQ2 network was also employed for classifying the feature vectors. The network had 5 inputs (the elements of a feature vector), 16 outputs (the texture classes) and 32 Kohonen neurons. Again, the number of Kohonen neurons was found empirically. 4 RESULTS AND DISCUSSION Table 1 gives the results for all the 45 data sets. It can be observed that the classification accuracy using GLD matrices is superior to that using SGLD features for all interpixel distances and directions. The table also shows that with both methods the best accuracies were obtained for an interpixel distance of I.
Note that, although the dimension of the feature vectors in the SGLD method is smaller than that for the proposed method, the computation required to obtain the SGLD feature vectors [Haralick et al., 1973] is much more demanding. Additionally, the time required to train the LVQ classifiers to recognise the information-rich GLD feature vectors was comparable to that for the SGLD feature vectors.
190
Table 1. Number of misclassifications for each data set and average classification accuracies. 5 CONCLUSION A new texture analysis method based on grey level difference statistics has been described and its results have been compared with those of the SGLD method. The new method gave much better texture discrimination accuracies than the SGLD method on the natural texture images chosen from the Brodatz album.
References Brodatz P. (1968) "Textures: A Photographic Album for Artists and Designers", Van Nostrand Reinhold, New York. Haralick R. M., Shanmugam K. and Dinstein I. (1973) "Textural Features for Image Classification", IEEE Trans. Syst., Man, Cybern., Vol. SMC-3, No. 6, November, pp. 610-621 Pham D. T. and Liu X. (1995) "Neural Networks for Identification, Prediction and Control", Spfinger-Verlag, London and Berlin, pp.4-7 Pham D. T. and Oztemel E. (1994) "Control Chart Pattern Recognition Using Learning Vector Quantization Networks", Int. J. Production Research, 32(3), pp. 721-729 Pham D. T. and Oztemel E. (1996) "Intelligent Quality Systems", Springer-Verlag, London and Berlin.
Proceedings IWISP '96," 4-7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
191
Texture Discrimination for Quality Control Using Wavelet and Neural Network Techniques. D.A. Karras 1 and S.A. Karkanis 2 and B.G. Mertzios 3 1University of Ioannina, Department of Informatics, Ioannina 45110, Greece,
[email protected] 2NRCPS "Democritos", Inst. of Nuclear Technology, Aghia Paraskevi, 15310 Athens, Greece,
[email protected] 3Dernocritus Univ.of Thrace, Dept. of Electr.and Comp. Eng., 67 100 Xanthi, Greece,
[email protected] Abstract This paper aims at investigating a novel solution to the problem of defect recognition from images, that can find applications in building robust quality control vision based systems. Such applications can be found in the production lines of textile, integrated circuits, machinery, etc. The proposed solution focuses on detecting defects from their textural properties. More specifically a novel methodology is investigated for discriminating defects in textile images by applying a supervised neural classification technique, employing a multilayer perceptron (MLP) trained with the online backpropagation algorithm, to innovative wavelet based feature vectors. These vectors are extracted from the original image using the cooccurrence matrices framework and SVD analysis. The results of the proposed methodology are illustrated in a defective textile image where the defective area is recognized with 98.48 % accuracy.
I. Introduction Defect recognition from images is becoming increasingly significant in a variety of applications since quality control plays a very important role in contemporary manufacturing of virtually every product. Despite the lot of interest, little work has been done in this field since this classification problem presents many difficulties. However, the resurgence of interest for neural network research has revealed the existence of powerful classifiers. In addition, the emergence of the 2-D wavelet transform [5],[6] as a popular tool in image processing offers the ability of robust feature extraction in images. Combinations of both techniques have been used with success in various applications [10]. Therefore, it is worth attempting to investigate whether they can jointly offer a viable solution to the defect recognition problem. To this end, we propose a novel methodology in detecting defective areas in images by examining the discrimination abilities of their textural properties. Besides neural network classifiers and the 2-D wavelet transform, the tools utilized in such an analysis are cooccurrence matrix based textural feature extraction [4] and SVD analysis. The problem at hand can be clearly viewed as an image segmentation one, where the image should be segmented in defective and non defective areas only unlike its conventional consideration. Concerning the classical segmentation problem, that is dividing an image into homogeneous regions, the discovery of a generally effective scheme remains a challenge. To this end, many interesting techniques have been suggested so far including spatial frequency techniques [9] and relevant ones like texture clustering in the wavelet domain [9]. Most of these methodologies use very simple features like the energy of the wavelet channels [9] or the variance of the wavelet coefficients [3]. Our approach stems from this line of research. However, there is need for much more sophisticated feature extraction methods if one wants to solve the segmentation problem in its defect recognition incarnation, taking into account the high accuracy required. Following this reasoning we propose to incorporate in the research efforts the cooccurrence matrices analysis, since it offers a very accurate tool for describing image characteristics and especially texture [4]. It clearly provides second order information about pixel intensities when the majority of the other feature extraction techniques do not exploit it at all. Two are the main stages of the suggested system. Namely, optimal feature selection in the wavelet domain (optimal in terms of the information these features carry) and neural network based classification. The viability of the concepts and methods employed in the proposed approach is illustrated in the experimental section of the paper, where it is clearly shown that, by achieving a 98.48 % defective area classification accuracy, our methodology is very promising for use in the quality control field.
II. Stage A: Optimal feature selection in the wavelet domain The problem of texture discrimination, aiming at segmenting the defective areas in images, is considered in the wavelet domain, since it has been demonstrated that discrete wavelet transform (DWT) can lead to better texture modeling [ 1]. Also, in this way we can better exploit the well known local information extraction properties of wavelet signal decomposition as well as the well known features of wavelet denoising procedures [7]. We use the popular 2-D discrete wavelet transform scheme ([5],[6] etc.) in order to obtain the wavelet analysis of the original images containing defects. It is expected that the images considered in the wavelet domain should be smooth but due to the well known time-frequency localization properties of the wavelet transform, the defective areas- whose statistics vary from the ones of the image background- should more or less clearly emerge from the background. We have experimented with the standard 2-D Wavelet transform using nearly all the well known wavelet bases like Haar, Daubechies, Coiflet, Symmlet etc. as well as with Meyer's and Kolaczyk's 2-D Wavelet transforms [6]. However, and this is very interesting, only the 2-D Haar wavelet transform has exhibited the expected and desired properties. All the
192 other orthonormal, continuous and compactly supported wavelet bases have smoothed the images so much that the defective areas don't appear in the subbands. We have performed a one-level wavelet decomposition of the images, thus resulting in four main wavelet channels. Among the three channels 2, 3, 4 (frequency index) we have selected for further processing the one whose histogram presents the maximum variance. A lot of experimentation has shown that this is the channel corresponding to the most clear appearance of the defective areas. The subsequent step in the proposed methodology is to raster scan the image obtained from the selected wavelet channel with sliding windows of M x M dimensions. We have experimented with 256 x 256 images and we have found that M=8 is a good size for the sliding window. For each such window we perform two types of analysis in order to obtain features optimal in terms of information content. First, we use the information that comes from the cooccurrence matrices [4]. These matrices represent the spatial distribution and the dependence of the gray levels within a local area. Each (i,j) th entry of the matrices, represents the probability of going from one pixel with gray level (i) to another with a gray level (j) under a predefined distance and angle. More matrices are formed for specific spatial distances and predefined angles. From these matrices, sets of statistical measures are computed (called feature vectors) for building different texture models. We have considered four angles, namely 0, 45, 90, 135 as well as a predefined distance of one pixel in the formation of the cooccurrence matrices. Therefore, we have formed four cooccurrence matrices. Due to computational complexity issues regarding cooccurrence matrices analysis we have quantized the image obtained from the selected wavelet channel into 16 gray levels instead of the usual 256 levels, without diverse effects in defective area recognition accuracy. This procedure, also, renders the on-line implementation of the proposed system highly feasible. Among the 14 statistical measures, originally proposed by Haralick [4], that are derived from each cooccurrence matrix we have considered only four of them. Namely, angular second moment, correlation, inverse difference moment and entropy.
9 energy- Angular Second Moment
fl = Z
Z p(i, j ) 2
i
j
1~ Ne
9 Correlation
Z Z (i *j)p(i, :)-
f2 = i=l j=l
GO"
1
9 InverseDifferenceMoment
f3=~i~l+(i_j)p(i,j
9 Entropy
f4 =- Z Z p(i, j)log(pO, j)) l
)
./
These measures, we have experimentally found, that provide high discrimination accuracy that can be only marginally increased by adding more measures in the feature vector. Thus, using the above mentioned four cooccurrence matrices we have obtained 16 features describing spatial distribution in each 8 x 8 sliding window in the wavelet domain. In addition, we have formed another set of 8 features for each such window by extracting the singular values of the matrix corresponding to this window. SVD analysis has recently been successfully related to invariant paaern recognition [8]. Therefore, it is reasonable to expect that it provides a meaningful means for characterizing each sliding window, thus preserving first order information regarding this window, while, on the other hand, the cooccurence matrices analysis extracts second order information. Therefore, we have formed, for each sliding window, a feature vector containing 24 features that uniquely characterizes it. These feature vectors feed the neural classifier of the subsequent stage of the suggested methodology, next described.
III. Stage B: Neural network based segmentation of defective areas. After obtaining information about the textural structure and other characteristics of each image, utilizing the above depicted methodology, we employ a supervised neural network architecture of the multilayer feedforward type (MLPs), trained with the online backpropagation error algorithm, having as goal to decide whether a texture region belongs to a defective part or not. The inputs to the network are the 24 features of the feature vector extracted from each sliding window. The best network architecture that has been tested in our experiments is the 24-35-35-1. The desired outputs during training are determined by the corresponding sliding window location. More specifically, if a sliding window belongs to a defective area the desired output of the network is one, otherwise it is zero. We have defined, during MLP training phase, that a sliding window belongs to a defective area if any of the pixels in the 4 x 4 central window inside the original 8 X 8 corresponding sliding window belongs to the defect. The reasoning underlying this definition is that the decision about whether a window belongs to a defective area or not should come from a large neighborhood information, thus preserving the 2-D structure of the problem and not from information associated with only one pixel (e.g the central pixel). In addition and probably more significantly, by defining the two classes ill such a way, we can obtain many more training patterns for the class corresponding to the defective area, since defects, normally, cover only a small area of the original image. It is important for the effective neural network classifier
193 learning to have enough training patterns for each one of the two classes but, on the other hand, to preserve as much as possible the a priori probability distribution of the problem. We have experimentally found that a proportion of 1:3 for the training patterns belonging to defective and non-defective areas respectively, is very good for achieving both goals.
IV.
Results and Discussion.
The efficiency of our approach in recognizing defects in automated inspection images, based on utilizing texture information, is illustrated in the textile image shown in fig. 1 which contains a very thin and long defect in its upper side as well as some smaller defects elsewhere. This image is 256 x 256, while the four wavelet channels obtained by applying the 2-D Haar wavelet transform are 128 x 128. These wavelet channels are shown in fig. 2. In fig. 3 the selected wavelet channel 3 of maximum histogram variance is shown. There exist 14641 sliding windows of 8 x 8 size in this wavelet channel. The neural network has been trained with a training set containing 1009 patterns extracted from these sliding windows as described above. 280 out of the 1009 patterns belong to the long and thin defective area of the upper side only, while the rest belong to the class of non defective areas. The learning rate coefficient was 0.3 while the momentum one was 0.4. The neural network has been tested on all the 14641 patterns coming from the sliding windows of the third wavelet channel. The results are shown in fig. 4. Note that the network based on the suggested methodology was able to generalize and find also some other minor defects, while another network of the same type trained with the 64 pixel values of the sliding windows, under exactly the same conditions, was able to find only the long and thin defect. This fact demonstrates the efficiency of our feature extraction methodology based on textural and SVD features. Finally, in terms of classification accuracy we have achieved an overall 98.48 %. The evolution of the training error and of the generalization ability for the class corresponding to defects is shown in fig. 5, 6 respectively.
Figure 1. Original textile image containing a defect
Figure 3. QMF Channel No.3
Figure 2. Wavelet transformation of the original image
Figure 4. Resulted Image - White regions represent the defects
194
Figure 5. Learning Error Evolution
Figure 6. Generalization Performance Evolution
V. Conclusions
We have proposed a novel methodology for detecting defects in automated inspection images based on wavelet and neural network segmentation methods by exploiting information coming from textural analysis and SVD in the wavelet channels of the 2-D Haar wavelet transformed original images. The efficiency of this approach is illustrated in textile images and the classification accuracy obtained is 98.48 %. Clearly, our methodology deserves further evaluation in quality control vision based systems.
References
[1] Ryan, T. W., Sanders, D., Fisher, H. D. and Iverson, A. E. "Image Compression by Texture Modeling in the Wavelet Domain", IEEE trans. Image Processing, Vol. 5, No. 1, pp. 26-36, 1996. [2] Antonini, M., Barlaud, M., Mathieu, P. and Daubechies, I. "Image Coding Using Wavelet Transform", IEEE trans. Image Processing, Vol. 1, pp. 205-220, 1992. [3] Unser, M. "Texture Classification and Segmentation Using Wavelet Frames", IEEE trans. Image Processing, Vol. 4, No. 11, pp.1549-1560, 1995 [4] Haralick, R. M., Shanmugam, K. and Dinstein, I. "Textural Features for Image Classification", IEEE Trans. Systems, Man and Cybernetics, Vol. SMC-3, No. 6, pp. 610-621, 1973. [5] Meyer, Y. "Wavelets: Algorithms and Applications", Philadelphia: SIAM, 1993 [6] Kolaczyk, E. "WVD Solution of Inverse Problems", Doctoral Dissertation, Stanford University, Dept. of Statistics, 1994 [7] Donoho, D. L. and Johnstone, I. M. "Ideal Time-Frequency Denoising." Technical Report, Dept. of Statistics, Stanford University. [8] A1-Shaykh, O.K. and Doherty, J.E. "Invariant Image Analysis based on Radon Transform and SVD.", IEEE Trans. Circuits and Systems, Feb. 1996, Vol. 43, 2, pp. 123-133. [9] Porter, R. and Canagarajah, N. "A Robust Automatic Clustering Scheme foe Image Segmentation Using Wavelets", IEEE Trans. on Image Processing, April 1996, Vol. 5, No. 4, pp.662 - 665. [10] Lee, C. S., et. al, "Feature Extraction Algorithm based on Adaptive Wavelet Packet for Surface Defect Classification", to be presented in ICIP 96, 16-19 Sept. 1996, Lausanne, Switzerland.
Proceedings IWISP '96; 4-7 November 1996," Manchester, U.K. B.G. Mertzios and 'P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
195
A Region Oriented CFAR Approach to the Detection of Extensive Targets in Textured Images C a r l o s A l b e r o l a - L d p e z , Jos6 R a m d n C a s a r - C o r r e d e r a * a n d J u a n R u i z - A l z o l a * * D e p t o . T e o r i a de la Serial y C o m u n i c a c i o n e s e I n g e n i e r l a T e l e m s E T S I T e l e c o m u n i c a c i d n . U n i v e r s i d a d de V a l l a d o l i d , S p a i n C / R e a l de B u r g o s s / n . 47011 V a l l a d o l i d . e - m a i l : c a r l o s @ t e l . u v a . e s * D e p t o . Sefiales, S i s t e m a s y R a d i o c o m u n i c a c i o n e s . E T S I T e l e c o m u n i c a c i d n - U P M C i u d a d U n i v e r s i t a r i a s / n . 28040 M a d r i d , S p a i n ** D e p t o . de Serial y C o m u n i c a c i o n e s . E U I T T e l e c o m u n i c a c i 6 n C a m p u s de T a f i r a s / n . 35017 Las P a l r n a s de G r a n C a n a r i a , S p a i n Abstract In this contribution we address the problem of locating the presence of arbitrarily-shaped extensive objects in textured images. To that end, we propose to introduce spatial constraints within the detection framework by means of a recursive search of connected components of the target to be extracted. With this procedure, every target within the image is ideally detected with a single threshold, and thus the problem of locating the reference of estimation of the parameters of the detector with respect to the pixel under test is bypassed. Our experiments show that extensive targets are properly detected, regardless of their shape and extension. In addition, false alarms are easily cancelled since they will show up as isolated point-like random detections.
1
Introduction
Well known CFAR approaches [5] to target detection in images strive to maximize the probability of detection while keeping the false alarm rate low and constant throughout a non-stationary background, by means of estimating its local statistics to calculate the appropriate threshold in every pixel. However they are ,either directed at detecting very small targets [4] or they make use of some a priori knowledge about the target to be extracted, by using, for instance, a searching template from which the target features can be estimated [6]. On the other hand, if a general purpose extensive-target detection scheme is sought, a template matching procedure is not the solution, since it should use a large number of candidate templates, which would unnecessarily increase the computational complexity of the detector. Additionally, due to the fact that targets typically encountered in real world applications are extensive at practical resolutions, pixel-level detectors might not be the most efficient solution, for decisions are made independently of each other and thus the raw output of the detector will often have no spatial coherence; this makes a postprocessing stage compulsory, in which detections from the target boundaries are to be connected, and false alarms cancelled. These pixeloriented detectors are quite easy to implement in a real time scheme, but the postprocessing might overload the processor. Additionally, when using a CFAR detector for extensive target extraction, care must be taken to properly place the reference of estimation of the parameters of the detector; if this point is not taken into account, some parts of the target can easily lead the detector to miss different portions of itself since the parameters will be biased because of the pressence of target pixels within the reference of estimation. In this contribution we propose a CFAR detection scheme that incorporates region constraints within the detection framework. Our procedure potential stems from the fact that, in the target area, the image statistics will be quite different from those from the background, and will also have a sort of homogeneity, even though the target is fluctuating, that allows us to extract the target as a whole by means of a single local threshold. This way, we benefit from using pixel-level and region-level information simultaneously in the detection stage, and since ideally a single threshold is needed for a given target, we also minimize the above-mentioned effect of target shadowing by its own pixels.
196 2 2.1
C F A R D e t e c t i o n of E x t e n s i v e T a r g e t s A Pixel-Oriented
Approach
As mentioned in the introduction, few proposals of CFAR detectors in images address the problem of locating the presence of arbitrarily shaped and extensive objects; the solutions more often encountered incorporate some knowledge of the object to be extracted. We have developed [1] a pixel-oriented CFAR detector that extracts the outer edges of an extensive target, regarless of its extension and shape, in a g a m m a distributed textured background. The key of the proposal lies in the use of the phase of the estimated gradient in the pixel under test: the reference of estimation of the parameters of the detector is placed orthogonally to the gradient vector, and thus we reduce the possibility of pixels from the target falling into the cells of the reference of estimation. However this philosophy and, generally speaking, all the techniques that make decisions in a pixel by pixel basis without taking into account decisions in their surroundings, bring about spotty results, in which a number of unconnected edge elements are extracted together with a number of false alarms. Thus, a second stage is needed in which edge elements are connected and false alarms cancelled. To that end, optimization techniques have proved useful, although computationally involved [2][3]. 2.2
A Region-Oriented
Approach
If an extensive target is sought, and the image statistics remain approximately constant through the body of the target, a single threshold might be sufficient to properly detect and extract the target as a whole. That is, regardless of the shape of the object, it could be detected by a guided recursive search of its components, using as starting point in the recursion a detection obtained by means of a pixel-level detector. We have applied this main idea to build a detection algorithm in which decisions are dependent of each other and thus, the detector could be regarded as a region-level detector. To that end we proceed as follows: the detection process is started at a pixel level, but, if a detection is encountered, a region-level detection procedure is triggered, which initiates a recursive search in the 8-neighborhood of this pixel; every neighbor is now compared to the threshold that triggered the first detection. Then all of the neighbors that result in detections are recursively examined, using the only threshold calculated so far, and expanding the tree of neighbors one more level. The process keeps going until the search reaches the opposite boundary of the target (opposite with respect to the direction of the search), for all of the decisions that do not exceed the threshold will be labelled as 'background' and no further search is invoked in undetected pixels. This process can be expressed in pseudocode as follows: 1. Label pixels as Unvisited 2. For every unvisited pixel (a) Decide pixel as target/background by any CFAR detector (see, for instance,
[1])
(b) If pixel detected i. Then for every undetected neighbor A. Decide pixel as target/background with threshold in (a) B. If pixel detected 9 Then label pixel as detected and go to i 9 Otherwise label pixel as visited ii. Otherwise label pixel as visited This algorithm benefits from the fact that the recursive procedure captures the whole body of the target accurately: both the outer boundary and the inner details are captured, since the detection threshold has been previously calculated from data outside the target area, and no further threshold calculations are needed. Additionally recursive algorithms are fast and efficient, and the code that implements this algorithm is surprisingly short ant thus easy to store. The main drawback of this procedure is the condition for halting the search: at the present stage we conclude the expansion of the tree of neighbors when no more detections are encountered. Therefore, in those cases that the targets may lie in a rapidly changing background the threshold in the opposite side of the target might not be able to stop the search and noisy results would be obtained.
197
3
Results
In this section we show two exalnples of our detector capabilities. First, an artificial non-stationary background is represented in figure la), which has been synthesized by a 2-dimensional autoregresive filter driven by a white gaussian noise, and its output has been warped to obtain a gamma probability density function. We have let the parameters of the distribution vary during the synthesis process to obtain a non-uniform illumination pattern, as can be seen from this figure. Three targets have been superimposed in the texture, whose brightness content is quite overlapped with that of the background (specially in the two lower circles), but their textural pattern is different; therefore the detection process has been carried out at the output of an adaptive whitening filter (with an assumed quarter plane support). We show this output process in figure lb). Note the evident presence of the three targets in the background (three noisy spikes along the diagonal of the figure). The pixel oriented detector output is shown in figure lc) for a Pfa=10 -3. Note that target boundaries are visible, but detections are mainly isolated; the reason for this result is that the output of the whitening filter is very fluctuating in the surroundings of the targets, and therefore, the estimation of the gradient (for the placement of the ret~rence of estimation) is noisy as well. This leads to an inaccurate placement of the reference of estimation and, as a consequence, to a low detection performance. However, as figure ld) shows, the proposed detection philosophy, due to its inherent functionality, is able to extract much of the body of the target, which inakes any further processing directed at target recognition much easier. This figure also highlights that, due to the filtering process, part of the target power is smeared out of its boundaries, and therefore the detections extend farther from the original target in the filtering direction.
Figure 1: Detection in a whitened domain. Pfa = 10 -3 a) Original image b) Squared output of an adaptive whitening filter with QP support c) Boundary detection in b) d) Region-oriented CFAR detection in b). The second example is an image of a jacket in which four pins have been superimposed (figure 2a). The Pfa is set to 10 -~ in each band (the original is a three-band image. Only one band is shown here), and decisions are fused according to the OR logical function. Figure 2b) shows the result of the iterative search: the four pins are correctly detected, and most of the details in them are also visible. False alarms can be
198 easily removed by a very simple postprocessing, since its extension is much smaller than that from the real targets.
Figure 2: Detection in a natural background. P f a = 10 -3 in each band, fused by logical OR. a) Original image b) Region-oriented CFAR detection in a).
4
Conclusions
In this contribution we have proposed an algorithm for incorporating region constraints in the operative of a CFAR detector for object extraction in a textured background. Our procedure scans the image under analysis in a pixel by pixel basis until a detection is encountered; the detection triggers a recursive search of target components within the neighbors of the detection. This search is continued until the object is compactly extracted. Our results show that the algorithm performs satisfactorily in slowly changing backgrounds, for both targets are properly detected and false alarms are controlled according to the level of the detector. However, we have highlighted the fact that this procedure is sensitive to sudden changes in the image statistics. Our future efforts will be directed at disminishing this sensitivity, by means of conceiving more robust stopping criteria.
References [1] C. Alberola, J. R. Casar, J. Ruiz, A Comparison of CFAR Strategies for Blob Detection in Textured Images, Proc. of the VIII European Signal Processing Con]., EUSIPCO-96, September 1996 (to be held). [2] A. Martelli, An Application of Heuristic Search Methods to Edge and Contour Detection, Communications o] the ACM, Vol. 19, No. 2, pp. 73-83, February 1976. [3] U. Montanari, On the Optimal Detection of Curves in Noisy Pictures, Communications o] the A CM, Vol. 14, No. 5, pp. 335-345, May 1971. [4] T. Soni, J. Z. Zeidler, W. H. Ku, Performance Evaluation of 2-D Adaptive Prediction Filters for Detection of Small Objects in Textured Backgrounds, IEEE Trans. on Image Processing, Vol. 2, No. 3, pp. 327-339, July 1993. [5] C. W. Therrien, T. F. Quatieri, D. E. Dudgeon, Statistical Model-Based Algorithms for Image Analysis, Proceedings o] the IEEE, Vol. 74, No. 4, pp. 532-551, April 1986. [6] X. Yu, I. S. Reed, A. D. Stocker, Comparative Performance Analysis of Adaptive Multispectral Detectors, IEEE Trans. on Signal Processing, Vol. 41, No. 8, pp. 2639-2656, August 1993.
9- ~
,..
Proceedings IWISP '96; 4- 7 November 1996; M~nehester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
199
Generating Stabile Structure of a Color Texture Image using Scale-space Analysis with Non-uniform Gaussian Kernels. Satoru MORITA and Minoru TANAKA Faculty
1
of Engineering,Yamaguchi
University, Ube,755
Japan
Abstract
Coarseness and directionality provides important sources of information for color texture image recognition. Especially, it is important to distinguish between the textures and understand characters of similar color texture. So we proposed new scale-space analysis generated by non-uniform Gaussian kernels in order to find stabile image for coarseness and directionality. We analyze zero-crossing surfaces to generate non-uniform Gaussian scale-space from observations of a limited number. Singular points, where the topology of zero-crossing surfaces changes are plotted in new scale-space. A filter parameter for the biggest size of chunk enclosed by topology change surface is selected as an optimal parameter of a pixel. Optimal filter and the image description are calculated by this approach for natural color image. We show that this method is suited for color texture image recognition.
2
Introduction
Recently, many researchers have carried on the study of color images in the field of computer vision. The segmentation of color images using competitive learning was studied[l]. On the other hand, the segmentation of a color image using multiresolution analysis was proposed[2]. But consideration was not given to the texture in a color image. Coarseness and directionality provide important sources of information for texture image recognition. Especially, it is important to distinguish between the textures and understand characteristics of similar texture. The importance of interpreting an image in various scales was denoted by Marr[7]. Scale-space analysis is proposed using the zero-crossing points of a signal which are observed in various scales[6]. Uniqueness of scale-space based on uniform Gaussian kernels is analyzed[10]. Scale-space analysis using non-uniform kernels is useful for texture analysis and edge detections[8][9]. Image segmentation using a Gabor filter[4] with various directions have been studied for texture analysis[5]. Witkin proposed the method that selects the optimal scale which corresponds to the maximum width of interval in order to generate a stabile one-dimensional signal[6]. So we extend the interval tree of a one dimensional signal to the same approach of a two dimensional color image using non-uniform Gaussian kernel in order to select filter parameters with consideration to coarseness and directionality. In section 2, we define scale-space filtering with non-uniform Gaussian kernels. Especially, we classify the zerocrossing surfaces for a color image and clarify the properties. In section 3. using non-uniform Gaussian scale-space analysis, we denote the algorithm generating a stabile color image without the affect of noise by coarseness and directionality. We extract the stabile color images from some real images and show the effectiveness by matching experiments using the structure of stabile images.
Scale-space Analysis with Non-uniform Color Texture Image 3.1
Gaussian
Kernels
for a
Scale-space Filtering with Non-uniform Gaussian Kernels.
In order to generate the stabile image with respect to the coarseness and directionality of texture, we propose scale-space analysis with non-uniform Gaussian kernels and the algorithm generating the structure of a stabile image. In this section, traditional scale-space analysis with uniform Gaussian kernels is extended to scale-space analysis with non-uniform Gaussian kernels.
O,L = ~1V2 L = ~1 ( ~ 0 ~
+ ~O~)L
L satisfies with the previous diffusion equation.
L(x; t) = Z
ER n
g(a; t)f(x - a)da
200 Non-uniform Gaussian kernel used in scale-space analysis is defined in the following. 1
1. x 2
g(x, y; ax, ay) = ~ exp{-~L~ + 21raxcru This equation is rewritten in
y2 ~1}
1 ~p(_ ~ + y~), g(x, y; ~, F, O) - 2~IMI 2
where,
Yx r=v~r
3.2
=
0
~
sinO x ) -sinO c o s O ) ( y
0)
Zero-crossing Surfaces
With the directional vectors which maximize and minimize the curvature at the point p as (u,v)=(~1,7h), (~2, r/2), the maximum curvature ~z, the minimum curvature ~2, the mean curvature H, and the Gaussian curvature K are defined as the following. a) the maximum curvature at the point p ~1 = ,k(~l, 7h) , b) the minimum curvature at the point p ~2 = )~(~2,~]2), C) the mean curvature at the point p H = ~1+~22, d) the Gaussian curvature at the point p K = Sla2, e) H0 contours H = 0 f)K0 contours K = 0 An image is divided into elements using positive and negative of Gaussian curvature K and mean curvature H . Relationships between elements are described. In this paper, K - 0 and H=0 are called zero-crossing contours and the surfaces composed of zero-crossing contours in (x, y, t) space are called zero-crossing surfaces, x and y are the coordinates of an image and t is the scale. An image divided into elements by positive and negative of Gaussian curvature K and mean curvature H is called a KH-image.
3.3
Scale-space w i t h N o n - u n i f o r m Gaussian Kernels for a Color T e x t u r e Image
A color image is described by three color planes which are red plane(R), green plane(G) and blue plane(B). A pixel in a color image has 24 bit data. A pixel in a plane has 8bit data and 256 densitys. Thus color image I(x, y) was described three planes which are In(x,y), Ia(x,y) and IB(x,y). Next, we define non-uniform Gaussian scale-space for a color texture image. The coordinates of zero-crossing contours on IR(x,y)*G(x, y; tY,0,F), IG(x,y)*G(x, y; 9 ,0,F) and IB(x,y)*G(x, y; ~,0,F) are plotted on a five dimensional space (x,y,~,0,r). The properties of filter G(~,0,F) are decided by distortion ~, direction O and size r. Zero-crossing surfaces on non-uniform Gaussian scale-space are three kinds of manifold, which are S(x,y,~,O,F)in, S(x,y,~,O,r)ia and S(x,y,~,0,F)xB, on a five dimensional space (x,y, ffl,0,r).
3.4
T h r e e kinds of N o n - u n i f o r m Gaussian Scale-space
Three kinds of zero-crossing surfaces are extracted from three kind of these manifolds. Suppose (F, ff~) are constant, the coordinates of the zero-crossing points on I(x,y)*G(~,0,F) are plotted in three dimensional space (x,y,0). The scale-space has cylindrical coordinates in which x and y are in a plane and 0, extends circularly. Zero-crossing surfaces S(F,ffl; 0,x,y) are plotted in this scale-space. Suppose (F, 0) are constant, the coordinates of the zero-crossing points on I(x,y)*G(~,0,r) exist, are plotted on three dimensional space (x, y, if'). The scale-space has rectangular coordinate with three kinds of axes, x, y and ff~. Zero-crossing surfaces S(F,0; r are plotted in the scale-space. Suppose (~, 0) are constant, the coordinates of the zero-crossing points on I(x,y)*G(ff~,0,F) are plotted on a three dimensional space (x, y, F). The scale-space has a rectangular coordinate with three kinds of axes, x, y and F. Zero-crossing surfaces S(~,0; F, x, y) are plotted in this scale-space. The singular points where three kinds of zero-crossing contour topologies change as 0, ff~ and F increase are plotted in three kinds of scale-spaces which are red plane, green plane and blue plane.
201
Figure 1: A sample color image (left) and color planes(left R, middle G, right B) (right).
Figure 2: Filter(Top 9
r = 0.015)
3.5
0.015, O = 2n~ (n = 1, ..., 8) , F = 0.015) (Bottom 9 = 0.125, O = ~2,~ (n = 1, ..., 8),
Topology Change Surfaces
We analyzed the scale-space with non-uniform Gaussian kernels of constant values (xl, yl) to decide optimal filter for a point (xl, yl) in an image. Suppose (x, y) are constant, the singular points where the topology of zero-crossing surfaces changes on three kinds of the scale-space are plotted on three dimensional space (I', ~, ~). The scale-space has the cone coordinate in which ,F and 0 are in a plane and ~, extends perpendicularly upwards and tapers down a cone that the intersection, which ~ is constant, is a circle. Topology change surfaces, which are W(x, y; F,~, {~)IR, W(x, y; r , ~ , {~)i~ and W(x, y; r , ~ , ~)Is, are composed of a set of topology change points which were obtained from three color planes R,G and B. We try to find the maximum size of a chunk enclosed by a topology change surface. Topology of an image does not change in a region. We use log21~l instead of 9 on calculation. Three kinds of optimal filter parameters in a point (xl, yl) in a image, which correspond to color plane R, G and B, are decided. These processes are executed for all pixels of an image. This approach is the extension of the interval tree for a one dimensional signal.
4
The Algorithm
generating
a Stabile Color Texture
Image
We show the algorithm generating a stabile color texture image. 9 Color image I(x, y) is described using three kinds of planes in which are IR(x, y), IG(X, y) and and have 8bit data. (2.3)
IB(X, y)
9 Convolve three kinds of color planes which are IR(x,y), Ia(x, y) and IB(x, y), to the filters of the parameter Y = ~ 9 23""(n = 1, " ' " 5), 9 = ~ 923""(n = 1 , " ' 5) ' t? = T2n~ (n = 1, " " ' 8).(2.1) 9 Classify a filtered planes into regions by K and H parameters. Execute same processes for planes Ia(x, y) and IB(x, y). (2.2)
IR(x, y),
9 Generate three kinds of scale-space in which parameters r and tI, are constant, 0 and r are constant and and 0 are constant using three color planes In(x, y), IG(X, y) and IB(X, y). (2.4) 9 Interpolate between zero-crossing points of the limited number in a scale-space based on x, y and 0. Execute the same processes for the scale-space based on x, y and ~ and based on x, y and r. Find the singular points where the topology of the zero-crossing contour changes. Plot the singular points in a scale-space based on 0, q, and r. The set of singular points for a plane is called by a topology change surface. (2.5)
2n~ / Figure 3: Filtering image(filter~ -= 0.015, O = --K-~n = 1, ..., 8), r = 0:015) (left top R, left middle G, left bottom 2nr/ B) ,KH-image(filter~ = 0.015, O = --~--tn = 1, ..., 8), r = 0.015) (fight top R, right middle G, right bottom B)
202
Figure 4: A segment image. 9 Select the chunk of the maximum size which is enclosed by topology change surfaces generated from each planes as an optimal filter parameter.(2.5) 9 Plot the optimal filter parameters (9, F, O) of the limited number on scale-spaces based on ~ , F and 0 parameters for three kinds of color planes. An optimal filter surface is composed of the set of optimal filter parameters. Extract the discontinuities from the optimal filter surfaces using the technique of a cluster analysis[3]. 9 Describe the neighbor relations between image elements using a graph representation. The discontinuities correspond to arcs and image elements correspond to node on the graph representation. 9 Convolve a plane to the Gaussian filter of the optimal parameter obtained from a pixel. The pixel value of the plane is the pixel value of the filtered image. Execute these processes for all planes and all pixels. Thus, all pixel values of a stabile image are decided. This algorithm is applied for some real color images. Figure 1 shows a sample color image and three color -~n = planes. Figure 2 shows non-uniform gaussian kernels that filter parameters are r 0.015, 0.125, O = -2n~, 1, ..., 8), r=o.o15. Figure 3 shows filtering images and KH-images for three color planes that filter parameters are r = 0.015, O = -2n~, - ~ [ n = 1, ..., 8), r = 0.015. Figure 4 shows segment images generated using the algorithm generating a stabile color texture image. The boundaries between different gray values mean the discontinuities from the optimal filter surfaces. It is confirmed that a stabile color image without the affect of noise by coarseness and directionality is generated.
5
Conclusions
We extend the interval tree of a one dimensional signal to the same approach of a two dimensional color image using scale-space analysis with the non-uniform Gaussian kernel in order to select filter parameters with consideration to coarseness and directionality. Both the selection of optimal filters and the segmentation of an image are executed at the same time by analyzing optimal filter parameter surfaces. The proposed algorithm is applied for some real color images, and it is confirmed that this approach is useful for the the color image with noise by the matching experiments using the structure of a stabile image.
References [1] T. Uchiyama and M. A. Arbib, "Color Image Segmentation Using Competitive Learning," IEEE Trans. Pattern Anal. & Machine Intell. , vol. 16, No. 12, vol. 12, pp. 1197-1206, 1993. [2] J. Liu and Y. Yang, "Multiresolution Color Image Segmentation," IEEE Trans. Pattern Anal. & Machine Intell. , vol. 16, No. 7, pp. 689-699, 1994. [3] D. E. Rummelhart and D. Zipser, "Feature discovery by competitive learning," Cognitive Sci., vol. 9., pp. 75-112, 1985. [4l D. Gabor ,"Theroy of communication," J. Inst. Elect. Engr. , 93,vol. 93, no. III, pp. 429-459", 1946. [5] A. K. Jain and F. Farrokhnia, "Unsupervised texture segmentation usin Gabor filters," Pattern Recognition, vol. 23, pp: 1167-1186, 1991. [6] A. Witkin "Scale-space filtering," Proc. Int. Joint Conf. Argifitial intelligence ", Karlshiruhe, West Germany, pp. 1019-1022, 1983. [7] D. Mart "Vision," W. H. Freeman, San Fransisco, 1982. [8] P. Perona and J. Malik," Steerable-scalable kernels for edge detection and junction analysis," in Proc. 2nd European Conf. on Computer Vision, pp. 3-18, 1992. [9] M. Michaelis and G. Sommer, "Junction classification by multiple orientation detection," in Proc. 3rd European Conf. on Computer Vision, pp. 101-108, 1994. [10l J. Babaud, A. P. Witkin, M. Baudin,and R. O. Duda, "Uniqueness of the Gausian kernel for scale space filtering," IEEE Trans. Pattern Anal. & Machine InteU., Vol. 8, No. 1, pp. 26-33, 1986.
Session G: IMAGE CODING II: TRANSFORM, SUBBAND AND WAVELET CODING
This Page Intentionally Left Blank
Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
205
A P P R O X I M A T I O N OF B I D I M E N S I O N A L K A R H U N E N L O E V E E X P A N S I O N S B Y M E A N S OF M O N O D I M E N S I O N A L K A R H U N E N L O E V E E X P A N S I O N S , A P P L I E D TO I M A G E C O M P R E S S I O N Nello Balossino and Davide Cavagnino Dipartimento di Informatica- Universit/L di Torino C.so Svizzera 185 - 10149 TORINO - Italy E-mail: {nello, davide}@di.unito.it Abstract The paper treats image compression based on Karhunen Loeve expansions approximated by monodimensional expansions. The results prove that the described method leads to a huge reduction of computational complexity and required time. A comparison with the Discrete Cosine Transform is also reported.
Introduction In many applications a capability to compress images is required; so compression algorithms are frequently embedded in software. In order to evaluate an algorithm used to compress images, the compression ratio C is defined as C=n~/no where no is the number of bits that encode the compressed image and n~ is the number of bits in the original image. As is well known, compression algorithms are classed as reversible, or irreversible, depending on whether the decompressed image is, or is not, identical to the original one. A class of reversible compression algorithms is based on bidimensional transformations that perform a spectral analysis of parts of the image (subimages) by means of an orthonormal basis:
F(u, v)= E A(u, v,x, y)f(x, y) x,.v
where f(x,y) represents the original bidimensional image, F(u,v) are the transformed coefficients and A is the kernel of the transformation (A is often called the set of basis images). In order to reproduce the original image it is sufficient to use the following transformation:
f(x, y) = E B(u, v,x, y)F(u, v) u,v
where B is the inverse of the kernel. A bidimensional.transformation is said to be separable if and only if we can write
A(u,v,x,y)= At(u,x)A2(v,y ) If we quantize the coefficients F(u,v) or discard some of them before applying the inverse transformation we expect an information loss in the reconstructed image (in our work we only discard coefficients and round the remaining ones to two byte integers); in this way, the compression algorithms become irreversible. In this paper we concentrate on the Karhunen Loeve (KL) expansion (used also with hybrid encodings in recent works [2]) and the Discrete Cosine Transform (DCT), the latter being used as the core of the JPEG standard (see [3, 6]). Given an image of size NxN, we partition it into non-overlapping subimages of size nxn, which we interpret as a random field [7] with mean m; the autocorrelation matrix K (of size n2xn2) is computed from the centered subimages (given a subimage x, the centered subimage is x-m). The kernel of the KL transform is made up of the eigenvectors of the matrix K. The eigenvalue associated with each eigenvector is the variance of the spectral coefficients belonging to the eigenvector; we can then sort the eigenvectors in descending order respect to their eigenvalues. If we arrange the eigenvectors by rows in a matrix A, then we can write the KL transform in the following way: y=A(x-m), where x, y and m are n• subimages in column form (see [ 1, 4]) and, given that the eigenvectors constitute an orthonormal basis, we have the fiwerse transformation x=A'y+m (the symbol ' meaning matrix transposition). To effect a compression we can discard the coefficients with smaller variances, keeping only the first l eigenvectors from which we obtain (At are the first l rows of A) y, = A , ( x - m )
(1)
206 and = A~y, + m
(2)
where ~ is an approximation of x. The KL transform has the property of being optimum, respect to all others, in the least square error sense, when considering the same number of coefficients. KL thus is adapted to the image from which the eigenvectors are computed and this is the informal proof of its coding efficiency. This method has the drawback that with subimages of nxn pixels, we need to calculate eigenvectors and eigenvalues of a symmetric matrix of dimension n2xn 2, SO the complexity of the problem grows very rapidly (see, for example, [5]) with increasing size of the subimages. However this increase should allow discarding a relatively greater number of coefficients to obtain larger compression ratios; this advantage has to be balanced with the increased length of the eigenvectors to be transferred to the decompression phase.
Method Our goal was a set of basis images (of extension nxn) having the desirable characteristics of the KL ones, but lighter in computational complexity. Thus we considered row and column vectors of dimensionality n by subdividing the image in row and column vectors; we calculated separately a KL orthonormal basis (of size n) for the rows {rl ..... r~} with mean of all the rows rM and for the columns {c~..... c,} with mean of all the columns CM. Computing the eigenvectors involved the inversion of two nxn matrices, one for rows and one for columns. Afterwards, to obtain an orthonormal basis of size n 2 of basis images nxn, we multiplied every column and every row (tensor product): ci I). What is obtained is an orthonormal basis of n 2 subimages; in fact by hypothesis
c,'cs = 6,s rkr; = 6 kl If ~
is the operator that produces a row starting from a matrix, we can write (cirj ) ( ( f ir j ))t =
[c/,r.l...Cilrjn
c,2rjl" ..Cinr.n
][C/151...Cilrjn
(r162
= (c;c,)(r:,~= 8,,8;,
C,251o ..cinrjn ], = (C;C i )(rjrj:) -- 8,i6 jj
while
where ~j is the Kronecker delta. To obtain an ordering for the significance of the obtained eigenvectors, we multiply the corresponding eigenvalues obtaining a fictitious eigenvalue for each basis image. The mean to use when applying equations (1) and (2) can be either the mean of the subimages n• or the mean of the mean vectors rM and cM calculated in this way:
mu =
r., +c~,
(3)
2
where rmi is the i-th pixel of rm and c ~ is thej-th pixel of cM. We obtain a new separable transformation, derived from KL, that requires less overhead information transfer (only 2n vectors of dimensionality n plus their eigenvalues), has a slower complexity growth when increasing the subimage dimensions with respect to bidimensional KL but has the drawback of lesser accuracy when using the same number of coefficients. We compared this method with DCT, and we noted (in our preliminary tests) that when we used only 8% of the coefficients for subimages of size 8x8, the proposed method performed better than DCT respect to the mean square error (4) and relative m.s.e. (5)
( f ' ( x , y) - f ( x , y))2 mean square error = o#_p,xels
(4)
# all_ pixels
l
relative mean square error =
,#_,,x,ts
f '(x, y ) - f (x, y) f (x, y) # all_ pixels
12 (5)
207 where f'(x,y) is the reconstructed and quantized (pixel values converted to integers) image. To compare both methods, one should determine the distortion functions (m.s.e. and relative m.s.e.) for equal bit rates. This comparison is not possible in a precise sense since the Huffman source coding of the same number of coefficients can vary in run length, and therefore in bit rate. We thus base our comparison on an equal number of coefficients, all of which should however be sufficiently well represented in the two byte integer format we used. Moreover, when the image was oversampled (i.e. a pixel was set equal to three of its neighbours), the proposed method performed better than DCT whatever number of coefficients was used when n=8, and in almost all cases when n-16. This can be explained noting that DCT uses general characteristics of the images (our eyes are not very sensitive to high frequency distortions) while the previous method is optimized for high performance with the image under examination: what is needed and obtained is a lower complexity respect to bidimensional KL, in calculating eigenvalues and eigenvectors. In addition to the previous considerations, if images with high spectral components are examined, the proposed method will perform better than DCT, because of its adaptivity to the image it examines. Another important aspect to note is that to obtain higher compression ratios it is necessary to use larger subimages (i.e. increasing n), and the proposed method is faster than bidimensional KL, especially with large n (n = 16, 24 .... ). The testing of the method was performed using MATLAB | [8], a software package that allows a fast prototyping of mathematical models.
Results We present some results obtained applying the proposed method and the DCT to images of size 512x512 with 256 grey levels. The subimages are of size 8x8 and 16x 16. In Figure l(a) and (b) the behaviour of the m.s.e, versus the number of retained coefficients is reported when the transformations are based on subimages of size 8x8 (i.e. n=8). In Figure 2 are shown the same variables for subimages of dimension 16x 16 (i.e. n = 16). Note that in these figures, errors were computed without rounding the coefficients in order to analyze the capability of the methods to compact the energy into few coefficients. If the coefficients were rounded, the error would be slightly increased and the corresponding compression ratios would be those reported in Table 1 and Table 2. Obviously the compression ratio is the same, both for the KL based method and the DCT method (not taking into account, for KL, the little overhead due to the eigenvectors, eigenvalues and mean subimage). If we fix the error then the KL method (in Figures l(b) and 2(b), for example) will use lesser coefficients and so will have a higher compression ratio. Table 1: Compression ratio with n=8
Table 2: Compression ratio with n=16
No. of coefficients 2 4
No. of coefficients 2 4 8 10 16
8
10 16
Compression ratio 16 8 4 3.2 2
Compression ratio 64 32 16 12.8 8
The first image considered is the classical Boat. The second image is a Nuclear Magnetic Resonance image of size 256x256 enlarged to 512x512 by means of pixel replication. We note in the graphs that the behaviour of the errors of the two transformations is similar (in Figure 1(a) and 2(a)), and better for the KL based transform (in Figure 1(b) and 2(b)). Compatible qualitative results are obtained by personally observing the reconstructed images upon reducing the number of retained coefficients. We performed a time test of the classical KL transform versus the monodimensional KL transform using the tic & toc functions of MATLAB | The test was performed on a 120 MHz Pentium running Windows 95. For 8x8 subimages the classical method computed the basis images in 10.93 seconds (average value) while the new method computed the basis images in 5.1 seconds (average value). For 16x 16 subimages the classical method computed the basis images in 55.91 seconds (average value) while the new method computed the basis images in 7.9 seconds (average value).
208
Figure 1: The m.s.e, values in reconstructing the Boat (a) and NMR (b) images using subimages of dimension 8x8.
Figure 2: The m.s.e, values in reconstructing the Boat (a) and NMR (b) images using subimages of dimension 16x 16.
Acknowledgements This work has been supported by the national project of MURST "Sviluppo di una workstation multimediale ad architettura parallela". The authors thank prof. A. Werbrouck for critical comments and textual suggestions.
References [ 1] R. C. Gonzalez and P. Wintz. Digital Image Processing. Addison-Wesley, 1987. [2] F. G. Horowitz, D. Bone and P. Veldkamp. Karhunen-Loeve based Iterated Function System encodings. In International Picture Coding Symposium, Melbourne, March, 1996. [3] K. R. Rao and P. Yip. Discrete cosine transform algorithms, advantages, applications. Academic Press, Inc., San Diego, 1990. [4] A. Rosenfeld and A. C. Kak. Digital Picture Processing, volume 1 II ed. Academic Press, New York, 1982. [5] C. A. L. Szuberla. Discrete Karhunen-Lo6ve Transform. http://foo.gi.alaska.edu/-cas, DRAFT. [6] G. K. Wallace. The JPEG still picture compression standard. Communications of the ACM, 34(4),1991. [7] A. M. Yaglom. An introduction to the theory of stationary random functions. Prentice Hall, 1962. [8] The MathWorks. MATLAB Reference Guide. The MathWorks, Inc., Natick, MA, 1992.
Proceedings IWISP '96," 4-7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
209
B L O C K N E S S D I S T O R T I O N E V A L U A T I O N IN B L O C K - C O D E D P I C T U R E S M. Cireddu, F.G.B. De Natale, D.D. Giusto, and P. Pes
Department of Electrical and Electronic Engineering University of Cagliari Piazza d'Armi, Cagliari 09123 Italy
[email protected] Abstract In this paper, some of the most significant image quality indexes are reviewed and compared with a new method for block distortion evaluation. At first, a survey is given related to classical measures based on numerical differences between original and reconstructed image data (e.g., MSE and SNR), as well as advanced methods aiming at considering the perceptive aspects of image degradation (e.g., Hosaka Plots, HVS-based methods). Then, four innovative methods for blockness distortion evaluation are described, based on DCT analysis or on the use of gradient operators. 1. Objective Distortion Measures The more classical distortion measure is the Mean Square Error (MSE) between the original image and the decoded one. It measures punctual variations of the image instensity by averaging the squared differences between couples of corresponding pixels 1 MSE : - - ~ ~ [f(i, j ) - fr(i, j)] m n i=lj=l
The Signal-to-Noise Ratio (SNR) and the Peak-Signal-to-Noise Ratio (PSNR) can be directly derived by the MSE by using the following equations, which assume the distortion introduced by the codingdecoding operation as a kind of noise: m
SNR = O2x 9 PSNR - (2b)2 ff2x=~ 1 m n ( f ( i , j ) - ./7); ~, ~., mn i=1 j=l MSE' MSE '
f =
n
~., f ( i , j ) i=lj=l mn
where f ( i , j ) is the original grey level of the pixel (ij)-th, fr(i,j) is the reconstructed grey level, and m, n are the image dimensions. These measures provide a global estimation of the image distortion after co-decoding process. 2.Advanced Methods In this section, three of the most interesting image distortion measures are breafly reviewed, which differ from the above in the sense that human perception parameters are taken into account.
2.1 Hosaka Plots The evaluation process consists of first segmenting (splitting) the NxN blocks of the original image into k classes. The initial block size N is usually chosen as 16, thus leading to 5 classes" all blocks of size k= 1,2,4,8,16 form the k-th class. From each class, two feature vectors are calculated, respectively based on the average standard deviation and on the weigthed mean
where the elements marked with '*' are referred to reconstructed images. The error diagram, or Hplot, is constructed by plotting the corresponding features dS k and dM k in polar coordinates. The area of the H-plot is proportional to the image degradation; in particular, the presence of noise and blurring effects are put in evidence by looking at the left and right side of the plot. 2.2 Information Content (IC) This method is based on the evaluation of the perceptual distortion and therefore takes into account the characteristics of the human visual system (HVS) model. It consists of five stages: (i) the original image is re-mapped by a non-linear transformation; (ii) a linear transformation in the DCT domain is
210
applied to 8x8 image blocks; (iii) a matrix of coefficients is calculated at fixed resolution; (iv) the DCT coefficients are multiplied by the weigths; (v) IC is determined by summing the coefficient magnitudes. 2.3 Perceptual distortion measure The perceptual distortion measure is based on an empirical model of the human perception of the spatial patterns. The model consists of four stages: (i) front-end linear filtering, (ii) squaring, (iii) normalization, and (iv) detection. A steerable pyramid transform decomposes the image locally into several spatial frequency levels; each level is further subdivided into a set of orientation bands ~ (0,45,90,135) degrees. The front-end linear transform yields a set of coefficients A0 for every image region. The squared normalized output is computed, and a simple squared-error norm is adopted as detection mechanism R0= k A20/Z~A 20 +o2.
~,0~ (0,45,90,135)
where k is a scaling constant, a a saturation value, and Rref Rref, images vectors.
Rdist
the original and distorted
3. Blockness distortion measures Block distortion, or tiling effect is typical of any kind of block-based coding systems. It consists in an annoying visual mosaic effect produced by the imperfect matching of neighboring approximated blocks. Some image coding approaches reduce this drawback by using appropriate overlapping or interleaving techniques, but most part of the common methods (included the current standards) prefer to ignore the problem for the sake of simplicity. The methods presented hereafter evaluate the amount of such a particular but very usual image degradation.
3.1 Methods based on DCT analysis Two block distortion measures based on DCT analysis are considered here. Both are targeted to a particular kind of distortion appearing as a step of the luminance function in the horizontal or vertical directions, and consequently analyse the DCT features looking for this phenomenon. In our tests we considered blocks of size 8x8 at 8 bpp and their DCT coefficients matrix. A block characterised by a horizontal or vertical luminance step presents on the correspondent coefficient matrix a predominance in the first column or row. A block that has a double step, horizontal and vertical, on the correspondent DCT matrix has null elements (magnitude base layer decoder, one has to use two imbedded motion compensation loops in the scalable coder, each of them corresponding to a resolution. 9 The transformation used [12], [9] has to enable easy high-quality reconstruction of lower resolutions using only part of the transform coefficients. A very interesting transformation was provided by [12], yielding subband PQMF filters with complexity similar to DCT but splitting the frequency space in a more adequate way than DCT for hierarchical coding. 9 The quantization [3] used in the two layers are linked by some . In the case of uniform quantization, the quantization step of the base layer has to be a power of two times that of the enhancement layer. Based on these considerations and in order to have as precise an analysis as possible of scalable coding, we used as a baseline for our experiments a scheme that significantly deviates from MPEG2 specifications. In this scheme, depicted in Figure 1, we use a PRMF subband transformation instead of an 8x8 DCT and the quantization step of scalar quantization (SQ) is constrained to be a power of two. The rest of the scheme however, conforms to MPEG2 specifications, uses hierarchical block matching, an IBP GOP structure, and, what is most important, takes practical implementation constraints into account (such as limited precision of number representation, real VLC encoding and costs based on the construction of a structured bitstream). We also had to restrict ourselves to the study of TV and 88 "IV spatial scalability for implementation purposes. Though it would have been more accurate to directly work on TV and HDTV, our conclusions on TV and 88TV can easily be generalized to this case.
Scalable coding scheme The scheme in Figure 1 could be called a >coding scheme since it codes separately each resolution. We used this scheme as a reference for our study since simulcast coding is the only way to transmit several resolutions with non scalable schemes. Another reference is the >scheme using the total bit rate for coding only the full resolution. Based on the simulcast scheme spatial scalability can be introduced as an extra refinement of the high resolution temporal prediction using the decoded low resolution sequence. We first wanted to study the influence of this > on the scalable performances. So we designed several spatial scalable schemes, using several ,Proc. HAMLET Race 2110 Workshop, pp.69-75, February 27-28 th, Rennes, France, 1996. J. Delameilleure, S. Pallavicini, 1 is the degree of fuzzification, C >= 2 is the number of clusters, n is the number of data samples and D ( . ) is the deviation of the data vector xk from the i-th cluster prototype. One may notice in this formulation that the cluster prototypes [5 are fixed for the entire range of the data set. This may give good results if the data are stationary, but in the context of image segmentation this is not the case. Images are highly nonstationary signals and the implied assumption of stationarity by fixing the values of [3 to constant values throughout the image does not result in good segmentations in terms of index of fuzziness and fuzzy entropy. A specially optimised algorithm for the purpose of image segmentation that incorporates the non-stationarity and the neighbourhood correlation features that are inherently present in all non-trivial images using a fuzzy multiresolution approach is presented in this paper. In Section 2.1 we present the multiresolution, spatially constrained model for the adaptive segmentation of images; in Section 2.2 we discuss the non-stationary estimation of the cluster prototypes; in Section 2.3 we analyse the inter-pixel correlation model. Finally, in Section 3 we discuss the results of the proposed scheme and the effects of different parameters to the segmentation results.
2.Analysis of the Algorithm
2.1 The multiresolution non-stationary image segmentation model The key element of the proposed family of algorithms is that the final segmentation should incorporate all the available segmentation information calculated at various resolution levels r, having utilised the non-stationary modelling of the cluster prototypes and the spatial constraints. If we assume that the segmentation performed at each resolution level U r is correct, then each segmentation result should contribute -to some degree- for the calculation of the final fuzzy partition matrix; however, the interpretation of the results of segmentation at each resolution level and the restrictions imposed by various resolutions should be considered. Let X denote the image data, having values that typically range from 0 to 255. Let xk denote the intensity of a pixel at the location k, with k ~ [0, MxN-1], with M,N being the image dimensions. The fuzzy segmentation of the image to c regions (clusters) is obtained by finding the fuzzy partition matrix U=[uJ. In the proposed model, the prototype vectors [3 vary
308
with the location k, i.e. 13=13i (k). Like both FCM and PCM, our approach iterates between estimating 13 and updating the partition matrix U using the calculated estimates of [5 i (k). The prototype values are estimated using a hierarchical approach. We construct a pyramid of images X" at different resolutions r, having dimensions M ~ x N', starting from the highest resolution image (r=0) by ideal low-pass filtering and decimating by two. Let 13~,..~, (k) denote the estimated cluster prototype for cluster i out of c, at a resolution level r using a window of size W. Let also U~ denote the fuzzy partition matrix for a certain resolution level r, having dimensions cx(2 r M x 2rN). At the lowest resolution image, typically of dimensions of 32x32 (r=-3) either the FCM or PCM algorithm is applied, its' results being an initial segmentation; U3= PCM(X 3) or U3=FCM(X3). For each resolution level, the following calculations are taking place: The values of 13i are calculated (as described in See. 2.2) in a window of size then, the fuzzy partition matrix f f is calculated in the following manner:
~r(k )
1
Wsize that
is equal to half the image size;
1
(1)
} The r h values, that define the inter-cluster distance, are calculated as the standard deviations between the prototype values estimated within the window W and the image data. Finally, the spatial constraints are taken into account, thus modifying f f as described in See. 2.3. When the calculation of f f has converged for a certain window size, the window size is reduced by a factor of two and the whole process is repeated until a minimum window size W,nin=8 is reached. The calculation of b r has converged for a window size if the number of changes in the fuzzy partition matrix is lower than a specified threshold. A good threshold was found to be 5% of the last number of changes. Typically, 3 to 5 iterations are adequate. When the algorithm has converged for the minimum window size at a resolution level, we have the segmented image for that resolution. The values of 13~,.sobtained are expanded by a factor of 2 and the process of re-estimating 13~,.wand updating the fuzzy partition matrix are repeated for the next resolution level, until the original resolution level is reached. The covergence of U~ is followed by a data fusion procedure that utilises all the segmentation information obtained for the different resolutions to calculate the final segmentation. If we assume a multiresolution quad-tree structure for the segmented pixels, then, each segmented pixel at resolution r has four children at resolution r + l . We define an information gain me~'c (1GM) for measuring the knowledge that the calculation of U for a cluster i has provided at the higher resolution as the difference between the parent's possibility of belonging to each class i from the average of the children's class assignments, that is:
~ur+l[m,n] 1GM r [k, l l =u r [k, l l
Children~ ur[k'l] 4
(2)
If IGM~i[k,l] is close to zero, then the existence of a homogeneous region is implied and the updated partition matrix resuits for cluster i are correct with possibility 1- 1GMri[k,l]; otherwise, details must have been emerged and the cluster assignments of the lower resolution segmentation are correct with a lower possibility. If (fMN denotes the results of segmentation at resolution r expanded to dimensions MxN, the final fuzzy partition matrix U is calculated in the following manner: U= 1
-~ ~0 (1- IMG r )'UrMN (3) r=rmin where K is a normalising constant. The factor 1-IMG r removes the bias towards the results of lower resolution segmentation only when details emerge in higher resolutions, providing consistent segmentation of homogeneous regions.
2.2 Non-stationary estimation of the clusterprototypes The estimation of the non-stationary cluster prototypes is one of the key elements for the performance of the proposed algorithm. We assume that there exists an ordering such as:
1(k) < ~ 2 (k) > objects are always darker than ((bright)) ones). The ordering is performed after the initial application of FCM or PCM to the lowest resolution image. One can easily observe that for a given window size W, the following relation holds:
min{x r (k)} < ~ r1,W . 2, the power of the test is large. Fast M i x t u r e D e c o m p o s i t i o n : If .Afnxn(i,j) is decided to be homogeneous, then the true value of pixel (i,j) is estimated by the sample mean of .Afnxn(i, j) which is the best estimator (unbiased and of minimum variance) for the case of Gaussian noise. If the decision is that .Afnxn(i,j) is heterogeneous, then the unknown parameters of the mixture P0, P1, p,0, p l are estimated by the method of moments using the three first sample moments [8, 7]. The closed form estimators are D - x / f l 2 - 4~
P0 -
2
~'1 -- fl + ~ / f 1 2 _ 4"~' '
~
P 0 -" ~/1 '
and
Cl
~1 -- ~ 0
'
J~)l ---- e l -- fL0 ~ 1 -- ~ 0
'
where /~ _. C3 -- C1 C2
c2-c~ '
7 =
e l C3 -- C22
c,-c~
'
cl = r n l ,
c2---m2-
0" 2
,
c3=rn3-3mlo
.2 ,
and m~, for g = 1,2,3, is the ~-order sample moment of Afn• Local Classification: The estimates P0, #1, P0 and 151 are used in the calculation of the threshold T, used to separate the two classes, that is 0.2 P0 T~ 0 + ]/,1 _~. ~ In- . 2 /~1 - h0 P1 The vMue of T is used for the classification of the pixel (i, j) as follows
ILC(i,j) = ( P l if J ( i , j ) > T P.0 otherwise
The above algorithnl reduces the noise quite efficiently while preserving intensity discontinuities very well, especially when the signal-to-noise ratio is above a certain level (p > 2).
3
Iterated Conditional Modes (ICM) Algorithm
The resulting estimate of the true image from the first stage ]LC may contain errors due to: a) wrong decisions at. the hypothesis testing phase, and b) inaccurate parameter estimates and, consequently, inaccurate
343 data classification. This initial estimate may be further improved by the exploitation of the spatial smoothness of the image. One of the simplest approaches to capture the spatial smoothness of the true image is to consider it ms a pairwise interaction Markov Random field [4, 10] with probability distribution
P(I) oc.exp{-U(I)} , where
lr'(I)-
Z
~
(i,j)ea
and
Z
6(I(i,j),I(k,l)),
(k,i)EJ~f3x3-{(i,j)}
6(a,b) is equal to -1 if a = b, and 0 otherwise. Given the observed or, the posterior distribution of I is P(IIJ) ~ exp{-U(IIJ)} ,
where U(IIJ)-
1 ~ In(a2) +
Z
[J(i,j)- I(i,j)] :~ 2a2
+ fl
(i,j )EG
Z
6(I(i,j),I(k,l))
}
.
(k,l)eJq'3x3-{(i,j)}
One approach to estimate I is maximizing the following pixel conditional probability at each pixel [4], that is,
P(I(i,j)]J, ia,) ~ P(J(i,j)lI(i,j))P(I(i,j)l]X]x 3(i,j)) ,
(2)
where G~,Af~x3 are the supports G,A/'zx3 without pixel (i,j). The ICM algorithm requires a good initial estimate of the true image which is provided by the first stage. The entire image is processed iteratively, each time in a raster scan order. The intensity value in each pixel is selected to maximize Eq. (2) and parameter fl needs to be adjusted. Our experiments have shown that the algorithm converges in less than 15 iterations and that best results are achieved when/3 E [1.0, 2.0].
4
E x p e r i m e n t a l Results
In order to quantitatively measure the noise reduction ability of the presented algorithm, we propose the following performance criteria. The set of pixels of a noise free piecewise constant image is partitioned into two sets De, for e = 0, 1. Do contains all pixels having a homogeneous neighborhood, that is, when the neighborhood pixels have equal intensity values. D1 contains the remaining pixels that have a heterogeneous neighborhood, that is, the neighborhood contains at least one pixel with a different intensity value than the rest. The noise reduction ability in either the llomogeneous (t = O) or heterogeneous (g = 1) image areas is defined by
a ~-
F e - E---~'
where
1
~
E e - ]De---I~(i,j)EDt
[I(i,j)-/~(i,j)]2,
and IDtl is the size of De. Table 1 presents a comparison of various algorithms based on F0 and Fx on a synthetic piecewise constant image with Gaussian noise of variance r = 5, 10 and 15. The synthetic image contains several different objects with intensity equal to 110 and background intensity equal to 80. It is clear that the proposed algorithm (Local Classification & ICM) outperforms the rest of the methods with respect to both measures F0, Fa. In addition, the results of the application of the proposed algorithm (neighborhood size 9 x 9) to a real MR. brain image are presented. Figure 1 shows a section of the original image, the initial estimate and the final estimate. The significant image quality improvement due to the ICM algorithm and the ability of the overall smoothing algorithm to effectively reduce noise while preserving intensity discontinuities ~ro clearly demonstrated.
5
Conclusions
An efficient two-stage noise reduction method, that is suitable for approximate piecerwise constant images corrupted by stationary Gaussian noise and preserves image structure, was presented. The initial estimate obtained by local classification is improved by applying the ICM algorithm. Based on our experimental comparison, we concluded that the noise reduction and edge preservation properties of the proposed algorithm are superior to other well-known smoothing algorithms. Finally, it is noted that, the performance of the local classification stage decreases when the noise level is high and the application of ICM significantly improves the initial estimation results.
References [1] K. Haris, S.N. Efstratiadis, N. Maglaveras, and C. Pappas. "Hybrid Image Segmentation Using Watersheds". volume 2727, pages 1140-51, Orlando, FL, April 1996.
344 [2] I. Pitas, and A.N. Venetsanopoulos. Nonlinear Digital Filters: Principles and Applications. Kluwer Academic Publishers, 1990. [3] J. Marroquin, S. Mitter, and T. Poggio. "Probabilistic Solution of Ill-Posed Problems in Computational Vision". Journal of the American Statistical Assosiation, 82(397):76-89, March 1987. [4] J. Bessag. "On the Statistical Analysis of Dirty Images". J. R. Statist. Soc. B, 48(3):259-302, 1986. [5] P. Saint-Marc, J. Chen, and G. Medioni. "Adaptive Smoothing: A General Tool for Early Vision". IEEE Trans. on Pattern Anal. and Mach. Intell., 13(6):514-529, June 1991. [6] P. Perona and J. Malik. "Scale-Space and Edge Detection Using Anisotropic Diffusion". IEEE Trans. on Pattern Anal. and Mach. Intell., 12(7):629-639, July 1990. [7] K. Haris. A Hybrid Algorithm for the Segmentation of 2D and 3D Images. Master's thesis, University of Crete, Greece, 1994. [8] K. Haris, G. Tziritas, and S. Orphanoudakis. "Smoothing 2-D or 3-D Images Using Local Classification". In Proceedings of EUSIPCO'94, Edimburg, September 1994. [9] Z. Wu. "Homogeneity Testing for Unlabeled Data: A Performance Evaluation". CVGIP: Graphical Models and Image Processing, 55(5):370-380, September 1993. [1011R'64Dubes and A. Jain. "Random Field Models in Image Analysis". Journal of Applied Statistics, 16(2):131,1989. ~-
NOISE VARIANCE NOISE REDUCTION MEASURE ]Initial Local Classification (various window sizes)
I
ocal Classification & ICM (various window sizes)
7x7 9x9 II x II 7x7 9x9
II
II 34.242 41.422 60.917
II x II
~Local Classification & Median (various window sizes)
H
7• 9x9 II x II
I Neighborhood Averaging (various window sizes) I Median Filtering (various window sizes)
7x7 9x9 I i x II 7x7 9x9 II • II
I Gradient Inverse Filtering (various iteration numbers corresponding to the indicated window sizes) Anisotropic Diffusion (various iteration numbers corresponding to the indicated window sizes)
~
3(7• 4(9x9)
5 (11 • 11) 3(7•
4 (9x9) 5(11•
42.654 61.160 80.938 41.851 63.913 90.564 43.913 60.537
a=lO
a=15
F1
Fo
3.412
29.290 37.200 60.264 50.575 82.372 104.427
3.024 3.662 4.096 4.893 5.057 7.226 9.708 6.562 12.473 8.539 5"643 I1 53"082 4.565 ;.440302 67.991 7.501 77.758 5.785 94.905 8.872 107.319 6.795 0.351 46.264 1.374 45.547 2.995 0.359 74.489 1.414 73.962 3.119 0.359 107.632 1.425 108.515 3.171 1.046 30.628 2.037 [I 29.765 3.456 II O.936 49.902 2.078 48.674 3.644 0.833 71.893 2.047 72.481 3.7!7
23.067 1[ 26.877 23.607 37.532 28.957 56.748 37.054 53.852 45.757 17194.745844 51.947
3.855
II
12.375 14.562
42.28'56.8050.56I[ 7'~:73~094 41.5063.0552.77[[ 6:i866:il
21.503 25.454
2.314 2.667
24.940 30.634
3.300 4.185 4.780
I1
15.930 22.181 26.146
5.533
6.567 5.865 6.713
Table 1" Comparison of various noise reduction methods on a synthetic image.
Figure 1" Left: A section of the observed noisy MR, image. Middle: The result of the initial noise reduction stage. Right: The result after the application of the ICM algorithm to the initial estimate.
Session L: ADAPTIVE SYSTEMS II: CLASSIFICATION
This Page Intentionally Left Blank
Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
347
An neural approach to invariant character recognition Iraklis M. Spiliotis 1, Panagiotis Liatsis 2, Basil G. Mertziosland Yannis P. Goulermas 2 1Department of Electrical & Computer Engineering Democritus University of Thrace GR- 67100 Xanthi Greece
2Control Systems Centre UMIST Manchester M60 1QD United Kingdom
Abstract
Geometric transformations constitute a difficulty in optical character recognition (OCR) systems. This work describes the development of an intelligent OCR system based on higher-order neural networks (HONNs). These networks can be designed such that their outputs remain invariant to certain geometric distortions, such as translation, rotation and scale. The main obstacle in the practical application of HONNs is the explosion of the weights due to the number of input combinations. This problem is tackled using an efficient object representation scheme called image block representation (IBR).
I. Introduction Invariant object recognition is a major research area in computer vision. A number of approaches have been proposed to address the issue of image correspondence once geometric transformations are applied [1]-[5]. The question remains to select the set of image features which will ensure an acceptable recognition rate. Higher-order neural networks have developed over the past decade as an alternative to traditional object recognition approaches [6]-[ 10]. The basic concept of HONNs is the expansion of the input repre~ntation space using higherorder combinations of the input terms, such that the mapping from the input to the output space can become more readily obtainable. This idea has a certain appeal in object recognition systems since geometric feature extraction mechanisms can be incorporated within the structure of the HONN. For instance, it has been shown that features such as distances and line slopes defined by point pairs, and angles of similar triangles defined by point triplets are respectfully invariants to translation-rotation, translation-scale, and translation-rotation-scale transformations. As mentioned above, these invariants can be obtained by enriching the input representation space with all possible point combinations (up to a certain order). Clearly, this constitutes a serious limitation for the application of HONNs in invariant image recognition, where images may have a spatial resolution of 512x512 pixels. In particular, for an MxN image and n-order point combinations, the number of input terms will be augmented by (MxN)!/(MxN-n)!n!. To allow the use of HONNs to object recognition problems, the technique of coarse coding has been proposed. This decomposes the image into a set of non-overlapping, offset images of coarse resolution, such that the number of input combinations is reasonably bounded. However, coarse coding does not ensure lossless image representation and thus does not allow perfect image reconstruction [ 11]. This research proposes the use of a simple yet effective object representation scheme called image block representation [12]-[14]. This method decomposes the object into a set of non-overlapping rectangular regions, which can then be used to extract the so-called critical points, i.e. points of interest on the object. The number of critical points is relatively small (when compared to the number of object pixels) and subsequently they can be used as direct inputs to the HONN architecture. The performance of the system is evaluated in the case of binary character recognition.
II. Higher-Order Neural Networks A major criticism of single-layered perceptrons [15] was that they were unable to perform non-linear separation, an example being the XOR problem, due to the simplicity of the resulting decision boundaries. A way of dealing with this problem was to generalise the perceptron architecture such that it accommodates intervening layers of neurons, capable of extracting abstract features, thereby resulting to networks that could solve reasonably well any given inputoutput problem [16]. An alternative approach, based on recent studies of information processing exhibited by biological neural networks [17] as well as Group Method of Data Handling (GMDH) algorithms [18], was the expansion of input representation space by using multi-linear terms. This gave rise to a family of neural networks collectively known as higher-order neural networks. In general, the output of a first-order neural network is def'med by [9]
348
f
T_)ia (i) : f(T~ id(i)+ T1hid(i)), forhiddennodes
Y~= / (~~_oT,~ (i) / =/(T~ ut (i) + I"1~ (i)), for output nodes
(1)
where f(net) is a nonlinear threshold function such as the sigmoid function and Tot(i) is the bias term for output i of the k-th layer (where k takes the values hidden and output) given by (2) Tok (i) = w,*
and T~*(i)are the first-order terms for the i-th output unit of the k-th layer T/ (i) = ~.~ w,jxj* k-,
i where wi/are the interconnection weights for each input x~of layer (k-l) and output node i of layer k. Generalising to mixed n-th order networks gives [ 10]
yi
f(~.~Tnh'd (i)] = f(Toid (i) + Tlhia(i) + Tffa (i)+.. .+l . ,...,hid ;, (i).)
(3)
(4)
for the nodes of the hidden layer, where T2hid(i)and Tnhid(i)are given by
T~'id(i) : E~-~ Wi#hidxyx, (second- order terms) k j T hid(i)=E... E wii...(k_n)Xi...x,, hid (nth - order terms) n
(5)
j
Consider an object and any two non-identical points A, B on the object. Next an arbitrary translation and/or rotation of the object within the image is applied and points A, B become A' and B'. Since the invariant under translation and/or rotation is the relative distance between any two points on the object, the output of the HONN can be handcrafted to be invariant to this set of transformations by considering only the second-order terms [9]
wok xjx, t
(6)
j
and by constraining the input-hidden weights to satisfy hid hid
wiA8 = WiA,B, if dAa = dA,B,
(7)
where dan and dAR' are the Euclidean distances between points A, B and A', B' respectively. The learning rule for the higher-order neural networks is the backpropagation algorithm, appropriately modified to accommodate the inclusion of the higher-order terms in the hidden layer. The updating rule for the weights of the hidden layer is then given by [10]
A Wijk = O~i ~_a ~_jXjXk ,
j
(8)
where I/is the learning rate, the ~5'sare calculated as in the classical backpropagation, while k, j take values which satisfy the invariance constraints. HI. I m a g e B l o c k R e p r e s e n t a t i o n A bilevel digital image is represented by a binary 2-D array. Without loss of generality, we consider that the object pixels are assigned to level 1 and the background pixels to level 0. Due to this kind of representation, there are rectangular areas of object value 1 in each image. These rectangles, called blocks, have their edges parallel to the image axes and contain an integer number of image pixels. Consider a set that contains as members all the nonoverlapping blocks of a specific binary image, in such a way that no other block can be extracted from the image (or equivalently each pixel with object level belongs to only one block). It is always feasible to represent a binary image with a set of all the nonoverlapping blocks with object level and this information lossless representation is called Image Block Representation (IBR) [12]. Given a specific binary image, different sets of different blocks can be formed. Actually, the nonunique block representation does not have any implications on the implementation of any operation on a block represented image.
349
The IBR concept leads to a simple and fast algorithm, which requires just one pass of the image and simple bookkeeping process. In fact, considering a NlxN 2 binary image fix,y), x=0,1 . . . . . N 1 - 1 , y=0,1 . . . . . N 2 - 1 , the block extraction process requires a pass from each line y of the image. In this pass all object level intervals are extracted and compared with the previous extracted blocks. As a result, a set of all the rectangular areas with level 1 that form the object. A block represented image is denoted as f ( x , y) = {bi : i = 0,1..... k - 1} (9) where k is the number of the blocks. Each block is described by four integers, the coordinates of the upper left and down right comer in vertical and horizontal axes. The block extraction process is implemented easily with low computational complexity, since it is a pixel checking process without numerical operations. Fig. 1, illustrates the blocks that represent an image of the character d.
Figure 1. Image of the character d and the blocks.
IV. Critical Points Extraction An object normalization procedure is first executed in order to facilitate rotation invariant descriptions of the objects. Specifically the maximal axis of the object is found and the whole object is rotated in such a way that the maximal axis has a vertical position and that the upper half of the image object contains the most of the object's maze. At this point, it is necessary to give the following definitions [14]: l.Group is an ordered set of connected blocks, in such a way that all its intermediate blocks are connected with two other blocks, while the first and last blocks are connected with only one block. 2. Junction point is called a point that it is connected with two other points. 3. End point is called a point that it is connected with only one other point. 4. Tree point is called the point that it is connected with more than two other points. 5. Critical point is called a junction or an end or a tree point. In this research, a fast non-iterative critical points detection method for block represented binary images is presented. The method has low computational complexity, extracts only critical points and to a degree appears to be immune to locality problems. This is achieved by the use of a suitable neighbourhood at each case. Specifically, groups of connected blocks are formed. Each group is terminated when an adjacent block does not exists for its continuation, or when two or more blocks exist for the continuation of the group.
Figure 2. (a). Image of the character B. (b) The extracted blocks. (c) The groups of blocks. (d) The critical points. Each group defines a local neighborhtxxl and all the necessary processing takes place in this neighborhood. Using a few simple rules for the processing, the groups are checked and labeled to certain categories: 9 Vertical Elongated groups. The absolute value of the angle of these groups with the horizontal axis is usually greater than 30 ~ . The width of each block of a vertical elongated group should not exceed a threshold value. The connections among the blocks result to junction points, which belong to the thinned line that results from the group. For each pair of connected blocks, one junction point (the central point of the common line segment of two connected blocks) is extracted. For each block we check if the distance among its junction points and its extremities (i.e. the central points of the edges of the small dimension) of the block, exceeds a threshold value. 9 Horizontal Elongated groups. The absolute value of the angle of these groups with the horizontal axis is smaller than 30 ~ . The width of an horizontal elongated group is significantly greater than its height and also its height appears to have small variation. For the extraction of the junction points the algorithm starts from the left end of
350
an horizontal elongated group and moves to the right with constant width steps. At each step a junction point is extracted at the middle of the height of the group at this vertical position. Angle groups. The angle groups are connected with two other groups that lie on the same vertical or horizontal side of the angle group. The width and the height of an angle group are usually small. An angle group should not be connected to a noisy group. If a group has labeled as angle group and it is connected with a noisy group, then the label "angle" is replaced by the horizontal elongated label or the vertical elongated label. Three junction points are extracted from an angle group. The two junction points are extracted due to the connections with the two groups and another one for the formulation of an angle. Noisy groups. These are small and spurious branches of the object. The noisy groups have width and height less than a threshold and they are connected to only one group, which is not an angle group. In the most cases, the noisy groups are connected from the left or right side to vertical elongated groups or from the up or down side to horizontal elongated groups. In these cases the extraction of junction points from the noisy groups is not acceptable, otherwise a noisy end point would be created. The noisy groups are branches of the object that have small height and width and usually junction points are extracted from the noisy groups, if and only if the noisy group is connected at the ends of an elongated group. Fig. 2 demonstrates (a) an image of the character B, (b) the extracted blocks, (c) the groups of blocks and (d) the critical points.
V. Results In this work. we examine the application of IBR and HONNs techniques to the problem of recognising typed characters. The binary data consisted of 26 Latin letter characters (A-Z) and 10 digits (0-9) with a spatial resolution of 64x64 pixels. Since digits '6' and '9' are rotationally equivalent, they were considered as the same pattern. The font style selected for training the OCR system was 'Times New Roman'. Next, the techniques were applied to each of the 35 characters presented in 5 random translations and rotations, giving a total of 175 training patterns. The first stage of the system was the pre-processing which resulted to the critical points extraction. Here, the rotation normalisation procedure is applied to ensure the success of the IBR scheme. Due to the discrete nature of the image grid, some noise was intr~uced to the characters, when their maximal axis was set to the vertical position. Next, each of the characters was decomposed into its resulting blocks, and the groups were labeled into vertical/horizontal elongated, angle and noisy. Finally, the critical points were found, using the procedure described in the last section. The second stage of the system was the classifier. Here, the critical points were fed into a second-order neural network, which had a built-in feature extraction mechanism. This provided invariant classification with respect to translation anil rotation. The input layer of the higher-order neural network had 256 inputs. This number was selected to correspond to the maximum number of critical points extracted from any one of the training images. Since a binary representation encoding was employed in the output layer, there were only 8 output units. The number of units in the hidden layer was determined by using a genetic optimisation scheme [10] which provides the minimal-optimal network topology for a given classification problem. The network was able to learn to discriminate between the 35 types of printed characters alter 500 epochs. To evaluate the performance of the trained system, two testing procedures were applied. We firstly tested the system with a set of 700 patterns, obtained by applying 20 random translations and rotations to each of the 35 characters. It was still able tb distinguish between all translated and rotated versions of the characters with 100% accuracy. Next, the characters were corrupted using variable percentages of binary salt-and-pepper noise. It was observed that the system was able to distinguish with 1(10% accuracy for additive noise of up to 10%, and still had a satisfactory performance (recognition accuracy > 70%) for noise levels of 25%. VI. C o n c l u s i o n s A new approach to the problem of Optical Character Recognition was presented. The proposed system uses an efficient object representation scheme called image block representation, which decomposes the characters into nonoverlapping rectangular regions, which are then used to find the critical points. Next, the critical points are fed into a higher-order neural network with invariances to translation and rotation. This alleviates the problem of the combinatorial explosion of the higher-order terms, associated with the use of HONNs. The optimal number of hidden units, for solving the character recognition problem, was determined using a Genetic Algorithms (GA) scheme. The structure of the neural network was selected to be 256 inputs -5 hidden -8 outputs. The system was able to identify the translated/rotated patterns with 100% recognition accuracy, while it demonstrated robustness to additive noise.
351
Future work will investigate the performance of the system using a number of font styles as well as handwritten characters. Another interesting application for this type of system is visual inspection. In particular, the problem of detecting blemishes in industrial workpieces, where the only discriminating feature between the two classes is the presence (or absence) of the defect.
Acknowledgments The authors wish to acknowledge the British Council and the Greek General Secretariat for Research & Development for providing financial support for this research.
References [1] M.W. Roberts, M. Koch and D.R. Brown, 'A multilayered neural network to determine the orientation of an object',Proc. Int. Joint Conf Neural Networks, Vol. 2, pp. 421-424, 1990. [2] K. Fukushima, ~A hierarchical neural network model for associative memory', Biol. Cybern., Vol. 50, pp. 105113, 1984. [3] S.E.. Troxel, S.K. Rogers and M. Kabrinsky, 'The use of neural networks in PRSI target recognition', Proc. 1EEE Int. Cot~ Neural Nem,orks, Vol. 1, pp. 569-576, 1988. [4] E. Barnard and D. Casasent, 'Invariance and neural nets', IEEE Trans. Neural Networks, Vol. 2, No. 5, pp. 498508, 1991. [5] N. Papamarkos, I. M. Spiliotis and A. Zoumadakis, 'Character recognition by signature approximation', Int. Jour. Patt. Rec. Art. lntell., Vol. 8, No. 5, pp. 1171-1187, 1994. [6] T. Maxwell, C.L. Giles, Y.C. Lee and H.H. Chen, 'Nonlinear dynamics of artificial neural systems', in Neural Nem,orksfor Computing, AIP Conf. 151, UT, pp. 299-304, 1986. [7] C.L. Giles and T. Maxwell, 'Leaning, invariance, and generalisation in higher-order neural networks', Applied Optics, Vol. 26, No. 23, pp. 4972-4978, 1987. [8] M.B. Reid, L. Spirkovska and E. Ochoa, 'Simultaneous position, scale and rotation invariant pattern classification using third-order neural networks', Neural Networks, Vol. 1, No. 3, pp. 154-159, 1989. [9] P. Liatsis, P.E. Wellstead, M.B. Zarrop and T. Prendergast, 'A versatile visual inspection tool for the manufacturing process', Proc. CCA'94, Vol. 3, pp. 1505-1510, 1994. [10] P. Liatsis and Y.J.P. Goulermas, ~Minimal optimal topologies for invariant higher-order neural architectures using genetic algorithms', Proc. ISIE'95, Vol. 2, pp. 792-797, 1995. [11] J. Sullins, ~Value cell encoding strategies', Tech. Rep. 165, CS Dept., Rochester Univ., New York, August 1985. [12] I. M. Spiliotis and B.G. Mertzios, ~Real-time computation of two-dimensional moments on binary images using image block representation', accepted for publication in IEEE Trans. Image Process. [13] I. M. Spiliotis and B.G. Mertzios, 'Fast algorithms for basic processing and analysis operations on block represented binary images" submitted to Patt. Rec. Letters. [14] B. G. Mertzios, I.M. Spiliotis and N. Papamarkos, 'Image block representation and its applications to manufacturing and automation', accepted in 5th Int. Work. Time-Va~ing Image Process. and Moving Object Recognition, Florence, Italy, September 5-6, 1996. [ 15] M. L. Minsky and S. Papert, Perceptrons, Cambridge MA: MIT Press, 1969. [ 16] H. White, Artificial Neural Networks: approximation and learning theory, Oxford: Blackwell, 1992. [ 17] D.A. Baylor, T.D. Lmnb and K.W. Lau, 'Responses of retinal rods to single photons', J. Physiol., No. 288, pp. 613-634, 1979. [18] S. J. Farlow (ed.), Self-organising methods in modeling: GMDH algorithms, New York: Marcel Dekker Inc., 1984.
This Page Intentionally Left Blank
P~oceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
353
IMAGE SEGMENTATION BASED ON BOUNDARY CONSTRAINT NEURAL NETWORK F.KURUGOLLU, S. BIRECIK, M. SEZGIN, B. SANKUR TUBITAK MARMARA RESEARCH CENTER, INFORMATION TECHNOLOGIES INSTITUTE, P.O. BOX 21 41470 GEBZE KOCAELI-TURKEY E-MAIL:
[email protected] ABSTRACT Recently, artificial neural networks based image segmentation methods have gained more acceptance over other methods due to their distributed architectures allowing real-time implementation.. Another important advantage of the neural networks is their robustness structures to the unexpected behavior of input image such as noise. On the other hand, their disadvantage is that the learning phase could be too long and resulting segmentation has noisy boundary. In this study, a neural network based image segmentation method, which is called Constraint Satisfaction Neural Network, has been investigated. A modification of Constraint Satisfaction Neural Network has been proposed to alleviate both problems. It has been observed that when the edge field is brought in as a constraint the convergence improves and the boundary noise is reduced.
1. INTRODUCTION : THE SEGMENTATION PROBLEM Image segmentation, an important step in image analysis, aims to divide an image into uniform and homogeneous segments, hopefully reflecting the semantic content. Image segmentation methods can be divided into three main category: Region based methods, edge based methods and pixel classification based methods [1 ]. Despite the plethora of segmentation algorithms in the literature, the quest for new innovative methods continues mainly to overcome - Computational complexity for real-time applications, Robustness to handle as large a variety of scenes as possible, To match algorithmic results to semantic content. -
-
In this respect, it is believed that neural networks judiciously used with image information constraints is a promising approach not only for computational speed but also for robust results. Recently, a number of neural network based schemes have been advanced for image segmentation purposes. Some of these algorithms use measurement space information while the others use the spatial information. The former ones usually use histogram of the image. These algorithms try to determine the peaks of the histogram. The latter ones use some spatial information such as gray value, local average, local variance, etc. The main advantages of the neural networks is that they have distributed architecture, they can be implemented in hardware to meet real-time demands [2] and they can handle the nonlinear relationships between measurement space and spatial information.
2.PREVIOUS WORK:CONSTRAINT SATISFACTION NEURAL NETWORK One such neural network based image segmentation method is Constraint Satisfaction Neural Network (CSNN) proposed by Lin et.al.[3]. In this method, Image segmentation is casted as constraint satisfaction problem. The principle of the method is to assign the segment labels into the pixels under certain spatial constraints. The method uses a network topology as shown in Figure 1 and the constraints defined between a pixel and its neighbors to accomplish the segmentation. The network topology (Figure 1) consists of m layers one for each of the segments. There are nxn (image dimension) neurons in each layer, each neuron representing an image pixel. Each neuron which has the same index in each of the layers holds the probability that the pixel belongs to the segment represented by layer index. Connections between a pixel and its neighbors are depicted in the Figure 2. In this example, 8 neighborhood connectivity is chosen. The neighborhood connectivity order may be varied depends on application. The weights of these connections represent the constraints in this topology. After an initialization, CSNN converges to a segmentation result which satisfy all constraints through a parallel and iterative process. Whenever CSNN converges to a result, the neuron in the correct layer approaches to 1 while the neurons in a column in the other layers reduce to 0. The layer label which approaches to 1 is assigned to corresponding pixel. The gray value distribution of the input image is used to as the initial condition. The gray values are classified into a number of segment categories by means of a Kohonen self-organizing neural network. These categories constitute the initial labels (i.e., probabilities of belonging to a segment). The network weights are adopted by a heuristic manner. These weights are determined so that a neuron excites those neurons that represent the same label, and inhibits the ones that represent the significantly different labels.
354 The method used in determination of the weights and the mathematical construction of the CSNN can be found in [3]. In summary, the major advantage of the CSNN is that segmentation is performed by using image information constraints in a parallel manner. Input Image
. ~ o o oo- -oo - - - oo.~, o/ /6oo~X~! o ooo" nxy-o o o - - - o 0 y o o 0
0oo-
9
0
0
o o o
o
o
.
9
9 0-
0
-
O0
9 .0-
0
-0-
0
-
'
-0 O0
0
O0
0 0 0
O~DO
CJU i(j+l,)l
9 - 0 0
99
000 0 0 0 .
U i(j-1)C}"
U (i-1)jl U (i-1)(i,1)1 . uJ:i. ~. ~. ~ ~ , ' .'
0 0'0 - 0 - ' ' 0 0
() 0 O 0 ~0
m
~)" , .
(i+1)(j-1)1
0oo.--o--.o0
0 0 0
U~
9
9 Q
0
9
9
~0 0
O0
0'0
-00
Figure 1. The Topology of the CSNN. Each layer
represents the segments. (i,j) th n e u r o n in each layer holds the probability that the (i,j) th pixel belongs to the segment represented by the layer.
Figure 2. Connections between a neuron and its neighbors. The weights of these connections are interpreted as constraints. T h e s e weights are determined heuristically to provide exciting the neurons with similar intensities and inhibiting those with different intensities
3. IMPROVEMENTS ON THE C S N N METHOD BOUNDARY CONSTRAINT SATISFACTION NEURAL NETWORK Convergence of CSNN to a meaningful segmentation is time-consuming and the error at convergence does not decrease beyond a certain value. This effect can be shown in Figure 3. The convergence error is 10 order after 50 iterations for a 256 by 256 image. Actually, the algorithm lets the segments grow rapidly, but it hesitates in the assignment of pixels around the segment boundaries. This problem causes a lot of the futile iterations. On the other hand, the neuron allocation problem arises when using images such as 256 by 256. For example, for the 256 by 256 size image with 8 potential, 524288 (256x256x8) neurons are necessary. If the image size are reduced to meet the real-time constraints and the alleviate capacity problem, then undesired smoothing in the segments become apparent because reduced segments are absorbed by strong segments. The segment borders are also noisy in this case. To solve these problems, an algorithm called Boundary-Constraint Satisfaction Neural Network (B_CSNN) is proposed in this study. Since boundary uncertainties were a major handicap, a coarse boundary map of the input image is used. For this reason, the weights between the boundary pixels and their neighbors are set to 0, so that contributions of the boundary pixels to their neighbors are precluded as depicted in Figure 4. Consequently B_CSNN is observed to accomplish the segmentation process more consistently. The flow diagrams of the B_CSNN algorithms is shown in Figure 6. While image pixels are only processed in the CSNN algorithm, the boundary constraints are used to process the image pixels in the B_CSNN algorithm. Therefore, the number of iterations and convergence error of the B_CSNN are both reduced significantly (See Figure 3). The segmentation results for 256 by 256 images and 4 segments are compared in Figure 7. One can notice that segmentation boundary noise resulting from the CSNN is removed by the B_CSNN algorithm.
355
Figure 3. Convergence error with respect to the number of iterations of both algorithm for a 256 by 256 image. At 23rd iteration step, convergence error of B CSNN is under 1 while that of CSNN is still 10 -
Figure 4. Adjustment on connection weights by using edge map of the input image. The black dots indicate the edge pixels. The contribution of the boundary pixel to their neighbor are precluded by assigning 0 to its connection weights.
For 64 by 64 images the segmentation results of both algorithms are compared in Figure 8. Notice the segment absorption and the noisy segment border problems evident in the CSNN algorithm while these problems are mostly fixed by the B_CSNN. Convergence error with respect to number of iterations of both algorithm are compared in Figure 5. After the improvements brought in by B_CSNN, we will investigate enhanced CSNN to multiresolution or pyramidal decomposition's in order to include the spatial information content of the image.
Figure 5. Convergence error with respect to the number of iterations of both algorithms for a 64 by 64 image. At 13th iteration step, convergence error of B_CSNN reaches to 0 while that of CSNN is still 2.
Figure 6. Flow diagram of the B_CSNN algorithm. Besides the original image, a coarse boundary information is used to determine segment boundary
REFERENCES: 1. R.M. Haralick, L.G. Shapiro, Computer and Robot Vision, Vol 1., Addison-Wesley Pub., 1992. 2. N.R. Pal, S.K. Pal," A Review on Image Segmentation Techniques ", Pattern Recognition, Vol. 26, No. 9, pp. 12771294, 1993. 3. W.C Lin, et. al. ," Constraint Satisfaction Neural Networks for image Segmentation ", Pattern Recognition, Vol. 25, No. 7, pp. 679-693, 1992.
356
Proceedings IWISP '96," 4- 7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
357
A HIGH PERFORMANCE NEURAL MULTICLASSIFIER SYSTEM FOR GENERIC PATTERN RECOGNITION APPLICATIONS Dimitris A. Mitzias and Basil G. Mertzios Automatic Control Systems Laboratory Department of Electrical and Computer Engineering, Democritus University of Thrace 67 100 Xanthi, HELLAS e-mail: mitzias/
[email protected] Abstract A high performance NEural MUlticlassifier System (NEMUS) is proposed, which is characterized by a great degree of modularity and flexibility, and is very efficient for demanding and generic pattern recognition applications. The NEMUS is composed of two stages. The first stage is comprised of several classifiers that operate in parallel, while the second stage is a Decision-Making Network (DM-Net) that performs the final classification task, combining the outputs of all the classifiers of the first stage. In general, the inputs of each classifier are the features extracted from different Feature Extraction Methods and correspond to various levels of importance. The performance of the proposed NEMUS is demonstrated by a shape recognition task of 2-D digitized objects, considering various levels of shape distortions. Three different kind of features, which characterize a digitized object, are used: (a). Geometric features, (b). 1-D scaled normalized central moments and (c). The angles of a fast polygon approximation method. 1. Introduction Pattern recognition applications are usually executed in two basic stages. In the first stage, each pattern is described by a set of features, using a Feature Extraction Method (FEM), according to the requirements of each particular task. In the second stage, the pattern is recognized using a classification procedure, which requires a set of input data that are usually expressed in the form of an input feature vector. A significant number of classifiers is available in the literature that are based in deterministic and statistical techniques (e.g. Euclidean distance, least mean square error, cross correlation, nearest neighbour rule and leave-one-out algorithm) [1],[2] or in distributed processing techniques, where pretrained systems serve as classifiers (e.g. neural networks) [3]. The selection of the appropriate FEM depends on the specific conditions and requirements in order to achieve the higher classification efficiency. To this end, it is essential in demanding applications to use a combination of different FEMs. The underlying idea is that multiple FEMs contribute in the classification procedure different features of the same pattern that correspond to considerably different levels of importance, carrying different and complementary information. Therefore, the actual contribution of each FEM can not be explicitly determined. Thus, for multiclassifier applications, an answer should be given to the following questions: (a). How many and which FEMs should be used, (b). Which kind of classifier presents the better performance for each particular FEM and (c). Which is the contribution of each classifier in a multiclassification scheme. A neural multiclassifier system, which is characterized by a great degree of modularity and flexibility, and is very efficient for demanding and generic pattern recognition applications, is proposed. The NEMUS is composed of two stages. The first stage is comprised of several classifiers that operate in parallel, while the second stage is a decision-making network that performs the final classification task, combining the Outputs of all the classifiers of the first stage, so that the whole discrimination efficiency is optimized. The NEMUS gives a satisfactory answer to the above questions by the automatic selection of the contribution of each classifier's output to the multiclassification procedure. This practically means that a classifier is partially accepted or rejected in proportion to its ability to contribute in a particular task. Thus, the prior knowledge, concerning the type and the number of the classifiers, which should be used in a particular task, is not required. The performance of the proposed NEMUS is demonstrated by a shape recognition task of 2-D digitized objects considering various levels of shape distortions. Three different kinds of features, which characterize a digitized object, are used: geometric features, 1-D scaled normalized central moments and angles of a fast polygon approximation method. 2. The Neural Muiticlassifier System (NEMUS) The proposed NEMUS is composed by a number (S) of classifiers at the first stage and a DM-Net at the second stage, as it is shown in Fig. I. The k th, k= L2,...,S, classifier in the first stage operates with input, the feature vector Xk, which is produced by the k m FEM. The second layer (DM-Ne0 of the
358 NEMUS performs the fusion of the outputs, Ok of all the classifiers, so that the discrimination efficiency is optimized. The DM-Net is a Neural Network and consists of simple Neural Elements (NEs), which operate in parallel [4],[5]. The elements of the final output vector Y=[yl, y2..... ym] of the multiclassifier are given by: s
y,-f(as), as =XWk,jo~j+0j
(1)
k=l
where f('). is the sigmoid function, Wka is the connection weight between the jth output of the k th classifier and the jm NE of the DM-Net, and 0j is the internal threshold value of the jth NE of the DM-Net.
Classifier-I
Classifier-2 DMoNet
Classifier-S
Figure 1. The schematic diagram of the NEMUS. The training of the DM-Net is defined as the calculation of the connection weights Wk,j, k=l,2,...,s, j=l,2,...,m, of the DM-Net with the outputs of the classifiers of the first stage. These weights represent a measure of the discrimination efficiency of each classifier's output. After the training phase of the classifiers has been completed, the DM-Net is trained using an adaptation algorithm according to the type of the selected neural elements of the DM-Net. The classification efficiency, as well as the contribution of each classifier, depend on their ability to discriminate under various conditions resulting by pattern variations, which usually appear in each particular pattern recognition application. Thus, the training of the DM-Net is achieved by presenting to the classifiers a set of destroyed patterns, which are called Training Patterns and should represent a sufficient samLa!e of pattern variations among the possible appeared variations in each particular task. Specifically, the jm NE operates with the jth output of each classifier as its inputs and its weights Wk,j, 0j are automatically determined by the adaptation algorithm, using the set of training patterns. Also, the NEs are independent of each other and they are trained in a different number of adaptation circles. The DM-Net is trained until a measure of the total DM-Net's output error becomes smaller than a specified value. This measure can be simply given by the Mean Square Error (MSE) as: 1 dj(t)- yj(t) (2) T M t-I j--1 In (2), the term [dj(t) - yj(t)] represents the classification error of the jth output os the DM-Net, when the t th training pattern is presented on the NEMUS. The adaptation algorithms converge in a few adaptation circles and the computation time for each circle is significantly low, due to the simple architecture of the DM-Net. M~E -
3. Experimental Results In this Section, we present a pattern recognition application in order to demonstrate the efficiency of the proposed multiclassification technique. The NEMUS is employed to discriminate between ten 2Ddigitized objects considering various levels of shape distortions. It is assumed that the shape variations are those resulting from three quantized artificial types of bounded distortions, which produce a finite number (10560) of shape variations. Three different l-D typical FEMs are used, which are implemented on the region boundaries and are invariant of position, size and orientation of the shape. They also provide
359 different types of features corresponding to various levels of importance and they are sensitive on different level of shape's distortion. (a). Geometric features. The geometric features of a shape are extensively used in pattern classification applications [6],[7]. These features are useful in applications, where the patterns are not very complicated and the desired discrimination efficiency is not very high. Three geometric features are used to produce a feature vector; the Normalized Inverse Compactness C , the Normalized Area A and the Normalized Length L of the shape, which are determined as" -- 4~rA -A -Y
c= p--V-, A
rra~' L
am
(3)
where P is the length of the shape's perimeter, A is the area of the shape, dm is the maximum distance of the shape and g is the mean of the distances from the boundary to the centroid of the shape. (b). The Scaled Normalized Central (SNC) set of moments [8],[9]. Statistical moment-based methods are used in a large number of image analysis and pattern recognition applications. The features resulting from a moment based method usually provide good description of a shape and therefore have good classification properties. In this application, a special case of the 1-D SNC moments is used. The considered I-D SNC moment of order k is defined as follows: I - _ _ - ' 1
(4) i...
v
...i
where hk are the I-D central moments, m0 is the zero-order geometric moment, c~ is the scaling factor corresponding to the hk moment, which is used in order to avoid the exponential increase of the high order moments, and 13is the normalization factor. (c). The Step by Step Polygon Approximation technique (SSPA) [10]. Polygon approximation techniques are often used in shape analysis and data reduction applications. The SSPA technique gives a satisfactory solution to the problem of the direct selection of a fixed number of vertices of the polygon, which approximates a contour of a shape, especially in the cases where time is a critical factor. Thus, a region boundary may be approximated by a polygon with a prespecified number of vertices. The angles of the extracted polygon are used as discrimination features. In the considered application, a suitable version of the NEMUS is used as a classifier, having three Neural Networks (NNs) in the first stage, while three simple neural elements form of the DM-Net. The following three feature vectors are used as inputs of the NNs of the first stage of the NEMUS: G = [C ,A ,L], H=[h2,h3,h4], A=[F~,~2..... a-7] (5) where C , A and L are the three geometric features, h k are the 1-D SCN moments and ak are the normalized internal angles of the polygon, approximating the contour of a shape. The NNs of the first stage are selected to be three-layer perceptrons, which are trained using the back-propagation algorithm, to discriminate among the ten prototypes. The number of inputs of each NN equals the dimension of the three feature vectors G, H and A respectively, while the number of the outputs equals the number (10) of the original prototypes. Table 2 demonstrates the classification efficiency of the three independent classifiers (corresponding to the three different FEMs: geometric features, 1-D SNC moments and polygon angles), using a sample of 1000 randomly selected patterns. After the training phase of the NNs has been completed, the DM-Net is trained, with a set of training patterns using the back-propagation algorithm. A sample of 500 training patterns, which are randomly selected from the set of the 10560 possible different versions, produced by ten prototypes, is used and the adaptation algorithm is applied for only 1000 iterations. For comparison purposes, the classification procedure is demonstrated using three different operations for the DM-Net. (a). Each NN contributes to the multiclassification procedure with the same level of importance, i.e. the DM-Net performs only linear summation of the outputs of the NNs, (b). Each classifier contributes to the multiclassification procedure in proportion to its ability to discriminate among the set of training patterns. This is achieved by determining statistically the contribution of each NN, and (c). The DM-Net is a Neural Network itself and determines the contribution of each singe output of the NNs of the first stage by performing a fusion of the classifiers' outputs, using the set of training patterns. Table 3 demonstrates the classification efficiency of the NEMUS (considered with the three versions of the DM-Net), for various combinations of the independent FEMs, in terms of the percentage of the right classified patterns and of the classification error, using the sample of the 1000 randomly selected patterns.
360 TABLE 2 The Classification Efficiency, of the Indel~endent FEMs Independent FEMs Percentage of right classified] Classification Error (MSE) ] I 9 patterns I Geometric Features G I 55.4 ] 0.0816 I I-D SNC Moments H 84.1 0.0302 Poly[[on Angles A 75.6 0.0550 TABLE 3 The Classification Efficiency, of NEMUS using several combinations of the three FEMs Combined Percentage of right classified patterns Classification Error (MSE) FEMs (a) (b) (c) (a) (b) (c) G, H 81.3 93.8 96.0 0.0372 0.0234 0.0176 G, A 77.6 86.2 88.7 0.0413 0.0337 0.0273 H, A 93.6 93.0 97.4 0.0248 0.0369 0.0129 G, H, A 95.5 97.1 98.3 0.0274 0.0192 0.0098 4. Conclusions
A NEural MUlticlassifier System (NEMUS) is proposed for high performance classification applications. The proposed multiclassifier is composed of two stages. The first stage is comprised of several classifiers that operate in parallel, while the second stage is a Decision-Making Network (DM-Net) that performs the discrimination task, using a fusion of the outputs of all the classifiers of the first stage, so that the discrimination efficiency is optimized. The NEMUS is applied to a shape recognition task of 2D digitized objects, under various levels of shape distortions. Three different kind of features, which characterise a digitized object, are used: geometric features, I-D scaled normalized central moments and the angles resulting from a fast polygon approximation method. NEMUS is suitable for generic classification applications, such as shape discrimination, signal detection and remote sensing, and has the following advantages and characteristics: * Different types of classifiers may be used simultaneously. Thus, for each independent FEM, the most appropriate type of classifier may be used, in order to achieve its higher discrimination efficiency. * The contribution of each FEM is stored as synaptic weights and bias of the DM-Net, which are automatically determined using a set of training patterns. Thus, the prior knowledge, concerning the type and the number of the classifiers, which should be used in a particular application, is not required. 5. References
[1] K. S. Fu, Syntactic Pattern Recognition and Applications, Englewood Cliffs, NJ: Prentice-Hall, 1982. [2] P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach, London, U.K." PrenticeHall, 1982. [3] R. J. Schalkoff, Pattern Recognition: Statistical Structural and Neural Approaches, New York: John Wiley and Sons, 1992. [4] R.P. Lippmann, "An introduction to computing with neural nets," IEEE Acoustics, Speech and Signal Process Magazine, pp. 4-22, April 1987. [5] D. E. Rumelhart and J. L. McClelland, (Eds.), Parallel Distributed Processing, Cambridge, MA: MIT Press, 1986. [6] L. Shen, R. M. Rangayyan and J. E. Leo Desautels, "Application of shape analysis to mammographic calcifications", IEEE Trans. on Medical Imaging, vol. 13, No. 2, pp. 263-274, 1994. [7] R.C. Gonzalez and P. Wintz, Digital Image Processing, Second ed., Reading, MA: Addison-Wesley, 1987. [8] B. G. Mertzios, "Shape discrimination in robotic vision using scaled normalized central moments," Proceedings of the IFA C Workshop on Mutual Impact of Computing Power and Control Theory, pp. 281-287, Prague, Chechoslavakia, September 1-2, 1992. [9] B. G. Mertzios and D. A. Mitzias, "Fast shape discrimination with a system of neural networks based on scaled normalized central moments," Proceedings of the International Conference on Image Processing: Theory and Applications, pp. 219-223, San Remo, Italy, June 10-12, 1993. [10] D. A. Mitzias and B. G. Mertzios, "Shape recognition with a neural classifier based on a fast polygon approximation technique," Pattern Recognition, vol. 27, No. 5, pp. 627-636, 1994.
Proceedings IWISP '96," 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
361
Application of a Neural Network for multifont Farsi character recognition using fuzzified Pseudo- Zernike moments Mehdi Namazi, Ms. Student & Karim Faez, Associate professor. E.E. Dep. Amirkabir University of technology Tehran - Iran, 15914 Email:
[email protected] Abstract In this paper an algorithm is developed for recognition of printed Farsi characters with various fonts, irrespective of size, rotation and stroke. The system operates on an image of characters that can be obtained from a standard digital scanner (e.g. 300 dpi). The approach consists of four main stages: pre-processing, feature extraction, fuzzification and classification. In the pre-processing stage, the image is first binarized and then passed through a noise reducing filter [1 ]. The image is finally thinned to compensate for any difference in stroke width (e.g. bold characters). In the feature extraction stage, the selected features are derived from a set of moments known as Pseudo - Zemike Moments (PZM). These moments offer several advantages over the regular moments that have been used in pattern recognition problems. In the third stage, the moments are fuzzified to a set of objects with a continuum of grades of membership. In the last step, input characters are classified by a feedforward neural network using backpropagation learning method. In this paper, we have also compared the classification results using fuzzified moments and nonfuzzy moments as network input. The learning patterns are printed characters with a group of fonts, and the test patterns are another printed character group with some fonts different from learning group. This comparison shows that neural networks with fuzzy inputs (FNN) has a better performance with unclear inputs. Key words Multifont character recognition, Farsi/Arabic characters, Fuzzified Pseudo - Zemike Moments (PZM), Fuzzy Neural Network. I - Introduction Dealing with uncertainty is a common pattern recognition problem, and fuzzy set theory has proved to be significantly important in pattern recognition problems[6]. Feedforward neural networks usually are trained with examples to learn the rules and functions of a real word system through the error back propagation algorithm. Fuzzy - neural networks (FNN) combine the learning and knowledge representational capabilities of neural networks and fuzzy sets. When the input pattern is too clear to classify, NN gives us a reliable answer. From the point of uncertainty, when the input pattern is not clear enough, NN results are unreliable, but FNN can easily classify theses patterns and can learn the difference between similar patterns. The network learns the patterns using backpropagation learning method. Note that we can use the feature moments directly as network inputs, but using fuzzified moments as network inputs has two advantages: decrease in network learning convergence time and decrease in system error. The structure of this paper is as follows: Section II discusses about the properties of Farsi characters and also introduces the selected character classes. Section III discusses about Pseudo - Zemike Moments (PZM). In section IV the assigned fuzzy sets are introduced. Section V presents the experimental results. The conclusion is presented in section VI.
II - Farsi characters In Farsi language, there are 32 main characters. Depending on the position of each character in a word, it may have 2 to 4 different shapes (fig. 1). We have only considered the main form of each character. Dots play an important rule in Farsi characters. For example as seen in fig.2, there are three different characters~ but
362 graphically they differ only in the number of dots down and up the character. To simplify, we neglect these dots and consider characters in main form without dots. So the number of classes reduces to 18 classes (see fig.3).
fig.2 Various Farsi characters having different dots
fig. 1 Typical different forms of a Farsi character depending of its position in a word
fig.3 Simplified character classes III - Pseudo - Zernike Moments Zemike polynomials were first introduced in 1934 [2], and were later derived from the requirement of orthogonality and invariant properties. Zemike polynomials, being invariant with respect to rotations of axis about the origin, are polynomials in x, and y variables. A related orthogonal set of polynomials in x, y, and r as derived in [3] which has properties analogous to Zemike polynomials, called pseudo - Zemike polynomials, which differs from that of Zemike in that the real - valued polynomials are defined as:
--
n-Ill
.
( 2 n + 1 - s)V"
R~ Z,__o(-1)" s!(n_ll I_ s)!(n+ll l_ s)!
.
r"-"
-
~
,=i,I
S nlll k r k
Here n=0, 1, 2, ..,oc and l takes on positive and negative integers values subject to [/[_ = 5k-n
(6)
401
a m [ k ] ~
~ ]
C~i]
Cm[N- 11
~
cos
(2xfct + 2~F__i t)
~(t/Tb)
cos(2afct+21t~) t)
,, Tb
Figure 2: The m-th transmitter model for the wavelet-based M C / B P S K CDMA system. For any j < m ,
< Cj,k(t), Cm,n(t) > -- ~j-m(~k-n 9
(7)
These relations are the basis of the wavelet transform applications in communications. There exist many families of wavelets and scaling functions. In communications applications, usually it is required that the wavelet should be smoother than the simplest Haar wavelet and provide better temporal as well as spectral localization. 4.2
Wavelet-Based
MC-CDMA
By using the self and cross-orthogonality of the scaling functions r and the wavelet functions now we propose novel wavelet-based MC-CDMA systems. In our wavelet-based MC-CDMA systems, there exist three levels of orthogonalities: the subcarrier frequencies are orthogonal to each other, the wavelets and scaling functions are orthogonal to each other, and the spreading sequences are also orthogonal to each other. The w a v e l e t - b a s e d M C / B P S K C D M A signal for the m-th transmitter can be described as follows:
r
N-1
s,~(t) - E ( c~[i]am[k] r ~:o v~
-
kTb) + c~[i]bm[k] r Tb x/~
- kTb)} cos(27rf~t + ~--)F Tb 2ui t ,
(S)
where Tb is a power of 2, and am[k] and bin[k] are two independent data symbols at the k-th bit interval. Shown in Fig. 2 is a model of the wavelet-based MC/BPSK-CDMA transmitter for the m-th user. At the receiver, assuming there are M active users and the channel is noiseless, the received signal is M-1N-1 r(t) = ~ ~{c~[i]am[k] r - kTb c~[i]bm[k]r kTb F m=0 i:0 ~ Tb ) + V~ Tb )} cos(27rfct + 27ri~t). (9) Assume that m = 0 corresponds to the desired signal. In the 0-th receiver, there are N passband filters with the i-th one corresponding to the frequency fc + iF/Tb, so the received signal r(t) is first converted back to the baseband signal in each i-th branch of the receiver: M-1
?~i(t) Now the signal and r
rc~[i]am[k] : t - kTb c~[i]bm[k] / t - kTb )} m:oE~ v/~ r % )+ x/~ ~L Tb
(10)
ri(t) is filtered separately by two matched filters with the impulse responses r Tb : respectively, where T = JTb is the duration of r and r and the filter
402
outputs are sampled at t =
nTb, which
result in the following variables
y~(nTb) = ~(t) 9T~1/2r JTb - t
Tb )]t=~T~,
M-1
= E c~[ilam[n- J],
(11)
m--O
and zi(nTb)
--
r i ( t ) 9 T b l / 2 ~ ) ( J T b -- t M-1
= ~ c~[i]bm[~- J].
(12)
m=0
Then,
yi(nTb)
is multiplied by c0[i], and taking summation over i gives N-1
u(n) = E co[i]yi(nTb) = coin- J].
(13)
i=O
Similarly, we have N-1
v(n) = E co[i]zi(nTb) = bo[n- J].
(14)
i=0
Therefore, we recover the data symbols coin- J] and bo[n- J] for n = 0, • • -... Now we generalize the wavelet-based MC/BPSK-CDMA system to the following w a v e l e t - b a s e d MC/QPSK-CDMA system. The transmitted signal at the m-th user in the wavelet-based MC/QPSK-CDMA system is:
~(t)
=
N-1
~([~[i]a~[k]
~--0
~
r
- kT~
+ [c~[i]a~[k] t - kTb
v~
~[i]b~[k]
% )+~r
r
t-TbkTb)] cos(2rf~t
c~[i]b~[k]r
) + ~
+
2~i~t)
- TbkTb )] sin(2~'fct + 2~i ~--~t)},
(15)
where the sequences {am[k]}, {bm[k]}, {a~[k]} and {Um[k]} are four independent data symbols, usually taking the values of =t=v/~. At the receiver, first, the in-phase and quadrature signals are separated by the orthogonality of cos(2~fct + 27ri~t) and sin(2nfct + 2ni~t) for i = 0, 1, ..., N - 1, then the separated in-phase and quadrature signals Sm,i(t I ) and sQ,i(t) can be demodulated by the separate matched filters with as the impulse responses respectively. Those matched filters are r and r followed by sampling and hard decision devices. Assuming Tb -- 2j, it can easily be seen that the above wavelet-based MC-CDMA systems use only • .... a single wavelet frequency band corresponding to the j in Cj,k(t) and Cj,k(t) for k = 0, • So in each branch i, we can form the 'near-baseband' signals by summing several single frequency band wavelet-modulated signals, and we use the resulting 'near-baseband' signals to replace the corresponding baseband signals in wavelet-based MC/QPSK-CDMA systems. So obtain the following fractal M C C D M A system: N-1
~m(t) = ~ ~{[~[i]o~,j[k]r i=0
+ c~[i]b~,j[k]r
co~(2~f~t+ 2 ~ i ~ t )
+ c~[i]b~,j[k]r
.F sin(2~'f~t + 2m~BBt ) },
j~U
+ [c~[i]a~,j[k]r
(~6)
where {am,j[k]}, {bm,j[k]}, {a ~ md[k]} and {b~ m,j[k]} are four independent data symbol sequences for the j-th band. U is a subset of integers, such as U = {1 - M, 2 - M, . . . , 0}, and it can be chosen according to the channel characteristics.
403
Table 1: Variation of bandwidth efficiency with different wavelet waveforms. BE ]l Waveform n 90.57 Full-Width Rectangular Pulse n 90.65 Daubechies Wavelet (order 4) n 90.71 Daubechies Wavelet (order 6) n . 1.43 Daubechies Wavelet (order 8) n , 1.48 Daubechies Wavelet (order 10) n . 1.74 Battle-Lemarie Wavelet .
4.3
.
.
.
.
.
Performance Analysis
In this section, we discuss the advantages and performance of our wavelet-based MC-CDMA systems compared with the conventional MC-CDMA used in wireless communication systems. As the case of conventional MC-CDMA system [15], wavelet-based MC-CDMA systems address the issue of how to spread the signal bandwidth without increasing the adverse effect of the delay spread. A wavelet-based MC-CDMA or a fractal MC-CDMA signal is composed of N narrowband subcarrier signals, each of which has a symbol duration much larger than the delay spread Td, so it will not experience an increase in susceptibility to delay spreads and ISI as does the DS-CDMA system. Since the parameter F can be chosen to determine the spacing between subcarrier frequencies, a smaller spreading factor N than the factor required by the DS-CDMA can be used not to make that all of the subcarriers are located in a deep fade in frequency. Then, frequency diversity is achieved. In addition, the mother wavelet function and the set of wavelet frequency bands U can be chosen according to the characteristics of the channel. Thus, two new dimensions to improve the system performance are obtained. If the effects of the channel have been included in pm,i and Ore,i, and n(t) is AWGN, the received signal for the wavelet-based MC/QPSK-CDMA can be represented as follows: M-1N-1 {[am[i]am[k]r V/~ r(t) = ~ ~ Pm,i m=O i=O
+ [
c~[i]a~[k] r - kTb v/T~
Tb
)+
Tb
Cm[i]bm[k]r
)+ ~
c~[i]b~[k]r V~
...... Tb
)] cos(27rfct +
kTb Tb )]sin(2rf~t +
F
27ri~bt Ore,i) +
2ui~t + 0m,~)} + n(t)
(17)
Then, by comparing the wavelet-based MC-CDMA demodulation processes with the conventional MCCDMA demodulation processes [15], it can be shown that both the wavelet-based MC-CDMA system and the conventional MC-CDMA system possess the same BER under above channel condition. For other fading channels, however, the suitable choice of the wavelets provides another way to combat the distortion of the transmitted signals and improve the system performance. Under the assumption of AWGN channel, the BERs of the wavelet-based MC/BPSK-CDMA and the fractal MC-CDMA systems can also be shown to be equal to the BERs of the corresponding conventional BPSK and QPSK systems respectively. The bandwidth efficiency (BE) of a modulation system is defined as
BE=
Total bit rate Bandwidth'
(bits/sec/gz).
(18)
Assuming 99% power bandwidth, based on the results in [18], the BE variation with some different wavelet waveforms is shown in Table 1. Here, n - 1 and 2 correspond to the BPSK and QPSK respectively. Consequently, for the wavelet-based MC-CDMA systems, significantly higher bandwidth efficiencies can be obtained, compared with the conventional MC-CDMA system, by the introduction of the compactly supported orthogonal wavelets.
404
5
Conclusions
In this tutorial paper, we compare the performance of various MCM techniques, such as O F D M and MC-CDMA, with an emphasis on the proposed wavelet-based MC-CDMA systems. The proposed wavelet-based MC-CDMA systems possess all the desirable characteristics, e.g., frequency diversity and small ISI, which the conventional MC-CDMA system has. In addition to those advantages, the wavelet-based M C - C D M A systems provide not only higher bandwidth efficiency than the MC-CDMA systems, but new dimensions for the anti-fading and interference immunity by the suitable choice of the wavelet functions and the wavelet frequency bands. By the results, the wavelet-based MC-CDMA systems can be the one feasible candidate of the multiplexing/multiple access technique for the use in the F P L M T S / I M T - 2 0 0 0 and the mobile multimedia applications.
References [1] J.A.C. Bingham, "Multicarrier modulation for data transmission: An idea whose time has come," IEEE Commun. Magazine, pp. 5-14, May 1990. [2] M.L. Doelz, E.T. Helad, and D.L. Martin, "Binary data transmission techniques for linear systems," Proc. IRE, vol. 45, pp. 656-661, May 1957. [3] H.F. Harmuth, "On the transmission of information by orthogonal time functions," AIEE Trans. Commun. Electron., vol. 79, pp. 248-255, July 1960. [4] S.B. Weinstein and P.M. Ebert, "Data transmission by frequency-division multiplexing using the discrete Fourier transform," IEEE Trans. Commun. Tech., vol. 19, pp. 628-634, Oct. 1971. [5] Y. Wu and B. Caron, "Digital television terrestrial broadcasting," IEEE Commun. Magazine, pp. 46-52, May 1994. [6] B. Le Floch, M. Alard, and C. Berrou, "Coded orthogonal frequency division multiplex," Proc. IEEE, vol. 83, pp. 982-996, June 1995. [7] M. Alard and R. Lassalle, "Principles of modulation and channel coding for digital broadcasting for mobile receivers," EBU Technical Review, no. 224, pp. 168-190, Aug. 1987. [8] H. Sari, G. Karam, and I. Jeanclaude, "Transmission techniques for digital terrestrial TV broadcasting," IEEE Commun. Magazine, pp. 100-109, Feb. 1995. [9] L.J. Cimini, Jr., "Analysis and simulation of a digital mobile channel using orthogonal frequency division multiplexing," IEEE Trans. Commun., vol. 33, pp. 665-675, July 1985. [10] A.E. Jones, T.A. Wilkinson, and S.K. Barton, "Block coding scheme for reduction of peak to mean envelop power ratio of multicarrier transmission schemes," Electronics Letters, vol. 30, pp. 2098-2099, Dec. 1994. [11] L. Vandendorpe, "Multitone spread spectrum communication systems in a multipath Rician fading channel," in Proc. IZSDC, Mar. 1994, pp. 440-451. [12] S. Kaiser, "OFDM-CDMA versus DS-CDMA: Performance evaluation for fading channels," in Proc. IEEE ICC, June 1995, pp. 1722-1726. [13] S. Kondo and L.B. Milstein, "Performance of multicarrier DS CDMA Systems," IEEE Trans. Commun., vol. 44, pp. 238-246, Feb. 1996. [14] E.A. Sourour and M. Nakagawa, "Performance of orthogonal multicarrier CDMA in a multipath fading channel," IEEE Trans. Commun., vol. 44, pp. 356-367, Mar. 1996. [15] N. Yee and J.P. Linnartz, "Multi-carrier CDMA in an indoor wireless radio channel," Memo. No. UCB/ERL M9~//6, Electronics Research Lab., UC-Berkeley, Feb. 1994. [16] M.A. Tzannes and M.C. Tzannes, "Bit-by-bit channel coding using wavelets," in Proc. IEEE GLOBECOM, Dec. 1992, pp. 684-688. [17] R. Orr, C. Pike, and M. Bates, "Covert communications employing wavelet technology," in Proc. IEEE Asilomar Conf. on Signals, Systems and Computers, Nov. 1993, pp. 523-527. [18] P.P. Gandhi, S.S. Rao, and R.S. Pappu, "On waveform coding using wavelets," in Proc. IEEE Asilomar Conf. on Signals, Systems and Computers, Nov. 1993, pp. 901-905. [19] M. Medley, G. Saulnier, and P.K. Das, "Applications of wavelet transform in spread spectrum communications systems," in SPIE Proc. Wavelet Applications, vol. 2242, pp. 54-68, Apr. 1994. [20] K.H. Chang, X.D. Lin, and H.J. Li, "Wavelet-based multi-carrier CDMA for PCS," in Proc. IEEE ICASSP, May. 1996, pp. 1443-1446. [21] K.H. Chang, X.D. Lin, and M.G. Kyeong, "Performance analysis of wavelet-based MC-CDMA for FPLMTS/IMT-2000," in Proc. IEEE ISSSTA, Sep. 1996, to be published.
Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
405
SIGNAL DENOISING THROUGH MULTIFRACTALITY W. Kinsner and A. Langi Department of Electrical and Computer Engineering, Signal and Data Compression Laboratory, University of Manitoba, Winnipeg, Manitoba, Canada R3T-5V6, email: {kinsnerllangi} @ee.umanitoba.ca and TRLabs (Telecommunications Research Laboratories) 10-75 Scurfield Boulevard, Winnipeg, Manitoba, Canada R3Y-1P6 ABSTRACT This paper presents a new framework for signal denoising based on multifractality, and demonstrates its practicality with several examples. Signal denoising is concemed with the separation of noise from a signal, and then with reducing the noise without altering the signal significantly. This paper demonstrates that a multifractal measure can be used to guide the process of noise reduction so that the fractal spectrum is preserved in the signal. INTRODUCTION
Denoising is critical in many signal applications in which noise contamination reduces the performance of signal processing. For example, signal analysis often results in incorrect characterization due to noise [ScGr91]. In signal compression, contaminated signals are often difficult to compress because their entropy values are very high [LaKi96a]. Unfortunately, proper denoising is difficult because neither the signal nor the noise is known. Although the concept of denoising is not new theoretically, it is now entering a practical phase due to several recent developments in the area of wavelets, contextual prediction, and multifractality. The current denoising algorithms are based on preserving selected characteristics of signals that do not occur in noise, such as regularity, smoothness, predictability, power spectrum density, and linearity [Dono92], [KoSc93], [CoMW92]. Although such algorithms perform well for classes of relatively smooth signals, they fail to apply well to noise-like signals (i.e., signals having noise appearances) such as image textures or speech consonants. We have developed an approach based on singularity preservation as the denoising criterion for regular as well as noise-like signals. This approach was prompted by previous work on singularity characterization using wavelets, indicating that singularity could represent regular and noise-like signals faithfully, i.e., signals reconstructed from wavelet-detected singularities are perceptually indistinguishable from the original ones [MaHw92]. In particular, multifractal measures of signals (e.g., a spectrum of singularities or the R6nyi generalized dimensions [Kins94]) can be used to characterize singularities [LaKi96b], [FKPG96], [Lang96]. Hence, denoising schemes should preserve signal multifractality. Furthermore, the removed parts must have multifractal characteristics of noise. EXAMPLES OF DENOISED IMAGES
This papers shows examples of applying the measures in various image denoising schemes (i.e., wavelet shrinkage [Dono92] and prediction [KoSc93]) as well as some high-quality highbit-rate image compression schemes (e.g., joint photographic expert group, JPEG) to
406
Fig. 1. Comparisonof (a) a 512x512 aerial ortho image and (b) the denoised image, using wavelet shrinkage at a level suitable for 2.03:1 lossless compression ratio. demonstrate the relation between the measure and the perceptual reconstruction quality, as well as the practicality of the framework. In one example, we have denoised an aerial ortho image to enable a compression ratio (CR) of at least 2:1. The importance of this example is that the image was almost incompressible (1.06:1) from Shannon's entropy point of view. This was achieved by wavelet shrinkage in which an image is first transformed into a wavelet domain, then the wavelet coefficient values are shrunk according to a soft thresholding, and the image is reconstructed from the shrunk coefficients. Increasing the thresholding level results in an increase in a lossless compression ratio of the denoised image. Figure 1 compares the original and the denoised images at a thresholding level of 0.011 for a 2.03:1 CR. Although the reconstructed image is smoother (with a 35.5 peak signal-to-noise ratio, PSNR), all sharp edges are still preserved, which makes denoising superior to classical filtering techniques that tend to blur edges (i.e., the high-frequencyparts of the image are altered). In another example, we have used prediction for denoising [LaKi96a], as shown in Fig. 2.
Fig. 2. Comparisonof (a) a 256x256 aerial ortho image and (b) a denoised image using prediction suitable for 2.22:1 lossless compression ratio, and (c) the residual image (enhanced for visual presentation).
407 This contextual predictive scheme removes noise while preserving image predictability. The approach results in a PSNR of more than 49.9 dB at a 2.22:1 CR and preserves image perceptual quality (i.e., the original and denoised images are perceptually indistinguishable). The removed part of the original image (called the residual image) has noise characteristics, as demonstrated in Fig. 2c which is amplified to the maximum range from 0 to 255. It is seen that the enhanced image contains no trace of the original image. We have verified experimentally that the prediction-based denoising preserves image multifractality (as measured by the R6nyi generalized dimension), while high-quality lossy compression schemes such as ]PEG do not. This constitutes the novelty of this paper. Figure 3a compares the R6nyi generalized dimensions D q of the original, denoised, and residual images, as well as JPEG 1 (CR of 2.08:1) and IPEG 2 (1.87:1) images [Brad94]. The Dq plots of the original and denoised images coincide, while those of the JPEG schemes deviate at low q. Using a Legendre transform, we can also calculate the singularity spectraf(e0, with similar results (see Fig. 3b). The f(c~) curves of the original and denoised images also coincide, while those of the JPEG images deviate at high singularities. This indicates that the JPEG schemes
Fig. 3. Multifractalmeasures of the original and various denoised images (2.22:1 CR prediction, 2.08:1 CR JPEG 1, and 1.87:1 CR JPEG 2 schemes): (a) the R6nyi generalized dimensions Dq, (b) spectra of singularitiesf((~), and (c) a zoomed-inregion of thef(~), showingthat while singularity spectra of the original and denoised images coincide, those of the JPEG images deviate at high singularities (~.
408 alter high singularity components of the original image. Figure 3c shows the discrepancy clearly in a zoomed-in plot at a high singularity region. It is important to notice that the multifractal measure is a clear indicator of the noise-like nature of the residual image which has a single fractal dimension as demonstrated by either the Dq flat dashed line in Fig. 3a, or, alternatively, a single point on thef(o0 curve in Fig. 3b. The high performance of the predictionbased denoising has prompted us to implement it in a commercial application (compressing otherwise incompressible aerial ortho images each 25 Mbytes in size) [LaKi96a]. CONCLUSIONS Denoising of signals appears to be a very important development in signal preprocessing for compression and other feature extraction procedures. Multifractality provides a framework for denoising through a multifractal measure for denoising quality. Such a framework can cover both regular and noise-like signals. The approach has become practical through our accurate schemes to compute the R6nyi generalized dimension and the spectra of singularities. This framework can be extended to other signal processing applications. REFERENCES
[Brad94] J. Bradley, XV v.3.10a (a Unix program). Available at
[email protected], 1994. [CoMW92] R. R. Coifman, Y. Meyer and M. V. Wickerhauser, "Wavelet analysis and signal processing" in Wavelet and Their Applications, M. Ruskai (ed.), Boston: Jones and Bartlett, pp. 153-178, 1992. [Dono92] D. L. Donoho, "De-noising via soft-thresholding", Technical Report, Department of Statistics, Stanford University, 1992, 37p. (Available through ftp from: ftp://playfair.stanford.edu/pub/donoho) [FKPG96] M. Farge, N. Kevlahan, V. Perrier, and E. Goirand, "Wavelets and turbulence," Proceedings of the IEEE, vol. 84, no. 4, pp. 639-669, April 1996. [Kins94] W. Kinsner, "Fractal dimensions: Morphological, entropy, spectrum, and variance classes" Technical Report, DEL94-4, Department of Electrical and Computer Engineering, University of Manitoba, 146 pp, April 1994. [KoSc93] E. J. Kostelich and T. Schreiber, "Noise reduction in chaotic time-series data: A survey of common methods" Physical Review E, vol. 48, no. 3, pp. 1752-1763, September 1993. [LaKi96a] A. Langi and W. Kinsner, "Compression of aerial ortho images based on image denoising" in Proc. NASA/Industry Data Compression Workshop 1996, (Snowbird, Utah; 4 April 1996), A.B. Kiely and R.L. Renner (eds), pp. 81-90. (Available from the Jet Propulsion Laboratory, California Institute of Technology, as JPL Publication 96-11. Contact: Dr. Aaron B. Kiely,
[email protected]) [LaKi96b] A. Langi and W. Kinsner, "Singularity processing of nonstationary signals" in Proc. IEEE Canadian Conf. Elect. and Comp. Eng., ISBN 0-7803-3143-5 (Calgary, Alberta; 26-29 May, 1996) pp. 687-691. [Lang96] A. Langi, "Wavelet and fractal processing of nonstationary signals" Ph.D. Thesis, Department of Electrical and Computer Engineering, University of Manitoba, 1996, 456 pp. [MaHw92] S. Mallat and W. L. Hwang, "Singularity detection and processing with wavelets" IEEE Trans. Inform. Theory, vol. 38, no. 2, pp. 617-643, 1992. [ScGr91] T. Schreiber and P. Grassberger, "A simple noise-reduction method for real data" Physics Letter A, vol. 160, pp. 411-418, 1991.
Proceedings IWISP '96; 4- 7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
409
Application of Multirate Filter Bank to the Co-Existence Problem of DS-CDMA and TDMA Systems Shinsuke Hara, Takahiro Matsuda and Norihiko Morinaga Graduate School of Engineering, Osaka University, Osaka, Japan E-Mail : hara@ comm.eng.osaka-u.ac.jp A b s t r a c t - I n this paper, we discuss the co-existence problem of DS-CDMA and TDMA systems where both systems share the same frequency band to improve the spectral efficiency. We propose a complex multirate filter bank (CMRFB) based adaptive notch filtering technique for the DS-CDMA systems, which can observe the received signal with different frequency resolutions at the same time, and easily form the most suitable notch filter for rejecting the TDMA signal. I. I n t r o d u c t i o n DS-CDMA (Direct Sequence-Code Division Multiple Access) system has the attractive feature of capability to share frequency band with narrowband communication systems without intolerable degradation of either system's performance. A DS-CDMA overlay has been suggested to improve the
spectral efficiency as well as to share the frequency band with existing narrowband systems[I]. The spread spectrum signal causes little damage to the narrow band signal due to the low spectral profile. On the other hand, it is inherently resistant to the narrowband interference, because the despreading operation has the effect of spreading the narrowband energy over a wide bandwidth. However, it has been demonstrated that the performance of spread spectrum system in the presence of narrowband signal can be enhanced significantly through the use of active narrowband interference suppression prior to despreading[2]. The Fast Fourier Transform (FFT) based adaptive notch filtering technique[3] first observes the received signal composed of a desired spread spectrum signal and some undesired narrowband interference in the frequency domain through the FFT, and then rejects the frequency band containing the interference component by forming a notch filter. Among the narrowband interference rejection techniques, this technique is attractive in terms of hardware complexity, however, it has to divide the whole received frequency band into a lot of narrow bands with the same bandwidth. It could result in the increase of computational cost and the distortion in the spread spectrum signal. We do not have to observe and divide the frequency band where there is no narrowband interference. In this paper, we propose a complex multirate filter bank (CMRFB) based adaptive notch filtering technique to solve the co-existence problem of CDMA and TDMA systems. We show the principle of the CMRFB based adaptive notch filtering technique, and discuss the bit error rate (BER) performance for both CDMA and TDMA systems. II. C o m p l e x M u l t i r a t e Filter B a n k Fig.1 (a) shows a complex multirate filter bank (CMRFB) in a DS-CDMA receiver, which is composed of an analysis filter bank and a synthesis filter bank. At the first stage, a down-converted discrete-time received.~','nal rrn) is passed through a pair of FIR digital filters (analysis filters:Ao(z) and At(z)) with'frequency responses as demonstrated in Fig.l(e). The filtered signals can be decimated by two, because they are approximately band-limited (lowpass and highpass, respectively). The analysis filters can be recursively used at any filter output. Fig.1 (f) shows the frequency response after the fourth stage in Fig.1 (a), where we can see four types of bandpass filters with different pass bandwidths. The decimated subband signals are recombined in the corresponding synthesis filter bank composed of expanders and synthesis filters:So(z) and St(z). Since multirate system has been mainly discussed with real filters[4], it can deal with only the positive frequency component of the input signal. In quasi-coherent detection systems, however, the down-converted signal processed in the baseband has positive and negative frequency components. 9
.
..,
:
~.i ~ , ~ .
.
c
~
-
410
Fig. 1 Complex multirate filter bank and adaptive notch filtering Therefore, we need to design the multirate filter bank with complex filters. In this case, the perfect reconstruction condition is written as So(z) = -jAo(z), (1) S~r(z) = jA ~r(z)=jAo(-Z), (2) where Ao(z), A~r(z), So(z) and Sl(z) are the frequency responses in terms of the z-transform. Ill. Adaptive Notch Filtering Technique When the received signal is composed of (wideband) DS-CDMA and (narrowband) TDMA signals as shown in Fig.1 (b), the hatched filter output in Fig.1 (a) contains mainly the (undesired) TDMA signal component. Therefore, setting the corresponding synthesis filter input to be zero in the synthesis filter bank, we can easily reject the narrowband interference (TDMA signal) (see Fig.1 (c)). The CMRFB based notch filtering technique does not try to divide the frequency band where there is no narrowband interference, and it can easily form the most suitable notch filter for rejecting the narrowband interference. It results in less distortion in the wideband DS-CDMA signal and less calculation cost in the adaptive notch filter forming (it also introduces the energy saving of mobile battery). The CMRFB technique is applicable to the DS-CDMA receiver in both the base station and mobile terminal, and furthermore, it is also effective for the TDMA group demodulator in the base station. Because of the phase linearity of complex multirate filter bank, we can directly use the rejected analysis filter output to demodulate the (phase-modulated) TDMA signal (see Fig.1 (d)). When 1 DS-CDMA system and N frequency-multiplexed TDMA systems are sharing the same frequency band, in order to support all the systems, we usually require 1 base station for the DS'CDMA system and N base stations for the TDMA systems, each of which can handle a single multiplexed signal. However, employing the complex multirate filter bank based technique, we can integrate the N+I base stations into one intelligent base station, which can handle both of the multiplexed signals simultaneously. This system must be a solution for the co-existence problem of DS-CDMA and TDMA systems.
411
IV. Numerical Results and Discussions A. System Model Fig.2 shows the co-existence problems of CDMA and TDMA systems discussed in this paper. The TDMA system is based on a QPSK/coherent demodulation scheme, and the root Nyquist filter with roll-off factor of 0.5 is used for baseband pulse shaping in the transmitter and receiver. The same modulation/demodulation scheme and
Fig. 2 C o - e x i s t e n c e of C D M A a n d T D M A systems
Nyquist filter are used in the CDMA system, and Gold codes with processing gain of 31 are used for spectrum spreading. The complex multirate filter bank in the CDMA receiver is constructed with a polyphase implemented[4] 32-tap or 12-tap complex filter obtained by the modification of real filters in [5]. Fig.1 (e) shows the frequency response of the 32-tap complex filter. We assume an additive white Gaussian noise (AWGN) channel, and define E(C/T) and E(T/C) as the C D M A - t o - T D M A and TDMA-toCDMA
signal
energy
ratios,
respectively,
and
B(C/T) and B(T/C) as the CDMA-to-TDMA and T D M A - t o - C D M A bandwidth ratios, respectively.
B. Bit Error Rate of CDMA System with TDMA Signal Fig.2 shows the power spectrum of the received signal composed of some CDMA components and 1 TDMA component. The center frequency of the TDMA signal is located at 27/128Hz, which corresponds to that of a notch filter formed by the 6-stage CMRFB. Therefore, the 6stage CMRFB can perfectly reject the TDMA signal with B(C/T)=64. Fig.3 shows the BER of the CDMA system for E(C/T)=-5dB. Without the notch filtering, the BERs are almost the same for different values of B(C/T). It means that the BER depends not on B(C/T) but on E(C/T). The notch filtering drastically improves the BER. The 4-stage CMRFB can perfectly reject the TDMA signal with B(C/T)=16, the 5-stage CMRFB the TDMA signal with B(C/T)=32, and the 6-stage CMRFB the TDMA signal with B(C/T)=64.
F i g . 3 B i t e r r o r r a t e of C D M A s y s t e m with I TDMA signal
Note that it is desirable to form a notch filter as narrow as possible, of course, wide enough to reject the narrowband
interference,
because
the
notch filter rejects a part of the energy of the CDMA signal as well as the narrowband interference (the loss of energy is proportional to the bandwidth of the notch filter). When we use K-stage CMRFB, we can form notch filters with up to 1/2 K bandwidth of the received frequency bandwidth. Therefore, the BER
Fig. 4 B i t e r r o r r a t e of C D M A s y s t e m without notch filtering
412
improves as the number of stages increases. Figs.4 and 5 show the BERs of the CDMA lAWGN Channel 12-Tap CMRFB system for 1, 4, 8 and 16 users without and with the notch filtering for 1 TDMA signal, respectively, -ll where we assume B(C/T)-64 and E(C/T)=-5dB. 10 " When there is 1 TDMA signal in the received ~, frequency band, without the notch filtering, the BER ,..,~ld2,, severely degrades as the number of CDMA users 0 1 User increases. On the other hand, with the notch A 4 Users ', filtering, the BER performance can be improved. ~ 1(~3 . [] 8 Users - - No Interference ' ~ N X ~ C. Bit Error Rate of TDMA System with CDMA 9 16 Users 1 TDMA Signal ' Signal .~ with Notch Filter m -4 Fig.6 shows the BER of the TDMA system 10 " CDMA to TDMA CDMA to TDMA Bandwidth Ratio Signal Energy Ratio when the received signal is composed of 1 TDMA B(C/'I3=64 E(C/T)=-5 dB component and 1 CDMA component (see Fig.3 (b)). -5 As the E(T/C) decreases, the BER degrades. The 10 2 4 6 8 10 12 Es/N0 [dB] simulation result for the energy penalty agrees well with the calculated result with a Gaussian Fig. 5 Bit error rate of CDMA system approximation for the CDMA signal. It means that with notch filtering we can really deal with the CDMA signal as a Gaussian noise in terms of the TDMA system. 1 AWGN Channel with 1 CDMA Signal V. Conclusions TDMA to CDMA In this paper, we have discussed the co-r !..-....._~ x J ~ Bandwidth Ratio existence problem of CDMA and TDMA systems, 10 ~"~ B(TIC)=l164 and proposed a complex multirate filter bank (CMRFB) based adaptive notch filtering technique ~16 2 for the CDMA system. We have shown the principle E(T/C)-TDMA to C of the CMRFB based adaptive notch filtering Sign~ Energy~Uo ~tX,,~ technique, and discussed the bit error rate ~16 3 -----O--- E(T/C)=-3dB ~'~X~ performance for both CDMA and TDMA systems ~. E(T/C)=OdB '~N~N~ with and without the proposed technique. rl--- E(T/C)=5dB ~ ~ The CMRFB based technique can observe the received signal composed of a desired wideband ..... Lower Bound signal and an undesired narrowband interference -5 I I I I I with different frequency resolutions at the same time, 10 0 2 4 6 8 10 12 Es/N0 [dB] and easily form the most suitable notch filter for rejecting the interference. Fig. 6 Bit error rate of TDMA system References with 1 CDMA signal [1] L. B. Milstein et al., "'On the Feasibility of a CDMA Overlay for Personal Communications Networks," IEEE Jour. on Sel. Areas in Commun., vol. 10, pp.655-668, May 1992. [2] H. V. Poor and L. A. Rusch, "'Narrowband Interference Suppression in Spread Spectrum CDMA," IEEE Personal Communications, vol. 1, No.3, pp.14-27, Third Quarter 1994. [3] L. B. Milstein, "'Interference Rejection Techniques in Spread Spectrum Communications," Proc. of the IEEE, vol. 76, pp.657-671, June 1988. [4] P. P. Vaidyanathan, "'Multirate Systems and Filter Banks," Prentice-Hall, 1993. [5] V. K. Jain and R. E. Crochiere, "'Quadrature Mirror Filter Design in the Time Domain," IEEE Trans. on Acoust. Speech Signal Proc., vol. 32, pp.353-361, Apr. 1984.
g
"164
9 !,,,~
Session N: EDGE DETECTION
This Page Intentionally Left Blank
Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
415
Multiscale Edges Detection by Wavelet Transform for Model of Face Recognition *Fan YANG, *Michel PAINDAVOINE, **Herv@ ABDI *University of Burgundy, LIESIB 6 B o u l e v a r d G A B R I E L 21000 D I J O N , F R A N C E email : fanyang~u-bourgogne.fr * * U n i v e r s i t y of T e x a s U . S . A
Abstract One of way to store and recall face images uses the linear auto-associative memory. This connectionist model is in conjunction with a pixel-based coding of the faces. The image processing using the Wavelet transform can be applied to the multiscale edges detection. In this paper, we describe a technique of learning for the auto-associator based on Wavelet transform and a 17% improvement of the performances for face recognition has been obtained in comparison with the standard learning.
1
Introduction
As noted, the linear auto-associator is a particular case of the linear-associator. The goal of this network is to associate a set of stimuli to itself which could be used to store and retrieve face images and it also could be applied as a pre-processing device to simulate some psychological tasks, such as categorizing face according to their gender[l]. The auto-associator function as a pattern recognition and pattern completion device in that it is able to reconstruct learned pattern when noisy of incomplete versions of the learned input pattern are used as "stimuli". A technique of learning based on the Wavelet transform can improve recognition capability when the pattern images are with a great noise. In the second part, the basic features of the classical auto-associative memory are briefly described. In the third part, we propose a technique of learning for auto-associator using the multiscale edges of face images and a comparison has been made between the results of different operators of edges detection. The experimental results concerning the face recognition of different types are presented in the fourth part.
2
M o d e l description
First, the faces to store are coded as a vector of pixel intensities digitizing each face to form a pixel image and concatenating the rows of the image to form I*l vector Xk. Each element in Xk represents the gray level of the corresponding pixel. Then, each element of the face vector Xk is used as input to a cell of the auto-associative memory. The number of cells of the memory is equal to the dimension of the vector Xk. Each cell in the memory is connected to each other cell. the output of a given cell for a given face is simply the sum of its inputs weighted by the connection strengths between itself and all of the other cells. The intensity of the connection is represented by an I*I matrix W . In order to improve the performances of the auto-associator, the Widrow-Hoff learning rule is used which correct the difference between the response of the system and the expected response by iteratively changing the weights in W as follows: W (t+l) - W (t) -t- r](Xk -- w ( t ) Z k ) X [ Where ~ is a small learning constant and k is randomly chosen. The Widrow-Hoff learning rule can be analyzed in terms of the eigenvectors and the eigenvalues of the matrix stimuli X (set of K faces)[2]:
W (t) = P { I -
with: A: diagonal matrix of eigenvalues of X X w P : matrix of eigenvectors X X T
( I - (rlA)t}P T
416 with a r] smaller than 2Area . - 1 ()~ma, being the largest eigenvalue), this procedure converges toward:
W(OO) = p p T The notation of eigenvectors and eigenvalues provides a possibility to work with the matrix of small dimension. So that, the matrix W of dimension I*I can be computed as W = P P T , w i t h the matrix P of dimension I*L (L being the number of eigenvectors with a non zero eigenvalue, i < min{ I, K }). For example, we have used an auto-associator to face recognition in which I is equal to 33975 and L is equal to 40 or 200.
3
New technique of learning using the multiscale edges
The standard learning for auto-associative memory consists in presentation of a series of face images to input of model as patterns stored. The auto-associator trained with this method doesn't give satisfactory results in the case of more noisy stimuli. The contour gives the first strong impression for recognition[3]. We have introduced the edges of face images in auto-associator during the learning. In domain of image processing, many algorithms have been proposed to extract the edges which come in two class: the gradient operator and the optimal detector. The Sobel operator uses a mask of [3*3] which gives the satisfactory results for images without noise. The Canny-Deriche filter is a optimal detector whose implementation can be realized on a recursive of order 2 form. The technique of the Wavelet transform allows the detection of multiscale edges and is used to detect all the details which are in an image by modifying the scale. We choose here the optimized Canny-Deriche filter (recursive of order 3) as a Wavelet function of edge detection: f(x)
=
ksxe
ms~' + e ms~
-
e~
with k--0.564 and m-0.215 The method has been applied which allows a direct implementation of the Wavelet transform using a convolution between the image and the edge detection filter for different scales (s = 2 j) to obtain edge images[4]. During the learning of the auto-associator, for each face, a pre-processing has been effected to extract the edges of face image. Then, not only a face image but also the edge images have been proposed to input of auto-associator as patterns. The fig.1 displays the responses of the memories trained with the different ways. The top panels present: la) a stimulus which is noisy with additive random noise, lb) response of the model trained only with the face images, and lc) the desired response. The bottom panels show: ld) response of the model trained adding the edge images by the Sobel operator, le) response of the model trained adding the edge images by the filter Canny-Deriche, and lf) response of the model trained adding the multiscale edges images by Wavelet transform (scale s=l,2 4 8).
Figure 1" Response of the models
Figure 2: Correlation of the models
Clearly, the standard method gives bad results for this noisy stimulus. We remark as using the edge images detected with the different techniques, from the Sobel operator to the Wavelet transform; the quality of recognition improves gradually. The quality of recognition can be measured by computing the cosine (correlation) of the angle between the vector Ok (response of model) and Tk (desired response). The fig.2 shows the correlations of the auto-associators trained with different manners.
417
Experimental results re have applied this new technique of learning using multiscale edges images to store a set of 40 face Caucasian 0 males and 20 females). The fig.3 displays the responses of 2 memories, the one trained with the standard ~rning and the another trained with the new technique of Wavelet transform. The stimuli are noisy with |ditive Gaussian noise, (from left to right) 1) Signal-to-Noise Ratio SNR=I, 2) SNR=3/5, 3) SNR=3/8, and qR=3/13.
Figure 3: The top panels show 4 stimuli, the middle panels the responses produced by the autoassociator trained with the new technique of learning and the bottom panels the response of the autoassociator trained with the standard learning.
Figure 4: Stimuli and responses of the models.
418 The fig.4 shows the results of these 2 memories for the new faces (from top to bottom): 1) a new face similar to the set of learned faces (Caucasian face), and 2) a new face face different from the set of learned faces (Japanese face). The auto-associator trained with the standard learning is not able to give distinguishable responses. Better results have been obtained for the model trained with the new technique. The fig.5 displays the mean correlation functions of these 2 memories: (5a): with 10 Caucasian faces whose faces without noise were learned, (5b): with 10 new faces similar to the learned face, (5c): with 10 new Japanese faces ( - - New technique, __ Method standard ) .
5a
o.91 i.. i_ o 0 r
5c
9
0.9
0.85
0.9 tO -,~
5b ,
0.95
i
0.9
0.85 i
0.8
0.8 0.85
0.75
I I
0.75
I
0.8
I
I
I
0.7
I
0.7
I
0.75
0.65
\ \
1 i
0.65
0
0.6
\
0.65
\ \
0.7
I
\
0.6~"
\ \ \
' 0.55 50 1 O0 0 Noise Magnitude
' 0.55 ~ ' 50 1 O0 0 50 1 O0 Noise Magnitude Noise Magnitude
Figure 5" Mean correlation functions.
5
Conclusion
We have proposed a technique of learning based on Wavelet transform for auto-associative memory which allows to improve the performances of face recognition when stimuli are noisy. More the noise is great, more the improvement is important. A 17% improvement of correlation in comparison with the standard learning has been obtained in the case of more noisy face. Considering the necessary computation amounts, we will implement this auto-associator on several processors (DSP TMS320 C40) in parallel form. We also hope to use this technique toward other applications like character recognition.
References [1] D. Valentin, H. Abdi and A.J. O'Tool Categorization and identification of human face images by neural networks: A review of the linear autoassociative and principal component approaches, In Journal of Biological Systems, 2, 1994. [2] H.Abdi, Les Rd.seaux de neurones, Presses Universitaires de Grenoble, Grenoble, 1994. [3] X.Jia and S.Nixon, Extending the feature vector for automatic face recognition, IEEE-Transactions on "Patterns Analysis and Machine intelligence "Vol.17, December 1995. [4] S.Mallat and Z.Zhong, caracterization of signal from multiscale edges, IEEE-PAMI, Vol.14, July 1992.
Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
419
Edge Detection by Rank Functional Approximation of Grey Levels J.P. ASSELIN de BEAUVILLE~ D BI~ F.Z.KETTAF Laboratoire dqnformatique - Universit6 de Tours E.3.I.- Ecole dqng6nieurs en Informatiquepour l'lndustrie 64 avenue Jean Portalis, Technop61eBoite N*4 37913 Tours Cedex 9 - France e-mail: asselin@univ-tours,fr
Abstract: In this paper, a new method of edge detection based on rank functional approximation is proposed. This approach regards the edge as the local discontinuity of grey levels, and this discontinuity is extracted by approximating the local grey levels with a linear rank function. The method proposed is robust against noise and can adapt to many edge models (step edge, ramp edge, roof edge.... ). In addition, a new method for selecting the edge position is also proposed, which leads to the thickness of the detected edge being only 1 pixel. Key words: Edge detection, image analysis, pattern recognition, rank statistics, median filter. I. Introduction In light intensity images, edges are usally regarded as the discontinuities of the grey levels, and the edge detection is often implemented in two steps. The first step extracts the discontinuities of the grey levels and the second step thresholds the amplitude of the discontinuities so as to decide the correct edge position. In traditional methods, the discontinuity is extracted by differentiating the grey levels in certain directions. Some examples of these methods are: Sobel gradient, Prewitt gradient, etc. These gradients are easily calculated, but they are too sensible to noise and their responses are not the same for different edge directions. In addition, these methods did not consider the choice of the threshold. Differently from the traditional methods, Marr and Hildreth [1] proposed the zero-crossing of the second derivative of Gaussian filter. This operator can precisely detect the edge in different scales and can minimize the errors of the edge positions both in spatial and frequence domains. In this method, the image is first smoothed by a Gaussian filter with a given scale, then the second derivative of the Gaussian filter is used to find the position of the edge by the zero-crossing output of the filter. The threshold method proposed by Marr and Hildreth consists in accumulating the detected edge positions with different scales. This method is robust against noise, but it often produces false edges especially in the corner of the objets. It is Canny [2] who first formuled three criteria, leading to many new mathematical schemes, such as Deriche scheme [3], Shen scheme [4], Kittler scheme [5] etc. These approaches regard the discontinuities as differents profile models, such as ideal step edge model, ramp edge model, roof edge model etc. The operators for detecting edges are obtained by optimizing the three criteria of Canny with different edge models. The edge detection is implemented by first filtering the image and then detecting the discontinuities with the derivative of the operators, the edge position is decided by nonmaximum suppression and hysterisis thresholding proposed by Canny. The mathematical schemes often give better results, because they have the advantages of multiscale edge detection like that of Marr and Hildreth and of precise edge position owing to nonmaximum suppression and hysterisis thresholding. The problems with these methods are that they need too much calculations and that they consider the edge models only from one dimension. An other type of edge detection approach is the functional approximation, such as functional approximation in two directions proposed by T. Pavlidis [6]. the facet model functional approximation proposed by Haralick [7], the surface functional fitting method proposed by Nalwa [8], and the full plane functional fitting method such as those of Zhou [9]. In this type of the methods, the edge is regarded as the discontinuity of a surface. With this conception, the edges correspond to special distributions of grey levels in two dimensions. Owing to approximating of the surface with a function, these methods are therefore robust against noise and can adapt to different edge models. Considering that all the preceding functional approximation methods use a two dimentional function or a one dimensional function in two or more directions[6], the calculations are therefore complicated. For this reason, we have proposed a new approach which uses only a one dimensional linear function. In our method, we first choose a window with a desired size, next we arrange the pixels in an increasing order acoording to their grey levels, then we use a one dimensional linear fuction to approximate the distribution of the rank-ordered grey levels, and finally we decide the position of the edge by using a local and a global thresholds. In a wxw(=K) window, if we arrange the pixels in an increasing order acoording to their grey levels, we will obtain a distribution of the type shown in Figure 1, where K is number of the pixels in the window. The rank of each pixel is calculated by ordering their grey levels, rank 1 corresponds to the minimum grey level pixel and rank K the maximum grey level pixel. A linear function with two parameters can be used to fit the distribution of the grey levels by their ranks. It is evident that the slope of the straigh line represents the change rate of grey levels in the window. Intuitively, if there is a edge in the window, the changes of intensity will be relevant and the slope will be great. So the slope can represent the discontinuity of the grey levels in the window. One advantage of this method is that it simplifies the two dimensional problem to one dimension. Under the considered window, the profile of the edge may be a step edge, a ramp edge, or a line edge, but the distribution of the rank ordered grey levels is always incereasing. For all edge
420
types,the slope of the function is invariant to the edge directions, this is another advantage of the rank functional approximation. Because the discontinuity detected by this method is invariant to the edge, so it can be regarded as an isotropic edge detection operator. The edge position is selected by thresholding the discontinuities. The threshold method proposed in this paper consists of two parts: the first part is a local threshold, which is calculated by using edges geometries in a 3x3 window, this threshold can give a very thin edge (one pixel). The second part is a global threshold chosen empirically which controls the number of edges to be detected. In the next section, we give the description of the functional approximation. Then we describe the thresholding method and the implementation of the algorithm in section III and IV respectively. For visual comparison, the results of the proposed algorithm and that of Canny's and Deriche's methods for the same image are given in section V. II. Rank functional approximation' In the literature, the rank-ordered grey level is often used to filter an image, such as median filter, wilcoxon filter, etc. but for image analysis, only few researchs have been done with the rank-ordered grey levels. Zamperoni [10] uses the difference of the distribution of rank-ordered grey levels between two regions to detect the edge of textured image. Bovik [11] also used rank-ordered grey level to filter image and to detect the edge, but he used the order-statistics. The above methods model the edge as step edge and detect it by calculating the differences of the ordered grey levels between two regions in a windows. Because the relative positions of the two regions in the window may be horizontal, vertical, or diagonal, this leads to many masks to be considered and to an amount of calculations. Different from them, Kim [12] has proposed a method to detect the edges by subtracting the minimum rank-ordered grey level from the maximum rank-ordered grey level in a K-neighbourhood, this method is very simple but is very sensible to noise because he did not consider the contributions of the non-extrema rank-ordered grey levels, and he did not discuss how to decide the position of the edge. As we have noted in the first section, we arrange the pixels in the window into an increasing order according to their grey levels, this will project all the edge models to only a one dimensional rank-ordered grey levels distribution. So in our method, we only need to consider one mask. By using functional approximation to detect the discontinuities of the rank ordered grey levels, we obtain an algorithm that is robust against noise. Supposing that the size of the window is wxw=K pixels, the grey levels of the K pixels in the window is y(i), i=l...K, where i is the pixel number. After we arrange the K pixels into an increasing order, we get a vector Y=(Yl,Y2 ..... y r ) r such that y, _ Number of lines, Number of columns. size of window for calculating a, b, a*Var(b), => (K=wxw). size of window for calculating local thresthold => (Ka=sxs). percent of the discontinuities average Sp. Calculation of aij, bij, for all the pixels (i,j) in the image
Calculationof discontinuity: @ Calculate Var(bij) for all pixels in the image with a wxw window (l) Calculate and record aiflVar(bij) for all pixels in the image Calculate the average(noted by E) of aiflVar(bij) for all pixels. | Calculate the global threshold SAb(=Sp*E). Localization of the edge: For i = 1 to NbLine do For j= 1 to NbColumn do 9 Find the third maximum discontinuities Sr in a Ka window. (~ Ifaij*Var(bij) > Sc and aij*Var(bij) >Sabthen rij=255. else rij=0; Endlf Record edge information ri,i. End do End do
Figure 4 shows two real scene images of 256*256 pixels with 256 grey levels. The first image mainly contains step edges and ramp edges, whereas the second, contains step and roof ones. Figure 5 shows the results of our algorithm and that of Canny's and Deriche's methods for visual comparaison. Note that our algorithm gives more thin edges (for
422 example, the edges of the woman's arms) and also straight lines (for example the borders of the table in the office). The algorithm is implemented in C langage on a SUN SPARC station. For wxw=3x3, the calculation time is 15 seconds by image. For wxw=5x5, the calculation time is 50 seconds.
Figure 4. Original images
Figure 5. Results of edge detection References
[ 1]. [2]. [3].
D.MARR and E.HILDRETH, "Theory of edge detection," In Proc. Roy. Soc. London, 1980, PP 187-207. J.F.CANNY, "A computational approach to edge detection," IEEE Trans. PAMI 8, 1986, PP679-698. R.DERICHE, "Using Canny's criteria to derive an optimal edge detecter recursively implemented," Int. J. Comput. Vision, 1987. [4]. Jun SHEN and Serge CASTAN, "An optimal linear operator for step edge detection," Computer Vision, Graphics and Image Processing, Vol 54, N~ 1992, PP 112-133. [5]. M.PETROU and J.KITTLER, "Optimal Edge Detectors for Ramp Edges, " IEEE Trans. PAMI 13, 1991, PP483-491. [6]. T. PAVLIDIS, "Segmentation of pictures and maps through functional approximation," Computer graphics and images processing, 1972, PP360-372. [7]. R.M. HARALICK, "Digital step edges from zero crossing of second directional derivatives," IEEE Trans., PAMI 6,1984, PP58-68. [8]. V.S.NAWA and T.O.BINFORD "On detecting edge," IEEE Trans. PAMI 8, 1986, PP699-714. [9]. Y.T.ZHOU, V.VENKATESWAR and R.CHELLAPPA "Edge detection and feature extraction using a 2-D random field model," IEEE Trans. PAMI 11, 1989, PP84-95. [10]. P. ZAMPERONI, "Feature extraction by Rank-vector filtering for image segmentation", Int. Journal of Patten Recognition and Artificial Intelligence, Vo12, N~ 1988, PP301-319. [11]. W. KIM and L. YAROSLAVSKII, "Rank algorithms for picture processing, " Computer Vision, Graphics and Image Processing, 35, 1986, PP234-258. [12]. A.C.BOVIK, T.S.HUANG and D.C.MUNSON, "Edege-sensitive image restoration using ordered constrained least squares methods, "IEEE Trans.ASSP 33, 1995, PP1253-1263.
Proceedings IWISP '96," 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
423
Fuzzy Logic Edge Detection Algorithm Sakad Murtovaara l, Esko
J u u s o 1 and
Raimo Sutinen2
1) Control
Engineering Laboratory University of Oulu Linnanmaa, FIN-90570 Oulu Finland Phone: +358 81 553 1011 Fax: +358 81 553 2304 E-mail: { sakari.murtovaaralesko.juuso }@oulu.fi
2)ABBIndustry Oy Tym~.v~.ntie 14 FIN-90400 Oulu Finland Phone: +358 81 374 555 Fax: +358 81 374 486
Abstract In this project, fuzzy logic is applied to edge detection. The performance of a recovery boiler is strongly affected by the geometry of the char bed, and therefore the operation of the boiler can be improved, if this geometry is known. By the use of infrared fire-room cameras, the bed can not only be displayed to the operator but from this image it is also possible to calculate the geometry parameters of the char bed, when the edges of the bed have been detected. The system utilises the information coming from the recovery boiler. The image processing analysis tries to find out the contour of the bed. The image of the contour may contain pseudo pixels and gaps, e.g. caked liquor solids on the walls may cause erroneous pixels to appear in the contour. The aim of this project is to further improve the edge detection and the image processing. A new algorithm searching for the contour of the char bed is developed. The present algorithm is based on membership functions of the contour obtained from history data. This algorithm filters out fast changes of the contour. The extended algorithm takes into account the neighbouring points by examining the new contour, and if the distance between the forecasted pixel and the pixel obtained image processing goes over tunable limits, the algorithm removes these pixels. This further improves the efficiency of the algorithm and gives more accuracy. This project is included in the national technology programme financed by TEKES (Adaptive and Intelligent Systems Applications) and is done in co-operation with ABB Industry Oy.
Keywords: Fuzzy logic, Image processing, Edge detection and Recovery boiler.
424
Introduction The behaviour of the char bed in a recovery boiler is extremely difficult to monitor by using conventional instrumentation. The char bed height depends on operating variables such as liquor temperature and primary/secondary air ratio as well as air pressure. Digital image processing offers techniques to expand and improve the supervision and control of the burning. [ 1] The shape and positions of the char bed in recovery boiler, as well as the temperature distribution of the bed, are important control objects when boiler efficiency is to be maximised and emissions minimised. Visibility in the visible light region is limited. By using infrared fire-room cameras, the char bed can be displayed to the operator. The effect of changing operating variables (liquor temperatures, air pressure, air flow etc.) can be seen on monitor. The effects of any plugging of the liquor nozzles and slagging of the air ports can also be detected. A camera gives the most immediate information about the burning process and a clear image could help the operator to identify the beginning of the transients much earlier than by other means. [2, 3] In this paper, we discuss a new edge detection algorithm. This algorithm will further improve the recognition of char bed. By using fuzzy logic we can simplify this process and increase flexibility in the supervisory control of the burning process.
Image processing The image processing is divided in two main parts: processing of the incoming image and analysis of the pre-processed image. In this context, analysis means searching for the contour of the bed and calculating the numerical information describing the bed. The image processing part digitises the camera image and performs different kinds of neighbourhood operations: 10 consecutive frames are averaged to reduce noise and decrease the influence of instantaneous disturbances, dirt around the camera opening is masked away, edges in the image are enhanced by differentiation, and image is thresholded so that only the enhanced edges remain in the image. The result of the digital image processing is another "improved" image. [4] The analysis section takes this pre-processed image and searches for the pixels that from the contour of the bed. First, a search window is fixed to speed up the calculations.. From the defined 'search' area contour pixels are searched for according the following principles: 9 non-zero pixels are searched for downwards from the search window boundary in each column. 9 if at least two pixels are found to be on top of each other or in previous columns a contour pixels in nearby columns was found, this pixel is assumed to belong to the contour. 9 locations of the contour pixels are stored in a table, which then represents the instantaneous contour of the char bed. [5]
425 In the recovery boiler application the following features are analysed: instantaneous contour of the bed, height of the bed, horizontal position of the top of the bed, cross-sectional area of the bed, and figure parameters describing the shape of the bed.
Fuzzy logic in edge detection The aim of this project is to further improve the edge detection and the image processing. A new algorithm searching for the contour of the char bed is developed. It generates and updates membership functions for each contour point on the basis of history data (Fig. 1). Then it defuzzifies the resulting fuzzy numbers into a new contour (Fig. 2). Defuzzification is the based on centre of average of the membership functions (Fig. 1). According to the tests, this algorithm filters out fast changes of the contour.
Fig. 1. The cemre of average calculation.
Fig. 2. Calculation of new contour.
426 By extending the algorithm we can further improve the efficiency of the algorithm. The extended algorithm takes into account the neighbouring points by examining the new contour, and if the distance between the forecasted pixel and the pixel obtained image processing goes over tunable limits, the algorithm removes these pixels. The evaluation of the method will be continued with very large material in the Matlab-environment and after those tests, the implementation will be transferred to the application software. A suitable number of contours in history data is 10 contours since the effective changes in the state of burning processes are slow, a typical time constant may be order of minutes. By changing number of contours in history data we can effect how quickly the system can be adapted in movements in the char bed. The reliability of the searching for the contour of the char bed can be improved by developing a fuzzy method for the image thresholding (adaptivity) by changing the thresholding parameters according to the intensity of the image. A fuzzy control method has been outlined for the image thresholding to stabilise the image processing conditions. Conclusions
According to the tests, the present algorithm filters out fast changes of the contour of the char bed. In the recovery boiler, the changes are very slow, and therefore, the algorithm improves the searching of the contour. Already the present algorithm will increase flexibility of the supervisory control, and the extended algorithm can improve further the efficiency of the algorithm. The adaptation of the system by changing number of contours in history data can be tuned. In digital image processing, the dynamics of the phenomenon can also be utilised on the basis of successive images. References
[ 1]
[2]
[3]
R. Lillja, "Pattern recognition in analysis of furnace camera pictures," in Pattern recognition applications, September 2 7 - October 2, 1987. Tbilis, USSR 1987, The Soviet-Finnish Symposium. 12p. S. Murtovaara and E. Juuso, "Fuzzy Logic in Digital Image Processing for Recovery Boiler Control," In Proc. of TOOLMET'96 - Tool Environments and Development Methods for Intelligent Systems,, April 1 - 2, 1996, Oulu, Finland, Report A No 4, May 1996: Univ. of Oulu, Control Eng. Lab., pp. 199- 204. M. Ollus, R. Lilja, J. Hirvonen, R. Sutinen and S. Kallo, "Burning Process Analyzing by Using Image Processing Technique," in IFAC 3rd MMS Conference, June 14 - 16, 1988, Vol. 1, Oulu, Finland, 1988, pp. 274 - 281.
[4] [5]
R. Sutinen, R. Huttunen, M. OUus and R. Lilja, "A new analyzer for recovery boiler control," Pulp & Paper, Canada, pp. T83 - T86, 1992(1991). T. Hosti, "Digital image processing for recovery boiler control," Masters thesis, Univ. of Oulu, Dep. of Process Eng., Oulu, Finland 1992, 57 p.
Proceedings IWISP '96, 94- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
427
Topological Edge Finding
Mark Mertens* 9 Hichem Sahli and Jan Cornelis. Vrije Universiteit Brussel (VUB), Dept. ETRO-IRIS. Pleinlaan 2- B-1050 Brussels - Belgium Abstract
In this paper we describe a new automatic approach which calculates a polygonal image model for arbitrary images. It is part of a framework for image modeling[ 1]. To cope with the wide range of images the method has to be topological avoiding a high sensitivity to the exact pixel values. Part of this requirement can be fulfilled by using a distribution free nonparametric estimator gain function. This gain function will be the subject of the paper. We found that it results in very accurate edge representations and that it is robust against noise.
Introduction
We describe an edge detection approach in which edge finding and polygonalisation of curves are tackled jointly in one optimisation framework. This is achieved by formulating a gain function which evaluates the quality of postulated lines, assumed to be coincident with the edges in the image. Edges are detected by finding the lines that maximise the gain function, which measures the dissimilarity between regions on opposite sides of the postulated line. We represent a postulated line segment by the internal data structure of an agent in a multi-agent framework[ 1]. The agents find the line features in the image, by moving towards them in representation space, by maximising their value of the gain function. The result is an emergent configuration of line segments, globally coinciding with the edges in the image. The advantages for feature extraction are (1) the robustness and accuracy of the proposed gain function, (2) the merging of detection and representation of features, (3) the easy interpretation of the extracted line features (e.g. for object-based progressive transmission). This edge finding approach is highly homogeneous which facilitates its incorporation in different image processing applications.
Our new edge finding method
The problem with classical edge finding, when trying to determine the correct amounts of differentiation and smoothing[2],[3], has always been the choice of window size. We cannot take information that is too local since we need a clear identification of the regions on both sides of the edge - in our case we use line segments in stead of points around which we construct our windows - but we cannot use a global method either, since an edge is a localised characteristic and global methods will tend to merge different edge parts. In particular the use of a fixed window for the whole image is not a good choice since some parts of the image need a coarse and others need a fine detection. This can be described by the Heisenberg uncertainty principle[ 4][5]. The optimal solution is the use of windows which are adapted to the actual shapes of the objects appearing in the image. This seems to be a circular reasoning since we want to use these windows to find the objects in the first place. Starting from basic modeling principles, we state the problem as a prediction-verification problem. Can we determine the descriptive parameters of the boundary and verify its existence ? We can calculate both if we recognise the fact that in a discrete image a boundary can always be faithfully represented as a chain of line segments, solving a first topological problem. We can then postulate (predict) and verify (fig. 1) the existence of a particular line segment 3 with 4 descriptive parameters, namely the coordinates of its starting point (x0, Y0), a length I and a slope O~ in the object boundary. This strategy can be automated for most types of image. We propose a one step solution for all the classical edge-finding problems of calculating the likeliness that a pixel is part of an edge, thinning, linking, and polygonalisation for representation. Postulated lines (prediction)
InputImage
-,
I Dissimilarity
~1 gain (verification)
Optimisation procedure
I segments "Maximumcoinciding gain"linewith image edge segments
t
Dissimilarity measure
(gainfunction)
Fig. 1" Block diagram of our prediction/verification optimisation approach to edge finding and representation. We use our edge verification criterion as a gain function which has to be maximised by moving a postulated line segment through the image. When the gain is maximum the postulated line segment coincides with the edge-segment with the same parameters (x0, Y0, l, O~) in a boundary. The details of the developed prediction/verification approach (fig. 1) will not be elaborated in this paper, but they are described in [1 ]. For clearness and conciseness we will focus upon our edge definition and the resulting verification gain function, its properties and relation to classical approaches.
* The research of Mark Mertens was sponsored by the IWT.
428 Definition of edges and regions We define regions as connected sets of pixels, having a particular a priori unknown statistical distribution of numerical values (e.g. colours, grey values, texture measures...), which we shall simply call "colours". An edge is defined as the 8-connected, single pixel width, set of pixels "optimally" separating two regions with different distributions. We conjecture that the exact distributions are irrelevant and that we only need to establish a first order difference criterion to determine an edge, so we have a distribution free and nonparametric method. Notice that we define the edge as a locus of change, but a change between regions and not numerical "intensity" values. The problem now is to extract a reasonable amount of pixels from the regions on both sides of the edge, so that the separability, expressed by the gain function, can always be considered reliable (fig. 2). This issue is not raised in classical edge finding techniques. Theoretically the maximum gain value will be obtained when the postulated line segment coincides exactly with the edge-segment of the object. Practically the maximum could shift a little due to numerical errors or a large amount of noise. We will show however that our method is inherently noise insensitive while also giving good localisation, and this as contrasted with methods based on differentiation.
Fig. 2: Rectangular window with fixed width w, associated to postulated line segment $ (x0,Y0,/, 0~). Gain function For each postulated line segment, we sample the pixels in its associated window, which gives the possibility to select an optimum set of representative pixels of both sides, and evaluate the following verification gain function: (1) The gain G is calculated for a postulated line segment S (shown dashed in fig. 2) as the sum (over all colours i E N in the space of possible colours) of the absolute values of the difference of the number of pixels with colour i in R1 (namely
CiR~(g))
and R2 (namely C/R2(s)). Hence we obtain a first order topological dissimilarity measure for the two regions. When the two "colour" distributions have almost no overlap and the postulated line segment coincides with an edge-line segment in the image, the G-value (eq. 1) will be approximately one, corresponding to the maximum possible gain value GM=I. So all G-values above 1- E, (where E is a threshold value that is determined from the noise in the image or specified by the wishes of the user) will be retained as valid object representing line segments. Results obtained with our gain function To illustrate the detection accuracy and the robustness of the function GCS), an image representing a part of a rectangle of gray value 96 on a background of gray value 160, with and without added uniformly distributed noise between plus and minus 64 is used (fig. 3).
Fig. 3: Part of a rectangle with (3B) and without (3A) added uniform noise for examination of the gain estimator (1). A typical 1D section through the 4D representation manifold (G function of x0,Y0,1, 0r will look like the curve in fig. 4. As shown in fig. 4, four characteristics are of interest to us. Let GM be the maximum theoretically achievable gain (GM=I), GN the value for the "noise", GO the optimum gain value, which occurs when the postulated line .~ coincides with an image edge-line in the noisy image, u the parameter (x0,Y0,1, O~) value where GO occurs and T the true parameter value of the image edge-line. We then define: - The clearness C, which is the difference between GO and GN. - 6= GM-G0. -The accuracy error A=Iu-TI. -a, the width of the high value peak. Note that 6 depends on the noise and should not be too big or the correct detection of the edge-line becomes questionable.
429
Fig. 4: A typical 1D section through the 4D representation manifold. Parameter is x, y, I or O~.
a) Gain cross section obtained by varying the parameter x. Figure 5 shows a scan (ax=t) of a vertical line of length/--20 and associated width w=2 across the edge. From theoretical arguments we would expect a linear function, since with each step I pixels are moved to the other side (cf fig. 3) of the postulated line segment. Theoretically, the gain variation G(d) for 0 p + 1, equation (3) represents an overdetermined system of linear equations and hence in general has an infinite number of solutions.The usual approach is to find a so that the least square error IYpap + ypl 2 = (Ypap + yp)t (Ypap + yp) is minimised. This minimisation yields ap = _ ( y p y p ) - i (y;yp)
(4)
Thus to build up an order recursion, we rewrite the equation (3) as Yp+~ap.k = - y p + ~
(5)
471 However, since the DCT coefficients for an ECG beat are zero for n > ~ , the above equation can be conveniently represented in terms of ~p as -Yp+k =-Yp,k = (Yp,k-1, Yp,k-2, "" Yp,o, Yp )at,+k
(6)
We have used the notation yi,~ to denote a vector which has it's first entry as yi+k+~ and having N - j - 1 elements. Hence for a given p and k, the data matrix Yv,~, = ( yp,h-1, yp,~-2, "" Yp,o, Yp ) is an (N - p -
1) X (k + p) matrix. This data matrix has a full column rank and has k + p nonzero singular
values. The characteristics of this data matrix are investigated with the help of an example in which, we have constructed the matrix ~sing the DCT coefficients of a typical ECG waveform illustrated in Fig. 1. The Fig. 2 shows the profile of the singular values of this matrix Yp,k for three different combinations of p and k. The curves show an identical variation for all the combinations of p and k and is understood to have practically no effect on the values of either p or k. To device a criterion for order determination in the framework of SVD, the data matrix Yp,k does not yield any useful information. Inorder to overcome this problem, we have constructed a reduced rank approximation of the data matrix by computing a sequence of filtered estimates for the vectors Yr,~ as
~rp+ 1 -- (~rp, ]r~)(~rp,
yp)typ+ 1
~rp+k = (~rp.+.k-1 , " ' , Srp, Yp)(~rp+k_l , . . . ,~rp, Yp)*Yp+k
(7)
so that the reduced rank matrix ?p,k = (3rp+k,3rp+k-1,'" ,~rp, Yp) is of rank p only. The SVD of the data matrices yield
l/p,k = Up,k 2p,k Yp,k
(8)
?,,,,, = G,k i:,,,,, #,,,k
(o)
and
Where, Ep, k = diag(al,h, a2,k,"',ap+~,k)
and Zp,k = d i a g ( S a , k , 5 2 , k , " ' , S p , k , O , " ' , O ) .
The last k en-
tries of ~p,~ are zeroes. The Fig. 3 shows the profile of the singular values of this matrix ?p,k for the same combinations of p and k as in Fig. 2. Unlike in.Fig. 2, the curves in Fig. 3 show a marked dependence on the values of p and k. This obviously indicates the fact that the reduced rank matrix serves to be a more appropriate tool for solving the order identification problem in the framework of SVD. Inorder to extract the order information from the singular values of the reduced rank matrix, we define the matrices Zp and ~p the j'th column of which are given by 4u
and Cp,j respectively where c~p,j = (al,j, a2,j, ." 9, ap+k,j)t
and ~p,j = (Sl,j,52,j," ",Sp+k,j) t. Tile average energy for the i'th component in the j'th column of ~p and ~p are given by
Ep(i,j) = V,p+ai,j ~
Z-s i.-= 1 (Ti.j
, for
l 0) (. = o, +~, +2,...). Further, let I.,(-T) be the sample values of J',,,(t) at the sample points nT (n = o, :t:1, :t:2,...). Now, we consider the problem of approximating J'(t - d) from the above sample values, where d is a given delay time. Let M
- 1
g(t)- E m-'O
oo
E
f m ( n T ) r .... (t)
(2)
n-'--~
be the corresponding approximation of . f ( t - d). The functions r are the prescribed bounded real functions called the generalized interpolation functions or, simply, interpolation functions. We assume that these interpolation functions satisfy ] r .... (t) ]= 0 (t < nT),
[ em,,,(t)]= 0 (t > nT + A)
(3)
490
where A > 0 (m = 0 , 1 , . . . , M - 1; n = 0,4.1,4.2,...). The approximation error between f(t - d) and g(t) is defined by e(t) =1 f ( t - d) - g(t) I. Further, let E,,,,,,(t) be the upper limit of e(t) obtained by fixing all the interpolation functions em,,(t) and changing f ( t ) over all the s i g n a l s / ( t ) in r . E,,,,,(t) = sup {e(t)} (4) /(0er Then, we obtain the following theorem [2]. Theorem 1 ~* I W(,o) I' eJO-n)~, -
E.,o=Ct) = ,/-X
oo
12
r
Z m-'O
dto
n ~--'--- o o
(5)
The proof is omitted. Let em,,,(t) (m = 0 , 1 , . . . , M 1; n = 0,4-1,4-2,...) be the optimum interpolation functions which minimize Em,~:(t). Then, it is proved that there exists a set of functions era(t) (m = 0 , 1 , . . . , M - 1) satisfying
r162
(m=0,1,...,M-
1; n = 0 , 4 - 1 , - ! - 2 , . . . )
(6)
Further, Emaz(t) = E,,,o~:(t + T) holds for Bronx(t) which uses these optimum interpolation functions. Hence, g(t') is expressed by M-l
g(t) = Z m=O
Z
l , , ( n T ) . em(t - nT)
(7)
n=--oo
Then, Eq.(3) can be expressed equivalently by I r Now, we assume that the above quantity A satisfies
0 (t < 0 .t > A) ( r n = 0 , 1 , . . . , M - 1).
A = N T + r (N = a non - negative integer; 0 I) (b) Derive tile second prototype lowpass filter e'(z) -
512
Zp~z-"
by
n'-0
256
Pn =
pn_#,p~ n >_ k, n = 0, I,...,512 k--0
(17)
492 511
(c) Make the new prototype lowpass filter P*(z) = ~ p ~ , z - " by n=0
p o, = P'- +P'-+, 2
; n=0,1,2,...,511
(19)
511
(d) Construct analysis filter bank H,(z) = ~_j h,,.,,,z-" using the cosine transformation. The coefficients of the analysis n--0
filters are given by hm,, = 2.0 .p, .coa{~(m ,1,1,0.5)(n- 5]_.!.t) + ( - 1 ) " 4 } (m = 0 , 1 , 2 , . . . , M - 1). (e) Obtain the synthesis filter bank by applying the presented optimization. If the attenuation of the synthesis filter is rg,2ss.s ) ) (n = 0 , 1 , 2 , . . . , 5 1 1 ) for appropriate 0 < b < 1 and 0 < rg < 1, not sufficient, make q, .0 = qO ( b-t- (1 - b)cos \(r("-2ss's)-~'~
where q, .0 and q~ are the coefficients of the initial and the derived synthesis filter. (f) Optimize the analysis and the synthesis filter bank iteratively by the reciprocal relation. That is, the replacement between (~I',,(z))i_ l ---* (H,,(z)) i and (H~(z))i_ l ---*(~I',,(z)) i are performed iteratively, where (*)i (i = 1 , 2 , 3 , . . . denotes 511
the i stage of the iterations for a pair of Bin(z) and 9,,(z). Let fire(Z) = ~
h,,,,z-" be the resultant analysis filters.
n=0
(g) Derive the coefficients of new prototype filter from the relation p~ =-],0,,/[2.0cos{(r/32)" 0 . 5 - ( n - 511/2.0) -i7r/4.0}] (n = 0 , 1 , 2 , . . . , N ) , where h0,, (n = 0 , 1 , 2 , . . . , 5 1 1 ) are the coefficients of the optimized low pass analysis filter
//o(~).
(h) M a k e t h e linear phase analysis filter bank H~(z) (m = 0 , 1 , 2 , . . . , M - 1) having the coefficients h ~ , , = 2.0. p b . - 511/2.0)} (m = 0 , 2 , 4 , . . . , 3 0 ) and h ~ , , = 2.0. p~. sin{(r/32)(m ,1, 0.5)(n - 511/2.0)} (m = 1,3, 5, . . . , 3 1 ) , where n = O, 1, 2 , . . . , 511. (i) Derive the synthesis filter bank by the presented optimization. Then, as the direct consequence of the symmetrical arrangement for the coefficients of the analysis filters, we can easily prove that all the resultant synthesis filters are linear
cos{(r/32)(m "1" 0.5)(n
N
phase. Assume that ~ ( z )
= ~r
(m = 0 , 1 , 2 , . . . , 3 1 ) be the transfer functions of the resultant synthesis filters.
n=0
(j) Make new analysis filters defined by H,.,(z) = o t , , . H ~ ( z ) + ( 1 . 0 - a , , ) . ~ ( z ) (m = 0 , 1 , 2 , . . . , 3 1 ; n = 0 , 1 , 2 , . . . , 5 1 1 ) , where a,, (m = 0 , 1 , 2 , ' . . , 3 1 ) are appropriate scaling factors satisfying 0 < am < 1 (m = 0 , 1 , 2 , . . . , 3 1 ) . These analysis filters are linear phase as well. (k) Derive tim synthesis filter bank by the presented optimization. Then, the symmetrical arrangement for the coefficients of these analysis filters, it is shown that all the resultant synthesis filters are also linear phase. (l) Optimize the analysis and the synthesis filter bank iteratively by the reciprocal relation. We c a n obtain an example of a linear phase filter bank with M = 32 paths and the size N ,1, 1 = 512. Before we derive this linear phase filter bank, we perform 10 iterations. In this example, d = 256 is used. Although the effect of the parameters, fc, r/ and so on, are critical, in this example, we use approximately fc = 47, r/ = 1.0, a = 0.52 and rg = 1.4, b = 0.5. The amplitude characteristics of each analysis and synthesis filters are 99 or 100 dB.
3
CONCLUSION
Although the detail is omitted, it should be noted that the proposed generalized interpolatory approximation has tile minimum measure of error in a certain sense among all the linear/nonlinear approximations using the same sample values of the signal. The presented design gives a simple way to obtain the optimum analysis/synthesis filter banks. Finally, we would like to express our sincere thanks to Professor B. G. Mertzios, Demokritus University, Greece.
References [1] P. P. Vaidyanathan: Multirate Systems and Filter Banks, P T R. Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1993. [2] T. Kida, L. P. Yoshioka, S. Takahashi and H. Kaneda: Theory on Eztended Form oJ Interpolatory Approzimarion oJ Multi.dimensional Waves, Elec. and Comm. Japan, Part 3, 75, No. 4, pp.26-34, 1992. Also, Trans. IEICE, Japan, Vol. 74-A, NO. 6, pp. 829-839 , 1991 ( in Japanese ),
The Optimum Approximation oJ Multi.dimensional Signals Based on the Quantized Sample Values oJ Transformed Signals, Submitted to IEICE Trans. E77.A, 1994.
[3] Takuro Kida:
Proceedings IWISP '96," 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
493
Robustness of Multirate Filters Banks F. N. Koumboulis 1 , M. G. Skarpetis ~ , and B. G. Mertzios 3 1 2 3
University of Thessaly, School of Technological Sciences, Dep. of Mech. & Industrial Eng., Volos, Greece. Mailing address: 53 Aftokratoros Irakliou St., pc. 15122, Athens, Greece, tel. +30-1-8023050, e-mail:
[email protected] National Technical University of Athens, Dep. of Electrical and Comp. Eng., Div. of Electroscience, Greece. e-mail:
[email protected] Democritus University of Thrace, Dep. of Electrical and Computer Eng., 67100, Xanthi, Greece, Fax: 130-541-26947 or 26473, e-mail:
[email protected].
Abstract The problem of designing nonmaximally decimated multirate filter banks is studied for the case of uncertain dynamic channels. The necessary and sufficient condition for the existence of an appropriate FIR and anticausal synthesis filter bank yielding perfect signal reconstruction, in spite of the channel's uncertainties, is established. The condition depends entirely upon the polyphase analysis matrix and the z-transform of the dynamic description of the uncertain channel. The general analytic expression of all polyphase synthesis matrices solving the problem is derived. I. Introduction The problem of designing multirate filter banks is an important signal processing design problem from both the theoretical and practical point of view [I],[2]. The problem has attracted considerable attention and it has been studied for different types of analysis and synthesis banks (see f.e. [I]-[5] ) as well as using maximally decimated [6]-[7] or nonmaximally decimated filter banks [8]. Here, we interested for one of the main objectives of the problem namely that of perfectly reconstructing the input signal [6],[7]. Motivated by many practical cases where the channel's behavior is not ideal, the case where the channel is described as a dynamic uncertain system is studied. The filter bank is considered to be nonmaximally decimated (number of channels greater than the decimation ratio). In particular, the necessary and sufficient condition for the existence of an appropriate FIR and anticausal synthesis filter bank achieving perfect reconstruction of the input signal, in spite of the channel's uncertainties, is established. The condition depends entirely upon the polyphase analysis matrix and the z-transform of the dynamic description of the uncertain channel. The general class of all, independent of the channel's uncertainties, polyphase synthesis matrices solving the problem is derived. 2. Problem Formulation Consider the nonmaximally decimated filter bank, presented in Figure 1
,cO0 J Ho(z)
l
=nol o o
o
t
L___.~--~hannelP-~
] H~-~z) I
Analysis bank
qv-I
Decimators
qU~l Expanders
~ Ft~ )
!,. I;C").-
Synthesis bank
Fig. 1- Non maximally decimated filter bank p > M where M is the decimation ratio and p is the number of channels. Hi(z) (/- 0 ..... p - 1) are the analysis filters and Fj(z) ( / - 0 ..... p - 1) are the synthesis filters. The signal x(n) is the input signal while ~(n) is the output signal. The design objective is to find an appropriate synthesis bank namely appropriate filters Fo(z), F1 (z) ..... Fml (z) such that k(n) = x(n) (Perfect Reconstruction). Using the polyphase representation (Fig. 2) of the filter bank the design objective is translated as follows: Find appropriate polyphase matrix R(z) of the synthesis bank such that R(z)E(z) = 1M, where E(z) is the polyphase matrix of the analysis bank. R(z)and E(z) are of dimensions p x Mand M x p, respectively. Eventhough the most standard type of filter banks is that of maximally decimated i.e. p = M. The use of nonmaximally decimated filters banks appear to have many applications especially in convolution codes [8]. Here the nonmaximally decimated filter banks are used in order to compensate the errors appearing in the filter bank output k(n), due to uncertainties of the transmission channels. The perfect transmission of a signal via a channel is an ideal si tuation which facilitates the solution of the respective filtering problem. The behavior of a channel is determined upon characteristics of the mean, properties of the signal as well as external events. Similarly to any other physical system a channel can be considered to have dynamic behavior. For example consider a wire high frequency transmission line. If the length of a transmission line is much less than the wave length of the signal, the transmission line is described as a static system [9]. If the length of the line is about equal or k-times greater (with k small) than the wavelength of
494
channel O r
, 1 M-th delay
I
channel
1-st aflvqnce C~alYI 1
E~
R(z)
9
-1 z
"
q
IM.th z T advance chain
channelp-
I
Decimators
q
Polyphase analysismatrix
Polyphase Expanders synthesis matrix
Fig. 2: Polyphase representation of a nonmaximally decimated filter bank the signal, the channel behaves as a dynamical system [9]. For sufficiently long transmission line the channel behaves as a distributed parameter system [9]. The values of the parameters of the three types of systems described above, i.e. the values of the parameters of the dynamics of a channel, depend upon other physical parameters e.g, temperature, magnetization. In many cases these physical parameters are not known in full accuracy. So they can be considered as uncertainties and consequently the channel is described by a dynamic uncertain model. In general, the parameters of the dynamic model are nonlinear functions of the uncertainties (f.e. the dependence of resistance upon temperature). In this paper, the problem of designing multirate filter banks is studied for the case where the channel has a dynamic uncertain behavior. To preserve generality, the channel is assumed to by affected by I uncertainties, let q l ..... q l, while the dynamics of the i-th channel are assumed to be described by the transfer function di(z,q) (with q = [ql ..... ql] e Q: uncerainty domain). In particular, the problem is oriented by the non maximally decimated filter bank with uncertain dynamic channels, presented in Fig. 3
9
9
9
9
Analysis bank Decimators Channels
Expanders
t A
Synthesis bank
Fig. 3: Nonmaximally decimated filter bank with uncertain dynamic channels or equivalently by the polyphase representation given in Fig. 4"
x~ 1-st delay
J 2-1 ]
M-th I delay
Z -1
9
-j
9
z
]
i,~
A
,I~A
I 1-st ladvance
I
1
E(z)
J Polyphase Decimator analysis matrix
-I ~-r
,.I
-iPolyphase- ~'
.
I
synthesis matrix Expander
--
adv. . . .
~ha;.
Fig. 4: Polyphase representation of a nonmaximally decimated filter bank with uncertain dynamic channel The design objective is to find a polyphase synthesis matrix R(z)which will eliminate not only the influence of the polyphase analysis matrix E(z) to the output signal k(n) but also the influence of the dynamics of the uncertain channel. Hence, the problem consist in finding an R(z) such that R(z) diagonal{ dj(z, q) } E(z)= I u (2.2) j = 0 ..... p - I
The dynamics of different channels are considered to be in general different. This can easily be understood after recalling the fact that different signals travel into the channel (linearized dynamics) as well as that in many practical cases (e.g. encryption [10]) different means with different characteristics are often used.
495
With regard to E(z), or equivalently with regard to the analysis filters Hi(z), j = 0 ..... p - 1 no limitations are imposed except that of causality. The polyphase matrix R(z) it is considered to be anticausal and FIR ([6]-[7]) thus corresponding to anticausal and FIR filters Fg(z), j = 0 ..... p - 1. 3. Solution of the Problem Define B(z, q) =diagonal {dg(z, q) } E(z) (3.1) Fo .....p--1 Based upon the above definition the equation (2.2) takes on the form R(z)B(z, q) = 1M (3.2) As already mentioned, E(z)is causal. The channel (deterministic system) is considered to be causal. So the rational matrix B(z, q) (rational with respect to z) is causal and thus it can be expressed in polynomial ratio form as follows: B,(q)z" + B,-1 (q)z "-l +... + Bo(q)z ~ (3.3) B(z, q) = z" + b ,_l (q)z "-1 +"" + bo(q) z~ where Bg(q) e [ 8~(q)]P• bg(q) e fd(q)are nonlinear functions of the uncertain vector q (with fs(q)the set of real functions of q. The integer n represents an upper bound of the realization degree of B(z,q). As already mentioned, the polyphase synthesis matrix is considered to be FIR and anticausal, i.e. to be of the form R(z) = Roz ~ + R lz 1 +... + R m z m (3.4) where m is the maximum number of advances. Substitution of (3.3) and (3.4) into (3.2) yields (3.5) [Rozo + R l z 1 +...+Rmzm][B,,(q)z" +B,_l(q)z "-1 + . . . + B o ( q ) z ~ 1 7 6 Equating like powers of z in both sides of equation (3.5), defining Bg(q) = 0 for j < 0 and defining Bn(q) B.-l(q) "" Bn-m+l(q) Bn-m(q) B ..... l(q) "'" 0 0 0 Bn(q) ... Bn-m+e(q) Bn-m+l(q) Bn-m(q) ... 0 0 9 9 . . . . . . (3.6a) BE(q)= b b ..i B,'(q) B,-l(q) B,-2(q) ... Bo'(q) b 0 0 ... 0 B.(q) B.-l(q) ... B~(q) Bo(q)
B~(q):E 0 0 ...o 1 i.(q~
I
bo(q)lM 3 (3.6b) RE = [ RM RM-1 -- R 1 Ro J (3.6c) the equation (3.5) can be expressed more compactly as in the following algebraic equation RFBE(q) = BR(q) (3.7) Equation (3.7) is linear with knowns depending upon the uncertainties and unknown RE which does not depend upon the uncertainties. According to Appendix (see relations (A.6-7)_~ the equation (3.8) is solvable if and only if . . .
BE(q) I=rank~[BE(q)] rank~ BR(q) J
(3.8)
If the condition (3.8) is satisfied then according to (A.8) the general solution of equation (3.7) is (3.9) R E : T[BE(q)]~ +(BR(q) I BR(q) )~ where T is an arbitrary matrix. From (3.9) and (3.10) the following theorems are derived. Theorem 3.1: For the multirate filter bank of Fig. 4, there exist an anticausal and FIR polyphase synthesis matrix R(z) of order m, yielding perfect reconstruction of x(n) i.e. k(n) = x(n), in spite of the channel's uncertain dynamics if and only if the condition (3.8) is satisfied. 9 Theorem 3.2: For the multirate filter bank of Fig. 4, the general form of the anticausal FIR polyphase synthesis matrix R(z) of order m, yielding perfect reconstruction of x(n) in spite of the channel's uncertain dynamics, is R(z) : TRa(z) +Rb(z) (3.11) where Rt,(z) = Rboz ~ +"" +Rbmz m and Ra(z) = Raoz ~ +"" +R,,mZ m, and where [ Rbo" "Rbm I = ( B R ( q ) I BR(q) ) ~ , 9 = [ E(q)]~. The matrix T is an arbitrary matrix. 9 Based upon Theorem 3.2 and the relation between the polyphase synthesis matrix and the respective synthesis filter bank [ 1]-[2], the general form of the synthesis filters Fj(z), q = 0 ..... p - 1) can easily be derived. 6. Conclusions The problem of designing nonmaximally decimated multirate filter banks has been studied for the case of uncertain dynamic channels. The necessary and sufficient condition for the existence of an appropriate FIR and anticausal synthesis filter bank yielding perfect signal reconstruction, in spite of the channel's uncertainties, has been established (Theorem 3.1). The general analytic expression of all fixed order polyphase synthesis matrices solving the problem has been derived (Theorem 3.2). Many aspects regarding the problem remain to be solved f.e. the minimal order of the polyphase synthesis matrix solving the problem. The application of the present results for the case of high frequency transmission lines yielding an RLGC channel model is currently under completion. A p p e n d i x
Here some useful mathematical definitions and properties, introduced in [1 1], are presented. Consider the row vector set {wl(q) ..... wv(q)}, where wi(q) e [ gs(q)] l• (i = 1..... v) is a nonlinear vector map Q ~ [(Ca (q)]l• vectors wi(q) (i = 1..... v)are said to be linearly dependent among themselves over 9t, if there exist x~ e 910 = 1..... v) with (x~ ..... xv) r 0 such t h a t x l w l ( q ) + . . . +x~wv(q) =0, Vq e Q. If the vectors w~(q) are not dependent over 91 they are called independent over 91. Consider the subset N(q)c_[8o (q)]l• where N(q)= { w ( q ) e [f0(q)] 1•
496
w(q)=xlw~(q)+... +xvwv(q) Vq ~ Q, xi ~ 9t(/= 1..... v) }. It can readily be shown that N(q) is a finite dimensional vector space over the field of real numbers 9t. Consider the matrix W(q)=[ [w~ (q)]r, ..., [wv(q)] T i t . The image of W(q) over 91 is defined to be Im~ { W(q) } = N (q). Let wA (q) ..... wf, (q) be the linearly independent (over 91) vectors of {wl(q) ..... w~(q)}. The rest of the vectors, let wo, (q) ..... Wov_,(q), are linearly dependent (over 91 ) upon the vectors{wfl (q) ..... wf,(q)}. Thus, {wA (q) ..... wf~(q)} is a base of Imp{ W(q)} and the dimension dim{Im~ { W(q)} } of the space Im~ { W(q) }, is equal to ft. The rank (over the field of real numbers) of W(q) is defined as follows rank~{W(q)} =dim{N (q)} =dim { Im~{W(q)} } (A.1) Consider the following subset of 9t ~ ,~= { z=[zl ..... z~] ~ 91~ " zlwl(q)+... +zvw~(q)=O, Vq ~ Q } The above subset is a subspace of 91~. The kernel of W(q) over 91 is defined to be Ker~ { W(q) } = ~. Note that, dim { Ker~{W(q)} }+ +dim { Im~{W(q)} }= v (A.2) To derive the independent of q matrix corresponding to Ker~{W(q)}, define {z~'..... z~W,} to be a base of Ker~{ W(q)}(z w ~ 91v). Then, the matrix corresponding to Ker~{ W(q)} is
t (q>l
:[
(zr> .....
(A3>
Let e(q)eIm~{W(q)}. Thus, 8(q)~Im~{Wl(q)}, where Wl(q)=[ [wA(q)] r ..... [wf~(q)]T i t . Since the rows of Wi(q) are linearly independent over 91there exist a unique vector, let x + ~ 91~x~, such that e(q)= x +W~(q). The elements of x + are the components of e(q) in Im~ { W(q) }, with respect to the base {wA (q) ..... wf~ (q) }. Augment x + with zero elements placed at positions corresponding to the linearly dependent (over 91) rows of W(q). Based on this augmentation the following generalization of the components of e(q) in Im~ { W(q) }, is derived t"
where Z; = t Zr+ ' if k =fr ~ {fl ..... f~ } (k = l .... v) and where Zr+ is the r-th element of x + If the vectors ' 0 , if k=o~ E {ol ..... o~_~} wf~ ( q ) . . . . . wf.(q) are selected by searching the vectors w~(q) ..... w~(q) from the first to the last, then the matrix Wt(q) and the vector ( e ( q )
/ W(q))~
are uniquely determined. Let E(q)= [~'!q) ~,-(q) ] be a v * x p
matrix with
e~(q) eIm~{ W(q)},i = 1..... v*. Then (A.4) can be generalized a follows
(E(q) I rV(q) )~
i W(q) )~ :[ (e"(q)l.> ,..> >. 1
Numerical algorithms for the computation of all above definitions, can be found in [ 12]. In what follows the solution of a linear non homogeneous algebraic matrix equation with data in ta (q) and unknowns in 91, derived in [ 11], is presented. Consider the equation XW(q) = E(q) X e 9t"'x" (A.6) The matrices E(q) and W(q) are known nonlinear maps of q. Ts problem consist in finding X, such that (A.6) is satisfied. Clearly, the problem is solvable if and only if E(q)_ eIm~ { W(q) }, or equivalently (from (A. 1)) if and only if
rank~L! j
W(q) ] =rankm[W(q)l E(q) If condition (A.7) is satisfied then, (according to (A.3) and (A.4)), the general solution of (A.6) is X = T[W(q)]; + ( E ( q ) I W(q) ~ ,~ where T is an arbitrary matrix. Note that the T, ( E ( q ) \ W(q) }~ and [W(q)]~ are independent of q. References
(A.7) (g.8)
[ 1] Vaidyanathan, P. P., 1993, Multirate Systems and Filter banks, Englewood Cliffs, NJ: Prentice-Hall. [2] Crochiere R. E. and Rabiner L. R., 1983, MuItirate Digital Signal Processing, Englewood Cliffs, NJ: Prentice-Hall. [3] Vaidyanathan, P. P., 1990, Multirate digital filters, filter banks, polyphase networks, and applications: A tutorial, Proc. IEEE, vol. 78, pp. 56-93. [4] Vetterli M., 1987, A theory ofmultirate filter bamks, IEEE Trans. Acoust. Speech Signal Processing, vol.35, pp. 356-372. [5] Smith M. J. T. and Barnell T. P., lIl., 1987, A new filter-bank theory for time-frequency representation, IEEE Trans. Acoust. Speech Signal Processing, vol.35, pp. 314-327. [6] Vaidyanathan, P. P. and Chen T., 1995, Role of Anticausal Inverses in Multirate filter-banks--Part I: System Theoretic Fundamentals, IEEE Trans. Signal Processing, vol.43, pp. 1090-1102. [7] Vaidyanathml, P. P. and Chen T., 1995, Role of Anticausal Inverses in Multirate filter-bamks--Part II: The FIR case, factorizations, and biorthogonal lapped transforms, IEEE Trans. Signal Processing, vol.43, pp. 1103-1115. [8] Fomey G. D. Jr., 1970, Convolutional codes I: Algebraic structure, IEEE Trans. Info. Theory, vol. 16, pp. 720-738. [9] Combes P. F., 1990, Microwave Transmission for Teleconmmnications, New York, Wiley. [10] Schneier B, 1994, Applied Cryptography, New York, Wiley. [11]Koumboulis, F.N., and Skarpetis, M.G., Input output decoupling for system with nonlinear uncertain structure, J. Franklin Inst. In Press. [ 12]Koumboulis, F.N., and Skarpetis, M.G., Robust triangular decoupling with application to 4WS cars, submitted.
Proceedings IWISP '96," 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
497
D e s i g n i n g a n d L e a r n i n g A l g o r i t h m of N e u r a l N e t w o r k s for P a t t e r n R e c o g n i t i o n Hiroki TAKAHASHI, Masayuki NAKAJIMA Graduate School of Information Science & Engineering Tokyo Institute of Technology 2-12-10okayama Meguro-ku Tokyo 152 Japan Abstract In the case of pattern recognition using neural networks, it is very difficult for researchers or users to design them. In this paper, a method of learning and designing feedforward neural networks is discussed. In t h e proposed method, a neural network is regarded as one individual and neural networks whose structures a r e same as one species. These networks are evaluated by grade of training and they evolve according to an evolution rule proposed in this paper. The designing and training neural networks which performs Handwritten KATAKANA recognitions are described and the efficiency of proposed method is discussed.
1
Introduction
There are many studies about neural networks model which has the function of learning. However, it is not clear how the signal processings are performed in neural networks, because that non-linear units function in parallel. In the case of designing neural networks which performs pattern recognition, researchers design the network structures and learning parameters such as learning rate, coefficient of momentum term and so on, by trial and error based on their knowledge and experiences. Especially in the case of character recognition in neural network, it is very difficult for researchers to design by trial and error, because that the network is large and it takes long time to confirm the performance of the network. There are many studies on designing neural networks [1] [2] [3] [4] [5].Whese researches are classified in two kinds of approaches. One is direct encoding method[3]and the other is grammatical encoding method[i][2]. The direct encoding method has some restrictions of neural network structures because the network connectivities are encoded into a matrix directly. The grammatical encoding method is more flexible than the direct method. However, it is difficult to obtain an optimal network structure. Moreover, structural evolution method is proposed by [4]. The method enables to generate any kinds of neural network structures, but the connections have only three kins of connection weights. Therefore, it is difficult to generate networks for complex pattern recognition. The authors proposed a method of designing optimal neural network structure using GA(Genetic Algorithms)[6][7]. Moreover, we also designed and trained neural network structures which classified some simple patterns[8]. In this paper, a method of learning and designing feedforward neural networks structures based on evolutional method is discussed. In the proposed method, a neural network is regarded as one individual and neural networks whose structures are same as one species. The decision of network structures and training of neural network are estimated based on fitness values of individuals and species. The designing and training neural networks which performs Handwritten character recognition is described and the efficiency of proposed method is discussed.
2
Genotype coding Table.1 shows genotype codings employed in proposed method. Table.1 Genotype codings of neural network. genotypel genotype2 genotype3 genotype4 N n 77 w N r/
n = ( n 1 . . . . , rig) w = (wl,...,wt)
: Number of neural network layers : Learning rate
: Unit numbers of each layer : Connection weights
G e n o t y p e l and 2 present the structure of neural network. The length of genotype4 is based on the number of node 1 which is restricted by network structures given in genotypel and 2. The length of genotype4 is given as following formula. N-1
Z= ~ (nk + 1)nk+i k---1
where, l gives number of nodes.
(1)
498
3
Definition of spieces
The training of neural networks is the minimum search problem in error-weights space. If the structures of neural networks are different, the shapes of error space become different, therefore, it's difficult to compare with the searching positions in neural network of different structure. In this paper, the individuals whose neural network structures are same are defined as same species. That is, the same phenotypes of the individuals which are represented by genotypel and 2 shown in Table.1 are regarded as the same species.
4
D e f i n i t i o n o f e v o l u t i o n rule of i n d i v i d u a l s a n d e v a l u a t i o n
In this section, operations between the individuals in the same spices are described. The operation described in here is performed in every 10 epochs, and in the other epochs except for this one, the weights represented by genotype4 are changed according to the direction of gradient descent in weight space. 1) E v a l u a t i o n The fitness value f ( / i ) of individual Ii is defined by M.S.E (mean square error) calculated by the output values of the network and it's target values. Therefore, the smaller the amount of fitness value is, the more superior it becomes. Np nN
(2)
/(z,) = Z p--1 k - ' l
N Np
nN Opk ~v tpk
: Number of layers 9Number of patterns
: Unit number of N-th layer 9Output value on k-th unit of N-th layer for pattern p : Target value on k-th unit for pattern p
2) S e l e c t i o n In the same species, individuals of large fitness values are disappeared and new individuals are created according to a selection ratio/:'8. 3) C r o s s o v e r a n d M u l t i p l i c a t i o n In this method, two kinds of generating operation are defined. One is crossover operation, and the other is multiplication operation. Crossover operation generates a new individual from two different kinds of individuals. The new individual succeed the features of it's parents. Therefore, we employ one point crossover operation at crossover ratio Pc. Multiplication operation generates a new individual from one individual. In multiplication operation, new individuals are multiplied according to the following formulas in order to create them distributed near the superior individual/8 in multiplication ratio Pi.
wnew
--
ws-t-0.4 • r(ws)
(3)
,,~e,.
=
.8 + 0 . 4 x r(,~)
(4)
ws : Connection weights of individual Is ~78 : Learning parameter of individual/8 wn~w : Connection weights of individual I n ~ ~?~ : Learning parameter of individual I , ~ The function r(a) produces random numbers whose range is - [a [< x _ 0 (rn - 0 , 1 , . . . , M 1" n - 0 , 4-1, 4-2, ...). Further, let R~ be the integer satisfying R ~ T < t < ( R , 4- 1)T. Then, we consider the following two intervals defined by
I~-{t IR,T R , ) (9) - 7r From Eq.(9), Moreover, we assume t h a t the interpolation functions r satisfy r - r when n does not satisfy R , - Q , < n < R , , the interpolation functions ~bm,n(t) - ~bm, n(t, t) vanish. Hence, it is necessary to confirm that this condition does not contradict the constraint shown in gq.(5). Now, recall that R ~ T < t < (R~ 4- 1)T. Further, we consider the range of t satisfying R~ - Q~ < n < R~ for a given < t < nT+NT4-r, then integer n. If t - n T , then R~ - n holds and this gives the minimum value o f t . I f n T 4 - N T R, - n 4- N - n 4- Q, holds and t 6. II - {tl R , T < t < R , T 4- r} for this R , . In this case, n - R , - Q , holds which gives the supreme value of t - n T 4- N T 4- r. When t is in the range n T 4- ( N - 1)T 4- r < t < n T 4- N T , R , - n 4- ( N - 1) - n 4- Q, holds and t 6. I~ - {tl R ~ T 4- r < t < (R~ 4- 1)T} for this Rf. In tiffs case, n - R , - Q, holds also. But, the supreme value of t, that is (R, 4- 1)T - ( n 4- N ) T is not larger than the previous supreme value t - n T 4- N T 4- r. In conclusion, the interpolation functions r - r t) have the meaningful value in n T < t < n T 4- N T 4- r which does not contradict the constraint Eq.(5). As shown in [2], for a given t, we have
Em,~(t)
-
IJ_
-~r
IW(w)
,,
oo
(teI~)
(k-lor
eJ('-d) ~ -
M-1 Rt C m , , ( t ) H m ( w ) e J ~ ~ n=R~-Q~
"T~
12dw
(lo)
m=0
2)
(11)
Let 12~ be the set of pairs ( r e , n ) composed of m and n satisfying m - 0 , 1 , . . . , M - 1 and n - R , - Q , , R , - ( Q , - 1),f-6 (Q, - 2 ) , . . . , R , . Minimizing of E q . ( l l ) is straightforward as is shown in [2]. Firstly, we expand E . . . . (t) 2 with respect to r . . . . (t) under consideration and simply differentiate E m a x ( t ) 2 with respect to the complex conjugate of the interpolation functions r which contribute actually to the approximation at the prescribed t, and make the resultant set of formulas to be zero, t h a t is ~ he,.,..(0 Further, let ~b,,,,(t) (m E , , , a x ( t ) . Then, it is proved n T ) , (m = 0 , 1 , . . . , M 1;
- 0, where m - 0 , 1 , . . . , M
- 1 and n - R , - Q , , R , - ( Q , - 1 ) , R , - (Q~ - 2 ) , . . . ,R~.
0,1,...,M1; n - 0 , 4 - 1 , 4 - 2 , . . . ) be the o p t i m u m interpolation functions which minimize that there exists a set of functions era(t) (m - 0 , 1 , . . . , M - 1) satisfying Cm,,(t) - r n - 0 , 4 - 1 , 4 - 2 , . . . ) . Moreover, E m a x ( t ) - E . . . . ( t 4 - T ) holds for E m ~ , ( t ) which uses these
M-1
C f,(nT) m
optimum interpolation functions. Hence, g ( t ) is expressed by g ( t ) = m=0
expressed equivalently by I $,(t) I= 0 ( 1 < 0 , t A ) ( m = 0) 1 , . . . , M - 1 ) . Using these relations, if we perform the above operatio11 for t satisfying 0 I,t of the interpolation funct~ons.
>
&(t
- nT).T h e n , Eq.(5) can be
< T , we can
obtain all the funct~onalforms
n=-ra
THE OPTIMUM APPROXIMATION
2
T,et T and u be a pair of time and frequency variables. Xow, we extend t h e above discussion. Let B ( T ) = v[{f,,,(nT)};T ] be a linear/nonlinear approximation for f ( r ) . We assume that g ( r ) uses t h e sample values j n , ( n T ) (m = 0 , 1 , 2 , .. . , M - 1; n = HI - Q l , R t - ( Q I - l ) , R ,- (Q, - 2), . . . , R 1 )when r is equal t,o t . We assume that v [ { j , ( n T ) j , T ] va11is11eswhen r = t holds and all the j,,,(nT) (m = 0 , l . 2 , . . , M - 1 , n = HI - Q,,ICl - (Q1- I ) , HI - ( Q I2 ) 1 . , R , ) arc zero. For a r b ~ t r a r yf ( 7 ) 111 r , wc assume that theri: exist f ( r , t ) anti g ( ~t ), satisfying f ( t ) = y f ( r , t ) and g ( t ) = y g ( r ,t ) . Since the error i ( r )= f ( T - d ) - g ( r ) depends on t , h signal ~ f ( r , t ) , we express t h e error as i ( r ) = ~ J ( tT) ], . We dcnotc I,y d [ ? ( ~ ) a] i u n r t i o n l a functional/ an operator of 2 ( r ) . We assume that J [ i ( r ) ]has a non-negative value. Moreover, let O Lc a subsct in t h e set of signals r Then; consider t h e following rr~easureof error E " ( T ) for a signal f ( r ) O.
In
E o ( r )= sup {d[e(r)]} f(r)E@
With respect to E e ( T ) , we assurne r~aturallythat E,:(r) 5 E e ( r ) holds for all the set of signals 0 : sat,isfying '3: E O Further, let E ( r ) = E1.(r)be the o b j e c t ~ v en~easureof error to hc n l i n ~ m ~ z c d We C O I I S I ~ ~new I. Inner product and norm such as ( B ( u ) C , (U))O = (?*)-I W ( u )1~B(u)C(2L)du and 11 B ( u ) ) l o = ( 2 ~ ) - ' ..
r=
S-ywI
I W ( u )1' 1
11 C ( u )/ l o
(3)
where r is a scaling function satisfying the dilatation equation [8]. We get the recursive relation:
~(k,~,y)
= ~ h(~,,~) ~(k- 1,~ + 2k-~,y + 2k-~m)
(4)
~Trt,
where h is the wavelet filter corresponding to the r function. The wavelet function use to be a cubic Bspline function but we can adapt many other function. Then, the wavelet coefficients w(k,x,y) at scale k (k = I,...,N) corresponds to the difference between two successive approximations c(k - 1, z, y) and c(k,x, y) of the image r(x, y): _
w(k,x,y)
= c ( k - 1, x , y ) - c ( k , x , y )
k = 1,...,N
(5)
where c(0, x, y)is equal to the image r(x, y). The image restoration is based on the analysis of the statistical significance of the w(k,x,y) coefficients [9]. Taking into account the distribution law of the coefficient w(k, x, y), non significant coefficients axe rejected. In the case of our images, it is not possible to obtain a single variance value
591 for each wavelet plane since the variance is spatially non uniform in the image. We have modified the algorithm to take into account the heterogeneous variance. If the w(k, x, y) coefficients of the plane k are related to the coefficients c(O,z,y) of the plane 0 (i.e. to r(x,y) itself) by a filter g such that:
(6) n~m
2 (k,x,y) of the then the variance a w r(x,y):
O" w
~
y)
-"
w(k z,y) coefficients can be easily obtained from the variance of
_
_
(7)
Every coefficient w(k,x, y) in each plane k is then tested against its standard deviation a~(k,z, y) obtained from Eq. (6). It is considered as significant if it is larger than na~(k,z,y), where n depends on the choosen significance's probability. The reconstruction of the restored image is obtained by adding together the planes of only significant w(k,z, y) coefficients and the last smoothed plane,
c(N,~,y).
4
E x p e r i m e n t a l results
This method was applied to detect local electric fields on the membrane of an excited neuron stained with a voltage sensitive dye and the biological methods and results were published in details elsewhere [2]. In Fig. 1, we can see examples of the detection obtained by using the wavelet transform that removes the spatial noise generated by the non biological background. When the neuron under study is excited (Fig. 1, C and E), we detect pixel clusters corresponding to the operation of groups of ionic channels in the membrane of the active neuron (Fig. 1, C). When the neuron under study is not stimulated (Fig. 1, D and F), the small number of clusters detected in the control images (Fig. 1, F) correspond to spontaneous activities of the biological membranes in the field.
Figure 1: (A) Image of the microscopic field showing the fluorescent neurons. Only the neuron shown by the arrow is excited. (B) 2D map of the variance in image A. (C, D) Relative variation image before filtering. Image C corresponds to a relative variation image when the neuron is excited. Image D corresponds to a control relative variation image when the neuron is at rest. (E, F) Filtering of C and D, respectively, by using the wavelet transform. The top scale corresponds to B (variance): full scale is 3 10 -3. The bottom scale corresponds to C, D, E and F (relative variation images): full scale is 3.2%.
592
5
S t a t i s t i c a l significance
First, the use of the wavelet transform makes possible to evaluate the significance, at the single pixel level, from the distribution law of the w(k,z, y) coefficients. Assuming that the w-law is well approximated as a normal law with zero mean, the significance of the detected pixel will depend on the n factor (see sec'c• 3). The thresholding of planes with n = 2 gives a confidence better than 95% and with n = 3 a confidence better than 99%. Then, to evaluate the significance of the results and the resolution in intensity changes that our system provides, the lowest significative distance (i.e. the best resolution in intensity) between two successive samples of intensity must be computed. The first step is to classify pixels of the filtered relative variation image into N samples with the same chosen intensity range. Then, we compute the mean and the variance in each sample of pixels from the raw pixel intensities and variances on the unfiltered relative variation image. Using the Bienaim~-Chebichev theorem we compute the limit of the probability that two successive samples have different means [10]. If this probability is greater than 80% the two compared samples are taken as significantly different. Filtering in wavelet space detects changes in intensity with a resolution of 0.3% and a confidence greater than 95%.
6
Conclusion
This method is a complete image processing tool that gives significant results but it is unfortunatly limited by its computational efficiency, both in time and in computer storage space. However, this algorithm conserves the photometrics and provides a calibration in size and intensity of active sites by extraction of significant structures.
References [1] L.M. Loew, S. Scully, L. Simpson, and A.S. Waggoner. Evidence for a charge-shift electrochromic mechanism in a probe of membrane potential. Macmillan J., 281:497-499, 1979. [2] P. Gogan, I. Schmiedel-Jakob, Y. Chitti, and S. Ty~-Dumont. Fluorescence imaging of local electric fields during the excitation of single neurons in culture. Biophysical J., 69:299-310, 1995. [3] J.A. Jamiesien. Infrared Physics and Engineering. McGraw-Hill, New York, 1963. [4] M. Gasden. Some statistical properties of pulses from photomultipliers. Applied Optics, 4:14461452, 1965. [5] A. Grinvald, R..D. Frostig, E. Lieke, and P~. Hildesheim. Optical imaging of neuronal activity. Physiol. Rev., 68:1285-1366, 1988. [6] J. Morlet, G. Arens, E. Fourgeau, and D. Giard. Wave propagation and sampling theory - I and II. Geophysics, 47:203-236, 1982. [7] A. Grossmann, R.. Kronland-Martinet, and J. Morlet. Reading and Understanding Continuous Wavelet transform. In Wavelets: Time-Frequency Methods and Phase-Space (J.M. Combes et al., Eds), Berlin, 1989. Springer Verlag. [8] G. Strang. Wavelets and Dilation Equations: A Brief Introduction. SIAM Review, 31:614-627, 1989. [9] J.L. Starck, A. Bijaoui, and F. Murtagh. Multiresolution Support Applied to Image Filtering and Restoration. Graphical Models and Image Processing, 57:420-431, 1995.
[10]
W. Feller. An Introduction to Probability Theory and its Applications I and [I. Wiley, New-York, 1964.
Proceedings IWISP '96,"4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
593
Deterioration Detection in a Sequence of Large Images O. Buisson, B. Besserer, S. Boukir & L. Joyeux obuisson @gi.univ-lr.fr, bbessere @gi.univ-lr.fr, sboukir @gi.univ-lr.fr, ljoyeux @gi.univ-lr.fr
Laboratoire d'Informatique et d'Imagerie Industrielle (L3i) Universit6 La Rochelle, avenue Marillac, F-17042 LA ROCHELLE cedex 1
Abstract This paper presents a robust technique to detect local deteriorations of old cinematographic films. This method relies on spatio-temporal informations and combines two different detectors : a morphological detector which uses spatial properties of the deteriorations to detect them, and a dynamic detector based on motion estimation techniques. Our deterioration detector has been validated on several film sequences and turned out to be a powerful tool for digital film restoration.
1.
Introduction
Most of the techniques in use today for cinematographic film restoration are based on chemical and mechanical manipulations. Applying digital techniques to the field of film restoration let us expect results beyond today's limitations such as automated processing, correction of photographed dust spots and scratches (i.e. after film duplication), the removal of large defects, etc. Our research institute is involved, beside the Laboratoires Neyrac Films company, in the European LIMELIGHT project which aims at designing a complete digital processing chain suitable to restore old films (film scanner, processing workstation, imaging device). Our main work concerns software development for automatic detection of defects like dust spots, hair and small scratches ). Because the processed picture will be imaged back to film, preserving the visual quality in the software process is essential. Thus, the scanning provides high resolutions images (2200 x 1640 pixels or 4000 x 3000 pixels). Of course, these resolutions are uncommon in classical computer vision problems. This involves great difficulties especially when financial viability is aimed at by user of the LIMELIGHT chain. Keeping the processing time short is a significant problem which requires very fast algorithms. Many approaches on defects restoration can be found in previous papers [4], [6]. In these works, the authors consider the "blobs" as impulse distortions or noise. Thus, deteriorations are restored using filtering techniques. These "blind" filters are applied to the entire image, removing deteriorations, but also deteriorating the regions which are not corrupted. A solution to cope with this problem consists in isolating first the regions with defects and then treating only these regions [7]. The following sections describe our detection algorithms.
2.
Dust and scratch detection based on a single image
What are the origins of a dust or hair that is visible on an image ? Mainly, it is a dust particle on the film which shades light during film-to-film copy operation or during film scanning. By the use of a specific high-tech film scanning device, ensuring a high resolution (less than 10 ktm, approximately the film grain size), the digital "signature" left by a dust particle is slightly different from photographed detail of the image, even a sharp, defined one (light dispersion within the sensitive layers). Overall, the characteristics of the defects tend to be : 9 Small surface (varying from 1 to 50 pixels, which is small in a 2200 x 1640 image), 9 The edges of defects have strong gradients. 2.1.
G r a y scale morphology
The four fundamental binary morphological transformations (erosion, dilatation, closing and opening) are all extended to gray scale morphology, without thresholding, via the use of local maximum and minimum operations. Given a gray scale image, I, and a structuring element (SE), B, the following neighbourhood operators ~ and G form the basis of classical mathematical morphology [8], [9], [ 11] : l(x, y) G) B = l(u, v): MAXB~x,y)(I(u, v) - B(u, v)) I(x, y) E) B - I(u,v):MINs(x,y)(I(u,v)+ B(u,v))
Dilatation Erosion
(l(x, y))B = (l(x, y) G) B) G B)
Closing
(l(x, y))B = (l(x, y) 0 B) ~ B)
Opening
594 2.2.
A morphological detector of local deteriorations
The closing operator has the attractive propertie of deleting local minima. Therefore, we can use it to detect black deteriorations. Similarly, the opening operator appears well suited to the detection of white deteriorations. Both morphological detectors of black and white deteriorations are then expressed as a simple difference between successive closing operations (or successive opening operations) and the original image :
Dal,ck(l(x, y),Bo,B n ) = ((((l(x, y) i~ B0) (1) Bn) (~) nn) ~) B0)- I(x, y) Owhite(I(x, y),Bo,B n ) = I(x, y)-((((l(x, y ) 0 B0) O Bn ) I ~ B n ) 1~ B 0) Where SE Be and B,, are defined as : 0 Bo =
0
0
0
0
ooo 000
B.=
~
2n
2n
2n
2n
2n
2n
n
n
n
2n 2n
2n
n
0
n
2n
n
n
n
2n
2n
2n
2n
2n
2n
The use of multiple SE Be and Bn permits to take into account the slope n of image curve gradients. Indeed, defects are generally characterized by very strong gradients. On the other hand, Be allows the detection of defects having smoothest gradients. For example, figure 2 left shows the result of the deteriorations detection using n=O on the image depicted on figure 1. We can notice that for n=O, i.e. without integrating curve gradient properties, the defect profiles are hardly distinguishable Figure1: An ambigous image part from their neigbourhood profiles. On the contrary, using n=30, no ambiguity remains between peaks corresponding to "real" defects and other peaks (see figure 2 right). This result is very satisfactory and demonstrates the robustness of our morphological defects detector.
Figure 2 : Defects detection using n=O(left) and detection using n=30(right)
3.
A dynamic detector of local deteriorations
Working on digitized film sequence gives us a great advantage, because we can use the information on the preceding and following frames. Our second defects detection algorithm uses this spatio-temporal information. Unlike long linear scratches, dust particles appear in a random manner. However, we can't use simple frame substraction or "XORing" to detect them, because, within a sequence, camera and actors or scene elements move around, objects may overlap other objects and/or background details (so-called occlusions or disocclusions). Our dynamic detector relies on both motion-flow estimation and grey-level conservation. There are two main methods to estimate the optical flow of a "noisy" sequence of images : 9 pro-filter the "noisy" sequence and use a classical motion estimation (block matching, regression, etc.). 9 develop a motion-estimation algorithm which is robust to noise or image alteration. We have chosen the first solution for two reasons : 9 It is difficult to know the real sensitivity to noise or to image alteration of a motion estimator. 9 In a high resolution image sequence, motion induces large displacements, up to 200 or 300 pixels. One of the best solutions to quickly estimate such motions is to use a hierarchical structure (image pyramid) [2], [3]. The filtering process is then included in the creation of this hierarchical structure. Having organized image information in a hierarchical manner, a recursive block matching technique is used to estimate the optical flow [5], [10].
595 3.1.
Hierarchical structure
The basic idea is to create a pyramidal representation of an image [1] using the following algorithm"
if (x mod 2 = O) and (y mod 2 = O) then I t+~(x / 2, y / 2,t)= f* I t (x, y,t) where * denotes the convolution operator and f is a given filter. /~ is interpreted as a family of images, where I indicates the level of resolution (or scale). The larger l, the more blurred the original image I is, finally showing the larger structures in the image. Our hierarchical image structure is built using a low-pass filtering such that film grain and deteriorations disappear at heigher levels of the pyramid. Indeed, such high spatial frequencies disturb the motion estimation process.
3.2.
Hierarchical motion estimation
Our method combines the principle of hierarchical motion estimation with a block matching algorithm. In first step, the global motion is estimated allowing only a coarse velocity field to result and, in lower hierarchical levels, the details of the vector field are calculated as relatively small update around the estimated vector resulting from the previous higher level. At each level, displacements are estimated using a recursive block matching algorithm [ 10]. For each pixel of the current grid, we search for the displacement vector which yields the minimum value of a common criterion based on the socalled displaced frame difference (DFD) 9
E(p,d,t) = Z ( D F D ( p i 'd't))2 s
with DFD(p,d,t) = l ( p i , t ) - l(pi - d , t - d t )
w_p
representing a neighbouring window of n x n pixels centered at pixel p , and
d the displacement of p from time t to t - d t . More formally, this recursive search consists of the following steps 9 First, the estimated displacement from the higher level is used as the prediction of the present location 9 d~, (p) = d'+l (p) x 2 To economize the calculational effort, rather than doing full block matching search, we check only 5 vectors (around the predicted position) in the first step and at the very most 3 vectors in the following steps. Figure 3 illustrates this procedure. Then, our algorithm selects the best displacement candidate ~tl, 3 I, ~ {(0,0),(-1,0),(1,0),(0,1),(0,-1)} according to criterion E(p+dto(p)+611,t). The current displacement is then updated" d[ = d~ + SI~ In the next search steps, 3 new candidates are evaluated. Their position depends on the best previous candidate : fl-, = (1,0) ~ 51 e {(O,O),(1,O),(O,1),(O,-1)}
Figure 3" Adaptive block matching search
51_, = (-1,0) :=, &l e {(0,0),(-1,0),(0,1),(0,-1)} 51_, = (0,-1) ~ 61_, = (0,1) ~
&l ~ {(0,0),(1,0),(-1,0),(0,1)} 51 ~ {(0,0),(1,0),(-1,0),(0,-1)}
where i denotes the search iteration number, and displacement (0,0) is related to the best previous selected candidate. Notice that the candidates that have already been checked do not need further evaluation. The current displacement is then updated with the best candidate 9 d~ = d,l_~+ &l The updating process is stopped at the moment the update falls under a threshold, or in case the previous selected candidate remains the best (local minimum), or after a fixed number of iterations. Finally, we design an adaptive search strategy preventing that all possible vectors need to be checked and thus providing fast block matching search. For a maximum displacement magnitude of +3 pixels, this method checks only 20 candidates, while an exhaustive method checks 49. So, the processing speed increases by almost a factor 2.5.
3.3.
Detection of local deteriorations
Once the optical motion flow is correctly estimated, the next frame could be rebuilt without any deteriorations. The absolute value of the DFD is considered as a measure of the quality of the estimated motion. Outliers, usually corresponding to deteriorations, occlusions or disocclusions, are detected when this criterion is higher than a threshold S. These outliers are potential deteriorations.
596 To deal with occlusions and disocclusions, we use a third image in our estimation scheme. The same process as described above is performed between the image at time t and the image at time t+dt. Common spurious points from the two independent motion estimation and comparison processes are selected as deteriorations (fig. 4).
4.
Combination of the two previous detectors
A very good detection rate can be achieved by combining the morphological and the dynamic detectors. The main problems of the latter detectors - false detections and threshold tuning - are bypassed with the double evidence provided by "ANDing" the results of the two detectors. Therefore, the thresholds are fixed at low values in order to detect every deteriorated pixel, but this also increases the number of wrong detections. However, these are not the same for the first and the second detector, and the double evidence eliminates them.
Figure 4 : Frame
5.
I(t) of a film sequence (La belle et la b~te, 1946), and defects detection on I(t)
Summary and Conclusions
We have presented an efficient detector of local deteriorations of old films. This detector combines two different detectors: a morphological detector and a dynamic one. Using a usual criterion in the motion estimation step, we obtain a rate of 3 % of false detections and 5 % of undetected deteriorations. Defects detection is acheaved in about 230 sec. per 2200 x 1640 frame on a standard workstation: 15 sec. for the morphological detection and about 215 sec. for the dynamic detection (which uses 3 images) in an early unoptimized version. Future work will concern detection of oversized defects, intensity distortions, image unstability and, of course optimization, eventually parallelization, of our algorithms.
6.
Acknowledgements
We thank Franqois HELT for helpful assistance. Image reproduction by courtesy of NEYRAC FILM.
7.
References
[ 1]
ANANDAN P. A computational framework and an algorithm for the measurement of visual motion, Int. Journal of Computer Vision, 2:283-310, 1989.
[2]
BAAZIZ N. Approches d'estimation et de compensation de mouvement multir6solutions pour le codage de s6quences d'images, PhD thesis, Universit6 de Rennes I, octobre 1991.
[31 [4]
[5] [6] [7] [8] [9]
BURT P.J. Fast filter transform for image processing, CVGIP, 16:20-51, 1981. GEMAN S., GEMAN D. and McCLURE D.E. A nonlinear filter for the film restoration and other problems in image processing, Graphical Models and Image Processing, 4, 1992. HAAN G. Motion estimation and compensation, PhD thesis, Delft University of Technology, Dept. of EE, Delft, the Netherlands, Sept. 1992. KLEIHORST R.P. Noise filtering image sequences, PhD thesis, University of Delft, 1994. KOKARAM A.C. Motion picture restoration, PhD thesis, University of Cambridge, May 1993. MUELLER S. and NICKOLAY B. Morphological image processing for the recognition of surface defects, Proceedings of the SPIE, 2249-58:298-307, 1994. SERRA J. Image analysis and mathematical morphology, Academic press, 1982
[ 10] SRINIVASAN R. and RAO K.R. Predictive coding based on efficient motion estimation, IEEE Trans. on Communications, COM-33(8):888-896, august 1985. [ 11 ] STENBERG S. Grayscale morphology, Computer Graphics and Image Processing, 35:333-335, 1986.
Invited Session U: COLOR PROCESSING
This Page Intentionally Left Blank
Proceedings IWISP '96; 4- 7 November 1996," Manchester, U.K.
B.G. Mertzios and P. Liatsis (Editors)
9 1996 Elsevier Science B.V. All rights reserved.
599
Segmentation of multi-spectral images based on the physics of reflection. N.H. Kroupnova Department of Electrical Engineering, University of Twente The Netherlands Abstract The paper describes an algorithm for multi-spectral images segmentation that takes into account the shape of the clusters formed by the pixels of the same object in the spectral space. The expected shape of clusters is based on the Dichromatic reflection model ([1]), and it's extension ([2]) for optically homogeneous materials. Further the influence of the illumination and image formation by a color CCD camera are considered. Based on expected shape of clusters we propose a criterion of similarity/homogeneity for the extended region merging algorithm. This criterion works successfully in case of objects of voluntary shape and illumination by one or several sources of the same spectrum.
1
Introduction
To develop segmentation algorithms, it is important to understand how the process of reflection of light by different materials causes the changes of the color and intensity in color images. The shape of the color clusters for the purpose of segmentation was considered also in [3], but the resulting algorithm was constructed for the case of one point-like light source and scene composed of objects made from inhomogeneous materials. We consider the process of light reflection for the scene, composed of several objects made from different materials, as is often the case for the real images. We also analyze how the image formation and interactions of objects influence the color clusters shape. Based on the structure of clusters in a color space we propose a criterion of similarity/homogeneity for the region merging (RM) algorithm. The criterion works in 2D color spaces obtained by two different kinds of projections which allow to eliminate the influence of either highlights or shadows and shape variations on the segmentation results. The algorithm works successfully in case of objects of voluntary shape illuminated by one or more sources of the same spectrum.
2
Expected
2.1
and real shape of color clusters
Theoretically expected shape of color clusters
The expected shape of clusters is based on the Dichromatic reflection model ([1]), and it's extension ([2]) for optically homogeneous materials. In accordance to this model, the reflected light can be described as a sum of 2 vectors, one accounted for body reflection and one for the surface reflection. Both the specular and the body reflection are decomposed into two factors - an "intensity factor", which depends on geometry, and "spectral factor", which depends on wavelength. So, power of light reflected by the surface towards the camera is given by I(A) = L(A)(ms(g)cs(A) + mb(g)cb(A)) (1) where:
L(A) - spectral power distribution of the incident light, g indicates dependence on the geometry, A - wavelength, ms(g) and rob(g) are geometry-dependent and cs(A) and cb(A) are spectral factors of reflectance for the surface and body components respectively. (1) works for both optically homogeneous and inhomogeneous materials, but the behavior of c~(A) and Cb(A) differ. Metals do not have body reflection component, so cb for them is equal to zero. For dielectrics and most metals c~(A) is approximately constant over the visible wavelength range, so the surface reflection component is a vector in the direction of the incident light. The exception are color metals like copper or gold, for them c~(A) varies considerably over the visible wavelength range, causing the color different from silver-grey. A color camera transforms the spectrum of a incoming light into a color space, for instance into 3D RGB (red,green,blue) space. This process is called spectral integration [3]. The output of every sensor si with response function fi can be written as p
8i
=/ s
(2)
600 Or, substituting I(A) from (1)
(3) So, the output vector is a linear combination of two vectors - one is a scaled light source vector in the basis fi (if cs(A) is constant over the visible wavelength range) and the other is a scaled product of the spectral power distributions of light and body reflectance of the object in the same basis. In the ideal c a s e - single point-like light source, no noise or imaging artifacts - the color clusters for inhomogeneous materials consist of 2 lines - the matte line and the highlight line and have a shape of skewed "T" or "L", as described, a.o. in [3]. Because cs(A) for dielectrics and white metals is approximately constant in the visible wavelength range, highlight lines go in the direction of the illumination color. Metals don't have the matte line, since they don't have the body reflection. Color metals as copper have highlight lines in different direction, determined also by Cs(A) which varies considerably over the visible wavelength range. It should be noticed, that dependent on the shape of the object and illumination geometry the color cluster can have several highlight lines or even "loops" instead of highlight lines. Diffuse illumination "spreads" the highlights, giving clusters looking more like an area in the dichromatic plane. The same effect has a very rough surface of an object.
2.2
Distortions of the theoretical shape
We consider theoretically and experimentally the influence of the illumination and image formation process on the shape of the clusters. It can be summarized as follows. In an ideal case of no noise and one or more light sources of the same spectrum color clusters can have different shape, varying from line or skewed "T" to an area in the dichromatic plane, but points of one cluster lie in the same dichromatic plane. Noise of CCD camera makes this plane "thicker". Point spread function (PSF) of the camera, chromatic aberrations, and inter-reflections can cause small parts of the cluster to lie outside the dichromatic plane. Inter-reflections and PSF cause "bridges" between different clusters.
3 Segmentation 3.1
Normalization on the white image
After subtraction of dark current and white balancing images are normalized on the white image. Since the white balance is performed so as to be good in the middle of the field, but gains of the sensors are set for the whole image, the normalization compensates for the site-dependent scaling due to the non-uniformity of illumination and the non-uniformity caused by the beam splitter and fixed pattern noise.
3.2
Projections of the color clusters (2D spaces)
We want to design an algorithm for color images segmentation that takes into account the shape of color clusters The shape of the clusters is simplified by projecting RGB space into 2D color space. We consider 2 kinds of projections here, both on the plane going through (1,0,0), (0,1,0) and (0,0,1) (Fig. 1). M1 and M2 are 2 orthogonal axes perpendicular to the intensity axis, M1 goes through(I,0,0).
~~,l,O~o M ....
M2
Matteu,. e
Parallel projectio-~/21 line _ ~ _~I " Hi~,hlight "line
/z .... /Perspectiveprojection line
M2
~
~ atte line
M1
M2
Mattepoint
~Highlight line ........ M1
Parallel
projection
Perspective
projection
Figure 1: Projections of the color clusters One projection is a parallel projection in the direction of the light source, as is also described in [4]. In coordinates M1, M2 highlight lines will project into points on the matte line, and the matte line will project into line going to the light source projection or to (0,0) for normalized images. So in this projection the highlights are actually eliminated and don't disturb the process of segmentation. However, the shadows and shape variations of the objects continue to play role.
601 Another projection is a perspective projection with the center in (0, 0, {)), the same as implemented when transform an image into HSV color space. The matte line is projected into one point and the highlight line is projected into line going to the light source projection, or, if images are normalized, to (0,0). In this projection the highlights still play the role, but the influence of the shadows and shape variations on the matte color vector are eliminated. One can see, that these two kinds of projections are somehow "complimentary" in the sense of eliminating different influences on the segmentation result. It should be noticed, that in both kinds of projections we consider Cartesian coordinates rather than polar to deal better with objects with low saturation.
3.3
Region merging using 2D color space criteria
We perform the segmentation by RM algorithm using a quad tree structure [6], as described in [5]. There is a distinction made in the size of the regions in regard to the criterion used. When both regions are relatively large, so that we can speak about the distribution of feature vectors, the criterion should reflect kind of distance between the two distributions. The regions R1 and R2 are merged when (see Fig. 2):
M2
(~-u2) ~1+~2 < threshold where #1 - average feature vector (M1, #2 - average feature vector (M1, al - standard deviation of R1 in a2 - standard deviation of R2 in
M2) of the first region M2) of the second region the direction to #2 the direction to #1
Figure 2: Distance between two distributions
The measure used here ranges from zero, indicating definite merging, to infinity, indicating no merging. When one region small, one large, we take Mahalanobis distance of the mean of the small region to the large region as a criterion whether two regions should be merged. When both regions are small, similarity and homogeneity criteria are applied and the result is combined as logical 'and'. R1 and R2 are merged if (#1 - #2) 2 < thresholdu and aRluR2 < threshold~ To calculate aR1uR2, first the largest eigenvector A1 of the covariance matrix of R1 U R2 is calculated and then aR~uR2 is defined as a square root of A1. It gives the largest variance of the resulting region. The RM is implemented using gradual relaxation of merging criteria, which gives a hierarchical sequence of segmentations. It allows sufficiently decrease the dependence on the order of merging, but it also gives possibilities for interpretation of images, since the algorithm merges first regions with strongly "overlapping", then with less and less "overlapping" distributions. RM on the parallel projection tends to give shadows and parts with different orientation as separate segments, RM on the perspective projection tends to distinguish highlights. Depending on the application, the results obtained on both projection can be combined, giving the segmentation independent on either shadows and orientation, or highlights, or both. The common problem of 2 projection is difficulty with dealing with achromatic objects of different value, like black and white. To distinguish them, value of intensity has to be used.
3.4
Segmentation example
Fig. 3 shows an image of several objects made from different materials: aluminum and copper cylinders, plastic blue duck and red and blue plastic caps on red and yellow background. Fig. 5 and 4 show histograms in M1, M2 coordinates for both kinds of projections, reflecting how complex are the clusters shapes even for a comparatively simple scene. Fig 6 and 7 shows results of RM by two different threshold values. Note the difference in segmentations. Fig. 8 shows the combination of results to get segmentation independent on the shadows, orientation and highlights.
4
Concluding remarks
In this paper we propose an algorithm for multi-spectral images segmentation that takes into account the shape of the clusters formed by the points of the same object in a color space. We provide physical foundation for the algorithm, analyzing the influence of the image formation process, illumination and interactions of objects on the shape of the clusters. The proposed algorithm is RM with similarity/homogeneity criterion that works on two different kinds of projections, allowing to eliminate the influence of either highlights or shadows and shape variations on the segmentation result. The algorithm hierarchically merges less and less "overlapping"
602
Figure 3: Image of several objects of different materials
Figure 4: Parallel projection histogram
Figure 5: Perspective projection histogram
Figure 6: RM on parallel projection by different threshold values
Figure 7: RM on perspective projection by different threshold values
Figure 8: Projections combination
distributions of color vectors, thus finding first very dense clusters corresponding to uniform parts of objects and then less dense clusters formed by the parts of objects where the color is influenced by some factors. One of the future research topics is to investigate the possibilities for interpretation of the images that can be driven from the hierarchical sequence of segmentations. Another interesting topic is to use for image interpretation the differences and correspondences in the results of RM on two different kinds of projections.
References [1] S.Shafer "Using color to separate reflection components", Color research and application, Vol.10, pp.210-218, 1985. [2] G.Healey, "Using color for geometry-insensitive segmentation", J. Opt. Soc. Am., Vol.6, pp.920-937, 1991. [3] G.Klinker, S.A. Shafer, T.Kanade , " A physical approach to color image understanding", Int. Journal of Computer Vision, Vol.4, pp.7-38, 1990. [4] S.Tominaga, "Surface identification using the dichromatic reflection model" IEEE Trans. PAMI, Vol.13, pp.658-670, 1991. [5] N. Gorte-Kroupnova, B.Gorte, "Method for multi-spectral images segmentation in case of partially available spectral characteristics of objects",Proceedings of "Machine Vision applications in Industrial Inspection IV", (ISeJT/SPIE Symposium on Electronic Imaging), 28 January- 2 February 1996, San Jose,CA,USA. [6] S.L.Horovwitz, T.Pavlidis, "Picture segmentation by a tree traversal algorithm", J.A CM, Vol.23, pp.368-388, 1976.
Proceedings IWISP '96," 4- 7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
603
U s i n g Color Correlation To I m p r o v e R e s t o r a t i o n Of Color I m a g e s Daniel K e r e n A n n a Gotlib D e p a r t m e n t of M a t h e m a t i c s and C o m p u t e r Science, T h e U n i v e r s i t y of Haifa, Haifa 31905, Israel dkere n @ mat hcs 2. haifa, ac. il Hagit H e l - O r D e p a r t m e n t of Psychology, Jordan Hall, Stanford University, C A 94305, U S A gigi @whit e. st anfo rd. ed u 1
Abstract
The problem addressed in this work is restoration of images that have a few channels of information. We have studied color images so far, but hopefully the ideas presented here apply to other types of images with more than one channel. The suggested method is to use a probabilistic scheme which proved rather useful for image restoration, and incorporate into it an additional term, which results in a better correlation between the three color bands in the restored image. Initial results are good; typically, there's a reduction of 30% in the RMS error, compared to standard restoration carried out separately on each color band.
2
Introduction
A rather general formulation of the restoration problem is the following: Given some partial information D on a image F, find the best restoration for F. Obviously, there are many possible ways in which to define "best". One way, which proved quite successful for a wide variety of applications, is probabilistic in nature: Given D, one seeks the restoration which maximizes the probability P r ( F / D ) . Following Bayes' rule, this is
604 equal to
Pr(D/F)Pr(F)
Pr(D) . The denominator is a constant once D is measured; is usually easy to compute. Pr(F)is more interesting, and more difficult to define. Good results have been obtained by following the physical model of the Boltzman distribution, according to which the probability of a physical system to be in a certain state is proportional to the exponent of the negative of the energy of that state - that is, low-energy, or "ordered" states, are assigned higher probability than high-energy, Or "disordered", states [3, 7]. It is common to define the energy of a sign al by its "smooth-
Pr(D/F)
ness"; the energy of a one-dimensional signal F is often defined by f
F~2dx'
etc. Such integrals are usually called "smoothing terms", as they enforce the resulting restoration to be smooth [5, 8, 4, 6]. Note that here "smooth" does not mean "infinitely differentiable", but "slowly changing".
3
Main B o d y
To see how the probabilistic approach naturally leads to restoration by socalled "smoothing", or regularization, let us look at the problem of reconstructing a two-dimensional image from sparse samples, which are corrupted by additive noise. Suppose the image is sampled at the points {xi, yi}, the sample values are zi, and the measurement noise is Gaussian with variance a 2. Then n Pr(D/F) c( e x p ( - ~ [F(xi,2___~yi)-zi] 2 ) i----1
and, based on the idea of the Boltzman distribution, one can define P r ( F ) as being proportional to
ll
+
+
so, the overall probability to maximize is
exp(-(~[F(x"Yi)-zi]2 2--a92 + A/ I n
(F~,~ + 2F~2~+ F~%)dudv))
i=1
which is, of course, equivalent to minimizing n
i=l
2a 2
(1)
This leads, via calculus of variations, to a partial differential equation, which can be effectively solved using multigrid methods. Other problems - s u c h as deblurring- can be posed First, let us look at the problem of deblurring a single-channel image (for instance, a gray level image). One is given a gray-level image D, which is a corrupted version of the true image F, and the goal is to reconstruct this F. Typically, one assumes that F was blurred by convolution with a kernel H, and corrupted by additive noise, which results in the mathematical model D = F , H 4- N, where 9 stands for the convolution operator and N is
605 additive noise. Proceeding as in the paradigm described above, one searches for the F which minimizes
let us proceed to shortly describe how this idea is extended to restoring multi-channeled images. Now, suppose we are given a color image, with RGB channels, that underwent degradation by convolution with H (for simplicity's sake, assume it is the same H for all channels, although it doesn't have to be so in the general case). One obvious way to reconstruct the image is to run the deblurring algorithm described above, for each of the separate channels, and combine the restored channels into a color image. Such an approach, however, does not work well in general. Usually, the resulting image is still quite blurry, and contaminated by false colors; that is, certain areas contain streaks of colors which do not exist in the original image. This problem is more acute in highly textured areas. The proposed solution to these problems is to incorporate into the probabilistic scheme a "correlation term", which will result in a better correlation between the RGB channels. Formally, if C~,y is the covariance matrix of the RGB values at a pixel (x, y), the probability for the combination of colors (R(x, y) G(x, y) B(x, y)) is proportional to e x p ( - 89 y) G(x, y) B(x, y))C~,~(R(x, y) G(x, y) B(x, y))) . Multiplying over all the pixels results in adding these terms in the exponent's power. Exactly as in the interpolation problem above, this exponential term combines with the other exponential terms, and we get a combined exponential that has to be maximized; therefore, we have to minimize the negative of the power, which simply results in adding the "correlation term",
i / ( R ( x , y) G(x , y)B(x , y) )Cx,y -1 ( R(x, y) G(x, y) B (x , y ))tdzdy, to the ex~g
pression of Eq. 1 (after subtracting the averages of the RGB channels). In effect, this term makes use of the fact that, in natural and synthetic images, the RGB channels are usually highly correlated. The "correlation term" penalizes deviations from this correlation, thus "pushing" the restored image towards one whose channels are "correctly correlated". Therefore, the combined expression to minimize is
liD- F 9HII2 + )~1(/ /
Jl
2) dxdy (R2~ + 2R2y + Ryy
II
We have implemented a simple iterative scheme for minimizing this functional. A substantial improvement was obtained using the "correlation term". A color photograph was blurred, and restored with and without the correlation term. When using this term, the resulting restoration is
606 sharper, and contains less "false colors". Comparing it against the original image shows that the RMS error is about 30% smaller than when restoring each channel separately. We have also used the "correlation term" to solve the "demosaicing" problem, in which one has to reconstruct a color image, given only one color band at each pixel [1, 2]. This was accomplished by incorporating the "correlation term" into the solution to the interpolation problem described above; usually, this also resulted in a reduction of about 30% in the RMS error.
4
Summary
An algorithm was suggested to restoring multi-channel images; it uses the correlation between the different channels to improve results. The algorithm was applied to color images and it usually resulted in an improvement of 30% or so in the RMS error as compared to standard restoration applied separately to each channel.
References [1] D.H. Brainard. Bayesian method for reconstructing color images from trichromatic samples. In Proceedings of the IS 8~T Annual Meeting, 1994. [2] W.T. Freeman and D.H.Brainard. Bayesian decision theory, the maximum local mass principle, and color constancy. In International Conference on Computer Vision, pages 210-217, Boston, 1995. [3] S. Geman and D.Geman. Stochastic relaxation, gibbs distribution, and the bayesian restorat ion of images. IEEE Trans. on Pattern Analysis and Machine Intelligence, 6:721-741, June 1984. [4] B.K.P Horn and B.G. Schunck. Intelligence, 17:185-203, 1981.
Determining optical flow. Artificial
[5] D. Keren and M. Werman. Probabilistic analysis of regularization. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15:982-995, October 1993. [6] J. Skilling. Fundamentals of Maxent in Data Analysis. In Maximum Entropy in Action, Edited by B. Buck and V.A. Macaulay. Clarendon Press, Oxford, 1991. [7] R. Szeliski. Bayesian Modeling of Uncertainty in Low-Level Vision. Kluwer, 1989. [8] D. Terzopoulos. Multi-level surface reconstruction. In A. Rosenfeld, editor, Multiresolution Image Processing and Analysis. Springer-Verlag, 1984.
Proceedings I W I S P '96," 4-7 N o v e m b e r 1996; Manchester,
U.K.
B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
607
Colour Eigenfaces Graham
D. F i n l a y s o n t, J a n e t D u e c k * , B r i a n V. F u n t * , a n d M a r k S. D r e w *
t D e p a r t m e n t of C o m p u t e r Science, U n i v e r s i t y of Y o r k , York YO1 5DD email
[email protected],
* S c h o o l of C o m p u t i n g S c i e n c e , Simon Fraser University, Vancouver, Canada. {janet,funt,mark}@cs.sfu.ca
Abstract
Images of the same face viewed under different lighting conditions look different. It is no surprise then that face recognition systems based on image comparisons can fail when the lighting conditions vary. In this paper we address this failure by designing a new lighting condition independent face matching technique. We begin by demonstrating that the colour image of a face viewed under any lighting conditions is a linear transform from the image of the same face viewed under complex (3 lights at 3 locations) conditions. Our new matching technique solves for the best linear transform relating pairs of face images prior to calculating the image difference. For a database of 15 (complexly illuminated) faces and 45 test face images the new matching method delivers perfect recognition. In comparison, matching without accounting for lighting conditions fails 25% of the time. I. INTRODUCTION
One of the most successful and widely used technique for face recognition is the eigenface method of Turk and Pentland[9], [8]. The basic idea in that method is, that the g r e y s c a l e images of the same face seen in different circumstances should be quite similar. Recognition takes place by comparing the image of an unknown face with face images stored in a database. The closest database image identifies the face. Because, in general, images are very large, image matching is very expensive. In order to reduce matching cost, Turk and Pentland approximated each face image as a linear combination of a small set of basis faces called
eigen/aces. Unfortunately, images of the same face viewed under different lighting conditions rarely look the same i.e. their shading fields will differ. This problem can be mitigated by viewing each face under a variety of lighting conditions and storing this variation in the face database[I], [3], [5]. The multiple image approach succeeds because each separate image encodes a certain amount of information about the shape of the face; that is, at an implicit level, the multiple image approach is concerned with matching shape. However, it is not clear how the notion of shape can be made explicit. We certainly do not want to solve for shape since although this can be done[10] highly specialized calibrated conditions are needed. In this paper we show that shape information is easily obtained so long as face recognition is based on colour images. Specifically we show that: t h e i m p l i c i t n o t i o n o f s h a p e is e x p l i c i t l y c a p t u r e d in a single 3 - b a n d c o l o u r i m a g e . This result follows from Petrov's[6] seminal work on the relationship between illumination, reflectance, shape and colour images in which he demonstrated that, so long as a Lambertian surface is viewed under a complex illumination field (at least 3 light spectrally distinct light sources at different locations), the rgb pixel triplets in an image are a linear transform from scene surface normals: c o l o u r is a linear transform from shape. In our method each database face is created with respect to a complex illumination field. Face recognition simply involves matching the image of an unknown face to the face database. Each database face image is first transformed (by a linear transform) to best match the image colours of some unknown face. Thereafter, the residual difference is calculated. The database face with the smallest residual difference overall identifies the face. In line with Turk and Pentland the cost of matching is reduced by approximating face images using a small number of colour eigenfaces. II. FACE RECOGNITION USING EIGENFACES Let us represent an n • n greyscale image by the function I such that I ( x , y ) denotes the grey-level at Face recognition location x, y. Suppose we have a database A/I of m face images: M = { I i , I 2 , . . . , I m } . is all about finding the image Ic in M which is closest to some unknown face image Iu. Mathematically we might define a function ~ which takes I~ and M as parameters and returns the closest match Ic:
9 (Iu,.M)
-
Ic
: Ic e M ~: [[Iv - Iq[[d < [[Ii -- Iq[[ (i - 1 , 2 , . . . , c - - 1, c4- 1 , . . . , m )
(1)
where ][.][d is a distance measure (usually Euclidean) which quantifies the similarity of two images. To reduce computation a face image I can be represented (approximately) by a linear combination of basis faces (which
608
Turk and Pentland call
eigenfaces). n i--1
here Bi is the ith (of n) eigenface and fli are weighting coefficient chosen to minimize" n
III- ~ ~,B, IId
(3)
i=1
Clearly the error in the approximation defined in (3) depends on the set of eigenfaces used. In general the eigenfaces are selected to minimize the expected residual difference in (3). This is done using standard statistical techniques (e.g. principal component analysis[4]). However, eigenfaces based on other error criteria are sometimes used[7]. Turk and Pentland have shown that a small number of eigenfaces (just 7) renders the error in (2) reasonably small. Denoting eigenface approximations with the superscript ', the function ~' is defined as:
9'(I~,M)
-
I='
9 I~' e M ' ~ Ilia' - rqlld < III~ - rqll (i = 1 , 2 , . . . , c -
1, c + 1,.-. ,m)
(4)
Because each of I ' and Iq are defined by just n numbers (the coefficients fl in (2)) it is staightforward to show that the cost of each image comparison is proportional to n. Usually n < < # pixels in an image, so matching is very fast. Turk and Pentland[8] have shown that the function (I)' suffices for face recognition so long as illumination conditions are not allowed to vary too much. III.
C O L O U R AND SHAPE
The light reflected from a surface depends on the spectral properties of the surface reflectance and of the illumination incident on the surface. In the case of Lambertian surfaces (these are the only kind we consider here), this light is simply the product of the spectral power distribution of the light source with the percent spectral reflectance of the surface. Illumination, surface reflection and sensor function, combine together in forming a sensor response"
P-~= e-'n=/w S=(A)E(A)R---(A)dA
(5)
where A is wavelength, _p is the 3-vector of sensor responses (rybpixel value) __Ris the 3-vector of response functions (red-, green and blue- sensitive), E (assumed constant across the scene) is the incident illumination and S ~ is the surface reflectance function at location x on the surface which is projected onto location ~ on the sensor array. The relative orientation of surface and light is taken in account by the dot-product of the surface normal vector n_~ with the light source direction _e (both these vectors have unit length). Let us denote fw S=(A)E(A)R_(A)dA as q=. It follows that (5) can be rewritten as:
_p~ = qe'n =
(6)
---P&-- ql ~ nx ~t_ q2_et2~ x
(7)
where t denotes vector transpose (e_.nx = etn=). Now consider that a scene is illuminated by two spectrally distinct light sources at distinct locations. If we denote illumination dependence using the subscripts 1 and 2 then equation (6) becomes
Assuming k lights incident at x: k
e ~ - [ Z q,~-~]~-~
(s)
i--1
So long as k -> 3 the term [~]ki=1 qi-e~] will define a 3 x 3 matrix of full rank. In this case there is a one to one correspondence between the colours in an image and the normal field of a scene. Shape and colour are inexorably intertwined. It is important to note that the relationship between surface normal and camera response depends on the reflective properties of the observed surface and the particular set of illuminants incident at a point. C1 .anging ~he reflectance or the illumination field changes the relationship between surface normal and image colour. Henceforth we will assume that faces are composed of a single colour and that faces are illuminated
609
by a homogeneous illumination field and as such a single 3 • 3 matrix relates all surface normals and image colours. IV.
FACE RECOGNITION USING COLOUR EIGENFACES
Let us represent an n x n colour image by the vector function / such that I(x, y) denotes the (r, g, b) vector at location x, y and records how red, green and blue a pixel appears. As before let us suppose we have a database AA of m images: M {Zl,I2,...,Zm}. Crucially, we assume that each database face image is created with respect to a complex illumination field and is thus a linear transform from the corresponding normal field. This relationship is made explicit in (9) where where Ni(x, y) is a vector function which returns the surface normal corresponding to Zi(x,y). The 3 x 3 matrix relating normal field to image colours is denoted T~. =
I~(x,y) = T~Ni(x,y) , (i = 1, 2 , - . - , m )
(9)
Suppose/~ denotes the image of an unknown face viewed under arbitrary lighting conditions. Clearly,
L~(~, y) = ~ N ~ ( x , y)
(10)
Suppose that I_j is an image of the same face (in AA). It is unlikely that Tj will equal T~. However, it is immediate from (9) and (10) that I_j must be a linear transform f r o m / ~ :
T~Tj-II_j = I .
(11)
where -1 denotes matrix inverse. It follows that a reasonable measure for the distance between a database image Li and L, can be calculated as:
117-(L,L)L
-
LII~
(12)
where T(L~, Iu) is the 3 • 3 matrix which best maps Li to L~. In the experiments reported later T() returns the matrix which minimizes the sum of squared errors and is readily computed using standard techniques[2]. Relative to (12) a closeness function 9 for colour face images can be defined as:
V(L,M)
= L
Lc E,M &
"
II'r(L,L~)L-LIId < I I T ( L , L , ) L , - L I I
(i-l,2,'",c-l,c+l,'",m)
(13)
To reduce computational cost of computing (13) we represent (in a similar way to the greyscale method) each band of a colour image as a linear combination of basis vectors:
Z~ ~ ~2-~~ " B ~ , (a=r,g, b)
(14)
i--1
where r, g and b denote the red, green and blue colour bands, the coefficients 13~ (a = r, g, b) are chosen to minimize the approximation error. To derive the eigenfaces to use in (14) a training set of colour face images is compiled. Each image is split into its 3 component band images and thereafter principal component analysis on the entire band image set. Denoting colour eigenface approximations with the superscript ', the function ~' is defined as:
v'(z'~,M) = L
9L c M
IIT(L'~,L')L'~- L ' l l d
< IIT(L',,
Lu)L,-/~ll (i--1,2,...,c-l,c+l,..-,m) ' ' '
(15)
It can be shown that the cost of calculating (15) is bound by the square of the number of eigenfaces used" matching costs O(n 2) (instead of O(n) for black and white faces). V.
RESULTS
The colour images of 15 people (see Figure 1) viewed under 3 complex illuminations provide a training set for eigenface analysis. We found that 8 eigenfaces provide a reasonable basis set (the approximation in (14) is fairly good). The eigen approximations for the 15 faces viewed under one of the complex illuminations comprises the face database. A further 45 test images were taken (the same faces under 3 more illuminants) under non-complex illuminations.
610
Fig. 1. Colour Face Images Each test image was compared with each database image using equation (15). The closest database image defines the identity of the face in the test image. We found that all 45 faces (a 100% recognition rate) were correctly identified. Importantly we found that faces were matched with good confidence; on average the second closest database face was at least twice as far from the test image as the correct answer. We reran the face matching experiment in greyscale using Turk and Pentland's original eigenface method. Greyscale images were created from the colour images (described above) by summing the colour bands together ( g r e y s c a l e - r e d + g r e e n + b l u e ) . We found that 7 eigenfaces is sufficient to approximate the training set. As before the face database comprises eigen approximations of each of the 15 faces viewed under a single illuminant. Test images were compared with the face database using (4). We found that only 32 of the faces were correctly identified (a recognition rate of 73%). This is quite poor given that the face database is quite small. VI. CONCLUSION Shape and colour in images are inexorably intertwined. A single coloured Lambertian surface viewed under complex illumination conditions is a linear transform from the surface normal field. It follows that the image of a face observed under any lighting conditions is a linear transform from the same face viewed under a complex illumination field. We use this result in a new system for face recognition. Database faces are represented by colour images taken with respect to a complex illumination field. Matching takes place by finding the linear transforms which takes each database face as close as possible to a query image. The closest face overall identifies the face (in the query image). To speed computation all faces are represented as a linear combination of a small number eigenfaces. Experiments demonstrated that the colour eigenface method delivers excellent recognition. Importantly recognition performance, by construction, is unaffected by the lighting conditions under which faces are viewed. That this is so is quite significant since existing methods[9] require the lighting conditions to be held fixed (and fail when this requirement is not met). REFERENCES [1] RussellEpstein, Peter J. Hallinan, and Alan L. Yuille. 5• eigenfaces suffice: an empirical investigation of low-dimensional lighting models. In Workshop on Physics-Based Modelling, ICCV95, pages 108-116, 1995. [2] G.H. Golub and C.F. van Loan. Matrix Computations. John Hopkins U. Press, 1983. [3] Peter W. Hallinan. A low-dimensional representation of human faces for arbitrary lightin conditions. In Proceedings o/the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 995-99, 1994. [4] I.T. Jolliffe. Principal Component Analysis. Springer-Verlag, 1986. [5] Shree K. Nayar and Hiroshi Murase. Dimensionality of illumination manifolds in eigenspace. Technical report, Columbia University, 1995. [6] A.P. Petrov. On obtaining shape from color shading. COLOR research and application, 18(4):236-240, 1993. [7] D.J. Kriegman P.N. Belhumeur, J.P. Hespanha. Eigenfaces vs fisherfaces: recognition using class specific linear projection. In The Fourth European Conference on Computer Vision (Vol I), pages 45-58. European Vision Society, 1996. [8] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, March 1991. [9] M. Turk and A. Pentland. Face recognition using eigenfaces. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 586-91, 1991. [10] R.J. Woodham. Photometric method for determining surface orientation from multiple images. Optical Engineering, 19:139-144, 1980.
Proceedings IWlSP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
611
Colour quantification for industrial inspection Maria Petrou and Constantinos Boukouvalas
Department of Electronic and Electrical Engineering, University of Surrey, Guildford, GU2 5XH, United Kingdom
Abstract In this paper we discuss the application of some of the most recent advances in the Psychophysics of colour, for the development of a colour grading system capable of replacing the human expert inspector in colour based quality control of manufactured products. In particular, we are discussing the problem of replacing the spectral sensitivity of the electronic sensor with that of the human visual system, so that agreement to the sub-level accuracy between the recordings of the electronic and the human sensors can be achieved. We demonstrate our methodology by grading automatically some coloured ceramic tiles previously graded by human experts operating at the threshold of human colour perception. 1. I n t r o d u c t i o n The greatest success of the vision research has been in developing vision systems that perform a specific task, because by narrowing the field of operation, the quality'of the performance can be greatly improved. Visual industrial inspection, however, has not become a matter of routine yet. Several industrial tasks have already been fully automatic, but the aspect which seems to present most resistance to the process of automation is that of final product quality control. The reason is that the automatic inspection in order to be acceptable to the manufacturer has to be at the level performed by the trained human inspectors at the peak of their performance. Part of the process of inspection of the final product is the inspection of colour and in particular the categorisation of the products in grades of colour, i.e. in "lots" or "batches". To achieve this automatically, one has to overcome a series of problems: The distortion caused to the recorded colour by the temporal variation of the illumination. Indeed, experiments have shown [1] that while the colour differences one has to detect are of the order of half a grey level (in a full scale of 0 to 255), the temporal variation of illumination even when it is controlled, could be several grey levels from one inspected object to the next. 9 The distortion caused by the spatial variation of illumination. On a fiat surface, like a tile, the illumination can very by as much as 10 grey levels from one end of the object to the other [2]. 9 The thermal noise in the image capturing device that can be random with variance of several grey levels. 9 The non-linear response of the sensors over the range of colours that might be present on the same object. The spectral response of each sensor which is not a pure delta function and which is clearly different from the spectral response of the sensor used by the human observer which an automatic system is expected to replace [4].
612 We have presented elsewhere the methodology for coping with the variations of the illumination, the non-linear responses of the camera and the thermal noise of the devices. Here we present methodology that uses established results of the Psychophysics of Vision to cope with the demands of an industrial application, and allows the identification of colour grades that correspond to the threshold of human colour perception which are discriminated by human inspectors working at the peak of their performance. To achieve this, the proposed methodology had to be able to measure colour differences at least one order of magnitude smaller than the various types of noise involved in the process of colour recording. We shall demonstrate our methodology for the particular application of ceramic tile colour grading. 2. C o l o u r G r a d i n g a n d t h e s e n s o r s ' r e s p o n s e s The visible part of the electro-magnetic spectrum can be discretized and represented by the values at n equidistant wavelengths. Then the true spectral reflectance of a tile is given by a set of n (unknown) numbers, (one for each sample wavelength chosen): R(A)
=
(rl,r2,..-,r~)
(1)
where ri is the reflectance at wavelength Ai. Similarly, the spectrum of the illumination used can be represented by: A(1)
=
(al,a2,...,an)
(2)
Assume also, that we have three sensors with known spectral sensitivities:
QI(A)
=
Q
=
(A)
(Q11,Q12,...,Qln)
=
(3)
The three sensors will record the following values: ql
=
rlQlla1 + r2Q12a2 + ' " - t - r n Q l n a n
q2 =
rlQ21al + r2Q22a2 + ' " + rnQ2na~
q3 =
rlQ31al + r2Q32a2 + ' " + rnQanan
(4)
In the above expressions, qi, Qij and ai are known and ri are unknown. As we only know the recordings of the three colour sensors, we have n - 3 degrees of freedom. Ideally, we would like to solve for the unknown reflectances and then blend them again using the sensitivities of the retina cones to work out what intensities the human sensor would have recorded from the same surface. A straightforward solution to this problem is not possible due to the fact that it is under-determined as typically n = 31. We make, however, the following assumption: The transformation between the three recorded intensities by the electronic sensors and the three intensities the human sensors would have had recorded, is atone. This assumption may not hold for the whole 28 dimensional space. However, as we are interested in surfaces which are very similar to each other, we are really concerned with a very small subspace of the colour space. No matter how complicated the relationship between the electronic and the human recordings is, locally it can always be approximated by a linear transformation. With the help of a spectrometer, we measured the reflectances of some typical tiles in the 31 wavelengths of interest. We then randomly chose hundreds of 31-tuples of intrinsic reflectances that complied with the restrictions of the sensor recordings and were confined in the colour subspace of interest as indicated by the spectrometer. For each one of them we found the signals expected
613 to be recorded by the electronic and by the human sensors. Thus, we created a large number of corresponding triplets of recordings. We then identified the elements of the atone transformation between the two sets of recordings in the least square error sense. This transformation then can be used to predict what the human sensor would have recorded, given what the electronic sensors have recorded. Knowing, however, what the human sensor records is not equivalent to knowing what the human brain sees. There is an extra non-linear process which converts the sensory recordings to perceptions. In Lab coordinates we know that the Euclidean distance between any two points is proportional to the perceived colour difference between the two colours represented by these two points. Thus, after the data have been spectrally corrected and the effects of the spatial and temporal variations of the illumination have been removed as described in [3], they are finally converted into the perceptually uniform colour space Lab, where the colour grading is performed by clustering. 3. E x p e r i m e n t a l r e s u l t s a n d C o n c l u s i o n s The above process was applied to several series of tiles graded by human experts. Figures 1 and 2 illustrate the grading of two sets of uniformly coloured tiles. For the purpose of presentation, tiles classified to the same grade by the human observer are represented by the same symbol. Each tile is represented by its mean values in the Lab system. In panels a we show the tiles without the spectral correction proposed here, while in panels b after the proposed correction. In both panels the orientation of the axes is the same and we can see that after the proposed correction the clusters identified by the humans become more distinct. This conclusion was confirmed by similar experiments with other sets of tiles. We conclude by stressing that when the vision system developed has to replace the human inspectors operating at the threshold of their vision ability, effects like the one discussed in this paper become significant and they have to be taken into accoun-t.
Figure 1: Colour Shade Grading of Linz tiles. Tiles represented by the same symbol were classified to the same colour class by human experts.
614
Figure 2: Colour Shade Grading of Koala tiles. Acknowledgements This work was carried out under the A S S I S T project, B R I T E - E U R A M H 5638. We also want to thank Dr. K. Homewood for his help in taking the spectrophotometric measurements.
References [1] Boukouvalas, C, Kittler, J, Marik, R and Petrou, M (1994). "Automatic grading of ceramic tiles using machine vision". Proceedings of the 1994 IEEE International Symposium on Industrial Electronics, Sandiego, Chile, pp 13-18. [2] C. Boukouvalas, J. Kittler, R. Marik & M. Petrou, "Automatic Colour Grading of Ceramic Tiles Using Machine Vision", to appear in IEEE Transactions on Industrial Electronics, February 1997. [3] C. Boukouvalas, J. Kittler, R. Marik & M. Petrou, "Automatic Grading of Textured Ceramic Tiles", Machine Vision Applications in Industrial Inspection, SPIE 2423, San Jose, 1995. [4] Wyszecki, G. & Stiles, W. S. "Color Science", 2nd Edition, NewYork 1982, Wiley.
Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
615
C O L O U R O B J E C T R E C O G N I T I O N USING PHASE C O R R E L A T I O N OF L O G - P O L A R T R A N S F O R M E D F O U R I E R SPECTRA. A L Thornton, S J Sangwine The University of Reading, England.
Abstract The knowledge of the rotation, scaling and translation of an object in comparison with a reference object is important in the recognition process. The work which is described below uses fourier transforms, log-polar coordinate transformations and phase correlation, together with a complex number representation for colour, to determine these variances and recognise a coloured object.
Introduction Much research has been conducted into the recognition of objects in monochrome images using frequency domain processing. This, however, ignores the useful information that can be contained in colour representations. The work which is described in this paper uses a novel colour representation together with a new combination of Fourier and Log-Polar transforms to make possible colour object recognition with invariance to translation, scaling and rotation. The importance of phase in signals has been shown [ 1] and this has led to the idea of using phase to locate coloured objects. An established frequency domain technique for locating objects, Phase Correlation, is described and the advantages of combining the colour representation, the Log-Polar Transform and the phase correlation technique are demonstrated.
Overview The Fourier transform can be thought of as a translation invariant algorithm but it will not overcome problems associated with the scaling and rotation of an object in an image. One method to remove these variations is the use of the Fourier-Mellin Transform which has been well documented in the literature, [2]. This procedure consists of an FFT followed by a log-poIar transform followed by another FFT. The first FFT removes translation variance since the spectrum of an object will be similar no matter where the object is located in the image. The Log-Polar Transform, [3,4], reduces rotation and scaling to translations which are then made invariant by the second FFT. To achieve recognition, the result is then correlated with another image which has undergone the same process. However, a disadvantage with this process is that it does not make the best use of the information available as the result will only determine if there is a similar object in both images. It would be more useful to be able to quantify the scaling, rotation and translation. The process of Phase Correlation, which is described below, has therefore been introduced and this is the main novel feature of the work reported in this paper. This processing is inspired by the Fourier-Mellin transform, but is able to quantify the rotation, scaling and translation and recognise object colour. The block diagram of the system is shown in figure 1 and will be discussed later.
Complex Log-Polar Transformation The log-polar coordinate transformation is a method of sampling an image in such a way that if an object is rotated this causes the log-polar transformed image to move up or down in comparison with a reference image. In a similar manner, if an object is scaled this causes the log-polar transformed image to move right or left in comparison with a reference image. The amount of shift on either axis is indicative of the amount of scaling or rotation undergone by the object of interest. A constraint on the complex log-polar transform procedure is that the object of interest must be near the centre of the image. However, if a Fourier transform is calculated before the coordinate transformation this constraint is overcome, since the data is inherently centred in the spectrum. Thus, by applying a LogPolar transformation to the spectrum, we avoid the need to locate the centre of the object of interest.
616 (It should be remembered that the rotation and scaling of a object in an image causes rotation and inverse scaling of the components of the spectrum due to the object.) I G[II]
FFT I
...._l
"-I
Log-Pohr
Transformation
| Phase Cm'relatiml
Phase Cocrelatim
Rotatian& .--I Rotate ScalingPeak"- I & Scale
Translaficm Peak "-
Log-Polar Transfmmafiml
Figure 1. Translation, Scaling and Rotation Invariance System Diagram Phase Correlation If object recognition is to take place, the location of the object in the image must be found. Phase correlation, [5,6], is a method for determining the translation of an object between one image and another. The result of the computation produces a peak corresponding to the spatial displacement of the object which can be used to locate the object in an image. A reference image is compared with another which we call the object image by multiplying the FFT of one (G1) by the complex conjugate of the FFT of the other (G2*). The normalised cross-power spectrum is obtained, and from this the phase correlation surface (P) is calculated by taking the inverse Fourier transform (F l) of the spectrum. Assuming that an object is contained in both the reference and object image, the result is an intensity peak in the phase correlation surface (P) whose position can be used to determine the displacement between the reference and object image. The calculation is shown in eqn. 1.
-](G1G2* ) P=F
LIG1G2.I
(1)
The same method that is used for the phase correlation of intensity images can be used for the phase correlation of colour images by using the IZ colour representation. This method, which has been more thoroughly discussed in [7], uses a complex number, Z, to represent the colour information, where hue is the argument of Z and a value related to saturation is the modulus of the complex number. The intensity, I, is represented separately. Because a complex number containing colour information is used to represent the image, the result of the phase correlation can discriminate between the different colours of similarly shaped objects. The argument of the displacement peak (which is complex) gives an angle whose value corresponds to the difference in colour between the object in the reference image and that in the object image. The advantage of using the complex colour representation is that the colour of the displaced object is calculated as part of the location procedure with no extra processing. If a monochrome image were used in the location procedure, the object would be found but it would not be possible to estimate its colour. Therefore, extra information is gained for no extra processing than a monochrome image would require.
Translation, Scaling and Rotation Invariance
The amount of scaling and rotation between the two images can be determined by combining phase correlation and the log-polar coordinate transformation. The block diagram of figure 1 illustrates the processes involved; the letters within circles in the diagram indicate images which appear as outputs
617 in figure 2. Some of the processing which is required to implement the block diagram can be performed before the capture of the object image. The reference image can be previously captured and stored, and its FFT, Log-Polar transform and FFT required for phase correlation can be calculated. This will save processing time when object images are to be compared to a reference image. As can be seen in figure 1, after each of the two images has undergone a Fourier transform and logpolar coordinate transformation, phase correlation of these two transformed images is calculated. Information about the position of the phase correlation peak can then be used to alter the object image so that scaling and rotation variance is removed. This is a more precise method than iteratively rotating the spectrum by small angles and altering the scaling until the result is found to match the spectrum of the reference image, [8]. Once the scaling and rotation differences have been removed the translation of the object can be found by phase correlation between the original reference image and the corrected object image and, as discussed above, the colour of the object found. The outputs of these processes are shown in figure 2, where each letter indicates at which point in figure 1 the output was obtained.
Figure 2. Outputs from the processes described in figure 1
618 Figures 2a and 2c show example inputs to the system. Each spatial image is fourier transformed and a log-polar coordinate transform applied, the outputs of which are shown in figures 2b and 2d. These outputs are then phase correlated so that the rotation and scaling difference between one image and the next can be found from the correlation peak (figure 2e). In this case the peak occurs at (14,18) which corresponds to a rotation of 25 ~ and a scale change of about 0.78. Using this information, one of the spatial images is corrected for rotation and scaling (figure 2f) and the result of this is phase correlated with the untouched spatial input. The resultant correlation peak (figure 2g) indicates the translation of the reference object with the object in the second spatial image. In addition, the colour of the second object is found by calculating the argument of the complex correlation peak.
Conclusion The research presented in this paper enables the quantification of rotation, scaling and translation between a reference object and another arbitrary image containing the object. The results show that it is possible to do this without having to perform iterative calculations to determine these values. It is also possible to determine if the object is of the desired colour due to the colour representation which is used. References 1. Oppenheim A V, Lira J S, 1981, 'The importance of phase in signals', IEEE Proceedings, 69(5), 529 - 541. 2. Li Y, 1992, 'Reforming the theory of invariant moments for pattern recognition', Pattern Recognition, 25,723-730. 3. Wilson J C, and Hodgson R M, 1992, 'A pattern recognition system based on models of aspects of the human visual system', lEE 4 th Int. conf. on image processing and its applications, 258-261. 4. Reitbock H J, and Altmann J, 1984, 'A model for size and rotation invariant pattern processing in the visual system', Biological Cybernetics, 51, 113-121. 5. Kuglin C D, and Hines D C, 1975, 'The phase correlation image alignment method', Proc. IEEE conf. on cybernetics and society, 163-165. 6. Pearson J J, Hines Jr. D C, Golosman S, Kuglin C D, 1977, 'Video-rate image correlation processor', Proc. SPIE conf. on application of digital signal processing (IOCC), 119, 197-204. 7. Thornton A L, and Sangwine S J, 1995, 'Colour object location using complex coding in the frequency domain', IEE 5th Int. conf. on image processing and its applications, Heriot-Watt University, Edinburgh, UK, July 4-6 1995, 410, 820-824, Institution of Electrical Engineers, London 1995. 8. Lee D-J, Krile T F, Mitra S, 1988, 'Power cepstrum and spectrum techniques applied to image registration', Applied Optics, 27(6), 1099-1106. Acknowledgment This research is supported by The University of Reading Research Endowment Trust Fund.
Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
619
SIIAC 9 Interpretation System of Aerial Color Images Salim Mouhoub, Michel Lamure and Nicolas Nicoloyannis URA 934- CNRS, Universit~ Claude Bernard Lyonl 43, bd 11 Novembre 1918. 69622 Villeurbanne France 1. Introduction
In this paper, a general methodology is presented to resolve the problem of the interpretation of aerial color images. This problem must be divided in several levels of abstractions corresponding to different classes of methods or low and high level algorithms. In our work, we are particularly interested to the high level part. 2. General presentation
According to the diversity of the information (color, texture, geometry) used in order to identify the different objects contained in an image and the importance of some types of information as the color knowledge which demands a particular process, we preferred to adopt a strategy based on the blackboards structure [ 1]. We associated, thus, to every type of information a knowledge source (KS) or "specialist". These specialists cooperate around a common facts base called the blackboard which contains all the data concerning the image. In SIIAC, the control is realized by the blackboard's monitor. SIIAC is constituted of three main parts: the knowledge sources, the blackboard and the control. It's general architecture is presented in (Fig. 1).
[KS,'R~,a~ation"~
......i...............
~
lackboard
< .....i..............
KS"Color"
I
1
i_ KS"Texture" [
: .....................................................................
,
Contol
....................9~
i
i
Control Data
Control flow Data flow
Fig. 1 SIIAC architecture diagram. 2.1 The control
In SIIAC, the identification of a area is realized by the cooperation of the different KSs. These KSs are distributed in two different classes: - The first class contains the low level KSs (KS "Color," KS "Texture" and KS "Geometric"). These KSs assign some labels to the different areas of the image according to their low level features. These KSs take into account the geometry, the color and the texture of every area in consideration. - the second class is represented by the KS "Relaxation". This KS uses the spatial arrangements of the different areas of the image in order to reduce the number of labels assigned to every area. The KS "Relaxation" is based on a system of constraint propagation (discret relaxation) which allows to construct a consistent labeling between the areas. 2.2 The knowledge sources
The knowledge sources contain two parts; a condition part and an action part. The condition part specifies the conditions of application of this KS and the action part contains the knowledge of the field of abstraction level which this source is destined. The knowledge sources read and write some information in the blackboard, they don't communicate directly
620
between them but through the common facts base. The SIIAC system is constituted by four knowledge sources 9the KS "Color," the KS "Texture," the KS "Geometry" and the KS "Relaxation". In the following, we are going to give into details the KS "Color" and "Relaxation, which are the most important KSs in SIIAC. 2.2.1 The KS "Color" To define the color, we took four radiometric parameters : the minimum, the maximum, the average and the variance of gray levels in the three bands R, V, B (min_ng, max_ng, ave_ng and var_ng). In order to construct the color recognition rules, we've used 600 representative area samples distributed into six groups. These groups have the following denominations: "clear roof, .... dark roof, .... brown roof, .... tarmac," vegetation," "shadow." In every group, we dispose of 100 areas. We notice that, every group is identifiable by a color and vice versa, every color corresponds to a group. We have therefore, in all, six different colors. For every area we calculated the parameters previously cited in the three bands. We extracted then, for every color, the confidence intervals corresponding to the three based components R, G, B. For example, the 12 confidence intervals corresponding to the red clear color are : .
.
.
.
~
R e d
A v e ng
' Min_ng Max_ng Var__ng
[
[ 511" 66.4 ]
1
[61." 86.] [ 0.00543"0.123 ]
~
II
11
'!
[4i. "54. ]
'R
G re e n
[ 60.'69. [44. "55
[I
]
]
II
Ii
[76. " 92. ] [ 0.079 0.1616 ]
Blue
[ 52.5"63.] [41." 55.] [63. "78. ] [ 0.05825"0'.124.] .
.
.
.
Where Ave_ng, Min_ng, Max_ng and Var_ng are respectively the average, the minimum, the maximum and the variance of the gray levels. We note that to every parameter corresponds three intervals. These constitute a parallelepiped in the R3 space representation. In order to construct our recognition rules, we adopted a strategy based on production rules. Bearing in mind that for every parameter (average, minimum, maximum and variance of the gray levels), a color is represented by a point in the R.G.B representation space. We have therefore associated to every parameter and for every color three rules (correspondenting to the three basis components R, G and B). The following is an example of rules (only integrating the average of gray levels) which permits the green color recognition : I._f R_ave_ng of the area >= 51.
and
R_ave_ng o f the area = 60.
and
G_ave_ng o f the area = 52.5 and
B_ave_ng o f the area Vi,HS
V i e A m(D)
(3)
are defined as those corresponding to parts of the m th ATN-TM element in the HS. The index s in Equation (3) denotes the "strength" of the qualified line direction in the HS domain. Furthermore, a SLS is def'med for each of the selected line directions according to (i) the projection in the image of the associated ATN-TM element line, and (ii) SLS length information formulated in this specific direction during the HT "voting" process. ATN-TM knowledge is also instrumental in producing ATN related pairs of SLSs. Those SLSs corresponding to parts of a specific ATN-TM element are then coupled together, forming pairs of SLSs which are likely to correspond to ATN elements. This is possible since ATN elements are described by parallel SLSs whose lengths and distances between them, are predetermined according to international airport design standards [5,6]. Also SLSs, in the image domain, should have opposite edge directions. These spatial length, width and edge direction relationships represent general ATN structural characteristics and manifest themselves in terms of system constraints. However, because these constraints are violated when perspective
659 effects are present [3,4], the system applies them in a Cartesian world co-ordinate system. This is achieved after "backprojecting", with the aid of INS/GPS, DTEM and CCDCD data, extracted SLSs from the image to this co-ordinate system. Specifically, SLS pairs are tested for compatibility with the available general ATN structural characteristics as defmed in both the world co-ordinate system and in the image. Thus, a number of constraint rules are used by the proposed system: 9 In the world co-ordinate system: (i) SLSs are parallel, (ii) there is an overlap in the projections of SLSs to a common direction, (iii) their lengths are within a predefined range, (iv) the distance defined between the directions of the two SLSs is within a certain range, (v) the aspect ratio of a hypothetical parallelogram that is formed by the two SLSs is much greater than 1. 9 In the image domain: (vi) the two SLSs have opposite edge directions. The system then proceeds with qualified pairs of SLSs which correspond to the same ATN-TM element and consistently appear in consecutive image frames. These are examined in order to select the "best" pair that will finally represent a particular ATN-TM element. A distortion measure is defined for this purpose over f successive image frames using n qualified pairs Pk of SLSs, which takes into account the structural characteristics, in the world co-ordinate system, of a candidate pair Pi of SLSs:
DM (Pi)=
~ / j=l~ (wJ'[PJ-PkJl)+w4"d(p?'p)4)
k=l
(4)
4.n
Index j in Equation (4) denotes the length (j=l), width (j=2), orientation (j=3) and the position of the centroid 0=4), in the world co-ordinate system, of the previously mentioned parallelogram, d(p4,p 4) is the Euclidean distance between the ith and k th pair centroids and wj are weights associated with the above structural characteristics. A pair of SLSs in a given frame, which results to a minimum distortion measure is thus selected to represent the ATN SLSs in all the f frames of the input video sequence. 3. COMPUTER SIMULATION RESULTS AND CONCLUSIONS
The proposed model-based approach has been tested using real airport aerial image sequences, each containing three ATN elements (MR, CRA, CRB) of different contrast and importance in terms of landing an aircraft. Detection and False Alarms rates measured in these cases in a "per frame" basis, i.e. without utilising the distortion measure defined by Equation (4), are illustrated in Figure 1.a. The system has the ability to locate ATN elements even in cases where the quality of the images is particularly poor and when inexperienced observers have difficulties in correctly identifying these elements, i.e. all ATN elements in the Airport02 sequence and the CRB element of the Airport04 sequence. Notice that the overall system performance depends on the accuracy of the image independent information, particularly that of the INS/GPS data and the CCD camera parameters. The robustness of the system with respect to the above data has been examined with the Airport02 sequence where both the INS/GPS and camera parameters were highly corrupted. In this case the system can still identify for most of the time the main runway (MR) and provides low False Alarms rates. Figure 1.b illustrates ATN elements detection rates for the more accurate aerial video sequence case of the Airport04 sequence, as a function of the number of frames f used in the verification process. This multiframe scheme offers zero False Alarms and even higher detection rates, when compared to the "per frame" case. Medium and high contrast ATN elements, such as the MR and CRA, result to reasonably high detection rates when the number of frames f used by the multi-frame scheme is f > 5. However, in this case the detection of low contrast ATN elements, like CRB, is poor due to inconsistencies in detecting parts of this structure throughout a large number of successive frames. In addition to the above typical ATN detection performance, experiments were also carried out in order to determine the accuracy of the system in estimating correctly the structural characteristics of detected ATN elements. Thus the Mean Absolute Differences (MADs) were measured for ATN elements which are aligned with (i.e. the MR ATN element) or which are perpendicular to the flight path of the aircraft (i.e. the CRA ATN element). The figures quoted below were measured with the aircraft being within the range of 3420 to 2960 meters from the airfield's reference point. In the Airport04 sequence the minimum MAD orientation
660
FIGURE 1. (a) ATN elements detection and False Alarms rates measured in the "per frame" case, for three ATN elements in two input video sequences. (b) ATN elements detection rates of the proposed multi-frame system, for the same ATN elements in the Airport04 sequence, as a function of the number of frames f used. No False Alarms are observed in this case.
observed was 0.36 ~ (MR case) and the maximum 3.6 ~ (CRA case). A MAD length of 3.5 meters was measured for the MR case which corresponds to a 0.2% relative length error. For the CRA element, the MAD length was 15.5 meters which corresponds to a 1.2% relative length error. The maximum absolute width difference observed was 28 meters, at a distance of 3380 meters away from the reference point, and the MR and CRA MAD width were 5.3 and 2.6 meters resulting in relative width errors of 11.4% and 5.6%, respectively. These performance characteristics are typical of the system operating with corrupted image independent information. Notice that the fidelity of this information is the enabling factor that allows the system to operate at a maximum required performance in critical applications, such as the autonomous landing of aircraft.
ACKNOWLEDGEMENTS This work was supported by the Military Aircraft Division of British Aerospace (Defence) Ltd.. and the Engineering and Physical Sciences Research Council (EPSRC).
REFERENCES [1] ScheU, F.R. and Dickmanns, E.D. "Autonomous Landing of Airplanes by Dynamic Machine Vision", Proc. IEEE Workshop on Applications of Computer Vision, Nov./Dec. 1992. [2] Tang, Y.L., Devadiga, S., Kasturi., R. and Harris Sr., R.L. "Model-Based Approach for Detection of Objects in Low Resolution Passive Millimeter-Wave Images", Proc. SPIE: Image and Video Processing II, Vol. 2182, Feb. 1994, pp. 320-330. [3] McGlone, J.C. and Shufelt, J.A. "Projective and Object Space Geometry for Monocular Building Extraction", Proc. IEEE Conf. on Computer Vision and Pattern Recognition, June 1994, pp. 54-61. [4] Jaynes, C., Stolle, F. and Collins, R. "Task Driven Perceptual Organization for Extraction of Rooftop Polygons", Proc. 23 rd ARPA Image Understanding Workshop, Vol I, Nov. 1994, pp.359-368. [5] Horonjeff, R. and McKelvey, F.X. "Planning and Design of Airports", 4th ed., McGraw-Hill Inc., 1994. [6] International Civil Aviation Organisation "Aerodrome Design Manual, Part 1: Runways", 2nded., Canada, 1984. [7] Leavers, V.F. "Which Hough Transform ?", CVGIP: Image Understanding, Vol. 58, No. 2, Sept. 1993, pp. 250264. [8] Sarantis, D. and Xydeas, C.S. "A Methodology for Detecting Man-Made Structures in Sequences of Airport Aerial Images", Proc. Int. Conf. on Digital Signal Processing, Cyprus, Vol. 2, June 1995, pp. 565-570. [9] Bryson, N.F. "FuseNTS-Fusion of Navigation, Terrain, and Sensor Data: Phase I: Work Package W2-ModelBased Feature Analysis", Technical Report, School of Engineering, Division of Electrical Engineering, University of Manchester, UK, May 1993. [10] Mahalanobis, P.C. "On the Generalized Distance in Statistics", Proc. National Inst. of Science of India, Vol. II, No. 1, April 1936, pp. 49-55. [ 11] Smith, R.C. and Cheesman, P. "On the Representation and Estimation of Spatial Uncertainty", Int. Journal of Robotics Research, Vol. 5, No. 4, Winter 1986, pp. 56-68.
Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
661
C O N T E X T DRIVEN MATCHING IN S T R U C T U R A L PATTERN RECOGNITION S.Gautama and J.P.F.D'Haeyer Vakgroep Telecommunicatie en Informatieverwerking Universiteit Gent ABSTRACT In this paper we examine the problem of structural pattern recognition using graph structures. To speed up the correspondence problem, we propose a histogram technique which characterizes the context of a primitive within a pattern and allows indexing in the model database with polynomial complexity. INTRODUCTION The current research on image understanding in real-world applications is dominated by knowledgebased systems, where knowledge from low-level image processing procedures to high-level image interpretation is gathered and programmed into an expert system 1'2. The disadvantage of such a system is that it becomes highly application dependend, making the redesign of an existing system to a new application impractable. In environments where expert knowledge is hard to formalize or where the size of the problem can benefit from automation, efficient use could be made of automated learning tools during the design. In this paper we examine representation, generated using basic primitives, and an iterative matching technique which can make efficiently use of this representation to guide the recognition process. As it applies to probabilistic graph structures, the method serves as basis for the incorporation of incremental learning. The object models and the scene that needs to be interpreted are described by primitives and relationships between these primitives. They are mathematically represented by a (probabilistic) hypergraph structure in which n-ary relations are represented by a hyperedge connecting the n primitives in the argument list. These hyperedges, encoding topological and geometrical relations, contain important information that is needed to constrain the large space of possible mappings between primitives. To restrict the number of relations that are being generated, a neighbourhood system is imposed on the scene. In this way only relations between a primitive and its nearest neighbours are allowed, reducing the size of the scene hypergraph. Within a neighbourhood, relations are being measured, after which they are passed through a quantifier, generating a discrete set of 'relation labels'. Thus after preprocessing the scene, each scene hyperedge carries a single label. No use is made of unary measurements in the primitives, other than its midposition to determine the neighbourhood. Object models are induced from object instances which are generalised into a probabilistic hypergraph. Model hyperedges contain a probability distribution of labels, to capture the variability in shape of the object instances. The model primitives contain the mean midposition of the corresponding instance primitives, defining the neighbourhood system over the set of model primitives. To solve the correspondence problem, the notion of context histogram is introduced ~. This histogram, calculated for each primitive, gathers the occurrence frequencies of the quantified relation labels in the support set of a target primitive. The support set is the set of relations (hyperedges) that contain the target primitive in their argument list. The characterisation by means of the support set bears resemblance to the Q-coefficient used in probabilistic relaxation techniques 4. A histogram however, while increasing the memory requirements, does allow a more detailed characterisation than a single coefficient, meaning more complex similarity measures can be used.
662
CONTEXT DRIVEN MATCHING In this section, definitions and mathematics are introduced that form the base of the recognition process. Attributed hypergraphs are used as representation for higher-order structural patterns. An attributed hypergraph I, defined on a set of labels s consists out of two parts: 1) H which denotes the structure of hyperedges, and 2) A: H ~ which describes the attribute values of the hyperedge set. A hyperedge of order v with index k is denoted as I~. Primitives in the hypergraph correspond to hyperedges of order 0 and are notated by Ik, dropping the superscript to ease the notation.
A random hypergraph M represents a random family of attributed hypergraphs, thereby serving as a model description which captures the variability present within a class. It consists out of two parts: 1) H which denotes its structure, and 2) P: H x ~ [ 0 , 1 ] which describes the random elements. Associated with each possible outcome I of M and graph correspondence T: I ~ M there is a probability P(I<Mr) of I being an instance of M through T. Correspondence between a scene primitive Ik and a model primitive Mrk proceeds by comparing the support set of both primitives. The support set S of a primitve Ik is defined as the set of hyperedges that contain Ik in their argument list: S(Ik) = { I~ / lk 9 Arg(l~ ) }, where Arg(l~ ) denotes the argument list of the hyperedge I~. Built over the support set is the context histogram, which is used to characterize scene and model primitive. For a scene primitive Ik and label or, the context histogram gathers the occurrence frequencies of a label oc in the support set of Ik and is defined as: t~S(lk)
=
Is(/,)[
The denominator normalises the total mass of the contex histogram to unity. Calculated on a random hypergraph, a context histogram is defined as containing the expected occurrence frequencies of the labels, modified by a hedge function F which encodes prior knowledge of the correspondence between scene and model primitive:
Z P ( Z ( M [ ) = ot).F(l k -< MT.k ,ot, M~') C(Ik -< M L ,or) = M~S(Mrk)
Is(M,)[
The hedge function weights the contribution of each hyperedge within the support set of the model primitive, by taking into account the support that the primitives in the argument list of the hyperedge receive. This is modeled after the Q-coefficient in probabilistic relaxation. For binary relations this coefficient is expressed as:
Q(lk "<Mr k )=
I-I ~ p(Z(MI,~) =/~(l~,t)).P(l t -< M~) ll,t~Sf lk ) M~k,rT~SfMrk )
where the subscript in the first order hyperedges fk, l denotes its arguments. This function can be viewed as calculating the probability of occurrence of the context vector {A(llk.t)/ I ~`,t ~ S(lk)} in the support set of the model segment Mrj:, where the scene graph is taken as an ANDgraph, while the model graph is taken as an OR-graph with independent and mutually exclusive primitives. Each occurrence probability p(lq,(Mlk,t)=,~,([lk,t)) is additionally weighted with the support of its argument p(It <Mrt). For first order hypergraphs, the hedge function F is taken as:
F(lk -< Mr~ ,t~,M~,~ ) =
max
p(ll -< M~ )
I~:~S(Ik)
Similarity between a scene primitive lk and a model primitive Mrk is defined as: S(Ik,Mr, ) = ~,,min(C(lk,ot),C(lk -< M L ,a)) which can be used again as prior estimation, thereby establishing an iterative recognition scheme. Figure 1 illustrates the basic elements of the process.
663
Figure 1 Illustration of the construction of the context histogram for scene and model primitive
To illustrate the technique, we examine the recognition of crossroad structures within a digital image. Fig.2a presents part of the city of Ghent, generated using CorelDraw, after which it has been segmented into line segments. The line segments form the basic primitives of the representation. Binary relations are generated using the relative angle between line segments, resulting in a first order hypergraph. The neigbourhood of a primitive is set to a region within radius 25 (img size=512x512) and the quantisation level is set to 8, resulting into 8 discrete relation labels. No use is made of unary measurements to characterize a line segment as the matching process relies solely on the information offered by the angle relations. The model is an extract from the scene which has to be localized. Model and scene graph representations are generated independently from eachother. Recognition is more complex than a 2-class problem (i.e. structure and scene noise), since each primitive within a structure needs to be correctly mapped onto the corresponding primitive. After two iterations, placing a threshold of 50% on the match probability to suppress scene noise and retaining the best match, the results are summarised in table 1 (exp.1). The scene primitives that pass the threshold (i.e. that are recognized as being model primitives) are highlighted in fig.2a. Fig.2b shows the unthresholded correspondence map after two iterations, which presents the match probabilities of the scene primitives (horizontal axis) onto the model primitives (vertical axis).
664
Figure 3 (a) scene with recognized primitives highlighted, (b) correspondence map, (c) model 1, (d) model 2, (e) original image
Fig.3e presents a digitized image of a roadmap of Bologna. The result after initial segmentation into line segments is shown in fig.3a, containing 214 segments. Two structures need to be identified in the image, fig.3b and 3c, containing resp. 24 and 20 segments. The same conditions hold as for experiment 1 and the results are summarised in table 1 (exp.2). Experiment 1 Total I % model segments correct mapping wrong mapping missing segments scene noise false alarm suppressed noise
27 1
15
62,8 2,3 34,9
Experiment 2 Total I % 30 1
13
1 0 0.5 204 213 99.5 Table 1 Summary results of structural matching
68.2 2.3 29.5 0
100
CONCLUSIONS We have presented a new iterative matching technique based on a histogram of structural context information. Experiments show a good noise suppressing ability while retaining adequate recognition results with minimal false alarms. Since scene primitives are structurally mapped onto the model, orientation and scale can be hypothesised from a match, whereas the model can be used to direct a search for missing information, thereby improving or ignoring the match. This will be the subject for further work. 1 V.Hwang, L.Davis, T.Matsuyama, "Hypothesis integration in image understanding systems," Computer Vision, Graphics and Image Processing, CVGIP 36, 1986, pp.321-371. 2J.Van Cleynenbreugel,"Tapping multiple knowledge sources to delineate road networks on high resolution satellite images," Master's thesis, KUL, 1992 3 S.Gautama,J.P.F.D'Haeyer, "Automatic induction of relational models" in Hybrid Image and Signal Processing V, Proc.SPIE vol.2751, 1996, pp.253-263 4 W.J.Christmas, J.Kitfler, M.Petrou, "Structural matching in computer vision using probabilistic relaxation," IEEE Transactions on Pattern Analysis and Machine Intelligence,Vol.17, No.8, 1995, pp.749-764.
Proceedings IWISP '96,"4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
665
A n Efficient Box-Couting Fractal Dimension A p p r o a c h for E x p e r i m e n t a l I m a g e Variation Characterization Aura Conci 1"2 Claudenize F. J. C a m p o s 2 1Comp. Apl. e AutomagSo - C A A - P6s-Grad. Eng. Mec~nica - PGMEC - UFF CEP 24 210-240, Niter6i, R J , Brazil -
[email protected] 2Dep. Eng. Mec~nica - Pontificia Universidade Cat61ica do Rio de Janeiro - PUC-Rio CEP 22 953-900, Rio de Janeiro, R J , Brazil -
[email protected] Abstract Many applications of fractal concepts rely on the ability to estimate the fractal dimension (FD) of objects. FD is a attempt to quantify how densely a fractal occupies the space in which it lies. This characteristc has been used in texture classification, segmentation and other problems. An efficient algorithm to estimate FD of images is proposed in this paper. We suggest its use to identify on line image deviation from a standard pattern. We report on some experiments on textile failings and comparison with four other methods. Introduction The FD of a set A in Euclidean n-space can be derived from 1 = Nr rEDor FD = log ( N r ) / l o g ( l / r ) (1) where Nr is the union of non-overlapping copies of.4 scaled down by a ratio r. However, it is difficult to compute FD using these equations directly. Peleg et al. Ill extended FD to image, that can be viewed as a terrain surface whose height is proportional to the image gray value (figure 1). The reticular ceil counting estimator has been proposed by Gangepain and Roques-Car m e~_[2]. But this estimator can not be used when the range of the actual FD of an image is 2.0 - 2.5. Keller et al. [3] proposed an approach, which presents satisfactory results up to FD=2.75. Pentland [4] suggested a method of estimating FD by using Fourier power spectrum of image intensity surface, such method gives satisfactory results but, since Fourier transformation computation is included, it is slower than the others. Sarkar and Chaudhuri [5] described an efficient approach, named Differential Box-Couting (DBC), that uses differences on computing Nr and gives satisfactory results in all range of FD. The Nr in DBC method is counted in a different manner from the others box-counting methods [6]. Consider the image of MxM pixcls has been partitioned into grids of sxs pixels and scaled down to r=s/M. If G is the total number of gray levels then G/s'=M/s. On each grid there is a column of boxes of size sxsxs'. Assign number l, 2, ...n to the boxes as shown in figure I. If the minimum gray level of the image in the grid (i,j) fall in box number k, and the maximum gray level of the images (i,j)th grid fall in the box number I, then in DBC approach nr (i,j)=l-k+ 1 (2) is the contribution of nr from the grid (i,j). Taking contributions from all grids N r - Z nr (i,j) (3) Then the FD can be estimate from the least squares linear fit of log(Nr) x log(I/r), where Nr is counted for different values of r and s. DBC Modification Although the DBC method gives a very good estimate of FD some simplifications in computations and improvements in efficiency is possible using the following modifications in the original method. If a set A E ~ 3 is covered by just-touching boxes of side length (1/2)", equation (1) can be rewriting as FD = limn __>oo(logNn) / (log 2n) (4)
666 where Nn denotes the number of boxes of side length (1/2)" which intersect the set A 6. In our proposed algorithm, the image division in box of different length is processed in a new manner from others box counting variations I61 and the original DBC method tsl. Consider the image of size MxM pixels, we take M to be a power of 2 and take the range of light intensity to be integers from 0 to 255. All images are enclosed in a big box of size MxMx256. We consider the image divided into box of side length nxnxn' for n=2,4,8...2m and n~ .... 2m' for each image subdivision. Nn is counted as Nn = Y, nn (i,j). (5) nn=int(Gray_max(i,j)/n'-int(Gray_min(i,j)/n')+ 1 where int(..) is the integer part of a division. These changes turn the implementation faster and simpler than the DBC original algorithm. The image file is read only once, in the first image division in boxes, the bitmap of MxM pixels needless be saved, when the image is read we saved only two matrices of M/2xM/2: Gray_max and Gray_min (saving MxM/2). This first calculation of nn using equation (3) correspond to divided the image in boxes of 2x2 pixels. For boxes of 4x4 pixels there will be M/4xM/4 elements in Gray_max and Gray_rnin, and each new element (i_new,j_new) is obtained from consulting only the four dement (i,j); (i+ 1,j); (i,j+ 1) and (i+ 1,j+ 1) of the Gray_max and Gray_min matrices. If the algorithm begin in i=j=0 in each new iteration the Gray_max and Gray_min matrix elements (i_new=i/2 and j_new=j/2), for the next division of the image, can be saved in the same space. Then using (4) we estimate FD from mean of log(Nn)/log(2n). It can be easily showed iS] that computation complexity of others approach, including the original DBC, is much more high than that using the above suggested modifications.
Experiments
The proposed DBC modifications have been used in some experiments. Our first goal is to examine the accuracy of the approach for FD estimation so we use first only images having known fractal dimensions. Test data for this first group of experiments came from synthetic textures (9 Brownian images and 9 Takagi surfaces generated on a 256x256 grid with 256 gray level and FD varied from 2.1 to 2.9 on steps of 0.1- not reported here) [71. For experiment with natural images we took Brodatz's textures (figure 2 and table 1). The fact that our modifications return accurate values on images with known dimension motivates questions on possibility of the use of FD to identify variations in images. The remainder of our experiments investigate this possibility. Experiments on estimates changes in images are shown in figure 3 and 4. These figures represent fabrics with different patterns and kind of defect. As can be seen light changes in the image modified the FD computed for each image.
--
I
if
_
,,
J;-.->-,, ,,. ,,
i.r~l
kr
/;--'--_;-4--'vx4--" Figure 1 - Deter~nation of nr or nn.
-7"
Table 1 - FD of natural texture (image numbers correspond to Brodatz's book) image DBC DBC tSj Pent- Peleg Keller modif, land[4] et al.m et al.t3J 2.66 2.55 2.72 2.68 D04 2.66 2.45 2.38 2.52 2.57 D05 2.45 2.58 2.49 2.65 2.65 D09 2.58 2.44 2.46 2.59 2.57 D24 2.44 2.55 2.48 2.61 2.62 D28 2.55 2.48 2.37 2.60 2.59 D55 2.48 2.53 2.44 2.63 2.60 D68 2.53 2.61 2.47 2.68 2.65 D84 2.61 2.50 2.38 2.59 2.59 D92 2.50
667
Fi~,ure 2. - Brodatz's natural texture ( FD in table 1 )
DF=2.60 (top) and 2.62 (botton) DF=2.51(top) and 2.53 (botton) DF=2.59(top) and 2.57 (botton) Figure 3 - Usual textiles imperfections on drill (le~), cotton (centre) and carpet (fight).
668
Figure 4 - Jeans without defects, FD=2.43 (top left); stained jeans, FD=2.41 (top centre) and non-uniform dye, FD=2.47 (top fight). Silk without imperfections (botton left), FD=2.18; with a single imperfections (botton centre), FD=2.09; and with many imperfections (botton fight), FD=2.40. Conclusions
The main goal of this paper is to present a simple approach to compute FD on images. Elementary experiments demonstrated that the variations between an original image pattern and its reproductions can affect the respective FD. A statistical analysis should be carried out, related to specific class of images, in order to access the applicability of the FD as a means for finding imperfections in practical considerations. The encouraging conclusion is that this approach is faster and simpler than the usual. This can be readily extended to 3D images as well. References
[1]S. Peleg, J. Naor, R. Hartley and D. Avnir, "Multiple resolution texture analysis and classification", 1EEE Trans. Pattern Anal. Machine Intell., Vol. 6,1984, pp. 518-523. [2]J. Gangepain and C. Roques-Carmes, "Fractal approach to two dimensional and three dimensional surface roughness", Wear, Vol. 109, 1986, pp. 119-126. [3]J. Keller, R. Crownover and S. Chen, "Texture description and segmentation through fractal geometry", Computer Vision Graphics Image Processing, Vol. 45, 1989, pp. 150-160. [4]A. P. Pentland, "Fractal based description of natural scenes", IEEE Trans. Pattern Analysis Machine Intell., Vol. 6, No.6, pp. 661-674, 1984. [5]N. Sarkar and B. B. Chaudhuri, "An efficient differential box-counting approach to compute fractal Dimension of Image", 1EEE Trans. Syst. Man and Cybernetic, Vol. 24, No. 1, pp. 115120, 1994. [6]B. Bamsley, Fractal Everywhere, Academic Press, 1988. [7]Qian Huang, J. R Lorch, R. C. Dubes, "Can fractal dimension of images be measured?", Pattern Recognition, Vol. 27, No. 3, March 1994, pp. 339-350.
Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
669
An Identification Tool to build Physical Models for Virtual Reality. Jean LOUCHET*t, Li JIANG:~* *ENSTA, Laboratoire d'Electronique et d'Informatique, 32 boulevard Victor,75739 PARIS 15, France t INRIA, projet SYNTIM, BP.153, Rocquencourt, 78153 LE CHESNAY, France SLAFORIA, Universit6 Pierre et Marie Curie, tour 46-00, 1 Place Jussieu, 75005 PARIS, France e-mail:
[email protected] Keywords
Computer vision, motion analysis, motion modelling, image animation, artificial evolution, multimedia, virtual reality. Virtual reality applications including visual and gestural man-machine interaction requires the use of particle-based deformable and articulated models. However, they lack the relevant model building methods, in order to ensure realistic behaviours based on observations of real-world objects. To this end, our research aims at looking for reliable physical model building mechanisms, which use experimental kinematic data of real objects as an input. Building a physical-based model of an object thus divides into two main steps: 9 The first step is the capture of kinematic data. It may consist of extracting characteristic points trajectories from the images of a real scene, involving image sequence analysis techniques, or other ad-hoc experimental means [6]. 9 the second step consists of using these kinematic data into a specific physical model implementation, scheme. This paper focuses on the second step. We propose an original method to automatically identify physical models built through using local masses and generalised springs.
1 The Physical Model. The general principles of the physical models we developed are presented in [4]. Particles are described by their masses, positions and speeds. Motion results from forces applied by internal bonds or the environment (gravitation...). Four bond types are considered: 9 9 9 9
unary bonds, between a particle and the medium (e.g. viscosity, gravitation...); binary bonds, between particle pairs; ternary ("flexion") bonds, between three particles; quaternary ("torsion") bonds, between four particles.
Each bond type may consist of two components: 9 the first component is defined as the gradient of an energy potential which depends on the relative positions of the particles involved. Forces applied to particles derive from this potential. This allows an easy (if tedious) transposition to ternary and quaternary bonds, of elementary mechanical conservation principles about resulting torques and forces [9]. 9 the second component generates damping forces, which depend on the particles' mutual positions and speeds.
2 An Evolutionary Identification Strategy. The task of the identification algorithm is to find out the bonds' parameters from given kinematic data. The basic idea is to consider identification as an optimisation problem. To this end, a cost function evaluates the quality of candidate physical models (i.e. object description files). This cost function measures a generalised distance between the trajectory predicted by the model to be evaluated, and the real given trajectory. The problem consists in finding, among all possible models, the one with the lowest cost. We observed that conventional numerical optimisation techniques are unsuccessful on these cost functions, probably lacking the desirable mathematical regularity properties. Even a stochastic technique like simulated annealing becomes extremely slow, and in practice cannot cope with objects containing more than about half a dozen particles. This is why we developed an "Evolutionary Strategy", a stochastic optimisation technique the principles of which are inspired (like
670 genetic algorithms) from biological evolution. It consists of creating a random initial population of models, and letting it evolve naturally through the effects of mechanisms of selection, crossover and mutations, under the control of the cost function values of the population's individuals. More generally, evolutionary strategies are known for their robustness and outstanding ability to optimise functions with multiple local minima. However, they are often slow, difficult to tune up, and may benefit from problem-specific implementations. We experienced that a conventional evolutionary scheme succeeds in identifying small particle systems (i.e. objects with simple behaviours), but loses efficiency and precision with increasing particle numbers. Therefore we introduced some novel characteristics into the evolutionary algorithm. Our philosophy is to exploit as much as possible the problem's specific semantics, but also to make sure that no a priori, implicit information on the parameters is given to the algorithm. First, we improved the robustness of the identification algorithm by designing a cost function which calculates shortterm (rather than long-term) trajectory differences: the cost function is the quadratic sum of differences between the real reference trajectory and the points extrapolated from the preceding time step. This has important consequences on the shape of the cost function. Second, we exploited a topological property: the position of a particle at a given time only depends on the recent history of its neighbourhood. This led us to split the cost function into the sum of each particle's contribution. We call these contributions, attached to individual particles, the "local cost functions". The local cost function of a particle is defined as the temporal sum of the prediction errors concerning its coordinates. These local cost functions play a key role in the identification algorithm. Indeed, let us remark that the position of a particle at time step (t + l~only depends on its position and speed and on its neighbours' positions at time step t (the neighbours of a given particle are defined as the particles which share a bond with it). Therefore, local cost functions corresponding to the extremities of a bond will be more relevant than the global cost function to evaluate the estimation quality of this bond. We use widely these local cost functions into the crossover and mutation processes in the evolutionary identification scheme. The guidance of evolutionary mechanisms by local cost functions results in the fact that the algorithm converges on each region of the object, independently of its convergence on other regions: the local cost functions are not influenced by remote regions as the global cost function would be. The fundamental consequence is that the number of generations (calculation steps) required for convergence does no more depend on the number of unknowns or the object' s complexity. Third, we provided the algorithm with a self-adaptive behaviour [2]: the variance of mutations, which is an important internal parameter of the algorithm, is controlled by input data. To this end, besides each (mechanical) parameter in the individual's codes, we introduced an extra "mutability" parameter, which controls the standard deviation of mutations which can be applied to the corresponding mechanical parameter. The mutability parameters have no direct effect on the behaviour or the cost of the individuals, but as part of the genetic code they are susceptible of the same evolutionary mechanisms (mutations, crossover) as mechanical parameters are. The mutabilities thus evolve and keep fitted to the algorithm's needs. Thus, the balance of robustness and precision, which is often a difficult point in evolutionary algorithms, remains under the control of input data and cost function values: this results in a much better precision of parameters' estimations at the end of the algorithm. An apparent side effect of this self-adaptivity is to double the representation size of an individual, but this has no real consequence on the execution speed, as the calculation of cost functions values, which is the dominant part, is unchanged.
3 Some application examples 3.1 Identification of the general linear model and noise resistance In order to test the identification algorithm's performance, we chose to use an experimental protocol which consists of: 9 build a catalogue of bonds 9 build an object by defining masses and positioning instantiations of the bonds between these masses 9 calculate a trajectory of the object, from arbitrary initial conditions. Then, we 'forget' the object parameters' values and use the evolutionary strategy described above in order to find out these values from the given kinematic data. The curves below show typical convergence results, on an elastic object consisting of 15 masses, 10 different linear bond types (20 parameters) and 31 installed bonds. The identification algorithm uses 100 consecutive images and a population of 100 individuals. In order to test the algorithm's robustness, we added a Gaussian noise to the given kinematic data with different standard deviations.
671
Log(precision of parameters' estimation) in function of the number of generations, with several noise levels on trajectories. The thick line represents the algorithm's convergence without noise. Tests show that convergence is very good after 500 to 1000 generations, independently of the number of parameters to be identified. In the typical convergence results shown above, the average parameter estimation error is lower than 0.01% for the three lowest noise levels shown. 3.2 Turbulent fluids In the general framework presented above, the objects have got a fixed structure of masses and bonds. However, particle-based models have long been used to simulate fluid objects like clouds or smoke. [8] use the Cordis-Anima approach [9] to model turbulent fluid flows, using point masses and conditional bonds between particles. Each of these bonds becomes active whenever the distance between its extremity particles goes under a given threshold. The authors obtained visually convincing simulations, using several particle and conditional bonds classes. Our aim is again to examine how it is possible to use our identification algorithm to induce internal characteristics of such a turbulent fluid model, from given kinematics. We implemented a similar viscous fluid model (see images below), using simultaneously several types of conservative bonds ("springs") depending on relative speeds:
if(distance
< thresholds)then (force = f s (Jc2 - 5q))
and energy-dissipative ("dampers") and depend on the relative positions:
if (distance < thresholda)then(f orce = f a(x2
-
Xl))
where fs and fa are non-linear functions.
Two 2-dimensional images of a jet penetrating into a fluid (flames nos. 300 and 400 from a sequence). The main difference with the general model is that the number of bonds (and therefore the computational load of the cost function) are very high compared to standard flexible objects (about 80000 in the example above) even if not all
672 are activated at the same time due to the distance threshold. After 1000 generations, the algorithm yields very good estimates of the initial parameters: bond parameter
initial parameter
parameter's estimate
viscosity coefficient 1 (between matrix particles)
2.5
2.499
viscosity coefficient 2 (non-linearity)
1.25
1.248
distance threshold for viscosity activation
1.5
1.500
viscosity coefficient 1 (between matrix and jet particles)
4
3.962
viscosity coefficient 2 (non-linearity)
2
2.052
distance threshold for viscosity activation
2
2.000
elasticity coefficient
2
2.000
distance threshold for elastic bonds activation
1.1
1.100
3.3 Other object types and future extensions The same identification procedure has been successfully tested on a cloth animation model [7] with a similar experimental protocol, and allowed a realistic reconstruction of a cloth image sequence. Here, the object is periodic and all bonds include a non-linearity factor through the introduction of an "elongation rate". The next step of our research project will consist of validating our approach using real-world kinematic data, coming from processing real images of articulated solid objects [5], fabrics [7] and turbulent fluid flows[8]. 4 Conclusion Particle-based physical models appear to be a promising general framework to build realistic and efficient models for simulation and animated image synthesis. They are being used increasingly to model smokes [10, 12], elastic or articulated bodies [1], fabrics [13]... but they need to be associated to identification methods both to deserve the "physical model" qualification and to ensure their behavioural realism. The evolutionary parameter identification technique proposed in this paper has proven successful to reconstruct model internal parameters from their kinematic outputs, including non-linear, conditional, elastic and viscous bonds in periodic or non-periodic objects, and thus provide the particle-based modelling tool with the 'measuring instrument' which should always be associated to a physical model. References [ 1]W. W. Armstrong, M. W. Green, "The dynamics of articulated rigid bodies for purposes of animation", The Visual Computer, Vol.l,, n~ 231-240, 1985. [2] Thomas B~ick, "Evolution Strategies: an alternative evolutionary algorithm". Artificial Evolution 95, Brest, September 1995, Springer 1996. [3] D.A.Goldberg, "Genetic Algorithms in Search, Optimization and Machine Learning", Addison-Wesley 1989. [4] J. Louchet, "An Evolutionary Algorithm for Physical Motion Analysis", British Machine Vision Conference, York, Sep. 1994. [5] J.Louchet, M.Boccara, "Detecting rotating regions in image sequences", Image'Com96, Bordeaux, may 1996. [6] J. Louchet, M.Boccara, D.Crochemore, X.Provot, "Building new tools for Synthetic Image Animation by using Evolutionary Techniques", Artificial Evolution 1995, Brest, September 1995, Springer 1996. [7] J.Louchet, X.Provot, D.Crochemore, "Evolutionary identification of cloth animation models, Eurographics Workshop on Animation and Simulation", Maastricht, Sep. 1995. [8] A. Luciani, A. Habibi, A. Vapillon, Y. Duroc, "A physical Model of Turbulent Fluids", Eurographics Workshop on Animation and Simulation, pp. 16-29, Maastricht, Sep. 1995. [9] A. Luciani, S. Jimenez. J.L. Florens, C. Cadoz, O. Raoult, "Computational Physics: a Modeller Simulator for Animated Physical Objects", Proc. Eurographics Conference, Vienna, Sep. 1991, Elsevier. [10] W. T. Reeves, "Particle Systems - A Technique for Modelling a Class of Fuzzy Objects", Computer Graphics (Siggraph) vol. 17 n ~ 3,359-376, juillet 1983. [ 1 I] N. Szilas, C. Cadoz, "Physical Models That Learn", International Computer Music Conference, Tokyo, 1993. [ 12] J.Stam, E. Fiume, "Turbulent wind fields for gaseous phenomena", ACM Computer Graphics (Siggraph 93), 369376, August 1993. [13] D. Terzopoulos, J. Platt, A. Barr, K. Fleischer, "Elastically Deformable Models", Proc. Siggraph 87, Computer Graphics, 1987, Vol. 21, No. 4, pp. 205-214.
Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
673
CUE- BASED CAMERA CALIBRATION AND ITS APPLICATION TO DIGITAL MOVING IMAGE PRODUCTION Yuji NAKAZAWA,
T~dcxtshi KOI4ATSU, and Takahim SAITO
Department of Electrical Engineering, Kanagawa University 3-27-1 Rokkakubashi, Kanagawa-ku, Yokohama, 221, JAPAN Tel" +81-45-481-5661 Ext.3119 Fax: +81-45-491-7915 Email:
[email protected] ABSTRACT
One of the keys to new-generation digital image production is to construct simple methods for estimating the camera's motion, position and orientation from a moving image sequence observed with a single TV camera. For that purpose, we present a method for camera calibration and estimation of focal length. The method utilizes four definite coplanar points, e.g. four vertices of an A4 size paper, as a cue. Moreover, we apply the cue-based method to the digital image production task of making up an moving image sequence of a synthetic 3-D CG image sequence and a real moving image sequence taken with a TV camera. The cue-based method works well for the task. 1. I N T R O D U C T I O N - BACKGROUND AND MOTIVATION Recently some research institutes have started studying digital production of a panoramic image sequence from an observed moving image sequence, construction of a virtual studio with the 3-D CG technology and so on, with intent to establish the concept and the schema of the newgeneration digital image production technology. Such an image production technology, utilizing information about the camera's motion, position, orientation and so on, integrates consecutive image frames to produce such an enhanced image as a high-resolution panorama and/or makes up a moving image sequence of a synthetic 3-D CG image sequence and a real moving image sequence taken with a TV camera. The key to the new: generation digital image production is to develop simple methods for estimating the camera's motion, position and orientation from a real moving image sequence observed with a single TV camera [ 1].
by using four definite coplanar points, which usually correspond to four vertices of a certain quadrilateral plane object such as an A4 size paper, as a cue. The practical computational algorithms for the cue-based method of camera calibration are composed of simple linear algebraic operations and arithmetic operations, and hence they work so well as to provide accurate estimates of the camera's motion, position and orientation stably. Furthermore, in this paper, we apply the cue-based camera calibration method to the image production task of making up an image sequence of a synthetic 3-D CG image sequence and a real moving image sequence taken with a TV camera according to the recovered estimates of the camera's motion, position and orientation. 2.
CALIBRATION
relative positions are known in advance and which usually correspond to four vertices of a certain quadrilateral plane object with known shape and are used as a cue for camera calibration. We perform camera calibration, that is to say, determination of the camera's position and orientation at each image frame, from 2-D spatial image coordinates of the four definite coplanar cue points, which are tracked temporally over consecutive image frames by our recently presented feature tracking algorithm [3]. Under such conditions, we perform camera calibration and estimate the focal length/" of the camera at the same time.
2.1 In this paper, to render it feasible to perform such 3-D estimation of the camera's motion, position and orientation when we use a single handy TV camera whose camera parameters are not given in advance, we present a method for performing camera calibration along with estimation of focal length of the camera accurately
CUE-BASED CAMERA
In this paper, we assume the following situation. The situation is as follows: while moving the single TV camera by hand [2], we image the scene which includes not only the objects of interest but also four definite coplanar points P1 " P4 whose
Image Coordinate System
Here for each image frame we define the 3-D viewing coordinate system o'x),'z' which is associated with the 2-D image coordinate system O-XY as shown in figure 1. In figure 1, we represent the 3-D viewing coordinates and the 2-D image coordinates with ( x' y' z ' ) and ( X Y ) respectively. We represent the 3-D viewing
674
coordinates of the four coplanar cue points P1 " P4 with { Pi' =(xi'
Yi' zi') t;i= 1,2,3,4},and we
represent the 2-D image coordinates of the imaged coplanar cue points, perspectively projected onto the image plane, with { Pi = (Xi Yi )t ; i = 1,2, 3, 4}.
2.2
y" Z ' ) t = M ' ( x
y
= ( m 1 m2
represent the 3-D world coordinates of the four coplanar cue points P I ~ P4 with P I = ( x'
Camera Calibration
The problem of camera calibration is to recover geometrical transformation of 3-D world coordinates of an arbitrary point in the imaged scene into its corresponding 2-D image coordinates, from given multiple pairs of the 3-D world coordinates and their corresponding 2-D image coordinates. The camera calibration problem is concisely formulated with the homogeneous coordinate systems. Given both 4D homogeneous world coordinates a = ( x 3' z 1 )t of an arbitrary point in the imaged scene and their corresponding 3-D homogeneous image coordinates b = h . ( X Y 1 )t, then the foregoing transformation will be represented as the linear transformation, which is defined as follows: (x'
z-axis is normal to the cue quadrilateral. Moreover, without loss of generality, we put the origin of the 3-D world coordinate system o-xyz at one of the coplanar cue points, e.g. P1. In this case, we can
z
Pi =(xi
t
(3)
Assuming that the focal length f o f the camera is accurately estimated some way or other, which will be described in the next section, we can easily recover the 3 u 4 transformation matrix M of equation 1 from the four pairs of the 3-D world coordinates Pi = (xi Yi 0 )t of each cue point Pi and its corresponding image coordinates Pi = ( Xi
Yi )t. Substituting the four coordinate pairs into equation 1, then we will reach the simultaneous equations:
(x'i Y'i Z'i)t = N ' ( x i
Yi
1)t
,n ,,2
1)t m3
Yl Zl)t =(Xl Yl ()) t : O Yi Zi / =(Xi Yi ~)' ;i=2,3,4
m4).(x
= x. ml + y. mz + z . m3 + m 4
y
= ] 1721 1722 n24 \n31 1132 n34
z 1)t
Vi X i = xi , Y/= -w--; Zi ~.s
(1)
'
. i = 1, 2, 3, 4
(4)
7.
x=X--.f z
, y=y--.f z
(2)
where the focal length f is explicitly handled. Here the camera calibration problem is defined as the problem to recover the 3 u 4 matrix M and the focal length f of equation 1 from given multiple pairs of the homogeneous world coordinates and their corresponding homogeneous image coordinates. Equation 1 means that the 3-D viewing coordinates ( x' y' z' ) are expressed as a linear combination of the three vectors { m 1 m 2 m 3 }, and hence we may regard the three vectors { m 1 m 2 m 3 } as the basis vectors of the 3-D world
where the focal length/" is implicitly included in the expression of the 3 u matrix N and the matrix N is related with the matrix M as follows"
I
ml I m12 m14 I (l?ll/f 1112/f /114/f / m21 m22 m24 : In-~l ]/" l?.~/.f" n 2 4 / f / m31 m32 m34 \ 1131 1132 !134 J
(5)
The simultaneous equations of given by equation 4 are linear with respect to the nine unknown variables { n I 1 ..... n34 }, and hence we
coordinate system o-xrz. On the other hand, the vector m 4 means the displacement vector shifting
can easily solve them. Their solution is expressed with a scale factor k. Moreover, given the focal length f of the camera, then we can recover the column vectors { m 1 m 2 m 4 }of the matrix M
from the origin of the 3-D viewing coordinate system to that of the 3-D world coordinate system.
by applying the relation of equation 5. With regard to the column vector m 3 of the matrix M, weshould
Here we imagine a plane quadrilateral whose four vertices are given by the four definite coplanar cue points, and we refer to the plane quadrilateral as the cue quadrilateral. As a common coordinate system to all image frames, we define the 3-D world coordinate system o-xyz whose x-y cross section contains the cue quadrilateral, that is to say, whose
employ a vector which is normal to both the two column vectors { m 1 m 2 },e.g. [ml[ .(m I xm2) m3 =lm 1 xm2[
(6)
Thus we can recover the 3 u 4 transformation
675
based camera calibration method explicitly identifies the focal length ./"of the camera.
matrix M of equation 1.
2.3 Estimation of Focal Length Once we recover the foregoing transformation matrix N of equation 4, we can estimate the relative depth z i' of each coplanar cue point P i as follows: Zi -- m31 "xi +m32 "Yi +m34 =rt31 "xi +1732 "Yi +n34
(7)
Thus we get an estimate of the 3-D viewing coordinates Pi' = (xi' Yi' zi' )t of each coplanar cue point Pi as follows: (8) The lengths of the four sides of the cue quadrilateral are assumed to be known in advance, and furthermore taking it into consideration the fact that the ratio of lengths of two sides arbitrarily chosen out of the four sides is invariant irrespective of the definition of the 3-D coordinate system, we get the relation:
IP'- P~
[P2 -Pl
12 =
Ip'4- p'll 2
[p4-Pl[ 2
(9)
r
Substituting equation 8 along with equation 7 into equation 9, then we will obtain the quadratic equation with respect to the focal length f The solution is given by f =
~r.C-A B-r.D
(10)
2.4 Comparison with the Existing Camera Calibration Method Most of the usual camera calibration methods do not employ any cues, and some of them involve computation of an eigenvector with the minimum eigenvalue for a certain positive-definite matrix, which seems to be often sensitive to noise [4]. On the other hand, the proposed cue-based camera calibration method requires only one solution o f linear simultaneous equations with nine unknown variables and simple scalar arithmetic operations, and hence its computational algorithm works numerically stably. In addition, the proposed cue-
3.
DIGITAL M O V I N G I M A G E PRODUCTION We have imaged the scene in our laboratory while moving a 8-mm handy TV camera for home use manually, and then we have applied the foregoing cue-based camera calibration method to the moving image sequence, each image frame of which is composed of 720 u 486 pixels. In the imaged scene we have put an A4 size paper on the floor of the laboratory, and we have used the four vertices of the A4 size paper as cue four coplanar points. Moreover, we have tracked certain feature points designated in the scene with the existing standard feature tracking algorithm which computes the position of a square feature window minimizing the sum of the squares of the intensity difference over the feature window from one image frame to the next [5]. Furthermore, we have performed the digital image production task of making up a moving image sequence of a synthetic 3-D CG image sequence of rotating and shifting bricks and the real moving image sequence of our laboratory according to the recovered estimates of the camera's motion, position and orientation. Figure 2 shows an image frame chosen from the resultant compound moving image sequence. As shown in figure 2, we can hardly identify artificial distortions in the compound image sequence, which demonstrates that the cue-based camera calibration method works satisfactorily for the foregoing digital image production task. 4. CONCLUSIONS In this paper, we present a method for performing camera calibration along with estimation of focal length of the camera accurately by using four definite coplanar points as a cue. The practical computational algorithms for the cue-based method of camera calibration are composed of simple linear algebraic operations and arithmetic operations, and hence they work so well as to provide accurate estimates of the camera's motion, position and orientation stably. Moreover, in this paper, we apply the cue-based camera calibration method to the digital moving image production task of making up an image sequence of a synthetic 3-D CG image sequence and a real moving image sequence taken with a TV camera according to the recovered estimates of the camera's motion, position and orientation. Experimental simulations demonstrate that the cue-based camera calibration method works satisfactorily for the digital moving image production task.
676 REFERENCES [1] K. Deguchi : "Image of 3-D Space : Mathematical Geometry of Computer Vision," Shoukodo Press, Tokyo, Japan, 1991. [2] C. Longuet-Higgins :"A Computer Algorithm for Reconstructing a Scene from Two Projections," Nature, vol.293, pp.133-135, 1981. [3] Y. Nakazawa, et al. : " A Robust ObjectSpecified Active Contour Model for Tracking Line-Features and Its Practical Application,"
Submitted to IEEE ICIP '96 (Accepted). [4] R. Horaud, et al. :"An Analytic Solution for the Perspective 4-Point Problem," Computer Vision, Graphics, and Image Processing,vol.47, pp.33-44, 1989. [5] C. J. Poelman and T. Kanade : "A Paraperspective Factorization Method for Shape and Motion Recovery," Lecture Notes in Computer Science, vol.801, pp.97-108, 1994.
m3 I
y
0
f
X
Figure 1
Figure 2
Coordinate systems.
Image frames chosen from the resultant compound moving image sequence.
3-D world coordinate
system
Session X: SIGNAL P R O C E S S I N G lI
This Page Intentionally Left Blank
Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
679
A Novel Approach to Phoneme Recognition using Speech Image M Ahmadi, N J Bailey, B S Hoyle The University of Leeds Department of Electronic & Electrical Eng. Leeds, LS2 9JT, England Tel: +44 (113) 233 2016 Fax: +44 (113) 233 2032 E-mail: <
[email protected]> Abstract- In this paper a novel feature extraction technique based on the two-dimensional DCT (Discrete Cosine Transform) of the spectrogram is proposed. This is in contrast to conventional approaches based on single dimension analysis such as LPC, Cepstral, or FFT. In order to demonstrate the novel approach two tasks of word and phoneme recognition were conducted. The word recognition was carried out as a preliminary study. A small database of 30 names spoken by 15 speakers were selected. As a phoneme recognition task, a series of experiments were conducted on the voice stops ('b', 'd', 'g') of the TIMIT [1] database uttered by 630 speakers (male & female). The extracted data form the basis for input patterns for training two types of neural networks, the semi-dynamic network Time-Delay Neural Network (TDNN), and a static network Multilayer perceptron (MLP). For word recognition task a recognition of 86 percent were achieved for 7 names using TDNN. However for the phoneme recognition task the highest recognition rates of 77.5 and 72.4 percent were recorded for TDNN and MLP respectively. The results for phoneme recognition contrasts with 72 percent quoted by Hwang et al [2] for the same phonemes spoken by 40 females. I. Introduction Since the advent of neural networks there has been a growing interest in automatic speech recognition. Many reviews have been carried out on different approaches to train a neural network [3,4]. The ultimate aim of this challenging task is to develop a system for speaker and text independent speech recognition. However current status of research falls well short of a comprehensive solution to this problem. The aim of this paper is a step forward in that direction. The proposed idea has shown significant improvements in recognition accuracy as well as convergence rate for cross validation and training. Fig. 1 illustrates the overall system model. In the following section the input data is defined. Section 3 explains the processing and feature extraction algorithm. In section 4 the neural network structures are discussed, and the results for different neural networks for the TIMIT database are quoted.
Input __ Speech
Featureprocessing) Extraction ~ Pre- |lI-~[[ (image processing
Networks~ Neural (TDNN&MLP)
Recognised Speech
Fig. 1 The overall system
2. Data Collection 2.1. Word Database The objective of this task was to recognize the number of names of personnel within our department. The names were recorded under room conditions with a noisy background by means of a good tape recorder and a dynamic microphone, in order to produce samples such as may occur in a real application. Thirty names were recorded by males and females speakers. The recording were then transferred to a Sun workstation in ~t-law 8-bits .au format, sampled at 8 KHz and edited to a length of 750 msec. The names with shorter length were padded to fit the fixed length. The data were then converted to 16-bit linear format with the same sampling rate and finally they were divided into training and testing sets ready for processing. The recorded speech signals were not centered or time aligned so they would correspond to real data.
680
2.1. Phoneme Database The voice stops ('b', 'd', 'g') data were extracted from natural continuous speech, uttered by 630 speakers from 8 different regions of the TIMIT [1 ] database. Over 2000 utterances were selected for training the neural networks and about 1250 utterances in total were selected for cross validation. The training and cross validation (i.e. testing) utterances consist of the same number of male and female for both pattern sets.
3. Data Processing & Feature Extraction Feature selection refers to the choice of certain attributes of an image. Such attributes are required for the recognition/classification purposes. The fundamental principal in digital image processing is the ability to represent the image in a space in which the attributes of the picture are not correlated. The orthogonal transform has such a distinct properties:I. It decorrelates the signal in the transform domain, II. It contains the most variance in the fewest number of transform coefficients. DCT [5] is the best sub-optimal orthogonal transform in comparison with the KLT (Karhunen-Loeve Transform) which is referred to as the optimal transform. Fig. 2 illustrates the MSE of the different orthogonal transforms [6] versus the size of the block. As it can be seen in Fig. 2 the smaller blocks are chosen rather than the entire image for three major reasons. Firstly, to exploit the redundancy in a set of pixels. Secondly, image processing of smaller number of blocks is computationally less intensive which reduces the real-time constraint for most practical purposes. Finally, any one pixel in a picture is likely to be closely related to the four pixels that surround it and similarly each of these four pixels are likely to bear the same relation to their respective neighbors, but the original pixel is unlikely to be related to one which is a long distance away. Therefore, by splitting up the image in a number of smaller blocks we hope to form groups of pixels that are statistically related with a consequently high level of redundancy. The generated wide-band spectrogram was broken into a number of PxQ (P = Q = 8) pixel blocks as it is shown in Fig. 3, where R and C are the dimensions of the spectrogram.
Fig. 2 Mean square error versus
Fig. 3 Image segmentation
Fig. 4 Processed PxQ block
block size for different orthogonal transforms A 2D-DCT of each block of 8x8 was calculated and the m key features were extracted as is shown in Fig. 4. In the case of the word recognition blocks of 16xl 6 were selected. As it is shown in Fig. 4 the frequency increases along the vertical and horizontal axis starting at dc element which is situated at top left comer, i.e. first element, and ends at the 64 th element which is situated at bottom right with highest frequency. Most of information in each processed block is stored in the low frequency region. The dc component was selected as the key feature from each individual block and stored in a pattern file for training neural networks. The overall system consists of three main sections as is shown in Fig. 5. In pre-processing stage the analogue data are converted to 16-bit linear data. The second stage represents the image processing and the key feature extraction and finally in the last section the generated patterns are trained and tested by the two neural networks.
681 4. Neural Network Structure and Results The selected data form the basis for input patterns for training neural networks. In this study a semi-dynamic neural network (Time-Delay Neural Network, TDNN) and a static network (Multilayer Preceptors, MLP) are trained for recognition purposes. These two networks were used in order to investigate whether the processed spectrogram needs to adapt to the dynamic behavior of the speech signal or the extracted features are adequate for a simple static network. 4.1. Word recognition A simple structure of 16x14 input layer with three sliding windows, first hidden layer of size of 14x8 with five sliding windows, second hidden layer of size 10x3 and finally N output nodes where N is the number of outputs depending how many names were required to be recognized. The network was trained for speaker independent recognition of 3, 5, and 7 names. After training the performance of the TDNN was tested with test patterns. The networks with 3 and 5 outputs showed a 100% accuracy in word recognition. The network with 7 classification achieved a lower accuracy of 86% for the test data set. The result for the 7 names can be improved by introducing time alignment and centering the signal. 4.2. Phoneme recognition The proposed procedure reduces the number of input nodes in the training patterns and at the same time provides a more prominent number of features in the data-set. For example in this experiment the input features are reduced to 72 (8x9) compared to 240 (16x15) reported by Waibel [1] for the same task. Therefore for a TDNN network the reduction in the number of input units translates to a smaller number of hidden nodes (reducing the total number of connections), which in turn results in shorter training time and better convergence rate. The TDNN used for this experiment has a input layer of 8x9 (72) nodes with three sliding windows (8x9/3), first hidden layer of 6x7 (42) nodes with four sliding windows (6x7/4), second hidden layer of 3x3 (9) nodes, and finally 3 outputs, i.e. 8x9/3 -6x7/4- 3x3 - 3. In case of MLP the same number of input and output was used, i.e. 72 and 3 respectively, but only one hidden layer of 20 nodes was used in comparison with two hidden layers in the TDNN. A full set of results are illustrated in Table 1. The highest recognition rates of 77.5 and 72.4 percents were recorded for TDNN and MLP respectively, These results contrast with result of 72 percent quoted by Hwang et al [2] for the same phonemes spoken by only 40 female speakers.
pre-processing
I
Analog Speech Signal
ADC samples8 KHz 8-bits mu-law
--
11
convert to 1 - ~ spectogram 16-bits linear I l withnp~
choose P, Q m
I
Ii
Image Processing (Feature Extraction)
I
r Recognised phoneme~word
-~
I I
Classifier TDNN
i I
takemfeatures~ from each segment (zigzag scanning)
store as pattern file
I
L
I
I
Fig. 5 The speech recognition system.
NN Types
TDNN MLP
divide segments into L number of PxQ i
Training 95 89
Table 1 Results of the TIMIT database.
Testing 77.5 72.4
2D-DCT of each segments
I
682 References
[1] Waibel A H, Hanazawa T, Hinton G,Shikano K, Lang K, "Phoneme Recognition Using Time-Delay Neural Networks.", IEEE Trans. on ASSP, Vol. ASSP-37, No. 3, March 1989. [2] Hwang J, Li H, "Interactive Quary Learning for Isolated Speech Recognition", Proc. Of IEEE Signal Processing, Network for Signal Processing II, Denmark 31 Aug. - 2 Sep. 1992, page 93-102. [3] Lippmann R P, "Review of Neural Networksfor Speech Recognition.", Neural Computation, Vol. 1, pp 1-38, MIT Publication, 1989. [4] Rabiner L, Juang B H, "Fundamental of Speech Recognition", Prentice-Hall International Inc., 1993. [5] Ahmed N, Natarajan T., Rao K. R., "Discrete Cosine Transform.", IEEE Trans. on Computer Jan. 1974. [6] Rao K R, Yip P, "DCT, Algorithm, Advantages, Applications.", Academic Press Inc., 1990.
Proceedings IWISP '96; 4- 7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
683
MODIFIED NLMS ALGORITHM FOR ACOUSTIC ECHO CANCELLATION M. M E D V E C K Y SLOVAK TECHNICAL UNIVERSITY FACULTY OF ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY DEPARTMENT OF TELECOMMLrNICATIONS ILKOVICOVA 3, 812 19 BRATISLAVA, SLOVAKJA
Abstract This paper presents the modified version of normalised least-mean-square algorithm (M-NLMS). The M-NLMS algorithm was developed for efficient weight adaptation of the modified finite impulse response (M-FIR) filter that is especially suitable for modelling systems with a very long, decaying and time varying impulse response in final precision arithmetic in applications such as hardware realisation of acoustic echo canceller. The M-NLMS algorithm derivation, parameter investigation and simulation results are presented. The comparison of acoustic echo cancellers with adaptive FIR filter adapted by NLMS algorithm and adaptive M-FIR filter adapted by M-NLMS algorithm are presented. The better performance of the acoustic echo canceller with M-NLMS algorithm is demonstrated by simulation results.
1. Introduction The main problem of acoustic echo cancellation is the precise identification of the acoustic impulse response. The actual structure of the echo path is usually modelled by a FIR filter. Such a filter has the advantages of guaranteed stability during adaptation, unimodal means square error (MSE) surface and the principle of FIR filter response computation is similar to the acoustic echo origination process. The FIR filter may require up to thousands adaptive weights to correctly identify the acoustic impulse response. Such a number of taps result in a large arithmetic complexity that is a crucial problem for real time implementations like handsfree telephone or videoconference. The solution is a hardware realisation of the acoustic echo canceller that enable a parallel processing implementation. Unfortunately, the hardware realisation brings a problem with a computation precision. Arithmetic precision has a profound impact on realisation complexity. The hardware realisations of the acoustic echo canceller by VLSI or ASIC circuits use from the reason of simpler realisation fixed point arithmetic that yields to poorer performance than floating point arithmetic. Furthermore, the room impulse response has a very specific shape: a short delay followed by an exponentially decaying tail. The modelling of such a characteristic by a transversal filter with fixed point arithmetic leads due to roundoff error to gradual degradation of a filter weights accuracy in direction toward the end of the filter [ 1]. This yields to a poorer performance of the acoustic echo canceller. The problem can be solved by the partitioning of the impulse response in time or in frequency domain for each partition [2]. Then, the filter weights may have a different gain in each block. Since the room impulse response changes - the partitions and gains should be changed as well. To solve this problem a modified structure of the FIR (M-FIR) filter and the modified version of a NLMS (M-NLMS) algorithm has been derived and they are presented.
2. Modified FIR filter When we use an adaptive transversal filter to model an acoustic echo, then the output Yk of the transversal filter can be obtained as a sum of weighted contributions from the last N samples of the input signal x. It can be written as follow N-I
Yk = Z i=o
N-I
Yik -" Z
/=o
Xk. iwik
(1)
where Wik are weight of the adaptive filter in time k. If d k is a sample of a desired signal in time k than the error e k is given as ek = dk - Yk = dk - (Y0k + Ylk + "''+ Y(N-I)k ) = ( d k - Y0k ) - Ylk - "'" - Y(N-l)k
(2)
684 Applying the following substitutions dok = dk
(3)
elk = dik "Yik d(i+l)k = elk
(4)
(5)
we obtain a new topology of an adaptive filter. The block diagram of the Modified FIR filter is shown in Fig. 1.
o Xk Input signal
_ 1
~
I
z-~l
~W0k Desired e
dk
d
x~.~
i
I
J .~ "l Z
Xk .~ . . . .
' I
~y~lk
[ Yok d
.....
~/W2k lk
eok
.
X k. ~.,
I ' 1
~ Z 1
~W(N_l)k
.a._~Y2k
e~k
d2
_ ~ Y(~-l)k
e2k
dc .
e~N.~)k
Fig. 1. Modified FIR filter As can be seen, the values of the
weight Wikare decreasing as well we
error
eik are decreasing with raising i. Since the values of the
can increase the computation accuracy by multiplying (bit shifting) of
the elk and Wik when their values fall under the specific level (thresholds). The multiplying can be made adaptive. The choice of the multiplier coefficients from the range 2" enables to realise the multiplication very simple by rotation. The hardware realisation of such an adaptive acoustic echo canceller can have the computation complexity equal to the complexity of the classical FIR filter. More details about the M-FIR filter implementations and internal data representations can be find in [3].
3. M o d i f i e d N L M S
algorithm
Consider a situation that the real signal and filter with coefficient represented by real numbers should be realised by the system with fixed point arithmetic. In this case, scaling of data and coefficients to best fit within the dynamic range of the used system is required. In practical terms, scaling is reduced to selection of crest factor appropriate for the signal characteristics and the precision used in storage and arithmetic. In that case, the internal scaling can be realised by the normalisation to the value 6 that is a signal power level rounded to the nearest 2" value. The variable 6 represents a basic shift in internal number representation. Considering the above implementation the convergence factor g for M-NLMS algorithm can be express as
,
,
T
~k -- (X / (It -F X kXk)
(6)
where ct and y are coefficients a and y from NLMS algorithm shifted by the value 5. The term X kXk in equation (6) can be obtain for time k by iteration T
2
X kXk -" X k-i and in our ease x
T
=
x
T
T
2
2
-- X k.iXk.! - X k-N-! -k- X k - x
2
+
x
2
(7) (8)
The new values of the filter weight can be obtained as follow. Let we define
W'i(k+l) = (Wik 8) / 8Wk + ((~tkekXk.i ) ] 8) / 8e
(9)
and
W**i(k+l) = (W*i(k+l) 8Wk+!) / 8 (10) where 8 w k is an additional shift of the weight wi in time k compare to the basic shift fi and 8 w k+! is an additional shift of the filter weight w~ in time k+ 1 compare to the basic shift 8. The ek is a final error (see Fig. 1.) obtained in time k and 8 ~ is additional shift of this error. The brackets in equations (9) and (10) define the sequence of arithmetic operations execution that guarantee that the entire term can be retained without prematurely truncating its precision and without resorting to extended-width of internal buses.
685 Next we can define following six conditions C! = 1 if I w'*i(k+l) I > WMw
(lla)
1 if I w*'~+~)l > '-e"w
(llb)
C 2 --
c3=1
if
8wk+1> 1
(11c)
C4 = 1 if [W**i(k+l)I < WOw
(lid)
C5 = 1 if I w**iO,+l)I 0
(lle)
C6 -- 1 if
(llf)
8wk+l
< ~M w
where 8Mw is a maximum acceptable additional shift of the weight W i. The threshold WMw is a maximum value of wi. The thresholds WHwand WLw define high/low level that when they are crossed then the weight wi can be increased/decreased (shifted up/down) by the unit step 8~. Now we can define four logical functions f~ =Cl
(12a)
f2 = C2 ^ C3/x fl
(12b)
f3 = C4/x C5/x C6 ^ f2
(12c)
f4= f3 If the value of the function f~ = 1 (True), than 8Wk+1 = 1 Wi(k+l)
-" W*
i(k+l) / 8
else if function f2 = 1 (True), than 8wk+1= 8wk+1 / 81 **
wio~+l) = w i(k+1)/ 81 if not, than if f3 = 1 (True)
(12d) (13) (14) (15) (16)
8Wk+1= 8Wk+i 81
(17)
Wi(k+l)
i(k+l) 81
(18)
i(k+l)
(19)
= W
and if the function f4 = 1 (True), than* * Wi(k+l)
"- W
and 8 w, k+~does not change. The value of 8w can be computed from two one bit (Boolean) variables 8u and 8D which indicate the shift of 8w in stage i against the shift in stage i-1. This approach enables to save a storage space and is implemented in M-FIR filter. The new values of 8u and 8D are set-up as follow 8 uI = fl ^ f3 (20) 8 D.I = fi A f2
(21)
4. Implementation The hardware realisation of the M-NLMS algorithm can decrease the computation complexity to the complexity of the classical NLMS algorithm. The conditions (15) can be solved simply by comparators. The function (16), (20) and (21) can be generated by logic elements (AND) or by look-up tables. When the thresholds and the unit step 8 ! are set as numbers 2" than the multiplication and division in equations (6 - 27) can be realised by multiplexer, demultiplexer or simply by bit shifting.
5. Simulation results To verify better performance of the Modified NLMS algorithm for acoustic echo cancellation, the following computer simulations were carried out. In the simulation an impulse response of a real teleconference room with length 4000 samples has been suppressed. The two adaptive filters with 4000 coefficients have been used. The FIR filter has been adapted by NLMS algorithm and M-FIR filter has been adapted by M-NLMS algorithm. The filter parameters were the same. In both cases the real data were scaled and internally represented by 16 bit integer with 11 bit for decimal part (the basic shift 8 = 2048). The unit step for M-NLMS algorithm was 81 = 2. The measuring signal was the white Gaussian noise. Total 50000 iterations has been made for each experiment. The convergence characteristics of NLMS and M-NLMS algorithms are shown in Fig. 2. As can be seen, while NLMS algorithm reaches only 30dB echo suppression, M-NLMS algorithm overcomes a 40 dB level defined by ITU-T.
686
Fig. 2. Comparison of convergence characteristics of echo cancellers adapted by NLMS Dependencies of the level of acoustic echo cancellation on the beginning value of normalised adaptation coefficient ~ for NLMS and M-NLMS algorithm are shown in Fig. 3. As can be seen, the higher level of acoustic echo cancellation is reached by M-FIR filter adapted by M-NLMS algorithm. Furthermore, while NLMS algorithm reaches the maximum level of acoustic echo cancellation for higher values of normalised convergence parameter ct ~ 2 and faster convergence for ct ~ 1 then M-NLMS algorithm reaches the maximum level of acoustic echo cancellation and the fastest convergence for the same value ot ~ 1. (Note: parameter ct can be in range 0 < ct < 2) Therefore, the choice c~ = 1 can decrease the computation complexity of M~NLMS algorithm.
Fig. 3. Dependence of the level of acoustic cancellation on the nonnalised adaptation coefficient c~
6. Summary In this paper the new modified version ofNLMS algorittun (M-NLMS) is presented. It is shown that the M-NLMS adaptive algorithm can achieve the better performance for acoustic echo cancellation then the NLMS algorithm with the same parameters. The implementation of weight shifting algorithm enables better exploitation of the dynamic range given by the number of bits for data representation. The effect of adaptive weight shifting is similar to floating point arithmetic, but its hardware implementation is much simpler. The hardware realisation enables achieve the same computation complexity as classic FIR filter and NLMS algorithm.
References [1] TREICHLER, J. R., JOHNSON, C. R., LARIMORE, M. G.: Theory and Design of Adaptive Filters.John Willey & Son, New York, 1987. [2] BORRALLO, J. M. P., OTERO, M. G.: "On the Implementation of a Partitioned Block Frequency Domain Adaptive Filter (PBFDAF) for Long Acoustic Echo Cancellation". Signal Processing, Vol. 27, 1992, pp. 301-315. [3] MEDVECKY.M.: "Improvement of Acoustic Echo Cancellation in Hands-free Telephones". In.: 1st htemational Conference in Telecommunications Technologies TELEKOMUNIKACIE'95, Bratislava, 31.5-1.6.1995, Vol.1, 1995, pp.127-132. (in Slovak)
Proceedings IWISP '96," 4-7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (FAitors) 9 1996 Elsevier Science B.V. All rights reserved.
687
Matrix Polynomial Computations Using the Reconfigurable Systolic Torus T. H. K a s k a l i s * K.G. Margaritis D e p a r t m e n t of I n f o r m a t i c s , U n i v e r s i t y of M a c e d o n i a 156 E g n a t i a str., 54006 T h e s s a l o n i k i , G r e e c e E-mail: {kaskalis,kmarg} ~macedonia.uom.gr
Abstract
A wide range of matrix functions, including matrix exponentials, inversions and square roots can be transformed to matrix polynomials through Taylor series expansions. The efficient computation of such matrix polynomials is considered here, through the exploitation of their recursive nature. The Reconfigurable Systolic Torus is proposed for its ability to implement iterative equations of various forms. Moreover, a detailed example of the matrix exponential realization is presented, together with the scaling and squaring method. The general design concepts of the Reconfigurable Systolic Torus are discussed and the algorithmic steps needed for the implementation are presented. The Area and Time requirements, together with the accomplished utilization percentage conclude the presentation.
1
Introduction
The solution of various types of equations, appearing in many mathematical models, dynamic probabilistic systems and in stochastic and control theory, often requires the calculation of distinct matrix functions [1, 2, 6, 13]. Such types of functions include matrix exponentials (cA), matrix inversions ( A - l ) , matrix roots (A 1/2, A-i~ 2) or functions of the form: cos A, sin A, log A, etc. As a result, the transformation of these polynomials to iterative algorithms, and the consequent efficient implementation, becomes an important issue. The Reconfigurable Systolic Torus [7] is a structure designed to implement iterative equations of various forms and is, therefore, a good candidate for the realization of matrix polynomial computations.
2
Systolic Implementation Example" The Matrix Exponential
In order to present the distinct steps for the implementation of a particular matrix function, we.will focus on the matrix exponential example. Given a matrix A, the exponential eA c a n be formally defined by the following convergent power series" A2
A3
eA -- I + A + T ( + --~. + ...
(1)
The straightforward algorithmic approach for calculating the exponential of a matrix is the Taylor series approximation technique:
eA " " T k ( A ) -
p!
(2)
p=0
However, such an algorithm is known to be unsatisfactory, since k is usually very large, for a sufficiently small error tolerance. Furthermore, the round-off errors and the computing costs of the Taylor approximation increase as IIAII increases [10]. We can surpass these difficulties by using the fundamental property:
~
_ (~/,~)m
(3)
Moreover, if we employ the scaling and squaring method, we choose m to be a power of two, such that: m-
2z
and
IIAII m
-
IIAII
21 -