Springer Series on S IGNALS AND C OMMUNICATION T ECHNOLOGY
S IGNALS AND C OMMUNICATION T ECHNOLOGY
Imaging for Forensics and Security: From Theory to Practice A. Bouridane ISBN 978-0-387-09531-8
Human Factors and Voice Interactive Systems, Second Edition D. Gardner-Bonneau and H. Blanchard ISBN 978-0-387-25482-1
Multimedia Content Analysis: Theory and Applications A. Divakaran (Ed.) ISBN 978-0-387-76567-9
Wireless Communications: 2007 CNIT Thyrrenian Symposium S. Pupolin ISBN 978-0-387-73824-6
Grid Enabled Remote Instrumentation F. Davoli, N. Meyer, R. Pugliese, S. Zappatore ISBN 978-0-387-09662-9
Adaptive Nonlinear System Identification: The Volterra and Wiener Model Approaches T. Ogunfunmi ISBN 978-0-387-26328-1
Usability of Speech Dialog Systems T. Hempel ISBN 978-3-540-78342-8 Handover in DVB-H X. Yang ISBN 978-3-540-78629-0 Multimodal User Interfaces D. Tzovaras (Ed.) ISBN 978-3-540-78344-2 Wireless Sensor Networks and Applications Y. Li, M.T. Thai, W. Wu (Eds.) ISBN 978-0-387-49591-0 Passive Eye Monitoring R.I. Hammoud (Ed.) ISBN 978-3-540-75411-4 Digital Signal Processing S. Engelberg ISBN 978-1-84800-118-3 Digital Video and Audio Broadcasting Technology W. Fischer ISBN 978-3-540-76357-4 Satellite Communications and Navigation Systems E. Del Re, M. Ruggieri (Eds.) ISBN 978-0-387-47522-6 Three-Dimensional Television H.M. Ozaktas, L. Onural (Eds.) ISBN 978-3-540-72531-2 Foundations and Applications of Sensor Management A.O. Hero III, D. Casta˜no´ n, D. Cochran, and K. Kastella (Eds.) ISBN 978-0-387-27892-6
Wireless Network Security Y. Xiao, X. Shen, and D.Z. Du (Eds.) ISBN 978-0-387-28040-0 Satellite Communications and Navigation Systems E. Del Re and M. Ruggieri ISBN: 0-387-47522-2 Wireless Ad Hoc and Sensor Networks A Cross-Layer Design Perspective R. Jurdak ISBN 0-387-39022-7 Cryptographic Algorithms on Reconfigurable Hardware F. Rodriguez-Henriquez, N.A. Saqib, A. D´ıaz P´erez, and C.K. Koc ISBN 0-387-33956-6 Multimedia Database Retrieval A Human-Centered Approach P. Muneesawang and L. Guan ISBN 0-387-25627-X Broadband Fixed Wireless Access A System Perspective M. Engels and F. Petre ISBN 0-387-33956-6 Distributed Cooperative Laboratories Networking, Instrumentation, and Measurements F. Davoli, S. Palazzo and S. Zappatore (Eds.) ISBN 0-387-29811-8 The Variational Bayes Method in Signal Processing ˇ ıdl and A. Quinn V. Sm´ ISBN 3-540-28819-8 (continued after index)
Ahmed Bouridane
Imaging for Forensics and Security From Theory to Practice
123
Ahmed Bouridane Queen’s University, Belfast Department of Computer Science Faculty Engineering Belfast United Kingdom BT7 1NN
[email protected] ISSN 1860-4862 ISBN 978-0-387-09531-8 e-ISBN 978-0-387-09532-5 DOI 10.1007/978-0-387-09532-5 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2009927770 c Springer Science+Business Media, LLC 2009 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of going to press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
To my Wife Saida, Daughters Asma and Alaa, and Son Abbudi For their love and support
“This page left intentionally blank.”
Preface
The field of security has witnessed an explosive growth during the last years, as phenomenal advances in both research and applications have been made. Biometric and forensic imaging applications often involve photographs, videos and other image impressions that are fragile and include subtle details that are difficult to see. As a developer, one needs to be able to quickly develop sophisticated imaging applications that allow for an accurate extraction of precious information from image data for identification and recognition purposes. This is true for any type of biometric and forensic image data. The applications covered in this book relate to Biometrics, Watermarking and Shoeprint recognition for forensic science. Image processing transforms using Discrete Fourier Transform, Discrete Wavelet Transforms Gabor Wavelets, Complex Wavelets, Scale Invariant Feature Transforms and Directional Filter banks are used in data modelling process for either feature extraction or data hiding tasks. The emphasis is on the methods and the analysis of data sets including comparative studies against existing and similar techniques. To make the underlying methods accessible to a wider audience, we have stated some of the key mathematical results given in a logical structure of the development. For example, biometric based methods are emerging as the most reliable solutions for authentication and identification applications where traditional passwords (knowledge-based security) and ID cards (token-based security) have been used so far to access restricted systems. Automated biometrics deal with physiological or behavioural characteristics such as fingerprints, iris, voice and face that can be used to authenticate a person’s identity or establish an identity within a database. With rapid progress in electronic and Internet commerce, there is also a growing need to authenticate the identity of a person for secure transaction processing. Current biometric systems make use of fingerprints, hand geometry, iris, retina, face, facial thermograms, signature gait, and voiceprint to establish a person’s identity. While biometric systems have their limitations they have an edge over traditional security methods in that they cannot be easily stolen or shared. Besides bolstering security, biometric systems also enhance user convenience by alleviating the need to design and remember passwords. Driven by the urgent need to protect digital media content that is being widely and wildly distributed and shared through the Internet by an ever-increasing number vii
viii
Preface
of users, the field of digital watermarking has witnessed an extremely fast-growing development since its inception almost a decade ago. The main purpose of digital watermarking, information embedding, and data hiding systems is to embed auxiliary information, usually called digital watermarks, inside a host signal (audio, image, and video) by introducing small and minor perturbations into the host signal. The quality of the host signal should not be degraded unacceptably and the introduced changes lie below the minimum perception threshold of the intended recipient. Watermark detection and extraction from the composite host signal should be possible in the presence of a variety of intentional and unintentional manipulations and attacks. It is obvious that these attacks and manipulations do not corrupt the composite host signal at an unacceptable level. Watermarking systems are expected to play an important role in meeting at least two major challenges that resulted from the widespread use of Internet for the distribution and exchange of digital media: (i) error-free perfect copies of digital multimedia and (ii) availability of free and affordable tools for the manipulation and alteration of digital content. The first challenge was the driving force that led the combined efforts of academic and industrial research to produce first-generation watermarking algorithms. These algorithms were mainly concerned with the “copyright protection” of the digital content. For instance, illegal distribution and copying of digital music is causing the music industry massive gain losses. The second challenge has guided the research efforts to develop what are so-called “tamper-proof” or “fragile” watermarking algorithms. This class of watermarking schemes aims at detecting any “intentional” manipulation or corruption of the media. Following the emergence and success of forensic science as a powerful and irrefutable tool for solving many enigmatic crime puzzles, images collected from crime scenes are abounding and, therefore, large image collections are being created. Shoeprint images are no exception and it has been indicated, recently, that shoeprint evidence, at crime scenes, is more frequently present than fingerprints. Very recently, it has been suggested that shoeprint evidence should be made comparable to that of fingerprint and DNA evidence. It is also true that shoeprint intelligence remains an untapped potential forensic source (usually overshadowed by the accepted fingerprint and DNA evidence). However, there is no practical technology to efficiently search shoeprint on large databases. Existing commercial systems still require manual involvement (manual annotation of both the impression under investigation and the primary database). The task of automated scenemark matching is a tedious one and researching the use of existing image processing and pattern recognition techniques is desired before an underpinning technology is developed. One of the most distinctive features of the book is that it covers in detail a number of imaging applications and their deployment in security problems. In addition, the book appeals to both undergraduate and postgraduate students since each application problem includes a detailed description of the mathematical background and its implementation. Most of the material of the book is derived from very recent research output generated by various researchers at doctoral level under the supervision of the author.
Preface
ix
This brings some novelty of the topics through a thorough analysis of the results of the implementation. My indebtedness to those students, in particular W R Boukabou, M Gueham, M Laadjel, M Nabti, O Nibouche, I Thompson, H Su, K Zebbiche and A Baig of the Speech, Image and Vision Systems (SIVS) group at the School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast. The book is organised as follows. Chapter 1 starts by defining the biometric technology including the characteristics required for a viable deployment using various operation modes such as verification, identification and watch-list. A number of currently used biometric modalities are also described with some emphasis of few emerging ones. Then the various steps of a typical biometric recognition system are discussed in detail. For example, data acquisition, image localisation, feature extraction and matching are all defined and the current methods employed for their implementation and deployment discussed and contrasted. The chapter concludes by briefly highlighting the need to use appropriate datasets for the evaluation of a biometric system. Chapter 2 introduces the notion of data representation in the context of biometrics. The various stages of a typical biometric system are also enumerated and discussed and the most commonly deployed biometric modalities are stated. The chapter also examines various aspects related to image data representation and modelling for feature extraction and matching. Various methods are then briefly discussed and brought within the context of a biometric system. For example, image data formats, feature sets and system testing and performance evaluation metrics are detailed. In Chapter 3 recent advances in enhancing the performance of face recognition using the concept of directional filter banks is discussed. In this context, the directional filter banks are investigated as pre-processing phase in order to improve the recognition rates of a number of different and existing algorithms. The chapter starts by reviewing the basic face recognition principles and enumerates the various steps of a face recognition system. Four algorithms representing both Component and Discriminant Analysis approaches, namely: PCA, ICA (FastICA), LDA and SDA are chosen for their proven popularity and efficiency to demonstrate the usefulness of the directional filter bank method. The mathematical models behind these approaches are also detailed. Then the proposed directional filter bank method is described and its implementation discussed. The results and their analysis are finally assessed using two well known face databases. Chapter 4 is concerned with recent advances in iris recognition using a mutiscale approach. State of the art works in the area is first highlighted and discussed and a detailed review of the various steps of an automatic iris recognition system enumerated. Proposed developments are then detailed for both iris localisation and classification using an integrated multiscale wavelet approach. Extensive experimentation is carried out and a comparative analysis with some state of the art approaches given. The chapter concludes by giving some future directions to further enhance the results obtained. In chapter 5, the use of complex wavelets for image and video watermarking is described. The theory of complex wavelets and their features are first highlighted.
x
Preface
The concept of spread transform watermarking is then given in detail and its combination with the complex wavelet transforms detailed. Information theoretic capacity analysis for watermarking with complex wavelets is then elucidated. The chapter concludes with some experiments and their analysis to demonstrate the improved levels of capacity that can be achieved through the superior feature representation offered by complex wavelet transforms. Chapter 6 discusses the problem of one-bit watermark detection for protecting fingerprint images. Such a problem is theoretically formulated based on the maximum-likelihood scheme, which requires an accurate modeling of the host data. The watermarking is applied into the Discrete Wavelet Transform (DWT) due to the vavious advantages provided by this transform. First, a statistical study of DWT coefficients is carried out by investigating and comparing three distributions, namely, the generalized Gaussian, Laplacian and Cauchy models. Then, the performances of the detectors based on these models are assessed and evaluated through extensive experiments. The results show that the generalized Gaussian is the best model and its corresponding detector yields the best detection performance. Chapter 7 is intended to introduce the emerging shoemark evidence for forensic use. It starts by giving a detailed background of the contribution of shoemark data to scene of crime officers including a discussion of the methods currently in use to collect shoeprint data. Methods for the collection of shoemarks will also be detailed and problems associated with each method highlighted. In addition, the chapter gives a detailed review of existing shoemark classification systems. In Chapter 8, methods for automatically classifying shoeprints for use in forensic science are presented. In particular, we propose two correlation based approaches to classify low quality shoeprints: i) Phase-Only Correlation (POC) which can be considered as a matched filter, and ii) Advanced Correlation Filters (ACFs). These techniques offer two primary advantages: the ability to match low quality shoeprints and translation invariance. Experiments were conducted on a database of images of 100 different shoes available on the market. For the experimental evaluation, challenging test images including partial shoeprints with different distortions (such as noise addition, blurring and in-plane rotation) were generated. Results have shown that the proposed correlation based methods are very practical and provide high performance when processing low quality partial-prints. Chapter 9 is concerned with the retrieval of scene-of-crime (or scene) shoeprint images from a reference database of shoeprint images by using a new local feature detector and an improved local feature descriptor. Similar to most other local feature representations, the proposed approach can also be divided into two stages: (i) a set of distinctive local features is selected by first detecting scale adaptive Harris corners where each corner is associated with a scale factor. This allows for the selection of the final features whose scale matches the scale of blob-like structures around them and (ii) for each feature, an improved Scale Invariant Feature Transform (SIFT) descriptor is computed to represent it. Our investigation has led to the development of two novel methods which are referred to as the Modified Harris-Laplace (MHL) detector and the Modified SIFT descriptor, respectively.
Preface
xi
Contributions: Chapter 2: “Data Representation and Analysis” A. Baig and A. Bouridane Chapter 3: “Improving Face Recognition Using Directional Faces” W. R. Boukabou and A. Bouridane Chapter 4: “Recent Advances in Iris Recognition: A Multiscale Approach” M. Nabti and A. Bouridane Chapter 5: “Spread Transform Watermarking Using Complex Wavelets” I. Thompson and A. Bouridane Chapter 6: “Protection of Fingerprint Data Using Watermarking” K. Zebbiche and A. Bouridane Chapter 7: “Shoemark Recognition for Forensic Science: An Emerging Technology” H. Su and A. Bouridane Chapter 8: “Techniques for Automatic Shoeprint Classification” M. Gueham and A. Bouridane Chapter 9: “Automatic Shoeprint Image Retrieval Using Local Features” H. Su and A. Bouridane Belfast, United Kingdom, 2008
Ahmed Bouridane
“This page left intentionally blank.”
Contents
1 Introduction and Preliminaries on Biometrics and Forensics Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Definition of Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.1 Biometric Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Biometric Modalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Recognition/Verification/Watch-List . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Verification: Am I Who I Claim to Be? . . . . . . . . . . . . . . . . . . 5 1.3.2 Recognition: Who Am I? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.3 The Watch-List: Are You Looking for Me? . . . . . . . . . . . . . . . 6 1.4 Steps of a Typical Biometric Recognition Application . . . . . . . . . . . . . 6 1.4.1 Biometric Data Localisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4.2 Normalisation and Pre-processing . . . . . . . . . . . . . . . . . . . . . . 7 1.4.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.4 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.5 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Data Representation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Sensor Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Matcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 System Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11 11 12 13 14 15 16 17 17 18 19
3 Improving Face Recognition Using Directional Faces . . . . . . . . . . . . . . . . 21 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 xiii
xiv
Contents
3.2
Face Recognition Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Recognition/Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Steps of a Typical Face Recognition Application . . . . . . . . . . 3.3 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . . 3.3.2 Independent Component Analysis (ICA) . . . . . . . . . . . . . . . . . 3.3.3 Linear Discriminant Analysis (LDA) . . . . . . . . . . . . . . . . . . . . 3.3.4 Subspace Discriminant Analysis (SDA) . . . . . . . . . . . . . . . . . . 3.4 Face Recognition Using Filter Banks . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Gabor Filter Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Directional Filter Bank: A Review . . . . . . . . . . . . . . . . . . . . . . 3.5 Proposed Method and Results Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 ICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.4 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.5 SDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.6 FERET Database Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22 22 23 26 26 27 28 29 31 31 33 37 37 38 39 41 41 43 45 45
4 Recent Advances in Iris Recognition: A Multiscale Approach . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Related Work: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Iris Localisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Iris Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Existing Methods for Iris Localisation . . . . . . . . . . . . . . . . . . . 4.4 Proposed Method for Iris Localisation . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 The Multiscale Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Texture Analysis and Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Wavelet Maxima Components . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Special Gabor Filter Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Experimental Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 Combined Multiresolution Feature Extraction Techniques . . 4.7.3 Template Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.4 Comparison with Existing Methods . . . . . . . . . . . . . . . . . . . . . 4.8 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 49 51 52 52 52 53 55 55 57 65 67 68 68 70 71 72 72 72 73 73 74 75 75
Contents
xv
5 Spread Transform Watermarking Using Complex Wavelets . . . . . . . . . . 79 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Wavelet Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.2.1 Dual Tree Complex Wavelet Transform . . . . . . . . . . . . . . . . . . 80 5.2.2 Non-redundant Complex Wavelet Transform . . . . . . . . . . . . . 83 5.3 Visual Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.3.1 Chou’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.3.2 Loo’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.3.3 Hybrid Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.4 Watermarking as Communication with Side Information . . . . . . . . . . 94 5.4.1 Quantisation Index Modulation . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.4.2 Spread Transform Watermarking . . . . . . . . . . . . . . . . . . . . . . . 97 5.5 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.5.1 Encoding of Watermark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.5.2 Decoding of Watermark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.6 Information Theoretic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.6.1 Decoding of Watermark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.6.2 Parallel Gaussian Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.6.3 Watermarking Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.6.4 Non-iid Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.6.5 Fixed Embedding Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6 Protection of Fingerprint Data Using Watermarking . . . . . . . . . . . . . . . . 117 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.2 Generic Watermarking System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.3 State-of-the-Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.4 Optimum Watermark Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.5 Statistical Data Modelling and Application to Watermark Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.5.1 Laplacian and Generalised Gaussian Models . . . . . . . . . . . . . 128 6.5.2 Alpha Stable Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.6.1 Experimental Modelling of DWT Coefficients . . . . . . . . . . . . 132 6.6.2 Experimental Watermarking Results . . . . . . . . . . . . . . . . . . . . 135 6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7 Shoemark Recognition for Forensic Science: An Emerging Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.1 Background to the Problem of Shoemark Forensic Evidence . . . . . . . 143 7.1.1 Applications of a Shoemark in Forensic Science . . . . . . . . . . 144 7.1.2 The Need for Automating Shoemark Classification . . . . . . . . 146 7.1.3 Inconsistent Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
xvi
Contents
7.1.4 Importable Classification Schema . . . . . . . . . . . . . . . . . . . . . . . 148 7.1.5 Shoemark Processing Time Restrictions . . . . . . . . . . . . . . . . . 149 7.2 Collection of Shoemarks at Crime Scenes . . . . . . . . . . . . . . . . . . . . . . . 149 7.2.1 Shoemark Collection Procedures . . . . . . . . . . . . . . . . . . . . . . . 150 7.2.2 Transfer/Contact Shoemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.2.3 Photography of Shoemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.2.4 Making Casts of Shoemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 7.2.5 Gelatine Lifting of Shoemarks . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.2.6 Electrostatic Lifting of Shoemarks . . . . . . . . . . . . . . . . . . . . . . 153 7.2.7 Recovery of Shoemarks from Snow . . . . . . . . . . . . . . . . . . . . . 154 7.2.8 Recovery of Shoemarks using Perfect Shoemark Scan . . . . . . 154 7.2.9 Making a Cast of a Shoemark Directly from a Suspect’s Shoe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.2.10 Processing of Shoemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.2.11 Entering Data into a Computerised System . . . . . . . . . . . . . . . 157 7.3 Typical Methods for Shoemark Recognition . . . . . . . . . . . . . . . . . . . . . 157 7.3.1 Feature-Based Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 7.3.2 Classification Based on Accidental Characteristics . . . . . . . . . 159 7.4 Review of Shoemark Classfication Systems . . . . . . . . . . . . . . . . . . . . . 160 7.4.1 SHOE-FIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 7.4.2 SHOE© . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 7.4.3 Alexandre’s System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 7.4.4 REBEZO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 7.4.5 TREADMARK TM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 7.4.6 SICAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 7.4.7 SmART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 7.4.8 De Chazal’s System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 7.4.9 Zhang’s System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8 Techniques for Automatic Shoeprint Classification . . . . . . . . . . . . . . . . . . 165 8.1 Current Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 8.2 Using Phase-Only Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 8.2.1 The POC Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 8.2.2 Translation and Brightness Properties of the POC Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 8.2.3 The Proposed Phase-Based Method . . . . . . . . . . . . . . . . . . . . . 168 8.2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 8.3 Deployment of ACFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.3.1 Shoeprint Classification Using ACFs . . . . . . . . . . . . . . . . . . . . 173 8.3.2 Matching Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 8.3.3 Optimum Trade-Off Synthetic Discriminant Function Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 8.3.4 Unconstrained OTSDF Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 8.3.5 Tests and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Contents
xvii
8.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 9 Automatic Shoeprint Image Retrieval Using Local Features . . . . . . . . . . 181 9.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 9.2 Local Image Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 9.2.1 New Local Feature Detector: Modified Harris–Laplace Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 9.2.2 Local Feature Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 9.2.3 Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.3.1 Shoeprint Image Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Chapter 1
Introduction and Preliminaries on Biometrics and Forensics Systems
1.1 Introduction Biometric-based security has been researched and tested for a few decades, but has only recently entered into the public consciousness because of high-profile applications especially since the events of 9/11. Many companies and government departments are now implementing and deploying biometric technologies to secure areas, maintain security records, protect borders and maintain law enforcement at borders and entry points. Biometrics is the science of verifying the identity of an individual through his/her physiological measurements, e.g. fingerprints, hand geometry, etc. or behavioural traits, e.g. voice and signature. Since biometric identifiers are associated permanently with the user they are more reliable than token- or knowledgebased authentication methods such as identification card (that can be lost or stolen), password (that can be forgotten), etc. Biometric recognition is concerned with methods and tools for the verification and recognition of a person’s identity by means of unique appearance or behavioural characteristics. This chapter starts by defining the biometric technology including the characteristics required for a viable deployment using various operation modes such as verification, identification and watch-list. A number of currently used biometric modalities are also described with some emphasis on a few emerging ones. Various steps of a typical biometric recognition system are then discussed in detail. For example, data acquisition, image localisation, feature extraction and matching are all defined and the current methods employed for their implementation and deployment are assessed and contrasted. The chapter concludes by briefly highlighting the need to use appropriate data sets for the evaluation of a biometric system.
1.2 Definition of Biometrics Biometric technologies can be defined as “automated methods of verifying or recognising the identity of a person based on a physiological and/or behavioural characteristic” [1, 2]. A. Bouridane, Imaging for Forensics and Security, Signals and Communication Technology, DOI 10.1007/978-0-387-09532-5 1, C Springer Science+Business Media, LLC 2009
1
2
1
Introduction and Preliminaries on Biometrics and Forensics Systems
The term “automated methods” means that biometric technologies are implemented completely by a machine (but not always), generally a digital computer. The second important part from the definition is “physiological or behavioural characteristic”, meaning that biometrics tends to recognise people from their biological and behavioural characteristics. In other words, biometrics defines something you are, in contrast to other methods of identification such as something you have (e.g. cards, keys) or something you know (password, PIN number).
1.2.1 Biometric Characteristics There exist several characteristics that physical or behavioural traits need to fulfil in order to be considered as a viable biometric application and the most agreed upon are [3] • Universality: Every individual accessing the application should possess the trait. • Uniqueness: The given trait should be sufficiently different across individuals comprising the population. • Permanence: The biometric trait of an individual should be sufficiently invariant over a period of time with respect to the matching algorithm. • Measurability: It should be possible to acquire and digitise the biometric trait using suitable devices that do not cause undue inconvenience to the individual. Furthermore, the acquired raw data should be amenable to processing in order to extract representative feature sets: • Performance: The recognition accuracy and the resources required to achieve that accuracy should meet the constraints imposed by the application. • Acceptability: Individuals in the target population that will utilise the application should be willing to present their biometric trait to the system. • Circumvention: This refers to the ease with which the trait of an individual can be imitated using artefacts in the case of physical traits, and mimicry in the case of behavioural traits.
1.2.2 Biometric Modalities In order to establish the identity of an individual, a variety of physical and behavioural characteristics can be used by biometric systems (Fig. 1.1). These include fingerprint, face, hand/finger geometry, iris, retina, signature, gait, palmprint, voice pattern, ear, hand vein, odour or DNA information. In the biometric literature, these characteristics are referred to as traits, identifiers or modalities. No single biometric is expected to effectively meet all the requirements imposed by all applications. In other words, no biometric is ideal but a number of them are
1.2
Definition of Biometrics
3
Fig. 1.1 Examples of biometric traits that can be used to recognise an individual. Illustrations in the figure include ear, iris, hand geometry, face, speech, vein, fingerprint, gait and palmprint traits
admissible [2]. The following sections briefly describe some of the most commonly used including some emerging biometric traits: Fingerprint recognition has been used as a biometric trait for many decades. The identification accuracy using fingerprints has been shown to be very high [4]. A fingerprint is the pattern of ridges and valleys on the surface of a fingertip whose formation is determined during the first seven months of foetal development. It has been empirically determined that the fingerprints of identical twins are different and so are the prints on each finger of the same person [5]. Fingerprint biometrics currently has three main applications: (i) large-scale automated finger imaging systems (AFIS) generally used for law enforcement purposes; (ii) fraud preven-
4
1
Introduction and Preliminaries on Biometrics and Forensics Systems
tion in entitlement programs; and (iii) physical and computer access. The main problems with fingerprint identification are related to the huge amount of computational resources required for large-scale systems, and to the number of cuts and bruises that people can have on their fingers [3].
• Iris recognition uses the iris patterns which are the coloured part of the eye, although the colour has nothing to do with the biometric trait. Iris patterns of a person’s left and right eyes are different, and so are the iris patterns of different individuals including identical twins [6]. Iris recognition is usually employed as a verification process due to its low false acceptance rate. • Hand geometry recognition is based on a number of measurements taken from the human hand such as its shape, size of palm (but not its print), and the lengths and widths of the fingers. This method is very easy to deploy and is not computationally expensive. However, its low distinctiveness degree and the variability of its size with age pose major problems [7]. This technology is not very suitable for identification applications. • Voice recognition is both a physical and behavioural biometric modality. The physical features of an individual’s voice are based on the shape and size of the appendages (vocal tracts, mouth, nasal cavities, and lips) which are invariant for an individual, but the behavioural aspect of the speech changes over time due to age, medical conditions, emotional state, etc. [8]. Speaker recognition is most appropriate in telephone-based applications but the quality of the voice signal degrades by the communication channel. The disadvantages of this biometric trait are (i) it is not suitable for large-scale recognition and (ii) the speech features are sensitive to the background noise. • Signature recognition is defined as the process of verifying the writer’s identity by checking his/her signature against samples kept in a database. The result of this process is usually a number between 0 and 1 which represents a fit ratio (1 for match and 0 for mismatch). The threshold used for the confirmation/rejection decision depends on the nature of the application. The distinctive biometric patterns of this modality are the personal rhythm, acceleration and pressure flow when a person types a specific word or group of words (usually the hand signature of the individual). • Keystroke recognition attempts to assess the user’s typing style such as the dwell time (how long each key is depressed), flight time (time between key strokes) and typical typing errors. Usually this security technology is deployed for computer access within an organisation. The distinctive and behavioural characteristics measured by keystroke recognition also include the cumulative typing speed; the frequency of the individual in using other keys on the keyboard, such as the number pad or function keys; and the sequence utilised by the individual when attempting to type a capital letter. • Gait recognition is the process of identifying an individual by the manner in which they walk. This modality is less unobtrusive than most others and as such offers the possibility to identify people at distances without any interaction or co-operation from the subject thus making it an attractive solution for identi-
1.3
Recognition/Verification/Watch-List
5
fication applications. This technology is still at an early stage of research and development. • Ear recognition is yet a new biometric technology and is useful during crime scene investigations in the absence of valid fingerprints. This modality is gaining interest from biometric community and its operation is similar to that of face recognition.
1.3 Recognition/Verification/Watch-List It is commonly known that a typical biometric recognition scenario, as all biometric applications, can be classified into one of two types: verification (or authentication) and identification (or recognition). In some applications, a third scenario may be added. For example, Phillips et al. in the Face Recognition Vendor Test (FRVT) [9] define another type called the “watch-list”.
1.3.1 Verification: Am I Who I Claim to Be? This scenario can be employed in a control access point application. It is used when a subject provides an alleged identity. The system then performs a one-to-one match that compares a query biometric image against the template image, of the person whose identity is being claimed, stored in the database. If a match is made, the identity of the person is verified. In other words, the verification test is conducted by dividing the subjects into two groups: • Clients: people trying to gain access using their own identity. • Imposters: people trying to gain access using a false identity, i.e. an identity known to the system but not belonging to them. The percentage of imposters gaining access and clients rejected access are reported as the false acceptance rate (FAR) and the false rejection rate (FRR).
1.3.2 Recognition: Who Am I? This mode is used when the identity of the individual is not known in advance. The entire template database is then searched for a match to the individual concerned in a one-to-many search. If a match is made the individual is identified. The recognition test works on the assumption that all biometric images being tested are of known persons. The percentage of correct identifications is reported as the correct (or Genuine) identification rate (CIR) or the percentage of false identifications is reported as the false identification rate (FIR).
6
1
Introduction and Preliminaries on Biometrics and Forensics Systems
1.3.3 The Watch-List: Are You Looking for Me? One application example of a watch-list task is to compare a suspect flight passenger against a database of known terrorists. In this scenario, the person does not claim any identity; it is an open-universe test. The test person may or may not be in the system database. The biometric sample of this individual is compared to the stored samples in a watch-list to determine if the individual concerned is present in the watch-list and a similarity score is reported for each comparison. These similarity scores are then numerically ranked and if a score is higher than a preset threshold, an alarm is raised and the system assumes that this person is present in the watch-list. There are two terms of interest for watch-list applications [10]: • Detection and identification rate: the percentage of times the system raises the alarm when correctly identifying a person on the watch-list. • False alarm rate: the percentage of times the system raises the alarm when an individual is not on the watch-list. It is worth noting that, in an ideal system, one wants the false alarm rate and the detection and identification rate to be 0 and 100%, respectively.
1.4 Steps of a Typical Biometric Recognition Application In general, biometric recognition applications, regardless of the specific method used, consist of the following steps as shown in Fig. 1.2.
1.4.1 Biometric Data Localisation This is the first module in a biometric recognition process where it is assumed that an image or a video containing image data is available to the system as an input. Localisation is a very important step in order to obtain satisfying recognition results and the challenges associated with this process can be attributed to many factors such as pose, presence or absence of structural components, occlusions, image orientation, imaging conditions (lighting, camera characteristics). Many approaches have been proposed to address biometric image localisation problems. In general, single image detection and localisation methods are classified into four categories: • Feature-invariant approaches: These algorithms aim to find the structural features that exist even when pose, viewpoint or lighting conditions vary in order to locate images. These methods are mainly designed for biometric image localisation.
1.4
Steps of a Typical Biometric Recognition Application
Fig. 1.2 Steps of a typical biometric image recognition
7
Image Acquisition Biometric image Localisation Biometric Sub-image Normalisation and Pre-processing Normalise Image Feature Extraction Feature Vector
Matching
Database
Result
• Template matching methods: Several standard patterns of a biometric image are stored to describe the image as a whole or the biometric features separately. The correlations between an input image and the stored patterns are computed for use in the detection process. These methods are used for both localisation and detection techniques. • Appearance-based methods: In contrast to template matching, the models (or templates) are learned from a set of training images which capture the representative variability of image appearance. These learned models are then used for biometric image detection.
1.4.2 Normalisation and Pre-processing The aim of this step is to enhance the quality of the captured images due to distortions previously described with a view to lead to a better recognition power for the system. Depending on the application, some or all of the following pre-processing techniques may be implemented in a biometric recognition system: • Geometrical alignment: In some cases, the position of the image is not located at its optimum position; it might be somehow rotated or shifted. Since the main
8
1
Introduction and Preliminaries on Biometrics and Forensics Systems
body of the biometric image plays a key role in the determination of biometric features, especially for face/iris/palmprint recognition systems based on the frontal views of images, it may be very helpful if the pre-processing module normalises the shifts and rotations in the main position. • Image size normalisation: This process aims to align images such that they are of the same size and are located at the same position and orientation. Resizing is then performed to set the size of an acquired image to a default image size, say of 128×128, 256×256, etc. This step is mostly encountered in systems where images are processed globally. • Enhancement: This step is not always required but it can be highly useful in two cases: (i) median filtering for noisy images especially obtained from a camera or from a frame grabber and (ii) high-pass filtering to highlight the contours of the image to further improve edge detection performances. • Background removal: This process deals primarily with the most useful information where background should be removed. Masking also can be used to eliminate the sections of the image that are not part of the main image area. This is done to ensure that the biometric recognition system does not respond to features corresponding to background, hair, clothing, etc.
1.4.3 Feature Extraction This is the key step in any biometric recognition system and all pattern recognition systems in general. Once the detection/localisation process has targeted a biometric image data and normalised it, image can be analysed. The recognition process analyses the spatial geometry of the distinguishing features of the image. There exist different methods to extract distinguishing features of an image, but in general they can be classified into three approaches: • Feature-based approaches: They are based on the extraction of the properties of individual organs located on a biometric trait such as eyes, nose and mouth for a face, wrinkles, lines for a palmprint, eye lashes for an iris, etc., as well as their relationships with each other. • Appearance-based approaches: They are based on information theory concepts and seek a computational model that best describes a biometric image. This is performed by extracting the most relevant information contained in the image without dealing with the individual properties of organs such as eyes or mouth. In appearance-based approaches, an image is considered as a high-dimensional vector, i.e. a point in a high-dimensional vector space. Many of these approaches use statistical techniques to analyse the distribution of the object image vectors in the vector space in order to derive an efficient and effective representation (feature space) depending on the targeted application. Given a test image, the similarity between the stored prototypes and the test view is then carried out in the feature space.
1.5
Summary
9
• Hybrid approaches: Just as the human perception system uses both local features and the whole image region to recognise a biometric image, a machine recognition system should use both.
1.4.4 Matching The fourth step of a biometric recognition system is to compare the template generated in step three against a database of known features of the biometric application. In an identification application, this process yields scores that indicate how closely the generated template matches each of those in the database. In a verification application, the generated template is only compared to one template in the database to claim the true or false identity of the person. Finally, the system should determine if the produced score is sufficiently large to declare a match. The rules governing the declaration of a match are of two types: (i) manual, where the end-user has to determine if the result is satisfying or not, and (ii) automatic, in which case the measured distance (the matching score) should be compared to a predefined threshold so that a match is declared only if the measured score is higher than the threshold.
1.4.5 Databases To build/train a biometric recognition algorithm, it is necessary to use a standard test data set as used by researchers and end-users in order to be able to directly assess and compare the results. A database is a collection of one or more computer files. For biometric systems, these files could consist of biometric sensor readings, templates, match results, related end- user information etc. While there exist many databases currently in use and which can be found on the Internet or available from academic or industrial institutions, the choice of an appropriate database should be made based on the targeted biometric application (face, iris, palmprint, speech, etc.). Another way is to select the data set specific to the application at hand; for example, how the algorithm responds to biometric images under varying environment conditions or how the algorithm operates under different operating scenarios by varying the setup variables/values.
1.5 Summary Biometrics aims to automatically identify individuals based on their unique physiological or behavioural traits. A number of civilian and commercial applications of biometrics-based identification have been deployed in real problems and many are emerging. These deployments are intended to strengthen the security and convenience in their respective environments. However, a number of legitimate concerns are also being raised against the use of biometrics for various applications such as
10
1
Introduction and Preliminaries on Biometrics and Forensics Systems
loss of privacy and performance limitations. To address these issues, appropriate legislations will need to be brought about in order to protect the privacy rights of the individuals while at the same time endorsing the use of biometrics for legitimate uses especially if much improved and non-invasive solutions are researched and developed at low cost.
References 1. J. Wayman, A. K. Jain, D. Maltoni and D. Maio, Eds., “Biometric Systems:Technology, Design and Performance Evaluation” Springer-Verlag, London, UK, 2005. 2. A. K. Jain, P. Flynn and A. A. Ross, Eds., “Handbook of Biometrics” Springer Science Business Media, LLC, New York, USA, 2008. 3. A. K. Jain, R. Bolle and S. Pankanti, Eds., “Biometrics: Personal Identification in Networked Society” Kluwer Academic Publishers, London, UK, 1999. 4. C. Wilson, A. R. Hicklin, M. Bone, H. Korves, P. Grother, B. Ulery, R. Micheals, M. Zoep, S. Otto and C. Watson, “Fingerprint Vendor Technology Evaluation 2003: Summary of results and analysis report” Tech. Report, NIST Technical Report NISTIR 7123, National Institute of Standards and Technology, June 2004. 5. D. Maltoni, D. Maio, A. K. Jain and S. Prabhakar, “Handbook of Fingerprint Recognition” Springer-Verlag, London, UK, 2003. 6. J. D. Woodward, C. Horn, J. Gatune and A. Thomas, “Biometrics: A Look at Facial recognition” RAND Public Safety and Justice for the Virginia StateCrime Commission, 2003. 7. R. Zunkel, “Biometrics: Personal Identification in Networked Society” Chapter Hand Geometry Based Authentication, pp. 87–102, Kluwer Academic Publishers, London, UK, 1999. 8. J. P. Campbell, “Speaker recognition: a tutorial” Proceedings of the IEEE, vol. 85, no. 9, pp. 1437–1462, September 1997. 9. P. J. Phillips, G. Grother, R. J. Micheals, D. M. Blackburn, E. Tabassi and J.M. Bone “FRVT 2002: Overview and summary, http://www.frvt.org/frvt2002/documents.htm”. 10. D. M. Blackburn, “Biometrics 101, version 3.1, vol. 12” Federal Bureau of Investigation, March 2004.
Chapter 2
Data Representation and Analysis
2.1 Introduction The last few years have witnessed the emergence of new tools and means for the scientific analysis of image-based information for security and forensic science and crime prevention applications. For instance, images can now be captured, viewed and analysed at the scenes or in laboratories within minutes whilst simultaneously making the images available to other experts via fast and secure communication links on the Internet, thereby making it possible to share information for forensic and security intelligence and crime linking purposes. In addition, these tools have a strong link with other aspects of investigation, such as image capture, information interpretation and evidence gathering. They are helpful for minimization of human error and analysis of data. Although there exist a number of application scenarios, the analysis of data is usually based on a conventional biometric system. Therefore, the following discussion on a biometric system is given as it would be a starting point for any other imaging system for use in security and/or forensic science. A standard Biometric Identification System consists of the following three phases: Data Acquisition, Feature Extraction and Matching, and operates in two distinct modes: Enrolment Mode or Identification Mode [1]. The Data Acquisition stage is used in the enrolment mode to establish the database of users and their related biometric data whereas in Identification mode it is used to obtain a reference biometric from the user. This reference biometric is then processed at the Feature Extraction phase to obtain unique and comparable features. These features are then compared in the Matching phase with the related features of all the biometric templates in the database to establish or refute the identity of the user. Figure 2.1 depicts a block diagram view of a basic Biometric Identification System. The design of any biometric system is based on decisions regarding the selection of appropriate modules for each of these processes [1, 3]. Details of these processes and modules included within these processes along with the critical issues that need to be addressed before a design decision is made are described below.
A. Bouridane, Imaging for Forensics and Security, Signals and Communication Technology, DOI 10.1007/978-0-387-09532-5 2, C Springer Science+Business Media, LLC 2009
11
12
2
Fingerprint, Iris, ..etc
Biometric Sensor
Data Representation and Analysis
Feature Extraction
Enrolment Fingerprint, Iris, ..etc
Biometric Sensor
Feature Extraction
ID
Database
Matching
Result
Identification/Authentication Fig. 2.1 Block diagram of a biometric identification system
2.2 Data Acquisition In the data acquisition process the first and foremost decision to make pertains to the selection of an appropriate biometric trait. A lot of thought has to go into the selection of human physical and physiological traits for use by the biometric recognition system. The selected trait has to be universal but unique so that the trait exists in all users but also varies from subject to subject. It has to be permanent and resistant to changes so that the stored biometric data is usable over a long period of time. It has to be measurable and socially and economically acceptable so that the data can be gathered and matching can be performed and results quantified within a reasonable time and cost constraints. It should also be very difficult to circumvent or forge the trait. In addition, attention should be paid to ensure that machine-readable representations should completely capture the invariant and discriminatory information in the input measurements. This representation issue is by far an important requirement and has far reaching implications on the design of the rest of the system. The unprocessed measurement values should not be invariant over the time of capture and there is a need to determine those peculiar/salient features of the input measurement which both discriminate between the identities as well as remain invariant for a given individual. The acquisition module should also aim to capture salient features since it is accepted that more distinctive biometric signals offer more reliable identity authentication while less complex measurement signals inherently offer a less reliable identification results. Therefore, quantification at an earlier stage would lead to much improved and effective results of a biometric recognition system. It is important to note that no single biometric is expected to meet all the above mentioned requirements. Therefore, developers are required to identify the best possible biometric trait for each application e.g. for an application that
2.2
Data Acquisition
13 Table 2.1 Some commonly used biometrics
Physical
Behavioural
Fingerprint: Most commonly used Higher false accept rate
Gait:
Face:
Voice: Easiest to acquire Difficult to compar
Hand Geometry: Robust under different conditions Changes occur with age
Useful in distant Surveillance Changes with age and surface Useful in absence of visual data Changes with age and Health Handwriting & Signature: Useful in detecting emotions as well Changes with age, health and stress
Palm Print: Bigger area of interest Availability of data set Iris: Very Low false accept rate Difficult to acquire Ear: Robust to change Difficult to acquire
focuses on access control to critical area or application cost may not necessitate a significant consideration but uniqueness and circumvention may be important. Some of the most commonly used biometric traits are identified in the following Table 2.1.
2.2.1 Sensor Module Once a biometric trait is identified a suitable sensor module is required to capture the data. The sensor module is the interface between the user and the system, therefore the selection of an appropriate sensor module is critical for the success of the system. A poor quality sensor module would not be able to capture the biometric data properly, thus increasing the failure to capture error. This will cause user dissatisfaction and low acceptance rate which may eventually cause the system to fail. It is also important to decide if the sensor module would be “overt” or “covert”. If the user is aware of the fact that biometric identification is taking place the system is called overt but if they are not aware that any identification is taking place then the system is called covert. Overt systems are mostly used in access control applications whereas covert systems are used in surveillance and continuous monitoring systems. For example, a computer may contain a fingerprint scanner to identify the user at logon (which is the example of an overt system) but then constantly keep acquiring facial images from an attached webcam to verify that the same user that logged on is still accessing the system (which is the example of a covert system).
14
2
Data Representation and Analysis
As most biometric systems are imaging based, the quality and maintenance of the raw captured biometric image also plays an important role in the development of a strong biometric identification system.
2.2.2 Data Storage Maintenance of the template data store ‘or’ template database is often an over looked area of the biometric identification system. A well established, secure and effective database can improve the performance and user acceptance of the system. As the database has to store the biometric data along with other personal details of the user, it has to be kept very secure. The size of the database also has to be kept as small as possible to maintain the speed of access. The type of data to be stored depends upon the kind of application that will utilize the data. Raw images are stored for research and feature sets are usually stored for real world applications. Both type of data storage provides some interesting challenges [4].
2.2.2.1 Raw Images Raw Images are stored for research and development applications. When storing raw images the following points have to be kept in mind. Image Size – Image Size should be kept small to reduce the database size but if the image size is very small a lot of important information may be lost. Image Format – As the data is being stored for research it is important to keep the images in a standard file format so that they can also be accessed by other researchers. The image should be stored in a format that does not use a strong compression algorithm because compression can cause change to pixel values in the image. This can cause corruption of data. Image DPI – A high DPI image will have more information available and thus will provide more data for researchers but on the other hand it will also have a large size thus increasing the size of the database. A balance needs to be obtained between the DPI and Image size for optimal performance of the database. Usually an image with a size of around 640 × 480 having 8 bits per pixel and 500 dpi stored either as a TIFF image or a BMP image is considered to be a good biometric image. Figure 2.2 shows some samples of raw fingerprint and iris images.
2.2.2.2 Feature Sets Real World Applications of biometric systems require quick access to data and very small storage space, therefore, the feature sets are usually stored instead of raw images. Processed feature sets usually take up less space than raw images, thus
2.3
Feature Extraction
15
Fig. 2.2 A visual spectrum fingerprint image, a thermal fingerprint image and an IR spectrum iris image
reducing the database size and also increasing the access speed. One of the lesser known advantages of using feature sets is that it is not possible to recreate the actual raw biometric data from the feature set therefore saving the feature set only provides personal data protection. It should be kept in mind that to maintain system openness, the feature sets should be stored in one of the standard formats like the ones defined in ANSI/NBS – ICST 1-1986 for minutiae, ANSI/NIST – ITL 1a-1997 for Faisal Feature Set, ANSI/NIST – ITL 1-2006 for Iris, etc. [4, 5]. Using these standards allows for an easy expansion and upgrade of the system at later times.
2.3 Feature Extraction As mentioned in Chapter 1, one of the first steps in feature extraction process is pre-processing. Pre-processing can consist of multiple steps e.g. Image Enhancement – To reduce the noise in the image and/or to enhance the features to be extracted. Image Formatting – Image is converted to a format that will allow for better feature extraction performances; e.g. some minutiae extraction algorithms require the image to be converted into a binary form. Image Registration and Alignment – Images are aligned and centred so that the extracted features from all images are within the same frame of reference thus making the matching process much simpler. Image Segmentation – Raw image is segmented into Region of Interest (ROI) and background. All the processing is carried out on the ROI, it is therefore imperative that the best possible segmentation is obtained. Once a raw image is properly processed the feature extraction algorithm can then be used to extract the relevant features. Feature Extraction Algorithms can be classified into two groups: Global Feature Extractors and Local Feature Extractors. Global Feature Extractors aim to locate usable features from the raw data at the overall image level. They process the image as a whole and try to extract the features, e.g. Gabor Wavelets-based approach for iris and fingerprint recognition [6].
16
2
Data Representation and Analysis
On the other hand, Local Feature Extractors focus on the chunks of image data. These algorithms work on small windows within the images and extract the relevant features, e.g. minutiae extraction from skeletonised and binarised fingerprint images. A feature extractor algorithm selection is governed mainly by the type of application that the system is being designed for. Applications requiring more accuracy and security should have a robust and exhaustive feature extractor. However, for faster applications a simpler algorithm might be the best option. Ideally, the feature extractor should be very robust, accurate and fast but practically this is not possible. It is therefore almost always a compromise between accuracy and speed. It is advisable to evaluate multiple feature extraction algorithms to find the optimal algorithm for the desired application. The feature extractor algorithm selection also depends upon the type of matcher being used in the system. The feature extractor should generate output in the format that the matcher is able to comprehend and process. As mentioned before, to maintain openness of the system it is prudent to ensure that the output of the feature extractor should follow a standard format.
2.4 Matcher A matcher algorithm takes the reference feature set and compares it with all the template feature sets in the database to provide a matching score for each pair. It then selects the best template–reference pair and outputs the details as its decision. Different types of matchers are usually used depending upon the type and format of the feature set as well as the type of application at hand. Matchers are commonly categorised into two categories: Time Domain Matchers and Frequency Domain Matchers [1–3]. Time Domain Matchers work in the spatial domain and the feature sets for these types of matchers are generated directly from the raw images. Frequency Domain Matchers operate in the frequency domain and the feature sets for these types of matchers are generated by first transforming the image into the frequency domain and then selecting the features, e.g. wavelets-based matchers, Fourier transform-based matchers, etc. However, it is worth noting that correlation-based matchers are the most commonly used matcher algorithms. Similarly, distance-based matchers and supervised learning or pattern recognition-based matching is also widely used. Pattern recognition-based matching finds the correct match by training on known correct and incorrect matching feature sets. In this type of matching the training process is usually computationally intensive, but if this process is efficiently done, matching can work very fast and provide highly accurate results. The selection of the optimal matcher depends upon the application for which the system is being developed as well as the type of feature sets available for matching. In addition, information regarding the desired accuracy and the speed required also plays an important role when selecting a matcher algorithm.
2.6
Performance Evaluation
17
2.5 System Testing When a biometric identification system is developed it should be thoroughly tested before going live. Appropriate testing is critical to locate, identify and eradicate errors before the system is implemented in a real world scenario. Testing is conducted on a test data set and for impartial testing of biometric recognition systems the industry and research community need to have access to large public data sets. Test data set is generated by collecting data with known values (called the ground truth) and the testing is carried out by matching a set of known reference feature sets with the test database and evaluating the result of the matcher against the ground truth [7]. Evaluation of testing is usually conducted in two overlapping categories: technology and operation scenarios. Technology evaluation is used to measure the performance of the recognition algorithm and is usually conducted using standard data sets. In general, the results from this type of evaluation are used to further improve the algorithm. On the other hand, the aim of the operation evaluation is to measure and assess the performance of the recognition system in a particular application; for example iris authentication at an airport. This type of evaluation usually takes weeks/months to account for environment changes including varying test samples. The results are then analysed to locate the bugs and remove them. The performance of the system is also evaluated and presented in a standard format for the users.
2.6 Performance Evaluation Determining the best biometric system for a specific operational environment and how to set up that system for optimal performance requires an understanding of the evaluation methodologies and statistics used in the biometrics community. The degree of similarity between two biometric images is usually measured by a similarity score. The similarity score is called genuine score if the similarity is measured between the feature sets of the same user. On the other hand, it is called imposter score if it is between feature sets of different users. An end-user is often interested in determining the performance of the biometric system for his specific application. For example, he/she would like to know whether the system makes accurate identification. Although, there exists a few criteria, no metric is sufficiently adequate to give a reliable and convincing indication of the identification accuracy of a biometric system. However, one criterion generally accepted by biometric community uses either a genuine individual type of decision or an impostor type of decision, which can be represented by two statistical distributions called genuine distribution and impostor distribution, respectively. The performance of a biometric identification system can then be evaluated based on resulting Genuine Score and Imposter Score generated by the system [7]. The following are usually employed:
18
2
Data Representation and Analysis
Matcher accuracy – Accuracy is measured on the test data set to discover how many of the feature sets are correctly matched by the system. If the genuine score is above an operating threshold of the system, the feature set is considered to be correctly matched. Matcher accuracy is usually displayed as a percentage of matches. False accept rate (FAR) – If an imposter score is above the operating threshold it is called a false accept. FAR, therefore, means that the system accepted an imposter as a genuine user. FAR is one of the major performance matrixes that have to be closely evaluated. In fact, effort should be made to keep it as close to zero as possible. False reject rate (FRR) – If a genuine score is below the threshold then it is called a false reject. Thus, false reject rate means that the system rejected a genuine user as an imposter. FRR should ideally be as close to zero as possible but in most access control applications it is not as critical as FAR. If a user is rejected as an imposter he/she can always try again but if an imposter is accepted as a genuine user the integrity of the complete system is compromised. False alarm rate – A statistic used to measure biometric performance when operating in the watch-list (sometimes referred to as open-set identification) task. This is the percentage of times an alarm is incorrectly sounded on an individual who is not in the biometric system’s database (the system alarms on John when John isn’t in the database) or an alarm is sounded but the wrong person is identified (the system alarms on Peter when Peter is in the database, but the system thinks Peter is Daniel). Equal error rate (ERR) – It is the point on the ROC curve where FAR and FRR are equal. For a high-performance system ERR should be as low as possible. Most vendors provide the performance evaluation in terms of accuracy and ERR. Some other evaluation criteria are Failure to capture rate (FCR) – FCR pertains to the amount of times a sensor is unable to capture an image when a biometric trait is presented to it. The FCR increases with wear and tear to the sensor module. If the FCR increases above a certain threshold it is advisable to replace the sensor module. Failure to enrol (FTE) – FTE indicates the number of users that were not enrolled in the system. FTE is usually related to the quality of the biometric image. In most cases, a system is trained to reject poor quality images. This helps in improving the accuracy of the system and reducing the FAR and FRR. Every time an image is rejected the FTE is increased. A trade-off between quality and FTE is required if the system is to be accepted by the users.
2.7 Conclusion To develop a strong biometric system it is imperative to select a very stable data acquisition system and a very secure, fast and robust database. Feature Extractor and Matcher selection will directly impact the user acceptance of the system and the selection is based on the type of application.
References
19
References 1. Arun A. Ross, Patrick Flynn and Anil K. Jain, “Handbook of Biometrics” ISBN: 978-0-38771040-2. 2. A. K. Jain, A. Ross and S. Prabhakar, “An introduction to biometric recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 1, January 2004. 3. J. G. Daugman, “Biometric Decision Landscape”, Technical Report No. TR482. University of Cambridge Computer Laboratory, 1999. 4. Data Format for Information Interchange – Fingerprint Identification, ANSI/NBS – ICST 11986. 5. Data Format for Information Interchange – Data Format for the Interchange of Fingerprint, Facial & SMT Information, ANSI/NIST – ITL 1a-1997. 6. H. Meng and C. Xu, “Iris Recognition Algorithm Based on Gabor Wavelet Transform,” IEEE International Conference on Mechatronics and Automation, 2006. 7. J. Wayman, A. Jain, D. Maltoni and D. Maio, “Biometric Systems Technology, Design and Performance Evaluation,” ISBN: 1852335963.
“This page left intentionally blank.”
Chapter 3
Improving Face Recognition Using Directional Faces
3.1 Introduction Face recognition is one of the most popular applications in image processing and pattern recognition. It plays a very important role in many applications such as card identification, access control, mug shot searching, security monitoring and surveillance problems. There are several problems that make automatic face recognition a very challenging task. The input of a person’s face to a recognition system is usually acquired under different conditions from those of the corresponding image in the database. Therefore, it is important that an automatic face recognition system can deal with numerous variations of images of a face. The image variations are usually due to changes in pose, illumination, expression, age, disguise, facial hair, glasses and background. Much progress has been made towards recognising faces under controlled conditions as reported in [1, 2], especially for faces under normalised pose and lighting conditions and with neutral expression. The Eigenfaces’ method [3], based on Principal Component Analysis (PCA), is one of the most popular methods in face recognition. Its principal idea is to find a set of orthogonal basis images (called eigenfaces) so that in this new basis, the image coordinates (the PCA coefficients) are uncorrelated. Independent Component Analysis (ICA) [4] is one generalisation of PCA and assumes that image data is independent, and not only uncorrelated as in PCA. Fisherface technique [5] based on Linear Discriminant Analysis (LDA) is an other popular method. It considers that each face image in the training set is of a known class and uses this information in the classification step. Subclass Discriminant Analysis (SDA) is a recent algorithm devised by Zhu and Martinez [6] where each class of the LDA method is subdivided into a number of subclasses. However, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains problematic. Researchers have proposed the utilisation of a pre-processing step in order to extract more discriminant features for use in the recognition step. Gabor Filter Bank (GFB) is one of the most well-known methods used for this purpose and many algorithms have been proposed [7, 2]. However, as described in [8], the use of a GFB inherently results in A. Bouridane, Imaging for Forensics and Security, Signals and Communication Technology, DOI 10.1007/978-0-387-09532-5 3, C Springer Science+Business Media, LLC 2009
21
22
3
Improving Face Recognition Using Directional Faces
some overlapping and missing subband regions. The Directional Filter Bank (DFB), on the other hand, is a contiguous subband representation that preserves all image information. Accordingly, a DFB can represent linear patterns, such as those availble around eyes, nose and mouth area, more effectively, than a GFB [9]. This chapter discusses the use of a DFB pre-processing phase in order to improve the recognition rates of a number of different algorithms. Four algorithms representing both Component and Discriminant Analysis approaches have been selected to demonstrate the efficiency of the DFBs. In this work, the algorithms PCA, ICA (FastICA [10]), LDA and SDA are chosen for their popularity and efficiency.
3.2 Face Recognition Basics 3.2.1 Recognition/Verification It is commonly known that a typical face recognition system can be classified into one of two modes: face verification (or authentication) and face identification (or recognition). However, Phillips et al. in the Face Recognition Vendor Test (FRVT) 2002 [11] define another mode referred to as the “Watch-list”. 3.2.1.1 Face Verification: Am I Who I Claim to be? This scenario can be employed as a control access point application. It is used when a person provides an alleged identity. The system then performs a one-to-one match that compares a query face image against the template face image stored in the database whose identity is being claimed. If a match is made the identity of the person is verified. In other words, the verification test is conducted by dividing the subjects into two groups: • Clients, people trying to gain access using their own identity. • Imposters, people trying to gain access using a false identity, i.e. an identity known to the system but not belonging to them. The percentage of imposters gaining access and clients rejected access are referred to as the False Acceptance Rate (FAR) and False Rejection Rate (FRR) for a given threshold, respectively. 3.2.1.2 Face Recognition: Who am I? This mode is used when the identity of the individual is not known in advance. The entire template database is then searched for a match to the individual concerned in a one-to-many search. If a match is made the individual is identified.
3.2
Face Recognition Basics
23
The recognition test works from the assumption that all faces being tested are of known persons. The percentage of correct identifications is reported as the Correct(or Genuine) Identification Rate (CIR) while the percentage of false identifications is reported as the False Identification Rate (FIR). 3.2.1.3 The Watch-List: Are You Looking for Me? One important application of the Watch-List task could be comparing a suspect flight passenger against a database of known terrorists. In this scenario, the person does not claim any identity, it is an open-universe test. The test person may or may not be in the system database. The biometric sample of this individual is compared with the stored samples in a Watch-List database to determine whether the individual concerned is present in the Watch-List. A similarity score is reported for each comparison. These similarity scores are then numerically and orderly ranked. If a similarity score is higher than a preset threshold, an alarm is raised and the system assumes that this person is present in the Watch-List. There are two factors of interest for a Watch-List application [12]: • Detection and identification rate: the percentage of times the system raises the alarm and correctly identifies a person on the Watch-List. • False alarm rate: the percentage of times the system raises the alarm for an individual that is not on the Watch-List. In an ideal system, one wants the false alarm and the detection and identification rates to be 0 and 100%, respectively.
3.2.2 Steps of a Typical Face Recognition Application Facial recognition applications, regardless of the specific method used, in general consist of the following steps (Fig. 3.1): 3.2.2.1 Face Localisation This is the first module in a face recognition process. It assumes that an image or a video containing face images is available as an input to the system. In Yang et al. [13], a difference is made between face localisation and face detection: • Face detection: given an arbitrary image, the goal is to determine whether or not there are any faces in the image and, if present, return the image location and extent of each face. • Face localisation is the process of localising one face in a given image, i.e. the image is assumed to contain one, and only one face.
24
3
Improving Face Recognition Using Directional Faces
Fig. 3.1 Steps of a typical face recognition application
Therefore, face detection is a very important task of any face recognition system and an efficient detection would enhance the recognition results. The challenges associated with face detection can be attributed to many factors such as: pose, presence or absence of structural components (facial hair, glasses, et.), facial expression, occlusions, image orientation, imaging conditions (lighting, camera characteristics). Many approaches have been proposed to address the face detection problems [14, 13], and summary is depicted in Table 3.1 3.2.2.2 Normalisation and Pre-processing The aim of this step is to enhance the quality of the captured images due to one or many of the factors mentioned in the previous section with a view to allow for a better recognition power of the recognition system. Depending on the application at hand, some or all of the following pre-processing techniques may be implemented in a face recognition system: • Geometrical alignment (translation, rotation). • Image size normalisation. • Filtering (median filtering, high-pass filtering).
3.2
Face Recognition Basics
25
Table 3.1 Categorisation of methods for face detection in a single image Approach
Representative work
Knowledge-based
Multi-resolution rule-based method [15]
Feature invariant – Facial Features – Texture – Skin Colour – Multiple Features
Grouping of edges [16] Space Gray-Level Dependence matrix of face pattern [17] Mixture of Gaussian [18] Integration of skin colour size and shape [19]
Template matching – Predefined face templates – Deformable Templates
Shape templates [20] Active Shape Models [21]
Appearance based – Eigenfaces – Distribution based – Neural Network – Support Vector Machine – Naive Bayes Classifier – Hidden Markov Model – Information-Theoretical Approach
Eigenvector decomposition and clustering [3] Gaussian distribution and multi-layer perception [22] Ensemble of neural networks and arbitration schemes [23] SVM with polynomial kernel [24] Joint statistics of local appearance and position [25] Higher order statistics with HMM [26] Kullback relative information [27]
• Illumination normalisation. • Background removal. 3.2.2.3 Feature Extraction This is the key step in face recognition in particular and in all pattern recognition applications in general. Once the face detection task has detected and normalised a face, the analysis can then take place by capturing the spatial geometry of distinguishing features of the face. There exist different methods to extract identifying features of a face, but in general they can be classified into three approaches: Feature-based approaches are based on the extraction of the properties of individual organs located on a face such as eyes, nose and mouth including their relationships with each other. Appearance-based approaches are based on information theory concepts. These approaches seek a computational model that best describes a face by extracting the most relevant information contained in the face without dealing with the individual properties of facial organs such as eyes or mouth. Hybrid approaches are similar to the human perception system which uses both local features and the whole face region to recognise a face. A machine recognition system should use both. Table 3.2 presents some of the principal algorithms developed for feature extraction as described by Zhao and Chellappa [28].
26
3
Improving Face Recognition Using Directional Faces
Table 3.2 Categorisation of features extraction techniques Approach
Representative work
Appearance-based methods – Eigenfaces – Probabilistic Eigenfaces – Fisherfaces – SVM – Evolution pursuit – Feature lines – ICA – Kernel faces
Direct application of PCA [3] Two-class problem with prob. measure [29] FLD on Eigenspace [5] Two-class problem based on SVM [30] Enhanced GA learning [31] Based on point to line distance [32] ICA-based feature analysis [33, 4] Kernel methods [34]
Feature-based methods – Pure geometry methods – Dynamic link architecture – Hidden Markov model
Earlier methods [35–37]; recent methods [38, 39] Graph matching methods [40, 41] HMM methods [42, 43]
Hybrid methods – Modular Eigenfaces – Hybrid LFA – Shape normalised – Component-based
Eigenfaces and Eigenmodules [44] Local feature method [45] Flexible appearance models [21] Face region and component [46]
3.2.2.4 Matching The fourth step of a face recognition system is to compare the template generated in step three with those in a database of known faces. In an identification application, this process yields scores indicating how closely the generated template matches each of those in the database. In a verification application, the generated template is only compared with one template in the database, that of the claimed identity. Finally, the system should determine if the produced score is high enough to declare a match. The rules governing the declaration of a match are of two types: a manual one where the end user has to determine if the result is satisfying or not and an automatic type in which the measured distance (the matching score) should be compared to a predefined threshold and a match is declared if the measured score is higher than this threshold.
3.3 Previous Work 3.3.1 Principal Component Analysis (PCA) The well-known Eigenface algorithm proposed by Turk and Pentland [3, 47] uses PCA for dimensionality reduction in order to find the vectors which best account for the distribution of face images within the entire image space. These vectors define the subspace of the face images (face space). All faces in the training set are
3.3
Previous Work
27
projected onto the face space to find a set of weights that describes the contribution of each vector in the face space. To identify a test image, the projection of the test image onto the face space is required to obtain the corresponding set of weights. By comparing the weights of the test image with the set of weights of the faces in the training set, the face in the test image can be identified. The key procedure in PCA is based on Karhunen–Loeve (KL) transformation. If the image elements are considered to be random variables, then the image may be seen as a sample of a stochastic process. The PCA basis vectors are defined as the eigenvectors of the covariance matrix C: C = E[X X T ]
(3.1)
Since the eigenvectors associated with the largest eigenvalues have face-like images, they also are referred to as Eigenfaces. Specifically, suppose the eigenvectors of C are u 1 , u 2 , . . . , u n and are associated respectively with the eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λn . Then X=
n
xˆ i u i
(3.2)
i=1
The dimensionality reduction can be achieved by letting X≈
m
xˆ i u i
(3.3)
i=1
where Xˆ = [xˆ 1 , xˆ 2 , . . . , xˆ m ] and m is usually selected such that λi is small for i > m. Since the eigenfaces’ method directly applies PCA, it does not destroy any information of the image by exclusively processing only certain points, generally providing more accurate recognition results. However, the technique is sensitive to variation in position and scale. Some serious issues relate to the effect of background, head size and orientation. The change of head size of an input image can be problematic because a neighbourhood pixel’s correlation is lost under head size change. Note that variation of light can also be a problem if the light source is positioned in some specific directions.
3.3.2 Independent Component Analysis (ICA) ICA is a widely used algorithm in statistical signal processing. It is defined as follows: having an observed m-dimensional vector X = (x1 , . . . , xm )T , the problem is to find a linear transform A that maps observation X into a n-dimensional vector S = (s1 , . . . , sn )T where the components si are as independent as possible:
28
3
Improving Face Recognition Using Directional Faces
X = AS
(3.4)
where A is a m×n matrix of full rank, called the mixing matrix. In feature extraction, the columns of A represent features, and si is the coefficient of the ith feature in the observed data vector X . There are several methods to compute the ICA. Here FastICA [10] is used because of its fast convergence during the estimation of the parameters. The FastICA method computes the independent components by maximising nonGaussianity of whitened data distribution using a kurtosis maximisation process. The kurtosis measures the non-Gaussianity and the sparseness of the face representations [48]. The idea is to estimate the independent source signals U by computing a separating matrix W where U = W X = W AS. First, the observed samples are centred and whitened, this means that the data has a mean equal to zero and a standard deviation equal to one. Let us denote the centered and whitened samples by Z. Then, one needs to search for the W matrix such that the linear projection of the whitened samples by the matrix W has maximum non-Gaussianity of data distribution. The kurtosis of Ui = WiT Z is computed as: K (Ui ) = E(Ui )4 − 3(E(Ui )2 )2
(3.5)
the separating vector Wi is obtained by maximising the kurtosis.
3.3.3 Linear Discriminant Analysis (LDA) PCA constructs the face space without using face class (category) information where training considers the whole face data. However, in an LDA approach the goal is to find an efficient way to represent the face vector space [5, 2] by exploiting the class information which can be helpful for the identification task. The Fisherface algorithm [5] is derived from the Fisher Linear Discriminant (FLD), which uses class specific information. By defining different classes with different statistics, the images in the learning set are divided into the corresponding classes. Then, techniques similar to those used in the Eigenface algorithm are applied. The Fisherface algorithm results in a higher accuracy rate in recognising faces when compared with Eigenface algorithm. The LDA finds a transform W L D A , such that the ratio of the between-class scatter and the within-class scatter is maximised as follows: W L D A = argmax W
|W T S B W | |W T SW W |
(3.6)
where S B is the between-class scatter matrix and SW is the within-class scatter matrix and are defined as SB =
c i=1
Ni (μi − μ)(μi − μ)T
(3.7)
3.3
Previous Work
29
SW =
c
(xk − μi )(xk − μi )T
(3.8)
i=1 x∈X i
Ni is the number of training samples in class i, c is the number of distinct classes, μi is the mean vector of samples belonging to class i and X i represents the set of samples belonging to class i.
3.3.4 Subspace Discriminant Analysis (SDA) The problem with traditional discriminant analysis methods is that they assume that the sample vectors of each class are generated from underlying multivariate normal distributions of common covariance matrix but with different mean values. Many authors have addressed this problem by introducing extensions of LDA, for example non-parametric DA [49] and Penalised DA [50]. However, these algorithms assume that each class is represented by a single cluster and, therefore, none of them solve the problem related by non-linearly separable classes. To solve this problem, one can use non-linear methods such as Flexible DA [51] and Generalised DA [52]. However, they have two major problems: first, they require a very large number of samples to obtain satisfactory results, and second a high computational cost in the training and testing phases is also needed [6]. One method that addresses the above problems is SDA. Its principal idea is to devise a solution which describes a large number of data distributions, regardless of whether these correspond to compact sets or not [6]. A method to achieve this is to approximate the underlying distribution of each class as a mixture of Gaussians where each Gaussian will represent a subclass. Figure 3.2 shows a two class problem (class of circles and class of stars) where the second class is represented by the mixture of two Gaussians. It can clearly be seen that there exist no direct linear method which can separate the two classes. However, if data distribution of each class is approximated using a mixture of Gaussians, the following generalised eigenvalue decomposition equation can be used to calculate those discriminant vectors that best (linearly) classify the data: ΣB V = ΣX V Λ
(3.9)
where Σ B is the between-subclass scatter matrix, Σ X is the covariance matrix of the data, V is a matrix whose columns correspond to the discriminant vectors where Λ is a diagonal matrix whose elements are the corresponding eigenvalues. 3.3.4.1 Dividing Classes into Subclasses As mentioned in the previous section, the essence of SDA is to divide each class into different subclasses. The first question one may ask relates to how many subclasses should each class have? and which clustering approach is best suited in
30
3
Improving Face Recognition Using Directional Faces
Fig. 3.2 A two-class problem when one of the classes is a mixture of two Gaussians
order to divide the samples into a set of subclasses (clusters). Although, there exist many clustering methods, it is accepted that the Nearest Neighbour (NN) method yields superior or equivalent results when compared against other parametric methods such as K-means and Gaussian mixtures; or non-parametric clustering methods such as the Valley-seeking algorithm of Koontz and Fukunaga [49]. In addition, the NN-clustering is efficient because it can also be used when the number of samples in each class is either large or small, and it does not require large computational resources [6].
3.3.4.2 NN-Clustering In a NN-Clustering approach the first step consists of sorting the feature vectors (i.e. face images in our case) so that a set {xi1 , xi2 , . . . , xini } is constructed as follows: if xi1 and xini are the two most distant feature vectors: arg max jk xi j − xik 2 where x2 is a norm-2 length of x with xi2 being the closest feature vector to xi1 and xi(n c−1 ) the closest feature vector to xin c . In general, xi j is the ( j − 1)th closest feature vector to xi1 . Once this done, the sorted set {xi1 , xi2 , . . . , xini } is divided into M subclasses Hi where i = 1, . . . M. For example, data can be divided into two equally balanced (in the sense of having the same number of samples) clusters (H1 and H2 ) by simply partitioning the sorted set into two parts:{xi1 , . . . , xi,ni /2 } and {xi,(ni /2)+1 , . . . , xini }. More generally, one can divide each class into h (equally balanced) subclasses; i.e. Hi = h∀i. This is suitable for such a case where the underlying distribution
3.4
Face Recognition Using Filter Banks
31
of each class is not Gaussian, but can be represented as a combination of two or more Gaussians. Another case is when the classes are not separable, but the subclasses are.
3.4 Face Recognition Using Filter Banks 3.4.1 Gabor Filter Bank The processing of facial images by a Gabor filter has been widly used for its biological relevance and technical properties. The Gabor filter kernels have similar shapes as the receptive fields of simple cells in the primary visual cortex [41]. They are multiscale and multi-orientation kernels. The Gabor transformed face images yield features that display scale, locality and differentiation properties. These properties are quite robust to variability of face image formation, such as the variations of illumination, head rotation and facial expressions. 3.4.1.1 Gabor Functions and Wavelets The two-dimensional Gabor Wavelets function g(x, y) and its Fourier transform G(u, v) can be defined as follows [53]: 1 x2 y2 1 exp − + 2 + 2π j W x g(x, y) = 2π σx σ y 2 σx2 σy
1 G(μ, υ) = exp − 2
υ2 (μ − W )2 + 2 2 σμ συ
(3.10)
(3.11)
Where σμ = 12 π σx and συ = 12 π σ y . Gabor functions can form a complete but non-orthogonal basis set. Expanding a signal using this basis provides localised frequency description. A class of self-similar functions, referred to as Gabor wavelets in the following discussion, is now considered. Let g(x, y) be the mother Gabor wavelet, then this self-similar filter dictionary can be obtained by appropriate dilations and rotations of g(x, y) through the generating function: gmn (x, y) = a −m G(x , y )
(3.12)
x = a −m (x cos θ + y sin θ )
(3.13)
y = a −m (−x sin θ + y cos θ )
(3.14)
where a > 1, m, n = integer and θ = nπ/k is the orientation(k is the number of orientations) and a −m is the scale factor.
32
3
Improving Face Recognition Using Directional Faces
3.4.1.2 Gabor Filter Dictionary Design The non-orthogonality of the Gabor wavelets implies that there is redundant information in the filtered images, and the following strategy is used to reduce this redundancy. Let Ul and Uh denote the lower and upper centre frequencies of interest. Let K be the number of orientations and S be the number of scales in the multiresolution decomposition. As proposed by Manjunath and Ma [53], the design strategy is to ensure that the half-peak magnitude support of the filter responses in the frequency spectrum touch each other. This results in the following formulas for computing the filter parameters σμ and συ (and thus σx and σ y ): a=
Uh Ul
1 − S−1
(a − 1)Uh √ (a + 1) 2 ln(2) − 1
π 2σμ2 [2 ln(2)]2 σμ2 2 Uh − 2 ln συ = tan 2 ln(2) − 2k Uh Uh2 σmu =
(3.15) (3.16)
(3.17)
where W = Uh , and m = 0, 1, . . . , S − 1. To eliminate the sensitivity of the filter response to absolute intensity values, the real (even) components of the 2D Gabor filters are biased by adding a constant to make them zero mean. 3.4.1.3 Augmented Gabor-Face Vector Given any image I(x,y), its Gabor wavelet transformation is Wmn =
∗ I (x1 , y1 )gmn (x − x1 , y − y1 )d x1 dy1
(3.18)
∗ indicates the complex conjugate of gmn . The Gabor wavelet transforwhere gmn mation of the facial image is calculated at S scales, m ∈ {0, 1, 2, . . . , S} and K different orientations, n ∈ {0, 1, 2, . . . , K } and let us set Ul = 0.05 and Uh = 0.4. Wmn denotes a Gabor wavelet transformation of a face image at the scale m and orientation n. Figure 3.3 shows a sample face image from the database and its forty filtered images (five scales: S = 5 and eight orientations: K = 8 have been taken). The augmented Gabor-face vector can then be defined as follows [54]: t t , . . . , W S,K )t χ = (W0,0
(3.19)
where t is the transpose operator. The augmented Gabor-face vector can encompass all facial Gabor wavelet transformations, and has important discriminatory information that can be used in the classification step.
3.4
Face Recognition Using Filter Banks
33
Fig. 3.3 Gabor filters (a) A face image from the database, (b) The filtered images: five scales and eight orientations
3.4.2 Directional Filter Bank: A Review A digital filter bank is a collection of digital filters with a common input or output. The DFB is composed of an analysis bank (analysis filter bank) and a synthesis bank. The analysis bank of the DFB splits the original image into 2n directionally passed subband images (n is the order of the DFB) while the synthesis bank combines the subband images into one image. A diagram of a DFB structure can be given as a tree with two-band splits at the end of each stage (Fig. 3.4), where each split increases the angular resolution by a factor of two. In the analysis section of the DFB, the original image is split into two directional subband images, then each subband image is split into two more directional images, and so on until the order n, where 2n directional subband images are obtained.
Fig. 3.4 DFB structure
34
3
(a)
Improving Face Recognition Using Directional Faces
(b)
2
1
8
7
6
2
1
5 4
2
3
3
2
4 5
6
7
8
3
1
5
2
3
4
6
1
2
1
4
7
8
1 5
6
7
8
Fig. 3.5 The frequency partition map for an eight-band DFB. (a) Input (b) Eight subband outputs
At this point, the output is used as the input for the next stage. Each of the subbands in the analysis part extracts frequency components based on the associated frequency partition map as shown in Fig. 3.5. In the synthesis bank, the dual operation is performed, i.e. the directional subband images are combined into a reconstructed image in the reverse order of the analysis stage to enable a perfect reconstruction of the signal. However, it is important to mention that, in this work, we are only interested in the analysis section since our goal is to extract discriminant features from each directional image. The components of the analysis part are the downsampler D and the analysis filters H0 and H1 . 3.4.2.1 Analysis Filters One of the attractive features of the DFB is the fact that it can be implemented by one filter prototype only. By using carefully designed unimodular matrices, the filter design process can be reduced to require only one filter prototype H0 (ω). Therefore, if the unimodular matrices which change the frequency components from R0i (ω) to H0 (ω), for i = 1, 2, 3, and 4, respectively, are determined (Fig. 3.6), then the systems in Fig. 3.7(a,b) are identical and only one filter prototype H0 (ω) is required. Consequently, H0 (ω) can replace the four remaining filters R0i (ω) using the unimodular matrices.
Fig. 3.6 Five passbands for DFB
3.4
Face Recognition Using Filter Banks
35
Fig. 3.7 Two identical structures in a DFB. (a) using R0i (ω) alone and (b) using a unimodular matrix with H0 (ω)
3.4.2.2 Quincunx Downsampling Quincunx downsampling uses quincunx 2×2 resampling matrices whose entries are ±1 so that their determinant is equal to 2 [55]. There are eight quincunx resampling matrices and the most commonly used is: Q1 =
1 1 −1 1
(3.20)
Simply speaking, a quincunx downsampling corresponds to a rotated downsampling. Figure 3.8 shows the original Lena image and its corresponding quincunx downsampled image by Q 1 . 3.4.2.3 Overview of –band DFB The Four-band DFB: A four-band DFB is composed of two-band DFBs (Fig. 3.9) arranged in a tree-like structure. After the modulator, the constituent frequency components are shifted, resulting in a diamond-like shape. Then, via the diamond filters, H0 (ω) and H1 (ω), each of the four frequency regions is filtered then downsampled by a quincunx downsampler. By cascading another set of two-band DFBs at the ends of the first two-band DFB, a four-band directional decomposition is obtained. The 2n -band DFB: Two-band and four-band DFBs lead to 2n -band extensions. To expand to eight bands, one can apply a third stage in a cascade fashion. With an input whose directional frequencies are labeled as shown in Fig. 3.5(a), an
Fig. 3.8 The Lena image and its quincunx downsampled image by Q 1
36
3
Improving Face Recognition Using Directional Faces
Fig. 3.9 A two-band DFB structure
eight-band DFB generates the eight subband outputs shown in Fig. 3.5(b). It is worth noting that each of the subband images is smaller than the original input, which is necessary to ensure a maximal DFB decimation.
3.4.2.4 Directional Images Directional images are obtained by applying all directional filters (as described above). Practical experiments show (Fig. 3.12) that better results are achieved when applying a two level DFB design, so four directional images are obtained at the end of the DFB pre-processing. These directional images can be regarded as a decomposition of the original image in four directions. Directional images contain features associated with global directions rather than local directions. By creating directional
Fig. 3.10 Some samples from Yale Face Database
3.5
Proposed Method and Results Analysis
37
images, noise in the original image is divided into four different directions, thus reducing its energy by a factor of four [9].
3.5 Proposed Method and Results Analysis 3.5.1 Proposed Method Experimental tests have been performed using two different databases: FERET Database [56] and YALE Face Database [57]. The FERET database was collected in 15 sessions between August 1993 and July 1996 at George Mason University and the US Army Research Laboratory facilities as part of the FERET programme which was sponsored by the US Department of Defense Counter- drug Technology Development Program. The database contains 1,564 sets of images for a total of 14,126 images and includes 1,199 individuals and 365 duplicate sets of images. A duplicate set is a second set of images of a person already in the database but usually taken on a different day. Images were recorded with a 35 mm camera, subsequently digitised and then converted to 8-bit gray-scale images. The resulting images are stored as 256 × 384 images. The Yale database is a collection of 165 images of 15 different individuals where images belonging to a person (i.e. same class) present variations in expressions and illumination conditions. In this database, 11 images of each individual are available (with different expressions: happy, sad, sleepy... and different lighting sources: center, left and right), three are randomly chosen to be used as reference faces while the eight remaining are used as input data (test images). Figure 3.10 shows some face image samples from Yale Face Database. The main contribution of this work is to improve the recognition rates of the existing face recognition algorithms, such as PCA, LDA, ICA and SDA, by applying a DFB pre-processing, thus demonstrating their suitability in capturing discriminant information. First, directional images were generated by applying the DFB to each face image from the database. Figure 3.11 illustrates an example of an original face image from the database and its directional images generated by the DFB. To evaluate the effect of the level of DFB decomposition on the ability to capture discriminating information and hence recognition rate, extensive experiments were carried out using both databases with varying levels of decomposition. The experiments show that the best results are obtained when the level of the DFB equals either two or four (Fig. 3.12). However, since the time of execution grows rapidly when the order of the filter bank is increased, it makes sense to choose a two level DFB decomposition. Thus four directional images are obtained for each face image in our analysis. In order to assess the efficiency of the proposed method, extensive experimentation has been carried out using state-of-the-art face recognition algorithms such as PCA, LDA, ICA and SDA. Experiments were conducted on data with and without the DFB pre-processing step as follows: first, the four methods are applied in isolation, and then in combination.
38
3
Improving Face Recognition Using Directional Faces
Fig. 3.11 Directional images generated by DFB. (a) Directional Image 1, (b) Directional Image 2, (c) Directional Image 3, (d) Directional Image 4
1 0.9 0.8
Recognition Rate
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
N=2
N=3
N=4 N=5 DFB Decomposition Level
N=6
N=7
Fig. 3.12 Recognition rates for different orders of the DFB
3.5.2 PCA In this experiment the original face database is used to extract features using the traditional Eigenfaces algorithm. The recognition rate is calculated for all the remaining faces in the database. The same system is maintained and applied to a new database obtained after DFB pre-processing. An NN algorithm using Euclidian distance is used to compute the distances between the different feature vectors.
3.5
Proposed Method and Results Analysis
39
Table 3.3 Experiment results for the DFB–PCA method and comparison with the PCA algorithm. Faces
PCA(%)
DFB–PCA(%)
Improvement(%)
Normal No glasses Wink Glasses Sleepy Surprised Sad Left-light
53.33 60 53.33 60 60 60 53.33 13.33
86.67 86.67 86.67 73.33 86.67 80 86.67 33.33
+62.50 +44.45 +62.50 +22.22 +33.33 +33.33 +62.50 +150.04
Global recognition rate
51.67
77.50
+49.99
Table 3.3 shows the results of this experiment over all the different expressions and lighting conditions of the face images in the database. Note that the improvement mentioned in Table 3.3 is a relative improvement and can be calculated from the following equation: I mpr ovement =
Rate(D F B S D A) − Rate(S D A) Rate(S D A)
(3.21)
It can be seen from Table 3.3 that low recognition accuracies are obtained for both methods (i.e. PCA alone PCA with DFB pre-processing). However, it is interesting to remark that the worst results are obtained for faces with changes in lighting conditions (only 13% for PCA), but while using the Directional Filters, the recognition rate has been improved by more than 150%. A general increase in the recognition accuracy of around 50% over all the faces is enough to conclude that a DFB implementation outperforms significantly its Eigenfaces counterpart algorithm. Figure 3.13 illustrates the results of an experiment conducted to show how the database size affects the recognition accuracy. To do so, 15 image faces are randomly chosen from the Yale Database as test images while the number of reference images per person is increased each time by one. A comparison with the GFB [7] approach has been made to demonstrate that the proposed method clearly outperforms the other pre-processing algorithms even when database size is important.
3.5.3 ICA This experiment is performed as with the PCA but using the FastICA algorithm instead of the Eigenfaces algorithm. The results obtained are reported in Table 3.4 and the effect of the database size with a comparison with the GFB approach is showed in Fig. 3.14.
40
3
Improving Face Recognition Using Directional Faces
1 0.9 0.8
Recognition Rate
0.7 0.6 0.5 0.4 PCA GFB−PCA DFB−PCA
0.3 0.2 0.1 0 30
45
60 Database Size
75
90
Fig. 3.13 Recognition rate of PCA-based algorithms
The ICA technique using both approaches significantly outperforms the PCA. In addition, it can also be seen that DFB is able to improve further the ICA especially for situations in which large facial changes occur (light source, glasses, etc.). An overall recognition rate of 80.83% is obtained for the combined ICA–DFB method with an overall improvement of 12.78%. This result clearly demonstrates the discriminating strength of a DFB pre-processing step.
Table 3.4 Experiment results for the DFB–ICA method and comparison with the ICA algorithm Faces
ICA(%)
DFB–ICA(%)
Improvement(%)
Normal No glasses Wink Glasses Sleepy Surprised Sad Left-light
80 73.33 86.67 66.67 93.33 66.67 93.33 13.33
93.33 80 86.67 73.33 93.33 86.67 100 33.33
+16.67 +09.10 0 +10 0 +30 +7.15 +150.04
Global recognition rate
71.67
80.83
+12.78
3.5
Proposed Method and Results Analysis
41
1 0.9 0.8
Recognition Rate
0.7 0.6 0.5 0.4 0.3 0.2 ICA GFB−ICA DFB−ICA
0.1 0 30
45
60 Database Size
75
90
Fig. 3.14 Recognition rate of ICA-based algorithms
3.5.4 LDA It is well known that the main problem with principal component methods (PCA and ICA) is the fact that they have no information about the class of each vector in the training database. This means that each face image is treated separately. This disadvantage has been resolved when using the LDA method since all the face images for one person are considered as one class. The same procedure is used as in the previous cases and the results obtained are showed in Table 3.5. A comparison with the Gabor approach is also illustrated in Fig. 3.15. The results obtained clearly show the LDA technique using both approaches (with and without DFB pre-processing) significantly outperforms the PCA. In addition, it can also be seen that DFB is able to improve further the LDA especially when significant changes in the image occur. An overall recognition rate of 91.67% is obtained for the combined LDA–DFB method with an overall improvement of 4.77% which clearly demonstrates the discriminating strength of a DFB pre-processing step.
3.5.5 SDA The principal idea of SDA is to divide each class (of the original LDA algorithm) into multiple subclasses. This property is very interesting in our method, since, from
42
3
Improving Face Recognition Using Directional Faces
Table 3.5 Experiment results for the DFB–LDA method and comparison with the ICA algorithm Faces
LDA(%)
DFB–LDA(%)
Improvement(%)
Normal No glasses Wink Glasses Sleepy Surprised Sad Left-light
93.33 80 93.33 80 93.33 80 100 80
93.33 93.33 100 86.67 93.33 93.33 93.33 80
0 +16.67 +8.22 +8.34 0 +16.67 −6.67 0
87.50
91.67
+4.77
Global recognition rate
each face image in the database, 2n directional images are generated (with n being the order of the DFB). The best application of the SDA is to place all the directional faces of a person into the same subclass. Figure 3.16 shows the proposed scheme for this method. To assess the performance of the method, the same steps used in the previous approaches are followed: the original face database is used to extract the features using the SDA algorithm as proposed in [6] and the recognition rate is calculated for all remaining faces in the database. A combined DFB–SDA method is used as illustrated in Fig. 3.16 to compute the new recognition rates.
1 0.9 0.8
Recognition Rate
0.7 0.6 0.5 0.4 0.3 0.2
LDA GFB−LDA DFB−LDA
0.1 0 30
45
60 Database Size
Fig. 3.15 Recognition rate of LDA-based algorithms
75
90
3.5
Proposed Method and Results Analysis
43
Fig. 3.16 Proposed scheme for the DFB–SDA method Table 3.6 Experiment results for the DFB–SDA method and comparison with the SDA algorithm Faces
SDA(%)
DFB–SDA(%)
Improvement(%)
Normal No glasses Wink Glasses Sleepy Surprised Sad Left-light
93.33 86.67 93.33 93.33 100 93.33 100 73.33
93.33 86.67 100 100 100 93.33 93.33 93.33
0 0 +8.22 +8.22 +0 0 −6.67 +27.27
91.67
95.83
+4.54
Global recognition rate
The results obtained for both SDA and DFB–SDA methods and the improvement observed for different poses in the database are depicted in Table 3.6. The results obtained demonstrate that a combined DFB–SDA approach improves the recognition rate obtained when applying the SDA algorithm alone by 4.54%. In addition, with an overall recognition rate of 95.83%, it can also be concluded that the idea of dividing the classes into subclasses is compatible with DFB-based pre-processing. Figure 3.17 shows the effect of database growing on the global recognition rate and a comparison with the Gabor approach.
3.5.6 FERET Database Results To demonstrate the efficiency of the proposed method, similar experiments using the FERET Face Database (Fig. 3.18 ) have been carried out. The following steps: reference and test databases are constructed from the original FERET database; then
44
3
Improving Face Recognition Using Directional Faces
1 0.9 0.8
Recogntion rate
0.7 0.6 0.5 0.4 0.3 0.2 SDA GFB−SDA DFB−SDA
0.1 0 30
45
60 Database size
75
90
Fig. 3.17 Recognition rate of SDA-based algorithms
PCA, LDA, ICA and SDA (alone and pre-processed by the DFB) algorithms are applied on the following database sizes: 50, 100, 200 and 300 using only one image by person as reference. The average recognition rate is then calculated for all tests. Table 3.7 depicts the experimental results obtained. From the table, it can be seen that DFBs improve the results obtained using a larger database with varying conditions such as head rotation and face sizes. Overall, the improvements for the different algorithms are all over 13% which is very satisfying.
Fig. 3.18 Some image samples from FERET Database
References
45
Table 3.7 Experiment results for the different methods with the FERET Database Method
PCA(%)
ICA(%)
LDA(%)
SDA(%)
Without DFB With DFB Improvements
72.33 84.89 +17.36
61.77 74.00 +19.80
71.67 81.11 +13.17
74.22 84.90 +14.39
3.6 Conclusion This chapter proposes a new method to enhance existing face recognition methods such as PCA, ICA, LDA and SDA by using a DFB pre-processing. The results have shown that this pre-processing step yields robustness against changes in expressions and illumination conditions. This step can also be very helpful when the number of face images in the database is insufficient since the number of images will increase by a factor of 2n (n is the order of the DFB), thus providing more discriminant power for the classification phase. It has been shown that this method is at least as good as all the other approaches including those with GFB pre-processing. The effect of DFB pre-processing is significant for the YALE and FERET databases. This is demonstrated by overall recognition rate improvements varying from 4.54% for the SDA algorithm to 49.99% for the PCA. The efficiency of the proposed method has been demonstrated by improvements of (Yale=49.99%, FERET=17.36%) for PCA, (Yale=12.78%, FERET=19.80%) for ICA, (Yale=4.77%, FERET=13.17%) for LDA and (Yale=4.54%, FERET=14.39%) for SDA. A recognition rate of 95.83% has been obtained for the SDA algorithm combined with DFB pre-processing.
References 1. P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min and W. Worek, “Overview of the face recognition grand challenge,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 947–954, June 2005. 2. A. Rosenfeld, W. Zhao, R. Chellappa and P. J. Phillips, “Face recognition: A literature survey,” ACM Computing Surveys, vol. 35, no. 4, pp. 399–458, 2003. 3. M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience, vol. 3. no. 1, pp. 71–86, 1991. 4. M. S. Bartlett, J. R. Movellan and T. J. Sejnowski, “Face recognition by independent component analysis,” IEEE Transactions on Neural Networks, vol. 13, no. 6, pp. 1450–1464, November 2002. 5. P. N. Belhumeur, J. P. Hespanha and D. J. Kriegman, “Eigenfaces vs. fisher-faces: Recognition using class specific linear projection,” IEEE Transactions Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711720, July 1997. 6. M. Zhu and A. M. Martinez, “Subclass discriminant analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1274–1286, August 2006.
46
3
Improving Face Recognition Using Directional Faces
7. W. R. Boukabou, L. Ghouti and A. Bouridane, “Face recognition using a Gabor filter bank approach,” First NASA/ESA Conference on Adaptive Hardware and Systems, pp. 465–468, June 2006. 8. C. H. Park, J. J. Lee, M. Smith, S. Park and K. H. Park, “Directional filter bank based fingerprint feature extraction and matching,” IEEE Transactions on Circuits and Systems For Video Technology, vol. 14, pp. 74–85, January 2004. 9. M. A. U. Khan, M. K. Khan, M. A. Khan, M. T. Ibrahim, M. K. Ahmed and J. A. Baig, “Improved pca based face recognition using directional filter bank,” IEEE INMIC, pp. 118–124, December 2004. 10. X. Yi qiong, L. Bi cheng and W. Bo, “Face recognition by fast independent component analysis and genetic algorithm,” IEEE on Computer and Information Technology, vol. 28, no. 8, pp. 194–198, August 2004. 11. P. J. Phillips, G. Grother, R. J. Micheals, D. M. Blackburn, E. Tabassi and J. M. Bone, Frvt 2002: Overview and summary. http://www.frvt.org/FRVT2002/documents.htm, March 2003. 12. D. M. Blackburn, Biometrics 101, version 3.1, volume 12. Federal Bureau of Investigation, March 2004. 13. M. H. Yang, D. J. Kriegman and N. Ahuja, “Detecting faces in images: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 1, p. 3458, January 2002. 14. J. Fagertun, Face Recognition. PhD thesis, Technical University of Denmark, 2006. 15. G. Yang and T. S. Huang, “Human face detection in complex background,” Pattern Recognition, vol. 27, no. 1, pp. 53–63, 1994. 16. K. C. Yow and R. Cipolla, “Feature-based human face detection,” Image and Vision Computing, vol. 15, no. 9, pp. 713–735, 1997. 17. Y. Dai and Y. Nakano, “Face-texture model based on sgld and its application in face detection in a color scene,” Pattern Recognition, vol. 29, no. 6, pp. 1007–1017, 1996. 18. S. McKenna and S. Gong and Y. Raja, “Modelling facial colour and identity with Gaussian mixtures,” Pattern Recognition, vol. 31, no. 12, pp. 1883–1892, 1998. 19. R. Kjeldsen and J. Kender, “Finding skin in color images,” Automatic Face and Gesture Recognition, pp. 312–317, 1996. 20. T. Craw, D. Tock and A. Bennett, “Finding face features,” Proceeding of Second European Conference on Computer Vision, pp. 92–96, 1992. 21. A. Lanitis, C. J. Taylor and T. F. Cootes, “An automatic face identification system using flexible appearance models,” Image and Vision Computing, vol. 13, no. 5, pp. 393–401, 1995. 22. K.-K. Sung and T. Poggio, “Example-based learning for view-based human face detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39–51, January 1998. 23. H. Rowley, S. Baluja and T. Kanade, “Neural network-based face detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 23–38, January 1998. 24. E. Osuna, R. Freund and F. Girosi, “Training support vector machines: An application to face detection,” Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 130–136, 1997. 25. H. Schneiderman and T. Kanade, “Probabilistic modeling of local appearance and spatial relationships for object recognition,” Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 45–51, 1998. 26. A. Rajagopalan, K. Kumar, J. Karlekar, R. Manivasakan, M. Patil, U. Desai, P. Poonacha and S. Chaudhuri, “Finding faces in photographs,” Proceeding of Sixth IEEE International Conference on Computer Vision, pp. 640–645, 1998. 27. A. J. Colmenarez and T. S. Huang, “Face detection with information-based maximum discrimination,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 782–787, 1997. 28. W. Zhao and R. Chellappa, “Face Processing: Advanced Modeling and Methods,” Academic Press, New York, 2006.
References
47
29. B. Moghaddam and A. Pentland, “Probabilistic visual learning for object representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 696–710, July 1997. 30. P. J. Phillips, “Support vector machines applied to face recognition,” Proceedings of the 1998 conference on Advances in neural information processing systems, pp. 803–809, 1998. 31. C. Liu and H. Wechsler, “Evolutionary pursuit and its application to face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 570–582, June 2000. 32. S. Z. Li and J. Lu, “Face recognition using the nearest feature line method,” IEEE Transactions on Neural Networks, vol. 10, no. 2, pp. 439–443, March 1999. 33. M. S. Bartlett, H. M. Lades and T. J. Sejnowski, “Independent component representation for face recognition,” Proceedings of SPIE Symposium on Electronic Imaging: Science and Technology, pp. 528–539, 1998. 34. M.-H. Yang, “Kernel eigenfaces vs. kernel fisherfaces: Face recognition using kernel methods,” FGR ’02: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, IEEE Computer Society, Washington, DC, USA, p. 215, 2002. 35. M. D. Kelly, “Visual identification of people by computer,” Tech. rep. AI-130, Stanford AI Project, Stanford, CA, 1970. 36. T. Kanade, “Picture processing system by computer complex and recognition of human faces,” In doctoral dissertation, Kyoto University, November 1973. 37. T. Kanade, “Computer recognition of human faces,” Interdisciplinary Systems Research, vol. 47, 1977. 38. I. J. Cox, J. Ghosn and P. N. Yianilos, “Feature-based face recognition using mixturedistance,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 209–216, 1996. 39. B. S. Manjunath, R. Chellappa and C. V. D. Andmalsburg, “A feature based approach to face recognition,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 373–378, 1992. 40. K. Okada, J. Steffans, T. Maurer, H. Hong, E. Elagin, H. Neven and C. Von der Malsburg, “The Bochum/USC Face Recognition System And How it Fared in the FERET Phase III test.” In H. Wechsler, P. J. Phillips, V. Bruce, F. Fogeman Souli´e and T. S. Huang, editors, Face Recognition: From Theory to Applications. Springer-Verlag, pp. 186–205, 1998. 41. L. Wiskott, J.-M. Fellous and C. Von Der Malsburg, “Face recognition by elastic bunch graph matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 775–779, January 1997. 42. F. Samaria, Face Recognition Using Hidden Markov Models. PhD thesis, University of Cambridge, UK, 1994. 43. A. V. Nefian and M. H. Hayes, “Hidden Markov models for face recognition,” In Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 2721–2724, 1998. 44. A. Pentland, B. Moghaddam and T. Starner, “View-based and modular eigenspaces for face recognition,” Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 84–91, June 1994. 45. P. Penev and J. Atick, “Local feature analysis: A general statistical theory for object representation,” Network: Computation in Neural Systems, vol. 7, pp. 477–500, June 1996. 46. J. Huang and B. Heisele, “Scomponent-based face recognition with 3d morphable models,” In Proceedings of International Conference on Audio-and Video-Based Person Authentication, 2003. 47. M. Turk and A. Pentland, “Face recognition using eigenfaces,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–591, June 1991. 48. A. J. Bell and T. J. Sejnowski, “The independent components of natural scenes are edge filters” Vision Research, vol. 37, no. 23, pp. 3327–3338, 1997.
48
3
Improving Face Recognition Using Directional Faces
49. K. Fukunaga, “Introduction to Statistical Pattern Recognition,” (2nd edition). Academic Press, New York, 1990. 50. A. Buja, T. Hastie and R. Tibshirani, “Penalized discriminant analysis,” Annals of Statistics, vol. 23, pp. 73–102, 1995. 51. T. Hastie, R. Tibshirani and A. Buja, “Flexible discriminant analysis by optimal scoring,” Journal of the American Statistical Association, vol. 89, pp. 1255–1270, 1994. 52. G. Baudat and F. Anouar, “Generalized discriminant analysis using a kernel approach,” Neural Computation, vol. 12, pp. 2385–2404, 2000. 53. B. S. Manjunath and W. Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 837–842, August 1996. 54. G. Dai and C. Zhou, “Face recognition using support vector machines with the robust feature,” The 2003 IEEE International Workshop on Robot and Human interactive Communication Millbrae, California, USA, 2003. 55. S. Park, “New Directional Filter Banks and Their Applications in Image Processing.” PhD thesis, Georgia Institute of Technology, 1999. 56. P. J. Phillips, H. Moon, P. J. Rauss S. and Rizvi, “The feret evaluation methodology for face recognition algorithms,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 10, pp. 1090–1104, October 2000. 57. Department of Computer Science Yale University. The Yale face database. http://cvc.yale.edu/projects/yalefaces/yalefaces.html.
Chapter 4
Recent Advances in Iris Recognition: A Multiscale Approach
4.1 Introduction Consistent and protected identification of a person is a key subject in security. In government and conventional environments, security is usually provided through badges, provision of information for visitors and issuing of keys. These are the most common means of identification since they are the easiest to remember and the easiest to confirm. However, these solutions are the most unreliable putting all components of security at risk. IDs can be stolen, passwords can be forgotten or cracked. In addition, security breaches resulting in access to restricted areas of airports or other sensitive areas are a source of concern for terrorist activities. Although there are laws against false identification, incidents of invasions and unauthorised modifications to information occur daily with catastrophic effects. For example, credit card fraud is rapidly increasing and traditional technologies are not sufficient to reduce the impact of counterfeiting and/or security breaches therefore a more secure technology is needed to cope with the drawbacks and pitfalls [1]. Biometrics, the use of biology, which deals with data statistically, provides a powerful answer to this need, since the uniqueness of an individual arises from his/her personal or behavioural characteristics with no passwords or numbers to remember. These include fingerprint, retinal and iris scanning, hand geometry, voice patterns, facial recognition and other techniques. Typically a biometric recognition system records data from a user and performs a comparison each time the user attempts to claim his/her identity. Biometric systems can be classified into two operating modes: verification and identification modes. In the “verification” mode, the user claims an identity and the system compares the extracted features with the stored template of the asserted identity to determine if the claim is true or false. In the “identification” mode, no identity is claimed and the extracted feature set is compared with the templates of all the users in the database in order to recognise the individual. For such approaches to be widely applicable, they must be highly reliable [2]. Reliability relates to the ability of the approach to support a signature that is unique to an individual and that can be captured in an invariant manner over time. The use of biometric traits require that a particular biometric factor be unique for each individual that it can be readily measured, and that it is invariant over time. Biometrics such as signatures, A. Bouridane, Imaging for Forensics and Security, Signals and Communication Technology, DOI 10.1007/978-0-387-09532-5 4, C Springer Science+Business Media, LLC 2009
49
50
4
Recent Advances in Iris Recognition
photographs, fingerprints, voiceprints and retinal blood vessel patterns, all have significant drawbacks. Although signatures and photographs are cheap and easy to obtain and store, they are insufficient to identify automatically with assurance, and can be easily forged. Electronically recorded voiceprints are susceptible to changes in a person’s voice, and they can be counterfeited. Fingerprints or handprints require physical contact, and they also can be counterfeited and marred by artifacts [2]. It is currently accepted within the biometric community that biometrics has the potential for high reliability because it is based on the measurement of an intrinsic physical property of an individual. Fingerprints, for example, provide signatures that appear to be unique to an individual and reasonably invariant with age, whereas faces, while fairly unique in appearance can vary significantly with time and place. Invasiveness, the ability to capture the signature while placing as few constraints as possible on the subject of evaluation, is another constraint. In this regard, acquisition of a fingerprint signature is invasive as it requires that the subject makes physical contact with a sensor, whereas images of a subject’s face or iris that are sufficient for recognition can require a comfortable distance. Considerations of reliability and invasiveness suggest that the human iris is a particularly interesting structure on which to base a biometric approach for personnel verification and identification [3]. From the point of view of reliability, the special patterns that are visually apparent in the human iris are highly distinctive to an individual; the appearance of a subject’s iris suffers little from day to day variations. In addition, the method is non-invasive since the iris is an overt body that can be imaged at a comfortable distance from a subject with the use of extant machine vision technology. Owing to these features of reliability and non invasiveness, iris recognition is a promising approach to biometric-based verification and identification of people [2]. An authentication system on iris recognition is reputed to be the most accurate among all biometric methods because of its acceptance, reliability and accuracy. Ophthalmologists originally proposed that the iris of the eye might be used as a kind of optical fingerprint for personal identification [4]. Their proposal was based on clinical results that every iris is unique and it remains unchanged in clinical photographs. The human iris begins to form during the third month of gestation and is complete by the eighth month, though pigmentation continues into the first year after birth. It has been discovered that every iris is unique since two people (even two identical twins) have uncorrelated iris patterns [5], and yet stable throughout the human life. It is suggested in recent years that the human irises might be as distinct as fingerprint for different individuals, leading to the idea that iris patterns may contain unique identification features. In 1936, Frank Burch, an ophthalmologist, proposed the idea of using iris patterns for personal identification [6]. However, this was only documented by James Doggarts in 1949. The idea of iris identification for automated recognition was finally patented by Aran Safir and Leonard Flom in 1987 [6]. Although they had patented the idea, the two ophthalmologists were unsure as to a practical implementation of the system. They commissioned John Daugman to develop the fundamental algorithms in 1989. These algorithms were patented by Daugman in 1994 and now
4.2
Related Work: A Review
51
form the basis for all current commercial iris recognition systems. The Daugman algorithms are owned by Iridian Technologies and they are licensed to several other companies [6].
4.2 Related Work: A Review Research in the area of iris recognition has been receiving considerable attention and a number of techniques and algorithms have been proposed over the last few years. Flom and Safir first proposed the concept of automated iris recognition in 1987 [7]. Since then, a number of researchers have worked on iris representation and matching, and significant progress has been made [8, 9–11]. Daugman [4, 12] made use of multiscale Gabor filters to demodulate texture phase structure information of the iris. In his work, filtering an iris image with a family of filters resulted in 1,024 complex-valued phasors which denote the phase structure of the iris at different scales. Each phasor was then quantised to one of the four quadrants in the complex plane. The resulting 2,048-component iris code was used to describe an iris. The difference between a pair of iris codes was measured by their Hamming distance. Sanchez-Reillo et al. [13] provided a partial implementation of the algorithm proposed by Daugman. Wildes et al. [3] represented the iris texture with a Laplacian pyramid constructed with four different resolution levels and used the normalised correlation to determine whether the input image and the model image are from the same class. Boles and Boashash [14] calculated a zero-crossing representation of one-dimensional (1D) wavelet transform at various resolution levels of a concentric circle on an iris image to characterise the texture of the iris. Iris matching was based on two dissimilarity functions. In [15], Sanchez-Avila et al. further developed the iris representation method proposed by Boles et al. [14]. They made an attempt to use different similarity measures for matching, such as Euclidean and Hamming distances. Lim et al. [16] decomposed an iris image into four levels using 2D Haar wavelet transform and quantised the fourth-level high-frequency information to form an 87-bit code. A modified competitive learning neural network (LVQ) was adopted for the classification process. Tisse et al. [17] analysed the iris characteristics using an analytic image constructed by the original image and its Hilbert transform. Emergent frequency functions for feature extraction were in essence samples of the phase gradient fields of the analytic image’s dominant components [18–20]. Similar to the matching scheme of Daugman, they sampled the binary emergent frequency functions to form a feature vector and used a Hamming distance for matching. Kumar et al. [20] utilised correlation filters to measure the consistency of iris images from the same eye. The correlation filter of each class was designed using the two-dimensional (2D) Fourier transforms of training images. If the correlation output (the inverse Fourier transform of the product of the input image’s Fourier transform and the correlation filter) exhibited a sharp peak, the input image was determined to be from an authorised subject, otherwise an imposter one.
52
4
Recent Advances in Iris Recognition
Bae et al. [10] projected the iris signals onto a bank of basis vectors derived by independent component analysis and quantised the resulting projection coefficients as features. In another approach by Li Ma et al., Multichannel [9] and Even Symmetry Gabor filters [4] were used to capture local texture information of the iris, which is used to construct a fixed length feature vector. Nearest feature line method is used for iris matching. In [21] a set of 1D intensity signals is constructed to effectively characterise the most important information of the original 2D image using a particular class of wavelets; a position sequence of local sharp variation points in such signals is recorded as features. A fast matching scheme based on an exclusive OR operation is used to compute the similarity between a pair of position sequences.
4.3 Iris Localisation 4.3.1 Background The eye is essentially made up of two parts: the scalera or “white” portion of the eye and cornea. The scalera consists of closely interwoven fibres and a small section in the front and centre known as the cornea. The cornea consists of fibres arranged in regular fashion. Conveniently this makes the cornea transparent, allowing light to filter in. Behind the cornea is the anterior chamber filled with a fluid known as the aqueous humor. A spongy tissue, the ciliary bodies, arranged around the edge of the cornea, constantly produces the aqueous humor. Immersed in the aqueous humor is a ring of muscles commonly referred to as the iris. The word iris is most likely derived from the Latin word for rainbow. It appears that the term was first applied in the sixteenth century, making reference to this multicoloured portion of the eye [2, 3]. The iris itself extends out in front of the lens, forming a circular array, with a variable opening in the centre, otherwise known as the pupil. The pupil is not located exactly in the centre of the iris, but rather slightly nasally and inferiorly (below the centre) [4]. The iris, which is made up of two bands of muscles, controls the pupil, the dilator, which contracts to enlarge the pupil, and the sphincter, which contracts to reduce the size of the pupil. The visual appearance of the iris is directly related to its multilayered construction.
4.3.2 Iris Segmentation Image acquisition captures the iris as part of a larger image that also contains data derived from the immediately surrounding eye region. Therefore, prior to performing iris pattern matching, it is important to localise that portion of the acquired image that corresponds to the iris [3]. This corresponds to the portion of the image derived from inside the limbus (the border between the sclera and the iris) and outside the pupil (Fig. 4.1 from CASIA iris database). If the eyelids are an occluding part of the iris, then only that portion of the image below the upper eyelid and above the lower
4.3
Iris Localisation
53
Fig. 4.1 Eye image
eyelid should be included. The eyelid boundary can also be irregular due to the presence of eyelashes. From these suggestions, it can be said that in iris segmentation a wide range of edge contrasts must be taken into consideration, and iris segmentation must be robust and effective.
4.3.3 Existing Methods for Iris Localisation Methods such as the Integro-differential, Hough transform and active contour models are well known techniques in use for iris localisation. These methods are described below including their strengths and weaknesses. 4.3.3.1 Daugman s Integro-Differential Operator In order to localise an iris, Daugman proposed an Integro-differential operator. The operator assumes that the pupil and limbus are circular contours and operate as a circular edge detector [4]. Detecting the upper and lower eyelids is also carried out using the Integro-differential operator by adjusting the contour search from circular to a designed arcuate shape [22]. The Integro-differential operator is defined as follows: I (x, y) ∂ ds . (4.1) max(r, x0 , y0 ) G σ (r ) ∗ ∂r (r,x0 ,y0 ) 2πr The operator pixel-wise searches throughout the raw input image, I(x,y), and obtains the blurred partial derivative of the integral over normalised circular contours in different radii. The pupil and limbus boundaries are expected to maximise the contour integral derivative, where the intensity values over the circular borders would make a sudden change. Gσ (r) is a smoothing function controlled by σ that smoothes the image intensity for a more precise search.
54
4
Recent Advances in Iris Recognition
This method can result in false detection due to noise such as strong boundaries of upper and lower eyelids since it works only on a local scale. 4.3.3.2 Hough Transform Hough transform is a standard image analysis tool for finding curves that can be defined in a parametrical form such as lines and circles. The circular Hough transform can be employed to deduce the radius and centre coordinates of the pupil and iris regions. Wildes [3], Kong and Zhang [23], Tisse et al. [17] and Ma et al. [24] have all used Hough transform to localise irises. The localisation method, similar to Daugman s method, is also based on the first derivative of the image. In the method proposed by Wildes, an edge map of the image is first obtained by thresholding the magnitude of the image intensity gradient:
where
|∇G(x, y) ∗ I (x, y)|
and ∇ ≡ ∂/∂x , ∂/∂ y
G(x, y) =
2 0) 1 − (x−x0 )2 +(y−y 2σ2 e 2π σ 2
(4.2)
G(x,y) is a Gaussian smoothing function with scaling parameter σ to select the proper scale of edge analysis. The edge map is then used in a voting process to maximise the defined Hough transform for the desired contour. A maximum point in the Hough space will correspond to the radius r and centre coordinates xc and yc of the circle best defined by the edge points according to equation x2c + y2c + r2c = 0.
(4.3)
Wildes et al. and Kong and Zhang also make use of the parabolic Hough transform to detect the eyelids by approximating the upper and lower eyelids with parabolic arcs. The Hough transform method requires the threshold values to be chosen for edge detection, and this may result in critical edge points being removed, thus resulting in failures to detect circles/arcs. In addition, Hough transform is computationally intensive due to its “brute-force” approach, and thus may not be suitable for realtime applications. 4.3.3.3 Discrete Circular Active Contours Ritter proposed an active contour model to localise iris in an image [25]. The model detects pupil and limbus by activating and controlling the active contour using two defined forces: internal and external forces. The internal forces are designed to expand the contour and keep it circular. This force model assumes that pupil and limbus are globally circular, rather than locally, to minimise the undesired deformations due to peculiar reflections and dark patches near the pupil boundary. The contour detection process of the model is based on
4.4
Proposed Method for Iris Localisation
55
the equilibrium of the defined internal forces with the external forces. The external forces are obtained from the grey level intensity values of the image and are designed to push the vertices inward. The movement of the contour is based on the composition of the internal and external forces over the contour vertices. Each vertex is moved between time t and (t+1) by Vi (t + 1) = Vi (t) + Fint,i + Fext,I
(4.4)
where Fint,i i is the internal force, Fext,i is the external force and Vi is the position of vertex i. A point interior to the pupil is located from a variance image and then a discrete circular active contour (DCAC) is created with this point as its centre. The DCAC is then moved under the influence of internal and external forces until it reaches equilibrium, and the pupil is then localised.
4.4 Proposed Method for Iris Localisation To the best knowledge of the authors, there exist no previous work on iris segmentation and localisation using a multiscale approach. In this chapter we propose a novel approach for iris segmentation with multiscale edge detection based on wavelet maxima which can provide significant edges where noise disappears with an increase of the scales (to a certain level), with less texture points producing local maxima thus enabling us to find the real geometrical edges of the image thereby yielding an efficient detection of the significant circles for inner and outer iris boundaries and eyelids. In our proposed method, multistage edge detection is used to extract the points of sharp variations (edges) with modulus maxima where the local maxima are detected to produce only single pixel edges. Depending on the requirement of details desired in the edges the level of decomposition can be selected.
4.4.1 Motivation 4.4.1.1 Edge Detector Using Wavelets Edges in images can be mathematically defined as local singularities. Until recently, the Fourier transform was the main mathematical tool for analysing singularities. However, the Fourier transform is global and as such not well adapted to local singularities and it is hard to find the location and spatial distribution of singularities with Fourier transforms. On the other hand, Wavelet transforms provide a local analysis; they are especially suitable for time-frequency analysis [26] such as for singularity detection problems. With the growth of wavelet theory, the wavelet transforms have been found to be remarkable mathematical tools to analyse the singularities
56
4
Recent Advances in Iris Recognition
including the edges, and further, to detect them effectively. Mallat, Hwang, and Zhong [27, 28] proved that the maxima of the wavelet transform modulus can detect the location of the irregular structures. The wavelet transform characterises the local regularity of signals by decomposing them into elementary building blocks that are well localised both in space and frequency. This not only explains the underlying mechanism of classical edge detectors, but also indicates a way of constructing optimal edge detectors under specific working conditions. A remarkable property of the wavelet transform is its ability to characterise the local regularity of functions. For an image f(x, y), its edges correspond to singularities of f(x, y), and thus are related to the local maxima of the wavelet transform modulus. Therefore, the wavelet transform can be used as an effective method for edge detection. Assume f(x, y) is a given image of size M × N. At each scale j with j>0 and S0 f = f(x, y), the wavelet transform decomposes S j−1 f into three wavelet bands: a low-pass band S j f, a horizontal high-pass band W Hj f and a vertical high-pass band WVj f. The three wavelet bands (S j f, W Hj f, WVj f) at scale j are of size M×N, which is the same as the original image, and all filters used at scale j (j>0) are upsampled by a factor of 2 j compared with those at scale zero. In addition, the smoothing function used in the construction of a wavelet reduces the effect of noise. Thus, the smoothing step and edge detection step are combined together to achieve the optimal result. 4.4.1.2 Multiscale Edge Detection The resolution of an image is directly related to the appropriate scale for edge detection. High resolution and a small scale will result in noisy and discontinuous edges; low resolution and a large scale will result in undetected edges. The scale controls the significance of edges to be shown. Edges of higher significance are more likely to be preserved by the wavelet transform across the scales. Edges of lower significance are more likely to disappear when the scale increases. Since an edge separates two different regions, an edge point is a point where the local intensity of the image varies rapidly – more rapidly than in the neighbouring points which are close from the edge; such a point could therefore be characterised as a local maximum of the gradient of the image intensity. The problem is that such a characterisation is to be applied to differentiable images, and above all that, it also detects all noise points. All techniques used so far to resolve the problem are based on smoothing the image first [15, 3, 29, 30]. However, a problem with smoothing arises: how much and what smoothing should one choose? A strong smoothing will lead to the detection of fewer points while a lighter one will be more permissive. That is why Mallat defined, in his work with Zhong [6], the concept of multiscale contours. In this case, every edge point of an image is characterised by a whole chain of the scale-space plane: the longer the chains, the more important the smoothing imposed is, and the smaller the number of edge points we will get. In addition, this allows us to extract useful information about the regularity of the image at the edge point it characterises. This can be very attractive in terms of a finer characterisation of edge map.
4.4
Proposed Method for Iris Localisation
57
The multiscale edge detection method described in [31] is used to find the edges. This wavelet is a nonsubsampled wavelet decomposition and essentially implements the discretised gradient of the image at different scales. At each level of the wavelet transform the modulus M j f of the gradients can be computed by 2 2 M j f = W jH f + W jV f
(4.5)
and the associated phase A j f is obtained by A j f = tan
−1
W jV W jH
.
(4.6)
The sharp variation points of the image f(x, y) smoothed by S j f (f(x, y) × S j f) are the points (x, y), where the modulus M j f has a local maxima in the direction of the gradient given by A j f.
4.4.2 The Multiscale Method Multilevel wavelet decomposition is applied and an edge detection is carried out before computing the local maxima to produce both iris outer and pupil boundary edge maps. A Hough transform is then used for the detection of circles (outer and pupil boundaries) followed by a conversion of iris images from Cartesian to polar coordinates system for normalisation purpose. The diagram of this method is depicted in Fig. 4.2. 4.4.2.1 Edge Map Detection In our work, we have used the algorithm described in [27] to obtain the wavelet decomposition using a pair of discrete filters H, G as shown in Table 4.1. A detailed description of the technique can be found in [36–37]. At each scale s, the algorithm decomposes the eye image I(x,y) into I(x,y,s), Wv (x, y, s) and Wh (x, y, s) – I(x,y,s): the image smoothed at scale s. – Wh (x, y, s) and Wv (x, y, s) can be viewed as the two components of the gradient vector of the analysed image I(x,y) in the horizontal and vertical directions, respectively. In each scale s (s < S) where S is the number of scales or decomposition level, the image I(x, y) is smoothed by a low-pass filter: s = 0, I (x y, s + 1) = I (x, y, s) × (Hs , Hs ).
(4.7)
58
4
Recent Advances in Iris Recognition
Fig. 4.2 Diagram of the proposed method
Multiscale edge detection
Local maxima
Iris outer boundary edge map
Pupil boundary edge map
Iris outer circle detection
Pupil circle detection
Eyelids and eyelashes isolating
Iris normalization
Table 4.1 Response of filters H, G H
G
0 0 0.12 5 0.37 5 0.37 5 0.12 5 0
0 0 0 –2.0 2.0 0 0
4.4
Proposed Method for Iris Localisation
59
Both horizontal and vertical details are obtained by 1 .I (x, y, s) × (G s , D), λs 1 Wv (x, y s) = .I (x, y, s) × (D, G s ). λs
Wh (x, y, s) =
(4.8) (4.9)
We denote by – D the Dirac filter whose impulse response is equal to 1 at 0 and 0 otherwise. – A × (H, L) the separable convolution of the rows and columns, respectively, of the image A with the 1D filters H and L. – Gs , Hs are the discrete filters obtained by putting 2s –1 zeros between consecutive coefficients of H and G. – λs , as explained in [27] due to discretisation, the wavelet modulus maxima of a step edge do not have the same amplitude at all scales as they should fit in a continuous model. The constants λs compensate for this discrete effect. The values of λs are given in Table 4.2. Figure 4.3 clearly shows the application of the algorithm on an eye image where it can be observed that the edges of the image in both horizontal and vertical directions and at different scales are efficiently computed. From Fig. 4.3 it can also be observed that there is significant information about the edge in an eye image, with Wh (x, y, s) eyelids and that the horizontal pupil’s lines are clearer than the outer boundary circle, and with Wv (x, y, s) useful information about both pupil and outer boundary circles. After computing the two components of the wavelet transform, we compute the modulus at each scale as follows: (4.10) M(x, y, s) = |wh (x, y, s)|2 + |wv (x, y, s)|2 The modulus M(x, y, s) has local maxima in the direction of the gradient given by
A(x, y, s) = arctan (wv (x, y, s) / wh (x, y, s)) Table 4.2 Normalisation coefficient λs .for s > 5, λs = 1 s
λs
2 3 4 5
1.50 1.12 1.03 1.01 1.00
(4.11)
60
4
Recent Advances in Iris Recognition
Fig. 4.3 Original image at the top, the first column on the left shows Wh (x, y, s) for 1 ≤ s ≤ 3, and the second column on the right shows Wv (x, y, s) for 1 ≤ s ≤ 3
From the modulus M(x,y,s) one can see how edges across the scales change and only real edges remaining at all the scales, for example if the intensities along specified column (see Fig. 4.4) are compared one can determine how well the edges are detected. A thresholding operation is then applied to the modulus M(x,y, s). This is carried out on the modulus maxima MAX(M(x, y, s)) and then multiplied by a factor α to obtain a threshold value that yields an edge map. The threshold value T is computed as follows: T = α ∗ MAX(M(x, y, s)).
(4.12)
Therefore only values of M(x,y,s) greater or equal to T are considered edge points. The constant α takes different values for pupil edge detection and outer boundary edge detection (see Figs. 4.5 and 4.6). The use of vertical coefficients for outer boundary edge detection will reduce the influence of the eyelids when performing a circular Hough transform because the eyelids are usually horizontally aligned [3].
4.4
Proposed Method for Iris Localisation
61
80
70
60
50
40
30
20
10
0
0
50
100
150
200
250
300
0
50
100
150
200
250
300
50
100
150
200
250
300
50
100
150
200
250
300
50
100
150
200
250
300
80
70
60
50
40
30
20
10
80
70
60
50
40
30
20
10
0 100 0
90
80
70
60
50
40
30
20
10
0
0
120
100
80
60
40
20
0 0
Fig. 4.4 The first column on the left shows the modulus images M(x, y, s) for 1 ≤ s ≤ 5, and the second column on the right displays intensities along specified column
62
4
Recent Advances in Iris Recognition
Fig. 4.5 Pupil edge detection in scale s = 3 with α = 0.66
Fig. 4.6 Outer boundary edge detection using only the vertical coefficients Wv
4.4.2.2 Iris Outer and Pupil Circle Detection The Hough transform locates contours in an n-dimensional parameter space by examining whether they lie on curves of specified shapes. For Iris outer and pupil circles detection (see Fig. 4.7) a circular Hough transform has been used, The Hough transform locates contours in an n-dimensional parameter space by examining whether they lie on curves of specified shape. For iris outer or pupillary boundaries and a set of recovered edge points (xi , yi ), i = 1, . . . , n, a Hough transform is defined as: H(xc , yc , r) =
n
h(xi , yi , xc , yc , r )
(4.13)
i=1
where H(xc , yc , r) shows a circle through a point, the coordinates xc , yc , r define a circle by the following equation: x2c + y2c + r2 = 0
(4.13a)
For edge detection for iris boundaries the above equation will become (xi − xc )2 + (yi − yc )2 − r2 = 0
Fig. 4.7 Iris localised: (a) pupil detected, (b) outer circle detected
(4.13b)
4.4
Proposed Method for Iris Localisation
63
4.4.2.3 Eyelids and Eyelashes Isolating The horizontal coefficients Wh (x, y, s) are only used for multiscale edge detection to create an edge map as shown in Fig. 4.8. The eyelids are then isolated by first fitting a line to the upper and lower eyelid parts using a linear Hough transform. A second horizontal line is then drawn, which intersects with the first line at the iris edge that is closest to the pupil; the second horizontal line allows a maximum isolation of eyelid regions while the thresholding operation is used to isolate the eyelashes (Fig. 4.9). 4.4.2.4 Iris Normalisation and Polar Transformation Once the iris region is segmented the next stage is to normalise this part to enable generation of the iris code and the subsequent comparison step. Since variations in the eye, such as optical size of the iris, position of pupil in the iris, and the iris orientation change from person to person, it is required to normalise the iris image, so that the representation is common to all, with similar dimensions [36–37].
Fig. 4.8 Edges for eyelids detection: the first column on the left shows the original images and the second column on the right shows the edges detected using the horizontal coefficients Wh (x, y, 3)
Fig. 4.9 Iris localisation without noise
64 Fig. 4.10 Unwrapping the iris
4
Recent Advances in Iris Recognition
θ 1
0 r
r θ
The normalisation process involves unwrapping the iris and converting it into its polar equivalent. It is done using Daugman’s Rubber sheet model (Fig. 4.10). The centre of the pupil is considered as the reference point and a remapping formula is used to convert the points from the Cartesian scale to the polar scale. The remapping of iris image I(x, y) from raw Cartesian coordinates to polar coordinates (r, θ ) can be represented as I (x(r, θ ), y(r, θ )) → I (r, θ )
(4.14)
where r is on the interval [0,1] and θ is angle [0,2π] and with: x(r, θ ) = (1 − r )x p (θ ) + r xl (θ )
(4.14a)
y(r, θ) = (1 − r)y p (θ) + ryl (θ)
(4.14b)
where x p (θ), y p (θ) and xl (θ), yl (θ) are the coordinates of the pupil and iris boundaries along the direction θ. In this model a number of data points are selected along each radial line (defined as the radial resolution). The number of radial lines going around the iris region is defined as the angular resolution as shown in Fig. 4.11. The normalisation process proved to be successful as demonstrated by Fig. 4.12 showing the normalised iris of image 4.11.
Fig. 4.11 Normalised portion with radial resolution of 15 pixels and angular resolution of 60 pixels
4.4
Proposed Method for Iris Localisation
65
Fig. 4.12 Normalised iris image
4.4.3 Results and Analysis The proposed algorithm has been evaluated using the CASIA iris image database, which consists of 80 persons, 108 set eye images and 756 eye images. A perfect segmentation was obtained as shown in Fig. 4.13 with a success rate of 99.6% which is very attractive when compared with the basis for all current iris recognition systems such as Daugman and Wildes methods. A multiscale approach can provide a complete and stable description of signals since it is based on a wavelet formalisation multiscale approach. This characterisation provides a new approach to classical iris edge detection problems since all existing research in iris localisation is based either on the integro-differential method proposed by Daugman or the derivatives of the images proposed by Wildes. For example, a problem with Daugman’s algorithm [15] is that it can fail in the presence of noise (i.e. from reflections, etc.) since the algorithm operates only on a local scale basis. However, in the proposed algorithm a multiscale approach provides more useful information about the sharp variations (images at each scale with a horizontal and a vertical decomposition) as shown in Fig. 4.2 and demonstrated in [27, 31]. It is clear from Fig. 4.14 that the proposed algorithm is capable to detect pupil and outer boundary circles even with poor quality iris images because of the efficient edge map detected with the multiscale edge detection. On the other hand, there are problems with threshold values to be chosen for edge detection. First, this may result in critical edge points being removed, resulting in a failure to detect circles/arcs. Secondly, there is no precise criterion to choose a threshold value. Wildes [3] chose a hard threshold value and applied the Hough transform; however, the choice of threshold was not based on solid ground.
Fig. 4.13 Illustration of a perfect iris segmentation
66
4
Recent Advances in Iris Recognition
Fig. 4.14 Poor quality iris image is efficiently localised
In the proposed algorithm the threshold value is selected by computing the maximum of the modulus at a given scale s which provides a solid criterion, because the sharp variation points of the image smoothed by h(x, y, s) are the pixels at locations (x, y), where the modulus M(x, y, s) has a local maxima in the direction of the gradient A(x, y, s) [31]. It can be clearly seen from Fig. 4.15 that edges are well detected and the pupil is clearer as shown in (b) and (c) than the edge and pupil as shown in (a). It can also be seen that, as a result, the pupil’s circle is well localised as shown in (e). This is the reason why the proposed algorithm outperforms other algorithms which used a local scale and canny edge detector. This analysis confirms and explains the effectiveness of our proposed method based on multiscale edge detection using wavelet maxima for iris segmentation, provides a precise detection of circles (iris outer boundary and pupil boundary) and obtains a precise edge map from the wavelet decomposition in the horizontal and vertical directions. This in turn greatly reduces the search space for the Hough transform and performs well in the presence of noise, thereby improving the overall performance with a better success rate than that of Daugman and Wildes methods (Fig. 4.16).
Fig. 4.15 Edge influence in iris segmentation: (a) pupil edge map using Canny edge detector and threshold value (T1 = 0.25 and T2 = 0.25), (b) and (c) pupil edge obtained with a multiscale edge detection using wavelet maxima for α = 0.4 and α = 0.6, (d) result of iris segmentation using Canny edge detector of example (a), (e) result of iris segmentation using a multiscale edge detection of example (c)
4.5
Texture Analysis and Feature Extraction
67
100 99 98 Success rate
97 96 95 94 93
Daugman's method Proposed method Wildes' method
92 91
0
10
20
30 Noise
40
50
60
Fig. 4.16 Success rate of iris segmentation
4.5 Texture Analysis and Feature Extraction Since iris has an interesting structure with plenty of texture information it makes sense to search for efficient methods that aim to capture crucial iris information locally. The distinctive spatial characteristics of the human iris are manifest at a variety of scales [2]. For example, distinguishing structures range from the overall shape of the iris to the distribution of tiny crypts and detailed texture. To capture this range of spatial details, it is advantageous to make use of a multiscale representation. A few researchers have investigated the use of multiresolution techniques for iris feature extraction [15, 3, 14] and a high recognition accuracy has been achieved so far. At the same time it has been observed that each multiresolution technique has its own specification and situation where it is suitable. For example, a Gabor filter bank is the most known multiresolution method used for iris feature extraction and Daugman [15], in his proposed iris recognition system, has demonstrated the accuracy of using Gabor filters. Moreover we have investigated the use of wavelet maxima components as part of a multiresolution technique for iris feature extraction by analysing iris textures in both horizontal and vertical directions. Since the iris has a rich structure with a very complex textures, it is important to analyse iris textures by combining all information extracted from iris region by taking orientation and both horizontal and vertical details. For this purpose, we have proposed a new combined multiresolution iris feature extraction scheme by analysing the iris using wavelet maxima components before applying a dedicated Gabor filter bank to extract all dominant texture features.
68
4
Recent Advances in Iris Recognition
4.5.1 Wavelet Maxima Components Wavelet decomposition provides a very good approximation of images and natural setting for the multilevel analysis. Since wavelet transform maxima provide useful information about textures and edges analysis [31], we propose to employ wavelet components for a fast and effective feature extraction. Wavelet maxima have been shown to work well in detecting edges which are key features in a query. Moreover, this method provides useful information about texture features by using horizontal and vertical details. As described in [27] to obtain the wavelet decomposition a pair of discrete filters H, G has been used (Table 4.1). At each scale s, the algorithm decomposes the iris image I(x,y) into I(x,y,s), Wv (x,y,s) and Wh (x,y,s) as shown in Figs. 4.17 and 4.18, respectively.
4.5.2 Special Gabor Filter Bank The 2D Gabor Wavelets function g(x,y) and its Fourier transform G(u,v) can be defined as follows [32]: g(x, y) =
1 x2 y2 1 exp[− ( 2 + 2 ) + 2πj W x] 2π σx σ y 2 σx σy
(4.15)
1 (u − W )2 v2 G(u, v) = exp[− ( + 2 )] 2 2 σu σv
(4.16)
where σ u = 1/2π σ x and σ v = 1/2π σ y . Gabor functions can form a complete but non-orthogonal basis set. Expanding a signal using this basis provides a localised 0.12 0.1 0.08 0.06 0.04 0.02 0 -0.02 -0.04 -0.06 -0.08
0
20
40
60
80
100
Fig. 4.17 Wavelet maxima vertical component at scale 2 with intensities along specified column 0.08 0.06 0.04 0.02 0 -0.02 -0.04 -0.06 -0.08 -0.1
0
10
20
30
40
50
60
70
80
90
100
Fig. 4.18 Wavelet maxima horizontal component at scale 2 with intensities along specified column
4.5
Texture Analysis and Feature Extraction
69
frequency description. A class of self-similar functions, referred to as Gabor wavelets in the following discussion, is now considered. Let g(x,y) be the mother Gabor wavelet, then this self-similar filter dictionary can be obtained by appropriate dilations and rotations of g(x,y) through the generating function gmn (x, y) = a −m G(x , y ),
(4.17)
x = a −m (x cos θ + y sin θ ),
(4.17a)
y = a −m (−x sin θ + y cos θ ),
(4.17b)
where a>1, m, n = integer and θ = nπ/k is the orientation (k is the number of orientations) and a–m is the scale factor. The non-orthogonality of the Gabor wavelets implies that there is redundant information in the filtered images, and the following strategy is used to reduce this redundancy [32]. Let Ul and Uh denote the lower and upper centre frequencies of interest, respectively. Let K be the number of orientations and S be the number of scales in the multiresolution decomposition. Then the design strategy is to ensure that the half-peak magnitude support of the filter responses in the frequency spectrum touch each other as shown in Fig. 4.19. This results in the following formula for computing the filter parameters σu and σv (and thus σx and σy ): a = (Uh /Ul )− S−1 1
σu =
(4.18)
(a − 1)Uh √ (a + 1) 2 ln 2
(4.19)
0.6 0.5 0.4 0.3 0.2 0.1 0 –0.1 –0.2 –0.6
–0.4
–0.2
0
0.2
0.4
0.6
Fig. 4.19 Gabor filter dictionary, the filter parameters used are Uh = 0.4, Ul = 0.05,K = 6 and S = 4
70
4
σv = tan(
Recent Advances in Iris Recognition
π σ2 (2 ln 2)2 σu2 −1 ]2 )[Uh − 2 ln( u )][2 ln 2 − 2k Uh Uh2
(4.20)
where W = Uh , and m = 0, 1, . . . , S–1. In order to eliminate the sensitivity of the filter response to absolute intensity values, the real (even) components of the 2D Gabor filters are biased by adding a constant to make them with a zero mean.
4.5.3 Proposed Method • Compute wavelet maxima components in the horizontal and vertical directions using 5 scales. • For each component, a special Gabor filter bank is applied with 4 scales and 6 orientations to obtain ((4×6)×5)×2 = 120×2 = 240 filtered images. • The feature vector is computed by using two different techniques: the first uses a statistical measure (mean and variance) while the second experiment is based on moments invariant (Fig. 4.20).
Normalized iris
Wavelet maxima components
Horizontal details {5 scales}
Vertical details {5 scales}
Special Gabor filter bank 6 orientations {4 scales} 4*6*5 = 120 filtered image
Special Gabor filter bank 6 orientations {4 scales} 4*6*5 = 120 filtered image
240 filtered images
Feature vector extracted using statistical features (mean, variance) 2*240 = 480 feature vector elements
Feature vector extracted using moment invariants (7) 7*240 = 1680 feature vector elements
Fig. 4.20 Proposed combined multiscale feature extraction techniques diagram
4.6
Matching
71
4.5.3.1 Template Generation Statistical Features For each filtered image the statistical features, the mean, μmn , and standard deviation, σ mn , are computed in order to construct a feature vector. In the experiments, we show that the best results are obtained when four scales (S = 4) and six orientations (K = 6) are used. The resulting feature vector is constructed as follows: f = [μ0,0, σ0,0, μ0,1, σ0,1, . . . , μ3,5, σ3,5 ].
(4.21)
After applying wavelet maxima components on the normalised iris images, a Gabor filter bank has been used with 4 scales and 6 orientations, thus resulting in 120 images for horizontal and vertical details (wavelet maxima (5 images)→ Gabor filters (4×6×5) = 120 images. Therefore, by employing statistical features (mean and variance), there will be (120×2) = 240 images to thus produce a feature vector of 480 elements.
Moments Invariant The theory of moments provides an interesting series expansion for representing objects. This is also suitable to mapping the filtered images to vectors so that their similarity distance can be measured [33]. Certain operations of moments are invariant to geometric transformations such as translations, rotations and scaling. Such features are useful in the identification of objects with unique signatures regardless of their location, size and orientation [33]. A set of seven 2D moments invariant that are insensitive to translations, rotations and scaling have been computed for each image analysed by the horizontal and vertical components wavelet maxima and Gabor filters. This has produced 240 filtered images for each image. Therefore, for seven moments a feature vector of size 1680 (240×7) elements is constructed.
4.6 Matching It is very important to present the obtained vector in a binary code because it is easier to compute the difference between two binary code-words than between two number vectors since Boolean vectors are always easier to compare and manipulate. A Hamming distance matching algorithm for the recognition of two samples has been employed. It is basically an Exclusive OR (XOR) operation between two bit patterns. Hamming distance is a measure, which delineates the differences, of iris codes. Every bit of a presented iris code is compared to the every bit of a referenced iris code: if the two bits are the same (i.e. two 1’s or two 0’s) the system assigns a value “0” to that comparison while a value of “1” is assigned if the two bits are different. The formula for iris matching is therefore as follows:
72
4
HD =
Recent Advances in Iris Recognition
1 Pi ⊕ Ri N
(4.22)
where N is dimension of feature vector, Pi is the ith component of the presented feature vector while Ri is the ith component of referenced feature vector. The Match Ratio between two iris templates is then computed by Ratio =
Tz Tb
∗ 100
(4.23)
where Tz is the total number of zeros calculated by Hamming distance vector and Tb is the total number of bits in iris template.
4.7 Experimental Results and Analysis A series of experiments have been conducted to evaluate the performance of the proposed technique. In addition, the proposed iris identification system has been compared with some existing methods for iris recognition and detailed discussions on the overall experimental results are presented.
4.7.1 Database The Chinese Academy of Sciences – Institute of Automation (CASIA) eye image database [34] containing 756 greyscale eye images with 108 unique eyes or classes and 7 different images of each unique eye has been used in the analysis. Images from each class are taken from two sessions with 1 month interval between the sessions. The images were captured especially for iris recognition research using specialised digital optics developed by the National Laboratory of Pattern Recognition, China. The eye images are mainly from persons of Asian decent, whose eyes are characterised by irises that are densely pigmented, and with dark eyelashes.
4.7.2 Combined Multiresolution Feature Extraction Techniques In our new approach, we have introduced the combined multiresolution techniquebased feature extraction which provides more significant iris texture in terms of iris structure and texture’s nature. Since the visual appearance of the iris is a direct result of its multilayered structure [2], the structure of the iris texture should be defined in all directions and orientations. In fact, some existing methods in iris recognition have used a multiscale approach to process iris features. However, these methods only address one class of iris representation: phase-based methods [4, 12], zerocrossing representation [14, 35], texture analysis [3, 16] and intensity variation analysis [29, 10]. Since each technique has a strong effect on one class, it fails to analyse
4.7
Experimental Results and Analysis
73
some types of texture if some conditions are not met. A solution is therefore required to combine a number of techniques to balance this problem with a view to analyse all types of texture. Our proposed approach has demonstrated that a combined multiscale technique is effective and robust for the analysis of iris texture and a high system performance can be achieved (Table 4.4).
4.7.3 Template Computation Two different experiments to compute the feature vector were conducted. In the first case, statistical features (mean and variance) have been used to compute the feature vector elements leading to 480 elements. The second method employs a set of seven moment invariants that are insensitive to translations, scaling and rotations. This has led to a feature vector of 1680 elements. From the experimental results depicted in Table 4.3, it has been found that the accuracy is higher with the second method. Therefore, it can be concluded that moment invariants are useful in identifying and efficiently representing images especially that they are compact and can be easily used to compute similar distances. Also, these moments are invariant to all affine transformations.
4.7.4 Comparison with Existing Methods Our comparative study was against the methods proposed by Daugman [4], Boles and Boashash [14] and Li Ma et al. [21] which are the best known among existing schemes for iris recognition. These methods characterise local details of the iris based on phase, texture analysis, zero-crossing and local sharp variations’ representation. However, it is worth noting that Wildes et al.[3] method only operates in a verification mode and as such a comparative study against it is not appropriate since our proposed method is more useful in an identification mode. From the results shown in Table 4.4, it can be seen that Daugman’s method and the proposed method have the best performances followed by Tan’s method [21] and then Boles [14]. Daugman’s method is slightly better than the proposed method in identification tests. Daugman’s method operates by demodulating the phase information of each small local region using multiscale quadrature wavelets. The resulting phasor, denoted by a complex-valued coefficient to one of the four quadrants in the complex plane, is then quantised. To achieve high accuracy, the size of each Table 4.3 Performance evaluations according to the size of feature Vectors Feature vector representation
Statistical features
Moment invariants
Feature vector size (bits) Correct recognition rate (%)
480 99.60
1680 99.52
74
4
Recent Advances in Iris Recognition
Table 4.4 Comparison of methods’ correct recognition rate Methods
Correct recognition rate
Daugman LiMa and Tan Boles et boashashe Proposed method (statistical features) Proposed method (moments invariant)
99.90 99.23 93.2 99.52 99.60
local region must be small enough, which results in a high dimensionality of the feature vector (2048 components). This means that Daugman’s method captures much more information in the much smaller local regions. This makes his method slightly better than ours. Boles [14] and Li Ma et al. [21] used a kind of 1D ordinal measures thereby losing much information when compared with 2D ordinal representations. This directly leads to a worsening of their performances when compared with our method. Boles and Boashash [14] only employed extremely little information along a virtual circle on the iris to represent the whole iris, which in turn, results in a relatively low accuracy. Li Ma et al.’s [21] method uses local features, so that the performance may be affected by iris localisation, noise and inherent iris deformations caused by pupil movements. Table 4.5 depicts the computational cost involved for the computation of the feature extraction using the methods described in [4, 14, 21] including our proposed algorithm. These experiments are carried out using Matlab 7.0. Since Boles’ method [14] is based on 1D signal analysis, the computational cost incurred is smaller than that of the other methods. However, our proposed approach is faster than both Daugman and Li Ma and Tan’s methods because it employs a compact feature vector representation with a high recognition rate. Table 4.5 Comparison of the computational complexity
Methods
Daugman
Li Ma and Tan
Feature extraction complexity (ms)
285
95
Boles and Boashash
Proposed (statistical features)
Proposed (moment invariants)
55
74
81
4.8 Discussion and Future Work From the analysis and comparative study given above, the following conclusions can be drawn: • The proposed multiscale approach has introduced a technique to detect the edges for precise and effective iris region localisation. This approach uses modulus
References
75
wavelet maxima to define pupil and iris edges. This in turn greatly reduces the search space for the Hough transform, thereby improving the overall performance. • A combination of Gabor filters with wavelet maxima components provide more textured information, since wavelet maxima allow us to efficiently detect horizontal and vertical details through scale variations. By applying Gabor filters to the resulting components with varying orientations and scales more precise information can be captured for use in iris recognition accuracy. • Moment invariants are useful and are efficient to capture iris features since they are insensitive to affine transformations (i.e. translations, rotations and scaling) thereby providing a complete and compact feature vector which can improve the matching process quickly. The experimental results also show that our proposed method is reasonable and promising for the analysis of iris texture. Future work will include: • Analysis of local variations to precisely capture local fine changes of the iris with a view to further improve the accuracy. • Analysis of a combined local and global texture analysis for robust iris recognition.
4.9 Conclusion Iris recognition, as a biometric technology, has great potential for security and identification applications. This is mainly due to its variability and stability features. This chapter has discussed an iris localisation method based on a multiscale edge detection approach using wavelet maxima as a preprocessing step which is highly suitable for the detection of iris outer and inner circles. This approach yields attractive iris localisation a necessary step to achieve higher recognition accuracy. The chapter has also introduced a novel and efficient multiscale approach for iris recognition based on a combined feature extraction methods that consider both the textural and topological features of an iris image. These features being invariant to translations, rotations and scaling yield a superior performance in terms of recognition accuracy and computational cost when compared against the algorithms proposed by Boles [14] and Li Ma et al. [21]. However, it performs with a marginally less accuracy, but with a lower complexity when compared against Daugman’s method [4].
References 1. M. K. Khan, J. Zhang and S. J. Horng, “An effective iris recognition system for identification of humans”, INMIC Multitopic Conference, pp. 114–117, 24–26 December 2004. 2. J. Wayman, A. Jain, D. Maltoni and D. Maio, “Biometric systems, Technology, Design and Performance Evaluation”, Springer-Verlag London, UK, 2005. 3. R. Wildes, “Iris recognition: an emerging biometric technology”, Proceedings of the IEEE, vol. 85, pp. 1348–1363, 1997.
76
4
Recent Advances in Iris Recognition
4. J. Daugman, “High confidence visual recognition of persons by a test of statistical independence”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, pp. 1148–1161, 1993. 5. A. Muron and J. Pospisil, “The human iris structure and its usages”, Physica, vol. 39, pp. 87–95, 2000. 6. P. C. Kronfeld, “The gross and embryology of the eye”, The Eye, vol. 1, pp. 1–66, 1968. 7. L. Flom and A. Safir, “Iris Recognition System,” U.S. Patent 4 641 394, 1987. 8. R. Wildes, J. Asmuth, G. Green, S. Hsu, R. Kolczynski, J. Matey and S. McBride, “A machinevision system for iris recognition”, Machine Vision and Applications, vol. 9, pp. 1–8, 1996. 9. R. Johnson, “Can Iris Patterns be Used to Identify People?” Chemical and Laser Sciences Division LA-12 331-PR, Los Alamos National Laboratory, Los Alamos, CA, 1991. 10. K. Bae, S. Noh and J. Kim, “Iris feature extraction using independent component analysis,” Proceedings of the 4th International Conference Audio- and Video-Based Biometric Person Authentication, pp. 838–844, 2003. 11. J. Daugman, “Biometric Personal Identification System Based on Iris Analysis,” U.S. Patent 5 291 560, 1994. 12. J. Daugman, “Demodulation by complex-valued wavelets for stochastic pattern recognition,” International Journal of Wavelets, Multiresolution and Information Processing, vol. 1, pp.1 –17, 2003. 13. R. Sanchez-Reillo and C. Sanchez-Avila, “Iris recognition with low template size,” Proceedings of the International Conference Audio and Video-Based Biometric Person Authentication, pp. 324–329, 2001. 14. W. Boles and B. Boashash, “A human identification technique using images of the iris and wavelet transform”, IEEE Transactions on Signal Processing, vol. 46, pp. 1185–1188, 1998. 15. J. Daugman, “How iris recognition works”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, pp. 21–30, 2004. 16. S. Lim, K. Lee, O. Byeon and T. Kim, “Efficient iris recognition through improvement of feature vector and classifier,” ETRI Journal, vol. 23, no. 2, pp. 1–70, 2001. 17. C. Tisse, L. Martin, L. Torres and M. Robert, “Person identification technique using human iris recognition”, Proceedings of the Vision Interface, pp. 294–299, 2002. 18. T. Tangsukson and J. Havlicek, “AM-FM image segmentation,” Proceedings of the IEEE International Conference Image Processing, pp. 104–107, 2000. 19. J. Havlicek, D. Harding and A. Bovik, “The mutli-component AM-FM image representation,” IEEE Transactions on Image Processing, vol. 5, pp. 1094–1100, June 1996. 20. B. Kumar, C. Xie and J. Thornton, “Iris verification using correlation filters,” Proceedings 4th International Conference Audio- and Video-Based Biometric Person Authentication, pp.697–705, 2003. 21. L. Ma, T. Tan, et al. “Efficient iris recognition by characterizing key local variations,” IEEE Transactions on Image Processing, vol. 13, pp. 739–750, 2004. 22. J. G Daugman, “The importance of being random: statistical principles of iris recognition”, In Pattern Recognition, vol. 36, no. 2, pp. 279–291, 2003. 23. W. Kong and D. Zhang, “Accurate iris segmentation based on novel reflection and eyelash detection model”, Proceedings of 2001 International Symposium on Intelligent Multi-media, Video and Speech Processing, Hong Kong, 2001. 24. L. Ma, Y. Wang and T. Tan, “Iris Recognition Using Circular Symmetric Filters”, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 2002. 25. N. J. Ritter and J. R. Cooper, “Locating the Iris: A First Step to Registration and Identification,” Proceedings of the 9th IASTED International Conference on Signal and Image Processing, IASTED, pp. 507–512, August 2003. 26. J. C. Goswami and A. K. Chan, “Fundamentals of Wavelets: Theory, Algorithms, and Applications”, John Wiley & Sons, New York 1999.
References
77
27. S. Mallat and S. Zhong, “Characterization of signals from multiscale edges”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, pp. 710–732, 1992. 28. S. Mallat and W. Hwang, “Singularity detection and processing with wavelets”, IEEE Transactions on Information Theory, vol. 38, pp. 617–643, 1992. 29. L. Ma, T. Tan, Y. Wang and D. Zhang, “Personal identification based on iris texture analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, pp. 1519–1533, 2003. 30. L. Pan and M. Xie, “Research on iris image preprocessing algorithm”, IEEE International Symposium on Machine Learning and Cybernetics, vol. 8, pp. 5220–5224, 2005. 31. S. Mallat, “A Wavelet Tour of Signal Processing”, Second Edition, Academic Press, New York, 1998. 32. B. S. Manjunath and W. Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, August 1996. 33. A. K. Jain. “Fundamentals of Digital Image Processing,” Prentice-Hall Inc., Upper Saddle River, 1989. 34. Chinese Academy of Sciences – Institute of Automation. Database of 756 Greyscale Eye Images. http://www.sinobiometrics.com Version 1.0, 2003. 35. C. Sanchez-Avila and R. Sanchez-Reillo, “Iris-based biometric recognition using dyadic wavelet transform,” IEEE Aerospace and Electronic Systems Magazine, vol. 17, pp. 3–6, Oct. 2002. 36. M. Nabti and A. Bouridane, “An improved iris recognition system using feature extraction based on wavelet maxima moment invariants,” Advances in Biometrics, Springer Berlin/Heidelberg, vol. 462, pp. 988–996, 2007. 37. M. Nabti and A. Bouridane, “An effective and fast iris recognition system based on a combined multiscale feature extraction technique,” Pattern Recognition, vol. 41, pp. 868–879, 2008.
“This page left intentionally blank.”
Chapter 5
Spread Transform Watermarking Using Complex Wavelets
5.1 Introduction The use of wavelets in digital watermarking has increased dramatically over the last decade replacing previously popular domains such as the Discrete Cosine Transform (DCT) and the Discrete Fourier Transform (DFT). The main reason for this relates to several advantages which wavelets offer over these domains, such as better energy compaction and efficiency of computation. The Discrete Wavelet Transform (DWT) however suffers from some disadvantages, it lacks directional selectivity so it cannot differentiate between opposing diagonals. It also lacks shift invariance meaning that small geometrical changes in the input signal can cause large shifts in the wavelet coefficients. To overcome these shortcomings complex wavelets have been developed. This chapter describes two complex wavelet transform implementations and their properties. The benefit of these properties to watermarking is also detailed. Watermarking schemes can be roughly categorised into two main methodologies; spread spectrum and quantisation-based schemes. Balado terms these as interference non-rejecting and interference rejecting schemes, respectively [1]. Spread transform has been developed as a combination between these two methodologies, “spreading” the quantisation over multiple host samples through the use of a vector projection. Spread transform embedding therefore combines the robustness gained from using multiple host samples with the host interference rejecting nature of quantisationbased schemes allowing for higher levels of capacity to be reached. Further as watermarking has matured as a subject area an urgent need has arisen to objectively find the absolute performance limits of watermarking systems. To this end the process of deriving the capacity of watermarking algorithms has been developed by Moulin [23]. Through statistical modelling of wavelet coefficients and the application of information and game theory it is possible to derive an estimate of the maximum achievable performance of any given watermarking system and host data. This chapter first introduces the concept of spread transform watermarking and then applies this algorithm and information theoretic capacity analysis to the case of watermarking with complex wavelets. This will demonstrate the improved levels of capacity that can be achieved through the superior feature representation offered by complex wavelet transforms. A. Bouridane, Imaging for Forensics and Security, Signals and Communication Technology, DOI 10.1007/978-0-387-09532-5 5, C Springer Science+Business Media, LLC 2009
79
80
5
Spread Transform Watermarking Using Complex Wavelets
5.2 Wavelet Transforms The DWT applies filtering to the source data with both a high-pass (wavelet) filter (h1) and low-pass (scaling) filter (h0), and then down-sampling the result by 2. The process is then repeated recursively on the low-pass section of the resulting signal as shown in Fig. 5.1. The DWT can be extended to the case of 2D data by applying the filters on the horizontal lines of the image and then repeating the process in the vertical direction at each level on both the coarse and detail subbands to create four subbands – the low-pass subband (LL) and the horizontal (HL), vertical (LH) and diagonal (HH) detail subbands. Due to the down-sampling at each stage an N pixel image will result in N wavelet coefficients and so the DWT is non-redundant.
h0(n) h0(n)
↓2
h1(n)
↓2
h1(n)
h0(n)
↓2
h1(n)
↓2
↓2
↓2
Fig. 5.1 DWT filterbank
The disadvantage of the DWT is its lack of directional selectivity as shown in Fig. 5.2. The HL and LH subbands are oriented at 0◦ and 90◦ , respectively while the HH subband is oriented at both +45◦ and –45◦ (Fig. 5.2). The HH subband thus has no dominant diagonal orientation and is unable to separate the diagonal features of an image. This has important implications for digital watermarking as it creates difficulty in properly adapting the watermark to the host image. For features in the image that have a dominant diagonal orientation, part of the watermark signal will consequently be perpendicular to the host image feature increasing the visibility of the watermark. This is demonstrated in Fig. 5.2. In addition, the DWT coefficients tend to produce “checkerboard” artefacts in the embedded watermark. This can make the watermark look unnaturally blocky (Fig. 5.2) making it harder to fulfil the imperceptibility requirement. DWT coefficients also suffer from a lack of phase information which makes them unsuitable for geometric modelling.
5.2.1 Dual Tree Complex Wavelet Transform To overcome the deficiencies of the DWT, Kingsbury [19, 20] proposed the use of a complex wavelet filterbank and a dual tree filterbank implementation of the complex wavelet. This involves the application of two DWTs acting in parallel on the same data, each of which can be viewed as one of the two trees of a dual tree complex wavelet transform. The two trees can then be modelled as the real and complex parts
5.2
Wavelet Transforms
81
Fig. 5.2 Top: DWT coefficients, Middle: DWT level 2 diagonal watermark features added to detail of F16 image (×8), Bottom: DTWT level 2 diagonal watermark features added to detail of F16 image (×8)
of the wavelet transform, respectively. The two DWTs act in parallel on the same data, one DWT acts upon the even samples of the data while the other acts upon the odd samples. The difference and sum of these two DWT decompositions are then taken to produce the two trees of the dual tree wavelet transform (DTWT). If the two DWTs used are the same then no advantage is gained. However if the DWTs are designed so as to be an approximate Hilbert transform of each other then it is possible to obtain a directionally selective complex wavelet transform (Fig. 5.4). This process is demonstrated in Fig. 5.3, with the scaling (h0 ) and wavelet (h1 ) filters of the upper DWT and the scaling (g0 ) and wavelet (g1 ) filter of the lower DWT applied recursively on their respective low-pass outputs at each level. The sum and the difference of the high-pass subbands produced at each level are then calculated to obtain the dual tree coefficients of the dual tree wavelet transform. The application of the transform to 2D data follows the same methodology as that of the DWT. Although the complex version has the advantage of excellent shift invariance, this comes at the cost of 4:1 redundancy for 2D signals. This is due to the use of four DWTs acting in parallel in the case of 2D data leading to 12
82
5
Spread Transform Watermarking Using Complex Wavelets
Fig. 5.3 DTWT filterbank
Fig. 5.4 DTWT wavelet, real (blue solid) and imaginary (red dashed)
different subbands at each level of decomposition. This places restrictions upon the embedding algorithm as the watermark in the wavelet domain must have a valid representation in the spatial domain. As a result of the redundancy, much of the power added to wavelet coefficients will be in the softy space of the wavelet transform and will be lost upon re-composition. For this reason in this thesis the lower redundancy version of the dual tree complex wavelets transform developed by Selesnick et al. [28] is used instead. This uses only two DWTs acting in parallel for 2D data and so has a much more manageable redundancy of 2:1 (Fig. 5.5) for 2D signals allowing for more freedom when embedding. This decreased redundancy also makes it an attractive option for use in compression [14]. The DTWT overcomes the problem of the DWT lacking directional selectivity. The DTWT can discriminate between opposing diagonals with six different
5.2
Wavelet Transforms
83
Fig. 5.5 DTWT decomposition Fig. 5.6 DTWT coefficients
subbands orientated at 15◦ , 75◦ , 45◦ , –15◦ , –75◦ and –45◦ (Fig. 5.6). This allows the watermark embedding to adapt to diagonal features in the host image better. The DTWT also represents horizontal and vertical features better giving two directional subbands for each. Also the DTWT is free from the checkerboard artefacts that characterise the coefficients of the DWT.
5.2.2 Non-redundant Complex Wavelet Transform The non-redundant complex wavelet transform (NRCWT) has been developed by Fernandes et al. [13] as an alternative to the class of over complete, redundant complex wavelet transforms. It makes use of a tri-band filterbank where the data is down-sampled by 3 at each stage.
84
5
Spread Transform Watermarking Using Complex Wavelets
Fig. 5.7 NCWTR and NCWTC filterbanks for real and complex inputs, respectively
There are two filterbanks NCWTR and NCWTC and both consist of a real scaling filter (h0) and two complex wavelet filters (h+ and h–). The NCWTR and NCWTC are applied to real and complex inputs, respectively (Fig. 5.7). The complex filters h+ and h– (Fig. 5.9) when applied to real input will produce wavelet coefficients that are complex conjugates of each other, as a result one set of these complex coefficients can be discarded. In the case of the NCWTR the output consists of one real part and two complex outputs. As a result of their being conjugates of each other one of these can be discarded. In the case of the NCWTC the output consists of three complex outputs. The two complex outputs in this case are unique and both must be kept. The NCWTR is first applied to the real value rows of the image to be decomposed. This results in the creation of one subband of real value columns and two subbands of complex value columns. The complex value columns are conjugates of each other and so one can be discarded as redundant. An input of N coefficients will thus produce 5N/3 coefficients (N/3 real and 2N/3 complex coefficients). However after the discarding of one of the complex subbands N coefficients will remain, hence the NCWTR is non-redundant. The NCWTR is then applied to the real-valued columns to produce one real and two complex-valued outputs. Again, one of these complex outputs can be discarded as a conjugate leaving the real-valued LL band and a complex value subband consisting of the horizontal features of the image. The NCWTC is applied to the complex value rows to create three complex value outputs consisting of the vertical and opposing diagonal features of the image, respectively. Due to down-sampling by three the storage space required for the three complex subbands will be the same as the original complex subband and so the NCWTC is non-redundant. The 2D NRCWT decomposition is illustrated in Fig. 5.8. The process is repeated on the LL band at each level to produce one real-valued subband and four complex-valued subbands at each level of decomposition. The subbands produced are orientated at 0◦ , 90◦ , 45◦ and –45◦ , in both real and imaginary subbands. While this offers less directional subbands than the DTWT, the NRCWT maintains the directional selectivity of the DTWT with regard to diagonal features (Fig. 5.11). However unlike the DTWT the transform will produce as many coefficients as there are pixels in the original image and is therefore non-redundant (Fig. 5.10). As a result there will be no loss of information in the wavelet coefficients upon re-composition.
5.2
Wavelet Transforms
Fig. 5.8 NRCWT filterbank for 1 level of decomposition
Fig. 5.9 NRCWT wavelet, real (solid blue) and imaginary (dashed red)
Fig. 5.10 NRCWT decomposition
85
86
5
Spread Transform Watermarking Using Complex Wavelets
Fig. 5.11 NRCWT coefficients
In addition the NRCWT coefficients have a high degree of phase coherency. This means that the phase of the coefficients is coherent in places where the coefficients have strong directional tendency.
5.3 Visual Models To fulfil the imperceptibility requirement of watermarking systems it is necessary to derive a just-noticeable-distortion (JND) model for the watermark embedding. The JND model assigns a value to each wavelet coefficient that quantifies the maximum distortion that can be applied to that coefficient before creating an unacceptable level of visual distortion. As noted by the psycho-visual studies in [30], the visibility of noise in an image is dependent on two main factors, these are Luminance Masking: The average background luminance of a region affects the visibility of watermark distortions. Distortions in bright and very dark areas of the image are less visible than those in areas of the image with middling levels of brightness. Contrast Masking: Textured areas and edges in an image where spatial variations are large are much better at masking distortions than smoother areas where the spatial variation is much smaller. In addition the visibility of wavelet noise can depend on the orientation of the wavelet coefficients under consideration with diagonal features generally less visible than horizontal and vertical ones. Also, low frequency features will generally be more visible than higher frequency features. JND models have been derived for use with discretely sampled wavelets such as those by Watson et al. [29] and Wolfgang et al. [27]. However these are not suitable for the more complicated decomposition produced by the complex wavelet transforms. Two methods of deriving the JND model are adapted for use in watermarking with complex wavelets. The first is a “universal” JND model that adapts a spatial JND profile to fit the subband structure under consideration. The second uses a series
5.3
Visual Models
87
of visual tests to derive JND values directly from the coefficients of the wavelet decomposition. A combination of both these methods is also considered.
5.3.1 Chou’s Model Chou’s method [8] operates by composing a full-band JND model in the spatial domain and then decomposing it into separate subband JND profiles. The advantage of this is that extensive visual tests are not required to derive the JND values and the visual model can be applied to any wavelets filter. As covered in the previous section the JND values are modelled as the dominant effects of both overall luminance and luminance contrast. The full-band JND model is constructed from the following Eq. [22]: JND f b (x, y) = max{ f 1 (bg (x, y), m g (x, y)), f 2 (bg (x, y))}
(5.1)
f 1 (bg (x, y), m g (x, y)) = m g (x, y)α(bg (x, y)) + β(bg (x, y)) bg (x, y) 1/2 f 2 (bg (x, y)) = T0 . 1 − + 3; f or bg (x, y) ≤ 127 127
f 2 (bg (x, y)) = γ . bg (x, y) − 127 + 3; f or bg (x, y) > 127
(5.2)
(5.3)
α(bg (x, y)) = bg (x, y).0.0001 + 0.115
(5.4)
β(bg (x, y)) = λ − bg (x, y).0001
(5.5)
Through visual experiments Chou derived T0 ; γ and λ were found to be 17, 3/128 and 1/2, respectively. The values bg (x, y) and mg (x, y) are the average background luminance and luminance contrast around the pixel at (x, y), respectively. They are obtained using the following filters: ⎡
⎡ ⎤ 0 0 0 0 0 0 ⎢ 1 3 8 3 1⎥ ⎢ 0 ⎢ ⎢ ⎥ ⎢ ⎥ G1 = ⎢ ⎢ 0 0 0 0 0⎥ G 3 = ⎢−1 ⎣−1 −3 −8 −3 −1⎦ ⎣ 0 0 0 0 0 0 0 ⎡ ⎡ ⎤ 0 0 1 0 0 0 ⎢ 0 ⎢ 0 ⎥ 8 3 0 0 ⎢ ⎢ ⎥ ⎢ 3 0 −3 −1 ⎥ G2 = ⎢ ⎢ 1 ⎥ G4 = ⎢ 0 ⎣ 0 ⎣ 0 0 −3 −8 0 ⎦ 0 0 −1 0 0 0
0 0 −3 −8 0 1 3 8 3 1
m g (x, y) = max {|gradk (x, y)|} k=1,2,3,4
1 3 0 −3 −1 0 0 0 0 0
−1 −3 −8 −3 −1
0 8 3 0 0
⎤ 0 0⎥ ⎥ 0⎥ ⎥ 0⎦ 0
⎤ 0 0⎥ ⎥ 1⎥ ⎥ 0⎦ 0
(5.6)
88
5
Spread Transform Watermarking Using Complex Wavelets
1 p(x − 3 + i, y − 3 + j).G k (i, j) 16 i=1 j=1 5
|gradk (x, y)| =
5
(5.7)
And for average background luminance, ⎡ 1 ⎢1 ⎢ B(i, j) = ⎢ ⎢1 ⎣1 1 5 5 1 p(x bg (x, y) = 32 i=1 j=1
1 2 2 2 1
1 2 0 2 1
1 2 2 2 1
⎤ 1 1⎥ ⎥ 1⎥ ⎥ 1⎦ 1
− 3 + i, y − 3 + j).B(i, j)
(5.8)
Finally, the individual subband JND profiles are calculated as follows: ⎡ JNDq2 (x, y) = ⎣
3 3
⎤ JND 2f b (i + x.4, j + y.4)⎦ . ωq
(5.9)
i=0 j=0
for q = 0,1,. . .,15 and 0 0. The parameter α is referred to as the scale parameter and it models the width of the PDF peak (standard deviation) and β is called the shape parameter and it is inversely proportional to the decreasing rate of the peak (see Fig. 6.5). Note that β = 1 and β = 2 yield Laplacian and Gaussian distributions, respectively. The value β = 0.5 is widely used in the literature; however, an accurate estimate of the parameters α and β can be found as described in [31]. By replacing the pdf of the GGD in Eq. 6.13, the detector can be defined by l(y) =
N |yi | βi i=1
αi
1 − |1 + λwi∗ |−βi
(6.18)
The threshold η can be obtained using Eq. 6.16 where μ0 =
Fig. 6.5 pdf of the generalised Gaussian distribution
N 1 [1 − |1 + λwi∗ |−βi ] β i i=1
(6.19)
6.5
Statistical Data Modelling and Application to Watermark Detection
129
and σ02 =
N 1 [1 − |1 + λwi∗ |−βi ]2 . β i i=1
(6.20)
Laplacian model is simpler than the generalised Gaussian one since the latter requires the use of interpolation methods to estimate the shape parameter. It has been used to model the DWT coefficients in [32,33]. In this chapter, the Laplacian pdf is obtained by letting β = 1 in Eq. 6.17. Also, the Laplacian detector can be obtained by substituting β by 1 in equations Eq. 6.18, 6.19, and 6.20.
6.5.2 Alpha Stable Model The symmetric alpha-stable (SαS) distribution family has recently gained considerable interest due to both empirical and theoretical reasons and also due to their capability of modelling heavy-tailed data in various applications. It has been used in [34] to model the DWT coefficients, especially to take into account the heavy-tailed coefficients. The SαS can be best determined by its characteristic function ϕ(ω) = exp( jδω − γ |ω|α )
(6.21)
where α (0 < α ≤ 2) is the characteristic exponent, δ (−∞ < δ < ∞) corresponds to the location parameter, and γ (γ > 0) represents the scale parameter, known also as the dispersion. For values of α in the interval [1,2], the location parameter δ corresponds to the mean of the SαS distribution, while for 0 < α < 1, it determines its median. The dispersion parameter γ determines the spread of the distribution around the location parameter δ, similarly to the variance of the Gaussian distribution. The characteristic exponent α is the most important parameter of the SαS distribution since it determines its shape. The smaller the α is, the heavier the tails of the SαS density and the corresponding random process displays high impulsiveness, while higher values of α correspond to distributions that approach the Gaussian distribution. Actually, no closed-form expressions for pdf of SαS random variables are known except for α = 2 and α = 1, which correspond to the Gaussian and Cauchy distributions, respectively. The Cauchy pdf is given by f X (x; γ , δ) =
1 γ 2 π γ + (x − δ)2
(6.22)
where δ(−∞ < δ < ∞) corresponds to the location parameter, and (ξ > 0) represents the scale parameter, also known as the dispersion parameter. The peak shape of a Cauchy distribution is controlled by ξ . The smaller the value of, the narrower is the peak shape and vice versa (see Fig. 6.6.)
130
6
Protection of Fingerprint Data Using Watermarking
Fig. 6.6 Pdf of Cauchy distribution
The two parameters γ and δ can be estimated from the data set using the consistent ML method described by Nolan [35], which gives reliable estimates and provides the most tight confidence intervals. The use of the Cauchy distribution in Eq. 6.13 leads to the following watermark detector: l(y) =
N
ln(γ + (yi − δ) ) − ln γ + 2
2
2
i=1
yi −δ 1 + λwi∗
2 (6.23)
For the sake of simplicity, the mean μ0 and the variance σ02 are estimated numerically by evaluating l(y) for n fake sequences {w j : w j ∈ [−1, +1]; 1 ≤ j ≤ n}, so that the estimated mean and variance of l(y) are given by n 1 lj n j=1
(6.24)
1 (l j − μ0 )2 n − 1 j=1
(6.25)
μ0 =
n
σ02 =
where l j represents the log-likelihood ratio corresponding to the sequence w j . Through experiments, we found that a good estimation of μl0 and σl02 , with reasonable computational complexity, can be obtained by letting n = 100.
6.6 Experimental Results In this section, modelling the DWT coefficients and the performance of the detectors discussed earlier are evaluated. A wide range of real fingerprint
6.6
Experimental Results
131
(a)
(b)
(c)
(d)
Fig. 6.7 Test images with different visual quality: (a) “Image 22 1: good quality with normal ridges area”, (b) “Image 83 1: good quality with large ridges area”, (c) “Image 43 8: small ridges area (latent fingerprint)” and (d) “Image 68 7: poor quality”
images from Fingerprint Verification Competition “FVC 2000, DB3” database http://biometrics.cse.msu.deu/fvc00db/index.html and FVC 2004 are examined [36]. Without loss of generality, the results on sample images shown in Fig. 6.7 are plotted since the results obtained with other images are very similar. These test images are chosen in order to take into account the different visual qualities, ranging from high quality to low quality, thus, allowing the modelling results to be more general and reliable. In these experiments, a three-level DWT using Daubechies’ linear-phase 9/7 wavelet is used because it has been adopted as part of the WSQ compression standard.
132
6
Protection of Fingerprint Data Using Watermarking
6.6.1 Experimental Modelling of DWT Coefficients A set of experiments have been carried out to investigate the best distribution that can model the statistical behaviour of the DWT coefficients of fingerprint images. To do so, two different sets of tests were carried out. In the first test, the similarity between the real distribution and the of DWT coefficients against those obtained by the GGD, Laplacian and Cauchy distributions is evaluated by using the relative entropy or Kullback–Leibler (K–L) divergence, while in the second set of tests, the Quantile–Quantile (Q–Q) plots are examined. The relative entropy is a measure of the difference between two probability distributions: from the real distribution p to an arbitrary probability distribution q. The smaller the K–L divergence is, the more similar are the distributions and vice versa. The K–L divergence is given by D K −L ( p||q) = Σi pi ln
pi qi
(6.26)
The results obtained are reported in Table 6.1 and clearly show that the GGD provides the smallest K–L divergence for all images. On the other hand, this divergence is larger when using a Cauchy model. A Q–Q plot is a graphical technique for determining if two data sets are generated from populations having a common distribution. It is a plot of the quantiles of the first data set against the quantiles of the second data set. If the two data sets are taken from two populations with the same distribution, the points should fall approximately along a reference line. The greater the departure from this reference line, the greater the evidence that the two data sets have been generated from populations with different distributions. In our experiments, for a given fingerprint image we first estimate the parameters for each model from the DWT coefficients and then generate a large number of random samples drawn from the corresponding model having the estimated parameters. The quantiles of the real DWT coefficients against the quantiles of the random generated samples are plotted. For the Q–Q plot corresponding the GGD, most of the ‘+’ marks have a straight line shape for Image 22 1 (Fig. 6.8a) and Image 86 7 (Fig. 6.9b), deviating slightly from the reference line for Image 83 1 (Fig. 6.8b) but with more significant deviation for Image 43 8 (Fig. 6.9a). For the Laplacian model, most of the ‘+’ marks of Table 6.1 K–L divergence of the high-resolution DWT subbands obtained using Daubechies 9/7 wavelet at the 3rd level. LH : horizontal subband; LH: vertical subband; HH: diagonal subband.
GGD Laplacian Cauchy
Image 22 1
Image 83 1
Image 43 8
Image 86 7
0.0582 0.1267 0.1741
0.0661 0.1808 0.2332
0.1530 0.7098 0.0893
0.0376 0.1224 0.1542
6.6
Experimental Results
133
GGD Q−Q Plot for Image 22__1
−400
−200 0 200 DWT Coefficients Quantiles
400
GGD Q−Q Plot for Image 83__1
600
−600 −400 −200 0 200 400 DWT Coefficients Quantiles
(a)
−200
0
200
400
Laplacian Q−Q Plot for Image 83__1
600
−600 −400 −200
0
200
400
600
DWT Coefficients Quantiles
DWT Coefficients Quantiles
(c)
(d)
Cauchy Q−Q Plot for Image 22__1
Cauchy Q−Q Plot for Image 83__1
−400
−200
0
200
DWT Coefficients Quantiles
(e)
800
(b)
Laplacian Q−Q Plot for Image 22__1
−400
600
400
600
−600 −400 −200
0
200
400
600
800
800
DWT Coefficients Quantiles
(f)
Fig. 6.8 Q–Q plots of DWT coefficients of sample images (Left: “Image 22 1”, Right: “Image 83 1”) for different models (Top: GGD; Middle: Laplacian; Bottom: Cauchy.)
134
6 GGD Q−Q Plot for Image 43__8
−600 −400 −200 0 200 400 DWT Coefficients Quantiles
Protection of Fingerprint Data Using Watermarking GGD Q−Q Plot for Image 86__7
600
800
−300 −200 −100 0 100 200 DWT Coefficients Quantiles
300
(a)
(b)
Laplacian Q−Q Plot for Image 43__8
Laplacian Q−Q Plot for Image 86__7
−600 −400 −200
0
200
400
600
800
−300 −200 −100
0
100
200
300
DWT Coefficients Quantiles
DWT Coefficients Quantiles
(c)
(d)
Cauchy Q−Q Plot for Image 43__8
Cauchy Q−Q Plot for Image 86__7
−3000 −2000 −1000
0
1000 2000 3000 4000
−600 −400 −200
0
200
400
DWT Coefficients Quantiles
DWT Coefficinets Quantiles
(e)
(f)
600
400
400
800
Fig. 6.9 Q–Q plots of DWT coefficients of sample images (Left: “Image43 1”, Right: “Image 86 7”) for different models (Top: GGD; Middle: Laplacian; Bottom: Cauchy.)
6.6
Experimental Results
135
the Q–Q plot follow a straight line but with a significant deviation from the reference line for Image 22 1 (Fig. 6.8c), Image 83 1 (Fig. 6.8d) and Image 86 7 (Fig. 6.9d). However, the marks follow a curve shape for Image 43 8 (Fig. 6.9c). For the Cauchy model, the ‘+’ marks in the plot have also curve like shape which does not follow a straight line for all test images (Figs. 8e,f and 9e,f). In conclusion, the Q–Q plots for all fingerprint images show that the GGD provide the best fit for the DWT coefficients. The results obtained for modelling the DWT coefficients reveal that the detector based on the GGD is expected to yield better watermark detection performances than those based on Laplacian and Cauchy models. Moreover, the Laplacian is expected to provide good/acceptable detection results. It is worth noting that none of the three used distributions model accurately the coefficients distribution of Image 43 8. The reason is that in this image, the region of interest (or the ridges’ area) is somewhat small when compared to the overall size of the image (i.e. most of it is composed by smooth area or background).
6.6.2 Experimental Watermarking Results In these experiments, the watermarks are cast in all coefficients of the highresolution horizontal (H L), vertical (L H ) and diagonal (H H ) subbands at level 3. Two main issues are considered here. First, the imperceptibility of the watermark is quantitatively evaluated by using the Peak Signal-to-Noise Ratio (PSNR). Second, the detection performance is assessed by the probability of false alarm and the probability of true detection. It is worth mentioning that the watermark consists of 12,090 (4,030 coefficients/subband) random real numbers uniformly distributed in the range [−1, +1]. 6.6.2.1 Imperceptibility Analysis First, the dispersion of the watermark in the spatial domain is assessed. Figure 6.10 shows the difference image between the original image and its corresponding watermarked image. As can be observed from the difference image, the watermark is concentrated in the ridges area. This is justified by the fact that the DWT provides an image-dependent distortion which is mostly concentrated at edges and textured areas. The second set of experiments were conducted to evaluate the fidelity which is a measure of the similarity between the watermarked data and the original one. This is done by using the PSNR, which is the widely used distortions measure in the literature. Figure 6.11 shows the PSNR of test images with different watermark strength λ. As can be seen, bad quality images and images with less textures (i.e. small ridges area) provide higher values of PSNR. This is justified by the fact that the DWT coefficient of such images are smaller. Indeed, since the watermark casting depends on the coefficient amplitude in the multiplicative case, the watermark magnitude added to the image will be smaller. However, this fact does not reflect
136
6
Protection of Fingerprint Data Using Watermarking
(a)
(b)
(c)
(d)
Fig. 6.10 Difference image between the original image and its corresponding watermarked one: (a) “Image 22 1”, (b) “Image 83 1”, (c) “Image 43 8” and (d) “Image 68 7”
the true fidelity because it is well known that the human visual system is less sensitive to changes in such regions compared to smooth and non textured areas. For instance, the PSNR, as a perceptual model, suggests that the watermarked image of Image 43 8 should be perceptually better than the watermarked image of Image 83 1; however, the watermarked image 43 1 shows more visible distortions when compared to the watermarked Image 83 1. 6.6.2.2 Detection Performance In order to evaluate the performance of the detectors, the test images were watermarked using λ = 0.10. The Receiver Operating Characteristics (ROC) curves, which are widely adopted in the literature, were used to assess the performance
6.6
Experimental Results
137
Image 22__1 Image 83__1 Image 43__8 Image 86__7
0.10
0.15
0.20 0.25 0.30 Watermark Strength
0.35
0.40
0.45
Fig. 6.11 PSNR of watermerked images
of the detectors. The ROC curves represent the variation of the probability of true detection (PDet ) against the probability of false alarm (PF A ). A perfect detection yields a point at coordinate (0,1) of the ROC space meaning that all given watermarks were detected without any false alarm. The theoretical false alarm is set to the range 10−4 to 10−1 . The experimental ROC curves are computed by measuring the performance of the actual watermark detection system by calculating the probability of detection from real watermarked images. Experiments are then conducted by comparing the likelihood ratio with the corresponding threshold for each value of the false alarm probability and for 1000 randomly generated watermarks. If the likelihood ratio is above the threshold under H1 , the watermark is detected and if it is above the threshold under H0 , a false alarm occurs. A blind detection is used so that the parameters of each detector are directly estimated from the DWT coefficients of the watermarked image. It is worth noting that the optimal parameter values for both the GGD and the Cauchy distribution may be different for each DWT coefficient, but for practical purposes a constant value over all coefficients suffices. The results on the sample images are plotted in Fig. 6.12. As can be seen and as it was expected for all images, the performance of the GGD detector is significantly better than that obtained for the Laplacian and Cauchy detectors. In addition, the Laplacian detector provides close results to the GGD one for images (Image 22 1, 83 1 and 86 7). It is worth noting that all detectors generate very high false alarm for Image 43 8. In general, images with large and well-defined ridge area provide good detection performance because such images have higher
138
6
Protection of Fingerprint Data Using Watermarking
GGD Laplacian model Cauchy model
GGD Laplacian model Cauchy model 10−2 10−1 Probability of False Alarm
100
10−2 10−1 Probability of False Alarm
(a)
100
(b)
GGD
GGD Laplacian model Cauchy model 10−1
Laplacian model Cauchy model 100
10−2
10−1
Probability of False Alarm
Probability of False Alarm
(c)
(d)
100
Fig. 6.12 Difference image between the original image and its corresponding watermarked one: (a) “Image 22 1”, (b) “Image 83 1”, (c) “Image 43 8” and (d) “Image 68 7”
DWT coefficients allowing the embedding of watermarks with higher amplitudes. Indeed, higher values for watermark amplitude make the hypotheses H0 and H1 more distinguishable.
6.7 Conclusions This chapter addresses the 1-bit multiplicative watermark detection for fingerprint images. Watermarking technique can be a solution for securing fingerprint data and thwart against some attacks that may affect the reliability and the secrecy of fingerprint-based systems. Watermarking system can be divided in two main processes: embedding and detection. In this chapter, we have focussed on the watermark detection which aims to detect whether a given watermark was embedded into the host data. The problem of detection is formulated theoretically based on a ML
References
139
estimation scheme requiring an accurate statistical modelling of the host data. This theoretical formulation allows for the derivation of optimal detector structures; this optimality of the detector structure depends on the accuracy of the statistical distribution used to model the statistics of the host data. The watermark is embedded into the DWT domain because the ridges and textures are usually well confined to the DWT coefficients of the high-frequency subbands. In addition, watermarking in the DWT is very robust to compression methods such as WSQ compression which is the standard adopted by the FBI and many other investigation agencies. First, the modelling of the DWT coefficients is carried out to determine the best model. The generalised Gaussian, Laplacian and Cauchy models were investigated and compared and the experimental results reveal that the GGD distribution provides the best model that can represent the distribution of the DWT coefficients. Then, the structures of the optimum detector of the three models were derived and the performance of the detectors were assessed through extensive experiments. It has been found that the detector based on the GGD outperforms the Laplacianbased detector, which in turn, significantly outperforms the Cauchy detector. The overall performance of the detectors is dependent on the fingerprint characteristics. This dependence is related to the size of the ridge area relative to the size of the fingerprint image. The bigger the ridge area, the higher the detection performance.
References 1. N. K. Ratha, H. H. Connell and R. M. Bolle, “An analysis of minutiae matching strength,” In The 3rd International Conference on Audio-and Video-Based Biometric Person Authentication (AVBPA2001), vol. 2091, pp. 223–228, 2001. 2. B. Schneier, “The uses and abuses of biometrics,” Communications of the ACM The 3rd International Conference on Audio-and Video-Based Biometric Person Authentication (AVBPA2001), vol. 42, no. 8, pp. 136, August 1999. 3. D. Maltoni, D. Maio, A. K. Jain and S. Prabhakar, “Handbook of Fingerprint Recognition,” Springer, New York, 2003. 4. N. Hartung and M. Kutter, “Multimedia watermarking techniques,” Proceeding of IEEE, vol. 42, no. 8, pp.1079–1107, 1999. 5. M. D. Swanson, M. Kobayashi and A. H. Tewk, “Multimedia data-embedding and watermarking technologies,” Proceeding of IEEE, vol. 86, pp. 1064–1087, 1998. 6. M. Yoshida, T. Fujita and T. Fujiwara, “A new optimum detection scheme for additive watermarks embedded in spatial domain,” International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2006), pp. 101–104, December 2006. 7. I. G. Karybali and K. Berberidis, “Efficient spatial image watermarking via new perceptual masking and blind detection schemes,” IEEE Transaction Information Forensics and Security, vol. 1, no. 2, pp. 256–274, June 2006. 8. J. R. Hernandez, M. Amado and F. Perez-Gonzales, “Dct-domain watermarking techniques for still images: Detector performance analysis and a new structure,” IEEE Transactions on Image Processing, vol. 9, no. 1, pp. 55–68, January 2000. 9. A. Briassouli, P. Tsakalids and A. Stouraitis, “Hidden messages in heavy-tails: Dct-domain watermark detection using alpha-stable models,” IEEE Transactions on Image Processing, vol. 7, no. 4, pp. 700–715, August 2005.
140
6
Protection of Fingerprint Data Using Watermarking
10. T. M. Ng and H. K. Garg, “Wavelet domain watermarking using maximum-likelihood detection,” Proceeding of SPIE Security, Steganography, and Watermarking of Multimedia Contents, vol. 5306, pp. 816–826, June 2004. 11. F. Kheli, A. Bouridane, F. Kurugollu and I. Thompson, “An improved wavelet-based image watermarking technique,” Proceeding of IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS2005), pp. 588–592, August 2005. 12. M. Barni, F. Bartolini, A. De Rosa and A. Piva, “A new decoder for the optimum recovery of nonadditive watermarks,” IEEE Transaction on Image Processing, vol. 10, no. 5, pp. 755–765, May 2001. 13. Q. Cheng and T. S. Huang, “Optimum detection and decoding of multiplicative watermarks in dft domain,” Proceeding of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP2002), pp. IV–3477–IV–3480, May 2002. 14. J. J. K. Ruanaidh and T. Q. Pun, “Rotation, scale and translation invariant spread spectrum digital image watermarking,” Signal Processing, vol. 66, no. 3, pp. 303–318, 1998. 15. C. Y. Lin, M. Wu, J. A. Bloom, I. J. Cox, M. Miller and Y. M. Lui, “Rotation, scale, and translation resilient public watermarking for images,” IEEE Transactions on Image Processing, vol. 10, no. 5, pp. 767–782, May 2001. 16. S. Pankanti and M. M. Yeung, “Verication watermarks on fingerprint recognition and retrieval,” Proceeding SPIE, Security and Watermarking of Multimedia Contents, vol. 3657, pp. 66–78, 1999. 17. S. Jain, “Digital watermarking techniques: A case study in fingerprints and faces,” Proceeding of Indian Conference on Computer Vision, Graphics and Image Processing, pp. 139–144, 2000. 18. B. Gunsel, U. Umut and A. M. Tekalp, “Robust watermarking of fingerprint images,” Pattern Recognition, vol. 35, no. 12, pp. 2739–2747, 2002. 19. A. K. Jain and U. Uludag, “Hiding biometric data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 11, pp. 1494–1498, November 2003. 20. F. Ahmed and I. S. Moscowit, “Composite signature based watermarking for fingerprint authentication,” Proceeding of 7th Workshop Multimedia and Security, pp. 137–142, 2005. 21. F. Ahmed and I. S. Moscowit, “A correlation-based watermarking method for image authentication application,” Optical Engineering Journal, vol. 43, no. 8, pp. 1833–1838, 2004. 22. K. Zebbiche, L. Ghouti, F. Kheli and A. Bouridane, “Protecting fingerprint data using watermarking,” Proceeding of the 1st AHS conference, pp. 451–456, June 2006. 23. M. K. Khan, L. Xie and J. Zhang, “Robust hiding of fingerprint-biometric data into audio signals,” Proceeding of the 2nd International Conference on Biometrics (ICB2007), vol. 4642/2007, pp. 702–712, August 2007. 24. G. F. Elmasri and Y. Q. Shi, “Maximum likelihood sequence decoding of digital image watermarks,” Proceeding of SPIE Security and Watermarking of Multimedia Contents pp. 425–436, 1999. 25. Q. Cheng and T. S. Huang, “An additive approach to transform-domain information hiding and optimum detection structure,” IEEE Transaction on Multimedia, vol. 3, no. 3, pp. 273–284, September 2001. 26. A. Papoulis, “Probability, Random Variables, and Stochastic Processes,” McGraw-Hill, New York, 1991. 27. J. V. Di Franco and W. L. Rubin, “Radar Detection,” SciTech Publishing, Raleigh, January 2004. 28. T. Ferguson, “Mathematical Statistics: A Decision Theoretical Approach,” Academic Press, New York, 1967. 29. X. G. Xia, C. G. Bonklet and G. R. Acre, “Wavelet transform based watermark for digital images,” Optics Express, vol. 3, no. 12, pp. 497–511, December 1998. 30. G. C. Langelaar, I. Styawan and R. L. Lagendijk, “Watermark digital image video data: A state-of-art overview,” IEEE Signal Processing Magazine, vol. 17, no. 5, pp. 20–46, September 2000.
References
141
31. M. N. Do and M. Vetterli, “Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance,” IEEE Transaction on image processing, vol. 11, no. 2, pp. 146–158, February 2002. 32. Y. Hu, S. Kwong and Y. K. Cha, “The design and application of dwtdomain optimum decoders,” In First International Workshop, IWDW2003, vol. 2613/2003, pp. 25–28, 2003. 33. T. M. Ng and H. K. Garg, “Maximum likelihood detection in dwt domain image watermarking using Laplacian modeling,” IEEE Signal Processing Letters, vol. 12, no. 4, pp. 285–288, April 2005. 34. G. Tzagkarakis and P. Tsakalides, “A statistical approach to texture image retrieval via alphastable modeling of wavelet decomposition,” In 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), pp. 21–23, 2004. 35. J. P. Nolan, “Maximum Likelihood Estimation and Diagnostics for Stable Distributions,” Technical report, American University, Washington June 1999. 36. R. Cappelli, D. Maio, D. Maltoni, J. L. Wayman and A. K. Jain, “Performance Evaluation of Fingerprint Verification Systems,” IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 28, no. 1, pp. 3–18, January 2006.
“This page left intentionally blank.”
Chapter 7
Shoemark Recognition for Forensic Science: An Emerging Technology
SoleMate, Foster & Freeman’s database of shoeprints, proved its worth recently during the investigation of a murder of a woman in the kitchen of her Coventry home in the U.K. Officers from West Midlands police station was confirmed by this database that the suspect’s shoeprint had been produced by a Mountain RidgeTM shoe, unique to JJB Sports, a UK nationwide shoe retailer, and that there were two models using that particular sole, and this footwear was first available during the summer of 2002. With this information, police officers were able to locate the store in which the footwear was purchased and to trace the shop’s copy of the actual receipt issued to the suspect, later confirmed when an empty shoe box was found at the suspect s home. Foster and Freeman Ltd.
It is generally understood that marks left by an offender’s footwear at a crime scene may be helpful in the subsequent investigation of the crime. For example, shoeprints can be useful to likely link crime scenes that have been committed by the same person. They can also be used to target the most prolific offenders by allowing officers to be alerted to watch out for particular shoemarks. Also, shoeprints can provide useful intelligence during interviews so that other crimes can be brought to an offender. Finally, they can provide evidence of a crime if a shoeprint and suspect’s shoe match. This chapter is intended to introduce the emerging shoemark evidence for forensic use. It starts by giving a detailed background of the contribution of shoemark data to scene of crime officers including a discussion of the methods currently in use to collect shoeprint data. Methods for the collection of shoemarks will also be detailed and problems associated with each method highlighted. Finally, the chapter gives a detailed review of existing shoemark classification systems.
7.1 Background to the Problem of Shoemark Forensic Evidence A shoemark is a mark made when the sole of a shoe comes into contact with a surface. People committing crimes inevitably leave their shoemarks at the crime scene. A. Bouridane, Imaging for Forensics and Security, Signals and Communication Technology, DOI 10.1007/978-0-387-09532-5 7, C Springer Science+Business Media, LLC 2009
143
144
7
Shoemark Recognition for Forensic Science: An Emerging Technology
Each time a person takes a step, there is no doubt that some sort of interaction between his (her) shoes and the surface occurs. This could be a deformation of that surface or the exchange of trace materials and the residue from the shoe to the surface. In the case where the surface is deformable e.g. snow and sand, a threedimensional impression is created as a result of the pressure exerted on that surface. When the surface is solid a visible pattern may still be transferred to the surface from a sole. This is as a result of an exchange of materials between the shoe and the surface. It should be noted that not all of the shoemarks are visible or detectable with the limit of the current technologies, but the chances are excellent that a great number of them will be. The author [6] claimed that there should be equal and perhaps even greater chance that footwear impressions could be present at a crime scene, compared with the presence of latent fingerprints. So far, the later has been widely accepted as a powerful tool in forensic applications, while the footwear impressions also are being realised as a potential assist in forensic investigations. A study in [2] suggests that footwear impressions could be located and retrieved at approximately 35% of all crime scenes. Statistics from Home Office of United Kingdom show that 14.8% of crime scenes attended by crime scene investigators in 2004–2005 yielded shoe print evidence. The crimes investigated consisted primarily of burglaries. It is also reported that by emphasising the potential evidence of shoemarks to crime scene personnel and by teaching the basics of locating and recovering footwear impressions, the percentage of cases in which footwear impression evidence was now being submitted to the laboratory had increased from less than 5% to approximately 60% [6]. Figure 7.1 shows some examples of shoemarks recovered from different crime scenes.
7.1.1 Applications of a Shoemark in Forensic Science As a form of physical evidence, shoemarks provide an important link between the criminals and the place where the crime occurred. In some cases, shoemarks can be positively identified as having been made by a specific shoe to the exclusion of all other shoes, which very often relies on a physical match of random individual characteristics the shoe has acquired with those respective features in the impressions (Fig. 7.2). Here, the physical match means that the questioned impression and the known shoe share one or more confirmed random characteristics, such as accidental cuts, tears, or other random features of varied size and shape. This match could not be repeated on another outsole in the opinion of a qualified shoemark expert. In other cases, the detail revealed in a shoemark may not be sufficient to positively identify a specific shoe, but the information may still be very significant, and can reveal the type, make, description, and approximate or precise size of the footwear that made them, so may still be very useful for forensic investigations. The following is the list of some of the forensic applications of shoemarks:
7.1
Background to the Problem of Shoemark Forensic Evidence
145
Fig. 7.1 Examples of shoeprint images retrieved from scene of crimes (Foster & Freeman Ltd.)
146
7
Shoemark Recognition for Forensic Science: An Emerging Technology
Fig. 7.2 Examples of random characteristics of a shoemark
• Assisting in the process of developing a suspect. There is no doubt that, in many cases, a positive identification can be applied as a sound evidence in the court of justice. Furthermore, some less positive identification can also be used to verify or rebut the information provided by witnesses or suspects. It can also provide investigators with a clue of who may be considered as a suspect. • Assisting in the reconstruction of a crime. The locations, characteristics, and the orientations of a shoemark can often help in determining the number of suspects, their path into, through, and away from the crime scene. • Assisting in building the link of a number of crimes, especially for burglaries. It is known that the majority of crimes are committed by repeat offenders and that it is common for burglars to commit a number of offences in the same day [11]. As it would be unusual for an offender to discard their footwear between committing different crimes, timely identification and matching of shoemarks allows various crime scenes to be linked. The linking of crime scenes provides more information about the activities of an offender and therefore increases the chances of identifying the responsible party, and secondly, it is more efficient to investigate the crimes of an offender as a whole than individually [14].
7.1.2 The Need for Automating Shoemark Classification In this section, justification is provided for the research works described in the following chapters. The section begins by describing some of the problems that occur with existing systems. These problems result in shoemarks being poorly classified and/or misidentified. Many systems already exist for the purpose of classifying and identifying shoemarks. Some are computerised and some are manual, but no system, commercial or otherwise, performs the identification process automatically.
7.1
Background to the Problem of Shoemark Forensic Evidence
147
Currently only about 14% of shoemarks collected from a crime scene are actually examined and a much smaller number of these actually identified. The best existing systems only recognise about 25% of recovered shoemarks which is only about 3.5% of the total available shoemarks at crime scenes [10]. The statistic provided relates to shoemarks collected and examined in Holland. Informal conversation with British forensic scientists suggests that figures in the UK are roughly the same. There are several reasons why more impressions are not identified. They fall mainly into one of three categories: inconsistent classification, importable classification schema and insufficient recognition time.
7.1.3 Inconsistent Classification Inconsistent classification primarily occurs because of the large number of shoemark descriptors required to uniquely classify a shoemark. Shoemark descriptors are the names used to describe the particular patterns seen in a shoemark image. This type of classification is called feature-based classification and its success relies on finding and identifying the key features of the shoemark pattern. Until the late 1970s shoe soles did not contain much variation. Often several different manufacturers used the same sole for a number of different models of shoes and the complexity of the sole pattern was low. Different size soles were simply cut from the same sheet of rubber rather than using scaled patterns. This meant that when classifying a shoemark it was often impossible to distinguish between several different shoemarks, although it did mean that the investigator responsible for classifying and cataloguing the mark had a comparatively easy task. When subsequently searching for a shoemark, it was likely that a large number of possible matches would be found. This required the forensic scientist to compare accidental characteristics of each of them to achieve a final match. Accidental characteristics are defects in the shoemark caused either by the manufacturing process or shoe sole wear that occurs over time. When trying to identify a particular suspect’s shoemark, rather than just a brand or model, accidental characteristics must be examined. However, this process is very time consuming and effective classification of shoemarks found at crime scenes will help reduce the number of shoemarks that will have to be compared in this way. With the advent of injection moulding and modern manufacturing techniques the number of different shoe soles on the market rose dramatically. To be able to distinguish between the different soles more and more descriptors are needed to be added to the existing classification schema to prevent two different soles generating the same classification. Whereas this theoretically increased the resolving power of the classification scheme, it introduced a new source of error. When the number of features was small an individual could consistently classify a shoemark. As the number of features increased, the possibility of inconsistently describing the feature also increased. For example, if the only descriptors available were straight and wavy line it would be easier to consistently classify a sole with wavy lines patterns than if the descriptors also included zigzag line and curvy line (Fig. 7.3).
148
7
Shoemark Recognition for Forensic Science: An Emerging Technology
Fig. 7.3 Example of features and descriptors
While shoemark classification remains a human subjective process it is unlikely to be consistent. When a single operator is responsible for classifying all shoemarks that are submitted to a laboratory this problem is minimised as the operator learns to classify consistently. However, in practice, a single laboratory often has to deal with too many impressions for it to be practical for a single operator to classify them all.
7.1.4 Importable Classification Schema In the past most forensic laboratories had paper catalogues containing stock images of manufacturers’ shoe soles. These catalogues were often organised in a proprietary way making it difficult for different laboratories to share information. It also required a skilled operator who knew the system making it difficult to operate if that person is absent. Later, as laboratories began to migrate their existing paper systems to computers, experts acknowledged that there was a need to create a standardised classification scheme. They saw this migration as an ideal time to implement it. As such, many new schemes were proposed but universal agreement on a standard was not agreed. Many of the new schemes were implemented and used in various forensic laboratories around the world. A major problem now exists because the experts missed the opportunity to create a single all encompassing standard. Most of the major police
7.2
Collection of Shoemarks at Crime Scenes
149
forces in Europe use their own proprietary classification scheme and in some countries they use more than one. In an age where there is very little restriction on where people can travel, it has become ever more desirable that laboratories in different locations can quickly and efficiently share data. For example if a crime is committed in Holland and the suspect crosses the border into Belgian, the Dutch police force has no way of sharing their shoemark intelligence with the Belgian police force other than to share the original shoemark and allow the Belgium police to reclassify it. The problem of proprietary classification schemes has not been solved but merely exacerbated. In addition where different laboratories use the same classification schema they are still unlikely to produce the same classification when given identical shoemarks to classify [12].
7.1.5 Shoemark Processing Time Restrictions In areas with a high crime rate the number of shoemarks produced is often too large for the investigating team to practicably examine all of them. This is true for both the number of shoemarks created at the crime scene and often for the number of shoemarks actually collected. For this reason valuable shoeprint evidence, especially in lower profile cases such as property crime, is often not utilised. For example within a couple of months of starting a new shoemark identification programme in one East London borough a backlog of 1,500 shoemarks built up [13].
7.2 Collection of Shoemarks at Crime Scenes In the previous section we identified several problems that occur during the identification of shoemarks. It can be seen that many of these arise from problems associated with the initial classification of the shoemark. One approach to minimising inconsistent classification due to human subjective interpretation of the marks has been to limit the number of descriptors used during classification. This is in stark contrast to the natural evolution of shoemark classification schema that has tended to add more and more descriptors as the complexity of shoemarks increased. Some modern classification systems under development have invested time in determining the smallest set of descriptors that can be used and still provide effective classification. However, reducing the number of descriptors generally does reduce the resolving power of the system and therefore its usefulness. There are a number of researchers working on the problem of reducing classification error and their particular approaches are discussed in Section 7.4 including a description of a number of the most common existing shoemark classification schemes. Parallel to the effort being made by researchers in forensic science there is also a considerable effort being made in the image processing domain to develop algorithms for general purpose image classification and recognition. It seems natural that
150
7
Shoemark Recognition for Forensic Science: An Emerging Technology
tools being developed in image processing and computer vision could be applied to the problem of shoemark classification and identification. The use of image processing in real world situations outside of computer science is also increasing. This includes its use in fingerprint and DNA identification. Unfortunately due to the nature of shoemark evidence, in particular the fact that it is often only suitable to be used as corroborative evidence in court [13], there has been less interest in the media and scientific community about it than other areas of forensic imaging.
7.2.1 Shoemark Collection Procedures The examination of shoemarks in manual systems consists of three activities: collection, classification and matching. A forensic scientist, or more commonly a Scene of Crimes Officer (SOCO), will collect the shoemark from the crime scene. This will then be classified and analysed in a controlled laboratory environment at a later date. The techniques used for collecting shoemarks differ according to the process involved in creating the mark such as • Types of Shoemarks. • Shoemarks can be broadly split into two types, impressions and transfer/contact prints. • Shoemark Impressions. Impressions occur when the shoe comes into contact with a soft malleable surface such as soil, snow or sand. The result of the contact will be an impression showing details of the “tread pattern” of the shoe in three dimensions. In burglaries in the UK it is common to find this type of shoemark in soil outside the point of entry.
7.2.2 Transfer/Contact Shoemarks Transfer/contact prints are created by the transfer of a material such as blood, mud, paint, and water, to the sole of the shoe and then in turn to the contact surface. In burglaries this type of shoemark is often found on windowsills where a burglar has placed his/her foot while climbing through an open or broken window. It is also common to find this type of shoemark on fragments of broken glass that lie just inside the point of forced entry through a window or door. This is often how partial shoemark images are created. Sometimes the action of the sole of the shoe on a surface may remove something from that surface. This is often the case when walking on dusty surfaces. Shoemarks can be found on many different surfaces. Tables 7.1 and 7.2 show the likelihood of shoemarks being found on various floor surfaces.
7.2
Collection of Shoemarks at Crime Scenes
151
Table 7.1 Likelihood of detectable marks occurring on different two-dimensional shoe/surface combinations [5]
Surface Carpet Dirty Floor with accumulation of dust, dirt or residue. Relatively clean but unwaxed floor. Clean Waxed Tile or Wood Floor Waxed Bank Counter, Desk Top, etc. Glass Kicked in Door Paper, Cardboard.
Damp and wet shoes
Shoe with blood, grease, oil
Dry shoes with dust or residue
Clean dry shoe
Unlikely Likely
Very likely Very likely
Likely Unlikely
Unlikely Unlikely
Likely
Very likely
Very likely
Unlikely
Likely
Very likely
Very likely
Likely
Likely
Very likely
Very likely
Likely
Very likely Very likely Very likely
Very likely Very likely Very likely
Very likely Very likely Very likely
Likely Likely Likely
Table 7.2 Detection of footwear marks after walking on various floor surfaces for five minutes [3] Premises
Floor type
Area
Footware mark
Household Kitchen Fish and Chip shop
Vinyl Tiles
Detected Detected
Sandwich Bar
Tiles
Butchers Shop
Paint with sawdust covering Carpet Carpet Tiles Concrete Vinyl
General surfaces Customer area and behind counter Customer area and behind counter Customer area Dining room Living room General surfaces Oily area General surfaces
Nothing detected Nothing detected Detected Detected Nothing detected
House House Kitchen Garage Office
Detected Detected
Specialised techniques have been developed for recording or “lifting” shoemarks found on these surfaces and these are described in detail during the remainder of this section.
7.2.3 Photography of Shoemarks The most common technique used when collecting shoemarks is photography. It is usual to photograph all visible shoemarks before attempting other collection techniques. When photographing a shoemark, a slow speed (typically ISO 100) black and white film is used. Slower films have typically higher resolution than
152
7
Shoemark Recognition for Forensic Science: An Emerging Technology
faster films that may appear “grainy”. The use of high-resolution film is necessary as the characteristics that the forensic examiner may be interested in are often so small that they are not visible to the naked eye [7]. The use of black and white film is often preferred because some experts believe that the additional layers of emulsion in coloured films result in a lower contrast, less easily examined image. This consideration is still very important even when using computerised systems. Although the digitisation of shoemark photographs before they are entered into a computerised system results in a loss of resolution, wherever possible the original images will be presented as evidence in court. When photographing a shoemark a scale is positioned adjacent to, and on the same plane as, the shoemark. This allows accurate measurements of the shoemark to be made in the laboratory, or allow easier verification that the shoemark was printed at 1:1 scale. A label identifying the impression, orientation and location is also placed within the image frame. This process minimises the possibility that different shoemarks photographed in the same location become confused with each other. It also helps to provide substantiation of continuity of evidence. The physical process of taking the photograph requires that the camera be positioned, mounted on a tripod directly over the impression. The film’s plane should be adjusted so that it is parallel to the plane of the shoemark. This helps to minimise the amount of perspective distortion. The frame in the camera’s viewfinder is adjusted so that it includes the shoemark, the scale and the label and the camera focus is set to the shoemark and not the scale as often the scale is at a slightly different focal depth. Wherever possible strong sources of ambient lighting are disconnected to remove shadow. When flash is needed it is diffused or positioned at a distance from the shoemark so that it does not result in unwanted glare.
7.2.4 Making Casts of Shoemarks When an impression is left in soil, snow or other easily deformable material, it is common that a cast is made using Plaster of Paris1 or Dental Stone.2 In the case where the shoemark is left in snow, the site of the impression may require “setting” with Snow Print Wax3 before Plaster of Paris can be used; or the cast may be made with liquid sulphur.
1
Plaster of Paris is a mixture of powdered and heat-treated gypsum that when mixed with water flows freely but hardens to a smooth solid as it dries. 2 Dental Stone is a plyable material used in the dental industry for making impressions of teeth and gums. 3 SnowPrint Wax is a commercial product used to add strength to an impression left in snow.
7.2
Collection of Shoemarks at Crime Scenes
153
If any lightweight debris has fallen into the impression since it was created it should be removed with tweezers. Care should be taken not to remove any extraneous matter that is part of the impression itself. In the case where the impression has been made in loose sandy soil a fixing agent such as hairspray may be used as a means of “binding” loose particles together prior to casting. The liquid casting material should be poured into the impression from a height of only a few centimetres; this helps prevent damage to the surface of the impression. The material should be poured from a position to the side of the part of the impression caused by the shoe arch. This area contains the least useful information when classifying the shoemark. The liquid should be poured until the mixture overflows the impression and onto the ground surface. As the material starts to harden information pertinent to the case should be scratched into the cast surface. The cast should be left for at least 20 min to dry in warm weather, longer when it is cold. The cast may then be lifted using a thin bladed spatula inserted into the soil well beneath the cast. The cast should then be left to air dry for at least 48 h before it may be cleaned of extraneous soil by soaking the cast in a solution of potassium sulphate. In the case of a serious crime the cast may be stored and kept as evidence. However, it is more common that the cast is photographed and the photograph be used as evidence. A technique often used when photographing a cast is to use oblique rather than direct lighting. This throws a slight shadow that brings certain detail in the cast into sharp relief helping the examiners to see small details.
7.2.5 Gelatine Lifting of Shoemarks Some shoemarks may be collected using gelatine lifting techniques. A “Gelatine Lift” consists of an inert backing material, usually a polyester film, coated with a low adhesive gelatine layer. This type of capture technique can be used to collect shoemarks from a wide variety of surfaces including porous materials such as paper and cardboard; however, the surface must be smooth and hard, such as floor coverings and table tops. Gelatine lifts are often able to capture shoemarks from surfaces where shoemarks are not visible to the naked eye. The process of lifting can take up to 10 min for the mark to transfer to the gelatine depending on the substrate and the transfer material.
7.2.6 Electrostatic Lifting of Shoemarks Electrostatic shoemark lifting operates by collecting the dust left behind by the shoemark. An electrically charged sheet is placed over the area of the shoemark and the dust particles are attracted to the surface. When the voltage is removed the dust remains on the sheet and may be fixed there using sticky plastic sheeting.
154
7
Shoemark Recognition for Forensic Science: An Emerging Technology
The power supply generates an electric field, the stronger the field the stronger the device’s ability to attract a dust mark. The devices are often used to recover shoemarks left on paper, linoleum, wood, carpet and concrete but cannot be used to collect shoemarks from wet surfaces. The image resulting from electrostatic lifting is the negative of the impression, i.e. the contact between the shoe sole and the substrate removed something (usually dust) and the collection process has collected what was not removed.
7.2.7 Recovery of Shoemarks from Snow Special mention of the procedures used for collecting shoemarks left in snow is made here. This is because although they are not commonly found within the United Kingdom shoemark impressions left in snow are usually of very high quality, and are commonly found elsewhere in Europe. They show fine, clear and distinctive detail. The process for collecting shoemarks left in snow principally follows the procedures already given for taking photographs or making casts of shoemarks. However, a number of common problems can occur when trying to make a cast in snow using the normal casting techniques. The most common of these is that the casting material often freezes before it flows into every feature of the impression. Another problem is that the walls of the impression are prone to collapsing, depositing snow into the bottom of the impression and obscuring detail. Photography also has drawbacks, the resulting images lacking contrast. Spraying the impression with a coloured aerosol from a low angle will reveal more detail in the resulting image when the impression is photographed. This technique produces an effect similar to using oblique lighting while photographing shoemarks. Spraying the impression with Snowprint Wax will seal the impression with a thin wax coating that greatly increases its strength prior to making a cast. When using Snowprint Wax care must be taken to prevent the pressure from the propellant from damaging the impression. If a cast has to be made using Plaster of Paris the addition of potassium sulphate to the mixture lowers its freezing point. This prevents ice crystals from forming and increases the setting time so that the liquid has time to flow before it hardens. In some circumstances liquid sulphur may be used as the casting material as it exhibits similar properties. In the same way as described earlier with other types of casts it is common to take photographs of shoemarks impression left in snow. As is the case with other types of cast it is also common to use oblique lighting to highlight relief in casts made from shoemarks in snow.
7.2.8 Recovery of Shoemarks using Perfect Shoemark Scan A simple technique for capturing shoemarks directly from a shoe sole is to use a commercial product called Perfect Shoemark Scan. The Perfect Shoemark Scan
7.2
Collection of Shoemarks at Crime Scenes
155
equipment consists of a chemical drenched sponge and paper that is reactive to that chemical. The shoe is pressed into the sponge ensuring that the toe and heel are coated, this may require that the shoe is rocked end to end, and then pressed on to the reactive paper. After a few minutes an image of the shoemark will develop on the paper. This technique is comparatively cheap and simple but is only useful if the actual physical shoe is available.
7.2.9 Making a Cast of a Shoemark Directly from a Suspect’s Shoe If the forensic scientist is comparing a photograph of an impression from a crime scene to a suspect’s shoe sole, it is usual for a cast of the suspect’s shoe to be made. This new cast can then be lit in the same manner as the recovered cast and photographed. When making a cast from a suspect’s shoe several techniques may be used to create the impression. The shoe may be depressed into Birofoam4 ; the foam records the indentations and ridges of the shoe sole. Zetalabor5 may also be used. It is warmed by kneading in the palm of the hand; a hardening gel is added before the shoe is pressed firmly into the substance. When the substance dries a cast may be made. Vinylpolysiloxane and Microsil6 may also be used in a similar way [6].
7.2.10 Processing of Shoemarks There are a number of possible objectives that an investigator tries to achieve when he/she processes a shoemark. These include • Determining if a particular shoemark was made by a particular shoe (in particular matching the shoemark against a suspect’s shoe). • Matching the shoemark with other shoemark impressions, possibly from other crime scenes. • Determining the make and model of a shoe that made the mark. • Classifying the shoemark for archival and possible matching later (this job is often performed by the SOCO). The first task in this list can only be performed by a qualified forensic scientist. Similar to how fingerprints are processed, a number of points of comparison must be made between the shoe sole and the shoemark. These usually include accidental characteristics, that is, characteristics of the sole that are caused by wear and tear. 4
Birofoam is a foam material similar to floral foam. Zetalabor is a silicone substance used in the dental industry 6 Vinylpolysiloxane and Microsil are two other silicone dental products used in the creation of dental impressions. 5
156
7
Shoemark Recognition for Forensic Science: An Emerging Technology
Unless the investigators have a suspect in custody the shoemark will be checked against the database and a small number of the most likely matches selected. These preliminary matches will then be passed to the forensic scientist for analysis and hopefully a close match at this point will provide a suspect. The process involved for 2, 3 and 4 each require that the shoemark is classified and the classification is used to search the database. As such, these processes are all subject to the problems associated with inconsistent classification. The procedure used for classification and identification differs slightly in detail between laboratories but generally follows the following sequence of events: • The shoemark is inspected visually and analytically measured. • The descriptors that best match the pattern are selected and recorded. Sometimes the shoemark is split into three sections, the heel, arch and toe. The heel and toe may be classified separately while the arch may be ignored as it rarely contains any useful information. When comparing an unknown shoemark with a suspect’s shoe the procedure used will be similar to the following. For manual shoemark recognition the examiner needs • The shoemark (the original mark, a 1:1 scale photograph or a cast) from the crime scene. • The suspect’s shoe or a second shoemark from either a photograph or cast. When this information is available the examiner proceeds with the following sequence: • • • • • •
Compare the images/casts looking for dissimilarities. Take measurements. Compare overall wear. Mark any characteristics (place, size and form). Compare those characteristics (place, size and form). Check back to the shoe outsole to determine if the characteristics are class or individual. That is are they accidental characteristics or part of the original pattern. • Draw conclusions, is this the suspect’s shoe or not? The success of the last three items in the first list is dependent on how accurately the shoemark has been classified by the SOCO.
7.3
Typical Methods for Shoemark Recognition
157
7.2.11 Entering Data into a Computerised System Many forensic laboratories have migrated their paper shoemark storage systems to computerised ones. It is important that when shoemark images are stored on a computer, the amount of information lost during the digitisation should be minimal. When the images are being processed certain artefacts may be introduced. For example if images are stored in a compressed format such as JPEG, it must be ensured that the compression applied does not, • Introduce artefacts that affect the classification/identification of the shoemark. • Prohibit the use of the image under local law. For the evidence to be admissible in court any artefact must be predictable and explainable. This, however, is not always possible. For this reason a lossless highresolution copy of the image should be kept in the database. A lower resolution working copy is used by the image processing algorithms in the application. When an image is to be used as evidence the original high-quality version is used.
7.3 Typical Methods for Shoemark Recognition There are two main methods of classifying shoemarks. The first was already mentioned and relies on identifying features of the shoemark and labelling them from a set of predefined descriptors. A common scheme of this type is the Birkett
Component Descriptors A B C D E F G H I J L M N Q R
PLAIN/RE-HEEL RANDOM/IRREGULAR LATTICE/NETWORK STRAIGHT CURVED/WAVY ANGLED/ZIG-ZAGGED CIRCULAR - forming basic pattern or a section thereof CIRCULAR – interspersed with other components ANY OTHER SHAPE GEOMETRIC – with three to six straight sides ANY OTHER SHAPE LETTERS/NUMERALS – as part of a name or number “TARGETS” – concentric circles, ovals etc or part thereof COMPLEX/DIFFICULT Same descriptor applies to different principal components.
Fig. 7.4 Shows the key features of two shoemarks and in Fig. 7.5 the relative importance of each section the classification of the mark are indicated. Under each shoemark the Birkett classification is given
158
7
Shoemark Recognition for Forensic Science: An Emerging Technology
Fig. 7.5 Indicates the key features of two shoemarks. The classification of these features is shown in Fig. 7.6. This diagram is reproduced from the “Scottish Police Detective Training” manual
system. A second way of classifying shoemarks is to look for and record accidental characteristics. An example of each is shown in Figs. 7.4 and 7.5.
7.3.1 Feature-Based Classification Many of the current shoemark databases are based on the Birkett system of shoemark feature classification. Figure 7.6 contains a list of the descriptors used in that classification schema.
7.3
Typical Methods for Shoemark Recognition
159
Fig. 7.6 Shows how the shoemark may be divided into four sections for classification. Each section has a classification priority indicated. The hand written letters below each shoemark show the Birkett classification. This diagram is reproduced from the “Scottish Police Detective Training” manual
7.3.2 Classification Based on Accidental Characteristics In Holland the forensic scientists at the Netherlands Forensic Institute have been using a classification system which includes descriptors representing “accidental
160
7
Shoemark Recognition for Forensic Science: An Emerging Technology
characteristics”, i.e. characteristics caused by damage and wear to the shoe sole. This type of characteristic is very important when trying to identify a particular instance of a shoe sole. This is because it is only the accidental characteristics that differ between shoe sole instances, i.e. all shoe sole leave the factory more or less identical and it is the random damage that occurs during normal wear that results in differing patterns. The shoemarks are still classified manually and stored by classification in a computer database. When searching the database accidental characteristics identified on the shoemark are used as well as the standard classification patterns.
7.4 Review of Shoemark Classfication Systems This section reviews some of the shoemark classification/retrieval systems, including both semi-automatic systems, such as SHOE-FIT, SHOE©, Alexandre’s System, REBEZO, TREADMARKTM , and SICAR, and also some automated ones, like SmART, deChazal’s System and Zhang’s System. It should be noted that some of the semi-automatic systems have been used in real forensic investigation by various agents, and some like TREADMARKTM , and SICAR even have already been successfully commercialised. However, so far, all of the automated systems appearing in the literature over the last few years are still in their initial stages, i.e. without any real application examples in forensic investigations.
7.4.1 SHOE-FIT SHOE-FIT is one of the earliest computerised shoemark databases, developed by Sawyer and Monckton [15]. It is based on another existing system developed by Birkett, which codes the shoeprint patterns with a number of letters, followed by a numerical sequence. SHOE-FIT prefixes the coded letter with 2 numerical digits for the year and suffixes it with a 3-digit number to uniquely identify a shoemark. A typical code is as follows: 94FNM011, which means that the footwear is from 1994, and has a zigzag (F) target (N) and letters or numbers (M). SHOE-FIT also concerns itself with transferring a footwear impression to a PC in terms of various of forms, such as faxing, scanning and photographing, and with combining a number of tools for image handling, like format conversion, rotation, resizing, masking and so on. Apart from these, the authors also identify the consistency and the compatibility of any coding system of shoemarks as important.
7.4.2 SHOE© SHOE© is a shoeprint capturing, recording, retrieving system developed by Victoria Forensic Science Centre [4]. The system comprises two parts: SHOEView, and
7.4
Review of Shoemark Classfication Systems
161
SHOEAdmin, and has 4,000 shoemarks in its database. SHOE© codes a shoemark based on a manual identification and recording of patterns found in that shoemark. Some of the popular patterns are categorised into different groups, and can be displayed on the screen for reference when one records and searches a shoemark. One of the attractive points for this system is that it combines the position information into the processes of recording and retrieving by dividing a shoemark into four parts: Toe, Ball, Instep, and Heel. This can increase the accuracy of the searching process. Another advantage is that each of these partitions can be separately classified and searched against independently, so it is possible to search for images that share only characteristics seen in any combination of the four partitions. In this way their system is able to search for partial shoemarks.
7.4.3 Alexandre’s System In his paper, Alexandre [2] has described a shoemark classification system developed by the police department of Neuchatel. In this system, the SOCO (Scene of Crime Officer) footwear manual was selected as the basic coding reference. The system extends this reference with more letters, denoting groups, and some numbers, denoting subgroups, following the letters. Similar to SHOE©, this system also divides a shoeprint into different partitions, sole, instep, and heel, so it can search a partial shoemark. In total, the classification system has 12 groups and 40 subgroups, and it contains 12,000 shoemarks. However, the process of coding is still carried out manually, and a professional officer is needed to record a shoemark.
7.4.4 REBEZO REBEZO was designed by Geradts et al. in the National Forensic Science Laboratory of the Ministry of Justice in the Netherlands with the cooperation of the Dutch Police. Similar to the systems described above, shoemarks in this system are also classified using a set of pattern descriptors that the investigator selects from. One of the problems with this system, like that of other manual systems, is the inconsistent classification, which motivated Geradts et al. to develop an automatic classification approach using Fourier analysis and a neural network system [10]. It thresholds a shoemark first, and then applies some morphological techniques to segment the patterns of the image, before the Fourier descriptors and the moments of each pattern are imported into a neural network system for classification. However, their experimental results [9] suggested that this attempt was not able to give a sound classification because of the unreliable segmentation, caused by noise and artefacts in a shoemark.
162
7
Shoemark Recognition for Forensic Science: An Emerging Technology
7.4.5 TREADMARK TM TREADMARKTM is a shoemark analysis and identification system developed by CSI Equipment Ltd. The manufacturer of this system claimed it as the only system so far available today which utilises all four parameters of Patterns, Size, Damage and Wear to identify individual footwear impressions, and compare them automatically with impressions from both a suspect’s database and a SoC database. Here, the automation refers only to the progress of matching and searching of a database. Actually, it still needs users to manually code shoemarks by patterns or other characteristics. One point different from other systems is that TREADMARKTM requires that the user indicates the position of accidental or random characteristics on the shoe sole. It records the positions of these characteristics and uses them to search for other shoemarks with accidental characteristics in similar positions. More details about TREADMARKTM are available at the website of CSI Equipment Ltd. (2006, http://www.k9sceneofcrime.co.uk/systems.aspx).
7.4.6 SICAR SICAR is one of the most successful commercial systems for shoemark archiving and classification/retrieval, developed by Foster and Freeman Ltd, London, UK. It has been widely used in British police forces and forensic laboratories in the UK. The most recent version is SICAR 6, which is claimed to be able to archive shoemarks from both suspects and SoC. Combined with SoleMate – a reference database of shoemarks from shoemakers developed by the same company, SICAR can be used to identify the information of a scene image, such as the manufacturers, the release date, and so on. Like other semi-automatic systems, this system requires an operator to classify the shoemark by assigning codes to individual features in the shoemark. The classification is then stored in the database and can be searched again later. SICAR adopts a simple coding technique to characterise shoeprints which forms the basis of many of the database search and match operations. The process enables the operator to create a coded description of the pattern of a shoe sole by identifying elemental pattern features such as lines, waves, zigzags, circles, diamonds and blocks, etc., each of which bears a unique code. Like SHOE©, this is a straightforward selection process, as each type of elemental pattern is displayed, with variants, for the operator to choose from. SICAR has also been extended to other databases like tyre tread, (Foster and Freeman Ltd., 2008, http://www.fosterfreeman.co.uk/sicar.html).
7.4.7 SmART Although the Geradts’ attempt to develop an automated shoemark classification system was not successful, the work in this area continued. Alexander et al. in their
References
163
paper [1] propose a fully automated shoemark classification system. As the first automatic shoemark classification system, SmART can automatically search against a database of shoeprints. The authors apply fractal codes to represent a shoeprint, and a mean square noise error method is used for determining the match results. The algorithm has been tested on a database of shoemarks, containing 32 marks.
7.4.8 De Chazal’s System DeChazal et al. in their paper [8] describe another fully automated shoemark classification system, which can automatically search against a database of shoemarks based on the outsole patterns in the Fourier domain. The algorithm proposed in this paper utilises the Power Spectral Density (PSD) function of a shoemark as the pattern descriptor. For orientation invariance consideration, a set of PSDs are computed for different rotations of the query shoemark. Meanwhile, low- and highfrequency components are removed to reduce the grey-level variance and keep the unique information of a shoemark. DeChazal et al. have tested their approach on a database of shoemarks, containing more than 1,000 marks from Forensic Science Laboratory, Garda Headquarters, Dublin, Ireland. They also generated a partial shoemark database to test the robustness of their approach. A reported 67% accuracy, for when the first ranked mark is in the same category of the query mark, has been obtained on their database.
7.4.9 Zhang’s System In their paper [16], Zhang et al. have proposed to represent a shoemark based on sole pattern edge information. The demo system developed by Zhang et al. can automatically archive and retrieve a shoemark in terms of an edge direction histogram, which is claimed to be inherent scale, translation and rotation invariance. The system provides a user with a short ranked list of best-matched shoemarks in response to a query impression. The retrieval test of their approach is performed on a database of 512 full shoemarks. The authors also generate some degraded images like rotated, scaled, and Gaussian noisy images, to evaluate the robustness of their approach to different degradations. They reported that the accuracy of 85.4% can be obtained on their databases.
References 1. A. G. Alexander, A. Bouridane and D. Crookes, “Automatic classification and recognition of shoeprints,” Special Issue of the Information Bulletin for Shoeprint/Toolmark Examiners, vol. 6. no. 1, pp. 91–104, 2000. 2. G. Alexandre, “Computerized classification of the shoeprints of burglar’s soles.” Forensic Science International, vol. 82, pp. 59–65, 1996.
164
7
Shoemark Recognition for Forensic Science: An Emerging Technology
3. R. Ashe, R. M. E. Griffin and B. Bradford, “The enhancement of latent footwear marks present as grease or oil residues on plastic bags,” Science and Justice, vol. 40 no. 3, pp. 183–187, 2000. 4. W. Ashley, “What shoe was that the use of computerised image database to assist in identification,” Forensic Science International, vol. 82, pp. 67–79, 1996. 5. W. J. Bodziak, “Footwear Impression Evidence,” New York: Elsevier, 1990. 6. W. J. Bodziak, “Footwear Impression Evidence, Detection, Recovery, and Examination,” 2nd Edition. CRC Press, 2000, ISBN: 0-8493-1045-8. 7. Davis, M. (1998) Details on shoeprint photography provided in communication by email with Davis, M., MSSC Regional Crime Lab, Joplin MO, US., Newton County Sheriff’s Department, Neosho MO. 8. P. D. De Chazal, J. Flynn and R. B. Reilly, “Automated processing of shoeprint images based on the Fourier transform for use in forensic science,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 341–350, 2005. 9. Z. Geradts, “Content-Based Information Retrieval from Forensic Image Databases,” PhD Thesis, The Netherlands Forensic Institute of the Ministry of Justice in Rijswijk, The Netherlands, 2002. 10. Z. Geradts and J. Keijzer, “The image-database REBEZO for shoeprints with developments on automatic classification of shoe outsole designs,” Forensic Science International, vol. 82, pp. 21–31, 1996. 11. A. Girod, “Computerised classification of the shoeprints of burglars’ shoes,” Forensic Science International, vol. 82 pp. 59–65, 1996. 12. H. Majamaa, “Survey of the conclusions drawn of similar foorwear cases in various crime laboratories,” Forensic Science International, vol. 82, pp. 109–120, 1996. 13. R. Milne, “Operation Bigfoot – a volume crime database project,” Science and Justice, vol. 41, no. 3, pp. 215–217, 2001. 14. T. J. Napier, “Scene linking using footwear mark databases,” Science and Justice, vol. 42, no. 1, pp. 39–43, 2002. 15. N. E. Sawyer and C. W. Monckton, “SHOE-FIT a Computerised Shoe Print Image Database.” IEE European Convention and Security and Detection, Brighton, UK, pp. 86–89, 1995. 16. L. Zhang and N. M. Allinson, “Automatic Shoeprint Retrieval System for use in Forensic Investigations,” 5th Annual UK Workshop on Computational Intelligence, 2005.
Chapter 8
Techniques for Automatic Shoeprint Classification
8.1 Current Approaches Research in automatic shoeprint classification has been reported a little over a decade. Some of the techniques reported have focused on representing the small shape components which make up a pattern, whereas others extract features from the shoeprint without any subdivision. Similarly, some techniques are deployed in the spatial domain, while others operate in the transform domain, or a combination of the two. Early research used techniques such as Fourier descriptors to model the pattern components, or fractals to model the complete shoe pattern [1–3]. More recent work by a group at University College Dublin (Ireland) [4] has used the Power Spectral Density (PSD) coefficients of the image, which are calculated using the Fourier Transform and used as features. Rotation invariance is achieved using a “brute force” approach in which the query image is rotated in 1◦ steps. The group at Sheffield University (UK) uses the technique of matching Edge Directional Histograms (EDH) [5]. The authors describe the interior shape information of a shoeprint image in terms of its significant edges, and use a histogram of the edge direction as the signature of a shoeprint image. This method first extracts the edges using a Canny edge detector. To obtain rotation invariance, they compute the 1D Fast Fourier Transform (FFT) of the normalised edge direction histogram and take it as the final signature of the shoeprint image. However, tests on partial shoeprints using this technique have not been reported. In this chapter, methods for automatically classifying shoeprints for use in forensic science are presented. In particular, we propose two correlation-based approaches to classify low-quality shoeprints: (i) Phase-Only Correlation (POC) which can be considered as a matched filter and (ii) Advanced Correlation Filters (ACFs). These techniques offer two primary advantages: the ability to match low-quality shoeprints and translation invariance. Experiments were conducted on a database of images of 100 different shoes available on the market. For the experimental evaluation, challenging test images including partial shoeprints with different distortions (such as noise addition, blurring and in-plane rotation) were generated. Results have shown that the proposed correlation-based methods are very practical and provide high performance when processing low-quality partial-prints. A. Bouridane, Imaging for Forensics and Security, Signals and Communication Technology, DOI 10.1007/978-0-387-09532-5 8, C Springer Science+Business Media, LLC 2009
165
166
8 Techniques for Automatic Shoeprint Classification
8.2 Using Phase-Only Correlation In this section, a phase-based method, which has been developed by the team at Queen’s University Belfast, is presented. The technique uses the POC technique for shoeprint matching. The main advantage of this method is its capability to match low-quality shoeprint images accurately and efficiently. In order to achieve superior performance, the use of a spectral weighting function is also proposed. The use of the phase information was mainly motivated by the fact that, in the Fourier domain, the phase is much more important than the magnitude in preserving the features of image patterns, as proved by Oppenheim and Lim [6]. A simple illustration for shoeprint images is given in Fig. 8.1. Fig. 8.1 (a) Original shoeprint image A. (b) Original shoeprint image B. (c) Image synthesised from the Fourier transform phase of image B and the magnitude of image A. (d) Image synthesised from the Fourier transform phase of image A and the magnitude of image B
(a)
(b)
(c)
(d)
8.2.1 The POC Function Consider two images g1 (x,y) and g2 (x,y). The Fourier transform of g1 and g2 are G1 (u,v)=A(u,v)ejφ(u,v) and G2 (u,v)=B(u,v)ejθ(u,v) where A(u,v) and B(u,v) are amplitude spectral functions while φ(u,v) and θ (u,v) are phase spectral functions, respectively. The POC function qg1 g2 (x, y) of the two images g1 and g2 is defined as
8.2
Using Phase-Only Correlation
167
qg1 g2 (x, y) = F
−1
G (u, v)G 2 ∗ (u, v) 1 G 1 (u, v)G 2 ∗ (u, v)
) (8.1)
= F −1 e j(φ(u,v)−θ(u,v))
(8.2)
where F-1 denotes the inverse Fourier transform and G2 ∗ is the complex conjugate of G2 . The term Q g1 g2 (u, v) = e j(φ(u,v)−θ(u,v)) is termed cross-phase spectrum between g1 and g2 [7]. If the two images g1 and g2 are identical, their POC function will be a Dirac δfunction centred at the origin and having the peak value 1. When matching similar images, the POC approach produces a sharper correlation peak compared to the conventional correlation as shown in Fig. 8.2.
(b)
(a)
0.2 0.15
2.38
0.1 2.375
0.05 0
2.37
-0.05 N/2 N/2 N/2 N/2
0
0 -N/2
0
0
-N/2
(c)
-N/2
-N/2
(d)
Fig. 8.2 (a) Original shoeprint image A. (b) Noisy partial shoeprint B generated from A. (c) Phase-only correlation (POC) between A and B. (d) Conventional correlation between A and B
168
8 Techniques for Automatic Shoeprint Classification
8.2.2 Translation and Brightness Properties of the POC Function Consider an image g3 that differs from g2 by a displacement (x0 , y0 ) and a brightness scale a>0. Then, g3 and g2 will be related by g3 (x, y) = ag2 (x − x0 , y − y0 )
(8.3)
In the frequency domain, this will appear as a phase shift and a magnitude scaling: G 3 (u, v) = ae− j2π (x0 u+y0 v) G 2 (u, v)
(8.4)
According to (8.1), (8.2) and (8.4), the POC function between g1 and g3 is given by qg1g3 (x, y) = F −1 e− j2π(x0 u+y0 v) e j(φ(u,v)−θ(u,v))
(8.5)
= qg1g2 (x − x0 , y − y0 )
(8.6)
Equation (8.6) shows that the POC function between g1 and g3 is only a translated version of the POC function between g1 and g2 . The two POC functions have the same peak value which is invariant to translation and brightness change.
8.2.3 The Proposed Phase-Based Method The proposed method uses the POC approach combined with a spectral weighting function. 8.2.3.1 Spectral Weighting Function Spectral weighting functions have already been used with the POC technique in image registration to enhance the registration accuracy [7]. In this work, we propose to use a band-pass-type spectral weighting function (Fig. 8.3) to improve the recognition rate by eliminating high frequency components which have low reliability, but without significantly decreasing the correlation peak sharpness as very low frequency components will also be eliminated. The proposed weighting function W(u,v) has the same shape as the spectrum of a Laplace of Gaussian (LoG) function and is given by W (u, v) =
u 2 + v2 α
−u
e
2 +v2 2β 2
(8.7)
where β is a parameter which controls the function width and α is used for normalisation purposes. Thus, the modified phase-only correlation (MPOC) function q˜ g1 g2 (x, y) of images g1 and g2 is given by
8.2
Using Phase-Only Correlation
169
1 1
0.9
0.9
0.8
0.8 0.7
0.7
0.6
0.6
0.5 0.4
0.5
0.3
0.4
0.2 0.1
0.3
0 N/2
0.2 N/2
0.1 0 0
-N/2
0 -N/2
0
N/2
-N/2
(a)
(b)
Fig. 8.3 The proposed band-pass-type spectral weighting function with β=50. (a) 3D representation. (b) 2D representation
q˜ g1 g2 (x, y) = F
−1
G 1 (u, v)G 2 ∗ (u, v) W (u, v) G 1 (u, v)G 2 ∗ (u, v)
) (8.8)
The peak value of the MPOC function q˜ g1 g2 (x, y) is also invariant to translation and brightness change.
8.2.3.2 Shoeprints Matching Algorithm A schematic of the proposed shoeprint matching algorithm is shown in Fig 8.4. In response to an unknown shoeprint image gi , the algorithm matches gi to each database image gn (n=1 . . . M where M is the size of the database) and determines the corresponding matching score. The matching algorithm consists of the following steps:
g1
G1 FFT
G1/ |G1|
~ Qg1g2 Qg1g2
Database image g2
Input image
e jφ
FFT
G2
*
*
G2/ |G2|
e –jθ
W Weighting
Fig. 8.4 Diagram of the proposed matching algorithm
q~g1g2 IFFT
Peak value
Matching score
170
8 Techniques for Automatic Shoeprint Classification
1. Calculate the Fourier transform of gi and gn using the FFT to obtain Gi and Gn . 2. Extract the phases of Gi and Gn and calculate the cross-phase spectrum Q gn gi . ˜ gn gi by modifying Q gn gi using the 3. Calculate the modified cross-phase spectrum Q spectral weighting function W. ˜ gn gi using the Inverse FFT (IFFT) to 4. Calculate the inverse Fourier transform of Q obtain the MPOC function q˜ gn gi . 5. Determine the maximum value of q˜ gn gi . This value will be considered as the matching score between images gi and gn . The use of the band-pass-type weighting function W (defined in Eq. (8.7)) will eliminate meaningless high frequency components without significantly affecting the sharpness of the correlation peak (since very low frequency components will be also attenuated). In this work, the peak value of the MPOC function has been considered as the similarity measure for image matching: if two images are similar, their MPOC function will give a distinct sharp peak, if they are dissimilar, then the peak drops significantly. After matching the input image gi against all database images, using the algorithm described above, the resulting matching scores are used to produce a list of l shoeprints (l