e-Learning Understanding Information Retrieval Medical
SERIES ON SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING Series Editor-in-Chief S K CHANG (University of Pittsburgh, USA)
Vol. 1
Knowledge-Based Software Development for Real-Time Distributed Systems Jeffrey J.-P. Tsai and Thomas J. Weigerf (Univ. Illinois at Chicago)
VOl. 2
Advances in Software Engineering and Knowledge Engineering edited by Vincenzo Ambriola (Univ. Pisa) and Genoveffa Torfora (Univ. Salerno)
VOl. 3
The Impact of CASE Technology on Software Processes edited by Daniel E. Cooke (Univ. Texas)
Vol. 4
Software Engineering and Knowledge Engineering: Trends for the Next Decade edited by W. D. Hurley (Univ. Pittsburgh)
VOl. 5
Intelligent Image Database Systems edited by S. K. Chang (Univ. Pittsburgh), 15.Jungerf (Swedish Defence Res. Establishment) and G. Torfora (Univ. Salerno)
Vol. 6
Object-Oriented Software: Design and Maintenance edited by Luiz F. Capretz and Miriam A. M. Capretz (Univ. Aizu, Japan)
VOl. 7
Software Visualisation edited by P. Eades (Univ. Newcastle) and K. Zhang (Macquarie Univ.)
Vol. 8
Image Databases and Multi-Media Search edited by Arnold W. M. Smeulders (Univ. Amsterdam) and Ramesh Jain (Univ. California)
VOl. 9
Advances in Distributed Multimedia Systems edited by S. K. Chang, T. F. Znati (Univ. Pittsburgh) and S. T. Vuong (Univ. British Columbia)
Vol. 10 Hybrid Parallel Execution Model for Logic-Based Specification Languages Jeffrey J.-P. Tsai and Sing Li (Univ. Illinois at Chicago) Vol. 11 Graph Drawing and Applications for Software and Knowledge Engineers Kozo Sugiyama (Japan Adv. Inst. Science and Technology) Vol. 12 Lecture Notes on Empirical Software Engineering edited by N. Jurist0 & A. M. Moreno (Universidad Politecrica de Madrid, Spain) Vol. 13 Data Structures and Algorithms edited by S. K. Chang (Univ. Pittsburgh, USA) Vol. 14 Acquisition of Software Engineering Knowledge SWEEP: An Automatic Programming System Based on Genetic Programming and Cultural Algorithms edited by George S. Cowan and Robert G. Reynolds (Wayne State Univ.) Vol. 15 Image: E-Learning, Understanding, Information Retieval and Medical Proceedings of the First International Workshop edited by S. Vitulano (Universita di Cagliari, Italy)
ngineering and Knowledge Engineering Series on Software Engineering Proceedings of the First International Workshop Cagliari, ltak
e-Learning Understanding Information Retrieval Medical
edited by Sergio Vitulano Uvliversitd degli Studi di Cagliari, Italy
r LeWorld Scientific
NewJersey London Singapore Hong Kong
9 - 10 June 2003
Published by World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224 USA ofice: Suite 202, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library
IMAGE: E-LEARNING, UNDERSTANDING, INFORMATION RETRIEVAL AND MEDICAL Proceedings of the First International Workshop Copyright 0 2003 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereoj may not be reproduced in any form or by any means, electronic or mechanical, includingphotocopying, recording or any information storage and retrieval system now known or to be invented, without written permissionfrom the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-238-587-8
Printed in Singapore by Mainland Press
This page intentionally left blank
PREFACE The role played by images in several human activities, that ranges from entertainment to studies and covers all phases of the learning process, is ever more relevant and irreplaceable. The computer age may be interpreted as a transformation of our social life in its working and leisure aspects. In our opinion this change is so relevant that it could be compared with the invention of printing, of the steam-engine or the discovery of radio-waves. While for a long time images could only be captured by photography, we are now able to capture, to manipulate and to evaluate images with the computer. Since original image processing literature is spread over many disciplines, we can understand the need to gather into a specific science all the knowledge in this field. This new science takes into account the image elaboration, transmission, understanding, ordering and finally the role of image in knowledge as a general matter. This book aims a t putting as evidence some of the above listed subjects. First of all we wish to emphasize the importance of images in the learning process and in the transmission of knowledge (e-Learning section). How much and what kind of information contents do we need in image comprehension? We try to give an answer, even if partially, in the Understanding section of this book. The big amount of images used in internet sites requires the solution of several problems. Their organization and the transmission of their content is the typical field of interest of information retrieval, which studies and provides solution to this specific problem. In the last two decades the number and the role played by images in the medical field has become ever more important. At the same time the physicians require methodologies typical of Computer Science for the analysis, the organization and for CAD (Computer-Aided Design) purposes applied t o medical images treatment. The Medical section of this volume gives examples of the interaction between computer science and medical diagnosis. This book tries to offer a new contribution to computer science that will inspire the reader to discover the power of images and to apply the new knowledge of this science adequately and successfully to his or her research area and to everyday life. Sergio Vitulano
vii
This page intentionally left blank
CONTENTS
vii
Preface
Medical Session Chairman: M. Tegolo An Introduction to Biometrics and Face Recognition
1
F. Perronnin, Jean-Luc Dugelay The Use of Image Analysis in the Early Diagnosis of Oral Cancer
21
3.Serpico, M. Petruzzi, M. De Benedittis Lung Edge Detection in Poster0 Anterior Chest Radiographs
27
P. Campadelli, E. Casimghi Discrete Tomography from Noisy Projections
38
C. Valenti An Integrated Approach to 3D Facial Reconstruction from Ancient Skull
46
A. F. Abate, M. Nappi, S. Ricciardi, G. Tortora e-Learning Session Chairman: M. Nappi The e-Learning Myth and the New University
60
V. Cantoni, M. Porta, M. G. Semenza e-Learning - The Next Big Wave: How e-learning will enable the transformation of education
69
R. Straub, C. Milani Information Retrieval Session Chairman: V. Cantoni Query Morphing for Information Fusion S.-K. Chang
ix
86
X
Image Representation and Retrieval with Topological Trees C. Grana, G. Pellacani, S. Seidenari, R. Cucchiara
112
An Integrated Environment for Control and Management of Pictorial Information Systems A . F. Abate, R. Cassino, M. Tucci
123
A Low Level Image Analysis Approach to Starfish Detection V. Di Geszi, D. Tegolo, F. Isgrd, E. Trucco
132
A Comparison among Different Methods in Information Retrieval F. Cannavale, V. Savona, C. Scintu
140
HER Application on Information Retrieval A . Casanova, M . Praschini
150
Understanding Session Chairman: Jean-Luc Dugalay Issues in Image Understanding V , Di Geszi
159
Information System in the Clinical-Health Area G. Madonna
178
A Wireless-Based System for an Interactive Approach to Medical Parameters Exchange G. Fenu, A . Crisponi, S. Cugia, M. Picconi
200
1
AN INTRODUCTION TO BIOMETRICS AND FACE RECOGNITION
F. PERRONNIN*AND J.-L. DUGELAY Eurecom Institute Multimedia Communications Department 2229, route des Crgtes - B. P. 193 06904 Sophia-Antipolis ce'dex - France E-mail: {perronni,dugelay} @eurecom.fr
We present in this paper a brief introduction to biornetrics which refers to the problem of identifying a person based on his/her physical or behavioral characteristics. We will also provide a short review of the literature on face recognition with a special emphasis on frontal face recognition, which represents the bulk of the published work in this field. While biornetrics have mostly been studied s e p arately, we also briefly introduce the notion of multirnodality, a topic related to decision fusion and which has recently gained interest in the biometric community.
1. Introduction to Biometrics
The ability to verify automatically and with great accuracy the identity of a person has become crucial in our society. Even though we may not notice it, our identity is challenged daily when we use our credit card or try t o gain access to a facility or a network for instance. The two traditional approaches t o automatic person identification, namely the knowledge-based approach which relies on something that you know such as a password, and the token-based approach which relies on something that you have such as a badge, have obvious shortcomings: passwords might be forgotten or guessed by a malicious person while badges might be lost or stolen '. Biometrics person recognition, which deals with the problem of identifying a person based on his/her physical or behavioral characteristics, is an alternative to these traditional approaches as a biometric attribute is inherent to each person and thus cannot be forgotten or lost and might be difficult t o forge. The face, the fingerprint, the hand geometry, the iris, 'This work was supported in part by France Telecom Research
1
L
etc. are examples of physical characteristics while the signature, the gait, the keystroke, etc. are examples of behavioral characteristics. It should be underlined that a biometric such as the voice is both physical and behavioral. Ideally a biometric should have the following properties: it should be universal, unique, permanent and easily collectible 2 . In the next three sections of this introductory part, we will briefly describe the architecture of a typical biometric system, the measures to evaluate its performance and the possible applications of biometrics.
1.1. Architecture
A biometric system is a particular case of a pattern recognition system Given a set of observations (captures of a given biometric) and a set of possible classes (for instame the set of persons that can be possibly
’.
identified) the goal is to associate to each observation one unique class. Hence, the main task of pattern recognition is to distinguish between the intru-class and inter-class variabilities. Face recognition, which is the main focus of this article, is a very challenging problem as faces of the same person are subject to variations due to facial expressions, pose, illumination conditions, presence/absence of glasses and facial hair, aging, etc. A biometric system is composed of at least two mandatory modules, the enrollment and recognition modules, and an optional one, the adaptation module. During enrollment, the biometric is first measured through a sensing device. Generally, before the feature eotraction step, a series of pre-processing operations, such as detection, segmentation, etc. should be applied. The extracted features should be a compact but accurate representation of the biometric. Based on these features, a model is built and stored, for instance in a database or on a smart card. During the recognition phase, the biometric characteristic is measured and features are extracted as done during the enrollment phase. These features are then compared with one or many models stored in the database, depending on the operational mode (see the next section on performance evaluation). During the enrollment phase, a user friendly system generally captures only a few instances of the biometric which may be insufficient to describe with great accuracy the characteristics of this attribute. Moreover, this biometric can vary over time in the case where it is non-permanent (e.g. face, voice). Adaptation maintains or even improves the performance of the system over time by updating the model after each access to the system.
3
~I-~I_ICUIIIIIF*llOh.EXTRACTION
ID
Figure 1. Architecture of a biometric system.
1.2. Performance Evaluation Generally, a biometric system can work under two different operational modes: identification or verification. During identification, the system should guess the identity of person among a set of N possible identities (1:N problem). A close-set is generally assumed, which means that all the trials will be from people which have a model in the database and the goal is hence to find the most likely person. During verification, the user claims an identity and the system should compare this identity with the stored model (1:l problem). This is referred as an open-set as persons which are not in the database may try to fool the system. One can sometimes read claims that identification is a more challenging problem than verification or vice-versa. Actually, identification and verification are simply two different problems. As it may not be enough to know whether the top match is the correct one for an identification system, one can measure its performance through the cumulative match score which measures the percentage of correct answers among the top N matches. Also one could use recall-precision curves as is done for instance to measure the performance of database retrieval systems. The FERET face database is the most commonly used database for assessing the performance of a system in the identification mode. A verification system can make two kinds of mistakes: it can reject a rightful user, often called client, or accept an impostor. Hence, the performance of a verification system is measured in terms of its false rejection rate ( F R R ) and false acceptance rate (FAR). A threshold is set to the scores obtained during the verification phase and one can vary this threshold t o
4
obtain the best possible compromise for a particular application depending on the required security level. By varying this threshold, one obtains the receiver operating curve (ROC), i.e. the FRR as a function of the FAR. To summarize the performance of the system with one unique figure, one often uses the equal error rate (EER) which corresponds to the point FAR=FRR. The M2VTS database and its extension, the XM2VTSDB 5 , are the most commonly used databases for assessing the performance of a system in the verification mode. The interested reader can also refer to for an introduction to evaluating biometric systems.
1.3. Applications
There are mainly four areas of applications for biometrics: access control, transaction authentication, law enforcement and personalization. Access control can be subdivided into two categories: physical and uirtual access control The former controls the access to a secured location. An example is the Immigration and Naturalization Service’s Passenger Accelerated Service System (INSPASS) deployed in major US airports which enables frequent travelers to use an automated immigration system that authenticates their identity through their hand geometry. The latter one enables the access to a resource or a service such as a computer or a network. An example of such a system is the voice recognition system used in the MAC 0s 9. Transaction authentication represents a huge market as it includes transactions at an automatic teller machine (ATM) , electronic fund transfers, credit card and smart card transactions, transactions on the phone or on the Internet, etc. Mastercard estimates that a smart credit card incorporating finger verification could eliminate 80% of fraudulent charges 8 . For transactions on the phone, biometric systems have already been deployed. For instance, the speaker recognition technology of Nuance is used by the clients of the Home Shopping Network or Charles Schwab. Law enforcement has been one of the first applications of biometrics. Fingerprint recognition has been accepted for more than a century as a means of identifying a person. Automatic face recognition can also be very useful for searching through large mugshot databases. Finally, personalization through person authentication is very appealing in the consumer product area. For instance, Siemens allows to personalize one’s vehicle accessories, such as mirrors, radio station selections, seating
’.
5
positions, etc. through fingerprint recognition
lo
In the following subsections, we will provide to the reader a brief review of the literature on face recognition. This review will be split into two parts: we will devote the next section to frontal face recognition which represents the bulk of the literature on and the “other modalities”, corresponding to different acquisition scenarios such as profile, range images, facial thermogram or video, will be discussed in section 3. The interested reader can refer to l1 for a full review of the literature on face recognition before 1995. We should underline that specific parts of the face (or the head) such as the eyes, the ears, the lips, etc. contain a lot of relevant information for identifying people. However, this is out of the scope of this paper and the interested reader can refer to l2 for iris recognition, t o l 3 for ear recognition and l4 for lips dynamics recognition. Also we will not review a very important part of any face recognition system: the face detection. For a recent review on the topic, the reader can refer to 1 5 .
2. Frontal Face Recognition
It should be underlined that the expression “frontal face recognition” is used in opposition to “profile recognition”. A face recognition system that would work only under perfect frontal conditions would be of limited interest and even “frontal” algorithms should have some view tolerance. As a full review, even of the restricted topic of frontal face recognition, is out of the scope of this paper, we will focus our attention on two very successful classes of algorithms: the projection-based approaches, i.e. the Eigenfaces and its related approaches, and the ones based on deformable models such as Elastic Graph Matching. It should be underlined that the three top performers at the 96 FERET performance evaluation belong t o one of these two classes ‘. 2.1. Eigenfaces and Related Approaches In this section, we will first review the basic eigenface algorithm and then consider its extensions: multiple spaces, eigenfeatures, linear discriminant analysis and probabilistic matching. 2.1.l. Eigenfaces Eigenfaces are based on the notion of dimensionality reduction. first outlined that the dimensionality of the face space, i.e. the space of variation
6
between images of human faces, is much smaller than the dimensionality of a single face considered as an arbitrary image. As a useful approximation, one may consider an individual face image to be a linear combination of a small number of face components or eigenfaces derived from a set of reference face images. The idea of the Principal Component Analysis (PCA) 17, also known as the Karhunen-Loewe Transform (KLT), is to find the subspace which best accounts for the distribution of face images within the whole space. Let { O i } i E [ l , be ~ ~the set of reference or training faces, 0 be the average face and Oi = Oi - 0. Oi is sometimes called a caricature image. Finally, if 0 = [ O 1 , 0 ~..., O N ] ,the scatter matrix S is defined as: N
S=
-pi@ =
0 0 T (1) i=l The optimal subspace PPCAis chosen to maximize the scatter of the projected faces:
P ~ C A= argmax P
~PSP~I
(2)
where 1.1 is the determinant operator. The solution to problem (2) is the subspace spanned by the eigenvectors [el, e2, ...e ~ ]also , called eigenfaces, corresponding t o the K largest eigenvalues of the scatter matrix S. It should be underlined that eigenfaces are not themselves usually plausible faces but only directions of variation between face images (see Figure 2). Each face image is represented by a point PPCAx Oi = [w:, w f ,...]w: in
Figure 2. (a) Eigenface 0 (average face) and (b)-(f) eigenfaces 1 to 5 as estimated on a subset of the FERET face database.
the K-dimensional space. The weights wk’s are the projection of the face image on the k - th eigenface ek and thus represent the contribution of each eigenface to the input face image.
7
To find the best match for an image of a person’s face in a set of stored facial images, one may calculate the Euclidean distances between the vector representing the new face and each of the vectors representing the stored faces, and then choose the image yielding the smdlest distance 18. 2.1.2. Multiple Spaces Approaches
When one has a large amount of training data, one can either pool all the data to train one unique eigenspace, which is known as the parametric approach or split the data into multiple training sets and train multiple eigenspaces which is known as the view-based approach. The latter approach has been designed especially to compensate for different head poses. One of the first attempts to train multiple eigenspaces was made in 19. This method, consists in building a separate eigenspace for each possible view 19. For each new target image, its orientation is first estimated by projecting it on each eigenspace and choosing the one that yields the smallest distance from face to space. The performance of the parametric and viewbased approaches were compared in l9 and the latter one seems to perform better. The problem with the view-based approach is that it requires large amounts of labeled training data to train each separate eigenspace. More recently Mixtures of Principal Components (MPC) were proposed to extend the traditional PCA An iterative procedure based on the Expectation-Maxamazationalgorithm was derived in both cases to train automatically the MPC. However, while 2o represents a face by the best set of features corresponding to the closest set of eigenfaces, in 21 a face image is projected on each component eigenspace and these individual projections are then linearly combined. Hence, compared to the former approach, a face image is not assigned in a hard manner to one eigenspace component but in a soft manner to all the eigenspace components. 21 tested MPC on a database of face images that exhibit large variabilities in poses and illumination conditions. Each eigenspace converges automatically to varying poses and the first few eigenvectors of each component eigenspace seem to capture lightning variations. 2oi21.
2.1 3. Eigenfeatures An eigenface-based recognition system can be easily fooled by gross variations of the image such as the presence or absence of facial hair 19. This shortcoming is inherent to the eigenface approach which encodes a global representation of the face. To address this issue, l9 proposed a modular or
8
layered approach where the global representation of the face is augmented by local prominent features such as the eyes, the nose or the mouth. Such an approach is of particular interest when a part of the face is occluded and only a subset of the facial features can be used for recognition. A similar approach was also developed in 22. The main difference is in the encoding of the features: the notion of eigenface is extended to eigeneyes, eigennose and eigenmouth as was done for instance in 23 for image coding. For a small number of eigenvectors, the eigenfeatures approach outperformed the eigenface approach and the combination of eigenfaces and eigenfeatures outperformed each algorithm taken separately. 2.1.4. Linear Discriminant Approaches
While PCA is optimal with respect to data compression 16, in general it is sub-optimal for a recognition task. Actually, PCA confounds intra-personal and extra-personal sources of variability in the total scatter matrix S. Thus eigenfaces can be contaminated by non-pertinent information. For a classification task, a dimension reduction technique such as Linear The idea Discriminant Analysis (LDA) should be preferred to PCA of LDA is to select a subspace that maximizes the ratio of the inter-class variability and the intra-class variability. Whereas PCA is an unsupervised feature extraction method, discriminant analysis uses the category information associated with each training observation and is thus categorized as supervised. Let O i , k be the k-th picture of training person i, Ni be the number of training images for person i and o i be the average of person i. Then SB and S,, respectively the between- and within-class scatter matrices, are given by: 24125726.
C
i= 1
i = l k=l
The optimal subspace PLDA is chosen to maximize the between-scatter of the projected face images while minimizing the within-scatter of the projected faces:
9
The solution to equation (5) is the sub-space spanned by [ e l , e n , . . . e ~ ] , the generalized eigenvectors corresponding to the largest eigenvalues of the generalized eigenvalue problem: SBek = /\kSwek
k= 1 , ... K
(6)
However, due to the high dimensionality of the feature space, Sw is generally singular and this principle cannot be applied in a straightforward manner. To overcome this issue, generally one first applies PCA to reduce the dimension of the feature space and then performs the standard LDA 24,26. The eigenvectors that form the discriminant subspace are often referred as Fisherfaces 24. In 2 6 , the space spanned by the first few Fisherfaces are called the m o s t discriminant features (MDF) classification space while PCA features are referred as m o s t expressive features (MEF). It should be
Figure 3. (a) Fisherface 0 (average face) and (b)-(f) Fisherfaces 1 to 5 as estimated on a subset of the FERET face database.
underlined that LDA induces non-orthogonal projection axes, a property which has great relevance in biological sensory systems 2 7 . Other solutions to equation 5 were suggested 27,28,29.
2.1.5. Probabilistic Matching While most face recognition algorithms, especially those based on eigenfaces, generally use simple metrics such as the Euclidean distance, 30 suggests a probabilistic similarity based on a discriminative Bayesian analysis of image differences. One considers the two mutually exclusives classes of variation between two facial images: the intra-personal and extra-personal variations, whose associated spaces are noted respectively RI and RE. Given two face images 01 and 0 2 and the image difference A = 01 - 0 2 , the similarity measure is given by P ( R I ~ A ) Using . Baye’s rule, it can be trans-
10
formed into:
The high-dimensionality probability functions P(AlR1) and P ( A ~ R E are ) estimated using an eigenspace density estimation technique 31. It was observed that the denominator in equation (7) had a limited impact on the performance of the system and that the similarity measure could be reduced to P(A\O,) with little loss in performance, thus reducing the computational requirements of the algorithm by a factor two. 2.2. Deformable Models
As noted in 32, since most face recognition algorithms are minimum distance pattern classifiers, a special attention should be paid to the definition of distance. The distance which is generally used is the Euclidean distance. While it is easy to compute, it may not be optimal as, for instance, it does not compensate for the deformations incurred from different facial expressions. Face recognition algorithms based on deformable models can cop with this kind of variation. 2.2.1. Elastic Graph Matching
Elastic Graph Matching algorithm (EGM) has roots in the neural network community 3 3 . Given a template image FT,one first derives a face model from this image. A grid is placed on the face image and the face model is a vector field 0 = { o i , j } where oi,j is the feature vector extracted at position ( i , j ) of the grid which summarizes local properties of the face (c.f. Figure 4(a). Gabor coefficients are generally used but other features, like morphological feature vectors, have also been considered and successfully applied to the EGM problem 34. Given a query image 3Q, one also derives a vector field X = { q j } but on a coarser grid than the template face (c.f. Figure 4(b)). In the EGM approach, the distance between the template and query images is defined as a best mapping M * among the set of all possible mappings { M }between the two vector fields 0 and X . The optimal mapping depends on the definition of the cost function C. Such a function should keep a proper balance between the local matching of features and the requirement t o preserve spatial distance. Therefore, a proper cost function should be of
11
...................................
Figure 4. (a) Template image and (b) query image with their associated grids. (c) Grid after deformation using the probabilistic deformable model of face mapping (c.f. section 2.2.3). Images extracted from the FERET face database.
the form:
where C, is the cost of local matchings, Ce the cost of local deformations and p is a parameter which controls the rigidity of the elastic matching and has to be hand-tuned. As the number of possible mappings is extremely large, even for lattices of moderate size, an exhaustive search is out of the question and an approximate solution has to be found. Toward this end, a two steps procedure was designed: 0
0
rigid matching: the whole template graph is shifted around the query graph. This corresponds to p + 00. We obtain an initial mapping M o . deformable matching: the nodes of the template lattice are then stretched through random local perturbations to reduce further the cost function until the process converges to a locally optimal mapping M * , i.e. once a predefined number of trials have failed to improve the mapping cost.
The previous matching algorithm was later improved. For instance, in 34 the authors argue that the two-stage coarse-to-fine optimization is sub-optimal as the deformable matching relies too much on the success of the rigid matching. The two stage optimization procedure is replaced with a probabilistic hill-climbing algorithm which attempts to find at each
12
iteration both the optimal global translation and the set of optimal local perturbations. In 35, the same authors further drop the C, term in equation (8). However, to avoid unreasonable deformations, local translations are restricted to a neighborhood.
2.2.2. Elastic Bunch Graph Matching
elaborated on the basic idea of EGM with the Elastic Bunch Graph Matching (EBGM) through three major extensions: 36
While the cost of local matchings in C, only makes use of the magnitude of the complex Gabor coefficients in the EGM approach, the phase information is used to disambiguate features which have a similar magnitude, but also to estimate local distortions. The features are no longer extracted on a rectangular graph but they now refer to specific facial landmarks called fiducial points. A new data structure called bunch graph which serves as a general representation of the face is introduced. Such a structure is obtained by combining the graphs of a set of reference individuals.
It should be noted that the idea of extracting features at positions which correspond t o facial landmarks appeared in earlier work. In 37 feature points are detected using a Gabor wavelet decomposition. Typically, 35 to 50 points are obtained in this manner and form the face graph. To compare two face graphs, a two-stage matching similar to the one suggested in 33 is developed. One first compensates for a global translation of the graphs and then performs local deformations for further optimization. However, another difference with 33 is that the cost of local deformations (also called topology cost) is only computed after the features are matched which results in a very fast algorithm. One advantage of 36 over 37 is in the use of the bunch graph which provides a supervised way to extract salient features. An obvious shortcoming of EGM and EBGM is that C,, the cost of local matchings, is simply a sum of all local matchings. This contradicts the fact that certain parts of the face contain more discriminant information and that this distribution of the information across the face may vary from one person t o another. Hence, the cost of local matchings at each node should be weighted according to their discriminatory power 38y39134935.
13
2.2.3. Probabilistic Deformable Model of Face Mapping
A novel probabilistic deformable model of face mapping 40, whose philosophy is similar to EGM 33, was recently introduced. Given a template face &-, a query face FQ and a deformable model of the face M , for a face identification task the goal is to estimate P(.TTI.FQ,M). The two major differences between EGM and the approach presented in 40 are: 0
0
In the use of the HMM framework which provides efficient formulas M ) and train automatically all the paramet o compute P(FTIFQ, ters of M . This enables for instance to model the elastic properties of the different parts of the face. In the use of a shared deformable model of the face M for all individuals, which is particularly useful when little enrollment data is available.
3. Other “Modalities” for Face Recognition
In this section we will very briefly review what we called the “other modalities” and which basically encompass the remaining of the literature on face recognition: profile recognition, recognition based on range data, thermal imagery and finally video-based face recognition. 3.1. Profile Recognition
The research on profile face recognition has been mainly motivated by requirements of law enforcement agencies with their so-called mug shot databases ”. However, it has been the focus of a relatively restricted number of papers. It should be underlined that frontal and profile face recognition are complementary as they do not provide the same information. A typical profile recognition algorithm first locates on the contour image points of interest such as the nose tip, the mouth, chin, etc. also called jiducial points and then extracts information such as the distances, angles, etc. for the matching (see 41 for an example of an automatic system based on this principle). An obvious problem with such an approach is the fact that it relies on an accurate feature extraction. Alternative approaches which alleviate this problem include (but are not limited to) the use of Fourier descriptors for the description of closed curves 42, the application of Eigenfaces to profiles l9 and, more recently, an algorithm based on string matching 4 3 .
14
3.2. Range Data
While a 2-D intensity image does not have direct access to the 3-D structure of an object, a range image contains the depth information and is not sensitive to lightning conditions (it can even work in the dark) which makes range data appealing for a face recognition system. The sensing device can be a rotating laser scanner which provides a very accurate and complete representation of the face as used for instance in However, such a scanner is highly expensive and the scanning process is very slow. In 46 the authors suggested the use the coded light approach for acquiring range images. A sequence of stripe patterns is projected onto the face and for each projection an image is taken with a camera. However, for shadow regions as well as regions that do not reflect the projected light, no 3-D data can be estimated which results in range images with a lot of missing data. Therefore, the authors decided to switch to a multi-sensor system with two range sensors acquiring the face under two different views. These two sets of range data are then merged. Although these sensing approaches reduce both the acquisition time and cost, the user of such a system should be cooperative which restricts its use. This may explain the fact that little literature is available on this topic. In 4 4 , the authors present a face recognition system based on range data template matching. The range data is segmented into four surface regions which are then normalized using the location of the eyes, nose and mouth. The volume between two surfaces is used as distance measure. In 45 the face recognition system uses features extracted from range and curvature data. Examples of features are the left and right eye width, the head width, etc. but also the maximum Gaussian curvature on the nose ridge, the average minimum curvature on the nose ridge, etc. In 4 6 , the authors apply and extend traditional 2-D face recognition algorithms (Eigenfaces and HMMbased face recognition 47) to range data. More recently, 48 point signatures are used as features for 3-D face recognition. These feature points are projected into a subspace using PCA. 44145.
3.3. Facial Therrnogmm
The facial heat emission patterns can be used to characterize a person. These patterns depend on nine factors including the location of major blood vessels, the skeleton thickness, the amount of tissue, muscle, and fat 4 9 . IR face images have the potential for a good biometric as this signatures is unique (even identical twins do not share the same facial thermogram)
15
and it is supposed to be relatively stable over time. Moreover, it cannot be altered through plastic surgery. The acquisition is done with an infrared (IR) camera. Hence, it does not depend on the lightning conditions, which is a great advantage over traditional facial recognition. However, 1R imagery is dependent on the temperature and IR is opaque to glass. A preliminary study 50 compared the performance of visible and IR imagery for face recognition and it was shown that there was little difference in performance. However, the authors in 50 did not address the issue of significant variations in illumination for visible images and changes in temperature for IR images.
3.4. Video-Based Recognition
Although it has not been a very active research topic (at least compared to frontal face recognition), video-based face recognition can offer many advantages compared to recognition based on still images: 0
0
0
Abundant data is available at both enrollment and test time. Actually one could use video at enrollment time and still images at test time or vice versa (although the latter scenario would perhaps make less sense). However, it might not be necessary to process all this data and one of the tasks of the recognition system will be the selection of an optimal subset of the whole set of images which contains the maximum amount of information. With sequences of images, the recognition system has access to dynamic features which provides valuable information on the behavior of the user. For instance, the BioID system l4 makes use of the lip movement for the purpose of person identification (in conjunction with face and voice recognition). Also dynamic features are generally more secure against fraud than static features as they are harder to replicate. Finally the system can try to build a model of the face by estimating t h 3-D depth of points on objects from a sequence of 2-D images which is known as structure from motion ll.
Video-based recognition might be extremely useful for covert surveillance, for instance in airports. However, this is a highly challenging problem as the system should work in a non-cooperative scenario and the quality of surveillance video is generally poor and the resolution is low.
16
4. Multimodality
Reliable biometric-based person authentication systems, based for instance on iris or retina recognition already exist but the user acceptance for such systems is generally low and they should be used only in high security scenarios. Systems based on voice or face recognition generally have a high user acceptance but their performance is not satisfying enough. Multimodality is a way to improve the performance of a system by combining different biometrics. However, one should be extremely careful about which modalities should be combined (especially, it might not be useful to combine systems which have radically different performances) and how to combine them. In the following, we will briefly describe the possible multimodality scenarios and the different ways to fuse the information. 4.1. Different Multirnodality Scenarios
We use here the exhaustive classification introduced in
51 :
(1) multiple biometric systems: consists in using different biometric attributes, such as the face, voice and lip movement 14. This is the most commonly used sense of the term multimodality. (2) multiple sensors: e.g. a camera and an infrared camera for face recognition. (3) multiple units of the same biometn'c: e.g. fusing the result of the recognition of both irises. (4) multiple instances of the same biometn'c: e.g. in video-based face recognition, fusing the recognition results of each image. (5) multiple algorithms on the same biometric capture.
We can compare these scenarios in terms of the expected increase of performance of the system over the monomodal systems versus the increase of the cost of the system, which can be split into additional software and hardware costs. In terms of the additional amount of information and thus in the expected increase of the performance of the system, the first scenario is the richest and scenarios (4) and (5) are the poorest ones. The amount of information brought by scenario (2) is highly dependent on the difference between the two sensors. Scenario (3) can bring a large amount of information as, for instance, the two irises or the ten fingerprints of the same person are different. However, if the quality of a fingerprint is low for a person, e.g. because of a manual activity, then the quality of the other
17
fingerprints is likely t o be low. The first two scenarios clearly introduce an additional cost as many sensors are necessary t o perform the acquisitions. For scenario (3) there is no need for an extra sensor if captures are done sequentially. However, this lengthens the acquisition time which makes the system less user-friendly. Finally, scenarios (1) and (5) induce an additional software cost as different algorithm are necessary for the different systems. 4.2. Information Fusion
As stated at the beginning of this section, multimodality improves the performance of a biometric system. The word performance includes both accuracy and eficiency. The assumption which is made is that different biometric systems make different types of errors and thus, that it is possible to use the complementary nature of these systems. This is a traditional problem of decision fusion 53. Fusion can be done at three different levels 52 (by increasing order of available information): 0
0
At the abstract level, the output of each classifier is a label such as the ID of the most likely person in the identification case or a binary answer such as accept/reject in the verification case. At the rank level the output labels are sorted by confidence. At the measurement level, a confidence measure is associated to each label.
Commonly used classification schemes such as the product rule, sum rule, min rule, max rule and median rule, are derived from a common theoretical framework using different approximations 54. In 5 5 , the authors evaluated different classification schemes, namely support vector machine (SVM), multi layer perceptron (MLP), decision tree, Fisher’s linear discriminant (FLD) and Bayesian classifier and showed that the SVM- and Bayesian-based classifiers had a similar performance and outperformed the other classifiers when fusing face and voice biometrics. In the identification mode, one can use the complementary nature of different biometrics to speed-up the search process. Identification is generally performed in a sequential mode. For instance, in 56 identification is a two-step process: face recognition, which is fast but unreliable is used to obtain an N-best list of the most likely persons and fingerprint recognition, which is slower but more accurate, is then performed on this subset.
18
5. Summary
We introduced in this paper biometrics, which deals with the problem of identifying a person based on his/her physical and behavioral characteristics. Face recognition, which is one of the most actively research topic in biometrics, was briefly reviewed. Although huge progresses have been made in this field for the past twenty years, research has mainly focused o n frontal face recognition from still images. We also introduced the notion of multimodality as a way of exploiting t h e complementary nature of monomodal biometric systems.
References 1. S. Liu and M. Silverman, “A practical guide to biometric security technology”, I T Professional”, vol. 3, no. 1, pp. 27-32, Jan/Feb 2001.
2. A. Jain, R. Bolle and S. Pankanti, “Biometrics personal identification in networked society”, Boston, MA: Luwer Academic, 1999. 3. R. 0. Duda, P. E. Hart and D. G. Stork, “Pattern classification”, 2nd edition, John Wiley & Sons, Inc. 4. P. J. Phillips, H. Moon, S. Rizvi and P. Rauss, “The FERET evaluation methodology for face recognition algorithms”, IEEE Tbans. on PAMI, 2000, vol. 22, no. 10, October. 5. K. Messer, J. Matas, J. Kittler and K. Jonsson, “XMZVTSDB: the extended M2VTS database”, AVBPA’99, 1999, pp. 72-77. 6. P. J. Phillips, A. Martin, C. L. Wilson and M. Przybocki, “An introduction t o evaluating biometric systems”, Computer, 2000, vol. 33, no. 2, pp. 56-63. 7. INSPASS, http://www.immigration.gov/graphics/howdoi/inspass.htm 8. 0. O’Sullivan, “Biometrics comes t o life”, Banking Journal, 1997, January. 9. Nuance, http://www.nuance.com 10. Siemens Automotive, http://media.siemensauto.com 11. R. Chellappa, C. L. Wilson and S. Sirohey, “Human and machine recognition of faces: a survey”, Proc. of the IEEE, 1995, vol. 83, no. 5, May. 12. J. Daugman, “HOWiris recognition works” ICIP, 2002, vol. 1, pp. 33-36. 13. B. Moreno, A. Sanchez and J. F. Velez, “On the use of outer ear images for personal identification in security applications”, IEEE 3rd Conf. on Security Technology, pp. 469-476. 14. R. W. Fkischholz and U.Dieckmann, “BioID: a multimodal biometric identification system”, Computer, 2000, vol. 33, no. 2, pp. 64-68, Feb. 15. E. Hjelmas and B. K. Low, “Face detection: a survey”, Computer Vision and Image Understanding, 2001, vol. 83, pp. 236-274. 16. M. Kirby and L. Sirovich, “Application of the karhunen-lohe procedure for the characterization of human faces,” IEEE Bans. on PAMI, vol. 12, pp. 103-108, 1990. 17. I. T. Joliffe, “Principal Component Analysis”, Springer-Verlag, 1986. 18. M. A. Turk and A. P. Pentland, “Face recognition using eigenfaces,” in IEEE
19 Conf. on CVPR, 1991, pp. 586-591. 19. A. Pentland, B. Moghaddam and T. Starner, “View-based and modular eigenspaces for face recognition,” IEEE Conf. on CVPR, pp. 84-91, June 1994. 20. H.-C. Kim, D. Kim and S. Y. Bang, “Face recognition using the mixture-ofeigenfaces method,” Pattern Recognition Letters, vol. 23, no. 13, pp. 15491558, Nov. 2002. 21. D. S. Turaga and T. Chen, “Face recognition using mixtures of principal components,” IEEE Int. Conf. on IP, vol. 2, pp. 101-104, 2002. 22. R. Brunelli and T. Poggio, “Face Recognition: Features versus Templates”, IEEE Trans. o n PAMI, 1993, vol. 15, no. 10, pp. 1042-1052, Oct. 23. W. J. Welsh and D. Shah, “Facial feature image coding using principal components,” Electronic Letters, vol. 28, no. 22, pp. 2066-2067, October 1992. 24. P. N. Belhumeur, J. P. Hespanha and D. J. Kriegman, “Eigenfaces vs. fisherfaces: recognition using class specific linear projection,” IEEE Transaction on PAMI, vol. 19, pp. 711-720, Jul 1997. 25. K. Etermad and R. Chellappa, “Face recognition using discriminant eigenvectors,” ICASSP, vol. 4, pp. 2148-2151, May 1996. 26. D. L. Swets and J. Weng, “Using discriminant eigenfeatures for image retrieval,” IEEE n u n s . on PAMI, vol. 18, no. 8, pp. 831-836, August 1996. 27. C. Liu and H. Wechsler, “Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition,” IEEE P a n s . on IP, vol. 11, no. 4, pp. 467-476, Apr 2002. 28. L.-F. Chen, H.-Y. M. Liao, M.-T. KO, J.-C. Lin and G.-J. Yu, “A new ldabased face recognition system which can solve the small sample size problem,” Pattern Recognition, vol. 33, no. 10, pp. 1713-1726, October 2000. 29. J. Yang and J.-Y. Yang, “Why can Ida be performed in pca transformed space?,” Pattern Recognition, vol. 36, no. 2, pp. 563-566, February 2003. 30. B. Moghaddam, W. Wahid and A. Pentland, “Beyond eigenfaces: Probabilistic matching for face recognition,” IEEE Int. Conf. o n Automatic Face and Gesture Recognition, pp. 30-35, April 1998. 31. B. Moghaddam and A. Pentland, “Probabilistic visual learning for object recognition,” Int. Conf. o n Computer Vision, 1995. 32. J. Zhang, Y. Yan and M. Lades, “Face recognition: Eigenface, elastic matching, and neural nets,” Proc. of the IEEE, vol. 85, no. 9, Sep 1997. 33. M. Lades, J. C. Vorbriiggen, J. Buhmann, J. Lange, C. von der Malsburd, R. Wiirtz and W. Konen, “Distortion invariant object recognition in the dynamic link architecture,” IEEE Trans. o n Computers, 1993, vol. 42, no. 3. 34. C. L. Kotropoulos, A. Tefas and I. Pitas, “Frontal face authentication using discriminant grids with morphological feature vectors,’’ IEEE P a n s . on Multimedia, vol. 2, no. 1, pp. 14-26, March 2000. 35. A. Tefas, C. Kotropoulos and I. Pitas, “Using support vector machines to enhance the performance of elastic graph matching for frontal face recognition,” IEEE Trans. o n PAMI, vol. 23, no. 7, pp. 735-746, Jul 2001. 36. L. Wiskott, J. M. Fellous, N. Kriiger and C. von der Malsburg, “Face recognition by elastic bunch graph matching,” IEEE Trans. on PAMI, vol. 19, no.
20
7, pp. 775-779, July 1997. 37. B. S. Manjunath, R. Chellappa and C. von der Malsburg, “A feature based approach to face recognition,” Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 373-378, 1992. 38. N. Kriiger, “An algorithm for the learning of weights in discrimination functions using a priori constraints,” IEEE Trans. on PAMI, vol. 19, no. 7, Jul 1997. 39. B. DGc, S. Fischer and J. Bigiin, “Face authentication with gabor information on deformable graphs,” IEEE Trans. on IP, vol. 8, no. 4, Apr 1999. 40. F. Perronnin, Jean-Luc Dugelay and K. Rose, “Deformable Face Mapping for Person Identification”, ICIP, 2003. 41. C. Wu and J. Huang, “Human Face profile recognition by computer”, Pattern Recognition, vol. 23, pp. 255-259, 1990. 42. T. Aibara, K. Ohue and Y. Matsuoka, “Human face recognition of P-type Fourier descriptors”, SPIE Proc., vol 1606: Visual Communication and Image Processing, 1991, pp. 198-203. 43. Y. Can, M. Leung, “Human face profile recognition using attributed string”, Pattern Recognition, vol. 35, pp. 353-360. 44. G. Gordon, “Face recognition based on depth maps and surface curvature”, SPIE Proc., vol. 1570, pp. 234-247, 1991. 45. G. Gordon, “Face recognition based on depth and curvature features”, IEEE Conf on CVPR, 1992, pp. 808-810, 15-18 Jun. 46. B. Achermann, X. Jiang and H. Bunke, “Face recognition using range images”, VSMM, 1997, pp. 129-136, 10-12 Sep. 47. F. S. Samaria, “Face recognition using hidden Markov models”, Ph. D. thesis, University of Cambridge, 1994. 48. Y . Wang, C.-S. Chua and Y.-K. Ho, “Facial feature detection and face recognition from 2D and 3D images”, Pattern Recognition Letters, 2002, vols. 23, pp. 1191-1202. 49. M. Lawlor, “Thermal pattern recognition systems faces security challenges head on”, Signal Magazine, 1997, November. 50. J. Wilder, P. J. Phillips, C. Jiang and S. Wiener, “Comparison of visible and infra-red imagery for face recognition”, Int. Conf. on Automatic Face and Gesture Recognition, 1996, pp. 182-187, 14-16 Oct. 51. S. Prabhakar and A. Jain, “Decision-level fusion in biometric verification”, Pattern Recognition, 2002, vol. 35, no. 4, pp.861-874. 52. R. Brunelli and D. Falavigna, “Person identification using multiple cues”, IEEE Trans. on PAMI, 1995, vol. 17, no. 10, pp. 955-966, Oct. 53. B. V. Dasarathy, “Decision fusion”, IEEE Computer Society Press, 1994. 54. J. Kittler, M. Hatef, R. Duin and J. Matas, “On combining classifiers”, IEEE Trans. on PAMI, 1998, vol. 20, no. 3, pp. 226-239, 55. S. Ben-Yacoub, Y. Abdeljaoued and E. Mayorz, “Fusion of face and speech data for person identity verification”, IEEE Trans. on NN, 1999, vol. 10, no.5, Sept. 56. L. Hong and A. Jain, “Integrating faces and fingerprints for personal identification”, IEEE Trans. on PAMI, 1998, vol. 20, no. 12, pp. 1295-1307.
21
THE USE OF IMAGE ANALYSIS IN THE EARLY DIAGNOSIS OF ORAL CANCER R. SERPICO, M. PETRUZZI AND M DE BENEDI'ITIS Department of Odontostomatology and Surgery University of Bari. p.zza G. Cesare 11- Bari- ITALY E-mail: r.serpico@doc,uniba.it
Oral squamous cell carcinoma (OSCC) is a malignant neoplasm revealing a poor prognosis. Despite of the site where such disease arises, there are several cases where OSCC is not early detected by clinicians. Moreover, diagnostic delay shortens the prognosis. In literature several tools, with variable specificity and sensibility, of image analysis have been proposed in order to detect OSCC. Lesional autofluorescence analysis of OSCC has revealed effective, however different methods used to evoke the fluorescence. On the other hand, vital staining, such as toluidine blu, requires only a clinical assessment of the degree to detect the lesions. No studies have been performed by using a computerized analysis of OSCC images or a neural networks. The screening tool for an early OSSC detection should be inexpensive, easy to use and reliable. We hope for the use information development in OSCC lesions analysis to make its diagnosis early in order to extend the prognosis.
1. Definition and epidemiology of oral carcinoma Recently, it has been estimated that oral squamous cell carcinoma ( OSCC) represents 3% of all malignant neoplasms . OSCC, usually, affects more men than women so is considered the 6" most frequent male malignant tumour and the 12" female one. In U.S.A about 21.000 new cases of OSCC are diagnosed every year and 6.000 people die because of this disease. In the last decade OSCC has gone on developing. This has caused a terrible increase of under 30 individuals affected by oral carcinoma. A serious data concerns prognosis in these patients. If the neoplasm is detected within its 1'' or 2"d stage, the probability of living for five years will be 76%. This value will go down 41% , if the malignant tumour is diagnosed withm its 3rdstage. Only 9% of the patients goes on living after five years since OSCC diagnosis during its 4" stage. The diagnostic delay is caused by different reasons: carcinoma development. OSCC, during its manifestation, doesn't reveal any particular symptom or painful. So, the patient tends to ignore the lesion and hardly goes to the dentist to ask for a precise diagnosis; the polymorfism that the oral lesions often show. For example, an ulcer can appear similar to a trauma, aphtae major or carcinoma;
22
the doctors in charge who aren’t used to examining the oral cavity during the routine check-up. So, recent researches has proved that a person suffering from mucous lesions in the oral cavity goes first to h s family doctor who advises him a dermatological visit. Usually, carcinoma is detected after 80 days after its first symptoms, so, this delay even is responsible for the short OSCC prognosis . 2.
Fluorescence methodologies
Optical spectroscopy autofluorescence tissue is a sensitive, not invasive methodology sensitive, easy to use and capable of detecting possible alterations of the tissue. Autofluorescence results from the presence of porphyrin connected with neoplasm growth. Fluorescence given out the sound tissues, reveals a colour different from that one observed on tissues affected by carcinoma. This autofluorescence can be also stimulated by irradiations through laser, xenon light or halogen lamps.
Fig. 1. Fluorescence of OSCC localized at the border of tongue.(Oral Oncology 39 (2003) 150-156. ) Recently, it has showed a particular program which permit us to read digitalized images of fluorescing lesions. This system uses the following operating algorithm: 1. RGB FLUORESCENCE IMAGE 2. CONTRAST ENHANCEMENT 3. HUE EXTRACTION 4. HISTOGRAM THRESHOLDING 5. SEGMENTATION 6 . QUANTITATIVE PARAMETERS EXTRACTION 7. DIAGNOSTIC ALGORITHM 8. COMPARE WITH GOLD STANDARD
23
9. TISSUE CLASSIFICATION. These methodologies reveal an high sensibility ( about 95%) but a specificity of 51-60%. Scientific literature shows some researches on the use of neural networks able to make a good judgement on autofluorescence caused by dubious lesions, Using these neural networks, it’s possible to distinguish a sound tissue from a neoplasm one with a sensibility of 86% and a specificity of 100%. In realty, it has been proved that these methodologies are ineffective because aren’t able to identify the various mucous areas with their different dysplasia levels.
Fig. 2. Example of mean neural work input curves grouped according to the clinical diagnosis. (Oral Oncology 36 (2000) 286-293) Onizawa and his collaborators have tested the use of fluorescence methodologies on 55 patients suffering from OSCC. According to their research, 90% of the cases analysed has resulted positive to fluorescence. So, they have found out that the lesion staging is as major as the sensibility and specificity methodology.
3.
Toulidine blue
Toulidine blue is a metachromatic vital staining. Years ago it was employed by the gynaecologists but today is considered a good methodology for diagnosing OSCC.
24
Because of the colorant is particularly similar to acid, it can combine directly with genetic material ( DNA, RNA) of cells keeping on reproducing. So, it’s possible to note DNA and RNA synthesis in neoplasm clones increasing where neoplasm grows. This methodology is easy, inexpensive and doesn’t cause any physical discomfort. So, the patient must only rinse his oral cavity with acetic acid ( 1%) in order to remove cellular residues and all that’s on the lesion. Successively, it’s possible apply toulidine blue (1%) on the lesion for 30 seconds.
Fig. 3. Example of neoplastic lesion stained by using toluidine blue. Areas with more active mitosis stain more with toluidine blue. The patient rinses again the lesion with acetic acid to remove the excessive and not fixed colour. In this moment the clinician can detect the lesion according to the colour even though OSCC diagnosis depends largely on histology report. So, the coloured lesion can be defined : a) TRUE POSITIVE : the lesion has absorbed the colour and is an OSCC from an histological point of view; b) FALSE POSITIVE: the lesion has absorbed the colour but isn’t an OSCC from an histological point of view;
c) TRUE NEGATIVE: the lesion doesn’t absorb the colour and isn’t an OSCC from an histological point of view; d) FALSE NEGATIVE: the lesion doesn’t absorb the colour but is an OSCC from an histological point of view.
25
Fig.4. Example of traumatic lesion: even though stained by toluidine blue, the lesion is not a carcinoma (false positive).
In realty, this methodology is sensible but not particularly specific. The number of coloured lesions, even though aren’t cancerous, is large. Scientific literature shows different researches on the reliability of this methodology. The case histories reveal encouraging data about the diagnostic power of toulidine blue but no study has still considered the use of a digital reading of the lesion. Employing digital methodologies could make more reliable this test and , for example, it’s possible to use the different blue gradations invisible to the naked eye. The reading of the coloured lesions making use of toulidine blue aims to offer the dentists another diagnostic tool. It is inexpensive, easy to use and not invasive, so, can be normally used like screening for the patients who are used to go to the dentist. On the other hand this methodology makes possible an on-line communication of the digital images to specialized centres in order to have other consultations. Actually, there isn’t a screening methodology with a sensibility and specificity of 100%. However, the use of data processing system improves the reliability in diagnostic methodologies and offers an objective analysis. 4.
Conclusions
Scientific literature hasn’t showed trial which have compared the efficacy of the different methodologies used to analyse the image in OSCC diagnosis. We hope we will use an univocal, reliable and inexpensive reading methodology of the lesion. The information development should help clinic-medical diagnosis. It could be the ideal way to have an early diagnosis. This will cause a prognosis improvement which will make the relationship between medicine and computer science extraordinary.
26
Acknowledgments Authors are grateful to Annalisa Chiala for reviewing this paper. References 1. Benjamin S, Aguirre A and Drinnan A, Dent. Today. 21( 11):116 (2002).
2.
Llewellyn CD, Johnson NW and Warnakulasuriya KA, Oral. Oncol. 37(5):401 (2001).
3.
Neville, Damm, Allen, Bouquot: Oral & Maxillofacial Pathology. Saunders press. 2"d Edition- USA (2002).
4.
Onizawa K, Okamura N, Saginoya H and Yoshida H. Oral. Oncol. 39(2): 150 (2003).
5.
Onofre MA, Sposto MR and Navarro CM. Oral. Surg. Oral. Med. Oral. Pathol. Oral. Radiol. Endod. 91(5):535(2001).
6.
Porter SR and Scully C. Br. Dent. J. 25;185(2):72 (1998),
7.
Reichart PA. Clin. Oral. Investig. 5(4):207 (2001).
8. van Staveren HJ, van Veen RL, Speelman OC, Witjes MJ, Star WM and Roodenburg JL. Oral. Oncol. 36(3):286 (2000). 9.
Zheng W, So0 KC, Sivanandan R and Olivo M. Znt. J. Oncol. 21(4):763 (2002).
27
LUNG EDGE DETECTION IN POSTER0 ANTERIOR CHEST RADIOGRAPHS
PAOLA CAMPADELLI Dipartamento di Scienze dell 'Informazione, Universitd degli Studi d i Milano, Via Comelico, 39/41 20135, Milano, Italy E-mail: campadelli0dsi.unimi.it ELENA CASIRAGHI Dipartimento di Scienze dell 'Informazione, Uniuersitd degli Studi d i Milano, Via Comelico, 39/41 80135, Milano, Italy E-mail: casiraghiQdsi.unimi.it The use of image processing techniques and Computer Aided Diagnosis (CAD) systems has proved to be effective for the improvement of radiologists' diagnosis, especially in the case of lung nodules detection. The first step for the development of such systems is the automatic segmentation of the chest radiograph in order to extract the area of the lungs. In this paper we describe our segmentation method, whose result is a close contour which strictly encloses the lung area.
1. Introduction
In the field of medical diagnosis a wide variety of ima-ging techniques is currently avalaible, such as radiography, computed tomography (CT) and magnetic resonance ima-ging (MRI). Although the last two are more precise and more sensitive techniques, the chest radiography is still by far the most common type of procedure for the initial detection and diagnosis of lung cancer, due to its noninvasivity characteristics, radiation dose and economic consideration. Studies by [20] and [ll]explain why chest radiograph is one of the most challenging radiograph to produce technically and to interpret diagnostically. When radiologists rate the severity of abnormal findings, large interobserver and intraobserver differences occur. Moreover several studies in the last two decades, as for example [B] and [2], calculated an av-
28
erage miss rate of 30% for the radiographic detection of early lung nodules by humans. In a large lung cancer screening program 90% of peripheral lung cancers have been found to be visible in radiographs produced earlier than the date of the cancer discovery by the radiologist. This results showed the potentiality of improved early diagnosis, suggesting the use of computer programs for radiograph analysis. Moreover the advent of digital thorax units and digital radiology departments with Picture Archiving Communication Systems (PACS) makes it possible to use computerized methods for the analysis of chest radiographs as a routine basis. The use of image processing techniques and Computer Aided Diagnosis (CAD) systems has proved to be effective for the improvement of radiologists’ detection accuracy for lung nodules in chest radiographs as reported in [15]. The first step of an automatic system for lung nodule detection, and in general for any further analysis of chest radioraphs, is the segmentation of lung field so that all the algorithms for the identification of lung nodules will be applied just to the lung area. The segmentation algorithms proposed in the literature t o identify the lung field can be grouped into: rule based systems ([l],[21], [22], [7], [4], [14], [5], [3]), pixel classification methods including Neural Networks ([13], [12], [9], [IS]) and Markov random fields ([18] and [19]),active shape models ([S]) and their extensions ([17]). In this paper we describe an automatic segmentation method which identifies the lung area in Postero-anterior (PA) digital radiographs. Since the method is thought as the first step of an automatic lung nodule detection algorithm, we choose to include in the area of interest also the bottom of the chest and the region behind the heart; they are usually excluded by the methods presented in the literature. Besides, we tried t o avoid all kind of assumptions such as the position and orientation of the thorax: we work with images where the chest is not always located in the central part of the image, it can be tilted and it can have structural abnormalities. The method is made of two steps. First, the lungs are localized using simple techniques (section 4), then their borders are more accurately defined and fitted with curves and lines in order to obtain a simple close contour (section 5).
2. Materials
Our database actually contains 11 1 radiographs of patients with no disease and 13 of patients with lung nodules. They have been acquired in the
29
Department of Radiology of the Niguarda Hospital in Milan. The images were digitized with a 0.160 mm pixel size, a maximum matrix size of 2128 by 2584, and 4096 grey levels. Before processing they have been downsampled to a dimension of 300 by 364 pixels, and filtered with a median filter of 3 pixel size. In the following sections we will refer to these images as the origina2 images. 3. Coarse lung border detection
3.1. Iterative thresholding
Since both the background of the image and the central part of the lungs are charachterized by the highest grey va-lues, while the tissues between them are very dark, we use an iterative thresholding technique to obtain a first classification of the pixels as belonging to lung, body or background regions. Before applying the thresholding procedure, we enhance the image contrast by means of a non linear extreme value sharpening technique:
GN(z,y) =
max iff Jmax-G(x, y)J 5 5 lmin -G(z, min otherwise
Y)I
(1)
where min and max are the minimum and maximum grey values computed on a window Win(x, y) centered in ( 2 , ~ )The . window size used is 5 pixel. We choose this operator because it has the effect of increasing the contrast where the boundaries between objects are characterized by gradual changes in the grey levels. In the case of chest radiographs we often find this situation in the peripheral area of the lung and sometimes on the top regions and costophrenic angles. We then perform a linear transformation on the enhanced image with 4096 grey levels, to get an image with 256 grey levels, and start the iterative thresholding at an initial high threshold value of 235. At each step we lower the threshold by 1 and classify the regions formed by the pixels with grey value higher than the threshold into background and lung regions. We consider background regions those attached to the borders of the image or those at a distance of 1 pixel from other border regions, the others are identified as lung. The algorithm stops when two regions classified differently at the previous step fuse.
30
To obtain a finer approximation of the lung region we repeat the described iterative procedure for three times; each time the input is the original 8-bit image where the lung pixels found at the previous iteration are set t o 0. In [Fig.l] (left) a lung mask image is shown. The background is red coloured, the body part is black, the lung regions are blue.
3.2. Edge detection
At this stage we look for rough lung borders. To obtain an initial edge image (see [Fig.l] (center)) we use the simple but efficient Sobel operator, select 18% of the pixels with the highest gradient and delete those corresponding t o the background. We then mantain only the connected edge pixels regions which intersect the lung region previously identified. To delete or to separate from the lung borders edge pixels belonging t o other structures such as collarbones, neck, or clavicles we use a morphological opening operator. The regions disconnected either from the lung mask border or from the edges selected are eliminated if their localisation satisfies one of the following conditions: they are attached t o the borders of the image or t o background regions, their bottommost pixel is located over the topmost pixel of the lung regions, they are totally located in the space between the two lung areas. If the area covered by the remaining edge pixels is less extended than the one occupied by the lung mask .we look for new edge pixels in the lung regions. This is done by considering in the initial edeg image a bigger percentage of pixel with the highest grey value and adding them until either the edge pixels cover the whole lung area or the the percentage reaches a value of 40%. In [Fig.l] we show an example of the initial edge image (center) and the extracted lung edge image, E , (right). As can be seen further processing is necessary since some lung border may still be missing (the top or bottom parts, the costophrenic angles,..), or wrong edge pixels (belonging to the neck or collarbones) can still be present. To solve this problem we search for the axis of the thorax. We can thus delete, if they are present, the edges belonging to neck or collarbones and estabilish if the thorax has a non vertical position.
31
Figure 1. lung mask image, initial edge image and edge image
3.3. Axis finder To find the axis of the chest we use a binary image obtained by an OR operation between the lung edge image, E , and the lung mask image. For each horizontal line of this new image, we find the pixel in the center of the segment connecting the leftmost and rightmost pixel and sign it if the extremes of the segment do not belong to the same lung region. Moreover, we consider the inclination of the line connecting one central pixel ($0, yo) to the following (z1,yl) and discard it if the value (21- y o ) / ( z l - z o ) is less then 1.5; a lower value means that probably (z1,yl) has been computed from two outmost pixels that are not symmetric with respect to the real axis. The Hough transform to search for lines, and a polynomial fitting method that minimizes the chi-square error statistic, is used to find two possible axis of the image. The one that fits the central pixels better is then chosen as chest axis. In figure [Fig.2] (left) the central points used to find the axis and the corresponding lateral points are signed with the blue and red color respectively; on the right the axis dilated is showed. 3.4. Edge refinement
The axis found is usually located in the center of the dorsal column. This fact allows us to delete edges in E that belong to the dorsal column or to the neck. They are tipically little edge regions (with less than 200 pixel), crossing the axis itself or, more often, located in a region around it. We defined this region as a stripe which width is of 1/25 of the width of the originaE image (see [Fig21 on the right). We then delete all the regions with less than 200 pixels that cross this stripe. If some lung edge is wrongly
32
Figure 2. axis points and neck stripe
cancelled it will be recovered in the next steps. It can happen that the top parts of the lungs are detected by the Sobel operator but they are not included in the lung edge image E because in the lung m a s k they are not labelled as lung regions. The axis can help to verify this condition since the apex point of the lung should be located close to it. Consider the left lung (in the image), let ( z py,) be coordinates of the leftmost edge pixel with the lowest y coordinate, and let ( z a ,y,) be the coordinates of the axis in the same row; if lzp - zal is bigger than 1/4 of the total image width, we add those pixels that in the initial edge i m a g e are contained in a stripe extending from the x, to x a , with an height of y,/lO. The same operation is done for the right lung. We can also verify a simmetry condition between the two lung top pixels; if more that one pixel with the lowest y coordinate is found on each side, the central is taken. We evaluate the euclidean distance between one top and the simmetric of the other with respect to the axis; if this distance is greater than 20 we are allowed to think that there is no simmetry between the lungs edges found, and that the wrong top pixel is the one with the higher vertical coordinate. We therefore use this top pixel and the simmetric of the other one as vertices of a rectangular search area in the inatial edge image, and add the edge pixels found to E. The bottom part of the lungs are often charachterized by very low contrast and therefore also in this region we look for edge pixels to be added to E. In this case we use more accurate edge detectors, such as the directional gaussian filters. We limit the processing to a stripe centered around the bottommost edge pixel and with an height fixed at 1/8 of the vertical dimension of the original image. We work separately on the left and right lung sub-images, applying a locally adaptive scaling operator described in [lo], followed by the histogram equalisation. On these enhanced data we search in the left lung for edges oriented at 90" and 45", and in the right
33
lung for those oriented at 90" and 135'. We filter the image with a gaussian filter at scale c,related to the stripe dimension, take the vertical derivative and mantain the 5% of the pixels with the highest gradient value. These edge pixels, which often belongs to the lung borders, are added to the edge image. Since the costophrenic angle can still be missing we filter the image at a finer scale 0/2, take the derivative at 135" and 45" (depending on the side) and mantain the 10% of the edge pixels. A binary image that may represent the costophrenic angles is obtained combining this information with the 10% of the pixels with the highest value in the vertical direction. The regions in the binary image just created are added to the lung edge i m a g e E if they touch, or are attached to, some edge pixels in it. At this stage most of the edge pixels belonging to the lung borders should have been determined; the image can hence be reduced defining a rectangular bounding box slightly greater than the lung area defined by the lung edge image E . 4. Lung area delineation
4.1. Final contour refinement To obtain more precise and continuos contours we process the reduced image but with 4096 grey levels. We enhance it with a locally adaptive scaling algorithm and apply histogram equalization to the result. On the grey level enhanced image we identify the pixels that in the lung edge image E constitutes the lung extremes; for each side they are the leftmost and rightmost pixel in each row and the topmost and bottommost pixel for each column (they are red coloured in [Fig31 (left)). These are the seeds of the following region growing procedure: for each seed with grey value G(x,y), we select in its 8 neighborhood, and add to E , all the pixels in the range [G(z, y - lo), G(z, y) lo]. If their number is greater than 4 we select the pixels whose grey value is closest to G(z,y) and iterate the procedure unless a background pixel is identified or the selected element is another seed or 20 iteration steps have been done. This procedure creates thick contours, that now reach the external border of the lung, often much better defined especially on the top and bottom; however the lateral lung contours are often still discontinuos, especially in the right lung (see also Fig.31 (center)). We improve their definition calculating the horizontal derivative of the enhanced image, and keeping 15% percent of the pixels with the maximum value for the right lung, and 10% for the left. We tken delete those pixels internal to the lung or background regions; the
+
34
regions in this image intersecting edge pixels are added to the lung edge i m a g e (the result of this addition is shown in [Fig.3] (right)).
Figure 3. enhanced image with the seed points, edge image after growing, edge image after the last regions added
At this point we can define the close contour of the area containing the lungs, fitting the borders found with curves and lines. We describe the operation on the left lung only, referring to the binary image of its edges as left edge image El. We noticed that the shape of the top part of the lung could be well fitted by a second order polynomial function. To find it we use the Hough transform to search for parabolas, applied to the topmost points of each column in El. The fitted parabola is stopped, on the right side of its vertex, in the point where it crosses a line parallel to the axis and passing through the rightmost pixel; on the left side it is stopped where it crosses the left edge image; if more than one point is found we select the one with the lowest y coordinate. To find a close contour approximating the lateral borders we consider the set U composed by selecting for each row in El the leftmost pixel if it is located at the left side of the top one. Since we noticed that the orientation of the left border can change starting from the top to the bottom, we extracted from U three subsets u1, u2, u3 with an equal number of elements and containing the points located respectively in the upper, central and bottom part of the image. These subsets are fitted separately with different functions. We use one parabola to fit the points in u1: this allow us to recover errors in case the parabola used to fit the top points was too narrow (in the central image in [Fig41 an example of this fact is shown). A second line is used to fit the points in 212. The set u3 often contains the lateral points of both the lateral border of the lung and the lateral border of the
35
costophrenic angles; we noticed that in some cases the contours of these borders have different inclinations. We therefore fit with two different lines the points in the upper and bottom part of u g . We define as boundary in the bottom part the horizontal line that crosses the bottommost pixel of the edge image 5. Results
We detected small errors in 4 of the 124 images in our database, where we consider as error the fact that a part of the lung has not been included by the lung contours defined. The part missed by the algorithm is the border of the costophrenic angle. The algorithm anyway shows to be robust to structural abnormalities of the chest. ([Fig.$]). The algorithm has been implemented in IDL, an interpreted language and, when executed on a Pentium N with 256 Mb of RAM, it takes from 12 seconds (for images of patients with little sized lung that can be cutted as described in section 4.4) to 20 seconds (for images of big sized lung).
(b)
(c)
Figure 4. resulting images
References 1. S.G. Armato, M.Giger, and H.MacMahon. Automated lung segmentation in digitized posteroanterior chest radiographs. Academic radiology, 5:245-255, 1998. 2. J.H.M. Austin, B.M. Romeny, and L.S. Goldsmith. Missed bronchogenic carcinoma: radiographic findings in 27 patients with apotentially resectable lesion evident in retrospect. Radiology, 182:115-122, 1992.
36 3. M.S. Brown, L.S. Wilson, B.D. Doust, R.W. Gill, and CSun. Knowledgebased method for segmentation and analysis of lung boundaries in chest xrays images. Computerized Medical Imaging and Graphics, 22:463-477, 1998. 4. F.M. Carrascal, J.M. Carreira, M. Souto, P.G. Tahoces, L. Gomez, and J.J. Vidal. Automatic calculation of total lung capacity from automatically traced lung boundaries in postero- anterior and lateral digital chest radiographs. Medical Physics, 25:1118-1131, 1998. 5. D. Cheng and M. Goldberg. An algorithm for segmenting chest radiographs. Proc SPIE, pages 261-268, 1988. 6. T. Cootes, C. Taylor, D. Cooper, and J . Graham. Active shape models-their training and application. Comput. Vzs, Image Understanding, 61:38-59, 1995. 7. J. Duryea and J.M. Boone. A fully automatic algorithmfor the segmentation of lung fields in digital chest radiographic images. Medical Physics, 22:183191, 1995. 8. J. Forrest and P. Friedman. Radiologic errors in patient with lung cancer. West Journal on Med., 134485-490, 1981. 9. A. Hasegawa, S.-C. Lo, M.T. Freedman, and S.K. Mun. Convolution neural network based detection of lung structure. Proc. SPIE 2167, pages 654-662, 1994. 10. R. Klette and P.Zamperoni. Handbook of image processing operators. Wiley, 1994. 11. H. MacMahon and K. Doi. Digital chest radiography. Clan. Chest Med., 12:19-32, 1991. 12. M.F. McNitt-Gray, H.K. Huang, and J.W. Sayre. Feature selection in the pattern classification problem of digital chest radiographs segmentation. IEEE Duns. on Med. Imaging, 14:537-547, 1995. 13. M.F. McNitt-Gray, J.W. Sayre, H.K. Huang, and M. Razavi. A pattern classification approach to segmentation of chest radiographs. PROC SPIE 1898, pages 160-170, 1993. 14. E. Pietka. Lung segmentation in digital chest radiographs. Journal of digital imaging, 7:79-84, 1994. 15. T.Kobayashi, X.-W. Xu, H. MacMahon, C. Metz, and K. Doi. Effect of a computer-aided diagnosis scheme on radiologists’ performance in detection of lung nodules on radiographs. Radiology, 199:843-848, 1996. 16. 0. Tsuji, M.T. Freedman, and S.K. Mun. Automated segmentation of anatomic regions in chest radiographs using an adaptive-sized hybrid neural network. Med. Phys., 25:998-1007, 1998. 17. B. van Ginneken. Computer-aided diagnosis in chest radiographs. P.h.D. dissertation, Utrecht Univ., Utrecht, The Nederlands, 2001. 18. N.F. Vittitoe, R.Vargas-Voracek, and C.E. Floyd Jr. Identification of lung regions in chest radiographs using markov random field modeling. Med. Phys., 25:976-985, 1998. 19. N.F. Vittitoe, R. Vargas-Voracek, and C.E. Floyd Jr. Markov random field modeling in posteroanterior chest radiograph segmentation. Med. Phys., 26:1670-1677, 1999. 20. Cj Vyborny. The aapm/rsna physics tutorial for residents: Image quality and
37 the clinical radiographic examination. Radiographics, 17:479-498, 1997. 21. X.-W. Xu and K. Doi. Image feature anlysis for computer aided diagnosis: accurate determination of ribcage boundaries chest radiographs. Medical Physics, 22:617-626, 1995. 22. X.-W. Xu and K. Doi. Image feature anlysis for computer aided diagnosis: accurate determination of right and left hemidiaphragm edges and delineation of lung field in chest radiographs. Medical Physics, 23:1616-1624, 1996.
38
DISCRETE TOMOGRAPHY FROM NOISY PROJECTIONS
C . VALENTI Dipartimento di Matematica ed Applicazioni Universitci degli Studi d i Palermo Via Archirafi 34, 90123 Palermo - Italy E-mail:
[email protected] The new field of research of discrete tomography will be described in this paper. It differs from standard computerized tomography in the reduced number of projections. It needs ud hoc algorithms which usually are based on the definition of the model of the object to reconstruct. The main problems will be introduced and an experimental simulation will prove the robustness of a slightly modified version of a well known method for the reconstruction of binary planar convex sets, even in case of projections affected by quantization error. To the best of our knowledge this is the first experimental study of the stability problem with a statistical approach. Prospective applications include crystallography, quality control and reverse engineering while biomedical tests, due to their important role, still require further research.
1. Introduction
Computerized tomography is an example of inverse problem solving. It Usually consists of the recovering of a 3D object from its projections this object is made of materials with different densities and therefore it is necessary t o take a number of projections ranging between 500 and 1000. When the object is made of just one homogeneous material, it is possible to reduce the number of projections to no more than four, defining the so called discrete tomography '. In such a case we define a model of the body, assuming its shape. For example, we may know about the types of atoms to analyze, the probability to find holes inside the object and its topology (e.g. successive slices are similar to each other or some configurations of pixels are energetically unstable) '. Though this assumptions may be useful when considering applications
'.
39
such as nondestructive reverse engineering, industrial quality control, electron microscopy, X-rays crystallography, data coding and compression, they become almost unacceptable when the data to analyze come from biomedical tests. Nevertheless the engagements required by the present technology are too restrictive for real tasks and the state-of-the-art algorithms let mainly reconstruct simulated images of special shapes. Aim of this work is the description of an extensive simulation to verify the robustness of a modified version of a well known method for the reconstruction of binary planar convex sets. In particular, we will face the stability problem under noisy projections due to quantization error. Section 2 introduces formal notations and basic problems. Section 3 gives a brief description of the algorithm. Section 4 concludes with experimental results and remarks.
2. Basic notations and issues Discrete tomography differs from computerized tomography in the small variety of density distribution of the object to analyze and in the very few angles of the projections to take. From a mathematical point of view we reformulate this reconstruction problem in terms of linear feasibility (Figure 1):
A: = p-t , A E (0, a: E (0, l}n,p E Nr where the binary matrix A represents the geometric relation between points in Z2 and the integer valued vector p- represents their projections. 1 3
1 3 2 Figure 1 . A subset of ' 2 and its corresponding linear equation system. The black disks (+)and the small dots (+) represent the points of the object and of the discrete lattice, respectively.
Main issues in discrete tomography arise from this dearth of the input data. In 1957 a polynomial time method to solve the consistency problem (i.e. the ability to state whether there exists any A compatible with a given p-) has been presented 4 .
40
The uniqueness problem derives from the fact that different A’s can satisfy the same p . For example, two A’s with the same horizontal and vertical projections can be transformed one into each other by a finite sequence of switching operations (Figure 2). Moreover, there is an exponential number of hv-convex polyominoes (i.e. 4-connected sets with 4-connected rows and columns) with the same horizontal and vertical projections 5 .
Figure 2.
Three switches let get these tomographically equivalent objects.
Lastly, the stability problem concerns how the shape of an object changes while perturbing its projections. In computerized tomography the variation in the final image due to the fluctuation in one projection sample is generally disregarded, since it forms independently, as one of many, the result and the effect is therefore distributed broadly across the reconstructed image 6 . This is not true in the discrete case and the first theoretical analysis to reconstruct binary objects of whatever shape has proved that this task is instable and that it very hard to obtain a reasonably good reconstruction from noisy projections Here we will describe how our experimental results show that it possible to get convex binary bodies from their perturbed projections, still maintaining a low reconstruction error.
’.
3. Reconstruction algorithm In order to verify the correctness of the algorithm we have generated 1900 convex sets with (10 x 10,15x 15,. . . , 100 x 100) pixels. Further 100 convex sets with both width and height randomly ranging between 10 and 100 have been considered too. Their projections have been perturbed 1000 times by incrementing or decrementing by 1 the value of some of their samples, randomly chosen. This is to estimate the effect of errors with absolute value 0 5 E 5 1, so simulating a quantization error. The number of the samples has been decided in a random way, but if we want to let the area of the reconstructed body be constant, we add and subtract the same amount of pixels in all projections.
41
The algorithm introduced in lets reconstruct hv-convex polyominoes in polynomial time, starting from a set of pixels, called spine, that surely belong to the object to be reconstructed. This method makes a rough assumption of the shape of the object and then adds pixels t o this core through an iterative procedure based on partial sums of the projection values. Usually the spine covers just a small part of the object and therefore it is necessary to expand it by applying the filling operations (Figure 3). The underlying idea is the recursive constraint of convexity on each line and along each direction till the core of pixels satisfies the projections (Figure 4). Should it not happen, then no convex polyomino is compatible with those projections.
Figure 3. The first two filling operations are not based on the projection value. The circles (+) represent pixels not yet assigned to the core.
We have generalized this algorithm by weakening the convexity constraint. This means that as soon as it is not possible to apply a certain filling operation, due to an inconsistency between the value of the projection and the number of pixels already in the considered line of the core, we skip that line and process the rest of the projection, so reaching a solution that we called non-convex. It may happen that the ambiguity will be reduced when processing the core along other directions. Besides the horizontal and vertical directions, we have also considered the following ones d = ( ( l , O ) , (0,-1), (1,-2), (2, l), (-1, -l), (1,1)),in a number of projections chosen between 2 and 4, according to the sets { { d l , d z } , {d3,d4}, ( d 5 , d6), (6,d 2 , d 5 ) , {dl, d2, d3, d4), {dl, d2, d 5 , d6), (d3, d4, d5, d s } } . The particular directions we used are indicated in the upper right corner of each of the following figures. Since we are dealing with corrupt projections, most of the ambiguity zones are not due t o complete switching components. Just in case of complete switches we link the processing of the remaining not yet assigned pixels to the evaluation of a corresponding boolean 2-CNF formula (i.e. the .and. of zero or more clauses, each of which is the .or. of exactly
42
Figure 4. Convex recover through {dlrd2,d5}. The spine is showed in the first two steps, the filling operations in the remaining ones. T h e grey pixels are not yet assigned.
two literals) lo. This complete search has exponential time complexity, but it has been proved that these formulas are very small and occur rarely, especially for big images 11, In order t o measure the difference between the input image taken from the database and the obtained one, we have used the Hamming distance (i.e. we have counted the different homologous pixels), normalized according t o the size of the image. Most of times we have obtained non-convex solutions for which the boolean evaluation involves a bigger average error. Due to this reason, we have preferred not to apply the evaluation on the ambiguous zones, when they were not due to switching components. We want to emphasize that these pixels take part in the error computation only when compared with those of the object. That is, we treat these uncertain pixels, if any, as belonging to the background of the image.
Figure 5 . Non-convex recover (upper right) from a binarized real bone marrow scintigraphy (left) with 1 pixel added/subtracted along { d 3 , d 4 } and without spine. T h e final reconstructed image (lowerright) is obtained by deleting all remaining grey pixels. T h e input image is utilized and reproduced with permission from the MIR Nuclear Medicine digital teaching file collection at Washington University School of Medicine. MIR and Washington University are not otherwise involved in this research project.
4. Experimental results
This final section summarizes the most important results we obtained, giving also a brief explanation. The average error rate increases when the number of modified samples
43
increases. Obviously, the more we change the projections, the harder is for the algorithm to reconstruct the object (Figure 6a). Many non-convex sets suffers from a number of wrong pixels lower than the average error. Despite the algorithm couldn’t exactly reconstruct the convex set, the forced non-convex solutions still keep the shape of the original object. For example, there are about 66.11% of non-convex solutions, marked in grey, with fixed 100 x 100 size, 1 pixel addedlsubtracted along directions ( d 3 , d 4 , d 5 , d s ) , and error smaller than the 0.34% average error (Figure 6b). In the case of convex solutions, the spine construction lets reduce the number of unambiguous cells for the successive phase of filling. In the case of non-convex solutions, the spine usually assumes an initial object shape that produces solutions very different from the input polyomino. An example of non-convex set obtained without spine preprocessing is shown in Figure 5. The choice of the horizontal and vertical directions { d l , d z } is not always the best one. For example, ( 4 ,d 4 ) and ( d 5 , d ~ )let recover more nonconvex solutions with a smaller error. This is due t o the higher density of the scan lines, that corresponds to a better resolution. More than two directions improve the correctness of the solutions, thanks t o the reduced degree of freedom of the undetermined cells. The following tables concisely reports all these results, obtained for objects with 100 x 100 pixels, with our without the spine construction, along different directions and by varying the number of perturbed samples. To the best of our knowledge this is the first experimental study of the stability problem with a statistical approach. Our results give a quantitative esteem for both the probability of finding solutions and of introducing errors at a given rate. We believe that a more realistic instrumental noise should be introduced, considering also that the probability of finding an error with magnitude greater than 1 usually grows in correspondence of the samples with maximum values. Moreover, though the convexity constraint is interesting from a mathematical point of view, at present we are also dealing with other models of objects to reconstruct, suitable for real microscopy or crystallography tools.
Acknowledgements The author wishes to thank Professor Jerold Wallis bution in providing the input image of Figure 5.
12
for his kind contri-
44
301
p
20
+Isamples
%
error
Figure 6. a: Average (*) minimum (D) and maximum (0) error versus number of modified samples, for non-convex solutions with fixed 100 x 100 size, directions {dl,dz} and spine preprocessing. Linear least-square fits are superimposed. b: Number of nonconvex solutions versus error, for fixed 100 x 100 size and 1 pixel added/subtracted along directions {d3,d4,d5,d6) without spine. The dashed line indicates the average error. Table 1. + I / - 1 samples (constant area).
Directions
{dl,da,ds,ds} ld1.dz)
Spine no no no no no no no Yes Yes yes yes
Average error 0.34% 0.35% 0.54% 0.64% 0.71% 0.79% 1.57% 4.81% 4.83% 5.03% 5.44%
Number of solutions 66.11% 68.06% 64.71% 71.01% 77.40% 72.91% 73.29% 38.53% 38.03% 39.11% 37.47%
Table 2. Random samples (non constant area).
Directions
Spine no no no no no no no Yes Yes Yes yes
Average error 5.43% 5.66% 5.71% 5.86% 6.24% 8.53% 9.84% 10.67% 10.78% 10.87% 11.94%
Number of solutions 67.48% 69.85% 69.51% 58.34% 62.93% 58.75% 75.11% 28.42% 29.92% 28.67% 28.32%
References
1. KAK A.C. AND SLANEY M., Principles of Computerized Tomography Imaging. IEEE Press, New York, 1988.
45
2. SHEPPL., DIMACS Mini-Symposium on Discrete Tomography. Rutgers University, September 19, 1994. 3. SCHWANDER P., Application of Discrete Tomography to Electron Microscopy of Crystals. Discrete Tomography Workshop, Szeged, Hungary, 1997. 4. RYSERH.J., Combinatorial properties of matrices of zeros and ones. Canad. J . Math., 9:371-377, 1957. 5. DAURATA,, Convexity in Digital Plane (in French). PhD thesis, Universite Paris 7 - Denis Diderot, UFR d’lnformatique, 1999. 6. SVALBEI. AND SPEK VAN DER D., Reconstruction of tomographic images using analog projections and the digital Radon transform. Linear Algebra and its Applications, 339:125-145, 2001. 7. ALPERSA., GRITZMANN P., AND THORENS L., Stability and Instability in Discrete Tomography. Digital and Image Geometry, LNCS, 2243:175-186, 2001. 8. BRUNETTIS., DEL LUNGOA., DEL RISTOROF., KUBAA,, AND NIVAT M., Reconstruction of 8- and 4-connected convex discrete sets from row and column projections. Linear Algebra and its Applications, 339:37-57, 2001. 9. KUBAA., Reconstruction in different classes of 2d discrete sets. Lecture Notes in Computer Science, 1568:153-163, 1999. 10. BARCUCCI E . , DEL LUNGOA , , NIVATM., AND PINZANI R., Reconstructing convex polyominoes from horizontal and vertical projections. Theoretical Computer Science, 155:321-347, 1996. 11. BALOGHE., KUBAA . , DBVBNYIC., AND DEL LUNGOA., Comparison of algorithms for reconstructing hv-convex discrete sets. Linear Algebra and its Applications, 339:23-35, 2001. 12. Mallinckrodt Institute of Radiology, Washington University School of Medicine, http://gamma.wustl.edu/home.html.
46
AN INTEGRATED APPROACH TO 3D FACIAL RECONSTRUCTION FROM ANCIENT SKULL A. F. ABATE, M. NAPPI, S. RICCIARDI, G. TORTORA Dipartimento di Matematica e Informatica, Universitri di Salem0 84081, Baronissi,, Italy E-mail:
[email protected] Powerful techniques for modelling and rendering tridimensional organic shapes, like human body, are today available for applications in many fields such as special effects, ergonomic simulation or medical visualization, just to name a few. These techniques are proving to be very useful also to archaeologists and anthropologists committed to reconstruct the aspect of the inhabitants of historically relevant sites like Pompei. This paper shows how, starting from radiological analysis of an ancient skull and a database of modem individuals of the same aredgendedage, it is possible to produce a tridimensional facial model compatible to the anthropological and craniometrkal features of the original skull.
1.
Introduction
In the last years computer generated imaging (CGI) has been often used for forensic reconstruction [19], as an aid for the identification of cadavers, as well as for medical visualization [3,16], for example in the planning of maxillo-facial surgery [ 141. In fact, 3D modelling, rendering and animation environments today available have greatly increased their power to quickly and effectively produce realistic images of humans [8]. Nevertheless the typical approach usually adopted for modelling a face is often still too much artistic and it mainly relies on the anatomic and physiognomic knowledge of the modeller. In other terms computer technology is simply replacing the old process of creating an identikit by hand drawn sketches or by sculpting clay, adding superior editing and simulative capabilities, but often with the same limits in term of reliability of the results. The recent findings of five skulls [see Figure 11 and several bones (from a group of sixteen individuals in Murecine (near Pompei), offers the opportunity to use CGI, and craniographic methods [ 5 ] ,to reconstruct the aspect of the victims of this tremendous event. This paper starts assuming that, unfortunately, what is lost in the findings of ancient human remains, is lost forever. This means that by no way is possible to exactly reproduce a face simply from its skull, because there are many ways in which soft tissues may cover the same skull leading to different final aspects.
47
The problem is even more complicated in the (frequent) case of partial findings, because the missing elements (mandible or teeth for example) could not be derived from the remaining bones [7].
Figure 1. One of the skulls found in the archaeological site of Murecine, near Pompei. Nevertheless is true that the underlying skeleton affects directly the overall aspect of an individual, and many fundamental physiognomic characteristics are strongly affected by the skull. One of the main purposes of this study, is therefore to correlate ancient skulls to skulls of living individuals, trying, in this way, to replace lost information (for example missing bones and soft tissues) with new compatible data. Additionally, the physiognomic relevant elements that are too much aleatory to be derived from a single compatible living individual, are selected through a search in a facial database (built from classical art reproductions of typical Pompeians) and then integrated in the previous reconstruction. This paper is organized as follows. In Section 2 related works are presented. In Section 3 the proposed reconstruction approach is presented in detail. In Section 4 the results of the proposed method are presented and discussed. The paper concludes showing directions for future research in Section 5.
48
2.
Related Works
Facial reconstruction from skull begins has a long history, and begins around the end of nineteenth century. The reconstructive methodologies developed over more of a century [20] basically come from two main approaches:
. .
the study of human facial anatomy and relationships between soft tissues (skin, fat, muscles) and hard tissues (cranial bones), the collection of statistical facial data about individuals belonging to different races, sex and ages,
and they can be summarized as follow:
.
. .
.
2D artistic drawing [6], in which the contours fitting a set of markers positioned on the skull act as a reference for the hand drawing phase which involves the anatomic knowledge of the artist. photo or video overlay of facial images on a skull image [lo], aimed to compare a face to a skull to highlight matching features.
3D reconstruction both with manual clay sculpting or digital modelling. In the manual approach the artist starts from a clay copy of a skull, applies the usual depth markers (typically referred as landmarks) and then begins to model in clay a face fitting the landmarks. In digital modelling the first step is to produce a 3D reconstruction of the skull [ 151, typically starting from CT data [17], then a facial surface model is created from 3D primitives using the landmarks as a reference for the contouring curves. It is also possible to generate a solid reconstruction of the modelled face by stereolithographic techniques [9,11]. warping of 3D digital facial model [18, 211, which tries to deform (warp) a standard “reference” facial model, to fit the landmarks previously assigned on the digital model of the skull.
Many of the methods mentioned above rely on a large survey on facial soft tissue depth, measured in a set of anatomically relevant points. Firstly developed on cadavers, this measurement protocol has been improved 141 with data from other races, various body build, and even from living individuals by radiological and ultrasound diagnostic techniques.
49
3.
The proposed method
The whole reconstructive process is detailed below in sections 3.1 to 3.11.Two reference databases are used: the Craniometrical Database (CD) and the Pictorial Physiognomic Database (PPD). In sections 3.5 and 3.10 these databases are discussed in detail.
3.1. The skull We start selecting one dry skull among the five ones found in Murecine. This skull belonged to a young male, and it has been found without the mandible and with many teeth missing, but its overall state of conservation is fine. Unfortunately the absence of the mandible make the reconstruction of the lower portion of the face more complicated and less reliable, because in this case there is no original bone tissue to guide the process. The skull is photographed and then scanned via CT on the axial plane with a step of 1 millimetre and a slice thickness of 2 millimetres, so every slice overlaps by 1 millimetre with the following one. This hires scanning produce a set of images (about 250), as well as a 3D reconstruction of the skull. Additionally, three radiological images of the skull from three orthogonal planes are taken, corresponding to front, side and bottom views. The 3D mesh outputted by CT will be used as a reference to visually verify the compatibility of the reconstructed soft tissues to the dry skull.
3.2. The set of landmarks The next step is to define on each radiological image a corresponding set of anatomic and physiognomic relevant points, named landmarks, each one with a unique name and number in each view [see Figure 21.
Figure 2. Landmarks located on front and side view of skull and craniometrical tracing.
50
Because the landmarks are chosen according to their craniometrical relevance, they possibly could not correspond to the points for soft tissue thickness measurement indicated by Moore [4]. In this study we use a set of 19 landmarks, but this number could be extended if necessary. Alternatively it is possible to assign the landmarks directly on the 3D skull mesh produced by CT, in this case the following step (3.3) is not necessary because the landmarks already have tridimensional coordinates. A complete list of the landmarks used is showed in Table 1.
I
Table 1. List of landmarks . Landmark # 1 Location (front view) 1 Landmark # I
Location (side view)
I
3.3. Adding a third dimension to the set of landmarks Now we have the same set of points assigned to each of three views corresponding with plane XY, XZ and YZ. So it is easy to assign to each landmark Li its tridimensional coordinates (hi, Lyi,Lz,) simply measuring them on the appropriate plane with respect to a common axis origin. We can easily visualize the landmark set in the tridimensional space of our modeling environment and make any kind of linear or angular measurements between two or more landmarks. 3.4.
Extraction of craniometricalfeatures
Starting from the landmarks previously assigned we define the n-tuple of features (F,*,F2*,......, F,*) which are peculiar to this skull and results from the craniometrical tracing of the skull [see Figure 21. These features are consistent to the features present in CD, they includes angles and lenghts measured on front or side view and are listed in Table 2.
51
Table 2 List of Features (front and side view)
Because each feature has a different relevance from a physiognomic and craniometrical point of view, a different weight is assigned to each of them. The resulting n-tuples (wl, w2, ......,w,,), with 0 I w i 5 1and 1I j I n , contains the weights relative to
(F,*,F2*,......, F,*) . These weights are not
meant to be dependent from a particular set of features, and if
Fi= o
then
w i=o. 3.5.
Searching for sirnilanties in CD
The CD is built on data collected from a radiological survey [see Figure 31 conducted on thousands of subjects of different ages and sex but all coming from the same geographical area in which the remains were found: Pompei and its surroundings.
Figure 3. Samples of records used to built the CD. Each individual represent a record in the database, and each craniometrical feature, extracted with the same procedure showed before, is stored in a numeric field, as well as the 3D coordinates. Additionally we stored three photographic facial images of each subject, shoot from the same position and during the same
52
session of radiological images. This precise alignment of photo camera and radio-diagnostic device is necessary to allow a spatial correlation between the two different kind of images. If a digital CT equipment or even a 3D scanneddigitizer were available, an optional field could point to a facial 3D model of each subject, thus avoiding the need for steps 3.6 e 3.7. Once the database is built, it is possible to search through it to find the record (the modern Pompeian individual) whose craniometrical features are more similar to the unknown subject given in input. This task is accomplished by evaluating for each record i the Craniometrical Similarity Score (CSS) that is calculated as :
I
in
which
Fi,is
the
jcomponent
of
the
n-tuple
of
features
(Fi17 Fi2 7 . . . . . . , F i , ) , relative to record i , w,represent its weight and D, is
D2,......)0,) containing the max allowed the j component of an array (D,, difference between Fi, and F,* for each j . If any feature is not present in the input skull, due to missing elements for example, then the corresponding term in the CSS formula becomes zero. So is O .!? h
20,m15.00-
al 10.00 -
0
20
10
# tiles
Figure 4. Retrieval trend with malignant query tile
5.
Conclusions
The main idea we have proposed with the HER method is to consider the maxima as more important feature of a signal. The importance of the maxima is not only on their position but rather on their “mutual position” inside the signal. HER is a hierarchical method who select maxima considering the relative value and reciprocal distances. The signal is represented by means a vector containing couples of elements, where the former is the distance of the maxima from the first one and the latter represent the entropy associated. HER is a non linear transform who present several nice invariance: translation, rotation, reflection, luminance shifting and scale. Experimentation using contour signal has shown encouraging results. Such a results are strictly connected with the procedure we followed to transform a shape into a 1-d signal. HER for contour allows to obtain important information on the number and on the shape of the elongations of the object under investigation. The sampling theorem clarifies the differences between the proposed method and Fourier descriptors. Also the comparison with momentbased technique has shown the validity of the HER method. Some consideration can be done about the results obtained using the Brodatz dataset of textures: the transformations applied on the tiles (rotation, reflection, luminance shifting and contrast shifting) does not modify the low
158
frequencies of the signal, this allows the Fourier Transform to obtain better results using only few coefficients. However, such a results are not better respect to the once obtained with HER and the wavelets. One of the most importance propriety of the HER method is relative to its low time consuming respect to all other techniques take into account. Furthermore all experimentation has been conducted using only the 30% of whole signal information. In conclusion we can affirm the experimentation with HER has shown results comparable (and sometimes better) with Fourier Transform and Wavelets. These lunds of results are confirmed with the last experiments on medical images (mammography database). Considering the results obtained on medical images we thlnk our method could be used as the basis for a computer-aided detection (CAD) system. Finding similar images, with the aim to attract radiologist attention to possible lesion sites, sure is important way to provide aid during clinical practice. The importance of a content based image retrieval system in computer aided detection is to help radiologists when they need reference cases to interpret an image under analysis. Our future objective is on the development of an efficient database methodology for retrieving patterns in medical images representing pathological processes.
References 1. Casanova A., Fraschmi M., Vitulano S., Hierarchical Entropy Approach for image and signals Retrieval, Proc. FSKD02, Singapore, L.Wang et al. Editors. 2. H Distasi R., Nappi M., Tucci M., Vitulano S., CONTEXT: A techniquefor Image retrieval Integrating CONtour and TEXture Information, Proc. of ICIAP 2001, Palermo-Italy, IEEE Comp. SOC. 3. Brodatz P., Textures, A Photographic Album of Artists and Designers, Dover Publications, New York, 1966. Available in a single .tar file: ftp :I/ftp .cps .msu.edu/pub/prip/textures/ . 4. Issam El Naqa, Yongyi Yang, et alt., Content-based image retrieval for digital mammography, ICIP 2002.
159
ISSUES IN IMAGE UNDERSTANDING *
VITO DI GESU D M A University of Palermo, Italy IEF University oj Paris Sud, ORSAY, France E-mail:
[email protected] Aim of the paper is to address some fundamental issues and view-points about machine vision systems. Among them image understanding is one of more challenging. Even in the case of the human vision its meaning is ambiguous, it depends on the context and goals to be achieved. Here, a pragmatic view will be considered, by addressing the discussion on the algorithmic aspects of the artificial vision and its applications.
1. Visual Science Visual science is considered one of the most important field of investigation in perception studies. One of reasons is that eyes collect most of the environment information and this makes very complex the related computation. Moreover, eyes interacts with other perceptive senses (e.g. hearing, touch, smell), and this interaction is not fully understood. Mental models, stored somewhere in the brain, are perhaps used to elaborate all information that flow from our senses to the brain. One of the results of this process is an update of our mental models by means of a sort of feedback loop. This scenario shows that the understanding of a visual scene surrounding us is a challenging problem. The observation of visual forms plays a considerable role in the majority of human activities. For example in our daily life, we stop the car at the red t r a f i c light, select ripe tomatoes, discarding the bad ones, read a newspaper to update our knowledge. The previous three examples are related to three different levels of understanding. In the first example an instinctive action is performed as a 'This work has been partly supported by the european action COST-283 and by the French ministry of education.
160
Figure 1.
Axial slices through five regions of activity of the human brain.
Figure 2.
Hardware based on bio-chips.
consequence of a visual stimulus. The second example concerns with conscious decision-making activity; where, attentive mechanisms are alerted by the visual task: recognize the color of the tomato and decide if it i s rape. In this case the understanding imply training and learning procedures. The third example involves an higher level of understanding. In fact, the reading of a sequence of typed words may produce or remind us concepts; for example images and emotions. Reading may generate mental forms that are different, depending on the reader culture, education, and past experiences. At this point we may argue that visual processes imply different degrees of complexity in the elaboration of the information. Visual perception has been an interesting investigation topic since the beginning of the human history, because of its presence in most of the human activities. It can be used for communication, decoration and ritual
161
purposes. For example, scenes of hunting have been represented by graffiti on walls prehistoric caves. Graffiti can be considered the first example of visual language that uses an iconic technique to pass on history. They also suggest us how external world was internalized by prehistoric men. During centuries, painters and sculptors have discovered most of color combination rules and spatial geometry relationships, allowing them to generate real scene representation, intriguing imaginary landscapes, and visual paradoxes. This evolution was due not only to the study of our surrounding environment; it was also stimulated from the emergency of more and more internal concepts and utilities. In fact, with the beginning of writing visual representation became an independent form of human expression. Since 4000 B.C., visual information has been processed by Babilon and Assir astronomers to generate sky maps representing and predicting planets and stars trajectories. Today astronomers use computer algorithms to analyze very large sky images at different frequency ranges to infer galaxy models and to predict the evolution of our universe. Physicians perform most of diagnoses by means of biomedical images and signals. Intelligent computer algorithms have been implemented to perform automatic analysis of MRI (Magnetic Resonance Imaging) and CTA (Computerized Tomography Analysis). Here, the intelligence stays for the ability to retrieve useful information by guessing the most likely disease. Visual science has been motivated by all previously outlined arguments; it is aimed to understand how we see and how we interpret scenes surrounding us by starting from the visual information that is collected by eyes and processed by our brain One of the visual science goals is the design and the realization of artificial visual systems closer and closer to human being. Recent advances in the inspection of human brain and future technology will allows us both to explore our physical brain in deep (see Figure 1))and to design artificial visual systems, behavior of which will be closer and closer to the human being (see Figure 2). However, advances in technology will be not sufficient to realize such advanced artificial visual systems, as a matter of fact that their design would need of a perfect knowledge of our visual system (from the eyes to the brain). 'v2.
162
2. The origin of artificial vision
The advent of digital computers has determined the development of the computer vision. The beginning of the computer vision can be dated around 1940 when Cybernetics started. In that period the physicist Nobert Wiener and the physician Arturo Rosenblueth promoted, at the Harward Medical School, meetings between young researchers, to debate interdisciplinary scientific topics. The guide-line of those meetings was the formalization of biological systems by including the human behavior 5. The program was ambitious, and the results not always has people hoped; however the advent of the cybernetics marks the beginning of a new scientific approach to natural science. Physicists, mathematician, neurophysiologists, psychologists, and physician cooperate to cover all aspects of human knowledge. The results of such integration was not only mere exchange of information coming from different culture; it contributed to improve the human thought. In this framework, Frank Rosemblat introduced, in a collection of papers an books the concept of perceptron. The perceptron is a multi-layers machine, that store a set visual patterns, collected by an artificial retina, that are used as a training set for an automaton; features and weights learned during the training set are then used to recognize unknown pattern. The intuitive idea is that each partial spatial predicates, recognized by the perceptron, should provides evidences about whether a given pattern belongs to a universe of patterns. The dream of building machines, able to recognize any patterns after a suitable training, was shattered either because of the intrinsic structural complexity of the natural visual system (even in the case of simple animals like frogs), or because of the insufficient technological development of that time. Nevertheless, the idea of perceptron must be considered the first paradigm of parallel machine vision system, defined as the mutual interchange of information (data and instructions) among a set of cooperating processing units. For example, it suggested the architecture of interesting machine vision systems 637,8,
9110i11.
3. The pragmatic paradigm The pragmatic paradigm of artificial vision is goal oriented, and the goal is to build machines, that perform visual tasks in a specific environment, according to user-requirements. It follows that the machine design can be based on models that are not necessarily suggested by natural visual sys-
163
tems. Here, the choice of the visual model is usually based on optimization criteria. Even if pragmatic approach has been developed under the stimulus of practical requirements, it has contributed also to understand better some vision mechanisms. For example, graph theoretical algorithms have been successfully applied to recognize Gelstat clusters according to the human perception l2?l3. The relation between graphs and natural grouping of patterns could be grounded on the fact that our neural system could be seen as is a very dense multi-graph with billions of billions of paths. Of course, the question is still open and probably will never be solved. Artificial visual systems can be described throughout several layers of increasing abstraction; each one corresponding to a set of iterated transformations. The general purpose is to reach a given goal, starting from an input scene, X, represented, for example as an array of 2D pixels or 3D voxels defined on a set of gray levels G. The computation paradigm follows four phases (see Figure 3): Low level vision, Intermediate level vision, High
level vision, Interpretation. Note that these steps don’t operate as a simple pipeline process; they may interact through semantic networks and mechanism of control based on feedback. For example, parameters and operators, used in the low level phase, can be modified if the result is inconsistent with an internal model used during the interpretation phase. The logical sequence of the vision phases is weakly related with natural vision processes; in the following a pragmatic approach is considered, where the implementation of each visual procedure is performed by means of mathematical and physical principles, which may have or not neuro-physiological counterpart. The pragmatic approach has achieved promising and useful results in many application fields. Among them robotics vision 14, face expression analysis 15, document analysis 16, medical imaging 17, and pictorial database
3.1. Low Level Vision Here, vision operators are applied point-wise and in neighborhood spatial domains to perform geometric and intensity transformations. Examples are digital linear and not-linear filters, histogram equalization with cumulative histogram 19, mathematical morphology ’O. Figures4a,b show examples of morphological erosion using man-definitions ”. The purpose of this stage is to perform a preprocessing of the input image that reduces the effect
164
Early Vis'ton
Coanitive Vision
Figure 3.
T h e classical paradigm of an artificial vision system.
Figure 4. T h e input image (a); its erosion (b)
of random noise, perform sharpening, and detects structural and shape features. A second goal of this phase is the selection of area of interests inside the scene, where to perform more complex analysis. The Discrete Symmetry Transform ( D S T ) ,as defined in 22,23, is an example of attentive operators that extract area of interest based on the gray levels circular symmetry around each pixel (see Figures 5a,b,c,d). The definition of interesting area depends on the problem and it is based on information theory methods.
165
Figure 5. The attentive operator D S T : a) the input image; b) the application of the D S T ; c) the selection of point of interest; c) the selection of eyes.
Low level vision operators can be often directly implemented in artificial retinas both to reduce the cost of the whole computation and to enhance their performance. So named active retinas have been included in active visual systems 24,25 Active visual systems have mechanisms that can actively control camera parameters such as orientation, focus, zoom, aperture and vergence in response to the requirements of the task and external stimuli. More broadly, active vision encompasses attention, selectively sensing in space, resolution and time, whether it is achieved by modifying physical camera parameters or the way data is processed after leaving the camera. The tight coupling between perception and action proposed in the active vision paradigm does not end with camera movements. The processing is tied closely with the activities it supports (navigation, manipulation, signaling danger or opportunity, etc.) allowing simplified control algorithms and scene representations, quick response time, and increased success a t supporting the goals of activities. Active vision has higher feasibility and performanqe but requires lower cost. The application of active vision facilitate certain tasks that would be impossible using passive vision. The improvement of performance can be measured in terms of reliability, repeatability, speed and efficiency in performing specific tasks as well as the generality of the kinds of tasks performed by the system. In active vision, the foveated sensor coupled with
166
a position system can replace a higher resolution sensor array. Moreover, less data needs to be acquired and processed will significantly save the hardware costs. 26,27.
3.2. Intermediate Level Vision Neurologists argue that There are quite a number of visual analysis carried out by the brain that is categorized as intermediate level vision. Included are our ability to identify objects when they undergo various transfomations, when they are partially occluded, to perceive them the same when they undergo change in size and perspective, to put them into categories, to learn to recognize new objects upon repeated encounter, and to select objects in the visual scene by looking at them or reaching for them 28. In the case of artificial visual systems the intermediate level vision computation is performed on selected area of interest. The task is the extraction of features that bring shape information. Features can of geometrical nature (borders, blobs, edges, lines, corners, global symmetries, etc.) or computed on pixel intensity values (regions, segmentation). These features are stored a t an intermediate level of abstraction. Note that, such features are free of domain information - they are not specifically objects or entities of the domain of understanding, but they contain spatial and other information. It is the spatial/geometric (and other) information that can be analyzed in terms of the domain in order to interpret the images. Yet, as in the natural case, all features involved must by invariant for geometrical and topological transformations. This property is not always satisfied in the real applications. Geometrical features. Canny’s edge detectorz9 is one of the most robust and is widely used in literature. The absolute value of the derivative of a Gaussian is a good approximation to one member of his family of filters. Ruzon and Tomasi 30 introduced an edge detector that uses also color information. The RGB components are combined following two different strategies: a) the edges of the three components are computed after the application of a gradient operator then the fusion of the three edge-images is performed; the edges are detected on the image obtained after fusing the gradient-images. 2D and 3D modelling 3 1 , Snakes computation 32 is another example of technique to retrieve object contours. They are based on an elastic modelling of a continuous, flexible open (closed) parametric curve, T ( s ) with s E [0, 11, is imposed upon and matched to an image. Borders follow the evolution of the dynamic
167
system describing the snake under constraints that are imposed by the image features. The solution is funded by an iterative procedure and it corresponds to the minimization of the system energy. The algorithm is based on the evolution of the dynamic system:
The first term of this equation represents internal forces, where w1 is the elasticity and 202 is the stiffness. The second term is the external forces; it depends in image features. The solution of this equation is funded by an iterative procedure and it corresponds to the minimization of the system energy:
I -Iz
lw12
where: Eint(-i;)(s)) = as and Efeature = -(V2P')2. A global solution is usually not easily founded and piece wise solutions are searched. The energy of each snake piece is minimized (the ends are pulled to the true contour and the snake growing process is repeated. A global solution is usually not easily founded and piece wise solutions are searched. Numerical solutions are based on finite differences methods and dynamic programming 33 (see Figure 6). Methods to compute global object symmetries are considered a t this level of the analysis, the Smoothed Local Symmetry 35 has been introduced to retrieval a global symmetry (if it exists) from the local curvature of contours. In the mathematical background to extract object skewed symmetries under scaled Euclidean, f i n e and projective images transformation is proposed. An algorithm for back-projection is given and the case of noncoplanarity is studied. The authors introduce also the concept of invariant signature for matching problems (see Figure 7). A third example of intermediated level operation is the DCADdecomposition 39 that is an extension of the Cylindrical Algebraic Decomposition 40. The DCAD decomposes a connected component of a binary input image by digital straight paths that are parallel to the Y (X)-axis and cross the internal digital borders of the component where they correspond to concavities, convexities, and bends. F'rom this construction a connectivity graph, CG, is derived as a new representation of the input connected 36137938
168
Figure 6.
(a) input image; (b) edges detection; (c) snakes computation.
components. The CG allow us to study topological properties and visibility relation among all components in the image. In this phase both topological information and structural relations need to be represented by high level data structure where full order (sequences) and/or partial order define relation between image components. Examples of partial ordered data structure are the connectivity graphs and trees 41t39,
42
Grouping and segmentation. The automatic perception and description of a scene is the result of a complex sequence of image-transformations starting from an input image. All transformations are performed by vision operators that are embedded in a vision loop representing their flow (from the low to
169
Figure 7. (a) Cipolla's skewed symmetries detection; (b) examples of global symmetry detection.
the high level vision). Within the vision loop, the segmentation of images in homogeneous components is one of the most important phase 43. For example, it plays a relevant role in the selection of parts an object on which to concentrate further analysis. Therefore, the accuracy of the segmentation step may influence the performance of the whole recognition procedure 44. More formally, we may associate to the input digital image X a weighted undirected graph G = (X,E , b ) , nodes of which are all pixels, arcs, E , depend on the digital connectivity (e.g. 4 or 8), and the function b : E -+ [0,1] is the arc-weight that could be a normalized distance between pixels. The segmentation is then defined as an equivalence relation, N ,that determine the graph partition:
where 0 5 q!~ 5 1 is a given threshold. One graph partition will correspond to each threshold. The spanning of all value of 4 will provide a large set of segmentation solutions and this make hard the segmentation problem. On the other hand, image segmentation depends on the context and it is subjective the decision process is driven by the goal or purpose of the visual
170
task. Therefore, general solutions do not exist and each proposed techniques is suitable for a class of problems. In this sense, the image segmentation is an hill posed problem that does not admit a unique solution. Moreover, the segmentation problem is often hard, because the probability distribution of the features is not well known. Often, the assumption of a Gaussian distribution of the features is a rough approximation that makes false the linear separation between classes. In the literature the segmentation problem has been formulated from different perspectives. For example, in 45 is described a two-steps procedure that use only data included in boundary, this approach has been extended t o boundary surfaces by combining splines and superquadrics to define global shape parameters 46,47. Other techniques use elastic surface models, that are deformed under the action of internal forces to fit object contours using a minima energy criteria 48. A model-driven approach segmentation of range images is proposed in 49. Recently, Jianbo and Malik 50 have considered a 2 0 image segmentation as a Graph Partitioning Problem ( G P P ) solved by a normalized cut criterion. The method founds an approximated solution by solving a generalized eigenvalue system. Moreover, the authors consider both spatial and intensities pixel features in the evaluation of the similarity between pixels. Recently, the problem of extracting the largest image regions that satisfy uniformity conditions in the intensity/spatial domains has been related to a Global Optimization Problem (GOP) 51 by modelling an images by a weighted graph; where the edge-weight is function of both intensities and spatial information. The chosen solution is that one for which a given objective function obtains its smallest value, hopefully the global minimum. In 52 a genetic algorithm is proposed to solve the segmentation problem as a G O P problem using a tree regression strategy 53. The evaluation of a segmentation method is not an easy task because the expected results are subjective and they depend on the application. One evaluation could be the comparison with a robust and well experimented method; but this choice is not always feasible; whenever possible the evaluation should be done combining the judgement of more than one human expert. For example, the comparison could be performed using a vote strategy as follows:
171
Figure 8. (a) input image; (b) human segmentation; ( c ) GS, (d) NMC, (e) SL, and (f) C-means segmentations.
where #agrkis the number of pixels in which there is the agreement between the human and the machine, I H P k I is the cardinality of the segment defined by the human, and 1 41is the cardinality of the segment found by the algorithm. Figure 8 shows how different segmentations methods (Genetic Segmentation (GS), Normalized Minimum Cut (NMC), Single link (SL), C-means) perform the segmentation of the same image. Figure 8b shows the human segmentation obtained using the vote strategy.
3.3. High Level Vision In this phase decision tasks regard the classification and the recognition of objects, the structural description of a scene. For example, high level vision provides the medical domain with objective measurements about features related to diseases, such as narrowing of arteries, volume changes of a pumping heart, or the localization of points attaching muscles to bones (used to analyze human motion) 54. Classification of cells is another example of hard problem in the biological domain; for example, both statistical
172
Figure 9. The axial moment values of an a (a) and a p (b) cell.
and shape features are used in the classification and recognition of a! and ,B ganglion retinal cells. In 55 is presented a quantitative approach where several features are combined such as diameter, eccentricity, fractal dimension, influence histogram, influence area, convex hull area, and convex hull diameter. The classification is performed integrating the results from three different clustering methods (Ward’s hierarchical scheme, K-Means and Genetic Algorithm) using a voting strategy. The experiments indicated the superiority of some features, also suggesting possible biological implications among them the eccentricity derived from the axial moments of the cell (see Figure 9). Autonomous robots equipped with visual systems are able to recognize their environment and to cooperate in finding satisfactory solutions. For example in 56 is developed a probabilistic, vision-based state estimation method for individual, autonomous robots. A team of mobile robots is able to estimate their joint positions in a known environment and track the positions of autonomously moving objects. The state estimators of different robots cooperate to increase the accuracy and reliability of the estimation process. The method has been empirically validated on experiments with a team of physical robots playing soccer 5 7 . The concept of internal model is central in this phase of the analysis.
173
Often, a geometric model is matched against the image features previously computed and embedded in data structure derived in the intermediated phase. The model parameters are optimized by minimizing a cost function. Different minimization strategies (e.g. dynamic programming, gradient descent, or genetic algorithms) can be considered. Two main techniques are used in model matching called bottom-up when the primary direction of flow of processing is from lower abstraction levels (images) to higher levels (objects), and conversely top-down when the processing is guided by expectations from the application domain 58. Matching results depends also on the parameter chosen space. For examples the classification of human bones from MRI scans requires the combination of multi-views data and the problem can’t admit an exact solution 59, the human face recognition has been treated considering a face as an element of a multi-dimensional vector space 60, in a the recognition of faces under different expressions and partial occlusions has been considered. To resolve the occlusion problem, each face is divided into local regions that are individually analyzed. The match is flexible and based on probabilistic methods. The recognition system is less sensitive to the differences between the facial expression displayed on the training and the testing images, because the author weights the results obtained on each local area on the basis of how much of this local area is affected by the expression displayed on the current test image.
3.4. Interpretation
This phase exploit the semantic part of the visual system. The result belongs to an interpretation space. Examples are linguistic description and definition of physical models. This phase could be considered as the conscious component of the visual system. However, in a pragmatic approach it is simply a set of semantic rules that are given, for example, by a knowledge base. The technical problem is that of automatically deriving a sensible interpretation from an image. This task depends on the application or the domain of interest within which the description makes sense. Typically, in a domain there are named objects and characteristics that can be used in a report or to make a decision. Obviously, there is a wide gap between the nature of images (essentially arrays of numbers) and their descriptions and the intermediate level of of the analysis is the necessary the link between image data and domain descriptions. There are researchers who take clues from
174
biological systems to develop theories, and there are those who focus on mathematical theories and physics regarding the imaging process. Eventually however, theory becomes practice in the specification of an algorithm embodied in an executable program with appropriate data representations. There are alternate views of vision, resulting in other paradigms for image understanding and research. In image interpretation, knowledge about the application domain is manipulated to arrive at understanding the recorded part of the world. Knowledge representation schemes that are studied include semantic networks 62, Bayesian and Belief Networks 6 3 , and fuzzy expert systems 64. Some of the issues addressed within these schemes are: incorporation of procedural and declarative information, handling uncertainty, conflict resolution, and mapping existing knowledge onto a specific representation scheme. Resulting interpretation systems have been successfully applied to interpret utility map, interpreting music scores and interpreting face images. Future developments will focus on the central theme of fusing knowledge representations. In particular, attention will be paid towards information fusion, distributed knowledge in multi-agent systems and mixing knowledge derived from learning techniques with knowledge from context and expert. Moreover recognition systems must be able to handle uncertainty, and t o include subjective interpretation of a scene. Fuzzy-logic 67 can provide a good theoretical support to model such kind of information For example, to evaluate the degree of truth of the propositions: 65966.
- the chair, beyond the table, is small ;
- the chair, beyond the table, is very small; - the chair, beyond the table, is quite small;
- few objects have straight medial axis; it is necessary to represent the fuzzy predicate small, the fuzzy attributes very and quite, and the fuzzy quantifier few. The evaluation of each proposition depends on the meaning that is assigned t o small, very, quite, and few. Moreover the objects chair and table, and the spatial relation beyond must be recognized with some degree of truth. These simple examples suggest the need for the use of fuzzy-logic to describe spatial relations often used in high level vision problems. However, the meaning of the term soft-vision can’t be simply limited to the application of fuzzy-operators and logic in vision. This example shows that new visual systems should include soft tools t o express abstract or not fully defined concepts 68 by following the paradigms of the soft-computing 69.
175
Figure 10. T h e Kanizsa triangle illusion.
4. Final remarks
This review has shown some problems and solutions in visual systems. Today, more than 10,000 researchers are working on visual science around the world. Visual science has become one the most popular field among scientists. Physicists, neurophysiologists, psychologists, and philosophers cooperate to reach a full understanding about visual processes from different perspectives. Fusion and the integration of which will allow us to make consistent progresses in this fascinating subject. Moreover, we note that anthropomorphic elements should be introduced to design complex artificial visual system. For example, the psychology of perception mays suggest new approaches to solve ambiguous 2D and 3D segmentation problems. For example, Figure 10 shows the well known Kanizsa illusion 70. Here the perceived edges have no physical support whatsoever in the original signal.
References 1. D.Marr, S.Francisco, W.H.Freeman, (1982). 2. S.E.Palmer, MIT Press, (1999). 3. M.D.Eesposito, J.A. Detre, G.K. Aguirre, M. Stallcup, D.C. Alsop, L.J. T i p pet, M.J. Farah, Neuropsychologia 35(5), 725 (1997). 4. M.Conrad, Advances in Computers, 31, 235 (1990). 5. [B] N.Wiener, Massachusetts Institute of Technology, MIT Press, Cambridge (1965). 6. F.Rosemblatt, Proceedings of a Symposium on the Mechanization of Thought Processes, 421, London (1959). 7. F.Rosemblatt, Self-organizing systems, Pregamon Press, NY, 63 (1960). 8. F.Rosemblatt, Spartan Books, NY (1962). 9. V. Cantoni, V. Di Ges, M. Ferretti, S. Levialdi, R. Negrini, R.Stefanelli, Journal oj VLSl Signal Processing, bf 2, 195 (1991)
176
10. A.Merigot, P.Clemont, J.Mehat, F.Devos, and B.Zavidovique, in Pyramidal Systems for Computer Vision, V.Cantoni and S.Levialdi (Eds.), Berlin, Springer-Verlag, (1986). 11. W.D.Hillis, The Connection Machine, Cambridge MA: the MIT Press, (1992). 12. C.T.Zahn, IEEE Trans. on Comp., C-20, 68 (1971). 13. V.Di Gesli, Znt.Journa1 oj Fuzzy Sets and Systems, 68, 293 (1994). 14. E.D.Dickmanns, Proceed-ings of the Fifteenth International Joint Conference on Artificial Intelli-gence, Nagoya, 1577 (1997). 15. B. Kolb and L. Taylor, Cognitive Neuroscience of Emotion, R.D. Lane and L. Nadel, eds., Oxford Univ. Press, 62 (2000). 16. P.M.Devaux, D. B. Lysak, R. Kasturi, International Journal o n Document Analysis and Recognition, 2(2/3), 120 (1999). 17. A. J.Fitzgerald , E.Berry, N.N.Zinovev, G.C.Walker, M.A.Smith and J.M.Chamberlain, Physics in Medicine and Biology, 47, 67 (2002). 18. J.Assfalg, A.Del Bimbo, P.Pala, IEEE Transactions on Visualization and Computer Graphics, 8(4), 305 (2002) 19. R.C.Gonzales, P.Wintz, Prentice Hall, (2002). 20. J. Serra, Academic Press, New York, (1982). 21. L.Vincent and P.Soille, IEEE Transactions on PAMI, 13(6), 583 (1991). 22. V.Di Gesil, C.Valenti, Vistas in Astronomy, Pergamon, 40(4), 461 (1996). 23. V.Di Gesh, C.Valenti, Advances in Computer Vision (Solina, Kropatsch, Klette and Bajcsy editors), Springer-Verlag, (1997). 24. Y. Aloimonos, CVGIP: Image Understanding, 840 (1992). 25. R.Bajcsy, Proceedings oj the IEEE, 76,996 (1988). 26. T.M. Bernard, B.Y. Zavidovique, and F.J. Devos, IEEE Journal oj SolidState Circuits, 28(7), 789 (1993). 27. G.Indiveri, R.Murer, and J.Kramer, IEEE Trans. on Cicuits and SystemsZI: Analog and Digital Signal Processing, 48(5), 492 (2001). 28. P.H.Schiller, http://web.mit.edu/bcs/schillerlab/index.html. 29. J.Canny, IEEE Trans. on Pattern Analysis and Machine Intelligence, 8(6), 679 (1986). 30. M. Ruzon and C. Tomasi, Proceedings oj the IEEE Conference on Computer Vision and Pattern Recognition, Ft. Collins CO, 2, 160 (1999). 31. O.D. Faugeras, MIT Press, 302 (1993). 32. D.Terzopulos and K.Fleischer, The visual Computer, 4, 306 (1988). 33. A.Blake and M.Isard, Springer-Verlag,London, (1998). 34. H.Blum and R.N.Nage1, Pattern recognition, 10, 167 (1978). 35. M.Brady, H.Asada, The International Journal oj Robotics Research, 3(3), 36 (1984). 36. D.P.Mukhergee, A.Zisserman, M.Brady, Philosofical Transaction oj Royal Society oj London Academy, 351, 77 (1995). 37. T.J.Chan, R.Cipolla, Image and Vision Computing, 13(5), 439 (1995). 38. J.Sato, R.Cipolla, Image and Vision Computing, 15(5), 627 (1997). 39. V.Di Gesh, C.Valenti, Journal oj Linear Algebra and its Applications, Springer Verlag, 339, 205 (2001).
177
40. G.E.Collins, Proc.of the Second GI Conference on Automata Theory and Formal Languages,Springer Lect.Notes Comp. SCi., 33, 515 (1975). 41. A.Rosenfeld, Journal oj ACM, 20, 81 (1974). 42. H.Samet, ACM Computing Surveys, 16(2), 187 (1984). 43. R. Duda and P. Hart, NY: Wiley and Sons, (1973). 44. K . h , Pattern Recognition, 13. 3 (1981). 45. A. Pentland, Int. J. Comput. Vision, 4, 107 (1990). 46. D.Terzopulos, D.Metaxas, IEEE Dans.on P A M , 13(7) (1991). 47. N. Raja and A. Jain, Image and Vision Computing, 10(3), 179 (1992). 48. I. Cohen, L. D. Cohen, N. Ayache, ECCV’92, Second European Conference on Computer Vision, Italy, 19 (1992). 49. A. Gupta, R. Bajcsy, Image Understanding, 58, 302 (1993). 50. J S h i , J.Malik, IEEE on PAMI, 22(8), 1 (2000). 51. R. Horst and P.M. Pardalos (eds.), Handbook of Global Optimization, Kluwer , Dordrecht 1995. 52. G.Lo Bosco, Proceedings of the 11th International Conference on Image Analysis and Processing, IEEE Comp SOC.Publishing, (2001). 53. L. Breiman, J.H. Friedman, R.A. Olshen,C.J. Stone, Wadsworth International Group, (1984). 54. S.Vitulano, C.Di Ruberto, M.Nappi, Proceedings of the third IEEE mt. Conf. on Electronics, Circuits and Systems, 2, 1111 (1996). 55. R.C.Coelho, V.Di Gesti, G.Lo Bosco, C. Valenti, Real Time Imaging, 8 , 213 (2002).
56. T.Schmitt, R.Hanek, M.Beetz, S.Buck, and B.Radig, IEEE Transactions on Robotics and Automation, 18(5), 670 (2002). 57. M. Beetz, S. Buck, R. Hanek, T. Schmitt, and B. Radig, First International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS), 805 (2002). 58. V.Cantoni, L.Carrioli, M.Diani, MTerretti, L.Lombardi, and M.Savini, Image Analysis and Processing, V.Cantoni, V.Di Ges, and S.Levialdi (Eds.), Plenum-Press, 329 (1988). 59. P.Ghosh, D.H.Laidlaw, K.W.Fleischer, A.H.Barr, and R.E. Jacobs, IEEE Transactions on Medical Imaging, 14(3), (1995). 60. A. Pentland, T. Starner, N. Etcoff, N. Masoiu, 0. Oliyide, and M.Turk, Proc. Workshop Int’l Joint Conf. Artificial Intelligence, Looking at People, (1993). 61. A.M.Martinez, IEEE Duns. on P A M , 24(6), (2002). 62. A.T.McCray, S.J.Nelson, Meth. Inform. Med., 34,193 (1995). 63. J.Pear1, Morgan Kaufmann, (1988). 64. N.Kasabov, MIT Press, (1996). 65. C.V.Negoita, The Benjamin/Cumming Publishing Company, (1985). 66. LSombB, Wiley Professional Compting, (1990). 67. L. Zadeh, Information and Control, 8 , 338 (1965). 68. V. Di Gesti, Fundamenta Informaticae, 37, 101 (1999). 69. L.A.Zadeh, Communication oj the ACM, 37(3), 77 (1994). 70. G.Kanizsa, Rivista di Psicologia, 49(1), 7 (1955).
178
INFORMATION SYSTEM IN THE CLINICAL-HEALTH AREA G..MADONNA
Sistemi Informativit
1.
The external context and objectives of the information system
The reference context must consider the analysis contained in the guidelines of the “White Book on Welfare”, whose objective is that of setting down a reference picture for the creation and strengthening of the Country’s social cohesion. From this viewpoint, two fundamental aspects, characterising the Italian situation, are being analysed: the matter of population and the role of the family, and two main objectives are identified: to favour the birth rate and improve family policies. As far as these focal themes are concerned, this document does not constitute a closed packet of proposals but rather it aims at representing a basis for a discussion about a new model of social politics. The policy on solidarity must be set into a framework of broad ranging actions aimed at guaranteeing social cohesion as a condition itself for development: this is the way institutional changes are going, and they are already underway both in Europe with the basic rights Card and the Lisbon summit, as well as in Italy, with the modification of Title V of the Constitution. Support to the family, the elderly and the disabled. These are therefore the main objectives the “White Book on Welfare” wishes to achieve. A revolution which wishes the family to be enhanced to the utmost, where the word family is to be understood as being the fundamental and essential nucleus for the development of civil society. The National Health System is to be inserted into this context, a system which is today subject of profound changes which concern its welfare mission, its organisation and underlying system of financing. The Legislative Decree 229/99 (Rgormu Ter) reformed the “SSiV” - the National Health System reiterating and confirming the concept of turning the system into a “company” characterised by the strategical, managerial and financial autonomy of the single health structures.
179
This scenario can however be placed into a process of transformation going back a long way, right from the Legislative decree 502/92 and which, as a whole implies: the consolidation of the trend to move the attention of the public action away from the formalities of procedures, to the verifying of the effectiveness of results; the orientation towards people satisfaction and the transparency of public action and in the case of the health system, to the achieving of best welfare levels compatible with the resources available; the attention to budget obligations, and the taking on of appropriate tools for governing them, such as the afore-mentioned process of rendering the system like a “company” Of course, both from the view point of the information system, and that of the strategical objectives of the company, to achieve at the same time the objective of satisfying users (effectiveness of result, quality of the level of service) and that of the control of costs on the basis of an availability of obliged resources (the objective about the efficiency and productivity), is not an easy task. Figure 1 represents the situation in an elementary way: a Company can achieve excellent performance results, but at the cost of a use of resources which does not pay attention to economic obligations and therefore after a short time is unsustainable; on the contrary, an approach which is too attentive to economic obligations alone (an “obsession” with costs), ends up transforming itself into low costs per unit of product, at the price of poor quality products, and implies that the essential public aim of prevention and healthcare for its citizens is not achieved.
180
users a'reserved with "infinite
Figure 1. Trade Off Effectiveness/Efficiency
The pursuit of these objectives is to be placed into a context of interactions with external entities which exchange data, information, requests. In the diagram at Figure 2, this situation is represented very simply, but effectively, so as to clarify any basic problems there may be: the crucial role of the health service, played by family doctors, and the necessity therefore to construct an effective relationship between them and the Company; the importance of the Region, as this is the office which issues not only financing but also standards, and the shifting of the central State Adrmnistration to a role of secondary level, almost always mediated by the Region, as regards the Company;
181
0
the emerging of consumer protection associations as significant actors who ask for information and system transparency, especially from the point of view of outcome and behaviour; the emerging, alongside the more traditional interactions of the Compwy with certified and agreed structures on the one hand and with suppliers of various services on the other, of further possibilities of interaction and the utilisation of suppliers of health and external welfare services (service co-operatives, non-profit making companies, voluntary workers, collaborators etc.).
Figure 2. Context diagram of the health Company
182
To sum up, all of these interactions put together require the information system supporting them to be able to: concentrate on the management of fundamental interactions (those with the patient), so as to ensure the governing of the strategic objectives pursued; handle other exchanges of information adequately, in particular by adopting progressively a logic of close integration of the information flows with external companies (for example entrusting GP’s with computerised appointment taking operations); produce automatically a full and explanatory set of information of public domain or with controlled access, so as to guarantee the necessary performance transparency of processes, and therefore make them usable by third parties, by way of portals. 2.
Internal organisation: objectives and strategic variables
The primary objective of the Company is that of guaranteeing essential levels of welfare, according to what is laid out in the National Health Plan: collective health and medical care for life and at work; district medical care; 0 hospital medical care. The division of these objectives can be immediately transformed into a structural division of the Company into Departments, Districts and Hospitals, each of which expresses their specific information needs. The three types of structures, visualised at Figure 3, must be supported by service structures and must be governed by a system of objectives and strategic variables which integrate them. As to the former, the district, department and hospital structures visualise it as the heart of components the company is made up of; a heart supported by a group of general technical and administrative services, and which exchanges data, information, processes and activities with the outside in two ways: 0 processes of access by the Company users; 0 processes of control by the strategic management on the Company. 0
183
ministrative Technical Suppo
Figure 3. Organisational components of the health Company
The strategic variables transversal to the three types of structure, then, can be identified on the basis of the two essential access and control fbnctions: 0 management control; the information and IT system, decisive both for the circulation of data for access purposes, as well as for their analysis for control purposes; the quality system, i.e. new attention paid to the service offered to the user; human resources development. 3.
The role of the information system: a system for transforming data into information
Having an “information system”, and not just scattered chunks of IT applications, represents the decisive leap of quality which needs to be taken; some fixed points must be brought into this direction, method points to be respected in that they constitute the “critical success factor” of an information system.
184
The Information System generates information useful for improving management and therefore takes account of and provides an answer to the following aspects: operational processes which constitute the activities of the Company and levels and their integration points; decisions which must be taken on a daily basis; assessments which must be carried out in order to make decisions; organisational responsibilities of those called upon to carry out the assessments and make decisions. In order to succeed in achieving these objectives, the system must respond to the following requirements: it is not parallel to the management system but an integral part of it, in that all the information, both for efficiency purposes and for system quality, can be constructed and generated by ordinary operational processes; the same datum in the scope of a homogenous procedure is collected only once and from one place only, in that the duplication of information sources, far from being a factor of data quality, is certainly a cause for redundancy of procedures and even system errors; it has the utmost flexibility and highest degree of opening to evolution of solutions both technological and, especially, organisational, so as to be able to cope with the inevitable multiplicity, diversification and poor standardisation of activities. The best integration possible between applications and the archives which constitute the information system must be envisaged, in that only from integration is it possible to obtain both process efficiency and availability of top level information. In other words, information, which is the significant aggregation of processed data according to interpreted models, is often the result of the combination of data contained in archives from different subsystems. There can be many examples, but maybe it is sufficient to recall the problems of management control, which is that typical management function through which the Company ensures that the activity carried out by the organisation goes in the direction set out by the units concerned and that it operates in the best conditions of inexpensiveness possible. From an information point of view, management control is an area where very diversified information is collected to be summed up into economic data and activity data or indicators of results, so that the costs of the activities and relative production factors can be compared. Expressed in these terms the reasoning is not particularly complex. The nodal point, nevertheless, is that the designing of a similar system requires the
185
identification of the different levels within which the data must be treated and processed and, in a correlated way, the identification of the integration mechanisms of the data themselves. Since integration is the solution to functioning problems of the information system of the company, an integration which is able to exchange flows of information is to be achieved by means of solutions which enable the sharing of archives by all subsystems. So it is clear how the help of a strong, organic and elastic support system is fundamental to informatiodmanagement activities. The system must be organised in such a way that the data only originate from primary sources, identified as follows: original data, generated by management processes; second level data, produced by processing procedures; complex data, resulting from the automatic acquisition from more than one archve. The three above-mentionedtypes are analysed and described as follows.
Original data: We can define as original all the data that, as compared with the information process as a whole, are collected at the origins of the process in a strictly correlated way to the operational management of the activity. Original data are therefore all the current information put together, managed by the part of the information system that can be defined as transactional or On Line Transactional Processing (OLTP).
Second level data: This is data obtained through the processing of information already acquired by the information system, possibly by procedures of a different subsystem to the one which is using the data.
Complex data : This is data necessary to activities of control, management and statistical and epidemiological assessment, which have been originated by a crossing of data present in more than one archive at the moment in which they are correlated, in order to be able to express significant values. This therefore deals with the definition of one or more data warehouses, starting from which the specific application systems see to the processing of a more aggregated level, in an On Line Analytical Processing (OLAP) logic.
186
Figure 4. The layers of the health information system. The division by sorts of data dealt with above, can be interpreted in the light of the centrality of the patient in the health information system: Figure 4 visualises the analogy between the levels of data complexity and the thematic layers of the health information system related to the patient: the system centres upon the patient, who is at the base of the information pyramid; the next layer is constituted by the original data, which are related to the patient; 0 fbrther up, the processing characteristic of operational systems is placed (second level data), which operates fundamentally for the
187
service of the patient, both directly, in that they are the direct user of them, and indirectly, as support to the work of health staff; the top two layers, both characterised by complex data resulting from statistical or OLAP processing, divide the information about the patient according to the two axes - effectiveness and efficiency - recalled most often: J in the first direction, the essential processing is of the medical and clinical sort, supporting the quality of the health outcome and capability of the system to guarantee the necessary levels of assistance; J in the second direction, the typical processing are those supporting managerial and strategical decisions and control of expenditure and the relative production level. 4.
Orientation of the patient and analysis capability
As a whole, the information system of a health Company must be able to undertake many and diverse tasks and it is necessary to computerise adequately all the processes inside the Company’s administrative machine. In this sense, the general orientation to the patient, which constitutes the reading key favoured here, obviously is not sufficient. In fact, it is clear that there are specific areas of management and analysis where it is essential to support specific processes. For example, in the support of specific activities like that of hotels or in interventions of industrial medicine, it is necessary to interface more to companies than to individuals. However, according to the definition presented here, the computerising of internal processes (accounts, staff management, supplies), even though essential, must be a result of a total design which puts the management of processes orientated to users outside the Company at the centre. And this for two reasons: firstly, because the average level of computerisation of administrative processes is often already higher than those strictly productive, so that effectiveness and efficiency can be improved in the second direction; secondly, because only by making data flow in a direct and integrated way from production processes to administrative ones, in particular according to the analysis of effectiveness and efficiency mentioned above, is it possible to provide real added value to the Company through its information system. The ambition of this proposal of computerisation based on an integrated information system is moving in this direction. This can be illustrated with the aid of a diagram, visualised at Figure 5, of the main (certainly not only) ways of access the patient has to the health system.
188
Q P
USER ASSOCIATION
Management Operator
Operator
Figure 5. Simplified diagram about access to health services
The three “ways of access” identified in the figure (medical checkup/prescription from the family doctor, hospital access in adrmssions and hospitalizazation, information access by way of the PR Office) set up a path which, through the automation of the phases of the patient’s entrance into the health and medical care system (Appointments Centre, ADT, support to the PR Office), constructs a precise information route whch has numerous uses:
0
0
firstly, the patients’ register is placed as a place for co-ordination and integrated management of the different information elements; the various processes of care act on this level (admissions, tests and check-ups), these produce both technical information elements (for example the Hospital Discharge Card) but also effects such as admmistration and economics, with the subsequent assessment of the services delivered; the admmistrative and control processes, i.e. in a general line with all the internal aspects necessary for developing the analysis capability of the Company and its top management, are the consequence of the access process and the complex path of welfare and medical care.
189
All in all, therefore, building up an information system orientated to the patient, if the flows handled are correctly integrated, also results in being an effective and complete way of supporting the internal processes of an administrative sort, because of their natural correlation to the welfare and medical care path. 5.
The Hospital Information System (HIS)
The Hospital Information System, so as to manage the complex wealth of company information and at the same time guarantee the involvement of operative departments, must satisfy the objectives illustrated here below:
Centrality of the patient Information which is generated about a patient at the moment they come into contact with the structure and receive the services requested, is collected and aggregated at the level of the patient themselves. Co-ordination of the welfare process The co-ordination and planned co-operation between operative units within the health structures (hospital specialists, family doctors, district teams in the hospital-territory system) enables the activation of a welfare process which ensures the highest levels of quality, timeliness and efficiency in the access to services and their delivery. Control of production processes Recent health legislation imposes an optimisation of expenditure and revision of structures and processes. The transformation according to modules of the company sort requires that medical care is carried out in the context of a global process which contemplates both clinical aspects and administrative ones. The primary objectives that the subsystem of the ClinicaVHospital proposes to resolve are as follows: improvement of the organisation of work for a more free and rational use of health structures; improvement of efficiency and effectiveness of services in the light of the qualitative growth of clinical welfare; reduction of time spent by patient in the health structure, by means of the organisation of waiting times between the request for a service and the relative delivery (average in-patient times);
190
rationalisation of the management of health information about the patient: previous complete and exhaustive clinical hstories; availability of information for investigations and statistical surveys, clinical and epidemiological research; use of the System as a decisional support. Figure 6 represents the integration between modules which compose the subsystem of the ClinicaVHospital Area. The assertion of the centrality of the register, i.e. of the patient, which is found to carry out the function of a primary node of the whole system is confirmed.
Figure 6. Integration of modules
191
6. Integration of modules - processes The integration of modules, i.e. the term Integrated Health System cannot be left behind by a new company vision: a vision based on “Processes”. This statement proposes another determining key for the “information system” as strategic variable: the capability of the system to be of support to company processes and, therefore to map itself out and configure in a flexible way on these processes. In short, the logics of design requested of an ERP system must regard not only administrative and accounting systems - now an acquired fact - but health ones too, both of the health-administration sort and health-professional sort. Each process is in itself complex and integrates administrative, health, economic and welfare elements. Each process requires therefore information integration towards itself and other correlated processes. As Figure 7 shows, an implicit logical information flow does exist which transports information from health processes (“production” in the strictest sense) towards directional processes (the “government” of production), transiting for processes which are to a certain extent auxiliary ones (but obviously essential for the functioning of the production machine of the company), of the healthladministrative type and the accounting/administrative one.
j .........................
. . . . . . . .
Figure 7. Integration of processes
. . . . . . . . . . . . . . . . . . . . . . . . .
192
The architecture of the system must enable the support of the processes described in the figure and the information flows which tie these processes, and in particular: the territory-hospital integration favouring the patient and relative welfare processes. The integration of welfare processes between operative units of hospitals and operators over the territory (family doctors and pharmacies) is activated through the support of the functionality of the whole process of delivery of services: from information activities, prescription, appointment making, delivery, to that of payment of tickets and withdrawallacquisition of return mformation. the integration of the operational units, whether clinical or not, as auxiliary service to the medical and health staff in the carrying out of activities relative to the care of the patient. The information system must handle in a unitary and facilitated way the activities of the care process controlling the process of the delivery of services requested and their outcome, so as to obtain an improvement in quality and efficiency. the integration of clinical-welfare information in order to guarantee compactness of the welfare process. The visibility of the status of the total clinical-welfare process towards the patient is made possible thanks to the access to previous clinical records. 0 the integration of information with directional purposes as auxiliary service to the management personnel of the Company. Following the latest reforms, Companies are pursuing management improvement, guaranteeing the delivery of services at the highest levels of quality, aiming at the final objective of health, at the total outcome of welfare. From that comes the need to collect, reconcile and integrate information coming from the different information, administrative and health subsystems. the information integration with the Regional Health Office General Management in order to facilitate both the administrative operations aimed at reimbursements (communications with local health authorities and with the Regional office concerning budgets and services delivered), as well as control and regional supervision activities about health expenditure, and lastly activities of health (epidemiological) supervision. The interaction of the Company with the GM of the Regional Office has a double scope: to transmit promptly the necessary documentation to receive reimbursements and regional financial support the Company is due for services delivered, and to provide the Region with the information necessary in support of the governing of expenditure and management of the financial support, planning and rebalancing of the Regional Health System and the improvement of services for the population.
193
What follows here is an illustration of three examples of health processes which involve a series of integrated modules to achieve their objective. The key identifies the fbndamental health processes carried out at the hospital structure (represented on the lefi-hand side of the figures which follow), as well as the specific activities necessary for the carrying out of the processes themselves (in the green circles in the tables).
6.1. Delivery of Services Process The process of Delivery of Services starts out with the request (Prescription) for the Service itself from the family doctor, the hospital doctor, Emergency Units and concludes with the issue of the referral (health process). The ahstrative/accounting process is completed with the handling of payment, the production of flows relative to the Ambulatory Services, the handling of flows relative to the Mobility of Specialist Services.
194
Figure 8 - Delivery of services process
The process of making appointments (request) can be activated by the modules: Appointment centre and Web Appt. Centre (external patients); INTERNAL Appt. Centre - Admissions, Discharges and Transfers (ADT) and Ward Management (internal patients); 0 Emergency Department and Admissions (Management of Urgent Requests); Out-Patient Management (Direct Acceptance of Service). All the services within the preceding modules must automatically generatelfeed the works list of the Delivery Units of the services themselves (Ambulatory Management) and, if subject to payment, produce written accounts to be visible at the module of Cash Department Management (the movement is recalled by the identification code of the appointment, or alternatively by personal data). The acceptance and delivery of the service activates the referral process and the management of its progress; the issuing of the referral qualifies its visualisation by the modules: AmbulatoryILaboratories Management, Admissions Discharges and Transfers (ADT), Ward Management, Emergency and Admwions Department (EAD), Operating Theatres, Departments concerned. The services registered as delivered activate the processes of administrative/accounting flows: control and analysis of ambulatory services (production of File “C”), control and data validation for feeding the Mobility of Specialist Services. 6.2. Hospitalisation process The Hospitalisation process starts out with the request (Prescription) for admission to hospital from the family doctor, hospital doctor, Emergency Units and concludes with the discharge of the patient and the closing up of their Clinical Record (health process). The administrativelaccounting process is completed with the handling of the Hospital Discharge Card (HDC) and the pricing of the HDC itself (Grouper 3M integration), the production of flows supplied by the closed HDC’s, the management of flows relative to the Hospitalisation Mobility.
195
Figure 9 - Hospitalisation process The Hospitalisation Process (request) can be activated by the modules: Admissions, Discharges and Transfers (ADT), for programmed hospitalisation and as DH (both as administrative admission and waiting list); Ward Management (programmed hospitalisation); Emergency Department and Admissions (Urgent hospitalisation); Internal Appt. Centre for DH admmions with servicedprocedures by appointment. Once admission has been done, aside from the module used, the patient is placed directly onto the in-patients list for the ward or onto the list of admissions to be confirmed for the ward. From the ADT modules and Ward Management it is possible to visualise the preceding reports of hospitalisation and ambulatory services, i.e. the patient’s clinical history.
196
The cards relative to hospitalisation, once filled out, are recorded in the HDCDRG, which sees to the carrying out of pricing (Grouper 3M) and formal controls and then to feeding the Hospitalisation Mobility module. The In-patients Ward Management represents in itself the evolution of a clinical-health process in the area of administrative process (admission and filling out of the Discharge card). The degree of integration sees to it that the list of in-patients of the single ward is fed by all the modules of the Health System whlch can carry out the admmistrative hospitalisation. Data relative to therapiedmedicine giving and the pages of the specialist clinical record are visible in the complete clinical history of the patient. The giving of medicine means an automatic writing for discharging form the pharmacy cabinet, which is integrated with the Pharmacy Store for the management of the sub stock and relative order for supplies. From the specialist clinical record it is possible to access all administrative and clinical data relative to the hospitalisation in itself as well as previous hospitalisations (including therein all the data about ambulatory services with referrals inside the Ambulatory Management). The figure which follows represents the logical flow relative to the Integrated management of an In-Patient Ward.
197
Figure 10 - Integration of an in-patient ward
6.3, Weyareprocess The Delivery of Welfare Services process starts out with the request (opening of file) for the Service itself from the family doctor, hospital doctor and is defined with the management of the Multi-dimension Assessment file for Adults and the Elderly and the putting onto the waiting list according to the type of welfare regime. The health process is completed with the discharge from the welfare structure and the closing of the Clinical Record. The admmistrative/accounting process is fed by the recording of the activities delivered, so the price lists relative to both private structures and the ones in the National Health Service are valid. On the basis of preventive activities it is possible to carry out expenditure forecasts for each structure and each type of welfare regime. For every welfare structure it is possible to import
198
specific plans (defined by the regions) relative to the activities carried out on patients on their lists. On the basis of the data from the files, it is possible to carry out controls on the suitability of said data; controls enable transparent management of payments made to private structures with welfare regime. Figure 11 - Welfare Process
Figure 1 1 -Welfare Process
The welfare process (request, opening of file) can be activated by the modules: Admissions, Discharges and Transfers (ADT), for the management of integrated home and protected residential assistance, should this activity be strictly connected to the post-hospitalisation phase, i.e. protected discharge; 0 Social-Health department (management of the Multi-dimension Assessment file for Adults and the Elderly and the putting onto the waiting lists). The assessment of the patient, for whom the request for assistance has been presented, activates the specific welfare regime:
199
Hospitalisation in Protected Welfare Accommodation; Day centre Care; Temporary Social Hospitality; Protected Accommodation; Day Centre for the demented; Rehabilitation assistance; Integrated Home Assistance; Hospice.
200
A WIRELESS-BASED SYSTEM FOR AN INTERACTIVE APPROACH TO MEDICAL PARAMETERS EXCHANGE GIANNI FENU University of Cagliari Department of Mathematics and Informatics e-mail:
[email protected] ANTONIO CRISPONI University of Cagliari Department of Mathematics and Informatics e-mail:
[email protected] SIMONE CUGIA University of Cagliari Department of Mathematics and Informatics e-mail:
[email protected]. it MASSIMILIANO PICCONI University of Cagliari Department of Mathematics and Informatics e-mail:
[email protected] The use of computer technology in the sector of medicine has seen lately the study of different fit applications to the clinical data management, being textual or images, across networks in which has always been given priority to the politics of flexibility and safety. In the pattern here shown I have summarized the factors of flexibility necessary for the development of systems for ample application-oriented spectrums, creditable to the technology investment wireless allowing safety and guaranteeing a certified exchange in client network-server. Besides the need to offer an ample base, in client side, has suggested the investment of different PDA patterns, making largely portable the application and allowing architectural independence. The same smart-wireless network smart-client allows cell-growth in the area of employment.
1
Introduction
The diffusion of computer tools in different sectors of modern medicine has marked an evident discontinuity in the steps of development in scientific activities. Notable benefits have been brought in the field of medicine, by the improvement of hospital services and from the consequential growth of the quality of the aid, that in different ways is related not only to the development of
20 1
information process, in a precise sense, but even in the transferral of mformation. In particular the transferral of information has seen in time and improvement of applications inside and outside of the hospital environment. The evolution of the computer technology and of the wireless networks, combined to the development of applications in order to reconcile the mobility of the consumers, offer several application fields. Among innovative services and technologies, this contribution was born, which presents itself as an aim to study and implementation of a client-server network in hospital environment with the use of palm-devices for the consultation, the visualization and the insertion of the data relating to the clinical folders of the patients in interactive way. It allows the recovery of information relatives to any previous patient admission, clinical folder data or only to verify the quality and/or quantity of specific parametric data. The opportunity to be able to consult the patients data base w i t h the hospital structure allows an analysis and monitoring therapies, without the necessity of paperwork, suggesting a different method of visualization of the data in each department. The client interface has been studied to guarantee a easier use and consultation, makmg its user-friendly for users with a little computer knowledge and allow security and flexibility in hospital departments. Particular attention has been given to the internal radiofrequencies in the hospitals, through the use of the ISM frequencies ( Industrial, Scientifical and Medical) specified by the ITU Radio Regulations for the scientific and medical devices, with inclusive bands among 2,4 and 2,5 GHz, consenting the network to conform to law. 2
Architectural Model
The solutions for the development of wireless applications are distinguished in browser-based, synchronization-based, and smart-client [3]. The browser-based introduces the disadvantage to require a permanent connection, with the consequent problems of high data exchange, more than user's necessity, and cachmg of informations, that would be able not result update. The synchronization-based solution introduces the contrary disadvantage, the offline operation, and it doesn't allow the system to work in real-time in wireless network; the application uses a cache of data on the handled device. The smart-client solution allows the network exchange of in demand informations only, guaranteeing data rate and simple inquiry mode; a further advantage is the independence by network architecture, integrating in existing server arclutecture.
202
Figure 1. Smart-client architecture. Wi-Fi connection between a PDA client and a server. The client-server systems was born as coordination of two logical systems ( the client and the server ) and of their components in applications. The clientserver application planning requires the choice of appropriate architecture; the client-server software has to be modular, because many parts of an application were execute in more then a system [I]. The separation of application functions in different components makes easier client-server processes, as components give clear locations when the applications are distributed. It needs besides to divide application modules in independent components, so that a component of request (client), it’s expect outputs through a component of response (server). Beginning from a classical definition client-server architecture like interaction among application distributed components, it notices that the execution location of components become main point for performances of the application and of the system. Some components of applications are able be executed in more efficient way in the server or in the client. Tasks of any fknctions are given below: Tasks of the client:
Visualizatiodpresentation Interaction with users; Request formulation through the server. Tasks of the server: Query on the database;
203 0
Possible arithmetics and graphic processes;
0
Patient’s data management; Communications and authentications of users; Response process.
To make executable the code on more handled device (pocket-size PC and PALM ) it’s important to eliminate the dependences of operating system by the code of applications; because currently hardware interfaces and peripherals of PDAs are not standardized, allowing so the possibility of use on different operating systems ( Palm Os, and Win CE ). In network architectures, the management of client-server interactions uses communication techniques based on socket, considered currently the more flexible technology. The client negotiates the socket with the server and establishes a connection on the same socket; through this channel all informations of every single client converge on server, so the same client, communicates with the server in different moments, would not be able to receive data fiom the same port of the server.
3
Patients’ data management
The filing, the management, and the consultation of data-base are managed by a DBMS engine, that allows the abstraction of data by the way when these are storaged and managed, and, above all, the possibility to perfom queries with a high level language. Particular consequence has again the authentication system, to avoid database consultations by unauthorized users. To can implement such mechanism is scheduled different users’ profiles diversified in several authorization levels, to the minimum, in two areas: a narrow area and a reserved area. On the skeleton of the clinical folders of the patients, had studied a relational structure contained information fields of patients: the bed in which is entertained, the informations to the hospitalization, the temperature bend, the water budget, the parameters relating to the breathing and cardiac rate, the specific therapy and the medical examinations of the patient.
204
Figure 2. Scheme of the database.
4
Client interface
The application answers three fundamental requirement: provide users’ informative reports, provide data transfer security and reliability and provide the data communication of different format but compatible with the computational archtecture and the interface of clients. The mechanism of communicatiodsynchronization between server and client are implemented through a pattem-matching system to interpret, on serverside, client commands, and on client-side, server commands [ 11 [4]. Steps of the communication between server and client are: Send from client-side a string composed by association #<parameter>; 0
Decoding operated by the parser on server-side of command string;
205
Execution command and following operation on database; Encoding informations server-side #; Send from server-side informations; Decoding operated by the parser on client-side of response string.
I
Cognome: ICognome
,.
I 4 Pnitd Operatiw
I
15i03i20031
Figure 3. Searching mask of patients. I
$
Cognome: Nome: Ie Interu.: ,....,:vento Letto: II 6ttn
Pnitd Oparatiw
I
15/03/20031
, [TEMP.]
[m
Figure 4 . Searching mask of patient's personal data.
1
Frq. Card. 1-iminuto Frq. Resp m i m i n u t o Pres. Rrt. ( p m a x m m i n
'Dirte:
7
P i t d Operatiw
Figure 5
. Searching mask of
I
15i03/200:
cardiobreathing parameters.
206
During the implementation of communication protocol are given prominence to some problems in the management of data flow; this is depend on different implementation of sockets between client and server, to modify platforms. For the solution of those problems and the overcoming of limitations, it’s chosen of limit the information send fiom the server to the client in a fixed size of 100 bytes at step. The in general, server-side code interprets the commands in arrival fiom the client and it performs the calls to response functions.
Figure 6. Searching mask of water balancing parameters.
Figure 7. Searching mask of the parameters of the temperature bend.
Particular attention gives to the implementation of the temperature bend; from client-side it’s possible visualize the graph of the daily or weekly temperature. This element is of notable aid for example to verify during the medical discussion the post operating feedbacks of patients.
207
5
Standards for transmission
To provide request interactivity between client and server for to the exchange of real-time informations, is implemented a communication system extremely flexible and suitable to guarantee the mobility of the regulating PDA [2] [6] [7]. Currently in the market they are present different technological solutions to satisfy the quoted requisites. The transmission occurs in the ISM band (Industrial Scientific Medical) about the 2,45 Ghz; it uses spectrum spread technical to avoid interference signals. The transmission of the signal is forced on a greater band of that necessary and, in this way, more systems are forced to transmit in the same band [5]. In the classes of the wireless devices exist a difference between the devices born to integrate perfectly with Ethernet LAN preexisting ( IEEE 802.1 1 ), and devices that use an alternative technology to implement area personal computer network (PAN), as Bluetooth. The use of the protocol IEEE 802.11b, that defines a standard for the physical layer and for the MAC sublayer for the implementation of Wireless LAN, rapresents a system of communication that extends a traditional LAN on a radio tecnologies, and so it facilitates the integration in existing department. The adopted WLAN may be configured in two separate modes: Peer-to-peer. A connection without any existent fixed network device; two or more devices equippeded with wireless cards are able to create a lan on the fly. Clientherver. T h s mode permits a connection to an Access Point to more devices, and it works as a bridge between them and the wired lan. Bluetooth [8] is the name of an open standard for wireless communications, studied for transmissions to short ray, low power, with facility of use. Bluetooth works at the frequencies of 2,4 GHz in the ISM band in TDD ( Time Division Duplex ) and with Frequency Hopping (FHSS). It uses two different typology of transmission: the first, on bus of asynchronous transmission; it supports a maximum rate of 72 1 Kbps asymmetric; the second works on synchronous bus, a symmetrical installments of 432,6 Kbps. To communicate among them, two or more BD (Bluetooth Device) have to form a piconet (a radio LAN with a maximum number of eight components). The baseband level reserves a slot to the master, and a slot to the with a typically alternate course ( master-Slave 1-master-Slave2-. . .), for the resolution conflicts of the bus. The master is able transmit only in odd slots, while the Slave in pair slots. The forwarding scheme of the packets is entrusted to the mechanism of ARQ (Automatic Repeat request). The notification of an ACK
208
(ACKnowledgment) to the sender of the packet testifies the good result of the transmission. The smart-client architecture works on both wireless systems. To be able to works in a wide area is recommended the use of the technology 802.1 lb, that results be altogether to low cost, compatible with the standards Internet and Ethernet; it allows high trasmissive speed with good QoS (Quality of Services) to ensure low losses of packets in the applications in real-time. 6
Cryptography and Security.
One of the weak point in an any radio communication, is the privacy and the data security. Usually, for this reason, when a protocol for radio communications is unplemented, there are adopted advanced secutity standards. Even in this case, the security has been submitted to the functions of data authentication and cryptography. The process of the authentication is divided in two ways: Open System Authentication Shared Key Authentication The first way doesn't schedule real authentication, so any device is permitted to access. The second way happens with a pre-shared key. This process is similar to the process of authentication in use in the GSM architecture; in fact when a server receives an authentication request, it sends to the client a pseudorandom number. The terminal user, based on the pre-shared key and on the pseudorandom number, calculates the result output (through a no reversing fimction) and sends it to the server. T h s one makes the same computation and compares the two data; in this way it determines if the consumer is qualified to the access. In such mode it is guaranteed the PDA authentication-server. One of the aspects in the use of the standard 802.1l b is one's own security, which is entrusted to a protocol called WEP ( Wired Equivalent Privacy ), that is concerned with authentications of nodes and cryptography. The logical diagram of the WEP algorithm is represented in this figure:
209
b
WEp Secret Key
Kev seauence,
PRGN
b
QD+
IV
Ciphertex
~
lntegritycheckvalue (ICV)
Figure 8. Operation diagram of the WEP algorithm
The inizialization vector (IV) is a 24 bits key, linked with the secret key (40bit key). In this way we obtain a set of 64 bits that are integrated as input in a generator of pseudorandom codes ( PRGN WEP ) creating the sequence key. The data users (Plaintext) are linked with a value of 4 bits, called Integrity Check Value (ICV), generated from the algorithm for the integrity. At this point the key sequence and the output of the connection between plaintext and ICV, are submitted to a XOR operation (ciphertext). Then IV and chiphertext are lmked and subsequently transmitted. The IV changes for every transmission, and it’s the only that is transmitted in clear, to be able to reconstruct the message in receipt phase.
7
Conclusions
The described architecture is characterized for aspects of the simplicity of implementation, and the insertion in complex existent architecture. T h s integration model is characterized in portability, security and interactivity, and it consents to the user to interact with the server even to distance. There are in study some enhancements to process different models and criterions for direct data exchange in the PDA architectures. However the model of an user without ties in the treatment and in the access to parametric data of the patient represents in this moment a simply and reliable smart-client architecture.
210
References 1. E. Guttman and J. Kempf, Automatic Discovery of Thin Servers: SLP, Jini and the SLP Jini Bridge., Proc 25th Ann. Conf, IEEE Industrial Electronics SOC.(IECON 99), IEEE Press, Piscataway, N.J., 1999. 2. AAW, a cura di F. Mwatore, Le comunicazioni mobili del hturo UMTS:il nuovo sistema del200 1, CSELT, 2000. 3. Rajan Kuruppillai, Mahi Dontamsetti and J. Casentino, Tecnologie Wireless, Mc Graw Hill, 1999 4. L. Bright, S. Bhattacharjee and L. Rashid, Supporting diverse mobile applications with client profiles, International Workshop on Wireless Mobile Multimedia, Proc. 5" Ann. Conf. ACM, p. 88-95, Atlanta, Georgia, 2002 5. European Telecommunications Standards Institute, official site: http://www.etsi.org 6 . PDA Palm official site: http://www.palm.com 7. Italian Palm Users Group, official site: http://www.itapug.it 8. BlueTooth, official site: http://www.bluetootli.com
This page intentionally left blank