Pattern Recognition 33 (2000) 875}876
Editorial
Special Issue on Mathematical Morphology & Nonlinear Image Processing As a discipline mathematical morphology has its roots in the pioneering work of Matheron [1] and Serra [2]. Mathematical morphology is a methodology for investigating geometric structures in images and it has been receiving growing attention in recent years. This is evident by the many industrial applications that have been developed and are currently being developed. These range from measurements of particles in microscope images to analyses of identi"able features in earth resources satellite systems. While the discipline of digital image processing has matured within the framework of linear systems, novel areas of nonlinear signal processing continue to appear. Indeed, all digital image processing is, of necessity, nonlinear since it involves the processing of "nite bit strings through logic circuits [3]. As a subject, nonlinear image processing has tended to focus mainly on design and analyses of "lters [3,4]. Nonlinear "lters have the ability to pass structural information while suppressing noise clutter and for the most part they involve Min/Max operations. The purpose of this issue is to provide an overview of existing and emerging techniques for morphological and nonlinear image processing. The published papers re#ect the variety of strategies and methodologies that can be applied to achieve similar results. All the papers represent in some way the state of the art, and healthily demonstrate that this is a dynamic subject area. The "rst paper by Roerdink presents the mathematical structure of group morphology. It surveys and reviews the representation of fundamental morphological operations acting on data ranging from sets through general lattices. An extension of existing formalism to encompass non-commutative group mappings on lattices is presented, with an emphasis on the motion group theory in 2D as an illustrative theme. The paper by Bloch presents the fuzzy geodesic morphological operators which are based on the de"nitions of fuzzy geodesic distances between points in a fuzzy set and fuzzy geodesic balls. These operations enhance the set of fuzzy morphological operations, leading to a conditional transformation of a fuzzy set to another fuzzy set.
The paper by Bieniek and Moga describes a variation of the original watershed algorithm that provides optimized performance and reduced memory requirements, while producing the same output as any watershed algorithm that does not construct watershed lines. The novelty of the approach is the use of the connected component operator to solve the watershed segmentation problem. The paper by Cheng and Venetsanopoulos deals with adaptive morphological operations. Investigation of their properties revealed an interesting way of handling and fast processing of images. Illustrative examples are included. The paper by Gader et al. presents and proves an interesting relationship between regularization theory and morphological shared-weight neural networks with no hidden units. This requires deriving the Fourier transforms of the Min and Max operators. The paper by Pessoa and Maragos introduces a novel class of neural networks (morphological/rank/linear neural network), which should provide an alternative architecture for researchers working in the "eld to consider. Applications to problems of optical character recognition are discussed. The paper by Saryazdi et al. is concerned with a new non-uniform subsampling strategy based on mathematical morphology, where samples are selected considering local visual quality of reconstructions. The method is validated by a comparative study in image compression applications. Shape comparison is one of the fundamental problems of machine vision. The paper by Tuzikov et al. discusses similarity measures for convex polyhedra based on Minkowski addition and Brunn}Minkowski inequality, using the slope diagram representation of convex polyhedra. The paper by Schavemaker et al. studies implementations of Kramer}Bruckner "lter for image sharpening. The sharpening operator is de"ned in terms of gray-scale erosion and dilation and can be represented by a partial di!erential equation. Experimental results using document images are provided. The paper by Gasteratos and Andreadis presents a new general digital hardware structure capable of
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 5 1 - X
876
Editorial / Pattern Recognition 33 (2000) 875}876
computing a wide range of nonlinear "lters including standard and soft morphological operations. The paper by Vlassis et al. provides a reference to analog implementation of nonlinear "lters and presents case studies of implementations based on the current mode principle. The paper by Jones and Jackway introduces a new image texture representation technique. Through the use of two di!erent parameterized monotonic mappings, this technique transforms the input image into a function on two dimensions that may be regarded as a surface called granold. Furthermore, it establishes the fundamental properties of the granold and provides experiments using gray level thresholds and morphological granulometries. The paper by Batman et al. extends multivariate granulometries to multivariate heterogeneous granulometries in which each structuring element is scaled by a function of its sizing parameter. The basic morphological properties of heterogeneous granulometries are studied. The paper by Hirata et al. deals with the increasing translation invariant binary "lters where optimal "lters need to be estimated from data. The presented algorithm is based on an error-related greedy property and has the advantage that the search is over a smaller set than other algorithms. The algorithm is applicable to relatively large windows. The paper by Aubert and Jeulin is devoted to the interesting problem of binary image reconstruction based on "rst, second and third order correlations.
The paper by Kesidis and Papamarkos addresses a new Hough transform inversion technique. Applications to image edge extraction and "ltering are provided. I am grateful to Mrs. Mossman and Dr. Ledley for their encouragement in bringing out this special issue. Also, I would like to thank the contributions by authors and the e!orts of the referees who worked under a tight schedule.
Ioannis Andreadis Guest Editor ¸aboratory of Electronics, Department of Electrical & Computer Engineering, Democritus University of Thrace, 671 00 Xanthi, Greece. E-mail address:
[email protected] References [1] G. Matheron, Random Sets and Integral Geometry, Wiley, New York, 1975. [2] J. Serra, Image Analysis and Mathematical Morphology, Academic Press, London, 1982. [3] E. Dougherty, J. Astola, An Introduction to Nonlinear Image Processing, SPIE Press, Bellingham, 1994. [4] I. Pitas, A. Venetsanopoulos, Nonlinear Digital Filters: Principles and Applications, Kluwer, Boston, 1990.
Pattern Recognition 33 (2000) 877}895
Group morphology Jos B.T.M. Roerdink* Institute for Mathematics and Computing Science, University of Groningen, P.O. Box 800, 9700 AV Groningen, The Netherlands Received 23 June 1998; received in revised form 27 July 1999; accepted 27 July 1999
Abstract In its original form, mathematical morphology is a theory of binary image transformations which are invariant under the group of Euclidean translations. This paper surveys and extends constructions of morphological operators which are invariant under a more general group T, such as the motion group, the a$ne group, or the projective group. We will follow a two-step approach: "rst we construct morphological operators on the space P(T) of subsets of the group T itself; next we use these results to construct morphological operators on the original object space, i.e. the Boolean algebra P(En) in the case of binary images, or the lattice Fun (En, T) in the case of grey-value functions F : EnPT, where E equals R or Z, and T is the grey-value set. T-invariant dilations, erosions, openings and closings are de"ned and several representation theorems are presented. Examples and applications are discussed. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Mathematical morphology; Image processing; Boolean algebra; Complete lattice; Minkowski operations; Symmetry group; Dilation; Erosion; Opening; Closing; Adjunction; Invariance; Representation theorems
1. Introduction Mathematical morphology in its original form is a settheoretical approach to image analysis [1,2]. It studies image transformations with a simple geometrical interpretation and their algebraic decomposition and synthesis in terms of elementary set operations. Such an algebraic decomposition enables fast and e$cient implementations on digital computers, which explains the practical importance of such decompositions, see e.g. Ref. [3]. In order to reveal the structure of binary images, small subsets, called structuring elements, of various forms and sizes are translated over the image plane to perform shape extraction. In this way, one obtains image transformations which are invariant under translations. The basic &object space' is the Boolean algebra of subsets of the image plane. In practice, it may be necessary to relax the restriction of translation invariance. For example, some images have
* Corresponding author. Tel.: #31-50-3633931; fax #31-503633800. E-mail address:
[email protected] (J.B.T.M. Roerdink)
radial instead of translation symmetry [2, p.17], requiring a polar group structure, see Example 2.8 below. In this case the size of the structuring element is proportional to the distance from the origin. The appropriate generalization of Euclidean morphology with arbitrary abelian symmetry groups was worked out by Heijmans [4], see also Ref. [5]. In the case of grey-level images a lattice formulation is required, see Refs. [6}9]. Again one may introduce a symmetry group, and a complete characterization of morphological operators for the case that this group is abelian was obtained by Heijmans and Ronse [10,11]. This paper extends Euclidean morphology on Rn by including invariance under more general transformations using the following general set-up. Take an arbitrary set E and a group T of transformations acting transitively on E, meaning that for every pair of elements x, y3E there is a transformation g3T mapping x to y. One says that E is a homogeneous space under T. Then T-invariant morphological operators on the space P(E) of subsets of E can be constructed [12}14]. A further extension concerns non-Boolean lattices, such as the space of greyscale functions on E. The basic assumption made in this paper is that the lattice has a sup-generating family l and
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 5 2 - 1
878
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
a group T of automorphisms which acts transitively on l, thus generalizing the work of Heijmans and Ronse [10,11] who considered the abelian case. The motivation for this approach derives from computer vision, where an important question is how to take the projective geometry of the imaging process into account. In many situations one does not want to distinguish between rotated versions of the same object. This is, for example, the basic assumption made in integral geometry in order to derive a complete characterization (Hadwiger's Theorem) of functionals of compact, convex sets in Rn [15]. Another example occurs in invariant pattern recognition, where the goal is to recognize patterns irrespective of their orientation or location [16]. In image understanding, one wants to derive information about three-dimensional (3D) scenes from projections on a planar (2D) image screen. In this case it is natural to require invariance of image operations under the 3D camera rotations [17]. So one may require invariance under increasingly larger groups, such as the Euclidean motion group, the similarity group, the a$ne group or the projective group, which are all non-commutative groups. For general questions of invariance in computer vision, see, for example, Ref. [18]. The purpose of this paper is to describe the mathematical structure of group morphology. For practical applications special algorithms are required, which extend the basic translation-invariant operations supported by standard image processing packages. An in-depth discussion of such algorithmical and computational issues is beyond the scope of this paper; however, some pertinent remarks can be found in the example presented in Section 4.6.2 below. The organization of this paper is as follows. In Section 2 we summarise Euclidean morphology together with some general lattice concepts, and present some material on group actions. Section 3 reviews the construction developed in Refs. [12}14] of morphological operators on Boolean lattices, which are appropriate for binary image processing. The starting point is a group T acting transitively on a set E. First, T-invariant morphological operators are de"ned on the lattice P(T) of subsets of T by generalizing the Minkowski operations to non-commutative groups. Next morphological operators are constructed on the actual object space of interest P(E) by (i) mapping the subsets of E to subsets of T, (ii) using the results for the lattice P(T), and (iii) projecting back to the original space P(E). Graphical illustrations are given for the case where T equals the Euclidean motion group M generated by translations and rotations. Section 4 deals with non-Boolean lattices, and as a special case we discuss T-invariant morphological operators for grey-scale functions. The material in this section is new. Section 5 contains a summary and discussion.
2. Preliminaries In this section we review Euclidean morphology and introduce some general concepts concerning complete lattices and group actions. 2.1. Euclidean morphology Let E be the Euclidean space Rn or the discrete grid Zn. By P(E) we denote the set of all subsets of E ordered by set-inclusion. A binary image can be represented as a subset X of E. Now E is a commutative group under vector addition: we write x#y for the sum of two vectors x and y, and !x for the inverse of x. The following two algebraic operations are fundamental in mathematical morphology: Minkowski addition: X=A"Mx#a : x3X, a3AN "Z X "Z A , a x a|A x|X Minkowski subtraction: X>A"Z X , ~a a|A where X "Mx#a : x3XN is the translate of the set a X along the vector a. In preparation for later developments we introduce here the operator q : P(E)PP(E) by q (X)"X , refera a a red to as &translation by a'. Clearly, q q "q , q~1" a a{ a`a{ a q . Hence the collection T:"Mq : a3EN also forms ~a a a group, called the translation group, which is &isomorphic' (as a group) to E, for to each point a there corresponds precisely one translation q 3T , i.e. the one which a maps the origin to a. Because of this 1}1 correspondence, one usually ignores the distinction in Euclidean morphology. Let the reyected or symmetric set of A be denoted by A[ "M!a : a3AN. The transformations dT and eT deA A "ned by dT (X) " : X=A"Mh3E : (A[ ) WXO0N, A h
(1)
eT (X) " : X>A"Mh3E : A -XN, A h
(2)
are called dilation and erosion by the structuring element A, respectively. To distinguish these translationinvariant operations from later generalizations, we explicitly indicate the dependence on the Euclidean translation group T and refer to them as T-dilations and Terosions. There exists a duality relation with respect to set-complementation (X# denotes the complement of the set X): X=A"(X#>A[ )#, i.e. dilating an image by A gives the same result as eroding the background by A[ . To any mapping t : P(E)PP(E) we associate the (Boolean) dual
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
mapping t@ : P(E)PP(E) by t@(X)"Mt(X#)N#.
(3)
Remark 2.1. Matheron and Serra de"ne the Minkowski subtraction of X by A as follows: X>A"5 X . The a|A a advantage of this de"nition is that the duality relation does not involve a re#ection of the structuring element. But it complicates the expression of adjunctions (see Section 2.2.2), which is a notion persisting in lattices without complementation. Minkowski addition and subtraction have many standard algebraic properties [15]. Two important properties are distributivity w.r.t. union or intersection,
A B
A B
Z X =A"Z X =A, Y X >A"Y X >A, i i i i i|I i|I i|I i|I and translation invariance: (X=A) "X =A, (X>A) " h h h X >A. Dilation and erosion are increasing mappings, i.e. h mappings such that for all X, >3P(E), X-> implies that t(X)-t(>). Other important increasing transformations are the opening aT and closing /T by a structuring element A: A A : (X>A)=A"ZMA : h3E, A -XN, aT (X) " A h h /T (X) " : (X=A)>A"YM(A[ #) : h3E, (A[ #) .XN. A h h The opening of X is the union of all the translates of the structuring element which are included in X. The closing of X by A is the complement of the opening of X# by A[ . 2.2. Lattice concepts Here we summarize the main concepts from lattice theory needed in this paper, cf. Refs. [6,7]. For a general introduction to lattice theory, see Birkho! [19]. De5nition 2.2. A complete lattice (L,)) is a partially ordered set L with order relation ), a supremum or join operation written s and an in"mum or meet operation written ', such that every ("nite or in"nite) subset of L has a supremum (smallest upper bound) and an in"mum (greatest lower bound). In particular there exist two universal bounds, the least element written OL and the greatest element IL . In the case of the power lattice P(E) of all subsets of a set E, the order relation is set-inclusion -, the supremum is the union 6 of sets, the in"mum is the intersection 5 of sets, the least element is the empty set 0 and the greatest element is the set E itself. An atom is an element X of a lattice L such that for any >3L, OL )>)X implies that >"OL or >"X. A complete lattice L is called atomic if every element of L is the supremum of the atoms less than or equal to it. It is called Boolean if (i) it satis"es the distributivity laws
879
Xs(>'Z)"(Xs>)'(XsZ) and X'(>sZ)" (X'>)s(X'Z) for all X, >, Z3L, and (ii) every element X has a unique complement X#, de"ned by XsX#"IL , X'X#"OL . The power lattice P(E) is an atomic complete Boolean lattice, and conversely any atomic complete Boolean lattice has this form. 2.2.1. Mappings The composition of two mappings t and t on 1 2 a complete lattice L is written t t , and instead of tt 1 2 we also write t2. An automorphism of L is a bijection t : LPL such that for any X, >3L, X)> if and only if t(X))t(>). If t and t are operators on L, we 1 2 write t )t to denote that t (X))t (X) for all 1 2 1 2 X3L. De5nition 2.3. A mapping t : LPL is called (a) idempotent, if t2"t; (b) extensive, if for every X3L, t(X)*X; (c) anti-extensive, if for every X3L, t(X))X; (d) increasing (isotone, order-preserving), if X)> implies that t(X))t(>) for all X, >3L; (e) a closing, if it is increasing, extensive and idempotent; (f) an opening, if it is increasing, anti-extensive and idempotent. De5nition 2.4. Let L and LI be complete lattices. A mapping t : LPLI is called (a) a dilation, if t(s X )"s t(X ); i|I i i|I i (b) an erosion, if t(' X )"' t(X ). i|I i i|I i When T is an automorphism group of two lattices L and LI , a mapping t : LPL I is called T-invariant or a T-mapping if it commutes with all q3T, i.e., if t(q(X))"q(t(X)) for all X3L, q3T. Accordingly, one speaks of T-dilations, T-erosions, etc. If no invariance under a group is required, one may set T"MidL N, where idL is the identity operator on L. 2.2.2. Adjunctions De5nition 2.5. Let e : LPLI and d : L I PL be two mappings, where L and L I are complete lattices. Then the pair (e, d) is called an adjunction between L and L I , if for every X3LI and >3L, the following equivalence holds: d(X))>QX)e(>). If L I coincides with L we speak of an adjunction on L. It has been shown [10,11,20] that in an adjunction (e, d), e is an erosion and d a dilation. Also, for every dilation d : LI PL there is a unique erosion e : LPLI such that (e, d) is an adjunction between L and L I ; e is given by e(>)"sMX3L I : d(X))>N, and is called the upper adjoint of d. Similarly, for every erosion e : LPLI
880
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
there is a unique dilation d : L I PL such that (e, d) is an adjunction between L and LI ; d is given by d(X)"'M>3L : X)e(>)N, and is called the lower adjoint of e. Finally, for any adjunction (e, d), the mapping de is an opening on L and ed is a closing on L I . In the case that L and L I are identical, one sometimes refers to such openings and closings as morphological or adjunctional [7]. 2.2.3. Sup-generating families De5nition 2.6. A subset l of a lattice L is called supgenerating1 if every element of L can be written as a supremum of elements of l. Let L be a lattice with sup-generating subset l. For every X3L, let l(X)"Mx3l: x)XN. The following properties hold [7,10,11]: X"Sl(X),
A B A B A B
(4)
l R X "Y l(X ), j j j|J j|J
(5)
l S X *Z l(X ), j j j|J j|J
(6)
S Z l(X ) "S (X ), (7) j j j|J j|J Note also that the operators l : XPl(X) and s : G C sG (i) are increasing, and (ii) form an adjunction between L and P(l): SG)XQG-l(X). This equation, together with Eq. (4), also implies the equivalence X)>Q l(X)-l(>). Atoms of a lattice L are always members of a supgenerating subset. L is atomic if the set of its atoms is sup-generating. For example, given a set E, the set of singletons is sup-generating in the lattice P(E). 2.3. Group actions Let E be a non-empty set, T a transformation group on E. Each element g3T is a mapping g : EPE, satisfying (i) gh(x)"g(h(x)), and (ii) e(x)"x, where e is the unit element of T, and gh denotes the product of two group elements g and h. Instead of g(x) we will usually write gx. We say that T is a group action on E [21,22]. T is called transitive on E if for each x, y3E there is a g3T such that gx"y, and simply transitive when this element g is
1 The dual concept is that of an inf-generating subset [7].
unique. A homogeneous space is a pair (T, E) where T is a group acting transitively on E. Any transitive abelian group T is simply transitive. The stabilizer of X3E is the subgroup T : "Mg3T : gx"xN. Let u be an arbitrary x but "xed point of E, henceforth called the origin. The stabilizer T will be denoted by & from now on: u &:"T "Mg3T : gu"uN. u The set g &:"Mg s : s3&N of group elements which x x map u to a given point x is called a left coset. Here g is x a representative (an arbitrary element) of this coset. In the following we present some examples of homogeneous spaces. In each case T denotes the group and E the corresponding set. Example 2.7 (Euclidean group). E"Euclidean space Rn, T"the Euclidean translation group T. T is abelian, therefore it can be identi"ed with E [14]. Elements of T can be parameterized by vectors h3Rn, with q the h translation over the vector h : q x"x#h, x3Rn. h Example 2.8 (Polar group). E"R2C M0N, T"the abelian group generated by rotations and scalar multiplication w.r.t. the origin. In this case points of E can be given in polar coordinates (r, h), r'0, 0)h(2n. Again T can be identi"ed with E and the group multiplication is (r , h )*(r , h )"(r r , h #h ), cf. Ref. [5]. 1 1 2 2 1 2 1 2 Example 2.9 (Spherical group). E"the sphere S2, T"the non-abelian group SO(3) of rotations in 3-space (see Ref. [23]). The subgroup leaving a point p "xed is the set of all rotations around an axis through p and the center of the sphere. Example 2.10 (Translation-rotation group). E"Euclidean space R2, T"the Euclidean motion group M (proper Euclidean group, group of rigid motions) [24]. The subgroup leaving a point p "xed is the set of all rotations around p. M is not abelian. The collection of translations forms a subgroup, the translation group T. The stabilizer & equals the group R of rotations around the origin, which is abelian. A group element c , h3R2, h,( /3[0, 2p), acts upon a point x3R2 as follows:
AB A AB
BA B A B
x cos / !sin/ x h 1 " 1 # 1 , c h,( x sin / cos / x h 2 2 2 x x" 1 3R2 x 2 Let q denote the unique (Euclidean) translation by h (cf. h Example 2.7), and let r be the rotation around the origin ( over an angle /. It is easy to verify that c "q r . From h,( h ( the relations q q "q , r r "r , r q "q ( r , h h{ h`h{ ( ({ (`({ ( h rh (
(8)
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
it is clear that we can represent any element of the motion group as the product of a single rotation around the origin followed by a single translation. The last equality in Eq. (8) expresses the fact that the motion group M is the semi-direct product of T and R [21,22]. We now introduce a graphical representation of the group elements. De"ne a pointer p to be a pair (x, v), where x is a point in the plane and v a unit vector attached to x. We call x the base-point of p. De"ne the base-pointer b to be the pair (u, e ), where e "(1,0), i.e., 1 1 b is a horizontal unit vector attached to the origin u. Any pointer p represents a unique element of M: if p"(x, v), where v"(cos /, sin /), then this element is precisely the motion c which maps b to p. The 2D rotation group h,( R is represented by the set of unit vectors attached to the origin, and T is represented by the collection of horizontal unit vectors attached to points of R2. In the discrete case we will use a hexagonal grid, and M will denote the subgroup of all motions which leave the grid invariant. Also, T now becomes a discrete set of translations, and R is a "nite group with six elements: rotations around the origin over k ) 60 deg, k"1, 2,2, 6. The reader may refer to Fig. 1, where subsets of the grid are indicated by dots and subsets of M by dots with one or more unit vectors attached to them. Notice also that the coset q &:"Mq r : r3RN of all motions carrying the origin to y y a given point y is represented on the hexagonal grid by the six unit vectors attached to y. Example 2.11 (Azne group). E"Euclidean space Rn (n*2), T"the a$ne group. The subgroup & leaving the origin "xed is the linear group G¸(n, R), whose elements are n]n invertible matrices a. A group element acts upon a point x3E as follows: c x"ax#h, a3G¸(n, R), h3Rn. h,a Let o : xPax denote the linear transformation by the a matrix a. Then c "q o . The relation o q o~1"q a h,a h a a h a oh
881
again expresses the fact that the a$ne group is the semi-direct product of T and G¸(n, R) [21,22].
3. Group morphology for Boolean lattices This section reviews the construction developed in Refs. [12}14] of morphological operators on Boolean lattices, appropriate for binary image processing, with a transitive group action. First we consider in Section 3.1 the case that E is a homogeneous space under a group T acting simply transitively on E. In this case there is a bijection between E and T: let u (the &origin') be an arbitrary point of E, and associate to any x3E the unique element of T which maps u to x. Hence in the simply transitive case is su$cient to study the power lattice P(T), i.e. the set of subsets of T ordered by setinclusion. The second case is that of a group T acting transitively on E. The object space of interest is again the Boolean lattice P(E) of all subsets of E. The general strategy is to make use of the results for the simply transitive case, by &lifting' subsets of E to subsets of T, applying morphological operators on P(T), and then &projecting' the results back to the original space E. The constructed operators are illustrated for the Euclidean motion group M acting on the hexagonal grid, using the representation by pointers introduced in Example 2.10. 3.1. Minkowski operators on groups On any group T one can de"ne generalizations of the Minkowski operations [12,14]. We denote elements of T by g, h, k, etc., and subsets of T by capitals G, H, K. The product of two group elements g and h is written gh, the inverse of g is denoted by g~1 and e is the unit element of T. For g3T, H-T, let gH:"Mgh: h3HN, Hg:"Mhg : h3HN, be the left and right products of a group element with a subset of T. For later use we also de"ne the inverted set of a subset G by G~1"Mg~1: g3GN. Note that inversion reduces to reyection for subsets of the Euclidean translation group (see Section 2.1).
Fig. 1. Representation of elements of the Euclidean motion group on the hexagonal grid. b: base-pointer. p: pointer with base-point x. q &: the collection of group elements which map y the origin u to y. Each pointer represents a unique group element.
De5nition 3.1. A mapping t"P(T)PP(T) is called left T-invariant (or left-invariant) when, for all g3T, t(gG)"gt(G), ∀G3P(T). Similarly, a mapping t : P(T)PP(T) is called right T-invariant (or right-invariant) when, for all g3T, t(Gg)"(t(G))g, ∀G3P(T). Recall that by de"nition a dilation (erosion) on P(T) is a mapping commuting with unions (intersections).
882
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
Proposition 3.2. Let H (the structuring element) be a xxed subset of T. Dexne T
dj (G) " : G= H " : Z Gh"Z gH, H h|H g|G j ej (G) " : G> H " : Y Gh~1" Z gHK , H h|H g|Gc where HK is dexned by HK "(H[ )#. Then the mapping dj deH xnes a left T-invariant dilation on the lattice P(T), with adjoint erosion ej . All left T-invariant adjunctions on P(T) H are of this form. Duality by complementation is expressed by the forT j mula (G= H)#"G#> H~1. It is easy to show the following equalities, which provide a geometrical interpretation: T
G= H"Mk3T : (kH[ )WGO0N"Mk3T : (G[ k)WHO0N, j G> H"Mg3T : gH-GN. Remark 3.3. Because of the non-commutativity of the set T
product G= H, one may also introduce a right-invariant dilation do and erosion eo by H H T
do (G) " : H= G " : Z hG"Z Hg, H h|H g|G o eo (G) " : G> H " : Y h~1G. H h|H
There is a connection to the theory of residuated lattices and ordered semigroups [25], which is explained in more detail in Ref. [14]. Only left-invariant dilations and erosions will be used in the remainder of this paper. From the properties of adjunctions (see Section 2.2) we know that we can build openings and closings by forming products of a dilation and an erosion. In particular, the mapping aj " : dj ej is an opening and the mapping H H H /j " : ej dj is a closing. Both mappings are left-invariH H H ant. As in the Euclidean case, there is a simple geometrical interpretation of these operations: T j aj (G) " : (G> H)= H"ZMgH: g3T, gH-GN, H T j /j (G) " : (G= H)> H"YMgHK : g3T, gHK .GN, H
In Fig. 2, we give an example of elementary T-operators for the case of the motion group (T"M). A special role is played by the dilation dI j and erosion & e8 j by the subgroup &: & T j dI j (G)"G= &, e8 j (G)"G> &. & &
The following lemma was proved in Ref [13]. Lemma 3.4. The adjunction (e8 j , dI j ) satisxes (a) e8 j " & & & e8 j e8 j "dI j e8 j , (b) dI j "dI j dI j "e8 j dI j . & & & & & & & & & This lemma says that e8 j is not only an erosion but also & an opening; and dI j is not only a dilation but also a clos& ing. The e!ect of the closing dI j on a subset G of & is to & make G &&-closed', i.e. invariant under right multiplication by &. For the case of the motion group, where &"R
Fig. 2. Morphological operations on the motion group M: (a) set G, structuring element H; (b) dilation of G by H; (c) erosion of G by H; (d) opening of G by H; (e) closing of G by H.
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
883
(cf. Example 2.10), any pointer q r with r3R, is extended x by dI j to the set of pointers q &, see Fig. 3. Similarly, the & x opening e8 j extracts all the cosets (i.e., subsets of the form & q &) from a subset G of T. x 3.2. Boolean lattices with a transitive group action This subsection summarizes the results obtained in Refs. [13,14] for the Boolean lattice P(E), with T acting transitively on E, and presents an application to invariant feature extraction. 3.2.1. Lift and projection operators De5nition 3.5. Let the &origin' u be an arbitrary point of E. The lift 0 :P(E)PP(T) and projection p: P(T)PP(E) are de"ned by
0(X)"Mg3T : gu3XN, X3P(E), p(G)"Mgu : g3GN, G3P(T).
(9) (10)
The mapping 0 associates to each subset X all group elements which map the origin u to an element of X. The mapping p associates to each subset G of T the collection of all points gu where g ranges over G. In the graphical representation, p maps G to the set of base-points of the pointers in G (Fig. 4(a)). Conversely, 0 maps a subset X of E to the set of pointers in T which have their base-points in X (Fig. 4(b)). De5nition 3.6. Let p be the projection (10) and e8 j the & j erosion e8 j (G)"G> &. Then p : P(T)PP(E) is the & & modixed projection de"ned by p "pe8 j . & & The projection p "rst extracts the cosets q & and then & x carries out the projection p (Fig. 4(c)). The operators 0, p and p have several useful proper& ties [14]. The most important ones are given in the next
Fig. 3. Action of the erosion and dilation by & on a subset of M.
proposition (cf. Fig. 5). Recall that a mapping t : P(E)PP(E) is called T-invariant or a T-mapping if t(gX)"gt(X) for all X3P(E), g3T. Proposition 3.7. (a) p, 0, p are increasing and T-invariant; & (b) 0 and p commute with unions, 0 and n commute with & intersections; (c) p0"idP ; p 0"idP ; (E) & (E) (d) X->Q0(X)-0(>); (e) (0, n) forms an adjunction between P(E) and P(T); (f) (n , 0) forms an adjunction between P(T) and P(E). &
Fig. 4. (a) Action of p on a subset of M. (b) Action of 0 on a subset of E. (c) action of n on a subset of M. &
884
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
Fig. 5. Illustration of properties of p, 0, p : (a) p0"p e"idP . (b) 0n"dI j . (c) 0n "e8 j . & & & & & (E)
3.2.2. Construction of T-invariant operators T-invariant operators can be constructed as follows [13,14]. Given a mapping t on P(E) we &lift' it to a mapping tI on P(T). Then we apply the results of Section 3.1 on P(T) and "nally &project' the results back to P(E), see Fig. 6 (left diagram). Remark 3.8. A "rst idea to generalize the Minkowski operations is to take a subset G of the group T (the &structuring element') and let it act on a subset X of E by de"ning GX " : 6 gX. This was applied, for example, g|G in Ref. [26] for the case of the a$ne group. However, this mapping is in general not T-invariant. For, let g 3T 0 be arbitrary. Then G(g X)"6 gg X. If we could 0 g|G 0 interchange g with g, the result would be 0 6 g gX"g GX, implying group invariance. But this g|G 0 0 interchange is not allowed if T is a non-commutative group such as the a$ne group. De5nition 3.9 Let T be a group acting on E, with & the stabilizer of the origin u in E. A subset X of E is called &-invariant if X"XM , where XM :"&X"6 sX is the s|& &-invariant extension of X. Proposition 3.10 (Representation of dilations and erosions). The pair (e, d) is a T-adjunction on P(E) if and only if, for some >3P(E), (e, d)"(eT , dT ), where Y Y
given in Fig. 7, where the underlined point in the structuring element denotes the origin. Next we consider openings and closings. De5nition 3.11. The structural T-opening aT (X) and TY closing /T (X) with structuring element >-E are deY "ned by (11) aT (X)"ZMg>: g3T, g>-XN, Y T / (X)"YMg>: g3T, g>.XN. (12) Y T In words, a (X) is the union of all translates g> which are Y included in X. An important consequence of the above proposition is that the adjunctional opening dT eT and closing eT dT are Y Y Y Y invariant under the substitution >P>M as well. Example 3.12. Let X be a union of line segments of varying sizes in the plane and > a line segment of size ¸ with center at the origin. Let the acting group T equal the translation-rotation group M. Then aM(X) consists of Y the union of all segments in X of size ¸ or larger, but M M M d e (X)"a 1 (X)"0, since >M "R> is a disc of radius Y Y Y ¸/2, which does not "t anywhere in X, cf. Fig. 8. So in general we cannot build the opening aT from Y a T-erosion eT on P(E) followed by a T-dilation dT on Y Y
T
dT (X) " : n[0(X)= 0(>)]" Z g>, Y g|0(X) k eT (X) " : n [0(X)> 0(>)]" Y g>K H, Y & g|0(X#) with >K H"(n(0[ (>)))#. In particular, (eT , dT ) is invariant Y Y under the substitution >P>M . This proposition says that any T-dilation on P(E) can be reduced to a dilation dT involving a &-invariant strucY turing element >; a similar statement holds for T-erosions. A graphical illustration for the motion group is
Fig. 6. Left: relations between mappings on P(E) and P(T). Right: relations between mappings on L, P(l) and P(T).
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
885
Fig. 7. Construction of an M-invariant dilation. (a) set X, structuring element >; (b) sets 0(X) and 0(>) of pointers; (c) set product M
M
0(X)= 0(>); (d) corresponding set n[0(X)= 0(>)] of base points.
Fig. 8. (a) X: a subset of the hexagonal grid consisting of &line segments'; set within the rectangle: structuring element >. (b) Erosion et (X) by >. (c) Dilation ds applied to the result in (b). The opening dMeM(X)"aMM (X) is empty. Y Y Y Y Y
P(E), in contrast to the classical case of the translation group (T"T), cf. Section 2.1. However, if erosions and dilations between the distinct lattices P(E) and P(T) are allowed, openings and closings can be decomposed into products of erosion and dilation (this is in agreement with a general result in Ref. [25, Theorem 2.7], see also Ref. [7, Section 6.3]). Proposition 3.13 (Decomposition of structural T-openings). The structural T-opening dexned by Eq. (11) is the projection of the opening a80 "dI 0 e8 0 , with (e8 0 , dI 0 ) (Y) (Y) (Y) (Y) (Y) the left-invariant adjunction on P(T) with structuring element 0(>), i.e. T j aT (X)"(ndI 0 e8 0 0)(X)"n((0(X)> 0(>))= 0(>)). Y (Y) (Y)
So, aT is the product of a T-erosion et : P(E)PP(T) Y Y followed by a T-dilation ds : P(T)PP(E), where Y
(et , ds ):"(e8 0 0, pdI 0 ) is a T-adjunction between P(E) Y Y (Y) (Y) and P(T). A similar representation holds for structural T-closings [14]. By a general result from Ref. [11], every T-opening on P(E) is a union of structural T-openings aT , where > ranges over a subset Y-P(E). Combining Y this with Proposition 3.13 we therefore can decompose any T-opening into T-openings of the form p dI 0 e8 0 0. (Y) (Y) 3.2.3. Example: A motion-invariant median xlter Consider the Boolean lattice L"P(Z2). Let > be a structuring element containing an odd number of N points. A point x of a subset X is retained by the Y median "lter if the intersection of X and the translated set q > contains at least (N #1)/2 points; otherwise the x Y point x disappears. De"ne a rotation-invariant median "lter by allowing rotations of > around x to get an
886
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
intersection containing the required number of points. That is, the intersection of X and the set q r > should x ( contains at least (N #1)/2 points for some angle /. This Y generalized median "lter will give the same result as the original median "lter if > is rotation-invariant. Therefore, we give in Fig. 9, an example (N "3) with a strucY turing element which is not rotation-invariant, and compare the result of the rotation-invariant median "lter with that of the classical median "lter. As is well known, one often can replace kernels with an in"nite number of elements by a "nite set of so-called basis elements [27]. As an illustration we give in Fig. 10 a decomposition of the M-invariant median "lter into a set of nine Terosions (the structuring element of each erosion is indicated). Notice that even this set of nine erosions is redundant. 3.2.4. Example: invariant feature extraction In computer vision one requires invariance under various groups, such as the Euclidean motion group, the similarity group, the a$ne group or the projective group [18]. When the group is enlarged, one gradually recovers the various geometric shapes present in the image. The following example is taken from Ref. [28]. Consider Fig. 11, showing a "gure containing a number of quadrangles. As the image transformation we take the opening aT , where the structuring element > is a square (without Y interior). This extracts from the input image all structures which are &similar' to the square, where &similar' means: obtainable from the square by a certain group operation. When T"T (translation group), the opening extracts all translates of the square, see Fig. 11(b). When T is the motion group, the opening extracts all translated and rotated versions of the square, see Fig. 11(c). When T is the similarity group, also scaled copies of the square are extracted, see Fig. 11(d). When T is the a$ne group, the opening extracts all parallelograms from the image, see Fig. 11(e). When T is the projective group, the opening extracts all quadrangles from the image (i.e., the original image), see Fig. 11 (f). So morphological operations for feature extraction can be adapted to the type of geometric invariance which is deemed to be appropriate for the application under consideration.
4. Group morphology for non-Boolean lattices Now we will extend the results of the previous section to non-Boolean lattices. It turns out that in general only part of the results carry over to the non-Boolean case. If the group T equals the motion group M, or when the lattice has both a sup-generating family l and infgenerating family l@, additional characterizations, e.g. of adjunctions, are obtainable, see Section 4.5. As a special case we consider M-operators on the lattice of grey value functions (Section 4.6). 4.1. Simple transitivity on a sup-generating family We start by recalling some results obtained by Heijmans and Ronse [10,11], see also Ref. [7]. Let L be a complete lattice with an abelian automorphism group T and a sup-generating subset l (cf. Section 2.2.3) such that: (i) l is T-invariant, i.e., for every q3T and x3l, qx3l; (ii) T is transitive on l: for every x, y3l there exists q3¹ such that qx"y (since T is abelian this q is unique). Given a "xed element u of l, q is the unique element x of T which maps u to x. This enables to de"ne a binary addition # on l by x#y"q q u, with !y"q~1u. x y y Now de"ne binary operations = and > on L by X=>" S q X"SMx#y : x3l(X), y3l(>)N, y y|l(Y)
(13)
X>>" R q~1X"SMz3l : q >)XN. y Z y|l(Y)
(14)
Proposition 4.1. For any >3L, the pair (eT , dT ) with Y Y dT (X)"X=>, eT (X)"X>>, is a T-adjunction. Every Y Y T-adjunction has this form. 4.2. Transitivity on a sup-generating family To extend the results of Section 4.1 to non-Boolean lattices with a non-abelian automorphism group, we relax the requirement made in Section 4.1 that T is abelian.
Fig. 9. Median "ltering: (a) set X, structuring element >; (b) result of M-invariant "lter; (c) result of T-invariant "lter.
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
Basic Assumption. Let L be a complete lattice with an automorphism group T and a sup-generating subset l such that: (i) l is T-invariant, i.e., for every q3T and x3l, qx3l;
887
(ii) T is transitive on l: for every x, y3l there exists at least one q3T such that qx"y. Various operators can be constructed using an extension of the &lifting' procedure described in Section 3. This is
Fig. 10. Decomposition of the M-invariant median "lter of Fig. 9 into a set of nine M-erosions. The structuring element of each erosion is indicated within a rectangular box.
Fig. 11. Opening of the quadrangle image X shown in (a) by a square structuring element >, using as acting group: (b) translation group; (c) motion group; (d) similarity group; (e) a$ne group; (f) projective group.
888
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
based upon the observation that the pair (l, s) forms an adjunction between L and P(l), with sl"idL , just as the pair (0, p) forms an adjunction between P(l) and P(T), with p0"idP l . () Given a mapping t on L we lift it to a mapping ( on P(T) as follows. First we go from L to P(l) by using the operator l. Then we move from P(l) to P(T) by applying the operator 0. Then we apply the results of Section 3.1 on P(T) and "nally project the results back, "rst to P(l) by using p, then to L by applying the s-operator. The procedure is illustrated in Fig. 6 (right diagram). Below we illustrate this approach by developing representations for openings and general increasing T-operators. For an operator t : LPL we de"ne corresponding operators tI on P(l) and ( on P(T) by tI "ltS, ("0 tI n"0ltSn.
(15)
Using Proposition 3.7(c) and Eq. (4), t and tI can be recovered by tI "n(0, t"StI l"Sn(0 l.
(16)
The next lemmas give us the necessary tools to derive properties of certain mappings on L from those on P(T). These lemmas are generalizations of results for Boolean lattices [13,14]. In the latter case, also results for adjunctions and closings hold, which in general are no longer valid in the non-Boolean case (cf. Remark 4.4). Lemma 4.2. Let t be an operator on L, and let ( be given by Eq. (15). Then: (a) If t is an increasing T-mapping, then ( is an increasing T-mapping. (b) If t is a closing, then ( is a closing. Proof. (a) Obvious, since 0, l, s, p are all increasing T-operators. (b) From (a), ( is increasing, since t, being a closing, is increasing. Also, t*idL , so (*0 l sp*0p*idP T , ( ) because both ls and 0p are closings, hence extensive. Finally, (2"0 ltsp0 ltsp"0 lttsp"0 ltsn"(, where we used that p0"idP l ,sl"idL , and t2"t. So ( () is increasing, extensive and idempotent, hence a closing. h
Proof. (a) Obvious, since 0, l, s, p are all increasing T-operators. (b) From (a), t is increasing, since (, being an opening, is increasing. Also, ()idL , so t)sp0 l"idL , since p0"idP l and sl"idL , hence t is anti-extensive. () This also implies that t2)t. On the other hand, using that both ls and 0p are closings, hence extensive, and the fact that (2"(, we "nd t2"sp(0 lsp(0 l* sp((0l"sp(0l"t. So we found that t2)t and t2*t, hence t2"t, and we proved the idempotence of t. h Remark 4.4. Note that 0 is not only an erosion, but also a dilation from P(l) to P(T) (cf. Section 3.2.1). However, l is not a dilation from L to P(l). This obstructs the construction of dilations on L using the lifting technique. For the special case that T is the Euclidean motion group or the a$ne group, we do in fact obtain a complete characterization of dilations using the results of Heijmans and Ronse [7,10], see Section 4.5. Another case occurs when L has both a sup-generating family l and an inf-generating family l@ on which T acts transitively. Then (', l@) is an adjunction between P (l@) and L, and any dilation d on L has the form d(X)" T
sp(0(l@(X))= G), with adjoint erosion e(X)" j 'p (0(l(X))> G), for some G3P(T); cf. Fig. 12. An & example is given by the lattice of grey-scale functions (see Section 4.6 below), where grey-level inversion transforms the sup-generating family into an inf-generating family [10]. 4.3. Representation of structural T-openings De5nition 4.5. The structural T-opening aT on L by Y >3L is de"ned by aT (X)"SMg> : g3T, g>)XN. Y
(18)
Conversely, with an operator ( on P(T) one can associate an operator t on L by t"Sp(0 l.
(17)
Notice that now ( cannot be recovered from t. However, we have: Lemma 4.3. Let ( be an operator on P(T), and let t be given by Eq. (17). (a) If ( is an increasing T-mapping, then t is an increasing T-mapping. (b) If ( is an opening, then t is an opening.
Fig. 12. Construction of a T-dilation d (left), and a T-erosion e (right), on a lattice L with sup-generating family l and infgenerating family l@.
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
Proposition 4.6 (Decomposition of structural T-openings). The structural T-opening aT dexned by Eq. (18) is the Y product of a T-erosion et : LPP(T) followed by its adY joint T-dilation ds : P(T)PL, i.e., aT (X)"ds et (X), Y Y Y Y where j et (X)"0(l(X))> 0(l(>)), X3L, Y T
ds (G)"Sn[G= 0(l(>))], G3P(T). Y Proof. By explicit computation, we "nd
Proof. The mapping tI de"ned by tI (G)" l(t(sG)), G3P(l), is an increasing T-operator on P(l). In Ref. [13] we proved that any increasing T-mapping on a Boolean lattice P(l) is a union of projected erosions, i.e., mappings which are projections of erosions on P(T): j tI (G)" Z n[0(G)> 0(H)], H|,%3(tI ) where ker(tI )"MG3P(l) : u3tI (G)N is the kernel of tI . Therefore, Eq. (16) yields j t(X)"StI (l(X))"S Z n[0(l(X))> 0(H)]. I H|,%3(t) We can relate the kernels of t and tI as follows:
aT (X)"SMg> : g3T, g>)XN Y "SZMgl(>) : g3T, gl(>)-l(X)N
ker(tI )"MG3P(l) : u3tI (G)N
"SZn[Mg0(l(>)) : g3T, g0(l(>))-0(l(X))N]
"MG3P(l) : u3l(t(SG))N
"Sn[ZMg0(l(>)) : g3T, g0(l(>))-0(l(X))N]
"MG3P(l) : u)t (SG)N
T j "Sn[(0(l(X))> 0(l(>))) = 0(l(>))]
"MG3P(l) : SG3ker(t)N.
j "ds (0(l(X))> 0(l(>)))"ds (et (X)), Y Y Y
Also, for all g3T, we have the equivalences
where we used the properties of sup-generating families (see Section 2.2.3). h Again we note that the opening aT is not an adjuncY tional opening on L in the sense of Section 2.2.2. To T decompose a as a product of an erosion and its adjoint Y dilation, distinct lattices L and P(T) are required. Finally, to obtain decompositions of structural T-closings one needs a dual Basic Assumption requiring the existence of an inf-generating subset, see Ref. [7, Remark 5.11].
g0(H)-0(l(X)) Q gH-l(X)QgSH)X Q gl(SH)-l(X) Q g0(l(SH))-0(l(X)), where we used the properties of 0 and l summarized in Section 2.2.3 and Section 3.2.1, as well as their T-invariance. This implies that j 0(l(X))> 0(H)"Mg3T: g0(H)-0(l(X))N "Mg3T: g0(l(SH))-0(l(X))N j "0(l(X))> 0(l(SH)).
4.4. Representation of increasing T-operators The lifting approach enables us to obtain a generalization of a theorem by Matheron [1] giving a characterization of T-invariant increasing mappings on L. De5nition 4.7. The kernel ker(t) t : LPL is de"ned by
of
a
mapping
ker(t)"MA3L : u)t(A)N. Here u is the origin of the sup-generating family l of L. Theorem 4.8. Let L be a complete lattice with automorphism group T satisfying the Basic Assumption. Then any increasing T-mapping t : LPL has the decomposition j t(X)"S Z n[0(l(X))> 0(l(>))]. Y|,%3(t)
889
(19)
Therefore, j t(X)"S Z Z p[0(l(X))> 0(l(SH))] Y|,%3(t) H>[H/Y j "S Z p[0(l(X))> 0(l(>))] Y|,%3(t) This completes the proof. h Note that the mapping et : LPP(T), with et (X)" Y Y j 0(l(X))> 0(l(>))"Mg3T : g>)XN is an erosion between the lattices L and P(T). Again, we remark that to obtain representations of an increasing T-operator as an in"mum of projected T-dilations one needs a dual Basic Assumption. By considering special cases, we recover some of the well-known representations.
890
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
1. T Abelian. Using the properties of the operators 0 and l one "nds. j 0(l(X))> 0(l(>)) " Y 0(l(X))h~1" Y h~10(l(X)) h|0(l(Y)) h|0(l(Y))
AA
BB
" Y 0(l(h~1X))"0 l R h~1X h|0(l(Y)) h|0(l(Y))
AA
"0 l
BB
d(rX)" S q >" S q >" S q >. x x rx{ x|l(rX) x|rl(X) x{|l(X) Now M is the semi-direct product of T and R, so from Eq. (8) and the R-invariance of Y, we "nd
R q~1X y y|l(Y)
Therefore,
CAA BBD A B A
t(X)"S Z p 0 l R q~1X y Y|,%3(t) y|l(Y)
M-dilation is a T-dilation, also all M-dilations have the form (21). But d has to be R-invariant as well, therefore >"d(u)"d(ru)"rd(u)"r>, ∀r3R, i.e., > has to be R-invariant. Conversely, we may ask whether every mapping of the form (21) with R-invariant structuring element > is an M-dilation. Well, Eq. (21) is a T-dilation, so it remains to prove that d is R-invariant. For any r3R,
d(rX)" S rq r~1>" S rq > x{ x{ x{|l(X) x{|l(X)
B
"S Z l R q~1X " S R q~1X y y Y|,%3(t) y|l(Y) Y|,%3(t) y|l(Y)
" S X>>, (20) Y|,%3(t) where X>> is de"ned by Eq. (14). This is precisely the representation for increasing T-operators with T abelian, as derived in Ref. [10, Theorem 3.11], see also Ref. [7, Theorem 5.22]. 2. L Boolean. If L"P(E) for some set E, then l becomes the identity operator, and s becomes union, so j t(X)" Z n[0(X)> 0(>)], Y|,%3(t) which is the representation by projected erosions as derived in Ref. [13]. If T equals the translation group T this representation reduces to that of Matheron [1]. Application of this decomposition to the Boolean dual (3) leads to a representation as intersection of projected dilations. 4.5. M-invariant operators When T is the motion group M, many formulas simplify considerably, and also some additional characterizations, e.g. for adjunctions, are obtained. Essentially, the same technique applies when M is replaced by other groups which have the translation group T as a transitive subgroup, such as the similarity group or the a$ne group. From the results of Section 4.1 we know that a mapping d is a T-dilation on L if and only if d has the form d(X)"dT (X)"X=>" S q >" S q X, (21) Y x y x|l(X) y|l(Y) where the structuring element is given by >"d(u), with u the origin of the sup-generating family l. Since every
"r S q >"rd(X). x{ x{|l(X) Since adjoints of dilations are unique, we know immediately that the mapping e given by eT (X)"X>>" Y R l q~1X is the M-erosion adjoint to d. Summarizing: y| (Y) y Proposition 4.9. For any >3L, with > R-invariant, the pair (eM, dM) with Y Y dM(X)" S q X and eM(X)" R q~1X, y y Y Y y|l(Y) y|l(Y) is an M-adjunction. Every M-adjunction has this form. In the case of the structural M-opening aM by the Y structuring element >, we "nd aM(X) " : SMg>: g3M, g>)XN Y "SMqr> : r3R, q3T, qr>)XN (22) "S aT (X), rY r|R where aT (X):"sMqr> : q3T, qr>)XN is the structural rY T-opening by r>. For the closing one "nds similarly, /M(X)"RMg> : g3M, g>*XN"R /T (X), Y rY r|R where /T (X):"'Mqr> : q3T, qr>*XN is the strucrY tural T-closing by r>. Remark 4.10. It was proved in Ref. [11] that aT is an rY adjunctional T-opening: aT "dT eT , but that, in general, rY rY rY /T is not an adjunctional T-closing (cf. Ref. [7]). rY Finally, we take a look at the representation of Theorem 4.8 for increasing T-mappings. Since every M-mapping is a T-mapping, Eq. (19) should reduce to the representation (20). For the projected erosions occurring
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
in Eq. (19) we "nd j n[0(l(X))> 0(l(>))]"pMg3M : g0(l(>))-0(l(X))N
G
t, y"x, f (y)" x,t !R, yOx.
891
(25)
As indicated in Remark 4.4, one can give complete characterizations of T-invariant grey-scale operators due to the existence of grey-level inversion. We give two examples.
"pMg3M : gl(>)-l(X)N "Mx3l : q l(r>)-l(X) x for some r3RN T
"Z l(X)> (l(r>)), r|R where
A
T
B
l(X)> (l(r>))" Y q~1l(X)"l R q~1X (23) y y y|l(rY) y|l(rY) denotes the T-dilation of the set l(X) by the structuring element l(r>). Therefore any increasing M-mapping t : LPL has the representation T
(24) t(X)"S Z Z l(X)> (l(r>)). Y|,%3(t) r|R It is easy to show that for any M-mapping t the following equivalence holds for all r3R : >3ker(t)Q r>3ker(t). This means that the union over all r3R in Eq. (24) actually can be omitted. So, using Eq. (23) and (7), one obtains
A
B
t(X)"S Z l R q~1X " S X>>, y Y|,%3(t) Y|,%3(t) y|l(Y) and we recover Eq. (20).
4.6.1. Motion-invariant grey-scale operators This is the case where T is the motion group M. De"ne an automorphism c on L by h,(,v (c (F))(x)"F(r~1x!h)#v, F3L, h,(,v ( i.e., c carries out a motion } consisting of a rotation h,(,v r followed by a translation q } of the graph of F in the ( h plane, and translates it over a distance v along the grey value axis. The group M:"Mc : h3E, /3[0, 2p), v3TN, h,(,v is an automorphism group of L acting transitively on l. M is the semi-direct product of the abelian groups T and R where T"Mq : h3E, v3TN, R"Mr : /3[0, 2p)N, h,v ( with q "c and r "c (note that r denotes h,v h,0,v ( 0,(,0 ( both an operator on points and on functions). In particular, r q r~1"q ( . Note that the group T of transla( h,v ( r h,v tions is transitive on l. So from the results of Section 4.5 we may conclude immediately that all M-dilations have the form dT(F)"F=G, for some R-invariant structuring function G3L, where
4.6. Group-invariant grey-scale operators The general approach above can be directly2 applied to the treatment of T-invariant operators on the lattice L of grey scale functions. Our approach closely follows that of Ronse and Heijmans [7,10,11]. Let L denote the complete lattice Fun(E, T) of grey scale functions with domain E, whose range is a complete lattice T of grey values. Here E may be Rn or Zn, and T may be R1 "RXM#R,!RN, Z1 "ZXM#R, !RN, or also a "nite set of grey values [7, Chapter 11]. In the following we restrict ourselves to the case n"2. The supremum and in"mum of a family (F ) of j j|J grey-scale functions is given by
A B
A B
S F (x)"sup F (x), R F (x)"inf F (x), x3E. j j j j j|J j|J j|J j|J The sup-generating family l is now given by the impulse functions f , x3E, t3T de"ned by x,t
2 An alternative is the umbra approach, which has to be handled with care [7,8].
(F=G)(x)" S (q F)(x)" S F(x!h)#v h,v (h,v)|l(G) (h,v)|l(G) "S F(x!h)#G(h) h|E and the R-invariance of the structuring function G is expressed by rG"G for all r3R, i.e. G(r~1x)"G(x) ∀/3[0, 2p). ( The adjoint erosion has the form eT(F)"F>G where (F>G)(x)"R F(x#h)!G(h). h|E Finally, the decomposition (22) of structural M-openings now reads aM(F)"s R aT (F), where aT (F):" G r| rG rG (F>rG)=rG, with ((F>G)=G)(x)"S R F(x!h#h@)!G(h@)#G(h) h h{ is the structural T-opening with structuring function G. Decompositions of structural M-closings are possible by the existence of grey-scale inversion, which transforms
892
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
the sup-generating family (25) into an inf-generating family, cf. Remark 4.4. Remark 4.11. The chosen group M leads to additive structuring functions. Other choices are possible, leading to multiplicative structuring functions. See Ref. [7,10] for more details. 4.6.2. Grey-scale operators on the sphere As a second example we consider grey-scale operators on the sphere, invariant under the group SO(3) of rotations in 3-space, cf. Example 2.9. The construction of morphological operators for this case leads to formulas which are completely analogous to the ones for M-invariant operators just considered. So we con"ne ourselves to illustrating this case by a practical example and making some remarks on the implementation of the spherical operators. First we recall some facts for the case of binary image operators on the sphere, which was considered in Ref. [23]. We assume that pictures of the sphere are produced by orthographic projection on a plane, which corresponds closely to what happens if pictures of the earth or a planet are taken from a large distance. Only one hemisphere will be visible, so we take a disc on which to map a hemisphere. Let D:"M(x, y)3R2 : x2#y2)1N be a disc of radius 1 in the plane. The upper hemisphere is the set S2 :"M(x, y, z)3R3 : x2#y2#z2"1, z*0N. ` Orthographic projection from the upper hemisphere to the disc D is the map p : S2 PD given by M ` p (x, y, J1!x2!y2)"(x, y), with inverse M p~1(x, y)"(x, y, J1!x2!y2). M Under orthographic projection, the rotations on the sphere induce transformations on the disc D. Consider a disc on the sphere centered at the pole, such that its projection is a disc C of radius d(1 with center at the origin of D, cf. Fig. 13. If the disc on the sphere has moved to a location such that the projection of its center is at (x, y)3D, then the image C of the rotated x,y disc consists of those points (u, v)3D which satisfy the equation 1!xu!yv!J(1!x2!y2)(1!u2!v2))1!J1!d2. (26) The boundary of the region C is in general an x,y ellipse, see Fig. 13. The ellipses have their minor axes oriented in the radial direction. Note that (x, y) is not the center of the ellipse C : if (x, y) has radial x,y distance r to the origin, then C has its center at x,y radial distance rJ1!d2. Very close to the boundary of D, C is no longer an ellipse, but a region enclosed x,y between part of an ellipse and the boundary of the disc D, corresponding to the situation that the rotating disc on the sphere moves from one hemisphere to the other.
Fig. 13. Disc C centered at the origin of D, and its &translates' C corresponding to rotated discs on the sphere under orthox,y graphic projection. A ' sign indicates the projection of the center of a rotated disc.
Now we can construct spherical grey-scale operators by a structuring function G with support inside the disc C of radius d. For simplicity we take a rotationally symmetric structuring function, more in particular a yat structuring function with constant value zero. This is implemented in the digital case as follows. The disc D is covered by a square grid of pixels, and for each pixel (x, y) in D, the disc C at the origin is transformed to position (x, y) according to Eq. (26). Then the value of the #at grey-scale dilation or erosion at pixel (x, y) is obtained by computing the maximum (resp. the minimum) of the image values at all pixels inside the region C around (x, y). Products of such an erosion x,y and dilation result in a spherical grey-scale opening or closing. As an example, we show in Fig. 14(a) a picture of the planet Mars, taken by the Hubble Space Telescope on February 25, 1995 (Source: NASA/National Space Science Data Center; credit: Ph. James (University of Toledo), S. Lee (University of Colorado), NASA). Fig. 14(b) shows its opening by the #at structuring function G de"ned above, where we have chosen d"0.1, i.e. the radius of C equals 10% of the radius of the planet. For comparison, the Euclidean opening with the disc C (for the same value of d) is shown as well, see Fig. 14(c). Notice the di!erent behavior near the boundary of the planet, in particular with respect to the polar cap: in the Euclidean case, the translates C remain discs of radius d at all points (x, y). x,y This illustrates that the spherical transformations are better adapted to the geometry than the Euclidean translations.
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
893
Fig. 14. (a) Picture of the planet Mars (for details, see text), (b): spherical grey-scale opening, (c) Euclidean grey-scale opening.
5. Discussion We have presented a mathematical framework for constructing morphological operators on complete lattices which are invariant under some group T. Starting from the classical operators, like dilation, erosion, opening and closing, which are invariant under the abelian translation group T, a two-stage process was described for constructing T-invariant morphological operators on Boolean lattices with a non-commutative group of automorphisms. First T-invariant morphological operators were de"ned on the space P(T) of subsets of T by generalizing the Minkowski operations to non-commutative groups. Next morphological operators were constructed on the actual object space of interest P(E) by (i) mapping the subsets of E to subsets of T, (ii) using the results for the lattice P(T), and (iii) projecting back to the original space P(E). Subsequently, we considered non-Boolean lattices with a non-commutative group T of automorphisms. Following Heijmans and Ronse [10,11] the basic assumption was made that the lattice has a sup-generating family on which T acts transitively. Di!erences with the case of Boolean lattices were pointed out. Special attention was given to the case where T equals the Euclidean motion group M generated by translations and rotations. As another application of special interest we considered T-invariant morphological operators for grey-scale functions. Examples covered by the general framework are: f Polar morphology [5,10], with applications to models of the visual cortex [29,30]. f Constrained perspective morphology [31], where one requires invariance of image operations under object translation parallel to the image plane used for perspective projection. f Spherical morphology [23], which has connections to integral geometry and geometric probability [32,33], see also Section 4.6.2.
f Translation-rotation morphology [24], which has applications to robot path planning [34], see also Ref. [35]. Another application is the tailor problem, which concerns the "tting of sets without overlap within a larger set [36], with applications to making cutting plans for clothing manufacture. For similar applications of the classical Minkowski operations to spatial planning and other problems, see Ghosh [37]. f Projective morphology [28], which is appropriate for invariant pattern recognition under perspective projection. Invariance may be restricted to subgroups of the projective group, such as the motion group, the similarity group, or the a$ne group. Other applications concern a$ne signal models or the inverse problem in fractal modeling [26]. f Diwerential morphology [38]. Shape description of patterns on arbitrary (smooth) surfaces based on concepts of di!erential geometry may be used to obtain morphological operators which leave the geometry of the surface invariant.
6. Summary In its original form, mathematical morphology is a theory of binary image transformations which are invariant under the group of Euclidean translations. This paper surveys and extends constructions of morphological operators which are invariant under a more general group T, such as the motion group, the a$ne group, or the projective group. The motivation for this approach derives from computer vision, where an important question is how to take the projective geometry of the imaging process into account. This is of importance in invariant pattern recognition, where the goal is to recognize patterns irrespective of their orientation or location. In image understanding one wants to derive information about three-dimensional (3D) scenes from projections on a planar (2D) image screen. In this case it is natural to require invariance of image operations under the 3D
894
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895
camera rotations. So one may require invariance under increasingly larger groups, such as the Euclidean motion group, the similarity group, the a$ne group or the projective group, which are all non-commutative groups. We will follow a two-step approach: "rst we construct morphological operators on the space P(T) of subsets of the group T itself; next we use these results to construct morphological operators on the original object space, i.e. the Boolean algebra P(En) in the case of binary images, or the lattice Fun(En, T) in the case of grey value functions F : EnPT, where E equals R or Z, and T is the grey value set. T-invariant dilations, erosions, openings and closings are de"ned and several representation theorems are presented. Graphical illustrations are given for the case of the Euclidean motion group generated by translations and rotations. Examples and applications are discussed.
References [1] G. Matheron, Random Sets and Integral Geometry, Wiley, New York, NY, 1975. [2] J. Serra, Image Analysis and Mathematical Morphology, Academic Press, New York, 1982. [3] C.R. Giardina, E.R. Dougherty, Morphological Methods in Image and Signal Processing, Prentice-Hall, Englewood Cli!s, NJ, 1988. [4] H.J.A.M. Heijmans, Mathematical morphology: an algebraic approach, CWI Newslett. 14 (1987) 7}27. [5] J.B.T.M. Roerdink, H.J.A.M. Heijmans, Mathematical morphology for structures without translation symmetry, Signal Processing 15 (1988) 271}277. [6] J. Serra (Ed.), Image Analysis and Mathematical Morphology. II: Theoretical Advances, Academic Press, New York, 1988. [7] H.J.A.M. Heijmans, Morphological Image Operators, Advances in Electronics and Electron Physics, Vol. 25, Supplement, Academic Press, New York, 1994. [8] C. Ronse, Why mathematical morphology needs complete lattices, Signal Processing 21 (2) (1990) 129}154. [9] J. Serra, ED leH ments de TheH orie pour l'Optique Morphologique, Ph.D. Thesis, UniversiteH P. and M. Curie, Paris, 1986. [10] H.J.A.M. Heijmans, C. Ronse, The algebraic basis of mathematical morphology. Part I: dilations and erosions, Comp. Vis. Graph. Im. Process 50 (1989) 245}295. [11] C. Ronse, H.J.A.M. Heijmans, The algebraic basis of mathematical morphology. Part II: openings and closings, Comp. Vis. Graph. Im. Process Image Understanding 54 (1991) 74}97. [12] J.B.T.M. Roerdink, Mathematical morphology on homogeneous spaces. Part I: the simply transitive case, Report AM-R8924, Centre for Mathematics and Computer Science, Amsterdam, 1989. [13] J.B.T.M. Roerdink, Mathematical morphology on homogeneous spaces. Part II: the transitive case, Report AMR9006, Centre for Mathematics and Computer Science, Amsterdam, 1990.
[14] J.B.T.M. Roerdink, Mathematical morphology with noncommutative symmetry groups, in: E.R. Dougherty (Ed.), Mathematical Morphology in Image Processing, Chapter 7, Marcel Dekker, New York, NY, 1993, pp. 205}254. [15] H. Hadwiger, Vorlesungen uK ber Inhalt, Ober#aK che, und Isoperimetrie, Springer, New York, 1957. [16] C. Zetzsche, T. Caelli, Invariant pattern recognition using multiple "lter image representations, Comp. Vis. Graph. Im. Process 45 (1989) 251}262. [17] K. Kanatani, Group-Theoretical Methods in Image Understanding, Springer, New York, 1990. [18] J.L. Mundy, A. Zisserman, D. Forsyth (Eds.), Applications of Invariance in Computer Vision, Lecture Notes in Computer Science, Vol. 825, Springer, New York, 1994. [19] G. Birkho!, Lattice Theory, 3rd edition, American Mathematical Society Colloquium Publications, Vol. 25, Providence, RI, 1984. [20] G. Gierz, K.H. Hofmann, K. Keimel, J.D. Lawson, M. Mislove, D.S. Scott, A Compendium of Continuous Lattices, Springer, New York, 1980. [21] D.J.S. Robinson, A Course in the Theory of Groups, Springer, New York, 1982. [22] M. Suzuki, Group Theory, Springer, New York, 1982. [23] J.B.T.M. Roerdink, Mathematical morphology on the sphere, Proceedings SPIE Conference Visual Communications and Image Processing '90, Lausanne, 1990, pp. 263}271. [24] J.B.T.M. Roerdink, On the construction of translation and rotation invariant morphological operators, Report AMR9025, Centre for Mathematics and Computer Science, Amsterdam, 1990. [25] T.S. Blyth, M.F. Janowitz, Residuation Theory, Pergamon Press, Oxford, 1972. [26] P. Maragos, A$ne morphology and a$ne signal models. Proceedings SPIE Conference Image Algebra and Morphological Image Processing, San Diego, July 1990. [27] P. Maragos, A representation theory for morphological image and signal processing, IEEE Trans. Pattern Anal. Mach. Intell. 11 (1988) 586}599. [28] J.B.T.M. Roerdink, Group invariance in mathematical morphology, 1995, Proceedings of the International Conference on Pure and Applied Di!erential Geometry, Nordfjordeid, Norway, July 18}August 7, 1995, to appear. [29] E.L. Schwartz, Computational anatomy and functional architecture of striate cortex: a spatial mapping approach to perceptual coding, Vision Res. 20 (1980) 645}669. [30] A. Trehub, Neuronal models for cognitive processes: networks for learning, perception, and imagination, J. Theor. Biol. 65 (1977) 141}169. [31] J.B.T.M. Roerdink, Computer vision and mathematical morphology, in: W. Kropatsch, R. Klette, F. Solina (Eds.), Theoretical Foundations of Computer Vision, Computing, Supplement 11, 1996, pp. 131}148. [32] R.E. Miles, Random points, sets and tesselations on the surface of a sphere, Sankhya A 33 (1971) 145}174. [33] L.A. Santalo, Integral Geometry and Geometric Probability, Addison-Wesley, Reading, MA, 1976.
J.B.T.M. Roerdink / Pattern Recognition 33 (2000) 877}895 [34] J.B.T.M. Roerdink, Solving the empty space problem in robot path planning by mathematical morphology, in: J. Serra, P. Salembier (Eds.), Proceedings Workshop &Mathematical Morphology and its Applications to Signal Processing', Barcelona, Spain, May 12}14, 1993, pp. 216}221. [35] M. de Berg, M. van Kreveld, M. Overmars, O. Schwarzkopf, Computational Geometry, Springer, New York, 1997. [36] J.B.T.M. Roerdink, The generalized tailor problem, in: P. Maragos, R.W. Shafer, M.A. Butt (Eds.), Mathematical
895
Morphology and its Applications to Image and Signal Processing, Kluwer Acad. Publ., Dordrecht, 1996, pp. 57}64. [37] P.K. Ghosh, A solution of polygon containment, spatial planning, and other related problems using Minkowski operations, Comp. Vis. Graph. Im. Process. 49 (1990) 1}35. [38] J.B.T.M. Roerdink, Manifold shape: from di!erential geometry to mathematical morphology, in: Y.L. O, A. Toet, D. Foster, H.J.A.M. Heijmans, P. Meer (Eds.), Shape in Picture, NATO ASI Series, Vol. F 126, Springer, New York, 1994, pp. 209}223.
About the Author*JOS B.T.M. ROERDINK received his M.Sc. (1979) in theoretical physics from the University of Nijmegen, the Netherlands. Following his Ph.D. (1983) from the University of Utrecht and a two-year position (1983}1985) as a Postdoctoral Fellow at the University of California, San Diego, both in the area of stochastic processes, he joined the Centre for Mathematics and Computer Science in Amsterdam. There he worked from 1986 to 1992 on image processing and tomographic reconstruction. He is currently associate professor of computing science at the University of Groningen, the Netherlands. His current research interests include mathematical morphology, wavelets, biomedical image processing and scienti"c visualization.
Pattern Recognition 33 (2000) 897}905
Geodesic balls in a fuzzy set and fuzzy geodesic mathematical morphology Isabelle Bloch* Ecole Nationale Supe& rieure des Te& le& communications, De& partement TSI } CNRS URA 820, 46 rue Barrault, 75013 Paris, France Received 23 July 1998; received in revised form 21 December 1998; accepted 2 May 1999
Abstract Although fuzzy operators have deserved a large attention in the Euclidean case, almost nothing exists concerning the geodesic case. In this paper, we address this question, by de"ning fuzzy geodesic distances between points in a fuzzy set, and geodesic balls in a fuzzy set (based on the comparison of fuzzy numbers), from which we derive fuzzy geodesic mathematical morphology operators. The proposed de"nitions are valid in any dimension. The main properties of the basic operators are demonstrated. These new operations enhance the set of fuzzy morphological operators, leading to transformations of a fuzzy set conditionally to another fuzzy set. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Fuzzy sets; Fuzzy geodesic distance; Fuzzy geodesic balls; Fuzzy mathematical morphology; Fuzzy geodesic dilation and erosion
1. Introduction The extension of mathematical morphology to fuzzy sets has become a focus of interest in several research teams since a few years, e.g. Refs. [1}7] and several others since these original works. One interesting point of view of these extensions relies in the links existing between fuzzy morphological operators (in particular distances) and fuzzy distances. For instance, in previous works [8}10], we have shown how fuzzy morphological operators can be derived from fuzzy distances, and conversely how fuzzy dilation can be the basis for powerful fuzzy distances between a point and a fuzzy set and between two fuzzy sets. Such links are widely studied in classical morphology, in the Euclidean case, but also in the geodesic case as well. Indeed, in mathematical morphology, an important set of operations is constituted by geodesic transformations [11}14]. They are most useful in image processing
* Tel.: #33(1)-45-81-75-85; fax: #33(1)-45-81-37-94. E-mail address:
[email protected] (I. Bloch)
and pattern recognition, where transformations may have to be performed conditionally to a restriction of the spatial domain. Applications can be found for de"ning operators under reconstruction (e.g. "ltering operators), in image segmentation, and in pattern recognition, where operations have to be constrained by results of some other transformations. In this paper, we propose to de"ne geodesic transformations on fuzzy sets, that extend our preliminary work in Ref. [15]. To our knowledge, this is the "rst attempt towards extending geodesic morphology to fuzzy sets, in contrary to Euclidean morphology, that has already motivated several works [1}7]. The aim of this extension is to provide geodesic operators for image processing under imprecision, where image objects are represented as spatial fuzzy sets. An object in the image is represented as a fuzzy set through a membership function assigning to each point of the image a value in [0,1] which represents its membership degree to the object.1 1 What is called object depends on the application. It may be for instance a region in the image to which we can assign a label or a semantics.
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 5 3 - 3
898
I. Bloch / Pattern Recognition 33 (2000) 897}905
With such a representation, spatial imprecision, for instance on the limits of the objects, is directly taken into account. We will consider mainly dilation and erosion, which are the two main morphological operators, from which a large set of operators can be built, by iterating and combining these two basic ones. Let us "rst introduce some notations and recall some de"nitions of geodesic morphology on binary sets. In the Euclidean case, the considered space S is equipped with the Euclidean distance d , and we denote by Dr(>) and E Er(>) the dilation and erosion of a set > by a ball B of r size r. In the geodesic case, transformations are de"ned conditionally to a reference set X. The considered distance is then the geodesic distance in X (i.e. the distance d (x, y) from x to y is the length of the shortest path from X x to y completely included in X). A geodesic ball of size r and center x is de"ned as B (x, r)"My3X, d (x, y))rN. (1) X X Geodesic dilation and erosion of > conditionally to X of size r are then de"ned as Dr (>)"Mx3S, B (x, r)W>O0N X X "Mx3S, d (x, >))rN, (2) X Er (>)"Mx3S, B (x, r)L>N"X!Dr (X!>). (3) X X X We propose to generalize Eqs. (1)}(3) to fuzzy sets. We "rst de"ne the type of fuzzy sets we use here in Section 2, and present a general principle for extending operations. The generalization of Eqs. (1)}(3) to fuzzy sets calls for extensions of geodesic distance and of geodesic balls to fuzzy sets. We have already proposed several de"nitions for fuzzy geodesic distances in Ref. [16]. We recall the de"nition having the best properties in Section 3, and propose another de"nition where the distance is considered as a fuzzy number. We propose in Section 4 a de"nition of fuzzy geodesic balls and we give its main properties. In Section 5 we derive de"nitions of fuzzy geodesic dilation and erosion, and present their algebraic properties.
2. Spatial fuzzy sets and extension of operations A useful representation of objects in images under imprecision can be found in the framework of fuzzy sets [17,18]. The space S is the image space, typically Z2 or Z3 for digital 2D or 3D images, or, in the continuous case, R2 or R3. We are interested in the objects of the image that we may describe as fuzzy sets. Thus we often call them fuzzy image objects. A fuzzy image object is a fuzzy set de"ned on S, i.e. a spatial fuzzy set. Its membership function k is a function from S into [0,1] and represents the imprecision in the spatial extent of the object. For any point x of S (pixel or voxel), k(x) is the
degree to which x belongs to the fuzzy object. Since it is equivalent to speak about a fuzzy set or its membership function, we will use in the following either of both terms, and denote both by k. The advantage of this representation is to account for spatial imprecision that is inherent to images in several domains. This imprecision may originate from the observed phenomenon itself, from the limited resolution, from the reconstruction algorithms, etc. [18]. Spatial fuzzy sets therefore represent both the spatial information and the imprecision attached to it. When dealing with fuzzy objects, operations usually de"ned on crisp (or classical or binary) sets have to be extended to fuzzy objects. Several di!erent methods have been proposed in the literature to this aim [2,19}21]. The method we use here consists in translating binary expressions into fuzzy ones. This method is particularly powerful if the operations can be expressed in set theoretical or logical terms. The idea is to replace formally every binary (or crisp) concept by its fuzzy equivalent. Table 1 summarizes the main de"nitions of fuzzy equivalents (the reader may "nd more details about de"nitions and properties of t-norms, t-conorms and complementations in Refs. [22}24]). From these equivalences, more complex relationships can be translated. For instance, the expression ALB, which is equivalent to ACXB"S, is translated as inf ¹[c(k ) (x), k (x)] A B x|S which is a number in [0,1] representing the degree to which the fuzzy set k is included in the fuzzy set k . The A B functions k and k represent the two concerned fuzzy A B sets, or equivalently their membership functions. Such translations have already been used for de"ning Euclidean morphological operators [2], leading to the following generic expressions for the dilation and erosion of a fuzzy set k by a fuzzy structuring element l: ∀x3S, D(k, l) (x)"sup t[k(y), l(y!x)], y|S
(4)
∀x3S, E(k, l) (x)"inf ¹[k(y), c(l(y!x))]. y|S
(5)
Table 1 Crisp concept
Equivalent fuzzy concept
Set X Characteristic function k, k(x)3M0, 1N Complement of a set Intersection W Union X Existence & Universal symbol ∀
Fuzzy set Membership function k, k(x)3[0, 1] Fuzzy complementation c t-norm t t-conorm ¹ Supremum In"mum
I. Bloch / Pattern Recognition 33 (2000) 897}905
These de"nitions have good properties in terms of both mathematical morphology and fuzzy sets, as shown in Ref. [2]. Therefore, we based our work on these de"nitions. The proposed construction of geodesic operators will follow the same principle (Section 5). One of the main advantages of this construction principle is that it leads to a nice axiomatization of the resulting operations. Indeed, since the fuzzy equivalent concepts of the basic set and logical operations share most of the properties of these crisp operations, the derived complex operations also satisfy a set of axioms. This set is precisely the one that has to be satis"ed in order to share similar properties in the fuzzy case and in the crisp case. However, as can be expected from any extension, some properties may be lost. The amount of loss depends on the choice of the t-norms and t-conorms. For instance, for Euclidean fuzzy morphology de"ned as in Eqs. (4) and (5), most properties of the operations are satis"ed whatever the choice of the t-norms and tconorms. A few properties are satis"ed only for speci"c choices of these connectives. This is the case for instance for the idempotence of opening and closing, that is satis"ed only for the Lukasiewicz t-norm and t-conorm (i.e. t(a, b) " max(0, a#b!1) and ¹(a, b) " min(1, a#b)) [2].
3. Fuzzy geodesic distance between two points in a fuzzy set 3.1. Fuzzy geodesic distance dexned as a number We proposed in Ref. [16] an original de"nition for the distance between two points in a fuzzy set, extending the notion of geodesic distance. We recall here this de"nition and the main results we obtained. The geodesic distance between two points x and y represents the length of the shortest path between x and y that `goes out of k as least as possiblea. We have proposed several formalisms for this notion. Here we recall only the one having the best properties. This de"nition relies on the degree of connectivity, as de"ned by Rosenfeld [25]. In the case where S is a discrete bounded space (as is usually the case in image processing), the degree of connectivity in k between any two points x and y of S is de"ned as c (x, y)"max [min k(t)], k Li |L t|Li
899
unique, can be interpreted as a geodesic path descending as least as possible in the membership degrees), and we denote by l(¸H(x, y) ) its length (computed in the discrete case from the number of points belonging to the path). Then we de"ne the geodesic distance in k between x and y as l(¸H(x, y)) . d (x, y)" k c (x, y) k
(7)
If c (x, y) " 0, we have d (x, y)"#R, which corresk k ponds to the result obtained with the classical geodesic distance in the case where x and y belong to di!erent connected components (actually it corresponds to generalized geodesic distance, where in"nite values are allowed). This de"nition corresponds to the weighted geodesic distance (in the classical sense) computed in the a-cut of k at level a" c (x, y). In this a-cut, x and y belong to the k same connected component (for the considered discrete crisp connectivity). This de"nition is illustrated in Fig. 1. This de"nition satis"es the following set of properties (see Ref. [16] for the proof ): positivity: ∀(x, y)3S2, d (x, y)*0; k symmetry: ∀(x, y)3S2, d (x, y)"d (y, x); k k separability: ∀(x, y)3S2, d (x, y)"0 Q x"y; k d depends on the shortest path between x and y that k `goes outa of k `as least as possiblea, and d tends k towards in"nity if it is not possible to "nd a path between x and y without going through a point t such that k(t)"0; (5) d is decreasing with respect to k(x) and k(y); k (6) d is decreasing with respect to c (x, y); k k (7) d is equal to the classical geodesic distance if k is k crisp. (1) (2) (3) (4)
(6)
where ¸ denotes the set of all paths from x to y. Each possible path ¸ from x to y is constituted by a sequence i of points of S according to the discrete connectivity de"ned on S. We denote by ¸H(x, y) a shortest path between x and y on which c is reached (this path, not necessarily k
Fig. 1. Illustration of the geodesic distance in a fuzzy set k between two points x and y in a 2D space.
900
I. Bloch / Pattern Recognition 33 (2000) 897}905
The triangular inequality is not satis"ed, but from this de"nition, it is possible to build a true distance, satisfying triangular inequality, while keeping all other properties. This can be achieved in the following way (see Ref. [16] for proof and details):
C
D
l(¸H(x, t)) l(¸H(t, y)) d{ (x, y)"min # . k c (x, t) c (t, y) k k t|S These properties are in agreement with what can be required from a fuzzy geodesic distance, both mathematically and intuitively. 3.2. Fuzzy geodesic distance dexned as a fuzzy number In the previous approach, the geodesic distance between two points is de"ned as a crisp number (i.e. a standard number). It could be also de"ned as a fuzzy number, taking into account the fact that, if the set is imprecisely de"ned, geodesic distances in this set can be imprecise too. This is the scope of this section. One solution to achieve this aim is to use the extension principle, based on a combination of the geodesic distances computed on each a-cut of k. Let us denote by d a (x, y) the geodesic distance between x and y in the k crisp set k . Using the extension principle, we de"ne the a degree to which the geodesic distance between x and y in k is equal to d as ∀d3R`, d (x, y)(d)"supMa3[0, 1], d a (x, y)"dN. k k This de"nition satis"es the following properties:
(8)
(1) If a'c (x, y), then x and y belong to two distinct k connected components of k .2 In this case, the (genera alized) geodesic distance is in"nite. If we restrict the evaluation of d (x, y) (d) to "nite distances d, then k d (x, y) (d)"0 for d'd ck(x, y) . k k (2) Let d (x, y) denote the Euclidean distance between E x and y. It is the shortest of the geodesic distances that can be obtained in any crisp set that contains x and y. This set can be for instance the whole space S, which can be assimilated to the a-cut of level 0 (k ). Therefore, for d(d (x, y), we have 0 E d (x, y) (d)"0. k (3) Since the a-cuts are nested (k Lk for a'a@), it a a{ follows that d a (x, y) is increasing in a, for k a)c (x, y). Therefore, d (x, y) is a fuzzy number, k k with a maximum value for d ck(x, y) , and with a disconk tinuity at this point. Its shape looks as shown in Fig. 2.
2 Since c (x, y) corresponds to `heighta (in terms of memberk ship values) of the point along the path that connects x and y, i.e. the maximum of the minimal height along paths from x to y.
Fig. 2. Typical shape of the fuzzy geodesic distance between two points in a fuzzy set, de"ned as a fuzzy number.
This de"nition can be normalized by dividing all values by c (x, y), in order to get a maximum membership value k equal to 1. One drawback of this de"nition is the discontinuity at d ck(x, y) . It corresponds to the discontinuity existing in the k crisp case when x and y belong to parts that become disconnected. Further work aims at exploiting features of fuzzy set theory in order to avoid this discontinuity, if this is found desirable.
4. Fuzzy geodesic balls in a fuzzy set Since several de"nitions of fuzzy geodesic distances exist or could be further proposed, we keep the following de"nitions of fuzzy geodesic balls as general as possible. Therefore, all what follows can be applied for any de"nition of a fuzzy geodesic distance, as a crisp number or as a fuzzy number. 4.1. General dexnition In this section, we de"ne fuzzy geodesic balls in a fuzzy set. Let us denote by b (x, o) the fuzzy geodesic ball of k center x and radius o, conditionally to k. We de"ne b (x, o) as a fuzzy set on S, and b (x, o) (y) denotes the k k membership value of any point y of S to the fuzzy geodesic ball. Intuitively, given that x is in k to some degree, for each point y the value b (x, o) (y) represents k the fact that y belongs to k to some degree and that it is at a geodesic distance in k from x less than o. For that, b (x, o) (y) is de"ned as a conjunction of three terms: the k degree to which x belongs to k, the degree to which y belongs to k, and the degree d(d (x, y))o) to which k d (x, y))o, i.e.: k ∀y3S, b (x, o) (y)"t[k(x), k(y), d(d (x, y))o)], k k where t is a t-norm.
(9)
I. Bloch / Pattern Recognition 33 (2000) 897}905
4.2. Simple example Obviously, d(d (x, y))o) should be a decreasing funck tion of d (x, y). If we consider that d and o are crisp k k numbers, we can choose a simple Heaviside function, such that
G
1 if d (x, y))o, k d(d (x, y))o)" k 0 else.
(10)
Then we derive
G
t[k(x), k(y)] if d (x, y))o, k (11) ∀y3S, b (x, o) (y)" k 0 else. A fuzzy ball is therefore a subset of k constituted of points y which are at a geodesic distance from x less than o, and whose membership degrees are bounded by k(x). In this case, we assume that the value of interest o is precisely de"ned, which may appear as restrictive in a fuzzy context. 4.3. Comparison of two fuzzy numbers If we consider that some imprecision is attached to o, rather than considering it as crisp, then we can choose
901
a smoother function, depending on the amount of imprecision attached to o. The problem with this approach is that the chosen decreasing function is somewhat arbitrary, and probably di$cult to tune for speci"c applications. Therefore, we propose another approach, where the link between this function and the imprecision of o is made more explicit. For this aim, we consider o as a fuzzy number. De"ning d(d (x, y))o) calls then for the comk parison of fuzzy numbers: d (x, y) is less than o if d (x, y) k k is equal to the minimum of d (x, y) and o. The minimum k between two fuzzy numbers has been de"ned in Ref. [22] as follows. Let d and o be two fuzzy numbers. From the de"nition of fuzzy numbers, the a-cuts of d and o are bounded intervals, denoted as [d~, d`] and [o~, o`], a a a a respectively. The minimum of d and o is then the fuzzy number, the a-cuts of which are min(d, o) "[min(d~, o~), min(d`, o`)]. a a a a a
(12)
Let us denote by [o , o ] the support of o and by 0 2 o its modal value. We use similar notations for d. Four 1 con"gurations are illustrated in Fig. 3, corresponding to di!erent rankings of o and d , o and d , o and d . The 1 1 2 2 3 3 four other possible con"gurations can be easily deduced by symmetry (by exchanging the roles of d and o).
Fig. 3. Minimum of two fuzzy numbers d and o (thick dashed line). Top left: d (o , d (o , d (o , the minimum is equal to d. Top 1 1 2 2 3 3 right: d (o , d 'o , d (o , the minimum is equal to d until the "rst intersection between d and o, then it is equal to o until the third 1 1 2 2 3 3 intersection, and then equal to d again. Bottom left: d (o , d (o , d 'o , the minimum is equal to d until the second intersection, 1 1 2 2 3 3 and then to o. Bottom right: d (o , d 'o , d 'o , the minimum is equal to d until the "rst intersection, and then equal to o. 1 1 2 2 3 3
902
I. Bloch / Pattern Recognition 33 (2000) 897}905
4.4. Detailed expression for the geodesic distance dexned as a number Let us detail the analytical expression of d(d (x, y))o) k in the case where the fuzzy geodesic distance is de"ned as a crisp number. Applying Eq. (12) in in the case where d (x, y) is a crisp number, we come up with the following k result, for all real number z:
G
(13)
f if o )d (x, y))o : 0 k 1
G
o(z) if z(d (x, y), k min(d (x, y), o) (z)" 1 if z"d (x, y), k k 0 if z'd (x, y), k
(14)
f if o )d (x, y))o : 1 k 2
G
o(z) if z)d (x, y), k min(d (x, y), o) (z)" k 0 if z'd (x, y), k f if d (x, y)*o : k 2 min(d (x, y), o) (z)"o(z). k
∀y3S, b (x, o)(y) k 0 if d (x, y)*o , k 1 (19) " t[k(x), k(y), c(o)(d(d (x, y))] else. k If t is chosen for instance as the product, k(x) and k(y) appear as weighting factors. This de"nition may appear as severe. For instance, values that are a little bit smaller than o have very low 1 degrees of being less than o, although they are less than the modal value of o. A more `optimistica de"nition can be derived from the relationship `to the left of a, as introduced in Ref. [26], but applied here in a simpler 1D case. In this approach, we de"ne in the considered space a `fuzzy landscapea representing, for each point, the degree to which this point is in a direction u from a reference set or a fuzzy set. Here, the space is one-dimensional, and equal to R`. The reference fuzzy set is o. The direction corresponding to the relationship `to the left of a is the opposite of the unit vector on the real line (horizontal line in Fig. 4). According to the de"nitions provided in the general case in Ref. [27], the degree to which a point P is to the left of o is de"ned as
G
f if d (x, y))o : k 0 1 if z"d (x, y), k min(d (x, y), o) (z)" k 0 if zOd (x, y), k
G
1 if d (x, y))o , k 0 d (d (x, y))o)" c(o) (d (x, y)) if o )d (x, y))o , k 0 k 1 k 0 if d (x, y)*o , k 1 (18) Finally, we obtain
(15)
(16)
To have d (x, y))o is equivalent to have k d (x, y)"min(d (x, y), o), or d (x, y)Lmin(d (x, y), o) k k k k and min(d (x, y), o)Ld (x, y). This last form can be eask k ily translated into fuzzy terms, in a way similar to the one used in Ref. [2], as
k (o) (P)"max t[o(Q), f (h(P, Q))], left Q
where t is a t-norm, f is a decreasing function in [0, p], with f (0)"1 and f (h)"0 for h*p/2, and h is de"ned as
A B
h(P, Q)"arccos and h(P, P)"0.
d(d (x, y))o) k
(20)
QP ' u DDQPDD
(21)
"t[inf ¹[c(d (x, y)) (z), min(d (x, y), o) (z) ], k k z inf ¹[d (x, y) (z), c(min(d (x, y), o))]], k k z where t is a t-norm, c a fuzzy complementation (typically c(z)"1!z) and ¹ a t-conorm, dual of t with respect to c. This leads to the following result:
G
if d (x, y))o , k 0 d (d (x, y))o)" inf k c(o) (z) if o )d (x, y))o , k zxd (x, y) 0 k 1 0 if d (x, y)*o . k 1 (17) 1
Since o is increasing on [o , o ] (as it is a fuzzy 0 1 number), we obtain
Fig. 4. Illustration of the de"nition of d(d (x, y))o) using the k minimum of two fuzzy numbers (continuous dark line) and using the relation `left toa (dashed line).
I. Bloch / Pattern Recognition 33 (2000) 897}905
Let x and x be the coordinates of P and Q on the P Q horizontal axis. We have QP.u"x !x , and therefore Q P
G
h(P, Q)"
0 n
if x 'x , Q P if x (x , Q P
These properties are the fuzzy equivalents of the properties of crisp geodesic balls. This shows the consistency of the proposed extension.
(22) 5. Fuzzy geodesic mathematical morphology
The "rst case corresponds to P being on the left of Q and the second one to P being on the right of Q. These results lead to the following expression of k (o) (P): left k (o) (P)" max o(x ). left Q xQ ;xP
903
(23)
In order to extend geodesic morphological operations to fuzzy sets, we translate Eqs. (2) and (3) into fuzzy terms. The idea is to replace formally every binary concept by its fuzzy equivalent, as presented in Section 2. 5.1. Dexnitions of basic fuzzy geodesic operators
It leads to
G
if d (x, y))o , k 1 d (d (x, y))o)" k o(d (x, y)) if d (x, y)*o . k k 1 1
(24)
These de"nitions are illustrated in Fig. 4. The proposed de"nition of a fuzzy geodesic ball applies directly to any other de"nition of the fuzzy geodesic distance, represented either as a crisp number, or as a fuzzy number. Also the following properties hold, and are not restricted to the particular form of the fuzzy geodesic distance we use. 4.5. Properties The proposed de"nitions of fuzzy geodesic balls share the following properties: (1) b (x, o) (x)"k(x) (since d(d (x, x))o)"1, and 1 is k k unit element of any t-norm); (2) b (x, o)(y))k(x) (since for any t-norm, we have k ∀(a, b)3[0, 1]2, t(a, b))a't(a, b))b); (3) b (x, o) (y))k(y); k (4) if d (x, y) and o are crisp numbers, d(d (x, y))o) is k k binary, and equal to 1 i! d (x, y))o (by construck tion); (5) if k, d and o are crisp, then b (x, o) is the crisp k k geodesic ball, therefore compatibility with the binary case is achieved (this comes from the limit values taken by any t-norm, that correspond exactly to a binary intersection: t(0, 1)"t(1, 0)" t(0, 0)"0 and t(1, 1)"1); (6) spatial invariance: b (x, o) is invariant by translation k and rotation; (7) monotony with respect to o: if o and o@ are such that o )o@ and o)o@ on [o , o ] (which is typically 1 1 0 1 the case if o@ is just a translation of o), then b (x, o))b (x, o@), expressing that a fuzzy geodesic k k ball is included in a fuzzy geodesic ball of same center and `largera radius; (8) a fuzzy geodesic ball is always included in the Euclidean ball of same radius.
In the geodesic case, we use similar rules as in Section 2 to translate Eqs. (2) and (3) into fuzzy terms. This leads to the following de"nitions of fuzzy geodesic dilation and erosion of k@ conditionally to k: ∀x3S, Do (k@) (x)"sup t[b (x, o) (y), k@(y)], k k y|S
(25)
∀x3S, Eo (k@) (x)"inf ¹[c(b (x, o) (y)), k@(y)]. k k y|S
(26)
From these two basic operators, other ones can be de"ned, as is done in classical morphology. For instance fuzzy geodesic opening and closing are simply de"ned as Oo (k@) " Do [Eo (k@)] and Co (k@) " Eo [Do (k@)]. k k k k k k 5.2. Properties The proposed de"nitions of fuzzy geodesic dilation and erosion have the following properties, which are similar to the properties of classical geodesic operators: (1) compatibility with the crisp case: if k, k@ and o are crisp, the de"nitions are equivalent to the binary geodesic operators; (2) duality with respect to complementation: ∀x3S, Do [c(k@)] (x)"c[Eo (k@)] (x) k k assuming that the t-norm and the t-conorm used in dilation and erosion, respectively, are dual with respect to the complementation c; (3) the result of the geodesic dilation of k@ conditionally to k is included in k: ∀x3S, Do (k@) (x))k(x) k expressing that the transformed set stays inside the conditioning set; (4) invariance with respect to geometrical transformations, and local knowledge property; (5) increasingness: k@)kAN∀x3S, Do (k@) (x))Do (kA) (x); k k
904
I. Bloch / Pattern Recognition 33 (2000) 897}905
(6) restricted extensivity: ∀x3S, Do (k@) (x)*t[k(x), k@(x)]; k (7) interpretation: rewriting the expression of fuzzy geodesic dilation leads to Do (k@) (x)"sup t[t[k(x), k(y), d(d (x, y))o)], k@(y)] k k y|S and, since a t-norm is commutative, associative and increasing Do (k@) (x)"t[k(x), sup t[k(y), k@(y), d(d (x, y))o)]]. k k y|S This represents the intersection of k with the dilation of k@ performed on a neighborhood containing the points y of k (the conditioning aspect) such that d (x, y))o (the k geodesic distance aspect). This interpretation is in complete agreement with what is expected from a geodesic dilation.
6. Conclusion We presented in this paper an original way to de"ne fuzzy geodesic morphological operators, based on fuzzy geodesic distance. We proposed de"nitions of these operators and of fuzzy geodesic balls that have good features: they deal with a direct representation of spatial imprecision in the fuzzy sets, they are consistent with existing binary de"nitions, they have good formal properties, in agreement with the formal properties of crisp de"nitions and with intuitive requirements. Future works aim at investigating further properties of these de"nitions, at comparing the di!erent possible instantiations of them, and at evaluating their applications in image processing problems under imprecision.
References [1] I. Bloch, H. Mam( tre, Constructing a fuzzy mathematical morphology: alternative ways, Second IEEE International Conference on Fuzzy Systems, FUZZ IEEE 93, San Fransisco, California, March 1993, pp. 1303}1308. [2] I. Bloch, H. Mam( tre, Fuzzy mathematical morphologies, a comparative study, Pattern Recognition 28 (9) (1995) 1341}1387. [3] D. Sinha, E. Dougherty, Fuzzy Mathematical Morphology, J. Visual Commun. Image Representation 3 (3) (1992) 286}302. [4] B. De Baets, Idempotent closing and opening operations in fuzzy mathematical morphology, ISUMA-NAFIPS'95, College Park, MD, September 1995, pp. 228}233. [5] V. di Gesu, M.C. Maccarone, M. Tripiciano, Mathematical Morphology based on Fuzzy Operators, in: R. Lowen, M. Roubens (Eds.), Fuzzy Logic, Kluwer Academic, Dordrecht, 1993, pp. 477}486.
[6] A.T. Popov, Morphological operations on fuzzy sets, in: IEE Image Processing and its Applications, Edinburgh, UK, July 1995, pp. 837}840. [7] D. Sinha, P. Sinha, E.R. Dougherty, S. Batman, Design and analysis of fuzzy morphological algorithms for image processing, IEEE Trans. Fuzzy Systems 5 (4) (1997) 570}584. [8] I. Bloch, Distances in fuzzy sets for image processing derived from fuzzy mathematical morphology (invited conference), Information Processing and Management of Uncertainty in Knowledge-Based Systems, Granada, Spain, July 1996, pp. 1307}1312. [9] I. Bloch, On links between fuzzy morphology and fuzzy distances: euclidean and geodesic cases (invited conference), in: Information Processing and Management of Uncertainty IPMU'98, Paris, 1998, pp. 1144}1151. [10] I. Bloch, Fuzzy morphology and fuzzy distances: new de"nitions and links in both euclidean and geodesic cases, in: A. Ralescu (Ed.), Lecture Notes in Arti"cial Intelligence: Fuzzy Logic in Arti"cial Intelligence, towards Intelligent Systems, Springer, Berlin, 1998. [11] J. Serra, Image Analysis and Mathematical Morphology, Academic Press, London, 1982. [12] J. Serra, in: J. Serra (Ed.), Image Analysis and Mathematical Morphology, Part II: Theoretical Advances, Academic Press, London, 1988. [13] M. Schmitt, J. Mattioli, Morphologie MatheH matique, Masson, Paris, 1994. [14] C. Lantuejoul, F. Maisonneuve, Geodesic methods in image analysis. pattern recognition 17 (2) (1984) 177}187. [15] I. Bloch, Fuzzy geodesic mathematical morphology from fuzzy geodesic distance, in: H. Heijmans, J. Roerdink (Eds.), Mathematical Morphology and its Applications to Image and Signal Processing, Kluwer Academic, Amsterdam, 1998, pp. 43}50. [16] I. Bloch, Fuzzy Geodesic Distance in Images, in: A. Ralescu, T. Martin (Eds), Lecture Notes in Arti"cial Intelligence: Fuzzy Logic in Arti"cial Intelligence, towards Intelligent Systems, Springer, Berlin, 1996, pp. 153}166. [17] L.A. Zadeh, Fuzzy sets, Inform. and Control 8 (1965) 338}353. [18] I. Bloch, Image information processing using fuzzy sets (invited conference), World Automation Congress, Soft Computing with Industrial Applications, Montpellier, France, May 1996, pp. 79}84. [19] L.A. Zadeh, The concept of a linguistic variable and its application to approximate reasoning, Inform. Sci. 8 (1975) 199}249. [20] I. Bloch, H. Mam( tre, M. Anvari, Fuzzy adjacency between image objects, Int. J. Uncertainty, Fuzziness KnowledgeBased Systems 5 (6) (1997) 615}653. [21] I. Bloch, On fuzzy distances and their use in image processing under imprecision. Pattern Recognition 32 (11) (1999) 1873}1895. [22] D. Dubois, H. Prade, Fuzzy Sets and Systems: Theory and Applications, Academic Press, New-York, 1980. [23] D. Dubois, H. Prade, A review of fuzzy set aggregation connectives, Inform. Sci. 36 (1985) 85}121. [24] R.R. Yager, On a general class of fuzzy connectives, Fuzzy Sets and Systems 4 (1980) 235}242. [25] A. Rosenfeld, The fuzzy geometry of image subsets, Pattern Recognition Lett. 2 (1984) 311}317.
I. Bloch / Pattern Recognition 33 (2000) 897}905 [26] I. Bloch, Fuzzy relative position between objects in images: a morphological approach, IEEE Interational Conference on Image Processing ICIP'96, Vol. II, Lausanne, September 1996, pp. 987}990.
905
[27] I. Bloch, Fuzzy relative position between objects in image processing: a morphological approach, IEEE Trans. Pattern Anal. Mach. Intell. 21(5) (1999).
About the Author*ISABELLE BLOCH is professor at ENST Paris (Signal and Image Department), and is in charge of the Image Processing and Interpretation Group. She graduated from Ecole des Mines de Paris in 1986, received a Ph.D. from ENST Paris in 1990, and the `Habilitation a` Diriger des Recherchesa from University Paris 5 in 1995. Her research interests include 3D image and object processing, structural pattern recognition, 3D and fuzzy mathematical morphology, decision theory, data fusion in image processing, fuzzy set theory, evidence theory, medical imaging, aerial and satellite imaging.
Pattern Recognition 33 (2000) 907}916
An e$cient watershed algorithm based on connected components A. Bieniek*, A. Moga Institute for Computer Science, Albert-Ludwigs-Universita( t Freiburg, Chair of Pattern Recognition and Image Processing, Universita( tsgela( nde Flugplatz, D-79085 Freiburg i.Br., Germany Accepted 27 July 1999
Abstract In this paper, a formal de"nition and a new algorithmic technique for the watershed transformation is presented. The novelty of the approach is to adapt the connected component operator to solve the watershed segmentation problem. The resulting algorithm is independent of the number of grey-levels, employs simple data structures, requires less error prone memory management, and issues a lower complexity and a short running time. However, the algorithm does not modify the principle of the watershed segmentation; the output result is the same as that of using any traditional algorithm which does not build watershed lines. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Watersheds; Image segmentation; Connected components
1. Introduction The watershed transformation is a popular image segmentation algorithm for grey-scale images. The traditional watershed algorithm simulates a #ooding process. Thus, an image is identi"ed with a topographical surface, in which the altitude of every point is equal to the grey level of the corresponding pixel. Holes are then pierced in all regional minima of the relief (connected plateaus of constant altitude from which it is impossible to reach a location of lower altitude without having to climb). Sinking the whole surface slowly into a lake, water springs through the holes and progressively immerses the adjacent walls. To prevent streams of water coming from di!erent holes to intermingle, a hinder is set up at the meeting locations. Once the relief is completely covered by water, the set of obstacles depicts the watershed image.
* Corresponding author. Tel.: #49-611-714-6736; fax: #49611-714-6736. E-mail address:
[email protected] (A. Bieniek)
Various de"nitions of watersheds have been proposed in the literature for both digital and continuous spaces [1}5]. Most algorithms label each pixel with the identi"er of its catchment basin and no watershed lines are explicitly constructed. In this paper, we present a new algorithm to perform the watershed transformation which does not construct watershed lines. Let us mention that the algorithm produces the same segmentation result as the techniques in Refs. [1}3], but a simpler algorithmic construction and hence a lower complexity is issued. The traditional implementation of the watershed segmentation algorithm simulates the #ooding process over the image surface. First, regional minima are detected and uniquely labelled with integer values. Then, the algorithm simulates the #ooding process using a hierarchical queue [1,2]. Such a queue consists of H "rst-in}"rst-out (FIFO) queues, one queue for each of the H grey levels in the image; the size of the hth FIFO queue is given by the number of pixels in the image having the grey-level h. This data structure is used to impose the order of accessing pixels to operate on. Initially, the hierarchical queue contains the seeds for the #ooding, i.e. the minima which are at the interface line between the regional minima and
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 5 4 - 5
908
A. Bieniek, A. Moga / Pattern Recognition 33 (2000) 907}916
the non-minima pixels; a pixel of grey-level h is introduced into the hth FIFO queue of the hierarchical queue. The hierarchical queue is then parsed from the lowest grey level to the highest one. A pixel p, removed from the queue, propagates its label to all its neighbours which have not been already reached by #ooding. The latter are introduced, at their turn, into the queue of their grey level. The FIFO order of serving the candidate pixels within the same connected plateau ensures the synchronous breadth-"rst propagation of labels coming from di!erent minima inside a plateau. When all FIFO queues have been emptied, each pixel was appended to a single region and the procedure stops. The image of labelled pixels depicts the segmentation result. For a simple input image, the #ooding process, illustrated by arrows, is shown in Fig. 1. Following the #owing scheme in Fig. 1, we developed a formalism which allows us to determine for every pixel p a neighbouring pixel q from which p will be #ooded. As in other watershed formalisms, q may not be unique. In such a case, q is arbitrarily chosen among the potential pixels. Having this local `connectivitya relation, between neighbouring pixels which pertain to the same catchment basin, embedded into the image (technique also known as arrowing [2]), the result is nothing but a directed graph, for which the connected components [6,7], must be computed. The novelty of our approach is to e!ectively apply the connected component operator [6,7], to compute catchment basins. Preliminary results for this approach have been published in Ref. [8] and a modi"ed version, which constructs watershed pixels according to the de"nitions of Meyer [2], has been recently found in Ref. [9]. However, the connected component technique has been previously used in Refs.
[10}12] for the parallelization of the watershed transformation. The paper is organized as follows. In Section 2, a formal de"nition of watersheds in digital space for images without non-minima plateaus is presented. In Section 3, our formalism is compared with Meyer's de"nition of watersheds [2]. Further on, the proposed de"nitions lead to a connected component-like watershed algorithm for images without non-minima plateaus in Section 4. The de"nitions are extended for images with non-minima plateaus in Section 5, whereas the corresponding algorithm follows in Section 6. In Section 7, the complexity analysis of the algorithm and timing results are presented, while conclusions are drawn in Section 8.
2. Segmentation based on local conditions In this section, we present a de"nition of the watershed segmentation for images without non-minima plateaus. The reason to consider just such images is that each pixel has at least one lower neighbour, except minima pixels, i.e. the image is lower complete [3,11]. The extension to include images with non-minima plateaus is presented in Section 5. Let f (p) be a function of grey levels, representing a digital image with the domain )LZ2. Each pixel p3) has a grey level f (p) and a set of neighbouring pixels p@3N(p) with a distance function dist (p, p@) to each neighbour. In most cases, a 4- or 8-square neighborhood is used with a constant distance of 1 to all neighbouring pixels. Before giving our de"nition of watershed segmentation and catchment basins, some preliminary de"nitions are introduced:
Fig. 1. Sequential watershed: (a) #ooding the input image (b) output image of labels.
A. Bieniek, A. Moga / Pattern Recognition 33 (2000) 907}916
De5nition 1 (Lower slope). The lower slope of a pixel p is given by the maximal ratio ( f (p)!f (p@))/dist(p, p@) to all its neighbouring pixels of lower grey level than itself: LS(p)" max +p{|N(p)
A
K
B
f (p)!f (p@) f (p@))f (p) dist(p, p@)
and is not de"ned for the case f (p@)'f (p), ∀p@3N(p) [2]. The lower slope de"nes the maximum steepness from a pixel to its lower neighbours. Each pixel in the image, excluding minima, has a steepest neighbourhood: De5nition 2 (Steepest neighbourhood ). ∀p3), NLS (p) is the set of pixels p@3N(p) de"ned as follows:
G
NLS(p)" p@3N(p)
K
H
f (p)!f (p@) "LS(p), f (p@)(f (p) . dist(p, p@)
For the case dist(p, p@)"1, ∀p@3N(p), the set becomes
G
K
H
NLS(p)" p@3N(p) f (p@)" min f (pA), f (p@)(f (p) . +pA|N(p) A similar de"nition exists also in Ref. [2]. Let us note that in an image without non-minima plateaus, NLS(p)O0 ∀p3), p is not a minimum. In addition, the path of steepest descent from a pixel p down to a regional minimum m will pass only pixels of the set i 6mi NLS(p@). p{/p A di!erent de"nition than in Ref. [2] for a catchment basin and watershed segmentation based on the steepest neighbourhood is next given. De5nition 3 (Watershed segmentation for images without non-minima plateaus). For any image without non-minima plateaus, a segmentation is called watershed segmentation if every regional minimum m has an unique label i ¸(m ) and, for every pixel p3) with NLS(p)O0, the foli lowing condition holds: &p@3NLS(p) such that ¸(p)"¸(p@). De5nition 4 (Catchment basin). For the watershed segmentation de"ned above, a catchment basin CBLC(m ) of i the regional minimum m is the set of pixels with the label i ¸(m ): i CBLC(m )"Mp D ¸(p)"¸(m )N. i i CBLC(pPm ) denotes the catchment basin of m coni i taining pixel p. The de"nition of watershed segmentation and catchment basin does not imply uniqueness of the segmentation result; in general, an image may have several valid watershed segmentations.
909
3. Relation to the traditional de5nition of the watershed segmentation In this section, Meyer's formalism [2] is presented and compared with our de"nitions in Section 2. From functions on continuous space, Meyer derived a formal definition of catchment basins for the digital space [2] as follows: De5nition 5 (Cost function based on lower slope). The cost for walking on the topographical surface from position p to p 3N(p ) is given by i~1 i i~1 cost(p , p )" i~1 i LS(p ) dist(p , p ), f (p )'f (p ), i~1 i~1 i i~1 i LS(p ) dist(p , p ), f (p )(f (p ), i i~1 i i~1 i 1 (LS(p )#LS(p )) dist(p , p ), f (p )"f (p ). 2 i~1 i i~1 i i~1 i
G
De5nition 6 (Topographical distance). The topographical distance between two pixels p and q of an image is the minimal p-topographical distance among all paths p between p and q inside the image: TD (p, q)"inf TDn (p, q). f f where TDn (p, q)"+n cost(p , p ) is the n-topof i/2 i~1 i graphical distance of a path n"(p"p , p ,2, 1 2 p "q), such that ∀i, p 3N(p ) and p 3). n i i~1 i De5nition 7 (Catchment basin based on topographical distance). A catchment basin CBTD(m ) of a regional mini imum m is the set of pixels p3) closer to m than to any i i other regional minimum m , according to the topoj graphical distance and the grey levels of the minima: CBTD(m )"Mp D f (m )#TD (p, m )(f (m ) i i f i j #TD (p, m ) ∀jOiN. f j Based on these de"nitions, Meyer presents the following theorem (Proposition 5 in Ref [2]): Theorem 8. The topographical distance between a pixel p and the regional minimum m in the depth of its catchment i basin is minimal and equal to f (p)!f (m ) and the geodesic i line between them is a line of steepest descent. The reversal of Theorem 8 states that a path of steepest descent ensures a minimal cost. The construction of the catchment basins is reduced to a problem of "nding a shortest path between each pixel and a regional minimum. The relation between De"nitions 4 and 7 is stated in the following theorem: Theorem 9. A catchment basin based on the topographical distance, as in Dexnition 7, is a subset of the catchment
910
A. Bieniek, A. Moga / Pattern Recognition 33 (2000) 907}916
basin in Dexnition 4, based on the local condition given in Dexnition 3. Proof. The formal construction of the catchment basin according to De"nition 3 can be described as a recursion. The process starts with the set of pixels belonging to the regional minimum m . All these pixels are labeled with i ¸(m ). At each step, unlabelled pixels, whose neighbours i of steepest descent are already in the set, are appended to the set. The recursion ends when no more pixels can be incorporated into the set.
Ref. [2], choose one of the possible assignments given by De"nition 4. Therefore, these algorithms are consistent with the de"nition. Algorithms which follow De"nition 7 may result in thick watershed lines and watershed areas. In other cases, no watershed line is visible between neighbouring regions. Algorithms which avoid thick or zero-width watershed lines are not consistent with De"nition 7. According to De"nition 4, every pixel belongs to a catchment basin, but the segmentation result is scanning order dependent.
4. A simple algorithm for lower complete images CBLC0(m )"m , i i CBLCk`1(m )"CBLCk(m ) X * CBLCk(m ), i i i * CBLCk(m )"Mp D ∀j, p N CBLCk(m ) and & p@3NLS(p), i j p@3CBLCk(m )N. i Each newly inserted pixel p has a neighbour p@ being part of the catchment basin CBLCk(m ). Thus, the local i condition of De"nition 3 is valid for each p. The proof proceeds as follows: p@3 NLS(p)NLS(p)$%&2 "
f (p)!f (p@) dist(p, p@)
NLS(p) ) dist(p, p@) "f (p)!f (p@)$%& N5 cost(p, p@)"f (p)!f (p@). According to Theorem 8, one recursion step adds only those pixels p building paths of steepest descent down to CBLCk(m ) with the minimal cost f (p)!f (p@), i p@3CBLCk(m ). After the recursion is "nished, all paths i between pixels of the catchment basin and its minimum are paths of steepest descent. Therefore, it is not possible to construct a steeper path to a di!erent minimum m . j However, there might exist another steepest path, of equal cost as to m , to a di!erent regional minimum m . i j In this case, the pixel is a watershed pixel according to De"nition 7. This proves that CBTD(m ) is a subset of i CBLC(m ). h i The di!erence between De"nitions 7 and 4 is the treatment of pixels which have the steepest paths of equal cost in more than one minimum. According to De"nition 7, these pixels are watershed pixels. Following De"nition 4, based on the local condition, such a pixel is assigned to one of the minima, m , to which it is connected by i a steepest path and for which the condition &p@3NLS(p), ¸(p@)"¸(m ) holds. All possible assigni ments result in a valid watershed segmentation. In such cases, most watershed algorithms which do not construct watershed lines, including algorithms described in
The idea of the proposed algorithm originates in the connected components problem [13}15]. The goal is to label each pixel with the representative of the region it belongs to. Choosing, for every pixel p, a neighbour from the set NLS(p) as predecessor, a directed graph results. However, minima pixels do not have a steepest neighbourhood. Therefore, for these pixels, another type of connectivity relation is introduced; all neighbours of a minimum pixel p and having the same grey level as p pertain to the same component. Consequently, they are uni"ed such that the representative of the regional minimum is the pixel with the smallest address value. Once the whole graph is constructed, its connected components have to be computed. Our design solution makes use, apart from the input image f, of an image l, which stores for every pixel its representative, or label. Let us underscore that pixel addresses are used for labeling [14] instead of arbitrary integer values. The algorithm consists of three raster scannings described below. N (p)"Mp@3N(p) D p@(pN represents the already scanprev ned neighbourhood of p, i.e. all neighbours with a smaller address than p in the raster scanning order. Watershed Algorithm for lower complete images M Input: f. Output: l. (1) Raster scan (p) M qQp; for each (p@3N(p) and f [p@](f [p]) if (f [p@](f [q]) qQp@; if (qOp) l[p]Qq; else l[p]QP¸A¹EA;; N (2) Raster scan (p) M if (l[p]"P¸A¹EA;) M l[p]Qp; for each (p@3N (p) and f [p@]"f [p]) M prev rQFIND(l, p); r@QFIND(l, p@); l[r]Ql[r@]Qmin(r, r@); N
A. Bieniek, A. Moga / Pattern Recognition 33 (2000) 907}916
N N (3) Raster scan (p) l[p]QFIND(l, p); N FIND(l, u) M for(rQu; l[r]Or; rQl[r]); for (wQu; wOr;) tmpQl[w]; l[w]Qr; wQtmp; return r; N In the "rst raster scanning, the label of each pixel p, which has a lower neighbour, is set to q3NLS(p). Otherwise, if the pixel has no lower neighbour, it is on a minima plateau and is labelled PLATEAU. A representative label is computed for every minima plateau in the second raster scanning. The connected component operator FIND (l, p) with path compression [6,7] returns the representative of the plateau on which p lies; this representative, in our implementation, is the pixel with the smallest address in the plateau. The path compression itself is performed in the second for-loop of the function FIND(l, p), by shortcutting all labels w on the path from u to the representative r; the latter was found in the "rst for-loop. Let us remark that performing the two raster scannings (1) and (2) at the same time is also possible. In the third raster scanning, all pixel labels are replaced by their representative. In this way, the condition in De"nition 3 is true for every pixel, and therefore, the presented segmentation algorithm performs a watershed segmentation. Let us notice that apart from the input and output image, no queue or other data structure is needed. The algorithm is independent of the number of grey levels in the image and of the image histogram, uses only contiguous chunks of memory, avoiding thus memory fragmentation or additional indexing variables.
911
which has a lower neighbour. The geodesic distance between two pixels p and p@ on a plateau is equal to the length of the shortest path within the plateau between p and p@ [5]. A plateau PL is a connected set of pixels of the same altitude. Let L "Mp@3PL D NLS(p@)O0N denote the set PL of pixels on the border of the plateau PL which have a lower neighbour; furthermore, let gdist (p, p@) denote PL the geodesic distance, or an approximation of it, between p and p@ within the plateau. The minimal distance between any pixel p on the plateau PL and all border pixels p@3L is gdist (p, L )"min gdist (p, p@). The PL min PL +p{|/PL PL watershed segmentation for images with non-minima plateaus can be de"ned by extending the steepest neighbourhood given in De"nition 2: De5nition 10 (Extended steepest neighborhood ). The set NLS@(p) contains the pixels of the sets NLS(p@) of all border pixels p@3L such that PL
G
NLS@(p)"
H
Z NLS(p@) . p{|/PL @'$*45PL (p,p{)/'$*45min (p, /PL )
De5nition 11 (Watershed segmentation for images with plateaus). For any image with non-minima plateaus, a segmentation is called watershed segmentation if every regional minimum m has an unique label ¸(m ) and, for i i every pixel p3) and NLS@(p)O0, the following condition holds: &p@3NLS@(p) with ¸(p)"¸(p@). The de"nition leaves open the metric used for the geodesic distance. In our and most other implementations, an approximation of the geodesic distance based on the 4- or 8-square neighbourhood is used. The case of images without non-minimal plateaus is included, because the equation in De"nition 10 possesses the following property: gdist (p@, p)"0Np@"pNNLS@(p)" PL NLS(p).
5. Extension to images with plateaus 6. The algorithm for images with plateaus Natural images do have non-minima plateaus. Therefore, an extension of De"nition 3 and of the previous algorithm is needed. In this section, we will show how to extend the set of lower neighbours NLS on a path of steepest descent to include images with non-minima plateaus. The basic problem is that the topographical distance (De"nition 6) has the same value for any two plateau pixels, which do not have lower neighbours. Therefore, the geodesic distance, or an approximation of it, must be used to ensure that a pixel on a non-minima plateau gets the label from the nearest border pixel of the plateau
In order to perform a watershed segmentation on any input image and to ful"l the condition in De"nition 11, another step has to be added to the algorithm for nonminima plateaus. Let us observe that after step (1), in Section 4, plateaus of minima and non-minima are not distinguishable. For a simple input image illustrated in Fig. 2(a), the result of step (1) is shown in Fig. 2(c), where the label PLATEAU has value !1. An intermediate step, for the treatment of non-minima plateaus, is below described in the frame of the entire general algorithm;
912
A. Bieniek, A. Moga / Pattern Recognition 33 (2000) 907}916
A. Bieniek, A. Moga / Pattern Recognition 33 (2000) 907}916
Watershed Algorithm M Input: f. Output: l. (I) Do step (1) of the algorithm of Section 4 (II) Raster Scan (p) M if (l[p]"P¸A¹EA;) for each (p@3N(p)) if (l[p@]OP¸A¹EA; and f [p]"f [p@]) "fo}put(p@); break; N (III) while ("fo}empty( )"FALSE) M pQ"fo}get( ); for each(p@3N(p) and l[p@]"P¸A¹EA;) M l[p@]Qp; "fo}put(p@); N (IV) Do step (2) of the algorithm of Section 4 (V) Do step (3) of the algorithm of Section 4 N
Let us stepwise follow what result produces the algorithm above on the image example in Fig. 2(a). As already mentioned, 1D pixel addresses, in the raster scanning order, are used for labelling. Thus, the pixel location (i, j), 0)i(nrows, 0)j(ncols, in an image of size nrows]ncols has the 1D address equal to i]ncols#j. All pixel addresses are illustrated in Fig. 2(b). In the rest of the paper, the 2D notation and its equivalent 1D value will be used to designate a pixel location. The result of the "rst raster scanning can be observed in Fig. 2(c). The label of pixels have lower neighbours is set to the address of the lowest grey-level neighbour; otherwise, to PLATEAU, i.e. !1. Thus, pixel (0, 4) Q 4 of grey level 8 has as lowest neighbour pixel (0, 3) Q 3, of grey-level 2. Consequently, l (4)Q3. Its neighbouring pixel (0, 5) Q 5 has no lower neighbour and therefore it receives label PLATEAU, l (5)Q!1. Similarly, pixel (3, 3) Q 33 of grey-level 7 is labelled PLATEAU, l (33)Q!1. At step (II), for every PLATEAU pixel p which has a neighbour p@ of the same grey level as p, but it also has a lower neighbour (l(p@)OP¸A¹EA;), p@ is introduced into the FIFO queue. Indeed, p@3L and therefore it is PL a seed for the computation of the extended steepest neighbourhood of pixels within the plateau. In our case, pixel (0, 5) Q 5 will insert pixel (0, 4) Q 4 into the queue and pixel (2, 1) Q 21 will introduce pixel (1, 0)Q10.
b Fig. 2. (a) Input image, (b) pixel addresses, (c) after "rst scan (I), (d) after #ooding non-minima plateaus (III), (e) after connecting minima plateaus (IV), and (f ) after replacing each label with its representative (V).
913
A global wave propagation, starting from the seeds in the queue, is performed at step (III). During this process, each seed pixel, accessed in FIFO order, sets its address as a label to all neighbouring PLATEAU pixels of the same altitude as itself. The latter become seeds and, at their turn, are introduced into the queue. The result of this step is depicted in Fig. 2(d). Thus, pixel (0, 5) Q 5 receives label 4 from pixel (0, 4) Q 4 and propagates its address to pixel (0, 6) Q 6; next, the latter sets its label to 5. The propagation continues until the whole plateau of grey level 8 is exhausted. Let us remark that the condition of De"nition 10 is ful"lled for non-minima plateaus, using an approximation of the geodesic distance. The latter is given by the time stamp of the wave propagation process, but it is not actually tracked during the algorithm. After step (III), only minima plateaus are labelled PLATEAU (see Fig. 2(d)), because they do not have lower brims. The remaining stages are identical with steps (2) and (3) described in Section 4. Thus, pixels on minima plateaus are connected at step (IV) using the connected component operator. The result of this phase is shown in Fig. 2(e). Minima pixels (0, 0) Q 0 and (0, 3) Q 3 are their own representative and accordingly, l (0)Q0, l (3)Q3. The e!ectiveness of the for-loop within this raster scanning is more evident on the plateau of grey-level 7; the latter is completely labeled with its representative label 33, i.e. the smallest pixel address within the plateau, parsed in raster scanning order. Similarly, the regional minimum of grey-level 10 is labelled 51. At step (V), for each pixel, its label is replaced by its representative at step (V). The output image can be observed in Fig. 2(f ). Unlike in the algorithm in Section 4, a FIFO queue is here needed, but only the pixels on non-minima plateaus are vehiculated through this queue. Thus, the dimension of this queue is smaller than that of the hierarchical queue (a su$cient size could be computed during the "rst raster scanning, namely by counting the total number of PLATEAU pixels). Additionally, before allocating each of the FIFO queue in the hierarchical queue, the classical algorithm must compute the image histogram; this step disappears entirely in the present algorithm. Finally, the mechanisms for manipulating a FIFO queue are much simpler than those for a hierarchical queue.
7. Complexity analysis and experimental results Given an image with n pixels, the complexity of the algorithm in Section 6 is now analysed step by step. At Steps (I) and (II) a linear scan with access to a limited neighbourhood is performed. Therefore the complexity of both steps is O(n), or linear with the number of pixels (there exist the constants c , c such that the complexity 1 2 equals c ]n#c ). 1 2
914
A. Bieniek, A. Moga / Pattern Recognition 33 (2000) 907}916
Table 1 Timing results Running time (s) ImageCAlgorithm
Approach I
Approach II
Hierarchical queues
Cermet (256]256) Lenna (512]512) Peppers (512]512) Simple512 (512]512) People (1024]1024)
0.07 0.34 0.35 0.39 1.47
0.08 0.35 0.36 0.35 1.35
0.15 0.76 0.71 0.71 3.26
Fig. 3. Peppers (a) input image (b) output image.
Each pixel on a non-minimum plateau is inserted into the FIFO queue during Steps (II) and (III) at most once. For each pixel in the FIFO queue, a limited neighbourhood is accessed at step (III). Therefore, the overall worst-case complexity of step (III) is O(n). Let n@)n be the number of minima plateau pixels. Since we use path compression in the FIND (l, p) operation in combination with naive linking at step (IV), the worst-case complexity of this step is O(n@ log n@) [6,7]. The worst-case complexity can be reduced to linear for practical problem sizes, if linking by rank or size is used [6,7], at the expense of an additional image to store the rank or size. Nevertheless, for the images we tested, the logarithmic factor could not be observed. At step (V), for each of the n pixels, a FIND (l, p) operation is performed. The pixels in the image can be divided into two sets. Let F be the set of pixels which have not been already accessed by a FIND operation. Initially all pixels are in F. Each FIND operation walks along a path of pixels which are within F. As soon as it hits a pixel p@ N F the operation "nishes, because p@ has already been shortcut to its representative. Afterwards, all pixels on the path are shortcut as well and removed from F. Therefore, the total complexity of step (V) is O(n) because DFD"n and the total number of FIND operations is also n.
As a result, the overall worst-case complexity of the algorithm is O(n#n@ log n@). With our test images we could not observe the logarithmic factor. Therefore, the algorithm can be treated as O(n) for practical images. Concerning the memory requirements, the algorithm described in the previous section makes use of an input and an output image, as well as of a FIFO queue. As already mentioned, the size of this queue can be dynamically computed, at the run time; alternatively, the size of the image can be used instead. In Table 1, the presented algorithm is compared with the traditional hierarchical queue algorithm. The time measurements were performed on a Silicon GraphicTM O2 workstation with an R10000 RISC processor. Approach I is the implementation of the algorithm as presented here, while in Approach II, step (II), slightly modi"ed as below explained, is performed at the same time with step (I). This saves the overhead of a scan through the image, but many unnecessary seeds might be detected, because labels of half of the neighbours are only available at this stage. Therefore, all pixels p@ having a lower neighbouring pixel and also neighbours p of the same altitude are stored as seeds; however, not all the pixels p in the neighbourhood not already scanned and having the same altitude as p@ will be labeled PLATEAU by the test at Scan (I). Hence, p@ are useless in the FIFO
A. Bieniek, A. Moga / Pattern Recognition 33 (2000) 907}916
queue. The results of both implementations show however a signi"cant improvement in the running time compared against the classical algorithm. One image example is illustrated in Fig. 3(a). Taking the gradient image thresholded with an arbitrary value as input for the watershed algorithm, the output can be observed in Fig. 3(b). Let us notice that only the boundaries of the labelled regions are shown in the latter "gure.
8. Conclusion In this paper, we have presented a de"nition for the watershed segmentation which is consistent with the behaviour of most implementations of the watershed algorithm, namely, to chose one arbitrary label in the case of competing labels. Di!erent distance metrics to approximate the geodesic distance on plateaus can be incorporated into the de"nition. The de"nition led to a new type of watershed algorithm which is closely related to the connected component algorithm. We have shown that the algorithm has a linear complexity with the number of pixels, except the connection of minima plateau pixels which introduces, in the worst case, an additional logarithmic factor. For the images we tested, the logarithmic factor could be however not observed. The algorithm has a regular structure (raster scannings comprising simple pixel assignment rules), the memory requirements are minimal (three contiguous chunks of memory accessed by direct indexing techniques) and independent of the image content (image resolution and image histogram), leading to a robust and e$cient implementation. Consequently, our timing results show a signi"cant improvement in the running time, compared against the classical watershed algorithm. Combining our watershed algorithm with an opening by reconstruction [16,17], to "nd markers for `signi"canta objects in the image, a marker-based watershed algorithm results, which is thus independent of the number of grey levels. Consequently, the algorithm is very suitable for images of large resolution, for which the hierarchical queue approach is rather expensive. Finally, the connected component-like formulation of watersheds exhibits a better parallel potential, allowing the design of e$cient and scalable parallel watershed algorithms [10}12].
9. Summary The watershed transformation is a popular image segmentation algorithm for grey-scale images. The traditional watershed algorithm simulates the #ooding process with the help of hierarchical queues. In this paper, we
915
develop a formalism for the watershed transformation, which does not build watersheds at the same time with #ooding of the basins, based on sets of neighbouring pixels. Our de"nition is consistent with the behaviour of most implementations of the watershed algorithm, namely, to choose one arbitrary label in the case of competing labels. Moreover, di!erent distance metrics to approximate the geodesic distance on plateaus can be incorporated into the formalism. The relation to the traditional de"nition of watershed segmentation is proven in the paper. The formalism leads to a new type of watershed algorithm which is closely related to the connected component algorithm. The algorithm that we here introduce is more simple, with respect to implementation and data structures. Additionally, the memory requirement is small and independent of the number of grey levels in the input image. Furthermore, our timing results show a signi"cant improvement in the running time, compared against the classical watershed algorithm.
References [1] S. Beucher, F. Meyer, The morphological approach to segmentation: The watershed transformation, in: E.R. Dougherty (Ed.), Mathematical Morphology in Image Processing, Marcel Dekker Inc, New York, 1993, pp. 433}481. [2] F. Meyer, Topographic distance and watershed lines, Signal Processing 38 (1) (1994) 113}125. [3] F. Meyer, S. Beucher, Morphological segmentation, J Visual Commun. Image Representation 1 (1) (1990) 21}46. [4] L. Najman, M. Schmitt, Watershed of a continuous function, Signal Processing 38 (1) (1994) 99}112. [5] L. Vincent, P. Soille, Watersheds in digital spaces: an e$cient algorithm based on immersion simulations, IEEE Trans. Pattern Anal. Mach. Intell. 13 (6) (1991) 583}598. [6] T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms, MIT Press, Cambridge, MA, 1990. [7] R.E. Tarjan, Data Structures and Network Algorithms, Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania, 1983. [8] A. Bieniek, A. Moga, A connected component approach to the watershed segmentation, in: Mathematical Morphology and its Applications to Image and Signal Processing, Computational Imaging and Vision, Vol. 12, Kluwer Academic Publishers, Dordrecht, 1998, pp. 215}222. [9] A. Meijster, J.B.T.M. Roerdink, A disjoint set algorithm for the watershed transform, Proceedings EUSIPCO'98, IX European Signal Processing Conference, September 8}11, Rhodes, Greece, 1998. [10] A. Bieniek, H. Burkhardt, H. Marschner, M. NoK lle, G. Schreiber, A parallel watershed algorithm, in: Proceedings of the 10th Scandinavian Conf. on Image Analysis (SCIA97), Lappeenranta, Finland, June 1997, pp. 237}244. [11] Alina Moga, Parallel watershed algorithms for image segmentation, Ph.D. Thesis, Tampere University of Technology, Tampere, Finland, 1997.
916
A. Bieniek, A. Moga / Pattern Recognition 33 (2000) 907}916
[12] A. Moga, M. Gabbouj, Parallel image component labelling with watershed transformation, IEEE Trans. Pattern Anal. Mach. Intell. 19 (5) (1997) 441}450. [13] R. Lumia, L. Shapiro, O. Zuniga, A new connected components algorithm for virtual memory computers, Comput. Vision Graphics Image Processing 22 (2) (1983) 287}300. [14] R. Miller, Q.F. Stout, Parallel Algorithms for Regular Architectures: Meshes and Pyramids, MIT Press, Cambridge MA, 1996.
[15] H. Samet, Connected component labeling using quadtrees, J. ACM 28 (3) (1981) 487}501. [16] Pierre Soille, Morphologische Bildverarbeitung, Springer, Berlin, 1998. [17] P. Soille, C. Gratin, An e$cient algorithm for drainage network extraction on DEMs, J. Visual Commun. Image Representation 5 (1994) 181}189.
About the Author*ALINA NICOLETA MOGA was born in Alba Iulia, Romania, in 1969. She received the MSc degree in computer science from `Politehnicaa University of Bucharest, Bucharest, Romania, in 1993 and the Ph.D. degree in parallel image segmentation algorithms at Signal Processing Laboratory, Department of Information Technology, Tampere University of Technology, Tampere, Finland, in 1997. Dr. Moga is currently a research assistant with Albert-Ludwigs-Universitt Freiburg, Institut fuK r Informatik, Freiburg, Germany. Her main research interests include parallel and distributed computing, e$cient algorithms, image segmentation, and multiscale adaptive techniques. About the Author*ANDREAS BIENIEK was born 1966 in Hamburg, Germany. He studied at the Technical University of Hamburg-Harburg until 1993. Toward obtaining the M.Sc. degree in electrical engineering/computer science, Andreas Bieniek worked in 1992 at the University of Melbourne, Australia on `Performance Evaluation of Task Allocation and Scheduling for an Optoelectronic Multicomputera. Currently he is "nalizing his Ph.D. thesis on parallel image processing algorithms at the Albert-Ludwigs-UniversitaK t, Freiburg, Chair of Pattern Recognition and Image Processing. His research interests include parallel algorithms, image segmentation, and communication networks.
Pattern Recognition 33 (2000) 917}933
Adaptive morphological operators, fast algorithms and their applications F. Cheng!, A.N. Venetsanopoulos",* !Electronic Systems R and D, Zenith Electronics Corporation, Glenview, IL 60025, USA "Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada M5S 3G4 Received 15 April 1998; accepted 2 May 1999
Abstract In this paper, adaptive morphological operators are further developed, extending those of Ref. [1], to allow more freedom in forming their operational windows that can adapt their shapes according to the local features of the processed images. The properties of the adaptive operators are investigated. These properties lead to an interesting way to handle images on the basis of the geometrical structure of images, and lead to the development of fast algorithms for the practical application of the adaptive operators. The e$ciency of adaptive operators in image processing is demonstrated with examples. ( 2000 Published by Elsevier Science Ltd. All rights reserved. Keywords: Nonlinear operator; Adaptive morphological operator; Generalized structuring element; Image geometrical structures; Tip and bottom regions; Geometrical performance; Optimization on geometry
1. Introduction In recent years, a number of nonlinear operators such as the median "lter [2] and the morphological "lter [3,4] have attracted a great deal of research interest and have found numerous applications in the areas of image processing and analysis. The early types of those nonlinear operators utilized one operational window with "xed shape and size. In the case of image processing, those nonlinear operators have been reported to have drawbacks such as creating arti"cial patterns and removing signi"cant details [5,6], because of the "xed operational window. Many approaches have been considered to deal with those problems. A well-accepted approach is based on the combination of a family of operational windows. Each window in the family is designed to preserve a special type of detail. The combination of all the windows in the family results in better performance than that
* Corresponding author. Tel.: #1-416-978-8670. E-mail addresses:
[email protected] (F. Cheng), anv@ dsp.toronto.edu (A.N. Venetsanopoulos)
with one "xed operational window [4,6]. The problem of that approach is that in practical cases, the images to be processed may contain too many patterns of signi"cant details. Thus, it may be di$cult to combine enough operational windows to preserve many possible patterns of signi"cant details, while keeping the computational complexity practical. Nonlinear operators that adapt their operational windows according to the local statistics of images were also reported with improved performance [7]. But in some cases, those adaptive nonlinear operators may have two basic di$culties. One is that the computational burden of these may be too heavy for practical applications. Another is that the local statistics of images may not be a good description of the geometrical features of images. To deal with the problems of these existing techniques, a new type of adaptive morphological operators is proposed in Ref. [1]. The operational window of the operators can adapt their shapes according to the geometrical features of images and can take any connected shape of a given size. The work of Cheng and Venetsanopoulos [1] suggested a new way to develop an image processing approach based on the geometrical structures in images and showed through application examples that the
0031-3203/00/$20.00 ( 2000 Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 5 5 - 7
918
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
distinct way of image processing is promising. But the adaptive morphological operators of Ref. [1] are still in their simplest forms and their properties are largely unknown. In order to consider more application areas, these adaptive operators need to be extended to more general forms. Their properties need to be systematically investigated for the further development in both theory and applications. Meanwhile, fast algorithms have to be designed to make the adaptive operators attractive for practical applications. These problems are addressed in this paper. In Section 2, we introduce a general structure of the operational window for the adaptive morphological operators proposed. In Section 3, we de"ne the adaptive morphological operators that utilize the operational window of a general structure and describe their properties. These properties lead to an interesting way to handle images on the basis of the geometrical structure of images, and lead to the proof of a number of propositions described in Section 4. Fast algorithms are designed on the basis of those propositions. In Section 5, application examples are described. Section 6 summarizes the main conclusions. 2. The basic element and the related structuring element The operational window of morphological operators is called structuring element. In Ref. [1], the structuring element is formed by connected pixels. In this paper, we introduce a more general approach. We form a structuring element by connecting basic elements. A basic element is considered as any connected shape. Generally, the shape depends on a speci"c image processing problem. The advantage of this approach will be shown through examples in Section 5. Before giving a formal de"nition of the structuring element, we "rst de"ne the neighboring relation and the connectivity of the basic element. 2.1. Dexnitions related to the basic element Throughout this paper, only the discrete case is considered. That is, let y(i, j) denote an image, the domain set is Mi, jNLZ2 and the range set is MyNLZ, where Z is the set of integers. Let d denote a basic element of any connected shape. The basic element d can be described by its support domain, since only #at structuring elements are considered in this paper. De5nition 1. A reference pixel of a basic element is de"ned as a pixel selected in the domain of the basic element. The position of the reference pixel of a basic element is de"ned as the position of the basic element. In Fig. 1(a), the shaded pixel is chosen as the reference pixel of the basic element. The neighboring relation of
Fig. 1. (a) Position of the basic element. (b) Neighboring relations of the basic elements.
Fig. 2. An example of the general structuring element, size N"4, actual size"22.
basic elements can be de"ned in many ways. Here, we only consider one case. De5nition 2. The basic elements located at (i#k, j#s) are de"ned as the neighbors of the basic element located at (i, j), where (k, s)3M!1, 0, 1N, and k and s cannot be zero simultaneously. The neighboring relation de"ned is shown in Fig. 1(b). According to De"nition 2, neighboring basic elements may overlap with each other depending on their shape. The connectivity of the basic elements is de"ned on the basis of the neighboring relations of basic elements. De5nition 3. Two neighboring basic elements are said connected to each other. 2.2. The generalized structuring element Based on De"nitions 1}3, we can de"ne a new type of generalized structuring element. De5nition 4. The structuring element is formed by connecting N basic elements. The size of the structuring element is de"ned as N. The actual size of the structuring element is de"ned as the number of pixels in the domain of the structuring element.
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
In Section 3, we will further de"ne the adaptive morphological operators which utilize the generalized structuring element. On the basis of the de"nition, it will be shown that the shape of the structuring element is able to adapt to the local features of images. Fig. 2 shows an example of a structuring element formed by connecting four basic elements of 3]3 pixels. In this paper, we only consider "xed N. We also have the limitation of the connectivity of both basic element and structuring element. In future work, we shall extend N to be adaptive and drop the limitation of connectivity.
919
introducing d makes the determination of the domain of each structuring element in $N not a simple matter anyd more, since d can be in any connected shape. In order to make the de"nition of the generalized NOP meaningful, it is necessary to develop a practical approach for the computation of (x " $N). For that purpose, we consider d what is really performed by (x " $N) in Eq. (1). Combining d with Eq. (A.5), Eq. (1) can be expressed as
C
C
DD
(x " $N)(i, j)"max max min (x(t , t )) 1 2 d B(k)|$Nd ((s1 ,s2 )>(i,j)|B(k)s1,s2 ) (t1 ,t2 )|B(k)s1,s2
(2) 3. The adaptive morphological operators and their properties In the appendix of this paper, we give a brief description of the morphological operators with one structuring element and of the morphological operators with a combination of a family of structuring elements. Those operators are the basis for the development of the adaptive morphological operators in this section. The adaptive morphological operators were originally developed on the basis of Eqs. (A.7) and (A.8) in the appendix, and were called the NOP and NCP (a new type of opening and closing operators) in Ref. [1]. In this paper, we still use the names NOP and NCP for adaptive morphological operators for convenience. One of our further research goals is to systematically develop a geometrical way for video processing on the basis of the adaptive morphological operators. Improved names may be considered at that time according to the new understanding of the adaptive morphological operators. 3.1. The NOP and NCP Although the results in this subsection look similar to that of their counterparts of Ref. [1], it should be mentioned that introducing the basic element d has made the NOP and NCP in this paper quite di!erent from the NOP and NCP of Ref. [1]. The di!erence will be shown through the development of fast algorithms and through application examples in Sections 4 and 5. Let $N denote the set of the structuring elements of all d the shapes formed by connecting N basic elements d. The proposed NOP (x " $N) is de"ned as d
In Eq. (2), the minimum of x is computed in the domain of every translation B(k) , which contains (i, j), of every s1 ,s2 structuring element B(k) in $N. Then the maximum is d computed over all the minima obtained. In other words, let DN,d denote the set of all the domains containing (i, j) i,j and formed by N connected basic elements. (x " $N)(i, j) is d assigned the minimum of x in such a domain S 3DN,d i,j i,j that, for any domain S(k)3DN,d!S , i,j i,j i,j (x " $N)(i, j)" min (x(t , t ))* min (x(s , s )) 1 2 1 2 d (s1 ,s2 )|S(k)i,j (t1 ,t2 )|Si,j holds. Those results are summarized as Proposition 1. Proposition 1. An equivalence of the NOP (x " $N) dexned d by Eq. (1) can be expressed as 1. Search for a domain containing (i, j) and formed by N connected basic elements, in which the minimum of x is not smaller than the minimum of x in any other domain containing (i, j) and formed by N connected basic elements. 2. Assign the minimum to (x " $N)(i, j). d A simple example of the domain searched in step 1 of Proposition 1 is given in Fig. 3. In Proposition 1, the domain searched is in fact the structuring element of the opening in Eq. (1), which gives the maximum value. In contrast with the method requiring the computation of all the openings in Eq. (1) before taking the maximum, Proposition 1 gives the relation between the structuring element satisfying (1) at (i, j) and the local geometrical structures of the image x at (i, j) and its neighboring pixels. It enables us to directly deal with only one opening. All the other openings in Eq. (1) do not have to be computed. In that way, Proposition 1 o!ers a great
De5nition 5 (x " $N)(i, j)"max [(x"B(k))(i, j)] (1) d B(k)|$Nd Generally, it is impossible to compute Eq. (1) by direct computation because of two reasons. One is that $N d usually contains too many elements. Another is that
Fig. 3. An example of the search of N basic elements.
920
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
potential in the development of fast algorithms to compute (x " $N). d Proposition 1 requires to search for the domain of a structuring element among all possible structuring elements. We have mentioned that to determine the domain of a structuring element is still a troublesome task, since the basic element d can be any shape. In Section 4, we shall prove a proposition which eliminates the search for the whole domain, thus further facilitating the computation. By using the duality between opening and closing, the case of the NCP can be described in a similar way. We omit the details and only brie#y mention the results. The NCP (x z $N) is de"ned as d De5nition 6 (x z $N)(i, j)" min [(x z B(k))(i, j)]. d B(k)|$Nd
(3)
Proposition 2 gives an equivalent description of Eq. (3). Proposition 2. An equivalence of the NCP (x z $N) dexned d by Eq. (3) can be expressed as 1. Search for a domain containing (i, j) and formed by N connected basic elements, in which the maximum of x is not larger than the maximum of x in any other domain containing (i, j) and formed by N connected basic elements. 2. Assign the maximum to (x z $N)(i, j). d We have mentioned that the domains searched in Propositions 1 and 2 are the structuring elements satisfying Eqs. (1) and (3), respectively. On the other hand, we may consider the connected maximum pixels on an image surface as bright geometrical structures, and the connected minimum pixels as dark geometrical structures. Propositions 1 and 2 show the relation between the structuring elements of the NOP and NCP and the local geometrical structures of images. Thus, they show how the structuring elements adapt their shapes according to the local geometrical structures of images. In Section 3.2, we will give rigorous proofs to such geometrical performance of the NOP and NCP. 3.2. The properties of NOP and NCP So far the properties of the NOP and NCP are largely unknown. In this paper, nine properties and seven propositions of the NOP and NCP are proved. These results may allow a deeper understanding of the theoretical aspects of the NOP and NCP and may enable us to develop fast algorithms and to open new application areas. In this section we investigate a special case, where the basic element d is a single pixel. In Section 4, the extension of the results in this subsection to the general
case of the basic element is discussed. Some properties of the NOP and NCP are obvious extensions of the properties of the conventional morphological operators. Those properties are mentioned in the appendix. Our work on the NOP and NCP has revealed the possibility to develop an image processing approach based on the geometrical structures in images. In this section, we try to use a geometrical language, rather than a morphological language, to make de"nitions and to describe and explain the properties of the NOP and NCP. We hope that this may provide a new beginning towards a geometrical approach in image processing. Consider an image as a surface, the local maxima as the tips on the surface, the local minima as the bottoms, and the other parts on the surface as the slopes. A tip is usually characterized by a rising area on the image surface and a bottom by a falling area. Those rising and falling areas related to the tips and bottoms form geometrical structures in images. Those structures are often the most interesting parts in image processing. De"nition 7 gives a description of the rising areas. Denote the basic element of a single pixel by d , and 0 denote x )x if for every (i, j) in the domains of x and 1 2 1 x , x (i, j))x (i, j) holds. Let Ts1 denote the set of all the 2 1 2 domains ¹i 1 that contain only one tip q . Let Si 1 denote q 1 q the set of the boundary pixels of ¹i 1 , and Ss1 denote the q set of all Si 1 . Let X(Si 1 ) denote the set of all the values of q q the image x on Si 1 , and x6 (Si 1 ) the maximum value in q q i X(Ss1 ). x6 (Si 1 )" max x(s, t). q (s,t)|Siq1 De5nition 7. The tip region ¹ 1 is de"ned as the minq imum domain satisfying x6 (S 1 )" min [x6 (Siq1 )]. q +Siq1 |Ss 1
(4)
In Eq. (4), S 1 is the set of the boundary pixels of the q tip region ¹ 1 . According to De"nition 7, the tip region q ¹ 1 is the domain which has the minimum x6 (S 1 ) (the q q maximum boundary value) among all the domains ¹i 1 containing only one tip q , and which is the smallest 1 q among all the domains with the same maximum boundary value as x6 (S 1) in Eq. (4). The tip region can be q roughly considered as the domain corresponding to the #at area left after horizontally cutting o! the tip q at the 1 height of x6 (S 1). One of the advantages of de"ning the tip q region in this way is that it shows the maximum domain to characterize the rising area leading to the tip q , which 1 does not overlap (except for the boundary pixels) with other tip regions. Fig. 4(a) shows one example of the tip region. We have mentioned that the tip structures are often the most interesting parts of image processing. With De"nition 7, we may consider these structures as objects characterized by the tip regions. Image processing according to these objects allows for the full
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
921
de"ned in the same way as that of tip regions. Before we give a description of the performance of the NOP, we de"ne a few more geometrical characters of an image surface. Let ; 1 denote the #at area of the tip q , D; 1 D q 1 q denote the number of the pixels contained in ; 1 . Let q / 1 denote the set of all the pixels (i, j) satisfying q (i, j)3M¹ 1 !S 1 N, and (i , j ), k"1,2, D/ 1 D denote the q q k k q pixels in / 1 so that for any 1)k(t)D/ 1 D, q q x(i , j )*x(i , j ). k k t t Property 1. The geometric performance of the NOP (x " $N0 ) can be described in four ways. d 1. When D; 1 D*N, the tip q will not be changed by the 1 q NOP. 2. When D; 1 D(N, D/ 1 D*N,q will be yattened to a tip q 1 q q satisfying 2 (a) D; 2 D*N; q (b) (x " $N0 )(i, j)"x(i , j ), for any (i, j)3; 2 . d N N q (c) / 2 "/ 1 , S 2 "S 1 , and ¹ 2 "¹ 1 . q q q q q q (d) ; 2 can be in any connected shape. (5) q 3. When D/ 1 D"N (N, (x " $N01 ), where N "N #1, d q 0 1 0 can be used to cut q , 1 (x " $N01 )(i, j)"x6 (S 1 ); for (i, j)3/ 1 . d q q Then ¹ 1 becomes a part of a new tip, of a slope or a part q of a bottom. (1) } (4) can be used again to describe the change of this new tip, slope or bottom. 4. The other parts of the image surface, such as bottoms and slopes, will not be changed by the NOP.
Fig. 4. (a) De"nitions related to a tip q . The shaded area is at 1 the level of x(S 1 ) and is corresponding to the tip region. (b) The q 1-D geometric description of the NOP. The dots show the original image surface. (c) The 2-D description of the NOP.
utilization of the spatial correlation of image features. Such an advantage is more bene"cial in the case of three and four dimensional images x(i, j, k), x(i, j, t) and x(i, j, k, t). For example, a moving spot in image x(i, j) has not much di!erence from a noise spot. But in image x(i, j, t), a moving spot becomes an object of a long curve, and is quite di!erent from a noise spot. Now let us turn the image surface upside down. The bottoms become tips. Then, the bottom regions can be
The proof of Property 1 is given in the appendix. A simple 1-D example and a 2-D example of the geometric performance of the NOP are illustrated in Fig. 4(b) and (c). Let ¹ 1 , ; 1 , S 1 and x (S 1 ) denote the bottom region b b b b of a bottom b , the #at area of b , the set of the boundary 1 1 pixels of ¹ 1 , and the minimum value of x on S 1 , b b respectively. Let / 1 denote the set of all the pixels (i, j) b satisfying that (i, j)3M¹ 1 !S 1 N, and (i , j ), k"1,2, b b k k D/ 1 D denote the pixels in / 1 so that for any 1)k( b b t)D/ 1 D, b x(i , j ))x(i , j ). k k t t Then, the geometric performance of the NCP (x z $N0 ) d can be described and proved in a similar way as that in the case of the NOP, with q , ¹ 1 , ; 1 , S 1 , x6 (S 1 ), / 1 1 q q q q q and tip replaced by b , ¹ 1 , ; 1 , S 1 , x(S 1 ), / 1 and 1 b b b b b bottom, respectively. Those properties show that the NOP #attens the tips and the NCP "lls the bottoms according to the local geometrical structures in images. That is, the change is made along the geometrical features. In contrast, most of the existing linear or nonlinear image processing
922
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
approaches change images according to the shapes of their operational windows. It has long been reported that these shapes may not well represent the local features of images in many cases. The unique geometrical performance of the NOP and NCP reveals a possible way of image processing based on the geometrical structures rather than the statistical characterization of images. The following properties of the NOP and NCP can be proved on the basis of Property 1, as well as Properties A.1}A.3 in the appendix. Property 2 [[(x " $N0 ) z $N0 ] " $N~i ] z $N~i "(x " $N0 ) z $N0 d d d0 d0 d d where i"0,2, N!1.
(6)
Fig. 5. An example of a less biased approximation.
Property 3 [[(x " $N~i ) z $N~i ] " $N0 ] z $N0 *(x " $N0 ) z $N0 , d0 d0 d d d d [[(x z $N~i ) " $N~i ] z $N0 ] " $N0 )(x z $N0 ) " $N0 , d0 d0 d d d d where i"1,2, N!2.
(7) (8)
Generally, the left-hand side of Eq. (7) gives a less biased approximation of x than that given by the righthand side. The same conclusion holds for Eq. (8). Fig. 5 shows a 1-D example of a less biased approximation. In Fig. 5, x is an image surface, y and y are the 1 2 outputs of morphological operators, y "[[(x " B ) z B ] " B ] z B (9) 1 1 1 2 2 y "[x " B ] z B (10) 2 2 2 where B and B are structuring elements, and B is 1 2 2 larger than B . Fig. 5 shows y resembles the image 1 1 closer with the details removed. Property 4. Among all the openings (closings) with structuring elements of size N or larger, the NOP (x " $N0 )(NCP(x z $N0 )) causes the minimum change of the d d processed image. Property 4 can be proved on the basis of Eqs. (1), (3) and of Property A.2 in the appendix. Consider the case of opening. The opening operation always cuts down the image surface. When a structuring element of the minimum size N is required for an image processing task, Eq. (1) shows that the NOP (x " $N0 ) always cuts down the d image surface the least, thus causes the minimum change of the processed image. The meaning of Property 4 to the image processing can be explained as a type of optimization di!erent from the classical one. In the classical way, optimization is usually based on the statistics of signals. In the case of image processing, it has long been known that a result optimized on the basis of the statistics does
not mean a perceptually optimized result, since the statistics of images are not good descriptors of the geometric structures, especially local geometric structures, of images. By contrast, here we consider a type of optimization based on geometry. We assume that noise and signal patterns di!er only by their sizes. A geometric pattern smaller than a given size is considered as noise. We also assume that the shapes of the geometric patterns of noise and signal objects are not speci"ed. Our task is to remove noise. In such a case, the optimal approach can be summarized as an attempt to remove all the noise patterns and to change the signal patterns as little as possible. This is what is implied by Property 4. Although the geometric interpretation of the optimization given here is far from rigorous and systematic, it can serve as an example to show that it is desirable and possible to develop such a way for image processing. Property 5. For two arbitrary points (i, j) and (s, t), we will obtain the same result of (x " $N0 ) by either xrst computing d (x " $N0 )(i, j), assigning the result to x(i, j), then computing d (x " $N0 )(s, t), assigning the result to x(s, t), or performing d the computation in a reverse order. The same property holds in the case of the NCP (x z $N0 ). d Proof. The proof of Property 1 shows that the result of the NOP and NCP described by Property 1 does not depend on the order of the NOP and NCP operations performed at the points (i, j) and (s, t). h Based on Property 5, we can compute (x " $N0 )(i, j) or d (x z $N0 )(i, j), assign the result to x(i, j), then to compute d the NOP or NCP at the next point. That way enables us not only to reduce the memory used in the computation, but also to develop fast algorithms for the implementation of the NOP and NCP, as will be shown in the next
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
section. Property 5 also allows us to use multiprocessors for parallel processing of image.
4. The algorithms for the computation of the NOP and NCP In this section, we "rst deal with the case of the NOP, then extend the results to the case of the NCP. In Section 4.1, we prove several properties of the NOP in relation with a basic algorithm given in the appendix. Fast algorithms are developed on the basis of those properties. In Section 4.2, several propositions are proved to show that the general NOP proposed in this paper can be decomposed into three stages. The decomposition allows us to compute the general NOP by the fast algorithms of Section 4.1. The decomposition also allows us to extend the properties proved in Section 3 to the case of the general NOP and NCP operators. The computational complexity of the algorithms is discussed in Section 4.3.
be easily combined into the basic algorithm, since we have to compare x(c ) with x(b ) when we search for Md N i i i from Mc N and Mb N in Step 1 of the basic algorithm in the i i appendix. Here Mb N, Mc N and M d N are de"ned in the i i i basic algorithm in the appendix. Suppose the current computational position is at (i, j). In the basic algorithm, we start from (i, j) to search for other N!1 connected pixels of S . Denote the N pixels i,j of S by z ,2, z . The order k of z corresponds to the i,j 1 N k order in which z is found in the search. We have k z "(i, j). 1 Proposition 4. Assume the minimum of x(z ),2, x(z ) is at 1 N the pixels z 1 ,2, z t , where 1)k )2)k )N and k k 1 t 1)t)N. Then we can assign (x " $N0 )(z )"x(z 1 ), 1)k)k , k"k , i"2,2, t; d k k 1 i (12) and the value of (x " $N0 )(z ); 1)k)k , and k"k , i"2, d k 1 i 2, t will not be changed in the later NOP computation. Proof. By assumption,
4.1. Propositions related to fast algorithms A basic algorithm for the computation of the NOP and NCP was published in Ref. [1]. The propositions and the fast algorithm developed on these propositions in this section are based on the basic algorithm. Because the symbols de"ned in the basic algorithm are extensively used in the description of the propositions in this section, the basic algorithm is described in the appendix to make the reference to those symbols easier. In Ref. [1], we proved two propositions for the development of fast algorithms. Later we found that Proposition 4 in Ref. [1] did not work well in many programs. The proposition has been dropped from all our programs. In this section, two new propositions are proved, which have proven to be very e!ective in reducing the computational cost. Proposition 3. When we compute (x " $N0 )(i, j) at pixel (i, j) d in Step 1 of the basic algorithm, if x(c ))x(b ), then, i bN (x " $N0 )(c )"x(c ) d i i
923
(11)
can be determined by the computation at (i, j), where c and i b are dexned in the basic algorithm of the appendix. bN Proposition 3 indicates that, under the condition x(c ))x(b ), the result of the NOP (x " $N0 ) at a numi bN d ber of neighboring pixels of S , that is de"ned in Section i,j 3.1, can be determined by the computation at (i, j). No computation is necessary at those pixels later. The proof is brie#y described as follows. According to the condition of the proposition, the de"nition of c and the de"nition k of b , there are at least N pixels (s, k) connected to k c satisfying x(s, k)*x(c ). The proof is complete by k k considering the de"nition of the NOP. Proposition 3 can
x(z t )" min (x(s, t)). (13) k (s,t)|Si,j In the following, we prove that we cannot "nd a domain D k of N connected pixels, which contains z , where z k k)k or k"k , i"2,2, t, so that 1 i min (x(s , t ))' min (x(s, t)). (14) 1 1 (s1 ,t1 )|Dzk (s,t)|Si,j At the pixels z i , i"1,2, t, Eq. (14) is obviously untrue. k Suppose k"k . Then, the existence of the domain Dz 1 k satisfying Eq. (14) means that z 1 NS , since the basic k i,j algorithm always searches for the pixels corresponding to the N largest values of x. When the search for S reaches i,j z , it will continue through the path of Dz , rather than k k the path to go to z 1 . The result contradicts the assumpk tion z 1 3S . Hence, the maximum of the left-hand side k i,j of Eq. (14) over all possible Dz is equal to the right-hand k side of Eq. (14). Thus, according to Proposition 1, we can choose S as the domain searched in the computation of i,j (x " $N0 ) at z , 1)k)k , and k"k , i"2,2, t. That is d k 1 i (x " $N0 )(z )"x(z 1 ) at those pixels. h d k k Proposition 4 shows that the result of the NOP computed at one pixel (i, j) can be used to determine the value of (x " $N0 ) at a set of pixels in S . Thus, the NOP does d i,j not have to be performed at those pixels again in the later computation. To combine Proposition 4 into the basic algorithm, we can make the following two modi"cations. Suppose the current computation position is at (i, j). (a) In the search for S starting at (i, j), we check whether i,j the value of (x " $N0 ) at a pixel has been determined d before we include the pixel into S . By doing the i,j
924
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
check, we obtain two bene"ts:
Proposition 6
1. If (x " $N0 )(i, j) has been determined by the compud tation at other pixels, then we can bypass (i, j) to compute the NOP at the next pixel. 2. If (x " $N0 )(s, t), (s, t)O(i, j) has been determined, d x(s, t)"(x " $N0 )(s, t). Then on the basis of Propd erty 3, at pixel (s, t), we can always "nd connected N pixels including (s, t), so that the values of x at those pixels are not smaller than x(s, t). This means that the search for S can be ended at (s, t). In this i,j way, the search can be expedited. (b) In the search for S , we check whether the value of i,j (x " $N0 ) at the pixels included in S can be deteri,j d mined.
C
D
(b " $N0 )(i, j)" max min [x(s, t)] . (16) d i,j g(n)|BN,d (s,t)|g(n) Proof. The proof is based on Proposition 5 and Eq. (2). h Let DN,d denote the set of the domains considered in i,j Proposition 1. By de"nition, BN,d is a subset of DN,d. The i,j i,j relation is DN,d"[XBN,dD(i, j)3d ] i,j s,t s,t Hence, according to Proposition 1,
C
C
max min (x(t , t )) (x " $N)(i, j)" max 1 2 d r1,r2 (t1 ,t2 )|g(n) (r1 ,r2 )@(i,j)|dr1,r2 g(n)|BN,d
(17)
DD
Although the propositions proved in this section are only the initial results of our studies, they have resulted in a vast reduction of computational complexity as will be shown in Section 4.3. Our study has revealed more interesting properties of the NOP and NCP, which may be used to further reduce the computational complexity of the NOP and NCP.
" max [(b " $N)(r , r )]. (18) d 1 2 (r1 ,r2 )@(i,j)|dr1,r2 According to Eq. (A.2), the maximum operation in the last part of Eq. (18) is a dilation of (b " $N)(r , r ) by ds. d 1 2 Combining Eqs. (16), (17) and (18), we obtain Proposition 7.
4.2. The extension to the general NOP and NCP
(19) (x"$N)(i, j)"[[(x>ds)"$N0 ]=d](i, j) d d Proposition 7 shows that in general case, the NOP (x " $N) can be computed in three steps. The "rst step is d an erosion with the basic element d as the structuring element. The second step is an NOP (b " $N0 ) with the d size of the structuring element being N. The third step is a dilation with the symmetrical set ds of the basic element d as the structuring element. The "rst and the third steps can be computed by conventional morphological algorithms. Since d is usually small, the computation is fast. The second step can be computed by the fast algorithms developed in the Section 4.1. The block diagram of the computational procedure of the algorithm is shown in Fig. 6(a). An example of the relation between the structuring element of the NOP (x " $N) and the structuring d element utilized in each computational step is shown in Fig. 6(b). The computational structure shown in Fig. 6 is similar to that of the opening of one structuring element with the structuring element being decomposed into several smaller structuring elements [9]. But the computational structure shown in Fig. 6 cannot be obtained on the basis of the theory of the structuring element decomposition, since the NOP with d in step 2 cannot be decom0 posed in the form of an erosion followed by a dilation. On the basis of the computational structure of the general NOP and NCP shown by Proposition 7 and in Fig. 6, the properties in Section 3.2 can be extended to the case of the general NOP and NCP. In this section, we do not go through the details of all those extensions. We only show the proof of the translation invariant property of the general NOP, as an example.
The properties in Section 3.2 and the algorithms in Section 4.1 apply only to the case where the basic element is a single pixel. In this section, we extend the results to the general case, where the basic element d can be any connected shape depending on the requirement of a speci"ed image processing task. Let d denote the domain i,j of the basic element d located at (i, j). Let b(i, j) denote the minimum value of x in d . According to Eq. (A.1), b(i, j) i,j is the erosion of the input image x by the structuring element d. b(i, j)"(x>ds)(i, j)
(15)
where ds is the symmetrical domain of d. Proposition 5. The minimum of an image x in the domain of N connected basic elements is equal to the minimum of b in the domain of N connected pixels located at the same positions as the corresponding basic elements. Proof. According to De"nition 4, the connectivity of the N pixels corresponding to b is guaranteed by the connectivity of the N basic elements. The rest of the proof is obvious. h Let g denote a domain formed by N connected basic elements d, among which one basic element is located at (i, j), and BN,d denote the set of all possible g. The next i,j proposition enables us to use the algorithms developed in Section 4.1 for the computation of the general NOP.
Proposition 7
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
925
dilation, Property A.4 in appendix and Fig. 6, we have (x " $N) "[[(x>ds) " $N0 ]=d] d s,t d s,t "[(x>ds) " $N0 ] =d d s,t "[(x>ds) " $N0 ]=d s,t d "[(x >ds)"$N0 ]=d s,t d "(x "$N). (20) s,t d By the duality between closing and opening, the results obtained in Section 4.1 and in this section can be easily extended to the case of the NCP. 4.3. The computational complexity of the algorithms
Fig. 6. (a) The computational structure of the general NOP. (b) An example of the relation between the structuring element of the general NOP and the structuring elements used in each computation step.
Let d denote a general basic element and d denote 0 the basic element of a single pixel. Let ( f ) denote the s,t translation of the function f in bracket by (s, t). Based on the translation invariant property of erosion and
The computational complexity of the algorithms heavily depends on the complexity of the geometrical structures of images. Since there is no suitable model to represent a natural image on a geometrical basis, generally there is no way to have a theoretical analysis of the computational complexity of the algorithms. In this paper, the computational complexity is measured through experimental results. The results given in this section are only those of the algorithms developed in Section 4.1, where the basic element is a single pixel. Based on the computational structure of the general NOP and NCP, we consider that this section gives a complete picture of the computational complexity of the general NOP and NCP, since the "rst and the third steps of the computation are usually very simple. Two test images `Lenaa and `Toysa are used to show the dependency of the
Fig. 7. The original images of (a) `Lenaa and (b) `Toysa.
926
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
the arithmetic operations, we only consider the computation of comparison since it is the main computation of the NOP and NCP. In Fig. 8(b), the numbers of comparisons/per pixel versus the size of the structuring elements are given. Fig. 8(a) and (b) show that when the size of the structuring elements becomes larger, the algorithms become faster. The fact can be explained by Proposition 3 and 4. When the size of the structuring elements becomes larger, in the computation at one pixel (i, j), the algorithm can compute the NOP and NCP values for more pixels which are in S or which are the neighbors of S . i,j i,j 5. Application examples Some application examples of the NOP and NCP have already been published in other papers [1,11], In Ref. [1], the adaptive "lters based on the NOP and NCP are compared with other "lters in removing impulsive noise from monochrome images. In Ref. [11], a detailed study is given on the e!ect of adaptive "ltering to the color appearance of nature color images. Di!erent types of noise and images are used in Ref. [11]. The results showed the advantage of the adaptive "lters based on the NOP and NCP over many other well-known "lters. The NOP and NCP used in Refs. [1,11] are the early versions whose basic element is a single pixel. The early NOP and NCP work well in noise "ltering. But as shown in this section, they may fail in some other areas of image processing. This section still gives two examples of the early NOP and NCP to show the detail preserving performance and the robustness of the operators over the size change of the structuring elements. Then we present an example that requires the basic element to be chosen according to the speci"ed image processing task, not just to be a single pixel. 5.1. Performance on synthetic image
Fig. 8. (a) The computation time. (b) The numbers of comparisons/per pixel.
computational complexity on the complexity of the images. `Toysa shown in Fig. 7(b) is simpler than Lena shown in Fig. 7(a). The size of the images is 256]256. Fig. 8(a) gives the computation time versus the size of the structuring elements. The results are measured on a SUN-3 workstation. In measuring the complexity of
The example given in this section is to show the detail preserving performance of the NOP and NCP on a synthetic image. The basic element in this section is one pixel. The synthetic image is introduced in Ref. [10], and is used to evaluate a number of detail-preserving ranked-order "lters in Ref. [6]. The synthetic image shown in Fig. 9(a) is sampled from function b(r) de"ned by a(r)"
G
A cos[u r2/R]#128, r)R/2, 0 A cos[u (R2!(r!R)2)/R]#128, R/2(r)3R/2, 0
G
250, a(r)*250,
b(r)" a(r), 0,
0(a(r)(250, a(r))0,
(21)
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
927
Fig. 9. (a) The original synthetic image. (b) Error image by the max of openings and the min of closings. (c) Error image by the NOP and NCP.
where r is the radius from the center, u "3.135, 0 R"160 and a large A"103 is used to reduce the e!ect of the discontinuity of the circles in the image caused by the MoireH patterns.
Fig. 9(b) shows the reversed absolute di!erence between the synthetic image and the image processed by the maximum of four openings and the minimum of four closings. The structuring elements of the four openings
928
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
(closings) are of size N"4, and are oriented at the angles of 0, 45, 90 and 1353. The resulting MSE is 8.08]10~2. Fig. 9(c) shows the reversed absolute di!erence between the synthetic image and the image processed by the NOP and NCP. The size of the structuring element of the NOP and NCP were 7 pixels. The resulting MSE is 3.68]10~4. The two sizes of the structuring elements are experimentally shown to be adequate for the corresponding operators to remove 10% impulsive noise. In Fig. 9, we observe that the NOP and NCP only caused error at the four corners, and that the details in all other parts were completely preserved. 5.2. Ewect of the size change of structuring element The example in this section is to show the robustness of the NOP and NCP over the size change of structuring element. The basic element in this section is one pixel. In practice, the suitable size of the structuring element of a morphological "lter is often chosen subjectively according to the type of noise and images. In this section, the sensitivities of the MSE changes with respect to the size changes of the structuring elements of the two types of morphological operators mentioned in Section 5.1 are investigated. The results shown in Fig. 10 are based on `Lenaa contaminated by 10% impulsive noise. The error before the minimum point in Fig. 10 is mainly caused by the remaining noise and that after the minimum point is mainly by the loss of details in the processed image. Fig. 10 shows that after the minimum point, the wrong size of the structuring elements cause much less performance deterioration in the case of the NOP and NCP compared to that of the maximum of openings and the minimum of closings. Such a property of the NOP and NCP may ease the demand for an optimal size of the structuring element, and thus may make it easier to develop the NOP and NCP with an adaptive size of the structuring element.
Fig. 10. The e!ect of the size change of the structuring elements.
5.3. Performance on image decomposition The application considered in this section is to extract the contours of large objects in an image. The requirement is that the extracted contour image should contain as few details, such as hair or "ne grass, as possible, and that the extracted contours should match the original large objects, including the detailed parts of the objects, such as the sharp corners. This type of image processing has been used in Ref. [8] to achieve image decomposition for coding. It may also "nd applications in pattern recognition and in other areas. According to the requirement, the "ne details such as hair have to be removed before extracting the required contours, since most edge-extracting approaches also pick up "ne details. This section compares the NOP and NCP with other two morphological approaches used for removing details. Linear approaches are not considered since they are known to cause large distortion of the edges and the detailed parts of the large objects in images. After the details are removed, the Sobel Operator is used in all the cases to extract the contours for comparison. The original image Lena is shown in Fig. 7(a). The details in the image are de"ned as the objects smaller than 30 connected 2]2 basic elements, and the lines with width less than two pixels. Fig. 11(a) gives the edge image of `Lenaa without removing the details. Fig. 11(b) gives the contour image with the details removed by the NOP and NCP, [[[(x " $N0 ) z $N0 ] " $N] z $N](i, j), d d d d
(22)
where N"30, d is a pixel and d is a basic element of 0 2]2 pixels. Two decomposition steps are used to obtain a smoother result. Fig. 11(c) shows the contour image with the details removed by opening-closing with one structuring element. The result is obtained with two decomposition steps. The structuring element in the "rst step is a square of 3]3 pixels. That in the second step is shown in Fig. 12(a), whose size is 21 pixels. Fig. 11(d) shows the contour image with the details removed by the opening}closing with a combination of four structuring elements. The decomposition has also two steps. The structuring elements of the "rst step are the compositions of a 2]2 structuring element and four 1-D structuring elements of 3 pixels. The de"nition of the composition of the structuring elements can be found in Ref. [9]. The structuring elements of the second step are the composition of a small structuring element and four 1-D structuring elements of 5 pixels shown in Fig. 12(b). The sizes of the structuring elements in all the cases are chosen on the basis that the resulting images have about the same entropy [8]. Comparing the four images in Fig. 11 shows that the NOP and NCP work well in both removing the "ne details and preserving the detailed parts of large objects.
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
929
Fig. 11. (a) The edge image of `Lenaa without removing details. (b) The edge image with details removed by the NOP and NCP. (c) The edge image with details removed by the opening}closing of one structuring element. (d) The edge image with details removed by the opening}closing with a combination of four structuring elements.
The contours in (b) match those in (a) very well while (b) contains no "ne details. In (c) there is obvious distortion of the contours of the detailed parts of large objects. As can be observed in the place of the eyes, the shapes are distorted according to the shapes of the structuring elements. (d) shows that the problem with one structuring
element is not signi"cantly alleviated by using four structuring elements. The reason is that the shapes of the four structuring elements are only a very small part of all the possible shapes of the size. Thus, the ability for the operator to preserve the shapes of the detailed parts of large objects is very limited. In fact, the problem will be
930
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
image processing based on the geometric structures of images.
7. Summary
Fig. 12. (a) One structuring element. (b) Composition of four structuring elements.
shared by many other operators with a "xed operational window or with limited ability to change the shapes of their operational windows. In contrast, the NOP and NCP can adapt the shapes of their structuring elements to all the possible shapes.
6. Conclusions In this paper, we proposed the NOP and NCP with generalized operational windows. Quite a few interesting results are obtained in the study of the properties of the NOP and NCP. We would like specially to mention Property 4 that shows the necessity and possibility to systematically develop a geometric approach for image processing. On the basis of the properties obtained, fast algorithms are developed for the computation of the NOP and NCP. Our work has brought the computation time of a fully adaptive operator in the range of seconds, and still shows a large room for further improvement. The distinctive performance of the NOP and NCP is demonstrated through several examples. The results showed that, due to the ability to handle the image features as objects, the NOP and NCP not only are attractive for noise "ltering, but also have a great potential in the areas such as coding and pattern recognition. We believe that our work in this area will not only o!er useful tools, but also produce innovated ideas for
A new type of adaptive morphological operators were proposed in Ref. [1]. The operational window of those operators can adapt their shapes according to the geometrical features of images and can take any connected shape of a given size. The adaptive morphological operators in Ref. [1] were proposed to be as simple as possible and their properties were not investigated. In this paper, adaptive morphological operators are further extended to allow more freedom in forming their operational windows that can adapt their shapes according to the local features of the processed images. The properties of these adaptive operators are also investigated. These properties lead to an interesting way to handle images based on the geometrical structure of images, and show the necessity and possibility to systematically develop a geometric approach for image processing. These properties also lead to the development of fast algorithms for the practical application of the adaptive operators. Our work has reduced computation time of a fully adaptive operator to be in the range of seconds, and still shows promise of further improvement. The distinctive performance of the NOP and NCP operators is demonstrated through several examples. The results show that, due to the ability to handle the image features as objects, the NOP and NCP operators not only are attractive for noise "ltering, but also have a great potential in the areas such as coding and pattern recognition.
Appendix A. Morphological operators with one or several structuring elements For reference, we give a brief description of the morphological operators with one structuring element and of the morphological operators with a combination of a family of structuring elements. These operators are the basis for the development of the adaptive morphological operators in Section 3. Let B denote a structuring element. Since we only consider #at structuring elements, B can be expressed by its support domain BLZ2. Denote Bs"M!b : b3BN as the symmetric set of B, and B 1 2 as the translation of t ,t B by (t , t ), where Mt , t NLZ2. Denote the input image 1 2 1 2 by x(i, j). The erosion x>Bs and dilation x=Bs can be expressed as [2] (x>Bs)(i, j)" min (x(t , t )), 1 2 (t1,t2 )|Bi,j (x=Bs)(i, j)" max (x(t , t )), 1 2 (t1,t2 )|Bi,j
(A.1) (A.2)
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
where the reference pixel of the structuring element B can be any pixel in the domain of B. Opening x " B and closing x z B are de"ned as [2] (x " B)"(x>Bs)=B,
(A.3)
(x z B)"(x=Bs)>B.
(A.4)
On the basis of Eqs. (A.1)}(A.4), opening and closing can also be expressed as follows:
C C
D D
(x " B)(i, j)" max min x(t , t ) , 1 2 ((s1 ,s2 )>(i,j)|Bs1,s2 ) (t1 ,t2 )|Bs1,s2
(A.5)
(x z B)(i, j)" min max x(t , t ) . 1 2 ((s1 ,s2 )>(i,j)|Bs1,s2 ) (t1 ,t2 )|Bs1,s2
(A.6)
Let GN denote a family of structuring elements formed d by N connected basic elements d. The morphological operators combining GN are de"ned as the max of opend ings and the min of closings, whose structuring elements are all the elements in GN [4]. Let x"GN and x z GN denote d d d the max of openings and the min of closings, whose structuring elements are all the elements in GN, respecd tively, we have (x " GN)(i, j)" max [(x " B(k))(i, j)], d B(k)|GNd (x z GN)(i, j)" min [(x z B(k))(i, j)]. d B(k)|GNd
(A.7) (A.8)
The existing way to compute Eq. (A.7) (or Eq. (A.8)) is "rst to compute each opening (closing) on the right-hand side of Eq. (A.7) (or Eq. (A.8)), then to take the maximum (minimum). In this way, the number of the structuring elements in GN is greatly limited by considerations of d computational complexity. In practice, GN usually cond tains only a few structuring elements. Hence, the ability of such operators to preserve details and suppress arti"cial patterns is limited, since an image may contain far more than just a few patterns of signi"cant details. In fact, such a problem is shared by many other nonlinear "lters, which combine a number of "xed windows [6]. Property A.1 (Increasing). x )x N(x " $N0 ))(x " $N0 ) 1 2 1 d 2 d
(A.9)
x )x N(x z $N0 ))(x z $N0 ) 1 2 1 d 2 d
(A.10)
Proof. The property can be proved on the basis of Propositions 1 and 2.
931
Proof. According to Proposition 1, the "rst inequality of Eq. (A.11) is obvious. To prove the second inequality of Eq. (A.11), denote S as the domain searched in step 1 i,j of Proposition 1 in the computation of the NOP (x " $N02 )(i, j) performed at (i, j). Then by Proposition 1, d (A.13) (x " $N02 )(i, j)" min x(s, t). d (s,t)|Si,j Choose a subset ¹ of connected N pixels in S , i,j 1 i,j which contains (i, j). Then,
C
C
DD
(x " $N01 )(i, j)"max max min (x(t , t )) d 1 2 1 B(k)|$Nd0 ((s1 ,s2 )>(i,j)|B(k)s1,s2 ) (t1 ,t2 )|B(k)s1,s2 .
(A.14) Then the second inequality of Eq. (A.11) can be proved on the basis of Eq. (A.13) and (A.14). Eq. (A.12) can be proved in a similar way. Property A.3 (Idempotent). )"(x " $N~i ) " $N0 ) (x " $N0 ) " $N~i d d0 d0 d "(x " $N0 ) d (x z $N0 ) z $N~i )"(x z $N~i ) z $N0 ) d d0 d0 d "(x z $N0 ), d where i"0,2, N!1.
(A.15)
(A.16)
Proof. We only give the proof of the second equality of Eq. (A.15), the rest of the property can be proved in a similar way. On the basis of Properties A.1 and A.2, we have ) " $N0 )(x " $N0 ). (A.17) (x " $N~i d d d0 In the following, we prove that the left-hand side of Eq. (A.17) is not smaller than the right-hand side. Let S denote the domain searched in step 1 of Proposition i,j 1 in the computation of (x " $N0 )(i, j) at (i, j). Then by d de"nition, (x " $N0 )(i, j)" min x(s, t). (A.18) d (s,t)|Si,j At an arbitrary pixel (s, t)3S , choose a connected N!i i,j pixels subset ¹ LS , which contains (s, t). Then, in the s,t i,j same way as that in Eq. (A.14), (x " $N~i )(s, t)* min x(s , t )* min x(s , t ). d0 1 1 2 2 (s1 ,t1 )|Ts,t (s2 ,t2 )|Si,j (A.19) Based on Eq. (A.19) and Proposition 1,
Property A.2 (Ordering). Suppose 0(N )N . Then, 1 2 x*(x " $N01 )*(x " $N02 ), d d
(A.11)
x)(x z $N01 ))(x z $N02 ) d d
(A.12)
(x " $N~i ) " $N0 )(i, j)* min (x"$N~i )(s, t) d0 d d0 (s,t)|Si,j * min x(s , t ). 2 2 (s2 ,t2 )|Si,j
(A.20)
932
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
Eqs. (A.18) and (A.20) show that the left-hand side of Eq. (A.17) is not smaller than the right-hand side. Thus, the second equality of Eq. (A.15) is proved. Property A.4 (Translation invariant). Let x denote the s,t translation of x by (s, t). Then (x " $N0 )(i!s, j!t)"(x " $N0 )(i, j), d s,t d (x z $N0 )(i!s, j!t)"(x z $N0 )(i, j). s,t d d
(A.21) (A.22)
Proof of Property 1. Let M(i , j )Dk"1,2, NN denote the k k set of the N pixels in / 1 (/ 1 is de"ned in the section q q before Property 1) on which the image x has the largest values. The proof is given corresponding to the four ways. 1. According to Proposition 1, it is obvious. 2. When D; 1 D(N, D/ 1 D*N (; 1 is de"ned in the secq q q tion before Property (1), we consider the problem in two parts. 1.
First, we consider the computation of the NOP (x " $N0 )(i, j) at (i, j)3M(i , j ) D k"1,2, NN. Pixels in d k k M(i , j ) D k"1,2, NN are connected since ¹ 1 (¹ 1 k k q q is de"ned by Eq. (4)) contains only one tip. Hence, M(i , j ) D k"1,2, NN can be taken as the domain k k searched in step 1 of Proposition 1. Thus, for (i, j)3M(i , j ) D k"1,2 NN, k k (x " $N0 )(i, j)" min [x(s, t)]. d (s,t)|M(ik ,jk )@k/1,2, NN
(1) S contains the pixels included into the operational i,j window so far. Counter M indicates how many pixels are included in S . x denotes the minimum value of i,j 0 image x at the pixels included in S so far. i,j (2) Bu!er Ma N contains the pixels included in S during n i,j the previous search cycle. The corresponding pixel counter is AN. (3) Bu!er Mb N contains the pixels, which are candidates n but are not included in S in the previous search i,j cycle. The corresponding pixel counter is BN. (4) Bu!er Mc N contains the neighboring pixels of Ma N. n n The corresponding pixel counter is CN. (5) Bu!er Md N contains N!M pixels chosen from Mb N n n and Mc N, which correspond to the N!M largest n values of image x at the pixels in Mb N and Mc N, where n n N is the given size of the operational window. Pixels in Md N are candidates in the current search cycle. The n corresponding pixel counter is DN. The current position of the calculation is at pixel (i, j) in image x. The following is the search procedure.
(A.23)
Secondly, we consider all the other pixels (s, t)3 M¹ 1 !M(i , j ) D k"1,2, NNN. Based on the de"niq k k tion of ¹ 1 , we can always "nd a path from (s, t) to q M(i , j ) D k"1,2, NN, on which the values of x are k k not smaller than x(s, t). Thus, according to Proposition 1, at those pixels, (x " $N0 )(s, t)"x(s, t). d
ing to the conditions in Proposition 1. If the number of the pixels included in S is still less than the given size of i,j S , the calculation goes back to Step 1 again. Steps 1 and i,j 2 constitute a search cycle of the algorithm. Before describing the search procedure, we de"ne a number of bu!ers and the corresponding counters.
(A.24)
The proof of (2) is complete. 3. The proof can be based on Property A.3 and the proofs of (1) and (2). 4. The proof is similar to the second part of the proof of (2). The basic algorithm The basic algorithm is for the calculation of the simple NOP described in Ref. [1]. De"ne S as the domain i,j described in the Step 1 of Proposition 1 with the basic element as a single pixel. The basic algorithm has three steps. Step 0 is an initialization. Step 1 is a search at the neighbors of the pixels included in S for the candidate i,j pixels considered in Step 2. In Step 2, we determine which pixels picked up in Step 1 can be included in S accordi,j
Step 0: Pixel (i, j) is included. Assign the initial values: S contains 1 pixel. Thus assign M"1 and i,j x "x(i, j). 0 Assign pixel (i, j) to a . Ma N contains 1 pixel, thus 1 i assign AN"1. Mb N, Mc N, and Md N are empty. Thus assign i i i BN"CN"DN"0. Step 1: Assign the neighboring pixels of a , i"1,2, AN i to c . Use a check-board to record which pixel i has been searched to avoid to pick up the pixels previously searched in the computation at (i, j). Assign the number of the pixels in set Mc N to i counter CN. Choose the N!M largest values from x(b ) i and x(c ), i"1,2, BN, j"1,2, CN. Order the j N!M values and assign the corresponding N!M pixels to d in the order that i x(d )*x(b ). Assign DN"N!M. i i`1 Step 2: Comparing x(d ) with x , we have three cases: i 0 1. If x(d )(x , then assign x "x(d ), 1 0 0 1 a "d , AN"1; b "d , for i"1,2, 1 1 i i`1 DN!1; 1. BN"DN!1 and M"M#1. 1. If M"N, assign (x"B )(i, j)"x , quit. N 0 1. If M(N, goto step 1.
F. Cheng, A.N. Venetsanopoulos / Pattern Recognition 33 (2000) 917}933
2. If x(d )*x , then assign (x"BN)(i, j)"x , DN 0 0 quit. 3. If x(d )*x , x(d )(x , 1(k(DN, as k 0 k`1 0 for i"1,2, k; AN"k; sign a "d , i i b "d , for i"1,2, DN!k; i i`k BN"DN!k and M"M#k, goto Step 1.
References [1] F. Cheng, A.N. Venetsanopoulos, Adaptive morphological "lters for image processing, IEEE Trans. Image Process. 1 (4) 1992. [2] J. Serra, Image Analysis and Mathematical Morphology, Academic press, New York, 1982. [3] P. Maragos, R.W. Schafer, Morphological systems for multidimensional signal processing, IEEE Proc. 78 (4) (1990) 690}709. [4] R.L. Stevenson, G.R. Arce, Morphological "lters: statistics and further syntactical properties, IEEE Trans. CAS 34 (1987).
933
[5] I. Pitas, A.N. Venetsanopoulos, Nonlinear Digital Filters, Kluwer Academic Publishers, Dordrecht, 1990. [6] G.R. Arce, R.E. Foster, Detail-preserving ranked-order based "lters for image processing, IEEE Trans. ASSP 37 (1) (1989) 83}98. [7] E.R. Dougherty, Minimal search for the optimal mean-square digital gray-scale morphological "lter, in: M. Kunt (Ed.), Proceedings of the Visual Communications and Image Processing'90, SPIE Vol. 1360, 1990, pp. 214}225. [8] F. Cheng, A.N. Venetsanopoulos, fast, adaptive morphological decomposition for image compression, in: Proceedings of the Conference on Information Science and Systems, Baltimore, USA, March 1991. [9] X. Zhuang, R.M. Haralick, Morphological structuring element decomposition, Computer Vision, Graphics Image Process. 35 (1986) 370}382. [10] T. Thong, Digital image processing test patterns, IEEE Trans. ASSP (1983) 31. [11] P. Deng-Wong, F. Cheng, A.N. Venetsanopoulos, Adaptive morphological "lters for color image enhancement, J. Intelligent Robotic Systems (15) (1996) 181}207.
About the Author*FULIN CHENG received the B.E. degree in radio engineering from the South China University of Technology, China in 1982, and the M.E. and Ph.D. degrees in electrical engineering from Kyushu University, Japan in 1986 and 1989, respectively. He worked in the University of Toronto, Canada as a research assistant from 1989 to 1992. He is now with Zenith Electronics Crop., USA. His research interests include multi-channel and multi-dimensional system and signal processing, image and video processing, nonlinear adaptive "ltering, as well as server and network for video on demand. About the Author*ANASTASIOS N. VENETSANOPOULOS (SM'79}F'88) received the Dipl. Eng. degree from the National Technical University of Athens (NTU), Greece, in 1965, and the M.S., M.Phil., and Ph.D. degrees in electrical engineering from Yale University, New Haven, CT, in 1966, 1968 and 1969, respectively. He joined the University of Toronto, Toronto, Ont., Canada, in September 1968, were he has been a Professor in the Department of Electrical and Computer Engineering since 1981. He has served as Chairman of the Communication Group and Associate Chairman of the Department Electrical Engineering. He was on research leave at the Federal University of Rio de Janerio, Brazil, the Imperial College of Science and Technology, London, U.K., the National Technical University of Athens, Swiss Federal Institute of Technology, Lausanne, Swizterland, and the University of Florence, Italy, and was Adjunct professor at Concordia University, Montreal, P.Q., Canada. He has served as Lecturer in 130 short courses to industry and continuing education programs, and as Consultant to several organizations. His general research interests include liner M-D and nonlinear "lters, processing of multispectral (color) image and image sequences, telecommunications, and image compression. In particular, he is interested in the development of e$cient techniques for multispectral image transmission, restoration, "ltering, and analysis. He is a contributor to 24 books, and is co-author of Nonlinear Filters in Image Processing: Principles and applications (Boston: Kluwer) and Arti"cial Neural Networks: Learning Algorithms, Performance Evaluation and applications (Boston: Kluwer), and has published over 500 papers on digital signal and image processing and digital communications. Dr. Venetsanopoulos has served as Chairman on numerous boards, councils, and technical conference committees including IEEE committees, such as the Toronto Section (1977}1979) and the IEEE Central Canada Council (1980}1982). He was president of the Canadian Society for Electrical Engineering and Vice President of the Engineering Institute of Canada (EIC) (1983}1986). He has been a Guest Editor or Associate Editor for several IEEE journals, and Editor of the Canadian Electrical Engineering Journal (1981}1983). He is a member of the IEEE Communications, Circuits and Systems, Computer, and Signal Processing Societies, as well as a member of Sigma Xi, the Technical Chamber of Greece, the European Association of Signal Processing, the Association of Professional Engineers of Ontario (APEO) and Greece. He was elected as a Fellow of the IEEE `for contributions to digital signal and image processinga, is a Fellow of EIC, and was awarded an Honorary Doctorate from the National Technical University of Athens for his `contribution to engineeringa in October 1994.
Pattern Recognition 33 (2000) 935}944
Morphological regularization neural networks Paul D. Gader!,*, Mohamed A. Khabou", Alexander Koldobsky" !Department of Computer Engineering and Computer Science, 201 EBW, University of Missouri } Columbia, Columbia, MO 65211, USA "Mathematics and Statistics Department, University of Texas at San Antonio, USA Received 28 December 1998; received in revised form 2 May 1999; accepted 23 June 1999
Abstract In this paper we establish a relationship between regularization theory and morphological shared-weight neural networks (MSNN). We show that a certain class of morphological shared-weight neural networks with no hidden units can be viewed as regularization neural networks. This relationship is established by showing that this class of MSNNs are solutions of regularization problems. This requires deriving the Fourier transforms of the min and max operators. The Fourier transforms of min and max operators are derived using generalized functions because they are only de"ned in that sense. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Morphology; Morphological Shared-weight Neural Network; Regularization Theory; Regularization Network; Hit}miss transform
1. Introduction Morphological shared-weight neural networks (MSNN) were introduced by Won et al. [1] and were used in many automatic target recognition (ATR) and handwriting recognition applications [1}4]. Previous published results suggest that MSNNs perform better than standard shared-weight neural networks (SSNN), either in terms of faster training in the case of digit recognition, or faster training and better detection vs. false alarm rates in the case of target detection [2,5]. This suggests that MSNNs generalize better than SSNNs. Generalization is a measure of how well a trained network performs on a testing data set which has not been used in the training process. Many techniques exist to improve the generalization capability of a neural network by imposing prede"ned constraints on its weights. Such techniques include regularization which uses an added term to the cost function to reduce the e!ect of non-useful weights, or
to impose a priori knowledge on the structure of the neural networks [6}10]. In this paper we establish a relationship between regularization theory and a class of MSNN with no hidden units. We call such neural networks morphological regularization neural networks (MRNN). This relationship is established by showing that this class of MSNNs are solutions of regularization problems. The paper is presented in the following manner. First, we brie#y introduce gray-scale erosion, dilation, hitmiss transform, and MSNN structure. Second, we present an overview of regularization theory. Third, we derive the Fourier transform of min and max operators. Fourth, we establish the relationship between MSNN and regularization theory and show some practical applications of the MRNN. Finally, we present our conclusions.
2. Morphological shared-weight neural networks
* Corresponding author. Tel.: #1-573-882-3644; fax: #1573-882-8318. E-mail address:
[email protected] (P.D. Gader)
Before we describe the MSNN structure, we brie#y explain the basic morphological operations of gray-scale erosion, dilation, and hit}miss transform. More detailed explanation of these and other morphological operations can be found in the literature [11].
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 5 6 - 9
936
P.D. Gader et al. / Pattern Recognition 33 (2000) 935}944
2.1. Gray-scale morphology The basic morphological operations of erosion and dilation of a gray-scale image f by a structuring element (SE) g are de"ned as erosion: ( f >g)(x)"minM f (z)!g (z) : z3D[g ]N, (1) x x dilation: ( f =g)(x)"maxM f (z)!gH(z) : z3D[gH]N, (2) x x where g (z)"g(z!x), gH(z)"!g(!z) and D[g] is the x domain of g. The gray-scale hit}miss transform of f by a pair of structuring elements (h, m) is de"ned by Hit}miss: ( f?g)"( f>h)!( f=mH).
Fig. 1. Example of gray scale erosion, dilation and hit}miss transform.
(3)
The hit}miss transform measures how a shape h "ts under f using erosion and how a shape m "ts above f using dilation [1]. High values indicate good "ts (Fig. 1). 2.2. MSNN structure An MSNN, W, is composed of two cascaded subnetworks, called stages: a feature extraction stage F followed by a feed-forward stage C, i.e., W"(F, C). The feature extraction stage F is composed of one or more layers called feature layers. Each feature layer is composed of one or more feature maps. Each feature map has local, translation invariant connections via hit}miss structuring elements (or kernels) to the previous feature layer. The kernels in the feature extraction layers perform hit}miss transforms on their inputs. The nodes of the last feature extraction layer are the inputs to the feed-forward stage C (see Fig. 2). For example, if we assume one feature extraction layer with n feature maps M , i"1,2, n, 1i and one hidden layer in C with m hidden nodes, we can write the operations of MSNN as follows: Step 1: Compute feature maps values M "A?(h , m ), (4) 1i 1i 1i where A is the input image, (h , m ) are the hit}miss 1i 1i structuring elements for feature map M , i"1,2, n 1i and ? indicates hit}miss transform. Step 2: Compute hidden layer image
A
B
H "s + M *K , (5) j 1i ij i where s(x)"1/(1#e~x) is the sigmoid function, j" 1,2, m, K are the weights connecting feature map ij M to hidden unit j, and * indicates convolution. 1i Step 3: Compute output image
A
B
O"s + w H . (6) j j j The parameters to be determined by training are the hit}miss structuring elements (h , m i ), the convolution 1i 1
Fig. 2. MSNN architecture.
kernels K and the output weights w . In particular, if ij j there is no hidden layer, the output can be written as
A
B A
B
O"s + w M "s + w (A?(h , m )) . i 1i i 1i 1i i i
(7)
3. Regularization theory The general problem of learning a mapping can be described as follows. Given input}output pairs (x , d ) i i where i"1,2, N, x 3Rn and d 3R, one would like to i i "nd a function f such that f (x )"d for i"1,2, N. In i i target recognition, the inputs x could represent samples i of targets (or features extracted from targets) or background/clutter, and the outputs d are generally 1 for i targets and 0 for background/clutter. The problem of learning a smooth mapping from data samples is illposed in the sense that the reconstructed mapping is not unique [8}13]. Constraints can be imposed on the mapping to make the problem well-posed. Typical constraints are smoothness and piecewise smoothness. This technique that uses constraints to transform an ill-posed problem into a well-posed one is called regularization. Regularization is a methodology for learning function
P.D. Gader et al. / Pattern Recognition 33 (2000) 935}944
approximations that uses objective functions of the form 1 N j m( f )" + ( f (x )!d )2# DDPf DD2 i i 2 2 i/1 "m ( f )#jm ( f ), s c
(8)
where j3(0, R) is called the regularization parameter and P is a smoothing operator. The operator P is also referred to as a stabilizer in the sense that it stabilizes the function f making it smooth. The functional m ( f ) is the s standard error term that measures the distance between the desired output d and the value of the function f at i x and m ( f ) is the regularizing term that embeds the i c a priori constraints on f, and hence can make the mapping problem well-posed. By de"nition, the Green's function g(x, x ) associated i with an operator P and centered at x satis"es i PHPg(x, x )"d(x!x ), i i
C
g(x , x ) 1 N F
g(x , x ) 2 g(x , x ) N 1 N N
D"[d , d ,2, d ]T. 1 2 N
4. Regularization theory and MSNN In this section we show that some types of MSNN can be viewed as regularization networks by "nding the Fourier transforms of the min and max operators and showing that erosion, dilation and hit}miss transform can be viewed as Green's functions. The discussion requires the use of generalized functions since the Fourier transform of the min and the max are only de"ned in that sense [14]. 4.1. Max/min operators as Green's functions
1 N N f (x)" + [d !f (x )]g(x!x )" + w g(x!x ), (10) i i i i i j i/1 i/1 where w "(1/j)[d !f (x )]. To "nd w for i"1,2, N, i i i i let F"[ f (x ), f (x ),2, f (x )]T, ="[w , w ,2, w ]T, 1 2 N 1 2 N 2
where /[ f ]": d D fI (s)D2/g8 (s) ds and the tilde indicates R the Fourier transform. The function g8 (s) tends to zero as DDsDDPR, i.e. g8 (s) is a low-pass "lter, and so 1/g8 (s) is a high-pass "lter. The solution to Eq. (12) is shown to be equal to that given by Eq. (10) [8}10].
(9)
where PH is the adjoint operator of P. The solution to Eq. (8) is
g(x , x ) 1 1 G" F
937
D
and
Eq. (10) can be rewritten in a matrix form as F"G=, where ="(1/j)(D!F). This yields ="(G#jI)~1D
(11)
which makes f (x) well de"ned when (G#jI) is invertible [12]. The smoothness of a function can also be de"ned in its frequency domain by the content of its Fourier transform. One function is said to be smoother than another one if it has less energy at higher frequencies. The highfrequency energy of a function can be measured by "rst high-pass "ltering the function and then measuring the energy of the "ltered result. This suggests another equivalent de"nition of a smoothing stabilizer in the frequency domain and an alternative way of posing the problem of regularization. In this formulation, the problem of regularization can be posed as "nding a function f (x) that minimizes 1 N j m( f )" + ( f (x )!d )2# /[ f ], i i 2 2 i/1
(12)
We "rst discuss simple one-dimensional examples, and then discuss the general n-dimensional case. Consider the class of stabilizers P considered by Duchon and Menignet [10,12] in their approach to multivariate interpolation:
P
n DDPf DD2"DDOmf DD2" + dx(L 1 2 m f (x))2, i i n i1 2im R
(13)
where L 1 2 m "Lm/(L i 2L m ) and m*1. In the onei i x x dimensional case where m"n"1, i.e. P( f )"PH( f )" Lf/Lx, the known Green's function is g(x)"DxD. We claim that g(x)"max(x, 0) and g(x)"min(x, 0) are also solutions. To prove this claim, notice that h(x)"x is a solution to the homogeneous problem PHPg(x, x )"0. Since i max(x, 0)"1/2(DxD#x) and min(x, 0)"1/2(x!DxD), it is easy to prove that g(x)"max(x, 0) and g(x)"min(x, 0) are then solutions to PHPg(x, x )"d(x!x ). i i As an illustration, we used the three functions DxD, max(x, 0), and min(x, 0) as Green's functions to approximate a one-dimensional signal based on the 20 data samples shown in Fig. 3. Notice that the samples are not equidistant and that some regions of the signal domain do not contain any samples. We wanted to see how well each function generalizes in the regions where no training samples were available, and we also wanted to observe the e!ect of the data scattering on the approximated signal. The approximated signals shown in Fig. 4 were obtained using j"10~3. As can be seen in Fig. 4, all three functions were equally able to approximate the signal and handle its discontinuities. The signal was approximated equally well in the regions where training data points were dense as in the regions where training data points were disperse. All three functions were equally able to generalize in regions where no training
938
P.D. Gader et al. / Pattern Recognition 33 (2000) 935}944
Theorem. The Fourier transform of the function f (max (x ,2, x )) at the point u"(u ,2, u )3Rn with non-zero 1 n 1 n coordinates is equal to fI (max(x ,2, x ))(u) 1 n "in~1
(u #2#u ) 1 n fI (u #2#u ), 1 n (u 2u ) 1 n
where the tilde indicates Fourier transform.
Fig. 3. Data samples used to approximate the one-dimensional signal.
Proof. First, let us de"ne the integral
P
x0
exp (!iux)dx
(14)
~=
data was available. This is of special value in applications where training data is limited. In the general n-dimensional case, based on the theory behind Eq. (12), to show that max(x ,2, x ) and 1 n min(x ,2, x ) are solutions to a regularization problem 1 n of the form given in Eq. (12), we need to compute their Fourier transforms. The Fourier transforms of a more general class of "lters have been derived (in the sense of distribution) by Dilworth et al. [14]. The following is a special case of that more general theorem.
as the Fourier transform (in the sense of distribution, because the integral diverges) of the function X(x)"1, if x(x and X(x)"0, if x*x . To compute this Fourier 0 0 transform, we use the connection between di!erentiation and the Fourier transform: fI @(u)"iu fI (u).
(15)
The derivative of the function X is the negative Dirac measure !d 0 at the point x , i.e. a mass of !1 located x 0
Fig. 4. Approximation of the one-dimensional signal of Fig. 3 using j"10~3 and (a) g(x)"DxD, (b) g(x)"max(x, 0), (c) g(x)"min(x, 0).
P.D. Gader et al. / Pattern Recognition 33 (2000) 935}944
at x . The Fourier transform of !d 0 is equal to 0 x = !dI 0 (u)"! exp (!iux)d 0 (x) dx x x ~= "!exp (!iux ). (16) 0 Because of Eq. (15), we have XI (u)"!exp(!iux )/iu, 0 so integral (14) is equal to
P
P
x0
!exp (!iux ) 0 . exp (!iux) dx" iu
(17)
~= Now we can compute the Fourier transform of the function f (max(x ,2, x )) where x"(x ,2, x )3Rn 1 n 1 n and f is a function on R. We divide Rn into n parts, where one of the coordinates is greater than all the others, and get fI (max(x ,2, x ))(u ,2, u ) 1 n 1 n
P
f (max(x ,2, x )) exp (!i(u x #2#u x )) dx 1 n 1 1 n n Rn n f (x ) "+ k k/1 xk ;x1 ,2,xk~1 ,xk`1 ,2,xn ]exp (!i(u x #2#u x )) dx 1 1 n n n = "+ f (x )exp (!iu x ) k k k k/1 ~= xk ] < exp (!iu x ) dx dx . m m m k m/1,2,n,mEk ~= Now we use Eq. (17) to compute the integrals in the product. The latter expression becomes "
P P
P
n 1 in~1 + u 2u u 2u k~1 k`1 n k/1 1 = ] f (x ) exp(!i(u #2#u )x ) dx k 1 n k k ~= u #2#u n fI (u #2#u ). "in~1 1 1 n u 2u 1 n
P
(18) (QED)
In particular, if f (x)"x is the identity function, then fI (x)(u)"2pd(u)/iu and the Fourier transform of max(x ,2, x ) is 1 n d(u #2#u ) 'C 1 n . max(x ,2, x )(u)"!2pin (19) 1 n (u 2u ) 1 n In order to derive the Fourier transform of the hit}miss transform, we need to also derive the Fourier transform of min(x ,2, x ). Since min(x ,2, x )" 1 n 1 n !max(!x ,2, !x ), and since fI (!x)(u)"fI (x)(!u), 1 n the Fourier transform of min(x ,2, x ) is then 1 n 'C min(x ,2, x )(u) 1 n
939
'C "!max (x ,2, x )(!u) 1 n "2pin
d(!u !2!u ) 1 n (!u 2!u ) 1 n
d(u #2#u ) 1 n (!1)n(u 2u ) 1 n 'C !max (x , 2, x )(u) if n is even, 1 n " 'C max(x , 2, x )(u) if n is odd. 1 n "2pin
G
(20)
Fig. 5 shows the magnitudes of the discrete Fourier transforms of max(x , x ) and min(x , x ) (using a 1 2 1 2 256]256 array). Notice that the Fourier transforms of max(x , x ) and min(x , x ) have the same magnitude. In 1 2 1 2 the continuous case the Fourier transforms are not de"ned on the coordinate axes because of the product in the denominator. In the discrete case, the magnitudes are very high along the coordinate axes. In addition, in the continuous case, the Fourier transforms are 0 except for the points x#y"0 because of the d function in the numerator. This can also be seen in the magnitudes in Fig. 5. Now, (min(x ,2, x )!max(x ,2, x )) would model 1 n 1 n the hit-miss transform. Based on Eq. (20), the Fourier transform of (min(x ,2, x )!max(x ,2, x )) is 1 n 1 n 'C 'C (min(x ,2, x )!max(x ,2, x ))(u) 1 n 1 n 'C !2 max (x ,2, x )(u) if n is even, 1 n (21) " 0 if n is odd.
G
4.2. Relationship between MSNN and regularization In this section, we precisely describe the relationship between the MSNN and the regularization theory. We show that a class of morphological shared weight networks can be derived using the theory of regularization. Let us refer to this class of networks as morphological regularization neural networks (MRNN). An MRNN can be viewed as a substructure of an MSNN. More precisely, the architecture of an MRNN is identical to an MSNN with no hidden layer in the feed-forward classi"cation stage and with one layer in the feature extraction stage. The operations follow directly from regularization theory. This result places morphological networks "rmly within the established body of mathematics known as approximation theory and provides a mathematical basis and analysis tool for morphological shared-weight networks. We now derive the MSNN from regularization theory. As before, assume that we have a set of known input}output pairs (x , d ) where i"1,2, N, the vectors i i x have dimension n.m for some positive integers n and m, i and that we wish to minimize the functional given in Eq. (12). In this case, / is de"ned using the (min}max)
940
P.D. Gader et al. / Pattern Recognition 33 (2000) 935}944
Fig. 5. Log(magnitude) of discrete Fourier transform of (a) max(x , x ) and (b) min(x , x ). 1 2 1 2
The value max(a !x ) denotes the output of the dilaj i tion (A=xH) at the pixel j (Note: xH is de"ned following i i Eq. (2)) and min(a !x ) denotes the erosion (A>x ). j i i Hence, the operator ( can be written in image form as f a weighted sum of hit}miss transforms N ( [A]" + w (A?(x , x )). (25) f i i i i/1 In this case, both the hit and miss structuring elements are the same and are equal to x . This expression for ( is i f equal to the output of an MSNN with no hidden nodes in the case for which the hit and miss structuring elements are the same.
Fig. 6. Neighborhood a associated with pixel j in image A. j
transform as the Green's function in the following manner:
P
D fI (s)D2s s 2s 1 2 nm ds, (22) d(s #2#s ) S 1 nm where S"Rn.m minus the coordinate planes. The solution given by Eq. (10) is /[ f ]"
N f (x)" + w (min(x!x )!max(x!x )). (23) i i i i/1 Now, suppose that A is a large image of size M]M (assumed square for simplicity). Let a denote the m]n j neighborhood associated with the jth pixel of A, j"1,2, M2 (Fig. 6). The function f induces an imageto-image operator, ( , de"ned by f N ( [A] ( j)" + w (min(a !x )!max(a !x )). (24) f i j i j i i/1
5. MRNN practical examples In this section we describe two practical examples using MRNN. In the "rst example we illustrate how to construct an MRNN for the arti"cial problem of detecting corners in noisy images. In the second experiment we describe how an MRNN was used to detect land mines in ground penetrating radar (GPR) data volume. 5.1. Corner detection A training set consisting of 10 3]3 corners corrupted by noise and 10 random backgrounds was constructed as shown in Fig. 7. The training set for the problem is M(x , d ): i"1,2, 20N, where x ,2, x are nine-dimeni i 1 10 sional vectors representing the corners, d "d "2" 1 2 d "1, x ,2, x are nine-dimensional vectors rep10 11 20 resenting the random backgrounds and d "d " 11 12 2"d "0. With the hit}miss transform, the matrix 20 G"[G ] is de"ned by G "g(x !x )"min(x !x )! ij ij i j i j max(x !x ). Given the training set, we solve for the i j
P.D. Gader et al. / Pattern Recognition 33 (2000) 935}944
941
Fig. 7. (a) foreground and (b) background training samples.
Fig. 8. Testing (a) foreground and (b) background images.
Fig. 9. Output results on the (a) foreground and (b) background testing images of Fig. 8.
weight vector: ="(G#jI)~1D (j"0.3 in this example). This is the `traininga process. Two test images were generated, the "rst consisting of 10 corners with added noise and the second consisting of random values as shown in Fig. 8. Eq. (24) was applied to those images with the weight vector = generated by the training process. The results are shown in Fig. 9. 5.2. Land mine detection The GPR data was collected by GEO-Centers, Inc. on di!erent dates from di!erent "elds that contain di!erent types of buried land mines. When surveying a "eld, a horizontally arranged array of GPR sensors are used to get slices of downward views into the ground (Fig. 10). A stack of these vertical slices (scans) forms a threedimensional volume of data. Fig. 11 shows 10 frames cut from 10 y}t planes of a volume data. These 10 images
share the same y and t ranges. The only di!erence between them is that they are of adjacent x locations, i.e. adjacent columns. From the ground truth "le we know that there are six landmines somewhere in this area. From observation we can notice that the signature of a mine starts to appear like an arch at a certain y}t plane, gets stronger and then fades away after some y}t plane. For example, the second mine (from left) shown in Fig. 11 starts to appear in the 1st y}t plane, appears strongest in the 4th y}t plane and then fades away in the 8th y}t plane. Fig. 12 shows the signature of an M15 land mine. For training we used 466 32]16 mine signatures (32 rows in the t direction and 16 columns in the y direction), 67 false alarms, and 583 backgrounds. These samples were clustered into 50 clusters using a fuzzy c-means clustering algorithm. These 50 samples (26 representing mines and 24 representing backgrounds) are the `traininga samples. The test set consists of 19 volume data "les
942
P.D. Gader et al. / Pattern Recognition 33 (2000) 935}944
Fig. 10. Formation of GPR data.
Fig. 11. Sample y}t planes (of adjacent x coordinates) showing the positions of six di!erent land mines (tick marks at bottom of image).
Fig. 12. Signature of a land mine (ten y}t frames of adjacent x coordinates).
P.D. Gader et al. / Pattern Recognition 33 (2000) 935}944
943
Fig. 13. Example of MRNN output plane.
representing 19 lanes containing a total of 225 mines. The approximate area of each lane is 150 m2 (3 m in x direction by 50 m in y direction). To generate an output plane, the MRNN scans the GPR volume data at each (x, y) location and generates a con"dence value at that location. The con"dence values at all (x, y) locations form the output plane. After the output plane is generated, it is thresholded and then opened with a 3]2 binary structuring element to remove speckle noise. Fig. 13 shows an example of an x}y output plane after noise removal. The small white tick marks indicate the position of the mines known from the ground truth "le. In this example there are 16 white blobs representing the 9 out of 12 mines that were detected and 7 false alarms. The MRNN detection rate for all testing data was 78% and the false alarm rate was 0.07 false alarms per m2.
become available. However, this advantage can become a burden if the number of samples becomes large. As we did in the mine detection example, some kind of clustering/selection algorithm must be used to create/select fewer prototypes.
6. Conclusion
References
We have shown how a certain type of MSNN can be viewed as solutions to regularization problems by "nding the Fourier transform of hit}miss transform and showing that erosion, dilation, and hit}miss transform can be viewed as Green's functions. There are still many questions to be answered. This derivation relies heavily on the use of the theory of generalized functions and requires an in-depth analysis before the implications are well understood. In particular, the properties of the integrals of the form given in Eq. (22) should be investigated in detail. Another problem that needs to be investigated is what values of j work best with a particular problem. In the corner detection problem we used j"0.3 and in the mine detection application we used j"0.1. These two j values worked the best among other values we experimented with. As we have demonstrated in Section 5, the MRNN can be used in a variety of automatic detection problems. However, its e!ectiveness needs to be compared to that of other standard neural networks like the multi-layer perceptron and the standard/morphological sharedweight neural networks. A particular advantage to the MRNN is that no o!-line training is required and additional samples can be added to the sample pool as they
Acknowledgements This e!ort is partially sponsored by the Air Force Research Laboratory (AFRL/MNGI), Air Force Materiel Command, USAF, under grant number F0863096-1-0005, and by the Humanitarian De-mining MURI program sponsored by the O$ce of the Secretary of Defense, contract number DAAG55-97-1-0014. The US Government is authorized to reproduce and distribute reprints for governmental purposes not withstanding any copyright notation thereon.
[1] Y. Won, Nonlinear correlation "lter and morphology neural networks for image pattern and automatic target recognition, Ph.D. Thesis, University of Missouri-Columbia, 1995. [2] Y. Won, P.D. Gader, P. Co$eld, Morphological sharedweight networks with applications to automatic target recognition, IEEE Trans. Neural Networks 8 (1997) 1195}1203. [3] N. Theera-Umpon, M.A. Khabou, P.D. Gader, J. Keller, H. Shi, H. Li, Detection and classi"cation of MSTAR objects via morphological shared-weight neural networks, SPIE Conference on Algorithms for SAR imagery V, Orlando, FL, April 1998. [4] M.A. Khabou, P.D. Gader, H. Shi, Entropy optimized morphological shared-weight neural networks, Opt. Eng. 38 (1999) 263}273. [5] P.D. Gader, Y. Won, M.A. Khabou, Image algebra networks for pattern classi"cation, Proceedings, SPIE Conference on Image Algebra and Morphological Image Processing V, San Diego CA, July 1994. [6] L. Hansen, C. Rasmussen, Pruning from adaptive regularization, Neural Comput. 6 (1994) 1223}1232. [7] A. Weigend, D. Rumelhart, B. Huberman, Generalization by weight-elimination with application to forecasting, Adv. Neural Inform. Proces. Systems 3 (1991) 875}882.
944
P.D. Gader et al. / Pattern Recognition 33 (2000) 935}944
[8] T. Poggio, F. Girosi, Regularization algorithms for learning that are equivalent to multilayer networks, Sci. 247 (1990) 978}982. [9] T. Poggio, F. Girosi, Networks for approximation and learning, Proc. IEEE 78 (9) (1990) 1481}1497. [10] F. Girosi, M. Jones, T. Poggio, Regularization theory and neural networks architectures, Neural Comput. 7 (1995) 219}269. [11] E.R. Dougherty, An Introduction to Morphological Image Processing, SPIE Press, Bellingham, WA, 1992.
[12] S. Haykin, Neural Networks, a Comprehensive Foundation, MacMillan Publishing Co., New York, 1994. [13] M. Renardy, R. Rogers, An Introduction to Partial Di!erential Equations, Springer, Berlin, 1992, pp. 165}176. [14] S.J. Dilworth, A.L. Koldobsky, The Fourier transform of order statistics with applications to Lorentz spaces, Israel J. Math. 92 (1995) 411}425.
About the Author*PAUL GADER after receiving his Ph.D in Applied Mathematics from the University of Florida in 1986 worked as a Senior Research Scientist at Honeywell Systems and Research Center, as an Assistant Professor of Mathematics at the University of Wisconsin } Oshkosh, and as a Research Engineer and Manager at the Environmental Research Institute of Michigan (ERIM) in the area of Signal and Image Processing. He is currently an Associate Professor in the Dept. of Computer Engineering and Computer Science at the University of Missouri, Columbia. Dr Gader has worked on a wide variety of basic and applied research problems since 1984, including Landmine Detection, Unexploded Ordnance Characterization, Automatic Target Recognition, Handwriting Recognition and Document Analysis Systems, Mathematical Morphology in Image Processing and Object Recognition, Fuzzy Sets in Computer Vision, Medical Imaging, and Applied Mathematics. In 1997, he begin working on sensor fusion issues on an Army Research O$ce funding basic research project in Humanitarian De-Mining. In 1998, he developed real-time algorithms for landmine detection with the Geo-Centers Vehivle Mounted Mine Detection System and is currently working on landmine detection algorithms for handheld landmine detectors. He performed his Ph.D. research in the area of Image Algebra and Mathematical Morphology and served as chair of the SPIE Image Algebra and Morphological Image Processing Conference from 1990 to 1995. He is an Associate Editor of the Journal of Mathematical Imaging and the Journal of Electronic Imaging. Dr. Gader has published over 100 technical papers, including 30 refereed journal publications. About the Author*MOHAMED ALI KHABOU received his BS and MS degrees, both in Electrical Engineering, from University of Missouri-Columbia in 1990 and 1993, respectively. He is currently working on his Ph.D. in Electrical Engineering at the same university. His research interests include mathematical morphology, neural networks, automatic target recognition, and handwriting recognition. He is member of the IEEE, SPIE, and the Tunisian Scienti"c Society. About the Author*DR. KOLDOBSKY received a Ph.D. in Mathematics from St. Peterburg State University, Russia in 1982. He has held academic positions at St. Peterburg University of Economics and Finance, University of Texas at San Antonio, and visiting positions at the New York University and the Weizmann Institute of Science. Beginning in September 1999 he will be a Professor in the Dept. of Mathematics at the University of Missouri, Columbia. Dr. Koldobsky has worked on applications of Fourier analysis to Banach space theory, geometry, probability, signal processing, chemical engineering. He has published 36 refereed journal papers.
Pattern Recognition 33 (2000) 945}960
Neural networks with hybrid morphological/rank/linear nodes: a unifying framework with applications to handwritten character recognitionq LuH cio F.C. Pessoa!,*, Petros Maragos" !Motorola, Inc., 3501 Ed Bluestein Blvd., MD: TX11/H4, Austin, TX 78721, USA "Department of Electrical and Computer Engineering, National Technical University of Athens, Zografou 15773, Athens, Greece Received 15 December 1998; received in revised form 25 March 1999; accepted 7 April 1999
Abstract In this paper, the general class of morphological/rank/linear (MRL) multilayer feed-forward neural networks (NNs) is presented as a unifying signal processing tool that incorporates the properties of multilayer perceptrons (MLPs) and morphological/rank neural networks (MRNNs). The fundamental processing unit of MRL-NNs is the MRL-"lter, where the combination of inputs in every node is formed by hybrid linear and nonlinear (of the morphological/rank type) operations. For its design we formulate a methodology using ideas from the back-propagation algorithm and robust techniques to circumvent the non-di!erentiability of rank functions. Extensive experimental results are presented from the problem of handwritten character recognition, which suggest that MRL-NNs not only provide better or similar performance when compared to MLPs but also can be trained faster. The MRL-NNs are a broad interesting class of nonlinear systems with many promising applications in pattern recognition and signal/image processing. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Morphological systems; MRL-"lters; Neural networks; Back-propagation algorithm; Handwritten character recognition
1. Introduction Multilayer feed-forward neural networks, or simply neural networks (NNs), represent an important class of nonlinear systems widely used in problems of signal/image processing and pattern recognition. Their applications in signal/image processing usually employ
q This work was done while both authors were with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA. It was supported by the US National Science Foundation under grant MIP}94-21677, and in part by CNPq (Conselho Nacional de Desenvolvimento CientmH "co e TecnoloH gico), BrasmH lia, Brazil, through a Doctoral Fellowship under grant 200.846/92-2. * Corresponding author. Tel.: #1-512-934-6613; fax: #1934-6688. E-mail addresses:
[email protected] (L.F.C. Pessoa),
[email protected] (P. Maragos)
networks with a single output, which are sometimes called NN-"lters. Furthermore, adaptive "lters and NNs are closely related, so that their adaptation/training can be studied under the same framework [1]. In this sense, the design of an NN-"lter corresponds to the training process of its embedded NN. The usefulness of NNs can be e$ciently investigated due to the existence of the back-propagation algorithm [2], which represents a generalization of the LMS algorithm for feed-forward networks. In this way, the system design is viewed as a problem of unconstrained optimization that is iteratively solved by the method of steepest descent. The node structure in an NN is supposed to model the input}output characteristic of a neuron, and so it represents the essence of the system. The perceptron, i.e., a linear combiner followed by a nonlinearity of the logistic type, is the classic node structure used in NNs. However, it has been observed that logic operations, which are not well modeled by perceptrons, can be generated by some internal interactions in a neuron [3]. For the sake of
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 5 7 - 0
946
L.F.C. Pessoa, P. Maragos / Pattern Recognition 33 (2000) 945}960
a better representation of these internal properties, a possible improvement to the basic perceptron model is presented in this paper. We propose the MRL-NNs [4], a general class of NNs where the combination of inputs in every node is formed by hybrid linear and nonlinear (of the morphological/rank type) operations. The fundamental processing unit of this class of systems is the MRL-"lter [5], which is a linear combination between a morphological/rank "lter and a linear FIR "lter. The MRL-NNs have the unifying property that the characteristics of both multilayer perceptrons (MLPs) and morphological/rank neural networks (MRNNs) [6] are observed in the same system. An important special case of MRNNs is the class of min}max classi"ers [7], which can provide classi"cation results comparable to MLPs, but with faster training processes. Other related works with min}max operations in networks have appeared in Refs. [8,9]. We show in this paper that the MRL-NNs can solve the parity problem in closed form with about half of the number of nodes usually required by MLPs and a smaller computational complexity. Examples from simple pattern classi"cation problems are also included to provide geometrical insights. These demonstrate the potential of this new structure that o!ers e$cient solutions to pattern classi"cation problems by requiring fewer nodes or fewer parameters to estimate than those needed by MLPs. Next, we formulate a simple and systematic training procedure using ideas from the backpropagation algorithm and robust techniques to circumvent the nondi!erentiability of rank functions. Our approach to train the morphological/rank nodes is a theoretically and numerically improved version of the method proposed by Salembier [10,11] to design morphological/rank "lters. Finally, we apply the proposed design methodology to problems of optical character recognition and provide extensive experimental evidence showing not only that the MRL-NNs can generate similar or better results when compared with the classical MLPs, but they also usually require less processing time for training.
rescaling purposes. We will consider the special cases where f is the identity or a nonlinearity of the logistic type and denote the corresponding systems as NNs of types I and II, respectively. A general NN is formally de"ned by the following set of recursive equations: y(l),F(z(l))"( f (z(l)), f (z(l)),2, f (z(l)l )), l"1, 2,2, ¸, 1 2 N z(l),h(y(l~1), w(l)), n"1, 2,2, N , n n l
where l is the layer number, and N is the number of l nodes in layer l. The weight vectors w(l) represent the n tuning parameters in the system. The structure of the lth layer is illustrated in Fig. 1. Besides this, the input and output of the system are y(0)"x"(x , x ,2, x 0 ) (input), 1 2 N y(L)"y"(y , y ,2, y L ) (output). 1 2 N
(2)
Before we de"ne the MRL-NN, we shall review the concept of its fundamental processing unit: The MRL"lter. Let x"(x , x , 2x ) in Rn represent the input 1 2 n signal and y be the output value from the "lter. We use a vector notation to represent the values of the 1D or 2D sampled input signal (after some enumeration of the signal samples) inside an n-point moving window. The MRL-"lter is de"ned as the shift-invariant system whose local signal transformation rule x Cy is given by y,ja#(1!j)b, a"R (x#a)"R (x #a , x #a ,2, x #a ), r r 1 1 2 2 n n b"x ) b@"x b #x b #2#x b , 1 1 2 2 n n
(3)
where j3R, a, b3Rn, and & @ ' denotes transposition. R (t) r is the rth rank function of t3Rn. It is evaluated by sorting the components of t"(t , t ,2, t ) in decreasing order, 1 2 n t *t *2*t , and picking the rth element of (1) (2) (n) the sorted list, i.e., R (t),t , r"1, 2,2, n. The vector r (r) b"(b , b ,2, b ) corresponds to the coe$cients of the 1 2 n
2. The MRL-NN In general terms, a (multilayer feed-forward) NN is a layered system composed of similar nodes, with some of them nonobservable (hidden), where the node inputs in a given layer depend only on the node outputs from the preceding layer. In addition, no feedback is allowed in the topology of this class of systems. Every node performs a generic composite operation, where an input to the node is "rst processed by some function h( ) , ) ) of the input and internal weights and then transformed by an activation function f ( ) ). The node structure is de"ned by the function h. In the case of MLPs, h is a linear combination. The activation function f is usually employed for
(1)
Fig. 1. Structure of the lth layer in a general NN.
L.F.C. Pessoa, P. Maragos / Pattern Recognition 33 (2000) 945}960
linear FIR "lter, and the vector a"(a , a ,2, a ) rep1 2 n resents the coe$cients of the morphological/rank "lter. We call a the `structuring elementa because for r"1 and n the rank "lter becomes the morphological dilation and erosion by a structuring function equal to $a within its support. The variables r and j are the rank and mixing parameters of the "lter, respectively. If j3[0, 1], the MRL-"lter becomes a convex combination of its components, so that when we increase the contribution of one component, the other one tends to decrease. For every point of the signal, we can easily see from Eq. (3) that we need 2n#1 additions, n#2 multiplications and an npoint sorting operation. The MRL-NN is the system de"ned by Eqs. (1) and (2) such that z(l),j(l)a(l)#(1!j(l))b(l), n n n n n a(l)"R (l)n (y(l~1)#a(l)), n r n b(l)"y(l~1) ) (b(l))@#q(l), (4) n n n where j(l), q(l)3R; a(l), b(l)3RNl~1 . n n n n Observe from Eqs. (1), (3) and (4) that the underlying function h is an MRL-"lter shifted by a threshold (1!j(l))q(l). The o!set variables q(l) are important when n n n j(l)"0. The resulting weight vector for every node is then n de"ned by w(l),(a(l), o(l), b(l), q(l), j(l)), (5) n n n n n n where we use a real variable o(l) instead of an integer rank n variable r(l) because we will need to evaluate rank derivan tives during the design of MRL-NNs. The relation between o(l) and z(l) will be de"ned later via a di!erential n n equation, and r(l) is obtained from o(l) via the following n n rescaling:1 N !1 l~1 #0.5 , (6) r(l), N ! n l~1 1#exp (!o(l)) n which is a simple way to map from a variable o(l)3R to n an integer r(l)3M1, 2,2, N N. For example, if o(l) P n l~1 n !R, then r(l)PN , corresponding to a minimum n l~1 operation; if o(l)PR, then r(l)P1, corresponding to a n n maximum operation; if o(l)"0, then r(l)"x N /2#1y , n n l~1 corresponding to a median operation. Two important special cases of MRL-NNs are obtained when f is the identity, de"ning the MRL-NN of type I, and when f is a nonlinearity of the logistic type, de"ning the MRL-NN of type II. In this way, an MLP is a special case of an MRL-NN of type II where j(l)"0∀n, l, and an MRNN is a special case of an MRLn NN of type I where j(l)"1∀n, l. Fig. 2 illustrates the n structure of the lth layer of an MRNN [6].
1 xy ) denotes the usual truncation operation, so that x) #0.5y is the usual rounding operation.
947
Fig. 2. Structure of the lth layer in an MRNN.
3. Geometrical insights Structure (4) of every node in an MRL-NN is a compact representation of a set of hyperplanes. The normal vectors of those hyperplanes will depend on the mixing parameter j and the coe$cients b of the linear FIR "lter. If j"1, the hyperplanes are parallel to some subset of the canonical coordinate directions. For instance, consider a single-node MRL-NN in R2 with r"2, i.e., y "j minMx #a , x #a N 1 1 1 2 2 #(1!j)(x b #x b #q ). 1 1 2 2 1
(7)
The boundary y "0 is de"ned by two lines obtained 1 when either minMx #a , x #a N"x #a or 1 1 2 2 1 1 minMx #a , x #a N"x #a . The resulting lines 1 1 2 2 2 2 are de"ned, respectively, by the equations
C C
x " 2
D C D C
D D
j(b !1)!b q #j(a !q ) 1 1 x ! 1 1 1 , 1 (1!j)b (1!j)b 2 2
(j!1)b q #j(a !q ) 1 2 1 . x " x ! 1 2 1 j(1!b )#b j(1!b )#b 2 2 2 2 If these lines intercept each other, the intersection will occur along the line x "x #a !a . It is not di$cult 2 1 1 2 to show that there will be no intersection if b #b "j/(j!1). Fig. 3(a) illustrates the use of the 1 2 MRL-NN (7) to solve a two-class pattern recognition problem, where the corresponding six unknown parameters were estimated. Fig. 3(b) shows a plot of Eq. (7) as a function of x and x . Similar classi"cation could be 1 2 obtained using a two-layer MLP with at least two hidden nodes, so that at least nine parameters would need to be estimated. A possible way to improve results using a single node is obtained when we set [x, !x] as the input signal. With this choice, we double the number of underlying
948
L.F.C. Pessoa, P. Maragos / Pattern Recognition 33 (2000) 945}960
Fig. 3. Decision boundaries of MRL-NNs.
hyperplanes. For our example in R2, this means that we could easily obtain a closed boundary composed by four lines. Again, similar result could be obtained using a two-layer MLP with at least four hidden nodes. In terms of the number of parameters to be estimated, we would need 11 parameters in an MRL-NN and at least 17 parameters in an MLP. Another solution to the classi"cation problem is illustrated in Fig. 3(c), where now we use Eq. (7) to generate the two-layer MRL-NN y "minMy , b x #b x #q N. 2 1 3 1 4 2 2
(8)
Observe that the resulting decision boundary is closed, and therefore provides robustness to reject spurious
patterns [12]. Similarly, Fig. 3(d) shows a plot of Eq. (8) as a function of x and x . For this MRL-NN we need to 1 2 estimate nine parameters, whereas an MLP would need at least three hidden nodes to generate a closed region with three linear bounds, and the estimation of at least 13 parameters. Thus, the MRL-NNs provide several improvements over MLPs. Not only the number of required nodes or parameters to be estimated in MRL-NNs can be smaller, but also sigmoid functions may not be necessary at all. Note also that, MRL-NNs provide improvements over MRNNs because the boundaries generated when j(l)"1∀n, l (i.e., when each node has no linear part) are n located only in a "nite number of directions. Therefore, the MRL-NN node has advantages over both the basic
L.F.C. Pessoa, P. Maragos / Pattern Recognition 33 (2000) 945}960
perceptron model as well as over the MRNN node. One drawback of MRL-NNs, however, is the computation of rank functions, but this is not a di$cult task since in many pattern recognition applications the feature vectors to rank have a relatively small length and fast sorting algorithms are available.
The parity problem is a generalization of the XOR problem, i.e., given a binary vector x with n components, the parity P (x) is 1 if x contains an odd number of 1s and n 0 otherwise. This problem is usually considered as a reference for checking new types of NNs or new training procedures. Using an MLP trained by the back-propagation algorithm, the parity problem can be solved with at least n hidden nodes [2], so that n#1 nodes are usually required and at least (n#1)2 parameters need to be estimated. On the other hand, we can derive a closedform solution to the parity problem using an MRL-NN. In fact, observe that for every binary vector x with n components,
(x([k] ), d([k] )), k3Z, .0$ K .0$ K
(12)
w(l)(i#1)"w(l)(i)#k v(l)(i), k '0, n n 0 n 0 n"1, 2,2, N ; l"1, 2,2, ¸, l
(13)
where the positive constant k controls the tradeo! 0 between stability and speed of convergence, v(l)"!+J, n and J is some cost function to be minimized. Let us de"ne the error signal )!y(k) e(k)"(e (k), e (k),2, e L (k))"d([k] .0$ K 1 2 N
i + m(k), 1)M)K, k/i~M`1
(15)
where (9)
where 1"(1, 1,2, 1). Thus, splitting the sums in Eq. (9) into even and odd values of r, yields x y
NL m(k),DDe(k)DD2" + e2(k). n n/1
(16)
Based on the steepest descent algorithm, it follows from Eqs. (13) and (15) that
(10)
which clearly can be modeled by an MRL-NN of type I with only x n/2 y #2 nodes, i.e., with about half of the number of nodes usually required by MLPs, and no more than 2n integer parameters. This result represents a considerable improvement over MLPs.
1 v(l)(i)" n M
i + u(l)(k), n k/i~M`1
(17)
where Lm(k) u(l)(k)"! . n Lw(l) n
(18)
5. Adaptive design
If we de"ne the matrices =(l), and L represent, respectively, original image, morphological erosion and morphological opening. The main drawback of this method is the blocking e!ect obtained in the reconstructed image. Kong [6] has shown better performances for Heijman's approach. His approach can be assimilated to openings performed only on pixels of a uniform sampling grid.
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 5 8 - 2
962
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
Another possible method of image representation, can be found in non-uniform sampling. In recent years, several non-uniform image sampling schemes have been developed (see Refs. [7}9]). In Ref. [7] the authors propose a strategy to eliminate the redundant points of erosion. They search for a subset F of F>K such that: 4 (i) F =K"F"K, and, (ii) its cardinality is minimum 4 among the subsets satisfying the condition described in (i). However, their method consists in choosing the minimal number of samples so that the reconstructed image will be equal to the opened image; but it does not indicate how a smaller number of samples could be chosen so that the norm of obtained error is minimum. In this paper, we propose a new method to answer the following question: `Given N, the desired number of samples, how can the best position of these samples be chosen in order to minimize the reconstruction error norma ? Our approach similarly uses erosions and dilations with #at structuring elements but, subsampling is considered from a new point of view and leads to an optimal non-uniform sampling grid where samples are selected considering local visual quality of reconstruction. This paper is organized as follows. The proposed sampling scheme is described in Section 2. In Sections 3 and 4, we study the in#uence of the choice of structuring elements on image description and determine the optimal description parameters set, respectively. An application of our sampling scheme on image compression is presented in Section 5. Finally, conclusions concerning this work are drawn.
2. Algorithm Picture description is performed by sample selection among pixels from the eroded original picture. Their choice is made by considering the bene"t brought to reconstruction when they are dilated. Since the performed description is a subset of erosion, sample reconstruction by dilation will be an approximation of opening. As for uniform sampling, description starts with a given number N of samples but position of these samples will be selected, in an optimal way by observing local resulting reconstruction quality. Optimizing reconstruction quality requires the choice of a criterion which can correspond to that of a minimal mean absolute error between the original picture and the reconstructed one. Such a criterion is signi"cant and it has been established by Devor [10], that among criteria of the form of a Pnorm minimization ¸ (I), 0(p(R, the case with p p"1 corresponding to absolute error, is particularly "tted to human visual properties. To describe the algorithm, we will note: F(i, j) the original picture, Supp(F) its support, E(i, j) and FK (i, j), respectively, the eroded and reconstructed picture.
F (i, j)-E(i, j) will be the subsampled picture. Our algo4 rithm is based on the following proposition: Proposition. For morphological sampling, minimizing the absolute error is equivalent to maximizing the volume of the reconstructed image. Proof. The fact that (i) the sample is a subset of eroded image, and, (ii) the opening is anti-extensive, implies that FK (i, j))F(i, j) ∀(i, j)3Supp(F), thus
A
B
A
B
min + D(F(i, j)!FK (i, j))D ,min + (F(i, j)!FK (i, j)) i,j i,j
CA
,min
B A
+ (F(i, j)) ! + (FK (i, j)) i,j i,j
BD
,
but for a given image + F(i, j) is constant, thus, i,j
A
B
A
B
min + D(F(i, j)!FK (i, j))D ,max + (FK (i, j)) , i,j i,j
h
The algorithm can now be developed. The main idea is to select each sample position according to the resulting quality or rather the observation of resulting descriptive volume which can be brought to the reconstruction. We have shown that maximizing this volume is equivalent to minimizing absolute error between the original picture and the reconstructed one. So sampling is then optimal in accordance with the criterion based on minimizing absolute error. The algorithm requires N iterations, where N represents the a priori number of samples. Let us note K, the #at structuring element. For each iteration, the position of the sample with minimal mean absolute error or equivalent maximal reconstructing volume must be selected. To do so, an auxiliary function called Volumen(i, j), de"ned at each point, is used which contains the increase in volume obtained in the reconstructed image by choosing Position(i, j) for sampling at nth iteration. This auxiliary function value is simply the eroded value of sample (i, j) dilated by the structuring element K. After position has been determined, the auxiliary function Volumen(i, j) is updated by subtraction of the previous described volume. As the size of structuring element is small (compared to the one of the picture) this update is performed locally. The di!erent steps of the algorithm are the following: 1. Initialization: 1. 1. 1. 1.
C C C C
set EQF>K; F (i, j)"0, ∀(i, j)3Supp(F); 4 FK (i, j)"0 ∀(i, j)3Supp(F); Volume1(i, j)"E(i, j)]Card(K) where Card(K) is the number of points belonging to Supp(K).
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
2. For 1)n)N do: 1. C "nd the position (k, l) of maximal value of Volumen(i, j); 1. C F (k, l)"E(k, l); 4 1. C modify Volumen(i, j) for (i, j)3(K=Kx ) (Kx is (i~k, j~l) the re#ection of K); 1. C Volumen`1Q modi"ed Volumen.
963
3. The reconstruction operator is a dilation by K, i.e. FK "F =K. 4 Fig. 1 illustrates the di!erent steps of the algorithm application to a one-dimensional signal. It can be successively observed on: Fig. 1(a).
The original 1D signal and the structuring element K.
Fig. 1. Example of optimal non-uniform sampling with just one structuring element K: (a) 1D signal to subsample, F; (b) eroded signal by K; (c) auxiliary function at the "rst iteration, Volume1(i); (d) "rst sample; (e) updated auxiliary function, Volume2(i); (f ) the "rst seven samples; (g) reconstructed image by the seven samples of Fig. 1(f ); (h) opening of F by K.
964
Fig. 1(b). Fig. 1(c).
Fig. 1(d). Fig. 1(e).
Fig. 1(f). Fig. 1(g). Fig. 1(h).
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
The eroded signal that the algorithm will describe by subsampling. The computed Volume1(i, j), which gives possible increases in described volume, selecting corresponding positions. The "rst selected sample considering maximal reconstructed volume. The updated auxiliary function resulting from the reconstructed volume of the "rst sample. The "rst seven samples which will be obtained by the algorithm. The corresponding reconstructed signal from these seven samples. The opening of the original signal by K, which represents the best possible reconstruction.
The following remarks can be made about the algorithm: f best quality, considering minimal absolute error criterion, for a given number of samples; f simplicity: it uses classical morphological operations; f rapidity: number of iterations is driven by number of required samples and updating Volume(i, j) is locally performed. Moreover, the operations used are only additions and comparisons; f robustness: di!erent types of morphology can be applied and we can substitute other criteria than number of samples for stopping the algorithm, for example global reconstruction error can be used. f quasi-simultaneous encoding and decoding: these two operations can be performed with just one delay corresponding to the extraction of one sample. The algorithm can be applied using a family of #at structuring elements instead of a single one. So "ner details will be described by small-sized structuring elements (this will improve the visual quality), and the #at areas will be sampled by large-sized structuring elements (this will result in a smaller number of samples). The sample selection strategy using a set of structuring elements becomes: Algorithm. Non-uniform sampling by a set of structuring elements Let K , for p"1,2,2, M, be M structuring elements p of di!erent sizes and/or shapes. For each structuring element K , we de"ne the auxiliary function, `Volp umen (i, j)a at each point (i, j). Another function called p `Index(i, j)a is also used to store the size of the structuring element which has been selected, according to maximal reconstructing volume, at position (i, j). The di!erent steps of the algorithm are the following:
1. Initialization: 1. 1. 1. 1. 1.
C C C C C
set E QF>K , for p"1, 2,2, M; p p F (i, j)"0, ∀(i, j)3Supp(F); 4 FK (i, j)"0, ∀(i, j)"3Supp(F); Index(i, j)"0, ∀(i, j)3Supp(F); Volume1 (i, j)"E (i, j)]Card(K ), for p"1,2,2, p p p M.
2. For 1)n)N do: 1. C "nd the position `(k, l)a of maximal value of Volumen (i, j), and corresponding index `qa of the strucp turing element ("size of S.E.), over i, j, and p; 1. C set F (k, l)"E (k, l) and Index(k, l)"q; 4 q 1. C modify Volumen (i, j) for (i, j)3(K =Kx ) , p p q (i~k, j~l) 1. C Volumen`1Q modi"ed Volumen . p p 3. Here, the reconstructed image is given by FK (i, j)"supMF (k, l)]K (i!k, j!l)Dp"Index(k, l) 4 p 'F (k, l)O0N. 4 A listing of this algorithm written in C language is given in Appendix A, at the end of the article. From de"nition of directional opening [2], one can see that if N is chosen large enough, the reconstructed image will be close to the directional opening of original image. Fig. 2 hereafter, gives qualitative results obtained by this new algorithm, for di!erent numbers of samples, on original image `Lenaa of size 256]256 pixels. The structuring element family includes four squares of respective size: 9]9, 7]7, 5]5, 3]3. Fig. 2(b) which is the opening of the original image by the square structuring element 3]3, corresponds to the limit in reconstructed image quality. We can observe that for a su$ciently large number of samples the reconstructed image is a good approximation of the opened image.
3. Structuring elements and description Obviously, the number of samples in#uences reconstruction quality but, as for all morphological descriptions, several parameters de"ning the family of structuring elements must be detailed: type (#at or multilevel), number of components, size and shape. As said before, the structuring element type has been "xed to #at one for simplicity considerations. To illustrate the other parameter choices, experiments of di!erent descriptions are successively presented and based on families composed with di!erent structuring elements: f 1 structuring element but di!erent numbers of samples. f 2 structuring elements of the same size but of di!erent shapes.
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
965
Fig. 2. Our non-uniform sampling algorithm applied on original image `Lenaa with the family of #at square structuring elements of respective size: 9]9, 7]7, 5]5, 3]3: (a) original image `Lenaa of size 256]256 pixels; (b) `Lenaa opened by a square structuring element 3]3; (c) `Lenaa subsampled: N"500; (d) reconstructed image from 500 samples; (e) `Lenaa subsampled: N"2000; (f ) reconstructed image from 2000 samples; (g) `Lenaa subsampled: N"4000; (h) reconstructed image from 4000 samples.
966
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
Fig. 2. (Continued )
f 2 structuring elements of di!erent sizes and di!erent shapes. f 2 structuring elements of di!erent sizes but with square shapes. The results of this series of four experiments are now developed on the original picture `Girla of size 256]256 pixels (Fig. 7(a)). This picture is not really suitable for morphological description, but it has su$ciently generic content and therefore, it is particularly critical to illustrate the performance of the algorithm. For each experiment, we present the reconstructed images as qualitative results, and the corresponding values of peak signal-tonoise ratio (PSNR) to give a quantitative evaluation of performance. The PSNR is calculated using the classical following formula:
A
B
3.2. Two structuring elements of the same size but diwerent shapes Description is now performed with a family of two possible structuring elements of the same size: one square and one rhombus (9]9 square, rhombus with area 85 pixels, which is the closest to the square one). The results are presented in Fig. 4, for 3000 samples. Reconstruction quality can be compared to that obtained previously with 5000 samples but using just one stucturing element. Of course, this last experiment requires one extra description bit per sample to specify the shape of the structuring element used. 3.3. Two structuring elements of diwerent sizes and/or diwerent shapes
2552 PSNR"10 log . 10 1/N2+N +N [F(i, j)!FK (i, j)]2 i/1 j/1
For this experiment we have tested four families each with two structuring elements:
3.1. One structuring element but diwerent numbers of samples
f f f f
We use the same structuring element, a 9]9 square, but we observe "nal descriptions with di!erent numbers of samples: 2000, 3000 and 5000. Fig. 3 shows these three resulting reconstructions (Figs. 3(b)}(d)) and also presents the opened picture (Fig. 3(a)) with the same structuring element, which corresponds to the best-possible reconstruction obtainable, when operating without any limitation on sample number. The reconstruction obtained with 5000 samples can be considered as visually identical to the opened picture.
2 2 1 1
squares: 9]9, 5]5, rhombi: 85 and 25 pixels, square: 9]9, 1 rhombus: 25 pixels, and rhombus: 85 pixels, 1 square: 5]5.
The number of samples was 3000. On reconstructed images, the visual quality is similar and PSNR are close to each other. But, compared to the two other sets of this series of experiments described in previous paragraphs and corresponding to one structuring element or two structuring elements but with same size, here, the visual quality is better. Fig. 5 shows the reconstructed image obtained in the three cases. Fig. 5(a) is the reconstruction
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
967
Fig. 3. Qualitative results obtained by our algorithm applied to image `Girla with a square structuring element of size 9]9: (a) opening of image `Girla (PSNR"20.36); (b) reconstructed image from 2000 samples (PSNR"19.74); (c) reconstructed image from 3000 samples (PSNR"20.16); (d) reconstructed image from 5000 samples (PSNR"20.34).
corresponding to a family with two squares 9]9 and 5]5. Fig. 5(b) corresponds to the two rhombi of size 85 and 25 pixels, respectively. Fig. 5(c) is the result obtained with a 9]9 square and a rhombus of 25-pixel size. Finally, Fig. 5(d) corresponds to the result with a rhombus of 85-pixel size and a 5]5 square.
4. Optimal description set Previous parameters must be de"ned when such picture description is carried out. To be e$cient for image representation, an optimal set of structuring elements must be de"ned giving details of: shape, number of components, sizes and number of samples. This set can be considered as optimal on one hand for pictures of general characteristic as presented here, and on the other hand in our image representation context where a good quality is
Fig. 4. Reconstructed image from 3000 samples by two structuring elements, square of size 9]9 and rhombus of 85 pixels, PSNR"20.42.
968
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
Fig. 5. Reconstructed image by optimal non-uniform sampling with N"3000 samples by di!erent families of two structuring elements: (a) 2 squares 9]9 and 5]5; (b) 2 rhombi of size 85 and 25 pixels; (c) a 9]9 square and a rhombus of 25-pixel size; (d) a rhombus of 85-pixel size and a 5]5 square.
reached for description when its reconstruction is close to opened original picture by the smallest structuring element. 4.1. Shape In previous experiments it has been observed that for a given family of structuring elements, the shape of elements can be "xed but their size may vary. Other experiments searching for an optimal shape by optimization based on genetic algorithms, have shown that the square shape is best. This square shape e$ciency is related to a better description for horizontal and vertical structures which are, statistically, the main components for edges in
natural pictures [11]. Moreover, algorithms with square structuring elements are simpler. 4.2. Number of structuring elements Using a family of structuring elements with more components has two consequences: f reconstruction quality grows, description being optimized at each sample for the best description between structuring elements, f description cost increases, as each sample requires details of the possible structuring element.
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
969
Information compression usually leads to a compromise between quality and bit rate. Morphological descriptions are rather oriented toward mean-to-lowquality reconstruction, corresponding to a very low bit rate, and experiments in this context have shown that operating with four structuring elements is a good compromise. 4.3. Size Concerning the size, two considerations arise: f The size of smallest structuring element determines the resolution of the reconstruction. With no limitation of sample number, quality reconstruction will converge to opened picture by the smallest structuring element. f The size of the greatest structuring element reduces the global number of samples if it "ts to picture content. From these considerations, a size 3]3 for the smallest one gives correct quality for our applications concerning rather low quality. For the greatest structuring element, a size of approximately 10% of image size is in general a good choice. 4.4. Number of samples Visual quality depends on the number of samples. The position of a sample is obtained from the volume it can bring to reconstruction, so the "rst samples correspond to greater structuring elements and last samples tend to correspond to smaller structuring elements. It can be observed that, with the number of iterations, the added volume to the reconstructed picture is very less. Once a given number of samples has been reached, resulting reconstruction from new additive samples is weak and, considering the increasing number of samples, it is better to stop the execution of the algorithm. In an experiment a family of four square structuring elements with sizes: 3]3, 5]5, 11]11 and 21]21, has been used. Fig. 6 shows, for reconstruction, the variation of PSNR with number of samples. Reconstruction could not be better than the opened picture with 3]3 square (corresponding to PSNR equal to 29,63). Qualitative results can be observed in Fig. 7. It can be noticed that image quality with 4000 samples is close to the opened picture.
5. Application to image compression Optimal non-uniform morphological sampling produces compact image representation. So, an application to image compression seems attractive. The image obtained after such description is a sparse matrix (see Fig. 2) in which, each sample supports three types of information:
Fig. 6. Variation of PSNR with the number of samples.
f the gray level of the sample (corresponding to eroded value at the concerned point), f the position in the image, and f the index of the associated structuring element. 5.1. Coding method To encode this matrix, two means are possible. The "rst one consists in separating the di!erent information to be coded into two parts. One is a binary map representing the position of the samples, the other one is a binary string for luminance and structuring element index coding. The second approach takes the three kinds of information globally. In this case, the image to be coded is a multi-level one. In this paper, we will only develop the "rst approach whose main advantages are that it is easier to implement and quicker than the second. The problem is now to "nd a method well adapted to sparse matrix coding. For this purpose, a comparative study of several well-known binary coding techniques proposed in the literature has been carried out. The retained methods are the following: f elias predictive coding method [12], f hierarchical block coding method proposed by Frant and Nevalainen [13], f predictive methods proposed by Netravali and Mounts [14], f block coding method proposed by Zeng and Ahmed [15], f block coding method proposed by Kunt and Johnson [16], f TUH code proposed by Musmann and Preus [17], f arithmetic coding proposed by Rissanen and Langdon [18].
970
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
Fig. 7. Variation of the image quality reconstruction with the number of samples: (a) original image `Girla of size 256]256 pixels; (b) `Girla opened by a square structuring element 3]3; (c) `Girla reconstructed from its "rst 100 samples; (d) `Girla reconstructed from its "rst 600 samples; (e) `Girla reconstructed from its "rst 2400 samples; (f ) `Girla reconstructed from its "rst 4000 samples.
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
All these methods have been applied to several images obtained by the algorithm described in the previous section. Zeng's method gives the lowest bit rate for all tested images. Moreover, it is very simple. In fact, this technique consists in decomposing the matrix to be coded, which is mapped into a one-dimensional array of size 2M according to classical video scanning, into 2a blocks of size 2b (M"a#b). The blocks are separated
971
from each other, by a comma which is coded as `0a. If there is at least a `1a in a block, then each `1a is assigned by a pre"x `1a followed by b bits to indicate its location in the block. No coding is needed for a block which contains no `1a. The optimal value for a is given by: a"log (N/ln 2), where N is the number of `1as in the 2 matrix to be coded. Obviously, if a is not an integer, it must be rounded. At this stage, only the sample position has been coded. For complete coding, it is necessary to encode gray level and the structuring element index for each sample. In our application, in order to reduce the gray-level coding cost, original images have been previously quantized on 16 gray levels (by an optimal quantization minimizing mean quadratic error [19]). So, gray-level coding needs 4 bits per sample. Finally, the structuring element index requires 2 bits because, as stated in the previous section, a family with 4 structuring elements represents the best quality/cost compromise. So, if N is the number of samples for image description, the coded image is a string of (2a!1)#((b#1)]N) bits representing the position matrix followed by (4#2)]N bits corresponding to the gray levels and the structuring element index.
5.2. Experimental results
Fig. 8. Original image of size 128]128 pixels extracted from original QCIF sequence `carphonea.
To assess the performance of our approach, we have compared the results obtained to those given by the JPEG standard (Joint Photographic Expert Group) [20],
Fig. 9. Comparison of results obtained by morphological non-uniform sampling (up) with those obtained by JPEG standard (down), for di!erent compression ratios (CR).
972
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
standard which can be considered at the moment, as the reference for still image compression. As underlined in the section devoted to optimal description set, this image description method is well adapted for middle-to-lowquality reconstruction for images at a low bit rate. The chosen images belong to `videoconferencea or `head and shouldersa image classes. These kinds of images are simpler than pictures like `Girla which was used to optimize the parameters of the description algorithm but correspond to an image type for the former applications. Fig. 8 presents an image of size 128]128 pixels extracted from the original QCIF sequence `carphonea (QCIF format corresponds to images of size 144]176 pixels). Fig. 9 illustrates the qualitative and quantitative results obtained by our method on one hand, by the JPEG standard on the other hand, for di!erent compression ratios. Image observation leads us to say that our method gives a better visual quality than JPEG for high compression ratios (compression ratio approximately greater than 12), even if PSNR is lower. When the compression ratio decreases, the inverse tendancy can be observed. This application enhances the interest of our method for image description and representation at low cost, but also reveals its limitations with respect to "nal quality of the reconstructed image.
6. Conclusion This paper presents a new image description scheme based on non-uniform morphological sampling with an optimal reconstruction quality criterion. Experimental results have helped to establish for the algorithm, parameter sets which are well adapted to low-cost image representation. This non-uniform morphological sampling is at one and the same time simple and optimal. While optimality of description is assumed, only basic operations are applied. In the last part of the paper, the method has been validated by a comparative study in image compression. Global bit rate or reconstruction quality are easy to control. For low bit rate applications, performance is attractive with this original approach.
Acknowledgements This work was supported by the `Centre commun d'eH tudes de TeH leH di!usion et TeH leH communicationsa (C.C.E.T.T. } France) on Research Convention n3 94ME30.
Appendix A. Listing of the non-uniform morphological sampling algorithm with a family of 6at square structuring elements /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - */ /* Morphological non-uniform sampling on images of size N1]N2. */ /* The structuring elements are flat squares which size is defined */ /* by the user. */ /* Multiple selection of a position by several structuring element */ /* indexes is not allowed. */ /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ dinclude(stdio.h' dinclude (stdlib.h' ddefine NBELTMAX 4 ddefine add(a, b, c, d) ((a)](b)((c))? (c) : ((a)](b)'(d)) ( (d) : (a)](b) ddefine sub(a, b, c, d) ((a)!(b)((c)) ? (c) : ((a)!(b)'(d)) ? (d) : (a)!(b) /* Function prototypes */ void Eros}2D(short *Imi, short *Er, int N1, int N2, int p); void Dilat}2D(short *Imi, short *Dil, int N1, int N2, int p); int Cardlcar(int N1, int N2, int p, int i, int j); main ( ) M short *Er[NBELTMAX], *Imo[NBELTMAX], *Imrec, *Ind, *Imi, er, *pos; int x, y, z, nbelt, M, N1, N2, *Sn[NBELTMAX]; int i, j, k, m, n, p, mx, kk, pp, m1, n1, q2, sz, s[NBELTMAX], a, b, c, d, l, l1; ) ) ) /* Image size determination */ ) ) ) sz"N1 *N2;
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
/* Parameter input */ printf(`Number of samples : a); scanf(`%da,& M); printf(`Number of structuring elements : a); scanf(`%da, & nbelt); for(n"0; n(nbelt; n]]) M printf(`Structuring element no %d : C na, n); printf(` }' x and y size : a); scanf(`%da, & s[n]); N /* memory allocation */ Imi " (short *)calloc(sz, sizeof(short)); for(i"0; i(nbelt; i]]) Imo[i]"(short *)calloc(sz, sizeof(short)); Imrec"(short *)calloc(sz, sizeof(short)); Ind"(short *)calloc(sz, sizeof(short)); pos"(short *)calloc(sz, sizeof(short)); for(i"0; i(nbelt; i]]) M Sn[i]"(int *)calloc(sz, sizeof(int)); Er[i]"(short *)calloc(sz, sizeof(short)); N /* image file opening */ ) ) ) /* initialization */ for(i"0; i(nbelt; i]]) Eros}2D(Imi, Er[i], N1, N2, s[i]); /* Volume function initialization */ for(kk"0; kk(nbelt; kk]]) M for(i"0; i(N1; i]]) M for( j"0; j(N2; j]]) M b"Cardlcar(N1, N2, s[kk], i, j); *(Sn[kk]]i*N2]j)"(*(Er[kk]]i*N2]j))*b; N N N /* iterations */ for(k"1; k(M]1; k]]) M mx"0; j"0; for(i"0; i(sz; i]]) M for(kk"0; kk(nbelt; kk]]) M if(((Sn[kk][i]))'mx && *(pos]i)""0) M
973
974
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
mx"Sn[kk][i]; j"i; /* localisation of found sample */ pp"kk; /* corresponding S.E. index */ N
N
N
/* found sample coordinates */ m"j/N2; n"j%N2; /* sample value, index and position actualization */ er"(*(Er[pp]]m*N2]n)); *(Imo[pp]]m*N2]n)"er; *(Ind]m*N2]n)"pp]1; *(pos]m*N2]n)"255; /* Image reconstruction */ for(i"0; i(s[pp]; i]]) M for( j"0; j(s[pp]; j]]) M m1"add(m, i, 0, N1!1); n1"add(n, j, 0, N2!1); if((*(Imrec]m1*N2]n1))(er) (*(Imrec]m1*N2]n1))"er; N N /* local modification of volume Sn for each structuring element */ for(kk"0; kk(nbelt; kk]]) M q2"s[kk]; m1"m]s[pp]; n1"n]s[pp]; for(y"m!q2]1; y(m1; y]]) M for(z"n!q2]1; z(n1; z]]) M if((y'"0) && ( y(N1) && (z'"0) & & (z(N2)) M *(Sn[kk]]y*N2]z)"0; b"*(Er[kk]]y*N2]z); for(i"0; i(q2; i]]) M for( j"0; j(q2; j]]) M if((y]i(N1) && (z]j(N2)) M a"*(Imrec](y]i)*N2]z]j); c"b!a; if(c(0) c"0; *(Sn[kk]]y*N2]z) ]"c; N N
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
N
N
N
N
975
N
N /* end of loop on M */ /* Results storing */ ) ) ) /* Memory liberation */ for(i"0; i(nbelt; i]]) M free(Er[i]); free(Sn[i]); free(Imo[i]); N free(Imi); free(Imrec); free(Ind); free(pos); N /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Erosion by a flat square structuring element of size p*p }' p"spatial dimension in x and y - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ void Eros}2D(short *Imi, short *Er, int N1, int N2, int p) M int i, j, l, m, l1, m1; short mini;
N
for(i"0; i(N1; i]]) M for( j"0; j(N2; j]]) M mini"256; for(l"0; l(p; l]]) M for(m"0; m(p; m]]) M l1"add(i, (l), 0, N1!1); m1"add( j, (m), 0, N2!1); if((*(Imi]l1*N2]m1))(mini) mini"*(Imi]l1*N2]m1); N N *(Er]i*N2]j)"mini; N N
976
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Dilation by a flat square structuring element of size p*p }' p"spatial dimension in x and y - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
void Dilat}2D(short *Imi, short *Dil, int N1, int N2, int p) M int i, j, l, m, max, l1, m1; for(i"0; i(N1; i]]) M for( j"0; j(N2; j]]) M max"0; for(l"0; l(p; l]]) M for(m"0; m(p; m]]) M l1"sub(i, (l), 0, N1!1); m1"sub( j, (m), 0, N2!1); if((*(Imi]l1*N2]m1))'max) max"*(Imi]l1*N2]m1); N N * (Dil]i*N2]j)"max; N N N
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Determination of the number of points belonging to the structuring element support"Card(K) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
int Cardlcar(int N1, int N2, int p, int i, int j) M int l, m, min, cd; cd"0; if((i'"0) && (i((N1!p)) & & ( j'"0) & & ( j((N2!p))) cd"p*p; else M for(l"0; l(p; l]]) M for(m"0; m(p; m]]) M if(((i]l)'"0) & & (( j]m)'"0) & & ((i]l)(N1!p) & & (( j]m)(N2!p)) cd]]; N N N return(cd); N
S. Saryazdi et al. / Pattern Recognition 33 (2000) 961}977
References [1] P. Salambier, Morphological multi-scale segmentation for image coding, Signal Processing 38 (1994) 359}386. [2] P. Salambier, L. Torres, F. Meyer, C. Gu, Region-based video coding using mathematical morphology, Proc. IEEE 83 (6) (1995) 843}857. [3] D. Wang, C. Labit, J. Ronsin, Region-based motion compensated video coding using morphological simpli"cation, Picture Coding Symposium, Melbourne, Australia, 1996. [4] R.M. Haralick, X. Zhuang, C. Lin, J. Lee, The digital morphological sampling theorem, IEEE Trans. Acoust speech signal process. 37 (1989) 2067}2090. [5] H. Heijmans, A. Toet, Morphological sampling, CVGIP Image Understanding 54 (3) (1991) 384}400. [6] X. Kong, J. Goutsias, A study of pyramidal techniques for image representation and compression, JVCIR 5 (2) (1994) 190}203. [7] D. Wang, C. Labit, A lossless morphological sampling scheme for segmented image compression, Proceedings of ICIP, Washington DC, USA, October 1995. [8] P. Maragos, R. Schafer, Morphological skeleton representation and coding of binary images, IEEE Trans. Acoust speech signal process. 34 (1986) 1228}1244. [9] R. Jeannot, D. Wang, V. Haese-Coat, Binary image representation and coding by a double-recursive morphological algorithm, Signal Process. Image Commun. 8 (3) (1996) 241}266.
977
[10] R.A. Devor, B. Jawerth, B.J. Lucier, Image compression through wavelet transform coding, IEEE Trans. Inform. Theory 38 (2) (1992) 719}746. [11] N. Keskes, F. Kretz, H. Maitre, Statistical study of edges in TV pictures, IEEE Trans. Commun. 27 (8) (1979) 1239}1247. [12] P. Elias, Predictive coding, IRE Trans. Inform. Theory 2 (1955) 24}33. [13] P. Frant, O. Nevalainen, Compression of binary images by composite methods based on block coding, IEEE J Visual Commun. Image Representation 6 (4) (1995) 366}377. [14] A.N. Netravali, F. Mounts, Ordering techniques for facsimile coding: a review, Proc. IEEE 68 (1980) 770}786. [15] G. Zeng, N. Ahmed, A block coding technique for encoding sparse binary patterns, IEEE Trans. Acoust speech signal process. 37 (5) (1989) 778}780. [16] M. Kunt, O. Johnson, Block coding: a tutorial review, Proc. IEEE 68 (1980) 770}786. [17] H. Musmann, D. Preuss, Comparison of redundancy reducing codes for facsimile transmission of documents, IEEE Trans. Commun. 25 (11) (1977) 1425}1433. [18] G.G. Langdon, J. Rissanen, Compression of black-white images with arithmetic coding, IEEE Trans. Commun. 29 (6) (1981) 858}867. [19] A.N. Netravali, B.G. Haskell, Digital picture representation and compression, AT&T Bell Laboratories, New York, 1988. [20] G.K. Wallace, The JPEG still picture compression standard, Commun. ACM 34 (4) (1991) 31}44.
About the Author*SAEIG D SARYAZDI was born in Iran in 1960. He graduated from the Department of Electrical and Electronical Engineering, at Shahid Bahonar University in Iran. From January 1994 to October 1997, he was a Ph.D. student at the laboratoire ARTIST at the INSA de Rennes, France. He obtained his Ph.D. degree in image processing, from the University of Rennes in October 1997. From the end of 1997 until now, he is a Professor in the Department of Electrical and Electronical Engineering, at the Shahid Bahonar University in Kerman, Iran. His principal research interests are image representation and compression, mathematical morphology. About the Author*VED RONIQUE HAESE-COAT was born in France in 1960. She graduated in Electronic Engineering in 1983, from the Institut National des Sciences AppliqueH es (INSA) de Rennes, where she also received, in 1987, a Ph.D. degree in image processing. Since October 1989, she has been a lecturer in the Department of Electrical Engineering at the INSA de Rennes and a member of the laboratoire ARTIST at this Institute. She became Directeur de Recherches in 1998. Her principal research interests are texture analysis for segmentation, classi"cation and pattern recognition, mathematical morphology and image coding. About the Author*JOSEPH RONSIN was born in 1948. He is of French nationality. He obtained a M.Sc. in Electronics from the University of Rennes, France, in 1972. He became a lecturer at the Institut National des Sciences AppliqueH es de Rennes, France, in 1972. He became Directeur de Recherches in 1989 and Professor one year later. He has been responsible for several industrial grants between the INSA and the state or private laboratories, in the "eld analysis, image synthesis and image compression. Joseph Ronsin is a Professor in the Department of Electronic Engineering of the INSA and Director of the Laboratoire ARTIST at this Institute. He is also an external researcher for IRISA/INRIA Rennes. His principal research interests are texture analysis and image coding.
Pattern Recognition 33 (2000) 979}995
Similarity measures for convex polyhedra based on Minkowski addition Alexander V. Tuzikov!, Jos B.T.M. Roerdink",*, Henk J.A.M. Heijmans# !Institute of Engineering Cybernetics, Academy of Sciences of Republic Belarus, Minsk, Byelorussia "Institute for Mathematics and Computing Science, University of Groningen, P.O. Box 800, 9700 AV Groningen, Netherlands #Centre for Mathematics and Computer Science, Amsterdam, Netherlands Received 23 June 1999; received in revised form 27 July 1999; accepted 27 July 1999
Abstract In this paper we introduce and investigate similarity measures for convex polyhedra based on Minkowski addition and inequalities for the mixed volume and volume related to the Brunn}Minkowski theory. All measures considered are invariant under translations; furthermore, some of them are also invariant under subgroups of the a$ne transformation group. For the case of rotation and scale invariance, we prove that to obtain the measures based on (mixed) volume, it is su$cient to compute certain functionals only for a "nite number of critical rotations. The paper presents a theoretical framework for comparing convex shapes and contains a complexity analysis of the solution. Numerical implementations of the proposed approach are not discussed. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Shape comparing; Similarity measure; Convex set; Convex polyhedron; Minkowski addition; Slope diagram representation; A$ne transformation; Similitude; Volume; Mixed volume; Brunn}Minkowski inequality
1. Introduction Shape comparison is one of the fundamental problems of machine vision. Shape similarity is usually measured in the literature either by a distance function or a similarity measure. In practice, it is usually important for the result of comparisons to be invariant under some set of shape transformations, leading to the necessity of solving complicated optimization problems. On the other hand, one is always interested to compare shapes in an e$cient way. Since this is not possible in general, it is important to study and describe shape classes and transformation sets for which a compromise between generality and e$ciency can be found. Most of the known related results are valid for comparing 2D shapes (see, for example, Refs. [1,2]) and it is
* Corresponding author. Tel.: #31-50-3633931; fax: #3150-3633800. E-mail address:
[email protected] (J.B.T.M. Roerdink)
not clear how to extend them for comparing 3D shapes e$ciently. This is mainly because of the fact that contour representations are used for comparison of shapes. In 3D the problem becomes much more di$cult and here polyhedral shapes can be considered as a simple but su$ciently general model for developing techniques for shape comparison. In this paper we deal however with the more constrained case of convex polyhedral shapes. This allows us to estimate the complexity of the problem and to develop an approach that avoids the necessity of checking all possible variants, unlike other methods known in the literature. The method we use in this paper for comparing convex polyhedra is based on Minkowski addition. The Brunn} Minkowski theory [3] allows one to introduce several similarity measures for convex shapes based on inequalities for the volume and mixed volume. We consider similarity measures for convex shapes which are invariant under subgroups of the group of a$ne transformations on R3 and follow the outline of the paper [4] devoted to 2D convex polygons. All these similarity
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 5 9 - 4
980
A.V. Tuzikov et al. / Pattern Recognition 33 (2000) 979}995
measures are translation-invariant. If one considers the measures which are invariant under the group of orthogonal transformations, the direct computation of similarity measures in the 3D case becomes very time consuming. Every orthogonal transformation with positive determinant can be considered as a rotation about some axis by a "xed angle. Therefore, the optimization should be performed for all possible positions of rotation axes and rotation angles. Data representation is a very important part of every computation. A spherical representation of convex polyhedra is most suitable while dealing with Minkowski addition. One of the simplest of such spherical representations is the extended Gaussian image (EGI). According to this representation every polyhedral facet is given by a point on the unit sphere having the same unit normal vector as the corresponding facet. A weight is assigned to such a point which equals the area of the corresponding facet. It follows from the Minkowski existence theorem [3] that the discrete distribution of these weights uniquely de"nes a convex polyhedron. The representation is translation-invariant and if the polyhedron rotates its EGI rotates in the same way. Due to these properties the EGI representation is often used in computer vision for solving problems of recognition and pose determination of 3D shapes [5}8]. Although the EGI de"nes a unique convex polyhedron, the reconstruction of a polyhedron itself from its EGI is a di$cult problem. Several algorithms have been developed for this reconstruction. Little [9] suggested an iterative algorithm which "nds the distances of the polyhedral facets from the origin. Recently, Moni [10] proposed an algorithm which "rst establishes an adjacency relation of facets and then "nds directions and lengths of polyhedral edges. However, this algorithm is quite time-consuming due to the necessity of solving nonlinear optimization problems. The time complexity of the polytope reconstruction problem from its EGI was investigated in Ref. [11]. Since the EGI is limited to convex shapes, several extensions of it have been proposed in the literature to deal with non-convex shapes as well [12}14]. This paper deals only with convex polyhedra and uses the slope diagram representation [15]. The facets, edges and vertices of a polyhedron are represented on the unit sphere in R3 by spherical points, spherical arcs and spherical polygons, respectively. Additionally, we keep information about areas of facets and lengths of polyhedral edges. This representation is unique for convex polyhedra, allows easy polyhedron reconstruction and computation of Minkowski addition of polyhedra. This representation is redundant in comparison to EGI which contains only spherical points and areas of corresponding polyhedral facets. As will be shown later, spherical arcs play also an important role in computing similarity measures for convex polyhedra. Although in fact they can be derived from spherical points using time-consuming
reconstruction algorithms, we prefer to have them explicitly in the polyhedron representation. If one restricts oneself to comparing convex polyhedra then it is possible to prove that the volume and mixed volume (which will be referred to as &objective functionals') of a Minkowski sum of polyhedra are piecewise concave functions of the rotation angle of one polyhedron with a "xed axis of rotation. This implies that, for every "xed rotation axis, there is only a "nite number of rotation angles at which it is necessary to compute the objective functionals in order to obtain the similarity measure. We also show that the set of rotation axes to be checked can be found using only information about the orientation of facets of polyhedra and the position of their edges. This set depends also on the similarity measure under consideration. Moreover we show that for the case of (mixed) volume the set of rotation axes to be checked is "nite. The paper is organized in the following way. In Section 2 we brie#y discuss the approaches for Minkowski addition of convex polyhedra, and introduce the slope diagram representation of convex polyhedra, as well as some facts about the a$ne transformation group and its subgroups. Properties of mixed volumes and main inequalities related to the Brunn}Minkowski theory needed in the paper are given in Section 3. To compare convex polyhedra we introduce in Section 4 the notion of similarity measures and de"ne a number of such measures based on inequalities for the volume and mixed volume. In Section 5 similarity measures based on (mixed) volume are investigated which are invariant under rotations and scaling. Given any axis of rotation, it is proved that it is su$cient to compute the objective functionals needed to obtain these measures only for a "nite number of critical rotations, thus generalizing a similar result for the 2D case [4]. Moreover, it is proved for the case of (mixed) volume that only a "nite number of rotation axes has to be checked.
2. Preliminaries This section presents some basic notation and other prerequisites needed in the remainder of the paper. Also, the representation of convex polyhedra using slope diagrams is introduced, as well as some facts about the a$ne transformation group and its subgroups. By K(R3), or brie#y K, we denote the family of all nonempty compact subsets of R3. Provided with the Hausdor! distance [3] this is a metric space. The compact convex subsets of R3 are denoted by C"C(R3), and the convex polyhedra by P(R3). In this paper, we are not interested in the location of a shape A-R3; in other words, two shapes A and B are said to be equivalent if they di!er only by translation. We denote this as A,B.
A.V. Tuzikov et al. / Pattern Recognition 33 (2000) 979}995
2.1. Minkowski addition of convex polyhedra Minkowski addition of two sets A, B-Rn is de"ned by A=B"Ma#bDa3A, b3BN. It is well-known [3] that every element A of C is uniquely determined by its support function given by h(A, u)"supMSa, uTDa3AN, u3S2. Here Sa, uT is the inner product of vectors a and u, and S2 denotes the unit sphere in R3. It is also known that [3] h(A=B, u)"h(A, u)#h(B, u),
u3S2,
981
This theorem is valid for the n-dimensional case as well [3, Theorem 1.7.5]. Eq. (2) is the basis for computing Minkowski addition of convex polyhedra. We follow here the outline of Ref. [15] and refer to it for a more detailed discussion. Since a convex polyhedron is de"ned by its oriented facets, it is su$cient for computation of P=Q to "nd only the facets of polyhedron P=Q. For every facet F(P=Q, u) the normal unit vector u is either orthogonal to a facet of P or/and Q, or there exist non-parallel edges of P and Q for which u is a normal vector. Therefore the facets of P=Q can be obtained by [15,18]
(1)
for A, B3C. The support set F(A, u) of A at u3S2 consists of all points a3A for which Sa, uT"h(A, u). Support sets can be of dimension 0, 1, 2. The support set of dimension k (k"0, 1, 2) is called a k-face and denoted by Fk. If A is a convex polyhedron, then 0-faces, 1-faces and 2-faces are called vertices, edges and facets of A, respectively. Henceforth, a facet will be denoted by F , and its area by S(F ). i i It is known from Minkowski's existence theorem [16] (see also Ref. [3, p. 390] for a discussion of the n-dimensional case as well as a general concept of surface measures for convex sets) that a convex polyhedron is uniquely determined by areas and normal vector directions of its facets.
(1) Minkowski addition of two facets: addition of a facet of P and a facet of Q; (2) Minkowski addition of a facet and an edge: addition of a facet of one of the two summands and an edge of the other; (3) Minkowski addition of a facet and a vertex: addition of a facet of one of the two summands and a vertex of the other; (4) Minkowski addition of two non-parallel edges: addition of non-parallel edges of P and Q.
Theorem 2.1 (Minkowski's existence theorem). Let u ,2, u 3S2 be distinct vectors linearly spanning R3, and 1 k let m ,2, m be positive real numbers such that 1 k k + m u "0. i i i/1 Then there exists a convex polyhedron P in R3 having k facets with normal vectors u and area m , i.e., i i S(F(P, u ))"m i i for i"1,2, k.
2.2. Polyhedra representation
This theorem is true for n-dimensional polytopes as well. Several equivalent ways are known to de"ne Minkowski addition [17] for convex polyhedra using representations based on vertices or facets. These are especially helpful for the actual computation of Minkowski sums. Let p , i"1,2, n be the vertices of P and q , i" i P i 1,2, n , be those of Q. Then Q P=Q"convMp #q D i"1,2, n , j"1,2, n N. i j P Q Here convM ) N denotes the convex hull. Theorem 2.2. Let P and Q be two convex polyhedra in R3. Then for every u3S2, F(P=Q, u)"F(P, u)=F(Q, u).
(2)
Here the added facets, edges, and vertices lie in supporting planes with parallel outward normals.
The remainder of the paper makes use of the slope diagram representation (SDR) of convex polyhedra [15]. According to this representation, facets, edges and vertices of a polyhedron are given by points, spherical arcs and convex spherical polygons of the unit sphere S2, see Fig. 1. f Facet representation. A facet F of a polyhedron which i is orthogonal to the unit vector u is represented on the i sphere S2 by the end point of this vector. f Edge representation. Each edge is represented by the arc of the great circle (spherical arc) joining the two points corresponding to the two adjacent facets of the edge. f Vertex representation. The region (called the spherical polygon) of the sphere bounded by the spherical arcs corresponding to the edges which are adjacent to a polyhedral vertex, represents this vertex on the sphere S2. The spherical arcs are included in the region. Sometimes we speak about spherical points and arcs of a polyhedron, meaning spherical points and arcs of its slope diagram representation. Also, weights of spherical points and spherical arcs are used. The weight of a spherical point or arc equals the area of the corresponding polyhedral facet, or the length of the corresponding polyhedral edge, respectively.
982
A.V. Tuzikov et al. / Pattern Recognition 33 (2000) 979}995
Q by merging their slope diagram representations. The following three cases need special attention: (1) A spherical arc of one polyhedron intersects a spherical arc of the other: (2) A spherical point of one polyhedron lies on a spherical arc of the other: (3) Two spherical points coincide.
Fig. 1. Polyhedron (a) and its slope diagram representation (b).
Therefore, the SDR of a polyhedron P is a triple SDR(P)"(V, A, W). Here V"Mu , u ,2, u P N is the set 1 2 n of spherical points, for which the same notation is used as for the corresponding unit vectors Mu N of P. ALV]V i is the set of spherical arcs. An arc from A connecting points u and u is denoted by (u , u ). W denotes the i j i j weights of points and arcs, i.e., a (u ) (or a(u )) equals the P i i area of the corresponding facet F and l (u , u ) (or simply i P i j l(u , u )) equals the length of the edge between facets i j F and F of the polyhedron P. i j In the two-dimensional case, i.e., in the case of convex polygons, the slope diagram can be considered also as a function M(P, u) de"ned on the unit circle S1. Given a polygon P-R2, denote by l the length of edge i and by i u the vector orthogonal to this edge. Then i
G
l M(P, u)" i 0
if u"u i otherwise.
This representation is also called in Ref. [4] a perimetric measure representation. As follows from Eq. (2), Minkowski addition of two convex polygons can be computed by merging their respective slope diagrams. Mathematically, this amounts to the following relation [17,19]: M(P=Q, u)"M(P, u)#M(Q, u), for P, Q3P(R2) and u3S1.
(3)
Let us denote by a "Lu the angle between the positive i i x-axis and u . Then, given a slope diagram representation i M(P, u) of a convex polygon P, its area S(P) can be computed as follows [4]: n i 1 n S(P)" + l sin a + l cos a ! + l2 sin a cos a . (4) i i j j 2 i i i i/1 j/1 i/1 Here n is the number of vertices of polygon P. Now we have all the necessary tools to "nd the Minkowski addition of two convex polyhedra P and
Let us consider these cases in more detail. Case 1: Let two spherical arcs (u, u@) and (v, v@) intersect at the point w3S2 (see Fig. 2(c)). Point w represents a facet of P=Q. This point is adjacent to u, u@, v, v@ and the weights of the corresponding spherical arcs are computed as follows (see Fig. 2(d)}(f) for illustration): l (w, u)"l (w, u@)"l (u, u@), P^Q P^Q P l (w, v)"l (w, v@)"l (v, v@). P^Q P^Q Q For, the edges corresponding to arcs (u, u@) and (v, v@) will be the edges of a facet (parallelogram) of P=Q corresponding to w. The normal vectors uA"u]u@/Du]u@D and vA"v]v@/Dv]v@D are parallel to the corresponding edges of polyhedra P and Q represented by the arcs (u, u@) and (v, v@), respectively. Directions and lengths of all edges of the facet corresponding to the point w being known, one can "nd the area of this facet by (4). Case 3: Let us consider now an example of case 3. Denote the coinciding spherical points of P and Q by s (see Fig. 3(a)}(c)). Suppose also that point s is adjacent to spherical points u , u , u of P and spherical points 1 2 3 v , v , v , v of Q. Point s represents a facet of polyhed1 2 3 4 ron P=Q. The arcs (s, u ), (s, u ), (s, v ) and (s, v ) are 2 3 2 4 assumed to belong to di!erent great circles. Therefore, there will be arcs (s, u ), (s, u ), (s, v ) and (s, v ) in the 2 3 2 4 SDR of P=Q with lengths determined by the SDR of P and Q, respectively. For, the edges corresponding to these spherical arcs will be the edges of the polyhedral facet corresponding to s in P=Q. The arcs (s, u ), (s, v ) 1 1 and (s, v ) are assumed to belong to the same great circle, 3 such that the arcs (s, u ) and (s, v ) have the same direc1 1 tion and the arc (s, u ) is shorter than (s, v ). Therefore, 1 1 the spherical point s in P=Q will be adjacent to u and 1 v and l (s, u )"l (s, u )#l (s, v ) and l (s, v )" 3 P^Q 1 P 1 Q 1 P^Q 3 l (s, v ). That is, the edges e , e corresponding to the Q 3 1 2 arcs (s, u ) and (s, v ) on the same great circle are parallel, 1 1 with the length of the corresponding edge of polyhedron P=Q being equal to the sum of the lengths of the edges e , e . This rule of changing weights is illustrated in 1 2 Fig. 3(f ). Similar to case 1 we can compute the area of the facet of P=Q corresponding to the spherical point s by Eq. (4). Case 2: This is similar to case 3. Suppose that a spherical point u lies on a spherical arc (v, v ). Let us introduce 1 a new spherical point v@ on the arc (v, v ) at the same 1 position as u having weight zero, i.e. corresponding to
A.V. Tuzikov et al. / Pattern Recognition 33 (2000) 979}995
983
Fig. 2. Minkowski addition of two convex polyhedra P and Q with intersecting spherical arcs.
a rectangular facet of zero area. This brings us back to case 3.
E: (plane) re#ections (planes passing through the origin) I: isometries (distance preserving transformations) S: similitudes (rotations, re#ections, multiplications)
2.3. Transformation groups Consider subgroups of the group G@ of azne transformations on R3. If g3G@ and A3K, then g(A)" Mg(a) D a3AN. We write g,g@ if g(A),g@(A) for every A3K. This is equivalent to saying that g~1g@ is a translation. We denote by G the subgroup of G@ containing all linear transformations, i.e., transformations g with g(0)"0. The following result is obvious. Lemma 2.3. For any two sets A, B-R3 and for every g3G, g(A=B)"g(A)=g(B).
(5)
We introduce the following notations for subsets of G: M: multiplications with respect to the origin by a positive factor R: rotations about an axis (passing through the origin)
Observe that I, R, M and S are subgroups of G [20]. For every transformation g3G one can compute its determinant &det g' which is, in fact, the determinant of the matrix corresponding to g. If g is an isometry then Ddet gD"1; the converse is not true, however. If H is a subgroup of G, then H denotes the subgroup of ` H containing all transformations with positive determinant. For example, I "R and S comprises all ` ` multiplications and rotations. If H is a subgroup of G, then the set MmhDh3H, m3MN is also a subgroup, which will be denoted by MH. Rotations in R3 are denoted as follows. When l is an axis (i.e., directed line) passing through the coordinate origin, r means a rotal,a tion about l through angle a in a counter-clockwise direction. At several instances in this paper, the following concept will be needed. De5nition 2.4. Let H-G and J-K. We say that H is J-compact if, for every A3J and every sequence Mh N in n
984
A.V. Tuzikov et al. / Pattern Recognition 33 (2000) 979}995
Fig. 3. Minkowski addition of two convex polyhedra P and Q with coinciding spherical points.
H, the sequence Mh (A)N has a limit point of the form h(A), n where h3H. It is easy to verify that R is compact. However, the subcollection Mrm D m3ZN, where r"rl 3R is a rotation ,a with a/n irrational, is not K-compact for "xed axis l. The following result is easy to prove. Lemma 2.5. Assume that H is J-compact and let f : JPR be a continuous function. If A3J and f :"sup f (h(A)) 0 h|H is xnite, then there exists an element h 3H such that 0 f (h (A))"f . 0 0 3. Mixed volumes
j ,2, j . That is 1 m m m g) (x)" R [ f (u)!g(u!x)]. 6|R2 Note that the dilation operator is extensive: ( f=g) (x)*f (x), and the erosion operator is anti-extensive: ( f>g)(x))f (x). Fig. 2(a) gives a one-dimensional example of the dilation and erosion image operator.
G
co(x)"
xTx)o2,
!R, xTx'o2.
1 qo(x)"! xTx. 2o
(3)
(5)
For both types of a structuring function, the parameter o determines the size (or scale) of the structuring function. Fig. 1 gives a two-dimensional example of a #at and parabolic structuring function. In practical applications of mathematical morphology, the use of a #at structuring function is widespread [1,13,14]. Its popularity stems from the fact that the gray-scale dilation and erosion operators using a #at structuring function are easy to implement (local maximum and minimum "lter) and result in fast algorithms. Van den Boomgaard et al. [15] prove for the class of quadratic structuring functions that: f Any quadratic structuring function is dimensionally decomposable with respect to dilation: ∀A : &R, a, b : ( f = qo(A)) (x)
A
In the remainder of this paper, we focus on the following two structuring functions. Firstly, the yat structuring functions as used by Kramer et al. in their original de"nition of the image-sharpening operator:
(4)
where A is a 2]2 positive-de"nite symmetric matrix. Taking the unity matrix for A yields the symmetric twodimensional parabolic structuring function:
A A BB A A B BB
" f = qo RT
3. Structuring functions
0,
x 1 "! xTA~1x, o 2o
=qo RT
a 0 0 0
0 0
0 b
R
R
(x).
(6)
f The general class of quadratic structuring functions contains the subclass of structuring functions that are rotationally symmetric: qo(x)"!(1/2o)xTx, i.e. those
Fig. 1. Parabolic and #at structuring function: (a) parabolic structuring function qo(x)"!(1/2o)xTx, and (b) #at structuring function co(x).
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
structuring functions for which
A B
A"
1 0 0 1
.
4. Image-sharpening operator class
(7)
These properties allow for very e$cient algorithms for implementing the parabolic dilation operator, that have been shown to be independent of the size of the structuring function [15]. Given these scalable structuring functions go(x) we rede"ne the dilation and erosion operator: F^(x, o)"( f = go) (x),
(8)
F>(x, o)"( f > go)(x),
(9)
F (x, 0)"F^(x, 0)"F>(x, 0)"f (x).
(10)
The functions F^(x, o) and F>(x, o) are the morphological scale-space notations for the gray-scale dilation and gray-scale erosion image operators with structuring function go(x) [12]. The parameter o is the scale parameter of the morphological scale space. The original function is f (x) and x is the position. The operators ( f = g) (x) and ( f > g) (x) are the dilation and erosion operators as de"ned in Eq. (1) and (2). Note that one can derive a scalable structuring function from any concave structuring function g(x) by means of umbral scaling:
AB
go(x)"og
x . o
(11)
For example, the scalable parabolic structuring function qo(x) can be derived from the structuring function q(x)"!1 xTx: 2
AB
qo(x)"oq
x o xTx 1 "! "! xTx. o 2 o2 2o
(12)
The scalable #at structuring function co(x) can be derived from structuring function,
G
c(x)"
0,
xTx)1,
(13)
!R, xTx'1
in a similar manner:
AB
G
x co(x)"oc " o
G
0,
0,
xTx )1 o2
!R,
xTx '1 o2
xTx)o2,
"
!R, xTx'o2.
999
(14)
In this section we give a de"nition of the image-sharpening operator class in terms of gray-scale dilation image operators and gray-scale erosion image operators; it is an extension of Kramer's original de"nition. Furthermore, we prove that iterative applications of the sharpening operator using a concave structuring function have sharpening properties. 4.1. Image-sharpening operator class dexnition First, we rephrase the original transformation de"ned by Kramer et al. in the framework of mathematical morphology: e[ f ](x, o)
G
F^(x, o), F^(x, o)!F(x, 0)(F(x, 0)!F>(x, o),
" F>(x, o), F^(x, o)!F(x, 0)'F(x, 0)!F>(x, o), F (x, 0), otherwise. (15)
This image-operator class is parameterized by the structuring function go(x). If we take a #at structuring function co(x) as the structuring function go(x), then the image-sharpening operator equals the original de"nition of Kramer et al. with one modi"cation: Kramer et al. did not consider the special case where F^(x, o)! F (x, 0) equals F (x, 0)!F>(x, o). In that case, the image-sharpening operator as de"ned by Kramer et al. behaves as the dilation operator (F^(x, o)). In case of a single-slope signal (∀x : +2 f"0), the application of the operator as de"ned by Kramer et al. results in a translation of the original signal, whereas this new de"nition preserves the original signal. Fig. 2(b) gives a one-dimensional example of an application of the sharpening operator. Fig. 3 gives a two-dimensional example of an application of the sharpening operator on a scanned part of a utility map. Note that Eq. (15) de"nes one application of the sharpening operator. In practical applications, however, it is generally bene"cial to use multiple iterative applications of the operator with a small structuring function. The size of the structuring function determines the sharpening speed. If we use a sharpening operator with a large structuring function, we require less iterative applications than with a small structuring function, but we lose small details in the resulting image, see for example Fig. 4. In general, this choice is a trade-o! between speed and accuracy. Repeated application of the sharpening operator for a "xed scale of the structuring function will sharpen the image until a "xed point is reached, i.e. application of the operator will no longer alter the image. In the following, we show that this repeated sharpening behavior is controlled by a partial di!erential equation
1000
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
Fig. 2. (a) One-dimensional example of gray-scale dilation image operator = and gray-scale erosion image operator >. (b) Imagesharpening operator E[ f ] takes values from f = g or f > g.
Fig. 3. One application of the image-sharpening operator E[ f ]. Shown are the original image f, the dilation f = g, the erosion f > g, and the sharpening result E[ f ].
Fig. 4. Choice of the size of a structuring function, i.e. speed versus accuracy. (a) Part of a scanned utility map (400 dpi). (b) Result of the image-sharpening operator with a 3]3 #at structuring function: three iterative applications were necessary to achieve a sharp image. (c) Result after one iteration of the image-sharpening operator with a 7]7 #at structuring function: the decimal dot is merged with the digit.
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
(PDE). By examining this PDE optimal structuring functions can be derived. 4.2. Laplacian properties for one-dimensional continuous functions In this section it is shown that the sign of the Laplacian of function f (x) at position x determines whether f (x) is going to be dilated or eroded by the sharpening operator. With this important property of the sharpening operator the PDE will be derived. For the sake of explanation, it is shown for one-dimensional continuous functions only, but it can simply be generalized to two-dimensional functions, which is shown in Section 4.3. In the remainder of this paper a concave function is de"ned as follows: a function g(x), g : x 3 R C g(x) 3 R is concave if for all x , x 3 R the line connecting the 0 1 points (x , g(x )) and (x , g(x )) is below the function 0 0 1 1 g(x). A function is convex when it is not concave. For any one-dimensional symmetric (go(x)"go(!x)) concave structuring function go(x), go : x 3 R C go(x) 3 R and o 3 R`, the following properties of the image-sharpening operator class hold: E[ f ](x, o)'f (x) if +2f (x)(0,
(16)
E[ f ](x, o)(f (x) if +2f (x)'0,
(17)
E[ f ](x, o)"f (x) if +2f (x)"0,
(18)
where +2f (x) is the Laplacian of function f (x). Before proving properties (16)}(18), we introduce the slope transform S[ f ](w). For symmetric concave structuring
1001
functions go(x), we have that for both cases x'0 and increasing x, as well as for x(0 and decreasing x, that the derivative of go(x), +go(x) is a decreasing function. This implies that the intercept of a tangent line of go(x) in x with the functional axis (y-axis) is an increasing function for increasing x'0 as well as for decreasing x(0, as depicted in Fig. 5(a). The intercept function is known as the slope transform S[go] (w) of function go(x), see Fig. 5(a). The slope transform was introduced by Dorst and Van den Boomgaard [16,17] and is closely related to the A-transform as introduced by Maragos [18]. The slope transform is a function of slope w and is set valued. Because our structuring functions are concave, the slope transform of these functions is single valued. To prove Eq. (16), we consider a function f (x) with +2f (x)(0 for a certain value of x, x . To determine the dilation and 1 erosion value of function f (x) at x , we use the 1 hit property of the dilation image operator and the hit property of the erosion image operator. The hit property of the dilation image operator can be explained with Fig. 5(b). To determine the dilation value d at 0 x (( f = go) (x )) we place the inverse structuring function 1 1 !go (x) above function f (x) in such a way that the origin of the inverse structuring function !go(x) is located at (x , #R). After that, we shift the origin downwards (i.e. 1 we translate it along the functional axis) until it hits the function f (x) (let us say in (x , f (x )). The function value a a of the thus translated origin sets the new dilation value of function f (x) at x . Now assume that we take the interval 1 [x , x ] so small that we may linearly approximate f (x) 0 1 within this interval. Note that because +f (0, +f (x )'+f (x ). Let us further assume that we have a b
Fig. 5. (a) Intercept with the functional axis (slope transform of go at +go). (b) Hit property of gray-scale dilation and gray-scale erosion image operator.
1002
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
taken the scale of the structuring function su$ciently small (pB0) that the hit position is within the above interval, i.e. x (x (x . Then the gradient of the struc0 a 1 turing function is equal to the gradient of function f (x) at x"x : a +f (x )"+!go(x !x )"+go(x !x ). a 1 a a 1
(19)
The dilation value d at x can now simply be derived 0 1 from the slope transform, i.e. d "S[go](+f (x )). The 0 a same applies for the hit property of the erosion image operator. In that case, we use structuring function go(x) and place its origin below function f (x), at x , i.e. 1 (x ,!R) and shift it upwards. Consequently, the ero1 sion value d at x equals d "S[go](+f (x ). Because 1 1 1 b +2f (x )(0, we have 1 +f (x )'+f (x ) a b
(20)
and S[go](+f (x ))(S[go](+f (x )), a b
(21)
we derive that d (d . This sets the image-sharpening 0 1 operator value at x to the dilation value F^(x , o) and 1 1 proves property (16). Property (17) is proved by the duality of the erosion image operator. Property (18) is proven by the fact that when +2f (x)"0, the sharpening operator equals F(x, 0)"f (x). 4.3. Laplacian properties for two-dimensional continuous functions Generalization of Eqs. (16) and (17) to the two-dimensional case is not trivial because the Laplacian is de"ned as L2f L2f +2f (x)" # . Lx2 Lx2 0 1
(22)
Now there are two cases for which +2f (x)(0: Case 1:
L2f L2f (0 and (0. Lx2 Lx2 0 1
Case 2:
L2f L2f L2f L2f (0, '0 and ' Lx2 Lx2 Lx2 Lx2 0 1 0 1
(and vice versa for x and x ) 0 1
(23)
(24)
L2f L2f '0 and '0. Lx2 Lx2 0 1
Case 4:
L2f L2f L2f L2f '0, (0 and ' Lx2 Lx2 Lx2 Lx2 0 1 0 1
(and vice versa for x and x ). 0 1
4.4. Sharpening properties for a one-dimensional edge model To demonstrate the sharpening properties of the image-sharpening operator, we construct an analytical edge model. Let the original edge be represented by function i(x), i : x 3 R C i(x) 3 R. Suppose further that the recording of the original edge can be modeled by a linear system. Hence, the recorded edge function f (x), f : x 3 R C f (x) 3 R is given by f (x)"i (x)*h (x),
i (x)" (25)
(26)
(27)
where * is the convolution operator and h (x) is the point spread function (PSF), h : x 3 R C h (x) 3 R. Function f (x) is depicted in Fig. 6(c). Let us assume a symmetrical lens, i.e. h (x)"h(!x), with "nite aperture, i.e. h (x)"0 for x!a and x'a for some a 3 N. Moreover, h(x)*0 and h(x) is a decreasing function for increasing and decreasing x. Point spread function h (x) is depicted in Fig. 6(b). Suppose the original edge is an ideal step edge, that is, i(x) is the unit step function:
G
and two cases for which +2f (x)'0: Case 3:
For case 1, f (x) is concave at position x. When we use a rotational-symmetric concave structuring function go(x) for the sharpening operator, the dilation value at position x is closer in value to the original gray value than the erosion value. Therefore, property (16) still holds for two-dimensional continuous functions; its proof is analogous to the proof for one-dimensional continuous functions. For case 3, f (x) is convex. When we again use a rotational-symmetric concave structuring function go(x) for the sharpening operator the value of the erosion at position x is closer to the original gray value than the dilation value. As a consequence, property (17) also holds for two-dimensional continuous functions; its proof is analogous to the proof for one-dimensional continuous functions. For cases 2 and 4, L2f /Lx2 and L2f /Lx2 do not have 0 1 the same sign, hence f (x) is neither convex nor concave in x. In these cases it will also depend on the values L2f /Lx2 0 and L2f /Lx2 whether the dilation value or the erosion 1 value is chosen as the resulting value after the application of the sharpening operator. The "nal choice is made on the basis of the highest partial second-order derivative value, giving the lowest slope transform value in x or x . 0 1
1,
x)0,
0,
x'0.
(28)
Function i(x) is shown in Fig. 6(a). When convolving with h(x), note that +f (x)(0 for x 3 [!a, a] and, as h(x) is decreasing and symmetric, that +2f (x)(0 for x3[!a, 0), +2f (x)'0 for x 3 (0, a], and +2f (x)"0 for x"0. The image-sharpening operator E[ f ] only changes function f (x) at points x that have +2f (x)O0.
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
1003
Fig. 6. (a) Original picture i(x). (b) Point spread function h(x). (c) Blurred version f (x) of original picture i(x). (d) One iteration of the image-sharpening operator.
Consider the interval [!a, 0) in which points x have +2f (x)(0, as depicted in Fig. 6(c). This interval represents a concave part of the function f (x). One application of the image-sharpening operator with a concave structuring function go(x) results in F^(x, o) at this interval, as shown in Fig. 6(d). As F^(x, o)'f (x) is true at this interval (Eq. (16)), the interval at which points x have +2F^(x, o)(0, i.e. [!b, 0), becomes smaller than the original interval [!a, 0), at which +2f (x)(0. Furthermore, because we use a concave structuring function go(x), F^(x, o) is also concave at the interval [!a, 0) (proved in Ref. [16]). As a consequence, repeated applications of the image-sharpening operator on the interval [!a, 0) result in an interval [!a, !c), with !b(c(0, at which all points x have function values equal to the maximum function value in the interval [!a, 0), i.e. f (!a). The same holds for the convex interval (0, a] with points x having +2f (x)'0. In this case, repeated applications of the image-sharpening operator result in an interval (c, a], with 0(c(b, at which all points x have function values equal to the minimum function value in the interval (0, a], i.e. f (a). After a "nite number of applications of the image-sharpening operator, the blurred function f (x) is sharpened to the original picture i (x). The exact number of needed sharpening applications is derived in Section 4.6. 4.5. Sharpening properties for a two-dimensional edge model If the structuring function go(x) is rotationally symmetric and concave, the sharpening properties of the sharpening operator also hold for edges with arbitrary orientations in two-dimensional images. For two-dimensional
oriented edges, the sharpening is perpendicular to the direction of the edge, and can be considered to be a collection of one-dimensional sharpening applications, as discussed in Section 4.4. Fig. 7(a) gives a visual example. This property can be derived using the hit property of the dilation and erosion operator and the fact that the structuring function is rotationally symmetric, (see Figs. 7(b) and (c)). The isophotes of Fig. 7(a) run parallel to the edge. As such, when we sharpen the edge, the structuring function will hit the edge in a point for which the line through the point and the origin of the structuring function is perpendicular to the direction of the edge, (see Fig. 7(b)). This implies that sharpening a point x on the edge only requires function values of points on the line that runs through x and is perpendicular to the edge. As a result, sharpening two-dimensional oriented edges can be considered to be a collection of one-dimensional sharpening applications, as shown in Fig. 7(c). Both the #at and parabolic structuring function are rotationally symmetric. However, discrete approximations of small #at structuring functions are not isotropic and can cause anisotropic sharpening behavior, as discussed in Section 8.1. 4.6. Number of iterative applications In this section we derive the number of iterative applications of the sharpening operator necessary to sharpen one edge. The number of applications depends on the PSF h(x) and the size of the structuring function. We use the one and two-dimensional edge models of Sections 4.4 and 4.5. Furthermore, we distinguish between the use of a #at structuring function and a parabolic structuring function.
1004
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
Fig. 7. (a) Sharpening of a two-dimensional oriented edge can be considered to be a collection of one-dimensional sharpening applications perpendicular to the direction of the edge. (b) The hit property of the dilation and erosion operator and a rotationalsymmetric structuring function imply that the sharpening is perpendicular to the direction of the edge. (c) Pro"le along vector n.
When we use a parabolic structuring function qo(x) with o"*o, the determination of the number of necessary applications of the sharpening operator to sharpen a one or two-dimensional oriented edge can be visualized with Fig. 6(c). We have to transport the maximum of (!a, f (!a)) and the minimum of (a, f (a)) towards x"0. This transport is done by consecutive applications of the sharpening operator. It can be proved that recursive applications of a parabolic dilation of function f (x) with structuring function q*o(x) can be performed by one parabolic dilation (same holds for erosion) [12]:
the number of applications is the same for one- and two-dimensional oriented edges. When we use a #at structuring function co(x) with o"1, the number of necessary applications of the sharpening operator to sharpen a one- or two-dimensional oriented edge equals a, the aperture of the point spread function h(x). Consider the visualization of a one-dimensional edge in Fig. 6(c). With each application of the image-sharpening operator, the maximum of (!a, f (!a)) and the minimum of (a, f (a)) move exactly one pixel (o"1) towards x"0 because:
f = q*o= 2 = q*o"f = qj*o.
f=c1= 2 =c1"f =cj.
(29)
As a result, we need to determine the parabola qo505 "qj*o that sets the maximum value at x"0. This equals "nding o for which qo505 (a)"0: 505 qo505 (a)"0, (30) a2 ! "0. (31) 2o 505 This is only true for o PR. In the discrete domain 505 this equals "nding o for which qo505 (a)'!1 the quant2 505 ization level: qo505 (a)'!1, (32) 2 a2 ! '!1, (33) 2 2 505 o o 'a2. (34) 505 After the determination of the value of o , we can 505 determine the number of necessary applications, denoted j, of the operator to sharpen an edge using a parabolic structuring function: o j" 505 . *o
(35)
If *o"1, the number of applications equals o . As the 505 parabolic structuring function is rotationally symmetric
(36)
For a two-dimensional oriented edge, the maximum and minimum gray value move towards the center of the edge along the perpendicular direction of the edge (see Fig. 7(b)). As a #at structuring function is rotationally symmetric, the number of necessary applications is the same for a one- and two-dimensional oriented edge. If we enlarge the size of the structuring function co(x) by increasing o, the number of applications of the imagesharpening operator decreases to a j" . o
(37)
5. Image-sharpening operator: partial di4erential equation This section introduces the partial di!erential equation (PDE) of the image-sharpening operator class. The PDE includes a partial derivative of sharpening operator E[ f ] to o, the sign of the Laplacian of function f (x), as well as the slope transform of the structuring function go(x). Given that g(x) is a concave structuring function and go(x)"og(x/o) (umbral scaling), we have: LF^ F^(x, o#*o)!F^(x, o) " lim Lo *o *o?0
(38)
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
" lim *o?0
( f = go`*o)(x)!F^(x, o) *o
1 lim (( f = go)=g*o)(x)!F^(x, o) " *o *o?0
(39)
Given Eqs. (44) and (45) we can derive the following partial di!erential equation for iterations of the sharpening operator with a structuring function of width o:
G
(40)
(F^(x, o)=g*o)(x)!F^(x, o) " lim *o *o?0
(41)
2 lim S[g*o](+F^) " *o *o?0
(42)
3 lim *o S[g](+F^) " *o *o?0
(43)
"S[g](+F^)
(44)
F^(x, o) "S[g](+F^), Lo
F>(x, o) LE[ f ] " "!S[g](+F>), Lo Lo F>(x, 0) "0, Lo
F^(x, o)!F(x, 0)(F(x, 0)!F>(x, o), F^(x, o)!F(x, 0)'F(x, 0)!F>(x, o), (49)
otherwise. If we use properties (16) and (17), this results in
and by duality of the dilation and erosion image operator:
LE[ f ] "!sign[+2f ]S[g](+f ), Lo
LF> "!S[g](+F>). Lo
where
(45)
Equality 1 uses the property that dilation of a function f (x) by members of a family of umbral-scaled structuring functions go(x) forms an additive semi-group in o [16]: (( f = go1 )=go2 )(x)"( f = go1 `o2 )(x).
(46)
Equality 2 stems from the property of the slope transform based upon umbral scaling of the structuring function g*o(x) [16]: g*o(x)"*o g
A B
x s % *o S[g](w). *o
(47)
Equality 3 uses the hit property of the dilation operator, as discussed in Section 4.2. For Equality 3 we want to calculate (F^(x, o)=g*o)(x)!F^(x, o) (see also Fig. 8(a)). With the slope transform of function g*o, we can set (F^(x, o)=g*o)(x)!F^(x, o)"S[g*o](+F^).
1005
(48)
G
1,
+2f (x)'0,
sign[+2f ](x)" !1,
+2f (x)(0,
(50)
(51)
0, otherwise.
In the remainder of this section, we derive the PDEs of the sharpening operator with a parabolic and a #at structuring function. For parabolic structuring functions, we have that go(x)"qo(x)"!(1/2o)x2, g(x)"!1x2, 2 and S[g](w)"1DwD2, (see Fig. 8(b)). The PDE for iter2 ations of the sharpening operator with a parabolic structuring function then becomes Le[ f ] 1 "!sign[+2f ] D+f D2. Lo 2
(52)
Eq. (52) is, apart from the term!1 sign[+2f ], equal to 2 the PDE of the morphological scale space [12] which is given by LF "D+f D2. Lo
(53)
Fig. 8. (a) Calculation of S[g*o](+F^) and !S[g*o](+F^). (b) Slope transform of structuring function g*o. (c) Slope transform of structuring function c*o.
1006
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
For #at structuring functions we have that go(x)"co(x) and S[c](w)"DwD (see Fig. 8(c)). The partial di!erential equation for iterations of the sharpening operator with a #at structuring function then becomes LE[ f ] "!sign[+2f ]D+f D. Lo
(54)
Eq. (54) is similar to the PDE for two-dimensional shock "lters for image enhancement, as obtained by Osher and Rudin [2]: u "!Ju2#u2F(L(u)) t x y
(55)
in which L(u) is set to L(u)"u
xx
) u2#2 ) u u u #u ) u2 x xy x y yy y
(more e$cient) mathematical morphological de"nition in this section. When using this de"nition we need to discretize the dilation and erosion operator. In Ref. [15], van den Boomgaard et al. present some very fast algorithms for the discrete dilation operator with a parabolic structuring function. The presented algorithms are independent of the size of the structuring function and are of order complexity O(n), with n the number of pixels in the original image. When using #at structuring functions, algorithms for the gray-scale dilation operator have order complexity O(o2n) and are dependent on the structuring function size o.
7. Image sharpening with a continuous approximation (56)
and where F is a Lipschitz continuous function which satis"es F(0)"0,
(57)
X(u)F(u)'0, uO0,
(58)
where X(u)"1 if u'0, X(u)"!1 if u(0, X(0)"0. Osher and Rudin use this L(u) because it contains second directional derivatives, as opposed to the Laplacian (L](u)"u #u ) which is curvature insensitive xx yy [19]. In Section 8, we experiment with the use of both structuring functions for applications of the sharpening operator in a discrete domain. We show that for small values of o, iterations of the image-sharpening operator using the gray-scale dilation and erosion image operators are a numerical di!erence scheme to solve the partial di!erential equation of the image-sharpening operator. Additionally, we show that the stability of the numerical di!erence scheme depends on the choice of the structuring function, the type of quantization, and the minimum value o that can be set for a structuring function in the discrete domain.
6. Discrete sharpening operator The above derivations of the sharpening operator are applicable to the continuous domain. In this section, we discuss its application in the discrete domain. In the previous sections we have shown that the sharpening operator can be either de"ned by the PDE de"nition (Eq. (49)), or by the mathematical morphological de"nition of Eq. (15). The PDE de"nition requires "rst- and secondorder derivatives of the discrete image, which can be obtained by convolution with Gaussian derivatives [20]. Because this implies additional blurring of the discrete image as well as two extra convolutions, we focus on the
A drawback of the discrete sharpening operator using the mathematical morphological de"nition is that there is a minimum bound on o for the parabolic and #at structuring function to ensure any sharpening e!ect. The minimum bound exists because the image and structuring function are both discrete. For a #at structuring function, the minimum value of o equals the sampling distance (one pixel): o*1. The minimum value of o for a parabolic structuring function depends on the number of quantization levels of the image. In Ref. [15], Van den Boomgaard et al. describe an implementation of the gray-scale dilation operator that operates on a continuous approximation of the original discrete image. In this approximation, each pixel of the image is represented with a continuous parabola that has a parameter o and a height equal to the gray value of the pixel. The implementation of the dilation operator with a parabolic structuring function, denoted as the unionof-translations implementation, modi"es the o values of the continuous parabolas of the image. A parabolic dilation with structuring function g*o(x) is performed by enlarging each parabola with the amount * . The resulto ing image consists of a set of parabolas with size o#*o and forms a continuous approximation of the dilation result that can be discretized again. Further research should indicate whether it is possible to use this implementation of the dilation operator to make a discrete sharpening operator in which we can choose o arbitrarily small.
8. Experiments This section presents results of applications of the sharpening operator using #at and parabolic structuring functions in the discrete domain. In Section 8.1 the isotropy of both structuring functions is tested for two quantization types. Sections 8.2 and 8.3 give results of experiments on arti"cially generated step edges and on document images.
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
8.1. Isotropic sharpening For accurate results of the sharpening operator in the discrete domain, where function values are given as quantized numbers on a uniform sampling grid, it is necessary to choose o of the structuring function as small as possible. This section investigates the e!ects of sampling and quantizing structuring functions. It is shown for two types of quantization that the use of a parabolic structuring function is preferred to the use of a #at structuring function in terms of isotropic sharpening behavior. The "rst quantization type that is considered is uniform and equals the gray-scale range [0, 255]. The second type is non-uniform and uses a yoating-point representation. 8.1.1. Uniform quantization with 256 gray values In this section the sharpening operator using a #at structuring function is compared with the sharpening operator using a parabolic structuring function for a uniform quantization with 256 gray values. Note that the size of a #at structuring function in this domain can only be as small as possible for o"1 (sampling problem). For that value of o, the discrete approximation of the disk So of the #at structuring function equals a diamond (4-connected) or a square (8-connected). The parabolic structuring function can be as small as possible, but in order to have any sharpening e!ect, o has to have a minimum value of o"1. For lower values of o, the sharpening operator is not able to fully sharpen the image; the operator only sharpens image edges to a maximum slope, set by the value of o.
1007
Fig. 9 shows the results of the application of the sharpening operator for di!erent numbers of iterations and di!erent structuring functions in case of uniform quantization with 256 gray values. The original image is a digitized two-dimensional Gaussian function with gray values "tted into the range [0, 255]. The desired result of the application of the sharpening operator is a cylinder. From the results we may conclude that sharpening with a parabolic structuring function better resembles the desired result than sharpening with a 4- or 8-connected #at structuring function. This stems from the fact that sampling a parabolic structuring function gives a more isotropic structuring function than sampling a #at structuring function. However, the sampled and quantized parabolic structuring function contains (repeating) quantization errors, as noted in Ref. [21], which may in#uence the stability of the sharpening operator and the correctness of the sharpening result after several iterative applications of the operator. 8.1.2. Non-uniform quantization with a yoating-point representation In case of #oating-point function values given on a grid, the o value for the parabolic structuring functions can even be lower than 1 (down to e), which is determined by the #oating-point precision. The minimal value of o for #at structuring functions remains 1. Fig. 10 shows the results of the application of the sharpening operator for di!erent numbers of iterations and di!erent structuring functions in case of #oating-point function values. The original image is again a digitized two-dimensional Gaussian function and consequently the desired result of
Fig. 9. Applications of the sharpening operator in the case of uniform quantization with 256 gray values of the input image and structuring function. Di!erent structuring functions are shown vertically and the number of iterative applications is shown horizontally. The original input image is a two-dimensional Gaussian function with p"9.0 and gray values "tted into the range [0, 255].
1008
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
Fig. 10. Applications of the sharpening operator in the case of non-uniform quantization with a #oating-point representation of the input image and structuring function. Di!erent structuring functions are shown vertically and the number of iterative applications is shown horizontally. The original input image is a two-dimensional Gaussian function with p"9.0 and gray values "tted into the range [0.0, 255.0].
the application of the image-sharpening operator is a cylinder. From the results, we may conclude that sharpening with a parabolic structuring function yields the desired result: a cylinder, whereas sharpening with a 4- or 8-connected #at structuring function gives cubiclike "gures, which are again due to their anisotropic behavior. 8.2. Edge sharpening This section presents results of experiments with the sharpening operator on an arti"cially generated step edge in a discrete (uniform sampled and uniform quantized with 256 gray values) image, as shown in Fig. 11(a). The image is distorted with Gaussian blur and additive Gaussian noise. The standard deviation of the blur and noise ranges between 0 and 25: p "0, 1, 3, 5, 10, 15, 25. "-63, /0*4%
(59)
The structuring functions used in the experiments are the #at structuring function and the parabolic structuring function. The choices for the size o of the #at structuring function have been: o"0, 1, 2, 4, 8, 16, 32, 64,
(60)
where 0 corresponds to a 4-connected #at structuring function and 1 to a 8-connected #at structuring function. The choices for the size o of the parabolic structuring function were: o"0.125, 0.25, 0.5, 1, 2, 4, 8, 16.
(61)
Fig. 11. (a) Arti"cially generated step-edge image. (b) Arti"cially generated image used in the experiment on document-image sharpening.
For both structuring functions, the range of iterations is chosen from 1 to 16 iterations. The error measure applied is the mean squared error (MSE) between the original image in Fig. 11(a) and the sharpening result. For each value of p and p we record the minimal MSE for "-63 /0*4% the #at and parabolic structuring function over the number of iterations and the range of structuring function size o. The error graph for the #at structuring functions is shown in Fig. 12(a); the error graph for the parabolic structuring functions is shown in Fig. 12(b). From Figs. 12(a) and (b) we may conclude that for p )5 and p )5, sharpening with a #at and para"-63 /0*4% bolic structuring function can completely (in the MSE sense) reconstruct the original image. For higher values of p and p , the error is not zero and gradually "-63 /0*4%
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
1009
Fig. 12. Edge-sharpening experiment. Minimal MSE for di!erent values of p and p in case of (a) #at structuring functions and "-63 /0*4% (b) parabolic structuring functions.
Fig. 13. Edge-sharpening experiment. (a) The number of times a #at structuring function of size o gives a minimal MSE as a function of o. (b) The number of times a parabolic structuring function of size o gives a minimal MSE as a function of o.
increases for increasing amounts of blur and noise. Although it is hard to compare the #at and parabolic structuring with di!erent ranges of o, the #at structuring functions perform slightly better than the parabolic structuring functions. All minimum errors were found after one iteration. Figs. 13(a) and (b) give the frequency of the di!erent sizes o of the structuring functions for which the minimum errors were found. Maximums are found at o"1 (8-connected) for the #at structuring functions and at o"1 for the parabolic structuring functions.
5 pixels on average. Two digits are at least one pixel apart. For the experiment we again distorted the image with Gaussian blur and additive Gaussian noise. The standard deviation of the blur ranges between 0 and 1.8:
8.3. Document-image sharpening
The structuring functions used are the #at and parabolic structuring with the same range of size o and the same number of iterations as in the previous experiment. The error measure applied is the number of disconnected digits after thresholding the sharpened image at threshold value t"122. For each value of p and p , we "-63 /0*4% record the number of disconnected digits closest to 200
In this section, results are shown of experiments with the sharpening operator on an arti"cially generated document image, as shown in Fig. 11(b). The image consists of the numbers 0}99, each represented with two digits in a 32-points font. The width of the digits is
p "0, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8. "-63
(62)
The standard deviation of the noise ranges between 0 and 7: p "0, 1, 2, 3, 4, 5, 6, 7. /0*4%
(63)
1010
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
for the #at and parabolic structuring function over the number of iterations and the range of structuring function size o. The graphs with the number of disconnected digits are depicted in Figs. 14(a) and (b). The corresponding numbers of noise objects are shown in Figs. 15(a) and (b).
From Figs. 14 and 15 we may conclude that image sharpening with a #at structuring function restores more of the original digits in the image for higher values of pblur, but it gives many noise objects. Figs. 16(a) and (b) give the frequency of the di!erent sizes o of the structuring functions for which the closest number of
Fig. 14. Experiment on document-image sharpening. (a) The number of disconnected digits closest to 200 after sharpening with a #at structuring function and thresholding for di!erent values of p and p . (b) The number of disconnected digits with a parabolic "-63 /0*4% structuring function.
Fig. 15. Experiment on document-image sharpening. The corresponding number of noise objects for (a) a #at structuring function and (b) a parabolic structuring function.
Fig. 16. Experiment on document image sharpening. (a) The number of times a #at structuring function of size o gives a number of disconnected digits closest to 200 as a function of o. (b) The number of times a parabolic structuring function of size o gives a number of disconnected digits closest to 200 as a function of o.
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
disconnected digits to 200 are found. Maximums are found at o"1 (8-connected) for the #at structuring functions and at o"1 for the parabolic structuring functions.
9. Conclusions In this paper we introduced a class of morphological image operators with applications to sharpen digitized gray-scale images. We de"ned the sharpening operator class in terms of the gray-scale dilation and erosion operator from the theory of mathematical morphology. Additionally, we derived the partial di!erential equation (PDE) that governs this class of operators. Furthermore, we showed with an analytical edge model that this class of image operators has sharpening properties when we use concave structuring functions. In this paper, we focused on two instances of this class of operators: one sharpening operator using a #at structuring function, and one sharpening operator using a parabolic structuring function. For two types of quantization, we showed with experiments in the discrete domain that the use of a parabolic structuring function is to be preferred to a #at structuring function in terms of isotropic sharpening behavior. On the other hand, experiments on sharpening document images revealed that #at structuring functions perform better than parabolic structuring functions if we can accept a higher number of sharpening errors.
References [1] H.P. Kramer, J.B. Bruckner, Iterations of a non-linear transformation for enhancement of digital images, Pattern Recognition 7 (1975) 53}58. [2] S. Osher, L.I. Rudin, Feature-oriented image enhancement using shock "lters, SIAM J. Numer. Anal. 27 (4) (1990) 919}940. [3] A. Rosenfeld, Picture Processing by Computer, Academic Press, New York, 1969. [4] W.F. Schreiber, Wirephoto quality improved by unsharp masking, Pattern Recognition 2 (1970) 117}121. [5] I. Scollar, B. Weidner, T.S. Huang, Image enhancement using the median and the interquartile distance, Comput. Vision, Graphics Image Process. 25 (1984) 236}251. [6] R.C. Gonzalez, R.E. Woods, Digital Image Processing, Addison-Wesley, Reading, MA, 1992.
1011
[7] P.W. Verbeek, H.A. Vrooman, L.J. van Vliet, Low-level image processing by max-min "lters, Signal Processing 15 (3) (1988) 249}258. [8] J.M. Lester, J.F. Brenner, W.D. Selles, Local transforms for biomedical image analysis, Comput. Graphics Image Process. 13 (1) (1980) 17}30. [9] J.E. den Hartog, A framework for knowledge-based map interpretation, Ph.D. Thesis, Delft University of Technology, September 1995. [10] J.G.M. Schavemaker, Document interpretation applied to public-utility maps, Ph.D. Thesis, Delft University of Technology, June 1999. [11] J. Serra, Image Analysis and Mathematical Morphology, Academic Press, New York, 1982. [12] R. van den Boomgaard, Mathematical morphology: extensions towards computer vision, Ph.D. Thesis, University of Amsterdam, March 1992. [13] F. Meyer, Iterative image transformations for an automatic screening of cervical smears, J. Histochem. Cytochem. 27 (1) (1979) 128}135. [14] B. LayK , Image processing: a key to success in industrial applications, in: J. Serra, P. Soille (Eds.), Mathematical Morphology and its Applications to Image Processing, Kluwer Academic Publishers, Dordrecht, 1994, pp. 341}352. [15] R. van den Boomgaard, L. Dorst, S. Makram-Ebeid, J.G.M. Schavemaker, Quadratic structuring functions in mathematical morphology, in: P. Maragos, R.W. Schafer, M.A. Butt (Eds.), Mathematical Morphology and its Applications to Image and Signal Processing, Kluwer Academic Publishers, Dordrecht, 1996, pp. 147}154. [16] L. Dorst, R. van den Boomgaard, Morphological signal processing and the slope transform, Signal Processing 38 (1994) 79}98. [17] L. Dorst, R. van den Boomgaard, Two dual representations of morphology based on the parallel normal transport property, in: J. Serra, P. Soille (Eds.), Mathematical Morphology and its Applications to Image Processing, Kluwer Academic Publishers, Dordrecht, 1994, pp. 161}170. [18] P. Maragos, Max-min di!erence equations and recursive morphological systems, in: J. Serra, P. Salembier (Eds.), Mathematical Morphology and its Applications to Signal Processing, Universitat Polite`cnica de Catalunya, Barcelona, Spain, May 1993, pp. 128}133. [19] R. Haralich, Digital step edges from zero crossings of second directional derivatives, IEEE Trans. Pattern Anal. Mach. Intel. 6 (1) (1984) 58}68. [20] L. Florack, Image Structure, Kluwer Academic Publishers, Dordrecht, 1997. [21] J.G.M. Schavemaker, Image segmentation in morphological scale-space, Master's thesis, University of Amsterdam, September 1994.
About the Author*JOHN SCHAVEMAKER received his M.Sc. degree in computer science from the University of Amsterdam, The Netherlands, in 1994. His thesis was entitled Segmentation in morphological scale-space. In 1999, he received his Ph.D. degree from the Delft University of Technology, The Netherlands, for a thesis on document interpretation applied to public-utility maps. He currently works at the TNO Physics and Electronics Laboratory. His research interests include image processing, document processing, and computer vision. About the Author*MARCEL REINDERS studied applied physics at the Delft University of Technology. He received his Ph.D. degree from the Delft University of Technology in 1994 for a thesis on model adaption for image coding. Currently, he works as an assistant professor at the faculty of Information Technology and Systems, Delft University of Technology.
1012
J.G.M. Schavemaker et al. / Pattern Recognition 33 (2000) 997}1012
About the Author*JAN GERBRANDS (1948) holds a M.Sc. in electrical engineering and a Ph.D. from Delft University of Technology, The Netherlands. He is an associate professor in the Information and Communication Theory Group of the Department of Electrical Engineering of Delft University of Technology and focuses on image processing, in particular on model-based image segmentation and knowledge-based systems for image analysis, with particular emphasis on medical applications and remote sensing. He is the (co-)author of some one hundred publications in journals and conference proceedings. He has taught courses on random signal theory, image processing, pattern recognition and computer and robot vision. Since 1995 he serves as director of studies in electrical engineering at Delft University. About the Author*ERIC BACKER is with the Delft University of Technology, Delft, The Netherlands, Faculty of Information Technology and Systems. He got his M.Sc. and Ph.D. degrees from the same University, in 1969 and 1978, respectively. He became Full Professor in Electrical Engineering in 1982 and was Visiting Professor at Michigan State University in 1987 and 1982, and at the University of South Florida in 1993, both at Computer Science Departments. He published over hundred technical papers and books and is member of the IEEE and Fellow of the IAPR. His research interests include Information Theory, Machine Intelligence, Pattern Recognition, Applied Fuzzy Logic, Knowledge Engineering, Data Mining. He is (co)managing Editor of Pattern Recognition Letters of Elsevier Publishing. He served as Dean of the Electrical Engineering Faculty from 1989 until 1998.
Pattern Recognition 33 (2000) 1013}1021
Non-linear image processing in hardware A. Gasteratos!, I. Andreadis",* !Laboratory for Integrated Advanced Robotics, Department of Communication Computer and System Science, University of Genoa, Via Opera Pia 13, I-16145 Genoa, Italy "Laboratory of Electronics, Section of Electronics and Information Systems Technology, Department of Electrical and Computer Engineering, Democritus University of Thrace, GR-671 00 Xanthi, Greece Received 1 October 1998; received in revised form 2 May 1999; accepted 23 June 1999
Abstract A new ASIC capable of computing rank order "lters, weighted rank order "lters, standard erosion and dilation, soft erosion and dilation, order statistic soft erosion and dilation, fuzzy erosion and dilation and fuzzy soft erosion and dilation is presented in this paper. Its function is based on local histogram and a successive approximation technique and performs on 3]3-pixel image windows. The hardware complexity of the proposed structure is linearly related to both image data window size and pixel resolution. The dimensions of the core of the proposed ASIC are 2.88 mm] 2.8 mm"8.06 mm2 and its die size dimensions are 3.72 mm]3.64 mm"13.54 mm2. It executes 3.5]106 non-linear "lter operations per second. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Machine vision; Non-linear "lters; Mathematical morphology; Fuzzy image processing; VLSI
1. Introduction Non-linear "lters are a large family of "lters used in signal and image processing [1]. They include wellknown "lter classes such as rank order "lters (median, min, max, etc.), morphological "lters (e.g. opening, closing), etc. Rank order "lters exhibit excellent robustness properties and provide solutions in many cases, where linear "lters are inappropriate. Linear "lters have poor performance in the presence of noise that is not additive as well as in cases where system non-linearities or nonGaussian statistics are encountered [1]. Non-linear "lters can suppress high frequency and impulse noise in an image, avoiding at the same time extensive blurring of the image, since they have good edge preservation properties. They have found numerous applications, such as in digital image analysis, speech processing and
* Corresponding author. Tel.: #30-541-79566; fax: #30541-79564. E-mail addresses:
[email protected] (A. Gasteratos),
[email protected] (I. Andreadis)
coding, digital TV applications, etc. Median and rank order "lters are related to morphological "lters [2,3]. It has been shown that erosions and dilations are special cases of rank order "lters and that any rank order "lter can be expressed either as a maximum of erosions or as a minimum of dilations [2]. Mathematical morphology o!ers a uni"ed and powerful approach to numerous image processing problems, such as shape extraction, noise cleaning, thickening, thinning, skeletonizing and object selection according to their size distribution [4]. A relatively new approach to mathematical morphology is fuzzy mathematical morphology [5,6]. Several attempts made to apply fuzzy set theory to mathematical morphology, have resulted in di!erent approaches and de"nitions. These are reviewed in Ref. [5], where a general framework is proposed. This framework leads to an in"nity of fuzzy mathematical morphologies, which are constructed in families with speci"c properties. Algorithms suitable for software implementation of rank order "lters are tree sorts, shell sorts and quick sorts [1,7]. Most of these algorithms result in ine$cient hardware structures, since they handle numbers in word-level. Several VLSI structures for median "lters are studied in Ref. [8], where it is stated that for a relatively small pixel
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 6 1 - 2
1014
A. Gasteratos, I. Andreadis / Pattern Recognition 33 (2000) 1013}1021
resolution the threshold decomposition technique is probably the best choice. Comparisons of VLSI architectures for weighted rank order "lters are described in Ref. [9]. These include array architectures, stack "lter architectures (threshold decomposition) and sorting network architectures. Comparisons are dependent on the choice of the "lter parameters. It has been shown that the stack "lter requires the largest silicon area since it depends exponentially on pixel resolution. For high pixel resolution and large window sizes, the sorting network is the best approach. Also, the stack "lter is faster than the array architecture. In this paper a new ASIC capable of computing rank order "lters, weighted rank order "lters, standard erosion and dilation, soft erosion and dilation [10,11], order statistic soft erosion and dilation [12], fuzzy erosion and dilation [6] and fuzzy soft erosion and dilation [13], is presented. It is based on local histogram and a successive approximation technique. The proposed hardware structure provides an output result in a "xed number of steps, which is equal to the image pixel resolution. Thus, the result does not depend on the local histogram values as it happens in the case of Gasteratos and Andreadis [14]. Image data window sizes are up to 3]3-pixels. It performs 3.5]106 non-linear "lter operations per second. The die size dimensions of the ASIC are 3.72 mm]3.64 mm"13.54 mm2, for a 0.8 lm, DLM, CMOS technology process. The rest of the paper is organized as follows. De"nitions of the operations performed by the proposed ASIC are described in Section 2. The technique, on which the function of the ASIC is based, is discussed in Section 3. Hardware details, including the VLSI implementation and simulation results, are presented in Section 4. Concluding remarks are made in Section 5.
2. Basic de5nitions Non-linear "lters based on order statistics, which are computed by the proposed hardware structure, are the following: f Rank order xlters [1]. The input of such "lters is a data window with an odd number of elements, N. These elements are shorted in ascending order and the output of the rank order "lter with rank k is the kth element (kth order statistic). Special cases of rank order "lters are the median, min and max "lters. f Weighted rank order xlters [1]. This is a generalization of the previous category of "lters. The output of a weighted rank order "lter of rank k and weights Mw , w ,2 w N is kth order statistic of the 1 2 N set Mw ex , w ex ,2 w ex N, where w is a 1 1 2 2 N N i natural number and w ex denotes w times repetition i i i of x . i
f Standard morphological xlters [4]. The basic morphological operations, from which all the other operations and "lters are composed are erosion and dilation. These are de"ned as follows: ( f>g)(x)"min M f (x#y)!g(y)N y|G and
(1)
( f=g)(x)"max M f (x!y)#g(y)N, (2) y|G F x~y| respectively, where x, y3Z2 are the spatial co-ordinates, f : FPZ is the gray-scale image, g : GPZ is the gray-scale structuring element and F, G-Z2, are the domains of the gray-scale image and gray-scale structuring element, respectively. f Soft morphological xlters [11]. In soft morphological operations the max/min operations, used in standard morphology, are replaced by weighted order statistics. Furthermore, the structuring element is divided into two subsets; the core and the soft boundary. Soft morphological erosion and dilation are de"ned as follows: f>[a, b, k](x)" min(k) (Mke( f (y)!a(x#y))N (x`y)|K1 (x`z)|K2 XM f (z)!b(x#z)N)
(3)
and f=[a, b, k](x)" max(k) (Mke( f (y)#a(x!y))N 1 (x~y)|K (x~z)|K2 XM f (z)#b(x!z)N),
(4)
respectively, where k is the order index, min(k) and max(k) are the kth smallest and the kth largest, respectively, x, y, z3Z2, are the spatial co-ordinates, f: FPZ is the gray-scale image, a: K PZ is the core of the 1 gray-scale structuring element, b: K PZ is the soft 2 boundary of the gray-scale structuring element, F, K , K -Z2 are the domains of the gray-scale image, 1 2 the core of the gray-scale structuring element and the soft boundary of the gray-scale structuring element, respectively, and K "KCK , where K-Z2 is the 2 1 domain of the gray-scale structuring element. f Order statistic soft morphological xlters [12]. In this class the order index k and the repetition of the elements of the core are generally di!erent. Order statistic soft morphological erosion and dilation are de"ned as follows: f>[a, b, k, r](x)" min(k) (Mre( f (y)!a(x#y))N 1 (x`y)|K (x`z)|K2 XM f (z)!b(x#z)N)
(5)
A. Gasteratos, I. Andreadis / Pattern Recognition 33 (2000) 1013}1021
and f=[a, b, k, r](x)" min(k) (Mre( f (y)#a(x!y))N (x~y)|K1 (x~z)|K2 XM f (z)#b(x!z)N),
(6)
respectively, r is the repetition times of the results related to the core of the structuring element. The other variables are exactly the same with those in Eqs. (3) and (4). f Fuzzy morphological xlters [6]. In this approach, mathematical morphology is studied in terms of fuzzy "tting. The fuzziness is introduced by the degree to which the structuring element "ts into the image. Fuzzy erosion and dilation are de"ned as follows:
C C
D D
k > (x)"min 1, min [1#k (x#y)!k (y)] , (7) A B A B y|B k (x)"max 0, max [k (x!y)#k (y)!1] , (8) A^B A B y|B respectively, where x, y3Z2 are the spatial co-ordinates and k , k are the membership functions of the A B image and the structuring element, respectively. f Fuzzy soft morphological xlters [13]. Here fuzzy "tting is introduced into soft morphology. Fuzzy soft erosion and dilation are de"ned as follows: k > 1 2 (x) A *B ,B ,k+ "min [1, min(k) (Mke(k (x#y)!k 1 (y)#1)N A B y|B1 z|B2 XMk (x#z)!k 2 (z)#1N)], (9) A B (x) k A^*B1 ,B2 ,k+ "max [0, max(k) (Mke(k (x!y)#k 1 (y)!1)N A B y|B1 z|B2 XMk (x!z)#k 2 (z)!1N)], (10) A B respectively, where k , k , k are the membership A B1 B2 functions of the image, the core and the soft boundary of the structuring element. Additionally, for the fuzzy structuring element BLZ2 : B"B XB and B W B "0. 1 2 1 2 3. Algorithm description In this section an algorithm which computes any weighted order statistic is described [14]. It is based on a local histogram technique [15,16]. According to this any order statistic can be found by summing the values in the histogram until the desired order statistic is reached.
1015
The number of steps of this process depends on the values of the local histogram. Here, instead of adding the local histogram values serially, a successive approximation technique has been adopted. This ensures that the result is traced in a "xed number of steps. The number of steps is equal to the number b of bits per pixel. According to the successive approximation technique the N pixel values are initially compared with 2b~1. Pixel values, which are greater than, less than or equal to that value are marked with labels GT, LT and EQ, respectively. GT, LT and EQ are either 0 or 1. Pixel labels are then multiplied by the corresponding pixel weight. The sum of the LTs and EQs determines whether the kth order statistic is greater than, less than or equal to 2b~1. More speci"cally: (i) If the sum +N w (EQ #LT ) is greater than or j j j/1 j equal to k and the sum +N w LT is less than k, j/1 j j then the kth order statistic is 2b~1. (ii) If the sum +N w LT is greater than or equal to k, j/1 j j then the kth order statistic is less than 2b~1 and, therefore, 2b~2 should be subtracted from 2b~1. (iii) Otherwise, the kth order statistic is greater than 2b~1 and, therefore, 2b~2 should be added to 2b~1. If the kth order statistic is not traced according to (i), then the comparison of each pixel value is made with the subtraction or addition result of (ii) or (iii), respectively. This is called temporal result (temp); when condition (i) is valid the kth order statistic is equal to temp. The process is repeated until the required weighted order statistic is computed. In the ith step number 2b~1~i is either subtracted or added to temp. Thus, the kth order statistic is computed in b steps (worst case). In the common case of 8-bit image pixel resolution, the local histogram summation algorithm [16] requires 1 to 28!1"255 (worst case) steps to compute an order statistic, whereas the described algorithm requires 1 to 8 steps. In general, the number of steps in the "rst case is exponentially related to the pixel resolution, i.e. there is a relationship O(2"). In the second case this relationship becomes linear (O(b)). Table 1 presents an illustrative example of the proposed algorithm application to a "ve pixel data window of 4-bit resolution with weights: w "1, w "3, 1 2 w "5, w "3, w "1. The 4th largest element (max 3 4 5 [4]) of this window is searched; this is the 10th order statistic and, therefore, k"10.
4. Design and implementation of the proposed hardware structure The block diagram of the proposed hardware structure is shown in Fig. 1. The pipeline principle is used in the entire process. Its inputs are a 1-byte data bus, the selection bits S , S of the input demultiplexer, the reset and 1 0
1016
A. Gasteratos, I. Andreadis / Pattern Recognition 33 (2000) 1013}1021
Table 1 An illustrative example of the algorithm use Image pixels i
2
1
5
10
7
1
LT
2
LT
5*LT 5*GT 5*LT
3*GT 3*GT 3*GT
GT
6
5*LT
3*GT
EQ
7
3
LT
3*LT 3*LT 3*LT
4
LT
3*LT
Temp
Condition
LT
8
GT
4
+5 w LT *10NSUB j/1 j j +5 w LT (10NADD j/1 j j +5 w LT (10NADD j/1 j j +5 w (EQ #LT )*10 AND j j j/1 j +5 w LT (10 j/1 j j
Output 0 0 0 7
Fig. 1. Block diagram of the proposed hardware structure.
the clock signals. The states of the selection bits of the demultiplexer determine the type of input loaded. More speci"cally, for S S "00 1 byte of control data is 1 0 loaded. Its two most signi"cant bits (MSBs) determine whether the operation is fuzzy or not and whether the operation is erosion or dilation. The other bits correspond to the order index k. Table 2 presents the hexadecimal control codes of this byte for some representative operations. This byte is stored into an 8-bit register with parallel load (k register). The input decoder controls the load input of this register. When S S "01 nine bytes 1 0
are loaded serially. These correspond to the weights of the 3]3-pixel neighborhood and are stored into the w array of registers. This array is a serial to parallel module consisting of nine 8-bit registers with parallel load and it is also controlled by the input decoder. If a smaller than a 3]3-pixel image window is to be processed, then pixels which are not taken into account are considered to have zero weights. In standard rank order "lters and morphological operations the weights are equal to 1. Fig. 2 depicts an example of a cross-shaped neighborhood, with central pixel weight equal to 3. In
A. Gasteratos, I. Andreadis / Pattern Recognition 33 (2000) 1013}1021
1017
Table 2 Hexadecimal control codes for some representative operations performed by the proposed ASIC Operation
Hexadecimal code
Median Center weighted median (w"3) Max Standard dilation Min Standard erosion Soft dilation (k"3, the core of the structuring is the central pixel) Soft erosion (k"3, the core of the structuring is the central pixel) Fuzzy dilation Fuzzy erosion Fuzzy soft dilation (k"3, the core of the structuring is the central pixel) Fuzzy soft erosion (k"3, the core of the structuring is the central pixel)
05 06 09 09 01 41 0B
Fig. 3. Structuring element pixel loading in the cases of erosion and dilation.
element 41 element 89 C1 8B element Cl element
Fig. 2. Example of weights consideration.
soft and fuzzy soft morphological operations the weights of core pixels are equal to the order index and the weights of soft boundary pixels are 1. When S S "10 the nine 1 0 bytes of the structuring element are loaded serially. These are stored into the structuring element array of registers, which has a similar structure with the w array of registers. The sequence of structuring element pixel loading is shown in Fig. 3. In non-morphological operations (rank order and weighted rank order operations) the values of the structuring element pixels are 0. Finally, when S S "11 the image pixels are loaded into the image 1 0 array of registers. Every nine clock pulses one 3]3-pixel image area is loaded. The original clock frequency is 250 MHz. The data loading frequency is extracted by dividing the original clock by eight through a frequency
divider (Fig. 1). This lower frequency is further divided by nine and it is used as the clock input to the rest of the process, but to the order statistic module. The two lower frequencies are used to synchronize image data loading (nine pixels) and order statistic module operations (eight steps of processing). The next stage of the circuit is the adders/subtractors module. The block diagram of this module is shown in Fig. 4. The pixels of the image and the structuring element are latched by means of 18 registers. The outputs of the registers are fed into nine adders/subtractors. These, depending on the operation (dilation/erosion), add/subtract the image pixels with the corresponding structuring element pixels. In the next stage, provided that the operation is fuzzy, number FF is subtracted or added in the case of dilation or erosion, respectively, according to Eqs. (7)}(10). For normalized pixel values [17] this number corresponds to 1. If the operation is not fuzzy, then 00 is subtracted or added and, therefore, the result is not a!ected. Since the addition results of dilation and fuzzy erosion may be greater than FF and the subtraction results of erosion and fuzzy dilation may be less than 00, the results are limited, through a clipper module, in the range [00..FF]. This module tests the MSB of the result and outputs either FF or 00 for each of the aforementioned over#ow cases. The results are then fed into the order statistic module. The order index k and the weights, properly delayed through registers are also inputs to this module. The block diagram of the order statistic module is shown in Fig. 5. It consists of nine pipeline stages. A non-linear computation operation is completed in eight successive steps. In each of these steps the value of the intermediate 8-bit signal temp is changed to (temp$2b~1~i), according to the algorithm. This is achieved through a feedback loop (Fig. 5). The addition/subtraction results are compared concurrently with signal temp by means of nine comparators. Each of these comparators has two outputs, determining whether the addition/subtraction result is less than or equal to signal temp. Thus, labels LT and EQ are extracted and latched through 18 #ip-#ops (F/F). Then, these labels are
1018
A. Gasteratos, I. Andreadis / Pattern Recognition 33 (2000) 1013}1021
Fig. 4. Adders/Subtractors module.
multiplied with the corresponding weights (according to the algorithm) through an array of 18 multipliers (II). Since labels are 1-bit long, each of these multipliers is an array of 2-input AND gates. The "rst input of each AND gate is the label and the second a bit of the binary representation of the weight. The multiplication results are collected by means of 18 registers (R). In the next stages the sums +N w EQ and +N w LT are formed, j j/1 j j/1 j j by means of two parallel tree adders (R), each consisting of three pipeline stages; the sum +N w (EQ #LT ) j j j/1 j is formed through a 6-bit adder. The results +N w j/1 j (EQ #LT ) and +N w LT are compared with the j j j/1 j j order index k, by means of two comparators. If the logical product of the comparators outputs is 1, then the result has been found. If it is so or the result has been found on a previous step (this is denoted by the internal signal found), then no further addition or subtraction is required. Signal found controls the multiplexer that provides to the adder/subtractor either 00 or 2b~1~i, in the cases of "nding the result or not, respectively. Number 2b~1~i is obtained by means of a b-state counter. The states of the counter are powers of 2 in descending order. The inverted output of the second comparator denotes that +N w LT *k. This controls the operation of the j/1 j j adder/subtractor, through the internal signal add}sub.
The result of the addition/subtraction is the new value of signal temp and it is stored into register RH (Fig. 5). Registers RH and RHH are triggered by the original clock divided by nine and by 72, respectively; the rest part of order statistic module (registers R) is triggered by the original clock. Therefore, in nine clock pulses of the original clock the new temp value is computed. In order to compute the output value o, 8]9"72 clock pulses of the original clock are required (the number of steps that are required for the computation of the output is eight). 4.1. VLSI implementation The Cadence DFWII VLSI CAD tool with the AMS hit-kit 2.40 have been used to design and implement the ASIC. It has been implemented using a 0.8 lm, DLM, CMOS technology. A microphotograph of the ASIC, is shown in Fig. 6. The core dimensions of the chip are 2.88 mm]2.8 mm"8.06 mm2 and its die size dimensions are 3.72 mm]3.64 mm"13.54 mm2. The inputs to the ASIC are the 8-bit data input, selection signals S and S , the clock and the reset signals, the power and 1 2 ground connections for both the core and the periphery of the chip, whereas the output is the 8-bit computed
A. Gasteratos, I. Andreadis / Pattern Recognition 33 (2000) 1013}1021
1019
Fig. 5. Order statistic module.
value. The simulation and test language STL, a high level language, has been used to examine the functionality of the ASIC. Its maximum frequency of operation is 3.5 MHz. A typical timing diagram, including the values of the variables in the various stages is presented in Fig. 7. In this "gure three cycles of the proposed successive approximation technique are demonstrated. The "rst line is the original clock divided by nine and this triggers the counter. The eight states of the counter are shown in the second line of the diagram. The next line represents signal found. When this signal is set to `1a the output of the multiplexer is set to 00 (next line). The operation of addition/subtraction is controlled by signal add}sub (next line). When this is `1a, then the output of the multiplexer is subtracted from the previous temp value (next line), otherwise it is added. The next line shows the original clock divided by 72 and this triggers the output register RHH. The "nal output is obtained when the order statistic module has completed the required eight steps. The structure presented in this section is scaleable both in terms of pixel resolution and size of image window. For higher pixel resolution the circuit should be expanded linearly to accommodate the new pixel representation. Furthermore, additional steps will be needed in
the order statistic module (equal to the additional number of bits in image pixel resolution representation). For larger size image windows additional registers to store data and adders/subtractors and comparators for the arithmetic operations are required. The ASIC can operate faster (approximately eight times), by utilizing eight order statistic modules consequently in pipeline fashion. Also, for an even faster operation the image data window could be loaded in parallel. This hardware module would perform 8]9"72 times faster approximately.
5. Conclusions A new ASIC suitable for performing non-linear image processing operations has been presented in this paper. It is based on local histogram and a successive approximation technique. This ASIC performs on 3]3-pixel image windows and computes rank order "lters, weighted rank order "lters, standard erosion and dilation, soft erosion and dilation, order statistic soft erosion and dilation, fuzzy erosion and dilation and fuzzy soft erosion and dilation. The die size dimensions for the chip are 3.72 mm]3.64 mm"13.54 mm2, for a 0.8 lm, DLM,
1020
A. Gasteratos, I. Andreadis / Pattern Recognition 33 (2000) 1013}1021
Fig. 6. A microphotograph of the proposed ASIC.
Fig. 7. Demonstration of the successive approximation technique.
CMOS technology process. It performs 3.5]106 non-linear "lter operations per second. The architecture of the ASIC is scaleable both in terms of pixel resolution and image window size and its hardware complexity is linearly related to both of them.
References [1] I. Pitas, A.N. Venetsanopoulos, Nonlinear Digital Filters: Principles and Applications, Kluwer Academic Publishers, Boston, 1990.
A. Gasteratos, I. Andreadis / Pattern Recognition 33 (2000) 1013}1021 [2] P. Maragos, R.W. Schafer, Morphological "lters-Part II: their relations to median, order-statistic, and stack "lters, IEEE Trans. Acoust. Speech Signal Process. 35 (1987) 1170}1184. [3] M. Chefchaouni, D. Schonefeld, Morphological representation of order-statistics "lters, IEEE Trans. Image Process. 4 (1995) 835}845. [4] J. Serra (Ed.), Image Analysis and Mathematical Morphology: Theoretical Advances, Vol. 2, Academic Press, London, 1988. [5] I. Bloch, H. Maitre, Fuzzy mathematical morphologies: a comparative study, Pattern Recognition 28 (1995) 1341}1387. [6] D. Shinha, E.R. Dougherty, Fuzzy mathematical morphology, J. Visual Commun. Image Repres. 3 (1992) 286}302. [7] M. Juhola, J. Katajainen, T. Raita, Comparison of algorithms for standard median "ltering, IEEE Trans. Signal Process. 39 (1991) 204}208. [8] D.S. Richards, VLSI median "lters, IEEE Trans. Acoust. Speech Signal Process. 38 (1990) 145}153. [9] C. Chakrabarti, L. Lucke, VLSI architectures for weighted order-statistic "lters. Proceedings of the IEEE Int. Symp. Circuits and Systems, Monterey, California, USA, Vol. II, 1998, pp. 320}324.
1021
[10] P. Kuosmanen, J. Astola, Soft morphological "ltering, J Math. Imaging Vision 5 (1995) 231}262. [11] C.C. Pu, F.Y. Shih, Threshold decomposition of gray-scale soft morphology into binary soft morphology, Graph. Mod. Image Process. 57 (1995) 522}526. [12] F.Y. Shih, C.L. Lai, S.C. Pei, J.H. Horng, Order statistic soft morphological "lters. Proceedings of International Conference on Information Sciences, Research Triangle Park, NC, 1997, pp. 1}4. [13] A. Gasteratos, I. Andreadis, Ph. Tsalides, Fuzzy soft mathematical morphology, IEE Proc. Vision, Image Signal Process. 145 (1998) 41}49. [14] A. Gasteratos, I. Andreadis, A new algorithm for weighted order statistics operations, IEEE Signal Process. Lett. 4 (1999) 84}86. [15] T.S. Huang, G.J. Yang, G.Y. Tang, A fast two-dimensional median "ltering algorithm, IEEE Trans. Acoust. Speech Signal Process. 27 (1979) 13}18. [16] E.R. Dougherty, J. Astola, in: An Introduction to Nonlinear Image Processing, SPIE, Bellingham, 1994. [17] V. Goetcharian, From binary to gray tone image processing using fuzzy logic concepts, Pattern Recognition 12 (1980) 7}15.
About the Author*ANTONIOS GASTERATOS received the Diploma and the Ph.D. degrees from the Department of Electrical and Computer Engineering, Democritus University of Thrace, Greece, in 1994 and 1999, respectively. He is currently a post-doc fellow in the Laboratory for Integrated Advanced Robotics, Department of Communication, Computer and System Science, University of Genoa, Italy, with a TMR (EU) grand. His research interests include computer vision, fuzzy and non-linear image processing, digital VLSI design and computer architectures. He is a member of the Technical Chamber of Greece (TEE). About the Author*IOANNIS ANDREADIS received the Diploma Degree from the Department of Electrical Engineering, DUTH, Greece, in 1983, and the M.Sc. and Ph.D. degrees from the University of Manchester Institute of Science and Technology (UMIST), UK, in 1985 and 1989, respectively. His research interests are mainly in machine vision and VLSI-based computing architectures for machine vision. He joined the Department of Electrical and Computer Engineering, DUTH, in 1992. He is a member of the Editorial Board of the Pattern Recognition Journal, the Technical Chamber of Greece (TEE) and the IEEE.
Pattern Recognition 33 (2000) 1023}1032
Analog implementation of erosion/dilation, median and order statistics "lters S. Vlassis!,*, K. Doris!, S. Siskos!, I. Pitas" !Laboratory of Electronics, Department of Physics, Aristotle University of Thessaloniki, 54006 Thessaloniki, Greece "Department of Informatics, Aristotle University of Thessaloniki, 54006 Thessaloniki, Greece Received 12 December 1998; received in revised form 27 May 1999; accepted 23 June 1999
Abstract In this work an analog implementation of nonlinear "lters based on a current-mode sorting/selection network is presented. Three nonlinear "lters, an erosion/dilation, a median and an order statistics "lter are implemented. The circuits are designed using a new high-speed and very accurate current maximum and minimum selector. These "lters could be easily incorporated to smart sensors as well as to smart cameras. SPICE simulation results demonstrate the feasibility of simple analog "lters using current-mode techniques. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Analog circuits; Signal processing; Nonlinear "lters
1. Introduction Nonlinear order statistics "lters is a well-known "lter class based on data ordering within the "lter window [1,2]. They behave very well in impulsive noise removal, where linear "lters fail. When properly designed in the form of L-"lters, they can cope with di!erent noise probability distributions [1]. It can be easily proven that the basic mathematical morphology operators erosion/ dilation of a function by set are essentially a special case of order statistics "lters. Furthermore, they can be used as local signal dispersion estimators. As a result, this "lters class has been extensively used both in signal and image processing. Let x , i"1,2, N be a sampled (digital or analog) i signal. Let us use a "lter window of odd size n"2l#1. The data samples x , j"!l2l within the "lter i`j window form the "lter output. If we order them, we obtain the ordered samples x )x )2)x (1) (2) (n)
* Corresponding author. Tel.: #30-31-99-80-79; fax: #3031-99-80-18. E-mail address:
[email protected] (S. Vlassis)
which are also called order statistics [3]. x is the (1) minimum sample and x is the maximum sample within (n) the "lter window. The output x is the local signal (l`1) median. The local max/min operators correspond to the erosion/dilation of the signal x by set equal to the "lter i window. Sorting (ordering) and max/min selection are classical topics in computer science [4]. Several topologies have been proposed for data sorting, called sorting networks. Such a network used for sorting n"5 signal samples is shown in Fig. 1. It is called odd/even transportation network. The vertical bars denote max}min selectors of the form y "maxMx , x N, (1) 1 1 2 y "minMx , x N, (2) 2 1 2 which form the basic building block of the network. A variety of such sorting or order selection network can be formed in Refs. [1,4]. A special case for median calculation of n"9 signal samples is shown in Fig. 2. Again max/min selector is the basic building block. Finally, a max/min (erosion/dilation) network is shown in Fig. 3. In the following, we shall concentrate our e!orts in proposing "lter architectures that are suitable to order
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 6 2 - 4
1024
S. Vlassis et al. / Pattern Recognition 33 (2000) 1023}1032
(e.g. sound, ECG/EEGS, measurements). However, similar architectures and their implementation can be used for two-dimensional signal (image) processing.
2. The current-mode min/max selector circuit 2.1. The block diagram of the min/max current selector
Fig. 1. Odd/even transportation network for n"5. The vertical bars denote max/min selectors.
Fig. 2. Median "lter structure for n"9.
In nonlinear "ltering based on order statistics, a basic operation is to compare signal samples and change their position according to their relative rankings. This section explains the operation of a new current-mode min}max selector which sorts its input currents in ascending order by determining I "maxMI , I N and I " .!9 1 2 .*/ minMI , I N at its outputs. 1 2 Fig. 4 shows the block diagram of the proposed min/max current selector. The input stage consists of two delay elements feeding the two PMOS input current mirrors with currents I and I . These currents are 1 2 mirrored into the basic current maximum circuit [13] as well as to the swap circuit. The feedback circuit is used to correct the corner error of the current maximum circuit. Two high-speed current comparators comp1 and comp2 are driven by the current maximum circuit. < and o1 < are their digital outputs that change the position of o2 currents I and I at the swap circuit according to their 1 2 relative rankings. 2.2. The delay elements Since "ltering of serial input data involves sampling the input periodically, a clock delay circuit is needed to sample the signal and to synchronize the data #ow in the structure. In this work, a delay line has been used based on the well-known switched-current (SI) technique, described in detail in Refs. [12,14]. 2.3. The current maximum circuit
Fig. 3. Erosion/dilation "lter structure for n"8.
statistics "ltering and that are easily implemented in a hybrid (analog/digital) way. The motivation to build these architectures is to construct fast, simple and a!ordable "lters that operate directly on the analog signal and can be easily incorporated to smart sensors as well as into smart cameras. However, there are only few publications on analog realization [5}9] and three recent current-mode designs [10}12]. The proposed architectures are essentially suited to one-dimensional signal "ltering
The two input current mode maximum circuit is shown in Fig. 5. Each maximum cell for an input current is formed by the transistor pairs M !M , and 11 12 M !M . The current I of the diode connected 21 22 b transistor M is used as the current source of the circuit. b Furthermore, this circuit is modi"ed to have two outputs instead of one as stated in Ref. [13]. The equilibrium state implies that only one of the output paths M , M is in conduction, the drain cur12 22 rent of M is the exact replica of the maximum input b current. We suppose that I 'I . The drain voltages 1 2 < and < of the transistors M and M , respectD1 D2 11 21 ively, are established by the input currents I and I and 1 2 < '< since the voltage < is established by the D1 D2 D1 maximum input current. The transistors M , M are 11 21 regarded as a di!erential transistor pair.
S. Vlassis et al. / Pattern Recognition 33 (2000) 1023}1032
1025
Fig. 4. Block diagram of the min}max selector.
The inherent problem associated with this implementation is the `corner errora in the transition region of the current maximum circuit. When Eq. (3) is not satis"ed, the two output currents do not have the correct values. 2.4. High-speed current comparator Fig. 6 shows a high-accuracy and high-speed current comparator [15]. This comparator can operate for lowvoltage supplies and the operation speed is not limited due to the overlapping capacitor of the MOS transistor (Miller e!ect). Furthermore, the operation is virtually insensitive to technological parameters. The circuit employs nonlinear negative feedback to obtain high-accuracy, less than 1.5 pA, and high-speed operation down to 10 ns.
Fig. 5. Current maximum circuit.
At the steady state, when the voltage di!erence < $*& between < and < satis"es the equation D1 D2
S
< "D< !< D* dif D1 D2
2I b b
(3)
where b"(kC /2)=/¸, k is the electron mobility, C is ox ox the gate oxide capacitance of the MOS, = is the channel width and ¸ is the channel length of the MOS, the currents will #ow through the transistor of the di!erential pair with the maximum input voltage. Thus, the output current I will be equal to I and the current o1 b I will be zero. Transistors M and M then form a o2 b 11 current mirror and the current I is equal to the maxb imum input current I . According to the above consider1 ations the output currents are I "I and I "0. Only o1 1 o2 one output current of the current maximum circuit is equal to the maximum input current (and di!erent to zero), while the other output is zero.
2.5. The overall structure of the min/max selector with the corner error correction The complete min/max selector circuit is shown in Fig. 7. The currents I and I are its inputs and the 1 2 currents I and I are its outputs. The currents .!9 .*/ I and I are mirrored into the feedback circuit via the o1 o2 PMOS current mirrors M !M !M (i"1, 2) as i3 i4 i5 well as into the subtraction nodes A and A . The tran1 2 sistors M and M are identical so their drain currents C1 C2 are equal to I #I o2 . I " o1 f 2
(4)
The feedback current I is mirrored to the subtraction f nodes A and A via the current mirror M , M and 1 2 C1 16 M . As a result, the currents at the input of the current 26
1026
S. Vlassis et al. / Pattern Recognition 33 (2000) 1023}1032
Fig. 6. High-speed current comparator and its symbol.
Fig. 7. Overall structure of the min}max selector circuit.
comparators I , I are the di!erence between each outC1 C2 put current and the feedback current I and are given by f I "I !I Ci oi f
for i"1, 2
(5)
At the beginning, we assume that at steady state I 'I , as 1 2 shown in Fig. 8(a). Thus, the feedback current I is equal to f
I "I /2. Using Eqs. (4) and (5), we extract the following: f 1 I I I ! 1" 1, i"1, 1 2 2 I " (6) Ci I I 0! 1 "! 1 , i"2 2 2 for the input currents of the comparators.
G
S. Vlassis et al. / Pattern Recognition 33 (2000) 1023}1032
1027
It is clear from Eq. (8) that the output < is at logic o1 &one' within the transition region for the case I 'I 1 2 (Fig. 8(b)). Using the same considerations for the case that I 'I , the output voltages are #ipped, so output 2 1 < is at logic &one', corresponding to the new maximum o2 input current, while I is at logic &zero'. 1 2.6. Swap circuit The current swapping circuit employs NMOS di!erential switches to achieve fast settling time. The swapping operation is realized when the gates of M and M are S2 S3 driven to logic &one' via comparator comp2 and at the same time the gates of M and M are driven to logic S1 S4 &zero' via comp1. In the opposite case, no swapping operation will be performed.
Fig. 8. Input and output currents (8a), digital outputs (8b).
Thus, current I of comp1 which corresponds to the C1 maximum input current is positive, while I of current C2 comp2 is negative. Consequently, the digital voltage outputs of the comparators will be
G
&logic one', i"1, < " oi &logic zero', i"2,
(7)
as shown in Fig. 8(b). This means that only the digital output < corresponding to the maximum input current oi (I in our case) is at logic one, while the other one goes to 1 logic zero. At the transition region where the two inputs are very close to each other, Eq. (3) is not satis"ed. The corner error is associated with the linear operation of M , M 11 21 within the transition region as shown in Fig. 8(a). At the crossing point, where the input currents are equal I "I , the transistors M , M have the same drain1 2 11 21 source voltage < "< because their gate-source voltD1 D2 ages are always the same. Consequently, M and M , 12 22 which are saturated, produce the same drain currents I "I "I . Thus, the input currents are o1 o2 o crossed at the same point with the output currents. At this crossing point, I is also equal to I since f o I "(I #I )/2"I "(I #I )/2"I (Fig. 8(a)). f o1 o2 f o o o Due to feedback circuit every time the two output currents I and I , which correspond to the old maxo1 o2 imum and the new maximum input current, are compared with the feedback current I . The output of the f circuit during the transition region is
G
&one' for I 'I 'I o1 f o2 . < " oi &zero' for I (I (I o1 f o2
(8)
3. E4ect of component mismatches In the preceding analysis, the basic "lters operation has been described by neglecting e!ects such as component mismatches. Mismatches on the transistors aspect ratio, on the oxide thickness and on mobility give rise to mismatch in the transconductance parameters b and on the threshold voltage < of the MOS transistors. These T mismatches a!ect the total accuracy of the proposed implementation in two ways. First, they limit the resolution of the min}max selector circuit. Second, a!ect the accuracy of the current mirrors which are used at every min}max selector circuit in order to convey the input current at the "nal "lter output. It is assumed that the mismatches between the transistors in b and in < are *b and *< , respectively. T T By expanding the well-known equation of the MOS transistors which operate in the linear and in the saturation region around the nominal bias point, the drain current mismatch *I /I , in the linear region is D D given by [16] !*< *< *b *< *I T # GS # # DS , D" < !< < !< b < I GS T GS T DS D
(9)
and in the saturation region by *I !2*< 2*< *b D" T # GS # . I < !< < !< b D GS T GS T
(10)
We use the above equations in order to calculate the e!ect of mismatches in the min}max selector circuit. It is assumed that, in the transition region, the transistors M , M and M , M operate in the linear and in the 11 21 12 22 saturation region, respectively. Then, the output current mismatch *I /I of the min}max selector circuit (Fig. 5) o o
1028
S. Vlassis et al. / Pattern Recognition 33 (2000) 1023}1032
is given by
A
B
*I 2b < b2 o "*< ! # DS T I g g g o mi2 mi2 mi1 *b < < *I # 1! DS b # DS b , (11) b g g I mi2 mi2 where b"b "b , g "b(< !< ), g " i1 i2 mi1 GSi1 T mi2 b(< !< ), I"I "I , < "< !< "< GSi2 T o1 o2 DS D1 SS D2 !< , I"I "I and i"1, 2. The drain current misSS 1 2 match of each current mirror *I /I is given by M M !2*< *b *I D" T # , (12) I (< !< ) b GS T M D where b is the transconductance parameter and < M GS is the gate-source voltage of the transistors, which form the current mirrors. These mismatches on the current mirrors can be minimized by the careful design of the transistors that form the current mirrors of each min}max selector, separately.
A
B
4. Simulation results In order to verify the performance of the proposed circuits SPICE simulations were performed on the extracted netlist of the their layout using Cadence design tools with AMS 1.2 lm CMOS process parameters. The supply voltage was $2.5 V. The dimensions of the transistors of the high-speed comparator were modi"ed
Table 1 Monte Carlo simulated precision for the proposed min}max selector, order statistics, median and erosion/dilation "lter Error (%) Input current Min}Max (lA) selector
Erosion/ dilation
Median
Order statistics
1 5 10 15 20 25 30
0.38 0.39 0.26 0.23 0.32 0.43 0.56
1.17 1.18 0.8 0.73 1.0 1.1 1.12
0.5 0.52 0.35 0.32 0.43 0.58 0.75
0.043 0.044 0.029 0.022 0.025 0.032 0.039
appropriately for the AMS technology parameters. The dimensions of the transistors of the min}max selector are: (=/¸) "12 , (=/¸) "25 and Mi1, Mi2, MB 1.2 Mi3, Mi4, Mi5, MB 5 (=/¸) "12. In order to improve the accuMc1, Mc2, Mi6 5 racy, minimize systematic errors and increase the output resistance of the current mirrors, low-voltage high-swing current mirrors have been used [17]. It should be noted that the e!ect of spikes on circuit performance, due to the switches feedthrough have been minimized with dummy switches that were used for the simulations of the switched-currents delay lines.
Fig. 9. Input and output of the erosion/dilation "lter with window size equal to 8.
S. Vlassis et al. / Pattern Recognition 33 (2000) 1023}1032
1029
Fig. 10. Input (10a) and output (10b) of the median "lter for n"9.
Table 1 shows the precision attained for the worst case as a function of the input current level, measured through Monte Carlo simulations, and using the statistical parameters reported by Pelgrom et al. [18]. In case of the min}max selector, one of its input currents is held constant and the second one was swept around this value, measuring the point at which the transition occurred. The error caused by component mismatches is very small, since this circuit has a symmetrical structure due to the feedback circuit, and the error due to the PMOS current mirrors M ,M is partially removed 13,14,15 23,24,25
by the error due to the NMOS current mirrors M !M . In case of the three "lters shown in C1,C2 16,17 Figs. 1}3, Monte Carlo simulations show that accuracy is degraded with increasing number of comparisons. However, even in case of the median "lter with window size n"9 the error is about 1%. The errors were computed using the following equation: DI !I D#I mean sigma ]100, error" in range
(13)
1030
S. Vlassis et al. / Pattern Recognition 33 (2000) 1023}1032
Fig. 11. Input currents (11a) and output currents (11b) of the order statistics "lter for n"5.
where I , I is the mean value and the standard mean sigma deviation of the current, respectively, and range"29 lA. Finally, to ensure an accurate operation of the proposed "lters, a special care must be taken during the layout design. The transistors M and M (Fig. 5) were 12 22 designed using common centroid geometry, while current mirror transistors were split and interdigitized in order to minimize technological parameter errors.
The two input min}max selector circuit has a 10 ns response and 10 nA resolution. Fig. 9 shows the results for erosion/dilation (max/min) "lter having window size equal to 8. The sampling period is 100 ns. It is clear that the circuit requires eight sampling periods for the initialization. After eight periods the output currents I and .!9 I take the maximum and minimum value, respectively, .*/ for each window. The values of local maximum and
S. Vlassis et al. / Pattern Recognition 33 (2000) 1023}1032
minimum input currents remain constant for eight periods. The accuracy of the max/min "lter is limited only by the accuracy of current mirrors and is less than 0.6%. The total delay time is measured to be less than 30 ns. Fig. 10 shows the results for a median "lter having window size equal to 9. The input signal is a waveform corrupted with impulsive noise. The sampling period is 200 ns. It is clear that the 1 MHz ripples and impulsive ringings are removed, while the characteristic shape of the waveform is retained. The accuracy of the median "lter is about 1%. The measured delay time of the median "lter is approximately equal to 2 ls. This delay time is mainly contributed due to the fact that the window size is equal to 9. Fig. 11(a) shows the input current for the order statistics "lter with window size 5. The "lter has 4 constant input currents I "20 lA, I "15 lA, I "10 lA, 1 2 3 I "5 lA, and the 5th input I is a triangular signal with 4 5 25 lA amplitude and period 20 ls. The outputs of the "lter are shown in Fig. 11(b). The accuracy is better than 0.8% and the delay time is less than 40 ns. All the presented "lters may also operate with low supply voltage as $1.5 V, in order to minimize power consumption and meet supply voltage requirements of portable devices.
5. Conclusion Three analog implementations of nonlinear "lters, an erosion/dilation, a median and an order statistics "lter, are presented. The circuits are designed using switchedcurrent delay lines and current-mode sorting/selection network. The comparison block is a new two input analog, current-mode, min/max selector circuit, presenting improved speed and higher accuracy than the conventional ones. Since this is the basic building block for the three "lter implementations the output error of the "lters is about 1% or less, depending on the number of comparisons as demonstrated by Monte Carlo analysis. These "lters could be easily incorporated to smart sensors as well as to smart cameras.
1031
[2] E.R. Dougherty, J. Astola, An Introduction to Nonlinear Image Processing, SPIE Press, Washington, 1994. [3] H.A. David, Order Statistics, Wiley, New York, 1980. [4] D.E. Knuth, The Art of Computer Programming, Vol. 3, Addison-Wesley, New York, 1973. [5] J.S.J. Lin, W.H. Holmes, Analog implementation of median "lters for real-time signal processing, IEEE Trans. Circuits Systems 35 (1988) 1032}1033. [6] T. Jarske, O. Vainio, A review of median "lter systems for analog signal processing, Analog Integr. Circuits Signal Process. 3 (1993) 127}135. [7] S. Paul, K. Huper, Analog rank "ltering, IEEE Trans, Circuits Systems-I: Fundam. Theory Appl. 40 (1993) 469}475. [8] K. Urahama, T. Nagao, Direct analog rank "ltering, IEEE Trans. Circuits Systems-I: Fundam. Theory Appl. 42 (1995) 385}388. [9] C.K. Tse, K.C. Chun, Design of a switched-current median "lter, IEEE Trans, Circuits Systems-II: Analog Digital Signal Process. 42 (1995) 356}359. [10] V.V.B. Rao, D.C. Kar, A new analog voltage sorter, IEEE Trans. Instrum. Measur. 41 (1992) 714}716. [11] J.E. Opris, G.T.A. Kovacs, Analogue median circuit, Electron. Lett. 30 (1994) 1369}1370. [12] S. Siskos, S. Vlassis, I. Pitas, Analog implementation of min/max "ltering, IEEE Trans. Circuits Systems-II 45 (1998) 913}917. [13] C.-Y. Huang, B.-D. Liu, Current-mode multiple input maximum circuit for fuzzy logic controllers, Electron. Lett. 30 (1994) 1924}1925. [14] B. Jonsson, S. Eriksson, New clock-feedthrough compensation scheme for switched-currrent circuits, Electron. Lett. 29 (1993) 1446}1447. [15] G. Linan-Cembrano, R. Del Rio-Fernandez, R. Dominguez-Castro, A. Rodriguez-Vazquez, Robust high-accuracy high speed continuous-time CMOS current comparator, Electron. Lett. 33 (1997) 2082}2084. [16] C.J. Abel, C. Michael, M. Ismail, C.S. Teng, R. Lahri, Characterization of transistor mismatch for statistical CAD submicron CMOS analog circuits, ISCAS'93 Chicago, 1993, pp. 1401}1404. [17] P.J. Crawley, G.W. Roberts, High-swing MOS current mirror with arbitrarily high output resistance, Electron. Lett. 28 (1992) 361}363. [18] M.J.M. Pelgrom, A.C.J. Duinmaijer, A.P.G. Welbers, Matching properties of MOS transistrs, IEEE J. SolidState Circuits 24 (1989) 1433}1440.
References [1] I. Pitas, A.N. Venetsanopoulos, Nonlinear Digital Filters: Principles and Applications, Prentice-Hall, Englewood Cli!s, NJ, 1993.
About the Author*SPIRIDON VLASSIS was born in 1971. He received the B.Sc. degree in Physics from the Aristotle University of Thessaloniki, Greece and the M.Sc. degree in Electronic Physics from the same University in 1994 and 1996, respectively. He is currently working toward the Ph.D. degree in analog signal processing circuits. His research interests are analog integrated circuit design, current mode integrated circuit design, sensor interfacing integrated circuits and design of signal processing circuits.
1032
S. Vlassis et al. / Pattern Recognition 33 (2000) 1023}1032
About the Author*KOSTANTINOS DORIS was born in 1973. He received the B.Sc. degree in Physics from the Aristotle University of Thessaloniki, Greece and the M.Sc. degree in Electronic Physics from the same University in 1996 and 1998, respectively. He is currently working towards the Ph.D. degree in Eindhoven University of Technology, The Netherlands. His research interests are in the "elds analog integrated circuit design, mixed analog/digital design and Fast D/A and A/D Conversion. About the Author*STILIANOS SISKOS was born in 1956. He received the B.Sc. degree in Physics from the Aristotle University of Thessaloniki, Greece, in 1980 and the M.Sc. and Ph.D. degrees in Electronics from the University of Paul Sabatier de Toulouse, France, in 1983. He had been a lecturer at the School of Technology of Thessaloniki from 1985 to 1989. He joined the Electronics Laboratory, Physics Dept, Aristotle University of Thessaloniki in 1989 as a Lecturer and, since 1993 he is an Assistant Professor in the same laboratory. His current research interests are analog integrated circuit design, mixed built-in signal structures, current mode integrated circuit design, sensor interfacing integrated circuits and design of signal processing circuits. He is a member of the IEEE. About the Author*IOANNIS PITAS received the Diploma in Electrical Engineering in 1980 and the Ph.D. degree in Electrical Engineering in 1985 both from the University of Thessaloniki, Greece. Since 1994, he has been a Professor at the Department of Informatics, University of Thessaloniki. From 1980 to 1993 he served as Scienti"c Assistant, Lecturer, Assistant Professor, and Associate Professor in the Department of Electrical and Computer Engineering at the same University. He served as a Visiting Research Associate at the University of Toronto, Canada, University of Erlangen-Nuernberg, Germany, Tampere University of Technology, Finland and as Visiting Assistant Professor at the University of Toronto. He was lecturer in short courses for continuing education. His current interests are in the areas of digital image processing, multidimensional signal processing and computer vision. He has published over 250 papers and contributed 8 books in his area of interest. He is the co-author of the book `Nonlinear Digital Filters: Principles and Applicationsa (Kluwer, 1990) and author of `Digital Image Processing Algorithmsa (Prentice-Hall, 1993). He is the editor of the book `Parallel Algorithms and Architectures for Digital Image Processing, Computer Vision and Neural Networksa (Wiley, 1993). Dr. Pitas has been member of the European Community ESPRIT Parallel Action Committee. He has also been an invited speaker and/or member of the program committee of several scienti"c conferences and workshops. He was Associate Editor of the IEEE Transactions on Circuits and Systems and co-editor of Multidimensional Systems and Signal Processing and he is currently an Associate Editor of the IEEE Transactions on Neural Networks. He was chair of the 1995 IEEE Workshop on Nonlinear Signal and Image Processing (NSIP95).
Pattern Recognition 33 (2000) 1033}1045
Granolds: a novel texture representation Damian G. Jones*, Paul T. Jackway The CRC for Sensor Signal and Information Processing, Department of Computer Science and Electrical Engineering, University of Queensland, Brisbane 4072, Australia Received 15 December 1998; received in revised form 2 May 1999; accepted 23 June 1999
Abstract In this paper we describe a new image texture representation. Using two di!erent parameterised monotonic mappings this new technique transforms the input image into a function on two dimensions that may be regarded as a surface that we have named the `granolda. The nature of this surface is such that corners appear at positions where there are simultaneously large changes in the response of the monotonic mappings to the input image. The shape and position of these corners is then analysed to provide information about the texture of the input image. The use of grey level thresholds and morphological granulometries in the formation of the granold is presented. Then these mappings are generalised and a mathematical analysis of the new technique including constraints for the parameterised monotonic mappings is presented. A number of experiments are described that use grey-level thresholds and morphological granulometries to fundamentally test and demonstrate the new technique are presented. The results from these experiments are presented and discussed including a qualitative discussion of the behaviour of the new technique in response to changes in the input image. The conclusion is that the new technique performs in an expected, repeatable and predictable manner in accordance with its mathematical analysis. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Image analysis; Image texture representation; Morphological granulometries; Grey-level thresholds
1. Introduction Image texture in general presents a problem for analysis; this may in part be because `texturea is such a catchall word: we can talk of natural or synthetic textures; regular quasi-regular, or irregular; oriented or isotropic; consisting of one dominant scale, or many, or fractal-like; binary-valued, grey scale, coloured, or multiband. A texture has many properties that may be useful in its representation, modelling, synthesis, classi"cation or segmentation: spatial regularities lead to peaks in the spatial frequency domain, and a spectral representation [1]; selfsimilarities lead to fractal-dimension computations [2]; statistical regularities lead to random "eld models [3] or co-occurrence models [4]; edges lead to edge element
* Corresponding author. Tel.: #61-7-3365-3812; fax: #617-3365-3684. E-mail address:
[email protected] (D.G. Jones)
descriptions [5]; spatial shapes and sub-shapes lead to a geometric approach [6], and so on. There are two main approaches to image texture analysis, one structural and the other statistical. These are addressed extensively in many works including, for example, Refs. [7,8]. The statistical approach involves the global analysis and characterisation of the texture. On the other hand, structural techniques attempt to unravel the details of the texture description by examining the properties of the individual texture elements (textels) and the rules that govern how they are placed with respect to each other to form the texture. For texture synthesis and modelling we need to estimate (extract) parameters for the model, perhaps from training textures [9]; for classi"cation and segmentation we need to extract features which will be discriminatory for the class of textures under consideration [10]. In this paper, however, we are not concerned so much with the extraction of features from texture, but rather with the introduction of a new class of texture representations
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 6 3 - 6
1034
D.G. Jones, P.T. Jackway / Pattern Recognition 33 (2000) 1033}1045
called the `granolda and the establishment of some of the fundamental properties of this new class. The outline of this paper is as follows: in Section 2 we provide a qualitative description of the granold using grey-level thresholding and morphological granulometries as the mappings involved. In Section 3 we then proceed to a more rigorous and general description of granold theory. Section 4 describes the results of some experiments that demonstrate and test the fundamental properties of the granold. Following a brief discussion of these results and related work in Section 5, the paper ends in Section 6 with some concluding remarks and indications of areas of further research.
2. What is a Granold? The proposed texture representation seeks to describe the texture elements and it does this in terms of their fundamental properties, size and shade. In Haralick's terms [7] this method is concerned with characterising the `tonal primitivesa and not the spatial organisation of the tonal primitives. The technique uses two parameterised monotonic mappings to transform an input image into a two and a half dimensional surface for the purpose of texture analysis. Other approaches to texture analysis, do not clearly separate the spatial and the grey-scale dimensions of texture but rather mix or confound them together. The technique presented here transforms the texture into a new analysis space where grey scale related properties are set along one dimension and spatial properties are set along an orthogonal dimension. The basic idea is to represent the input texture as a stack of binary images. We may do this through an operation such as grey-level thresholding [11]. Then, for each binary image in the stack, we perform a morphological granulometry. Granulometries as introduced by Matheron [12] are set valued (binary image) mappings that satisfy axioms chosen in such a way to exactly generalise the physical meaning of a size distribution (or `sievinga [13]). The granulometry encompasses two steps, "rst a morphological opening by successively larger structuring elements, followed in each case by the measurement of the image residual. The area of the binary image is the most obvious such measurement, but, as we discuss later, other types of set measurements are possible. If we plot this resulting measurement against the threshold level on one axis and the size of opening on the other axis we will obtain a surface which decreases monotically to zero in both directions. We have named this surface the `granold surfacea (Granulometryof-Thresholds). For threshold ranges which do not change the binary images and for opening size ranges which do not alter the granulometric analyses, the granold surface will be #at, conversely the granold sur-
face will be steep in both directions (i.e. a corner will appear) wherever a small change in both the threshold value and the opening size causes a large change in the resulting output measurement. By detecting this corner we can establish the characteristic shades and sizes of the texture elements in the image. The whole idea of the granold concept is to produce `spectral-likea response peaks that indicate particular shade and size components of the input image. Obviously, other image mappings besides the threshold and morphological opening may be used under the same basic framework and a full mathematical generalisation is presented in the next section. However, the granulometry of thresholds seems to be the most obvious implementation of the idea and will be used through this paper in the examples and experiments. The "rst step in forming the granold of a grey-scale image is to threshold the image at all grey levels. Typically in an 8-bit grey-scale image this would produce 255 binary images. Each binary image obtained by this thresholding operation is opened with a set of convex, compact, smooth structuring elements. The choice of structuring element shape should be related to the texture itself, for isotropic textures we have used both digital squares and digital disks with similar results, for highly directional textures we may use appropriately oriented line segments. Fig. 1 shows a sample of a number of images produced by this method. Part (a) is the original 50]50 pixel greylevel image. Part (b) shows the result of thresholding the image at grey-level 172. Part (c) shows the result of opening this threshold with a disc-shaped structuring element of radius 1. Part (d) shows the result of opening (c) with a disc-shaped structuring element of radius 2. Part (e) shows the result of thresholding the original image at grey-level 209. Part (f) shows the result of opening this threshold with a disc-shaped structuring element of radius 1. Part (g) shows the result of opening (f) with a disc-shaped structuring element of radius 2. After each opening the normalised residual area of the binary image is calculated. This forms a granulometry of each binary image. The normalised residual area of each opening for each threshold is entered into a two-dimensional array of grey-level versus size of the structuring element. This forms the three-dimensional granold surface. The nature of this surface is such that corners appear in the surface that indicate a relationship between the size and the grey-level of objects in the original image. Following the above informal discussion and example we now seek to formalise the approach in a theoretical framework.
3. Granold theory In this section we outline the theory of granolds: "rstly, we introduce some necessary notation, then we de"ne the
D.G. Jones, P.T. Jackway / Pattern Recognition 33 (2000) 1033}1045
1035
Fig. 1. Example of images produced during the formation of a granold surface G (i, j). (a) Original 50]50 pixel grey-level image. X (b) Result of thresholding (a) at grey-level 172. (c) Result of opening (b) with a disc-shaped structuring element of radius 1. (d) Result of opening (c) with a disc-shaped structuring element of radius 2. (e) Result of thresholding (a) at grey-level 209. (f) Result of opening (e) with a disc-shaped structuring element of radius 1. (g) Result of opening (f) with a disc-shaped structuring element of radius 2.
In order to describe the new technique it is necessary to de"ne certain notations. In the following, &&R`'' denotes the non-negative real numbers (including zero), &&Z'' the integers M2, !1, 0, 1, 2N and &&N'' the natural numbers M0, 1, 2, 2N. Firstly, we de"ne,
Here, MFN represents the general class of m-integer valued functions on n-dimensional integer space. We would typically use such a function to represent a greyscale 2-D image } in which case we would write, for example, `Let X3F be a 2-D grey-scale image 2a, Note: this statement implies that n"2 and m"1. Throughout this paper we keep the dimensions of the image representation general (at n and m) so as not to unnecessarily limit the scope of the technique to 2-D images. The domain of the image X3F will be denoted by Dom(X). The domain is the set of all points in Zn for which X is de"ned. Secondly, we de"ne,
MFN:ZnPZm.
B"P(Zn),
mappings to be used in the generation of the granold followed by the required properties of these mappings. Finally, we de"ne the granold itself and examine its properties. 3.1. Notation
(1)
(2)
1036
D.G. Jones, P.T. Jackway / Pattern Recognition 33 (2000) 1033}1045
where P is the powerset operator, so P(Zn) is the set of all the subsets of Zn ) B represents the general class of binary images on n-dimensional integer space. We would typically use such a notation to represent a 2-D binary image where we would write, `let B3B be a 2-D binary image 2a which implies that n"2. We will allow a n-dimensional image so as not to unnecessarily limit the scope of the technique to 2-D images. The domain of the binary image B3B will be denoted by Dom(B). 3.2. The mappings Based on the image spaces just de"ned we now introduce the mappings or operators of the granolds approach to image texture analysis. Firstly, we de"ne, ' : FPB.
(3)
This function to set mapping takes a grey-scale image and returns a binary image. As an example let X3F be a grey-scale image, so we could write: B"'(X). The resulting image B3B is a binary image. The grey-level threshold operation is an example of this type of mapping. Secondly, we de"ne, ( : BPB.
(4)
This set to set mapping, which we will call an operator, takes a binary image and returns a binary image. As an example let B3B be a binary image so we could write: A"((B). The resulting image A3B is also a binary image. The morphological binary opening operation is an example of this type of mapping. It is our intention that these mappings preserve the domain or support of the images in all cases, that is, Dom('(X))"Dom(X), and, Dom(((B))"Dom(B), and with the composition of these mappings, Dom(('(X))" Dom(X). Lastly, we de"ne the set measure, d : BPR`.
(5)
This set to scalar mapping, which we will call a measure, takes a binary image and returns a non-negative scalar value. The area (number of object pixels) of a binary image is an example of this type of mapping, so we could write, for example,: area"d(B). The next step is to parameterise the mappings ' and ( to give families of mappings and operators indexed on non-negative integer parameters i3N. Therefore we have the families: M' N : FPB i and
(6)
M( N : BPB. j
(7)
3.3. Necessary properties of the mappings Since we require a certain order properties with respect to the indices i and j, it is important for the granold technique that the above mappings possess additional mathematical properties as follows. 3.3.1. Property 1 } monotone decreasing for parameter The families of mappings ' and ( must be monotone i j decreasing with respect to their indices i and j. Also, to ensure that (in the limit) the mappings extract all the information from the images, there should exist values M and N (which may depend on the images) such that for i'M and j'N, the appropriate mappings produce the empty-set, 0. In contrast, we should also ensure that the index zero mappings have the least possible e!ect on the images and respectively produce the domain of the image or reproduces the image itself. In mathematical terms: for the mappings ' and for i any X3F, i"0N' (X)"Dom(X), i
(8)
0(i(j(MN' (X)-' (X), j i
(9)
i*MN' (X)"0. i
(10)
Likewise, for the operators ( and for any A3B, j j"0N( (A)"A, j
(11)
0(j(k(NN( (A)-( (A), k j
(12)
j*NN( (A)"0. j
(13)
3.3.2. Property 2 } monotone increasing for argument The family of operators (set to set mappings) ( need j to have some additional properties to ensure the granold surface is well behaved. Firstly, ( need to be anti-extenj sive that is, for all j3N ( (B)-B. j
(14)
Secondly, the operators ( must be monotone increasing j with respect to their arguments, the binary images, that is for binary images A, B3B, for a "xed integer j3N: ALBN( (A)-( (B). j j
(15)
3.3.3. Property 3 } measurement Finally, the measure d, to represent the usual sense of measuring something, must be: (i) zero if the object to be measured is empty, d(0)"0
(16)
and (ii) increasing, that is, ALBNd(A))d(B).
(17)
D.G. Jones, P.T. Jackway / Pattern Recognition 33 (2000) 1033}1045
1037
Note, set measures commonly possess stronger properties such as p-additivity [13, p.114] but this is not strictly necessary here. 3.4. The granold Now, we consider the composition of these mappings as follows: (d( ' ) : FPR`. j i
Fig. 2. Diagram showing the formation of the surface G (i, j)" X d( ' (X). i j
(18)
This function to scalar mapping takes a grey-scale image and returns a non-negative scalar which depends on the values of the indicess i and j (and of course the image itself!). To show the dependence on i and j explicitly we choose to denote the granold function G (i, j) : X N2PR` of an image X3F as,
Incidentally, from Property 1 (8), and Property 1 (11) we "nd the value of this maximum to be G (0, 0)" X dt ' (X)"d(Dom(X)). 0 0 The granold must drop to zero su$ciently far from the origin along either axis. We have the following proposition.
$%& (d( ' )(X). G (i, j)" X j i
Proposition 2. For any image X3F, there exists some numbers M, N so that G (i, j)"0 for i*M or j*N. X
(19)
Fig. 2 shows how G (i, j), considered as a surface, is X formed. The arrows indicate increasing values of i, j and G (i, j). X 3.5. Properties of the granold The monotone properties (1)}(3) of the mappings used to construct the granold, ensure the following monotone properties of the granold function itself. Proposition 1 (Monotonic Property of the Granold). For each i, j3N and for any image X3F, (a) G (i, j)*G (i#1, j)*G (i#1, j#1), X X X (b) G (i, j)*G (i, j#1)*G (i#1, j#1). X X X Proof. For any image X3F, ' (X)-' (X) by Propi i`1 erty 1 (9). Therefore, ( ' (X)-t ' (X) by Property j i j i`1 2 (15), and thence d( ' (X))d( ' (X) by Property j i j i`1 3 (17). Therefore, G (i, j)"d( ' (X))G (i#1, j)" X j i X d( ' (X), and, in the same way, G (i, j#1)" j i`1 X d( ' (X))G (i#1, j#1)"d( ' (X). This j`1 i X j`1 i`1 proves half the proposition. For any image X3F, ( ' (X)-( ' (X) by Propj i j`1 i erty 1 (12). Therefore, d( ' (X))d( ' (X) by Property j i j`1 i 3 (17). Therefore, G (i, j)"d( ' (X))G (i, j#1)" X j i X d( ' (X), and, in the same way, G (i#1, j)" j`1 i X d( ' (X))G (i#1, j#1)"dt ' (X). Comj i`1 X j`1 i`1 pleting the remainder of the proof. h From the above proposition it is easy to note that the maximum value of the granold occurs at the origin, that is, G (0, 0)*G (i, j) for all i, j'0. X X
(20)
Proof. Since G (i, j)"d( ' (X) we need to look at the X j i behaviour of ' for large i. Now Property 1 (10) ensures i that there is some number M for which ' (X), i'M is i zero, no matter what X is. Then if ' (X) is zero, from the i anti-extensiveness of ( , Property 2 (14), and the j measure property of d, Property 3 (16), the whole granold, G (i, j)"d( ' (X)"0. X j i Similarly, for large j'N, we have from Property 1 (13) that ( (' (X))"0 no matter the value of ' (X). Then, j i i from the measure property of d, Property 3 (16), the whole granold, G (i, j)"d( ' (X)"0. Thus completX j i ing the proof of the proposition. h Fig. 1 in Section 2 showed an example of the binary images produced during the computation of the granold. If the measure d is, for example, the number of black pixels in each image, then we can easily see the monotonic decreasing property as we go across each row of images and down each column. 3.6. Corner detection The monotone property of the granold suggests that the following "rst di!erences may be useful. These di!erences approximate the directional derivatives of the granold and will be large wherever there is a rapid fall in the granold in the corresponding directions. $%& G (i, j)!G (i#1, j), *x (i, j)" X X X
(21)
$%& G (i, j)!G (i, j#1). *y (i, j)" (22) X X X The monotone property ensures that both these di!erences will always be positive.
1038
D.G. Jones, P.T. Jackway / Pattern Recognition 33 (2000) 1033}1045
Fig. 3. Diagram showing the formation of a corner in the surface G (i, j) corresponding to values of i"i and j"j . X c c
The whole idea of the granold concept is to produce a `spectral-likea response peak when there is a simultaneous rapid decrease in the granold surface in both the i and j directions. As an example, suppose this happened at the point (i"q, j"R), so that both *x (i"q, j"R)" X (d( )(' (X))!(d( )(' (X)) and Dy (i"q, j"R)" R q R q`1 X (d)( (' (X))!(d)( (' (X)) were large at this point. R q R`1 q Therefore, we can infer a rapid change in the value of both mappings ' (X) at i"q and ( (B) at j"R, where i j B"' (X). If ' deals with grey-scale properties of image i objects (for example a threshold operator), and ( deals with sizes of objects (for example a morphological opening), then such a peak in *x and *y indicates the presX X ence of image objects of size"R and greyscale"q. Simultaneous rapid decreases in the granold surface in both the i and j directions give a shape on the surface of a `cornera, (see Fig. 3). So we need to detect and measure these corners in the granold surface. The quantity, $%& min(*x (i, j), *y (i, j)), K (i, j)" (23) X X X which we call the `granold spectruma, is useful to determine where the granold surface has simultaneous large "rst di!erences in both the ' and ( directions. The i j validity of K (i, j) can be appreciated from several arguX ments. Firstly, the min( ) forms the logical equivalent of an AND function } only if BOTH di!erences are large can the result be large. Secondly, from a dimensional point of view, the min( ) chooses one or the other of its arguments, so the dimensions of the result is the same as that of each of its arguments.
4. Experiments The intention of the experiments presented in this section is to illustrate and test the fundamental concepts of the granold. Test images containing simple arti"cial textures are used and a qualitative discussion of the results provided. Test images and results are shown in
Figs. 4}9. Part(a) of these "gures show the test images, these were PGM "le format, 8-bit grey-scale images 200]200 pixels in size. Parts (b) and (c) of the "gures show the corresponding granold surface and spectrum, respectively. These surfaces have been normalised to the area of objects within the test image. Fig. 10 contains a natural texture image and is included as a reality check. Code used for these experiments was written in the C programming language and was run on a DEC alpha 3000 model 300 machine under Digital Unix 4.0. Fig. 4 shows a granold analysis of a set of test images containing randomly placed non-overlapping discs. The sizes and grey-levels of the discs vary between images. For each image a granold surface was formed by thresholding at 255 grey-levels and opening each threshold with a set of disc-shaped structuring elements of radius 1}16 pixels. Each granold surface shows a corner in a position corresponding to the grey-level and the radius of the image discs. Applying the corner detector to each granold surface produces a peak in the correct position. To show the invariance of the technique to the arrangement of objects, we present Fig. 5. This "gure shows two test images each containing 150 discs of radius 5 pixels, grey-level 150. The discs are non-overlapping. In the top image the discs are randomly placed while in the bottom image the discs are evenly placed. A granold surface was formed for each image by thresholding at 255 grey-levels and opening each threshold with a set of disc-shaped structuring elements of radius 1}16 pixels. The granold surfaces are identical with a corner in a position corresponding to grey-level 150, radius 5 pixels. Applying the corner detector to these granold surfaces produces an identical result with a single peak in a position corresponding to grey-level 150, radius 5 pixels. To examine the response to di!erent populations of objects, we present Fig. 6. This "gure shows a test image containing 50 randomly placed discs of radius 5 pixels, grey-level 100 and 100 randomly placed discs of radius 5 pixels, grey-level 200. The discs are non-overlapping. A granold surface was formed for this image by thresholding at 255 grey-levels and opening each threshold with a set of disc-shaped structuring elements of radius 1}16 pixels. The granold surface shows two corners in positions corresponding to grey-level 100, radius 5 pixels and grey-level 200, radius 5 pixels. Applying the corner detector to this granold surface produces two peaks in positions corresponding to grey-level 100, radius 5 pixels and grey-level 200, radius 5 pixels. The ratio of the heights of these two peaks is 1 : 2, this is proportional to the area consumed by the respective objects. To test the response to several sizes and shades simultaneously, we present Fig. 7. This "gure shows a test image containing 25 randomly placed discs of radius 8 pixels, grey-level 100; 25 randomly placed discs of radius 8 pixels, grey-level 150; and 150 randomly placed discs of radius 4 pixels, grey-level 250. The discs are
D.G. Jones, P.T. Jackway / Pattern Recognition 33 (2000) 1033}1045
1039
Fig. 4. (a) Input images containing (down the page) 150 randomly placed discs of radius 5 pixels, grey-level 150; 50 randomly placed discs of radius 10 pixels, grey-level 150; 150 randomly placed discs of radius 5 pixels, grey-level 190; (b) Granold surfaces resulting from thresholding the input images at 255 grey-levels and opening each threshold with a set of disc structuring elements of radius 1 to 16; (c) The respective granold spectrums.
1040
D.G. Jones, P.T. Jackway / Pattern Recognition 33 (2000) 1033}1045
Fig. 5. The e!ect of object arrangement. (a) Input images containing (i) 150 randomly placed discs of radius 5 pixels, grey-level 150 (top "gure) and (ii) 150 evenly placed discs of radius 5 pixels, grey-level 150 (bottom "gure). (b) Granold surface resulting from thresholding the input images at 255 grey-levels and opening each threshold with a set of disc-shaped structuring elements of radius 1 to 16 pixels. (c) The respective granold spectrums.
non-overlapping. A granold surface was formed for this image by thresholding at 255 grey-levels and opening each threshold with a set of disc-shaped structuring elements of radius 1}16 pixels. The granold surface shows three corners in positions corresponding to grey-level 100, radius 8 pixels, grey-level 150, radius 8 pixels, and grey-level 250, radius 4 pixels. Applying the corner detector to this granold surface produces three peaks in positions corresponding to grey-level 100, radius 8 pixels, grey-level 150, radius 8 pixels, and grey-level 250, radius 4 pixels. The height of these three peaks is proportional to the area consumed by the respective objects. To show the e!ect of overlapping image objects we present Fig. 8. This "gure shows a test image containing 200 randomly placed overlapping discs of radius 6 pixels, grey-level 150. A granold surface was formed for this
image by thresholding at 255 grey-levels and opening each threshold with a set of disc shaped structuring elements of radius 1}16 pixels. The granold surface shows a corner in a position corresponding to grey-level 150 and radius 6 pixels. However, this corner is not as `sharpa as in the previous "gures as the overlapping of objects produces larger objects that `smeara the corner in the openings or size direction. Applying the corner detector to this granold surface produces a peak in a position corresponding to grey-level 150 and radius 6 pixels. The peak in this case is also smeared in the size direction. To show the e!ect of di!ering structuring elements in the granulometry we present Fig. 9. This "gure shows the same test image containing 50 randomly placed nonoverlapping discs of radius 10 pixels, grey-level 150. This time a granold surface was formed by thresholding at 255
D.G. Jones, P.T. Jackway / Pattern Recognition 33 (2000) 1033}1045
1041
Fig. 6. The e!ect of di!erent populations of objects in the image. (a) Input image containing 50 randomly placed discs of radius 5 pixels, grey-level 100 and 100 randomly placed discs of radius 5 pixels, grey-level 200. The discs are non-overlapping. (b) Granold surface resulting from thresholding the input image at 255 grey-levels and opening each threshold with a set of disc structuring elements of radius 1 to 16. (c) The granold spectrum.
Fig. 7. The e!ect of several sizes and shades in the image. (a) Input image containing 25 randomly placed discs of radius 8 pixels, grey-level 100, 25 randomly placed discs of radius 8, grey-level 150, and 150 randomly placed discs of radius 4 pixels, grey-level 250. The discs are non-overlapping. (b) Granold surface resulting from thresholding the input image at 255 grey-levels and opening each threshold with a set of disc structuring elements of radius 1 to 16. (c) The granold spectrum.
grey-levels and opening each threshold with a set of square-shaped structuring elements with edge lengths of 2}16 pixels. The granold surface shows a number of corners in positions corresponding to grey-level 150 and di!ering structuring element sizes. This series of corners was a result of the square structuring elements not perfectly matching the disc-shaped objects. As the disc objects are successively opened by larger square structuring elements the remaining area decreases until the square
structuring element no longer "ts inside the disc objects at which point the whole object disappears. At the point where the structuring element square is of size 14]14 pixels it will no longer "t inside the discs of radius 10 pixels and we see a large change in the remaining area. This is the `highesta corner that we see in the granold surface at a position corresponding to grey-level 150 and structuring element size 14]14 pixels. Applying the corner detector to this granold surface produces a large peak
1042
D.G. Jones, P.T. Jackway / Pattern Recognition 33 (2000) 1033}1045
Fig. 8. The e!ect of overlapping objects. (a) Input image containing 200 randomly placed, overlapping discs of radius 6 pixels, grey-level 150. (b) Granold surface resulting from thresholding the input image at 255-grey levels and opening each threshold with a set of disc structuring elements of radius 1 to 16. (c) The granold spectrum.
in the same position and again it is smeared in the size direction. Fig. 10 shows a test image containing a more complicated natural texture which is, in fact, an image of reptile skin which has been inverted to give light texture elements on a dark background similar to the arti"cial textures already considered. A granold surface was formed for this image by thresholding at 255 grey-levels and opening each threshold with a 16 element homothetic set of a 2]2 pixel structuring element. This image was included to illustrate that depending upon the texture we should not always expect to see perfect corners and peaks.
5. Discussion From the above experiments where discrete objects exist in the test image we can make the following observations: f Corner detection applied to the granold surface produced a spectral-like response where a single peak was produced for each type of image object. f The position of granold spectrum peaks indicates the characteristic size and shade of image objects. f The height of peaks in the granold spectrum indicates the number of image pixels covered by the associated image objects. f Where objects overlapped, the peaks were smeared in the size direction as the overlapping produced some larger objects. f The granold analysis is not sensitive to the particular layout of image objects } random or regular placement
makes no di!erence to the granold function or spectrum. For the complicated natural texture we can see that the granold surface and spectrum is itself more complicated and characteristic of the texture. A review of the literature has found the following works which share some features with the granold technique although none combine multiple granulometries across many thresholds into a two-dimensional representation as we have proposed. It is common to perform a Granulometric analysis of binary images which however are usually obtained from a grey-level images by a single thresholding operation, see for example, Ref. [14]. Threshold decomposition and stacking has been used to implement grey-level morphological operations with simpler binary morphological operations [11,15,16]. Here an image is decomposed by grey-level threshold into multiple binary images. These images are processed in parallel using binary morphology and the results combined by stacking to produce the desired grey-level result. In an analysis of cell nucleus texture, the grey-level image has been opened with structuring elements of increasing size. The image formed by each opening is subtracted from the original grey-level image. The resulting images are then densitometrically thresholded from grey-level 10 to 255. These operations result in a sequence of binary images that represent the granular fractions of the original image. These binary images are then analysed from a geometric and a densitometric point of view [17]. Texture features based on geometrical attributes such as the number, size and irregularity of connected regions
D.G. Jones, P.T. Jackway / Pattern Recognition 33 (2000) 1033}1045
1043
Fig. 9. The e!ect of di!ering structuring elements. (a) Input images containing 50 randomly placed-discs of radius 10 pixels, grey-level 150. (b) Granold surface resulting from thresholding the input image at 255 grey-levels and opening each threshold with (i) a set of disc-shaped structuring elements of radius 1 to 16 pixels (top "gure), and (ii) a 16 element homothetic set of a 2]2 pixel structuring element (bottom "gure). (c) The respective granold spectrums.
in each binary image of a threshold decomposed texture image have also been proposed [6].
6. Conclusions In this paper we have described a new image texture analysis technique. Using two di!erent parameterised monotonic mappings this new technique transforms the input image into a function on two dimensions that may be regarded as a surface that we have named the `granolda. The nature of this surface is such that corners appear at positions where there are simultaneously large changes in the response of the monotonic mappings to the input image. The shape and position of these corners is then analysed to provide information about the texture of the input image.
The mathematical theory of the granold has been developed and analysed in depth including constraints for the parameterised monotonic mappings. A number of experiments were performed using greylevel thresholds and morphological granulometries to fundamentally test the new technique. The results of these experiments showed that for the test images used the new technique produced spectral-like results that provided information about the distributions of size and grey-level of objects within the images. The new technique performs in an expected, repeatable and predictable manner in accordance with its mathematical analysis. In future we intend to use the granold function to discriminate between textures which may di!er subtly in the size and shape of their texture elements, however in the present paper we have simply introduced the
1044
D.G. Jones, P.T. Jackway / Pattern Recognition 33 (2000) 1033}1045
Fig. 10. (a) Input image containing a natural texture as used in Fig. 1. (b) Granold surface resulting from thresholding the input image at 255 grey-levels and opening each threshold with a 16 element homothetic set of a 2]2 pixel structuring element. (c) The granold spectrum.
technique in detail and outlined some of its major properties. Future research will investigate the use of other legal mappings, as de"ned in Section 3.3 to address speci"c tasks. Future work will also require statistical analysis of both the granold surface and the granold spectrum with the intention of obtaining discriminating features for di!erent real-world textures.
References [1] C.H. Chen, A study of texture classi"cation using spectral features, Proceedings 6th International Conference on Pattern Recognition, Munich, 1982, pp. 1074}1077. [2] A.P. Pentland, Fractal-based description of natural scenes, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6 (6) (1984) 661}674. [3] G.R. Cross, A.K. Jain, Markov random "eld texture models, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-5 (1) (1983) 25}39. [4] R.M. Haralick, K. Shanmugam, Its'Hak Dinstein, Textural features for image classi"cation, IEEE Trans. Systems, Man Cybernet. SMC-3 (6) (1973) 610}621. [5] V.S. Nalwa, E. Pauchon, Edgel aggregation and edge description, Comput. Vision Graphics Image Process. 40 (1) (1987) 79}84. [6] Yan Qui Chen, Mark S. Nixon, David W. Thomas, Statistical geometrical features for texture classi"cation, Pattern Recognition 28 (4) (1995) 537}552. [7] R.M. Haralick, Statistical and structural approaches to texture, Proc. IEEE 67 (5) (1979) 786}804.
[8] M.D. Levine, Texture, in: McGraw-Hill Series in Electrical Engineering, McGraw-Hill, New York, 1985 Chapter 9. [9] R. Chellappa, R.L. Kashyap, Texture synthesis using 2-d noncausal autoregressive models, IEEE Trans. Acoust. Speech Signal Process. ASSP-33 (1) (1985) 194}203. [10] R. Chellappa, R.L. Kashyap, B.S. Manjunath, Model} based texture segmentation and classi"cation, in: C.H. Chen, L.F. Pau, P.S.P. Wang (Eds.), Handbook of Pattern Recognition and Computer Vision, World Scienti"c, Singapore, 1993, pp. 277}310. [11] P. Maragos, R.D. Zi!, Threshold superposition in morphological image analysis systems, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-32 (5) (1990) 498}504. [12] G. Matheron, Random Sets and Integral Geometry, Wiley, New York, 1975. [13] J. Serra, Image Analysis and Mathematical Morphology, Academic, London, 1982. [14] E.R. Dougherty, J.T. Newell, J.B. Pelz, Morphological texture-based maximum-likelihood pixel classi"cation based on local granulometric moments, Pattern Recognition 25 (10) (1992) 1181}1198. [15] Frank Yeong-Chyang Shih, Owen Robert Mitchell, Threshold decomposition of grey-scale morphology into binary morphology, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-11 (1) (1989) 31}42. [16] C.C. Pu, F.Y. Shih, Threshold decomposition of grey-scale morphology into binary soft morphology, Graphical Models Image Process. 57 (6) (1995) 522}526. [17] J.A. GimeH nez-Mas, M. Pilar Sanz-Moncasi, Loto RemoH n, Paula GamboH , M. Paz Gallego-Calvo, Automated textural analysis of nuclear chromatin, Anal. Quantitative Cytol. Histol. 17 (1) (1995) 39}47.
About the Author*DAMIAN G. JONES, born in Sale, Australia, received the Bachelor of Engineering degree (Hons.) in Electronics Communications Engineering from the Royal Melbourne Institute of Technology, Australia in 1992. He worked for four years as a systems engineer in industrial automation and control systems. He is currently pursuing a Ph.D. degree in Engineering at the University of Queensland, Australia. His research interests include image analysis, particularly image texture analysis.
D.G. Jones, P.T. Jackway / Pattern Recognition 33 (2000) 1033}1045
1045
About the Author*PAUL T. JACKWAY, born in Warrnambool, Australia, received his Bachelor of Engineering (Electronics) in 1984, a Graduate Diploma in Applied Statistics in 1986, the Master of Applied Science (Mathematical Modelling and Data Analysis) in 1990, all from the Royal Melbourne Institute of Technology, and a PhD in Electrical Engineering in 1995 from the Queensland University of Technology. Paul is a Senior Research Fellow in the Cooperative Research Centre for Sensor Signal and Information Processing (CSSIP). His research interests include, image texture, pattern recognition and classi"cation, multi-resolution and scale-space image theory, and mathematical morphology.
Pattern Recognition 33 (2000) 1047}1057
Heterogeneous morphological granulometries Sinan Batman!, Edward R. Dougherty",*, Francis Sand# !Department of Electrical Engineering, Johns Hopkins University, USA "Department of Electrical Engineering, Texas Center for Applied Technology, Texas A&M University, 214 Wisenbaker Engineering Research Center, College station, TX 77843-3407, USA #School of Computer Science and Information Systems, Fairleigh Dickinson University, USA Received 18 January 1999; received in revised form 2 May 1999; accepted 23 June 1999
Abstract The most basic class of binary granulometries is composed of unions of openings by structuring elements that are homogeneously scaled by a single parameter. These univariate granulometries have previously been extended to multivariate granulometries in which each structuring element is scaled by an individual parameter. This paper introduces the more general class of "lters in which each structuring element is scaled by a function of its sizing parameter, the result being multivariate heterogeneous granulometries. Owing to computational considerations, of particular importance are the univariate heterogeneous granulometries, for which scaling is by functions of a single variable. The basic morphological properties of heterogeneous granulometries are given, analytic and geometric relationships between multivariate and univariate heterogeneous pattern spectra are explored, and application to texture classi"cation is discussed. The homogeneous granulometric mixing theory, both the representation of granulometric moments and the asymptotic theory concerning the distributions of granulometric moments, is extended to heterogeneous scaling. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Granulometry; Mathematical morphology; Mixing theory; Pattern spectrum; Texture
1. Introduction Granulometries are parameterized families of morphological openings that are used for granular "ltering (clutter removal) [1}5] and pattern and texture classi"cation [6}12]. As originally conceived, the most basic type of granulometry is formed as a union of binary openings in which each structuring element is a homothetic tB and the scaling (sizing) parameter t is the same for all structuring elements [13]. There are various extensions of this concept, the most relevant to the present paper being multivariate granulometries, in which the structuring elements are scaled by independent individual parameters [14]. From the standpoint of texture classi"cation, allow-
* Corresponding author. Tel.: #1-409-862-8154; fax: #1409-862-3336. E-mail address:
[email protected] (E.R. Dougherty)
ing each structuring element to be scaled independently provides a more general class of "lters and thereby facilitates increased discriminatory power. The present paper takes this form of generalization one step further by having each structuring element scaled by an increasing function of its sizing parameter. This heterogeneous scaling produces a larger class of "lter families and, by using a single parameter with individual scaling functions, we can obtain sieving "lters that are parameterized along paths in multidimensional space. Sieving along a path imposes no extra computational cost than that which occurs with a classical univariate granulometry; however, it can lead to improved classi"cation over homogeneously scaled univariate granulometries without the growing computational burden commensurate with using multivariate granulometries. After de"ning the class of heterogeneous granulometries and giving some basic morphological properties, analytic and geometric relationships between multivariate and univariate heterogeneous pattern spectra are explored, and application to
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 6 4 - 8
1048
S. Batman et al. / Pattern Recognition 33 (2000) 1047}1057
texture classi"cation is discussed. Morphological texture classi"cation often involves feature vectors whose components are moments of granulometric pattern spectra. Finally, we extend the granulometric mixing theorems to heterogeneous granulometries. These theorems provide representations for granulometric moments for classes of disjoint randomly sized granular images and provide asymptotic distributions for these moments [14}17]. Before proceeding, we review some basic de"nitions. A granulometry is a family M( N, t'0, such that ( is t t antiextensive [( (A)LA]; ( is increasing [ALB imt t plies ( (A)L( (B)]; for all u, v'0, ( ( "( ( " t t u v v u ( ; and ( is translation invariant [( (A#x)" .!9(u,v) t t ( (A)#x]. For completeness, for t"0, we de"ne t ( (A)"A. M( N is a Euclidean granulometry if, for any 0 t t'0 and any image A, ( (A)"t( [(1/t)A]. For t 1 u)v, ( )( . The basic representation of Matheron v u states that M( N is a Euclidean granulometry if and only if t there exists a set family B"MB N, called the generator of i the granulometry, such that ( (S)"Z Z S " rB , (1) t i i rwt where the opening S " B is the union of all translates of B that are subsets of S [13]. Given two sets A and B, A is B-open (open relative to B) if A " B"A. If A is B-open, then S " ALS " B for any set S. For a convex set B and r*t, rB is tB-open. Hence, S " rB L S " tB. Thus, if the generator sets are convex, then the double union reduces to the single union ( (S)"Z S " tB . (2) t i i For compact generator sets, the double union reduces to a single union if and only if all generator sets are convex [13]. Owing to the di$culty of forming the union over all r*t, applications involve "nite numbers of convex, compact structuring elements and the single-union formulation. M( (S)N is decreasing for increasing t and if S is comt pact, then ( (S)"H for su$ciently large t. Letting a t denote Lebesgue measure, )(t)"a[( (S)] is a decreasing t function of t. The normalization '(t)"1!)(t)/a[S] is a probability distribution function and the derivative '@(t) is a probability density called the pattern spectrum of S. The moments of '(t) are used as texture features. Because S is modeled as a random set, '(t) is a random function and its moments are random variables possessing probability distributions dependent on S.
2. Heterogeneous granulometries Extending Eq. (2), this paper treats "lter families formed by "nite unions of openings by scaled structuring
elements in which the scalings are functions of individual parameters. In this sense, for a family B"MB , B ,2, B N 1 2 n of convex, compact sets, we de"ne a heterogeneous multivariate granulometry to be a family of "lters de"ned by n (t (S)" Z S " h (t )B , (3) i i i i/1 where t"(t , t ,2, t ), t '0 for i"1, 2,2, n, and 1 2 n i h , h ,2, h are strictly increasing continuous functions of 1 2 n t , t ,2, t , respectively, on [0, R] such that h (0)"0 and 1 2 n i h (t )PR as iPR. For any t"(t , t ,2, t ) for which i i 1 2 n there exists t "0, de"ne (t (S)"S. Because h is strictly i i increasing and continuous, it represents an increasing bijection. Hence, h (r)B is h (t)-open for r*t for all i. i i i To avoid redundancy, B , B ,2, B are assumed to be 1 2 n distinct shapes relative to scalar multiplication: namely, if iOj, then there does not exist a scalar s such that sB i is B -open. Letting h"(h , h ,2, h ), we call M(t N j 1 2 n a (B, h, t)-granulometry, and B is the generator of the granulometry. ( is a q-opening [increasing, idempotent, t antiextensive, and translation invariant] with base Mh (t )B ,2, h (t )B N. The invariant class of (t , 1 1 1 n n n Inv[(t ], is the collection of all sets S such that (t (S)"S. Because h , h ,2, h are strictly increasing, if t*s, 1 2 n meaning t *s for i"1, 2,2, n, then (t )(s and i i Inv[(t ] L Inv[(s ]. If h , h ,2, h all equal the ident1 2 n ity function, then we obtain the previously studied multivariate granulometries [14], which we now term homogeneous multivariate granulometries, or (B, t)granulometries. If we let kPR for some "xed k and S is compact, then S " h (t )PH and we obtain the mark k ginal granulometry given by the union of Eq. (3) with the kth term deleted. If, for r'0, we "x t"(t , t ,2, t ) and de"ne 1 2 n n " (S)" Z S " rh (t )B , (4) r i i i i/1 then M" N is a Euclidean granulometry with generator r Mh (t )B ,2, h (t )B N and the original Euclidean 1 1 1 n n n theory yields the following properties: if r*s'0, then Inv[" ] L Inv[" ], " " "" " "" M N ; and r s r s s r .!9 r,s Inv[" ]"rInv[" ]. r 1 If we let t"(t, t,2, t), then Eq. (3) yields a univariate heterogenous granulometry M( N, which we term a (B, h)t granulometry. Because each ( is a q-opening and t for t*s, Inv[( ]LInv[( ], M( N is a granulometry. t s t If we let h (t)"t for all i, then the (B, h)-granulometry is i a Euclidean granulometry with generator B, which in the present context we will call a B-granulometry. If each function h is linear, say h (t)"a t, then groupi i i ing a with B in a tB shows M( N to be the Euclidean i i i i t granulometry with generator Ma B ,2, a B N. 1 1 n n A (B, h)-granulometry is an upper semicontinuous (u.s.c.) granulometry. To demonstrate this, we need to show that the mapping (t, S)P( (S) is u.s.c. on R`]K, t
S. Batman et al. / Pattern Recognition 33 (2000) 1047}1057
where K is the space of compact sets. The homothetic (t, B)PtB is a continuous mapping on R`]K and therefore (t, B)Ph(t)B is continuous when h(t) is continuous and positive valued. Consequently, (t, B)Ph (t)B is i i continuous for each i. Since opening is u.s.c., this implies that S " h (t ) is u.s.c. for all i, and, since union is continui i ous, ( is u.s.c. For the Euclidean case, h (t)B "tB . t i i i In the multivariate setting, the size distribution and pattern spectrum are de"ned by )(t)"a[(t (S)] and '(t)"1!)(t)/a[S], respectively. It has been shown that '(t) is a probability distribution function for homogeneous multivariate granulometries [14]. A similar argument shows that '(t) is a probability distribution function for heterogeneous multivariate granulometries.
3. Univariate heterogeneous granulometric size distributions Despite the large amount of information extracted by multivariate granulometries, a univariate approach is computationally attractive. For a (B, h)-granulometry, the size distribution is a function, )(t), of a single variable. Let c be the curve de"ned by h(t)"(h (t),2, h (t)) 1 n and s"m(t) be the arc length of c. Because h is a bijection, m(t) is strictly increasing. Hence, the size distribution can be viewed as a function of h(t) or of s. Rigorously, )(t)") (m(t) ), where ) ")m~1, and )(t)") (h(t)), 1 1 2 where ) ")h~1. Similar variable changes apply to the 2 pattern spectrum and the derivative of '(t) can be obtained via the chain rule. Relative to arc length, '(t)"' (m(t))"1!) (m(t))/a[S], 1 1 1 d) ds d' d' ds 1 . " 1 "! ds dt a[S] ds dt dt
(5) (6)
Relative to h(t), '(t)"' (h(t))"1!) (h(t))/a[S], (7) 2 2 d' n L' dh 1 n L) dh 2 k "! 2 k . "+ + (8) dt Lh dt a[S] Lh dt k/1 k k/1 k The pattern spectrum can be viewed as a function of arc length s according to ' (s)"'m~1(s) and the chain1 rule di!erentiation d' (s) n L' dh dt 1 "+ 2 i ds Lh dt ds i/1 i n dh 2 ~1@2 n L' dh i 2 i, " + + (9) dt Lh dt i/1 i/1 i d' (s)/ds can be derived from '(t) by means of the 1 preceding formula together with the formula
C A BD P P
' (h(t))" 2
h1 (t) h2 (t)
0
0
P
hn (t)
2
0
d' (q , q ,2, q ), 2 1 2 n
(10)
1049
where ' "'h~1. These expressions are useful because 2 a key concern is "nding univariate heterogeneous granulometries that provide texture discrimination that is almost as good as that provided by a multivariate homogeneous granulometry. Eqs. (9) and (10) allow derivation of pattern spectra for the former from pattern spectra of the latter when univariate heterogeneous granulometries are viewed as functions of arc length. Since granulometries are employed to extract geometric information, it is important to gain an appreciation of the relationship between a homogeneous multivariate granulometry with generator B and heterogeneous univariate granulometries possessing the same generator. In particular, we would like to consider the relationship between their pattern spectra. To place the geometric analysis in R3, so that we can picture it, we consider the case of a disjoint union S"(B #z )X(B #z ), (11) 1 1 2 2 formed by translations of B and B . To give the size 1 2 distribution a structured geometry, we will assume that B and B are orthogonal [15], which means that 1 2 S if t)1, (S " tB )X(S " tB )" (12) 1 2 H if t'1.
G
We apply the homogeneous multivariate granulometry with generator B"MB , B N. Let H(t)"1 if t'0 and 1 2 H(t)"0 if t)0, and HM (t)"1 if t*0 and HM (t)"0 if t(0. Then the bivariate size distribution, normalized bivariate size distribution, and bivariate pattern spectrum are given by )(t , t )"a[S]HM (1!t )HM (1!t ) 1 2 1 2 #a[S " t B ]H(t !1) 1 1 2 #a[S " t B ]H(t !1), (13) 2 2 1 '(t , t )"'(t )H(t !1)#'(t )H(t !1) 1 2 1 2 2 1 !H(t !1)H(t !1), (14) 1 2 L'(t , t ) 1 2 "'@(t )d(t !1) '@(t , t )" 1 2 1 2 Lt Lt 1 2 #'@(t )d(t !1)!d(t !1)d(t !1), (15) 2 1 1 2 respectively. These functions are illustrated in Figs. 1}3, where t (t ) is the largest scalar t for which 12 21 a[S " tB ]O0 (a[S " tB ]O0). 1 2 Now consider a more general situation in which S is a disjoint union of translated scalar multitples of B and 1 B of the form 2 n S" Z (t B #z )X(t B #z ). (16) k 1 1,k k 2 2,k k/1 The bivariate pattern spectrum, which is the generalized second-order mixed partial derivative of the normalized
1050
S. Batman et al. / Pattern Recognition 33 (2000) 1047}1057
Fig. 1. Bivariate size distribution.
Fig. 4. Comparison of bivariate and heterogeneous univariate size distributions.
through the triangles in the "gure. This B-granulometry will simultaneously eliminate pairs of grains t B #z k 1 1,k and t B #z . Its size distribution is a step function k 2 2,k with a step of size t2(a[B ]#a[B ] ) at each t . So long k 1 2 k as the path of any heterogeneous univariate granulometry remains between ! and ! , it too will have .*/ .!9 a size distribution that is a step function; however, same sized grains will not be simultaneously eliminated and therefore there will be more steps corresponding to elimination of individual grains.
4. Mixing theory Fig. 2. Normalized bivariate size distribution.
The homogeneous granulometric mixing theorems provide representations for granulometric moments of certain disjoint grain processes and asymptotic distributions for these moments. The mixing theory has been used to estimate parameters of grain-size distributions in random grain processes and to estimate the proportions of di!ering grain types. The next theorem, whose proof is in the appendix, provides a heterogeneous multivariate granulometric mixing theorem. Theorem 1. Let S be composed of randomly sized, disjoint translates arising from d compact sets A , A ,2, A , 1 2 d d mi S" Z Z r A #x ij i ij i/1 j/1
Fig. 3. Bivariate pattern spectrum.
size distribution, has the form given in Fig. 4 (excluding the lines !, ! , and ! , and the shaded region "). .!9 .*/ Were we to apply the univariate homogeneous Bgranulometry, its path would be the 453 line running
(17)
and h(t)"(tp1 , tp2 ,2, tpn ), with p an integer for 1 2 n i i"1, 2,2, n. ¹he granulometric moments of the (B, h, t)granulometry of Eq. (3) are given by +d a[A ]k(v@w)(A )+mi sL i i j/1 ij , k(k)(S)" i/1 +d a[A ]+mi s2P i j/1 ij i/1
(18)
S. Batman et al. / Pattern Recognition 33 (2000) 1047}1057
where k"k #k #2#k , P"p p 2p , s "r1@P, 1 2 n 1 2 n ij ij v n k #1!p i, "+ i (19) w p i i/1
A
B
n k #1 ¸" 2# + i P. (20) p i i/1 The Taylor expansion of sL about s6 , the sample mean ij i of s , s ,2, s i , is given by i1 i2 im
AB
L ¸ s6 L~c(s !s6 )c, sL " + ij i ij c i c/0
(21)
where s6 "E[s ]. Therefore Eq. (18) can be rewritten as i
AB AB
¸ +d a[A ]k(v@w)(A )+mi +L s6 L~c(s !s6 )c i i ij i i/1 j/1 c/0 c i k(k)(S)" P +d a[A ]+mi +P s6 P~c(s !s6 )c i ij i i/1 j/1 c/0 c i
AB AB
¸ +d a[A ]k(v@w)(A )+L s6 L~cm i i ic i/1 c/0 c i " , P d P + a[A ]+ s6 P~cm i c/0 c i ic i/1
(22)
where m is the cth sample central moment of ic s , s ,2, s i . i1 i2 im The representation for k(k)(S) in Eq. (18) is similar in form to the homogeneous multivariate granulometric mixing theorem for the model of Eq. (17) [14]. The main di!erence is that k(v@w)(A ) is a fractional moment of the i related granulometry. An important di!erence with the homogeneous theory is that the Taylor series expansion is evaluated around the mean of the transformed variable s , not r . Without such a change of variables, the expani i sion would have in"nite terms and it would be di$cult to discuss the normality of the moments. The family of heterogeneous paths for which Theorem 1 is applicable is larger than the simple monomials tpi , since it depends i only on some properties of these monomials; however, we see no practical bene"t in stating the theorem more generally in terms of these properties. If S is a random set of the form m S" Z r A (23) j j/1 for which the scalars r , r ,2, r are independent and 1 2 m identically distributed, then the discussions of Refs. [15,16] apply and we can apply a theorem of Crame`r [18] to conclude that k(k)(S) is asymptotically normal (as mPR) and obtain asymptotic expressions for the statistical moments of granulometric moments k(k)(S). It can be shown that the expectation E[k(k)(S) ] and variance
1051
Var[k(k)(S) ] converge to their asymptotic expressions at rates O(m~1) and O(m~3@2), respectively. We next state a heterogeneous multivariate extension to the asymptotic granulometric mixing theorem [17]. The original (long) proof goes through with some minor changes. In stating the theorem, we let m"m #m #2# 1 2 m be the total sample size, u be the numerator in Eq. (18) d divided by m, v be the denominator divided by m, H(u, v)"u/v"k(k)(S), and Eu and Ev be the expectations E[u] and E[v] of u and v, respectively. Theorem 2. Let S be a random set of the form given in Eq. (17) for which the random grain-sizing variables r are ij independent, each r is selected from a sizing distribution ij % possessing moments up to order k#2, the counts i m , m ,2, m occur in known xxed proportions 1 2 d h "m /m, i"1, 2,2, d, there exists a bound C, indepeni i dent of m, and q'0 such that Hk)Cmq for n'1, and H has xrst and second derivatives and its second derivatives are bounded by a constant independent of m in a neighborhood of (Eu, Ev). Then, for the (B, h, t)-granulometry of Eq. (3) the distribution of H"k(k)(S) is asymptotically normal with mean and variance given by E[H]"H(Eu, Ev)#O(n~1),
A
<ar[H]"
(24)
B
LH 2 (Eu, Ev) <ar[u] Lu LH LH (Eu, Ev) (Eu, Ev)Cov[u, v] Lu Lv
#2
A
#
B
LH 2 (Eu, Ev) <ar[v]#O(n~3@2). Lv
(25)
(Note that Eu and Ev are calculated via the expected value of transformed variables.) To illustrate Theorem 2, consider the heterogeneous scaling function h~1(t)"t1@2 and the single-primitive random set of Eq. (23) governed by a gamma distribution having parameters a and b. With the choice of a gamma sizing distribution, computation of the asymptotic expression for the expected value of the pattern-spectrum mean in fractional cases is possible. In this case we require E[X2.5] for the numerator (Eq. (24) ) of the asymptotic expression for the pattern-spectrum mean. (Note that this expectation does not exist in the case of Gaussian sizing distributions since fractional exponents applied to the negative part of the range yield imaginary values.) The fractional-power moment is evaluated as
P
= xa`2.5~1e~x@s dx, (26) 0 where C is the normalization constant for the probability integral. Applying the identity !(a#1)"a!(a) twice k(2.5)"C
1052
S. Batman et al. / Pattern Recognition 33 (2000) 1047}1057
and using the fact that !(0.5)"n1@2, we "nd that for large N, E[k(1)]+(nb)0.5
(a#1.5) (a#0.5)2(0.5) . (a#1)
(27)
A simulation with 100 images having a mean number of 50 circular grains governed by a gamma sizing distribution with a"10 and b"1 produced 3.29 for the average pattern-spectrum mean, the exact value predicted by Eq. (27).
5. Heterogeneous granulometric classi5cation
Fig. 5. Bivariate size densities corresponding to three di!erent textures and granulometric analysis employing two structuring elements.
Univariate heterogeneous granulometries can achieve greater discrimination than univariate homogeneous granulometries without the increasing computational cost of multivariate granulometries. Given a set of structuring elements, the path de"ning a univariate heterogeneous granulometry can be selected to increase class separation among textures. This principle can be discerned with the aid of Fig. 5, which depicts bivariate
Fig. 6. Synthetic textures.
S. Batman et al. / Pattern Recognition 33 (2000) 1047}1057
size densities corresponding to three di!erent textures and granulometric analysis employing two structuring elements. Paths 1 and 2 correspond to the marginal single-structuring-element granulometries. Path 3 extracts the univariate homogeneous granulometry involv-
1053
ing both structuring elements. Path 4 corresponds to a heterogeneous scaling. The pattern spectra corresponding to paths 1}4 are depicted in the "gure. Examination of the pattern spectra pro"les reveals the superiority of the heterogeneous approach. The spectra of all three
Fig. 7. Real and binarized real textures.
1054
S. Batman et al. / Pattern Recognition 33 (2000) 1047}1057
texture classes are clearly separated without any overlap for the pro"le generated by path 4. All other granulometries experience substantial amount of inter-class overlap which will produce features with lower discriminating power. To illustrate the discriminative power of heterogeneous granulometries with respect to slight texture variations when using vertical and horizontal linear structuring elements, we employ the same synthetic (Fig. 6) and binarized real textures (Fig. 7) that we have previously used for multivariate granulometric classi"cation [14]. The synthetic textures have been generated with primitives that produce strongly overlapping pattern spectra. Fifteen parameterized ellipsoidal curves are used as heterogeneous paths (Fig. 8). These are discretized for digital application (Fig. 9). Heterogeneous pattern spectra are derived from sampling the multivariate granulometric size distributions along these paths. A large number of fractional moments of the univariate pattern spectra are extracted into feature vectors to obtain texture representations as complete as possible. Later the granulometric fractional moments associated with each path are projected into a compressed feature set via the Karhunen}Loeve transform. The transformed features are used for training and classi"cation. For the synthetic textures, the "rst heterogeneous path above the B-granulometry (diagonal) had the highest classi"cation, 96.0%. In this case, the B-granulometry also did fairly well with a classi"cation rate of 94.7%. In both cases, the best classi"cation is achieved with the same number of features. At such relatively high classi"cation rates, an improvement of 1.3% is signi"cant. There was extensive
performance variation across di!erent paths. The peak performance goes as low as 80% for both of the marginal granulometries. For the real images, almost perfect classi"cation is achieved using the "rst path below the diagonal, and again the marginal granulometries were among the poorest performers. The results clearly suggest that heterogeneous granulometries can extract more information from a texture process than its traditional counterparts.
Heterogeneous granulometries form an extended class of granulometries based on nonlinear scaling. As such they provide greater #exibility for the formation of sieving "lters and texture classi"cation. In particular, univariate heterogeneous granulometries can provide better classi"cation than univariate homogeneous granulometries without incurring the increased computational cost of multivariate granulometries. There is a straightforward relationship between homogeneous multivariate orthogonal granulometries and heterogeneous univariate orthogonal granulometries. Both the representational and asymptotic mixing theorems for homogeneous granulometries extend to heterogeneous granulometries. The mixing theory applies to the disjoint grain model of Eq. (17). A natural question concerns the status of the mixing theory when there is grain overlapping. One way to interpret the issue is in terms of robustness. Speci"cally, suppose there is grain overlapping and the image is
Fig. 8. Parameterized ellipsoidal curves to be used as heterogeneous paths.
Fig. 9. Discretized ellipsoidal curves.
6. Conclusion
S. Batman et al. / Pattern Recognition 33 (2000) 1047}1057
segmented so that the new components `approximatea the original grains prior to overlapping. If granulometric computations are made on the segmented image, to what degree does the representation of Eq. (18) remain valid? To wit, if S is the segmented image and S is an image 0 formed as a disjoint union of the grains whose union forms the unsegmented image, then can we say something about the di!erence between k(k)(S) and k(k)(S )? 0 A tight quanti"cation relating to this di!erence in the context of heterogeneous granulometries has yet to be discovered. However, we can look to the univariate homogeneous theory to see the kind of result that we might be able to achieve. Under practical overlap constraints and a suitable segmentation procedure, for the single-structuring element granulometry S " tB, there exist lower and upper bounds u and u , which are 1 2 dependent on the degree, d, of overlap, such that u )k(k)(S))u . For certain models, u , u Pk(k)(S ) 1 2 1 2 0 as dP0 (no overlap) [19]. Appendix A We prove Theorem 1. The Euclidean property of an ordinary opening, rA " B"r(A " B/r), together with the fact that area is homogeneous of degree 2, leads to
A
B B
d mi n ) (h(t))" + + a Z r A " h (t )B S ij i k k k i/1 j/1 k/1 i d m n h (t ) " + + r2 a Z A " k k B i ij k r ij i/1 j/1 k/1 d mi h(t) " + + r2 ) i , (28) ij A r ij i/1 j/1 where h(t)"(h (t ),2, h (t )). Applying the identity 1 1 n n !) (h(t)/r )"a[A](' (h(t)/r )!1), (29) A ij A ij to the de"nition of ' along with algebraic manipulation S yields +d a[A ]+mi r2 ' i (h (t )/r ,2, h (t )/r ) i j/1 ij A 1 1 ij n n ij . ' (h(t))" i/1 S +d a[A ]+mi r2 i j/1 ij i/1 (30) Hence, the kth moment of the pattern spectrum is
A
A B
1055
The change of variables used in Eq. (31) requires some justi"cation. Eq. (31) states that E[tk1 2tkn ]"E[(h~1(r s ))k1 2(h~1(r s ))kn ], 1 n 1 ij 1 n ij n
(33)
in their respective probability spaces. Because A is comi pact, for each i the multiple Stieltjes integral involves a bounded integrand over a compact set. Let $"$() i , A, P) and $@"$@()@ i , A@, P@) be the probabilA A ity spaces associated with the distribution functions ' i (t ,2, t ) and ' i (h (t )/r ,2, h (t )/r ), respectively. A 1 n A 1 1 ij n n ij Let the event spaces ) i and )@ i coincide with the domains A A of their respective distribution functions on Rn. ) i and A )@ i are the compact regions shown in Fig. 10 (due to the A constraints on h (t )). The random vectors X"(X ,2, X ) i i 1 n and X@"(X@ ,2, X@ ) associated with these probability 1 n spaces are obtained via the identity mappings I () , A)"(Rn, BRn ). () i , A)"(Rn, BRn )P Ai A
(34)
Where B denotes the Borel "eld. The associated distribution functions F and F @ are the images of P and P@ under the equivalence class of these mappings. Let /(t , t ,2, t ) 1 2 n be the measurable mapping between the probability spaces $ and $@ de"ned by /(t , t ,2, t )"(h (t ), 1 2 n 1 1 h (t ),2, h (t )). For any elementary n-dimensional rec2 2 n n tangular event R@ on $@, there is associated a rectangular event R on $ such that *F(R)"*F@(R@). Referring to Fig. 10, it is clear that *F@(R@)"F@(a@)#F@(c@)!F@(b@)!F@(d@) "F(/H(a@))#F(/H(c@))!F(/H(b@))!F(/H(d@)) "*F(R).
(35)
Therefore, / : $P$@ is a morphism between the probability spaces $ and $@. Consequently, for any integrable function g on $@, /Hg is integrable on $ and E[/Hg]"E[g], where the expectations are taken in the corresponding measure spaces. Hence, the measure dF can be related to dF@ via the Jacobian of /(t , t ,2, t ) as dF"dF@/J. 1 2 n This is the content of the second equality of Eq. (31).
+d a[A ]+mi r2 :`= 2:`= tk1 2tkn d' i (h (t )/r ,2, h (t )/r ) i j/1 ij ~= n A 1 1 ij n n ij ~= 1 k(k)(S)" i/1 +d a[A ]+mi r2 i j/1 ij i/1 +d a[A ]+mi r2`n:`= 2:`= (h~1(r s ))k1 2(h~1(r s ))kn /h@ (h~1(r s ))2h@ (h~1(r s )) d' i (s ,2, s ) i j/1 ij ~= n ij n 1 1 ij 1 n n ij n A 1 n . ~= 1 ij 1 " i/1 +d a[A ]+mi r2 i j/1 ij i/1 If h (t )"tpi , where p is a positive integer, then the preceding expression reduces to i i i i d 1/p 2p + a [A ] k (A )((k1 `1~p1 )@p1 `2`(kn `1~pn )@pn )+mi r(k1 `1)@p1 `2`(kn `1)@pn n i/1 i i j/1 ij k(k)(S)" 1 , +d a[A ]+mi r2 i j/1 ij i/1 which is the desired result.
(31)
(32)
1056
S. Batman et al. / Pattern Recognition 33 (2000) 1047}1057
Fig. 10. Mappings for morphism.
References [1] Y. Chen, E.R. Dougherty, Optimal and adaptive reconstructive granulometric bandpass "lters, Signal Process. 61 (1997) 65}81. [2] E.R. Dougherty, Optimal binary morphological bandpass "lters induced by granlometric spectral representation, Math. Imaging Vision 7 (2) (1997) 175}192. [3] E.R. Dougherty, C. Cuciurean-Zapan, Optimal reconstructive q-openings for disjoint and statistically modeled nondisjoint grains, Signal Process. 56 (1997) 45}58. [4] Y. Chen, E.R. Dougherty, Adaptive reconstructive q-openings: convergence and the steady-state distribution, Electron. Imag. 5 (3) (1996) 266}282. [5] R.M. Haralick, P.L. Katz, E.R. Dougherty, Model-based morphology: the opening spectrum, CVGIP: Graph. Models Image Process. 57 (1995) 1}12.
[6] P. Maragos, Pattern Spectrum and Multiscale Shape Representation, IEEE Trans. Pattern Anal. Mach. Intell. 11 (1989) 701}716. [7] E.R. Dougherty, J. Newell, J. Pelz, Morphological texture-based maximum-likelihood pixel classi"cation based on local granulometric moments, Pattern Recognition 25 (10) (1992) 1181}1198. [8] Y. Chen, E.R. Dougherty, Gray-scale morphological granulometric texture classi"cation, Opt. Eng. 33 (8) (1994) 2713}2722. [9] Y. Chen, E.R. Dougherty, S. Totterman, J. Hornak, Classi"cation of trabecular structure in magnetic resonance images based on morphological granulometries, Magn. Resonance Med. 29 (3) (1993) 358}370. [10] K. Sivakumar, J. Goutsias, Discrete morphological size distributions and densities: estimation techniques and applications, Electron. Imaging 6 (1997) 65}75. [11] C. Bhagvati, D.A. Grivas, M.M. Skolnick, Morphological analysis of pavement surface condition, in: E. Dougherty (Ed.), Mathematical Morphology in Image Processing, Marcel Dekker, New York, 1993. [12] S. Baeg, S. Batman, E.R. Dougherty, V. Kamat, N. Kehtarnavaz, S. Kim, A. Popov, K. Sivakumar, R. Shah, Unsupervised morphological granulometric texture segmentation of digital mammograms, Electron. Imaging 8 (1) (1999) 31}53. [13] G. Matheron, Random Sets and Integral Geometry, John Wiley, New York, 1975. [14] S. Batman, E.R. Dougherty, Size distributions for multivariate morphological granulometries: texture classi"cation and statistical properties, Opt. Eng. 36 (5) (1997) 1518}1529. [15] F. Sand, E.R. Dougherty, Asymptotic normality of the morphological pattern-spectrum moments and orthogonal granulometric generators, Visual Commun. Image Representation 3 (2) (1992) 203}214. [16] E.R. Dougherty, F. Sand, Representation of linear granulometric moments for deterministic and random binary euclidean images, Visual Commun. Image Representation 6 (1) (1995) 69}79. [17] F. Sand, E.R. Dougherty, Asymptotic granulometric mixing theorem: morphological estimation of sizing parameters and mixture proportions, Pattern Recognition 31 (1) (1998) 53}61. [18] H. Cramer, Mathematical Methods of Statistics, Princeton Univ. Press, Princeton, 1946. [19] F. Sand, E.R. Dougherty, Robustness of granulometric moments, Pattern Recognition 32 (9) (1999) 1659}1665.
About the Author*SINAN BATMAN holds a B.Sc., 1990, in electrical engineering from Middle East Technical University, Ankara, Turkey, and a Ph.D., 1998, in electrical engineering from Texas A&M University, specializing in nonlinear stochastic signal processing. He is currently an associate research scientist in the Center for Imaging Science at the Electrical and Computer Engineering Department of The Johns Hopkins University. His current areas of interest are the development of nonlinear techniques in the areas of biomedical image processing and automated mine detection. He has published several journal articles and conference proceedings in the areas of nonlinear image processing, stochastic pattern recognition, and optimiz ation theory. About the Author*EDWARD R. DOUGHERTY holds an M.S. in computer science from Stevens Institute of Technology and a Ph.D. in mathematics from Rutgers University. He is currently a Professor in the Department of Electrical Engineering at Texas A&M University. He is editor of the SPIE/IS&T Journal of Electronic Imaging and of the SPIE/IEES Series on Imaging Science and Engineering. He is the author of eleven books, editor of four books, and has published numerous papers in nonlinear "ltering and mathematical morphology. His current interest is the optimal design of nonlinear "lters, granulometric analysis, and informatics for dCNA microarrays.
S. Batman et al. / Pattern Recognition 33 (2000) 1047}1057
1057
About the Author*FRANCIS SAND is an Associate Professor in Fiarleigh Dickinson University's School of Computer Science and Information Systems in New Jersey. He holds a Ph.D. in Mathematics from Princeton University, an M.S. (Applied Math.) from N.Y. Polytechnic University and a B.Sc. (Physics, Math.) from Cape Town University in South Africa. He has published extensively in statistics, operations research, systems theory and mathematical morphology. His current research interests include development of a practical approach to robust "lter design for a wide class of images. He is a leader in the use of computers for distance learning at FDU and is currently teaching a graduate course in computer science via the internet.
Pattern Recognition 33 (2000) 1059}1081
A switching algorithm for design of optimal increasing binary "lters over large windows Nina S.T. Hirata!,1, Edward R. Dougherty",*, Junior Barrera! !Department of Computer Science, Institute of Mathematics and Statistics, University of SaJ o Paulo, Rua do MataJ o, 1010 SaJ o Paulo, 05508-900, Brazil "Computer-Assisted Medical Diagnostic Imaging Laboratory, Department of Electrical Engineering, Texas A & M University, College Station, TX 77843-3128, USA Received 18 January 1999; received in revised form 2 May 1999; accepted 23 June 1999
Abstract All known approaches for the design of increasing translation-invariant binary window "lters involve combinatoric searches. This paper proposes a new switching algorithm having the advantage that the search is over a smaller set than other algorithms. Beginning with an estimate from image realizations of the optimal generic (nonincreasing) window function, the algorithm switches (exchanges) a set of observation vectors (templates) between the optimal function's kernel and the kernel's complement. There are many such `switching setsa that provide a kernel de"ning an increasing "lter. The optimal increasing "lter is the one corresponding to the switching set that produces the minimal increase in error over the optimal generic "lter. The core of the search problem is the inversion set of the optimal "lter. The inversion set is composed of all vectors in the kernel lying beneath a nonkernel vector in the lattice of observation vectors and all nonkernel vectors lying above a kernel vector. The new algorithm, which is based on an error-related greedy property, recursively eliminates the inversion set until the optimal increasing "lter is obtained. For purposes of computational e$ciency, the actual implementation may be based on a relaxation of the original construction, so that the result may be based on a relaxation of the original construction, so that the result may be suboptimal. For the various models tested, the relaxed algorithm has proven to be optimal or very close to optimal. Besides its good estimation precision, the new algorithm has three noteworthy properties: "rst, it is applicable to relatively large windows; second, it operates directly on the input data via estimates of the determining conditional probabilities; and third, the degree of relaxation serves as an input parameter to the algorithm, so that computation time can be bounded for large windows and the algorithm can run to full optimality for small windows. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Increasing "lter; Optimal "lter; Nonlinear "lter; Boolean lattice; Switching algorithm; Greedy algorithm
1. Introduction This paper presents a new algorithm to design increasing, translation-invariant window operators for binary
* Corresponding author. Tel.: #1-409-862-8154; fax: #1409-862-3336. E-mail address:
[email protected] (E.R. Dougherty) 1 Partially supported by CAPES Foundation (Brazil) and Department of Electrical Engineering of Texas A & M University.
images from experimental data. The task is statistical in nature: given data from pairs of images, one member of the pair being a realization of the ideal image to be estimated by a "lter and the other being a realization of the observed image to serve as input to the "lter, we desire an algorithm to estimate the optimal "lter from the data. In any such estimation problem, precision of estimation depends on the complexity of the "lter and the amount of data available. Here we also confront an additional problem, the complexity of the estimation (design) algorithm. From a purely probabilistic perspective, one would normally desire an estimate of the optimal "lter. However,
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 6 5 - X
1060
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
increasing operators can be desirable for three reasons: (1) the increasing property, combined with translation invariance, provides algebraic structure for representational and geometric analysis within the context of mathematical morphology; (2) increasing operators typically require far less logic circuitry for implementation; and (3) for a given amount of experimental data, estimation of the optimal increasing "lter is more precise than estimation of the optimal generic (not necessarily increasing) "lter. Algebraic structure and logic cost are important for image analysis and commercial implementation, respectively. Our emphasis in the present paper is on design from data. A binary digital-image operator ( is translation invariant if, for any binary digital image f (a binary valued function de"ned on the Cartesian grid) and pixel z, ((f )"(( f ) , where f is the translation of f by z, z z z de"ned by f (w)"f (w!z). ( is increasing if f )f z 1 2 implies (( f ))(( f ). Translation-invariant operators 1 2 are morphologically represented as suprema (unions) of hit-or-miss transforms [1]; increasing, translation-invariant operators are represented as suprema (unions) of erosions [2]. Speci"cally, the kernel of ( is the class K[(] of all images f such that (( f )(0)"1, and (( f ) is equal to the suprema of all erosions of f by images in K[(]. Reduction of the kernel expansion can be obtained by employing the basis, B[(], of (. B[(] is the set of all minimal elements in the kernel. Under suitable conditions, an increasing operator has a basis and (( f ) is equal to the supremum of all erosions of f by elements in B[(] [3,4]. Here we consider =-operators. There exists an n-pixel window ="Mw , w ,2, w N and a Boolean function 1 2 n t such that ( is de"ned by (( f )(z)"t( f (w #z), 1 f (w #z),2, f (w #z)). t is a binary-valued function, 2 n t(x), de"ned on vectors x"(x , x ,2, x ) of the n bi1 2 n nary variables of f in =. Letting X"M0,1Nn denote the set of all binary vectors over =, the kernel is expressed in terms of t by K[t]"Mx3X : t(x)"1N. The kernel is also called the 1-set of t, and its complement in X is called the 0-set of t. When using this terminology we write K[t] and K[t]# as tS1T and tS0T, respectively. The hit-or-miss expansion of ( corresponds to the disjunctive logical expansion of t. ( is increasing if and only if t is increasing, meaning that x)y implies t(x))t(y). t is increasing if and only if no minimal element of K[t] lies beneath a maximal element of K[t]#. The basis, B[(], is the set of minimal elements in K[t]. For increasing (, the basis expansion of ( corresponds to the reduced positive (complementfree) logical expansion of t and the expansion always represents t [5,6]. Finding an optimal =-"lter corresponds to "nding an optimal Boolean function, where now f is a random image, the observed vector X"(X , X ,2, X ) in = is 1 2 n random, the intent is to estimate a binary random vari-
able > that is a value at a pixel in the image to be estimated by the "lter, and the optimal mean-absoluteerror (MAE) estimator, t (X), of >, minimizes the expec015 tation MAEStT"E[D>!t(X)D] " + P(x)px # + P(x)(1!px ) (1) x x |tW0X |tW1X over all mappings t, where P(x) is the probability of observing x and px "P(>"1Dx) is the conditional probability that >"1 given X"x. t (x) is the binary 015 conditional expectation of >, given x : t (x)"1 if 015 px '0.5 and t (x)"0 if px )0.5. If t is not an optimal 015 estimator, then there is an error increase, *(t, t ), re015 sulting from applying t instead of t . Rather than 015 optimize over all possible operators on X, we can optimize over a subclass C of operators. The optimal "lter in C, t , is suboptimal relative to t with increased MAE C 015 *(t , t )" + D2px !1DP(x). (2) C 015 Mx C x >t ( )Et015 (x)N For "lter design, we take a random sample of N pairs, S"M(X , > ), (X , > ),2,(X , > )N, from image data 1 1 2 2 N N and form an estimate t of t via an estimation rule m; C,N C to wit, t "m(S). For the optimal "lter t , we often C,N 015 use the binarized sample mean. Letting >Dx be the random variable having conditional density f (yDx), the binarized sample mean is
G
1 if (1/Nx )+Nx y Dx'0.5, i/1 i t (x)" (3) 015, N 0 if (1/Nx )+Nx y Dx)0.5 i/1 i where yDx is the value of > given the observation X"x and Nx is the number of times x is observed during sampling. There are 2n parameters, P(>"1Dx), to estimate. Whether the sample mean is the best estimator, depends on the joint distribution of (X , X , 2, X , >) 1 2 n and some criterion of goodness. The MAE of t is 015, N MAESt T"MAESt T#*(t ,t ) (4) 015, N 015 015, N 015 Since t depends on the training sample, it is random 015, N and the precision of estimation is de"ned by the expected cost E[*(t , t ] [7]. The expected MAE of the 015, N 015 designed "lter is obtained by taking expectations in Eq. (3). Not only does E[*(t , t )] depend on 015, N 015 the estimation procedure m, but even if E[*(t , 015, N t )]P0 as NPR, for large windows N needs 015 to be so great to make E[*(t , t )] acceptably 015, N 015 small that it is often impossible to use the conditional expectation. An optimal constrained "lter t has less parameters to C estimate so that its precision, E[*(t , t )], can be C, N C much better than the precision for t . Constraint is 015 statistically bene"cial if E[*(t , t )]#*(t , t ))E[*(t , t )]. C, N C C 015 015, N 015
(5)
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
Both sides of Eq. (5) involve N, so that usefulness of the constraint depends on the sample size, the joint distribution of (X , X , 2, X , >), and the estimation proced1 2 n ure m. Since this paper concerns the estimation procedure for a constrained "lter, our interest is on the leftmost term, which we denote by e(C, m, N, n). According to the usual statistical de"nition, for a given sample size N, estimation procedure m is better than procedure m if 1 2 e(C, m , N, n)(e(C, m , N, n). We will adopt this per1 2 spective; however, in addition we are concerned with estimation time, which is the time it takes on a given platform to compute t . This time, q(C, m, N, n, S), C, N depends on the sample data and the window size. For elementary estimation rules such as the sample mean, median, and variance, time is not an issue. But for estimation of image "lters time is a key issue. While expected time to completion, E[q(C, m, N, n, S)], is important, worst-case situations are critical. We desire estimation procedures for which the upper bound, q (C, m, N, n), for 0 q(C, m, N, n, S), is within an acceptable limit. This is the real-time constraint. (A real-time system is one that must satisfy explicit bounded response time constraints to avoid failure [8].) For a large window, we may "nd an estimation procedure m for which e(C, m, N, n)P0 as NPR, but instead employ a procedure m that approx1 imates m with e(C, m , N, n)'e(C, m, N, n), but also with 1 q (C, m , N, n)(q (C, m, N, n). 0 1 0 A basic paradigm is to have a sequence of algorithms m , m ,2,m such that 1 2 K e(C, m, N, n)(e(C, m , N, n)(e(C, m , N, n) 1 2 (2(e(C, m , N, n) K q (C, m, N, n)'q (C, m , N, n)'q (C, m , N, n) 0 0 1 0 2 '2'q (C, m , N, n). 0 K
(6)
Practical application requires that, for a given imaging problem, sample size, and window size, there exist k such that q (C, m , N, n) is acceptable and e(C, m , N, n) is ac0 k k ceptably close to e(C, m, N, n). The constraint is statistically bene"cial relative to the unconstrained optimal "lter (Eq. (5)) if e(C, m , N, n)#*(t , t ))E[*(t , t )] k C 015 015, N 015
(7)
and q (C, m , N, n))q , the real-time constraint. Rela0 k .!9 tive to the MAE for the estimated "lter, Eq. (7) means that E[MAESt T])E[MAESt T]. C, N, mk 015, N We will employ the paradigm of Eq. (7) for increasing "lters. It needs to be recognized that, even if Eq. (7) does not hold, one may still choose the optimal increasing "lter for its other bene"ts. Speci"cally, if we require an MAE of d after "ltering, and would like an increasing "lter, then, so long as E[MAESt T])d, we may C, N choose to employ t even though Eq. (5) is violated. C, N
1061
2. Design procedures for optimal increasing W-operators As originally formulated morphologically, "nding an optimal increasing =-operator corresponds to "nding an optimal erosion representation, but this corresponds to selecting an optimal basis [9]. For a single erosion "lter, eb (x)"minMx : b "1N, where b"(b , b , 2, b ), i i 1 2 n MAE is given by + P(x, y) (8) ( , y) > yEeb (x)N For design, MAESbT is estimated from realizations of the ideal and observed images. For an m-erosion "lter (B with basis B"Mb ,b ,2, b N, "lter error is given by 1 2 m MAESbT"E[D>!eb (X)D]"
Mx
MAEStB T"E[D>!tB (X)D] P(x, y). (9) + b x N ( , y) > yE.!9 e ( ) i i An estimate of the optimal increasing Boolean function, t , can be found by estimating MAEStB T for all */# possible bases and choosing t as the operator corre*/# sponding to the basis having minimal MAE. Since there are 2n subsets of =, there are 22n =operators, each de"ned by a collection of subsets of = de"ning the kernel of a Boolean function. The number of increasing =-operators, which corresponds to the number of bases, is substantially less. (The problem of counting the number of increasing operators is known as Dedekind 's problem [10].) Nonetheless, as formulated, optimization corresponds to a search over a space that is very large unless the window is quite small. Various constraints have been imposed to reduce the search, which concomitantly reduce estimation error and impose constraint error. These include limiting the basis size and constraining the search to a library (subclass) of all possible structuring elements [11]. Library constraint requires some method of choosing the library. Expert libraries are collections of structuring elements whose e!ects are well-known (to experts) or which are basis members of popular "lters known to work reasonably well for similar image models. First-order libraries are found by placing into the library some number of structuring elements possessing the smallest MAEs as singleerosion "lters. A more basic approach to design (that has proven useful in digital document processing) is to employ a recursive representation of the MAE [12,13]. As expressed in Eq. (9), it would appear that design must include obtaining realization-based statistics for every basis; in fact, one need only obtain MAE estimates for singleerosion "lters and then recursively obtain MAE estimates for multiple-erosion "lters. The MAE of an m-erosion Boolean function t can be expressed in terms of m "
Mx
1062
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
a single-erosion function with structuring element b and m two (m!1)-erosion functions t and / : m~1 m~1 MAESt T"MAESt T!MAES/ T m m~1 m~1 #MAESb T m
(10)
where the bases are given by B[t ]"Mb , b , 2, b N, m~1 1 2 m~1 B[t ]"B[t ]XMb N"Mb , b , 2, b , b N, m m~1 m 1 2 m~1 m B[/ ]"Mb Xb , b Xb ,2, b Xb N. m~1 1 m 2 m m~1 m Even using this recursive representation, some type of constraint is necessary on the collection of structuring elements from which to choose the basis, and typically the basis is restricted to from six to eight structuring elements to achieve acceptably bounded estimation time. By letting k be the number of basis elements, the paradigm of Eq. (6) is satis"ed by MAE-representation-based design. The original basis formulation of the optimal increasing "lter [9] depends on a search strategy. The switching strategy to be discussed in the next section uses the lattice structure of X to discover the optimal basis. Before discussing switching, which is the ground of the new algorithm introduced in this paper, we brie#y describe a proposed graph search design built on the idea that, given the basis of an increasing operator, one can determine systematically which elements can be added to the corresponding kernel to produce another increasing operator [14]. The "rst node of the graph corresponds to the basis formed by the maximum structuring element. The corresponding kernel possesses one element. The MAE for this node is easily computed from the probabilities of all the elements. A new node is created and linked to the previous node if and only if an increasing operator can be built by adding just one element to the kernel of the previous node. The MAE of the new operator is computed from the MAE of the previous node together with the probability of the element being added. An arc in the graph holds MAE variation between the two nodes (operators) that it links. Note that the entire graph (were it to be computed) gives the lattice of all increasing operators. The goal of the graph search is to "nd the node (increasing operator) with the smallest MAE. Among the elements that can be added to the current kernel, the search algorithm always chooses one producing the largest decrease in MAE * if such an element exists (greedy condition). In this case, the algorithm creates a new node, assigning to it the new kernel and MAE. The process is then continued. However, if all of the elements that can be added to the kernel increase MAE, then the algorithm faces an undecidable condition. It has
to proceed in all possible branches, i.e., it creates a node for each possible element that can be added to the current kernel and then proceeds from each of these new nodes. These nodes are kept in a stack or queue, and processed sequentially. Whenever a node with smaller MAE than the smallest found up to the moment is reached, it is recorded as the smallest so far. When all elements that produce decreases in MAE have been added to the current kernel, the algorithm can stop further expansion on this branch because all operators that can be created by adding more elements will have greater MAE than the current one (bound condition). The graph is built iteratively based on the greedy condition whenever it is satis"ed, and only the branches that need to be traversed are built. Nonetheless, the worst case of the algorithm is exponential.
3. Switching If t and t de"ne the optimal and optimal increas015 */# ing "lters, respectively, then the constraint error for increasingness is given by *(t , t ). If we have estimates */# 015 of the conditional probabilities px , then we ipso facto have an estimate of the optimal generic "lter, speci"cally, we have an estimate of K[t ]. Switching refers to an 015 estimation procedure whereby we begin with the kernel K "K[t ] and iteratively exchange (switch) ele0 015 ments to obtain a sequence of kernels K , K ,2,K in 0 1 m such a way that K "K[t ] [15]. The cost of switchm */# ing x from K[t ] to K[t ]# or from K[t ]# to 015 015 015 K[t ] is the increase in MAE resulting from the ex015 change and is given by c(x)"D2px !1DP(x). If the sequence K , K ,2,K results from switching vectors 0 1 m between K[t ] and K[t ]#, then elements with the 015 015 smallest costs should be switched, since from Eq. (2), *(t , t ) is the sum of the switching costs. */# 015 For any operator t, its inversion set, to be denoted by Q , is composed of all 1-set elements having 0-set elet ments above them in X, and all 0-set elements having 1-set elements beneath them. t is increasing if and only if Q "0. When determining the optimal switching transt formation K[t ]PK[t ], only elements in the in015 */# version set of t need be considered for switching. If 015 there exists a 0-set element y above a 1-set element x, then there are two possibilities: either switch x to 0 [meaning that t(x) is set to 0] or switch y to 1 [meaning that t(y) is set to 1]. If x is switched to 0, then all 1-set elements beneath x (if any) must also be switched to 0; if x is not switched, then all 0-set elements above it must be switched, and so on. The decision whether to switch (the value of) an element may depend on the values of elements adjacent to it in the lattice, and also on elements related to those elements, etc. [15]. For any set A-X, we let A S1T"AWtS1T and t A S0T"AWtS0T. If the values of t are switched over A, t
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
then the resulting operator is given by
G
1 if x3(tS1TC A S1T)XA S0T, t t t (x)" A 0 if x3(tS0TC A S0T)XA S1T. t t
(11)
If t is increasing, then A is called a switching set. A Switching t by a switching set A yields an increasing 015 operator t A with MAE increase 015 *(t A , t )" + c(x)# + c(x)" + c(x) 015 015 x t015 x t015 x |A W0X |A W1X |A (12) where the "rst and second sums are due to switching the 0-elements of A to 1 and the 1-elements of A to 0, respectively. Finding an optimal increasing operator is equivalent to "nding a switching set (relative to t ) that 015 minimizes the preceding error. A switching set A for which *(t A , t ) is minimal among all switching sets 015 015 is called an optimal switching set. Since there always exists an optimal switching set that is a subset of the inversion set, one only needs to consider subsets of the inversion set, when using the switching approach to design an optimal increasing operator.
1063
Fig. 1(a) shows the 0- and 1-sets of an operator, with the corresponding inversion set being shown in Fig. 1(b). The elements of tS1T are the dark ones (this convention will be used throughout the sequel). The diagram expresses the partial order relation among the elements. There is an edge (line) linking two elements x and y, x(y, if and only if there is no other element z such that x(z(y. Observe that, regarding elements of the inversion set, there exists at least one element of the 1-set beneath each element of the 0-set, and vice-versa. Fig. 1(c) gives an example of a switching set (the elements enclosed by a double frame), and Fig. 1(d) shows the increasing operator resulting after the switchings. Fig. 2(a) shows a subset of the same inversion set that is not a switching set. The resulting nonincreasing operator is shown in Fig. 2(b). The computation times for both the graph-search kernel-building method [14] and the original switching algorithm [15] depend on the size of the inversion set. Both are acceptably time-bounded for only small inversion sets. In practice, this means that they can only be used for small windows. The algorithm that we propose is a variant of the original switching algorithm, in that it begins from the kernel of the optimal "lter; however, it switches
Fig. 1. An increasing operator produced by switchings.
1064
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
Fig. 2. Switches that produce a nonincreasing operator.
substantially larger numbers of elements by recognizing that "nding the optimal switching set is equivalent to "nding an optimal partition of the inversion set. It begins by partitioning the inversion set to produce an operator possessing a smaller inversion set, and then it partitions this residual inversion set, and so on. Upon completion, this recursive procedure has partitioned the inversion set of the optimal "lter in such a way as to produce the optimal increasing "lter.
4. A new method for computing optimal switching sets In this section, we propose an algorithm to "nd an optimal switching set, by reformulating it as a problem of "nding an optimal partition of the inversion set. It will be assumed implicitly that an optimal operator t is given, and any mention to inversion sets or switch015 ing sets must be understood as referring to t . An 015 inversion set Q 015 will be simply denoted Q, and, theret fore, QS0T"Q 015 Wt S0T and QS1T"Q 015 Wt S1T. t 015 t 015 4.1. Switching problem viewed as a partition problem Given an inversion set Q, an ordered pair (¸, ;) such that ¸X;"Q, ¸W;"0, and no element of ¸ lies above any element of ; (or equivalently, no element of ; lies beneath any element of ¸) is called a valid partition of Q. The sets ¸ and ; are called lower and upper sets of the partition, respectively. Proofs of propositions are given in the appendix. Proposition 1. Let Q be an inversion set, A-Q be a switching set, and (¸, ;) be a valid partition of Q. Consider the sets ¸ "(QS0T C AS0T)XAS1T, ; "(QS1T C A A AS1T)XAS0T, and A( )"(¸WQS1T)X(;WQS0T). Then L,U (a) (¸ , ; ) is a valid partition of Q. A A (b) A is a switching set. (L,U)
(c) A A A "A. (L , U ) (d) (¸ (L, U) , ; (L, U) )"(¸, ;). A A To illustrate these concepts, let Q"M0001, 0010, 0101, 0110, 1001, 1101N, as shown in Fig. 3(a). In this case, QS0T"M0110, 1001, 1101N and QS1T"M0001, 0010, 0101N. Consider the switching set A"M0001, 0010, 1101N. Then, AS0T"M1101N, AS1T"M0001, 0010N, and the corresponding partition is ¸ "(QS0T C AS0T) X A AS1T"M0001, 0010, 0110, 1001N and ; "(QS1TC A AS1T)XAS0T"M0101, 1101N. On the other hand, consider the partition (¸, ;)"(M0001, 0010, 0110, 1001N, M0101, 1101N), shown in Fig. 3(b). Then, the corresponding switching set is A "(¸WQS1T)X(;WQS0T)" (L, U) M0001, 0010NXM1101N"M0001, 0010, 1101N. Figs. 3(c) and (d) show, respectively, two other switching sets and the respective partitions. Because of the equivalence between switching sets and valid partitions, given a switching set A, the error increase of t A (Eq. (12)) can be rewritten in terms of the 015 valid partition (¸ , ; ) as: A A *I (¸ , ; )" + c(x)# + c(x) A A x A x A |U W0X |L W1X
(13)
i.e. *3(¸ , ; )"*(t A , t ). A A 015 015 The problem of "nding an increasing operator t that minimizes Eq. (2), with t in place of t , */# */# C which is equivalent to the problem of "nding an optimal switching set, is, therefore, equivalent to the problem of "nding a valid partition that minimizes Eq. (13). 4.2. Optimal valid partition problem In this new setting, the problem of designing an optimal increasing =-operator can be stated as follows: given an inversion set Q and nonnegative switching costs c(x) for any x3Q, "nd a valid partition, among all valid partitions of Q, with minimum error increase (Eq. (13)).
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
1065
Fig. 3. Some switching sets and respective partitions.
This will be called the Optimal Valid Partition (OVP) Problem. Finding an OVP is a combinatorial optimization problem, whose space of solutions consists of all valid partitions of Q. Since Q may have exponential size with relation to the window size, brute-force algorithms might not be applied in acceptable time. A good algorithm for solving optimization problems should scan only relatively few possible solutions until "nding the optimal one. To do that it must exploit some special property or structure of the solution space. Sometimes the solution space may present a greedy property, in which choosing a local optimum at each step leads to the global optimum. The next de"nitions and propositions will be used to show that the solution space for the OVP problem has a greedy property, which will be exploited by the algorithm. Proposition 2. Let (¸, ;) be a valid partition of Q and let F be a subset of Q such that (QCF, F) is a valid partition of Q. The following statements are true.
(a) If F-;, then (¸, ;C F) is a valid partition of QCF (Fig. 4(a)). (b) If F-¸, then (¸C F, ;XF) is a valid partition of Q (Fig. 4(b)). (c) If F U. ; and F U . ¸, then (¸C(¸WF), ;X(¸WF)) is a valid partition of Q and (¸WF, ;WF) is a valid partition of F (Fig. 4(c)). (d) If (¸@, ;@) is a valid partition of QCF then (¸@, ;@ XF) is a valid partition of Q (Fig. 4(d)). (e) If (¸A, ;A) is a valid partition of ¸, then (¸A, ;X;A)" (¸A, QC¸A) is a valid partition of Q (Fig. 4(e)). Dual results are valid in relation to subsets F and Q such that (F, QCF) is a valid partition of Q. Proposition 3. Consider the trivial valid partition (Q, 0) of Q. Then *3(Q, 0)" + c(x) x |QW1X
1066
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
Fig. 4. For Proposition 2.
Dually, relative to the dual trivial partition (0, Q), *3(0, Q)"+x c(x). These are, respectively, the |QW0X error increases corresponding to the cases where all elements of QS1T are switched to 0 and where all elements of QS0T are switched to 1. If (¸, ;) is a valid partition of Q, then its error increase *3(¸, ;) (Eq. (13)) can be rewritten, in terms of *3(Q, 0), as *3(¸, ;)"*3(Q, 0)# + c(x)! + c(x) x x |UW0X |UW1X or dually as
(14)
*3(¸, ;)"*3(0, Q)# + c(x)! + c(x) (15) x x |LW1X |LW0X Therefore, partition (¸, ;) is preferable over (Q, 0) if and only if +x c(x)!+x c(x)(0; it is preferable |UW0X |UW1X over (0, Q) if and only if +x c(x)!+x c(x)(0. |LW1X |LW0X To simplify the presentation, we de"ne the weight of a subset.
Fig. 5. Switching cost and weight.
Fig. 6. For Proposition 7.
De5nition 4. The weight of a subset Z of Q is de"ned as u(Z)" + c(xT! + c(xT x x |ZW0X |ZW1X In particular,
G
u(MxN)"
c(x)
if x3QS0T, . !c(x) if x3QS1T
Given a subset Z-Q, suppose that only two choices are allowed: either switch all elements of ZS0T to 1 or switch all elements of ZS1T to 0. In order to produce smaller error increase one should switch the set with smallest switching cost. For instance, if +x c(x)'+x c(x), |ZW1X |ZW0X i.e, u(Z)(0, then it is better to switch ZS0T to 1 rather than the contrary. If u(Z)'0 then it is better to switch ZS1T to 0 rather than the contrary. If u(Z)"0, it does not matter. We use the convention to switch 1-elements to 0. The "rst two cases are illustrated respectively in Figs. 5(a) and (b). The numbers by the side of each element represent the respective switching costs. Proposition 5. Given two disjoint subsets Z and Z of 1 2 Q, u(Z XZ )"u(Z )#u(Z ). 1 2 1 2
Proposition 6. Let (¸, ;) be a valid partition of Q. Then *3(¸, ;)"*3(Q, 0)#u(;) Proposition 6 rewrites Eq. (14) in terms of weight and it states that the error increase of a partition can be expressed in terms of the error increase of the trivial partition and the weight of the upper set of the partition. Dually, relative to Eq. (15), *3(¸, ;)"*3(0, Q)!u(¸). Proposition 7. Let (¸, ;) be a valid partition of Q, and (¸@, ;@) be a valid partition of ¸ (see Fig. 6(a)). Then, (¸C;@, ;X;@) is a valid partition of Q and *3(¸C;@, ;X;@)"*3(¸, ;)#u(;@) Proposition 7 is a generalization of Proposition 6. When a set of elements is moved from the lower to the upper set, the variation of the error increase between the original partition and the new one is given by the weight of the subset. Dually, if (¸A, ;A) is a valid partition of ;, then (¸X¸A, ;C¸A) is a valid partition of Q and
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
*3(¸X¸A, ;C¸A)"*3(¸, ;)!u(¸A) (see Fig. 6(b)), corresponding to the case where a subset is moved from the upper to the lower set. To "nd an OVP, one could choose an initial partition of Q and try to change it in order to produce new partitions with smaller error increase, by applying Proposition 7. However, it raises some non trivial questions such as which initial partition should be chosen, and how many times one has to change a given partition in order to "nd an optimal one. As a second possibility, one could consider the following. If just a small part of Q is analyzed at a time, it may be possible to decide in which side of the optimal partition it should belong. If such decisions are possible, then an iterative procedure, which considers initially empty upper and lower sets, could choose subsets of Q that certainly will belong to one of the sides (lower or upper) of the optimal partition, and remove it from Q to the appropriate side of the partition. By repeating this procedure for the remaining inversion set, one could progressively build the upper and lower sets. The basic di!erence between the two approaches is that the second one gradually reduces the problem size (inversion set size) while the "rst one must iterate several times (until some criterion is reached) over problems of the same size. To illustrate the idea of the second approach, which we will use, consider the inversion set shown in Fig. 7(a). Analyzing the subset Mg, h, iN, we see that the cost of switching g from 0 to 1 is smaller than the cost of switching both h and i from 1 to 0. Since switching g from 0 to 1 does not a!ect other elements, Mg, h, iN can be safely
1067
moved from the inversion set to the upper set (Fig. 7(b)). Elements moved from the inversion set are depicted by dotted lines, the dark ones are elements of the upper set and the white ones are elements of the lower set. Next, since switching f from 1 to 0 does not a!ect other elements, and since the cost of switching f to 0 is smaller than the cost of not switching it (since it implies that at least d must be switched to 1) the subset Md, f N can be safely moved to the lower set (Fig. 7(c)). Analyzing the remaining inversion set Ma, b, c, eN we can see, for instance, that Ma, cN can be safely moved to the upper set (Fig. 7(d)). The last two elements do not need to be switched, therefore they can be moved to the upper set (Fig. 7(e)). The resulting partition is shown in Fig. 7(f), where the elements of the switching set are marked by double circles. It is important to observe that the sequence in which the subsets can be chosen is not unique, and also that the subsets to be chosen do not need to be necessarily those shown by the example. For instance, instead of choosing the subsets Ma, cN and then Mb, eN, one could have chosen Ma, b, c, eN, or Ma, bN and then Mc, eN. Moreover, these subsets must be chosen carefully. For instance, for the case shown in Fig. 8, one could wrongly decide to move all elements to the upper set because u(0, when the optimal choice is Ma, cN in the upper set and Mb, dN in the lower set. In order to formalize the notion of safely removable subsets and eliminate some ambiguities relative to the choice of subsets, we introduce the concept of feasible sets and then we prove that an algorithm that always choose a feasible set to be removed from Q, adding it to either the
Fig. 7. Dynamic of partitioning.
1068
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
(b) if F is ¸-feasible, then (¸@XF, ;@) is an optimal partition of Q.
Fig. 8. An inversion set.
Theorem 11 is the ground for the algorithm proposed here. It starts with empty upper and lower sets and then sequentially moves feasible sets from Q to these sets. Since it is based on a greedy property, once a subset is removed, it will never be put back into Q. This fact together with Lemma 10 guarantees that the process "nishes. Theorem 12. Given an inversion set Q and switching costs c(x) for each element x of Q, the following OVP algorithm produces an optimal solution for the OVP problem. Algorithm 1 (OVP algorithm)
Fig. 9. For De"nitions 8 and 9.
upper or to lower set, depending on which one is adequate, will generate an optimal partition of Q. De5nition 8. Let F be the class of non empty subsets U F of Q (see Fig. 9(a)) satisfying 1. (QCF, F) is a valid partition of Q, and 2. u(F)(0. A subset F3F is ;-feasible if and only if F is minU imum in F relative to -. U De5nition 9. Let F be the class of nonempty subsets L F of Q (see Fig. 9(b)) satisfying 1. (F, QCF) is a valid partition of Q, and 2. u(F)*0. A subset F3F is L-feasible if and only if F is minimal L in F relative to -. L For convenience, (0, 0) is considered a valid and optimal partition of 0. To refer to U- or L-feasible sets, without explicit distinction, we will simply use the term `feasible setsa. Lemma 10. (a) If an inversion set Q does not contain a ;-feasible set, then it contains an L-feasible set. (b) If an inversion set Q does not contain an ¸-feasible set, then it contains a U-feasible set. Theorem 11. Let F be a feasible set of Q, and (¸@, ;@) be an optimal partition of QCF . Then, (a) if F is ;-feasible, then (¸@, ;@XF) is an optimal partition of Q, and
(1) Set ;Q0 and ¸Q0. (2) If Q is empty, then return (¸, ;) and exit. (3) Search for a feasible set F in Q. If F is ;-feasible, then do ;Q;XF; if F is ¸-feasible, then do ¸Q¸XF. Do QQQCF and return to step 2. Figs. 10 and 11 give two examples of application of the OVP algorithm. They illustrate sequences of feasible sets being removed successively from two di!erent inversion sets. The "gures show the initial inversion set, the intermediary states generated by the algorithm (where the feasible sets found are marked by a parabolic curve, and elements removed from the inversion set are drawn by dotted lines), and the resulting partition.
5. Searching for feasible sets The main part of the algorithm presented in the previous section consists in searching the inversion set for feasible sets. In this section, we will present a searching strategy for feasible sets and a constraint over feasible sets to control the complexity of the search problem. 5.1. Searching strategy De5nition 13. Let ¹"Mz , z , 2,z N, k*1, be a subset 1 2 k of the inversion set Q. Then we de"ne ;[¹]"Mx3Q : &z3¹, x*zN, ¸[¹]"Mx3Q : &z3¹, x)zN. Proposition 14. (a) If (QCF, F) is a valid partition of Q and x , x ,2x are the minimal elements of F, then 1 2 k F";[Mx , x ,2,x N]. 1 2 k
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
1069
Fig. 10. An example.
(b) If (F, QCF) is a valid partition of Q and x , x ,2,x 1 2 k are the maximal elements of F, then F"¸[Mx , x ,2,x N]. 1 2 k Proposition 15. (a) All minimal elements of a ;-feasible set F of Q are elements of QS1T. (b) All maximal elements of an ¸-feasible set F of Q are elements of QS0T.
From Propositions 14 and 15, it is clear that when searching Q for ;-feasible sets, an algorithm needs to consider only subsets of the form ;[Mx , x ,2,x N], 1 2 k k*1, where x 3QS1T, for all i3M1, 2,2, kN. Moreover, i due to the minimality condition of feasible sets, the algorithm needs to make sure that no proper subset of the set to be tested is a ;-feasible set. In other words, a set
1070
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
Fig. 11. Another example.
;[Mx , x ,2,x N] should be tested only after all 1 2 k subsets of the form ;[Mx 1 , x 2 ,2,x j N],1)j(k, i 3 i i i l M1, 2,2, kN, 1)l)j, i Oi if lOt, have been tested. l t
In practice, the searching procedure is performed on the graph structure corresponding to Q. Elements of Q correspond to the nodes of the graph. Two nodes
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
corresponding to two elements of Q, x and y, x(y, are linked by an edge if and only if there is no other element z3Q such that x(z(y. The node corresponding to an element x will be referred as x too. To describe the searching procedure used in practice, we de"ne the level of a node. The maximal nodes of QS1T are de"ned as nodes at level 0. A node x3QS1T is at level 1 if and only if all 1-elements in ;[MxN]CMxN are at level 0. In general, if t is the maximum level among all nodes in (;[MxN]CMxN)WQS1T, then the level of x is de"ned as t#1. We denote the level of x by l(x). Therefore, ∀x3QS1T, such that x is not maximal in QS1T, l(x)"maxMl(y) : y3(;[MxN]CMxN)WQS1TN#1. Consider ;[MxN], x3QS1T and l(x)"t. The subsets ;[MxN] that can potentially be ;-feasible are those whose minimal elements are in (;[MxN]CMxN)WQS1T, which means that the level of these nodes is smaller than t. Therefore, as long as all subsets corresponding to combinations of elements of QS1T with level smaller that t have been tested (and are not ;-feasible), the algorithm can proceed to test ;[MxN]; it only needs to verify whether ;[MxN] satis"es condition (2) of De"nition 8, i.e., whether u(;[MxN])(0. More precisely, the algorithm may proceed to test ;[MxN] if all subsets corresponding to combinations of elements in QS1TW(;[MxN]CMxN) have been tested. Generalizing this idea, a subset ;[Mx , x ,2, x N] 1 2 k should be tested only after all subsets corresponding to combinations of elements in (Zk ;[Mxi N])WQS1T have i/1 been tested. This gives the following search procedure. First, test all subsets of the form ;[MxN], where x3QS1T and l(x)"0. Then test subsets of the form ;[Mx , x N], where both 1 2 x and x are in QS1T and have level 0. Proceed taking 1 2 combinations of three, four and more elements at level 0. After testing all possible combinations of elements at level 0, test all subsets with one minimal element at level 1. Before testing combinations of two or more elements at level 1, each element x at level 1 needs to be combined with each combination of elements at level 0. Then, combinations of 2 elements at level 1 need to be combined with all combinations of elements at level 0, and so on. The number of possible combinations grows quickly as the level increases. Each time a feasible set is found, the level of the elements needs to be updated, and the search restarted at level 0. Actually, only combinations involving elements x such that some portion of ;[MxN] has been removed need to be tested again. The discussion applies also for ¸-feasible sets, levels increasing from bottom to top. In general, feasible sets have few minimal/maximal elements and therefore the search need not go through all combinations described above. However, it is possible that at some instance of the problem all feasible sets are very large. In such a case, the algorithm will need to test
1071
a large number of subsets before "nding a feasible set, a critical situation in terms of processing time. 5.2. Relaxed searching procedure In order to control the combinatorial complexity of the searching problem, we impose some constraints on the number of minimal/maximal elements in a feasible set. De5nition 16. Let F (k) be the class of nonempty subU sets F of Q satisfying 1. (QCF, F) is a valid partition of Q, 2. u(F)(0, and 3. F has at most k minimal elements. A subset F3F (k) is ;-k-feasible if and only if F is U minimal in F (k) relative to -. U ¸-k-feasible sets are de"ned in a similar way, replacing minimal by maximal elements. These two de"nitions introduce what we will simply call as k-feasible sets. If only k-feasible sets are searched, then the search procedure described previously needs to consider combinations with up to k minimal/maximal elements, and therefore the processing time can be controlled by changing the value of k. This relaxation may introduce suboptimality, meaning that the result may be non-optimal, although, conceptually, it is always possible to choose a su$ciently large k to produce the optimal result. We use the term k-relaxed algorithm to refer to the algorithm that searches only k-feasible sets as opposed to the previously given algorithm that searches (nonrelaxed) feasible sets. We will analyze some cases that may occur due to relaxation. Let Q be the inversion set shown in Fig. 12(a). There exists only one 1-feasible set, ;[MiN], in Q. Since ;[MiN]"Q, after moving it from Q to the upper set, the 1-relaxed algorithm stops, returning as a result the partition (0, Q), as shown in Fig. 12(b). However, ;[MiN] is not a feasible set in the strict sense because it contains subsets ;[Md, fN], ¸[Mg, hN], ¸[Mg, eN] and ¸[Me, hN] that are feasible. All these feasible sets have more than 1 minimal/maximal element, which explains why they cannot be found by the 1-relaxed algorithm. The optimal result (*3"9) is shown in Fig. 12(c). If the 2-relaxed algorithm is employed on Q, then no feasible set will be missed. Let F be the k-feasible set being removed from an inversion set Q by the k-relaxed algorithm. Since the k-relaxed algorithm does not verify subsets with more than k minimal/maximal elements for feasibility, if there exists at least one subset with more than k minimal 1-set elements in F and F is ;-k-feasible, or if there exists a subset with more than k maximal 0-set elements in F and F is a ¸-k-feasible set, then (non-relaxed) feasible sets may be missed. Note, however, that the existence of
1072
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
Fig. 12. Result of the relaxed algorithm is not optimal.
Fig. 13. Result of the relaxed algorithm is optimal.
such subsets in F does not imply the existence of nonrelaxed feasible sets in F. In some cases, the relaxed algorithm may produce an optimal result even after missing some feasible sets. To illustrate such a case, we consider the inversion set Q shown in Fig. 13(a). If the 1-relaxed algorithm is applied on Q, then two 1-feasible sets, ;[MiN] and ;[MkN] (the latter inside QC;[MiN]), will be found. Again, ;[MiN] is not a feasible set in the strict sense. Nonetheless, the result produced by the 1-relaxed algorithm, shown in Fig. 13(b), is optimal (optimal result from the 2-relaxed algorithm is shown in Fig. 13(c). This example shows that if no elements are moved to the wrong side of the partition, then there is a chance the result will be optimal, even if
some choices violating strict feasibility are made at intermediary steps. Let Q be the inversion set shown in Fig. 14(a). If the 1-relaxed algorithm is applied on it, only one 1-feasible set (¸[MeN]) will be found (Fig. 14(b)). There is no 1feasible set in the remaining inversion set Ma, b, c, d, fN (Fig. 14(c)). Generally speaking, if at some instance there exists no k-feasible set in the remaining inversion set, then the k-relaxed algorithm will not be able to process this portion. This is called an unsolvable portion. To process an unsolvable portion, say Q@, a simple strategy consists in computing *3(Q@, 0) and *3(0, Q@), and then removing Q@ to the upper set if *3(Q@, 0)(*3(0, Q@) or to the lower set on the contrary. Alternatively, if Q@ is
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
1073
Fig. 14. An example of unsolvable portion.
small enough, then all possible partitions of Q@ can be checked and a partition with the smallest error increase can be chosen. A pseudo-code for the relaxed algorithm is shown below. Algorithm 2 (Relaxed OVP algorithm) (1) Set ;Q0 and ¸Q0. (2) If Q is empty, then return (¸, ;) and exit. (3) Search for a k-feasible set F in Q. f If F is found, and if it is ;-feasible, then do ;Q;XF; if F is ¸-feasible, then do ¸Q¸XF. Do QQQCF. f If F is not found, compute *3(Q, 0) and *3(0, Q). If *3(Q, 0))*3(0, Q) then do ¸Q¸XQ, otherwise do ;Q;XQ. Do QQ0. Return to step 2.
Fig. 15. Unsolvable portion processed correctly.
the unsolvable portion. However, it is important to keep in mind that missing feasible sets or the existence of an unsolvable portion do not necessarily imply suboptimality. The cases in which there is a possibility that feasible sets have been missed or when an unsolvable portion have been found will be referred as uncertain cases. 5.3. Bounds for the error increase
Suppose a situation in which no feasible set has been missed up to the point an unsolvable portion has been found. Even after processing the unsolvable portion as described, the result may still be optimal. To see such a case, suppose the unsolvable portion Q@ to be processed is the one shown in Fig. 15(a) (none of ¸[MaN], ¸[MbN], and ;[McN] are 1-feasible). The 1-relaxed algorithm computes *3(Q@, 0)"4 and *3(0, Q@)"6 and decides to move Q@ to the lower set. Since ¸[Ma, bN]"Q@ is a feasible set (in strict sense), it would be moved by the non-relaxed algorithm to the lower set too (Fig. 15(b)). On the other hand, the strategy adopted to process the unsolvable portion may produce a nonoptimal result, as shown in the following case. Let Q@ be the unsolvable portion shown in Fig. 16(a). Then, *3(Q@, 0)"15 and *3(0, Q@)"13. Therefore Q@ will be moved to the upper set as shown in Fig. 16(b). However, the optimal solution (*3"11) is the one shown in Fig. 16(c). In summary, relaxation may produce non-optimal results due to missed feasible sets or incorrect processing of
It is possible to state worst-case bounds for the error increase of the operator produced by the relaxed OVP algorithm, which may be a nonoptimal increasing operator. If we let *3 denote the error increase of the optimal partition derived from an inversion set Q, then, 0)*3)min M*3(Q, 0), *3(0, Q)N Let ;@, ¸@ be the sets of elements in the upper and lower sets, respectively, when the algorithm faces an uncertain case for the "rst time. At this point, it is certain that elements in ;@ and ¸@ are, respectively, elements of the upper and lower sets of the optimal partition. The error increase due to ;@ and ¸@ are given, respectively, by c`(;@)" + c(x) and c~(¸@)" + c(x) x x |U{W0X |L{W1X Therefore the error increase until the point the algorithm faces an uncertain case for the "rst time is given by
1074
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
Fig. 16. Unsolvable portion processed incorrectly.
c`(;@)#c~(¸@), which is a lower bound for the actual value of *3, i.e., 0)c`(;@)#c~(¸@))*3 On the other hand, let *3 denote the error increase k corresponding to the "nal partition actually produced by the k-relaxed algorithm. Then, *3 is smaller or equal to k minM*3(Q, 0), *3(0 Q)N. Therefore the following relation is true: 0)c`(;@)#c~(¸@))*3)*3 )minM*3(Q, 0), *3(0, Q)N. k (16) As we discussed before, if the algorithm does not face an uncertain case, then the produced result is optimal, i.e., *3 "*3. As we will see in the experimental results, k *3 is usually very close to *3. k 6. Experimental results We experimentally consider the use of di!erent k for the relaxed OVP algorithm and compare it to a standard implementation (to be presented later) of the MAE-representation-based algorithm. Several experiments have been performed on a Pentium II, 300 MHz, using di!erent image models (A, B, C, and D), shown by Figs. 17}20. The windows used in the experiments are shown in Fig. 21. 6.1. OVP algorithm performance If the k-relaxed algorithm does not face an uncertain case, then we know that *3 is optimal. Moreover, if *3 is k k optimal then *3 "*3 , for all l*k. On the other hand, l k even in cases where the algorithm faces an uncertain case, the result may still be optimal as discussed in the previous section. However, to guarantee optimality we need
Fig. 17. Group A: 15% salt-and-pepper noise.
to make sure no uncertain case has occurred during the processing. Table 1 shows the results obtained for 67 experiments using a 3]3 window. Rather than presenting the value of *3 , here we emphasize comparison among *3 , *3 and k 1 2 *3 . These results are separated in two major groups: the 3 optimal (cases in which at least one of the k-relaxed algorithms, k"1, 2, or 3, did not face an uncertain case), and the uncertain (cases in which all the k-algorithms, k"1, 2, and 3, faced an uncertain case). Each of these groups are subdivided in subcases. For the optimal cases, if column 2 indicates k"1, that means that the 1-relaxed algorithm did not face an uncertain case and therefore it is known that *3 is optimal. More1 over, in this case, *3 "*3 "*3 . If k"2, then that 1 2 3 means that the 1-relaxed algorithm faced an uncertain case while the 2-relaxed algorithm did not, i.e., *3 is 2 optimal. In this case, for the subcases in which *3 "*3 , 1 2 we conclude that *3 is also optimal after computing 1 *3 and knowing that it is optimal; if *3 '*3 , then we 2 1 2 conclude that *3 is suboptimal. The same analysis ex1 tends for k"3.
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
1075
Fig. 18. Group B: Edge noise, di!erent densities.
Fig. 19. Group C: Boolean model with square grains, and subsets of 3]3 square as noise.
Fig. 20. Group D: Rectangles with edge noise having di!erent densities.
Fig. 21. Windows used in the experiments.
The third column shows how many times the optimal solution was found using the corresponding k at column 2. Column 4 shows the subcases for each k, and column 5 shows how many times each of them have been observed. Column Model indicates the image model
used for the experiments and the last column indicates the frequency at which each of the sub-cases have been observed for the respective image model. Table 2 shows the same for 69 experiments using the 13-point window.
1076
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
Table 1 OVP algorithm performance for 3]3 window Group
Optimal
Case
Occurr.
Subcase
Occurr.
Model
Occurr.
k"1
34
* ("* "* ) 1 2 3
34
A C D
1 3 30
k"2
11
* "* ("* ) 1 2 3 * "* "* 1 2 3 * '* "* 1 2 3
11
D
11
6 2
D B D
6 1 1
* "* "* 1 2 3
13
* '* "* 1 2 3
1
B D B
2 11 1
k"3
Uncertain
8
14
Total
67
67
67
Table 2 OVP algorithm performance for 13-point window Group
Optimal
k
Occurr.
Subcase
Occurr.
Model
Occurr.
k"1
22
* ("* "* ) 1 2 3
22
A D
1 21
8
* "* ("* ) 1 2 3
7
k"2
* '* ("* ) 1 2 3 * "* "* 1 2 3 * '* "* 1 2 3 * "* "* 1 2 3
1
C D A
1 6 1
6 2
D C
6 2
17
B D B D D B D
2 15 1 6 1 1 5
k"3
Uncertain
Total
8
31
* '* "* 1 2 3
7
* "* '* 1 2 3 * '* '* 1 2 3
1 6
69
Table 3 gives more details relative to the 17 cases, for the 13-point window, where * '* . Column 1 shows 1 3 the MAE of the optimal operator t and columns 2 and 015 3 show, respectively, the MAE of the increasing operator generated by the algorithm for k"3 and k"1. There is no signi"cant variation between the error increase of these two increasing operators, as shown in column 4. Table 4 shows the processing times for di!erent window and inversion set sizes. The time is given in seconds and it has been rounded to the closest non-null integer value. The symbol * indicates that the corresponding result is optimal, and the symbol ] indicates that the corresponding case was not processed. The processing time is the total time spent by: (a) inversion set computation, (b) graph structure building, (c) OVP algorithm application, (d) computation of the minimal elements of
69
69
the resulting operator's 1-set, and (e) I/O. All experiments with the 3]3 window took no more than 0.27 s. 6.2. OVP algorithm versus MAE-representation-based algorithm In this section we empirically compare the OVP algorithm with an algorithm based on the MAE representation theorem. Any such comparison depends on the manner in which the MAE representation is employed. The implementation of the MAE-representation-based algorithm we have used has been employed in previous studies and consists of the following steps: (1) Compute MAE for each of 2n structuring elements (Eq. (8)).
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
(2) Select the best 100 structuring elements having smallest MAE to build the "rst-order library. (3) Compute MAE for all combinations up to m structuring elements of the library, where m is the basis-size
Table 3 Error increase variation between k"1 and k"3 MAESt T MAESt T#* 015 015 3
MAESt T#* 015 1
* !* 1 3
5.6226e-03 7.6559e-03 5.2408e-04 4.9300e-03 3.9749e-04 5.1621e-03 7.1254e-03 1.8423e-04 6.2505e-03 1.5442e-03 3.1494e-03 2.1721e-03 1.9718e-03 2.1792e-03 5.0711e-04 5.2177e-04 1.0535e-02
6.0368e-03 8.1048e-03 7.8259e-04 5.3265e-04 4.4492e-04 5.6849e-03 7.6551e-03 2.7552e-04 8.9051e-03 1.9629e-03 3.2069e-03 2.7248e-03 2.2425e-03 2.7289e-03 9.9836e-04 9.9659e-04 1.4465e-02
1e-07 1e-07 1.5e-07 1.8e-07 1.9e-07 2e-07 2e-07 2.4e-07 4e-07 7e-07 8e-07 9e-07 1.6e-06 2.2e-06 3.84e-06 4.4e-06 1.4e-05
6.0367e-03 8.1047e-03 7.8244e-04 5.3283e-04 4.4473e-04 5.6847e-03 7.6549e-03 2.7528e-04 8.9047e-03 1.9622e-03 3.2061e-03 2.7239e-03 2.2409e-03 2.7267e-03 9.9452e-04 9.9219e-04 1.4451e-02
1077
constraint, using a recursive algorithm that implements Eq. (10). Better implementations of Eq. (10) other than by means of a recursive algorithm are possible. For instance, it could be implemented using dynamic programming, taking advantage of the fact that the MAE of any merosion "lter can be computed from the MAE of two (m!1)-erosion "lter and the MAE of a single-erosion "lter. A di$culty with such an approach is that the number of (m!1)-erosion "lters is very large, thereby requiring a large amount of memory for storage. Moreover, even if storage is possible, the MAEs of the (m!1)erosion "lters need to be stored in such a way that they can be found quickly. More generally, di!erent architectures permit specialized approaches. Table 5 shows some comparisons between the OVP algorithm and the MAE-representation-based algorithm (whose implementation is described above). For the MAE-representation-based algorithm the number of structuring elements in the bases have been constrained, as shown in column size constraint of the table. Beyond the comparisons of Table 5, it should be noted that in Table 4 the OVP algorithm has been used for window sizes of 25 and 49, with the number of basis elements in the thousands and tens of thousands, respectively. Even for industrial implementation, application of the MAE
Table 4 Time measurements for some experiments Window size
Examples
Inversion set
Edges
Basis
Time$0.5 (in s) k"1
k"2
k"3
3671 8180 8185 8185 7892 5864 6605
622 902 2452 70 3090 4492 5020
1516 1084 8778 42 12,718 22,762 26,368
41 596 215 831 165 69 81
1 2 3 1H 3 4 5
1 2 6 ] 8 15 26
1 2H 96 ] 40 1125 1483
47,648 56,706
5277 14,063
20,139 13,209
1856 1485
66 223
67 393
72 18,621
17
74,253 68,011 95,439
7066 18,656 24,539
10,532 52,716 14,549
2820 1995 2153
137 397 779
148 1028 2595
161 ] ]
25
52,615 290,892 74,914 95,565 597,965 368,717 836,334
5101 5376 10,090 10,899 19,885 87,732 198,168
73,541 97,878 42,711 45,789 68,891 808,285 2,040,243
212 5183 198 8 9384 6151 8375
87 288 203 449 3749 11,459 16 h
88H 289 204 465 4075 66,518 7d
] 311 204 1571 5830 ] ]
49
1,173,140
445
426
37,376
678
671H
]
13
16
1078
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
Table 5 OVP algorithm versus MAE-representation-based algorithm Window size
9
13
MAE theorem
OVP Basis size
*
Time
Size constraint
Basis size
*
Time
4 4 10 22 100 100 24 34
2.0488e-04 1.2631e-03 4.5525e-05 2.2020e-06 0.0 0.0 6.8646e-04 0.0
1 1 1 1 1 1 1 1
6 6 6 6 6 8 6 6
4 4 6 6 6 8 6 6
2.0488e-04 1.2631e-03 3.4783e-04 2.5304e-04 1.1874e-02 1.3307e-02 1.0895e-03 4.7806e-04
2202 2190 2499 4541 8433 13.7 d 4158 7047
276 831
1.3440e-04 6.8237e-06
1 1
6 6
4 5
2.1947e-02 1.8189e-02
19152 19192
representation restricts the basis to a very small number of structuring elements selected from a relatively small library [13].
7. Conclusion Increasing translation-invariant binary "lters play an important role in image processing, especially in morphological processing, and optimal "lters need to be estimated from image data. All known approaches involve combinatoric searches. The algorithm proposed in this paper has the advantage that the search is over a smaller set than other algorithms. The new algorithm is based on a greedy property. Actual implementation may be based on a relaxation of the original construction, so that the result may be suboptimal; nonetheless, for the various models tested, the result has proven to be optimal or very close to optimal. Besides its precision, three properties of the new algorithm are noteworthy: "rst, it is applicable to relatively large windows, and this applicability has been experimentally demonstrated; second, it operates directly on the input data via estimates of the conditional probabilities P(>"1Dx); and third, the degree of suboptimality (relaxation) serves as an input parameter to the algorithm, so that computation time can be bounded for large windows and the algorithm can run to full optimality for small windows. It is conceivable that the switching paradigm can be applied to other algebraically constrained "lter-optimization problems to achieve precise estimation of the optimal constrained "lter while at the same time satisfying the three aforementioned properties. Application of switching-based estimation techniques to other constraints is a goal of our future research.
Appendix A This appendix presents proofs for some of the propositions and theorems. Proof of Proposition 1. (a) We prove that ¸ X; " A A Q, ¸ W; "0, and no element of ; lies beneath any A A A element of ¸ . A ¸ X; A A "[(QS0TCAS0T)XAS1T]X[(QS1TCAS1T)XAS0T] "[(QS0TCAS0T)XAS0T]X[(QS1TCAS1T)XAS1T] "QS0TXQS1T"Q ¸ W; A A "[(QS0TCAS0T)XAS1T]W[(QS1TCAS1T)XAS0T] "[(QS0TWAS0T#)XAS1T]W[(QS1TWAS1T#)XAS0T] "[(QS0TWAS0T#)W[(QS1TWAS1T#)XAS0T]] X[AS1TW[(QS1TWAS1T#)XAS0T]] "0X0"0 Suppose by absurd that there exists y3¸ such that A &x3; , x(y. Since ¸ -(tS0TC AS0T)XAS1T" A A t S0T and ; -(tS1TC AS1T)XAS0T"t S1T, if we A A A build t (as de"ned by Eq. (11)), then we will have A t (x)"1 and t (y)"0, which is an absurd because A A t should be increasing (since A is a switching set). A (b) By de"nition, A is a switching set if and only (L, U) if t (L, U) is an increasing operator. Suppose by absurd A that A is not a switching set. Then, t (L, U) is not (L, U) A
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
increasing, i.e., there exists x and y, x(y, such that x3 t (L, U) S1T and y3t (L, U) S0T, where A A t (L, U) S1T"[tS1TC(¸WQS1T)]X(;WQS0T) A "[tS1TC¸S1T]X;S0T, t (L, U) S0T"[tS0TC(;WQS0T)]X(¸WQS1T) A "[tS0TC;S0T]X¸S1T. Therefore, either x3tS1TC¸S1T or x3;S0T, and either y3tS0TC;S0T or y3¸S1T. We will show that all possibilities imply x3; and y3¸, which is an absurd because (¸, ;) is a valid partition of Q. (i) The case x3;S0T and y3¸S1T is trivial. (ii) If x3;S0T and y3tS0TC;S0T, then x3Q (since ;S0T-Q), i.e., there exists z3X, z(x, such that t(x)"0 and t(z)"1. Therefore y3Q (since y'x'z and t(y)"0), i.e., y3[tS0TC;S0T]WQ"¸S0T. (iii) If x3tS1TC¸S1T and y3¸S1T, then y3Q, x3Q too, and therefore x3[tS1TC¸S1T]WQ";S1T, by same reasoning in (ii). (iv) If x3tS1TC¸S1T and y3tS0TC;S0T, then t(x)"1 and t(y)"0, and therefore x, y3Q (since y'x), which means that x3[tS1TC¸S1T]WQ";S1T and y3[tS0TC;S0T]WQ"¸S0T. (c) A A A "(¸ WQS1T)X(; WQS0T) (L , U ) A A "[((QS0TCAS0T)XAS1T)WQS1T]
1079
of ¸CF (because ¸CF-¸). Thus, no element of ;XF lies beneath any element of ¸CF. Therefore, (¸CF, ;XF) is a valid partition of Q. (c) Since (¸C(¸WF))X(;X(¸WF))"Q and (¸C (¸WF))X(;X(¸WF))"0, (¸C(¸WF), ;X(¸WF)) is a partition of Q. Moreover, since (QCF, F) is a valid partition of Q, no element of F lies beneath any element of ¸C(¸WF) (because ¸C(¸WF)-QCF) and, thus, no element of ¸WF lies beneath any element of ¸C(¸WF) (because ¸WF-F). Furthermore, since (¸, ;) is a valid partition of Q, no element of ; lies beneath any element of ¸C(¸WF) (because ¸C(¸WF)-¸). Therefore, no element of ;X(¸WF) lies beneath any element of ¸C(¸WF). (¸WF, ;WF) is a partition of F because (¸WF)X(;WF)"F and (¸WF)W(;WF)"0. Moreover, since (¸WF)-¸ and ;WF-;, no element of ;WF lies beneath any element of ¸WF. (d) Since ¸@X(;@XF)"(¸@X;@)XF"(QCF)XF"Q and ¸@W(;@XF)"(¸@W;@)X(¸@WF)"0X0"0, (¸@, ;@XF) is a partition of Q. To show that no element of ;@XF lies beneath any element of ¸@, we just need to show that no element of F lies beneath any element of ¸@ since no element of ;@ lies beneath any element of ¸@ (because (¸@, ;@) is a valid partition of QCF). But this is straight from the fact that (QCF, F) is a valid partition of Q. (e) Since ¸A-¸, no element of ¸A lies above any element of ;. Moreover, since (¸A, ;A) is a valid partition of ¸, then, no element of ¸A lies above any element of ;A. Thus, no element of ¸A lies above any element of ;X;A"QC¸A.
X[((QS1TCAS1T)XAS0T)WQS0T] Proof of Proposition 6. "AS1TXAS0T"A *3(¸, ;)
(d) ¸ (L, U) "(QS0T)CA S0T)XA S1T A (L, U) (L, U)
" + c(x)# + c(x) x x |LW1X |UW0X
"(QS0TC;WQS0T))X(¸WQS1T) "(QS0TC;S0T)X¸S1T "¸S0TX¸S1T"¸ ; (L, U) "; can be proved in a similar way. A Proof of Proposition 2. (a) Since ¸X(;CF)" ¸X(;WF#)"QCF and ¸W(;CF)"0, (¸, ;CF) is a partition of QCF. Moreover, since no element of ;CF lies beneath any element of ¸, it is a valid partition. (b) Clearly (¸CF)X(;XF)"Q. Since (QCF, F) is a valid partition of Q, no element of F lies beneath any element of QCF, and thus, no element of F lies beneath any element of ¸CF. Moreover, since (¸, ;) is a valid partition of Q, no element of ; lies beneath any element of ¸, and thus, no element of ; lies beneath any element
" + c(x)# + c(x)# + c(x)! + c(x) x x x x |LW1X |UW0X |UW1X |UW1X "*3(Q, 0)#u(;) Proof of Proposition 7. The element (¸C;@, ;X;@) is a valid partition of Q (Proposition 2(e)). Moreover, *3(¸C;@, ;X;@)"*3(Q, 0)#u(;X;@) "*3(Q, 0)#u(;@)#u(>) "*3(¸X;@, ;)#u(;@) "*3(¸, ;)#u(;@). Proof of Lemma 10. (a) If Q does not contain a ;feasible set, then, by De"nition 8, for any valid partition
1080
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
(¸, ;) of Q, u(;)*0. In particular, it is true for (¸, ;)"(0, Q), which means that F is nonempty. L (b) The proof is analogous. Proof of Theorem 11. (a) To prove that (¸@, ;@XF) is an optimal partition of Q, we just need to prove that it is optimal since we already know, by Proposition 2(a), that it is a valid partition of Q. We will suppose, by absurd, that it is not optimal, and show that this assumption will always lead to some contradiction. If (¸@, ;@XF) is not an optimal partition of Q, then there exists some valid partition (¸, ;) of Q such that *3(¸, ;)(*3(¸@, ;@XF). Let us take, in particular, an optimal one, (¸H, ;H). We will analyze the relation of F to ;H and ¸H and show that all possible cases ((1) F-¸H, (2) F-;H, and (3) F U . ;H and F U . ¸H), lead to some contradiction. Case 1: If F-¸H, then (¸HCF, ;HXF) is a valid partition of Q (proposition 2(c)). From Proposition 7 and from the fact that F is a ;-feasible set it follows that *3(¸HCF, ;HXF)"*3(¸H, ;H)#u(F)(*3(¸H, ;H). This is an absurd, because (¸H, ;H) is an optimal partition of Q. Case 2: If F-;H, then (¸H, ;HCF) is a valid partition of Q@ (Proposition 2(b)). Furthermore, from Proposition 7, *3(¸H, ;H)"*3(¸H, FX(;HCF)) "*3(¸HX(;HCF), F)#u(;HCF)"*3(Q@, F)#u(;HCF) and *3(¸@, ;@XF)"*3(Q@C;@, FX;@)"*3(Q@, F)#u(;@). Since (¸H, ;H) is an optimal partition of Q, then *3(¸H, ;H))*3(¸@, ;@XF), which means from the above two expressions that u(;HCF))u(;@). The equality cannot hold because (¸@, ;@XF) would be an optimal partition of Q, contradicting our assumption. Therefore, it must hold that u(;HCF)(u(;@), and then, from Proposition 6, *3(¸H, ;HCF)"*3(Q@, 0)#u(;HCF)(*3(Q@, 0)#u(;@) "*3(¸@, ;@) which is an absurd because (¸@, ;@) is an optimal partition of Q@. Case 3: If F U . ¸H and F U. ;H, then let X"FW¸H. From Proposition 2(d), (¸HCX, ;HXX) is a valid partition of Q. We also know that u(X)*0 because otherwise, by Proposition 7, (¸HCX, ;HXX) would be a valid partition of Q with smaller error increase than (¸H, ;H),
which is an absurd. Since u(F)(0 (because F is a ;feasible set), from Proposition 5, we conclude that u(FCX)#u(X)"u(F)(0. From this and the fact that u(X)*0, it follows that u(FCX)Qu(X)(0. This, u(FCX)(0, is an absurd because FCXLF3F , violatU ing the minimality condition of F. (b) The proof is analogous. Proof of Theorem 12. The case Q"0 is trivial. Otherwise, there exists at least one feasible set of Q (Lemma 10). Let then F , F ,2, F , t*1, be the sequence of feasible 1 2 t sets found by the algorithm. Note that Q"F XF X 1 2 2XF . t If F is a ;-feasible set then (0, F ) is an optimal t t partition of F and, if F is a ¸-feasible set then (F , 0) is t t t an optimal partition of F (Theorem 11), with t (¸@, ;@)"(0, 0). Now suppose (¸@, ;@) is an optimal partition of F X2XF . By Theorem 11, if F is a ;-feasible set, i`1 t i then (¸@, ;@XF ) is an optimal partition of i F XF X2XF , and if F is a ¸-feasible set then i i`1 t i (¸@XF, ;@) is an optimal partition of F XF X2XF , i i`1 t i"t!1, t!2,2,1. In other words, given an optimal partition of F X2XF , the optimal partition of i`1 t F XF X2XF can be build by adding F to the upper i i`1 t i set or to the lower set, depending, respectively, if F is i a ;- or ¸-feasible set. Therefore, the partition of Q whose the upper and lower sets are formed, respectively, by all ;- and ¸feasible sets is an optimal partition of Q. Since this is exactly the partition returned by the algorithm, it is optimal. Proof of Proposition 14. (a) If x3F then, by minimality x*x for some i3M1, 2, 2, kN. Therefore, x3 i ;[Mx , x ,2,x N]. 1 2 k On the other hand, if x3;[Mx , x ,2,x N], then there 1 2 k exists x , for some i3M1, 2, 2, kN, such that x*x . Supi i pose by absurd that xNF. In this case, x3QCF, and (QCF, F) would not be a valid partition of Q, since x 3F i lies beneath x3QCF, which is an absurd. (b) The proof is analogous. Proof of Proposition 15. (a) Suppose by absurd that some minimal elements of F are in QS0T, and let x be one of them. By de"nition, u(MxN)*0. Therefore, u(F)"u(FCMxN)#u(MxN), u(FCMxN)"u(F)!u(MxN))u(F)(0. Then, (MxN, FCMxN) is a valid partition of F and u(FCMxN)(0, which contradicts the fact that F is a ;feasible set, since FCMxNLF. (b) The proof is analogous.
N.S.T. Hirata et al. / Pattern Recognition 33 (2000) 1059}1081
References [1] G.J.F. Banon, J. Barrera, Minimal representations for translation invariant set mappings by mathematical morphology, SIAM J. Appl. Math. 51 (6) (1991) 1782}1798. [2] G. Matheron, Random Sets and Integral Geometry, Wiley, New York, 1975. [3] P. Maragos, R.W. Schafer, Morphological "lters: Part I: Their set-theoretic analysis and relations to linear shiftinvariant "lters, IEEE Trans. Acoust. Speech Signal Process. ASSP- 35 (1987) 1153}1169. [4] E.R. Dougherty, C.R. Giardina, Morphological Methods in Image and Signal Processing, Prentice-Hall, Englewood Cli!s, NJ, 1988. [5] E.R. Dougherty, M. Haralick, Uni"cation of nonlinear "ltering in the context of binary logical calculus, Part I: Binary "lters, J. Math. Imaging Vision 2 (2) (1992) 173}183. [6] E.R. Dougherty, J. Barrera, Logical image operators, in Nonlinear Filters for Image Processing, SPIE and IEEE Press, Bellingham, 1999. [7] E.R. Dougherty, R.P. Loce, Precision of morphological representation estimator for translation-invariant binary "lters: increasing and nonincreasing, Signal Processing 40 (1994) 129}154. [8] E.R. Dougherty, P.A. Laplante, Introduction of Real-Time Imaging, Tutorial Texts in Optical Engineering, Vol.
[9]
[10]
[11]
[12]
[13]
[14]
[15]
1081
TT19, SPIE Optical Engineering Press and IEEE Press, New York, 1995. E.R. Dougherty, Optimal mean-square N-observation digital morphological "lters I. Optimal binary "lters, CVGIP: Image Understanding 55 (1) (1992) 36}54. D.J. Kleitman, G. Markowsky, On Dadekind's problem: the number of isotone boolean functions II, Trans. Amer. Math. Soc. 213 (1975) 373}390. R.P. Loce, E.R. Dougherty, Facilitation of optimal binary morphological "lter design via structuring element libraries and design constraints. Opt. Eng. 31 (5) (1992) 1008}1025. R.P. Loce, E.R. Dougherty, Optimal morphological restoration: the morphological "lter mean-absolute-error theorem, J. Visual Commun. Image Representation 3 (4) (1992) 412}432. R.P. Loce, E.R. Dougherty, Enhancement and Restoration of Digital Documents: Statistical Design of Nonlinear Algorithms, SPIE } The International Society for Optical Engineering, Bellingham, 1997. C.C. Han, K.C. Fan, Finding of optical binary morphological erosion "lter via greedy and branch & bound searching, Math. Imaging Vision 6 (4) (1996) 335}353. A.V. Mathew, E.R. Dougherty, V. Swarnakar, E$cient derivation of the optimal mean square binary morphological "lter from the conditional expectation via a switching algorithm for discrete power-set lattice, Circuits Systems Signal Process 12 (3) (1993) 409}430.
About the Author*EDWARD R. DOUGHERTY holds an M.S. in Computer Science from Stevens Institute of Technology and Ph.D. in Mathematics from Rutgers University. He is currently a Professor in the Department of Electrical Engineering at Texas A&M University. He is editor of the SPIE/IS&T Journal of Electronic Imaging and of the SPIE/IEEE Series on Imaging Science and Engineering. He is the author of eleven books, editor of four books, and has published numerous papers in Nonlinear Filtering and Mathematical Morphology, His current interest is the Optimal Design of Nonlinear Filters, Granulometric Analysis, and Informatics for cDNA Microarrays. About the Author*NINA S.T. HIRATA received the B.S. and M.S. degrees in Computer Science from the University of Sao Paulo in 1989 and 1996, respectively. She is currently a Ph.D. student at the same university and her research interests include Mathematical Morphology, Nonlinear Image Processing and Machine Learning. About the Author*JUNIOR BARRERA received the degree of doctor in Automatic control and Systems from the University of Sao Paulo in 1992. Since June 1992, he has been an Assistant Professor at the Department of Computer Science of the Institute of Mathematics and Statistics of the University of Sao Paulo. In the last 10 yr he has worked in the "eld of Image Analysis by Mathematical Morphology, with several Theoretical and Applied contributions.
Pattern Recognition 33 (2000) 1083}1104
Estimation of the in#uence of second- and third-order moments on random sets reconstructions Antoine Aubert, Dominique Jeulin* Centre de Morphologie MatheH matique, Ecole des Mines de Paris, 35 Rue St-HonoreH , 77305 Fontainebleau, France Received 23 June 1999
Abstract To characterize a random image, we often limit ourselves to the use of two functions. The "rst one is the histogram, or grey-level distribution, the other is the covariance function which allows the study of the spatial grey level distribution. One evaluation of the relevance of these two functions to describe the morphology of images is presented. Especially, we show the importance of having recourse to the centered third-order moment, in addition to these two functions, to obtain a "ne characterization of images. This study is done for binary random images generated from the same primary grains (Poisson polygons or discs) that are implanted in di!erent ways. We simulate with the Gagalowicz's procedure some textures respecting the same second- or third-order moments. We study the resemblance, in term of the morphology described with erosion and dilation curves and with granulometry by opening and pseudo-granulometry by closing (with squares). The analysis of the results of these measurements is carried out with the help of correspondence analysis that allows a quantitative and qualitative approach of the di!erences of morphology. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Mathematical morphology; Reconstruction; Covariance; Third-order moment; Random sets; Automatic classi"cation
1. Introduction The reconstruction of a texture from partial information is an important issue in di!erent domains: the production of three-dimensional simulations from some borings at geological scales [1,2], or from some plane characteristics, at microscopic scales [3]; more generally, it is interesting to be able to reproduce any texture from limited information, for the restitution of compressed images [4,5]. For this reconstruction, two classes of iterative methods were proposed and tested: f The "rst family changes, in a sequential way, each pixel of the image with respect to a criterion (to be opti* Corresponding author. Tel.: #33-1-64694795; fax: #331-64694707. E-mail addresses:
[email protected] (A. Aubert),
[email protected] (D. Jeulin)
mized). In a common way, the di!erence between some statistical properties of the image to be constructed and the image we want to reach is minimized. So the algorithm proposed by Gagalowicz [4,5], that we use in this paper, or the simulated annealing algorithms are proceeded with [1}3]. f The second family of methods, adapted to primary grains (where random objects respect positioning rules), is based on the reconstruction of a birth and death process; it allows to construct realizations of point processes respecting geometrical conditions [6], or certain families of random sets. All these methods allow to reconstruct some objects respecting geometrical constraints on some points (in this case, we talk about conditional simulations), as in Refs. [1,2,7], or with respect to non-covering rules of the grains [8]. We propose here to test, for di!erent categories of random sets, a reconstruction algorithm that respects the
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 6 6 - 1
1084
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
two- or three-point moments. After recalling theoretical characteristics of random sets, we introduce some examples of random sets and of the simulation process. The di!erence between models and simulations is studied with the help of a multivariate statistical analysis of morphological data that allows one to judge the quality of simulations. It is a complement to the visual observation, and it illustrates from simulations a well known fact, namely that statistical moments with a level higher than two or three are required for the generation of wellde"ned binary textures.
The closing is de"ned by A z K[ "((A = K[ ) > K). In practice, the opening erases the parts of A smaller than the structuring element (because they will disappear during the erosion) and the closing will suppress the parts of A# (complementary set of A) smaller than the structuring element (because they are "lled in during the dilation). One example of closing and opening is given in Fig. 1(d) and (e). 2.1. Choquet capacity
2. Characterization of random sets The "rst point consists in "nding some functions to describe, as objectively as possible, the morphology of images and this to organize their dissimilarities into a hierarchy. We proceed, on the basis of the theory of random sets. It will give us the erosion (resp. dilation curve) and the granulometry with opening (resp. antigranulometry with closing) [9,10]. At "rst, let us recall some de"nitions of mathematical morphology [11}13]. Dilation}Erosion. We work in the binary case. Let K be a structuring element (i.e. a compact set of points in the Euclidean space En). The dilation of a set A by K is de"ned by A = K[ "ZMx : K W AO0N x " Z Mx!yN. x|A, y|K K is the set K centered (translated) in x3A and K[ is its x transposed set. The erosion is de"ned by A > K[ "Mx : K -AN. x Some examples of dilation and erosion are presented in Fig. 1 for a planar image. Opening}closing. The morphological opening is the combination of one erosion and one dilation: A " K[ "((A > K[ ) = K).
To deal with the reconstruction problem, it is necessary to recall the way to completely characterize a random set. This point was studied by G. Matheron in the frame of the theory of closed random sets [11]. A random set is completely de"ned by probability laws generalizing the case of random variables. To characterize the set A, we can choose a reference set K and answer the following two questions: K W A"0?
(1a)
K W AO0?
(1b)
For example, for K"x : x 3 A? (or in the same way for K"Mx, x#hN, and more generally K"Mx , x ,2, 1 2 x ,2N). n By increasing the number of points, the number of positive answers to question (1a) decrease and to question (1b) increase, so that a richer information is obtained on the structure of A. Any random closed set A is completely characterized by its Choquet capacity [11], a functional ¹(K) de"ned over all compact sets K: ¹(K)"PMK W AO 0N"1!PMKLA#N"1!Q(K), (2) where PMEN refers to the probability of event E. In practice, di!erent "gures K are used to evaluate the morphological properties of an heterogeneous structure. It is necessary to point out here that a limited amount of information such as second- and third-order moments
Fig. 1. Original image (a), dilation (b), erosion (c) closing (d), opening (e) by a square (4]4 pixels).
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
cannot warrant the identi"cation of a unique random set. This is a consequence of Matheron's result concerning the Choquet capacity, and will be illustrated by a theoretical counter-example in Section 3 and by the results of simulations presented in Section 5. We have for the set K located in the origin O: ¹(K)"PMK W AO0N"PMO3A = K[ N.
(3)
This lets us estimate ¹(K) with an image analysis on realizations of A and with the help of dilations. For a stationary process, ¹(K )"¹(K) (the Choquet x capacity is, in this case, invariant by translation). For an isotropic set, the Choquet capacity is invariant by the rotation of K. For an ergodic set A, ¹(K) is estimated from only one realization by the measurement of the volume fraction < , the estimation being denoted by V ¹(K)H: ¹(K)H"PMx3A = K[ NH"< (A = K[ )H. (4) V Each compact set K contributes to the knowledge of the morphology of A. For example, if K"MxN in R3, we get < from ¹(x)"p"< (A). V V If K"Mx, x#hN, we obtain the covariance by ¹(x, x#h)"PMx 3 A X A
N (5) ~h Q(x, x#h)"1!¹(x, x#h)"PMx 3 A#W A# N, (6) ~h where Q(x, x#h) is the covariance of A#. It depends only on h for a stationary closed random set. Deduced from the indicator function k(x) de"ned by k(x)"1 for x 3 A and k(x)"0 for x 3 A#, the centered covariance is given by = M (h)"CM (h)"EM(k(x)!p) (k(x#h)!p)N. 2 We will also use the reduced covariance = M (h)/p2, where 2 the variance p2 is given here by p (1!p). For K"Mx, x#h , x#h N we get the third-order 1 2 statistics. The third-order centered moment is given by = M (h , h )"EM(k(x)!p) 3 1 2 ](k(x#h )!p) (k (x#h )!p)N. (7) 1 2 In addition, we will consider the reduced third-order moment = M (h , h )/p3, p3 being the third-order centered 3 1 2 moment given here by p (1!p) (1!2p). The theoretical characteristics of random structures enable us to test a probabilistic model (to compare theoretical and experimental properties), to estimate the parameters of a model and to predict the characteristics that are not directly measured (like the inference of 3D properties from 2D observations). The main advantage of a probabilistic model is to provide a theoretical form of ¹(K) for di!erent compact sets, and to ensure at the same time all the coherence relations imposed by this function.
1085
We will use, for the reconstruction, limited information: the order 2 and 3 moments, obtained from considering the compact sets K"Mx, x#hN and K" Mx, x#h , x#h N, the three points being placed on an 1 2 equilateral triangle. So, the information we use is twodimensional, but only a part of third-order moments is considered. More con"gurations of the three points could be used without any di$culty, but would require much longer computations in the implementation of the reconstruction. The validation will be carried out by estimating P(K)"PMKLAN, Q(K) and the granulometries for compact squares, as indicated below. In Ref. [3], the useful information for the reconstruction is one dimensional, K is successively one doublet and the segment l. In Refs. [1,2,7], the information is limited to = M (h), 2 however, information at a larger scale (average density of pixels in windows of di!erent sizes and shapes) is also used in Ref. [2]. 2.2. The erosion and dilation curve We try to quantify the speed of disappearance of white phase when it is eroded by structuring elements of increasing sizes. In a symmetrical way, we study the disappearance of black zones when we do some dilations of the white parts by structuring elements of increasing sizes (Fig. 2). So, at "rst, we have to construct the erosion and dilation curves. We estimate the probability S(s) for a point to belong to A > sK or to A = sK as follows: 1 S(s)" D= > DsDBD
G
]
D(A W =) > sKD
for s*0,
D(A W =) = DsDKD for s)!1,
(8)
where = is the observation window and DAD is the measurement of A. We will use an estimator using the derivative of these functions to estimate the speed (with respect to the size of structuring elements) of erosion and dilation: SK (s)"S (s)!S (s#1). Some examples of erosion and dilation curves are presented in Fig. 10. 2.3. The granulometry with opening and the antigranulometry with closing A granulometry is the study of the distribution of the sizes of objects (a sieve analysis). On our images a granulometry (distribution of sizes in the white phase) with the help of openings and an antigranulometry (distribution of sizes in the black phase)
1086
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
Fig. 2. Erosion}dilation curves and granulometry}antigranulometry curves with squares for reference images.
with the help of closings will be estimated (Fig. 2). First we measure the function S(s),
The granulometry and antigranulometry are obtained from the derivatives of these functions:
1 S (s)" D(= > DsDK) > DsDKD
SK (s)"S (s)!S (s#1).
G
]
D(A W =) " sKD
for s*0,
D(A W =) z DsDKD for s)!1.
(9)
Some examples of granulometries are shown in Fig. 6. In our case, these parameters will be useful to discriminate the textures and their simulations.
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
1087
Fig. 3. Comparison of the theoretical second- and third-order moments for three models with Poisson primary grains, and di!erent fractions of white phase p: Poisson mosaic (mos); boolean model with Poisson polygons (sbp); dead leaves of Poisson polygons (dlp); (a) order 2, p"0.3; (b) order 3, p"0.3; (c) order 2, p"0.46; (d) order 3, p"0.46; (e) order 2, p"0.5; (f ) order 3, p"0.5.
3. Examples of random sets models For the simulations presented in this paper, "ve random sets models will be considered:
f f f f f
The The The The The
Poisson mosaic. boolean model of Poisson polygons. boolean model of discs. dead leaves of Poisson polygons. dead leaves of discs.
1088
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
They were used in many applications, including the prediction of the physical properties of random composites [14]. Some of these were tested to simulate the microstructure of sintered materials [15]. They can reproduce the observed textures very well. In the present study, which is devoted to simulations of textures from partial morphological information, the choice of speci"c models comes from the fact that we have for them the theoretical expressions of the covariance function and of the threepoint moment. We brie#y recall their de"nitions [16]:
with radius r. For instance, we can consider the following hierarchical model: starting from a Poisson tesselation of space with intensity j/2, we generate in every Poisson polygon-independent realizations of Poisson mosaics with intensity j/2. Any linear pro"le of this random set has the same statistical properties as a standard Poisson mosaic. In addition, the function = M (h , h ) is the same 3 1 2 for these two di!erent models.
3.1. The two-phase Poisson mosaic model
The boolean model is constructed as follows [11,17]: we start with a Poisson point process with intensity h (average number per unit area) and with a family of random compact sets A@ called primary grains. The boolean model is obtained by taking the union of primary grains located on points x of the Poissson point k process:
The mosaic model is built as follows: let % be a locally "nite random tesselation of space. The points x of classes C of % are attributed to the random set A with the probability p, and to A# with the probability q"1!p. The a!ectations are made independently for the di!erent classes. For a Poisson mosaic, the tesselation % is delimited by a network of Poisson lines in the plane. In the isotropic case (considered here), the lines network with the intensity j, generates on any straight line a Poisson point process with parameter 2j. Simulations of Poisson lines for a rectangular window in two dimensions are obtained as follows: consider the disc B with radius R enclosing the window; the random number N of lines hitting B follows a Poisson distribution with average 2pjR; given N"n, we then have to generate n diameters of the disc, making with a "xed axis random angles with a uniform distribution ranging from 0 to p; on each diameter, we take a point x with a uniform location and a line orthogonal to the diameter in x. The lines are then restricted to the interior of the rectangular window. One example of 2D binary mosaic is given in Fig. 4(1), in the case of p"q"1. The theoretical expressions for 2 second- and third-order moments are given by = M (h)"CM (h)"(p!p2) e*~2j,h,+, 2 = M (h , h ) 3 1 2 "p (1!p) (1!2p) e*~j(,h1 ,`,h2 ,`,h2 ~h1 ,)+,
(10)
(11)
where EhE is the norm of vector h. Notice that = M (h , h )"0 for p"0.5. For this frac1 2 tion, the random set A is autodual, meaning that A and A# possess the same probabilistic properties. For the Poisson mosaic, the theoretical expressions for P (l ) and Q (l ) obtained when the compact set is the segment of length l, are given by P (l )"p e(~2jql),
(12)
Q (l )"q e(~2jpl).
(13)
Notice that it is possible to construct di!erent random sets with the same properties = M (h), P (l), Q (l), 2 = M (h , h ) as for the Poisson mosaic, but with di!erent 3 1 2 probabilities P (K) for K made of four points of a disc
3.2. The boolean model
A"ZA@ k . x The Choquet capacity of the stationary boolean model in En is given as follows, with q"PMx 3 A#N, ¹(K)"1!Q(K)"1!exp (!hk6 (A@ = K[ )) n "1!qk6 n (A{ ^ K[ )@k6 n (A{), where k6 is the average of the Lebesgue's measure on all n the realizations of primary grains. For our simulations, we use in a "rst case primary grains A@ made of Poisson polygons (i.e. polygons extracted from a Poisson tesselation). One example of simulation is shown in Fig. 4 (2) (a). The expressions of the covariance of the set A#, Q(h), and of the third-order moment Q(h , h ), are 1 2 Q(h)"PMx 3 A#, x#h 3 A#N "q2~%91(~2j,h,),
(14)
Q(h , h )"PMx 3 A#, x#h 3 A#, x#h 3 A#N 1 2 1 2 "q3~%91(~2j,h1,)~%91(~2j,h2,)~%91(~2j,h2 ~h1,) ]q%91(~j(,h1,`,h2 ,`,h2~h1,)).
(15)
Now if the primary grains are discs with radius R, these moments become Q(h)"PMx 3 A#, x#h 3 A#N "q2~2p@!3#04(,h,@2R)~,h,@2RJ1(,h,@2R)2 for EhE)2R, Q(h , h ) 1 2 "PMx 3 A#, x#h 3 A#, x#h 3 A#N 1 2 "q3~2@p !3#04(,h1 ,@2R)~(,h1 ,@2R),J1(,h,@2R)2 ]q~2@p !3#04(,h2 ,2R)~(,h2 ,2R)J1(,h,@2R)2
(16)
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
1089
Fig. 4. Examples of simulations: Poisson mosaic (1), Boolean model of Poisson polygons (2) and discs (3). Dead leaves of Poisson polygons (4) and discs (5). Here are shown the original images (a), the images constrained on the basis of the covariance and the third-order moment (b), on the basis of the covariance only (c), the images constrained on the covariance and on a third-order moment di!ering from the one of the model (d), on the three-point moment only (e) and on the basis of one covariance and one three-point moment di!erent from the model (f ). For a given model, the same germs are used from one reconstruction to the other (b)}(f ).
]q~2@p !3#04(,h1 ~h2 ,@2R)~(,h1 ~h2 ,@2R)J1~,(h1 ~h2 ,@2R)2
3.3. The two-phase dead leaves model
]qs(h1 , h2 ) for Eh E)2R, Eh E 1 2
The color dead leaves model is constructed as follows [16,18}21], using a sequence of random primary grains A@ (t) with colors i (i"1, 2 in the present case): i
)2R, Eh !h E)2R 1 2
(17)
with s (h , h )"E[DA@ W A@ 1 W A@ 2 D]/E[DA@D]. Instead of 1 2 h h using the analytical expression of this function in the case of discs, we have tabulated it after the measurements of the erosion of discs by a triplet (0, h , h ). 1 2
f At t"0, we start from the empty set. f Between t and t#dt, realizations of primary grains A@ (t) are translated to the points of a Poisson point i process with intensity h(t) dt.
1090
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
f Grains appeared at t#dt and cover the previous grains. In a "rst application, we use Poisson polygons with two di!erent colors as primary grains. The polygons of each species are extracted on Poisson tesselation with the same parameters. One realization of this model is shown in Fig. 4(a). The theoretical expressions of the covariance and of the third-order moment are, respectively, C(h)"p!2p (1!p)
1!exp (!2jEhE) , 2!exp (!2jEhE)
(18)
= M (h , h )"p (1!p) (1!2p) 3 1 2 exp(!j(Eh E#Eh E#Eh !h E)) 1 2 2 1 ] 3!exp(!jEh E)!exp(!2jEh E)!( ) ) ) 1 2 ( ) ) )!exp(!2jEh !h E) 2 1 #exp(!2j(Eh E#Eh E#Eh !h E)). (19) 1 2 2 1 If we now take the grains discs with radius R, these expressions become C(h)"p!2p (1!p) 1!2/p arcos (EhE/2R)!(EhE/2R) J1!(EhE/2R)2
]
4. Reconstruction of images on the basis of their second- and third-order moments To reconstruct one image from partial informations, here second- and third-order moments, we will use the process proposed by Gagalowicz in the case of numerical images [4]. At "rst, Gagalowicz proposed to use the n-point joint densities as criteria [5]. In his work on textures (with grey levels), he compared the results obtained from bivariate laws, and from the covariance. So he demonstrated that a high reduction in the amount of information gives good results, in agreement with a visual inspection.
2!2/p arcos (EhE/2R)!(EhE/2R) J!(EhE/2R)2
for EhE)2R,
(20)
= M (h , h )"p (1!p) (1!2p) 3 1 2 with
For the three models with Poisson primary grains, the theoretical properties for two- or three-point moments are very close. For this reason, simulations will produce very similar textures. Moreover, the centered order three moment is close to zero (as illustrated in Fig. 3, which compares the theoretical moments of these Poisson models, for surface fractions equal to 0.3, 0.46 and 0.5). The erosion and dilation curves by squares are more discriminant for these three models (see Fig. 2), as will also be shown in Section 5.
For any random function Z(x) with expectation g, we can estimate the following statistical properties from the observation of the values of Z over N pixels:
s(h , h ) 1 2 , D
A B A B A B
Eh E Eh E 1 ! 1 J1!(Eh E/2R)2 (21) 1 2R 2R Eh E Eh E 2 ! 2 J1!(Eh E/2R)2 !2/p arcos 2 2R 2R
D"3!2/p arcos
! 2/p arcos
4.1. The Gagalowicz's process
Eh !h E 1 2 2R
Eh !h E 2 J1!(Eh !h E/2R)2#s(h , h ) ! 1 1 2 1 2 2R (22) for Eh E)2R, Eh E)2R, Eh !h E)2R. (23) 1 2 1 2 In the case of discs, we use the tabulated values of s(h) as discussed above. As in the case of the Poisson mosaic, these dead leaves models are autodual if p"0.5, and the centered order 3 moment is null for this value of p. In addition, the models with Poisson primary grains can be considered as two-dimensional models or as sections of three-dimensional models, where the primary grains would be Poisson polyhedra (we just have to replace 2j by pj in the theoretical expressions).
Histogram: For each grey level n, m/N h (n)"1/N + d (Z(m)!n). m/1 Centered reduced covariance function:
(24)
N(h) 1 + (Z(m)!g) (Z(m#h)!g), (25) = M (h)/p2" 2 N(h)p2 m/1 where N(h) gives the number of pairs Mx, x#hN in the image. The centered reduced third-order moment: = M (h , h ) 3 1 2 1 N(h1 ,h2 ) " + (Z(m)!g) (Z(m#h )!g) 1 N(h , h ) p3 1 2 m/1 ](Z(m#h )!g), (26) 2 where N(h , h ) gives the number of triples Mx, x# 1 2 h , x#h N in the image. 1 2 The obtained values of the previous measurements are concatenated into one vector, namely the attribute vector of the texture, with the notation B.
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
Starting from a white noise, the image is randomly scanned. In the current pixel, any possible value (two in the binary case) can be selected. For each value, the attribute vector B5x and the error EB!B5xE between our vector attribute and the vector attribute we want to reach are calculated. We use a quadratic error with weights depending on the criterion. This weighting is adjusted to insure a stable convergence for the vector B5x. In our case, we take the weight 1 for the histogram, 2 for the seconorder moment and 2 for the third-order moment. The "nal value allocated to the pixel minimizes the error. From a random sweep across the image, each pixel is visited step by step. Additional passes are performed as long as the residual error remains too large. This process builds a sequential texture synthesis. Initialization: We start from a white noise, with the histogram equal to the histogram that we want to reach. So initially, the attribute vector is composed of: f h5x (n)"original histogram, here PMx"0N"1!p and PMx"1N"p. f CM 5x (h)"0 ∀hO0. f = M 5x (h , h )"0 ∀h O0, ∀h O0. 1 2 1 2 The updates: In each pixel, we have to update the attribute vector for each tested value. To simplify the computation, we update only the modi"cation generated by the change of the value of the pixel at position m, that changes from a to b. Here we present the evolution of the functions of the attribute vector, when they are a!ected by this change: If Z(m)"aPZ@(m)"b the mean becomes gPg@"g!a/N#b/N.
(27)
The variance p2 becomes p@2: p2Pp@2"1/N [2(g!g@) (gN!a)#N (g@2!g2) #(b2!a2)!2bg@#2ag].
(28)
For the third-order moment, we calculate the update of p3"1/N + (Z(m)!g)3, p3Pp@3"p3#1/N [3 (g!g@) (p2N#g2N!a2) #3 (g@2!g2) (gN!a)!N (g@3!g3) #(b3!a3)!3b2g@#3bg@2#3a2g!3ag2]. (29) Note that, contrary to Gagalowicz, we do not make the assumption that, starting from a correct histogram, the variations of the mean and of the variance will be of second-order with respect to the variations of the covariance and of the third-order moment (i.e. g@"g and p@"p). We will take the exact variations of these
1091
parameters for each step of the reconstruction. Updates are given as follows Histogram: h5x(a)Ph5x(a)!1/N, h5x(b)Ph5x(b)#1/N.
(30)
Centered reduced covariance: Let j"Z(m#h), j@"Z(m!h), so
A
CM 5x (h)PCM 5x (h)#
B
1 (( j!g@) (b!g@) N (h)p@2
A
B
#( j@!g@) (b!g@)!
1 (( j!g) (a!g) N(h)p2
#( j@!g) (a!g) ∀h.
(31)
Three-point centered reduced moment: Let j "Z(m#h ); 1 1 j "Z(m!h ); 4 1 so
j "Z(m#h ); 2 2 j "Z(m!h ); 5 2
j "Z(m!h #h ), 3 1 2 j "Z(m!h #h ), 6 2 1
= M 5x (h , h ) 1 2 1 P= M 5x (h , h )# (b!g@) [( j !g@) ( j !g@) 1 2 N(h , h ) p@3 1 2 1 2 #( j !g@) (j !g@)#( j !g@) ( j !g@)] 3 4 5 6 1 ! (a!g) [( j !g) ( j !g) 1 2 N(h , h ) p3 1 2 #( j !g) ( j !g)#( j !g) ( j !g)] 3 4 5 6 ∀(h , h ). (32) 1 2 These three updates give the variation of the attribute vector B5x. We have to minimize the global error of this vector. In practice, we will minimize a linear combination of the errors of the three moments. We have to stress here the fact that local minima are reached by this process. This fact, combined with a random scan of the image, enables us to generate random textures and not a unique image for a given set of moments. We choose for the translations of C(h), the vertical and horizontal translations from 0 to 80 pixels. When the isotropy of the models is taken into account, we do not need to consider other directions. For the third-order centered moment, the couple (h , h ) forms equilateral 1 2 triangles from 0 to 80 pixels. With this choice, we considerably reduce the information of the third-order moment. We have to mention that the second order moment is recovered for certain shapes of triplets (for example (0, h ) or (h , 0)), that are excluded from our study. In the 2 1
1092
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
present case, we can generate simulations respecting the third-order moment, but without necessarily respecting the second-order moment. One example of simulation is presented in Fig. 4(d). For a given model, the obtained simulations use the same germ, to make easier the visual comparisons. From a qualitative point of view, we can see that simulations reproduce only a part of the appearance of
the theoretical textures that we want to generate. This is a direct consequence of the use of information limited to a part of second- and third-order moments. An increase of resemblance would require the use of higher-order moments in the constraints of the simulation. All the theoretical characteristics that we want to obtain are reached with an error lower than 1%. We give in Fig. 5 an example of comparison between obtained and
Fig. 5. Examples of measurement on one simulation compared to the original values (dead leaves of Poisson polygons). We show the horizontal and vertical reduced covariance (a) and the reduced third-order moment (b).
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
expected values in the case of the dead leaves of Poisson polygons. We work on images with 256]256 pixels. We swept across the image 8 times (each pixel is changed at most 8 times). It needs 2 min computation on a IBM SP2 computer used in a sequential way. During the simulation, we can constrain certain points to "xed values to obtain conditional simulations: we just have to exclude these points of our scan of the image. This method can be used, for instance, to reconstruct three-dimensional objects from some sections. The simulations being produced in a bounded domain, we shall obtain important variations of the local morphological properties (all orders moments, ¹(K)). For the criterion to be minimized by taking a di!erence between calculated properties on one image and the average properties (issued from the measurement on several realizations or from the theoretical expression), we underestimate the expected #uctuations in the simulations. To take this into account, we should generate simulations with local morphological properties respecting the statistics. This information is generally out of reach with theoretical calculus, but we can access it from simulations, in the frame of one model, or from images of real structures. For large domains compared to the texture size, #uctuations between realizations are still weak if the target model is ergodic (it is presently the case). Finally, the methodology of reconstruction is general enough to deal with textures issued from scalar or multivariate random functions, like color images.
5. The texture analysis To visualize the di!erences between di!erent textures (simulations, reconstructions or reality), the statistical multivariate data analysis is a very e$cient tool. This tool allows to exhibit the more discriminant morphological properties to separate the textures best. So we can really talk about an automatic texture analyser [22]. The used technique in the examination of data is the correspondence analysis (sometimes named factorial correspondence analysis) [23}25]. It enables us to analyze in a set of data the relations between studied images and measured descriptive functions [26,27]. At "rst, we build the array of data. For each binary texture, we estimate: f The granulometry and the antigranulometry with opening and closing for increasing sizes of structuring elements (squares), from 0 to 100. f The erosion and dilation curves (pseudo-granulometry and anti-pseudo-granulometry) for increasing sizes of structuring elements (squares), from 0 to 100. All these values are tabulated in two di!erent arrays. Dilation and erosion curves are concatenated in the same
1093
vector. The erosion part (resp. granulometry) is stored from left to right in the variables 101 to 200. The dilation part (resp. anti-granulometry) is stored from right to left in variables 100 to 1. We set the two following de"nitions: f We call one vector formed by the 200 values (or variables) associated with one observation (one image in our case) a line point. It means that every simulated image can be represented as a point located in a 200dimensional space. f We call the vector formed by the n results obtained over the population of n images associated with a given measurement a column point. The correspondence analysis makes it possible to represent the cloud of line points (images) and column points (variables) in a space of lower dimension (for instance, 2 or 3 for an easy display). This is possible by building synthetic criteria from linear combinations of the initial data (that is why this technique is a kind of factor analysis). This reduced plane is the plane that separates all the points the best keeping the maximum of inertia of this cloud. Here, the inertia comes from the Chi2 criterion. One quality indication of this analysis is the pertinence of the representation of the cloud in a reduced number of dimensions. To estimate this quality, we calculate the portion of inertia explained by each factor (i.e. each dimension). For example, if the "rst plane is made of two factors that explain respectively 56% and 18% of inertia, it means that we represent 84% of information of the total array. So the representation can be considered as representative enough. The analysis will also allow us to give a physical meaning to each factor axis. We give below the way the coordinates in a factorial plane of line points and of column points are computed. We use the following notations: f ¹ table of relative frequencies (mass) f r sum vector of lines (sometimes named line pro"le in the literature) f c sum vector of columns (column pro"le) f D diagonal matrix, where diagonal values are the r components of vector r f D diagonal matrix, where diagonal values are the c components of vector c The singular-value decompostion of ¹ gives the following matrices: ¹"M D M5 g s d
(33)
where D is the matrix with singular values on the diags onal, M is the matrix of generalized singular values (left g side), and M is the matrix of generalized singular values d (right side).
1094
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
The coordinates of textures (or line points) in the space are generated by the singular values (factors) given by R"D~1M D . (34) r g s This matrix has to be understood as: R "jth coordii,j nate of texture i (i.e. associated with the jth singular value). The coordinates of column points are calculated in the following way: C"D~1M D . c d s
(35)
6. Results For each of the "ve models, we estimated the average of the granulometry and erosion-dilation curves on "ve realizations. These curves, of the same type as in Figs. 10 and 6, are considered as the reference curves of the studied models. We reconstruct images with respect to the process described above, in Section 4. For each model, we use:
f Textures strictly respecting the histogram, the secondand third-order moment, indexed with the su$x (23) (Fig. 4(b)). f Textures strictly respecting the histogram and the second-order moment, indexed by su$x (2) (Fig. 4(c)). f Textures strictly respecting the histogram and the third-order moment, indexed by su$x (3) (Fig. 4(e)). f Textures strictly respecting the histogram, the covariance but with a di!erent third-order moment than the one of the reference, indexed by su$x (20) (Fig. 4(d)). In this case, the third-order moment (centered, reduced) is "xed to 20% of the reference value for h O0, h O0, h !h O0. 1 2 1 2 f Textures strictly respecting the histogram, but second- and third-order moments constrained to values di!erent from the reference moments, indexed by (nc) (Fig. 4(f )). In this case, the values are "xed to 20% of the original moments for h O0, h O0, h !h O0. 1 2 1 2 We have synthesized these notations in Table 1. In order to verify the dispersion of simulations, we have
Table 1 Notations Original image name Model
Reconstructions Name
Constraints Order 2
Order 3
Poisson mosaic } } } }
mo } } } }
mo-nc mo-3 mo-20 mo-2 mo-23
No No Yes Yes Yes
No Yes Yes (modi"ed values) No Yes
Boolean model of Poisson polygons } } } }
sbpp } } } }
sbp-nc sbp-3 sbp-20 sbp-2 sbp-23
No No Yes Yes Yes
No Yes Yes (modi"ed values) No Yes
Boolean model of discs } } } }
sbdisc } } } }
sbd-nc sbd-3 sbd-20 sbd-2 sbd-23
No No Yes Yes Yes
No Yes Yes (modi"ed values) No Yes
Dead leaves of Poisson polygons } } } }
dlpp } } } }
dlp-nc dlp-3 dlp-20 dlp-2 dlp-23
No No Yes Yes Yes
No Yes Yes (modi"ed values) No Yes
Dead leaves of discs } } } }
dldisc } } } }
dlp-nc dld-3 dld-20 dld-2 dld-23
No No Yes Yes Yes
No Yes Yes (modi"ed values) No No
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
1095
Fig. 6. Granulometries and antigranulometries by squares for original images and reconstructions of the mosaic (a), the boolean model of Poisson polygons and of discs (b) and (c) and dead leaves of Poisson polygons and discs (d) and (e).
produced, for each model, several realizations, indexed by letters a, b, c2. In view of the di$culty to obtain simulations with exactly the same fraction of white phase, we work with reduced and centered morphological functions. We proceed to the corres-
pondence analysis on all the results (reference and simulations). The erosion-dilation curve for original images and for reconstructions is shown in Fig. 10. The granulometries and antigranulometries are shown in Fig. 6.
1096
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
Fig. 7. Projection in the "rst factor plane of column points in the case of the granulometry and antigranulometry (a). This projection allows us to identify (and keep) the most discriminant measurements (b).
6.1. The granulometry and the antigranulometry In view of the repartition of column points (Fig. 7(a)), we can restrict our study to openings and closings of size lower than 30 pixels (variables 70 to 130). Actually, by evaluating the projections of variables on the "rst two factors, we can see that the "rst factor gives some information on disparities for small sizes of structuring elements (1 to 10) and the second factor gives some
information on the inertia brought about by di!erences for more important sizes of structuring elements (15 to 30 pixels). The new coordinates of column points bring about a very simple reading of the factors (Fig. 7(b)). The position of images is presented in Fig. 8. Note that the 2D representation is representative enough because the "rst two factors explain 86% of the inertia of the cloud of points.
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
1097
Fig. 8. Granulometry and antigranulometry: Positions of images in the "rst factor plane (a). Zooms on mosaics (b), boolean models (c) and dead leaves (d) of Poisson polygons; boolean models (e) and dead leaves of discs (e) and (f ).
At "rst, we observe that the reconstructions on the basis of the reference covariance and a third-order moment di!erent from the one of the model (textures (20)) allow to create a disconnected class from the "rst one in
the "rst factorial plane. This clearly shows the fact that the modi"cation of the third-order moment gives a realization with speci"c morphological properties (except in the case of Poisson mosaics: see the visual aspect in
1098
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
Fig. 9. Granulometry and antigranulometry: Chi2 distance between the di!erent simulations and the respective references, and the other references.
Figs. 4 (1) (b)}(d)). Moreover, the constraining of the third-order moment to the correct value makes to the decrease of the distance between reference images and reconstructions possible (see Fig. 9) in the following proportions. For morphologies using Poisson polygons: 63% for the dead leaves, 30% for the mosaics and 77% for the boolean models. For morphologies using discs: 42% for the dead leaves and 60% for the boolean model.
We can observe that textures reconstructed only on the basis of the covariance (named (2)), are nearly similar to the images reconstructed on the basis of the secondand third-order moments. This is due to the fact that if we do not constrain the third-order moment, this one comes naturally close to the theoretical value, at least for these models. So, the positioning of the simulated textures is coherent for each model. However, we can observe a number of
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
1099
Fig. 10. Erosion and dilation curves by squares for original images and reconstructions of mosaics (a); boolean models of Poisson polygons and discs (b) and (c); dead leaves of Poisson polygons (d) and discs (e).
ambiguities in terms of nearest neighbor in the factor plane: in fact, we observe, for example, that the simulations of Poisson mosaics of type (23) are closer to the reference of the boolean model of discs than to their own reference. The correct constraint of the third-order moment enables us to increase the similarity in terms of
granulometries but not enough to suppress this ambiguity. Notice that this case is the most unfavorable one, because these granulometry functions are very sensitive to noise, as the process of reconstruction makes a strictly pointwise adjustment of the attribute vector. Therefore, if
1100
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
Fig. 11. Projection of column points in the case of erosion}dilation (a). This projection lets us to identify (and keep) the most discrimant measurements (b).
the covariances are respected at the scale of images, we observe some very small patterns which make this global adjustment possible. Hence, the obtained granulometry curves are not very smooth. So the meaning of the factorial axis is not so clear for this kind of information. We will see that for erosion and dilation, curves are more stable, with certain bene"t for the evidence of the results.
6.2. The erosion and dilation curve From the repartition of the column points (Fig. 11(a)), we observe that only small sizes of structuring element (less than 11 pixels) in#uence the discrimination of images. The projections of variables on the "rst two factors allow us to con"rm this assertion: the "rst factor takes into account the inertia for small sizes of squares (1}7
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
1101
Fig. 12. Erosion and dilation curve: positions of images in the factor plane (a). Zooms on mosaics (b), boolean models (c) and dead leaves (d) of Poisson polygons; boolean models (e) dead leaves of discs (f ).
pixels) and the second factor explains the di!erences for larger structuring elements (10}15 pixels). So we proceed to a further analysis on a limited number of variables, easily readable in the "rst factorial plane (Fig. 11(b)), which represents 84% of all the inertia.
Moreover, if we consider all the images, we notice that, as in the case of granulometry, the unconstrained simulations or the ones constrained only on the basis of the third-order moment are disconnected from the others; this is due to their radically di!erent morphology. So
1102
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
Fig. 13. Erosion and dilation curve: Chi2 distance between the di!erent simulations and the respective references, and the other references.
their presence can only unfavorably in#uence the visibility of the results because this large di!erence carries a large part of the inertia. Hence, we perform now another analysis, excluding these images. The column points are given in Fig. 11(b). The "nal results are presented in Fig. 12(a). We also give di!erent zooms on di!erent families. This plane representation is excellent, because we keep 84% of the total inertia.
The readibility is clearly better because we can now distinguish immediatly the di!erent models and the associated reconstructions (see Fig. 12(a)). The textures constrained with a wrong third-order moment (type (20)) are very much disconnected from their reference, except in the case of Poisson mosaics. It shows that reconstructions on the basis of a third-order moment di!erent from the reference one can generate a new
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
class of textures, completely isolated from the textures respecting the right third-order moment. So the morphological similarity of textures of type (23) is speci"c, and di!erent from the similarity of the type (20). We present now the Chi2 distance between the realizations and their model and between all the models (see Fig. 13). We notice that textures reconstructed with the help of covariance or with the help of covariance and third-order moment are closer to their reference than to other models. So we managed to clear the ambiguity about the granulometries. It is impossible here to distinguish the textures reconstructed with the covariance from those reconstructed from the covariance and the third-order moment. It is because in the "rst case, the third-order moment "ts, spontaneously, its theoretical value (at least for these models). However, if the second-order moment is correct and the third-order moment is constrained to a di!erent value (type (20)), the similarity falls in large proportions. But if the third-order moment is constrained to the right theoretical value, the points are closer, which means that the resemblance increases in the following proportions. For morphologies based on Poisson polygons: 81% for dead leaves, 55% for mosaics and 80% for boolean model. For the morphologies based on discs, we have: 80% for the boolean model and 68% for the dead leaves. We can conclude with the following assertion: the thirdorder moment can modify the textures in great proportions (with a "xed covariance); in such a way that a good discrimination between textures is no longer possible.
7. Conclusion From the examples presented here, we observed the in#uence of a limited amount of morphological information to reconstruct a texture by the simulation process introduced by Gagalowicz. The study is based on a quantitative (distance between textures) and qualitative (explanation of the synthetic dimensions, or factors, from the projection of variables) automatic tool of texture classi"cation. This tool allows us to estimate the amount of morphological information which exists in the centered third-order moment and makes it possible to choose some morphological criteria to validate simulations. If we constrain two simulations with the same covariance but with di!erent centered third-order moments (textures of types (23) and (20)), we obtain di!erent textures, from the visual inspection as well as from the automatic classi"cation tool. The classi"cation is clearly better for the erosion and dilation curves than for granulometries. This is because these last curves are very sensitive to the residual noise produced by the Gagalowicz's algorithm. By constraining only the reconstruction on the basis of the covariance and by
1103
abandoning the third-order moment, this last one "ts spontaneously close to the correct value. This certainly will not allow us to conclude that the reconstruction with the help of the covariance only is su$cient. Actually, if we take the inverse problem and try to reconstruct one image of column (d) of Fig. 4, only on the basis of the covariance function, we obtain images of types (23) or (2). In these conditions, the reconstruction, and after that the classi"cation, would be wrong. So this reconstruction makes possible the natural "t of the third-order moment only for speci"c textures. The results of the classi"cation from the erosion}dilation curve by squares show the sensitivity of these criteria to the di!erent models used here. This illustrates the important morphological content of this type of information for the discrimination of textures, as expected from the theory of random sets. It would be interesting to constrain simulations with this kind of information in spite of the numerical cost of the updates. Finally, the same technique of reconstruction was recently applied to the textures of rough surfaces, based on random function models [28].
References [1] C.V. Deutsch, in: A. Soares (Ed.), Conditioning Reservoir Models to Well Test Information in Geostatistics Troia'92, Kluwer Academic Publishers, Dordrecht, 1993. [2] R.M. Srivastava, An annealing procedure for honouring change of support statistics in conditional simulations, in: R. Dimitrakopoulos (Ed.), Geostatistics For The Next Century, Kluwer Academic Publishers, Dordrecht, 1994. [3] C.L.Y. Yeong, S. Torquato, Reconstructing random media, Phys. Rev. E 57 (1998) 495}506. [4] S. De Ma, A. Gagalowicz, Sequential Synthesis of Natural Textures, Comput. Vision Graphics Image Process. 30 (1985) 289}315. [5] A. Gagalowicz, A new method for texture "elds synthesis. Some applications to the study of human vision, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-3 (1981) 520}532. [6] D. Stoyan, W.S. Kendall, J. Mecke, Stochastic Geometry and its Applications, 2nd Edition, Wiley, New York, 1995. [7] Ch. LantueH joul, Conditional simulation of object-based models, Proceedings of the Symposium on the Advances in the Theory and Applications of Random Sets (Fontainebleau, 9}11 October 1996), D. Jeulin (Ed.), 1997, pp. 271}288. [8] M.D. Rintoul, S. Torquato, J. Colloid Interface Sci. 186 (1997) 467}494. [9] K. Sivakumar, J. Goutsias, Discrete morphological size distributions and densities: estimation techniques and applications, J. Electron Imaging 6 (1997) 31}53. [10] R.L. Kashyap, R. Chellapa, A. Khontazand, Texture classi"cation using features derived from random "elds models, Pattern Recognition Lett 1 (1982) 43}50. [11] G. Matheron, Random Sets and Integral Geometry, Wiley, New York, 1975. [12] J. Serra, Image Analysis and Mathematical Morphology, Academic Press, London; Vol. 1, 1982, Vol. 2, 1988.
1104
A. Aubert, D. Jeulin / Pattern Recognition 33 (2000) 1083}1104
[13] E.R. Dougherty, Mathematical Morphology in Image Processing, Dekker, New York, 1993. [14] L. Savary, D. Jeulin, E!ective Complex Permittivity of Random Composites, J Phys I France 1997, pp. 1123}1142. [15] J.L. Quenec'h, M. Coster, J.L. Chermant, D. Jeulin, Study of the liquid-phase sintering process by probabilistic models: application to the coarsening of WC-Co cermets, J. Microsc. 168 (Pt 1) (1992) 3}14. [16] D. Jeulin, Mode`les morphologiques de structures aleH atoires et de changement d'eH chelle, The`se de Docteur d'Etat en Sciences Physiques, UniversiteH de Caen., 1991. [17] G. Matheron, EleH ments pour une theH orie des milieux poreux, Paris, 1967. [18] D. Jeulin, Multi-component random models for the description of complex microstructures, Proceedings of the "fth International Congress for Stereology, Mikroskopie 37 (1980) 130}137. [19] D. Jeulin, Morphological Modeling of images by Sequential Random Functions, Signal Processing 16 (1989) 403. [20] D. Jeulin, Dead Leaves Models: from space tesselation to random functions, in: D. Jeulin (Ed.), Proceedings of the Symposium on the Advances in the Theory and Applications of Random Sets, Fontainebleau, 9}11 October 1996, World Scienti"c, Singapore, 1997.
[21] D. Jeulin, Random structure analysis and modelling by Mathematical Morphology, in: A.J.M. Spencer (Ed.), Proceedings of CMDS5, Balkema, Rotterdam, 1987, pp. 745}751. [22] D. Jeulin, J. Serra, Pour reconnam( tre les inclusions: chartes ou analyseurs de textures? MeH moires et Etudes Scienti"ques de la Revue de MeH tallurgie 72 (1975) 745}751. [23] J.P. BenzeH cri, L'analyse des DonneH es: T2: L'analyse des correspondances, Dunod, Paris, 1973. [24] F. Cailliez, J.P. Pages, Introduction a` l'Analyse de DonneH es, SMASH, 1976. [25] M.J. Grennacre, Theory and Application of Correspondence Analysis, Academic Press, New York, 1984. [26] A. Aubert, ModeH lisation de la topographie de la surface et de l'interaction surface-lumie`re, rapport d'avancement 3, Internal Report N-19/97/MM of the Centre of Mathematical Morphology * Ecole des Mines de Paris, 1997 [27] A. Aubert, D. Jeulin, ModeH lisation de la topographie de la surface et de l'interaction surface-lumie`re, rapport d'avancement 5, Internal Report N-01/98/MM of the Centre of Mathematical Morphology * Ecole des Mines de Paris, 1998. [28] A. Aubert, D. Jeulin, Classi"cation morphologique de surfaces rugueuses, Revue de Metallurgie, sciences et geH nie des mateH riaux, May 1999, Accepted for publication.
About the Author*ANTOINE AUBERT received his M.S. degree in applied mathematics from the University Pierre and Marie Curie, Paris, France in 1994. After one year with the CEA (Atomic Energy), he works currently as a Ph.D. student in the group of heterogeneous media at the Mathematical Morphology Center, School of Mines of Paris. His research interests, in cooperation with steel industry (Usinor), include automatic texture classi"cation, models of random rough surfaces and the scattering of light from random surfaces, through a physical or geometrical approach.
About the Author*DOMINIQUE JEULIN is Maitre de Recherche at the Ecole des Mines de Paris, which he joined in 1986. He has been doing research and teaching in three laboratories: the Centre de Morphologie MatheH matique (Fontainebleau), where he leads a research group on the Physics of Heterogeneous Media, the Centre de GeH ostatistique (Fontainebleau), and the Centre des MateH riaux P.M. Fourt (Evry), where he is Scienti"c Adviser. He received his Civil Mining Engineer degree from the Nancy's School of Mines in 1972, his Doctor-Engineer degree in Geostatistics and in Mathematical Morphology from Ecole des Mines de Paris in 1979, and became a Docteur d'EtateH s Sciences Physiques in 1991. He has been involved in research in Image Analysis and Materials Science for 27 years, and he is author or coauthor of over 200 scienti"c papers. His current areas of interest are the theoretical prediction of overall physical properties of random heterogeneous media from their microstructure, models and simulations of random structures, and applications of Geostatistics, Image Analysis, and Mathematical Morphology to Materials Science.
Pattern Recognition 33 (2000) 1105}1117
A window-based inverse Hough transform A.L. Kesidis, N. Papamarkos* Electric Circuits Analysis Laboratory, Department of Electrical Computer Engineering, Democritus University of Thrace, 67100 Xanthi, Greece
Abstract In this paper a new Hough transform inversion technique is proposed. It is a window-based inverse Hough transform algorithm, which reconstructs the original image using only the data of the Hough space and the dimensions of the image. In order to minimize memory and computing requirements, the original image is split into windows. Thus, the algorithm can be used to large-size images as a general purpose tool. In this paper, the proposed technique is applied for edge extraction and "ltering. The edges are detected not just as continuous straight lines but as they really appear in the original image, i.e. pixel by pixel. Experimental results indicate that the method is robust, accurate and fast. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Inverse Hough transform; Edge extraction; Line detection; Non-linear "ltering
1. Introduction The Hough Transform (HT) is one of the most often used tools for curve detection. The original HT is commonly used to detect straight lines in a binary image, and was "rst proposed by Hough [1]. It is a voting process where each point (pixel) of the original binary image votes for all possible patterns (straight lines) passing through that point [2]. The votes are accumulated in an accumulator array, in which its peaks correspond to line segments. However, the information given by the peaks of the accumulator array are only the polar parameters of the straight line and the total number of pixels that belong to it. Unfortunately, the HT does not determine the exact position of each pixel in the straight lines. The main advantages of the HT are its robustness to image noise and that it can determine the slope and the distance from the origin (polar parameters) of discontinuous straight lines. The disadvantages of the HT are associated with its large storage and computational requirements. For this reason many approaches have been
* Corresponding author. Tel.: #30-541-79585; fax: #30541-79569. E-mail address:
[email protected] (N. Papamarkos)
proposed in the literature, regarding the reduction of computation time and memory requirements [3}8]. Additional techniques have been proposed to improve the accuracy [9}12] and to analyze the quantization e!ects of the Hough space [13,14]. Duda and Hart [15] improved the HT algorithm and extend it to the detection of other geometrical shapes. Ballard [16] introduced the generalized HT that could "nd arbitrary shapes of any orientation and scale. Additionally, Chatzis and Pitas introduced the fuzzy cell HT, which using fuzzy split of the Hough space detects shapes with better accuracy, especially in noisy images [17]. As it was mentioned above, HT cannot determine the exact position of the pixels of a straight line. This is a serious disadvantage in many applications such as edge detection via the HT. In this case, it is required to know the pixels of the edges and not only the polar coordinates of the edges. The solution of this problem can be achieved by the development of an inverse Hough transform (IHT) technique. The IHT can be de"ned as the technique that permits the detection of the original binary image knowing only its size and the data of the Hough space. No further information about the image is needed. Recently, a general IHT algorithm has been proposed by Kesidis and Papamarkos [18]. This IHT algorithm can be considered as a decomposition procedure which by checking the existence of the sinusoidal curve peaks in the
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 6 7 - 3
1106
A.L. Kesidis, N. Papamarkos / Pattern Recognition 33 (2000) 1105}1117
Hough space, identi"es the curves and reconstructs the original image pixel by pixel. In order to have correct inversion, the size of the accumulator array must satisfy some conditions. These conditions are analytically stated and are associated with the scale coe$cients that control the size of the accumulator array. However, the necessary size of the accumulator array, the memory requirements and the computation time increase signi"cantly with the size of the original image. Therefore, for large-size images the application of the proposed IHT is unpractical. To solve this problem we propose a Window-based inverse Hough transform (WIHT) algorithm. The method considers the original image, as a sum of not overlapped rectangular windows. In other words, the original image is split in n2, n"1, 2,2, windows and HT and IHT are applied independently to each of them. The proposed algorithm is suitable for large size images. As an application, we describe the use of WIHT for edge detection. In the last stage of the edge extraction algorithm via WIHT, a "ltering merging procedure is applied. This produces the "nal "ltered edges taking into account the "ltering edge results of each window and the global "ltering requirements. It can be noticed that the extracted edges include all pixels in the correct positions as these appear in the original image. The proposed algorithm is robust and always applicable to any size of binary images. In this paper, we provide representative examples that cover di!erent types of edge extraction and "ltering. The experimental results shown con"rm the e!ectiveness of the proposed method. The rest of this paper is arranged as follows. Section 2 gives de"nitions of the HT and discusses the quantization problems of the discrete HT implementation. In Section 3 the inversion conditions are formulated and the proper values of the scale coe$cients are de"ned. Section 4 summarizes the IHT algorithm and its implementation. Section 5 analyzes the new WIHT algorithm and describes its application. Section 6 gives some experimental and comparative results of the application of the WIHT algorithm and demonstrates its suitability for edge extraction and "ltering. Finally, Section 7 presents the conclusions.
2. De5nitions of the Hough transform In order to analyze our method it is necessary to provide some de"nitions and discuss the quantization problems associated with the discrete form implementation of the HT. The HT maps a line (not necessarily a straight line) of the image space (x, y) into a point in the Hough space. A de"nition of the HT is based on the polar representation of lines o"x cos h#y sin h. i i
(1)
All points (x , y ) of a line in the binarized image space i i correspond to a point (h, o) in the Hough space. Additionally, any point (x , y ) in the image space is mapped to i i a sinusoidal curve in the HT space. For this reason, the HT can be considered as a point-to-curve transformation. In the discrete case the Hough space is an accumulator array. In the accumulator array C, if 1/sf is the step h interval for variable h, then h3[!903, !903#1/ sf ,2, 1803]. Let also h h "h sf C h and hI "Round (h ), (2) C C where the Round(.) function gives the nearest integer of (.). Similarly, 1/sf is the step interval for variable o and o o3[o , o #1/sf , 2, o ] where o and o denote the 1 1 o 2 1 2 minimum and maximum values of o. Also it is de"ned that o "o ) sf C o and
(3)
o8 "Round (o ). (4) C C For each point (x , y ) the peak coordinates (h , o ) of i i M M the sinusoidal curve in the HT space are given by
AB
do y "0Nh "tan~1 i M dh x i and
(5)
o "x cos h #y sin h . (6) M i M i M Generally, for any value of sf and sf the coordinates of h o each peak are given by the equations h "h sf CM M h and
(7)
o "o sf . (8) CM M o At the peak of each curve in the HT space, there is a region around h de"ned by $dh where the CM C o8 values are constant due to the e!ect of the round C function. That is, if o belongs to the interval C o8 !0.5)o (o8 #0.5, (9) CM C CM then o8 "Round (o )"o8 . C C CM Also o !o"x cos h #y sin h !x cos (h #dh) M M M M !y sin (h #dh) M "o (1! cos dh)No"o cos dh. M M
(10)
(11)
A.L. Kesidis, N. Papamarkos / Pattern Recognition 33 (2000) 1105}1117
1107
In the general case, and for any value of sf it is h assumed that o "o cos C CM
A B
dh C . sf h
(12)
Since dh is symmetrically distributed around h , C CM Eqs. (9) and (12) give
A
dh "sf cos ~1 C h
o8
B
!0.5 CM . o CM
(13)
The range of the angle values where o8 "o8 (depicted C CM in Fig. 1) is given by the following equations dhI "Trunc (hI !(h !dh )), CL CM CM C
(14)
dhI "Trunc ((h #dh )!hI ), CR CM C CM
(15)
Fig. 2. The curve peaks in the upper three zones of the HT space of an 10]10 image array A having all pixels on.
where hI "Round (h ), CM CM
(16)
o8 "Round ((x cos hI #y sin hI ) sf ) CM CM CM o
(17)
and Trunc is the truncation function. In Fig. 1 can be observed the peak region of the curve of pixel (18,19) for sf "1 and sf "5. It is h "46.55, h o CM o "130.86, hI "47 and o8 "131. The dh value CM CM CM C equals to 4.24 while the angle width on the left side of hI is dhI "4 and on the right dhI "3. CM CL CR 3. Determination of the scale coe7cients The inversion of the HT is possible only if the dimensions of the accumulator array C satisfy some lower
Fig. 1. Description of a peak region. The solid line indicates the real values while the circles depict the discrete elements of the accumulator array C.
bounds. These dimensions are de"ned by the scale coe$cients sf and sf . In this section, a method is described h o which determines the minimum values of the scale coe$cients that permit the inversion of the HT. Let us consider the general case of a binary image A of N]N pixel size, which has all the pixel values equal to one. If the image dimensions are N ]N where 1 2 N ON , then, without the loss of generality, it can be 1 2 considered that N"maximumMN , N N. According to 1 2 the previous analysis, in accumulator array C, the peaks of the curves of the diagonal pixels of A are located at h "45sf . The coordinates of those peaks are given by D h Eqs. (16) and (17). To determine the minimum values for the scale coe$cients it is necessary the N2 curve peaks of the image A to be sorted according to their o8 value. Instead of sorting CM all of them, they are divided in horizontal zones, groups and classes as depicted in Fig. 2. Each zone is de"ned by the o8 value of two conseCM cutive pixels in the diagonal of the matrix A (marked with a circle in Fig. 2). The peaks in each zone are sorted in descending order and then are separated into groups so that the elements of each group have the same o8 CM value. Next, the elements of each group are divided in two classes according to their hI value. In the left class CM belong the elements of the group that have hI (h , CM D while the right class contains the elements that have hI *h . Equivalently, the separation can also be done CM D as follows: the elements with x 'y and x )y belong i i i i to left and right classes, respectively. As it has been analyzed above, for each sinusoidal curve in the HT space there is a region dh around the C peak h , where, due the quantization, the values o8 of CM C the curve points are equal to o8 . Therefore, each curve CM i has o8 values equal to its maximum o8 within an angle C CM range [hI (i) !dhI (i) , hI (i) #dhI (i) ], where dhI (i) , dhI (i) and CM CL CM CR CL CR hI (i) are given by Eqs. (14)}(16), respectively. CM
1108
A.L. Kesidis, N. Papamarkos / Pattern Recognition 33 (2000) 1105}1117
3.1. Class overlapping
3.2. Group overlapping
Let i, j denote two curves of a right class with hI (i) (hI (j) . If hI (i) #dhI (i) (hI (j) #dhI (j) , then there is CM CM CM CR CM CR no overlapping (Fig. 3). This means that there is a number of points (at least one) on the right side of row o8 , CM which are contributed only by the right curve j. The furthest right of these points is the characteristic point of the curve that allows the detection of the curve during the inversion process. Similarly, for two curves i, j of a left class with hI (i) (hI (j) if hI (i) !dhI (i) (hI (j) !dhI (j) , then CM CM CM CL CM CL there is a number of points (at least one) on the left side of row o8 which are contributed only by the left curve i. CM The furthest left is the characteristic point of the curve that allows the detection of the curve during the inversion process. In general, starting from a small value of sf and by o gradually increasing it we can achieve separation of all curve peaks into distinguished classes so that each left or right class satis"es the following condition:
Let also i, j denote two curves one from the left and one from the right class of a given group. The upper parts of all possible pairs i, j must di!er at one point at least. Starting from a small value of sf and by gradually h increasing it the following process is iteratively applied to determine the scale coe$cient sf : h
f For a left class hI (S) !dhI (S)(hI (S`1)!dhI (S`1) (18) CM CL CM CL with s"1,2, k !1, where k is the number of class L L members sorted from left to right according to the distances DhI (i) !h D, i"1,2, k CM D L f For a right class
(19)
hI (S~1)#dhI (S~1)(hI (S) #dhI (S) (20) CM CR CM CR with s"2, 2, k , where k is the number of class R R members sorted from right to left according to the distances DhI (i) !h D, i"1,2, k . CM D R
(21)
f In every group, for each element i of the left class and each element j of the right class, one of the next inequalities must be satis"ed hI (i) !dhI (i) (hI (j) !dhI (j) CM CL CM CL or hI (i) #dhI (i) (hI (j) #dhI (j) . (22) CM CR CM CR Summarizing, for any square image matrix A, of size N]N, the original image can be reconstructed correctly by using only the array C, if the scale coe$cients sf and o sf have values that satisfy the conditions (18), (20) and h (22), respectively. These conditions are referred to as the Inversion Conditions. In general, the scale coe$cients do not depend on the form of the image but only on its dimensions. Therefore, it is not necessary to apply the above procedure for scale coe$cients determination in every image under study. Alternatively, the minimum (optimal) values of the scale coe$cients can be directly obtained from a table such as Table 1, which gives the values of sf and sf for several h o image dimensions.
4. The inversion procedure Using the above analysis and de"nitions we developed an IHT procedure which permits the exact reconstruction of the original image by using only the HT space. Let us consider an accumulator array C of a HT space corresponding to an N]N pixel image array A. Let also
Table 1 Minimum scale coe$cients sf and sf for several values of image h o dimension N
Fig. 3. The curves (6, 13) and (3, 14) in the right class of group in row o8 "43. CM
Image dimension N
sf h
sf o
10 25 50 100 150 200 250 300
1 1 1 2 3 4 5 6
4 9 17 34 53 68 89 102
A.L. Kesidis, N. Papamarkos / Pattern Recognition 33 (2000) 1105}1117
A be an N]N image array, where the reconstructed inv image is stored. We suppose that the pixels of the original image that are equal to one have been transformed to the HT space. The corresponding sinusoidal curves are separated into groups and classes as mentioned above. The decomposition process of the IHT algorithm is a topdown procedure that runs from the `uppera groups (higher o8 value) to the `lowera ones and from the CM `outera member of each class (greater DhI !h D value) CM D to the `innera. Analytically, the procedure is as follows: Step 1: Examine the groups from up to down, according to the above-mentioned separation of the curves into zones, groups and classes. Step 2: Examine "rst the `outera members of the group and then the `innera ones, according to their DhI !h D CM D value. That is, examine successively the furthest left member of the left class, the furthest right member of the right class, the second furthest left member of the left class and so on. For each examined curve go to Step 3 if it belongs to the left class or to Step 5 if it belongs to the right class. If all the members of the group are examined then go to Step 1 and continue with the next lower group. Step 3: Let us suppose that the examined member corresponds to pixel (x , y ) of the original image A. The i i values dhI (xi ,yi ), hI (xi ,yi ) and o8(xi ,yi ) are given by Eqs. (14), (16) CL CM CM and (17), respectively, and describe the peak position of the curve. The extreme left peak element [hI (xi , yi )! CM dhI (xi , yi ), o8(xi , yi )] of row o8(xi ,yi )of the accumulator C is CL CM CM examined. If this element has non-zero value then go to Step 4 else execute Step 2 with the next member. Step 4: Since the value at [hI (xi , yi )!dhI (xi ,yi ), o8(xi ,yi )] is CM CL CM non-zero, the curve of pixel (x , y ) had a contribution in i i array C during the direct HT, which means that point (x , y ) in array A was equal to 1. In that case, the array i i A is updated (i.e. its (x , y ) point is set to 1) and the inv i i curve obtained from point (x , y ) is removed from array i i C, i.e. all the points of C corresponding to this curve decrease their value by 1. Go to Step 2 to proceed with the next member. Step 5: Let suppose that the examined member corresponds to pixel (x , y ) of the original image A. The values i i dhI (xi ,yi ), hI (xi ,yi ) and o8(xi ,yi ) are given by Eqs. (15)}(17) reCR CM CM spectively, and describe the peak of the curve. The furthest right peak element [hI (xi ,yi )#dhI (xi ,yi ), o8(xi ,yi )] of row CM CR CM o8(xi ,yi ) of the accumulator C is checked. If this element has CM non-zero value then go to Step 6 else execute Step 2 with the next member. Step 6: Since the value at [hI (xi ,yi )#dhI (xi ,yi ), o8(xi ,yi )] is CM CR CM non-zero, the curve of pixel (x , y ) had a contribution in i i array C during the direct HT, which means that point (x , y ) in array A was equal to 1. In that case, the array i i A is updated (i.e. its (x , y ) point is set to 1) and the inv i i curve obtained from point (x , y ) is removed from array i i C, i.e. all the points of C corresponding to this curve decrease their value by 1. Go to Step 2 to check the next member.
1109
Fig. 4. Decomposition of the curve of the (10, 10) pixel.
At the end of the above procedure, the array C is empty and the restored image A is the same with the inv original A. Fig. 4 depicts an example of the decomposition of the accumulator array C corresponding to a 10]10 pixel image and having the pixels (10,8), (8,10), (9,10) and (10,10) on. Speci"cally, the curve (10,10) is examined. This curve is the only member of the group at row o8 "57 and belongs to the right class. The check CM point value is non-zero, so the curve is removed and the element (10,10) of array A is set to one. The procedure inv continues by checking the left class of the group at row o8 "54 which has no members, then the right class of CM that group which has one member i.e. the curve (9,10), etc. Summarizing, in an accumulator array C, which has scale coe$cients sf and sf that satisfy the inversion h o conditions, we can fully reconstruct the original image A from C following the described decomposition procedure.
5. The window-based inverse Hough transform Let us suppose that we have a N]N pixel image A and we want to apply a "ltering procedure to "nd the edges that satisfy some speci"c conditions. According to IHT, we can calculate the direct HT of the entire image then apply the "lter to the accumulator array values and "nally use the IHT to extract the pixels of image A that satisfy the "lter conditions. As it was already mentioned, if the coe$cients sf and sf satisfy the inversion condih o tions (18), (20) and (22), then we can reconstruct the original image, by applying a decomposition process in accumulator array C. Unfortunately, the larger the original image is, the higher the values sf and sf are. This h o increases both the dimensions of the accumulator array and the required processing time. These requirements make the IHT prohitive if the image size is large enough.
1110
A.L. Kesidis, N. Papamarkos / Pattern Recognition 33 (2000) 1105}1117
It will be shown that by using our proposed method, which separates the image into windows, leads to signi"cantly less computation time and reduced memory requirements. Thus, the inversion procedure can be applied to large size images. 5.1. Determination of the line parameters according to a point (x , y ) k k Before we describe the WIHT procedure, it is necessary to extract the relations that describe any straight line of the image from the origin of each image window. Let us suppose the image of Fig. 5, which contains a line with polar coordinates (o , h ). These coordinates are referred o o to the image origin, which is the bottom left corner of the image. The image is separated into k windows = . Let k k"4, as shown in Fig. 5. The bottom left corner of each window is denoted as (x , y ), where k"1, 2, 3, 4. Espek k cially, for window = the parameters values of the line 1 are the same, that is (h , o )"(h , o ). o o 1 1 The general line equation gives Ax#By#C"0.
(23)
Also o "x cos h #y sin h . o o o Thus
(24)
A" cos h
(25)
o
B" sin h (26) o C"!o (27) o The distance of a point (x , y ) from the line (h , o ) is k k o o given by the relation DAx #By #CD k o" k k JA2#B2 No "Dx cos h #y sin h !o D. k k o k o o
(28)
The intersection point (x( , y( ) of line (h , o ) and axis k k o o y"y is given by the relations k By( #C Ax( #By( #C"0 x( "! k , k k k A N (29) y( "y . y( "y k k k k For the de"nition of h there are the following cases: k
H G
f If A"0 then h "$903 and the line is parallel to axis o x, so
G
h , o )y , k k h " o k h #180, o 'y . o k k f If AO0 and h 3(!90, 90) then o h , x( *x , k k h " o k h #180, x( (x . o k k
G
(30)
(31)
f If AO0 and h (!90 or h '90 then o o h , x( (x , k k (32) h " o k h #180, x( *x . o k k Finally, the value of h is eliminated between (!180 and k 180):
G
G
h , h )180, k h " k (33) k h !360, h '180. k k From Fig. 5 we can notice that by checking the values of h and o we can "nd out if line (h , o ) passes through k k o o the window = . The values of h must be in the range k k (!90, 180), while o must be in the range (0, J2 S ), w k where S denotes the dimension of window = . So, in w k case of window = , h N (!90, 180) which means that 4 4 there are no points in window = that belong to line 4 (h , o ). o o Concluding, using Eqs. (28)}(33) we can determine the parameters of line (h , o ) according to the origin of each o o window = . k 5.2. The WIHT algorithm
Fig. 5. The line parameters (o, h) according to the origin of the windows.
To apply the WIHT algorithm, the N]N pixel image A must "rst split into windows of size S ]S where w w S 3G and G is the set of the integer dividers of N. The w total number of these windows is = "(N/S )2. sum w For a window of size S ]S the coe$cients w w sf , iaι, sf are determined from the inversion conditions. h o Thus, we can calculate the HT of each window = with k k"1, 2, = , "lter the values of accumulator array sum and using the inversion process extract only the pixels of window = that satisfy the "lter conditions. Unfortuk nately, the "lter conditions refer to the whole image A. In other words, the "lter limits values (h , h ) and .*/ .!9
A.L. Kesidis, N. Papamarkos / Pattern Recognition 33 (2000) 1105}1117
(o , o ) as well as the threshold value ¹ cannot be .*/ .!9 applied directly to the accumulator arrays that corresponds to each window = . To solve this problem the k following two-phase procedure is introduced: f the control phase: where using the direct HT for every window = we collect information about the pixel k distribution in the whole image A, and f the decomposition phase: where using the IHT in each window = we "nd the pixels of image A that satisfy k the "lter conditions. These two phases are analyzed in the next: Control phase Step 1: Specify the "lter parameters. These parameters concern the regions [h , h ] for h the regions .*/ .!9 o [o , o ] for o and the threshold value ¹. .*/ .!9 o Step 2: For every window = calculate the direct HT. k The scale coe$cients sf and sf of the accumulator h o arrays C are de"ned using the inversion conditions or k can be taken from Table 1. Step 3: For every line (h , o ) with o o h 3[h 2h ] (34) o .*/ .!9 and o 3[o 2o ], (35) o .*/ .!9 de"ne the values (h(k), h(k)) from Eq. (28)}(33). These o o values specify the line (h , o ) according to the points o o (x , y ), which is the origin of window = . To ensure that k k k line (h(k), o(k)) passes through window = , check the o o k values h(k) and o(k) if they belong to the range (!90, 180) o o and (0, J2S ), respectively. If any of them is out of range w then go to Step 3 and proceed with the next line (h , o ) o o else go to Step 4. If all lines have been checked, then go to Step 2 and repeat the procedure with the next window = . k Step 4: Calculate hI and o8 using h(k) and o(k). Let C C o o < denote the value of element (hI , o8 ) of the accumulator C C array C . If this value is non-zero, then there exists at k least one pixel in window = that belongs to line (h , o ). k o o So, if