1 Curve evolution and image processing
In this volume, we study some theoretical results and the numerical analysis of ...
38 downloads
839 Views
420KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
1 Curve evolution and image processing
In this volume, we study some theoretical results and the numerical analysis of the motions of plane curves driven by a function of the curvature. If C is a smooth (say C 2 ) curve, they are described by a partial differential equation (PDE) of the type ∂C = G(κ)N, ∂t
(1.1)
where κ and N are the curvature and the normal vector to the curve. This equation means that any point of the curve moves with a velocity which is a function of the curvature of the curve at this point. (See Fig. 1.1.)
Fig. 1.1. Motion of a curve by curvature. The arrows represent the velocity at some points. Here, the velocity is a nondecreasing function of the curvature
These equations appear in differential geometry because the curvature is the variation of the area functional for hypersurfaces. (The length for curves.) In particular, the case G(κ) = κ can be considered as the gradient flow of the area (length for curve), playing an important role in the theory of minimal surfaces. These equations are also related to the description of crystal growth, where the velocity may also contain an anisotropic term depending on the normal vector. Generally speaking, curvature motions often appear in the motion of interfaces driven by an inner energy or tension, as flame propagation, melting ice, or rolling stone. Surprisingly enough, the motions
F. Cao: LNM 1805, pp. 3–21, 2003. c Springer-Verlag Berlin Heidelberg 2003
4
1 Curve evolution and image processing
by curvature have recently appeared in the field of image processing. More precisely, the theory developed in the core of this monograph aims at solving one of the steps that belongs to what has been called low-level vision. It appeared that any automatic interpretation of an image was impossible (or at least very difficult) to perform if one does not apply some preliminary operations on the image. These operations are transformations on the image which make it easier to handle, or simplify it, in order to extract the most basic information more easily. The nature of this information itself is not so easily defined and many researches have tried to mimick the human vision for computational purpose. How vision does really work is still a controversial subject and, except in the next paragraph, we shall not enter into such considerations, but try to remain as practical as possible. Let us examine a bit closer what an image analysis algorithm should intuitively do. The input of such an algorithm is an image taken by a camera. The output is some interpretation yielding an automatic decision.A commonly accepted method to attain this objective is to detect the objects that are present in the scene and to determine their position and possibly their movement. We could further try to determine the nature of these objects. Before the foundation of the Gestalt School in 1923 [168], it was believed that we detected objects because of the experience we had of them. On the contrary, Gestaltists proved by some psychophysical experimentations that without a priori semantic knowledge, shapes were conspicuous as the result of the collaboration or inhibition of some geometrical laws [99, 98]. Even though the Gestalt laws are rather simple (and were nearly set in mathematical terms by the Gestaltists), their formulation in a computational language is more complex, because they are nonlocal and hierarchically organized. A plan for the computational detection of perceptual information was initiated by Attneave [15], then Lowe [114] and more recently by Desolneux, Moisan and Morel [50]. In fact, most of widely used theories, as edge detection or image segmentation, without strictly following a Gestaltist program, take some part of it into account, since they assume that shapes are homogeneous regions separated from one another and the background by smooth and contrasted boundaries [26, 118, 133, 100], which is in agreement with some grouping Gestalt laws. These theories are often variational and can be formulated with elegant mathematical arguments. (We also refer to a recent book by Aubert and Kornprobst [16] exposing the mathematical substance of these theories.) Very recently, Desolneux, Moisan and Morel [49] developed a new algorithm for shape detection following the Gestalt principles. The advantage of this method is that the edges they found are level lines of images, and consequently, Jordan curves, which are the objects we shall deal with in the following. We assume (and believe!) that this detection program is realistic but we do not cope with it. On the other hand, this does not mean that we should consider that the problem of shape extraction has been completely elucidated! Nevertheless, as the topic of these lectures follows shape detection, we are obliged to take it for granted.
1.1 Shape recognition
5
1.1 Shape recognition Determining automatically the nature of a detected object (is it a man? a vehicle? what kind of vehicle? etc...) is achieved by placing it in some pre-established classification which is the preliminary knowledge. Algorithms use some more or less large databases allowing to precise the classification and try to compare the detected shapes with known ones. Shape recognition is this classification. Otherwise said, we have a collection of model patterns and we want to know which one the detected shape matches best. A more simple subproblem is to decide whether the observed shape matches a given model. This raises at least two questions: 1. what kind of representation do we take for a shape? (or what is our model of shape?) 2. what kind of properties a shape recognition algorithm should satisfy? In what follows, we only consider two-dimensional images. The answer to the first question shall be simple: a shape will be a subset of the plane. If the set is regular, it shall be useful to represent it by its boundary. If the set is bounded, its boundary is a closed curve. By the Theorem of Alexandrov 2.4, it is equivalent to know the set or its boundary. In order to answer the second question, let us follow David Marr in Vision [117]. “Object recognition demands a stable shape description that depends little, if at all, on the view point. This, in turn, means that the pieces and articulation of a shape need to be described not relative to the viewer but relative to a frame of reference based on the shape itself. This has the fascinating implication that a canonical coordinate frame must be set up within the object before its shape is described, and there seems to be no way of avoiding this.” A “canonical coordinate frame” is “a coordinate frame uniquely determined by the shape itself.” The description must be stable in the sense that it must be insensitive to noise. For instance, consider the shape given by the curve on Fig. 1.2(a). This curve has been obtained by scanning a hand and then by thresholding the grey level to a suitable value. One has no difficulty to recognize this shape immediately. However, in a computational point of view, this shape is very complicated. A quantitative measure of this complexity is that the curve has about 2000 inflexion points, most of which having no perceptual meaning! Let us now consider the shape on Fig. 1.2(b). This shape has been obtained from the original one by smoothing it with an algorithm described in the following of these lectures. The tiny oscillations have disappeared, and the curve has only 12 inflexion points. In itself, this number has no absolute significance. However, this shape is intuitively better in a computational point of view for three reasons: 1. it is very close to the first one. 2. it is smoother.
6
1 Curve evolution and image processing
3. Fig. 1.2(b) is a good sketch of a hand in the sense that it cannot be much simplified without changing the interpretation. As a parallel, Attneave [15] showed a sketch of a cat containing only a few (carefully chosen) lines, which were sufficient to guess what the drawing was. This means that, up to some point, a shape can be considerably simplified without altering our recognition. In a sense, Fig. 1.2(b) is closer than Fig. 1.2(a) to the minimal description of a hand.
(a) Original shape
(b) Smoothed shape
Fig. 1.2. The representation of shape for recognition must be as simple and stable as possible. Both shapes represent the same object at different level of details. For a recognition task, most of details on Fig. 1.2(a) are spurious. The shape on Fig. 1.2(b) is intuitively much simpler and visually contains the same information as the noisy one
What about the “canonical coordinate frame”? Mathematicians will call such a frame intrinsic. It is not very complicated to imagine such a frame. For instance, the origin may be taken as the center of mass of the shape. Then, the principal directions given by the second order moments provide some intrinsic directions. Those have the nice additional properties to be invariant with respect to rotations. However, if we think of a circle, this intrinsic frame is not uniquely defined. This does not matter since each frame gives the same description, but if we now think of a noisy circle, then the orientation of the frame vector may dramatically change, contradicting the stability hypothesis.
1.1 Shape recognition
7
1.1.1 Axioms for shape recognition... Marr did not give precise any practical algorithms for shape representation and recognition but, in a sense, he initiated an axiomatic approach, that has been prolonged by many people. (A recent review with axiomatic arguments is presented by Veltkamp and Hagedoorn [164] for the shape matching problem.) Most people agree that the matching problem is equivalent to finding distances between shapes. (See for example the works of Trouv´e and Younes [159, 170].) Intrinsic distance First, the distance between objects should be independent on the way we describe them. If an object is a set of pixels, it seems clear that a distance taking an arbitrary order of the points into account is not suitable. In the same way, for curves matching, the parameterization should not influence the matching, which should only depend on the geometry of the curves. Invariance Another property shall also be a cornerstone of our theory: invariance. Marr understood that the recognition should not depend on the particular position of the viewer. The mathematical formulation of invariance is a well known technique, giving extremely important results in many fields such as theoretical physics or mechanics. We consider a set of transformations (homeomorphisms), which has in general a group structure. This group models the set of modifications of the shape when the viewer moves. In a three dimensional world, images are obtained by a projection and such a group does not exist: we cannot retrieve hidden parts of objects by simply deforming the image taken by the camera. However, for “far enough” objects and “small displacements”, projective transformations are a good model to describe the modification of the silhouette of the objects since they correspond to the change of vanishing points in a perspective view. If we add some additional hypotheses on the position of the viewer, we can even consider a subgroup of the projective group. If G is the admissible group of deformations, and d a pseudo-distance between shapes, the invariance property can be formulated by ∀g ∈ G,
d(gA, B) = f (g)d(A, B),
(1.2)
where gA represents the shape A deformed by g and f : G → R does not depend on A and B. Notice that d is only a pseudo-distance since d(gA, A) = f (g)d(A, A) = 0. In a mathematical point of view, it is natural to define shape modulo a transformation, which is equivalent to define a distance between the orbits of the shape under the group action. In such a way, a true distance is retrieved instead of a pseudo-distance.
8
1 Curve evolution and image processing
Stability The stability may be thought as noise insensitivity. In [164], it is formulated by a set of four properties. The first one is strongly related to invariance, and we do not go further. The last three may be interpreted as follows: we modify a shape by some process and we compute the distance between the original shape and the new one. Then, it should be small in the following cases: 1. blurring: we add some parts, possibly important, but close to the shape. 2. occlusion: we hide a small part of a shape (possibly changing its topology). 3. noise addition: we add small parts possibly far from the shape. Simplicity This last property is not an axiom properly speaking, since it is not related to the recognition itself. However, we believe that an algorithm will be all the more efficient and fast, if it manipulates a small amount of data. Intuitively, it is certainly easier to describe the curve of Fig. 1.2(b) than the one of Fig. 1.2(a). For instance, we could think of keeping a sketch of the curve linking the points with maximal curvature. On the noisy curve, nearly all the points are maxima of curvature and the sketch is as complex as the original curve. 1.1.2 ... and their consequences What can be deduced from the heuristic above? First, in order to get insensitivity to noise, it seems natural to smooth the shapes. Naturally, we then face the problem: what kind of oscillation can be labelled as noise, or contains real information? There is no absolute answer to this question but it shall only depend of a single parameter called scale representing the typical size of what will be considered as noise, or the distance at which we observe the shape. Since we cannot choose this scale a priori, smoothing will be multiscale and shape recognition will have a sense at each scale. Since the recognition must resist to occlusion, it should, at least partially, rely on local features. This is another argument for smoothing since local features are sensitive to noise. For instance, commonly used are the inflexion points and the maxima of curvature. Since they are defined from second derivatives, this is clear that a noisy curve as in Fig. 1.2(a) is totally unreliable.
1.2 Curve smoothing We now admit the principle that shape recognition is made possible by a multiscale smoothing process removing the noise at each scale. It seems, that by adding this step, we have complicated the problem. Indeed. We do not know what kind of smoothing we have to choose, the only objective we have is to make local features reliable. It is also obvious that the smoothing has to be compatible with all the
1.2 Curve smoothing
9
assumptions we made on the recognition task (invariance, stability, simplicity). The remaining of these lectures gives the possible ways to smooth curves and a possible implementation of the underlying equations. Once for all, we do not pretend to give the method for shape recognition. We do not even give any recognition algorithm since we only focus on the low-level vision preprocessing. As Guichard and Morel [81], we simply assert that if a smoothing has to be done, then it is sound that it satisfies some requirements, and in this case, we give the corresponding mathematical models. The conclusions are only valid because of the acceptance of a model that can always be criticized or discussed1 . 1.2.1 The linear curve scale space Before studying in full details the axioms that curve smoothing should satisfy, we first shortly examine the most simple way of regularizing curves we can think of. This attempt will be a failure, but it will help us to better understand the final approach. It is well-known that a way to regularize a function is to compute its convolution with a smoothing kernel. If the kernel is taken to be a gaussian with a suitably chosen variance, then the function is solution of the heat equation, which is the archetype of smoothing equations2 . In our case, consider the curve C given by its coordinates (x(p), y(p)). For a closed curve, p is taken on the unit circle S1 endowed with its usual Riemannian structure, so that we can differentiate with respect to p. We then solve the one-dimensional heat equation for each coordinate, that is ∂2x ∂x = ∂t ∂p2
and
∂2y ∂y = 2. ∂t ∂p
(1.3)
Note that the parameterization is given once for all at time t = 0. It is classical that, for any t > 0, the coordinates x(p, t) and y(p, t) are C ∞ functions of p. Moreover, it is quite easy to find stable numerical scheme to solve (1.3). However, the smoothness of each coordinate does not imply that the curve is smooth in the sense of Def. 2.10. Numerical evidence is shown on Fig 1.3 below, where the curve develops self-intersection and cusps. Moreover, this flow is not intrinsic: if we change the parameterization and solve the heat equation, we obtain another family of curves. We can try to parameterize the original curve by its length parameter. This is actually what was done in the experiment. This is not satisfying though. Indeed, the curve may still develop some singularity (which is not expected from a smoothing process!) and the length parameter at initial time does not remain the length during the evolution. As a consequence, 1
This remark is not at all pessimistic. First, this model is more than widely accepted by the community of image processing and has allowed to obtain many results. Second, we hope the analysis may be interesting also in a mathematical point of view. 2 The result is asymptotically true, for any symmetric kernel, in the sense that, if we iterate the convolution and let the variance of the kernel go to 0, then the limit is solution of the heat equation. This makes the choice of the smoothing kernel irrelevant [81]. Let us point out that the kernel does not even need to be smooth!
10
1 Curve evolution and image processing
Fig. 1.3. Classical heat equation of a closed embedded curve. The initial curve is on top-left. The evolution by the classical convolution of the coordinates with a gaussian is displayed for five different scales from top to down and left to right. The first scale on top-right is still fine. The curve becomes close, then tangent to itself on middle left, and creates a self-intersection on middle right. The curve is then no longer embedded but remains locally smooth. On bottom left then right, this smoothness is lost as a cusp appears
it is not equivalent to smooth the curve up to a scale T , or to smooth it up to t0 < T , renormalize the curve, and finally smooth this last one up to scale T − t0 . The final result depends on t0 . The conclusion is that this smoothing algorithm is not suitable because it violates the stability and the intrinsicness of the shape recognition principles. 1.2.2 Towards an intrinsic heat equation In the previous section, we saw that the heat equation was not suitable because it is not an intrinsic evolution. Even if the initial parameterization of the curve is intrinsic, it does not remain so for any positive time. Following Mokhtarian and Mackworth [127], we can then think of renormalizing the curve at any time. On the one hand, we obtain an intrinsic evolution of the curve, but on the other hand, the time and space parameters are no longer independent. More precisely, we denote by s0 the length parameter of the curve at time t = 0. We start solving the classical
1.2 Curve smoothing
11
heat equation with this parameter s0 up to some small time h > 0 and obtain a curve Ch . We stop the evolution and parameterize the new curve by computing its length parameter sh . We then solve the classical heat equation for the scales comprised between h and 2h and iterate the process while it is possible. If we now let h (the time interval between two renormalizations) tend to 0, we then heuristically obtain the equation ∂2x ∂2y ∂x ∂y = = 2, and (1.4) 2 ∂t ∂s ∂t ∂s which is called the intrinsic heat equation. In a condensed form it is written ∂C ∂2C , = ∂t ∂s2 or by replacing the right hand term by its expression with the curvature (see Def. 2.14) ∂C = κN. ∂t
(1.5)
Contrary to what we can think at once when looking at (1.4), this equation is nonlinear, since the length parameter depends on the scale t. This equation is also known under the name mean curvature motion since its interpretation is that any point of the curve at time t moves in the normal direction with a velocity equal to the curvature. If we go a bit further in the examination of the equation, we see that a curve moves inward when it is locally convex and outward when it is locally concave3 . Hollow parts are progressively filled while bumps are worn out. Thus, it seems that the general behavior is that the curve tends to become more and more convex. For a convex curve, any point move inward and the curve can only shrink, at least while it stays convex. In order to compare with the classical heat equation, we display the mean curvature motion of the same shape on Fig. 1.4. What can be rigorously said of the existence, the uniqueness and the regularity of a solution? (Remind that we look for some smoothing process.) All of the questions above were studied and eventually answered by two celebrated papers by Gage-Hamilton [69] and Grayson [77]. A part of Chap. 3 will be dedicated to their results. Remark 1.1. At the sight of Fig. 1.4, we can wonder about the use of shortening the curve until it disappears. The asymptotic shape has nothing to do with the initial one. Gage and Hamilton even proved that the isoperimetric constant of the curve tends to the one of a circle, implying that it becomes rounder and rounder. This, in general, may seem a bit disturbing, if we understand filtering as a denoising or restoration process whose purpose is to remove spurious parts of a shape. This is not the aim of shape analysis. Indeed, the scale parameter can be interpreted as the distance from 3
We shall say that a curve is locally convex (resp. concave) in the neighborhood of some point if its curvature is nonnegative (resp. nonpositive) around this point. Thus the definition is orientation dependent. Alternatively, if the curve is the boundary of some set K, we say that it is convex near some point x if K ∩ B is convex for some small ball B centered at x, and concave if the complementary of K is locally convex.
12
1 Curve evolution and image processing
Fig. 1.4. Motion of the same curve as in Fig. 1.3 obtained by using the intrinsic heat equation. The numerical scheme is exposed in Chap. 6. The curve is smooth at any time, becomes convex and eventually disappears
which the shape is seen. Since this distance in arbitrary, all scales have to be taken in consideration, and this is the reason why multiscale analysis seems adequate. Therefore, the fact that the shape always becomes a circle is an advantage. Indeed, this means that we have gradually extracted all the geometric information; we can be sure of this because we have attained the final state which is the same for any shape. When the shape disappears, no information is left in it. Moreover, we do not create singularities in the filtering which could be thought as some important cues. In the very same paper where they introduce the renormalization scheme, Mackworth and Mokhtarian [127] gave an algorithm a multiscale shape analysis that uses all the scales, whose principle is to follow critical points of the curvature through all the scales.
1.3 An axiomatic approach of curve evolution 1.3.1 Basic requirements The mean curvature motion seems to define a nice smoothing process for curves. It is intrinsic, and experimentally, smoothes a curve, which eventually becomes a “round point”. Nevertheless, we found this equation a little bit by chance, by renormalizing
1.3 An axiomatic approach of curve evolution
13
the classical heat equation at any time. A more satisfying approach is to directly examine all the criteria that the smoothing should respect. We already know some of them, but we can complete the list as follows. 1. The smoothing should be causal: the information is contained in the initial shape and the filtering can only remove some details. In particular, if we choose to look at the shape at different scales, then the shape viewed at large scale should be deduced from the same shape at any smaller scale. 2. The smoothing should be intrinsic: describing a shape or a curve by a function must not make us forget that we are interested in the geometrical object itself and not the way it is described. In particular, when dealing with curves, the smoothing should not depend on any parameterization. 3. The smoothing should be local: because of occlusions, we are often unable to match complete shapes but only parts of shapes corresponding to apparent parts. It is sound to assume that the fact that there are hidden parts does not dramatically modify the smoothing of visible parts. 4. The smoothing should be invariant with respect to some geometric transformations: we mean that the position of the observed object, or the position of the observer may have changed between the two views of a same object. This should not prevent an efficient algorithm to recognize an object. As a consequence, the smoothing itself should provide the same information whatever the relative position of the observer and the observed object is. One way to obtain this is to choose a smoothing process which is invariant with respect to the class of all admissible motions. Of course, these motions must be part of the model. 5. The smoothing should be stable: two shapes that are close to each other (in a sense willingly vague) should not diverge. A link with the locality assumption is that, if a shape locally contains another one, then this remains true for small smoothing scales. The problem is now to translate those qualitative requirements into well posed mathematical terms; first, are they are compatible or not; what kind of mathematical models can be derived; what can be said about these models about their properties, their well-posedness; how can they be implemented? 1.3.2 First conclusions and first models Let us first check the causality assumption. We suppose that the smoothing is achieved by an operator Tt depending on the observation scale t. The causality can be formulated as follows: if s < t, then Tt can be obtained from Ts by an operator Ts,t passing from scale s to scale t. This is related to a stronger but very usual assumption which is a semi-group property, telling that Ts,t = Tt−s , meaning that the filtering process is stationnary. Otherwise said, for any s and t, we have Ts+t = Tt ◦ Ts and T0 = Id, the identity operator. Now, if we compare the smoothing between scale t and t + h for a small h > 0, that is we compute Tt+h − Th , we can use the semi-group property to derive
14
1 Curve evolution and image processing
Tt+h − Tt = (Th − Id) ◦ Tt . If the evolution itself is smooth, then, up to a renormalization of the scale, there exists an operator F such that Th − Id = hF + o(h), F being called the infinitesimal generator of the semi-group. Thus, we shall look for smoothing processes such that the motion of a curve Cis described by an equation of the type ∂C = F (C), ∂t
(1.6)
where F is some vector-valued function depending on C and its spatial derivatives. Each point of the curve has a velocity equal to F . Both heat equations (classical and intrinsic) presented above fit this model. But, if, in addition, we require the smoothing to be intrinsic, this implies that F should only depend on intrinsic quantities. This eliminates the classical heat equation, since whatever the initial parameter may be, it is not intrinsic for positive scale. The locality implies that the velocity function F only depends on local features of the curve at scale t. For instance, this rejects the total length of the curve which cannot be computed without knowing the whole curve. Thus, we look for a function F depending on intrinsic differential characteristics. These characteristics cannot be chosen randomly. Indeed, they must be invariant with respect to some transformations and the way to find them is developed in Chap. 3. For instance, if we want the smoothing to be invariant with respect to isometries (translations and rotations), then we can choose the velocity F to be a function of the curvature κ. Since the length parameter is also isotropic invariant, the derivatives of κ with respect to s also are invariant and we obtain a whole family of equations which are ∂C ∂κ ∂nκ = F κ, ,··· , n . (1.7) ∂t ∂s ∂s In Chap. 3, we shall see indeed that those equations are suitable, but also that any suitable equation is of this type. Indeed, the theory developed by Olver, Sapiro and Tannenbaum [140, 142, 141] will allow us to classify all the intrinsic invariant equations. We have not examined the stability property yet. In fact, the invariance approach of Olver et al. gives no such result, and each equation has to be individually studied. Mathematically, a local stability principle is equivalent to a local maximum principle. It is known that some of the equations of type (1.7) do not satisfy the maximum principle [64, 74]. On the other hand, if the velocity is an increasing function of the single curvature, then (1.7) is a parabolic equation and we can hope that a maximum principle holds. In the following sections, we shall see that the problem of curve evolution may be considered as generic, since other points of view and purposes make them appear naturally.
1.5 Applications
15
1.4 Image and contour smoothing A completely different point of view was adopted by Alvarez, Guichard, Lions and Morel [4] for who stability is a primordial axiom. The approach is to apply a smoothing before the shape detection. More precisely, the image smoothing is such that we can apply the smoothing and the detection in any order. As a consequence the smoothing must be compatible with an axiom of shape conservation. For instance, it is well known that a gaussian blurring makes an image smoother, but too much visual information is lost. Indeed, the contours are less and less marked and we can hardly see the edges of objects. It is then irrealistic to automatically and precisely detect the shapes. In [31], Caselles et al. argue that edges coincide with parts of level lines of images. Experimentally, this can be checked by thresholding the grey level of an image for different values. It is striking how a few grey levels allow to retrieve a large part of the objects. A consequence of this principle is that the smoothing must preserve level lines as much as possible. This can be formalized in terms of invariance with respect to contrast changes. (See [81] and Chap. 4 in this volume.) Alvarez et al. then prove that if u : R2 → R is a grey level image, the suitable smoothing equations are of the type ∂u = |Du|G(curv u), ∂t
(1.8)
where G is a nondecreasing function and curv u is a second-order differential operator corresponding to the curvature of the level lines of u. This equation means that the level lines move with a normal velocity equal to a function of the curvature, that is, an equation of type (1.7), a motion of the classification of Olver et al.! We shall see in Chap 4 why this is not a coincidence.
1.5 Applications In this section we give three examples of applications of shape simplification in image processing. Of course, it is not exhaustive at all and only aims at illustrating the use of geometric motions. 1.5.1 Active contours Active contours algorithms aim at finding the boundary of an object, by assumimg that it is smooth and that the object is contrasted with respect to the background. (Contrast can be understood in a general sense, since it can mean that the gray level, the color, the texture etc... are different in the object and on the background.) The original idea by Kass, Witkin and Terzopoulos [100] is to initialize a curve in an image and let this curve move until it adapts to the contour of a searched object. The motion of the curve is driven by the image itself. The driving force is obtained by defining a potential in the image that shall be small near objects contours. Following Marr’s paradigm [117], contours coincide with irregulars parts of the image. If u is
16
1 Curve evolution and image processing
a grey level image, the external potential is defined at each point of the image and is of the type 1 , g(x) = 1 + |DGσ u|2 where Gσ is a gaussian with standard deviation σ. The term appearing in the denominator is the gradient of a regularized version of the image. (The star denotes a convolution.) Let C be a curve parameterized by p ∈ [0, 1]. The external potential yields an external energy on a curve C defined by Eext (C) =
1
g(C(p)) dp. 0
Minimizing such a potential does not ensure to obtain a curve, since a minimizing sequence may degenerate into something very sparse. The idea is then to add some internal energy terms. This internal energy aims at controlling the rigidity of the curve. The initial model basically contained the term Eint (C) = α
0
1
2
|C (p)| dp + β
0
1
|C (p)|2 dp,
with α, β > 0. Finally, the energy to minimize takes the inner an outer forces into account and is E(C) = Eint (C) + λEext (C). An immediate objection to this model is that it is not intrinsic since it depends on a particular parameterization. Instead of this energy, we then consider E(C) = g(C(s)) ds C
where ds is an Euclidean parameter of the curve. We do not add any other term. The length is already penalized, and the external potential is also taken into account. It is interesting to compute the first variation of this energy. This shall give a necessary condition for a curve to be a minimizer and the gradient descent which can provide a numerical implementation. (We do not discuss its convergence at all.) Exercise 1.2. Let p ∈ S1 an extrinsic parameterization of C and let δC1 a small variation of C also parameterized by p. 1. By using the parameter p, prove that E(C + δC1 ) − E(C) = (δC1 )(p) · (Dg(C(p))|Cp | C
+g(C(p))
(δC1 )p · Cp dp + o(δ) |Cp |
1.5 Applications
17
2. By integrating by parts and using the length parameter, prove that a minimum curve must satisfy (Dg · N − gκ)N = 0, and that the “gradient flow” of the energy E is ∂C = (gκ − (Dg · N))N. ∂t
(1.9)
This exercise shows that a motion by curvature appears and it would be interesting to know how to solve such an equation. However, the solution may degenerate since changes of topology are likely to occur. A solution proposed by Caselles, Catt´e, Coll and Dibos [30] is to use a scalar formulation based on level sets methods. In Chap. 4, we shall see that their method consists in solving the evolution equation ∂v D2 v(Dv, Dv) = g(x)|Dv| ∆v − − Dg · Dv. ∂t |Dv 2 | The minimization of the energy E above was proposed by Caselles, Kimmel and Sapiro [32], who explained the link between this approach and curvature motions of curve on non flat surfaces. Mathematical results for such motions by a direct approach were found by Angenent [12, 13]. 1.5.2 Principles of a shape recognition algorithm The recent shape recognition algorithms try to take the invariance principles into account. Most of them are limited to affine invariance. Indeed, projective invariant algorithms are in general more difficult to implement. Moreover, projective invariants are numerically less stable, because they are of higher order, and in practice, affine invariance is often sufficient. In this paragraph, we shortly describe the first affine invariant shape recognition algorithm using the affine invariant smoothing. It was achieved by Cohignac [40, 41] who defined characteristic points of shapes. They basically correspond to inflexion points and extrema of curvature while being more stable. More precisely, if C0 is a closed Jordan curve, we denote by Ct the curve obtained by affine invariant smoothing (see Chap. 3) at scale t. We identify Ct and its interior. Now, for a > 1, we examine the parts of Ct/a which are enclosed between the curve and the tangent at some point of Ct , as on Fig. 1.5. The enclosed area is algebraic, according to the local inclusion relation between Ct/a and Ct . For instance, at x1 , the area is positive, negative at x2 , and equal to 0 at x3 . Cohignac calls characteristic points, the points of Ct for which this area is extremal or zero crossing (that is where Ct/a and Ct cross). The characteristic regions are those regions where the area is extremal. The characteristic points provide an affine invariant collection of points which are used as a coarse representation of C for recognition. Since they are determined from area computations, they are more stable than simple curvature estimates. Other affine invariant characteristics are provided by the barycenters of the characteristic regions. If we want to match two shapes C and C , we try to find the best affine transform that maps the characteristic points of C to the points of C .
18
1 Curve evolution and image processing
x3
x1
x2
Ct Ct/a Fig. 1.5. Cohignac’s shape recognition algorithm. The dashed curve is the smoothed curve at scale t/a (a > 1) and the solid one is the curve at scale t. Characteristics points are the points where the enclosed area is extremal or equal to 0. Matching two shapes is achieved by matching the set of characteristic points
An affine mapping is completely determined by the image of three points. For any triple of characteristic points of C, we compute the coordinates of all other characteristic points in the barycenter frame. We then compare these coordinates when the frame triple is mapped on a triple of characteristic points of C . The procedure may be made faster by applying some learning procedure as a hashing procedure. More recently, Lisani [111] with Moisan, Monasse and Morel [112] also used the affine invariant curve smoothing before applying some affine invariant features recognition. The principle of the algorithm is to create a dictionnary containing pieces of curves. These ones are chosen in a very stable and affine invariant way, then normalized in a reference frame. Then, being given a curve, it is splitted into pieces which are susceptible to match with codes in the dictionnary. Comparisons are made in the reference frame, which makes the algorithm affine invariant up to computational errors. 1.5.3 Optical character recognition A particular application for which curve smoothing has turned out to be useful is optical character recognition (OCR). In this paragraph, we only show an experiment motivating efficient curve smoothing algorithms. Automatic character recognition is extremely difficult for hand-written documents, and the field is widely opened. We are here interested in the more easy case of typed characters. Even in this case, the same character may take many different appearances for instance because of the choice of different fonts. For bad quality documents, differences also arise from noise. On Fig 1.6, we display a few words taken from a fax, where the words “papers” are taken from different places. Note that this is directly the transmitted fax and not scanned after the fax was printed. As a consequence, the image is directly a binary image (which could have been different after a scan). The letters
1.6 Organization of the volume
19
Fig. 1.6. Upper row: Some words directly taken from a fax. Remark how letters may have different topology. Middle row: smoothed curves by affine invariant smoothing. Bottom row: affine invariant matching. There are 29 matching pieces of curves, 4 of which are false matchings
are different in details, and a smoothing is necessary before applying a matching procedure [111, 112, 134].
1.6 Organization of the volume Chapter 2 is a short introduction to the geometry of plane curves. We shall introduce the notations we shall use throughout these notes. Chapter 3 is dedicated to the research of all the intrinsic curve evolution equations. The approach is divided in three steps. We first introduce the tools of differential geometry to formulate rigorously the problem of finding invariant evolution equations. We then derive these equations in a systematic way. Finally, existence, uniqueness and properties of these equations are exposed for the simplest cases. Concerning results about existence and uniqueness, many authors contributed to the advance of the theory, and we shall recall the main known results for the motions by curvature. All these results are given with no proof, since they are long, technical and do not fit the purpose of these notes. (Of course, we shall give references for the interested reader.) We shall concentrate on the axiomatic approach that allows to derive the invariant equations, by developing a short theory of differential invariants. Different authors have then tried to introduce a weak notion of solution for the curve shortening problem. Chapter 4 presents the level sets approach. We first expose the connection between the curve evolution approach and the level sets one by using some results on monotone operators and see that some particular PDEs (the so-called
20
1 Curve evolution and image processing
geometric PDEs) naturally appear in the analysis. We then give a self-contained exposure of the theory of curvature motion by level sets method with existence and uniqueness results of viscosity solutions. In Chap. 5, we briefly discuss the classical existing algorithms for curve evolution. Finally, we present in Chap. 6 a geometric algorithm for curve evolution. This algorithm allows to solve the curvature motion when the velocity is a power of the curvature larger than 1/3. We first propose a theoretical scheme which is inconditionally stable, consistent and convergent in the sense of level sets. We shall then give a possible implementation of this scheme. Finally, we end with many numerical experiments where we check the invariance and the stability properties of the proposed numerical scheme.
1.7 Bibliographical notes It is certainly quite difficult to know when the first ideas of computer vision appeared. A computational program was already proposed by Attneave [15], but a commonly adopted reference is Marr’s Vision [117], where he introduced the concept of raw primal sketch, which considerably influenced the research in computer vision. Montanari [130] had previously studied the problem of line detection in a noisy context. Shape extraction was launched with the edge detection doctrine by Marr and Hildreth [118], followed by hundreds of papers among which Canny’s is one of the most famous [26]. There are basically two other classes of sketch extraction according to whether contours or regions are the objects of interest. Active contours were introduced by Kass, Witkin and Terzopoulos [100], and later improved by the use of level sets techniques (see Chap 4) by Caselles, Catt´e, Coll and Dibos [30] and Caselles, Kimmel and Sapiro [32]. Equation (1.9) comes from [32]. In image segmentation, edges are the boundaries of the segmented areas, in which a property is homogeneous. A general model is Mumford and Shah’s [133], with mathematical developments in Morel and Solimini’s book [131], and even more recently in a book by Ambrosio, Fusco and Pallara [8]. All of these methods implicitly follow some of the principles of the Gestalt school [99, 105, 168], since they assume that perceptual contours are smooth and delimit contrasted zones. More recently, an approach based on a level set decomposition was proposed by Desolneux, Moisan and Morel in [49]. A smoothing method of extracted shapes was proposed by Koenderink and Van Doorn [103], consisting in solving the heat equation for the characteristic function of the shape. The lack of locality and causality was tackled by Bence, Merriman and Osher (see [22] and Chap. 4 and 5.) For a direct curve approach, the renormalization of Sect. 1.2.2 was proposed by Mackworth and Mokhtarian [127], with a multiscale matching algorithm. An axiomatic approach for curve evolution was proposed by Lopez and Morel [113] and Sect. 1.3 is a simplified version of their work. The works by Olver, Sapiro and Tannenaum [141, 142] on invariant flows classification will be described in Chap. 3. Figure 1.2 was obtained with Moisan’s affine erosion algorithm [104, 125, 126]. Shape matching is
1.7 Bibliographical notes
21
another huge subject, with an overwhelming bibliography. The axiomatic approach we gave in Sect. 1.1.1 can basically be found in Veltkamp and Hagedoorn recent review [164]. The first matching method using the affine morphological scale space (see [4] and Chap. 4) is Cohignac’s [40, 41] and is described in Sect.1.5.2, while the experiments of Sect. 1.5.3 are due to an algorithm by Lisani [111, 112] recently improved by Mus´e, Sur and Morel [135].