Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5567
Xue-Cheng Tai Knut Mørken Marius Lysaker Knut-Andreas Lie (Eds.)
Scale Space and Variational Methods in Computer Vision Second International Conference, SSVM 2009 Voss, Norway, June 1-5, 2009 Proceedings
13
Volume Editors Xue-Cheng Tai Department of Mathematics University of Bergen, Norway and Division of Mathematical Science Nanyang Technological University, Singapore E-mail:
[email protected] Knut Mørken Department of Informatics and Centre of Mathematics for Applications University of Oslo, Norway E-mail:
[email protected] Marius Lysaker Simula Research Laboratory Lysaker, Norway E-mail:
[email protected] Knut-Andreas Lie Centre of Mathematics for Applications University of Oslo, Norway and SINTEF ICT, Oslo, Norway E-mail:
[email protected] Library of Congress Control Number: Applied for CR Subject Classification (1998): I.4, I.5, I.3.5, I.2.10, G.1.2, F.2.2 LNCS Sublibrary: SL 6 – Image Processing, Computer Vision, Pattern Recognition, and Graphics ISSN ISBN-10 ISBN-13
0302-9743 3-642-02255-3 Springer Berlin Heidelberg New York 978-3-642-02255-5 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12689675 06/3180 543210
Preface
This book contains 71 original, scientific articles that address state-of-the-art research related to scale space and variational methods for image processing and computer vision. Topics covered in the book range from mathematical analysis of both established and new models, fast numerical methods, image analysis, segmentation, registration, surface and shape construction and processing, to real applications in medical imaging and computer vision. The ideas of scale space and variational methods related to partial differential equations are central concepts. The papers reflect the newest developments in these fields and also point to the latest literature. All the papers were submitted to the Second International Conference on Scale Space and Variational Methods in Computer Vision, which took place in Voss, Norway, during June 1–5, 2009. The papers underwent a peer review process similar to that of high-level journals in the field. We thank the authors, the Scientific Committee, the Program Committee and the reviewers for their hard work and helpful collaboration. Their contribution has been crucial for the efficient processing of this book, and for the success of the conference. Finally, we wish to thank those who have supported and helped to organize the conference. First and foremost it is a pleasure to acknowledge the generous financial support from the Centre of Mathematics for Applications (CMA) at the University of Oslo and the Research Council of Norway. In addition, partial support was given by the Centre of Integrated Petroleum Research (CIPR) at the University of Bergen and the Simula Research Laboratory (SRL). Moreover, we would like to thank Tiril P. Gurholt and Andrew McMurry for their support, both with technical and administrative matters. Members and students from the Mathematical Imaging and Vision Group at the Nanyang Technological University of Singapore and the University of Bergen, Norway deserve special thanks for their kind help. March 2009
Xue-Cheng Tai Knut Mørken Marius Lysaker Knut-Andreas Lie
Organization
Organizing Commitee and Editors Xue-Cheng Tai
University of Bergen, Norway, and Nanyang Technology University Singapore, Conference Chair Sintef, Norway Simula Research Laboratory, Norway University of Oslo, Norway
Knut-Andreas Lie Marius Lysaker Knut Mørken
Scientific Committee Alfred M.Bruckstein Tony F. Chan Mads Nielsen Stanley Osher Nikos Paragios Bart M. ter Haar Romeny Christoph Schnoerr Fiorella Sgallari Joachim Weickert
Technion IIT, Israel University of California at LA, USA University of Copenhagen, Denmark University of California at LA, USA Ecole Centrale de Paris, France Eindhoven University of Technology, The Netherlands University of Heidelberg, Germany University of Bologna, Italy Saarland University, Germany
Program Commitee Luis Alvarez Noura Azzabou Thomas Brox Bernhard Burgeth Vicent Caselles Raymond Chan Yeowmeng Chee Yunmei Chen Daniel Cremers Francoise Dibos Michael Felsberg Luc Florack
Lewis Griffin Anders Heyden Charles Kervrann Ron Kimmel Arjan Kuijper Georg Langs Antonio Leitao Riccardo March Antonio Marquina Etienne Memin Karol Mikula Jan Modersitzki
Michael Ng Mila Nikolova Martin Rumpf Otmar Scherzer Nir Sochen Gabriele Steidl Demetri Terzopoulos David Tschumperle Baba C. Vemuri Hongkai Zhao Haomin Zhou
VIII
Organization
Invited Speakers Antonin Chambolle, CMAP - Ecole Polytechnique, France Raymond Chan, The Chinese University of Hong Kong, China Amiram Grinvald, Weizmann Institute of Science, Israel
Sponsoring Institutions Centre of Mathematics for Applications, University of Oslo Research Council of Norway Centre of Integrated Petroleum Research, University of Bergen Simula Research Laboratory
Table of Contents
Segmentation and Detection Graph Cut Optimization for the Piecewise Constant Level Set Method Applied to Multiphase Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . Egil Bae and Xue-Cheng Tai Tubular Anisotropy Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fethallah Benmansour and Laurent D. Cohen
1 14
An Unconstrained Multiphase Thresholding Approach for Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benjamin Berkels
26
Extraction of the Intercellular Skeleton from 2D Images of Embryogenesis Using Eikonal Equation and Advective Subjective Surface Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Paul Bourgine, Peter Frolkoviˇc, Karol Mikula, Nadine Peyri´eras, and Mariana Remeˇs´ıkov´ a
38
On Level-Set Type Methods for Recovering Piecewise Constant Solutions of Ill-Posed Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adriano DeCezaro, Antonio Leit˜ ao, and Xue-Cheng Tai
50
The Nonlinear Tensor Diffusion in Segmentation of Meaningful Biological Structures from Image Sequences of Zebrafish Embryogenesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Olga Drbl´ıkov´ a, Karol Mikula, and Nadine Peyri´eras
63
Composed Segmentation of Tubular Structures by an Anisotropic PDE Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elena Franchini, Serena Morigi, and Fiorella Sgallari
75
Extrapolation of Vector Fields Using the Infinity Laplacian and with Applications to Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laurence Guillot and Carole Le Guyader
87
A Schr¨ odinger Equation for the Fast Computation of Approximate Euclidean Distance Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karthik S. Gurumoorthy and Anand Rangarajan
100
Semi-supervised Segmentation Based on Non-local Continuous Min-Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nawal Houhou, Xavier Bresson, Arthur Szlam, Tony F. Chan, and Jean-Philippe Thiran
112
X
Table of Contents
Momentum Based Optimization Methods for Level Set Segmentation . . . Gunnar L¨ ath´en, Thord Andersson, Reiner Lenz, and Magnus Borga Optimization of Divergences within the Exponential Family for Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Francois Lecellier, Stephanie Jehan-Besson, Jalal Fadili, Gilles Aubert, and Marinette Revenu Convex Multi-class Image Labeling by Simplex-Constrained Total Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jan Lellmann, J¨ org Kappes, Jing Yuan, Florian Becker, and Christoph Schn¨ orr Geodesically Linked Active Contours: Evolution Strategy Based on Minimal Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julien Mille and Laurent D. Cohen
124
137
150
163
Validation of Watershed Regions by Scale-Space Statistics . . . . . . . . . . . . . Tomoya Sakai and Atsushi Imiya
175
Adaptation of Eikonal Equation over Weighted Graph . . . . . . . . . . . . . . . . Vinh-Thong Ta, Abderrahim Elmoataz, and Olivier L´ezoray
187
A Variational Model for Interactive Shape Prior Segmentation and Real-Time Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manuel Werlberger, Thomas Pock, Markus Unger, and Horst Bischof
200
Image Enhancement and Reconstruction A Nonlinear Probabilistic Curvature Motion Filter for Positron Emission Tomography Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Musa Alrefaya, Hichem Sahli, Iris Vanhamel, and Dinh Nho Hao
212
Finsler Geometry on Higher Order Tensor Fields and Applications to High Angular Resolution Diffusion Imaging . . . . . . . . . . . . . . . . . . . . . . . . . Laura Astola and Luc Florack
224
Bregman-EM-TV Methods with Application to Optical Nanoscopy . . . . . Christoph Brune, Alex Sawatzky, and Martin Burger
235
PDE-Driven Adaptive Morphology for Matrix Fields . . . . . . . . . . . . . . . . . Bernhard Burgeth, Michael Breuß, Luis Pizarro, and Joachim Weickert
247
On Semi-implicit Splitting Schemes for the Beltrami Color Flow . . . . . . . Lorina Dascal, Guy Rosman, Xue-Cheng Tai, and Ron Kimmel
259
Multi-scale Total Variation with Automated Regularization Parameter Selection for Color Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yiqiu Dong and Michael Hinterm¨ uller
271
Table of Contents
Multiplicative Noise Cleaning via a Variational Method Involving Curvelet Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sylvain Durand, Jalal Fadili, and Mila Nikolova Projected Gradient Based Color Image Decomposition . . . . . . . . . . . . . . . . Vincent Duval, Jean-Fran¸cois Aujol, and Luminita Vese
XI
282 295
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christoffer A. Elo, Alexander Malyshev, and Talal Rahman
307
Anisotropic Regularization for Inverse Problems with Application to the Wiener Filter with Gaussian and Impulse Noise . . . . . . . . . . . . . . . . . . Micha Feigin and Nir Sochen
319
Locally Adaptive Total Variation Regularization . . . . . . . . . . . . . . . . . . . . . Markus Grasmair Basic Image Features (BIFs) Arising from Approximate Symmetry Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lewis D. Griffin, Martin Lillholm, Mike Crosier, and Justus van Sande
331
343
An Anisotropic Fourth-Order Partial Differential Equation for Noise Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohammad Reza Hajiaboli
356
Enhancement of Blurred and Noisy Images Based on an Original Variant of the Total Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Khalid Jalalzai and Antonin Chambolle
368
Coarse-to-Fine Image Reconstruction Based on Weighted Differential Features and Background Gauge Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bart Janssen, Remco Duits, and Luc Florack
377
Edge-Enhanced Image Reconstruction Using (TV) Total Variation and Bregman Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shantanu H. Joshi, Antonio Marquina, Stanley J. Osher, Ivo Dinov, John D. Van Horn, and Arthur W. Toga Nonlocal Variational Image Deblurring Models in the Presence of Gaussian or Impulse Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miyoun Jung and Luminita A. Vese A Geometric PDE for Interpolation of M -Channel Data . . . . . . . . . . . . . . Frank Lenzen and Otmar Scherzer An Edge-Preserving Multilevel Method for Deblurring, Denoising, and Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Serena Morigi, Lothar Reichel, and Fiorella Sgallari
389
401 413
426
XII
Table of Contents
Fast Dejittering for Digital Video Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . Mila Nikolova
439
Sparsity Regularization for Radon Measures . . . . . . . . . . . . . . . . . . . . . . . . . Otmar Scherzer and Birgit Walch
452
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simon Setzer Anisotropic Smoothing Using Double Orientations . . . . . . . . . . . . . . . . . . . Gabriele Steidl and Tanja Teuber
464 477
Image Denoising Using TV-Stokes Equation with an Orientation-Matching Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xue-Cheng Tai, Sofia Borok, and Jooyoung Hahn
490
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration for ROF Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xue-Cheng Tai and Chunlin Wu
502
The Convergence of a Central-Difference Discretization of Rudin-Osher-Fatemi Model for Image Denoising . . . . . . . . . . . . . . . . . . . . . Ming-Jun Lai, Bradley Lucier, and Jingyue Wang
514
Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Welk, Guy Gilboa, and Joachim Weickert
527
L0 -Norm and Total Variation for Wavelet Inpainting . . . . . . . . . . . . . . . . . Andy C. Yau, Xue-Cheng Tai, and Michael K. Ng
539
Total-Variation Based Piecewise Affine Regularization . . . . . . . . . . . . . . . . Jing Yuan, Christoph Schn¨ orr, and Gabriele Steidl
552
Image Denoising by Harmonic Mean Curvature Flow . . . . . . . . . . . . . . . . . Mourad Z´era¨ı
565
Motion Analysis, Optical Flow, Registration and Tracking Tracking Closed Curves with Non-linear Stochastic Filters . . . . . . . . . . . . Christophe Avenel, Etienne M´emin, and Patrick P´erez A Multi-scale Feature Based Optic Flow Method for 3D Cardiac Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alessandro Becciu, Hans van Assen, Luc Florack, Sebastian Kozerke, Vivian Roode, and Bart M. ter Haar Romeny
576
588
Table of Contents
XIII
A Combined Segmentation and Registration Framework with a Nonlinear Elasticity Smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carole Le Guyader and Luminita A. Vese
600
A Scale-Space Approach to Landmark Constrained Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eldad Haber, Stefan Heldmann, and Jan Modersitzki
612
A Variational Approach for Volume-to-Slice Registration . . . . . . . . . . . . . . Stefan Heldmann and Nils Papenberg Hyperbolic Numerics for Variational Approaches to Correspondence Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Henning Zimmer, Michael Breuß, Joachim Weickert, and Hans-Peter Seidel
624
636
Surfaces and Shapes From a Single Point to a Surface Patch by Growing Minimal Paths . . . . . Fethallah Benmansour and Laurent D. Cohen
648
Optimization of Convex Shapes: An Approach to Crystal Shape Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timo Eirola and Toni Lassila
660
An Implicit Method for Interpolating Two Digital Closed Curves on Parallel Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nikolaos Gabrielides and Laurent Cohen
672
Pose Invariant Shape Prior Segmentation Using Continuous Cuts and Gradient Descent on Lie Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Niels Chr. Overgaard, Ketut Fundana, and Anders Heyden
684
A Non-local Approach to Shape from Ambient Shading . . . . . . . . . . . . . . . Emmanuel Prados, Nitin Jindal, and Stefano Soatto
696
An Elasticity Approach to Principal Modes of Shape Variation . . . . . . . . Martin Rumpf and Benedikt Wirth
709
Pre-image as Karcher Mean Using Diffusion Maps: Application to Shape and Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nicolas Thorstensen, Florent Segonne, and Renaud Keriven
721
Fast Shape from Shading for Phong-Type Surfaces . . . . . . . . . . . . . . . . . . . Oliver Vogel, Michael Breuß, Thomas Leichtweis, and Joachim Weickert
733
Generic Scene Recovery Using Multiple Images . . . . . . . . . . . . . . . . . . . . . . Kuk-Jin Yoon, Emmanuel Prados, and Peter Sturm
745
XIV
Table of Contents
Scale Space and Feature Extraction Highly Accurate PDE-Based Morphology for General Structuring Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Breuß and Joachim Weickert Computational Geometry-Based Scale-Space and Modal Image Decomposition: Application to Light Video-Microscopy Imaging . . . . . . . Anatole Chessel, Bertrand Cinquin, Sabine Bardin, Jean Salamero, and Charles Kervrann
758
770
Highlight on a Feature Extracted at Fine Scales: The Pointwise Lipschitz Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christophe Damerval and Sylvain Meignen
782
Line Enhancement and Completion via Linear Left Invariant Scale Spaces on SE(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remco Duits and Erik Franken
795
Spatio-Featural Scale-Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Felsberg Scale Spaces on the 3D Euclidean Motion Group for Enhancement of HARDI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erik Franken and Remco Duits
808
820
On the Rate of Structural Change in Scale Spaces . . . . . . . . . . . . . . . . . . . . David Gustavsson, Kim S. Pedersen, Francois Lauze, and Mads Nielsen
832
Transitions of a Multi-scale Image Hierarchy Tree . . . . . . . . . . . . . . . . . . . . Arjan Kuijper
844
Local Scale Measure for Remote Sensing Images . . . . . . . . . . . . . . . . . . . . . Bin Luo, Jean-Fran¸cois Aujol, and Yann Gousseau
856
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
869
Graph Cut Optimization for the Piecewise Constant Level Set Method Applied to Multiphase Image Segmentation Egil Bae1 and Xue-Cheng Tai2 1
Department of Mathematics, University of Bergen, Norway
[email protected] 2 Department of Mathematics, University of Bergen, Norway and Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore
[email protected] Abstract. The piecewise constant level set method (PCLSM) has recently emerged as a variant of the level set method for variational interphase problems. Traditionally, the Euler-Lagrange equations are solved by some iterative numerical method for PDEs. Normally the speed is slow. In this work, we focus on the piecewise constant level set method (PCLSM) applied to the multiphase Mumford-Shah model for image segmentation. Instead of solving the Euler-Lagrange equations of the resulting minimization problem, we propose an efficient combinatorial optimization technique, based on graph cuts. Because of a simplification of the length term in the energy induced by the PCLSM, the minimization problem is not NP hard. Numerical experiments on image segmentation demonstrate that the new approach is very superior in terms of efficiency, while maintaining the same quality.
1
Introduction
The level set method [1, 2] is a powerful tool for interphase problems. It has numerous applications in computer vision, fluid dynamics and inverse problems. The interphase is implicitly represented by a higher dimensional level set function. Originally, the signed distance functions were used as level set functions. Later the work of [3, 4, 5] introduced piecewise constant level set functions, representing the interphases by discontinuities. This has certain advantages, such as ability to represent several interphases by one single level set function. This method will be referred to as the piecewise constant level set method (PCLSM) In computer vision, the level set method has been applied with great success to image segmentation. Of particular importance is the Mumford-Shah model [6],
Support from the Norwegian Research Council (eVita project 166075), National Science Foundation of Singapore (NRF2007IDM-IDM002-010) and Ministry of Education of Singapore (Moe Tier 2 T207B2202) are gratefully acknowledged.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 1–13, 2009. c Springer-Verlag Berlin Heidelberg 2009
2
E. Bae and X.-C. Tai
which is an established image segmentation model. In [7, 8], Chan and Vese proposed a numerical realization of this model based on traditional level set functions. In [3, 4, 5], piecewise constant level set functions were proposed. Both approaches lead to a system of nonlinear PDEs that needs to be solved numerically. They both have the drawback of expensive computation. This work aims to significantly reduce the computational cost of the piecewise constant level set method for the multiphase Mumford-Shah model. The length term is often simplified in the energy induced when representing this model by piecewise constant level set functions. We will show that this simplification makes it possible to efficiently compute global minimizers via graph cuts, when the mean image intensity value in each phase is known. Graph cuts is a wellknown technique in image analysis and computer vision [9, 10, 11, 12]. Usually, NP-hard multilabeling problems are approached by constructing algorithms for finding approximate suboptimal solutions, such as alpha expansion [12]. We instead do the approximation in the model, and then compute the exact solution of the approximate model. The graph used for optimization is constructed as in [13, 14, 15], except for some small modifications. Finally, for unknown mean intensity values, an iterative algorithm is presented, which we believe will have large practical value because of the strong efficiency. In case of two phases, some work on graph cut optimization for the MumfordShah model has been made in [16, 17]. Also a multiphase approach based on graph cuts has recently been presented in [18]. The process is started by splitting the image into two regions, by solving the two-phase Mumford-Shah model to optimality. In the next step, each new region is splitted in two by solving the twophase Mumford-Shah model within each region. The process is repeated until the intensity variation within each region falls below a predefined threshold. The limitation of this approach is that the possibility of a region to evolve has been ignored. For instance, the optimal interphase for two regions may not be a subset of the optimal interphases for three regions. An experiment in Section 4 will clarify this. The paper is organized as follows: Section 2 gives a brief overview of the piecewise constant level set method and the Mumford-Shah model. Section 3 presents the new integer optimization approach, while numerical experiments are presented in Section 4.
2 2.1
Image Segmentation and the PCLSM The Mumford-Shah Model
The Mumford-Shah model [6] is an established image segmentation model with a wide range of applications. Let u0 be the input image. In the most common variant with closed contours, one seeks a partition {Ωi }ni=1 of the image domain Ω, and an approximation image u which minimizes the functional E(u, Γi ) = Ω
(u − u0 )2 dx + μ
Ω\∪i Γi
|∇u|2 dx +
n ν i=1
Γi
ds,
(1)
Graph Cut Optimization for the Piecewise Constant Level Set Method
3
where {Γi }ni=1 denotes the interphases between the regions {Ωi }ni=1 . Often u is assumed to be constant within each phase, in which case the second term disappears and one ends up with the simpler version
(u − u0 )2 dx +
E({ci }, Γi ) = Ω
n ν
ds,
(2)
Γi
i=1
n where u = i=1 ci Ψi , and Ψi is the characteristic function of Ωi . As a numerical realization, Chan and Vese [7,8] proposed to represent the above functional with level set functions, and solve the resulting gradient descent equations numerically. In order to represent n phases, log2 (n) level set functions were required. For any n > 2 the length term had to be simplified. 2.2
Piecewise Constant Level Set Functions
In [3, 4, 19], instead the piecewise constant level set method was proposed, and applied to the Mumford-Shah model. This approach has certain benefits, such as the ability to represent any number of phases with one single level set function. Let {Ωi }ni=1 be a partition of the domain Ω into n regions. Any such partition can be described by a piecewise constant level set function φ as follows φ=i
in Ωi
for i = 1, 2, ..., n.
(3)
Note that all interphases are represented by discontinuities in φ. The MumfordShah functional can now be written in terms of φ E(c, φ) =
(u − u0 )2 dx +
Ω
n ν |∇ψi |dx. 2 i=1 Ω
(4)
n where u = i=1 ci ψi , and ψi is the characteristic function of Ωi . It can be derived from the level set function by ψi =
1 αi
(φ − j) with αi =
j=1j =i
(i − k).
(5)
k=1k =i
The length term can be approximated by the total variation of the level set function itself, especially when the number of phases is not too large E(c, φ) =
n i=1
0 2
(u − u ) dx + ν Ω
|∇φ|dx,
(6)
Ω
see for instance [5] for a justification. Most often this approximation is preferred, since it is computationally easier. Such a simplification of the length term has also been made in [20, 21] among others for multiphase image segmentation. In this work we will consider (6).
4
E. Bae and X.-C. Tai
There are some variants of the total variation regularization term. The com monly used version is the isotropic total variation: T V2 (φ) = Ω |∇φ|2 dx = |φx1 |2 + |φx2 |2 dx. In computation, often a simpler verΩ order to simplify sion is used: T V1 (φ) = Ω |∇φ|1 dx = Ω |φx1 | + |φx2 | dx. However, since T V1 is not isotropic, regularization will be stronger in certain directions. A more isotropic version based on the 1-norm can be obtained by splitting T V1 using the original gradient counterclockwise π/4 radians: operator, and one rotated T V1, π4 (φ) = 12 Ω |∇φ(x)|1 + |R π4 ∇φ(x)|1 dx, where R π4 ∇ is the gradient in the rotated coordinate system. It is also possible to create even more isotropic versions by considering more such rotations. Previous attempts to minimize (6) have been made by continuous optimization. In order to force a solution taking only integer values, the following constraint was imposed n K(φ) = (φ − i) = 0 (7) i=1
The constrained optimization problem (6) and (7) could be solved by the augmented lagrangian method as in [3, 4, 19]. Some attempts to speed up the computation can be found in [5]. In the next section we propose to solve the minimization problem by graph cuts. We start by discretizing the variational problem (6) on a grid P of mesh size δ = 1. For each p = (i, j) ∈ P, define the neighborhood systems N4 (p) = {(i ± 1, j), (i, j ± 1)}, and N8 (p) = {(i ± 1, j), (i, j ± 1), (i ± 1, j ± 1)}. The modification of the definition for boundary points is clear. The discrete energy function can now be written compactly Ed (c, φ) =
p∈P
δ 2 (up − u0p )2 + ν
p∈P q∈Nk (p)
1 wpq |φp − φq |, 2
(8)
where k = 4 for T V1 and k = 8 for T V1, π4 . The weights wpq are given by 2
4δ . In case of two phases, similar weights can also be derived by using wpq = k||p−q|| 2 the Cauchy-Crofton formula [22]. Note that each term is being counted twice in the last summation. This is compensated by multiplication by the factor 12 .
3
Integer Optimization for PCLSM and Mumford-Shah
Instead of imposing constraints to force an integer solution by continuous optimization, we instead propose the much more natural approach of using integer optimization to minimize (6). We will show that the discretized functional (8) can be minimized by graph cuts in case the values c are known in Section 3.2. Finally, in Section 3.3 an algorithm is designed to minimize with respect to both c and φ.
Graph Cut Optimization for the Piecewise Constant Level Set Method
3.1
5
Background on Graph Cuts and Terminology
Min-cut is a well known optimization problem. Due to a duality theorem by Ford and Fulkerson [23], there are several fast algorithms for this problem. Graph cuts is a reference to such algorithms, and was introduced as a computer vision tool by Greig et. al. [9] in connection with markov random fields [24]. A graph G = (V, E) is a set of vertices V and a set of directed edges E. We let (a, b) denote the directed edge going from vertex a to vertex b, and let c(a, b) denote the cost (weight) on this edge. In the graph cut scenario there are two distinguished vertices in V, called the source {s} and the sink {t}. A cut on G is a partitioning of the vertices V into two disjoint an connected (through edges) sets (Vs , Vt ) such that s ∈ Vs and t ∈ Vt . For each cut, the set of severed edges C is uniquely defined as C = {(a, b) | a ∈ Vs , b ∈ Vt and (a, b) ∈ E}.
(9)
We say that the cut severs the edge e if e is contained in C. From now on, we refer to the cut as the set of severed edges C. The cost of the cut is defined as |C| = c(e). (10) e∈C
We are interested in finding the cut of minimum cost on G, from now on called the minimum cut. The duality theorem by Ford and Fulkerson [23] states this is equivalent to finding the maximum flow from {s} to {t}, where the edge weights are bounds on the maximum amount of flow that can be pushed through the edges. Cuts of minimum cost can thus be computed very efficiently by max-flow algorithms such as Ford-Fulkerson [23]. See [10] for a detailed discussion about implementation. 3.2
Graph Cuts for the Multiphase PCLSM
For fixed values c, we will show that the minimizer of (8) can be obtained by finding the minimum cut over an appropriate graph, i.e. we will construct a graph G such that min |C| = min Ed (c, φ) + σ, (11) C cut on G
φ
where σ is a constant that will be specified later. Note that the minimizer φ is not influenced by this constant. Some work on graph cuts for the two phase Mumford-Shah model can be found in [17,16]. Unfortunately, the extension to more than two phases is NP hard [25]. The usual graph cut approach to optimization problems of several labels, is to use some sort of approximation method, such as the alpha expansion [12]. Since we have already made an approximation in the model (6), we will show that graph cuts can be used to find the exact minimum. The idea is to introduce an extra dimension to take care of several phases. We construct the graph in a similar way as Ishikawa [13, 14], except for some small technical differences:
6
E. Bae and X.-C. Tai
(a)
(b)
Fig. 1. (a) The graph corresponding to a 1D signal of 6 grid points used for 4 phase segmentation. Edges in ED are depicted as vertical arrows and edges in ER are depicted as horizontal arrows. The gray curve is used to visualize the cut, vertices in the interior to the curve belongs to Vs , vertices in the exterior to the curve belongs to Vt . Edges in C are depicted as dotted arrows. Figure (b) shows the values of φ at each grid point corresponding to the cut in (a), they are determined from definition 1.
our graph consists of one less layer of vertices and edges, and is a generalization from the binary construction of Greig et. al. [9]. We also avoid edges of infinite capacity. When the number of phases is small, this will have a little effect on the efficiency. For each grid point p ∈ P, we associate (n−1) vertices in the graph G, denoted vp, , = 1, ..., n − 1. The set of vertices V is formally defined V = {vp, | p ∈ P, ∈ {1, ..., n − 1}} ∪ {s} ∪ {t}.
(12)
An illustration in case of a 1D image where P = {1, 2, ..., 6}, is shown in Figure 1. For ease of visualization, no 2D cases are shown. The edges are arranged in two groups, ED and ER . The first group ED corresponds to the data term in (8). It is defined as ED = ∪p∈P Ep ,
(13)
where for each p ∈ P the edge set Ep is defined as Ep = (s, vp,1 ) ∪n−2 =1 (vp, , vp,+1 ) ∪ (vp,n−1 , t).
(14)
The edges in ED are illustrated as the vertical arrows in Figure 1. The second group of edges ER corresponds to the regularization term in (8). These are illustrated as the horizontal arrows in Figure 1, i.e. ER = {(vp, , vq, ) | p ∈ P, q ∈ Nk (p), ∈ {1, ..., n − 1}}.
(15)
We say that a cut is admissible if it severs exactly one edge in Ep for each p ∈ P. We can now establish the relationship between a cut on G and a level set function φ.
Graph Cut Optimization for the Piecewise Constant Level Set Method
7
Definition 1. Let C ⊂ E be an admissible cut on G. For any grid point p ∈ P, the corresponding level set function φ is defined as ⎧ if (s, vp,1 ) ∈ C, ⎨1 φp = + 1 if (vp, , vp,+1 ) ∈ C, (16) ⎩ n if (vp,n−1 , t) ∈ C. Note that φ is single valued by the admissible cut requirement. We can now define the edge costs (weights) such that the relationship (11) is satisfied. We start by edges in ED , i.e. the data edges σ c (s, vp,1 ) = δ 2 |u0p − c1 |2 + |P| ∀p ∈ P, σ 2 0 2 c (vp, , vp,+1 ) = δ |up − c+1 | + |P| ∀p ∈ P, σ c (vp,n , t) = δ 2 |u0p − cn |2 + |P| ∀p ∈ P.
∀ ∈ {1, ..., n − 2},
(17)
The costs (weights) for the regularization edges ER are defined by c (vp, , vq, ) = νwpq , ∀p ∈ P, ∀q ∈ Nk (p), ∀ ∈ {1, ..., n − 1}.
(18)
By choosing σ as any positive value, the cut of minimum cost will be admissible, which implies that its corresponding level set function is single valued. Theorem 1. Let C be a minimum cut on G, then C is admissible if σ > 0 Proof. Suppose C is a minimum cut on G and for some p ∈ P several edges in Ep belongs to C. That is, there exists a set of indices Lp such that (vp, , vp,+1 ) ∈ C ∀ ∈ Lp . Define the cut C ∗ s.t. for each p ∈ P, vp, ∈ Vs∗ if ≤ max Lp , else vp, ∈ Vt∗ . Then C ∗ ∩ ED ⊂ C ∩ ED . Since σ > 0, no edges have zero weight, therefore |C ∗ ∩ ED | < |C ∩ ED |. Furthermore, |C ∗ ∩ ER |cardinality ≤ |C ∩ ER |cardinality . For T V1 , the weights on all edges in ER are equal. Therefore |C ∗ | < |C|, which is a contradiction. The same contradiction can also be derived for T V1, π4 . To summarize, for any piecewise constant level set function φ taking values in {1, 2, · · · n}, there exists a unique admissible cut on G. Moreover, the function φ and its corresponding cut satisfies |C| = Ed (c, φ).
(19)
Thus, a function φ corresponding to a minimum cut, is a minimizer of the functional (8), i.e. it solves the segmentation problem. Note that in case n = 2, the extra dimension breaks down, and the graph becomes identical to that of Greig et. al. [9] for binary problems. It is also possible to exactly minimize (8) as in [26], by solving a sequence of binary optimization problems via graph cuts. This approach is likely to be faster when n is very large, and is a power of 2. However, for image segmentation n is relatively small, and we expect the presented approach to be faster.
8
3.3
E. Bae and X.-C. Tai
Algorithm for Minimizing the Mumford-Shah Functional
The algorithm presented in the last section minimizes Ed (c, φ) with respect to φ for a fixed c. Vice versa, for a fixed φ the values c minimizing Ed (c, φ) are given by the average intensity in each region 0 u (x)ψi (x) dx i = 1, 2, ..., n, (20) ci = Ω ψ (x) dx Ω i or in discrete form
p∈P ci =
u0p ψi,p
p∈P
ψi,p
i = 1, 2, ..., n.
(21)
We want an algorithm to minimize both with respect to φ and c. This is achieved by combining the two above results in the following iterative descent algorithm Algorithm 1. Estimate initial values c0 , set l = 0 while( ||cl − cl−1 || > tol ) 1. Use graph cuts to estimate φ from ˜ φ = arg minφ˜ Ed (cl , φ).
(22)
2. Update cl+1 according to equation (21). 3. Update l ← l + 1.
Note that no initialization of the level set function is required. Only the values c0 need to be initialized, which can be achieved very efficiently by the isodata algorithm [27]. Note that algorithm 1 has an exact termination criterion, as tol can be set to zero. In all our experiments, convergence was reached in 4-12 iterations. It must be noted that this algorithm is no longer guaranteed to find the global minima. Theoretically it may get trapped in a local minima close to the initial values c0 . However, in practice it is usually rather insensitive to initialization.
4
Numerical Experiments
In this section we validate our new optimization method by numerical experiments. The results are compared with the original gradient descent approach [3] for minimizing (4) (note: not the variant with simplified length term). The implementation of both these methods is made in C++. Comparisons are made both with respect to quality and computation time on an intel 2.19 GHz laptop. The list of computation times is shown in Table 1. The test images are shown in Figure 2. In all results, the estimated phases are depicted as a bright region. The results of experiment 1 and 2 are depicted in Figure 3 (a) - (d). We observe that graph cut
Graph Cut Optimization for the Piecewise Constant Level Set Method
9
Fig. 2. Test images Table 1. Computation times in seconds for gradient descent vs graph cut optimization
Experiment1 Experiment2 Experiment3 Brain
Size 100x100 100x100 92x98 933x736
Number of Phases Gradient descent Graph Cut 4 50.3 0.120 5 70.0 0.179 5 55.4 0.165 4 5401 25.22
optimization solve the multiphase problem with at least as good quality as gradient descent. In experiment 3 Figure 3(e)(f), the number of regions is assumed to be unknown. The optimal number of regions can be estimated by using more phases than necessary in the minimization problem. As we can see, this results in some empty phases, while the remaining phases capture the correct regions. We have also tested the method on a synthetic brain MRI image. The noise level is 7%, and non-uniformity of the RF-puls is of 20 % (see http://www.bic.mni.mcgill.ca/brainweb/ for details). We want to extract four tissue classes from the image: region 1; background, region 2; cerebrospinal fluid, region 3; gray matter and region 4; white matter. This is achieved by minimizing the Mumford-Shah model with 4 phases. In Figure 4 we compare the results of graph cuts, gradient descent and the exact results. The background phase is not shown. Again, we observe that graph cut results are very good, while the computation time is dramatically reduced compared to gradient descent(c.f. Table 1, brain). Finally, in Figure 5 we show an example which demonstrates the limitation of the multiphase approach presented in [18], described in the related work section. For the chosen parameter ν, the global minimum consists of three phases, which we are able to detect by applying our multiphase algorithm with 4 phases, Figure 5(b) top. The result of the first step of the algorithm presented in [18] is shown in Figure 5(b) buttom, which is the global minimum of the two phase Mumford-Shah functional. Clearly, no further splitting of these regions can result in the correct three phases, since the interphase from the first step is not allowed to evolve.
10
E. Bae and X.-C. Tai
(a) Experiment 1: graph cut
(b) Experiment 1: gradient descent
(c) Experiment 2: graph cut
(d) Experiment 2: gradient descent
(e) Experiment 3: graph cut
(f) Experiment 3: gradient descent Fig. 3. (a) and (b) Experiment 1, from left to right: phase 1 - phase 4. (c) and (d) Experiment 2, from left to right: phase 1 - phase 5. (e) and (f) Experiment 3, from left to right: phase 1 - phase 5.
Graph Cut Optimization for the Piecewise Constant Level Set Method
11
(a) graph cut
(b) gradient descent
(c) exact phases Fig. 4. From left to right: phase 1 - phase 3. (a) Graph cut, (b) gradient descent, (c) exact phases.
(a)
(b)
Fig. 5. (a) Input image. (b) Top: Our approach, from left to right phase 1 - phase 4. Buttom: First step of approach reported in [18], from left to right phase 1 - phase 2.
5
Summary
We have presented an algorithm for efficiently minimizing the energy induced by the piecewise constant level set representation of the multiphase MumfordShah functional. This minimization method is based on graph cuts. Numerical
12
E. Bae and X.-C. Tai
experiments demonstrated the method is very superior in efficiency compared to the previous PDE based approach, while maintaining the same quality of results.
References 1. Dervieux, A., Thomasset, F.: A finite element method for the simulation of a Rayleigh-Taylor instability. In: Approximation methods for Navier-Stokes problems, Proc. Sympos., Univ. Paderborn, Paderborn, 1979. Lecture Notes in Math., vol. 771, pp. 145–158. Springer, Berlin (1980) 2. Osher, S., Sethian, J.: Fronts propagating with curvature dependent speed: algorithms based on hamilton-jacobi formulations. J. Comput. Phys. 79(1), 12–49 (1988) 3. Lie, J., Lysaker, M., Tai, X.: A variant of the level set method and applications to image segmentation. Math. Comp. 75(255), 1155–1174 (2006) (electronic) 4. Lie, J., Lysaker, M., Tai, X.: A binary level set model and some applications to mumford-shah image segmentation. IEEE Transactions on Image Processing 15(5), 1171–1181 (2006) 5. Tai, X., Christiansen, O., Lin, P., Skjaelaaen, I.: Image segmentation using some piecewise constant level set methods with mbo type of project. International Journal of Computer Vision 73, 61–76 (2007) 6. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. 42, 577–685 (1989) 7. Chan, T., Vese, L.: Active contours without edges. IEEE Image Proc. 10, 266–277 (2001) 8. Vese, L.A., Chan, T.F.: A new multiphase level set framework for image segmentation via the mumford and shah model. International Journal of Computer Vision 50, 271–293 (2002) 9. Greig, D.M., Porteous, B.T., Seheult, A.H.: Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society, Series B, 271–279 (1989) 10. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. In: Figueiredo, M., Zerubia, J., Jain, A.K. (eds.) EMMCVPR 2001. LNCS, vol. 2134, pp. 359–374. Springer, Heidelberg (2001) 11. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2), 147– 159 (2004) 12. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. In: ICCV, vol. (1), pp. 377–384 (1999) 13. Ishikawa, H.: Exact optimization for markov random fields with convex priors. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(10), 1333– 1336 (2003) 14. Ishikawa, H., Geiger, D.: Segmentation by grouping junctions. In: CVPR 1998: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, pp. 125–131. IEEE Computer Society, Los Alamitos (1998) 15. Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation part ii: Levelable functions, convex priors and non-convex cases. J. Math. Imaging Vis. 26(3), 277–291 (2006)
Graph Cut Optimization for the Piecewise Constant Level Set Method
13
16. Darbon, J.: A note on the discrete binary mumford-shah model. In: Gagalowicz, A., Philips, W. (eds.) MIRAGE 2007. LNCS, vol. 4418, pp. 283–294. Springer, Heidelberg (2007) 17. Zehiry, N.E., Xu, S., Sahoo, P., Elmaghraby, A.: Graph cut optimization for the mumford-shah model. In: Proceedings of the Seventh IASTED International Conference visualization, imaging and image processing, pp. 182–187. Springer, Heidelberg (2007) 18. El-Zehiry, N.Y., Elmaghraby, A.: A graph cut based active contour for multiphase image segmentation. In: IEEE International Conference on Image Processing, pp. 3188–3191 (2008) 19. Lie, J., Lysaker, M., Tai, X.: Piecewise constant level set methods and image segmentation. In: Kimmel, R., Sochen, N.A., Weickert, J. (eds.) Scale-Space 2005. LNCS, vol. 3459, pp. 573–584. Springer, Heidelberg (2005) 20. Chung, G., Vese, L.A.: Energy minimization based segmentation and denoising using a multilayer level set approach. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 439–455. Springer, Heidelberg (2005) 21. Jung, Y.M., Kang, S.H., Shen, J.: Multiphase image segmentation via modicamortola phase transition. SIAM J. Appl. Math. 67, 1213–1232 (2007) 22. Boykov, Y., Kolmogorov, V.: Computing geodesics and minimal surfaces via graph cuts. In: ICCV 2003: Proceedings of the Ninth IEEE International Conference on Computer Vision, Washington, DC, USA, pp. 26–33. IEEE Computer Society, Los Alamitos (2003) 23. Ford, L., Fulkerson, D.: Flows in networks. Princeton University Press, Princeton (1962) 24. Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. In: Readings in uncertain reasoning, pp. 452–472. Morgan Kaufmann Publishers Inc., San Francisco (1990) 25. Dahlhaus, E., Johnson, D.S., Papadimitriou, C.H., Seymour, P.D., Yannakakis, M.: The complexity of multiway cuts (extended abstract). In: STOC 1992: Proceedings of the twenty-fourth annual ACM symposium on Theory of computing, pp. 241– 251. ACM, New York (1992) 26. Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation part i: Fast and exact optimization. J. Math. Imaging Vis. 26(3), 261–276 (2006) 27. Velasco, F.R.D.: Thresholding using the ISODATA clustering algorithm. IEEE Trans. Systems Man Cybernet. 10(11), 771–774 (1980)
Tubular Anisotropy Segmentation Fethallah Benmansour and Laurent D. Cohen CEREMADE, UMR CNRS 7534, Université Paris Dauphine, Place du Maréchal De Lattre De Tassigny, 75775 PARIS CEDEX 16, France {benmansour,cohen}@ceremade.dauphine.fr
Abstract. In this paper we present a new interactive method for tubular structure extraction. The main application and motivation for this work is vessel tracking in 2D and 3D images. The basic tools are minimal paths solved using the fast marching algorithm. This allows interactive tools for the physician by clicking on a small number of points in order to obtain a minimal path between two points or a set of paths in the case of a tree structure. Our method is based on a variant of the minimal path method that models the vessel as a centerline and surface. This is done by adding one dimension for the local radius around the centerline. The crucial step of our method is the definition of the local metrics to minimize. We have chosen to exploit the tubular structure of the vessels one wants to extract to built an anisotropic metric giving higher speed on the center of the vessels and also when the minimal path tangent is coherent with the vessel’s direction. This measure is required to be robust against the disturbance introduced by noise or adjacent structures with intensity similar to the target vessel. We obtain promising results on noisy synthetic and real 2D and 3D images.
1 Introduction In this paper we deal with the problem of finding a complete segmentation of tubular structures like vessels. The main objective is to extract at the same time the centerline of the tubular structure and its boundary. During the last two decades, the extraction of vascular objects such as the blood vessel, coronary arteries, or other tube-like structures has attracted the attention of more and more researchers. Various methods such as vascular image enhancement methods [1, 2, 3], or others were proposed, see [4] for a complete survey. Some of these methods extract the vessel boundary directly, and then use thinning methods to find its centerline. Other methods extract only the centerline and then estimate the vessel width to extract its boundary. Deschamps and Cohen [5] proposed to use the minimal path method to find the centerline. The minimal path technique introduced by Cohen and Kimmel [6] captures the global minimum curve between two points given by the user. This leads to the global minimum of an active contour energy. Since then, the minimal path method has been improved by many researchers, and adapted to anisotropic media as done by Jbabdi et al for tractography [7]. Unfortunately, despite their numerous advantages, classical minimal path techniques exhibit some disadvantages. First, vessel boundary extraction can be very difficult, even in 2D X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 14–25, 2009. c Springer-Verlag Berlin Heidelberg 2009
Tubular Anisotropy Segmentation
15
where the vessel’s boundary can be completely described by two curves. Second, the path given by the minimal path technique does not always yield to the centerline of the vessel. A readjustment step is required to obtain a central trajectory. Third, the minimal path technique provides only a trajectory and does not give information about the vessel boundary and local width. Li and Yezzi [8] proposed a new variant of the classical, purely spatial, minimal path technique by incorporating an extra non-spatial dimension into the search space. Each point of the 4D path (after adding the extra dimension for the 3D image) consists of three spatial coordinates plus a fourth coordinate which describes the vessel thickness at that corresponding 3D point. Thus, each 4D point represents a sphere in 3D space, and the vessel is obtained by taking the envelope of these spheres as we move along the 4D curve. A crucial step of this method is to build an adequate potential that drives the propagation. Li and Yezzi [8] proposed different isotropic potentials. As they said in the conclusion of their paper, the proposed potentials are very parameter dependent and they hoped to find more appropriate choice of potential. In particular, one can see in their paper, that the potential used does not yield to a correct detection of the radius when it is not constant (see figure 6 in [8]). An other drawback of Li and Yezzi method is that they did not take into account the vessel orientation. Our first contribution is to take into account the vessel orientation by defining a suited anisotropic metric that makes the propagation faster along the centerlines and for the adequate radius. Law et al. [9] proposed a new scalar descriptor called Optimally Oriented Flux (OOF) for the detection of curvilinear structures. But they did not exploit the orientation given by their descriptor. The major advantage of the OOF technique is that it does not consider the regions in the vicinity of target objects, where background noise or adjacent structures with intensity similar to the target vessels are possibly present. Therefore, the disturbance introduced by the closely located nearby structures is avoided. The second contribution of this paper is to build an anisotropic metric based on the OOF descriptor, its scalar function as well as its orientation. That makes the propagation faster along the vesselÕs center line and for exact associated scale. This means that the path location, orientation and scale (radius) have to be coherent with the local geometry of the image extracted by the OOF. In section 2, we give some background on minimal path method and Anisotro-pic Fast Marching. In section 3 the Optimally Oriented Flux descriptor is presented as well as the metric construction. In section 4, results on synthetic and real data are shown. Finally, conclusions and perspectives follow in section 5.
2 Background on Minimal Path Method A minimal path, first introduced in the isotropic (P does not depend on the orientation of the path) case [6], is a pathway minimizing the energy functional, E(γ) = P γ(s), γ (s) ds (1)
γ
γ (.)T M(γ(.))γ (.) describes an infinitesimal distance where, P(γ(.), γ (.)) = along a pathway γ relative to a metric tensor M (symmetric definite positive). Thus,
16
F. Benmansour and L.D. Cohen
we are considering only the case of an elliptic medium. In the isotropic case M(.) = P 2 (.)I, where I is the identity matrix. A curve connecting p1 to p2 that globally minimizes the above energy (1) is a minimal path between p1 and p2 , noted Cp1 ,p2 . The solution of this minimization problem is obtained through the computation of the minimal action map U : Ω → R+ associated to p1 on the domain Ω which can be a 2D, 3D or 4D domain. The minimal action is the minimal energy integrated along a path between p1 and any point x of the domain Ω : ∀ x ∈ Ω, U(x) = min P γ(s), γ (s) ds , (2) γ∈Ap1 ,x
γ
where Ap1 ,x is the set of paths linking x to p1 . The values of U may be regarded as the arrival times of a front propagating from the source p1 with oriented velocity related to the metric tensor M−1 . U satisfies the Eikonal equation ∇U(x)M−1 (x) = 1 for x ∈ Ω, and U(p1 ) = 0,
(3)
√ where vM = vT M v. The map U has only one local minimum, the point p1 , and its flow lines satisfy the Euler-Lagrange equation of functional (1). Thus, the minimal path Cp1 ,p2 can be retrieved with a simple gradient descent on U from p2 to p1 (see Fig. 1), solving the following ordinary differential equation with standard numerical methods like Heun’s or Runge-Kutta’s : dCp1 ,p2 (s) ∝ −M−1 (Cp1 ,p2 (s))∇U Cp1 ,p2 (s) , with Cp1 ,p2 (0) = p2 . ds
(4)
Proof of (3) and (4) can be found in [10,7]. On figure 1, we show some examples of the minimal path method on an isotropic case and an anisotropic one. On the first image of figure 1 the metric is isotropic and the potential P in the grey region is twice as low as the white one. Isolevel sets of the minimal action map associated to the source point p1 are displayed and so the minimal path Cp1 ,p2 . The second image represents a metric M. We took two constant metrics in each half side of the image with different orientations. On the last image, the minimal action map U associated to the metric M and to the source point p1 is shown. The minimal path Cp1 ,p2 is found by solving equation (4).
p1 p1
p2
p2 0
50
100
150
200
250
300
350
Fig. 1. Minimal path examples on an isotropic case on the left image. On the middle, visualization by small ellipses of eigenvalues of a metric constant on each half side of the image. On the right, the minimal action map associated to the source point p1 with the minimal path Cp1 ,p2 .
Tubular Anisotropy Segmentation
17
The Fast Marching Method (FMM) is a numerical method introduced by Sethian in [11] and Tsitsiklis in [12] for efficiently solving the isotropic Eikonal equation on a cartesian grid. The central idea behind the FMM is to visit grid points in an order consistent with the way wavefronts of constant action propagate. It leads to a single-pass algorithm for solving equation (3) and computing the minimal action map U. Tsitsiklis’s method relies on minimizing directly the energy functional of equation (1) while Sethian’s method uses the Eikonal equation. Both methods are suitable for isotropic metric, but they fail for anisotropic metric [13]. To deal with anisotropy, Sethian and Vladimirsky [10] proposed an update scheme that converges to the viscosity solution of the anisotropic Eikonal equation. A simplified scheme, based on the original Tsitsiklis’s method [12], was proposed by Lin in [14] to approximate the solution of the anisotropic Eikonal equation. Contrary to Sethian and Vladimirsky’s ordered upwind method (OUM) [10], Lin’s algorithm does not converge to the viscosity solution of the Eikonal equation. In this paper we used Lin’s scheme to solve the anisotropic Eikonal equation, since it is much faster (than OUM) and the introduced errors do not affect much the extracted geodesics. The FMM is a front propagation approach that computes the values of U in increasing order, and the structure of the algorithm is almost identical to Dijkstra’s algorithm for computing shortest paths on graphs [15]. In the course of the algorithm, each grid point is tagged as either Alive (point for which U has been computed and frozen), Trial (point for which U has been estimated but not frozen) or Far (point for which U is unknown). The set of Trial points forms an interface between the set of grid points for which U has been frozen (the Alive points) and the set of other grid points (the Far points). This interface may be regarded as a front expanding from the source until every grid point has been reached. Let us denote by NM (x) the set of M neighbors of a grid point x, where M = 2 × d if the dimension of Ω is equal to d. Initially, all grid points are tagged as Far, except the source point p1 that is tagged as Trial. At each iteration of the FMM one chooses the Trial point with the smallest U value, denoted by xmin . Then, xmin is tagged as Alive and the value of U is updated for each point of the set NM (xmin ) which is either Trial or Far. In order to satisfy a causality condition, the way U is updated in the vicinity of xmin requires special care. The iteration ends by tagging every Far point of the set NM (xmin ) as Trial. The algorithm automatically stops when all grid points are Alive. The key to the speed of the FMM is the use of a priority queue to quickly find the Trial point with the smallest U value. If Trial points are ordered in a min-heap data structure, the computational complexity of the FMM is O(N logN ), where N is the total number of grid points. A crucial step of the Fast Marching algorithm is the computation of the weighted distance between the front and the neighbouring voxels in the Trial set. Here, we present a way to estimate this weighted distance in the anisotropic case and only in 3D. It is straightforward to extend it to 4D. Since the distance is anisotropic, we cannot use the standard methods, because they rely on the fact that the geodesics are perpendicular to the level sets of U. To take into account the anisotropy Jbabdi et al [7] and Lin [14] considered a set of simplexes that cover the whole neighbourhood around a voxel of the narrow band. The definition of a simplex neighbouring a point x is simply a set of three points (x1 , x2 , x3 ) that are among the 26 neighbours of x, defining a triangle that we
18
F. Benmansour and L.D. Cohen x3 xm x
x2
x1
Fig. 2. On the left Position of the optimal point on a simplex such as to minimize the geodesic distance to x. On the right the considered simplexes.
denote x1 x2 x3 . There are 48 such triangles around x for the 26 connexity. To make the update procedure faster, we propose to consider only the simplexes defined by a t-uple of three points of the 6-neighbors of x. There are 8 such triangles (see Fig. 2), and by making this modification, the precision of the algorithm is lower but the algorithm is six times faster. To estimate U(xm ), where xm is a neighbor of the last trial point xmin , we make two approximations. If the geodesic passing by xm comes from a triangle x1 x2 x3 then the time of arrival is given by: xm U(x) + U(xm ) = min P (γ, γ ) (5) x∈x1 x2 x3
x
The term one wants to minimize is approximated by : f (α) =
3
i=1
3
αi U(xi ) + x − αi xi i=1
M(x)
,
(6)
3 where α = (α1 , α2 , α3 ), with i=1 αi = 1 since the point x is in the triangle (see figure 2). This equation follows Tsitsiklis’s approximation [12]. The first term approximates the value of the minimal action map at the point x = 3i=1 αi xi by a simple linear interpolation. And the second term approximates the remaining distance by considering the metric constant along the segment [x, xm ] equal to its value at point xm . 3 The function f is convex and the constraints on α, i.e i=1 αi = 1 and αi ≥ 0, define a convex subset. Thus the minimization of f can be done using classical optimization tools. See [7] for more details. For each of the eight triangles, we get a value u. Finally, we choose the triangle giving the smallest value of u. Note that in order to approximate ∇U, computing the derivatives of U in the triangle using the estimate U(xn ) gives a consistent approximation of ∇U(xn ) by the following: ∇U(xn ) = (U(xn ) − U(x ))
xn − x , xn − x
where x is the minimizer of function f , see figure 2 left, and . is the Euclidean norm. The computation of the gradient is very useful since it is used to solve the gradient descent described by equation (4).
Tubular Anisotropy Segmentation
19
3 Optimally Oriented Flux: An Anisotropy Descriptor We are interested in the construction of a metric that extracts from the image the geometric information leading to reconstruction of vessels. This means that we wish to find an estimate for the local orientation and scale and a criterion on the local geometry to distinguish the presence of vessels from the background. At the position x on an image I, the amount of the image gradient projected along the axis v flowing out from a 3D sphere (or a 2D circle) Sr is measured as in [9],
h da, (7) (∇(G ∗ I(x + h)) · v)v · f (x, v; r) = |h| ∂Sr where G is a Gaussian function with a scale factor of 1 pixel, r is the sphere (or circle) radius, h is the position vector along ∂Sr and da is the infinitesimal area (or length) on ∂Sr . To detect vessels having higher intensity than the background region, one would be interested in finding the vessel direction which minimizes f (x, v; r), i.e. we are looking for: arg min f (x, v; r). Using the divergence theorem, it can be shown that f (x, v; r) v can be calculated using a simple convolution, f (x, v; r) = vT {(∂i,j G) ∗ I ∗ 1Sr } v,
(8)
where (∂i,j G) is the Hessian matrix of function G and 1Sr is the indicator function inside the sphere (or circle) Sr . By differentiating the above equation with respect to v, minimization of function f is in turn acquired as solving a generalized eigenvalue decomposition problem. Solving the aforementioned generalized eigen decomposition problem gives d eigenvalues (where d = 2 or 3 is the dimension of the image), λ1 (·) ≤ · · · ≤ λd (·) and d eigenvectors vi (·), i.e. λi (x; r) = f (x, vi (x; r); r) for i = 1, . . . , d. To handle the vessels having various radii, a multi-scale approach should be used along with the OOF method. In [9], Law and Chung have proposed to normalize the OOF’s eigenvalues by the sphere surface area when the OOF method is incorporated in a multiscale approach for 3D image volumes. In the 2D case the eigenvalues are normalized by the circle perimeter 2πr. In the 3D case the eigenvalues are normalized by the sphere area 4πr2 . In the 2D case (see figure 3), for a point on the centerline and if r is equal to radius of the vessel, the first eigenvector v1 represents the direction orthogonal to the vessel. v2 reprensents the direction along the vessel. In the 3D case, if the point is on the centerline, the two eigenvectors associated to the first eigenvalues (λ1 , λ2 ) represent the directions orthogonal the vessel. v3 represents the direction along the vessel, see figure 4. On the same figure, one can see that if the point x is on the centerline, the minimal response of the function f is obtained when the radius r is equal to the exact radius of the tube. If the point is inside the tube but not on the centerline, v3 is parallel to the tube orientation, and the other eigenvectors depends on the scale r. If the point is outside the tube (last line), then the vector v3 , corresponding to the red area, is oriented toward the centerline. Li and Yezzi [8] proposed a new variant of the classical, purely spatial, minimal path technique by incorporating an extra non-spatial dimension into the search space. The crucial step of this method is to build an adequate metric that drives the propagation.
20
F. Benmansour and L.D. Cohen
Fig. 3. The plots of the values of f (x, v; r) obtained from the synthetic image shown in the left, at four different positions with various radii and projection axes. (a) Four interested positions, denoted as x1 , x2 , x3 and x4 are shown along with the original synthetic image. (b) An illustration regarding the polar coordinate system used in (c)-(f). (c)-(f) The plots of the values of f (·) and the corresponding eigenvectors, computed at the four different positions shown in (a), using various values of r and different projection axes (cos θ sin θ)T .
Fig. 4. Plot of f (x, v; r) superimposed on the original 3D synthetic image for three different points(on each line) and different values of the radius : r = 3, . . . , 7 from left to right. The radius of the tube on the top half side image is equal to 4, and equal to 6 on the bottom half side. Similarly to figure 3, the visualization of the normalized flux function is done using a spherical coordinate system (instead of the polar one used in 2D). The first point is on the centerline of the tube. The second point is inside the tube but not on the centerline. The third point is outside the tube. The reader should zoom on each image. Notice that the colormaps are different.
Li and Yezzi [8] proposed different isotropic potentials. The main drawback, as they mention, is that these potentials are very parameter dependent and they do not exploit the vessel orientation. Our main contribution is to improve Li and Yezzi method by adding to it an anisotropic formulation, and the anisotropic metric is constructed by extension of the OOF descriptor presented by Law et al. [9].
Tubular Anisotropy Segmentation
21
Fig. 5. The constructed metric for different scales r = 1, 5, 10, 15, 20 from left to right. The original image is shown in figure 3 (a), the radius of the structure is equal to 10. We used the same color range for all images, so one can see that the optimal anisotropy is obtained along the centerline of the tubular structure when the scale r is equal to the exact radius of the tube. On ˜ the top, we show a display of M(x, r)−1 . On the bottom, responses of Pradii are shown.
The (d + 1)D minimal path is found by minimizing the following energy: γ (s)T M(γ(s))γ (s) ds, where M is the (d + 1)D anisotropic metric we want γ to construct. It is not natural to consider orientations on the (d + 1) dimension, i.e the radii dimension. Thus one candecompose by block the metric M as follows: ˜ M(x, r) 0 ˜ where M(x, r) is a d × d symmetric definite M(x, r) = 0 Pradii (x, r) positive matrix giving the spatial anisotropy and Pradii (x, r) is the radii potential (also strictly positive). Since the result given by the anisotropic minimal path method is very dependent on the metric, results inherit advantages and drawbacks of the constructed metric, thus we should be very carful with its construction. First, let us fix conditions on the desired ˜ has to be well oriented along the vessel centerline. And metric. The spatial metric M the radii potential Pradii has to be small for the adequate scale for any point of the image. ˜ is symmetric Pradii corresponds to the inverse speed for the radii dimension. Since M d T ˜ m definite positive, we can decompose it as follows: M(.) = i (.)ui (.)ui (.) , i=1 where 0 < m1 ≤ · · · ≤ md are the eigenvalues and ui are the associated eigenvectors. √ The velocity of the propagating front along direction ui is equal to 1/ mi . We used the OOF descriptor to construct the metric as follows: d λj (.) λ (.) i j = i T i=1 ˜ M(.) = vi (.)vi (.) , Pradii (.) = β exp α . exp α d−1 d i=1 (9) The constant α is controlled by an intuitive parameter, which is the maximal exp(αλ2 (x, r)) in the 2D case and μ = spatial anisotropy ratio: μ = max x,r exp(αλ1 (x, r)) d
22
F. Benmansour and L.D. Cohen
⎫
⎧ 3 (x,r) ⎨ exp α λ2 (x,r))+λ ⎬ 2 in the 3D case. By choosing the maximal spatial
max x,r ⎩ 2 (x,r) ⎭ exp α λ1 (x,r))+λ 2 anisotropy ratio μ, the constant α is fixed. And by doing so, the anisotropy descriptor M becomes contrast invariant because the OOF is linear on the image. The parameter β controls the radii speed. In 2D (it is very similar in 3D), if Pradii ≤ exp(αλ1 ) then the Fast Marching propagation is faster for the radii than the spatial dimensions. If Pradii ≥ exp(αλ2 ) then the propagation is slower. One can tune parameter β depending on the tubular structure one wants to extract. If its radius changes a lot then β should be chosen such that the propagation on the radii dimension is faster. If not β is chosen such that the propagation is less sensitive on the radii dimension. On figure 5 the constructed metric of image 3 at some different scales is shown. Since we chose the same color range for the visualization, we can see that the directions are well detected, and that the optimal values are obtained along the centerline of the tube
Fig. 6. The red cross points are source points given by the user, and the blue ones are end points. On each case the segmented centerlines are displayed as well as the envelope of the moving discs. In the middle, the associated minimal action map U as well as the 3D minimal path between the two selected points are shown (transparent visualization).
Tubular Anisotropy Segmentation
23
when the scale is equal to the tube radius. For our experiments, we took μ = 10 and 1) β such that max exp(αλ = 5, this means that in the worst case, the speed along Pradii the radii dimension is 5 times faster than the spatial dimensions. We did so, because we wanted our algorithm to be sensitive to the radii dimension.
4 Experimental Results Our method is minimally interactive. First, the user has to precise if the desired vessels are darker or brighter than the background. So, we can consider different criteria on the signs of the eigenvalues. Then the scale range [rmin , rmax ], which corresponds to the range of radii of the vessel one wants to extract, is given by the user. Finally few points are required as source points or end points of the Fast Marching algorithm. We used the metric described in the previous section to find the minimal anisotropic path (as described in section 2) between two or more selected points (see figures 6 and 7). For any selected point, the associated radius is equal to the minimal radius rmin given by the user. On figure 6, segmentation results on synthetic and real noisy 2D images are shown. On the first synthetic image, the source point and destination are selected on the centerline. The obtained tube is perfectly detected as well as the centerline. On the second
Fig. 7. First line : RCA segmentation using the tubular anisotropy approach shown on the whole image and on the selected sub-volume. Second line : LAD segmentation shown on the whole image and on the selected sub-volume. Only few points are required (the extremities of the paths). The tubular anisotropy method provides the centerline as well as vessels boundaries.
24
F. Benmansour and L.D. Cohen
image, the initial points are not centered. But the centerline given by our algorithm goes back fast to the real centerline. This makes our algorithm robust to initialization. The third synthetic image shows that our approach is robust to scale changing. On the last line of figure 6, segmentation results are shown on real noisy images. In figure 7, segmentation results are shown on real medical images. First, right coronary arteries (RCA) are segmented. Second, left anterior descending (LAD) arteries are segmented. One can see that the obtained radii on the principal coronary branches are larger than those of the secondary. Thus, our approach is robust to scale changing and bifurcations. Nevertheless, our current implementation requires huge memory allocations due to the 4D and anisotropic aspects. To overcome this issue, we added a preprocessing interactive tool to select a sub-volume containing the desired vessels (see figure 7). Moreover, we are working on a new implementation of the tubular anisotropy approach to make the memory allocation dynamic and hence to benefit from the front propagation aspect of the fast marching algorithm. Besides the reduction of the computation time (which has been actually achieved), we will save on memory allocation and will have a new version of our algorithm that extract the whole coronary arteries using a regular PC.
5 Conclusion In this paper we have proposed a new general method for tubular structure extraction in 2D and 3D images. Our method exploit the orientation of the vessels by using the optimally oriented flux to construct a multi-resolution anisotropic metric that extracts from the image the local geometry and describes the vessels orientation and scales. Combining this metric with anisotropic minimal path technique, we were able to find a complete description of the tubular structure, i.e the centerline as well as the boundary. To summarize, our method is minimally interactive, robust to initialization, scale variations and bifurcations.
Acknowledgements We would like to thank Professor Anthony J. Yezzi and Max Wai-Kong Law for interesting discussions. Also Eduardo Davila for his precious help for the implementation of the interface. This work was partially supported by ANR grant SURF -NT05-2_45825.
References 1. Sato, Y., Nakajima, S., Shiraga, N., Atsumi, H., Yoshida, S., Koller, T., Gerig, G., Kikinis, R.: Three-dimensional multi-scale line filter for segmentation and visualization of curvilinear structures in medical images. Med. Image Anal. 2(2), 143–168 (1998) 2. Krissian, K.: Flux-based anisotropic diffusion applied to enhancement of 3D angiogram. TMI 21(11), 1440–1442 (2002) 3. Frangi, A., Niessen, W.J., Vincken, K.L., Viergever, M.A.: Multiscale vessel enhancement filtering. In: Wells, W.M., Colchester, A.C.F., Delp, S.L. (eds.) MICCAI 1998. LNCS, vol. 1496, pp. 130–137. Springer, Heidelberg (1998)
Tubular Anisotropy Segmentation
25
4. Kirbas, C., Quek, F.K.H.: A review of vessel extraction techniques and algorithms. ACM Computing Surveys 36, 81–121 (2004) 5. Deschamps, T., Cohen, L.: Fast extraction of minimal paths in 3D images and applications to virtual endoscopy. MIA 5(4) (December 2001) 6. Cohen, L.D., Kimmel, R.: Global minimum for active contour models: a minimal path approach. International Journal of Computer Vision 24, 57–78 (1997) 7. Jbabdi, S., Bellec, P., Toro, R., Daunizeau, J., Pélégrini-Issac, M., Benali, H.: Accurate anisotropic fast marching for diffusion-based geodesic tractography. Journal of Biomedical Imaging 2008(1), 1–12 (2008) 8. Li, H., Yezzi, A.: Vessels as 4D curves: Global minimal 4D paths to extract 3D tubular surfaces and centerlines. IEEE Transactions on Medical Imaging 26(9), 1213–1223 (2007) 9. Law, M.W.K., Chung, A.C.S.: Three dimensional curvilinear structure detection using optimally oriented flux. In: ECCV, vol. 4, pp. 368–382 (2008) 10. Sethian, J.A., Vladimirsky, A.: Fast methods for the eikonal and related hamilton- jacobi equations on unstructured meshes. Proceedings of the National Academy of Sciences 97(11), 5699–5703 (2000) 11. Sethian, J.A.: A fast marching level set for monotonically advancing fronts. Proceedings of the National Academy of Sciences 93, 1591–1595 (1996) 12. Tsitsiklis, J.N.: Efficient algorithms for globally optimal trajectories. IEEE Transactions on Automatic Control 40, 1528–1538 (1995) 13. Chopp, D.L.: Replacing iterative algorithms with single-pass algorithms. Proc. Nat. Acad. Sc. USA 98(20), 10992–10993 (2001) 14. Lin, Q.: Enhancement, extraction, and visualization of 3D volume data. PhD thesis, Linkopings Universitet (2003) 15. Dijkstra, E.W.: A note on two problems in connection with graphs. Numerische Mathematic 1, 269–271 (1959)
An Unconstrained Multiphase Thresholding Approach for Image Segmentation Benjamin Berkels Institut für Numerische Simulation, Rheinische Friedrich-Wilhelms-Universität Bonn, Nussallee 15, 53115 Bonn, Germany
[email protected] http://numod.ins.uni-bonn.de/ Abstract. In this paper we provide a method to find global minimizers of certain non-convex 2-phase image segmentation problems. This is achieved by formulating a convex minimization problem whose minimizers are also minimizers of the initial non-convex segmentation problem, similar to the approach proposed by Nikolova, Esedo¯ glu and Chan. The key difference to the latter model is that the new model does not involve any constraint in the convex formulation that needs to be respected when minimizing the convex functional, neither explicitly nor by an artificial penalty term. This approach is related to recent results by Chambolle. Eliminating the constraint considerably simplifies the computational difficulties, and even a straightforward gradient descent scheme leads to a reliable computation of the global minimizer. Furthermore, the model is extended to multiphase segmentation along the lines of Vese and Chan. Numerical results of the model applied to the classical piecewise constant Mumford-Shah functional for two, four and eight phase segmentation are shown.
1
Introduction
Image segmentation is one of the fundamental research topics in the field of image processing. In particular, the Mumford-Shah model [1] is widely used in this context. One of the difficulties of this and many other variational image processing models is that the underlying energy functional has local, non-global minima. This is not only a theoretical problem, since the commonly used numerical minimization techniques often get stuck in local minima that differ considerably from a global minimum, hence possibly producing useless results. The goal of this paper is to introduce a method to obtain a global minimizer of the Mumford-Shah functional for 2-phase segmentation that only involves solving an unconstrained convex minimization problem. This method can be extended to multiphase segmentation by the ideas of Vese and Chan [2] in a canonical way. 1.1
Related Work
The problem of minimizing the Mumford-Shah segmentation functional has been extensively studied in the last decade leading to a wide range of existing methods, X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 26–37, 2009. c Springer-Verlag Berlin Heidelberg 2009
Image Segmentation by Unconstrained Multiphase Thresholding
27
each with its own shortcomings. One of the first numerical feasible methods to obtain (local) minimizers of the functional was proposed by Chan and Vese [3]. They build on the levelset methods of Osher and Sethian [4] and parameterize the unknown set by a levelset function. Shen [5] developed a Γ -convergence formulation along with a simple implementation by the iterated integration of a linear Poisson equation. The unknown set is represented in a diffuse way by a phase field. In [6], Esedo¯glu and Tsai tackle the minimization problem based on the threshold dynamics of Merriman, Bence and Osher [7] for evolving an interface by its mean curvature. Here the minimization is achieved by alternating the solution of a linear parabolic partial differential equation and simple thresholding. Alvino and Yezzi [8] approximate Mumford-Shah segmentation using reduced image bases. According to them, the majority of the robustness of Mumford-Shah segmentation can be obtained without allowing each pixel to vary independently. Their approximative model has comparable performance to Mumford-Shah segmentations where each pixel is allowed to vary freely. A way to obtain global minimizers was introduced by Nikolova, Esedo¯ glu and Chan [9]. Here, a convex constrained minimization problem has to be solved followed by a simple thresholding of the latter minimizer. This method is closely related to the method we propose in this paper, the key difference is that [9] requires a constraint in the convex minimization while the model proposed in this paper does not involve any constraint in the convex formulation. On the other hand there are methods to solve a certain class of minimal surface problems by unconstrained convex optimization, cf. the work of Chambolle and Darbon [10,11]. The 2-phase Mumford-Shah functional belongs to this class, yet due to the best of our knowledge nobody seems to have tapped the potential offered by these general insights for Mumford-Shah based image segmentation so far.
2
Constrained Global 2-Phase Minimization
First let us describe the general framework and revise the work of Nikolova et al. [9], the starting point for our model. In the following, Ω denotes our computational domain, an arbitrary but fixed subset of Rn . For given indicator functions f1 , f2 ∈ L1 (Ω) such that f1 , f2 ≥ 0 a.e. we consider the prototype Mumford-Shah energy EMS [Σ] := f1 dx + f2 dx + ν Per(Σ), (1) Σ
Ω\Σ
where Per(Σ) denotes the perimeter of the set Σ ⊂ Ω in Ω. If u0 is an image, c1 , c2 ∈ R are two grey values and fi (x) := (u0 (x) − ci )2 , this is the well known piecewise constant Mumford-Shah functional for 2-phase segmentation, i.e. E[Σ, c1 , c2 ] = (u0 − c1 )2 dx + (u0 − c2 )2 dx + ν Per(Σ). (2) Σ
Ω\Σ
28
B. Berkels
Remark 1. Because of
EMS [Σ] = Σ
(f1 − f2 )dx + ν Per(Σ) +
f2 dx, Ω
ˆMS [Σ] =:E
ˆMS share the same minimizers. EMS and E 2
Remark 2. For h(x) := e−|x| , we have (f1 + h) dx + (f2 + h) dx + ν Per(Σ) − hdx, EMS [Σ] = Σ Ω\Σ Ω =C 0 a.e. in Ω without loss of generality. To obtain (local) minimizers of the functional above, Chan and Vese [3] proposed to parametrize the unknown set Σ by a levelset function φ and get the energy ECV [φ] := H(φ)f1 + (1 − H(φ))f2 + ν|∇(H(φ))|dx. Ω
Here, H(·) denotes the Heaviside function, i.e. H(s) = 1 for s > 0 and H(s) = 0 else. A gradient descent will be used for minimization, therefore H is replaced by a smeared out Heaviside function, e.g. Hδ (x) := 12 + π1 arctan xδ , where δ > 0. While the specific choice is not important, it is important to use a function whose derivative does not have compact support (cf. [3]). This gives the regularized energy ECV,δ [φ] := Hδ (φ)f1 + (1 − Hδ (φ)) f2 + ν |∇(Hδ (φ))| dx (3) Ω
and yields the gradient descent
∇φ . ∂t φ = Hδ (φ) (f2 − f1 ) + ν div |∇φ|
(4)
One of the major drawbacks of the energy (3) is its non-convexity in φ. In [9], Nikolova et al. noted that the gradient descent (4) and
∇φ ∂t φ = (f2 − f1 ) + ν div |∇φ| have the same stationary points, because Hδ (φ) > 0. Obviously the latter is the gradient descent of the energy ECE [φ] := (f1 − f2 )φ + ν|∇φ|dx. Ω
Image Segmentation by Unconstrained Multiphase Thresholding
29
In general, f1 − f2 takes positive and negative values, therefore the energy is not bounded (neither from below nor from above). In other words, it does not necessarily have a minimizer. However, this is easily fixed by restricting the minimization to 0 ≤ φ(x) ≤ 1 for all x ∈ Ω. Based on this, the following theorem holds: Theorem 1. For given indicator functions f1 , f2 ∈ L1 (Ω) such that f1 , f2 ≥ 0 a.e., let u := argmin (f1 − f2 )˜ u + ν|∇˜ u|dx = argmin ECE [˜ u] 0≤˜ u≤1
0≤˜ u≤1
Ω
and Σc := {x ∈ Ω|u(x) > c}. Then Σc is a minimizer of the Mumford-Shah energy (1) for all c ∈ [0, 1). Proof. Nikolova et al. proved this theorem in [9] for a.e. c ∈ [0, 1], we extend it here to hold not only for almost every, but for every c ∈ [0, 1). First, we briefly sketch the prove given by Nikolova et al. for a.e. c ∈ [0, 1]. Using 0 ≤ u ≤ 1 and the coarea formula, one can show 1 ECE [u] = EMS [Σc ]dc − C, 0
where C is a constant independent of u. Let Σ∗ ⊂ Ω be a minimizer of EMS (the existence of such minimizers using convergence in measure follows from standard arguments) and let M := {c ∈ [0, 1]|EMS [Σc ] > EMS [Σ∗ ]}. Assuming μ(M ) > 0 leads to the contradiction ECE [χΣ∗ ] < ECE [u] therefore μ(M ) = 0 holds and the statement is proven for a.e. c ∈ [0, 1]. Here, χA denotes the characteristic function of the set A. Now we extend the statement to all c ∈ [0, 1), inspired by the proof of Lemma 4 (iii) in [12]: Again let u be a minimizer of ECE under the constraint 0 ≤ u ≤ 1 and denote its superlevelsets by Σc . Choose an arbitrary but fixed cˆ ∈ [0, 1). The statement holds for a.e. c ∈ [0, 1], so by Remark 1, there exists a sequence (cn ) ∈ [0, 1]N with cn ↓ cˆ such that ˆMS [Σ]. Σcn ∈ argmin E Σ⊂Ω
Since the superlevelsets of a function are contained in each other, we have χΣcn =
∞ χ nk=1 Σck → χΣ ∪ pointwise a.e., where Σ ∪ := n=1 Σcn . Setting g := f1 − f2 and using Lebesgue’s dominated convergence theorem, we obtain gdx = gχΣ ∪ dx = lim gχΣcn dx = lim gdx. Σ∪
Ω
n→∞
n→∞
Ω
Σcn
Here we used gχΣcn ≤ |g| ≤ |f1 | + |f2 | to provide the integrable upper bound. For each n and Σ ⊂ Ω, we have gdx + ν Per(Σcn ) ≤ gdx + ν Per(Σ). Σ cn
Σ
30
B. Berkels
Using the continuity argument from above and the lower semicontinuity of the perimiter (cf. [13]), we get gdx + ν Per(Σ ∪ ) ≤ gdx + ν Per(Σ), Σ∪
Σ
i.e. Σ ∪ is a minimizer of EMS [·, cn ]. Combining this with Σc = {x ∈ Ω|u(x) > c} =
∞
{x ∈ Ω|u(x) > cn } =
n=1
concludes the proof.
∞
Σc n
n=1
Knowing that Theorem 1 holds true for all c ∈ [0, 1) also remedies the last bit of “uncertainty” left in [9]. Remark 3. For any function u that fulfills the constraint, obviously {u > 1} = ∅. Therefore we cannot expect Theorem 1 to hold for c = 1. To solve the constrained optimization problem, Nikolova et al. show that the constrained problem has the same minimizers as the unconstrained problem if a penalty term of the form α p(u(x)) is added with asufficiently large coefficient α (cf. [9], Claim 1). Here p denotes p(s) = max{0, 2 s − 12 − 1} . While this result already gives a method to find global minimizers of EMS by solving a convex, unconstrained minimization problem, its practical relevance is limited. Most numerical minimizations methods rely on the gradient of the functional, but the proposed penalty term is not differentiable, making a regularization necessary. But any smooth regularization of the penalty term will stop the minimizers of the convex, constrained functional to coincide with those of the convex functional with penalty term. The stronger the regularization, the more the minimizers deviate. Furthermore, the regularization imposes numerical difficulties. If an explicit gradient descent is used for the minimization (as proposed in [9]), a suitable timestep size control is needed to ensure convergence. The step sizes allowed by such methods, e.g. the Armijo rule [14], typically correspond to the size of the region in which the linearization of the functional properly approximates the functional. Due to the nature of the penalty term p, the linearization at 0 and 1 of a regularized version of it only approximates the regularization properly in a region that is of the size of the regularization parameter. So, as soon as the current iterate of the gradient descent takes values near 0 or 1, the timestep control only allows timestep sizes of the order of the regularization parameter, which, as mentioned above, cannot be chosen too big. Instead of using a penalty term one could of course also approach the constrained convex optimization problem directly. This is done for example by Bresson et al. [15]. Their approach does not need a penalty term and gives an efficient algorithm to minimize ECE , but has to introduce an additional unknown v and a regularization parameter θ and needs to minimize for u and v alternatingly. Furthermore, the key idea to apply Chambolle’s TV minimization algorithm [16] can
Image Segmentation by Unconstrained Multiphase Thresholding
31
also be directly applied to our model to obtain a simpler and faster minimization algorithm: There is no need to introduce v, θ and the alternating minimization. Therefore it is worth to investigate whether it is possible to simplify the problem by getting rid of the constraint altogether.
3
Unconstrained Global 2-Phase Minimization
Another alternative to Chan Vese is a phase field approach [6, 5] with a typical double well term:
1 2 2 EPH, [u] := u (1 − u)2 + |∇u| dx. u2 f1 + (1 − u)2 f2 + ν Ω A minimizer u of this energy is a diffuse representation of the segmentation, i.e. {u = 0} and {u = 1} represent the two segments respectively with a smooth transition in between. EPH, [u] is known to Γ -converge to EMS [5], but unfortunately not convex and does not permit jumps in u for > 0. Knowing both ECE and EPH, , the question arises whether it is possible to combine the advantages of both models while eliminating some of the disadvantages. Heuristically looking at both energies served as motivation to investigate the following energy: E[u] := u2 f1 + (1 − u)2 f2 + ν|∇u|dx. (5) Ω
This energy is convex because it does not involve the non-convex double well term of EPH, , and can be minimized without imposing constraints because it does not have the indicator term from ECE that is not bounded from below. Furthermore, it permits jumps in u. Remark 4. Given a function u, obviously we have E[min{max{0, u}, 1}] ≤ E[u]. Therefore, a minimizer umin fulfills 0 ≤ umin ≤ 1. While the proposed functional has some nice obvious properties, it is far from obvious whether there is a relation between its minimizer and minimizers of EMS . Before we tackle this question, let us remark a link between ECE and E: Remark 5. There is a direct relationship between ECE and E: A straightforward calculation shows u2 f1 + (1 − u)2 f2 = (f1 − f2 ) u + (u − 12 )2 (f1 + f2 ) − 14 (f1 + f2 ) + f2 . Therefore
(f1 − f2 ) u + (u − 12 )2 (f1 + f2 ) − 14 (f1 + f2 ) + f2 + ν|∇u|dx =ECE [u] + (f1 + f2 )(u − 12 )2 dx + C.
E[u] =
Ω
Ω
In other words, E essentially equals ECE plus an additional quadratic penalty energy. The constant C is clearly irrelevant for the minimizers.
32
B. Berkels
To investigate the relation between the minimizers of E and minimizers of EMS we can make use of the theory derived in the context of the connection between minimal surface problems and total variation minimization. The following general statement has been made by Chambolle [17], Chambolle and Darbon [11], in the continuous setting, its discrete counterpart is well known: Theorem 2. Let Ψ : Ω × R → R, (x, s) → Ψ (x, s) such that Ψ (x, ·) is C 1 and uniformly convex for all x ∈ Ω and u := argmin Ψ (x, u ˜(x)) + ν|∇˜ u|dx. u ˜
Ω
Then Σc := {x ∈ Ω|u(x) > c} for all c ∈ R is a minimizer of ∂s Ψ (x, c)dx + ν Per(Σ). Σ
Note that this general statement cannot be directly applied to the model of Nikolova et al. discussed in Section 2 because the integrand is neither uniformly (not even strictly) convex nor does the general statement incorporate the constraint. As remarked in [11], the proof for a more specific statement given in [10] still applies to Theorem 2. Theorem 3. If u is a minimizer of (5), then {u > 12 } minimizes EMS [Σ] = f1 dx + f2 dx + ν Per(Σ). Σ
Ω\Σ
Proof. Let Ψ (x, s) := s2 f1 (x)+(1−s)2 f2 (x). Obviously Ψ (x, ·) is C 2 for all x ∈ Ω and we have ∂s Ψ (x, s) = 2sf1 (x)+2(s−1)f2 (x) and ∂s2 Ψ (x, s) = 2(f1 (x)+f2 (x)). From Remark 2, we know that f1 , f2 > 0 a.e., therefore Ψ (x, ·) is uniformly convex for a.e. x ∈ Ω. Now just apply Theorem 2, noting ∂s Ψ (x, 12 ) = f1 (x) − f2 (x) and Remark 1.
In this sense, our theorem is a corollary of Theorem 2. The preceding theorem finally tells us how to find a global minimizer of EMS [·] given in (1): Minimize the convex energy (5) and threshold the minimizer to 1 2 . In case of the piecewise constant Mumford-Shah functional for 2-phase segmentation, we obtain a global minimizer of the Mumford-Shah energy (2) with respect to Σ for fixed gray values c1 , c2 . We do not necessarily find a global minimizer with respect to Σ, c1 and c2 . Another link between ECE and EPH, is the so-called piecewise constant levelset method [18] for 2-phase segmentation that constrains the levelset function to be piecewise constant. If this constraint is approximated with a penalty energy, the method equals the phase field approach. If the constraint is relaxed to a certain boundedness constraint, the method equals [9]. In both cases the fidelity term has to be altered accordingly, making use of the fact that this term is the same in ECE and EPH, if u only takes the values 0 and 1.
Image Segmentation by Unconstrained Multiphase Thresholding
33
Since (5) is similar to the Rudin-Osher-Fatemi energy [19], there is a wide variety of established minimization schemes to choose from, ranging from a straightforward gradient descent scheme with a differentiable approximation of the BV term over primal thresholding methods [20] to sophisticated methods based on the dual formulation of the BV norm, e.g. [16, 11]. 2 With Ψ (x, s) = 12 (s − (f2 (x) − f1 (x))) , another immediate consequence of Theorem 2 is that the zero superlevelset of a minimizer of the ROF energy 1 2 (u − (f2 − f1 )) + ν|∇u|dx EROF [u] := (6) 2 Ω ˆMS and therefore of EMS . This is another way to obtain is a global minimizer of E a global minimizer of EMS by unconstrained convex optimization, but compared to (5) this method has a few shortcomings, cf. Sections 4 and 5. Furthermore, the boundedness mentioned in Remark 4 does not hold for minimizers of the ROF energy. Perhaps this is one of the reasons why nobody seems to have used the classical ROF function for Mumford-Shah based image segmentation so far.
4
Multiphase Segmentation
Our functional can be extended to multiphase segmentation by the using the idea of Vese and Chan [2] in a straightforward manner. To keep notation at bay, we restrict the discussion to segmentation in 4 phases. The segmentation in 2n phases works analogously. Let f1 , f2 , f3 , f4 ∈ L1 (Ω) such that fi ≥ 0 a.e., then the multiphase functional is given by E[u1 , u2 ] := u21 u22 f1 + (1 − u1 )2 u22 f2 Ω
+ u21 (1 − u2 )2 f3 + (1 − u1 )2 (1 − u2 )2 f4
(7)
+ ν (|∇u1 | + |∇u2 |) dx. If we fix u2 , the reduced functional E[·, u2 ] is the same as the 2-phase functional (5) with the indicator functions f˜1 = u22 f1 + (1 − u2 )2 f3 and f˜2 = u22 f2 + (1 − u2 )2 f4 . As in the 2-phase case, we can assume fi > 0 a.e. without loss of generality and because either u22 > 0 or (1 − u2 )2 holds, we have f˜1 , f˜2 > 0. Therefore, all statements proven for the 2-phase functional can be applied to E[·, u2 ], i.e. we can compute the global minimum (for fixed u2 ). The same applies for fixed u1 , so as an optimization strategy, we propose to minimize with respect to u1 and u2 alternatingly. Even though it is easy to extend (5) to multiphase segmentation, the same does not apply to the ROF energy (6). There is no apparent extension in the sense of [2] to formulate the multiphase segmentation in a single functional.
5
Indicator Parameters
In typical segmentation tasks, the indicator functions depend on unknown parameters, e.g. the grey values for each segment in case of the piecewise constant
34
B. Berkels
Mumford-Shah model. For the sake of simplicity, we discuss the latter model in its 2-phase formulation here, i.e. fi (x) := (u0 (x) − ci )2 , i = 1, 2, but this discussion applies to other indicator functions and multiphase segmentation as well. During the minimization of (5) we have to minimize for c1 and c2 as well. This is typically done in an alternating fashion, but there are two apparent possibilities to update the grey values: Minimize (5) with respect to c1 and c2 or do so for the energy in the set formulation (1). The two possible updating formulae for c1 two arising are u2 u0 dx u2 dx or c1 = u0 dx dx. c1 = Ω
Ω
{u> 12 }
{u> 12 }
The two possibilities only coincide if u is binary. The first formula not only averages u0 in {u > 12 }, instead it takes into account the values of u0 everywhere, but weights the values according to u2 . To a certain degree this is similar to the effect of the regularization of the Heaviside function in the model of Chan and Vese. From our experiments, this reduces the chance of getting stuck in local minima that can still occur when minimizing over u and the indicator parameters. Particulary in the case of multiphase segmentation it turned out to be beneficial. Due to the different way f1 and f2 are used in the ROF energy (6), it is not quadratic in c1 and c2 . So this functional does not give a natural formula to update the grey values.
6
Numerical Examples
To conclude, we show the practical usability of the proposed model by applying it to the classical piecewise constant Mumford-Shah functional, see equation (2). As minimization method we use an explicit gradient descent scheme with the Armijo√rule [14] as timestep size control. The absolute value is regularized by |z| = z 2 + 2 (in all examples presented here, = 0.1 is used). For the spatial discretization, we use bilinear finite elements on a regular quadrilateral grid, i.e. each pixel of the input image u0 corresponds to a node of the finite element mesh. The grey values c1 and c2 are initialized with 0 and 1 respectively and updated occasionally during the gradient descent. Figure 1 shows results of our method and of the one proposed by Nikolova et al. [9] on one artificial image and one digital photo. In both examples, the minimizer u from our model is far from being binary, but this is nothing to be expected from the theory presented in this paper. The 0.5-superlevelset gives an accurate segmentation that is not influenced by the presence of heavy noise (top row) and works on non-binary input images (bottom row). The minimizers u of the Nikolova et al. model look very different, but the segmentation obtained from the 0.5-superlevelsets is almost identical. Upon closer inspection, the minimizer u of our model from the top row of Figure 1 looks very much like as obtained by minimizing the ROF energy with
Image Segmentation by Unconstrained Multiphase Thresholding
35
Fig. 1. Segmentation of an artificial noisy structure (ν = 2 · 10−3 , top row) and the well-known Matlab cameraman image (ν = 4·10−3 , bottom row): Input image u0 (left), segmentation function u and 0.5-superlevelset of u colored with the average grey values c1 , c2 obtained by our model (middle) and by using ECE (right). The slight difference of the grey values is attributed to the employed update formula, cf. Section 5.
Fig. 2. 4-phase segmentation of an artificial noisy image (top row) and a MRI image (bottom row) (ν = 6 · 10−4 ): Input image u0 (left), segmentation functions u1 and u2 (middle), segmentation colored with the average grey values c1 , ..., c4 (right)
u0 as input image. This is not surprising due to the following observation: If u0 is binary, i.e. u0 = χA for a set A ⊂ Ω and c1 = 0, c2 = 1 we have f1 = (χA − 0)2 = χA and f2 = (χA − 1)2 = χΩ\A and therefore E[u] = Ω
(u − χΩ\A )2 + ν|∇u|dx,
36
B. Berkels
Fig. 3. Segmentation of a digital photo (ν = 2 · 10−5 ). Input image u0 (left), segmentation in four (middle) and eight (right) segments colored with the average grey values c bigmama / PIXELIO. of the segments. Original image
Fig. 4. Intermediate results of the segmentation in eight segments shown in Figure 3 after 50 (left), 250 (middle) and 700 (right) gradient descent steps
i.e. E equals the ROF energy in this special case. This is not the case if u0 is non-binary which can be seen from the bottom row of Figure 1. Figure 2 shows 4-phase segmentation results. Those indicate the tendency of the segmentation functions to become binary for small values of ν. Finally, Figure 3 illustrates the behavior of the method for different numbers of segments and Figure 4 shows three timesteps of the 8-phase segmentation.
References 1. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Communications on Pure Applied Mathematics 42, 577–685 (1989) 2. Vese, L.A., Chan, T.F.: A multiphase level set framework for image segmentation using the Mumford and Shah model. International Journal of Computer Vision 50(3), 271–293 (2002) 3. Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Transactions on Image Processing 10(2), 266–277 (2001) 4. Osher, S.J., Sethian, J.A.: Fronts propagating with curvature dependent speed: Algorithms based on Hamilton–Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 5. Shen, J.: Γ -convergence approximation to piecewise constant Mumford-Shah segmentation. In: Blanc-Talon, J., Philips, W., Popescu, D.C., Scheunders, P. (eds.) ACIVS 2005. LNCS, vol. 3708, pp. 499–506. Springer, Heidelberg (2005)
Image Segmentation by Unconstrained Multiphase Thresholding
37
6. Esedo¯ glu, S., Tsai, Y.H.R.: Threshold dynamics for the piecewise constant Mumford-Shah functional. Journal of Computational Physics 211(1), 367–384 (2006) 7. Merriman, B., Bence, J.K., Osher, S.J.: Diffusion generated motion by mean curvature. CAM Report 92-18, UCLA (1992) 8. Alvino, C.V., Yezzi, A.J.: Fast Mumford-Shah segmentation using image scale space bases. In: Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, vol. 6498 (2007) 9. Nikolova, M., Esedo¯ glu, S., Chan, T.F.: Algorithms for finding global minimizers of image segmentation and denoising models. SIAM Journal on Applied Mathematics 66(5), 1632–1648 (2006) 10. Chambolle, A.: An algorithm for mean curvature motion. Interfaces and free Boundaries 6, 195–218 (2004) 11. Chambolle, A., Darbon, J.: On total variation minimization and surface evolution using parametric maximum flows. CAM Report 08-19, UCLA (2008) 12. Alter, F., Caselles, V., Chambolle, A.: A characterization of convex calibrable sets in RN . Mathematische Annalen 332(2), 329–366 (2005) 13. Ambrosio, L., Fusco, N., Pallara, D.: Functions of bounded variation and free discontinuity problems. Oxford Mathematical Monographs. Oxford University Press, New York (2000) 14. Kosmol, P.: Methoden zur numerischen Behandlung nichtlinearer Gleichungen und Optimierungsaufgaben, 2nd edn. Teubner, Stuttgart (1993) 15. Bresson, X., Esedo¯ glu, S., Vandergheynst, P., Thiran, J., Osher, S.: Fast global minimization of the active contour/snake model. Journal of Mathematical Imaging and Vision 28(2), 151–167 (2007) 16. Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 20(1-2), 89–97 (2004) 17. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 18. Lie, J., Lysaker, M., Tai, X.C.: A binary level set model and some applications to Mumford-Shah image segmentation. IEEE Transactions on Image Processing 15(5), 1171–1181 (2006) 19. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 20. Daubechies, I., Defrise, M., de Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics 57(11), 1413–1457 (2004)
Extraction of the Intercellular Skeleton from 2D Images of Embryogenesis Using Eikonal Equation and Advective Subjective Surface Method Paul Bourgine1 , Peter Frolkovič2, Karol Mikula2 , Nadine Peyriéras3, and Mariana Remešíková2 1
2
CREA, Ecole Polytechnique-CNRS, 1 rue Descartes, 75005, Paris, France
[email protected] Department of Mathematics, Slovak University of Technology, Radlinského 11, 81368 Bratislava
[email protected],
[email protected],
[email protected] 3 CNRS-DEPSN, Avenue de la Terasse, 91198, Gif-sur-Yvette, France
[email protected] Abstract. We suggest an efficient method for automatic detection of the intercellular skeleton in microscope images of early embryogenesis. The method is based on the solution of two advective PDEs. First, we solve numerically the time relaxed eikonal equation in order to obtain the signed distance function to a given set – a set of points representing cell centers or a set of closed curves representing segmented inner borders of cells. The second step is a segmentation process driven by the advective version of subjective surface equation where the velocity field is given by the gradient of the computed distance function. The first equation is discretized by Rouy-Tourin scheme and we suggest a fixing strategy that significantly improves the speed of the computation. The second equation is solved using a classical upwind strategy. We present several test examples and we show a practical application - the intercellular skeleton extracted from a 2D image of a zebrafish embryo.
1
Introduction
The measure of the cell contact surface (intercellular skeleton) is an important quantitative characteristic of a living organism, especially during its embryonic development [6]. Together with other characteristics, e.g. the volume of the embryo, the global and local density of cells, the density of cell divisions etc., it provides an insight into the process of the evolution of the organism and allows to detect abnormalities or to compare individuals evolving in different conditions. The intercellular skeleton can be extracted from the miscroscope images of the evolving embryo. Fig. 1 shows an example of suitable image data. These images display significant cell structures (cell nuclei and cell membranes) of a zebrafish embryo at an early stage of its development and they were obtained by a two-photon confocal microscope. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 38–49, 2009. c Springer-Verlag Berlin Heidelberg 2009
Extraction of the Intercellular Skeleton
39
The main goal of our paper is to introduce an efficient and easily implementable method for detecting the intercellular skeleton. Our technique is based on numerical solution of a pair of advective partial differential equations. The first step is the solution of the time relaxed eikonal equation with a special Dirichlet type condition. The solution of such an equation is the distance function to a given set. This can be a set of points representing cell centers or a set of closed curves representing inner borders of cells obtained by segmentation. In case we deal with the curves, we construct the signed distance function with negative values in the interior part. We discretize the problem using the explicit Rouy-Tourin scheme and we suggest to extend the original scheme by a fixing technique. The idea of fixing is based on the fact that the Rouy-Tourin scheme applied to the time relaxed eikonal equation produces in every point monotonically increasing values approaching the value of the distance function. At some moment, the value will reach some steady state and the point can be excluded from the calculations. This strategy provides a significant improvement of the efficiency of the method and it brings a natural stopping criterion for the computation. We compare the performance of our algorithm with the computation of the exact distance function and we provide some examples of situations when the numerical solution can be obtained faster. The second step of our procedure is the segmentation using the advective version of the subjective surface equation. For each cell, we construct an initial segmentation function. Afterwards, all level sets of this function are evolving according to the velocity field given by the gradient of the computed signed distance function. By taking one of the level sets of the final form of the evolving function, we obtain the part of the required intercellular skeleton corresponding to one particular cell. The complete skeleton is constructed as the union of the results corresponding to individual cells. Using the distance function corresponding to cell centers, we get a Voronoi type cell skeleton that is already a good approximation of the real one as the cell formations are naturally similar to Voronoi tiling. A very realistic skeleton localization can be obtained if we consider the distance function to the segmented inner boundaries of the cells, assuming that we have a good quality cell segmentation. For pratical purposes, it is even sufficient to perform only a few time steps of the Rouy-Tourin scheme and use a rough estimate of the distance function in order to obtain a correctly oriented velocity field for the segmentation. This makes the method very efficient without loss of the quality of the resulting skeleton. The advective subjective surface equation is discretized by an explicit upwind approach. The paper is organized as follows. In Sec. 2, we describe the mathematical models for the two substeps of the procedure. Sec. 3 presents the numerical schemes and it explains the idea of the fixing algorithm intended to reduce the CPU time needed to compute the numerical solution of the eikonal euqation. In Sec. 4, we provide a series of numerical experiments as well as an example of a skeleton extracted from a 2D microscope image of a zebrafish embryo. Let us note that we present a method for solving two-dimensional problems, but the extension to three dimensions is rather straightforward.
40
P. Bourgine et al.
Fig. 1. 2D slices of 3D image of a zebrafish embryo. Left, the cell nuclei. Right, the cell membranes.
2
Mathematical Models
The first equation involved in our skeleton extraction strategy is the eikonal equation with time relaxation dt + |∇d| = 1
(1)
solved in the domain Ω × [0, TD ] where Ω is the image domain and coupled with a Dirichlet type condition d(x, t) = 0,
x ∈ Ω0 ⊂ Ω.
(2)
By the problem formulated in this way the solution d approaches, as time is evolving, the distance function to the set Ω0 . In our case, as we have already mentioned, Ω0 can be a set of points corresponding to approximate cell centers or a set of closed curves representing the segmented inner boundaries of the cells. The signed distance function can be constructed straightforwardly. The result of the cell shape segmentation is a level set function. Choosing one of the level sets to represent the inner boundary of the cell, we are able to recognize the inner and outer parts of the cell [1, 2] and assign the corresponding sign to the distance function. The distance function corresponding to a set of points is always positive. In the second step, we use the computed signed distance function in the advective part of the subjective surface model [4, 2] and we solve the equation ut + ∇g · ∇u = 0
(3)
where (x, t) ∈ Ω × [0, TS ] and g(x) = d (x, TD ) according to [8] or g(x) = −1/(1 + Kdp (x, TD )) with K > 0, p > 0 as in [4, 2]. The unknown function u is initialized by a piecewise constant profile localized around the approximate cell center. Then it is evolved by (3). The intercellular borders are represented by a p
Extraction of the Intercellular Skeleton
41
chosen level set of the function u(x, TS ). Due to the properties of the function d (see Fig. 3 and 6), the border lines of the neighboring cells correspond to the ridges of the distance function and are attached to each other and thus form the intercellular skeleton.
3 3.1
Numerical Schemes Time Relaxation Method with Fixing for Computing the Distance Function
In order to solve the equation (1) with the condition (2) numerically, we use an explicit time discretization with time step τD . Afterwards, the equation is discretized in space by the Rouy-Tourin scheme [3], cf. also [5,7]. As it is natural for image processing applications, the space grid elements correspond to the pixels of the image. Let us consider a rectangular space domain with dimensions Lx × Ly . The space grid is then uniform and consist of square elements Vij , i = 1 . . . nx , j = 1 . . . ny , nx = Lx /hD , ny = Ly /hD where hD is the length of the side of the pixel. For each volume Vij , let dij represent the approximate value of the solution d in the center of Vij in time step n. Let us define Mijpq , p, q ∈ {−1, 0, 1}, |p| + |q| = 1 as 2 Mijpq = min dni+p,j+q − dnij , 0 The Rouy-Tourin scheme for problem (1) then reads as follows τD 1,0 0,−1 n+1 n + max Mij max Mij−1,0 , Mij , Mij0,1 dij = dij + τD − hD
(4)
This scheme is stable for τD ≤ hD /2 and we take advantage of the fact, that it produces monotonically increasing updates that are gradually approaching a steady state. This property allows us to implement (4) in a computationally efficient way. Let us consider the index set F n that contains the indices (i, j) of the volumes where the steady state has been already reached, i.e. there exists such n0 ∈ N , n0 ≤ n, that dnij0 = dnij0 −1 . The set F 0 is given as follows. At the beginning, we compute exact distances to the set Ω0 (which is a set of points corresponding to cell centers or a set of curves representing the inner boundaries of cells) in a local (one pixel) neighborhood. Then F 0 consists of the indices of all volumes with these exact values including the set Ω0 . Then the method is given by Algorithm 1. 3.2
Advective Subjective Surface Method for Detecting the Intercellular Skeleton
Now we discretize equation (3). Again, we consider explicit time discretization with time step τS and the space grid elements are indentified with the pixels of the image. The space discretization is based on the upwind principle. If the
42
P. Bourgine et al.
Algorithm 1. Fixing method for distance function • if (i, j) ∈ F n then continue • else −1,0 0,−1 1,0 0,1 τD = dn max Mij , Mij , Mij + max Mij • dn+1 ij + τD − hD ij •
n+1 if dn+1 = dn = F n ∪ {(i, j)} ij then F ij
lentgh of the side of the pixel is denoted by hS and we define the central differy x ences Dij g = (gi+1,j − gi−1,j )/(2hS ), Dij g = (gi,j+1 − gi,j−1 )/(2hS ), we get the following approximation of (3) x x τS un+1 max Dij = unij − g, 0 (uij − ui−1,j ) + min Dij g, 0 (ui+1,j − uij ) ij hS y y g, 0 (uij − ui,j−1 ) + min Dij g, 0 (ui,j+1 − uij ) (5) + max Dij As the initial condition we take a shock-like profile localized around the cell center. Due to the properties of the signed distance function computed by the method described in Sec. 3.1, we can see that the advective velocity ∇g drives all level lines of the initial segmentation function to the ridges of the distance function, cf. Fig. 6. These ridges represent the intercellular skeleton.
4 4.1
Numerical Experiments Computation of the Signed Distance Function
Now let us present some computational results obtained by Algorithm 1. We inspected the experimental order of convergence of the suggested method, the CPU time needed for the computation and the effect of the fixing strategy. The results were also compared with the distance function computed analytically. First, let us make a note about the stopping criteria for the methods. If the fixing technique is not applied, the computation is stopped either when ||dn+1 − dn ||L1 (Ω) ≤ ε1 0} and Ω/D = {x ∈ Ω ; φ(x) < 0}. The level values c1 , c2 ∈ R are unknown and have to be determined as well. As already observed in [3], the Heaviside operator H maps H 1 (Ω) into the set V := {χD ; D ⊂ Ω measurable, Hn−1 (∂D) < ∞}, where Hn−1 (S) denotes the (n-1)-dimensional Hausdorff-measure of the set S. Therefore, the operator P in (3) maps H 1 (Ω) × R2 into the admissible parameter set U := {u = q(v, c1 , c2 ); v ∈ V and c1 , c2 ∈ R}, where q : V × R2 (v, c1 , c2 ) → c1 v + c2 (1 − v) ∈ L∞ (Ω). Using the level set framework introduced above, the inverse problem in (1), with data given as in (2), can be written in the form of the operator equation F (P (φ, c1 , c2 )) = y δ .
(4)
Once an approximate solution (φ, c1 , c2 ) of (4) is obtained, a corresponding solution of (1) can be computed using equation (3). In this article, approximate solutions to (4) are obtained by minimizing the Tikhonov functional Gα (φ, c1 , c2 ) := F (P (φ, c1 , c2 )) − y δ 2Y + α β1 |H(φ)|BV + β2 φ − φ0 2H 1 2 + β3 j=1 |cj − cj0 |2 , (5) based on T V -H 1 penalization. Here φ0 and cj0 are known reference parameters. This Tikhonov functional extends the ones proposed in [5, 6, 9] (based on T V penalization) and [3, 8] (based on T V -H 1 penalization). To motivate the regularization terms in (5), notice that they effect: i) the boundedness of the level lines of φ as well as it’s H 1 -norm; ii) the boundedness of cj . These two facts allow us to guarantee existence of (generalized) minimizers of Gα in L∞ ∩ BV.
52
A. DeCezaro, A. Leitão, and X.-C. Tai
This article is outlined as follows: In Section 2 we introduce the concept of generalized minimizers for the functional Gα in (5). In Section 3 we derive a convergence analysis for this Tikhonov approach. In Section 4 we introduce stabilized functionals and prove that the corresponding minimizers approximate a minimizer of Gα . Section 5 is devoted to numerical experiments. A level set type method is implemented for solving a two-dimensional inverse potential problem.
2
The Concept of Generalized Minimizers
We shall consider the model problem described as in the introduction under the following general assumptions: (A1) Ω ⊆ Rn is bounded, connected, with piecewise C 1 boundary ∂Ω. (A2) The operator F : D(F ) ⊂ L1 (Ω) → Y is continuous and Fréchetdifferentiable on D(F ) with respect to the L1 (Ω)-topology. (A3) ε, α and βj , j = 1, 2, 3 denote positive parameters. (A4) Equation (1) has a solution, i.e. there exists u ∈ U satisfying F (u) = y and a function φ ∈ H 1 (Ω) satisfying |∇φ| = 0, in a neighborhood of {φ = 0} such that H(φ) = z, for some z ∈ V. Moreover, there exist constants values c1 , c2 ∈ R such that q(z, c1 , c2 ) = u. For each ε > 0, we define the operator Pε (φ, c1 , c2 ) := c1 Hε (φ) + c2 (1 − Hε (φ)) ,
(6)
where Hε is the continuous approximation to H given by: 1 + t/ε for t ∈ [−ε, 0] Hε (t) := . H(t) for t ∈ R/ [−ε, 0] In order to guarantee existence of a minimizer of Gα in (5), we adapt to the level-set framework described above, the concept of generalized minimizers formulated in [3]. Definition 1. Let the operators H, P , Hε and Pε be defined as above. a) A vector (z, φ, c1 , c2 ) ∈ L∞ (Ω) × H 1 (Ω) × R2 is called admissible when there exists a sequence {φk } of H 1 (Ω)-functions satisfying limk φk − φL2 = 0, and also there exists a sequence {εk } ∈ R+ converging to zero such that limk Hεk (φk ) − zL1 = 0. b) A minimizer of Gα is considered to be any admissible vector (z, φ, c1 , c2 ) minimizing 2 Gα (z, φ, c1 , c2 ) := F (q(z, c1 , c2 )) − y δ Y + αR(z, φ, c1 , c2 ) (7) over the set of admissible vectors, where
R(z, φ, c1 , c2 ) = ρ(z, φ) + β3 2j=1 |cj − cj0 |2 , ρ(z, φ) := inf lim inf β1 |Hεk (φk )|BV + β2 φk − φ0 2H 1 . k→∞
(8) (9)
The infimum in (9) is taken over all sequences {εk } and {φk } characterizing (z, φ, c1 , c2 ) as an admissible vector.
On Level-Set Type Methods
53
c) A generalized minimizer of Gα (φ, c1 , c2 ) is an admissible vector (z, φ, c1 , c2 ) minimizing the functional Gα in (7) on the set of admissible vectors. 2.1
Relevant Properties of Admissible Vectors
First we verify some basic properties of the operators Pε , Hε and q that will be necessary in the subsequent analysis. Lemma 1. Let Ω be given as above and j = 1, 2. (i) Let {zk }k∈N be a bounded sequence in L∞ (Ω) converging to some element z in L1 (Ω) and {cjk }k∈N be a sequence of real numbers converging to cj . Then q(zk , c1k , c2k ) converges to q(z, c1 , c2 ) in L1 (Ω). (ii) Let (z, φ) ∈ L1 (Ω) × H 1 (Ω), be such that Hε (φ) → z in L1 (Ω) as ε → 0 and let c1 , c2 ∈ R. Then Pε (φ, c1 , c2 ) → q(z, c1 , c2 ) in L1 (Ω) as ε → 0. (iii) Given ε > 0, let {φk }k∈N be a sequence in H 1 (Ω) converging to φ ∈ H 1 (Ω) in the L2 -norm. Then Hε (φk ) → Hε (φ) in L1 (Ω), as k → ∞. Moreover, if {cjk }k∈N are sequences of real numbers converging to some cj , then q(Hε (φk ), c1k , c2k ) → q(Hε (φ), c1 , c2 ) in L1 (Ω), as k → ∞. Proof. Since Ω is assumed to be bounded, we have L∞ (Ω) ⊂ L1 (Ω). To prove (i), notice that 1 2 1 2 q(zk , ck , ck )−q(z, c , c )L1 = |c1k zk + c2k (1 − zk ) − c1 z − c2 (1 − z)| dx Ω
1 (|c | + |c2 |)|zk − z| + |c2k − c2 | dx ≤ |zk | (|c1k − c1 | + |c2k − c2 |)dx + Ω
Ω
≤|Ω| zk L∞ (|c1k − c1 | + |c2k − c2 |) + (|c1 | + |c2 |)zk − zL1 + |Ω| |c2k − c2 | , which converges to zero as k → ∞. Assertion (ii) follows with similar arguments. The first part of assertion (iii) is a direct consequence of the inequality Hε (φk )− Hε (φ)L1 (Ω) ≤ ε−1 meas(Ω)φk − φL2 (Ω) . The second part of assertion (iii) follows by a combination of the inequality above and assertion (i). Lemma 2. Let (zk , φk , c1k , c2k ) be a sequence of admissible vectors converging in L1 (Ω)× L2 (Ω)× R2 to some (z, φ, c1 , c2 ). Then (z, φ, c1 , c2 ) is also an admissible vector. Sketch of the proof. In order to prove that (z, φ, c1 , c2 ) is also an admissible vector, one uses an argument of extraction of diagonal subsequences, analogously as in [8, Lemma 2]. 2.2
Relevant Properties of the Penalization Functional
In the next lemmas we verify two properties of the functional R which are fundamental for the convergence analysis in Section 3. Lemma 3. The functional R in (8) is coercive on the set of admissible vectors.
54
A. DeCezaro, A. Leitão, and X.-C. Tai
Sketch of the proof. Let (z, φ, c1 , c2 ) be an admissible vector. From [8, Lemma 4] follows
ρ(z, φ) ≥ β1 |z|BV + β2 φ − φ0 2H 1 . (10) Now, from (10) and the definition of R in (8) follows β1 |z|BV + β2 φ − φ0 2H 1 + β3
2
|cj − cj0 |2 ≤ 2 ρ(z, φ) + β3 j=1 |cj − cj0 |2 = R(z, φ, c1 , c2 ) , j=1
concluding the proof.
Lemma 4. The functional R in (8) is weak lower semi-continuous on the set of admissible vectors, i.e. given a sequence {(zk , φk , c1k , c2k )} of admissible vectors such that zk → z in L1 (Ω), φk φ in H 1 (Ω), cjk → cj in R, for some admissible vector (z, φ, c1 , c2 ), then it follows R(z, φ, c1 , c2 ) ≤ lim inf R(zk , φk , c1k , c2k ) . k∈N
Proof. The functional ρ(z, φ) is weak lower semi-continuous cf. [8, Lemma 5]. Moreover, the Euclidean norm in R2 is also lower semi-continuous. The lemma follows from the fact that the functional R in (8) is a linear combination of lower semi-continuous functionals.
3
Convergence Analysis
First we prove that for any positive parameters α, β the functional Gα in (5) is well posed. Theorem 1 (Well-Posedness). The functional Gα in (5) attains minimizers on the set of admissible vectors. Proof. Notice that the set of admissible vectors is not empty, since (0, 0, 0, 0) is admissible. Let {(zk , φk , c1k , c2k )} be a minimizing sequence for Gα , i.e. a sequence of admissible vectors satisfying Gα (zk , φk , c1k , c2k ) → inf Gα ≤ Gα (0, 0, 0, 0) < ∞. Then, {Gα (zk , φk , c1k , c2k )} is a bounded sequence of real numbers. Therefore, {(zk , φk , c1k , c2 , k)} is uniformly bounded in BV × H 1 (Ω) × R2 . Thus, the Sobolev compact embedding theorem [10] and the Bolzano-Weierstrass theorem guarantees the existence of a subsequence (denoted again by {(zk , φk , c1k , c2k )}) and the existence of (z, φ, c1 , c2 ) ∈ L1 (Ω) × H 1 (Ω) × R2 such that φk → φ in L2 (Ω), φk φ in H 1 (Ω), zk → z in L1 (Ω) and cjk → cj in R. From Lemma 2 we conclude that (z, φ, c1 , c2 ) is an admissible vector. Moreover, from the weak lower semi-continuity of R together with the continuity of F and q we obtain lim Gα (zk , φk , c1k , c2k ) = lim F (q(zk , c1k , c2k )) − y δ 2Y + αR(zk , φk , c1k , c2k ) k→∞
k→∞
≥ F (q(z, c1 , c2 )) − y δ 2Y + αR(z, φ, c1 , c2 ) = Gα (z, φ, c1 , c2 ) , (11) proving that (z, φ, c1 , c2 ) minimizes Gα .
On Level-Set Type Methods
55
In the next theorems we present the main convergence and stability results. The proofs use classical techniques from the analysis of Tikhonov type regularization methods (see, e.g., [11, 12]). Theorem 2 (Convergence for exact data). Assume that we have exact data, i.e. y δ = y and βj > 0 , j = 1, 2, 3. For every α > 0 let (zα , φα , c1α , c2α ) denote a minimizer of Gα on the set of admissible vectors. Then, for every sequence of positive numbers {αk }k∈N converging to zero there exists a subsequence, denoted again by {αk }l∈N , such that (zαk , φαk , c1αk , c2αk ) is strongly convergent in L1 (Ω)× L2 (Ω) × R2 . Moreover, the limit is a solution of (1). Proof. Let (z † , φ† , c1,† , c2,† ) be a solution of (1) – its existence is guaranteed by assumption (A4). Let {αk }k∈N be a sequence of positive numbers converging to zero. For each k ∈ N, let (zk , φk , c1k , c2k ) := (zαk , φαk , c1αk , c2αk ) be a minimizer of Gαk . Then, for each k ∈ N we have 2 Gαk (zk , φk , c1k , c2k ) ≤ F (q(z † , c1,† , c2,† )) − y Y + αk R(z † , φ† , c1,† , c2,† )
= αk R(z † , φ† , c1,† , c2,† ) . (12)
Since αk R(zk , , φk , c1k , c2k ) ≤ Gαk (zk , φk , c1k , c2k ), it follows from (12) that R(zk , φk , c1k , c2k ) ≤ R(z † , φ† , c1,† , c2,† ) < ∞ .
(13)
Moreover, from the assumption on the sequence {αk }, it follows that lim αk R(z † , φ† , c1,† , c1,† ) = 0 .
k→∞
(14)
From (13) and Lemma 3 we conclude that the sequences {φk }, {zk } and {cjk }j=1,2 are bounded in H 1 (Ω) , BV and R2 respectively. Using an argument of extraction of diagonal subsequences (see proof of Lemma 2) we can guarantee the existence ˜ c˜1 , c˜2 ) such that of an admissible vector (˜ z , φ, ˜ c˜1 , c˜2 ) in L1 (Ω) × L2 (Ω) × R2 . (zk , φk , c1k , c2k ) → (˜ z , φ, From Lemma 1 (i) follows that q(˜ z , c˜1 , c˜2 ) = lim q(zk , c1k , c2k ) on L1 (Ω). Using k→∞
the continuity of the operator F together with (12) and (14) we conclude that y = lim F (q(zk , c1k , c2k )) = F (q(˜ z , c˜1 , c˜2 )) . k→∞
On the other hand, from the lower semi-continuity of R and (13) it follows that ˜ c˜1 , c˜2 ) ≤ lim inf R(zk , φk , c1 , c2 ) R(˜ z , φ, k k k→∞
≤ lim sup R(zk , φk , c1k , c2k )) ≤ R(z †, φ† , c˜1 , c˜2 ) , k→∞
concluding the proof.
56
A. DeCezaro, A. Leitão, and X.-C. Tai
Theorem 3 (Convergence for noisy data). Let α = α(δ) be a function satisfying lim α(δ) = 0 and lim δ 2 α(δ)−1 = 0. Moreover, let {δk }k∈N be a seδ→0
δ→0
quence of positive numbers converging to zero and y δk ∈ Y be corresponding noisy data satisfying (2). Then, there exist a subsequence, denoted again by {δk }, and a sequence {αk := α(δk )} such that (zαk , φαk , c1αk , c2αk ) converges in L1 (Ω) × L2 (Ω) × R2 to solution of (1). Proof. Let (z † , φ† , c1,† , c1,† ) be a solution of (1).1 For each k ∈ N, denote by (zk , φk , c1k , c2k ) := (zα(δk ) , φα(δk ) , c1α(δk ) , c2α(δk ) ) a minimizer of Gα(δk ) . Then, for each k ∈ N we have 2 Gαk (zk , φk , c1k , c2k ) ≤ F (q(z † , c1,† , c1,† )) − y δk Y + α(δk )R(z † , φ† , c1,† , c2,† ) ≤ δk2 + α(δk )R(z † , φ† , c1,† , c2,† ) .
(15)
Taking the limit k → ∞ in (15), it follows from the theorem assumptions 2 that lim F (q(zk , c1k , c2k )) − y δk ≤ lim Gαk (zk , φk , c1k , c2k ) = 0. Therefore, k→∞
k→∞
lim F (q(zk , c1k , c2k )) = y. Moreover, from (15) and the definition of Gαk , it fol-
k→∞
lows that R(zk , φk , c1k , c2k ) ≤ δk2 α(δk )−1 + R(z † , φ† , c1,† , c2,† ). Thus, from the assumptions on the function α(δk ), we conclude that lim sup R(zk , φk , c1k , c2k ) ≤ k→∞
R(z † , φ† , c1,† , c2,† ). The proof follows arguing as in the proof of Lemma 2.
4
Numerical Solution
In the sequel we introduce a functional which can be handled numerically, and whose minimizers are ’close’ to the minimizers of Gα . Let Gε,α be the stabilized functional defined by Gε,α (φ, c1 , c2 ) := F (Pε (φ, c1 , c2 )) − y δ 2Y + α β1 |Hε (φ)|BV + 2 + β2 φ − φ0 2H 1 + β3 j=1 |cj − cj0 |2 , (16) where Pε (φ, c1 , c2 ) := q(Hε (φ), c1 , c2 ) is the functional defined in (6). The functional Gε,α is well-posed as the following lemma shows: Lemma 5. Given positive constants α, ε, βj , j = 1, 2, 3 as above, a function φ0 ∈ H 1 (Ω) and cj0 ∈ R, j = 1, 2, the functional Gε,α in (16) attains a minimizer on H 1 (Ω) × R2 . Proof. Since inf{Gε,α (φ, c1 , c2 ) : (φ, c1 , c2 ) ∈ H 1 (Ω) × R2 } ≤ Gε,α (0, 0, 0) < ∞, there exists a minimizing sequence {(φk , c1k , c2k )} in H 1 (Ω) × R2 satisfying lim Gε,α (φk , c1k , c2k ) = inf{Gε,α (φ, c1 , c2 ) : (φ, c1 , c2 ) ∈ H 1 (Ω) × R2 } .
k→∞ 1
The existence of solutions is guaranteed by (A4).
On Level-Set Type Methods
57
Then, for fixed α > 0, the sequences {φk } and {cjk }j=1,2 are bounded in H 1 (Ω) and R2 respectively. Therefore, φk φ in H 1 (Ω) and cjk → cj in R, j = 1, 2. Moreover, by the weak lower semi-continuity of the H 1 –norm and the continuity of the Euclidean norm in R, it follows that φ − φ0 2H 1 ≤ lim inf φk − φ0 2H 1 , k→∞
and |cj − cj0 | ≤ lim inf |cjk − cj0 |. k→∞
From the Sobolev compact embedding theorem [13] we have φk → φ in L2 (Ω). Therefore, Lemma 1 implies √ Hε (φjk ) − Hε (φj )L1 ≤ ε−1 meas(Ω)φk − φL2 → 0,
Pε (φk , c1k , c2k ) − Pε (φ, c1 , c2 )L1 = q(Hε (φk ), c1k , c2k ) − q(Hε (φ), c1 , c2 )L1 → 0. Thus, it follows from [10, Theorem 1, pg 172] that |Hε (φ)|BV ≤ lim inf |Hε (φk )|BV . k→∞
Now, from the continuity of F and q, together with the estimates above we obtain Gε,α (φ, c1 , c2 ) ≤ lim F (Pε (φk , c1k , c2k )) − y δ 2Y + α β1 lim inf |Hε (φk )|BV + k→∞ k→∞ 2 2 + β2 lim inf φk − φ0 H 1 + β3 lim inf j=1 |cjk − cj0 |2 k→∞
≤ lim inf k→∞
concluding the proof.
Gε,α (φk , c1k , c2k )
k→∞
= inf Gε,α ,
In the sequel we prove that, when ε → 0, the minimizers of Gε,α approximate a minimizer of the functional Gα . Theorem 4. Let α and βj be given as above. For each ε > 0, denote by (φε,α , c1ε,α , c2ε,α ) a minimizer of Gε,α . There exists a sequence of positive numbers εk → 0 such that (Hεk (φεk ,α ), φεk ,α , c1εk ,α , c2εk ,α ) converges strongly in L1 (Ω)×L2 (Ω)× R2 and the limit minimizes Gα on the set of admissible vectors. Proof. The functional Gα attains a generalized minimizer (zα , φα , c1α , c2α ) on the set of admissible vectors (cf. Theorem 1). From Definition 1, there exists a sequence {εk } of positive numbers converging to zero and corresponding sequences {φk } in H 1 (Ω) satisfying φk → φα in L2 (Ω), Hεk (φk ) → zα in L1 (Ω). Moreover, we can further assume [8, Lemma 3] that R(zα , φα , c1α , c1α ) = lim
k→∞
2 β1 |Hεk (φk )|BV + β2 φk − φ0 2H 1 + β3 j=1 |cjk − cj0 |2 .
Let (φεk , c1εk , c2εk ) be a minimizer of Gεk ,α . The sequences {φεk }, {Hεk (φεk )} and {cjk }j=1,2 are uniformly bounded in H 1 (Ω), BV(Ω) and R2 respectively. By the compact Sobolev embedding theorem [13], the compact embedding of BV into L1 [10] and the Bolzano-Weierstrass theorem, there exist convergent subsequences ˜ z˜ and c˜j . Summarizing, we have φε → φ˜ in whose limits are denoted by φ, k 2 1 ˜ c˜1 , c˜2 ) ∈ L (Ω), Hεk (φεk ) → z˜ in L (Ω), and cjk → c˜j in R, j = 1, 2. Thus, (˜ z , φ, L1 (Ω) × H 1 (Ω) × R2 is an admissible vector (cf. Lemma 2).
58
A. DeCezaro, A. Leitão, and X.-C. Tai
From the definition of R, Lemma 1 and the continuity of F , it follows that F (q(˜ z , c˜1 , c˜2 )) − y δ 2Y = lim F (Pεk (φεk , c1εk , c2εk )) − y δ 2Y ,
k→∞
2 ˜ c˜1 , c˜2 ) ≤ lim inf β1 |Hε (φε )|BV + β2 φε − φ0 2 1 + β3 |cj − cj |2 . R(˜ z , φ, εk k k k 0 H k→∞
j=1
Therefore, ˜ c˜1 , c˜2 ) = F (q(˜ ˜ c˜1 , c˜2 ) z , φ, z , c˜1 , c˜2 )) − y δ 2Y + αR(˜ z , φ, Gα (˜ ≤ lim inf Gεk ,α (φεk , c1εk , c2εk ) ≤ lim inf Gεk ,α (φk , c1k , c2k ) k→∞
≤
lim sup F (Pεk (φk , c1k , c2k )) k→∞
−
k→∞ δ 2 y Y
+ α lim sup β1 |Hεk (φk )|BV + β2 φk − φ0 2H 1 + β3
=
k→∞ F (q(zα , c1α , c2α ))
2 j=1
|cjk − cj0 |2
− y δ 2Y + αR(zα , φα , c1α , c2α ) = Gα (zα , φ1α , c1α , c2α ) ,
˜ c1 , c2 ) as a minimizer of Gα . characterizing (˜ z , φ, α α 4.1
Optimality Conditions for the Stabilized Functional
For numerical purposes it is convenient to derive first order optimality conditions for minimizers of the stabilized functionals Gε,α . Therefore, we consider Gε,α in (16) with Y = L2 (Ω) and we look for the Gâteaux directional derivatives with respect to φ and the unknown constants cj for j = 1, 2. Since Hε (φ) is self-adjoint, we can write the optimality conditions for the functional Gε,α in the form of the system α(Δ − I)(φ − φ0 ) = Lε,α,β (φ, c1 , c2 ), in Ω ; (φ − φ0 ) · ν = 0, at ∂Ω (17a) α (cj − cj0 ) = Ljε,α,β (φ, c1 , c2 ), j = 1, 2 .
(17b)
Here ν(x) is the external unit normal vector at x ∈ ∂Ω, β¯ := (2β3 )−1 , and Lε,α,β (φ, c1 , c2 ) = (c1 − c2 )β2−1 Hε (φ)∗ F (Pε (φ, c1 , c2 ))∗(F (Pε (φ, c1 , c2 )) − y δ ) −β1 (2β2 )−1 Hε (φ) ∇· ∇Hε (φ)/|∇Hε (φ)| , (18a)
∗ L1ε,α,β (φ, c1 , c2 ) = β¯ F (Pε (φ, c1 , c2 ))Hε (φ) (F (Pε (φ, c1 , c2 )) − y δ ), (18b)
∗ L2ε,α,β (φ, c1 , c2 ) = β¯ F (Pε (φ, c1 , c2 ))(1 − Hε (φ)) (F (Pε (φ, c1 , c2 )) − y δ ).(18c)
5
Numerical Results
In this section a level-set type method based on the system of optimality conditions (17) is used for solving an inverse potential problem of recovering a piecewise constant function u : Ω → {c1 , c2 }, from measurements of the Cauchy data of its corresponding potential on the boundary of the domain Ω = (0, 1) × (0, 1). Notice that no knowledge of the image of u (values c1 , c2 ∈ R) is assumed.
On Level-Set Type Methods
5.1
59
The Inverse Potential Problem
To describe the direct problem, we define the operator F : L2 (Ω) → L2 (∂Ω) by F : u(x) → F (u) := wν |∂Ω , where u is a piecewise constant function in L2 (Ω) with u(x) ∈ {c1 , c2 } a.e. in Ω, and w ∈ H 1 (Ω) solves the elliptic boundary value problem Δw = u , in Ω ; w = 0 , at ∂Ω . (19) Since u ∈ L2 (Ω), the Dirichlet boundary value problem in (19) has a unique solution, namely the potential w ∈ H 2 (Ω) ∩ H01 (Ω). The inverse problem we are concerned with, consists in determining the piecewise constant source function u from measurements of the Neumann trace of w at ∂Ω, i.e. from wν |∂Ω . Using the above notation, the inverse potential problem can be written in the abbreviated form F (u) = y δ , where the data y δ has the same meaning as in (2). Other inverse problems for the operator F were considered in [3, 8]. In [3] a level set method was used for recovering the indicator function u = χD of a star-shaped domain D ⊂ R2 . In [8] a multiple level set method was used for recovering a simple function u : Ω → {c1 , . . . , c4 }. In both cases, knowledge of the (finite) image of u was assumed. 5.2
A Level-Set Algorithm for the Inverse Potential Problem
In the sequel we describe the level set regularization algorithm. This method compares to the level set method as proposed in [8]. The complexity of our algorithm is as follows: at each iteration of the level set method, four elliptic boundary value problems (BVP) are solved (two of Dirichlet type and two of Neumann type). In Table 1 an explicit fixed point procedure for solving the the optimality condition (18) is outlined. In the first step the residual rk ∈ L2 (∂Ω) of the iterate (φk , c1k , c2k ) is evaluated. This corresponds to solving one elliptic BVP of Dirichlet type. In the second step the solution hk ∈ H 1 (Ω) of the adjoint problem for the residual is evaluated. This corresponds to solving one elliptic BVP of Dirichlet type. In the fourth step, the velocity function vk ∈ H 1 (Ω) for the level-set function is evaluated. This corresponds to solving an elliptic BVPs of Neumann type. In the subsequent numerical experiments this algorithm was implemented using a finite element method for the solution of partial differential equations. 5.3
Numerical Experiment
In our experiment we consider the inverse problem of reconstructing the right hand side u in (19) from the knowledge of a single pair of Cauchy data (0, y δ ) at ∂Ω. We further assume that the level value c2 = 0 is given, and that we have to identify only the support of u and the level value c1 ∈ R+ . The data y δ = y = F (u) for solving the inverse problem is known exactly, i.e. δ = 0, and is obtained by solving numerically the elliptic boundary value problem
60
A. DeCezaro, A. Leitão, and X.-C. Tai Table 1. Level set algorithm for the inverse potential problem 1. Evaluate the residual rk := F (Pε (φk , c1k , c2k )) − y δ = (wk )ν |∂Ω − y δ , where wk solves wk = 0 , at ∂Ω . Δwk = Pε (φk , c1k , c2k ) , in Ω ; 2. Evaluate hk := F (Pε (φk , c1k , c2k ))∗ (rk ) ∈ L2 (Ω), solving Δhk = 0 , in Ω ; hk = rk , at ∂Ω . 3. Calculate Lε,α,β (φk , c1k , c2k ) and Ljε,α,β (φk , c1k , c2k ), j = 1, 2 as in (18). 4. Evaluate the velocity vk ∈ H 1 (Ω), solving (Δ − I)vk = Lε,α,β (φk , c1k , c2k ) , in Ω ; (vk )ν = 0 , at ∂Ω . 5. Update the level set function φk and the level values cjk , j = 1, 2: φk+1 = φk +
1 α
vk ,
cjk+1 = cjk +
1 α
Ljε,α,β (φk , c1k , c2k ) .
in (19) (the word ’exactly’ here means: up to the precision of the numerical method used for solving the direct problem). For the direct problem we use the values: c1 = 1, c2 = 0 to compute the exact solution. In the computation of the inverse problem, the exact solution is known a priori to assume the values {c1 , 0} (with unknown c1 ). Moreover, when the data are given exactly, the iterative level-set method is implemented without the additional regularization term |Hε (φ)|BV , i.e. β1 = 0. The solution u of the inverse problem as well as the initial guess Pε (φ0 , c10 ) for the level-set method are shown in Figure 1. Notice that the support of u corresponds to a non-connected proper subset of Ω, The initial guess c10 = 1.5 is used for the unknown level value. In Figure 2 the evolution of the level set method for the first 1500 iterative steps is presented. As one can see in this figure, the shapes of both inclusions are reasonably reconstructed, and the level value c1 is accurately reconstructed as well. The iteration is stopped when the residual drops below the predefined precision F (Pε (φk , c1k )) − yL2 < 10−2 .
Fig. 1. Numerical experiment: The picture on the left hand side shows the coefficient to be reconstructed. On the right hand side, the initial condition for the level-set method.
On Level-Set Type Methods
61
Fig. 2. Numerical experiment: On the left hand side a plot of P (φk , c1k ) for k = 1500. The picture on the right hand side shows the corresponding iteration error.
We performed other numerical simulations with different choice of initial guess (φ0 , c10 ), and observed that the number of iterative steps required in order to obtain a reasonable approximation (up to the predefined precision of 10−2 in the L2 -norm) strongly depends on the choice of the initial guess c10 . On the other hand, the final result is not sensitive with respect to the choice of the initial guess φ0 .
Acknowledgments A.DC acknowledges the support from CNPq, grant 474593/2007-0. The work of A.L. is supported by the Brazilian National Research Council CNPq, grants 306020/2006-8, 474593/2007-0, and by the Alexander von Humbolt Foundation AvH. This article was written during a visit of the author to NTU (Singapore). X.-C.T. acknowledges the support from NTU SUG 20/07 and MOE Tier II project T207B2202 (ARC 29/07).
References 1. Santosa, F.: A level-set approach for inverse problems involving obstacles. ESAIM Contrôle Optim. Calc. Var. 1, 17–33 (1995/1996) 2. Leitão, A., Scherzer, O.: On the relation between constraint regularization, level sets, and shape optimization. Inverse Problems 19, L1–L11 (2003) 3. Frühauf, F., Scherzer, O., Leitão, A.: Analysis of regularization methods for the solution of ill-posed problems involving discontinuous operators. SIAM J. Numer. Anal. 43, 767–786 (2005) 4. Chung, E., Chan, T., Tai, X.C.: Electrical impedance tomography using level set representation and total variational regularization. J. Comput. Phys. 205(1), 357– 372 (2005) 5. Chan, T., Tai, X.C.: Identification of discontinuous coefficients in elliptic problems using total variation regularization. SIAM J. Sci. Comput. 25(3), 881–904 (2003) 6. Chan, T., Tai, X.C.: Level set and total variation regularization for elliptic inverse problems with discontinuous coefficients. J. Comput. Phys. 193(1), 40–66 (2004)
62
A. DeCezaro, A. Leitão, and X.-C. Tai
7. Chung, J., Vese, L.: Image segmantation using a multilayer level-sets apprach. UCLA C.A.M. Report 193(03-53), 1–28 (2003) 8. DeCezaro, A., Leitão, A., Tai, X.C.: On multiple level-set regularization methods for inverse problems. Inverse Problems 25 (to appear, 2009) 9. Tai, X.C., Chan, T.: A survey on multiple level set methods with applications for identifying piecewise constant functions. Int. J. Num. Anal. Model 1(1), 25–47 (2004) 10. Evans, L., Gariepy, R.: Measure theory and fine properties of functions. Studies in Advanced Mathematics. CRC Press, Boca Raton (1992) 11. Engl, H., Kunisch, K., Neubauer, A.: Convergence rates for Tikhonov regularisation of nonlinear ill-posed problems. Inverse Problems 5(4), 523–540 (1989) 12. Engl, H.W., Hanke, M., Neubauer, A.: Regularization of inverse problems. Mathematics and its Applications, vol. 375. Kluwer Academic Publishers Group, Dordrecht (1996) 13. Adams, R.: Sobolev Spaces. Academic Press, New York (1975)
The Nonlinear Tensor Diffusion in Segmentation of Meaningful Biological Structures from Image Sequences of Zebrafish Embryogenesis Olga Drblíková1 , Karol Mikula1 , and Nadine Peyriéras2 1
Slovak University of Technology, Radlinského 11, 813 68 Bratislava, Slovakia
[email protected],
[email protected] http://www.math.sk/drblikov, http://www.math.sk/mikula 2 CNRS-DEPSN, Institut de Neurobiologie Alfred Fessard, Batiment 32-33, Avenue de la Terrasse, 91198 Gif sur Yvette, France
[email protected] Abstract. In this contribution we develop a strategy for segmentation of evolving biological structures in image sequences. Our approach is based on combination of nonlinear tensor diffusion image smoothing and subjective surface based image segmentation. Since the fine cell structure would restrain the evolving segmentation function to achieve a shape of meaningful biological structures, we have to smooth properly the images in the sequence. To that goal we apply the nonlinear tensor diffusion which enhances the connectivity of bordering structure lines and smoothes their inner parts. For the numerical implementations we use semi-implicit diamond-cell finite volume methods both for filtering and segmentation. We show application of the method in image segmentation of early stages of zebrafish embryogenesis.
1
Introduction
The subjective surface based segmentation is an efficient tool for the extraction of 2D or 3D image objects, cf. [10,9,1]. It is also the case when dealing with twophoton laser scanning microscopy images in detecting and segmenting structures at cellular and subcellular level, cf. [6, 8]. However, the use of such algorithms when segmenting the supercellular structures is not straightforward. Using an original (not filtered) image leads to entirely useless results due to the presence of small cell structures. Then a useful tool is filtering by the nonlinear tensor diffusion enhancing the coherence of structure boundaries and smoothing the inner cell structures and noise. The model, cf. [11, 7, 4], has the following form ∂t u − ∇ · (D∇u) = 0, u(x, 0) = u0 (x), (D∇u) · n = 0,
in QT ≡ I × Ω,
(1)
in Ω, on I × ∂Ω,
(2) (3)
where u represents a greylevel 3D image intensity, u0 ∈ L2 (Ω), I = [0, T ] denotes a time interval, Ω is an image domain, D = D(u(x, t)) is a diffusion tensor and X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 63–74, 2009. c Springer-Verlag Berlin Heidelberg 2009
64
O. Drblíková, K. Mikula, and N. Peyriéras
n is the outer normal unit vector to ∂Ω. The model is useful when a strong filtering is desirable in a preferred direction, e.g. along 2D edge surfaces in 3D images and a low smoothing is expected in the perpendicular direction.
2
Design of the Diffusion Tensor
The matrix D depends on a smoothed intensity gradient, which is given as ∇ut˜ = (ux1 , ux2 , ux3 )T , where ut˜(x, t) = (Gt˜ ∗ u(·, t))(x),
(t˜ > 0)
(4)
and Gt˜ is a Gaussian kernel. Provided μ = ||∇ut˜||2 > 0 we choose a triplet of vectors (v1 , v2 , v3 ) as follows v1 ∇ut˜,
v2 ⊥ ∇ut˜,
v3 ⊥ ∇ut˜,
v2 ⊥ v3 .
(5)
The direction of vector v1 corresponds to the direction of the largest intensity change. The other two vectors give a tangential plane to a level set of image intensity which may represent a 2D surface edge in a 3D image, provided that μ is large. It is called a coherence plane P, cf. [4, 7], and corresponds to an eigenspace corresponding to the eigenvalue 0 of the outer product ∇ut˜ ⊗ ∇ut˜. In order to improve the coherence, the diffusion tensor D must steer a filtering process such that the diffusion is strong and increasing with the level of μ along the coherence plane and is small in the perpendicular direction. We achieve it choosing the eigenvalues of the diffusion tensor, which determine the diffusivities in the directions v1 , v2 and v3 as κ1 = α, κ2 =
α ∈ (0, 1), α 1, α, if μ = 0, α + (1 − α) exp
−C μ
, C>0
(6) otherwise.
Further, we apply other convolution with a smoothing kernel ρ to get the diffusion matrix D in the form ⎛ ⎞ κ1 0 0 B, if μ = 0, B = ⎝ 0 κ2 0 ⎠ D = Gρ ∗ D0 , where D0 = (7) P BP −1 otherwise, 0 0 κ2 and P represents a transition matrix from the basis (v1 , v2 , v3 ) to (e1 , e2 , e3 ). The exponential function in (6) is used to ensure that κ2 does not exceed 1. The process never stops owing to the positive parameter α. Even if μ tends to zero, a small linear diffusion with a diffusivity α > 0 still remains there. C has the role of a threshold parameter. If μ C then κ2 ≈ 1, and, conversely if μ C then κ2 ≈ α. After some manipulations we get that at any point where μ > 0, the matrix D0 has the following form ⎞ ⎛ 2 ux1 ux2 (κ1 − κ2 ) ux1 ux3 (κ1 − κ2 ) ux1 κ1 + (u2x2 + u2x3 )κ2 1⎝ ux1 ux2 (κ1 − κ2 ) u2x2 κ1 + (u2x1 + u2x3 )κ2 ux2 ux3 (κ1 − κ2 ) ⎠ (8) μ ux1 ux3 (κ1 − κ2 ) ux2 ux3 (κ1 − κ2 ) u2x3 κ1 + (u2x1 + u2x2 )κ2
The Nonlinear Tensor Diffusion in Structure Segmentation
65
in the standard basis (e1 , e2 , e3 ). Such choice of the matrix D0 was given in [4], it is independ on a concrete choice of v2 and v3 and can be directly and fast evaluated using the diamond-cell finite volume technique (see also next section). Then the matrices are spatially averaged using the Gaussian smoothing with a variance ρ to get the final matrix D elements. The diffusion tensor possesses the smoothness, symmetry and positive definiteness properties, cf. [4].
3
The Finite Volume Scheme for 3D Nonlinear Tensor Diffusion
Let the image u(x) be represented by a bounded mapping u : Ω → R and given by n1 × n2 × n3 voxels (finite volumes) such that it looks like a mesh with n1 rows, n2 columns and n3 layers. Let us consider an image domain Ω = (0, n1 h) × (0, n2 h) × (0, n3 h), with a voxel size h. We consider the diffusion process in a time interval I = [0, T ]. Let the time discretization is given by 0 = t0 < t1 < ... < tNmax = T with tn = tn−1 + k, where k is a length of a discrete time step. We will look for an approximation of solution at time tn for every n = 1, ..., Nmax . We start the scheme derivation integrating the equation (1) over a finite volume K, then provide a semi-implicit time discretization and use the divergence theorem to have
unK − un−1 K m(K) − (Dn−1 ∇un ) · nK,σ ds = 0, (9) k σ σ∈EK ∩Eint
where unK , K ∈ Th , denotes the mean value of un on K and Th is a cubic finite volume mesh. Further quantities and notations are given as follows: m(K) is the 3D measure of finite volume K with the boundary ∂K, σKL = K ∩ L is a side of the finite volume K, where L ∈ Th is a neighboring finite volume to K for which holds that the volumes K and L share a 2D surface element with a nonzero area. At several places we will replace σKL by σ only due to a notation simplification. EK represents the set of sides such that ∂K = σ∈EK σ and E = K∈Th EK . The set of boundary sides is denoted by Eext , that is Eext = {σ ∈ E, σ ⊂ ∂Ω} and Eint = E \ Eext . Υ is the set of pairs of neighboring finite volumes defined by Υ = {(K, L) ∈ Th2 , K = L, m(K ∩ L) = 0} and nK,σ is the normal unit vector to σ outward to K. Our discrete approximation solution is defined as uh,k (x, t) =
N
max
unK χ{x ∈ K}χ{tn−1 < t ≤ tn },
n=0 K∈Th
where the function χ(A) is given by 1, χ{A} = 0,
if A is true, elsewhere.
(10)
66
O. Drblíková, K. Mikula, and N. Peyriéras
The finite volume approximation at the n-th time step is given by unh,k (x) =
n 1 uK χ{x ∈ K} and initial values as u0K = m(K) u0 (x)dx, K ∈ Th . We K∈Th
K
can define an auxiliary unknown φnσ (unh,k ) representing an approximation of the 1 n−1 exact averaged flux m(σ) ∇un ) · nK,σ ds for any K and σ ∈ EK in order σ (D to rewrite (9) in the form unK − un−1 1 K − k m(K)
φnσ (unh,k )m(σ) = 0,
σ∈EK ∩Eint
where m(σ) is the measure of side σ.
Fig. 1. The co-volumes associated with the side σ = σWE (left) and σ = σEW (right)
φnσ (unh,k ) is built with the help of a co-volume mesh, cf. e.g. [2, 3], for the 2D case. We create a co-volume χσ associated with σ around each finite volume side by joining four vertices of this side and midpoints of the finite volumes which are common to this side, cf. Fig. 1. The co-volume boundary consists of triangles σ ¯ ⊂ ∂χσ (their vertices are denoted by N1 (¯ σ ), N2 (¯ σ ) and N3 (¯ σ )) and nχσ ,¯σ is the normal unit vector to σ ¯ outward to χσ . First, we approximate the gradient averaged on χnσ . Applying the divergence theorem we obtain 1 1 n ∇u dx = u nχσ ,¯σ ds which can be approximated as follows m(χσ ) χσ m(χ σ ) ∂χσ
1 1 n n n u m(¯ σ )nχσ ,¯σ . The values at xE + u + u pnσ (u) = m(χ (¯ σ ) (¯ σ ) (¯ σ ) N N N ) 3 1 2 3 σ σ ¯ ∈∂χσ
and xW are denoted as uE and uW . Further, we evaluate the values uTN , uTS , uBN , and uBS at the vertices xTN , xTS , xBN , and xBS , cf. Fig. 1, as the arithmetic mean of uK , where K represents the finite volumes which are common to the vertex. Since the mesh is uniform and squared, we can simplify our discrete √ h3 2 2 scheme applying the following relations: m(χσ ) = 3 , m(¯ σ ) = 4 h . After a short calculation we can state pnσ (u) =
uEn − unW un + unBN − unTS − unBS nK,σ + TN t1K,σ h 2h un + unTS − unBN − unBS t2K,σ , + TN 2h
(11)
The Nonlinear Tensor Diffusion in Structure Segmentation
67
6
φσT
φσN
σT
φσW
σE σW
φ σE
σN
-
σS σB
φσS
φσB
?
Fig. 2. A finite volume K, its boundaries σi , i = E, W, N, S, T, B and the fluxes outward to the finite volume K
where t1K,σ is a unit vector parallel to xTN −xTS such that (xTN −xTS )·t1K,σ > 0 and t2K,σ is a unit vector parallel to xTN −xBN such that (xTN −xBN )·t2K,σ > 0. We replace the exact gradient ∇un by the discrete gradient pnσ (u) to get the numerical flux in the form φnσ (unh,k ) = (Dσ pnσ (u)) · nK,σ . ⎛
(12)
⎞
¯σ D ¯σ D ¯σ D 11 12 13 ¯σ D ¯σ D ¯ σ ⎠ denotes an approximation of the mean value Dσ = Dσn−1 = ⎝ D 12 22 23 σ ¯σ ¯σ ¯ 13 D D23 D33 of the matrix D along σ which was evaluated at the previous time step using ∞ functions due to the convolutions in un−1 h,k . The elements of matrix Dσ are C (4) and (7). Let us emphasize that in (12) we always consider the matrix Dσ written in the basis (nK,σ , t1K,σ , t2K,σ ), cf. [2,3] for an analogy with the 2D model. In practice it means cf. Fig. ⎛ σ that, ⎞ 2, the matrix D given in the standard basis on a side σ σ σ D11 D12 D13 σ σ σ ⎠ D22 D23 by ⎝ D12 is the same in the new basis on two sides σW and σE . It σ σ σ D13 D23⎛D33 ⎞ σ σ σ D22 D12 D23 σ σ σ ⎠ D11 D13 has the form ⎝ D12 in the new basis for two other sides σS and σN σ σ σ D D D ⎛23 σ 13 σ 33 σ ⎞ D33 D23 D13 σ σ σ ⎠ D22 D12 and it becomes ⎝ D23 for the last two sides σB and σT . Using such σ σ σ D13 D12 D11 matrix representations, the definition (12) can be written in the form ⎡⎛ ⎞⎤ ⎛ ⎞ n ⎞⎛ un E −uW ¯σ D ¯σ D ¯σ D 1 h 11 12 13 ⎢ ¯ σ ¯ σ ¯ σ ⎠ ⎜ unTN +unBN −unTS −unBS ⎟⎥ ⎝ ⎠ φnσ (unh,k ) = ⎣⎝ D · 0 = ⎠ ⎦ ⎝ 12 D22 D23 n 2h n n ¯σ D ¯σ D ¯σ un TN +uTS −uBN −uBS 0 D 13 23 33 n
n
n
n
n
n
2h
n
n
n
n
σ uE −uW σ uTN +uBN −uTS −uBS σ uTN +uTS −uBN −uBS ¯ 12 ¯ 13 ¯ 11 +D +D . =D h 2h 2h
68
O. Drblíková, K. Mikula, and N. Peyriéras
Finally, let us summarize our semi-implicit finite volume scheme: unK − un−1 1 K − k m(K) where
φnσ (unh,k ) =
φnσ (unh,k )m(σ) = 0,
(13)
σ∈EK ∩Eint n n n u ¯ σ uTN ¯ σ E − uW + D D 11 12
h n n n u σ TN + uTS − uBN ¯ 13 +D 2h
+ unBN − unTS − unBS 2h − unBS .
(14)
Due to the computation of the values uT N , uT S , uBN and uBS in (14) as the arithmetic mean of neighboring voxel values, we end up with the 27 point finite volume scheme.
4
Segmentation
Our segmentation approach is based on the subjective surface method [10] and its finite volume implementation from [9]. The mathematical model has the following form ∇u 0 2 2 , in QT ≡ I × Ω,(15) ∂t u = ε + |∇u| ∇. g(|∇Gσ ∗ I |) ε2 + |∇u|2 u(x, 0) = u0 (x), u = 0,
in Ω,
(16)
on I × ∂Ω,
(17)
where I 0 is the image which is segmented and ε is the regularization parameter. The solution u represents here the evolving segmentation function. The function g = g(|∇Gσ ∗ I 0 |) has the role of the edge detector, which requires a suitable 1 choice of g in practice, e.g. g(s) = 1+Ks 2 , K > 0. In the subjective surface method we start the segmentation constructing the initial segmentation function located in an approximate object center. The segmentation function is driven by equation (15) and evolves to a numerical steady state. Its shock profile gives the segmentation result and shape of the object. To that goal, we choose a suitable isoline of the shock profile which represents the boundary of the segmented object. This isoline is most naturally taken as the average of maximal and minimal value of the final segmentation function.
5
Numerical Experiments
The goal of this section is to discuss our computational results and the influence of nonlinear tensor diffusion filtering on the time evolving biological structure segmentation. We perform our experiments on the 3D image sequences of cell nuclei, cf. Fig. 3, and cell membranes, cf. Figs. 4-6. The images represent early stages of the zebrafish embryogenesis and were created by the two-photon laser scanning microscope. We apply the 3D numerical scheme to filter the images,
The Nonlinear Tensor Diffusion in Structure Segmentation
69
Fig. 3. 2D slices of a 3D zebrafish embryo image. Left: the original image. Right: the filtered image after 50 time steps.
then the segmentation is performed on 2D image slices (512×512 pixels) in order to firstly test the performance and capabilities of the method. First experiment illustrates the behaviour of nonlinear tensor diffusion in filtering of this type of images, cf. Fig. 3. One can observe that this type of diffusion improves the connectivity of structure bordering lines while it smoothes the structure interiors. One can compare the original image showing separate nuclei but with observable structure borders with the filtered one, where the structure border lines are connected. Our next experiments are devoted to the segmentation of eye retina structure in the several subsequent image slices. First, the initial segmentation function is given by two cones which are inside the structure such that their partially overlapping bases sufficiently cover the eye structure area. Then we evolved it in the original as well as filtered images. Using the original image we obtained the final state of segmentation function represented by a variety of different level lines, cf. Fig. 4 (top, right). The question is, which isoline would represent the most precisely the structure shape. The natural choice is a medium isoline which is depicted in the original image Fig. 4 (top, left). One can clearly see the large difference between the segmented and real structure shape due to the restraints of evolving segmentation function caused by inner cell structures. In order to compare our method with other filtering techniques we performed several tests. The segmentation results obtained on the images filtered by the geodesic mean curvature flow (GMCF) filtering, the mean curvature flow (MCF) filtering and the Perona-Malik (PM) filtering are shown in Fig. 5. In Fig. 5 (right), we can see that after filtering the profiles of final segmentation functions are not well suited for our purposes although the MCF results is rather close to the real one. They are again given by several different isolines and medium one, cf. Fig. 5 (left), represents the segmented structure only partially. This is a consequence of edge
70
O. Drblíková, K. Mikula, and N. Peyriéras
Fig. 4. The eye retina segmentation using the 2D original image (top) and image filtered by 20 time steps of the nonlinear tensor diffusion (middle). Left: the averaged isoline of the final state of segmentation function is superimposed to the original and filtered image, respectively. Right: the graphs of the final state of segmentation function is plotted after 2000 time steps using the original image and after 200 time steps using the filtered image. At the bottom we display the original (left) and the filtered image (right).
The Nonlinear Tensor Diffusion in Structure Segmentation
71
Fig. 5. The eye retina segmentation using the filtered image by 100 steps of the GMCF filtering (top), 25 steps of the MCF filtering (middle) and 20 steps of the PM filtering (bottom). Left: the averaged isoline of the final state of segmentation function is superimposed to the filtered image. Right: the graphs of the final state of segmentation function is plotted after 3000 segmentation steps using the GMCF filtering, after 500 segmentation steps using the MCF filtering and 5000 segmentation steps using the PM filtering.
72
O. Drblíková, K. Mikula, and N. Peyriéras
Fig. 6. The segmentation results for the image sequence which are superimposed to the original slices
preserving smoothing by GMCF and PM which can not remove inner cell structures. On the contrary, the final steady state of segmentation function evolving in the image filtered by nonlinear tensor diffusion consists of isolines accumulated
The Nonlinear Tensor Diffusion in Structure Segmentation
73
along the real structure boundary, cf. Fig. 4 (middle, right). The formation of correct shock profile was enabled due to the smoothing of cell structure barriers and noise removal and the emphasizing of structure boundaries. Embedding the medium isoline into the image, cf. Fig. 4 (middle, left) we achieved the precise structure shape. Then the segmentation procedure was successively applied in the image sequence part consisting of 11 images, cf. Fig. 6. We use the backward in time strategy starting from the last image of the sequence segmented as explained above. The initial segmentation function for other slices is taken as the final result of the segmentation of the previous image. Fig. 6 shows the segmentation results displayed on the original membrane images from last 150th image slice (top), to the 145th slice (middle) up to the 140th slice of the processed image sequence (bottom). In experiments dealing with the nonlinear tensor anisotropic diffusion we used the spatial step h = 0.01, time step k = 0.0001, C = 1, α = 0.001, t˜ = 10−5 , ρ = 0.002, 20 time steps for the filtering of membranes images and t˜ = 10−10 , ρ = 0.1, 50 time steps for the filtering of nuclei images. The arising linear systems were solved by the Gauss-Seidel iterative method. For the segmentation experiments we use the following parameters: ε = 10−4 , the spatial step h = 0.01, time step k = 0.01, δ = 10−6 for a stopping criterion and K = 100 (a constant of the 1 function g(s) = 1+Ks 2 ), cf. [9]. The resulting linear systems were solved by the SOR method.
6
Conclusions
In the article we concern with the technique for embryo structure segmentation in image sequences. Since a noise and cell structures restrain the correct segmentation evolution, as the first step, we smooth the image sequence. We choose the nonlinear tensor diffusion due to the fact that this filtering not only smoothes image objects but emphasizes connections of their boundaries as well. Then, the segmentation process starts using an artificial initial function centered inside the biological structure of the first image in the sequence. The segmentation result given by the subjective surface method obtained for this image is used as the initial condition for the next image of processed sequence, etc. Our experiments confirm the usefulness of the nonlinear tensor diffusion for this type of segmentation.
Acknowledgment The work was supported by the European projects Embryomics and BioEmergences, the grants APVV-RPEU-0004-06, APVV-0351-07, APVV-LPP-0020-07 and the grant of VEGA 1/0269/09.
74
O. Drblíková, K. Mikula, and N. Peyriéras
References 1. Corsaro, S., Mikula, K., Sarti, A., Sgallari, F.: Semi-implicit co-volume method in 3D image segmentation. SIAM J. Sci. Comput. 28(6), 2248–2265 (2006) 2. Coudiere, Y., Vila, J.P., Villedieu, P.: Convergence rate of a finite volume scheme for a two-dimensional convection-diffusion problem. M2AN Math. Model. Numer. Anal. 33, 493–516 (1999) 3. Drblíková, O., Mikula, K.: Convergence Analysis of Finite Volume Scheme for Nonlinear Tensor Anisotropic Diffusion in Image Processing. SIAM J. Numer. Anal. 46(1), 37–60 (2007) 4. Drblíková, O., Mikula, K.: Semi-implicit Diamond-cell Finite Volume Scheme for 3D Nonlinear Tensor Diffusion in Coherence Enhancing Image Filtering. In: Eymard, R., Herard, J.M. (eds.) Finite Volumes for Complex Applications V: Problems and Perspectives, ISTE and WILEY, London, pp. 343–350 (2008) 5. Eymard, R., Gallouët, T., Herbin, R.: Finite Volume Methods. In: Ciarlet, P., Lions, J.L. (eds.) Handbook for Numerical Analysis, vol. 7. Elsevier, Amsterdam (2000) 6. Frolkovič, P., Mikula, K., Peyriéras, N., Sarti, A.: A counting number of cells and cell segmentation using advection-diffusion equations. Kybernetika 43(6), 817–829 (2007) 7. Meijering, E., Niessen, W., Weickert, J., Viergever, M.: Diffusion-Enhanced Visualization and Quantification of Vascular Anomalies in Three-Dimensional Rotational Angiography: Results of an In-Vitro Evaluation. Medical Image Analysis 6(3), 217– 235 (2002) 8. Mikula, K., Peyriéras, N., Remešíková, M., Sarti, A.: 3D embryogenesis image segmentation by the generalized subjective surface method using the finite volume technique. In: Eymard, R., Herard, J.M. (eds.) Finite Volumes for Complex Applications V: Problems and Perspectives, ISTE and WILEY, London, pp. 585–592 (2008) 9. Mikula, K., Sarti, A., Sgallari, F.: Co-volume level set method in subjective surface based medical image segmentation. In: Suri, J., et al. (eds.) Handbook of Medical Image Analysis: Segmentation and Registration Models, pp. 583–626. Springer, New York (2005) 10. Sarti, A., Malladi, R., Sethian, J.A.: Subjective Surfaces: A Method for Completing Missing Boundaries. Proceedings of the National Academy of Sciences of the United States of America 12(97), 6258–6263 (2000) 11. Weickert, J.: Coherence-enhancing diffusion filtering. Int. J. Comput. Vision 31, 111–127 (1999)
Composed Segmentation of Tubular Structures by an Anisotropic PDE Model Elena Franchini, Serena Morigi, and Fiorella Sgallari Department of Mathematics-CIRAM, University of Bologna, Bologna, Italy {franchini,morigi,sgallari}@dm.unibo.it
Abstract. In this work we introduce the composed segmentation (Csegmentation), that is a priori composition of sources to obtain a single one segmentation result according to specific logic combinations. The approach and the segmentation model are general but we apply the C-segmentation technique to the challenging problem of segmenting tubular-like structures. The reconstruction is obtained by continuously deforming an initial distance function following the Partial Differential Equation (PDE)-based diffusion model derived from a minimal volumelike variational formulation. The gradient flow for this functional leads to a nonlinear curvature motion model. An anisotropic variant is provided which includes a diffusion tensor aimed to follow the tube geometry. Numerical examples demonstrate the ability of the proposed method to produce high quality 2D/3D segmentations of complex and eventually incomplete synthetic and real data.
1
Introduction
Segmentation of three-dimensional (3D) images can be a very useful computer aided diagnosis tool for clinical routines or surgical planning. We use the term composed segmentation for systems that extract structures from several images, by combining them according to specific Boolean operations. Traditionally, the segmentation process independently performed on single images have to be combined by cumbersome algorithms. The goal of C-segmentation is to combine complementary multispatial, multisensor, multitemporal and/or multiview information into one new domain containing only the information to be segmented. The term composed means by Boolean operations which depends on the application requirements. The individual images entering the C-segmentation process need to be registered to a common frame of reference, this is a nontrivial task which could affect the robustness of the segmentation approach, but it is not addressed in this work. We assume the input images have been preliminary registered. Let us illustrate the role of C-segmentation in different applications. Multimodal fusion deals with images that capture different physical properties of the original scene. In this case, C-segmentation identifies and segments the union of regions of interest. Multispatial fusion is related to several images which cover a single one scene, for example several aerial photographs to represent an entire territorial region, or multiple CT scans to reconstruct a human organ. The X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 75–86, 2009. c Springer-Verlag Berlin Heidelberg 2009
76
E. Franchini, S. Morigi, and F. Sgallari
C-segmentation unifies all the information, eventually replicated in the multiple sources, into a single one segmented structure. Multitemporal composition requires the comparison between images representing the same structures acquired at different timing. For example, in medical analysis, a tumor-region growing is monitored by subsequently images of the region of interest. The C-segmentation can identify and segment the difference between structures in two images, reconstructing the grown area. While our segmentation methodology is quite general, we focus our attention on the most challenging problem of tubular-like segmentation which is particularly difficult in case of multiple sources due to the huge amount of connected structures that should be reconstructed. In particular, we will consider applications in medical image analysis which are interested in the extraction of anatomical surfaces of tubular structures like blood vessels. Indeed, problems like aneurysm or stenosis can occur in a vessel, and the clinicians need tools to help them in interpreting and quantifying the images for evaluating the pathology, for proposing a therapy or a surgical operation, for planning minimally invasive treatment. A number of deformable model-based approaches for vessel segmentation or, generally tube-like structures, have received considerable attention and success. We refer the reader to [4] for an extended review on vessel segmentation algorithms. Since explicit deformable model representation is usually impractical, level set techniques to evolve a deformable model have been recently introduced, which provide implicit representation of a deformable model. A curve in 2D or a surface in 3D evolves in such a way as to cover a complex shape or structure. Its initialization can be either manual or automatic and it needs not to be close to the desired solution. A disadvantage of level sets segmentation approach is the computational effort required to cover the entire domain of interest which is, in general, one dimension higher than the original one. Interested readers are referred to recent literature on the level set segmentation strategy for tubular structures [5], [6], [7], [8], [10]. A generalization of the single-channel active contour without edges model is proposed in [9] for object detection using logic operations. This logic framework suffers from the active contour model limits, and is not suitable for detecting tubular structures. In this work, we modify a geometric deformable model segmentation procedure based on level sets [2], to obtain a fast and accurate method for solving the C-segmentation problem to extract tubular structures from multiple 2D/3D images, and we apply the proposed segmentation method for segmenting blood vessels, neurovascular structures and similar characteristics medical images. The main contributions of this work concern the design of a strategy to deal with directionality in the vessels based on a diffusion tensor, and the capability to compose segmentation of multiple images according to Boolean operations. The former makes the segmentation algorithm able to follow tubular structures and connect eventually disconnected parts, while the latter let simultaneously combine different information into a robust segmentation method. The proposed method is able to segment twisted, convoluted, and occluded structures without
Composed Segmentation of Tubular Structures
77
the user interactivity, following branching of different layers, from thinner to larger structures. One of the major disadvantages of the geometric deformable models, that is the computational cost, is strongly reduced by the proposed numerical approach which limits the dimensions of the linear systems involved in the solution. The paper is organized as follows. The non-linear PDE model for tubular structures segmentation is introduced in Section 2, and numerical aspects related to the discretization of the PDE model are discussed in Section 3. The Csegmentation algorithm is discussed in details in Section 4, and the anisotropic variant of the segmentation model is introduced in Section 5. Synthetic as well as real tests are provided in Section 6. Some selected 3D examples are also presented in Section 6 to demonstrate the effectiveness of this technique for automatic segmentation of bloods vessels in volumetric MRA/CTA images. Section 7 contains concluding remarks.
2
A Segmentation Model for Tubular Structures
Several recently proposed 3D segmentation methods are based on deformable models, which can naturally capture the physics and geometry of shapes varying in space and time. In this section we formulate the segmentation problem as a special deformation of a 3D manifold driven by the structures we want to recover. Classical segmentation problems show oversmoothed structures and eventually uncomplete boundaries and the surface evolution usually flows over the boundaries of longer and thinner objects when propagating. A common choice to detect structure boundaries and to drive diffusion or segmentation process is the Perona-Malik diffusivity g(s) = 1/(1 + (s/ρ)2 ),
(1)
where ρ > 0 is a small positive constant. For the implicit representation of the segmented surface, we consider a special 3D manifold which is the graph of a trivariate function φ mapping an open set Ω ⊂ R3 into R. The problem of determining the surface that best fits the object boundary represented in a 3D image I, can be posed as a volume minimization problem with objective function Vg := g(∇I)dV dV = 1 + ∇φ2 dxdydz, (2) Ω
where the metric g is defined by (1) in Ω and Vg represents the weighted volume of a 3D manifold on Ω. The volume functional (2) can be minimized and according to the steepest descent, reading ε = 1, we have ∇φ ∂φ 2 2 = ε + ∇φ ∇. g(∇I) , (3) ∂t ε2 + ∇φ2
78
E. Franchini, S. Morigi, and F. Sgallari
or, equivalently, in advection-diffusion form ∂φ 2 ∇φ 2 = ε + ∇φ g(∇I)∇. + ∇g · ∇φ. ∂t ε2 + ∇φ2
(4)
The PDE model (4), ε = 1, represents the mean curvature motion of the 3D manifold in 4D space with metric g. The metric g in (4) is the edge function appropriately chosen so that the object boundaries act as attractors under a particular flow. This term allows us to extract sharp features, such as edges, corners, spikes, and to accelerate the deformation of the initial function. In the evolution of φ according (4) the 3D manifold assumes constant values for most regions far from the boundaries. The first term in (4) corresponds to a minimal volume regularization weighted by the function g, while the second term corresponds to the attraction to the image edges. The advection term in equation (4) introduces a driving force which moves the level surfaces towards the object boundaries. Equation (3) in case ε ∈ (0, 1] is proposed in [2] for dealing with the boundary completion problem. The variability in the parameter ε, ε ∈ (0, 1], provides both a regularization effect and a hole filling strategy. The effect of the parameter ε is to segment boundaries which are eventually uncompleted due to, for example, noise or corruptions in the acquisition phase. However, this does not help in the reconstruction of slightly disconnected tubular structures. The latter problem is solved by the introduction of a suitable diffusion tensor, which is discussed in Section 5. The starting initial function φ0 is usually a problem, since it involves user interaction for locating some starting points at one particular recognizable part of the structure to be segmented inside the 3D image. This is overcome by our method which automatically initialize the surface evolution using a suitably designed distance function, as described in Section 4. We can adapt the PDE model (3) to compute Boolean operations between implicit surfaces M1 and M2 . This can be carried out quite easily, using the min, max tools on the related signed distance functions dist1 (x) and dist2 (x). In fact the union, intersection and differences between two surfaces can be obtained applying the evolving PDE (3) initialized by (9) in Section 4, where dist(x) is defined respectively by dist(x) = min{dist1 (x), dist2 (x)}, union dist(x) = max{dist1 (x), dist2 (x)} intersection dist(x) = max{dist1 (x), −dist2 (x)} difference(M1 − M2 ) dist(x) = max{−dist1 (x), dist2 (x)} difference(M2 − M1 ).
3
(5)
Solving the PDE Model
The computational method for solving (3) is based on an efficient semi-implicit co-volume scheme as suggested in [2]. The semi-implicit in time discretization is obtained by treating the nonlinear terms of the equation from the previous
Composed Segmentation of Tubular Structures
79
time step while the linear ones are considered at the current time level. Timediscretization of (3) by Euler’s method yields the following semi-discrete scheme Let τ be a uniform discrete time step, φ0 be a given initial function. Then, for every discrete time step tn = nτ , n = 1, . . . N , we look for a function φn , solution of the equation 1 φn − φn−1 ∇φn 0 = ∇. g , (6) τ ε2 + ∇φn−1 2 ε2 + ∇φn−1 2 where g 0 := g(∇I). The computational domain is obtained through Ω-decomposition into cubic cells and the construction of a co-volume mesh using a complementary 3D tetrahedral grid. Following the classical finite volume methodology we integrate (6) over every co-volume p, p = 1, . . . , M and according to the details explained in [2] we get at the time step n, a system of linear equations which can be written in matrix-vector form as AΦn = b,
(7)
where A ∈ RM×M is the coefficient matrix, which is symmetric and diagonally dominant M-matrix, and Φn = (φ1 , . . . , φM ) is the vector solution. Since the unknown function φ(x, t) evolves only on nodes sufficiently close to the structure boundary, we speed up the computation by determining the updated values for φ(x, t) only for the nodes identified by initial function φ0 (x) > η, for a given small positive threshold η. In case of vessel structures, for example, this means a significant reduction of the computational effort required since the number of nodes representing the vessels is small compared with the dimension of the entire 3D image which contains them. In practice, at each time step, the number of unknowns of the linear system (7) is significatively reduced, and thus both the storage and the computational cost are much lower. Since at each row of A corresponds a node in Ω, if we consider a limited number of nodes M1 0, we used = 1 · 10−3 in the computational examples. We can proceed similarly for other Boolean operations.
Composed Segmentation of Tubular Structures
81
Finally, the reconstructed surface is obtained from the implicit surface φ as the zero level set of the function φ(x) − s, that is the s-level set of φ: {x ∈ Ω : φ(x) = s}, where s = (max(Φ) + min(Φ))/2. This is motivated by the fact that the flow driven by (3) forms a sharp step in the proximity of the object boundaries, while it approaches at constant values inside/outside the object.
5
An Anisotropic Variant of the Segmentation Algorithm
In this section we provide a variant of the isotropic model (3) designed to improve significantly the connectivity of the coherent structures in the segmentation. The idea is to incorporate local orientation of the tubular structures into the dynamic segmentation process in such a way that at each time step the surface evolves by isotropic mean curvature motion in homogeneous regions, while it is driven by the directional field representing the orientation of the tube in presence of tubular structures. We aim to capture the vessel’s structure and the vessels directions locally by a local spatial coherence descriptor. Coherence enhancing image smoothing has been introduced by [3] and successfully applied in image filtering by anisotropic diffusion. This type of nonlinear diffusion includes the construction of a diffusion tensor which is built as follows. Given an image I, and its Gaussian-smoothed version ∇Iσ , a regularized shape descriptor is provided by Jδ (∇Iσ ) := (Kδ ∗ (∇Iσ ⊗ ∇Iσ )) (11) where Kδ is a Gaussian kernel with δ ≥ 0. The matrix Jδ is symmetric positive semi-definite and its eigenvalues μ1 ≥ μ2 integrate the variation of the gray values within a neighborhood of size O(δ). They describe the average contrast in the corresponding eigendirections v1 and v2 . The orientation of the eigenvector v2 , corresponding to the smaller eigenvalue, represents the direction of lowest fluctuations, the so-called coherence orientation. In this way, constant areas are characterized by μ1 = μ2 = 0, while straight edges give μ1 μ2 = 0. The normalized coherence value which measures the anisotropic structures within a window of scale δ is thus defined as c=
(μ1 − μ2 )2 , max{(μ1 − μ2 )2 }
c ∈ [0, 1].
(12)
Thus c approaches to 1, for anisotropic structures and tends to zero for isotropic structures. The diffusion tensor D is a matrix with the same eigenvectors as the (regularized) structure tensor Jδ and its eigenvalues are given by λ1 = g(∇I) g(∇I) λ2 = g(∇I) + (1 − g(∇I))e−κ/c
if μ1 = μ2 κ > 0, else
(13)
where g(·) is the composed diffusion function, defined for example, by (10) which suitably adapts its values to the anisotropy. The parameter κ has the role of a
82
E. Franchini, S. Morigi, and F. Sgallari
threshold, and c is the coherence defined in (12). Therefore, the matrix D has the following form
T
v1 λ 0 . (14) D = [ v1 v2 ] 1 0 λ2 v2T In local homogeneous areas of an image the diffusion is reduced to be the isotropic mean curvature motion driven by (3), in fact we have D = g(∇I). Areas nearby elongated structures are characterized by values of g(·) approaching to zero, which gives λ2 >> λ1 . The effect on the diffusion of the segmentation function is thus stronger along the coherence directions. In all the experiments reported in Section 6 we set κ = 1 · 10−5 . Since if c >> κ then λ2 ≈ 1, while if c 0 is a tuning parameter. Remark 1. Functional (3) is defined on W 1,∞ (Ω). The domain Ω being bounded, the inclusion L∞ (Ω) ⊂ L2 (Ω) holds so Dv ∈ L2 (Ω). Remark 2. If v is a minimizer of (3), so is v + C where C denotes any real constant. This is not a problem since we are interested in the associated gradient vector field. If v ∈ W 1,∞ (Ω), v is Lipschitz continuous and thus, by Rademacher’s theorem, differentiable almost everywhere. To minimize the above energy, we make use of the absolutely minimizing Lipschitz extensions. Following the results on AMLE recalled in Sect. 1, we obtain the Euler-Lagrange equation satisfied by v if it minimizes (3) and solve it by gradient descent. More precisely, classically, in image processing, the equation is defined on a domain R of IR2 (e.g., on the square [0, 1] × [0, 1]). In this case, boundary conditions must be defined: Neumann boundary conditions on ∂R are well-suited to the image processing framework since it corresponds to the reflection of the data through the edges. Thus it is no longer necessary to define boundary values. Following [6] and [15], we propose to simplify the problem by working with periodic solutions. The function v, primarily defined on [0, 1] × [0, 1], is extended to IR2 . First, by symmetry, we extend it to [−1, 1] × [−1, 1] and then in all of IR2 by periodicity (see Sect. 3.3.1 from [6]). We thus obtain that ∀h ∈ ZZ 2 , ∀x ∈ IR2 ,
Extrapolation of Vector Fields Using the Infinity Laplacian
91
v(x + 2h) = v(x). Also, we assume that the initial condition v0 and the functions x → wk (x), k = 1, 2 are extended to IR2 with the same periodicity. Given T > 0, we then obtain the following problem: ⎧ ∂v ⎪ ⎪ = 2 ||W ||2 Δv + 2 D||W ||2 , Dv − 2 div ||W ||2 W ⎪ ⎪ ∂t
⎪ ⎪ Dv Dv ⎪ 2 ⎨ , on IR2 × (0, T ), + μD v |Dv| |Dv| (4)
⎪ Dv Dv ⎪ ⎪ ⎪ , , = b(x)Δv − d(x), Dv − h(x) + μ D 2 v ⎪ ⎪ |Dv| |Dv| ⎪ ⎩ 2 v(x, t = 0) = v0 (x) in IR , with b : x → 2||W (x)||2 , d : x → −2 D||W ||2 (x), h : x → 2 div ||W ||2 W (x), and with the assumptions v0 ∈ C(IR2 ) ∩ W 1,∞ (IR2 ), b ∈ C(IR2 ) and bounded by ξb , d ∈ C(IR2 ) ∩ W 1,∞ (IR2 ), bounded by ξd and with Lipschitz constant κd , h ∈ C(IR2 ) ∩ W 1,∞ (IR2 ), bounded by ξh and with Lipschitz constant κh , and with ·, · denoting the euclidean scalar product in IR2 . We also assume that the mapping IR2 x → b1/2 (x) is Lipschitz continuous on IR2 with Lipschitz constant κb1/2 .
3
Theoretical Results
This problem falls within the framework of the theory of viscosity solutions. Indeed, we obtain a second order singular degenerate parabolic equation. The concept of viscosity solutions has been introduced in 1981 by Crandall and Lions ( [22]). This theory was developed to study first-order partial differential equations of nondivergence form, typically, Hamilton-Jacobi equations. Later, the study of viscosity solutions was extended to second-order elliptic and parabolic equations (for a good introduction to the theory of viscosity solutions, we refer to Barles [8, 7], the article of Crandall, Ishii and Lions [21], Crandall, Lions [23], Ishii [26], and Ishii, Lions [27]). We also refer to the related work [9]. In our problem, the evolution equation in (4) can be rewritten in the form: ∂v + G x, Dv, D2 v = 0, ∂t with G : IR2 × IR2 − {0IR2 } × S 2 (S 2 being the set of symmetric 2 × 2 matrices equipped with its natural partial order) defined by: p pT X , |p| |p|
p p = d(x), p + h(x) − b(x) trace (X) − μ trace X , |p|2
G(x, p, X) = d(x), p + h(x) − b(x) trace (X) − μ
= c(x, p) + E (x, X) + F (p, X) , with the following properties:
92
L. Guillot and C. Le Guyader
p p X 2 |p|
and E : (x, X) →
if X ≤ Y then F (p, X) ≥ F (p, Y ).
(5)
1. The operators G, F : (p, X) → −μ trace
−b(x) trace (X) are independent of v and are elliptic, i.e., ∀X, Y ∈ S 2 , ∀p ∈ IR2 ,
The operators G, E, and F are therefore proper. 2. F is locally bounded on IR2 × S 2 , continuous on IR2 \ {0IR2 } × S 2 , and F ∗ (0, 0) = F∗ (0, 0) = 0,
(6)
where F ∗ (resp. F∗ ) is the upper semicontinuous (usc) envelope (resp. lower semicontinuous (lsc) envelope) of F . 3. c : IR2 × IR2 (x, p) → d(x), p + h(x) is locally Lipschitz continuous in space and ∀x, y ∈ IR2 × IR2 , |c(x, p) − c(y, p)| ≤ (κd |p| + κh ) |x − y|.
(7)
We start by proving a comparison principle that will be useful to prove the uniqueness of the viscosity solution of the considered problem. Theorem 1 (Comparison principle). Let u ∈ U SC(IR2 × [0, T )), bounded, periodic (with the same periodicity as the initial condition of (4)), be a subsolution and v ∈ LSC(IR2 × [0, T )), bounded, periodic (with the same periodicity as the initial condition of (4)), be a supersolution of (4). Assume that u0 (x) = u(x, 0) ≤ v0 (x) = v(x, 0) in IR2 , then u ≤ v in IR2 × [0, T ). Proof. This proof is rather classical. We follow the arguments of [21]. We first observe that for λ > 0, u ˜ = u − T λ−t is also a subsolution of (4) and u ˜t + G∗ (x, D˜ u, D 2 u ˜) ≤ −
λ λ ≤ − 2. 2 (T − t) T
Since u ≤ v follows from u ˜ ≤ v in the limit λ → 0, it will simply suffice to prove the comparison under the additional assumptions: ⎧ ⎨ (i) u + G x, Du, D2 u ≤ − λ . t ∗ T2 (8) ⎩ (ii) lim u(x, t) = −∞. t→T
Let us set M = supIR2 ×[0,T ) u(x, t) − v(x, t). We aim to show that M ≤ 0. In this purpose, we argue by contradiction and assume that M > 0. We introduce the duplication function f (x, − (4ε)−1|x − y|4 and consider y, t) = u(x, t) − v(y, t) −1 M0 = supIR2 ×IR2 ×[0,T ) u(x, t) − v(y, t) − (4ε) |x − y|4 , ε > 0. Obviously, M0 ≥ M > 0. Moreover, this supremum is reached owing to the bound above of u and −v, the fact that f is such that ∀h ∈ ZZ 2 , f (x + 2h, y + 2h, t) = f (x, y, t), and (8)(ii). We denote by (x0 , y0 , t0 ) ∈ IR2 × IR2 × [0, T ) a point of maximum. We first prove that t0 > 0 for ε small enough and then rise a contradiction using Th. 8.3 from [21], which allows to conclude that M ≤ 0. Consequently, u ≤ v in IR2 × [0, T ).
Extrapolation of Vector Fields Using the Infinity Laplacian
93
We now give an existence result using the classic Perron’s method (see Sect. 4 from [21]). We start by constructing a subsolution U − . Let us set U − = inf IR2 (v0 ) − Ct with C = ξh . U − is twice differentiable in space, once differentiable in time, bounded, and periodic with the same periodicity as v0 and U − is a subsolution of (4). Similarly, U + = supIR2 (v0 ) + Ct is a supersolution of (4). Obviously, U − (x, 0) ≤ U + (x, 0). We can define: v = sup {w; w periodic with the same periodicity as v0 , subsolution such that U − ≤ w ≤ U + } . In that case, Perron’s method states that v is a periodic discontinuous solution of (4) with the same periodicity as v0 . Clearly, the solution is bounded since U + is bounded. Also as v is a solution, v ∗ is a subsolution and v∗ a supersolution so from the comparison principle v ∗ ≤ v∗ . But v∗ ≤ v ∗ so v ∗ = v∗ = v, which gives that v is continuous on IR2 × [0, T ). Conclusion 1. We have proved the existence and uniqueness of a bounded, periodic, continuous on IR2 × [0, T ) viscosity solution of (4). We now prove that a solution of (4) is Lipschitz continuous in space, and uniformly continuous in time. Theorem 2 (Regularity results). Let us assume that ||Dv0 ||L∞ (IR2 ) ≤ B0 with B0 > 0. Then the solution of (4) satisfies: ||Dv(·, t)||L∞ (IR2 ) ≤ B(t), αt
with B(t) = κh e
−1 α
+ B0 eαt , and with α = 8κ2b1/2 + κd .
Proof. The function v is bounded, continuous on IR2 × [0, T ), and periodic with 1 the same periodicity as v0 . We set Φε (x, y, t) = B(t) |x − y|2 + ε2 2 and aim at proving that v(x, t) − v(y, t) ≤ Φε (x, y, t). Let us set M = sup(x,y)∈IR2 ×IR2 , t∈[0,T ) (v(x, t) − v(y, t) − Φε (x, y, t)). We thus aim to show that M ≤ 0. Once again, we argue by contradiction and assume that M > 0. So we conclude that v(x, t) − v(y, t) ≤ Φε (x, y, t) and letting ε tend to 0, one obtains: v(x, t) − v(y, t) ≤ B(t)|x − y|. Exchanging x and y yields: |v(x, t) − v(y, t)| ≤ B(t)|x − y|.
Theorem 3 (Regularity results). The solution v is uniformly continuous in time.
94
L. Guillot and C. Le Guyader
Proof. We proceed like in [25]. In a first time, we assume that v0 is bounded, periodic, C 2 , and such that there exists C, ||Dv0 ||L∞ (IR2 ) , ||D2 v0 ||L∞ (IR2 ) ≤ C. Let us set C1 = sup ζ + E(x, D2 v0 ) + F∗ (Dv0 , D2 v0 ), ζ − E(x, D2 v0 ) − F ∗ (Dv0 , D2 v0 ) x∈IR2
with ζ = ξd ||Dv0 ||L∞ (IR2 ) + ξh . Let us also set v − = v0 − C1 t and v + = v0 + C1 t. It can be checked that v − is a subsolution of (4) and v + is a supersolution. Then, there exists a unique solution v of (4) and, by the comparison principle, it yields: ∀x ∈ IR2 , ∀t ∈ [0, T ), |v(x, t) − v0 (x)| ≤ C1 t. Letting u(x, t) = v(x, t + h), we obtain that u is the solution of ∂u + G(x, Du, D2 u) = 0 . ∂t u(x, t = 0) = v(x, h) Classical arguments (comparison principle) allow to conclude that |u(x, t) − v(x, t)| ≤ C1 h, that is |v(x, t + h) − v(x, t)| ≤ C1 h. So v is uniformly continuous in time. Then we assume that v0 is only bounded, periodic and Lipschitz continuous, and use mollification (see Chap. IV from [11] and Sect. 2.5 from [6]). Using the first step of the proof, we obtain the result and the modulus of continuity of v which depends on B0 . Conclusion 2. We have proved the existence and uniqueness of a viscosity solution of problem (4), bounded, periodic, continuous on IR2 × [0, T ), Lipschitz continuous in space so differentiable almost everywhere, and uniformly continuous in time. We now discretize the evolution equation. In the sequel, we set Ω x = (x1 , x2 ).
4
Experimental Results
Let Δx1 and Δx2 be the spatial steps, Δt be the time step and (x1i , x2j ) = (iΔx1 , jΔx2 ) be the grid points, 1 ≤ i ≤ M and 1 ≤ j ≤ N . For a function Ψ : Ω → IR, let Ψijn = Ψ (iΔx1 , jΔx2 , nΔt). To discretize (4), we use an explicit finite difference scheme as follows. Also, the problem is complemented by Neumann boundary conditions. For the discretization of the convection component, we refer to [36] (we have used the usual notations for the finite difference operators and the notation d = (d1 , d2 )). n+1 n n n = vi,j + Δt bi,j Dx1 x1 vi,j + Dx2 x2 vi,j vi,j x1 n x1 n vi,j + min (d1 )i,j , 0 D+ vi,j (9) −Δt max (d1 )i,j , 0 D−
x2 n x2 n +max (d2 )i,j , 0 D− vi,j + min (d2 )i,j , 0 D+ vi,j − Δt hi,j +Δt μ
n n 2 n n n n n 2 Dx1 x1 vi,j (Dx1 vi,j ) +2Dx1 vi,j Dx2 vi,j Dx1 ,x2 vi,j +Dx2 x2 vi,j (Dx2 vi,j ) n 2 n 2 (Dx1 vi,j ) +(Dx2 vi,j ) +ε
.
Extrapolation of Vector Fields Using the Infinity Laplacian
95
Fig. 1. On the left, depiction of the initial gradient vector field W = −Dg(||DI||), on the right, the obtained vector field with our proposed approach (μ = 0.05, Δt = 0.1)
Fig. 2. On the left, depiction of the initial gradient vector field W = −Dg(||DI||), on the right, the obtained vector field with our proposed approach (μ = 0.1, Δt = 0.1)
4.1
Numerical Experimentations of Extrapolation
The experiments have been performed on a 2.21 GHz Athlon with 1.00 GB of RAM. In all our experiments, Δx1 = Δx2 = 1. We apply our model to real data and for each test, we provide a view of the initial gradient vector field −Dg(||DI||) and a view of the extrapolated vector field. The initialization was made either by setting v0 ≡ 0, or by setting v0 ≡ −g(||DI||). In all the tests we performed, it does not seem to influence the obtained result. The number of iterations as well as the computational time (order of the second) are similar for the three methods (GVF, NGVF and our proposed approach). Our method qualitatively performs in a way similar to the GVF and the NGVF: we increase the capture range of the vector field and we obtain downward components within the boundary concavity. Nevertheless, contrary to the the GVF and NGVF models, the method requires only one unknown. We start with an image taken from the Image Toolbox of Matlab (Fig. 1), and with an image showing a slice of Tuffeau
96
L. Guillot and C. Le Guyader
Fig. 3. Steps of the segmentation of the synthetic image taken from [34]
Fig. 4. Steps of the segmentation of the image of the brain
Extrapolation of Vector Fields Using the Infinity Laplacian
97
(Fig. 2, Courtesy of ISTO/ESRF). Our proposed approach performs well but seems to be sensitive to the textures of the objects contained in the image. 4.2
Application to Segmentation
This part is dedicated to segmentation and more precisely to the integration of this extrapolated vector field in the geodesic active contour model, in order to alleviate the constraint on the choice of the initial condition. The geodesic active contour model, introduced by Caselles et al. in [16], is cast in the level set setting developed by Osher and Sethian in [33]. We propose, as done in [34], to replace W = −Dg(||DI||) of the geodesic active contour model by the extrapolated vector field obtained with our proposed approach. To illustrate this, we propose an example taken from [34]. It demonstrates that the initial condition can be made of several contours selected inside, outside or across the boundaries of interest, provided the initial curves contain part of the skeleton of the extrapolated vector field. The classical geodesic active contour model does not authorize this flexibility in the initialization step and therefore the method alone would fail to detect all the shapes. Of course, the proposed method cannot detect automatically interior contours but this drawback is overcome, still with the flexibility in the initialization step. We illustrate this remark with Fig. 4 that represents a slice of the brain (Courtesy of the Laboratory Of Neuro Imaging, UCLA).
5
Conclusion
This paper was devoted to the theoretical study of a new method to extrapolate vector fields using the infinity Laplacian and with applications to image processing. Contrary to prior related works, the number of unknowns is reduced to a single one. The problem is phrased in a variational framework and the EulerLagrange equation is then derived. It is solved using a gradient descent method, which leads to a parabolic problem that falls within the viscosity solution theory framework. The existence and uniqueness of a viscosity solution continuous in space and time, Lipschitz continuous in space and uniformly continuous in time is established. The theoretical study is complemented by several numerical experimentations, first dedicated to the extrapolation problem, and then extended to the segmentation problem. The experimentations show that the proposed approach performs well, even if in strong concavities the results are slightly less accurate than with the NGVF. The model is sensitive to the geometry of the boundaries and to the textures present in the images. In the segmentation framework, the introduction of this new force field allows to widen the choice of the initial condition.
References 1. Aronsson, G.: Minimization problems for the functional supx F (x, f (x), f (x)). Arkiv für Mate. 6, 33–53 (1965) 2. Aronsson, G.: Minimization problems for the functional supx F (x, f (x), f (x)). II. Arkiv für Mate. 6, 409–431 (1966)
98
L. Guillot and C. Le Guyader
3. Aronsson, G.: Extension of functions satisfying Lipschitz conditions. Arkiv für Mate. 6(6), 551–561 (1967) 4. Aronsson, G.: On the partial differential equation u2x uxx + 2ux uy uxy + u2y uyy = 0. Arkiv für Mate. 7, 395–425 (1968) 5. Aronsson, G., Crandall, M., Juutinen, P.: A tour of the theory of absolutely minimizing functions. Bull. Amer. Math. Soc. (N.S.) 41, 439–505 (2004) 6. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the calculus of Variations. Springer, Heidelberg (2002) 7. Barles, G.: Solutions de viscosité des équations de Hamilton-Jacobi. Springer, Heidelberg (1994) 8. Barles, G.: Solutions de viscosité et équations elliptiques du deuxième ordre. Cours de DEA (1997) 9. Barles, G., Busca, J.: Existence and comparaison results for fully nonlinear degenerate elliptic equations without zeroth-order term. Comm. Partial Differential Equations 26, 2323–2337 (2001) 10. Barron, E.N., Evans, L.C., Jensen, R.: The infinity Laplacian, Aronsson’s equation and their generalizations. Trans. Amer. Math. Soc. 360(1), 77–101 (2008) 11. Brézis, H.: Analyse fonctionnelle. Dunod (1999) 12. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–698 (1986) 13. Carlsson, S.: Sketch based coding of grey level images. Signal Process. 15, 57–83 (1988) 14. Casas, J.R.: Image compression based on perceptual coding techniques. Ph.D. dissertation, Dept. Signal Theory Commun., UPC, Barcelona, Spain (1996) 15. Caselles, V., Catté, F., Coll, C., Dibos, F.: A geometric model for active contours in image processing. Numer. Math. 66, 1–31 (1993) 16. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic Active Contours. Int. J. Comput. Vision 22(1), 61–87 (1997) 17. Caselles, V., Morel, J.M., Sbert, C.: An Axiomatic Approach to Image Interpolation. IEEE Trans. Image Process. 7(3), 376–386 (1998) 18. Cohen, L.D.: On Active Contour Models and Balloons. CVGIP: Image Understanding 53(2), 211–218 (1989) 19. Cong, G., Esser, M., Parvin, B., Bebis, G.: Shape Metamorphism Using p-Laplacian Equation. In: ICPR, vol. 4, pp. 15–18 (2004) 20. Crandall, M.G.: A visit with the ∞-Laplace equation. Preprint, Notes from a CIME course (2005) 21. Crandall, M.G., Ishii, H., Lions, P.L.: User’s guide to viscosity solutions of second order partial differential equations. Bull. Amer. Math. Soc. 27, 1–67 (1992) 22. Crandall, M.G., Lions, P.L.: Viscosity solutions of Hamilton-Jacobi Equations. Trans. Amer. Math. Soc. 277, 1–42 (1983) 23. Crandall, M.G., Lions, P.L.: On existence and uniqueness of solutions of HamiltonJacobi equations. Non-Linear Anal. 10, 353–370 (1986) 24. Elion, C., Vese, L.A.: An image decomposition model using the total variation and the infinity Laplacian. In: Proceedings SPIE, vol. 6498, pp. 64980W-1–64980W-10 (2007) 25. Forcadel, N.: Dislocations dynamics with a mean curvature term: short time existence and uniqueness. Differential and Integral Equations 21(3-4), 285–304 (2008) 26. Ishii, H.: Existence and uniqueness of solutions of Hamilton-Jacobi equations. Funkcial. Ekvac. 29, 167–188 (1986) 27. Ishii, H., Lions, P.L.: Viscosity solutions of fully nonlinear second-order elliptic partial differential equations. J. Differ. Equations 83, 26–78 (1990)
Extrapolation of Vector Fields Using the Infinity Laplacian
99
28. Jensen, R.: Uniqueness of Lipschitz extensions minimizing the sup-norm of the gradient. Arch. Rat. Mech. Anal. 123(1), 51–74 (1993) 29. Jifeng, N., Chengke, W., Shigang, L., Shuqin, Y.: NGVF: An improved external force field for active contour model. Pattern Recogn. Lett. 28, 58–63 (2007) 30. Kass, M., Terzopoulos, D., Witkin, A.: Snakes: Active contour models. Int. J. Comput. Vision 1, 321–331 (1988) 31. Mémoli, F., Sapiro, G., Thompson, P.: Brain and surface warping via minimizing Lipschitz extensions. In: MFCA, International Workshop on Mathematical Foundations of Computational Anatomy (2006) 32. Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. AMS 22 (2001) 33. Osher, S., Sethian, J.A.: Fronts propagation with curvature dependent speed: Algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988) 34. Paragios, N., Mellina-Gottardo, O., Ramesh, V.: Gradient Vector Flow Fast Geodesic Active Contours. In: Proc. IEEE Intl. Conf. Computer Vision, vol. 1, pp. 67–73 (2001) 35. Prewitt, J.M.S.: Object enhancement and extraction. In: Lipkin, B., Rosenfeld, A. (eds.) Picture Processing and Psychopictorics, pp. 75–149. Academic Press, New York (1970) 36. Sethian, J.A.: Level Set Methods and Fast Marching Methods: Evolving interfaces in Computational Geometry. In: Fluid Mechanics, Computer Vision and Material Science. Cambridge University Press, Londres (1999) 37. Torre, V., Poggio, T.A.: On edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 147–163 (1986) 38. Xu, C., Prince, J.L.: Snakes, shapes, and gradient vector flow. IEEE Trans. Image Process. 7(3), 359–369 (1998) 39. Yuille, A.L., Poggio, T.A.: Scaling theorems for zero-crossings. IEEE Trans. Pattern Anal. Mach. Intell. 8, 15–25 (1986)
A Schrödinger Equation for the Fast Computation of Approximate Euclidean Distance Functions Karthik S. Gurumoorthy and Anand Rangarajan Dept. of CISE, University of Florida, Gainesville, FL, USA
Abstract. Computational techniques adapted from classical mechanics and used in image analysis run the gamut from Lagrangian action principles to HamiltonJacobi field equations: witness the popularity of the fast marching and fast sweeping methods which are essentially fast Hamilton-Jacobi solvers. In sharp contrast, there are very few applications of quantum mechanics inspired computational methods. Given the fact that most of classical mechanics can be obtained as a limiting case of quantum mechanics (as Planck’s constant h tends to zero), this paucity of quantum mechanics inspired methods is surprising. In this work, we derive relationships between nonlinear Hamilton-Jacobi and linear Schrödinger equations for the Euclidean distance function problem (in 1D, 2D and 3D). We then solve the Schrödinger wave equation instead of the corresponding HamiltonJacobi equation. We show that the Schrödinger equation has a closed form solution and that this solution can be efficiently computed in O(N log N ), N being the number of grid points. The Euclidean distance can then be recovered from the wave function. Since the wave function is computed for a small but non-zero h, the obtained Euclidean distance function is an approximation. We derive analytic bounds for the error of the approximation and experimentally compare the results of our approach with the exact Euclidean distance function on real and synthetic data.
1 Introduction Image analysis [1,2] has a tradition of importing and adapting a host of classical physics based approaches including Lagrangian based variational principles and their associated Euler-Lagrange equations [3], Hamiltonian dynamics [4] and more recently HamiltonJacobi based methods [5]. Approaches in image analysis do not strictly adhere to the classical mechanics sequence [6] of i) first specifying a Lagrangian action principle, ii) deriving the corresponding Euler-Lagrange equation, iii) employing a Legendre transformation to convert the Lagrangian dynamics to a first-order Hamiltonian dynamics, and finally, iv) employing a canonical transformation to derive the Hamilton-Jacobi equation whose solution also yields a solution to the original variational problem. Instead, most research in image analysis uses a combination of one or more of these four approaches depending on the problem at hand. For example, in surface reconstruction [3], a popular approach consists of writing a variational form and then finding a solution using preconditioned conjugate gradient or quasi-Newton type iterative methods. While we notice a plethora of classical mechanics inspired techniques in image analysis, the same cannot be said about quantum mechanics. Despite the well known fact X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 100–111, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Schrödinger Equation for the Fast Computation
101
that most of classical mechanics is a limiting case of quantum mechanics as Planck’s constant → 0 [7], there is very little application of quantum mechanical principles in image analysis problems. Rather than speculate on the reasons for this dearth of applications, we wish to point out that in this paper, we are primarily interested in exploiting a concrete relationship between the classical, non-linear Hamilton-Jacobi equation and the quantum, linear Schrödinger equation. We feel that focusing more narrowly on this relationship (which will become more obvious as we proceed) is more productive than dwelling on the more mysterious and specifically quantum mechanical issues of i) interpretation of the wave function, ii) role of probabilities and, iii) the problem of measurement. While these issues are certainly important, they do not play any role in this paper. In summary, we are mainly interested in exploiting the relationship between the Schrödinger and Hamilton-Jacobi equations in order to derive computationally efficient algorithms which are applicable in image analysis problems where Hamilton-Jacobi theory is used. In the theoretical physics literature, a Schrödinger wave equation at the energy state h E has the form ψ(X, t) = φ(X) exp( iEt ), ( ≡ 2π ) [8], where φ(X)—the stationary state wave function—is the eigenstate of the Hamiltonian operator H corresponding to the eigenvalue E. When the Hamilton-Jacobi scalar field S ∗ appears as the exponent ∗ of the stationary state wave function, specifically φ(X) = exp( −S (X) ), and if φ(X) satisfies the linear Schrödinger equation, namely Hφ = Eφ, we show that as → 0, S ∗ satisfies the Hamilton-Jacobi equation for a carefully chosen problem. The novel aspect is that a nonlinear Hamilton-Jacobi equation is obtained in the limit as → 0 of a linear Schrödinger equation. Consequently, instead of solving the Hamilton-Jacobi equation, one can solve the Schrödinger wave equation (taking advantage of its linearity), and then compute an approximate S ∗ for small values of . This computational procedure would be approximately equivalent to solving the original Hamilton-Jacobi equation. With the basic setup in place, we now turn our attention to an actual application. Our goal is to apply the Schrödinger formalism to a well known problem that has been successfully attacked by Hamilton-Jacobi theory. To this end, we choose the problem of computing Euclidean distance functions on a grid for a given set of points where the task is to assign at each grid point a value corresponding to the Euclidean distance to its nearest neighbor from the given point-set. The literature is replete with elegant and pioneering works which have successfully solved this problem. To name a few, the well known fast marching [9] and fast sweeping [10] methods are essentially O(N log N ) Hamilton-Jacobi based algorithms where N is the number of grid points. [The fast sweeping method has an added advantage in that the algorithm appears (empirically)1 to be O(N ).] These techniques focus on directly solving the non-linear eikonal equation [4], whereas our approach shows that one can instead solve a linear equation and obtain the solution to the non-linear eikonal equation in the limit as → 0. The important connection between the Schrödinger wave equation and the Hamilton-Jacobi equation [7] is illuminated during the process. In the more traditional (computer science) algorithms literature, computational geometry inspired techniques like Voronoi diagrams and KD-Trees [11] have also solved this problem. However, computing Voronoi 1
After a careful reading of [10], it is still unclear to us if the fast sweeping method is formally rather than empirically O(N ).
102
K.S. Gurumoorthy and A. Rangarajan
diagrams or building data structures for KD-Trees in 3D and higher dimensions is expensive and the O(N log N ) complexity is not retained at higher dimensions [11]. Our technique (which as we shall see is based on application of the fast Fourier transform (FFT) [12]) is very simple and elegant and remains O(N log N ) irrespective of the spatial dimension. The basic question asked and answered in this paper is: Can we design a Schrödinger equation that computes an exponentiated Euclidean distance function on a grid? The distance function is obtained from the exponent of the Schrödinger wave function. A naïve approach to solve this Euclidean distance problem would be to visit every grid point and compute the Euclidean distance to all members of the point-set and pick the smallest distance. The complexity of this naïve approach is obviously related to the product of the number of grid points and the cardinality of the point-set. The fast sweeping and fast marching methods avoid this naïve complexity and as we shall see, so does the Schrödinger wave function approach.
2 Euclidean Distance Functions We now describe the Euclidean distance function problem. Given a point-set Y = {Yk ∈ RD , k ∈ {1, . . . , K}} where D is the dimensionality of the point-set and a set of equally spaced Cartesian grid points X, the Euclidean distance function problem requires us to assign S ∗ (X) = min X − Yk (1) k
with the Euclidean norm used in (1). If efficient computation is a non-issue, this is a simple problem. We visit each grid point in the set X and compute the distance to every point Yk , ∀k ∈ {1, . . . , K} and assign S ∗ (X) the minimum distance. If the number of grid points is N , the naïve complexity is O(N KD). 2.1 Hamilton-Jacobi Formulation The Hamilton-Jacobi equation approach to the Euclidean distance function problem stems from considering the following variational problem which in 2D is I[q] =
t1
L(q1 , q2 , q˙1 , q˙2 , t)dt
(2)
to
where the Lagrangian L is defined as L(q1 , q2 , q˙1 , q2˙, t) ≡
1 2 (q˙1 + q˙2 2 ). 2
Defining pi ≡ ∂∂L q˙i and applying a Legendre transformation [6] [by inverting pi = to get the function q˙i (q1 , q2 , p1 , p2 , t)], we define the Hamiltonian of the system as
(3) ∂L ∂ q˙i
H(q1 , q2 , p1 , p2 , t) ≡ p1 q˙1 + p2 q˙2 − L(q1 , q2 , q˙1 (q1 , q2 , p1 , p2 , t), q˙2 (q1 , q2 , p1 , p2 , t)) (4)
A Schrödinger Equation for the Fast Computation
103
which for the Euclidean distance function problem turns out to be H(q1 , q2 , p1 , p2 , t) =
1 2 (p + p22 ). 2 1
(5)
The Hamilton-Jacobi equation is obtained via a canonical transformation [6] of the Hamiltonian. In the 2D case, it is ∂S ∂S ∂S + H(q1 , q2 , , , t) = 0 ∂t ∂q1 ∂q2
(6)
where we have replaced the generalized momentum variables pi with ∂S ∂L = pi = . ∂qi ∂ q˙i
(7)
Since the Hamiltonian in (5) is a constant independent of time, equation (6) can be simplified to the static Hamilton-Jacobi equation. By separation of variables, we get S(q1 , q2 , t) = S ∗ (q1 , q2 ) − Et
(8)
∗
where E is the total energy of the system and S (q1 , q2 ) is called the Hamilton’s characteristic function [13]. Using the definition of pi from (7) in (5) and observing that ∂S ∂S ∗ ∂qi = ∂qi , we get 2 ∗ 2 1 ∂S ∗ ∂S = E. (9) + 2 ∂q1 ∂q2 Choosing the energy E to be 12 , we obtain S∗ = 1
(10) ∗
which is the well known eikonal equation [4] where the forcing term is 1. S is the required Hamilton-Jacobi scalar field which is efficiently obtained by the fast sweeping and fast marching methods. 2.2 Schrödinger Wave Equation for Euclidean Distance Functions We now derive and solve a Schrödinger wave equation for this Euclidean distance function problem instead of solving the non-linear eikonal equation. The Schrödinger wave equation is written as [8] i
∂ψ = Hψ ∂t
(11)
where ψ(X, t) is the wave function and H is the Hamiltonian operator obtained by first ∂ quantization2—where the momentum variables pi are replaced with the operator i ∂x . i Using the definition of our Hamiltonian from (5), ψ satisfies (in 2D) 2 ∂ 2 ψ ∂ 2ψ ∂ψ = . (12) + i ∂t 2 ∂x21 ∂x22 2
First quantization is still mysterious. For an informal but illuminating treatment, please see http://math.ucr.edu/home/baez/categories.html
104
K.S. Gurumoorthy and A. Rangarajan
Using separation of variables ψ(X, t) = φ(X)f (t), we get 2 2 φ f˙ = =E (13) if 2 φ where E is the energy state of the system. By choosing the energy of the system to be 1 2 as before, we get it (14) f (t) = exp 2 and hence it (15) ψ(X, t) = φ(X) exp 2 where φ(X) satisfies the equation 2 2 φ = φ.
(16)
2.3 Eikonal Equation for Euclidean Distance Functions ∗
∗ We now show that when φ = exp( −S ) and satisfies (16), S asymptotically satisfies the eikonal equation (10) as → 0. We show this for the 2D case but the generalization to higher dimensions is straightforward. ∗ 1 ,x2 ) When φ(x1 , x2 ) = exp( −S (x ), the first partials of φ are ∗ 1 1 ∂φ −S ∂S ∗ ∂φ −S ∗ ∂S ∗ = − exp , = − exp . (17) ∂x1 ∂x1 ∂x2 ∂x2
The second partials needed for the Laplacian are ∗ 2 1 1 ∂2φ ∂S −S ∗ ∂ 2 S ∗ −S ∗ exp = exp − , ∂x21 2 ∂x1 ∂x21 ∗ 2 1 1 ∂S −S ∗ ∂ 2 S ∗ ∂2φ −S ∗ = 2 exp − exp . ∂x22 ∂x2 ∂x22 From this, equation (16) can be rewritten as ∗ 2 ∗ 2 2 ∗ ∂2S ∗ ∂S ∂S ∂ S =1 + − + ∂x1 ∂x2 ∂x21 ∂x22
(18)
(19)
which in simplified form is S ∗ 2 − 2 S ∗ = 1.
(20)
The additional 2 S ∗ term [relative to (10)] is referred to as the viscosity term [9]. (Note that this term emerges naturally from the Schrödinger equation derivation—an intriguing result.) Since | 2 S ∗ | is bounded, as → 0, (20) tends to S∗ = 1
(21)
which is the original eikonal equation (10) for the Euclidean distance function. This relationship motivates us to solve the linear Schrödinger equation (16) instead of the non-linear eikonal equation and then compute the distance function via S ∗ (X) = − log φ(X).
(22)
A Schrödinger Equation for the Fast Computation
105
3 Closed Form Solutions for the Approximate Euclidean Distance Function and Proofs of Convergence We now derive the closed form solution for φ(X) (in 1D, 2D and 3D) satisfying equation (16) and hence for S ∗ (X) by (22) and observe that we get the actual Euclidean distance function in the limit as → 0. In order to satisfy the condition that S ∗ (Yk ) = 0,∀Yk , k ∈ {1, . . . , K}, we consider the forced version of the equation (16) which is −2 2 φ + φ =
K
δ(X − Yk ).
(23)
k=1
Using a Green’s function approach [14] (where the form of the solution depends on the number of spatial dimensions), we can write expressions for the solution φ. Below, let r = mink X − Yk —the actual Euclidean distance function at the grid point X. 1D: In 1D, the solution [14] for φ is 1 exp 2 K
φ(X) =
k=1
−|X − Yk |
.
(24)
Using the relationship in (22), we get ∗
S (X) = − log
K
exp
k=1
Observe that
−|X − Yk |
−r S (X) ≤ − log exp = r + log(2). ∗
Also,
+ log (2) .
(25)
+ log(2)
−r + log(2) S ∗ (x) ≥ − log K exp = − log K + r + log(2).
(26)
(27)
As → 0, log K → 0 and log → 0. Furthermore, we see from (26) and (27) that lim S ∗ (X) = r.
(28)
K 1 X − Yk φ(X) = K0 2π2
(29)
→0
2D: In 2D, the solution [14] for φ is
k=1
106
K.S. Gurumoorthy and A. Rangarajan
where K0 is the modified Bessel function of the second kind. Using (22), we get S ∗ (X) = − log
K
K0
k=1
X − Yk
Then,
r
S ∗ (X) ≤ − log K0 Using the relation K0 ( hr ) ≥
r exp(− h )
√r
when
h
∗
S (X) ≤ − log = − log Moreover
r h
exp r
+ log(2π2 ).
+ log(2π2 ).
(30)
(31)
≥ 0.5, we get
−r
+ log(2π2 )
+ r + log(2π2 ). r
−r + log(2π2 ). S ∗ (X) ≥ − log KK0
(32)
(33)
r Using the relation K0 ( r ) ≤ exp( −r ) when h ≥ 1.5, we get −r + log(2π2 ) S ∗ (X) ≥ − log K exp
= − log K + r + log(2π2 ).
(34)
As → 0, log K → 0, log r → 0 and log → 0. Furthermore, we see from (32) and (34) that lim S ∗ (X) = r. (35) →0
3D: In 3D, the solution [14] for φ is based on the modified spherical Bessel function of the second kind:
K exp −X−Yk 1 φ(X) = . (36) 4π2 X − Yk k=1
Using (22), S ∗ (X) = − log
K exp k=1
−X−Yk
X − Yk
+ log 4π2 .
(37)
Then, ∗
exp
−r
+ log(4π2 ) r
= r + log r + log 4π2 .
S (X) ≤ − log
(38)
A Schrödinger Equation for the Fast Computation
Also,
S ∗ (X) ≥ − log K
exp
−r
r
107
+ log(4π2 )
= − log K + r + log r + log(4π2 ).
(39)
As → 0, log K → 0, log r → 0 and log → 0. Furthermore, we see from (38) and (39) that lim S ∗ (X) = r. (40) →0
Hence, we have shown that (in 1D, 2D and 3D), the closed form solution for φ guarantees that S ∗ approaches the true Euclidean distance function in the limit → 0.
4 Error Bound between the Obtained and True Euclidean Distance Function The solution for φ in 1D, 2D and 3D motivates us to compute the function K −X − Yk ˜ exp φ(X) =
(41)
k=1
(instead of computing φ) and then to compute the approximate distance function ˜ S˜∗ (X) = − log φ(X)
(42)
(instead of computing S ∗ ). The reasons are two-fold. Firstly, φ˜ can be computed efficiently in O(N log N ) time using the fast Fourier transform (FFT) [12] as explained in the subsequent section. Secondly, lim→0 S˜∗ (X) = r, as shown below, where r is the true Euclidean distance function value at the grid point X (r = mink X − Yk ), and this is the lim→0 S ∗ (X) as seen from the previous section. Hence, for small values of ∗ , S˜∗ (X) is a very good
approximation to S (X).
K −X−Yk ˜∗ can be approximated as exp −r As → 0, k=1 exp . Hence S (X)
−r ≈ − log exp = r. The bound derived below between S˜∗ (X) and r also unveils the proximity between the computed and the actual Euclidean distance function. Note from (41) that −r ∗ ˜ S (X) ≤ − log exp = r. (43) Also, observe that
and hence,
−r S˜∗ (X) ≥ − log K exp = − log K + r
(44)
r − S˜∗ (X) ≤ log K.
(45)
108
K.S. Gurumoorthy and A. Rangarajan
From (43) and (45),
|r − S˜∗ (X)| ≤ log K. (46) ∗ ˜ Equation (46) shows that as → 0, S (X) → r. It is worth commenting that the bound log K is actually very tight as (i) it scales only as the logarithm of the cardinality of the point-set (K) and (ii) it can be made arbitrarily small by choosing a small but non-zero value of .
5 Efficient Computation of the Approximate Euclidean Distance Function The motivation for computing φ˜ instead of φ is the fact that the direct computation of φ at the N grid locations is O(N K) which is O(N 2 ) when the cardinality of the point-set is O(N ), whereas computing φ˜ at the N grid locations can be done in O(N log N ) using an FFT implementation [12]. The realization that FFT can be employed
to compute K k φ˜ stems from the insight that the summation term, namely, k=1 exp −X−Y is
actually the discrete convolution between the functions f (X) = exp −X com puted at the grid locations, with the function g(X) which takes the value 1 at the pointset locations and 0 at other grid locations. By the convolution theorem [15], a discrete convolution can be obtained as the inverse Fourier transform of the product of two individual transforms, which for two O(N ) sequences can be computed in O(N log N ) time and hence φ˜ can be determined efficiently at the N grid locations. One just needs to compute the discrete Fourier transform (DFT) of the sampled version of the functions f (X) and g(X), compute their point-wise product and then compute the inverse discrete Fourier transform. Taking the logarithm of the inverse discrete Fourier transform and multiplying it by (−), gives the approximate Euclidean distance function. The algorithm is adumbrated in Table 1. Table 1. Approximate Euclidean distance function algorithm
1. Compute the function f (X) = exp −X at the grid locations. 2. Define the function g(X) which takes the value 1 at the point-set locations and 0 at other grid locations. 3. Compute the FFT of f and g, namely F (u) and G(u) respectively. 4. Compute the function H(u) = F (u) ∗ G(u). ˜ 5. Compute the inverse FFT of H which gives φ(X) at the grid locations. ˜ 6. Take the logarithm of φ(X) and multiply it by (−) to get the approximate Euclidean distance function at the grid locations.
5.1 Computation of the Approximate Euclidean Distance Function in Higher Dimensions Our technique has a straightforward generalization to higher dimensions. Regardless of the spatial dimension, the approximate Euclidean distance function, S˜∗ can be computed by exactly following the steps delineated in the table above. It is worthwhile mentioning that computing the discrete Fourier transform using FFT is always O(N log N )
A Schrödinger Equation for the Fast Computation
109
irrespective of the spatial dimension. Hence, for all dimensions, S˜∗ can be computed at the given N grid points, in O(N log N ). This speaks for the scalability of our technique, which is generally not the case with other methods, for example KD-Trees [11].
6 Experiments In this section, we show the efficacy of our technique by computing the approximate Euclidean distance function S˜∗ and comparing it to the actual Euclidean distance function S, first on randomly generated 2D point-sets and then on a set of bounded 3D grid points. We began with a 2D grid consisting of points between (−30, −30) and (30, 30). Hence, the total number of grid points is N = 61 × 61 = 3721. We randomly chose around 1000 grid locations as data points (point-set). Then 50, 000 experiments were run for values of ranging from 0.1 to 0.5 in steps of 0.01. The errorbar plot in Figure 1 shows the mean and standard deviation of the percentage error at each value of . The error is less than 0.5% at = 0.1 demonstrating the algorithm’s ability to compute accurate Euclidean distances. Next, we took the Stanford bunny dataset3 and used the coordinates of the data points on the model as our point-set locations. Since the input data locations need not be at integer locations, we scaled the space uniformly in all dimensions and rounded off the data so that the data lies at integer locations. The input data was also shifted so that it was approximately symmetrically located with respect to the x, y and z axis. We should comment that shifting the data doesn’t affect the Euclidean distance function value and uniform scaling of all dimensions is also not an issue, as the distances can be rescaled once they are computed. After these basic data manipulations, the cardinality of the point set was K = 3019 with the data confined to the cubic region −16 ≤ x ≤ 16, −15 ≤ y ≤ 15 and −12 ≤ z ≤ 12. Our grid consisted of the set of all integer locations within this cubic region. The number of grid locations was N = 25575. We computed the Euclidean distance function value at each of these grid locations using our technique for different values of and compared it to the true Euclidean distance function value. −X At small values of , exp drops off very quickly and hence for grid locations which are far away from the point-set, the convolution done using FFT needs to be precise (without round-off error) for the computed distance to be meaningful. Such high precision support may not be available and hence our technique may produce erroneous results at these grid locations for very small values of . But at those grid locations which are close to the point-set, the accuracy of the computed distance improves as is decreased. Hence, to circumvent this problem of choosing , we ran our technique for different values of and chose the distance function values obtained at large values of at those grid locations whose average computed distance is larger and vice versa.
3
Go to http://www.cc.gatech.edu/projects/large_models/bunny.html to obtain this dataset.
110
K.S. Gurumoorthy and A. Rangarajan
percentage error
20
15
10
5
0
0.1
0.2
0.3 hbar values
0.4
0.5
Fig. 1. Percentage error versus in 50,000 2D experiments
Fig. 2. Isosurfaces: (i) Left: Actual Euclidean distance function and (ii) Right: Our approach
When we ran our technique for the set of ∈ {0.1, 0.2, 0.3, 0.4} and used {3, 6, 10} respectively as the threshold of the average computed distance for choosing the appropriate distance function, it gave the following set of results. The maximum absolute difference between the actual and the computed Euclidean distance value over all the grid locations is 0.9066 and the average absolute difference is 0.1322. The accuracy is fairly high since the furthest grid point from the point-set is at a distance of 15.5242 and the average overall distance computed at the grid locations from the point-set is 3.5449. The average error is 0.1322 3.5449 ∗ 100 = 3.72%. This error can be lowered by using higher precision numerical methods for convolution [16]. We plotted the isosurface obtained by connecting the grid points which are at a distance of 0.5 from the point set, determined both by the true Euclidean distance function and our technique. Figure 2 shows the two surfaces. Notice the similarity between the two plots. It provides anecdotal visual evidence for the usefulness of our approach.
7 Discussion In this paper, we have introduced a new approach to solving the non-linear eikonal equation (with a constant forcing term equal to 1). We have proved that the solution of the eikonal equation can be obtained as a limiting case of the solution to the corresponding linear Schrödinger wave equation. The key here is the embedding of the nonlinear Hamilton-Jacobi equation in a linear Schrödinger equation. Our Schrödinger wave equation formalism for solving the Euclidean distance function problem (which has been successfully attacked by pioneering Hamilton-Jacobi solvers such as the fast sweeping [10] and fast marching [9] methods) leverages this deep relationship between the two regimes of modern physics. In the future, we would like to solve the more
A Schrödinger Equation for the Fast Computation
111
general, static Hamilton Jacobi equation using techniques inspired from quantum mechanics as a counterpart to classical mechanics based techniques. In all likelihood, this will involve direct discretization of the Schrödinger wave equation which was not required for the Euclidean distance function problem. We expect that the linearity of the Schrödinger equation will result in fast algorithms even in this more general setting.
References 1. Horn, B.K.P.: Robot vision. MIT Press, Cambridge (1986) 2. Kimmel, R.: Numerical geometry of images: Theory, algorithms, and applications. Springer, Heidelberg (2003) 3. Grimson, W.E.L.: An implementation of a computational theory of visual surface interpolation. Computer Vision, Graphics, and Image Processing 22(1), 39–69 (1983) 4. Siddiqi, K., Tannenbaum, A., Zucker, S.W.: A Hamiltonian approach to the eikonal equation. In: Hancock, E.R., Pelillo, M. (eds.) EMMCVPR 1999. LNCS, vol. 1654, pp. 1–13. Springer, Heidelberg (1999) 5. Kao, C.Y., Osher, S.J., Tsai, Y.H.: Fast sweeping methods for static Hamilton-Jacobi equations. SIAM Journal on Numerical Analysis 42(6), 2612–2632 (2004) 6. Goldstein, H., Poole, C.P., Safko, J.L.: Classical mechanics. Addison-Wesley, Reading (2002) 7. Butterfield, J.: On Hamilton-Jacobi theory as a classical root of quantum theory. In: Elitzur, A., Dolev, S., Kolenda, N. (eds.) Quo-Vadis Quantum Mechanics, pp. 239–274. Springer, Heidelberg (2005) 8. Griffiths, D.J.: Introduction to quantum mechanics. Addison-Wesley, Reading (2004) 9. Osher, S.J., Sethian, J.A.: Fronts propagating with curvature dependent speed: algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics 79(1), 12–49 (1988) 10. Zhao, H.K.: A fast sweeping method for eikonal equations. Mathematics of Computation 74, 603–627 (2005) 11. de Berg, M., Cheong, O., van Kreveld, M., Overmars, M.: Computational geometry: Algorithms and applications. Springer, Heidelberg (2008) 12. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19(90), 297–301 (1965) 13. Arnold, V.I.: Mathematical methods of classical mechanics. Springer, Heidelberg (1989) 14. Abramowitz, M., Stegun, I.A.: Handbook of mathematical functions with formulas, graphs, and mathematical tables. Government Printing Office, USA (1964) 15. Bracewell, R.N.: The Fourier transform and its applications. McGraw-Hill Science and Engineering, New York (1999) 16. Hida, Y., Li, H.S., Bailey, D.H.: Quad-double arithmetic: Algorithms, implementation, and application. Technical Report LBNL-46996, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (2000)
Semi-supervised Segmentation Based on Non-local Continuous Min-Cut Nawal Houhou1 , Xavier Bresson2 , Arthur Szlam2 , Tony F. Chan2 , and Jean-Philippe Thiran1 1
Signal Processing Laboratory 5, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland 2 Department of Mathematics, University of California Los Angeles, CA 90095-1555, USA
Abstract. We propose a semi-supervised image segmentation method that relies on a non-local continuous version of the min-cut algorithm and labels or seeds provided by a user. The segmentation process is performed via energy minimization. The proposed energy is composed of three terms. The first term defines labels or seed points assigned to objects that the user wants to identify and the background. The second term carries out the diffusion of object and background labels and stops the diffusion when the interface between the object and the background is reached. The diffusion process is performed on a graph defined from image intensity patches. The graph of intensity patches is known to better deal with textures because this graph uses semi-local and non-local image information. The last term is the standard TV term that regularizes the geometry of the interface. We introduce an iterative scheme that provides a unique minimizer. Promising results are presented on synthetic textures a nd real-world images.
1
Introduction
Image segmentation is an important problem in image processing. The objective of segmentation algorithms is to partition an image into a finite number of semantically important regions such as anatomical or functional structures in medical images or objects in natural images. Well-posed approaches to solve the image segmentation problem are energy minimization methods. This paper introduces an energy minimization algorithm to solve the semi-supervised segmentation problem based on the continuous min cut/max flow model originally defined by Strang in [1]. Semi-supervised segmentation models defined in a continuous setting have already been proposed in the literature. Among them, Protiere and Sapiro proposed in [2] an interactive algorithm for segmentation. Cremers et al. introduced in [3] an algorithm based on the level set method to perform interactive image segmentat ion. Appleton and Talbot introduced in [4] a semi-supervised segmentation model based on the continuous min-cut of Strang. Unger et al. defined in [5] a segmentation method also based on the min-cut model of Strang in [1]. The semi-supervised segmentation models using the continuous min-cut are based on local image information. These models X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 112–123, 2009. c Springer-Verlag Berlin Heidelberg 2009
Semi-supervised Segmentation Based on Non-local Continuous Min-Cut
113
s.a. [4,5] perform very well for the segmentation of smooth regions, but they are less efficient with textures. In this paper, we extend the continuous min cut to a non-local formulation along the same line as non-local means defined by Buades, Coll and Morel in [6] and the variational non-local means model of Gilboa and Osher [7]. This non-local extension of the continuous min cut can be obtained in different ways. We used the original discrete min-cut model [8] to define the non-local continuous min-cut, which turns to be the H 1 norm defined on graph and a term that constraints the labels. The H 1 norm carries out the diffusion of object and background labels on the graph of image patches [9, 6], which holds semi-local and non-local image information that can better segment textures and real-world objects. Besides, the continuous formulation of the min-cut algorithm allows us introducing other regularization processes such as the TV energy. The TV energy is indeed useful to regularize the boundary between the object and the background. Besides, the TV energy can smooth out the segmentation of small sets favored by the min-cut algorithm as noticed by Shi and Malik in [10] (see Figure 1).
2
Graph, Min-Cut and Diffusion
Graph representation. Let G = (V, E) be a weighted undirected graph, where the sets V are the graph nodes and E the edges connection nodes. In this paper, each node Vi represents a pixel i in an image I with support Ω ⊂ Rn where typically n = {2, 3}. The similarity between two pixels/nodes i and j in Ω is measured by the edge function on the graph, namely wij . In the case of image segmentation, two pixels i and j that belong to the same object/class are said to be connected and define a measure wij close to unity. Inversely, two pixels i and j that do not belong to the same class are said to be not connected and define a measure wij close to zero. A standard construction approach for the weight matrix wij is as follows. Let h(i, j) be some general non-negative distance measure between nodes i and j, then the weight wij is computed with a Gaussian kernel of 0-mean and variance σ such that: h(i, j) 1 exp(− ), Z σ2 where σ is the scaling parameter and Z is the normalization factor. wij =
Image Feature. The distance h(i, j) depends on image feature. The choice of features is difficult and critical to get an optimal segmentation result. For piecewise smooth and constant images, the gray-level value can be enough. For texture images, a feature vector at each pixel computing from a filter bank (as suggested in e.g. [11]) can be efficient. A recent promising image feature to represent and process textures is the image intensity patch around the current pixel. The patch idea as feature vector was first introduced for texture synthesis [9, 12, 13] then for image denoising. Buades et al. in [6] proposed to compute the weight matrix with patch differences and denoise the image with a non-local averaging. Gilboa and Osher in [7] proposed a variational model for non-local denoising based on
114
N. Houhou et al.
patch differences. Fina lly, Bresson and Chan in [14] proposed a variational unsupervised segmentation method also based on patch differences. In this paper, we will use the graph of image patches of Buades et al. Min-Cut. By definition a binary cut partitions a graph into two subsets. This partition process for graphs can be used for image segmentation when we want to find an object and the background. In optimization theory about maximum flows in flow networks [8], the optimal partition of the graph V into two sets A and B such that A ∪ B = V and A ∩ B = ∅ can be computed by finding the minimal cut (min-cut), i.e. the minimization of the inter-similarity between two sets A and B of V . In other words, given two particular nodes s ∈ A and t ∈ B in the graph, the min-cut partition can be written as: min − cut(A, B) = min −wij xi xj , (1) x
xi >0,xj 0 (resp. xi < 0). Then (1) can be written in the matrix form as follows: cut(A, B) = which implies: xT (D − W )x =
1 T x (D − W )x, 4 1 wij (xi − xj )2 . 2 i,j
The weighted Graph Laplacian corresponds to a finite difference approximation of the continuous Lapacian operator. The graph Laplacian can also be a non-local operator. Semi-supervised segmentation. We observe that min-cut partitioning algorithms are defined as semi-supervised segmentation techniques. The min-cut seeks for
Semi-supervised Segmentation Based on Non-local Continuous Min-Cut
115
the optimal partition of the graph given particular nodes called the source "s" and the sink "t". Hence, it is easy to assign some pixels as source and some as sink if the pixel belongs to the object or the background. Several graphbased partitioning methods have been proposed in the literature s.a. [17, 21, 22, 23, 18]. The previous papers are based on discrete minimization methods to compute the min-cut given the labels. In this paper, we propose a continuous minimization method to solve the min-cut problem with labels and non-local image information.
3 3.1
Proposed Segmentation Method Continuous Min-Cut
Energy minimization problem. In this section a new non-local semi-supervised segmentation algorithm is introduced. The algorithm relies on the continuous formulation of the discrete min-cut problem defined as: min − cut(A, B) = min x
s.t.
wij (xi − xj )2
(3)
i,j
xk = +1, ∀k ∈ S xk = −1, ∀k ∈ T,
where S are the labels selected for the object and T are the labels assigned to the background. We propose the continuous min-cut (CMC) problem as follows (which is a constrained minimization problem w.r.t. a real-valued function u): 1 CM C(u) = min w(x, y)(u(x) − u(y))2 dxdy u 2 Ω×Ω s.t. u(x) = 1, ∀x ∈ S u(x) = 0, ∀x ∈ T, which is equivalent to this unconstrained minimization problem for u: 1 2 w(x, y)(u(x) − u(y)) dxdy + λ(x)(u − u0 )2 dx, (4) CM C(u) = min u 2 Ω×Ω Ω 1 if x ∈ S ∞ if x ∈ S ∪ T where u0 (x) = and λ(x) = , 0 if x ∈ T 0 otherwise where function λ provides the degree of confidence with respect to the labels. Non-local H 1 energy. The first term of (4) is deduced from (3) using the change of variable ui = xi2+1 ∈ {0, 1} then relaxing ui to [0, 1]. This term is also known as the non-local H 1 energy ( [24, 7]) defined as: 1 1 1 2 HG (u) = w(x, y)(u(x) − u(y)) dxdy = |∇G u|2 dx = ||u||HG1 , (5) 2 Ω×Ω 2 Ω
116
N. Houhou et al.
where |∇G u|2 := Ω w(x, y)(u(x)−u(y))2 dy is the square norm of the continuous graph gradient of u. The optimality condition for (5) is: w(x, y)(u(x) − u(y))dy = ΔG u = 0, Ω
where ΔG u is the continuous graph Laplacian of u. Labels. The second term of (4) introduces the hard constraint of labels in the energy minimization approach. This term comes from Unger et al. in [25], which incorporates seed points (assigned either to the object or to the background) in the geodesic active contour/snake model [26]. This term constrains function u to be equal to u0 for x ∈ S ∪ T and being equal to anything else for x ∈ S ∪ T . Function λ is highly discontinuous, which requires some regularization process to handle it. Unger et al. proposed a splitting operation to solve this problem. A new function v is introduced s.t.: 1 min λ(x)(v − u0 )2 dx + ||u − v||22 , u 2θ Ω where the term ||u − v||22 forces v ≈ u as θ → 0. The optimality solution w.r.t. v leads to: 2λ(x)θu0 − u u if λ → 0 = v= . u if λ → ∞ 2λ(x)θ − 1 0 3.2
Proposed Semi-supervised Segmentation Algorithm: Continuous Min-Cut + TV
Final model. The previous section introduced the continuous formulation of the min cut problem. In this section, we proposed to merge the continuous min-cut with the Total Variation (TV) energy. The TV term offers two advantages. First, TV regularizes the geometry of the contour between classes (object and background). Experiments showed that the continuous min-cut can provide irregularities along the contour. Second, Shi and Malik in [10] observed that the min-cut algorithm tends to favor misclassification of small sets, which are smoothed out with the TV regularization process. Finally, we propose the following energy minimization model for semisupervised segmentation: E(u) = ||u||HG1 + λ(x)(u − u0 )2 dx + β||u||T V , (6) where ||u||T V =
Ω
Ω
|∇u|dx.
Minimization process. A direct use of the calculus of variation to (6) will produce a very slow minimization process. We propose to use a splitting operation to minimize E more efficiently. We introduce two new functions v, s s.t.: λ(x)(v − u0 )2 dx + β||s||T V E(u, v, s) = ||u||HG1 + Ω
1 1 + ||u − v||22 + ||s − v||22 . 2θv 2θs
(7)
Semi-supervised Segmentation Based on Non-local Continuous Min-Cut
117
1 2θv ||u − w(x,y)u(y)dy+v(x) Ω . Funcθv Ω w(x,y)dy+1
Then, v, s being fixed, we search for u as the solution of minu ||u||HG1 + v||22 , which is given by a fixed point method as u =
θv
tions u, s being fixed, we search for v as the solution of minv Ω λ(x)(v − vs u0 )2 dx + 2θ1v ||u − v||22 + 2θ1s ||s − v||22 , which is given by v = θsθv+θ if λ = 0 s +θv and v = u0 if λ = ∞. Functions u, v being fixed, we search for s as the solution of mins β||s||T V + 2θ1s ||s − v||22 , which solution is given e.g. the Projection algorithm of Chambolle [27]. We propose the following iterative scheme for minimizing energy (7): ⎧ ⎪ un+1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ n+1 v ⎪ ⎪ ⎪ ⎪ sn+1 ⎪ ⎪ ⎩ n+1 p
3.3
= =
θv
w(x,y)un (y)dy+v n (x) θv Ω w(x,y)dy+1
Ω
θs un+1 +θv sn θs +θv
u0
if λ = 0 if λ = ∞ ,
n≥0
(8)
=v − θdivp β(pn +1/8∇(divpn −v n+1 /θs )) = β+1/8|∇(divpn −vn+1 /θs )| n+1
n
Some Properties of the Models (6) and (7)
Convexity. Both energy minimization models (6) and (7) are strictly convex 1 (since the HG term is strictly convex), which implies the existence of a unique minimizing solution independently of the initial condition. Hence, even using gradient descent approaches, the algorithm does not get stuck in a local minimum. Thus, as far as the labels are correctly defined, the results will be independent of the initialization. Relation with the original min-cut problem (1) or equivalently (3). The continuous min-cut problem (4) has the same solution as the discrete min-cut problem when considering characteristic/indicator functions of sets, i.e.: min {ECMC (u = 1A )} = min {cut(A, Ω\A)}. 1A
A
(9)
We remind that we relax function u to be between [0, 1] to define a continuous version of the min-cut algorithm, which can be minimized with continuous minimization tools. Then, the segmentation result is given by thresholding the minimizer u of (6) with any value in (0, 1). Non-trivial steady state solution of (6). The final steady state solution of (6) is not the mean value of the initial function. Call ut=0 = ut=0 the mean value function. It is easy to show by contraction that ut=0 is not solution to (6). If ut=∞ = ut=0 then E(ut=∞ ) = Ω λ(x)(ut=0 − u0 )2 dx > 0, and the minimizer is thus given by ut=0 = u0 . However, u0 (x) = 1 ∀x ∈ S, 0 ∀x ∈ T . Thus, ut=0 = u0 . We notice that Gilboa and Osher in [7] also use the energy ||u||HG1 to perform semi-supervised segmentation. However, they did not use a term to constraint the labels as in this work. They minimized energy ||u||HG1 starting with a trinary initial function ut=0 = {−1, 0, 1} (labelled pixels for the object are assigned to
118
N. Houhou et al.
the value 1 and those for the background to the value −1). However, the minimizing solution is the mean value function ut=∞ = ut=0 . Hence, this algorithm requires to stop the diffusion process.
4
Results
This section presents some results of the proposed semi-supervised segmentation algorithm. The graph is defined from local and non-local image information: |i−j|2 (j)|2 + |F (i)−F if i, j ∈ Na×a (i) σ12 σ22 w(i, j) = , (10) 0 otherwise where Na×a (i) is a square window of size a × a around i. The computational cost of the similarity between pixel on the whole image is very expensive, however we chose to simply select points in a close neighborhood. This implies the supposition that if two points are far away, they are not connected. From a2 neighbors only the cl = 8 closest points are selected. The feature vector F is a square patch of size f ×f centered on each pixel. The segmentation is driven by (8). The initial condition for u, v and s are given by the label S, i.e. u = v = s = 1 if x ∈ S and u = v = s = 0 otherwise. With an unoptimized Matlab implementation, the graph computation lasts approximatively 15 seconds and the segmentation is performed in approximatively 1 minute. The image size is 128 × 128. TV Regularization Effect. The importance of the TV-Regularization effect is emphasized in this paragraph. A salt-and-pepper noise is added on a two-phase image with different means 1(a). The inside and outside labels are presented on Figure 1(a). The results show that if the TV regularization is not performed then the segmentation fails (Fig 1(b)). When the TV regularization is used, then the segmentation succeeds.
(a)
(b)
(c)
Fig. 1. Application of our algorithm on a image with a salt-and-pepper noise. (a)Initialization (b)The segmentation result without TV-regularization.(c)The segmentation result with TV-regularization.
Semi-supervised Segmentation Based on Non-local Continuous Min-Cut
119
Texture Images. We apply our algorithm to a synthetic texture image composed of five different patterns. Figures 2(a) and 2(c) show the initializations and Figures 2(b) and 2(d) the corresponded results. The patch size is chosen to be 9 × 9 which correspond to the pattern size for the two selected textures.
(a)
(b)
(c)
(d)
Fig. 2. Results on synthetic textures. (a) and (c) Initializations. (b) and (d) results.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 3. Results on real-world images from the Berkeley dataset. Left column: Initial labels. Right column: Segmentation Result.
Natural Images. We apply now our algorithm to a set of natural images taken from the Berkeley segmentation dataset [28]. In the fist column of Figure 3, the inside and outside labels are shown and in the second column the segmentation results.
120
N. Houhou et al.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 4. Results on real world color images from the Berkeley dataset. Left column: Initial labels. Right column: Segmentation Result.
Fig. 5. First row, Segmentation of the liver. (a) Initial labels. (b) Segmentation Result. Second row, Segmentation of the lateral muscles on the neck. (c) Initial labels. (d) Segmentation Result. (e) Zoom on the segmentation of the muscles.
Color Images. We consider the simple case of Red-Green-Blue (RGB) channels. The first step consists of computing the graph by taking into account each channel, i.e. F = (Fr , Fg , Fb ), where respectively the red, green and blue feature channel. Images are also taken from the Berkeley segmentation dataset [28]. In the fist column of Figure 3, the inside and outside labels are shown and in the second column the segmentation results. Medical Images. We apply our segmentation algorithm on 2-D medical images of CT scans of the abdomen and the head and neck. Figures 5(a) and 5(c) present
Semi-supervised Segmentation Based on Non-local Continuous Min-Cut
121
the inside and outside initial labels. Figures 5(b) and 5(d) show the segmentation results. For the liver segmentation, the label on the background (black) prevents the diffusion from capturing as well the heart. The segmentation of the structures in the neck are challenging and the result that we obtain are promising.
5
Discussion and Conclusion
In this paper, a non-local semi-supervised segmentation method has been proposed. The success of graph partitioning algorithms for image segmentation has motivated this work. Our objective was to translate the discrete min-cut algorithm into a non-local continuous min-cut algorithm. The addition of hard constraints with the source and sink labels has been done naturally in the proposed continuous framework. Besides, it has also been easy to introduce new terms such as the TV term that regularizes the geometry of the boundary between the object and the background. The non-local continuous min-cut is also equivalent to a diffusion process. The diffusion is done on the graph of image intensity patches, which holds semi-local and non-local image information useful to segment textures and complex patterns. Our semi-supervised segmentation has provided promising segmentation results for textures and real-world objects. Future work will focus on comparing the efficiency of our segmentation algorithm with other related semi-supervised segmentation algorithms. We would like also to extend our method to 3-D medical images.
Acknowledgements Nawal Houhou was supported by Swiss National Science Foundation #205320101621, Xavier Bresson was supported by ONR N00014-03-1-0071 and ONR MURI subcontract from Stanford University and Arthur Szlam was supported by NSF DMS-0811203. The authors would like also to thank the referees for their constructive comments.
References 1. Strang, G.: Maximal Flow Through A Domain. Mathematical Programming 26(2), 123–143 (1983) 2. Protiere, A., Sapiro, G.: Interactive image segmentation via adaptive weighted distances. IEEE Transactions on Image Processing 16(4), 1046–1057 (2007) 3. Cremers, D., Fluck, O., Rousson, M., Aharon, S.: A Probabilistic Level Set Formulation for Interactive Organ Segmentation. In: SPIE (2007) 4. Appleton, B., Talbot, H.: Globally minimal surfaces by continuous maximal flows. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(1), 106–118 (2006) 5. Unger, M., Pock, T., Cremers, D., Bischof, H.: Tvseg - interactive total variation based image segmentation. In: British Machine Vision Conference (BMVC), Leeds, UK (September 2008)
122
N. Houhou et al.
6. Buades, A., Coll, B., Morel, J.: A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation 4(2), 490–530 (2005) 7. Gilboa, G., Osher, S.: Nonlocal linear image regularization and supervised segmentation. Multiscale Modeling and Simulation 6(2), 595–630 (2007) 8. Elias, P., Feinstein, A., Shannon, C.E.: Note on Maximum Flow Through a Network. IRE Transactions on Information Theory 2, 117–119 (1956) 9. Efros, A., Leung, T.: Texture Synthesis by Non-Parametric Sampling. In: IEEE International Conference on Computer Vision, vol. 2, pp. 10–33 (1999) 10. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 888–905 (2000) 11. Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. International Journal of Computer Vision 43(1), 7–27 (2001) 12. Efros, A., Freeman, W.: Image quilting for texture synthesis and transfer. In: Proceedings of the Conference on Computer graphics and interactive techniques, SIGGRAPH, pp. 341–346. ACM, New York (2001) 13. Liang, L., Liu, C., Xu, Y., Guo, B., Shum, H.: Real-time texture synthesis by patch-based sampling. ACM Trans. Graph. 20(3), 127–150 (2001) 14. Bresson, X., Chan, T.: Non-local Unsupervised Variational Image Segmentation Models, UCLA CAM Report 08-67 (2008) 15. Wu, Z., Leahy, R.: An Optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11), 1101–1113 (1993) 16. Ishikawa, H., Geiger, D.: Segmentation by grouping junctions. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 125–131 (1998) 17. Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: Proceedings of Eighth IEEE International Conference on Computer Vision, vol. 1, pp. 105–112 (2001) 18. Boykov, Y., Funka-Lea, G.: Graph cuts and efficient n-d image segmentation. Int. J. Comput. Vision 70(2), 109–131 (2006) 19. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 2001 (2001) 20. Kwatra, V., Schödl, A., Essa, I., Turk, G., Bobick, A.: Graphcut textures: Image and video synthesis using graph cuts. In: Proceedings of the Conference on Computer graphics and interactive techniques, SIGGRAPH, vol. 22(3), pp. 277–286 (July 2003) 21. Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 19–26. Morgan Kaufmann Publishers Inc., San Francisco (2001) 22. Yu, S., Shi, J.: Segmentation given partial grouping constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2), 173–183 (2004) 23. Grady, L., Funka-lea, G.: Multi-label image segmentation for medical applications based on graph-theoretic electrical potentials. In: Proceedings of the European Conference on Computer Vision, pp. 230–245. Springer, Heidelberg (2004) 24. Zhou, D., Scholkopf, B.: A Regularization Framework for Learning from Graph Data. In: Workshop on Statistical Relational Learning and Its Connections to Other Fields (2004) 25. Unger, M., Pock, T., Bischof, H.: Continuous globally optimal image segmentation with local constraints. In: Computer Vision Winter Workshop 2008 (2008)
Semi-supervised Segmentation Based on Non-local Continuous Min-Cut
123
26. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic Active Contours. International Journal of Computer Vision 22(1), 61–79 (1997) 27. Chambolle, A.: An Algorithm for Total Variation Minimization and Applications. Journal of Mathematical Imaging and Vision 20(1–2), 89–97 (2004) 28. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A Database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, July 2001, vol. 2, pp. 416–423 (2001)
Momentum Based Optimization Methods for Level Set Segmentation Gunnar Läthén1,3 , Thord Andersson2,3 , Reiner Lenz1,3 , and Magnus Borga2,3 1
Department of Science and Technology, Linköping University Department of Biomedical Engineering, Linköping University Center for Medical Image Science and Visualization, Linköping University 2
3
Abstract. Segmentation of images is often posed as a variational problem. As such, it is solved by formulating an energy functional depending on a contour and other image derived terms. The solution of the segmentation problem is the contour which extremizes this functional. The standard way of solving this optimization problem is by gradient descent search in the solution space, which typically suffers from many unwanted local optima and poor convergence. Classically, these problems have been circumvented by modifying the energy functional. In contrast, the focus of this paper is on alternative methods for optimization. Inspired by ideas from the machine learning community, we propose segmentation based on gradient descent with momentum. Our results show that typical models hampered by local optima solutions can be further improved by this approach. We illustrate the performance improvements using the level set framework.
1
Introduction
A very popular and powerful approach for solving image segmentation problems is through the calculus of variations. In this setting the solution is represented by a contour, which parameterizes an energy functional depending on various image based quantities such as intensities or gradients. In general, the set of possible contours constitutes the solution space, where the goal is to find the contour which extremizes the energy in this space. As an optimization problem, there are many possible strategies to find this solution. One approach is to use the method of graph cuts to find a global optimum [1]. However, this can only be applied to a small class of energy functionals. For more general problems, the standard method has been to deform an initial contour in the steepest (gradient) descent of the energy. Equations of motion for the contour is derived using the Euler-Lagrange equation and the condition that the first variation of the energy functional should vanish at a (local) optimum. Then, the contour is evolved to steady-state given the resulting equations. A standard implementation of this strategy is usually hampered by two common problems. The first problem is sensitivity to local optima, which are manifested due to noisy data. To avoid this, the usual approach has been to modify the energy functional by adding regularizing terms. The second common problem is poor convergence due X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 124–136, 2009. c Springer-Verlag Berlin Heidelberg 2009
Momentum Based Optimization Methods for Level Set Segmentation
125
to difficulties in choosing good initial conditions. To improve convergence, very effective solvers based on multi-grid [2, 3] and AOS schemes [4, 5, 6] have been developed. However, these methods all search for a solution in the gradient descent direction and little focus has been given to the underlying optimization problem. This has been identified in recent work [7, 8], where the metric defining the notion of steepest descent (gradient) has been studied. By changing the metric in the solution space, local optima due to noise are avoided in the search path. Along the same direction, this paper presents an alternative search strategy for the optimization solver. Our idea stems from the machine learning community, where an optimization problem is solved to update the system to adapt to a given stimulus. A simple, but effective, modification to gradient descent was proposed in [9], which basically adds a momentum to the motion in solution space. This simulates the physical properties of inertia and momentum and effectively allows the search to avoid local optima and accelerate in favorable directions. In this paper, we show how this idea can be used for image segmentation in a variational framework using level set methods. The results show faster convergence and less sensitivity to local optima. The paper will proceed as follows. In Section 2, we describe the idea of gradient descent with momentum in a general setting and give examples highlighting the benefits. Then, Section 3 presents how this idea can be used to solve segmentation problems in a level set framework. This is exemplified in Section 4 and Section 5 where we give implementation details and compute segmentations given a common energy functional. Finally, Section 6 concludes the paper and presents ideas for future work.
2
Gradient Descent with Momentum
Considering general optimization problems, gradient descent is a very simple approach which can handle many types of cost functions. It is intuitive, since it always moves in the direction of steepest descent, which locally gives the largest amount of decrease in the cost function. In addition, it only requires first order derivatives of the function, providing simple and fast computations. On the other hand, it is well known that gradient descent suffers from poor convergence and high sensitivity to local optima for many practical problems. Therefore, other descent directions (Newton, Quasi-Newton, etc.) have been studied and proved superior, see e.g. [10] for a rigorous reference. A simple alternative to these, more theoretically sophisticated methods, is often applied in the machine learning community. A typical problem here is the construction of adaptive systems that can classify unknown inputs. This can be formulated as an optimization problem and one of the goals of machine learning is to construct fast learning or adaptation rules that can be implemented in very simple hardware or software devices. To improve the convergence and robustness of a simple gradient descent solution, while avoiding the complexity of more sophisticated optimization methods, gradient descent with momentum
126
G. Läthén et al.
was proposed [9]. The starting point of our derivation of the proposed method is the following description of a standard line search optimization method: xk+1 = xk + sk
(1)
sk = αk pk
(2)
where xk is the current iterate, sk is the next step consisting of length αk in direction pk . To guarantee convergence, it is often required that pk be a descent direction while αk gives a sufficient decrease in the cost function. A simple realization of this is gradient descent which moves in the steepest descent direction according to pk = −∇fk , where f is the cost function, while αk satisfies the Wolfe conditions [10]. Turning to gradient descent with momentum, we will adopt some terminology from the machine learning community and choose a search direction according to: sk = −η(1 − ω)∇fk + ωsk−1 (3) where η is the learning rate and ω ∈ [0, 1] is the momentum. Note that ω = 0 gives standard gradient descent sk = −η∇fk , while ω = 1 gives “infinite inertia” sk = sk−1 . The intuition behind this strategy is that the current iterate has an inertia, which prohibits sudden changes in the velocity. This will effectively filter out high frequency changes in the cost function and allow for greater steps in favourable directions. Selecting appropriate parameters, our hope is that the rate of convergence is increased while eventual local optima will be overstepped. The effect of the momentum term is illustrated in Figure 1. The iterates with momentum ω = 0 show the behaviour of standard gradient descent when varying the learning rate (step length) η. In comparison, for an appropriate choice of momentum ω = 0.1, the solution approaches the optimum more rapidly. It can be seen however, that too high momentum of ω = 0.4 leeds to oscillations. 1
10
100 90
0
10
80 −1
10
70 60
−2
10
50 −3
10
40
−4
30
10
20
−5
10
10 −6
20
40
60
80
(a) Iterates and cost function
100
10
0
η = 0.04, ω = 0 η = 0.4, ω = 0 η = 0.4, ω = 0.1 η = 0.4, ω = 0.4 2
4
6
8
10
(b) Convergence rate
Fig. 1. Gradient descent search with/without momentum on a quadratic cost function
Momentum Based Optimization Methods for Level Set Segmentation
3
127
Minimizing Level Set Flows
As was previously outlined, segmentation problems in image analysis are often described as optimization problems with solutions derived using the calculus of variations. The standard procedure is to formulate an energy functional by means of a contour and various image derived terms. Extremals of this functional are then identified by an Euler-Lagrange equation, which is used to derive equations of motion for the contour [11]. This typical procedure yields a gradient descent search in a high dimensional solution space, in which each possible contour is represented by a point. For example [11] presents, among others, the derivation of weighted region described by the following functional: E(C) = f (x, y)dxdy (4) ΩC
where C is a 1D curve embedded in a 2D domain, ΩC is the inside region of C, and f (x, y) is a scalar function. This functional is used to maximize some quantity given by f (x, y) inside C. A simple example is f (x, y) = 1 which measures, and maximizes, the area. Calculating the first variation of Eq. (4) yields the evolution equation: ∂C = −f (x, y)n (5) ∂t where n is the curve normal. Again, setting f (x, y) = 1 gives a constant flow in the normal direction, typically referred to as the “balloon force”. The representation, or parameterization, of the contour C can in general be chosen arbitrarily. However, it is often convenient to use the implicit level set method by Osher and Sethian [12], since this allows for arbitrary topological changes. To summarize the basic ideas, a contour is represented implicitly as a zero level set of a time dependent scalar function (referred to as the level set function). Formally, a contour C is described by C = {x : φ(x, t) = 0}. To deform C, the level set function is evolved in time according to a set of partial differential equations (PDEs). The transition from the equations of motion for a parametrized curve (Eq. (5)) to a level set PDE is accomplished by a simple procedure. In general, the motion ∂C ∂t = γn translates to the level set equation ∂φ ∂t = γ |∇φ| [11]. Thus, Eq. (5) gives the familiar level set equation: ∂φ = −f (x, y) |∇φ| ∂t
(6)
The remainder of this section will describe how we modify the typical level set method update scheme to incorporate a momentum term as presented in Section 2. 3.1
Momentum for Minimizing Level Set Flows
We have noted that the contour evolving according to the Euler-Lagrange equation yields a gradient descent search. Recall that each contour can be represented as a point in the solution space (the structure of the space will depend on
128
G. Läthén et al.
parameterization). Thus, we can approximate the direction of the gradient by computing the vector between two subsequent points. In the level set framework we achieve this by taking the difference between two subsequent time instances of the level set function, representing the entire level set function as one vector: φ(tn ) − φ(tn−1 ) (7) Δt where f is a cost function in compliance with the terminology used in Section 2. Note that this is indeed an approximation, depending on the time difference Δt = tn − tn−1 . Following the ideas from Section 2, we update the level set function to incorporate a momentum term: ∇f (tn ) ≈
n ) − φ(tn−1 ) φ(t + ωs(tn−1 ) Δt φ(tn ) = φ(tn−1 ) + Δts(tn ) s(tn ) = −η(1 − ω)
(8) (9)
The complete procedure works as follows: Procedure UpdateLevelset 1
Given the level set function φ(tn−1 ), compute the next (intermediate) n ). This is performed by evolving φ according to a PDE time step φ(t (such as Eq. (6)) using standard techniques (e.g. Euler integration).
2
Compute the approximate gradient by Eq. (7).
3
Compute a step s(tn ) according to Eq. (8). This step effectively modifies the gradient direction by incorporating the momentum term as a fraction of the previous step s(tn−1 ).
4
Compute the next time step φ(tn ) by Eq. (9). Note that this replaces the intermediate level set function computed in Step 1.
The procedure is very simple and is directly compatible with any type of level set implementation.
4
Experiments
We now describe some details of the implementation and illustrate properties of the suggested method using two examples. Here we study 1D curves embedded in a 2D domain, but the approach readily generalizes to 2D surfaces in 3D given the level set framework. 4.1
Implementation Details
We have implemented the proposed ideas in Matlab using standard level set techniques based on [13, 14]. Reference code can be found online at the site http://dmforge.itn.liu.se/ssvm09/. Some details of our implementation are the following:
Momentum Based Optimization Methods for Level Set Segmentation
129
– The level set function is reinitialized (reset to a signed distance function) after Step 1 and Step 4. This is typically performed using the fast marching [15] or fast sweeping algorithms [16]. There are two reasons for this: Firstly it is required for stable evolution in time due to the use of explicit Euler integration. Secondly we want a momentum induced by the zero level set of φ (the contour), rather than all level sets of φ. Reinitialization could be omitted, with the effect of introducing a momentum on all individual level sets. Interpreting each sample of φ as a parameter of the contour, this is equivalent to applying momentum on each parameter. While feasible, we have not experimented with momentum without incorporating reinitialization. – We avoid instabilities by dampening s(tn ) in Step 3 using a sigmoidal function: sˆ(s(tn ), smax ) =
1+
2smax −2s(t n )/smax e
− smax
(10)
where smax is the maximum step length allowed. – Any explicit or implicit time integration scheme can be used in Step 1. Due to its simplicity, we have used explicit Euler integration which might require several inner iterations in Step 1 to advance the level set function by Δt time units. 4.2
Weighted Region Based Flow
To verify our idea, we have used a simple energy functional based on a weighted region term (Eq. (4)) combined with a penalty on curve length for regularization. The goal is to maximize: E(C) = f (x, y)dxdy − α ds (11) ΩC
C
where α is a regularization weight parameter. The target function f (x, y) is image based, computed using the approach in [17]. This method uses quadrature filters [18] across multiple scales to detect line structures. Taking the real part of the complex filter response, f (x, y) gives positive values on the inside of linear structures, negative on the outside, and zero on the edges. Translating Eq. (11) to a level set PDE following [11] gives: ∂φ = −f (x, y) |∇φ| + ακ |∇φ| ∂t
(12)
where κ is the curvature of the contour. First we illustrate some properties of the method with a synthetic test image depicted in Figure 2(a), which mimics the common problem of intensity variation in medical imaging. The intensity of the object ranges from 0.3 to 1, while the noise level is 0.1. This image yields the target function f (x, y) in Figure 2(b) where bright and dark colors indicate positive and negative values respectively. As exemplified in our first experiment (Figure 3) the dip in contrast results in a local optimum in the solution space.
130
G. Läthén et al.
(b) Target function f (x, y)
(a) Input image
Fig. 2. Synthetic test image illustrating the presence of a local optima in the solution space
(a) time = 0
(b) time = 40
(c) time = 100
(d) time = 170
(e) time = 300
(f) time = 870
Fig. 3. Iterations without momentum (conventional gradient descent)
Figure 3 shows the results after evolving the level set function by Eq. (12) until convergence without momentum, using conventional methods. We define convergence as |∇f |∞ < 0.03 (using the infinity/maximum norm), with ∇f given in Eq. (7). For this experiment we used parameters α = 0.7 and we reinitialized the level set function every fifth time unit. For comparison, Figure 4 shows the results after running our method using parameters α = 0.7, ω = 0.8, η = 10, smax = 100, Δt = 5. Plots of the energy functional for both experiments are shown in Figure 5. Here, we plot the weighted area term and the length penalty term separately, to illustrate the balance between the two. Note that the functional without momentum in Figure 5(a) is monotonically increasing, due to the nature of gradient descent, while the functional with momentum visits a number of local maxima during the search. To further exemplify the behaviour of our method, we created a slightly modified version of Figure 2(a), shown in Figure 6(a). In contrast to Figure 2(a), the shape in Figure 6(a) is disconnected, so the global optimum is expected to contain two separated regions. Not surprisingly, conventional gradient descent captures only a local minimum as displayed in Figure 7, while gradient descent with momentum succeeds in capturing the global solution as two separated
Momentum Based Optimization Methods for Level Set Segmentation
(a) time = 0
(b) time = 20
(c) time = 40
(d) time = 60
(e) time = 150
(f) time = 200
(g) time = 245
(h) time = 320
(i) time = 460
131
Fig. 4. Iterations using momentum 1800
1800 Energy functional Length penalty integral Target function integral
1600
1600
1400
1400
1200
1200
1000
1000
800
800
600
600
400
400
200
200
0 0
100
200
300
400 500 time
600
700
(a) Without momentum
800
0 0
Energy functional Length penalty integral Target function integral 100
200
300
400
time
(b) With momentum
Fig. 5. Plots of energy functionals for synthetic test image in Figure 2(a)
regions (Figure 8). For this experiment, we used the same parameters as in Figure 3 and Figure 4. As a third test image we used a 458 × 265 retinal image from the DRIVE database [19], shown in Figure 9(a). The target function f (x, y) is illustrated in Figure 9(b). As in the previous experiment, bright and dark colors indicate positive and negative values for f (x, y). The convergent result without momentum using parameters α = 0.07 and reinitialization every tenth time unit is shown in Figure 10, given the initial condition in Figure 10(a). Applying
132
G. Läthén et al.
(b) Target function f (x, y)
(a) Input image
Fig. 6. Synthetic test image illustrating the presence of a local optima in the solution space
(a) time = 0
(b) time = 200
(c) time = 515
Fig. 7. Iterations without momentum (conventional gradient descent)
(a) time = 0
(b) time = 40
(c) time = 70
(d) time = 180
(e) time = 240
(f) time = 485
Fig. 8. Iterations using momentum
the idea of momentum yields the result in Figure 11, using the parameters α = 0.07, ω = 0.5, η = 1.3, smax = 40, Δt = 10. The energy functionals are plotted in Figure 12 to display the convergence of both methods.
5
Results
The synthetic test image in Figure 2(a) illustrates a local optimum in the solution space when applying the parameters in our first experiment. As expected,
Momentum Based Optimization Methods for Level Set Segmentation
(a) Input image
133
(b) Target f (x, y)
Fig. 9. Retinal image
(a) time = 0
(b) time = 20
(c) time = 40
(d) time = 100
(e) time = 200
(f) time = 400
(g) time = 600
(h) time = 1210
Fig. 10. Iterations without momentum (conventional gradient descent)
134
G. Läthén et al.
(a) time = 0
(b) time = 20
(c) time = 40
(d) time = 100
(e) time = 200
(f) time = 400
(g) time = 600
(h) time = 820
Fig. 11. Iterations using momentum 9000
9000 8000
8000
7000
7000
6000
6000
5000
Energy functional Length penalty integral Target function integral
4000
5000
3000
3000
2000
2000
1000
1000
0 0
200
400
600 time
800
1000
(a) Without momentum
1200
Energy functional Length penalty integral Target function integral
4000
0 0
100
200
300
400 time
500
600
700
(b) With momentum
Fig. 12. Plots of energy functionals for the retinal image in Figure 9(a)
800
Momentum Based Optimization Methods for Level Set Segmentation
135
the conventional gradient descent approach converges to this local optimum as depicted in Figure 3. In contrast, our proposed method gains enough momentum in order to overstep the optimum, while at the same time the global solution is reached more rapidly. The process (illustrated in Figure 4) intuitively expands the curve beyond a local optimum, followed by a retraction if the search does not provide any increase in that direction. Using a slightly modified input image, our second example shows that our method is capable of capturing global optima, even when the solution consists of separated regions (Figure 8). Our third example illustrates our method on real data using a retinal image. In Figure 10 we see that conventional gradient descent fails to capture many weak signal blood vessels. This is a typical case of local optima solutions introduced by noise and poor image contrast. Under the same conditions, gradient descent with momentum captures practically all visible vessels as shown in Figure 11. Note that this example does not include any verification of the accuracy of the segmented vessels. The primary purpose is to illustrate that our method reaches a stronger optimum value for the energy functional, as shown in Figure 12.
6
Conclusions and Future Work
In this paper we have presented the idea of gradient descent with momentum in the context of segmentation using the level set method. We have illustrated the drawbacks of conventional gradient descent and showed examples on how the solution is improved by adding momentum. In contrast to much of the previous work, we have improved the solution by changing the method of solving the optimization problem rather than changing the parameters of the energy functional. In the future, we will further study the general optimization problem of image segmentation to propose more efficient solutions. Regarding the particular idea of momentum, we will apply this on real applications and verify the quality of the results.
References 1. Boykov, Y., Kolmogorov, V.: Computing geodesics and minimal surfaces via graph cuts. In: Proc. ICCV 2003, October 2003, vol. 1, pp. 26–33 (2003) 2. Papandreou, G., Maragos, P.: Multigrid geometric active contour models. IEEE Transactions on Image Processing 16(1), 229–240 (2007) 3. Kenigsberg, A., Kimmel, R., Yavneh, I.: A multigrid approach for fast geodesic active contours. Technical Report CIS-2004-06, Technion–Israel Inst. Technol., Haifa (2004) 4. Paragios, N., Mellina-Gottardo, O., Ramesh, V.: Gradient vector flow fast geometric active contours. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(3), 402–407 (2004) 5. Goldenberg, R., Kimmel, R., Rivlin, E., Rudzsky, M.: Fast geodesic active contours. IEEE Transactions on Image Processing 10(10), 1467–1475 (2001)
136
G. Läthén et al.
6. Weickert, J., Kühne, G.: Fast methods for implicit active contour models. In: Geometric Level Set Methods in Imaging, Vision and Graphics. Springer, Heidelberg (2003) 7. Charpiat, G., Keriven, R., Pons, J.P., Faugeras, O.: Designing spatially coherent minimizing flows for variational problems based on active contours. In: Proc. ICCV 2005, October 2005, vol. 2, pp. 1403–1408 (2005) 8. Sundaramoorthi, G., Yezzi, A., Mennucci, A.: Sobolev active contours. International Journal of Computer Vision 73(3), 345–366 (2007) 9. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation, pp. 318–362. MIT Press, Cambridge (1986) 10. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Heidelberg (2006) 11. Kimmel, R.: Fast edge integration. In: Geometric Level Set Methods in Imaging, Vision and Graphics. Springer, Heidelberg (2003) 12. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 13. Osher, S., Fedkiw, R.: Level Set and Dynamic Implicit Surfaces. Springer, New York (2003) 14. Peng, D., Merriman, B., Osher, S., Zhao, H.K., Kang, M.: A pde-based fast local level set method. Journal of Computational Physics 155(2), 410–438 (1999) 15. Sethian, J.: A fast marching level set method for monotonically advancing fronts. Proceedings of the National Academy of Science 93, 1591–1595 (1996) 16. Zhao, H.K.: A fast sweeping method for eikonal equations. Mathematics of Computation (74), 603–627 (2005) 17. Läthén, G., Jonasson, J., Borga, M.: Phase based level set segmentation of blood vessels. In: Proc. ICPR 2008, Tampa, FL, USA, IAPR (December 2008) 18. Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer Academic Publishers, Netherlands (1995) 19. Staal, J., Abramoff, M., Niemeijer, M., Viergever, M., van Ginneken, B.: Ridge based vessel segmentation in color images of the retina. IEEE Transactions on Medical Imaging 23(4), 501–509 (2004)
Optimization of Divergences within the Exponential Family for Image Segmentation Francois Lecellier1 , Stephanie Jehan-Besson2, Jalal Fadili1 , Gilles Aubert3 , and Marinette Revenu1 1
Laboratoire GREYC, University of Caen, France Laboratoire LIMOS, University of Clermont-Ferrand, France Laboratoire J.A. Dieudonné, University of Nice Sophia-Antipolis, France 2
3
Abstract. In this work, we propose novel results for the optimization of divergences within the framework of region-based active contours. We focus on parametric statistical models where the region descriptor is chosen as the probability density function (pdf) of an image feature (e.g. intensity) inside the region and the pdf belongs to the exponential family. The optimization of divergences appears as a flexible tool for segmentation with and without intensity prior. As far as segmentation without reference is concerned, we aim at maximizing the discrepancy between the pdf of the inside region and the pdf of the outside region. Moreover, since the optimization framework is performed within the exponential family, we can cope with difficult segmentation problems including various noise models (Gaussian, Rayleigh, Poisson, Bernoulli ...). We also experimentally show that the maximisation of the KL divergence offers interesting properties compare to some other data terms (e.g. minimization of the anti-log-likelihood). Experimental results on medical images (brain MRI, contrast echocardiography) confirm the applicability of this general setting.
1 Introduction We propose here to focus on the segmentation of homogeneous regions in noisy images using statistical region-based active contour models (RBAC). In RBAC, region-based terms can be advantageously combined with boundary-based ones [1, 2]. The evolution equation is generally deduced from a general criterion to minimize that includes both region integrals and boundary integrals. The combination of those two terms in the energy functional allows the use of photometric image properties, such as texture [3] and noise [4], as well as geometric properties such as the shape prior of the object to be segmented. In statistical region-based active contours, see [5] for a review, image features (e.g. intensity) are considered as random variables whose distribution may be parametric (e.g. Gaussian) or non parametric [6]. Classically, the authors consider the minimization of the anti-log-likelihood for segmentation [7,8,4]. In this paper, we rather focus on the optimization of distance between pdfs. Such distances or more generally divergences can be used in two different manner. On the one hand, they can be used for segmentation with distribution intensity prior and in this case, we aim at minimizing the distance between the pdf of the evolving region and a reference one. On the other hand, they can be used for segmentation without reference and in this second case, we aim X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 137–149, 2009. c Springer-Verlag Berlin Heidelberg 2009
138
F. Lecellier et al.
at maximizing the distance between the pdf of the inside region and the pdf of the outside region. In the literature, the minimization of divergences between non parametric pdfs has first been proposed in [6] for video sequences. It has then been developed for cardiac structures tracking in perfusion MRI (p-MRI) sequences in [9]. As far as segmentation using the maximization of divergences is concerned, some authors [10] have also proposed to take benefit of the maximization of the Bhattacharya distance of non parametric pdfs for segmentation. On the other hand, divergences between Gaussian distributions have been developed for DTI segmentation in [11]. In this paper, we propose to set a general framework for the optimization of divergences between parametric pdfs within the exponential family. To the best of our knowledge, such a framework has never been studied for region-based active contour segmentation. The rationale behind using the exponential family is that it includes, among others, Gaussian, Rayleigh, Poisson and Bernoulli distributions that have proven to be useful to model the noise structure [4] in many real image acquisition devices (e.g. Poisson for photon counting devices such as X-ray or CCD cameras, Rayleigh for ultrasound images, etc). Using shape derivative tools as in [12, 6], our effort focuses on constructing a general expression for the derivative of the energy (with respect to a domain), and on deriving the corresponding evolution speed. Our general framework is also specialized to some particular cases, such as the optimization of the KullbackLeibler (KL) divergence [13], which gives a simple expression of the derivative. This theoretical framework is then more explicitly detailed and illustrated for the case of the segmentation without reference. In this case, we aim at maximizing the dissimilarity between the pdf of the intensity within the region inside the evolving contour and the pdf of the intensity within the region outside the contour. In other words, we perform a competition between the pdfs of these two regions through the maximisation of divergences. Experimental results are given for the particular case of the KL divergence. We experimentally compare this data term to the classical minimization of the anti-log-likelihood [7, 14] for the segmentation of the White Matter in brain MRI and we show that KL maximisation is able to extract a single Gaussian from a mixture of Gaussian. We also show the applicability of our data term for the segmentation of the left ventricle in contrast echocardiography where the noise is modelled using Rayleigh. In this paper, we first set our general setting and introduce shape gradients in section 2. In section 3, we propose to give some general results for the exponential family and then for the shape derivative of divergences between pdfs. These results are then specialized for the KL divergence using the Maximum Likelihood Estimation (MLE) for the parameters. Experimental results for the maximization of KL divergence are given in section 4.
2 Optimization of Divergences between Pdfs: General Setting In this section, we set our general setting for segmentation through the optimization of distances between pdfs or more generally divergences. 2.1 General Setting Consider a function y : Rn → χ ⊂ R which describes the feature of interest. The term y(x) then represents the value of the feature y at location x where x ∈ Rn . Let q(y, Ω)
Optimization of Divergences within the Exponential Family for Image Segmentation
139
be the probability density function (pdf) of the feature y within the image region of interest. We now assume that we have a function Ψ : R+ × R+ → R+ which allows us to compare two pdfs. This function is small if the pdfs are similar and large otherwise. It allows us to introduce the following functional which represents the distance or more generally the divergence between the current pdf estimate q(y, Ω) and another one p(y) which may also depend on another domain: D(Ω) = Ψ (q(y, Ω), p(y)) dy. (1) χ
The distance can be for example the symmetrized Kullback-Leibler divergence when p(y) Ψ (q, p) = 12 (p(y) log q(y,Ω) + q(y, Ω) log q(y,Ω) p(y) ). Such divergences represent a general setting for both segmentation with and without reference. Indeed, in segmentation problems, we generally search for homogeneous regions regarding with a given feature. We may then modelize the segmentation problem as the maximization of the distance between the pdf of the feature within the inside region and the pdf of the feature within the outside region. In order to fix ideas, let us consider a partition of an image in two regions where Ω is the inside region and Ω c the complementary outside region. The segmentation may then be formulated as the maximization of the following criterion: D(Ω, Ω c ) = Ψ (q(y, Ω), p(y, Ω c )) dy. (2) χ
On the other hand, we can also consider that we have a reference histogram pref and that we search for the domain that minimizes the divergence between q and pref . This last framework may be applied to tracking or to supervised segmentation where a reference pdf is learned on the region of interest. The theoretical results given in this paper can be used for both applications. 2.2 Shape Gradient Descent In order to find an optimum, we perform a shape gradient descent using region-based active contours. We then have to compute the derivative of the criterion according to the domain using shape derivation tools [15]. Shape derivative tools applied to region-based active contours are described in [12, 6] and we won’t remind all the definitions in this paper. Let us just remind that, from the shape derivative, we can derive the evolution equation that will drive the active contour towards a (local) minimum of the criterion. Let us suppose that the shape derivative of the criterion D(Ω) in the direction V may be written as follows: < D (Ω), V >= −
speed(x, Ω)(V(x) · N(x))da(x) ,
(3)
∂Ω
where N is the unit inward normal to ∂Ω and da its area element. When minimizing the distance D(Ω), interpreting equation (3) as the L2 inner product on the space of velocities, the straightforward choice is to take V = speed(x, Ω)N. When minimizing the distance D(Ω), we can then deduce the following evolution equation: ∂Γ = speed(x, Ω) N(x) , ∂τ
(4)
140
F. Lecellier et al.
On the contrary, when maximizing the criterion, we take the opposite sign for the velocity.
3 General Results for Shape Derivative of Divergences within the Exponential Family In this paper, we consider that pdfs belong to the exponential family. In this case, the current pdf estimate q(y, Ω) is now indexed by a set of parameters θ ∈ Θ ⊂ Rκ (e.g. we have κ = 2 and θ = (μ, σ)T where μ is the mean and σ the variance for the Normal family). When using the exponential family, we rather index the pdf by η which is the natural parameter as explained below. In order to derive the criterion, we must take into account the dependence of the natural parameter with the domain. We then restrict our study to the full rank κ-parameter canonical exponential family [16]. For this family, we can establish a 1-1 correspondence between η and Ω and so compute directly the shape derivative of D(Ω). In the sequel, let us first introduce the exponential family and some properties and then explain the computation of the shape derivative. We then specialize our result when parameters are estimated using the Maximum Likelihood Estimation (MLE) method. We also give some results for the optimization of the Kullback-Leibler (KL) divergence. In this case, the shape derivative reduces to a very simple general expression. 3.1 The Exponential Family: Definition and Properties The multi-parameter exponential family [17] is naturally indexed by a κ-dimensional real parameter vector η and a κ-dimensional natural statistic vector T(Y ). We draw the reader’s attention to the fact that η is a function of θ ∈ Θ which is the parameter of interest in most applications (for the Gaussian distribution, we have θ = (μ, σ)T ). Definition 1. The family of distributions of a Random Variable (RV) Y {qθ : θ ∈ Θ ⊆ Rκ }, is said a κ-parameter canonical exponential family, if there exists real-valued functions: • • • •
η(θ) = [η 1 , ..., η κ ]T with η i : Θ ⊆ Rκ → R h:R→R B:Θ→R T = [T1 , ..., Tκ ]T : Rκ → R
such that the pdf qθ (y) may be written as: qθ (y) = h(y) exp[η(θ), T(y) − B(θ)]
with
y∈χ⊂R.
(5)
The term T is called the natural sufficient statistic and η the natural parameter vector. The term η, T denotes the scalar product. Letting the model be indexed by the natural parameter η rather that θ, the canonical κ−parameter exponential family generated by T and h is defined as follows: qη (y) = h(y) exp[η(θ), T(y) − A(η)] , (6) +∞ with A(η) = log −∞ h(y) exp[η(θ), T(y)]dy. The natural parameter space is defined as E = {η ∈ Rκ ; −∞ < A(η) < +∞}.
Optimization of Divergences within the Exponential Family for Image Segmentation
141
Some Common Distributions. Table 1 provides a synthetic description of some common distributions of the exponential family: Table 1. Some common canonical exponential families. B(α, β) is the Euler Beta function. Distribution
θT
η(θ)T
T(y)T
Normal
(μ, σ 2 )
−1 ( σμ2 , 2σ 2)
(y, y 2 )
1 2
A(η) η2
− 2η12 − log
−η2 π
Gamma (λ, p) (−λ, p − 1) (y, log y) −(η2 + 1) log −η1 + log Γ (η2 + 1) Beta (r, s) (r − 1, s − 1) (log y, log(1 − y)) − log B(η1 + 1, η2 + 1) Poisson μ log μ y eη Exponential λ −λ y − log −η Rayleigh θ2 −1/2θ2 y2 − log −2η
Properties. The following results will be useful for our RBAC scheme based on the exponential family. Their proofs may be found in [16]. These properties give us a relation between the parameters η and the domain Ω through the use of the expectation of the natural statistics T(Y ). The first theorem provides general relations between the gradient of A and the expectation of T(Y ) while the second theorem allows us to establish a 1 − 1 correspondence between η and E[T(Y )] (for the full rank exponential family). Such a relation may then be used to express the parameter η and derive it according to the domain. Theorem 1. Let {qη : η ∈ E} a κ-parameter canonical exponential family with natural sufficient statistic T(Y ) and open natural parameter space E, we then have the following properties: 1. 2. 3. 4.
E is convex. A:E → S ⊆ R is convex. E[T(Y )] = ∇A(η). ¨ Cov[T (Y )] = A(η).
∂A ∂A ∂A T where ∇A = ( ∂η , , .., ∂η ) represents the gradient of A, and A¨ is the Hessian 1 ∂η2 κ 2 matrix of A with A¨ij = ∂ A . ∂ηi ∂ηj
The following theorem establishes the conditions of strict convexity of A, and then those for ∇A to be 1-1 on E. This is a very useful result for optimization (derivation) purposes: Theorem 2. Let {qη : η ∈ E} a full rank (i.e. Cov[T (Y )] is a positive-definite matrix) κ-parameter canonical exponential family with natural sufficient statistic T(Y ) and open natural parameter space E, we have [16]: 1. η → ∇A(η) is 1-1 on E. 2. The family may be uniquely parameterized by μ(η) ≡ E[T(Y )] = ∇A(η). 3. The anti-log-likelihood function is a strictly convex function of η on E. These results establish a 1-1 correspondence between η and E[T(Y )] such that: μ = ∇A(η) = E[(T(Y )] ⇔ E η = φ (E[T(Y )]) , holds uniquely with ∇A and φ continuous.
(7)
142
F. Lecellier et al.
Estimation of the Hyperparameters. The relation 7 allows us to express the parameter η as a function of E[(T(Y )]. In order to estimate the parameters, we replace E[T(Y )] by the empirical estimate of the mean T(Y ). This corresponds to the MLE of the parameter. Indeed, the MLE of η corresponds to minimizing the anti-log-likelihood score (for independent and identically distributed (iid) data). By differentiation of the anti-log-likelihood according to η, we find ∇A(η MLE ) = T (Y ). Note however that in this case, this is the discrete sample mean. The following example illustrates this stating: −1 Example 1. When dealing with the Rayleigh distribution, we have η = 2θ 2 , A(η) = 2 − log(−2η) and T (y) = y . By computing A (η) = T(Y ), we find that − η1 = 1 2 y(x)2 dx, which corresponds to the MLE of the parameter θ2 given by θML |Ω| Ω 1 = 2|Ω| y(x)2 dx. Ω
3.2 Shape Derivative of the Criterion In this section, we propose to derive according to the domain the functional (1). The dependence of the functional with the domain is due to the estimation of the parameter η detailed above. In the sequel, for the sake of simplicity, we will invariably denote η for the natural parameter and its finite sample estimate over the domain (without a ˆ ). We are now ready to state our main result: slight abuse of notation, this should be η Theorem 3. The Gâteaux derivative, in the direction of V, of the functional (1), is: < D (Ω), V >= ∇V η, C ,
(8)
where ∇V η = [< ∇η1 (Ω), V >, ..., < ∇ηκ (Ω), V >] is the Gâteaux derivative of η in the direction of V, ., . is the usual scalar product of two vectors and: C = E[∂1 Ψ (q(Y, η(Ω)), p(Y ))(T(Y ) − E[T(Y )])]. The term ∂1 Ψ denotes the partial derivative of Ψ according to the first variable. The proof is detailed in Appendix A.2. We then have to compute the shape derivative ∇V η. Such a computation requires an estimation of the expectation E[T(Y )] as explained in the next section. 3.3 Computing the Shape Derivative for the MLE Estimator As mentioned in section (3.1.3), the expectation E[T(Y )] can be replaced with the empirical estimate of the mean T(Y ) which is computed over the considered domain Ω. Using such an estimation for the hyperparameter, we can state the following proposition: Lemma 1. Within the full rank exponential family, and using the MLE estimator for the hyperparameters, the shape derivative ∇V η can be expressed as: ¨ −1 ∇V (T) . ∇V η = A(η)
(9)
Optimization of Divergences within the Exponential Family for Image Segmentation
143
¨ −1 = I(η)−1 is the inverse of the Hessian matrix of A¨ which is also the where A(η) fisher information matrix I. The derivative ∇V (T) is given by: 1 T(y) − T(y(a)) (V · N)da(x) , (10) ∇V (T) = |Ω| ∂Ω The proof is given in Appendix A.3. We can then replace the shape derivative of the natural parameters given in Lemma 1 in the general Theorem 3. The corollary that gives the shape derivative then follows: Corollary 1. The Gâteaux derivative, in the direction of V, of the functional (1), is: 1 < D (Ω), V >= |Ω|
κ
∂Ω
i=1
Ci
κ
−1 ¨ [A(η)]ij (Tj (y) − Tj (y(a))) (V · N)da ,
j=1
where the κ components of the vector C are defined as follows: Ci = E[∂1 Ψ (q(Y, η(Ω)), p(Y ))(Ti (Y ) − Ti (Y )] i ∈ [1, κ]. The term ∂1 Ψ denotes the partial derivative of Ψ according to the first variable. In order to fix ideas, the functional D(Ω) can be chosen as the Kullback-Leibler divergence, in this case ∂1 Ψ (q, p) = log q + 1 − log p − pq . In order to compute the vector C in Corollary 1, we can assume that the pdf p belongs to the exponential family and to the same parametric law as the pdf q. Let us denote by η 1 the parameter of the pdf p. This parameter is supposed to be already computed or dependent of another domain and so does not depend on the domain Ω. We then state the following proposition: Lemma 2. When p(y, η 1 ) and q(y, η(Ω)) are two members of the exponential family that belong to the same parametric law with respective parameters η 1 and η, and when the functional D(Ω) is chosen as the KL divergence, we find for the vector C defined in Theorem 1: ¨ C = A(η)(η − η 1 ) + ∇A(η) − ∇A(η 1 ) . A proof is given in appendix C. This expression demonstrates that the derivative can be very simply computed using the natural parameters and the sufficient statistics of the law. Let us give two examples of computation for both the Rayleigh and the Gaussian law. Example 2. When dealing with the Rayleigh following example 2, with distribution, θ12 1 2 2 2 θ2 θ = 2 y , the term C is equal to C = 2θ θ2 − θ2 . we then find for the derivative 1 of KL divergence: 1 y(a)2 C < KL (Ω), V >= (1 − )(V · N)da(x) . (11) 2 |Ω| ∂Ω 2θ 2θ2 Example 3. When dealing with the Gaussian distribution, the term C is equal to
2 C=
( σσ 2 + 1)(μ − μ1 ) r 2 μ2 + σ 2 − μ21 − σ12 + 2 σσ 2 μ2 − μμ1 + σ 4 ( σ12 − 1
1
1 ) σ2
.
(12)
144
F. Lecellier et al.
We then find for the derivative of KL divergence: < KL (Ω), V >= 2μ 1 μ −(y − μ) C (1 + ) − C 1 2 2 σ 2 |Ω| ∂Ω σ2 σ C2 μ +(y 2 − σ 2 − μ2 ) C1 2 − 2 (V · N)da(x) . σ 2σ
4 Maximisation of Divergences In this section, we propose to concentrate on the segmentation of an image into two regions (namely Ω and its complement Ω c ) by maximizing the criterion 2. 4.1 Evolution Equation When using the MLE estimator for the parameters, and noting that Ω and Ω c shares 1 the same boundary with opposite normals, we take T(y) = |Ω| Ω T(y(x))dx and c 1 T(y) = |Ω c | Ω c T(y(x))dx. Using Corollary 1 and the fact that < D (Ω, Ω c ), V >= ∇V η, C + ∇V η c , Cc , we find for the evolution equation: κ κ 1 ∂Γ ¨ −1 (Tj (y) − Tj (y(x)) = Ci (Ω) A(η) ij ∂τ |Ω| i=1 j=1
−
κ κ 1 c ¨ c )−1 (Tj (y)c − Tj (y(x)) N. C (Ω ) A(η i ij |Ω c | i=1 j=1
For the KL divergence, the term C is evaluated as explained in section 3.3. A classical regularization term λκ is added where λ is a positive constant and κ the curvature. As far as the numerical implementation is concerned, we use the level set method approach first proposed by Osher and Sethian [18]. 4.2 Comparison with Other Methods in the Gaussian Case In this section, we propose to compare the behavior of our data term based on the maximization of the symmetrized Kullback-Leibler divergence between parametric pdfs to two other well-known region-based methods [7, 14]. The first method is the famous Chan & Vese method [14]. Such a criterion implies a Gaussian distribution for the feature y with a fixed variance. The corresponding evolution equation can be found in [14]. The second method has been first proposed by [7] and aims at minimizing the anti-loglikelihood for a Gaussian distribution. The evolution equation can be found in [7]. In order to compare these terms, let us express the non symmetrized KL divergence using the expectation under the pdf q, denoted by Eq , as follows: D(qp) = Eq [log(q(Y, η Ω ))] − Eq [log(p(Y, η Ω c ))]
(13)
Optimization of Divergences within the Exponential Family for Image Segmentation
145
To get the gist of using KLD as a criterion in an RBAC functional, consider the data yi = {y(x)|x ∈ Ω} as an iid sequence from the statistical model q(y, η Ω ). Using the weak law of large number for a very large domain Ω, the first term (which corresponds 1 to the entropy) can then be expressed as |Ω| log(q(y(x), η(Ω))dx. Maximizing the Ω first term in KL divergence can then be seen as equivalent to minimizing the anti-loglikelihood score [19] divided by the size of the sample (which corresponds to the entropy under the law of large number). Using the same assumptions, the second term of KL divergence can be seen as the minimization of the plausibility of the data provided by Ω c in the inside region Ω. When using the symmetrized version, we act both on Ω and Ω c . Let us now compare experimentally the behavior of these criterions for the extraction of an homogeneous region corrupted by a Gaussian noise in an image. We propose to take the example of the segmentation of the White Matter (WM) in T1-weighted brain MRI images. We perform the three evolution equations using the Gaussian assumption for the pdf of the feature y within each region. The feature y is chosen as the Intensity of the image. The initial contour is given in Figure 1.(a) and we also show the two initial pdfs (b), namely qη (I, Ω) which corresponds to the distribution of the intensity I inside the region Ω and qηc (I, Ω c ) which corresponds to the distribution of I inside
0.025 hist_in hist_out
0.02
0.015
0.01
0.005
0 0
50
100
150
200
Intensity
(a) initial contour
0.03
(b) associated pdfs
0.03
0.06
hist_in hist_out
hist_in hist_out
0.025
hist_in hist_out
0.025
0.05
0.02
0.02
0.04
0.015
0.015
0.03
0.01
0.01
0.02
0.005
0.005
0
0.01
0 0
50
100
150
Intensity
(c) Chan & Vese
200
0 0
50
100
150
Intensity
(d) log likelikood
200
0
50
100
150
200
Intensity
(e) KL maximization
Fig. 1. T1-weighted brain MRI segmentation results (extraction of the White Matter). The pdf of the intensity inside the contour is in solid line, the pdf of the intensity outside the contour is in dotted lines. (a): initial contour and (b) : associated pdfs, column (c): final contour and pdfs for the Chan & Vese method [14], column (d): for the log-likelihood method [7], column (e): for the maximization of the KL divergence.
146
F. Lecellier et al.
the region Ω c (i.e. outside the region ). In Figure 1, we can observe the final active contour obtained using our criterion (22) and the two other criterions mentioned above. We can remark that our criterion acts as an extractor of the most important Gaussian in the initial mixture of Gaussian (see Figure 1.e). The two other criterions separate the mixture without extracting a single Gaussian. So, with our method, we can directly obtain the White Matter of the brain without a multiphase scheme. 4.3 Examples of Applications In this part, we consider two examples of application (brain MRI images and contrast echocardiogaphy) using two different noise models (Gaussian and Rayleigh). Concerning 3D T1-weighted MRI images of the brain, the noise model is assumed to be represented by a Rician distribution [20]. For large signal intensities the noise distribution can be considered as a Gaussian distribution (this is the case for the White Matter (WM) or the Gray Matter (GM)). We propose in Figure 2 an example of WM segmentation by maximizing the KL divergence between Gaussian distributions. When evaluating quantitatively our results of WM segmentation on the simulated brain T1weighted MRI images provided by the Montreal Neurological Institute Brain Web URL, we find a dice coefficient of 0.91, a very law False Positive Fraction (FPF) of 0.8% and a True Positive Fraction (TPF) of 84%.
(a) 3D rendering of the WM
(b) slice 72
(c) slice 75
(d) slice 84
Fig. 2. 3D Segmentation of WM in a T1 brain MRI using KL maximization
As the Rayleigh distribution is well suited to model noise in echography [20], this noise model was applied for segmentation of the left ventricle in contrast echocardiography. Final contours for several images of the sequence are shown in Figure 3. The segmentation is accurate all along the sequence. Note that experimental results reported in [21, 4] prove that when using the appropriate noise model, segmentation results are more accurate and less sensitive to the choice of the regularization parameters.
frame 1
frame 31
frame 40
Fig. 3. Segmentation of the LV in a contrast echocardiographic sequence
Optimization of Divergences within the Exponential Family for Image Segmentation
147
References 1. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. International Journal of Computer Vision 1, 321–332 (1988) 2. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. International Journal of Computer Vision 22(1), 61–79 (1997) 3. Aujol, J.F., Aubert, G., Blanc-Féraud, L.: Wavelet-based level set evolution for classification of textured images. IEEE Transactions on Image Processing 12(12), 1634–1641 (2003) 4. Martin, P., Réfrégier, P., Goudail, F., Guérault, F.: Influence of the noise model on level set active contour segmentation. IEEE PAMI 26, 799–803 (2004) 5. Cremers, D., Rousson, M., Deriche, R.: A review of statistical approaches to level set segmentation: integrating color, texture, motion and shape. International Journal of Computer Vision 72(2), 195–215 (2007) 6. Aubert, G., Barlaud, M., Faugeras, O., Jehan-Besson, S.: Image segmentation using active contours: Calculus of variations or shape gradients? SIAM Applied Mathematics 63(6), 2128–2154 (2003) 7. Zhu, S., Yuille, A.: Region competition: unifying snakes, region growing, and bayes/MDL for multiband image segmentation. IEEE PAMI 18, 884–900 (1996) 8. Paragios, N., Deriche, R.: Geodesic active regions: A new paradigm to deal with frame partition problems in computer vision. JVCIR 13, 249–268 (2002) 9. Rougon, N., Discher, A., Prêteux, F.: Region-based statistical segmentation using informational active contours. In: SPIE Conf. on Mathematics of Data/Image Pattern Recognition, San Diego, CA (August 2006) 10. Michailovich, O., Rathi, Y., Tannenbaum, A.: Image segmentation using active contours driven by the bhattacharyya gradient flow. IEEE Transactions on Image Processing 16, 2787– 2801 (2007) 11. Wang, Z., Vemuri, B.: DTI segmentation using an information theoretic tensor dissimilarity measure. IEEE Transactions on Medical Imaging 24(10), 1267–1277 (2005) 12. Jehan-Besson, S., Barlaud, M., Aubert, G.: DREAM2 S: Deformable regions driven by an eulerian accurate minimization method for image and video segmentation. International Journal of Computer Vision (53) , 45–70 (2003) 13. Kullback, S.: Information Theory and Statistics. Wiley, New York (1959) 14. Chan, T.F., Vese, L.A.: Active contour without edges. IEEE Transactions on Image Processing 10, 266–277 (2001) 15. Delfour, M., Zolésio, J.: Shape and geometries. Advances in Design and Control. SIAM, Philadelphia (2001) 16. Bickel, P., Docksum, K.: Mathematical statistics: basic ideas and selected topics, 2nd edn., vol. I. Prentice-Hall, London (2001) 17. Koopman, P.: On distributions admitting a sufficient statistic. Trans. Am. Math. Soc. 39, 399–409 (1936) 18. Osher, S., Sethian, J.: Fronts propagating with curvature-dependent speed: Algorithms based on hamilton-jacobi formulation. Journal of Computational Physics 79, 12–49 (1988) 19. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991) 20. Goodman, J.: Some fundamental properties of speckle. J. of Optical Society of America 66, 1145–1150 (1976) 21. Lecellier, F., Jehan-Besson, S., Fadili, J., Aubert, G., Revenu, M.: Statistical region-based active contours with exponential family observations. In: ICASSP, vol. 2, pp. 113–116 (2006)
148
F. Lecellier et al.
A Appendix A.1 Shape Derivation Tools Let us remind this useful theorem [15] that will be used in the following proofs. Theorem 4. The Gâteaux derivative of the functional J(Ω) = f (x, Ω) dx in the Ω direction of V is: < J (Ω), V >= fs (x, Ω, V)dx− f (x, Ω)(V·N )da(x) where Ω
∂Ω
N is the unit inward normal to ∂Ω, da its area element and fs the shape derivative of f [15]. A.2 Proof of Theorem 3 To compute < D (Ω), V >, we must first get the derivative of q(y(x), η) with respect to the domain, and apply the chain rule to Ψ (q(y(x), η), p(y)). To simplify the notation we write the Eulerian derivative of η as < η (Ω), V >= ∇V η = [< η1 (Ω), V >, .., < ηκ (Ω), V >]T . Using the definition of q(y, η) given in (6) and the chain rule applied to A (η(Ω)), we obtain: = h(y) (∇V η, T(y)−∇V η, ∇A(η)) eη(Ω),T(y)−A(η(Ω)) (14) = q(y, η)∇V η, T(y) − ∇A(η) . By the chain rule applied to Ψ (q(y(x), η), p(y)), we get < Ψ (q(y, η), p(y)), V >=< q (y, η), V > ∂1 Ψ (q, p), which gives < D (Ω), V >= χ q(y, η)∂1 Ψ (q, p)∇V η, T(y) − ∇A(η)dy. We introduce C= q(y, η)∂1 Ψ (q, p) (T(y) − ∇A(η)) dy = E[∂1 Ψ (q, p) (T(Y ) − E[T(Y )])] χ
which completes the proof. A.3 Proof of Lemma1 When using the MLE, the term E[T(Y )] can be empirically estimated with T(Y ) and so derived easily with respect to the domain Ω. We propose to directly derive the expression ∇A(η) = T(Y ) which gives: κ j=1
< ηj , V >
∂2A (η) =< Ti (Y ) , V > ∂ηi ∂ηj
∀i ∈ [1, κ] ,
(15)
¨ which can be written in the compact form ∇V (T) = A(η)∇ V η. ¨ Restricting our study to the full rank exponential family, where A(η) is a symmetric positive-definite, hence invertible, matrix (Theorem 2), the domain derivative of the pa¨ −1 ∇V (T) = ∇V η where ∇V (T) is given rameters η is uniquely determined by A(η) 1 by: ∇V (T) = |Ω| ∂Ω T(y) − T(y(a)) (V · N)da(x) (taking benefit of theorem 4) and the lemma follows.
Optimization of Divergences within the Exponential Family for Image Segmentation
149
A.4 Proof of Lemma 2 Since p and q belongs to the same parametric law, they share the same value for h(y), T(y) and A(η) and then log(q) − log(p) = η − η 1 , T(y) − A(η) + A(η 1 ). The value of C is then C = s1 − s2 , with: s1 = E[(η − η 1 , T(y) − A(η) + A(η 1 ) + 1)(Ti (Y ) − E[Ti (Y )]] p s2 = E[ (Ti (Y ) − E[Ti (Y )]]Ep [(Ti (Y ) − E[Ti (Y )]] q Developing the expression of the expectation of the second term,we find s2 = Ep [(Ti (Y ) − E[Ti (Y )]] = ∇A(η 1 ) − ∇A(η). Using the linearity of the expectation and the fact that E[Tj (Y )(Ti (Y )] − E[Ti (Y )]E[Tj (Y )] designates the co¨ ij = variance matrix of the sufficient statistics T and can then be replaced by A(η) κ ¨ ¨ ¨ Cov[T(Y )]ij = A(η)ji , we find: s1 = j=1 (ηj − η1j )A(η)ij , and then C = A(η) (η − η1 ) + ∇A(η) − ∇A(η 1 ).
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation Jan Lellmann, Jörg Kappes, Jing Yuan, Florian Becker, and Christoph Schnörr Image and Pattern Analysis Group (IPA) HCI, Dept. of Mathematics and Computer Science, University of Heidelberg {lellmann,kappes,yuanjing,becker,schnoerr}@math.uni-heidelberg.de
Abstract. Multi-class labeling is one of the core problems in image analysis. We show how this combinatorial problem can be approximately solved using tools from convex optimization. We suggest a novel functional based on a multidimensional total variation formulation, allowing for a broad range of data terms. Optimization is carried out in the operator splitting framework using Douglas-Rachford Splitting. In this connection, we compare two methods to solve the Rudin-Osher-Fatemi type subproblems and demonstrate the performance of our approach on single- and multichannel images.
1
Introduction
In this paper, we study the variational approach inf f (u) , f (u) = − u(x), s(x)dx + λ TV(u) , u∈C
λ>0,
(1)
Ω
for determining a labeling u : Ω → RL , that is a contextual classification of each pixel x ∈ Ω into one out of L classes, based on an arbitrary vector-valued similarity function s(x) ∈ RL as input data that has been computed from image data beforehand. The objective function (1) comprises the common form of a data term plus a regularization term. The data term is given by the L2 inner product of the assignment variables u and the similarity function s, and the regularizer is a total variation (TV) formulation for vector-valued data, TV(u) = ∇u1 2 + · · · + ∇uL 2 dx . (2) Ω
Furthermore, the constraint u ∈ C restricts the vector field u(x) at each location x ∈ Ω to lie in the standard probability simplex, that is u(x) ∈ RL + and L i=1 u(x) i = 1 for all x ∈ Ω. Our work is motivated by the following observation. Suppose that at each pixel x ∈ Ω, there is an unambiguous assignment (labeling) of the data s(x) to some class l ∈ {1, . . . , L} represented by the corresponding l-th unit vector, X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 150–162, 2009. c Springer-Verlag Berlin Heidelberg 2009
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation
151
Fig. 1. Left: Noisy input image. Right: The labeled image based on the non-binary assignment u as global minimizer of the convex approach (1). The discrete problem is accurately solved by a continuous approach.
u(x) = el . Then, an interface with√area A between two image regions labeled with l and l , respectively, adds A 2 to the regularization term iff l = l , as all but two gradients under the square root vanish. As a result, under these √ assumptions and up to the immaterial constant 2, the TV term corresponds to the well-known Potts model that assigns constant penalties to local changes of the labeling. A significant difference between the Potts model and our approach (1), however, is that the former amounts to solve a discrete combinatorial problem, whereas the latter is a continuous convex optimization problem. Experiments show that our approach (1) approximates discrete decisions fairly well (Fig. 1 and 2) by computing a global optimum to a single convex optimization problem. By contrast, the state-of-the-art discrete approach [1] approximates the combinatorial solution by solving a non-uniquely defined sequence of binary problems via graph cuts. This fact, along with the potential of continuous convex optimization for parallel implementations and their more robust dependency on (hyper-) parameters, motivated to investigate the approach (1) as a promising model for a general “labeling submodule” within computer vision systems. To this end, – We have a closer look at the data and regularization terms (section 2). – We apply an operator splitting approach to (1) in order to decompose the computation of a globally optimal labeling into two independent computational steps: TV denoising for vector-valued data, and projection of the labeling vectors u(x) on the canonical simplex (section 3). – We evaluate two different algorithms for the TV denoising subroutine (section 4) and compare the performance of our convex method to a range of established graph cut-based approaches (section 5). Related work. In contrast to the binary case with anisotropic discretization [2], multi-class energies are generally not submodular and thus cannot be optimized globally using graph cuts [3]. Some extensions exist, which find a local minimum by solving a sequence of binary graph cuts [1]. The continuous formulation – optimization on the set of characteristic functions – is known as continuous cut [5]. Chan et al. [6] showed that this problem can be relaxed and solved on
152
J. Lellmann et al.
Fig. 2. Output of the standard TV approach [4] for scalar-valued images applied to the noisy input image depicted in Fig. 1, for different values of the regularization parameter λ. Irrespective of this value, the performance is worse than with the approach (1) (cf. Fig. 1, right), because the latter approximates the Potts model that does not depend on the size (contrast) of discontinuities. Consequently, the former approach cannot remove noise without degrading weak discontinuities, as is apparent above for the horizontal discontinuities.
a convex set, without losing global optimality. In contrast, our work is aimed at the multi-class case. In [7], a comparable approach based on [8] was presented, which relies on a natural ordering of the labels, as given in e.g. stereo reconstruction. An approach very similar to ours was recently presented in [9], where the authors use a different formulation of the total variation on vector fields, and an alternating optimization method. The (discrete) Potts model was studied in [10], where approximate solutions were computed by an LP relaxation with explicit constraints. In contrast, our approach considers the general TV term and a problem decomposition into efficiently solvable subproblems, without the need to introduce additional variables. Notation. We consider the discretized version of our approach (1). Let Ω = {1, . . . , n1 } × · · · × {1, . . . , nd } ⊆ Rd , d ∈ N, denote a regular image grid of n := |Ω| pixels. The (multidimensional) image space X := Rn×L is equipped with the Euclidean inner product ·, ·Ω over the vectorized elements. We naturally identify v = (v 1 , . . . , v L ) ∈ Rn×L with ((v 1 ) · · · (v L ) ) ∈ RnL . Superscripts v i denote a collection of vectors, while subscripts vk denote vector components. Using the notation e = (1, 1, . . . , 1) , the standard simplex on RL n×L L and its extension C on R are given by ΔL := v ∈ R v ≥ 0 , e, v = 1
and C := x∈Ω ΔL . Define δC (x) to be 0 iff x ∈ C, and +∞ otherwise. Let grad := (grad 1 , . . . , gradd ) be the d-dimensional forward difference gradient operator for Neumann boundary conditions. Accordingly, div := −grad is the backward difference divergence operator for Dirichlet boundary conditions. These operators extend to Rn×L via Grad := (IL ⊗ grad), Div := (IL ⊗ div), where IL is the L × L identity matrix. We will also need the convex sets L 12 Bλ := (p1 , . . . , pL ) ∈ Rd×L pi 22 λ , i=1
(3)
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation
Dλ :=
Bλ ⊆ Rn×d×L , Eλ := {u ∈ Rn×L |u = Div p , p ∈ Dλ } .
153
(4)
x∈Ω
The discrete total variation on vector-valued data is then defined as
TV(u) := σE1 (u) = Gx u2 ,
(5)
x∈Ω
where σM (u) := supp∈M u, p is the support function from convex analysis, and Gx is an (Ld)×n matrix composed of rows of (Grad) s.t. Gx u gives the gradients of all ui in x stacked one above the other.
2
Variational Approach
Based on the introduced notation, our novel approach (1) reads inf f (u) ,
u∈C
λ TV(u) , f (u) = −u, sΩ + data term regularization term
λ>0,
(6)
As the objective function f and the constraint set C are convex, the overall problem is convex as well. We will now define and motivate each term. Data Term. The data term in (6) is fairly general. Any vector-valued similarity function s can be used, whose components s(x) i indicate the affinity of some data point at x with class i. As an example, suppose we have image features g(x), x ∈ Ω, prototypical feature vectors G = (G1 , . . . , GL ) as well as a distance measure d on the features. We might think of g as a grayscale image, of G as some prototypical gray values, and of d as a quadratic distance measure, possibly derived from a statistical noise model. The hard assignment of the pixel x ∈ Ω to a label (or class) l(x) ∈ {1, . . . , L} should then be penalized by the distance d(g(x), Gl(x) ) of the corresponding feature to the prototype of the assigned class. Denoting the negative distance by s, and summing up over the image domain, we see that
s(x), u(x) for u(x) = el(x) . d g(x), Gl(x) = − (7) x∈Ω
x∈Ω
Thus, instead of looking for l ∈ {1, . . . , L}n , we may equivalently look for u ∈ {e1 , . . . , eL }n . However, the right hand side formulation has the advantage that it extends naturally to the soft assignment u ∈ C: We may now solve the easier problem of optimizing for u on the convex set C. In our experiments, we chose d(x, y) = ||x − y||1 , as the 1 -norm is still convex but known to be more robust against noise and outliers. However, s is not restricted to representing distances. In fact, it may be arbitrarily nonlinear and nonconvex in x and g, and involve nonlocal operations on g. The complexity is completely hidden within the precomputed vector s. Regularization Term. Recall that the regularizer of (6) is defined (5) as
TV(u) = sup u, Div p = Gx u2 . (8) p∈D1
x∈Ω
154
J. Lellmann et al.
This definition for vector-valued u parallels the definition of the “isotropic” total variation measure in the scalar-valued case [11, 4, 12]. It is also known as MTV [13, 14, 15], and was recently studied in [16] in its continuous formulation. Contrary to the anisotropic discretization, where one would substitute the sum of 1-norms in (3), it is less biased towards edges parallel to the axes. See also [17] for an overview of TV-based research and applications. Optimality. After solving the relaxed problem, it remains to show that a binary solution can be recovered. For the continuous, binary case, Chan et al. [6] showed that an exact solution can be obtained by thresholding at almost any threshold. However, their results do not immediately transfer to the discrete multi-class case. In particular, the crucial “layer cake” formula holds for 1 -, but not 2 discretizations of the TV. Contrary to the binary case, it is not clear which rounding scheme to use for vector-valued u. For our experiments, we chose the final class label for each pixel x as the index l of the maximal u∗l (x) of the global optimum u∗ of (6). This defines a suboptimal discrete solution u∗t . Bounding the error f (u∗t ) − f (u∗d) with respect to the unknown discrete optimum u∗d will be subject of our future work.
3
Optimization
Two basic problems arise concerning the optimization of (6): Nondifferentiability of the objective function due to the TV term, and handling of the simplex constraint u ∈ C. We cope with the latter using the tight Douglas-Rachford splitting method as presented in the following section. We refer to [18] for the full derivations. Douglas-Rachford Splitting. Minimization of a proper, convex, lower-semicontinuous (lsc) function f : X → R can be regarded as finding a zero of its (necessarily maximal monotone [19, Chap. 12]) subgradient operator T := ∂f : X ⇒ X. In the operator splitting framework, ∂f is assumed to be decomposable into the sum of two “simple” operators, T = A + B, of which forward and backward steps can practically be computed. Here, we consider the (tight) Douglas-Rachford-Splitting iteration [20, 21], z k+1 ∈ (Jτ A (2Jτ B − I) + (I − Jτ B ))(z k ) ,
(9)
where Jτ T := (I + τ T )−1 is the resolvent of T . Under the very general constraint that A and B are maximal monotone and A + B has at least one zero, the sequence (z k ) will converge to a point z, with the additional property that x := Jτ B (z) is a zero of T ( [22, Thm. 3.15], [22, Prop. 3.20], [22, Prop. 3.19], [23]). In particular, for f = f1 +f2 , fi proper, convex, lsc with ri(dom f1 )∩ri(dom f2 ) = ∅ (ri(S) denoting the relative interior of a set S), it can be shown [19, Cor. 10.9] that ∂f = ∂f1 + ∂f2 , and the ∂fi are maximal monotone. As x ∈ Jτ ∂fi (y) ⇔ x = argmin(2τ )−1 x − y22 + fi (x), the computation of the resolvents reduces to proximal point optimization problems involving only the fi .
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation
155
Application. For our specific problem, we split inf (f1 (u) + f2 (u)) , f1 (u) = −u, sΩ + λTV(u) , f2 (u) = δC (u) .
u∈C
(10)
and get the following Douglas-Rachford scheme: Algorithm 1. Outer loop (Douglas-Rachford) 1: choose some u0 and a fixed step size τ > 0 2: repeat 1 3: solve uk ← argminu { 2τ u − z k 2 − u, s + σEλ (u)} 1 k 4: solve w ← argminw { 2τ w − (2uk − z k )2 + δC (w)} 5: z k+1 ← z k + wk − uk 6: until uk − uk−1 ∞ δouter .
As f is bounded from below on the compact set C and thus attains its minimum. From the remarks in the last section, we get convergence of the scheme for the discrete case: δC (w) and σEλ are both proper, convex, lsc with dom σEλ = Rn and ri(C) = ∅. In practice, one has to deal with solutions of the subproblems with limited accuracy. While there are extensions of the convergence result that take these inexact solutions into account [22, Prop. 4.50], they require the subproblems to be solved with increasing accuracy. However, we found that the method generally converged even though these requirements were not met.
4
Inner Loop Optimization
The second subproblem (Alg. 1, step 4) is a projection on the constraint set, wk = ΠC (2uk − z k ), which requires one projection on the low-dimensional unit simplex ΔL per x ∈ Ω. These projections can be computed in a finite number of steps [24]. The first subproblem (step 3) is equivalent to 1 uk = argminu u − (z k + τ s)2 + (τ λ)T V (u), (11) 2 i.e. an extension to vector vector-valued u of the classical Rudin-Osher-Fatemi (ROF, TV-L2 ) problem with regularization parameter τ λ. Many methods have been suggested to solve the ROF problem, e.g. PDE, fixpoint, or interior point methods for primal [4, 25], dual [26, 27, 28], or mixed [29] formulations. We evaluate two approaches: First, we will formulate a particularly simple gradient projection method in the operator splitting framework, cf. [30]. This scheme was introduced in [27] and extended to the multidimensional case in [31] (see also [16]). The second approach is based on the fast half-quadratic method of Yang et al. [15]. −1 k Forward-backward approach. The optimality of step (z − k 3, τ k condition u) + s ∈ ∂σEλ (u), can be rewritten as u = τ z /τ + s − ΠEλ z /τ + s . To compute the projection ΠEλ , we use the dual representation, 1 1 2 2 ΠEλ (x) = argmin q − xΩ = Div argmin Div p − xΩ + δDλ (p) . (12) 2 p q∈Eλ 2
156
J. Lellmann et al.
Using a simple forward-backward splitting for the inner problem results in the (gradient projection) update rule pj+1 = ΠDλ p − νDiv (Div p − x) . The projection ΠDλ can be computed explicitly and is separable in x, while the inner part can be computed for all models independently. This opens up the method to parallelization. Convergence is guaranteed for ν < 2/Div Div (see e.g. [22,√Thm. 3.12]). Extending the argument in [26, Thm. 3.1], we find that div 4d. Accord1 ingly, we may set ν < 2d . In our experiments, we set ν = 0.95 2d to avoid numerical problems close to the theoretical maximum. Wrapping up, we have Algorithm 2. Inner loop, forward-backward approach 1: 2: 3: 4: 5:
k
x ← zτ + s, choose arbitrary p0 ∈ Rn×d×L repeat pj+1 = ΠDλ (pj − νDiv (Div p − x)) until pj+1 − pj ∞ δinner uk ← τ (x − Div pj+1 ).
Half-quadratic approach. While the forward-backward method is simple and easy to implement, its convergence speed is in practice not satisfactory. As an alternative, we tested an ROF specialization of the general multichannel image restoration method by Yang et al. [15]. Starting from (11), the problem is to find μ uk = argminu g(u) , g(u) := u − f 2 + T V (u) , (13) 2 where μ := τ1λ and f := z k + τ s. Using a half-quadratic approach [32, 33], Yang et al. derive the splitting/penalty formulation
μ β 2 (u, y) = argmin yx + yx − Gx u + u − f 2Ω . (14) 2 2 Ld nL yx ∈R ,x∈Ω,u∈R x∈Ω
The parameter β controls smoothing of the total variation; setting β n/(2ε) guarantees ε-suboptimality of the solution of the smoothed problem with respect to the original problem (for a derivation see [18]). Equation (14) can be solved using alternating minimization w.r.t. u and the auxiliary variables yx . The latter is highly parallelizable, as it boils down to n separate explicit operations: yxj+1 = max Gx u − β −1 , 0 (Gx u/Gx u) . (15) On the other hand, minimizing (14) for u amounts to solving μ Grad Grad + (μ/β)I(nL) uj+1 = Grad y j+1 + f, β for uj+1 , where y j+1 is a proper rearrangement of the yx .
(16)
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation
157
Fig. 3. Results of the speed comparison between forward-backward (FB) and halfquadratic method (HQ) for the inner problem, applied to data from the first iteration of the outer problem (cf. Table 1). Left to right: Original input, FB with τ λ = 5, HQ with τ λ = 5, FB with τ λ = 20, HQ with τ λ = 20. Iteration counts were fixed at 80 resp. 300 to equalize the runtime for both approaches. For larger regularization parameter, the half-quadratic method outperforms the forward-backward approach as smoothness increases.
For periodic boundary conditions, Yang et al. solved (16) rapidly using FFT. In our case, Neumann boundary conditions and thus the Discrete Cosine Transform (DCT-2) [34] are appropriate. This requires 2L independent (parallelizable) individual DCTs which can be efficiently computed in O(n log n) each. By the alternating application of the above two steps, we can solve (14) for fixed β large enough for any required suboptimality bound. In practice, convergence can be sped up by starting with a small β and solving a sequence of problems for increasing β, warm-starting each with the solution for the previous problem. Given an arbitrary u0 ∈ RnL , the complete algorithm reads Algorithm 3. Inner loop, half-quadratic approach 1: while stopping criterium not satisfied do 2: compute y j+1 from (15) 3: compute uj+1 from y j+1 and (16), 4: possibly increase β 5: end while The stopping criteria can be based on the residual [15]. For our experiments, we set a fixed iteration count, as increasing β at each step turned out to lead to fastest convergence, and residua for different β are not comparable.
5
Experiments, Performance Evaluation
Inner Problem. We compared the half-quadratic approach to the conventional forward-backward method. The difficulty with the former lies in the choice of the update strategy for β. We chose a generalization of the exponential strategy outlined in [15]: Set β = βmin and update by multiplying with c := (βmax /βmin)1/K for some K until β = βmax . We made the following observations: – In order to rapidly minimize the objective function, it is best to use a continuation strategy, i.e. to increase β at each step, rather than spending time on solving (14) exactly for each β.
158
J. Lellmann et al.
– Increasing K generally improves the quality of the result. – For fixed βmax and K, there seems to be a unique optimal βmin that minimizes the final objective function value. With the continuation strategy and fixed βmax , we found the optimal βmin to usually lie in the range of 10−5 βmax to 10−3 βmax . Unfortunately, there seems to be a strong dependency on the choice of λ as well as the scale and complexity of s. We set βmin = 0.2 · 10−4 βmax , which worked well for our data. βmax was set at n/0.2 according to a suboptimality bound of ε = 0.1 (section 4). We compared the performance of the two methods in terms of the objective function value for fixed runtime of the optimized Matlab implementations (Fig. 3, Table 1). For larger τ λ,the half-quadratic method gives better results. For τ λ = 20, less than 10 iterations are required to reach the quality of 300 iterations of the forward-backward method, giving a speedup of about 4-5. However, finding the optimal parameter set is more involved than for the forward-backward method. Table 1. Run times t (in seconds), objective function values r and relative differences (rHQ − rFB )/rHQ for the experiment in Fig. 3. For larger τ λ, the half-quadratic method gives more accurate results in the same time. τλ 0.1 1 2 5 10 20 50 tHQ 1.14 1.23 1.20 1.31 0.98 0.95 1.08 tFB 1.03 1.02 1.06 1.03 1.22 1.25 1.19 rHQ 3901.9 27660.7 36778.5 40038.8 42262.8 44377.1 44752.5 rFB 3901.9 27660.4 36760.6 40104.3 42924.3 46988.6 57504.9 rel. diff. 1.17e-16 1.24e-5 4.85e-4 -1.64e-3 -0.0156 -0.0588 -0.285
Overall Problem. We evaluated the performance of our algorithm against five different methods in their publicly available implementations from the Middlebury MRF benchmark [35]: Belief Propagation (BP), Sequential Belief Propagation (BPS), Graph Cuts with alpha-expansion (GCE) and alpha-beta swap (GCS), and Sequential Tree Reweighted Belief Propagation (TRBPS). Of each of the grayscale 32 × 32 images, 20 noisy copies were generated and segmented into four gray levels with fixed intensities. In view of the last section and in order not to mix up speed with accuracy issues, we used the forward-backward approach for the inner loop. We set δinner = 1 · 10−3 , δouter = 2 · 10−2, and τ = 1. For small λ, our method shows results comparable to the other approaches with respect to the number of bad labels. We point out again that this solution to the non-binary labeling problem is achieved by solving the convex optimization problem (6) followed by local rounding as explained in section 2. In contrast to our method, the MRF benchmark algorithms optimize the anisotropic energy. To compensate, their λ was scaled by a common factor of √ ≈ 2 that was found empirically. Nevertheless, the discretization gives them a small advantage on images with axis parallel edges (experiments 1 and 2).
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation
159
25
50 bp bps gce gcs trws tv
40 30
bp bps gce gcs trws tv
20 15 10
20 10 0 0
Standard deviation
Incorrect labels (mean %)
Fig. 4. Exemplary grayscale segmentation results for the benchmarked methods for four labels. Left to right: Noisy input data, final results for BP, BPS, GCE, GCS, TRWS, and the proposed method (TV). λ was manually chosen for each method. Axis-parallel edges are better recovered by the anisotropic methods, while our isotropic discretization has an advantage on diagonal edges.
5 0
0.2
0.4
0.6
λ
0.8
1
1.2
1.4
0
0.2
0.4
0.6
λ
0.8
1
1.2
1.4
Fig. 5. Error rates for the first experiment in Fig. 4. For each λ, all experiments were repeated 20 times with random noise (zero-mean Gaussian with σ = 0.45, 0.35, 0.25 resp. 0.35 for experiments 1-4 and image intensities in [0, 1]), and the percentage of incorrectly assigned labels compared to ground truth was recorded. Sequential Belief Propagation (BPS) generally performed worst, while our method (TV) was on par with the others, in particular for lower λ. The figure also reveals that belief propagation (BP) gets stuck in a good, but often inferior local optimum, and does not respond to larger values of λ, i.e. stronger regularization requested by the user.
Figure 6 demonstrates the performance of our algorithm for color segmentation. Only few outer iterations (20 in our case) are necessary for accurate optimization.
160
J. Lellmann et al.
Fig. 6. Performance of our method for four-class segmentation based on 1 color distance. Left to right: Ground truth, inspired by [29, 36]; ground truth overlaid with Gaussian noise, σ = 1; local nearest-neighbor labeling; our approach with λ = 0.7 after 20 outer iterations. The energy of the result is about 1% lower than the energy of the ground truth, suggesting that at this noise level, further improvements are limited by the model.
6
Conclusion and Future Work
In this paper, we presented a convex variational approach to solve the combinatorial multi-labeling problem for energies involving a general data term, total-variation-like regularizers, and simplex constraints. To enforce the simplex constraint, we based our approach on the globally convergent Douglas-Rachford operator splitting scheme. We evaluated two methods in order to efficiently solve the ROF-type subproblems, and showed that the half-quadratic approach allows faster convergence at the price of more involved parameter tuning. Experiments showed that the quality of the generated labelings is comparable to state of the art discrete optimization methods, and can be achieved by just solving a convex optimization problem. Due to the generality of the data term, our method allows for a wide range of features or distance measures. To fully evaluate these possibilities in connection with variations of the TV measure is a subject of our future research. Acknowledgements. Jing Yuan gratefully acknowledges support by the German National Science Foundation (DFG) under grant SCHN 457/9-1.
References 1. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. PAMI 23(11), 1222–1239 (2001) 2. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. PAMI 26(9), 1124–1137 (2004) 3. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? PAMI 26(2), 147–159 (2004) 4. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 5. Strang, G.: Maximal flow through a domain. Math. Prog. 26, 123–143 (1983)
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation
161
6. Chan, T.F., Esedo¯ glu, S., Nikolova, M.: Algorithms for finding global minimizers of image segmentation and denoising models. J. Appl. Math. 66(5), 1632–1648 (2006) 7. Pock, T., Schönemann, T., Graber, G., Bischof, H., Cremers, D.: A convex formulation of continuous multi-label problems. In: ECCV, vol. 3, pp. 792–805 (2008) 8. Ishikawa, H.: Exact optimization for Markov random fields with convex priors. PAMI 25(10), 1333–1336 (2003) 9. Zach, C., Gallup, D., Frahm, J.M., Niethammer, M.: Fast global labeling for realtime stereo using multiple plane sweeps. In: VMV (2008) 10. Kleinberg, J., Tardos, E.: Approximation algorithms for classification problems with pairwise relationships: Metric labeling and MRFs. In: FOCS, pp. 14–23 (1999) 11. Ziemer, W.: Weakly Differentiable Functions. Springer, Heidelberg (1989) 12. Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. Univ. Lect. Series, vol. 22. AMS (2001) 13. Sapiro, G., Ringach, D.L.: Anisotropic diffusion of multi-valued images with applications to color filtering. Trans. Image Process. 5, 1582–1586 (1996) 14. Chan, T.F., Shen, J.: Image processing and analysis. SIAM, Philadelphia (2005) 15. Yang, J., Yin, W., Zhang, Y., Wang, Y.: A fast algorithm for edge-preserving variational multichannel image restoration. Tech. Rep. 08-09, Rice Univ. (2008) 16. Duval, V., Aujol, J.F., Vese, L.: A projected gradient algorithm for color image decomposition. CMLA Preprint (2008-21) (2008) 17. Chan, T., Esedoglu, S., Park, F., Yip, A.: Total variation image restoration: Overview and recent developments. In: The Handbook of Mathematical Models in Computer Vision. Springer, Heidelberg (2005) 18. Lellmann, J., Kappes, J., Yuan, J., Becker, F., Schnörr, C.: Convex multi-class image labeling by simplex-constrained total variation. TR, U. of Heidelberg (2008) 19. Rockafellar, R., Wets, R.J.B.: Variational Analysis, 2nd edn. Springer, Heidelberg (2004) 20. Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. of the AMS 82(2), 421–439 (1956) 21. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis 16(6), 964–979 (1979) 22. Eckstein, J.: Splitting Methods for Monotone Operators with Application to Parallel Optimization. PhD thesis, MIT (1989) 23. Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for max. mon. operators. M. Prog. 55, 293–318 (1992) 24. Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of Rn . J. Optim. Theory and Appl. 50(1), 195–200 (1986) 25. Dobson, D.C., Curtis, Vogel, R.: Iterative methods for total variation denoising. J. Sci. Comput 17, 227–238 (1996) 26. Chambolle, A.: An algorithm for total variation minimization and applications. JMIV 20, 89–97 (2004) 27. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 28. Aujol, J.F.: Some algorithms for total variation based image restoration. CMLA Preprint (2008-05) (2008) 29. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. J. Sci. Comput. 20, 1964–1977 (1999) 30. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. SIAM J. Multisc. Model. Sim. 4(4), 1168–1200 (2005)
162
J. Lellmann et al.
31. Bresson, X., Chan, T.: Fast minimization of the vectorial total variation norm and applications to color image processing. Tech. Rep. 07-25, UCLA (2007) 32. Geman, D., Yang, C.: Nonlinear image recovery with halfquadratic regularization. IEEE Trans. Image Proc. 4(7), 932–946 (1995) 33. Cohen, L.: Auxiliary variables and two-step iterative algorithms in computer vision problems. JMIV 6(1), 59–83 (1996) 34. Strang, G.: The discrete cosine transform. SIAM Review 41(1), 135–147 (1999) 35. Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., Rother, C.: A comparative study of energy minimization methods for Markov random fields. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 16–29. Springer, Heidelberg (2006) 36. Hintermüller, M., Stadler, G.: An infeasible primal-dual algorithm for total bounded variation-based inf-convolution-type image restoration. J. Sci. Comput. 28(1), 1–23 (2006)
Geodesically Linked Active Contours: Evolution Strategy Based on Minimal Paths Julien Mille and Laurent D. Cohen CEREMADE, UMR CNRS 7534, Université Paris IX-Dauphine Place du Maréchal de Lattre de Tassigny, 75016 Paris, France {mille,cohen}@ceremade.dauphine.fr
Abstract. The proposed method is related to parametric and geodesic active contours as well as minimal paths, in the context of image segmentation1 . Our geodesically linked active contour model consists in a set of vertices connected by paths of minimal cost. This makes up a closed piecewise defined curve, over which an edge or region energy functional is formulated. The greedy algorithm is used to move vertices towards a configuration minimizing the energy functional. This evolution technique ensures lower sensitivity to erroneous local minima than usual gradient descent of the energy. Our method intends to take advantage of explicit active contours, minimal paths and greedy evolution techniques.
1
Introduction
Among well known variational models for image segmentation, active contours have drawn lively interest since their introduction by Kass et al [1]. Their key principle is the research of a curve minimizing an energy functional, which mainly depends on the adequacy of the curve to the target object. Active contours are implemented either with a parametric curve - in which case they are often referred to as ’snakes’ - or in an implicit fashion based on the level set framework [2] [3]. Early active contour models are mainly parametric and boundary-based, as the data term of the energy functional is an edge indicator function integrated along the curve. The Euler-Lagrange equation, determined by calculus of variations, indicates the minimizing flow to be followed by gradient descent scheme. These models are dependent of curve parameterization and unable to adapt their topology. Moreover, gradient descent is sensitive to local minima of the energy functional. Parameterization invariance is achieved by the geodesic active contour model [4], which introduces a geometrically intrinsic functional, whereas topology adaptiveness is provided by the level set implementation. Significant attempts have been made to decrease the sensitivity to local minima, based either on the gradient descent direction or on the minimization method itself. The balloon force [5] falls into the first category, as it adds a normal-oriented inflation or retraction component, in order to increase the capture range of the snake. As regards the evolution process, several heuristics based 1
This work was partially supported by ANR grant NanoGPSCellulaire ANR-05NANO- 045-06.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 163–174, 2009. c Springer-Verlag Berlin Heidelberg 2009
164
J. Mille and L.D. Cohen
on local searches have been proposed as alternatives to gradient descent, including dynamic programming [6] [7] and the greedy algorithm [8] [9]. The latter, which is subsequently addressed in the paper, considers the energy as a sum of curve points energies. It basically consists in iteratively moving curve points to locations minimizing their own energies, these locations belonging to a search window. On the other hand, the minimal path approach by Cohen and Kimmel [10], which seeks for a curve of minimal cost between two end-points, can be used to recover open and closed boundaries. It is closely related to the geodesic active contour with respect to the functional to be optimized, but has in addition the main benefit of finding a global minimum efficiently thanks to the Fast Marching technique [11]. In this paper, we deal with an explicit implementation of active contour, i.e. a discrete curve defined by control points, or vertices. The described method is both related to minimal paths and greedy search. Our geodesically linked active contour model is made up of a set of vertices connected by paths of minimal cost with respect to a boundary-based metric. We define search windows centered at each vertex and evolve vertices according to a greedy fashion. Making a given vertex movable and the other ones still, we consider every geodesically linked contour passing through the points in the window of the moving vertex. This last one is finally moved to the location leading to the contour of smallest energy. The motivation for this work resides in several points. Firstly, the minimal path approach alone can only find a minimizer of an edge functional, with one or several(s) fixed input end-point(s). Conversely, our model is suitable to any energy functional, which we prove by endowing it with edge-based or different regionbased energies, including the minimal variance of the Chan and Vese model [12]. We believe that describing the curve with geodesics is pertinent whatever the energy functional is. Indeed, whether the functional holds edge, region and/or even shape prior terms, the major part of the final curve will be located on more or less salient edges. In comparison to snakes driven by gradient descent, the use of search windows significantly reduces sensitivity to erroneous local minima and energy weights tuning.
2 2.1
Background Parametric and Geodesic Active Contours
The active contour model is represented as a plane curve Γ with C 2 position vector c(u) = [x(u) y(u)]. Segmentation of an object of interest is performed by finding the curve minimizing an energy functional E, which has the general form: 1 E(Γ ) = L(c, c , c )du (1) 0
where L is usually made up of internal terms regularizing the curve and external terms attaching the curve to image data. According to calculus of variations, the following variational derivative vanishes if the curve is a local minimizer of E: δE ∂L d ∂L d2 ∂L = − + 2 (2) δΓ ∂c du ∂c du ∂c
Geodesically Linked Active Contours
165
Curve evolution is usually performed by gradient descent, taking the opposite variational derivative as a descent direction. Given an image I defined over D ∈ R2 , they use the following edge indicator g, which is a decreasing function of gradient magnitude (the image is usually convolved with the derivative of a gaussian). The original parametric snake [1] has the following energy and variational derivative: 1 2 2 Lsnake (c, c , c ) = α c + β c + g(c) 2 (3) δEsnake = −αc + βc + ∇g δΓ The energy functional of the snake is dependent on parameterization. This has an impact on discretization, since the energy varies in terms of sampling when the contour is implemented as a polygonal curve. The geodesic active contour (GAC) [4] solves the parameterization issue by introducing an intrinsic energy functional, weighting the edge indicator by length element c : LGAC (c, c , c ) = g(c) c (4)
δEGAC = (∇g, n − κg) n δΓ
where n and κ are the unit inward normal vector and curvature, respectively. Hence, the flow resulting from the geometric energy also holds a regularization term. This model lends itself to level set implementation, allowing topology changes. Boundary-based models driven by gradient descent, whether parametric or geodesic, are relatively blind to neighboring structures and may get trapped in local minima induced by noise. To increase the capture range, the balloon force was introduced in [5] for parametric contours, whereas an advection term is used in [3] for level sets. Despite such techniques, gradient descent may still cause the contour to miss or pass through significant boundaries. The minimal path method addresses this issue by finding a global minimum of the energy. 2.2
Minimal Paths
The minimal path approach by Cohen and Kimmel [10] aims at finding curves of minimal lengths in a Riemannian space endowed with an heterogeneous isotropic metric. The length of path C is: 1 1 L(C) = (5) P˜ (C(s))ds = P˜ (C(u)) C (u) du 0
0
where s is the arc length. Potential P˜ , which defines the isotropic metric, should be chosen according to the application. Curves located on image boundaries are detected by using an edge-dependent potential P˜ (x) = w + g(x), where w is a regularizing constant. Hence the cost of C may be rewritten using euclidean length: 1 1 L(C) = (w + g(C(s)))ds = wLeuclidean(C) + g(C(s))ds (6) 0
0
166
J. Mille and L.D. Cohen
With respect to the energy functional to be minimized, the minimal path approach is similar to the geodesic active contour model, as can be seen in term LGAC of eq. 4. However, the minimal path has the avantageous difference of reaching the global minimum of the energy, given two fixed end-points x0 and x1 . Starting from point x0 , the minimal action map U0 should be calculated. It corresponds to the minimal cost integrated along a path starting at x0 and ending at x: C(0) = x0 U0 (x) = inf L(C) s.t. C(1) = x C The action map U0 is the viscosity solution of the Eikonal equation ∇ U = P˜ with initial condition U (x0 ) = 0. This allows U0 to be computed by the Fast Marching method [11], which is similar in principle to Dijkstra’s graph search algorithm. Once the action map has been computed, the geodesic γ, i.e. the path of minimal action linking a point x1 to x0 , is found by back-propagation starting from x1 until x0 is reached: γ = −∇U0 (γ). In its initial formulation, the minimal path method determines an open curve between two fixed end-points. It is also able to find closed contours by providing only one point on the final contour and detecting a saddle point on the minimal action map [10]. 2.3
Greedy Algorithm
Along with dynamic programming [6] [7], greedy methods deal with discrete energy functionals. The greedy algorithm for active contours, as developed in [8], seeks for a minimizer of the energy by means of a set of local optimizations. It is only applicable on explicit implementations, where the contour is represented as a polygon with n vertices {vi }1≤i≤n . The total energy is considered as a sum of vertex energies: n E(Γ ) = Evertex (vi ) i=1
where Evertex is the discretization of the energy at a given vertex, using finite differences. Considering the snake term Lsnake of eq. 3, it comes: 1 2 2 α vi − vi−1 + β vi+1 − 2vi − vi−1 + g(vi ) (7) Evertex (vi ) = 2 Vertices are successively moved in order to minimize their own energies. At each iteration, a square window of width m is considered around the current vertex. ˜ i in the window. The energy of the latter is computed at each tested position v The vertex is then moved to the position leading to the lowest energy, which is summarized by the evolution scheme, at iteration t: (t+1)
vi
= arg min Evertex (˜ vi ) ˜ i ∈W vi(t) v
where W(x) is the window centered at point x. The initial greedy algorithm [8] performs in O(nm2 ) operations. The window size has an obvious impact on computational cost, but also on convergence abilities. Indeed, the contour can capture
Geodesically Linked Active Contours
167
farther structures as the window is larger. The greedy algorithm is by essence a discrete optimization heuristic. The formulation of the variational derivative is not used and continuous calculus of variations is thus not necessary.
3
The Geodesically Linked Active Contour
We develop an approach taking advantage of above described methods. Our geodesically linked active contour is based simultaneously on an explicit implementation of active contours, minimal paths and the greedy algorithm. Basically, we deal with an evolving explicit closed curve, allowing initialization inside or around the target object without providing fixed points. Minimal paths coupled with a geometric energy functional allows a parameterization-free handling of the contour. The use of the greedy algorithm, as opposed to gradient descent, guarentees better robustness to local minima. 3.1
Minimal Paths to Connect Vertices
Let us consider a set of n linked vertices S = {vi }1≤i≤n . We denote as γi (u) = [xi (u) yi (u)] the geodesic path connecting vi to vi+1 : C(0) = vi γi = arg min L(C) s.t. (8) C(1) = vi+1 C where the cost functional L is defined in eq. 5. At every step of the evolution algorithm, the set of geodesics {γi }1≤i≤n describes a closed piecewise differentiable contour Γ , which euclidean length is: Leuclidean(Γ ) =
n i=1
0
1
γi (u) du
One may note that a concatenation of geodesics γi is not a geodesic itself, since it is forced to pass through given points. To some extent, curve Γ may be considered as a piecewise minimizer of an edge-based functional. If a uniform potential P˜ (x) = 1 was chosen, the geodesics would become straight lines of equation γi (u) = (1 − u)vi + uvi+1 , u ∈ [0, 1], in which case Γ would represent a polygon. Fig. 1 depicts geodesically linked contours with uniform potential and edge-based potential (dark smooth lines represent high image gradient areas). As described in section 2.2, path γi is determined by gradient descent of the minimal action map Ui+1 of origin vi+1 . Given start point vi+1 , the Fast Marching algorithm [11] allows to specify one or several end points (in our case vi ) so that propagation can be stopped when vi is reached. This prevents the whole image from being visited by the Fast Marching and saves computational time. In the case of edge-based segmentation, the interest of describing the evolving contour with geodesics is obvious. Indeed, in the end of deformation, the
168
J. Mille and L.D. Cohen
(a)
(b)
Fig. 1. Vertices linked by geodesics with uniform potential (a) and edge-based potential (b)
geodesics fit the actual boundaries of the sought object. On the other hand, in the case of region-based segmentation, image edges are not explicitly searched. However, we believe that linking vertices with geodesics is relevant for any usual segmentation criterion. We may assume that the final contour should be partially located on more or less salient boundaries, whatever energy functional is optimized. In subsequent sections, we formulate three energies independently implemented on the geodesically linked active contour, namely the edge, region and narrow band region energies. Before, we recall Green’s theorem, which we use to convert domain integrals into boundary integrals. For every region R and real-valued function f over R2 , we have: f (x)dx = P dx + Qdy (9) R
∂R
where [P (x) Q(x)] is a continuously differentiable vector field such that: 1 x 1 y Q(x, y) = f (t, y)dt P (x, y) = − f (x, t)dt 2 −∞ 2 −∞
(10)
The theorem expects that ∂R should be at least piecewise smooth, it is thus applicable to the geodesically linked active contour. For instance, to express the area of region Rin enclosed by Γ , we consider eq. 9 with f (x) = 1: n 1 1 |Rin | = xi (u)yi (u) − xi (u)yi (u)du 2 i=1 0 3.2
Edge Energy
Boundary-based segmentation is performed by minimizing an edge energy. The edge indicator function g is integrated along geodesics. In order not to penalize lengthy contours, the edge energy is normalized by euclidean length: n 1 1 g(γi (u)) γi (u) du Eedge (Γ ) = Leuclidean(Γ ) i=1 0 Note that according to eq. 6, the integral of g along γi equals Ui+1 (vi ) minus the euclidean length Leuclidean(γi ). Hence, once the action maps have been computed, the edge indicator does not need to be summed over geodesics again. With the edge energy alone, if the search space of vertex coordinates is too small, the
Geodesically Linked Active Contours
169
contour fails at capturing actual boundaries when initialized far from them. To increase the capture range, we add an area-dependent term, which minimization acts like a balloon force [5]: Eballoon (Γ ) =
|D| − |Rin | |D|
where |D| is the image area. In that case, the total energy is a weighted sum of edge and balloon energies. 3.3
Region Energy
The increasing use of region terms has proven to overcome limitations of edgebased only models, especially when dealing with data sets suffering from noise and lack of contrast between neighboring structures. Classical region-based deformable models segment images according to statistical data computed over the object of interest and the background. Image partitions should be uniform in terms of pixel intensities or higher level features like texture descriptors. We rely on the intensity variance, which is close to the two-phase Mumford-Shah segmentation model by Chan and Vese [12]. The average intensity in the inner region is expressed using Green’s theorem: μ(Rin ) =
1 |Rin |
1 |Rin | i=1 n
I(x)dx = Rin
1
0
xi P (γi ) + yi Q(γi )du
where P and Q are the summed intensities (see template formulas in eq. 10). Then, the inner intensity variance is: 1 1 2 2 σ (Rin ) = (I(x) − μ(Rin )) dx = I 2 (x)dx − μ(Rin )2 |Rin | Rin |Rin | Rin where the integral of squared intensities may also be expanded according to Green’s theorem. Corresponding quantities on the outer region may be expressed using relation f (x)dx = Rout
3.4
D
f (x)dx −
f (x)dx Rin
Narrow Band Region Energy
The ideal case of uniform regions is rarely encountered in real applications, as the background usually contains structures of various intensities. Hence, strict homogeneity is not necessarily a desirable property. In order to account for spatially varying intensity, local statistics in region-based segmentation have emerged recently [13] [14]. The narrow band principle, which has proven its efficiency in the evolution of level sets [3], is used in our approach to formulate a local region term [15].
170
J. Mille and L.D. Cohen
Γ
Γ[−B]
Γ[B]
Bin Bout Fig. 2. Inner and outer bands for narrow band region energy
Instead of dealing with whole domains Rin and Rout , we consider an inner band Bin and an outer band Bout in the vicinity of the contour, as depicted in fig. 2. The narrow band region energy is the intensity variance over the bands: Eband (Γ ) = σ 2 (Bin ) + σ 2 (Bout ) Our narrow band region energy is based on parallel curves [16]. We define curve γ[B] i as a parallel curve of γi : γ[B] i (u) = γi (u) + Bni (u)
(11)
where B is the user-defined band thickness, constant along the curve, and ni is the inward unit normal to geodesic γi . Hereafter, we will use the index [B] to denote all quantities related to the parallel curve. Bands Bin and Bout are bounded by parallel curves of the n geodesics γi , respectively γ[B] i and γ[−B] i . We assume that geodesics are smooth enough so that their parallel curves do not self-intersect nor exhibit singularities. An important property resulting from the definition in eq. 11 is that the velocity vector of the parallel curve can be expressed as a function of the velocity vector of the initial curve, as well as its curvature and normal. Using the identity ni = − κi γi , we have: γ[B] i = γi + Bni = (1 − Bκi )γi
(12)
By a change of variable, an integral over inner band Bin may be expressed explicitly in terms of the curve and band thickness: n 1 B f (x)dx = f (γi + bni ) γi (1 − bκi ) db du (13) Bin
0
i=1
0
We use the template formula in eq. 13 to express the mean and variance of intensities in the inner band: n 1 1 B μ(Bin ) = I(γi + bni ) γi (1 − bκi ) db du |Bin | i=1 0 0 1 σ (Bin ) = |Bin | i=1 2
n
0
1
0
B
(I(γi + bni ) − μ(Bin ))2 γi (1 − bκi ) db du
and similarly for the outer band, replacing b with −b.
Geodesically Linked Active Contours
3.5
171
Evolution with Greedy Algorithm
Vertices should be moved in order to minimize the selected energy. This is usually performed with gradient descent of the Euler-Lagrange equation. In our case, it is difficult to differentiate Eedge , Eregion or Eband with respect to a given vertex vi , since these energies depend on geodesics to vi (see eq. (8)). The greedy algorithm presented in section 2.3 provides us a way to evolve vertices without differentiating the energy. Motion of curve points can always be decomposed into normal and tangential components. While the geometry of the curve is modified by normal displacements, tangential motion only affects curve parameterization [4]. Since the distribution of vertices along the contour can be updated with a resampling technique, we only consider normal displacement in the greedy evolution. We define a normal-oriented window WN of length m centered at vertex vi : m m
WN (vi ) = vi + knvi k ∈ − , 2 2 where nvi is the inward unit normal vector, estimated by finite difference using the second and next-to-last points of geodesics γi and γi+1 , respectively. Since steps between successive points in the window are integers, the window may be computed using a Bresenham-like algorithm.
γ˜i ˜i v
vi+1
vi
γ˜i−1
vi−1 Fig. 3. Geodesics linking neighboring vertices to points in search window
Greedy evolution is performed by moving vertex vi to the position in the window which corresponding geodesically linked contour has the smallest en˜ i belonging to the window. The associated ergy. Let us consider a test position v geodesics γ˜i−1 and γ˜i link it to the neighbors of vi , as depicted in fig. 3. The energy of the corresponding geodesically linked contour Γ˜ = {γ1 , ..., γi−2 , γ˜i−1 , γ˜i , γi+1 , ..., γn } is computed and compared to the energy of the initial contour Γ . All window points are tested in this way, so that the evolution scheme at iteration t is: (t+1)
vi
=
arg min
˜ i ∈WN v
(t) vi
E(Γ˜ )
172
J. Mille and L.D. Cohen
where E is one of the previously described energies. If we consider set H = {1, ..., i − 2} ∪ {i + 1, ...n} holding indices of geodesics not influenced by a modification on vi , all quantities involved in the energies are written with constant ˜ i . For instance, the area of the tested inner and variable parts with respect to v region is decomposed: 1 ˜ xj (u)yj (u) − xj (u)yj (u) du Rin =
j∈H
+ 0
0 1
x ˜i−1 y˜i−1
−
x ˜i−1 y˜i−1
du + 0
1
x ˜i y˜i − x ˜i y˜i du
˜ i need to be comThis implies that the part of energies invariant with respect to v puted only once, before moving vi . Finally, once all vertices have been treated, resampling may be performed to maintain consistent distribution of vertices along the curve.
4
Experiments
We tested the geodesically linked active contour with the three different energy configurations (edge+balloon, region and narrow band region). A comparison
Fig. 4. Segmentation of left ventricle: initialization (a) and final location (b) of the geodesically linked active contour, initialization (c) and final location (d) of the parametric contour
Geodesically Linked Active Contours
173
with a parametric snake endowed with the same energies is provided. The snake was initialized as a small circle inside the area of interest, far from the target boundaries. Similarly, the initial vertices of our model are sampled on a circle. Results are shown in fig. 4. For each row, columns (a) and (b) represents the initial and final states of the geodesically linked active contour, respectively. Columns (c) and (d) represent the same states for the snake. For all experiments, the regularization weight w was set to 0.25, which achieved sufficient regularization for all tested images. The size of the window was m = 50 and the maximal inter-vertex distance for resampling was set to 20. The image in row 1, which was segmented using the edge energy, depicts the gapclosing ability of the model. The geodesically linked active contour managed to pass through false edges and reach actual boundaries. Thanks to the large search window, it turned out to be rather unsensitive to balloon strength, as values for coefficient α in the range [0.1, 4] were suitable. On the other hand, the balloon coefficient has a strong influence on the gradient descent-driven parametric snake, which yields difficult parameter tuning. Actually, it was not possible to find a correct balloon weight allowing to jump false edges while stabilizing on real ones. The image in row 2, which was segmented using the region energy, depicts a similar phenomenon. The geodesically linked contour does not get trapped in small gaps in the region, which could present an interest for segmentation of partially occluded objects. Row 3 depicts a MRI of the heart left ventricle, which was used to put the narrow band region energy into application. The band thickness B is an important parameter. Apart from its impact on the algorithmic complexity - computing intensity means and variances on the bands takes at least O(nB) operations it controls the trade-off between local and global features around the object. If B = 1, the region energy is as local as an edge term. The main image property having an effect on the minimal band thickness is the edges sharpness. Indeed, the deformable curve needs a larger band as the boundaries of the target object are fuzzy. However, B = 10 was a suitable value in our experiments. Note that we depict the state of the parametric snake before self-collision. One may note that an unconstrained region-based level set method would also properly segment images in row 2 and 3. However, this remark should be moderated by the fact our model is dedicated to applications where topology preservation is needed.
5
Conclusion and Perspectives
We proposed the geodesically linked active contour model for image segmentation. The model lies on an explicitly implemented curve moved by an evolution method based on minimal paths and a greedy algorithm. Linking curve points with geodesics solves parameterization issues and allows the contour to fit the most salient boundaries at every step of deformation. Displacing vertices according to a greedy search ensured lower sensitivity to erroneous local minima than usual gradient descent of the energy. The model was endowed with edge and
174
J. Mille and L.D. Cohen
region energies and was validated on a few datasets. Further work may focus on developing an adaptive search window for greedy evolution. Currently, the window length is constant whatever the values of energies or the previous positions of vertices. We believe the algorithm could be improved by adapting the window length with respect to these properties, in order to avoid visiting positions that would not seemingly minimize the energy.
References 1. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. International Journal of Computer Vision 1(4), 321–331 (1988) 2. Osher, S., Sethian, J.: Fronts propagation with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 3. Malladi, R., Sethian, J., Vemuri, B.: Shape modeling with front propagation: a level set approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(2), 158–175 (1995) 4. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. International Journal of Computer Vision 22(1), 61–79 (1997) 5. Cohen, L.: On active contour models and balloons. Computer Vision, Graphics, and Image Processing: Image Understanding 53(2), 211–218 (1991) 6. Amini, A., Weymouth, T., Rain, R.: Using dynamic programming for solving variational problems in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(9), 855–867 (1990) 7. Geiger, D., Gupta, A., Luiz, A., Vlontzos, J.: Dynamic programming for detecting, tracking, and matching deformable contours. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(3), 294–302 (1995) 8. Williams, D., Shah, M.: A fast algorithm for active contours and curvature estimation. Computer Vision, Graphics, and Image Processing: Image Understanding 55(1), 14–26 (1992) 9. Sakalli, M., Lam, K.M., Yan, H.: A faster converging snake algorithm to locate object boundaries. IEEE Transactions on Image Processing 15(5), 1182–1191 (2006) 10. Cohen, L., Kimmel, R.: Global minimum for active contour models: a minimal path approach. International Journal of Computer Vision 24(1), 57–78 (1997) 11. Sethian, J.: A fast marching level set method for monotonically advancing fronts. Proceedings of the National Academy of Science 93(4), 1591–1595 (1996) 12. Chan, T., Vese, L.: Active contours without edges. IEEE Transactions on Image Processing 10(2), 266–277 (2001) 13. Piovano, J., Rousson, M., Papadopoulo, T.: Efficient segmentation of piecewise smooth images. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 709–720. Springer, Heidelberg (2007) 14. Lankton, S., Tannenbaum, A.: Localizing region-based active contours. IEEE Transactions on Image Processing 17(11), 2029–2039 (2008) 15. Mille, J., Boné, R., Cohen, L.: Region-based 2D deformable generalized cylinder for narrow structures segmentation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 392–404. Springer, Heidelberg (2008) 16. Farouki, R., Neff, C.: Analytic properties of plane offset curves. Computer Aided Geometric Design 7(1-4), 83–99 (1990)
Validation of Watershed Regions by Scale-Space Statistics Tomoya Sakai and Atsushi Imiya Institute of Media and Information Technology, Chiba University, Japan {tsakai,imiya}@faculty.chiba-u.jp
Abstract. This paper shows a potential use of scale space for statistical validation of watershed regions of a greyscale image. The watershed segmentation has difficulty in distinguishing valid watershed regions associated with real structures of the image from invalid random regions due to background noise. In this paper, a hierarchy of watershed regions is established by following merging process of the regions in a Gaussian scale space. The distribution of annihilation scales (lives) of the regional minima is investigated to statistically judge the regions as being valid or not. Recursive validation using the hierarchy prevents oversegmentation due to the randomness.
1
Introduction
The aim of this study is to develop a statistical validation scheme for segmentation of a greyscale image. If we do not have a priori knowledge on the shapes or structures of objects in the image, topographic features of the greyscale image, and the watersheds in particular, are useful for unsupervised image segmentation. A well-known phenomenon in the watershed segmentation is oversegmentation, that is, producing a large number of undesired tiny regions. Since the undesired watershed regions are mainly caused by noise in the image, it is desirable to settle the oversegmentation problem by taking account of statistical properties of the randomness. There is a body of literature dealing with the oversegmentation problem of watersheds [1,2,3,4,5,6,7,8]. In the antecedent work, most schemes for preventing the oversegmentation attempt to hierarchically merge the oversegmented regions on the basis of similarity between adjacent regions measured by the MDL [3], colour distance [8], and so on. Diffusion-based multiscale image representations are also used for merging the regions [5, 6, 8], since the scale space theory [9,10,11,12,13,14,15] mathematically underpins topological relationships among the topographic features without a priori knowledge about them. The oversegmentation can be reduced by selecting levels in the hierarchy of regions, or by setting lower bounds to the scale above and below which the watersheds are valid and invalid, respectively. In this paper, we show that the scale-space treatment of the image is also useful for the statistical analysis of the random watershed regions. The validity X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 175–186, 2009. c Springer-Verlag Berlin Heidelberg 2009
176
T. Sakai and A. Imiya
of a watershed region can be quantified in terms of the statistical confidence of distinguishing it from invalid watershed regions due to randomness. We present a fully unsupervised watershed segmentation algorithm, in which the watershed regions are recursively validated according to their hierarchical relationships in the scale space.
2 2.1
Watershed Segmentation with Variable Scale Gaussian Scale Space
In the Gausian scale-space theory [9,10,11,12,14,15,16], a one-parameter family of nonnegative functions is derived from a d-dimensional greyscale image f (x), x ∈ Rd . f (x, σ) = G(x, σ) ∗ f (x) (1) Here, “∗” expresses d-dimensional convolution, and G(x, σ) is an isotropic Gaussian function with the scale σ. |x|2 1 (2) G(x, σ) = √ d exp − 2 2σ 2π σ d We redefine the d-dimensional greyscale image and its scale-space representation in the extended real scale and space as follows. Definition 1. A d-dimensional greyscale image is defined as a nonnegative d ¯ d with a finite net image intensity scalar function f (x), x ∈ R x∈R¯ d f (x)dx . ¯ d, R ¯ + ), is the convoDefinition 2. The scale-space image f (x, σ), (x, σ) ∈ (R lution of the greyscale image f (x) with the isotropic Gaussian kernel G(x, σ). ¯ d and R ¯ + denote the d-dimensional extended real space including a point Here, R at infinity and the extended real scale including an infinite scale, respectively. Although the domain of a greyscale image in practice is bounded within a limited area or volume, we embed such an image in the extended real scale space. The point at infinity will be theoretically used as a representative point of the background of the image in the watershed segmentation later. 2.2
Watershed Segmentation and Hierarchy of Regions
The watershed segmentation was derived from spatial partitioning on the basis of the drainage patterns of rainfall. As the topographic height map defines the boundaries of the catchment basins draining to the same lowest points, a twodimensional greyscale image defines the watershed boundary curves enclosing regions with local minima when we regard the image intensity as the topographic height. For a d-dimensional image, the entire space is partitioned by (d − 1)dimensional hypersurfaces into d-dimensional watershed regions. Each watershed region defined by a smooth function f (x) contains a unique local minimum, to
Validation of Watershed Regions by Scale-Space Statistics
177
which any point in the watershed region is connected by a gradient curve of f (x). In practice, the watershed segmentation of the gradient image |∇f (x)| is known to provide better intuitive partitions than that of the image f (x) itself [2, 5, 6, 8] because object boundaries in a scene may cause large spatial changes in the image intensity. Simple computation of the watersheds of the images results in oversegmentation caused by tiny and insignificant catchment basins. As suggested in the antecedent work [3, 5, 6, 8], hierarchical relationships among the watershed regions are of great help for merging the oversegmented regions. We employ the scale-space framework to derive the hierarchy because the scale-space axioms are acceptable in general cases where any prior information about the similarities among the unexpected watershed regions are not given. If we apply the gradient watershed segmentation to the image f (x, σ) with the variable scale σ, we can observe the evolution of the watersheds with respect to scale. The catastrophy theory applied to the gradient watershed segmentation in the Gaussian scale space [5] shows that the gradient watershed regions of f (x, σ) may be generically annihilated, merged, created and splitted with increasing scale σ. Therefore, hierarchical watershed segmentation using multiscale representation of the image [2, 6, 8] is essentially the extraction of the hierarchical relationships among the watershed regions in the scale space through the generic events. Since every watershed region is represented by its local minimum, the trajectories of the regional minima in scale space describe the relationships among the regions. For the purpose of validation of the regions, we derive the hierarchy from all the traceable regional minima from the finest scale along their trajectories in scale space. We trace the trajectories by local minimisation at every level of scale [16]. In an annihilation or merging event, two regional minima and a saddle between them are involved. We regard one of these two regional minima as a child of the resulting regional minimum after the event. We trace only one of two local minima after a creation or splitting event because we are interested in the hierarchy of the regions at the finest scale. Remark that the point at infinity is a local minimum which exists at any scale. The local minimum at infinity is the regional minimum of the image background because the rainfall in the background region is drained to this ideal point. The following algorithm RegionHierarchy traces every trajectory of the regional minimum from every pixel p ∈ P at σ = 0 until the regional minimum disappears or goes outside the image boundary toward the local minimum at infinity with increasing scale. RegionHierarchy(set of pixel centres P , image f (p ∈ P )) 1 let G be a graph with card(P ) + 1 nodes with the labels l = 0, . . . , N where l = 0 represents the point at infinity; 2 store σlt = ∞ in all nodes of G; 3 set σmax to be the size of the convex hull of P ; 4 σ := 0; 5 Q := P ; 6 while card(Q) = 1 or σ < σmax do
178
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
T. Sakai and A. Imiya
Q := Q; σ := σ + Δσ, where Δσ is a small value compared with the space intervals of the points Q; for each q l ∈ Q do update q l by minimising |∇f (x, σ)|2 with q l as the initial position 1 ; if q l is outside the convex hull of P then connect the two nodes of G labelled 0 and l; end if end for let L be a list of labels corresponding to the points in Q; while card(L) = 1 do pop a label l from L; n := NearestNeighbour(L, l); if |q l − q n | < εσ, where q l , q n ∈ Q, and ε is the tolerance of minimisation then if |q l − q l | > |q n − q n |, where q l , q n ∈ Q then child := l and parent := n; else child := n and parent := l; end if connect the two nodes of G labelled parent and child; t set σchild := σ; remove q child from Q; end if end while end while return G.
The resulting graph G is a set of trees representing the hierarchy of the watershed regions of the gradient image. Any node in G represents a watershed region consisting of the pixels indicated by its subtree nodes. The annihilation or merging scale σ t is stored at the node in G corresponding to p. We utilise the bicubic spline interpolation [17] to seach for the local minimum with subpixel precision in Step 10. The function NearestNeighbour in Step 18 searches for the nearest point to pl in the set of points listed in L and returns its label. The annihilation or merging event is detected in Step 19, and one of the two regional minima with larger displacement is identified as the child in Step 20. Figure 1 shows an example of the trajectories of regional minima and the region hierarchy obtained by RegionHierarchy. Since the set of tree, G, expresses hierarchical relationships among the image pixels, any tree node with a scale σ > 0 represents a set of pixels consisting a watershed region.
1
It is trivial that the watershed regions of the gradient magnitude squared |∇f |2 are identical to those of the gradient magnitude |∇f |.
Validation of Watershed Regions by Scale-Space Statistics
179
σ
(a)
(b)
(c)
Fig. 1. Trajectories of regional minima and region hierarchy. (a) A noisy 96 × 96 image f (x) embeded in a dark background. (b) Gradient magnitude squared |∇f (x, σ = 20)|2 . The brighter the larger magnitude. (c) Trajectories of the regional minima in scale space. The thick curves (blue) are the parts of the trajectories for σ > 5. The thin straight lines (red) are the edges of G between the nodes with σ > 5.
2.3
Scale Selection Problem
We need a criterion to select the scales or the tree levels in hierarchy. One may expect that the watersheds of the image f (x, σ) at a small scale σ well approximates the boundary of true image regions. However, if noise spoils the fine structure of the image, the estimated watersheds at small scales are stochastic and experimentally less reproducible. The noise is suppressed at a large scale, but the watershed segmentation is poor in terms of detection ability and localisation: the edges of small watershed regions are smoothed out, and the boundary shapes of large regions are simplified. Since the randomness is the major cause of the oversegmentation problem in the watershed methods [1, 4, 5], the oversegmentation problem should be resolved in a statistical manner.
3 3.1
Validation of Watershed Regions Valid Watershed Regions
Generally, a greyscale image expresses spatial distribution of a measured physical quantity. The true image f true(x), which we want to measure and apply the watershed segmentation to, is inevitably spoiled by random noise through the measurement. Therefore, the actual image f (x) presents valid watersheds related to those of the true image f true (x) and invalid watersheds due to the randomness. Assertion 1. A valid watershed region of an observed gradient image |∇f (x)| is related to one of the watershed regions of the true gradient image |∇f true (x)|. Since the watershed regions are represented by the region minima, the image f (x) has the valid watershed regions of the gradient image |∇f (x)| iff the true gradient image |∇f true (x)| has corresponding local minima. Contrapositively,
180
T. Sakai and A. Imiya
iff |∇f true (x)| is a featureless image without any local minimum, then no valid watershed exists for any observation f (x), which should be considered as an image of the background only. This condition means that f true (x) = 0 everywhere ¯ d because of the Definition 1. Therefore, f (x) for f true (x) = 0, i.e., the in R noise image, produces only the invalid watershed regions. The valid watershed region must be statistically distinguishable from such invalid region. From this viewpoint, the validity of the watershed region is interpreted as the statistical confidence in rejecting the following null hypothesis. Null hypothesis H0 : The watershed region is that of the noise image. Alternative hypothesis H1 : The watershed region is not that of the noise image. The null hypothesis H0 is rejected if the regional minimum is distinguishable from that of the noise image using test statistics. 3.2
Life Distribution
An important fact is that the randomness of the image f (x, σ) is filtered out as the scale σ increases, and deterministic features of the image f (x) emerge at large scales. In other words, the deterministic features such as the valid watershed regions are established from coarse to fine. There presumably exists a critical lower bound of scale, above and below which the watersheds of f (x, σ) are valid and invalid, respectively. In order to observe how the valid regions survive until large scales against the scale-space filtering, we define the life of the watershed region. Definition 3. The life of the watershed region is defined as the annihilation scale σ t of the regional minimum. Let W be a distribution of the lives of the watershed regions of |∇f (x, σ)| for the image of random noise. If W can be parametrically modelled, a goodness-of-fit test can be performed under the null hypothesis H0 . That is, if an image f (x) is an observation of a true uniform image with noise, then the model of W fits the distribution of lives {σ t } of its watershed regions, and H0 for any watershed regions of f (x) is accepted. We investigate experimentally the life distribution W for the gradient watershed regions of a Gaussian white noise image as shown in Fig. 2(a). We averaged the frequencies of lives over one hundred noise images. We discard the lives of pixel points whose annihilations are detected in 0 < σ ≤ Δσ by RegionHierarchy because not all the pixel centres are the local minima. Figure 2(b) is the averaged histogram of life. The obtained life histogram shows an unimodal shape. This implies that there exists a scale where the merging of the regions most frequently occurs. The regional minima of the noise image are uniformly distributed random points, and the regions tend to merge with nearest regions. Therefore, we deduce that this unimodal property is associated with distribution of the nearest neighbour distances of random points. In fact, the nearest neighbour distance distribution has a unimodal shape (See appendix A). The scale of
Validation of Watershed Regions by Scale-Space Statistics
181
σ
Relative frequency (a)
(b)
Fig. 2. Noise image and the averaged life histogram for its gradient watershed regions. (a) The noise image has uncorrelated random pixel values. (b) The life histogram shows relative frequency of scale at which the regional minima of the gradient image are annihilated as Gaussian blurring of the noise image proceeds.
the mode can be used as a gauge of the density of invalid regions. The regional minima with significantly large values of life out of the unimodal distribution W can be identified to be valid, because such regional minima are distinguishable from the invalid regional minima of the noise image. 3.3
Recursive Validation
We can set a critical value of the scale to judge the watershed regions valid or invalid. Although the computation of such a critical scale requires the parametric model of the life distribution in the strict sense of statistics, the critical scale can be roughly evaluated by the peak and decaying form of the life histogram. If the image contains valid regions, the life histogram may be multimodal or may have a peak at a small scale relative to the outlying lives representative of the valid regions. According to our experimental result in Section 3.2, a regional minimum with a life which is more than six times greater than the peak can be considered to be valid with the statistical confidence level α > 99% under the assumption of uncorrelated Gaussian random pixel values of a two-dimensional image as the noise. We present an algorithm RegionDiscovery for discovery of the valid watershed regions. This algorithm recursively validates the regions in a top-down fashion using each tree T in G by RegionHierarchy. According to the hierarchy, any discovered region is split into subregions as long as they are valid. Each subregion is validated using the life histograms constructed from the lives stored in the subtrees of T corresponding to the subregion.
182
T. Sakai and A. Imiya
RegionDiscovery(tree T , set of valid regions V , significance level α) 1 let Σ be a set of life values stored in T except the root; 2 let s be the subroot node of T with the largest life value σmax ∈ Σ; 3 if IsMultimodal(Σ) or IsOutlier(σmax , Σ, α) then 4 RegionDiscovery(Subtree(T , s), V , α); 5 RegionDiscovery(T \Subtree(T , s), V , α); 6 else 7 push the region R := Pixels(T ) into V ; 8 end if. Here, the function IsMultimodal returns true if the histogram of Σ is not unimodal. IsOutlier returns true if the life σ t is greater than the critical αlevel of scale computed from the given set of lives Σ. Note that these functions discard the lives in 0 < σ ≤ Δσ. Subtree extracts the subtree with subroot node s from the tree T . Pixels returns a set of pixels whose labels are recorded at the nodes in the given tree. The following function, Watershed, executes our watershed segmentation algorithm for a given image f with a set of pixels P and a significance level α. It returns the set of valid watershed regions consisting of subsets of P . Watershed(set of pixel centres P , image f , significance level α) 1 set V := ∅; 2 G := RegionHierarchy(P, f ); 3 for each tree T in G do 4 RegionDiscovery(T , V , α); 5 end for 6 return V .
scale 30 25 20 15 10 5
(a)
(b)
(c)
Fig. 3. An example of our watershed segmentation of noisy image. (a) Original image with 20% noise. (b) Trajectories of local minima of the gradient magnitude of (a) in scale space. The trajectories reaching out of the spatial domain are subordinate to a local minimum at infinity. (c) Watershed regions of the gradient magnitude by the algorithm Watershed. The brightness indicates the order of lives.
Validation of Watershed Regions by Scale-Space Statistics
4
183
Test Example
We demonstrate our gradient watershed segmentation Watershed for a noisy greyscale image. The purpose of this section is not to test the performance of the algorithm, but to show that the statistics in scale space has potential to discover the valid watershed regions without any prior information about them. Figure 3(a) shows a 128 × 128 test image f (x) with 20% additive noise [18]. The trajectories of local minima of |f (x, σ)| traced from σ = 0 in scale space are shown in Fig. 3(b). We see a large number of local minima created by the noise at small scales. As the scale increases, the local minima are hierarchically grouped and representative local minima survive at larger scales. Figure 3(c) shows the segmentation result with a confidence level α = 99% for f (x). There are nine
σ=2
σ=6
σ = 12
Fig. 4. Watershed segmentation of Fig. 3(a) at different scales. First row: the scalespace image f (x, σ). Second row: the gradient magnitude |∇f (x, σ)|. Third row: the watersheds of |∇f (x, σ)|. Each column corresponds to the same scale indicated below.
184
T. Sakai and A. Imiya
discovered regions clearly corresponding to the major regions of the original image. The tiny faults in the regions were caused by failure in the minimisation. They were wrongly assigned to the image background, which should be fixed in the future work. For the comparision purpose, we show in Fig. 4 the simple watershed segmentation results at a few levels of scale without using the region hierarchy or statistics in scale space. We see invalid small regions at small scales while the shapes of valid regions at large scales are distorted. It is remarkable that structural and statistical analyses using scale space can reconstruct the precise edges of statistically valid watershed regions despite the significant noise.
5
Concluding Remarks
The scale-space treatment of the image clarifies not only the hierarchical relationships among the watershed regions but also their statistical properties. We can observe in the Gaussian scale space how the random features are suppressed and deterministic features emerge as the scale grows. A valid watershed region must be statistically distinguishable from unreproducible regions caused by the random features. The reproducibility is a desirable ability of image recognition techniques. On the basis of this simple requirement we described the null hypothesis H0 , which is to be rejected if the watershed region is valid. A watershed region is recognised as valid at a statistical confidence level in rejecting H0 . We presented a validation scheme for watershed segmentation using statistics in scale space. We defined the life of a watershed region, whose distribution is useful for testing H0 . We showed that the life distribution for the noise image is unimodal, and the valid regions can be identified by the regional minima with significantly large values of lives out of the unimodal distribution. The statistical properties of the life and the region hierarchy enable the recursive validation of the watershed regions. A distinctive feature of our scheme is that it does not require any definition of similarity or dissimilarity measures between watershed regions, which is used in many methods for preventing oversegmentation. Instead, we focused on the statistical differences between the valid and invalid regions in scale space. In order to take advantage of the potential of scale-space statistics, our scheme requires further investigation, especially in relation to the model of the life distribution, and improvement and acceleration of the algorithms to obtain feasible segmentation results for larger size real images.
References 1. Vincent, L., Soille, P.: Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Trans. on Pattern Analysis and Machine Intelligence 13(6), 583–598 (1991)
Validation of Watershed Regions by Scale-Space Statistics
185
2. Beucher, S.: Watershed, hierarchical segmentation and waterfall algorithm. In: Proc. Math. Morphology and Its Appl. to Image Processing, pp. 69–76 (1994) 3. Maes, F., Vandermeulen, D., Suetens, P., Marchal, G.: Computer-aided interactive object delineation using an intelligent paintbrush technique. In: Ayache, N. (ed.) CVRMed 1995. LNCS, vol. 905, pp. 77–83. Springer, Heidelberg (1995) 4. Hagyard, D., Razaz, M., Atkin, P.: Analysis of watershed algorithms for greyscale images. In: Proc. of IEEE Intl. Conf. Image Procesing, vol. 3, pp. 41–44 (1996) 5. Olsen, O.F., Nielsen, M.: Generic events for the gradient squared with application to multi-scale segmentation. In: ter Haar Romeny, B.M., Florack, L.M.J., Viergever, M.A. (eds.) Scale-Space 1997. LNCS, vol. 1252, pp. 101–112. Springer, Heidelberg (1997) 6. Gauch, J.M.: Image segmentation and analysis via multiscale gradient watershed hierarchies. IEEE Trans. on Image Processing 8(1), 69–79 (1999) 7. Roerdink, J.B.T.M., Meijster, A.: The watershed transform: definitions, algorithms, and parallelization strategies. Fundamenta Informaticae 41, 187–228 (2001) 8. Vanhamel, I., Pratikakis, I., Sahli, H.: Multiscale gradient watersheds of color Images. IEEE Trans. on Image Processing 12(6), 617–626 (2003) 9. Witkin, A.P.: Scale space filtering. In: Proc. of 8th IJCAI, pp. 1019–1022 (1986) 10. Koenderink, J.J.: The structure of images. Biological Cybernetics 50, 363–370 (1984) 11. Lindeberg, T.: Scale-Space Theory in Computer Vision. Kluwer, Boston (1994) 12. Weickert, J., Ishikawa, S., Imiya, A.: Linear Scale-Space has First been Proposed in Japan. Journal of Mathematical Imaging and Vision 10, 237–252 (1999) 13. Lifshitz, L.M., Pizer, S.M.: A multiresolution hierarchical approach to image segmentation based on intensity extrema. IEEE Trans. on Pattern Analysis and Machine Intelligence 12(6), 529–540 (1990) 14. Florack, L.M.J., Kuijper, A.: The topological structure of scale-space images. Journal of Mathematical Imaging and Vision 12(1), 65–79 (2000) 15. Kuijper, A.: The deep structure of Gaussian scale-space images. PhD thesis, Utrecht University (2002) 16. Sakai, T., Imiya, A.: Gradient structure of image in scale space. Journal of Mathematical Imaging and Vision 28(3), 243–257 (2007) 17. Keys, R.: Cubic convolution interpolation for digital image processing. IEEE Trans. on Acoustics, Speech, and Signal Processing 29(6), 1153–1160 (1981) 18. SAMPL database, http://sampl.ece.ohio-state.edu/database.htm 19. Suwa, N.: Quantitative morphology: stereology for biologists. Iwanami Shoten (1977) (in Japanese)
A
Distribution of Nearest Neighbour Distances
We present a proof that the nearest neighbour distances obey the Weibull distribution if the points in Rd are uniformly distributed in a Poisson arrangement [19]. The Poisson arrangement is defined as the uniformly random distribution of points with constant density ρ such that the number of points x in a fixed volume V follows the Poission distribution. Po(x; λ) =
λx exp(−λ) x!
(3)
186
T. Sakai and A. Imiya
Here, λ = ρV is the expected number of points in the volume V . Let r be the distance from an arbitrary point. The distribution of the nearest neighbour distances, p(r), can be regarded as the probability that the nearest neighbour is found in an infinitesimal gap between r and r + δr. This is the case that no points are found within the distance r, and at least one point is found between r and r + δr. Since the volume Vd of a unit d-ball and its surface area Sd−1 has a relationship Vd d = Sd−1 , we have p(r)δr = Po(0; ρVd rd ) 1 − Po(0; ρSd−1 rd−1 δr) ≈ exp(−ρVd rd ) 1 − exp(ρSd−1 rd−1 δr) = exp(−ρVd rd ) · ρSd−1 rd−1 δr = exp(−ρVd rd ) · ρVd drd−1 δr √ Letting s = 1/d ρVd be the scale of the average volume of d-dimensional hypercube per point, we obtain the Weibull distribution
d r d−1 r d p(r; s, d) = (4) exp − s s s where s and d correspond to the so-called scale and shape parameters of the Weibull distribution, respectively. This distribution p(r; s, d) has a mode at r = s d (d − 1)/d. For a fixed dimensionality, the mode depends only on the scale parameter s, which enables us to calculate the point density ρ from the mode.
Adaptation of Eikonal Equation over Weighted Graph Vinh-Thong Ta, Abderrahim Elmoataz, and Olivier Lézoray Université de Caen Basse-Normandie, GREYC CNRS UMR 6072, Image Team {vinhthong.ta,abderrahim.elmoataz-billah,olivier.lezoray}@unicaen.fr http://www.info.unicaen.fr/˜vta
Abstract. In this paper, an adaptation of the eikonal equation is proposed by considering the latter on weighted graphs of arbitrary structure. This novel approach is based on a family of discrete morphological local and nonlocal gradients expressed by partial difference equations (PdEs). Our formulation of the eikonal equation on weighted graphs generalizes local and nonlocal configurations in the context of image processing and extends this equation for the processing of any unorganized high dimensional discrete data that can be represented by a graph. Our approach leads to a unified formulation for image segmentation and high dimensional irregular data processing.
1
Introduction
Solutions of the nonlinear eikonal equation have found numerous applications. One can quote for instance, geometric optics, image analysis or computer vision including shape from shading [1, 2], median axis or skeleton extraction [3], topographic segmentation (watershed) [4] or geodesic distance computation on discrete and parametric surfaces [5, 6, 7, 8, 9]. The latter works consider both structured and unstructured meshes on cartesian or non-cartesian domains. The eikonal equation is a special case of the following general continuous Hamilton-Jabobi equation: H(x, f, ∇f ) = 0 x∈Ω ⊂ IRn , (1) f (x) = φ(x) x∈Γ ⊂ Ω where φ in the boundary condition is a positive speed function defined on Ω and f (x) is the traveling time or distance from source Γ . Then, the eikonal equation can be expressed by using the following Hamiltonian: H(x, f, ∇f ) = ∇f (x) − P (x),
(2)
where P (x) is a given potential function. Solution of (1) represents the shortest distance from x to the zero distance curve given by Γ (where φ(x)=0). Solutions of (2) are usually based on a discretization of the Hamiltonian where the approximation of the derivatives is performed by the Godunov [10] or the X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 187–199, 2009. c Springer-Verlag Berlin Heidelberg 2009
188
V.-T. Ta, A. Elmoataz, and O. Lézoray
Lax-Friedrich [11] schemes. Then, many numerical methods have been proposed and investigated to solve the nonlinear system described by (2). For instance, one can quote the following schemes. (i) An iterative scheme [1] relying on fixed point methods that solves a quadratic equation was proposed. (ii) The fast sweeping methods [12] that use Gauss-Seidel type of iterations to update the distance function field. The key point of fast sweeping is to update the points in a certain order. (iii) Tsitsiklis [13] was the first to develop a Dijkstra like method and proposed an optimal algorithm for solving the eikonal equation. Based on this idea, [14, 11] produced the fast marching methods. Another approach to solve (2) is to consider a time dependent version of the equation and to evolve it to the steady state. Then, (2) can be rewritten as ⎧ n ⎪ ⎨∂f (x, t)/∂t = −∇f (x) + P (x) x∈Ω ⊂ IR (3) f (x, t) = φ(x) x∈Γ ⊂ IRn . ⎪ ⎩ f (x, 0) = φ0 (x) x∈Ω This paper only considers the discrete analogue of the time dependent formulation of the eikonal equation but, in future works, the stationary case (time independent) will be also considered. Contributions. In this work, we propose an adaptation of (3) over weighted graphs of the arbitrary structure. The goal here is to provide a simple and common formulation that solves the eikonal equation for any discrete data that can be represented by a weighted graph such as images or high dimensional data defined on irregular domains. This alternative formulation for solving the eikonal equation is based on partial difference equations (PdEs) and discrete gradients over weighted graphs. Our formulation has several advantages. Any discrete domain that can be described by a graph can be considered without any spatial discretization. In the context of image processing, local and nonlocal configurations are directly enabled within a same formulation. Finally, the aim of this paper is not to solve a particular application with the eikonal equation but to show the potentialities of our proposition to address image segmentation, data clustering or distance computation. Paper Organization. The paper is organized as follows. Section 2 recalls basics, definitions and operators on weighted graphs. Section 3 introduces our formulation for solving the eikonal equation. Section 4 shows the potentialities of our proposition for the segmentation of images and unorganized data processing. Finally, last Section concludes.
2
Discrete Derivatives on Weighted Graphs
This Section recalls basics, definitions, operators and processes on weighted graphs.
Adaptation of Eikonal Equation over Weighted Graph
2.1
189
Definitions and Weighted Graphs Construction
Notations and Definitions. We consider the general situation where any discrete domain can be viewed as a weighted graph. Let G=(V, E, w) be a weighted graph composed of two finites sets: vertices V and weighted edges E⊆V ×V . An edge (u, v)∈E connects two adjacent (neighbor) vertices u and v . The neighborhood of a vertex u is noted N (u)={v∈V \{u} : (u, v)∈E}. The weight ωuv of an edge (u, v) can be defined with a function w:V ×V →IR+ such that w(u, v)=ωuv if (u, v)∈E and w(u, v)=0 otherwise. Graphs are assumed to be simple, connected and undirected implying that function w is symmetric. Let f :V →IR be a discrete real-valued function that assigns a real value f (u) to each vertex u∈V . We denote by H(V ) the Hilbert space of such functions defined on V . Weighted Graphs Construction. Any discrete domain can be represented by a weighted graph where functions of H(V ) represents the data to process. In the general case, an unorganized set of points V ⊂IRn can be seen as a function f 0 :V ⊂IRn →IRn . Then, constructing a graph from this data consists in defining the set of edges E by modeling the neighborhood. It is based on a similarity relationship between data with a pairwise distance measure μ:V ×V →IR+ . There exists several methods to transform a set of vertices V into a neighborhood (similarity) graph (see [15] for a survey on proximity and neighborhood graphs). In this paper, we focus on two particular graphs: the τ -neighborhood graphs and a modified version of k-nearest neighbors graphs. The k nearest neighbors graph, noted k-NNG is a weighted graph where each vertex u∈V is connected to its k nearest neighbors which have the smallest distance measure towards u according to function μ. Since this graph is directed, a modified version of this graph is used to make it undirected. The τ -neighborhood graph, noted Gτ is a weighted graph where the τ -neighborhood Nτ for a given vertex u∈V is defined as Nτ (u)={v∈V \{u} : μ(u, v)≤τ } with τ >0 a threshold parameter. 2D images can be viewed as functions f 0 :V ⊂ZZ 2 →IRn . In this case, the associated distance μ for construct the neighborhood graph is usually the city block or the Chebychev distances computed with the spatial coordinates of each vertex representing an image pixel. With these distances and the τ -neighborhood graphs, one recovers the two usual graphs used in image processing, the 4-adjacency grid graph (denoted G0 with the city block distance) and the 8-adjacency grid graph (denoted G1 with the Chebychev distance) with τ ≤1. Another useful graph structure in image processing is the region adjacency graph (RAG) where vertices correspond to image regions, and the set of edges is obtained by considering an adjacency distance. With the τ -neighborhood (τ =1), the RAG is the Delaunay graph of an image partition. Weights Computation. Similarities between data can be incorporated within edges’ weights according to a measure of similarity g:E→IR+ that satsfies w(u, v)=g(u, v) for (u, v)∈E. Then, the distance computation between data is performed by comparing their features that generally depend on a given initial function f 0 ∈H(V ). To this aim, each vertex u∈V is assigned with a feature vector F (f 0 , u)∈IRm . With F , the following weight functions can be considered. For
190
V.-T. Ta, A. Elmoataz, and O. Lézoray
a given edge (u, v)∈E and a distance measure ρ:V ×V →IR+ associated to F , we can have g0 (u, v) = 1 (constant weight case) , g1 (u, v) = (ρ(F (f 0 , u), F (f 0 , v)) + )−1 with >0, →0, g2 (u, v) = exp(−ρ(F (f 0 , u), F (f 0 , v))2 /σ 2 ) with σ>0, where σ controls the similarity and ρ is usually the euclidean distance. Several choices for the expression of F can be considered depending on the features to preserve. The simplest one is F (f 0 , .)=f 0 . In the context of image processing, an important feature vector F is provided by image patches, i.e., F (f 0 , u)=Fτ (f 0 , u)={f 0 (v) : v∈Nτ (u) ∪ {u}}. In the case of a grayscale image Fτ (f 0 , .) is a vector of size (2τ +1)2 corresponding to the values of f 0 in a square window of size (2τ +1)×(2τ +1) centered at vertex u (a pixel). Color images can be handled using features of dimension 3×(2τ +1)2 . Then, the resultant weight function directly incorporates local or nonlocal features [16]. This feature vector has been proposed in the context of texture synthesis [17], and further used in the context of image processing [18,19]. 2.2
Graph Based Discrete Gradients
Let G=(V, E, w) be a weighted graph. The discrete weighted gradient of a function f ∈H(V ) at a vertex u∈V is defined by
(∇w f )(u) = (∂v f (u))(u,v)∈E
where ∂v f (u)= w(u, v)(f (v)−f (u)) corresponds to the discrete (partial) derivative of f with respect to the edge (u, v). These definitions have been used by [20] for image and mesh regularization. Based on the latter works, two discrete formulations of weighted morphological gradients on graphs have been proposed − by [21]: namely, the weighted external ∇+ w and the internal ∇w gradient operators. For u∈V + (∇+ w f )(u) = (∂v f (u))(u,v)∈E
and
− (∇− w f )(u) = (∂v f (u))(u,v)∈E ,
(4)
where the external ∂v+ f (u) and the internal ∂v− f (u) discrete partial derivatives are ∂v+ f (u) = max(0, ∂v f (u)) and ∂v− f (u) = − min(0, ∂v f (u)), with ∂v− f (u)=∂u+ f (v). When the weight is constant (w=g0 ) these definitions recover the classical directional derivative operators. The Lp -norm (with 0 0 , and > . det(m) det(m) gij y˙ θi y˙ θj
(8)
Since we use linear interpolation between tensors, we only need to check the condition at original data-points. This condition is always met in our ODF-data, and we expect it to hold quite generally. The goal of this section was to define a Finsler-structure and in particular a Finsler metric tensor gij (x, y) corresponding to a given tensorial ODF. Indeed
228
L. Astola and L. Florack
in case the ODF is a symmetric tensor of order two, this metric tensor is equivalent to the Riemann metric tensor. Following our Finsler approach, instead of one metric tensor per voxel we obtain a bundle of metric tensors at any x. For illustration, see Fig.1.
4
Transforming a Polynomial Tensor to a Monomial Tensor
Assume we wish to apply Laplace-Beltrami smoothing to our spherical data, by which we obtain a field of spherical functions at any desired scale, and that we wish to use a tensorial representation of the data instead of spherical harmonics. As is shown in [10], this smoothing is easy to do, using iterative polynomial tensor fitting. The point here is that for Finsler analysis, we would rather work with a tensor representation of monomial form D(y) = Di1 ···in yi1 · · · yin ,
(9)
than with the equivalent polynomial expression ˜ D(y) =
n
˜ i1 ···i yi1 · · · yik , D k
(10)
k=0
but still exploit the convenient (co-domain) scale space representation of the latter: n ˜ i1 ···i yi1 · · · yik . ˜ τ) = e−τ k(k+1) D (11) D(y, k k=0
This poses no problem, since we can rather easily transform the polynomial expression to a monomial one, using the fact that our polynomials are restricted to the sphere (eq. (4)), thus we may expand a lower order tensor to a sparse higher order one and symmetrize it. We can also always transform the monomial expression to polynomial sum of irreducible monomial tensors using Clebschprojection [20].
5
Fiber Tracking in HARDI Data Using Finsler Geometry
In DTI setting the most straightforward way of tracking fibers is to follow the principal eigenvector corresponding to the largest eigenvalue of the diffusion tensor until some stopping criterion. This method cannot reveal crossings and only provides a single direction (if at all) per voxel. Instead computing the shortest paths according to the diffusion-induced Riemann metric tensor, we could expect these to be the candidates for real fibers [5]. Of course, most of the shortest paths (geodesics) are not representing actual fibers, and therefore we should extract the potential neural fibers from arbitrary geodesics based on their connectivity [6]. We show some results of solving well-connected geodesics in an analytic as well as in a real rat brain data.
Finsler Geometry on HOT Fields
5.1
229
Analytic Tensor Field
We treat an analytic norm field in R2 , but the situation can be directly extended to R3 . Let us take as a convex norm function at each spatial position 1 1 (12) F (ϕ) = (cos 4ϕ + 4) 4 = 5 cos4 ϕ + 2 cos2 ϕ sin2 ϕ + 5 sin4 ϕ 4 . This is an example of fourth order tensor on unit vectors. Such a tensor field could represent an infinitely dense field of orthogonally crossing fibers. From the fact that F has no x-dependence we conclude that the geodesic coefficients vanish and that the geodesics coincide with the Euclidean geodesics γ(t) = (t · cos ϕ, t · sin ϕ), i.e. straight lines. However the so-called connectivity of a geodesic [6], [21] is relatively large, only in cases, where the directional norm function is correspondingly small. In Finsler setting the connectivity measure m(γ) is:
ηij γ˙ i γ˙ j dt m(γ) = , (13) gij (γ, γ) ˙ γ˙ i γ˙ j dt where the ηij (γ) represents the covariant Euclidean metric tensor which in Cartesian coordinates reduces to the constant identity matrix, γ˙ the tangent to the curve γ and gij (γ, γ) ˙ the Finsler-metric tensor (which depends not only on the position on the curve but also on the tangent of the curve). For illustration we compute explicitly the metric tensors, using Cartesian coordinates:
1 g11 g12 gij = , (14) (5 cos ϕ4 + 2 cos ϕ2 sin ϕ2 + 5 sin ϕ4 )3/2 g21 g22 where g11 = 5(5 cos ϕ6 + 3 cos ϕ4 sin ϕ2 + 15 cos ϕ2 sin ϕ4 + sin ϕ6 ) g12 = g21 = −48 cos ϕ3 sin ϕ3 g22 = 5(cos ϕ6 + 15 cos ϕ4 sin ϕ2 + 3 cos ϕ2 sin ϕ4 + 5 sin ϕ6 ) 1 2
1 2
g˙ g ¨ 2 The strong convexity criterion gg¨˙ 1 gg˙ 2 − −g1 g˙ 2 > 0 in R [13] on the indicatrix g(ϕ), for metric (14) is satisfied for every ϕ, since
13 − 8 cos 4ϕ g¨1 g˙ 2 − g˙ 1 g¨2 = >0. 1 2 1 2 g˙ g − g g˙ (4 + cos 4ϕ)2
(15)
The connectivity measure for a (Euclidean) geodesic γ can be computed analytically:
dt , (16) m(γ) =
(4 + cos(4ϕ))1/4 dt 5π 7π which gives the maximal connectivities in directions { π4 , 3π 4 , 4 , 4 }, as expected. See Fig. 2 for an illustration. We observe that on such a norm field the Riemannian (DTI) framework would result in Euclidean geodesics and constant connectivity over all geodesics thus revealing no information at all of the angular heterogeneity.
230
L. Astola and L. Florack
20
3
2
10 1
0 3
2
1
1
2
3
1
10
2
20 20
10
0
10
20
3
Fig. 2. Left:A field of fourth order spherical harmonics as in the norm function eq. (12) representing dense crossings and some well connected geodesics, colored in red. Right: 200 equiangular metric tensors of the same norm function, and an ellipse with light blue color corresponding to the metric in direction ϕ = π4 .
Fig. 3. Left:Finsler geodesics emanating from a voxel, and the most connective ones in red. Right: Fibers through same neighborhood in the traditional DTI principal eigenvector tracking.
5.2
Real Rat Brain Data
The Subthalamic Nucleus is a small area in the brain, that is involved in physiopathology of Parkinson’s disease [22]. We computed the Finsler geodesics and their connectivities, having an initial point in several central voxels in the Subthalamic Nucleus. These voxels were located based on comparison to an atlas of rat brain [23]. We tracked Finsler geodesics using the standard equation (ODEformulation) [14](p.78) and second order Taylor approximation, with initial directions as the 49 measurement directions, stepsize 0.2 voxel size and for 10 steps. Then we selected those 30% of all geodesics that have the best connectivity. Compared to the traditional DTI-tracking, we found that one of the main
Finsler Geometry on HOT Fields
231
directions with strong connectivity typically coincide with the DTI-fibers, but we also found other potential fiber directions. For illustration see Fig. 3.
6
Conclusions and Future Work
We have seen that it is indeed possible to analyze spherical tensor fields using Finsler geometry. It gives new methods to work with the data and also has the potential to give new information on the data. Finsler geodesics and Finsler curvatures are examples of geometric measures that can be applied on HARDI fiber-analysis, and which will be a subject of extensive future work.
Acknowledgement The rat brain data acquired for a study [24], was kindly provided by Ellen Brunenberg.
References 1. Tuch, D., Reese, T., Wiegell, M., Makris, N., Belliveau, J., van Wedeen, J.: High angular resolution diffusion imaging reveals intravoxel white matter fiber heterogeneity. Magnetic Resonance in Medicine 48(6), 1358–1372 (2002) 2. Stejskal, E., Tanner, J.: Spin diffusion measurements: Spin echoes ion the presence of a time-dependent field gradient. The Journal of Chemical Physics 42(1), 288–292 (1965) 3. Cohen de Lara, M.: Geometric and symmetry properties of a nondegenerate diffusion process. The Annals of Probability 23(4), 1557–1604 (1995) 4. O’Donnell, L., Haker, S., Westin, C.F.: New approaches to estimation of white matter connectivity in diffusion tensor MRI: Elliptic PDEs and geodesics in a tensorwarped space. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002. LNCS, vol. 2488, pp. 459–466. Springer, Heidelberg (2002) 5. Lenglet, C., Deriche, R., Faugeras, O.: Inferring white matter geometry from diffusion tensor MRI: Application to connectivity mapping. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 127–140. Springer, Heidelberg (2004) 6. Astola, L., Florack, L., ter Haar Romeny, B.: Measures for pathway analysis in brain white matter using diffusion tensor images. In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 642–649. Springer, Heidelberg (2007) 7. Astola, L., Florack, L.: Sticky vector fields and other geometric measures on diffusion tensor images. In: MMBIA 2008, IEEE Computer Society Workshop on Mathematical Methods in Biomedical Image Analysis, held in conjunction with CVPR 2008, Anchorage, Alaska, The United States. CVPR, vol. 20, pp. 1–7. Springer, Heidelberg (2008) 8. Özarslan, E., Mareci, T.: Generalized diffusion tensor imaging and analytical relationships between diffusion tensor imaging and high angular resolution diffusion imaging. Magnetic resonance in Medicine 50, 955–965 (2003) 9. Barmpoutis, A., Jian, B., Vemuri, B., Shepherd, T.: Symmetric positive 4th order tensors and their estimation from diffusion weighted MRI. In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 308–319. Springer, Heidelberg (2007)
232
L. Astola and L. Florack
10. Florack, L., Balmashnova, E.: Decomposition of high angular resolution diffusion images into a sum of self-similar polynomials on the sphere. In: Proceedings of the Eighteenth International Conference on Computer Graphics and Vision, GraphiCon 2008, Moscow, Russian Federation, June 2008, pp. 26–31 (2008) (invited paper) 11. Florack, L., Balmashnova, E.: Two canonical representations for regularized high angular resolution diffusion imaging. In: MICCAI Workshop on Computational Diffusion MRI, New York, USA, September 10, 2008, pp. 94–105 (2008) 12. Melonakos, J., Pichon, E., Angenent, S., Tannenbaum, A.: Finsler active contours. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(3), 412–423 (2008) 13. Bao, D., Chern, S.S., Shen, Z.: An Introduction to Riemann-Finsler Geometry. Springer, Heidelberg (2000) 14. Shen, Z.: Lectures on Finsler Geometry. World Scientific, Singapore (2001) 15. Tuch, D.: Q-ball imaging. Magnetic Resonance in Medicine 52(4), 577–582 (2002) 16. Jansons, K., Alexander, D.: Persistent angular structure: New insights from diffusion magnetic resonance imaging data. Inverse Problems 19, 1031–1046 (2003) 17. Özarslan, E., Shepherd, T., Vemuri, B., Blackband, S., Mareci, T.: Resolution of complex tissue microarchitecture using the diffusion orientation transform. NeuroImage 31, 1086–1103 (2006) 18. Jian, B., Vemuri, B., Özarslan, E., Carney, P., Mareci, T.: A novel tensor distribution model for the diffusion-weighted MR signal. NeuroImage 37, 164–176 (2007) 19. Descoteaux, M., Angelino, E., Fitzgibbons, S., Deriche, R.: Regularized, fast and robust analytical q-ball imaging. Magnetic Resonance in Medicine 58(3), 497–510 (2006) 20. Müller, C. (ed.): Analysis of Spherical Symmetries in Euclidean Spaces. Applied Mathematical Sciences, vol. 129. Springer, New York (1998) 21. Prados, E., Soatto, S., Lenglet, C., Pons, J.P., Wotawa, N., Deriche, R., Faugeras, O.: Control Theory and Fast Marching Techniques for Brain Connectivity Mapping. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, New York, USA, vol. 1, pp. 1076–1083. IEEE Computer Society Press, Los Alamitos (2006) 22. Hamani, C., Saint-Cyr, J., Fraser, J., Kaplitt, M., Lozano, A.: The subthalamic nucleus in the context of movement disorders. Brain, a Journal of Neurology 127, 4–20 (2004) 23. Paxinos, G., Watson, C.: The Rat Brain In Stereotaxic Coordinates. Academic Press, San Diego (1998) 24. Brunenberg, E., Prckovska, V., Platel, B., Strijkers, G., ter Haar Romeny, B.M.: Untangling a fiber bundle knot: Preliminary results on STN connectivity using DTI and HARDI on rat brains. In: Proceedings of the 17th Meeting of the International Society for Magnetic Resonance in Medicine (ISMRM), Honolulu, Hawaii (2009)
Finsler Geometry on HOT Fields
233
Appendix We seek the general condition for gij (y)v i v j > 0 ,
(17)
to be valid in R3 (= Tx M ). From the homogeneity of the norm function F , it follows that it is sufficient to have this condition on the unit level set of the norm. We consider this level surface i.e. the set of vectors y for which F (y) = 1 and a parametrization y(θ, ϕ) = (y 1 (θ, ϕ), y 2 (θ, ϕ), y 3 (θ, ϕ)). In what follows we abbreviate gij = gij (x, y). From F (y) = 1 we have gij y i y j = 1 .
(18)
Taking derivatives of both sides and using a consequence of Euler’s theorem for homogeneous functions ( [13] p.5) that says ∂gij k y =0, ∂y k
(19)
we obtain gij y˙ θi y j = 0
(20)
gij y˙ ϕi y j = 0 , implying y˙ θ ⊥g y and y˙ ϕ ⊥g y. Taking derivatives once more, we get gij y¨θi y j = −gij y˙ θi y˙ θj gij y¨ϕi y j = −gij y˙ ϕi y˙ ϕj i gij y¨θϕ yj
=
−gij y˙ θi y˙ ϕj
(21) .
We may express an arbitrary vector v as a linear combination of orthogonal basis vectors:
y˙ ϕ , y˙ θ v = αy + β y˙ θ + γ y˙ ϕ − y˙ θ . (22) y˙ θ , y˙ θ We substitute this expression for v to the left hand side of (17) and obtain: (gij y˙ θi y˙ ϕj )2 i j 2 i j 2 i j 2 i j , (23) gij v v = α gij y y + β gij y¨θ y + γ gij y¨ϕ y − gij y˙ θi y˙ θj because the mixed terms vanish due to the orthogonality of basis vectors. On the other hand, for y’s on the indicatrix we have as a consequence of ∂F Euler’s theorem on homogeneous functions (denoting Fyi = ∂y i ): Fyi y i = F (y) = 1 .
(24)
234
L. Astola and L. Florack
Differentiating eq. (24) w.r.t. θ and ϕ, we obtain two equations: Fyi y˙ θi = 0
(25)
Fyi y˙ ϕi
(26)
=0,
for F is a homogeneous function. The matrices m, mθ , mϕ are as defined in eq. (7). Solving system of equations (24), (25) and (25) we get: Fy1 =
y˙ ϕ2 y˙ θ3 − y˙ ϕ3 y˙ θ2 y˙ ϕ3 y˙ θ1 − y˙ ϕ1 y˙ θ3 y˙ ϕ1 y˙ θ2 − y˙ ϕ2 y˙ θ1 , Fy2 = , Fy3 = . det(m) det(m) det(m)
(27)
Now using equalities gij y¨i θ y j = Fyk y¨θk ,
Fyi = gij y j , and
det(mθ ) , det(m)
−gij y¨θi y j =
gij y¨i ϕ y j = Fyk y¨ϕk ,
(28)
det(mϕ ) det(m)
(29)
−gij y¨ϕi y j =
we obtain
2
gij v v = α − β i j
2
gij y¨θi y j
−γ
if det(mθ ) >0 det(m)
and
2
gij y¨ϕi y j
−
(gij y˙ θi y˙ ϕj )2
gij y˙ θi y˙ θj
(gij y˙ θi y˙ ϕj )2 det(mϕ ) > . det(m) gij y˙ θi y˙ θj
>0
(30)
(31)
Bregman-EM-TV Methods with Application to Optical Nanoscopy Christoph Brune, Alex Sawatzky, and Martin Burger Westfälische Wilhelms-Universität Münster, Institut für Numerische und Angewandte Mathematik, Einsteinstr. 62, D-48149 Münster, Germany {christoph.brune,alex.sawatzky,martin.burger}@wwu.de http://imaging.uni-muenster.de Abstract. Measurements in nanoscopic imaging suffer from blurring effects concerning different point spread functions (PSF). Some apparatus even have PSFs that are locally dependent on phase shifts. Additionally, raw data are affected by Poisson noise resulting from laser sampling and "photon counts" in fluorescence microscopy. In these applications standard reconstruction methods (EM, filtered backprojection) deliver unsatisfactory and noisy results. Starting from a statistical modeling in terms of a MAP likelihood estimation we combine the iterative EM algorithm with TV regularization techniques to make an efficient use of a-priori information. Typically, TV-based methods deliver reconstructed cartoon-images suffering from contrast reduction. We propose an extension to EM-TV, based on Bregman iterations and inverse scale space methods, in order to obtain improved imaging results by simultaneous contrast enhancement. We illustrate our techniques by synthetic and experimental biological data.
1
Introduction
Image reconstruction is a fundamental problem in many fields of applied sciences, e.g. nanoscopic imaging, medical imaging or astronomy. Fluorescence microscopy for example is an important imaging technique for the investigation of biological (live-) cells, up to nano-scale. In this case image reconstruction arises in form of deconvolution problems. Undesirable blurring effects can be ascribed to diffraction of light. Mathematically, image reconstruction in such applications can often be formulated as the solution of a linear inverse and ill-posed problem. The task consists of computing an estimation of an unknown object from given measurements. Typically these problems deal with Fredholm integral equations of the first kind ¯ , f¯ = Ku
(1)
¯ is a compact operator, f¯ (exact) data and u the desired image. In the where K ¯ is a convolution operator case of nanoscopic imaging K ¯ (Ku)(x) = (k ∗ u)(x) = k(x − y)u(y)dy , Ω X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 235–246, 2009. c Springer-Verlag Berlin Heidelberg 2009
236
C. Brune, A. Sawatzky, and M. Burger
where k is a convolution kernel, describing the blurring effects created by a ¯ is not suitable, nanoscopic apparatus. Determining u by direct inversion of K since (1) is ill-posed. In such cases regularization techniques are needed to produce reasonable reconstructions. A frequently used way to realize regularization techniques is the Bayesian model, whose aim is the computation of an estimate u of the unknown object by maximizing the a-posteriori probability density p(u|f ) with measurements f . The latter is given according to Bayes formula p(u|f ) ∼ p(f |u) p(u) .
(2)
This approach is called maximum a-posteriori probability (MAP) estimation. If the measurements f are given, we describe the density p(u|f ) as the a-posteriori likelihood function which depends on u only. The Bayesian approach (2) has the advantage that it allows to incorporate additional information about u via the prior probability density p(u) into the reconstruction process. The most frequently used prior densities are Gibbs functions p(u) ∼ e−α R(u) ,
(3)
where α is a positive parameter and R a convex energy. Usual models for the probability density p(f |u) in (2) are Gaussian- or Poisson-distributed raw data f , i.e. 2
p(f |u) ∼ e−Ku−f 2 /(2σ
2
)
,
p(f |u) ∼
(Ku)fi i
fi !
i
e−(Ku)i ,
(4)
¯ In the canonical case of additive where K is a semi-discrete Operator based on K. Gaussian noise (see (4), left) the minimization of the negative log likelihood function (2) leads to classical Tikhonov regularization [1] based on minimizing a functional of the form 1 2 min Ku − f 2 + α R(u) . (5) u≥0 2 The first, so-called data-fidelity term, penalizes the deviation from equality in (1) whereas R is a regularization term as in (3). If we choose K = Id and the total variation (TV) regularization technique R(u) := |u|BV , we obtain the wellknown ROF-model [2] for image denoising. The additional positivity constraint is necessary in typical applications as the unknown represents a density image. In nanoscopic imaging measured data are stochastic and pointwise, more precisely, they are called "photon counts". This property refers to laser scanning techniques in fluorescence microscopy. Consequently, the random variables of measured data are not Gaussian- but Poisson-distributed (see (4), right), with expected value given by equation (1). Hence a MAP estimation via the negative log likelihood function (2) leads to the following variational problem [1] min (Ku − f log Ku) dμ + α R(u) . (6) u≥0
Ω
Bregman-EM-TV Methods
237
Up to additive terms independent of u, the data-fidelity term is the so-called Kullback-Leibler functional (also known as cross entropy or I-divergence) between the two probability measures f and Ku. A particular complication of (6) compared to (5) is the strong nonlinearity in the data fidelity term and resulting issues in the computation of minimizers. In case of K = Id, i.e. in case of Poisson noise removal with total variation regularization, we refer to [3]. In the absence of regularization (α = 0) the EMalgorithm (cf. [4, 5, 6]) has become a standard scheme, which is however difficult to be generalized to regularized cases. Robust solutions of this problem for appropriate models of R is one of the novelties of this paper. The specific choice of the regularization functional R in (6) is important for how a-priori information about the expected solution is incorporated into the reconstruction process. Smooth, in particular quadratic regularizations have attracted most attention in the past, mainly due to the simplicity in analysis and computation. However, such regularization approaches always lead to blurring of the reconstructions, in particular they cannot yield reconstructions with sharp edges. Recently, singular regularization energies, in particular those of 1 or L1 -type, have attracted strong attention. In this work, we introduce an approach which uses total variation (TV) as the regularization functional. TV regularization was derived as a denoising technique in [2] and generalized to various other imaging tasks subsequently. The exact definition of TV [7], used in this paper, is R(u) := |u|BV = sup u divg , (7) g∈C0∞ (Ω,Rd ), ||g||∞ ≤1
Ω
which is formally (true if u is sufficiently regular) |u|BV = Ω |∇u|. The motivation for using TV is the effective suppression of noise and the realization of almost homogeneous regions with sharp edges. These features are attractive for nanoscopic imaging if the goal is to identify object shapes that are separated by sharp edges and shall be analyzed quantitatively. Unfortunately, images reconstructed by methods using TV regularization suffer from loosing contrast. In this paper, we suggest to extend EM-TV by iterative regularization to Bregman-EM-TV, attaining simultaneous contrast enhancement. More precisely, we apply total variation inverse scale space methods by employing the concept of Bregman distance regularization. The latter has been derived in [8] with a detailed analysis for Gaussian-type problems (5) and generalized to time-continuity [9] and Lp -norm data fitting terms [10]. Here, in the case of Poisson-type problems, the method consists in computing a minimizer u1 of (6) with R(u) := |u|BV first. Updates are determined successively by computing l+1 l u = arg min (Ku − f log Ku) dμ + α ( |u|BV − p , u ) , (8) u∈BV (Ω)
Ω
l
where p is an element of the subgradient of the total variation semi norm in ul . Introducing the Bregman distance with respect to | · |BV defined via p˜ (u, u ˜) := |u|BV − |˜ u|BV − ˜ p, u − u ˜ , D|·| BV
p˜ ∈ ∂|˜ u|BV ⊆ BV ∗ (Ω) ,
(9)
238
C. Brune, A. Sawatzky, and M. Burger
where ·, · denotes the duality product, allows to characterize ul+1 in (8) as pl l+1 l u = arg min (Ku − f log Ku) dμ + α D|·|BV (u, u ) . (10) u∈BV (Ω)
Ω
We will see that inverse scale space strategies can noticeably improve reconstructions for inverse problems with Poisson statistics like optical nanoscopy.
2
Reconstruction Methods
In literature there are two types of reconstruction methods that are used in general: analytic (direct) and algebraic (iterative) methods. A classical example for a direct method is the Fourier-based filtered backprojection (FBP). Although FBP is well understood and computationally efficient, iterative type methods obtain more and more attention in the applications mentioned above. The major reason is the high noise level (low SNR) and the type of statistics, which cannot be taken into account by direct methods. Hence, we will give a short review on the Expectation-Maximization (EM) algorithm [4, 11], which is a popular iterative algorithm to maximize the likelihood function p(u|f ) in problems with incomplete data. Then we will proceed to the presentation of the proposed EMTV and Bregman-EM-TV algorithm. 2.1
Reconstruction Method: EM Algorithm
In the absence of prior knowledge any object u has the same relevance, i.e. the Gibbs a-priori density p(u) in (3) is constant. We can then normalize p(u) such that R(u) ≡ 0. Hence (6) reduces to the constrained minimization problem min (Ku − f log Ku) dμ . (11) u≥0
Ω
A suitable iteration scheme for computing stationary points, which also preserves positivity (assuming K preserves positivity), is the so called EM algorithm (cf. [12]) K∗ f , k = 0, 1, . . . . (12) uk+1 = uk ∗ K 1 Kuk For noise-free data f several convergence proofs of the EM algorithm to the maximum likelihood estimate, i.e. the solution of (11), can be found in literature [12,13,14,15]. Besides, it is known that the speed of convergence of iteration (12) is slow. A further property of the iteration is a lack of smoothing, whereby the so-called "checkerboard effect" arises. For noisy data f it is necessary to differentiate between discrete and continuous modeling. In the discrete case, i.e. if K is a matrix and u is a vector the existence of a minimum can be guaranteed since the smallest singular value is bounded by a positive value. Hence, the vectors are bounded during the iteration and convergence is ensured. However, if K is a general continuous operator
Bregman-EM-TV Methods
239
the convergence is not only difficult to prove, but even a divergence of the EM algorithm is possible. Again the reason is the ill-posedness of the integral equation (1), which transfers to problem (11). This aspect can be taken as a lack of additional a-priori knowledge about the unknown u resulting from R(u) = 0. The EM algorithm converges to a minimizer if it exists. Consequently, in the continuous case it is essential to ensure consistence of the given data to prevent divergence of the EM algorithm. As described in [13], the EM iterates show the following typical behavior for ill-posed problems. The (metric) distance between the iterates and the solution decreases initially before it increases as the noise is amplified during the iteration process. This issue might be regulated by using appropriate stopping rules to obtain reasonable results. In [13] it is shown that certain stopping rules indeed allow stable approximations. Ways to improve reconstruction results are TV or Bregman-TV regularization techniques that we will consider in the following section. 2.2
Reconstruction Method: EM-TV Algorithm
The EM or Richardson/Lucy algorithm is currently the standard iterative reconstruction method for deconvolution problems with Poisson noise based on the linear equation (1). However, with the assumption R(u) = 0, no a-priori knowledge about the expected solution is taken into account, i.e. different images have the same a-priori probability. Especially in case of measurements with low SNR the multiplicative fixed point iteration (12) delivers unsatisfactory and noisy results even with early termination. Therefore we propose to integrate nonlinear variational methods into the reconstruction process to make an efficient use of a-priori information and to obtain improved results. An interesting approach to improve the reconstruction is the EM-TV algorithm. In the classical EM algorithm, the negative log likelihood functional (11) is minimized. We modify the functional by adding a weighted TV term [2], min . (13) (Ku − f log Ku) dμ + α|u|BV u∈BV (Ω) u≥0
Ω
This is exactly (6) with TV as regularization functional R. That means images with smaller total variation are preferred in the minimization (have higher prior probability). BV (Ω) is a popular function space in image processing since it can represent discontinuous functions. By minimizing TV the latter are even preferred [16, 17]. Hence, expected reconstructions are cartoon-like images. Obviously, such an approach cannot be used for studying very small structures in an object, but it is perfect for segmenting different cell structures and analyzing them quantitatively. For the solution of (13), we propose a forward-backward splitting algorithm, which can be realized by alternating classical EM steps with almost standard TV minimization steps as encountered in image denoising. The latter is solved by using duality [18] obtaining a robust and efficient algorithm. For designing the proposed alternating algorithm, we consider the first order optimality condition
240
C. Brune, A. Sawatzky, and M. Burger
of (13). Due to the total variation, this variational problem is not differentiable in the usual sense. But the latter is convex since TV is convex and since we can extend the data fidelity term to a Kullback-Leibler functional, cf. [19], without affecting the stationary points. For such problems powerful methods from convex analysis are available, e.g. a generalized derivative called the subdifferential [20], denoted by ∂. This generalized notion of gradients and the Karush-Kuhn-Tucker (KKT) conditions [20, Theorem 2.1.4] yield the existence of a Lagrange multiplier λ ≥ 0 such that ⎧ ⎫ f ⎨ 0 ∈ K ∗1 − K ∗ + α ∂|u|BV − λ ⎬ Ku . (14) ⎩ ⎭ 0 = λu By multiplying (14) with u we can eliminate the Lagrange multiplier and derive the following semi-implicit iteration scheme K∗ f + α ˜ uk pk+1 = 0 (15) uk+1 − uk ∗ K 1 Kuk ˜ := Kα∗ 1 . Interestingly, the second term within with pk+1 ∈ ∂|uk+1 |BV and α this iteration scheme is the EM step in (12). Consequently, method (15) solving variational problem (13), can be realized as a nested two step iteration, ⎧ ⎫ ∗ f ⎨u 1 = u K ⎬ (EM step) k k+ 2 K ∗ 1 Kuk . (16) ⎩ ⎭ uk+1 = uk+ 12 − α ˜ uk pk+1 (TV step) Thus, we alternate an EM step with a TV correction step. The complex second half step from uk+ 12 to uk+1 can be realized by solving the following variational problem, (u − uk+ 12 )2 1 . (17) uk+1 = arg min +α ˜ |u|BV 2 Ω uk u∈BV (Ω) Inspecting the first order optimality condition confirms the equivalence of this minimization with the TV correction step in (16). Problem (17) is just a modified version of the Rudin-Osher-Fatemi (ROF) model, with weight u1k in the fidelity term. This analogy creates the opportunity to carry over efficient numerical schemes known for the ROF-model. For the solution of (17) we use the exact definition of TV (7) with dual variable g and derive an iteration scheme for the quadratic dual problem similar to [18]. The resulting algorithm reads as follows: We initialize the dual variable g 0 with 0 (or the resulting g from the previous TV correction step) and for any n ≥ 0 we compute the update g n+1 =
α uk divg n − uk+ 12 ) g n + τ ∇(˜ 1 + τ |∇(˜ α uk divg n − uk+ 12 )|
,
0 < τ
0 (resp., ≥ 0) for v ∈ Rm \ {0} . This set is of special interest since DT-MRI produces data with this property. Note that at each point x the matrix U (x) of a field of symmetric matrices can be diagonalised yielding U (x) = V (x) D(x)V (x), where V (x) is a orthogonal matrix, while D(x) is a diagonal matrix. In the sequel we will denote m × m - diagonal matrices with entries λ1 , . . . , λm ∈ R from left to right simply by diag(λi ). The extension of a function h : R −→ R to Symm (R) is standard [19]: With a slight abuse of notation we set h(U ) := V diag(h(λ1 ), . . . , h(λm ))V ∈ Sym+ m (R), h denoting now a function acting on matrices as well. Specifying h(s) = |s|, s ∈ R as the absolut value function leads to the absolut value |A| ∈ Sym+ m (R) of a matrix A. It is natural to define the partial derivative for matrix fields componentwise: ∂ ω U = (∂ω Up,q )p,q=1,...,m
(6)
where ω ∈ {t, x1 , . . . , xd }, that is, ∂ ω stands for a spatial or temporal derivative. Viewing a matrix as a tensor (of second order), its gradient would be a third order tensor according to the rules of differential geometry. However, we adopt a more operator-algebraic point of view by defining the generalised gradient ∇U (x) at a voxel x = (x1 , . . . , xd ) by ∇U (x) := (∂ x1 U (x), . . . , ∂ xd U (x))
(7)
250
B. Burgeth et al.
which is an element of (Symm (R))d , in close analogy tothe scalar setting where ∇u(x) ∈ Rd . For W ∈ (Symm (R))d we set |W |p := p |W1 |p + · · · + |Wd |p for 0 < p < +∞. It results in a positive semidefinite matrix from Sym+ m (R), the direct counterpart of a nonnegative real number as the length of a vector in Rd . There will be the need for a symmetric multiplication of symmetric matrices. We opt for the so-called Jordan product A •J B := 12 (AB + BA) . It produces a symmetric matrix, and it is commutative but neither associative nor distributive. Furthermore, for later use in numerical schemes we have to clarify the notion of maximum and minimum of two symmetric matrices A, B. In direct anaology with relations known to be valid for real numbers one defines [8]: max(A, B) =
1 1 (A + B + |A − B|) and min(A, B) = (A + B − |A − B|) (8) 2 2
where |F | stands for the absolut value of the matrix F . With this at our disposal we formulate the matrix-valued counterpart of (3) as ∂ t U = |M (U ) • ∇U |2
(9)
with an initial matrix field F (x) = U (x, 0). Here M (U ) denotes a symmetric md × md-block matrix with d2 blocks of size m × m that is multiplied block-wise with ∇U employing the symmetrised product "•". Note that | · |2 stands for the length of M (U ) • ∇U in the matrix valued sense. The construction of M (U ) is detailed in Section 3 and relies on the so-called full structure tensor. The full structure tensor S L for matrix fields as defined in [10] reads S L (U ) := Gρ ∗ ∇U ·(∇U ) = Gρ ∗ ∂ xi U · ∂ xj U i,j=1,...,d (10) with Gρ ∗ indicating a convolution with a Gaussian of standard deviation ρ. S L (U (x)) is a symmetric md × md-block matrix with d2 blocks of size m × m, S L (U (x)) ∈ Symd (Symm (IR)) = Symmd (IR). Typically for the 3D medical DT-MRI data one has d = 3 and m = 3, yielding a 9 × 9-matrix S L . It can md be diagonalised as S L (U ) = k=1 λk wk wk with real eigenvalues λk (w.l.o.g. arranged in decreasing order) and an orthonormal basis {wk }k=1,...,md of IRmd . In order to extract useful d-dimensional directional information S L (U ) ∈ Symmd (IR) is reduced to a structure tensor S(U ) ∈ Symd (IR) in a generalised projection step [10] using the block operator matrix TrA := diag(trA , . . . , trA ) containing the trace operation. We set Tr := TrIm where Im denotes the m × m unit matrix. This operator matrix acts on elements of the space (Symm (IR))d as well as on block matrices via formal block-wise matrix multiplication, ⎛ ⎞⎛ ⎞ ⎛ ⎞ trA · · · 0 M11 · · · M1d trA (M11 ) · · · trA (M1d ) . . . . . . .. .. .. ⎝ .. . . . .. ⎠ ⎝ .. . . . .. ⎠ = ⎝ ⎠, (11) . 0 · · · trA trA (Md1 ) · · · trA (Mdd ) Md1 · · · Mdd provided that the square blocks Mij have the same size as A. The projection that is conveyed by the reduction process condenses the directional information contained in S L (U ), for a more detailed reasoning we must refer the reader to [10]
PDE-Driven Adaptive Morphology for Matrix Fields
251
for the sake of brevity. The reduction operation is accompanied by an extension operation: The Im -extension is the mapping from Symd (IR) to Symmd (IR) conveyed by the Kronecker product ⊗ : ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ v11 · · · v1d v11 · · · v1d Im · · · Im v11 Im · · · v1d Im ⎜ .. . . .. ⎟ ⎜ .. . . .. ⎟ ⎜ .. . . .. ⎟ ⎜ .. .. ⎟(12) .. −→⎝ . ⎝ . . . ⎠ . . ⎠⊗⎝ . . . ⎠:=⎝ . . . ⎠ vd1 · vdd
vd1 · · · vdd
Im · · · Im
vd1 Im · · · vdd Im
This resizing step renders a proper matrix-vector multiplication with the large generalised gradient (∇U (x)) possible. By specifying the matrix A in (11) one may invoke a priori knowledge into the direction estimation [10]. The research on these structure-tensor concepts has been initiated by [38, 7]. The approaches to matrix field regularisation suggested in [11] are based on differential geometric considerations. Comprehensive survey articles on the analysis of matrix fields using various techniques can be found in [39].
3
Steering Matrix M (U ) for Matrix Fields
With this notions we are in the position to propose the steering matrix M in the adaptive dilation process for matrix fields. We proceed in four steps: 1. The matrix field IRd x → U (x) provides us with a module field of generalised gradients ∇U (x) from which we construct the generalised structure tensor S L (U (x)) possibly with a certain integration scale ρ. This step corresponds exactly to the scalar case. 2. We infer d-dimensional directional information by reducing S L (U (x)) with trA by means of the block operator matrix TrA leading to a symmetric d × dmatrix S, for example S = Jρ if A = Im , (13) S(x) := TrA S L (U (x)) 3. The symmetric d × d-matrix S is spectrally decomposed, and the following mapping is applied: Rd+ −→ Rd H: , (14) c (λ1 , . . . , λd ) −→ λ1 +···+λ (λd , λd−1 , . . . , Kc · λ1 ) d with constants c, K > 0. H applied to S yields the steering matrix M , M := H(S)
(15)
Observe that the ellipsoid associated with the matrix M is flipped if compared with S and, depending on the choice of K, more excentric than the one accompanying S. 4. Finally we enlarge the d × d-matrix M to a md × md-matrix M by the extension operation: ⎛ ⎞ Im · · · Im ⎜ ⎟ M = M ⊗ ⎝ ... . . . ... ⎠ (16) Im · · · Im
252
4
B. Burgeth et al.
Matrix-Valued Numerical Schemes
In the context of PDE-based mathematical morphology, first-order finite difference methods such as the Osher-Sethian scheme [25] and the Rouy-Tourin method [28] are reasonable choices for solving the scalar PDE (4). We choose the latter in our experiments. The variant we present for the sake of brevity in its two-dimensional form reads 1 x n 2 1 n+1 n x n ui,j = ui,j + τ max max −D− ui,j , 0 , max D+ ui,j , 0 hx hx 1/2 1 y n 2 1 y n + max max −D− ui,j , 0 , max D+ ui,j , 0 (17) hy hy In the latter formulation we employ the notation unij as the grey value of the image u at the pixel centred at (ihx , jhy ) ∈ R2 at the time-level nτ of the
Fig. 1. (a) Top left: 2D slice of original 3D matrix field. (b) Top right: Adaptive dilation with of the original data with K = 25, ρ = 1 after t = 0.3. (c) Bottom left: Standard PDE-based dilation mimicing a ball-shaped structuring element after t = 1. (d) Bottom right: CED-filtering with ρ = 4 after t = 10.
PDE-Driven Adaptive Morphology for Matrix Fields
253
evolution. Additionally we use standard abbreviations for forward and backward x n x n difference operators, i.e., D+ ui,j := uni+1,j −uni,j and D− ui,j := uni,j −uni−1,j . and spatial grid size hx , hy . This scheme approximates, in the pixel (ihx , jhy ) 1 x n 1 x n max −D− ui,j , 0 , max D+ ui,j , 0 (18) ux ≈ max hx hx uy ≈ max
1 y n 1 y n max −D− ui,j , 0 , max D+ ui,j , 0 hy hy
(19)
Using this approximations, we modify the original Rouy-Tourin scheme (17) in an obvious manner to obtain a numerical scheme for the adaptive version of the PDE-based dilation (3). The extension to higher dimensions poses no problem. Since linear combinations and elementary functions such as the square, squareroot or absolute value function for matrix fields are now at our disposal it is straightforward to define one sided differences in x-direction for 2D matrix fields of m × m-matrices: x n D+ U (i, j) := U n ((i + 1)hx , jhy ) − U n (ihx , jhy ) ∈ Symm (R)
(20)
x n U (i, j) := U n (ihx , jhy ) − U n ((i − 1)hx , jhy ) ∈ Symm (R) D−
(21)
Fig. 2. (a) Left: 2D slice of 3D DT-MRI data set. (b) Right: Adaptive dilation of the original data with K = 10, ρ = 1, t = 0.5.
254
B. Burgeth et al.
In order to avoid confusion with the subscript notation for matrix components we used the notation U (i, j) to indicate the (matrix-) value of the matrix field evaluated at the voxel centred at (ihx , jhy ) ∈ R2 . The y-direction (and z-direction in 3D) is treated accordingly. The notion of supremum and infimum of two matrices – as needed in a matrix variant of Rouy-Tourin – has been provided by (8). Having these generalisations at our disposal a modified, adaptive version of the Rouy-Tourin scheme is available now in the setting of matrix fields simply by replacing grey values unij by matrices U n (i, j).
5
Experiments
The matrix data are visualised as an ellipsoid in each voxel via the level sets of quadratic form {v ∈ R2 v : v U −2 (i, j)v = const.} associated with the matrix
Fig. 3. (a) Top left: Enlarged section of the original data of figure 2 showing the genu area. (b) Top right: Adaptive dilation of the original data with K = 10, ρ = 1, t = 0.5. (c) Bottom left: Standard PDE-based dilation mimicing a ball-shaped structuring element with t = 0.5. (d) Bottom right: CED-filtering with ρ = 1 after t = 0.5.
PDE-Driven Adaptive Morphology for Matrix Fields
255
Fig. 4. (a) Top left: Enlarged section of the original data of figure 2 showing the splenium area. (b) Top right: Adaptive dilation of the original data with K = 10, ρ = 1, t = 0.5. (c) Bottom left: Standard PDE-based dilation mimicing a ball-shaped structuring element with t = 0.5. (d) Bottom right: CED-filtering with ρ = 1 after t = 0.5.
U (i, j) ∈ Sym+ 3 (R) representing the matrix field at voxel (ihx , jhy ). By using U −2 the length of the semi-axes of the ellipsoid correspond directly with the three eigenvalues of the matrix. Changing the constant const. amounts to a mere scaling of the ellipsoids. Note that only positive definite matrices produce ellipsoids as level sets of its quadratic form. In all our experiments we compare the results of the proposed matrix-valued adaptive dilation with the isotropic dilation [8] , and with the matrix-valued coherence-enhancing diffusion from [10]. For the explicit numerical schemes we used a time step size of 0.1, grid size hx = hy = 1, and c = 0.01 · K in (14). Figure 1 shows a synthetic data set of size 32 × 32 representing an interrupted diagonal stripe built from cigar-shaped ellipsoids of equal size. All methods succeed to some degree to fill the gaps. In the case of the proposed adaptive dilation the gap is filled almost completely with tensors comparable in size with the original ones while the width of the stripe is not altered at all. However, the numerical scheme has a slight bias towards the directions of the coordinate system entailing in the appearance of mild artefacts. Standard dilation fills the gap basically as a side effect of the isotropic dilation process which leads also to a considerable widening of the ribbon-like structure. CED for matrix fields produces indeed small cigar-shaped ellipsoids at the location of the gap. But the process is considerably slower than any of the dilation processes
256
B. Burgeth et al.
and the neighbouring ellipsoids become smaller due to the property of mass conservation. Additionally an undesirable widening of the stripe is observed. We also tested the proposed method on a real DT-MRI data set of a human head consisting of a 128 × 128 × 38-field of positive definite matrices. Figure 2 shows the lateral ventricals in a 40 × 55 2D section before and after applying adaptive dilation with speed parameter K = 10, integration scale ρ = 1 and stopping time t = 0.5. For a better comparison we display two enlarged regions of interest in Figures 3 and 4, namely the genu and the splenium areas, resp.. We observe that adaptive dilation preserves the shape of the ventricles better than the isotropic dilation, while enhancing slightly the directional structure of the fibre tracts surrounding the ventricles. Due to measurement errors the fibre tracts are interrupted in the original Figures 3(a) and 4(a). These holes in the anisotropic regions (splenium) are quickly filled by the adaptive dilation while CED-filtering will take much longer to do so.
6
Conclusion
In this article we have presented a novel method for an adaptive, PDE-based dilation process in the setting of matrix fields. The evolution governed by a matrix-valued PDE is guided by a steering tensor, the construction of which relies on an extended structure tensor concept for matrix fields. A matrix-valued extension of the Rouy-Tourin-scheme that allows to include directional information is employed to solve the novel PDE. Experiments on positive semidefinite DT-MRI and synthetic data confirm that the novel adaptive dilation process displays line-enhancing and gap-closing qualities, and as such it is superior to standard isotropic dilation which extends structures in all directions. It is also a valuable alternative in terms of quality and speed to coherence-enhancing diffusion filtering for matrix fields, an anisotropic processes which aims at enhancing flow-like structures as well but may suffer from dissipative effects. Future research will concentrate on improving the numerical realisation of our adaptive dilation.
Acknowledgement The financial support of the German Academic Exchange Service (DAAD) for the third author is gratefully acknowledged.
References 1. Alvarez, L., Guichard, F., Lions, P.-L., Morel, J.-M.: Axioms and fundamental equations in image processing. Archive for Rational Mechanics and Analysis 123, 199–257 (1993) 2. Arehart, A.B., Vincent, L., Kimia, B.B.: Mathematical morphology: The Hamilton–Jacobi connection. In: Proc. Fourth International Conference on Computer Vision, Berlin, pp. 215–219. IEEE Computer Society Press, Los Alamitos (1993)
PDE-Driven Adaptive Morphology for Matrix Fields
257
3. Bigün, J.: Vision with Direction. Springer, Berlin (2006) 4. Bigün, J., Granlund, G.H., Wiklund, J.: Multidimensional orientation estimation with applications to texture analysis and optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(8), 775–790 (1991) 5. Breuß, M., Burgeth, B., Weickert, J.: Anisotropic continuous-scale morphology. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007. LNCS, vol. 4478, pp. 515–522. Springer, Heidelberg (2007) 6. Brockett, R.W., Maragos, P.: Evolution equations for continuous-scale morphological filtering. IEEE Transactions on Signal Processing 42, 3377–3386 (1994) 7. Brox, T., Weickert, J., Burgeth, B., Mrázek, P.: Nonlinear structure tensors. Image and Vision Computing 24(1), 41–55 (2006) 8. Burgeth, B., Bruhn, A., Didas, S., Weickert, J., Welk, M.: Morphology for tensor data: Ordering versus PDE-based approach. Image and Vision Computing 25(4), 496–511 (2007) 9. Burgeth, B., Didas, S., Florack, L., Weickert, J.: A generic approach to diffusion filtering of matrix-fields. Computing 81, 179–197 (2007) 10. Burgeth, B., Didas, S., Weickert, J.: A general structure tensor concept and coherence-enhancing diffusion filtering for matrix fields. Technical Report 197, Department of Mathematics, Saarland University, Saarbrücken, Germany (July 2007); to appear in: Laidlaw, D., Weickert, J. (eds.): Visualization and Processing of Tensor Fields. Springer, Heidelberg (2009) 11. Chefd’Hotel, C., Tschumperlé, D., Deriche, R., Faugeras, O.: Constrained flows of matrix-valued functions: Application to diffusion tensor regularization. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 251–265. Springer, Heidelberg (2002) 12. Di Zenzo, S.: A note on the gradient of a multi-image. Computer Vision, Graphics and Image Processing 33, 116–125 (1986) 13. Feddern, C., Weickert, J., Burgeth, B., Welk, M.: Curvature-driven PDE methods for matrix-valued images. International Journal of Computer Vision 69(1), 91–103 (2006) 14. Förstner, W., Gülch, E.: A fast operator for detection and precise location of distinct points, corners and centres of circular features. In: Proc. ISPRS Intercommission Conference on Fast Processing of Photogrammetric Data, Interlaken, Switzerland, June 1987, pp. 281–305 (1987) 15. Goutsias, J., Heijmans, H.J.A.M., Sivakumar, K.: Morphological operators for image sequences. Computer Vision and Image Understanding 62, 326–346 (1995) 16. Goutsias, J., Vincent, L., Bloomberg, D.S. (eds.): Mathematical Morphology and its Applications to Image and Signal Processing. Computational Imaging and Vision, vol. 18. Kluwer, Dordrecht (2000) 17. Heijmans, H.J.A.M.: Morphological Image Operators. Academic Press, Boston (1994) 18. Heijmans, H.J.A.M., Roerdink, J.B.T.M. (eds.): Mathematical Morphology and its Applications to Image and Signal Processing. Computational Imaging and Vision, vol. 12. Kluwer, Dordrecht (1998) 19. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1990) 20. Kramer, H.P., Bruckner, J.B.: Iterations of a non-linear transformation for enhancement of digital images. Pattern Recognition 7, 53–58 (1975) 21. Lerallut, R., Decencière, E., Meyer, F.: Image filtering using morphological amoebas. Image and Vision Computing 25(4), 395–404 (2007)
258
B. Burgeth et al.
22. Louverdis, G., Vardavoulia, M.I., Andreadis, I., Tsalides, P.: A new approach to morphological color image processing. Pattern Recognition 35, 1733–1741 (2002) 23. Matheron, G.: Eléments pour une théorie des milieux poreux. Masson, Paris (1967) 24. Matheron, G.: Random Sets and Integral Geometry. Wiley, New York (1975) 25. Osher, S., Fedkiw, R.P.: Level Set Methods and Dynamic Implicit Surfaces. Applied Mathematical Sciences, vol. 153. Springer, New York (2002) 26. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton–Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 27. Rao, A.R., Schunck, B.G.: Computing oriented texture fields. CVGIP: Graphical Models and Image Processing 53, 157–185 (1991) 28. Rouy, E., Tourin, A.: A viscosity solutions approach to shape-from-shading. SIAM Journal on Numerical Analysis 29, 867–884 (1992) 29. Sapiro, G., Kimmel, R., Shaked, D., Kimia, B.B., Bruckstein, A.M.: Implementing continuous-scale morphology via curve evolution. Pattern Recognition 26, 1363– 1372 (1993) 30. Schultz, T., Burgeth, B., Weickert, J.: Flexible segmentation and smoothing of DTMRI fields through a customizable structure tensor. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Remagnino, P., Nefian, A., Meenakshisundaram, G., Pascucci, V., Zara, J., Molineros, J., Theisel, H., Malzbender, T. (eds.) ISVC 2006. LNCS, vol. 4291, pp. 455–464. Springer, Heidelberg (2006) 31. Serra, J.: Echantillonnage et estimation des phénomènes de transition minier. PhD thesis, University of Nancy, France (1967) 32. Serra, J.: Image Analysis and Mathematical Morphology, vol. 1. Academic Press, London (1982) 33. Serra, J.: Image Analysis and Mathematical Morphology, vol. 2. Academic Press, London (1988) 34. Soille, P.: Morphological Image Analysis, 2nd edn. Springer, Berlin (2003) 35. van den Boomgaard, R.: Mathematical Morphology: Extensions Towards Computer Vision. PhD thesis, University of Amsterdam, The Netherlands (1992) 36. Weickert, J.: Coherence-enhancing diffusion of colour images. In: Sanfeliu, A., Villanueva, J.J., Vitrià, J. (eds.) Proc. Seventh National Symposium on Pattern Recognition and Image Analysis, Barcelona, Spain, April 1997, vol. 1, pp. 239–244 (1997) 37. Weickert, J.: Coherence-enhancing diffusion filtering. International Journal of Computer Vision 31(2/3), 111–127 (1999) 38. Weickert, J., Brox, T.: Diffusion and regularization of vector- and matrix-valued images. In: Nashed, M.Z., Scherzer, O. (eds.) Inverse Problems, Image Analysis, and Medical Imaging. Contemporary Mathematics, vol. 313, pp. 251–268. AMS, Providence (2002) 39. Weickert, J., Hagen, H. (eds.): Visualization and Processing of Tensor Fields. Springer, Berlin (2006)
On Semi-implicit Splitting Schemes for the Beltrami Color Flow Lorina Dascal1 , Guy Rosman1 , Xue-Cheng Tai2,3 , and Ron Kimmel1 1
Department of Computer Science, Technion – Israel Institute of Technology, 32000, Haifa, Israel {lorina,rosman,ron}@cs.technion.ac.il 2 Division of Mathematical Sciences, SPMS, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
[email protected] 3 Department of Mathematics, University of Bergen, Johannes Brunsgate 12, 5007, Bergen, Norway
[email protected] Abstract. The Beltrami flow is an efficient non-linear filter, that was shown to be effective for color image processing. The corresponding anisotropic diffusion operator strongly couples the spectral components. Usually, this flow is implemented by explicit schemes, that are stable only for small time steps and therefore require many iterations. In this paper we introduce a semi-implicit scheme based on the locally one-dimensional (LOD) and additive operator splitting (AOS) schemes for implementing the anisotropic Beltrami operator. The mixed spatial derivatives are treated explicitly, while the non-mixed derivatives are approximated in a semi-implicit manner. Numerical experiments demonstrate the stability of the proposed scheme. Accuracy and efficiency of the splitting schemes are tested in applications such as the scale-space analysis and denoising. In order to further accelerate the convergence of the numerical scheme, the reduced rank extrapolation (RRE) vector extrapolation technique is employed.
1
Introduction
Nonlinear diffusion filters based on partial differential equations (PDEs) have been extensively used in the last decade for different tasks in image processing. Their efficient implementation requires designing numerical schemes in which the issues of accuracy, stability, and computational cost all play important roles. The Beltrami image flow is an example of a non-linear filter, that is efficient for color image processing. It treats the image as a 2-D manifold embedded in a hybrid spatial-feature space. Minimization of the image area surface yields the Beltrami flow. The corresponding diffusion operator is anisotropic and strongly couples the spectral components. Due to its anisotropy and non-separability, so far there is no efficient implicit, nor operator-splitting-based numerical scheme for the partial differential equation that describes the Beltrami flow in color. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 259–270, 2009. c Springer-Verlag Berlin Heidelberg 2009
260
L. Dascal et al.
Usual discretizations of this filter are based on explicit schemes, that limit the time step and therefore result in a large number of iterations. In [1] an acceleration technique based on the reduced rank extrapolation (RRE) algorithm [2, 3] was proposed in order to speed-up the slow convergence of the explicit scheme. As an alternative to the explicit scheme, an approximation using the short time kernel of the Beltrami operator was suggested in [4]. Although unconditionally stable, this method is still computationally demanding, since computing the kernel involves geodesic distance computation around each pixel. The bilateral filter, which can be shown to be an Euclidean approximation of the Beltrami kernel, was studied in different contexts (see [5], [6], [7], [8], [9], [10]). Recently, a related filter, the nonlocal means filter, was proposed in [11] and shown to be useful in denoising gray-scale and color images. In this paper we propose to approximate the system of nonlinear coupled equations given by the Beltrami flow by a semi-implicit finite difference scheme based on operator splitting. Additive operator splitting (AOS) schemes were first developed for (nonlinear elliptic/parabolic) monotone equations and NavierStokes equations [12, 13]. In image processing applications, the AOS scheme was found to be an efficient way for approximating the Perona-Malik filter [14], especially if symmetry in scale-space is required. The AOS scheme is first order in time, semi-implicit, and unconditionally stable with respect to its time-step [13, 14]. In the early 1950’s (see [15]) the alternating-direction method (ADI) was introduced, and in [16] the LOD (locally one-dimensional) splitting method was proposed. The LOD scheme and other multiplicative splitting methods were employed in the context of nonlinear diffusion image filtering in [17]. We stress that the main characteristic of this class of equations, which allows splitting, is local isotropy. However, in the case of the anisotropic Beltrami operator, the main difficulty in splitting stems from the presence of the mixed derivatives. To overcome this problem, we suggest to construct the following semi-implicit scheme; the spatial mixed derivatives are discretized explicitly at the current time step nΔt, while those that do not contain mixed derivatives are approximated using an average of two levels of time steps: nΔt and (n + 1)Δt (Crank-Nicolson scheme). As our equations are nonlinear, a stability proof of the corresponding finite difference scheme is a non-trivial task. We provide numerical experiments which indicate that the LOD and the AOS splitting schemes for the nonlinear Beltrami color filter are stable for a wide range of time steps. We demonstrate the efficiency and stability of the splitting in applications such as: Beltramibased scale space and Beltrami-based denoising. In order to further expedite the LOD/AOS splitting schemes, we show how to speed-up their convergence by using the RRE (reduced rank extrapolation) technique. The RRE method was introduced by Me˘sina and Eddy [2, 3] to speed-up the convergence of general sequences of vectors without explicit knowledge of the sequence generator. This technique was applied in [1] in order to speed up the slow convergence of the standard explicit scheme for the Beltrami color flow. In this paper we show that in applications such as scale-space and denoising of color images, the semiimplicit LOD/AOS schemes can also be accelerated using the RRE technique.
On Semi-implicit Splitting Schemes for the Beltrami Color Flow
261
This paper is organized as follows: In Section 2 we briefly summarize the Beltrami framework. In Section 3 we briefly review general semi-implicit splitting operator schemes. In Section 4 we propose a semi-implicit splitting scheme for the anisotropic Beltrami operator, based on the LOD/AOS schemes. In Section 5 we demonstrate the efficiency and stability of the LOD/AOS splitting schemes for Beltrami-based scale-space and Beltrami-based denoising. Furthermore, we propose to accelerate the LOD/AOS schemes using the RRE technique. Section 6 concludes the paper.
2
The Beltrami Framework
Let us briefly review the Beltrami framework for non-linear diffusion in computer vision [18, 19, 20, 21]. We represent images as embedding maps of a Riemannian manifold in a higher dimensional space. We denote the map by U : Σ → M , where Σ is a two-dimensional surface, with (σ 1 , σ 2 ) denoting coordinates on it. M is the spatial-feature manifold, embedded in Rd+2 , where d is the number of image channels. For example, a gray-level image can be represented as a 2D surface embedded in R3 . The map U in this case is U (σ 1 , σ 2 ) = (σ 1 , σ 2 , I(σ 1 , σ 2 )), where I is the image intensity. For color images, U is given by U (σ 1 , σ 2 ) = (σ 1 , σ 2 , I 1 (σ 1 , σ 2 ), I 2 (σ 1 , σ 2 ), I 3 (σ 1 , σ 2 )), where I 1 , I 2 , I 3 are the three components of the color vector. Next, we choose a Riemannian metric on this surface, g, with elements denoted by gij . The canonical choice of coordinates in image processing is Cartesian (we denote them here by x1 and x2 ). For such a choice, which we follow in the rest of the paper, we identify σ 1 = x1 and σ 2 = x2 . In this case, σ 1 and σ 2 are the image coordinates. We denote the elements of the inverse of the metric by superscripts g ij , and the determinant by g = det(gij ). Once images are defined as embedding of Riemannian manifolds, it is natural to look for a measure on this space of embedding maps. Denote by (Σ, g) the image manifold and its metric, and by (M, h) the spacefeature manifold and its metric. Then, the functional S[U ] assigns a real number to a map U : Σ → M , √ S[U, gij , hab ] = ds σ g||dU ||2g,h , (1) where s is the dimension of Σ, g is the determinant of the image metric, and the range of indices is i, j = 1, 2, ... dim(Σ) and a, b = 1, 2, ... dim(M ). The integrand ||dU ||2g,h is expressed in a local coordinate system by ||dU ||2g,h = (∂xi U a )g ij (∂xj U b )hab . This functional, for dim(Σ) = 2 and hab = δab , was first proposed by Polyakov [22] in the context of high energy physics, in the theory known as string theory. The elements of the induced metric for color images with Cartesian color coordinates are 3 3 1 + β 2 a=1 (Uxa1 )2 β 2 a=1 Uxa1 Uxa2 G = (gij ) = , (2) 3 3 β 2 a=1 Uxa1 Uxa2 1 + β 2 a=1 (Uxa2 )2
262
L. Dascal et al.
where a subscript of U denotes a partial derivative and the parameter β > 0 determines the ratio between the spatial and spectral (color) distances. Using standard methods in calculus of variations, the Euler-Lagrange equations with respect to the embedding (assuming Euclidean embedding space) are 1 1 δS 0 = − √ hab b = √ div (D∇U a ), g δU g
(3)
Δg U a
where the diffusion matrix is D =
√ −1 gG . Note that we can write 2
div(D∇U ) =
∂xq (dqr ∂xr U ).
q,r=1
The operator that acts on U is the natural generalization of the Laplacian from flat spaces to manifolds. It is called the Laplace-Beltrami operator, and denoted by Δg . The parameter β, in the elements of the metric gij , determines the nature of the flow. At the limits, where β → 0 and β → ∞, we obtain respectively a linear diffusion flow and a nonlinear flow, akin to the TV flow [23] for the case of grey-level images (see [20] for details). The Beltrami scale-space emerges as a gradient descent minimization process 1 δS Uta = − √ = Δg U a , g δU a
a = 1, 2, 3.
(4)
For Euclidean embedding, the functional in Eq. (1) reduces to S(U ) =
√ g dx1 dx2 .
(5)
This geometric measure can be used as a regularization term for color image processing. In the variational framework, the reconstructed image is the minimizer of a cost-functional. This functional can be written in the following general form, 3
Ψ (U ) = λ ||U a − F a ||2 + S(U ), a=1
where the parameter λ controls the smoothness of the solution and F is the given image. The modified Euler-Lagrange equations as a gradient descent process are 1 δΨ 2λ Uta = − √ = − √ (U a − F a ) + Δg U a , g δU a g
a = 1, 2, 3.
(6)
On Semi-implicit Splitting Schemes for the Beltrami Color Flow
3
263
Operator Splitting Schemes
In this section we briefly review standard first order accurate splitting schemes for diffusion equations. One of the main drawbacks of the semi-implicit schemes for such equations in multiple dimensions is that the resulting inverted matrix does not have an efficient algorithm for its inversion. In order to remedy this shortcoming, splitting techniques are commonly employed in solving timedependent partial differential equations. They allow one to reduce problems in multiple spatial dimensions to a sequence of problems in one dimension, which are easier to solve. One of the simplest splitting schemes belonging to the class of multiplicative operator splitting schemes, is the locally one-dimensional (LOD) scheme [16]. The LOD scheme only needs to invert one three-diagonal matrix for each direction. It is simple to implement, is unconditionally stable and it is first order accurate. However, the system matrix is not axis symmetric, a property that may be important in some cases. If such a property is required, one could use the additive operator splitting scheme [13], which was actually invented for parallel implementation of splitting methods. Even for sequential implementations, the AOS is almost as efficient as the LOD scheme; instead of multiplying the operators, one computes them independently and then averages the sums of the inverse of the two matrices. We want to emphasize that the matrices for AOS use 2Δt instead of Δt. It is not a trivial matter to apply dimensional splitting schemes for Beltrami type of equations. Our goal is to construct a splitting scheme for the nonlinear anisotropic Beltrami operator, which would amount to inverting tridiagonal matrices, be unconditionally stable and preserve the time discretization accuracy that was obtained without applying splitting techniques.
4
The Proposed Splitting Scheme
In this section we present an operator splitting scheme for the Beltrami filter. Before splitting, we first introduce a semi-implicit approximation scheme to our equations. A semi-implicit Crank-Nicolson scheme for an equation involving mixed derivatives can rely on the following discretization of the spatial derivatives operators: mixed derivatives are computed at time step nΔt, while the non-mixed derivatives are computed as the average of the values at time steps nΔt and (n + 1)Δt. This approach for handling mixed derivatives in semiimplicit schemes for approximating linear equations has been considered in several previous works (see [24, 25, 26] for example), including the context of image processing [27], although it was not combined with the Crank-Nicolson method in the latter case. We note that in numerical experiments we have found the introduction of the Crank-Nicolson method into the splitting scheme necessary in order to maintain stability for large time steps. A simpler scheme, similar to the one used in [27], did not seem to be sufficiently stable for this PDE and the applications demonstrated in this paper. We now present the scheme we intend to use.
264
L. Dascal et al.
First, we refine our grid notations. We work on the rectangle Ω = (0, 1)×(0, 1), which we discretize by a uniform grid of m × m pixels, such that xi = iΔx, yj = jΔy, tn = nΔt, where 1 ≤ i ≤ m, 1 ≤ j ≤ m, 1 ≤ n ≤ J and JΔt = T . Let the 1 grid size be Δx = Δy = m−1 . a For each channel U , a = 1, 2, 3 of the color vector, we define the discrete approximation (U a )nij by (U a )(iΔx, jΔy, nΔt) = (U a )nij ≈ U a (iΔx, jΔy, nΔt). We impose von-Neumann boundary condition, and initially set U a to be our initial data image. 4.1
LOD/AOS Scheme for the Beltrami Scale-Space
We approximate the Beltrami filter given in Eq. (4) by the following semi implicit Crank-Nicolson scheme: 1 1 n a n+1 1 n a n (U a )n+1 − (U a )n = √ n All (U ) + All (U ) + Δt g 2 2 2
l=1
2
2
l=1
Anqr (U a )n ,
q=1 r =q
where U a is the N -dimensional vector denoting one of the components of the color vector, and Anqr is a central difference approximation of the operator ∂xq (dqr ∂xr ) at time step n. Rearranging terms, we obtain −1
2
Δt (U a )n+1 = I − √ n Anll 2 g l=1 ⎛ ⎞ 2
2
Δt Δt ⎝I + √ Anqr + √ n Anll ⎠ (U a )n , g n q=1 2 g r =q
l=1
which can also be written as ⎞ −1 ⎛
2 2
2
Δt Δt ⎝I + Δt (U a )n+1 = I − A¯nll A¯nqr + A¯nll ⎠ (U a )n , 2 2 q=1 l=1
where
r =q
l=1
1 A¯11 = √ ∂x (A∂x ), g
1 A¯22 = √ ∂y (C∂y ), g
1 A¯12 = √ ∂x (B∂y ), g
1 A¯21 = √ ∂y (B∂x ), g
and the functions A, B, C are the corresponding elements of the diffusion matrix associated with the Beltrami flow.
On Semi-implicit Splitting Schemes for the Beltrami Color Flow
265
Again, this semi-implicit scheme still has a major drawback. At each iteration one needs to solve a large linear system whose matrix of coefficients is not tridiagonal and thus costly. Instead, we employ the LOD splitting scheme Δt ¯ −1 Δt ¯ −1 A22 A11 (U a )n+1 = I − I− 2 2 2
Δt ¯ Δt ¯ A11 )(I + A22 ) + Δt (I + A¯nqr (U a )n , 2 2 q=1 r =q
or the AOS scheme, that reads, (U a )n+1 =
−1 −1 1 I − ΔtA¯22 + I − ΔtA¯11 2 2
Δt ¯ Δt ¯ A11 )(I + A22 ) + Δt (I + A¯nqr (U a )n . 2 2 q=1 r =q
The above splitting schemes are efficient because at each time step a single tridiagonal matrix inversion is performed for each spatial dimension. The system of differential equations we deal with is nonlinear. The question of theoretical stability of the LOD/AOS based nonlinear finite difference scheme is a non-trivial challenge, with theory still lagging behind common practice. Our numerical experiments indicate that the splitting is stable for a wide variety of parameters, suitable for most applications, as will be shown in Section 5. 4.2
LOD/AOS Scheme for the Beltrami-Based Denoising
The splitting scheme in the presence of a fidelity term requires a slight modification that we detail below. In this case we solve for each channel the equation 2λ Uta = − √ (U a − F a ) + Δg U a , g
(7)
with von-Neumann boundary condition and the initial condition U a (x, 0) = F a (x). The Crank-Nicolson scheme approximating Eq. (7) is 2 Δt ¯n λ −1 (U a )n+1 = I − All + 2Δt √ n I 2 g l=1
2
Δt ¯n Δt ¯n A11 )(I + A22 ) + Δt A¯nqr (U a )n + 2 2 q=1 r =q λ +2ΔtF a √ n . g
(I +
(8)
266
L. Dascal et al.
It is possible to use LOD/AOS approximations for the inverse of the matrix in the above scheme. √ However, we would like to treat the fidelity term in a special way. When λ/ g n is big, we find that the scheme proposed below possesses better stability properties. We now describe the details for treating the fidelity term for our CrankNicolson the nominator and the denominator by the matrix scheme. Dividing S n = 1 + 2Δt √λgn I, and rearranging terms, we get 2 Δt n −1 ¯n −1 (S ) (U a )n+1 = I − All 2
l=1
2
Δt ¯n Δt ¯n A11 )(I + A22 ) + Δt A¯nqr (U a )n 2 2 q=1 r =q λ +2(S n )−1 ΔtF a √ n . g
(S n )−1 (I +
Approximating the semi-implicit scheme based on the LOD-splitting, we have −1 −1 1 1 I − Δt(S n )−1 A¯n11 (U a )n+1 = I − Δt(S n )−1 A¯n22 2 2 2
Δt ¯n Δt ¯n A11 )(I + A22 ) + Δt (S n )−1 (I + A¯nqr (U a )n + 2 2 q=1 r =q λ +2(S n )−1 ΔtF a √ n . g A similar splitting scheme can be developed using AOS.
5
Experimental Results
We proceed to demonstrate experimentally the stability, accuracy, and efficiency of the LOD and AOS splitting schemes for the Beltrami color flow. In Figure 1 we show the results of the Beltrami flow, implemented by employing the LOD splitting scheme for approximating Eq. (4). Next we illustrate the use of the splitting schemes in the case where the functional involves a fidelity term. A noisy image as well as the reference denoising result, based on the explicit scheme, are shown in Figure 3, next to the result of the AOS and LOD splitting schemes. Note that the visual results obtained by the two schemes are similar to the reference image. 5.1
RRE Extrapolation Technique for Acceleration of the LOD Splitting Scheme
In [28, 1] vector extrapolation was applied in order to speed up the slow convergence of the explicit schemes for the Beltrami color flow. In the experiments
On Semi-implicit Splitting Schemes for the Beltrami Color Flow
267
Fig. 1. Top row, left: The original image which contains JPEG artifacts.√Middle: Results of the LOD splitting scheme with Δt = 1, after 1 iteration, β = 103 , λ = 0. Right: Results of the LOD splitting scheme with after 2 iterations. Bottom row, left: Results of the LOD splitting scheme with after 4 iterations. Middle: a close-up of the original image. Right: a close-up of the resulting image after 4 iterations.
Fig. 2. The different image channels of an image patch taken from the images in Figure 1. Left to right: An image patch before denoising, its different color channels, the denoised image, and the denoised color channels. The color arrows indicate the direction of the gradient in the various color channels.
below we demonstrate how the RRE extrapolation technique can also be used to accelerate the convergence of implicit schemes. Figure 4 shows that the RRE method accelerates the LOD scheme. A comparison is also given to the convergence rate achieved by the method of [28,1]. Extrapolation techniques also allow us to obtain a more accurate rate, if one takes a smaller time step.
268
L. Dascal et al.
Fig. 3. Large image at the right: An image with artifacts resulting from lossy compression.. Smaller images – a close-up on a section of the image. Top row, left: The image with JPEG artifacts. Right: Beltrami-based denoising by explicit scheme, run with 4000 explicit iterations, Δt = 0.0005. Bottom row,√left: Denoising by LOD, Δt = 0.02. Right: Denoising by AOS, Δt = 0.02. λ = 1, β = 2000.
10
Residual Norm
10
10
10
10
5
Explicit Explicit+RRE LOD LOD+RRE
0
−5
−10
−15
0
10
20 30 40 CPU Time (sec)
50
Fig. 4. Graph of the residuals (LOD, explicit+RRE and LOD+RRE) versus CPU times. Parameters: Δt = 0.05 for the explicit scheme, Δt = 2.5 for LOD, λ = 0.5, β = √ 500 ≈ 22.36.
6
Conclusions
Due to its anisotropy and non-separability nature, no implicit scheme, nor operator splitting based scheme was so far introduced for the partial differential equations that describe the Beltrami color flow. In this paper we propose a
On Semi-implicit Splitting Schemes for the Beltrami Color Flow
269
semi-implicit splitting scheme based on LOD/AOS for the anisotropic Beltrami operator. The spatial mixed derivatives are discretized explicitly at time step nΔt , while the non-mixed derivatives are approximated using the average of the two time levels nΔt and (n + 1)Δt. The stability of the splitting is empirically tested in applications such as Beltrami-based scale-space and Beltrami-based denoising, which display a stable behavior. In order to further accelerate the convergence of the splitting schemes, the RRE vector extrapolation technique is employed.
Acknowledgements We thank Prof. Avram Sidi for interesting discussions. This research was supported by the United States -Israel Binational Science Foundation grant No. 2004274, by the Israeli Science Foundation grant No. 623/08, by the Ministry of Science grant No. 3-3414, and by the Elias Fund for Medical Research. XueCheng Tai is supported by the MOE (Ministry of Education) Tier II project T207N2202 and IDM project NRF2007IDMIDM002-010.
References 1. Rosman, G., Dascal, L., Kimmel, R., Sidi, A.: Efficient beltrami image filtering via vector extrapolation methods. SIAM J. Imag. Sci. (2008) (submitted) 2. Mešina, M.: Convergence acceleration for the iterative solution of the equations X = AX + f . Comp. Meth. Appl. Mech. Eng. 10, 165–173 (1977) 3. Eddy, R.: Extrapolating to the limit of a vector sequence. In: Wang, P. (ed.) Information Linkage Between Applied Mathematics and Industry, New York, pp. 387–396. Academic Press, London (1979) 4. Spira, A., Kimmel, R., Sochen, N.A.: A short-time Beltrami kernel for smoothing images and manifolds. IEEE Trans. Image Process. 16(6), 1628–1636 (2007) 5. Smith, S.M., Brady, J.: Susan - a new approach to low level image processing. Intl. J. of Comp. Vision 23, 45–78 (1997) 6. Aurich, V., Weule, J.: Non-linear gaussian filters performing edge preserving diffusion. In: Mustererkennung 1995, 17. DAGM-Symposium, London, UK, pp. 538–545. Springer, Heidelberg (1995) 7. Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Proceedings of IEEE International Conference on Computer Vision, pp. 836–846 (1998) 8. Sochen, N., Kimmel, R., Bruckstein, A.M.: Diffusions and confusions in signal and image processing. J. of Math. Imag. and Vision 14(3), 195–209 (2001) 9. Elad, M.: On the bilateral filter and ways to improve it. IEEE Trans. Image Process. 11(10), 1141–1151 (2002) 10. Barash, D.: A fundamental relationship between bilateral filtering, adaptive smoothing and the nonlinear diffusion equation. IEEE Trans. Image Process. 24(6), 844–847 (2002) 11. Buades, A., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. SIAM Interdisciplinary Journal 4, 490–530 (2005) 12. Lu, T., Neittaanmaki, P., Tai, X.C.: A parallel splitting up method and its application to Navier-Stokes equations. Applied Mathematics Letters 4(2), 25–29 (1991)
270
L. Dascal et al.
13. Lu, T., Neittaanmaki, P., Tai, X.C.: A parallel splitting up method for partial differential equations and its application to Navier-Stokes equations. RAIRO Mathematical Modelling and Numerical Analysis 26(6), 673–708 (1992) 14. Weickert, J., Romeny, B.M.T.H., Viergever, M.A.: Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Trans. Image Process. 7(3), 398–410 (1998) 15. Peaceman, D.W., Rachford, H.H.: The numerical solution of parabolic and elliptic differential equations. Journal Soc. Ind. Appl. Math. 3, 28–41 (1955) 16. Yanenko, N.N.: The method of fractional steps. The solution of problems of mathematical physics in several variables. Springer-Verlag, New York (1971) 17. Barash, D., Schlick, T., Israeli, M., Kimmel, R.: Multiplicative operator splittings in nonlinear diffusion: from spatial splitting to multiple timesteps. J. of Math. Imag. and Vision 19(16), 33–48 (2003) 18. Kimmel, R., Malladi, R., Sochen, N.: Images as embedding maps and minimal surfaces: Movies, color, texture, and volumetric medical images. Intl. J. of Comp. Vision 39(2), 111–129 (2000) 19. Sochen, N., Kimmel, R., Maladi, R.: From high energy physics to low level vision. In: ter Haar Romeny, B.M., Florack, L.M.J., Viergever, M.A. (eds.) Scale-Space 1997. LNCS, vol. 1252, pp. 236–247. Springer, Heidelberg (1997) 20. Sochen, N., Kimmel, R., Maladi, R.: A general framework for low level vision. IEEE Trans. Image Process. 7, 310–318 (1998) 21. Yezzi, A.J.: Modified curvature motion for image smoothing and enhancement. IEEE Trans. Image Process. 7(3), 345–352 (1998) 22. Polyakov, A.M.: Quantum geometry of bosonic strings. Physics Letters 103 B, 207–210 (1981) 23. Rudin, L., Osher, S., Fatemi, E.: Non-linear total variation based noise removal algorithms. Physica D Letters 60, 259–268 (1992) 24. Yanenko, N.N.: About implicit difference methods of the calculation of the multidimensional equation of thermal conductivity. In: Proceedings of VUZ. Series of Mathematics, vol. 23(4), pp. 148–157 (1961) 25. Andreev, V.B.: Alternating direction methods for parabolic equations in two space dimensions with mixed derivatives. Zhurnal Vychislitelnoi Matematiki i Matematicheskoi Fiziki 7(2), 312–321 (1967) 26. Mckee, S., Mitchell, A.R.: Alternating direction methods for parabolic equations in three space dimensions with mixed derivatives. The Computer Journal 14(3), 25–30 (1971) 27. Weickert, J.: Coherence-enhancing diffusion filtering. Intl. J. of Comp. Vision 31(2/3), 111–127 (1999) 28. Dascal, L., Rosman, G., Kimmel, R.: Efficient Beltrami filtering of color images via vector extrapolation. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 92–103. Springer, Heidelberg (2007)
Multi-scale Total Variation with Automated Regularization Parameter Selection for Color Image Restoration Yiqiu Dong1 and Michael Hintermüller2 1 START-Project “Interfaces and Free Boundaries” and SFB “Mathematical Optimization and Applications in Biomedical Science”, Institute of Mathematics and Scientific Computing, University of Graz, Heinrichstrasse 36, A-8010 Graz, Austria
[email protected] 2 Department of Mathematics, Humboldt-University of Berlin, Unter den Linden 6, 10099 Berlin, Germany, and START-Project “Interfaces and Free Boundaries” and SFB “Mathematical Optimization and Applications in Biomedical Science”, Institute of Mathematics and Scientific Computing, University of Graz, Heinrichstrasse 36, A-8010 Graz, Austria
[email protected] Abstract. In this paper, a multi-scale vectorial total variation model for color image restoration is introduced. The model utilizes a spatially dependent regularization parameter in order to preserve the details during noise removal. The automated adjustment strategy of the regularization parameter is based on local variance estimators combined with a confidence interval technique. Numerical results on images are presented to demonstrate the efficiency of the method.
1
Introduction
We consider the problem of recovering color images degraded by cross-channel ˆ blurring and Gaussian noise. Without loss of generality, we assume an image u is a vectorial function defined on a bounded and piecewise smooth open subset ˆ : Ω → RM , where M is the number of channels in the color Ω ∈ R2 , that is, u ˆ is given by model. The degraded form z of u ˆ + n, z = Ku where K ∈ L(L2 (Ω; RM )) is a cross-channel blurring operator, and n represents white Gaussian noise with zero mean and standard deviation σ. The problem of ˆ from z with unknown n is known to be typically ill-posed [1]. restoring u In order to preserve significant edges during restoring images, Rudin, Osher and Fatemi proposed total variation regularization [2] for gray-level images. In this approach (which we call the TV-model in what follows), the image u ˆ is recovered by solving the optimization problem λ min |Du| + |Ku − z|2 dx, (1) u∈BV (Ω) Ω 2 Ω X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 271–281, 2009. c Springer-Verlag Berlin Heidelberg 2009
272
Y. Dong and M. Hintermüller
where BV (Ω) denotes the space of functions of bounded variation and λ > 0. Because of the edge preservation ability, the TV-model is widely accepted as a reliable tool in image restoration. Over the years, various research efforts have been devoted to studying, solving and extending the TV-model; see, e.g., [3, 4, 5, 6, 7, 8, 9] as well as the monograph [1] and the many references therein. In general, images are comprised of multiple objects at different scales. This suggests that different values of λ localized at image features of different scales are desirable to obtain better restoration results. For this reason, a multi-scale total variation (MTV) model with a spatially varying choice of parameters was proposed [10]. In order to enhance image regions containing details while still sufficiently smoothing homogeneous features, a spatially dependent regularization parameter selection was proposed in [11]. In this paper, we will extend the multi-scale total variation with spatially dependent regularization parameter to restore degraded color images. The automated adjustment strategy of the regularization parameter is based on local variance estimators combined with a confidence interval technique. For speeding up the performance of the scheme we generalize the multi-scale representation according to [12, 13], and the corresponding subproblems are solved by a superlinearly convergent algorithm based on Fenchel-duality and inexact semismooth Newton techniques. The latter extends earlier work in [9]. The outline of the rest of the paper is as follows. In Section 2 we introduce the multi-scale vectorial total variation model and the primal-dual algorithm for solving the associated minimization problem. In Section 3 we extend the LVE-based parameter selection to color images. Section 4 proposes a method for color image restoration combining the multi-scale representation and spatially adaptive parameter selection. Section 5 gives numerical results to demonstrate the performance of the new method. Finally conclusions are drawn in Section 6.
2
Multi-scale Vectorial Total Variation
Based on the TV-model (1), in [14] the vectorial total variation (VTV) regularization was proposed for restoring color images: λ min |Du| + |Ku − z|2 dx, (2) 2 Ω u∈BV (Ω;RM ) Ω where the space BV (Ω; RM ) of vector-valued functions is the set of functions u ∈ L1 (Ω; RM ) such that Ω |Du| < ∞, where the vectorial TV norm Ω |Du| is defined as |Du| = sup u · divv dx : v ∈ Cc1 (Ω; RM×2 ), |v| ≤ 1 , Ω
and |v| =
Ω
M i=1 vi , vi .
The space BV (Ω; RM ) endowed with the norm uBV (Ω;RM ) = uL1 (Ω;RM ) + |Du| Ω
is a Banach space.
Multi-scale Total Variation
273
In the VTV-model (2), the parameter λ controls the trade-off between a good fit of z and a smoothness requirement due to the vectorial total variation regularization. Since images are usually comprised of multiple objects at different scales, locally different λ is desirable. Therefore, here we consider multi-scale vectorial total variation (MVTV): 1 min |Du| + λ(x)|Ku − z|2 dx. (3) 2 Ω u∈BV (Ω;RM ) Ω Similar as in Section 2 of [11], we can obtain the same conclusion on the existence and uniqueness of the minimizer for the MVTV-model. Here, we do not repeat proof details, but rather refer to [11]. 2.1
Primal-Dual Approach to Multi-scale Vectorial Total Variation
In [9] an infeasible primal-dual algorithm of generalized Newton-type was proposed for solving (1). In the sequel we extend its key features to the case (3). Rather than operating on the MVTV-model (3) the method is based on 1 μ 2 2 min |∇u| dx + λ|Ku − z| dx + |∇u| dx, (4) 2 2 Ω u∈H01 (Ω;RM ) 2 Ω Ω ¯ for almost all x ∈ Ω and 0 < μ λ ¯ −1 . The μwhere 0 < ≤ λ(x) ≤ λ term serves the purpose of a function space regularization for a “convenient" dualization in a Hilbert space setting. In our numerics, we typically choose μ = 0. Applying the Fenchel-Legendre calculus [15] analogously as in [9], the Fencheldual of (4) reads sup 2
M
p ∈ L (Ω; R ) |p(x)| ≤ 1 a.e. in Ω
1 1 − |||K ∗ z − divp|||2H −1 + z2L2 , 2 2
(P0 )
where |||u|||2H −1 = Hμ,K v, vH01 ,H −1 , v ∈ H −1 (Ω; RM ) with Hμ,K = (K ∗ λK − μ)−1 , : H01 (Ω; RM ) → H −1 (Ω; RM ), and ·, ·H01 ,H −1 denotes the duality pairing between H01 (Ω; RM ) and its dual H −1 (Ω; RM ). Moreover, L2 (Ω; RM ) = (L2 (Ω; RM ))2 . In order to avoid the non-uniqueness of the solution of (P0 ), following [9] we consider a dual regularization: 1 1 β ∗ 2 2 sup − |||K z − divp|||H −1 + zL2 − p2L2 . (P ) 2 2 2 Ω p ∈ L2 (Ω; RM ) |p(x)| ≤ 1 a.e. in Ω
where β > 0 is the regularization parameter. In order to study the effect of the βregularization of the Fenchel-dual, we apply the Fenchel-Legendre calculus once more and find that the dual of (P ) is given by 1 μ 2 2 |∇u|2 dx + λ|Ku − z| dx + Φβ (∇u)dx, (P ∗ ) min 2 Ω u∈H01 (Ω;RM ) 2 Ω Ω where for w ∈ L2 (Ω; RM ),
274
Y. Dong and M. Hintermüller
Φβ (w)(x) =
|w(x)| − β2 , if |w(x)| ≥ β, 1 2 2β |w(x)| , if |w(x)| < β.
(5)
¯ and The first-order optimality conditions of (P ∗ ) characterize the solution u ¯ of (P ∗ ) and (P ), respectively, by p ¯ − div¯ − μ¯ u + K ∗ λK u p = K ∗ λz in H −1 (Ω; RM ), max(β, |∇¯ u|)¯ p − ∇¯ u=0
2
in L (Ω; R ). M
(6a) (6b)
Note that the system (6) is non-smooth, i.e. not necessarily Fréchet-differentiable. The discrete version of this system can be solved by a semismooth Newton method [9, 11]. The generalized Newton solver converges globally, that is regardless of the initialization, and locally at a superlinear rate [9].
3
Spatially Dependent Regularization Parameter Selection
Since the capability of multi-scale vectorial total variation is mainly limited by the selection of the parameter λ, in this section we extend the way to choose λ proposed in [11] to the MVTV-model. Suppose the variance of Gaussian noise is σ 2 , which can be estimated easily in practice. With a correct choice of λ in the TV-model (1), the restored image u can satisfy the constraint |Ku − z|2 dx = σ 2 |Ω| (7) Ω
globally. However, the MVTV-model (3) represents a localized version of the constraint by allowing λ = λ(x). In order to enhance image details while preserving homogenous regions, the choice of λ must be based on local image features. Hence, we search for a reconstruction where the variance of the residual is closer to the noise variance in both the detail regions and the homogeneous parts. In order to achieve this goal we introduce local variance estimators (LVEs) for an automated adaptive choice of λ. 3.1
Local Variance Estimator
Consider the discrete version of the residual image rh = zh − K h uh , where uh is the restored image from the minimization problem (2) with λ > 0. If we use a relatively small parameter λ, the residual rh will include the noise as well as the details. Then, the average of the squared residual in a small window will reflect the distribution of details in the image. ω Let Ωi,j denote the set of pixel-coordinates in a ω-by-ω window centered at (i, j) (with obvious modification near the boundary), i.e., ω ω
ω Ωi,j ≤ s, t ≤ , = (s + i, t + j) : − 2 2
Multi-scale Total Variation
275
where · means rounding to the nearest integer towards zero. Then we apply the mean filter with window size ω to the residual image rh as follows: LVEω i,j =
M 1 M ω2
rhs,t
ω k=1 (s,t)∈Ωi,j
2 k
=
M 1 M ω2
ω k=1 (s,t)∈Ωi,j
h 2 zs,t − (K h uh )s,t k .
Here LVE stands for a “Local Variance Estimator”. In general, LVEω has a large value in the detail regions, and it has a small value in the homogeneous regions. But the noise in the residual may also lead to some large LVE values in the homogeneous regions. In order to reduce the effect due to noise, we utilize the confidence interval technique well-known in statistics [16, 17] in connection with LVE. 3.2
Upper Bound for the Local Variance
In the discrete setting, all elements of n can be regarded as an array of independent normally distributed random variables with mean 0 and variance σ 2 . Then, the random variable M 1 ω Ti,j = 2 (nhs,t )2k σ ω k=1 (s,t)∈Ωi,j
ω has the χ2 -distribution with M ω 2 degrees of freedom, that is, Ti,j ∼ χ2Mω2 . Set M 1 ω Si,j := (zhs,t − (K h uh )s,t )2k . M ω2 ω k=1 (s,t)∈Ωi,j
ˆ h , then ˆ h satisfies nh = zh − K h u If uh = u ω Si,j =
M 1 M ω2
M 1 M ω2
ˆ h )s,t )2k (zhs,t − (K h u
ω k=1 (s,t)∈Ωi,j
=
(nhs,t )2k =
ω k=1 (s,t)∈Ωi,j
σ2 ω T . M ω 2 i,j
On the contrary, if the residual image zh − K h uh contains details, we expect ω Si,j =
M 1 M ω2
k=1
M 1 > M ω2
(zhs,t − (K h uh )s,t )2k
ω (s,t)∈Ωi,j
ω k=1 (s,t)∈Ωi,j
(nhs,t )2k =
σ2 ω T . M ω 2 i,j
ω > B for some pixel (i, j) Therefore, we search for a bound B such that Si,j implies that in the residual some details are left. Given m × m, the total number
276
Y. Dong and M. Hintermüller
of pixels in the color image with M channels, we propose to consider the expected σ2 ω 2 maximum of the m2 random variables Mω 2 Ts , s = 1, . . . , m , as the bound B: B ω,m :=
σ2 E( max T ω ), M ω 2 k=1,...,m2 k
(8)
where E represents the expected value of a random variable. Similar as proposed in [11], we get σ2 (Em (T ω ) + dm (T ω )), M ω2
B ω,m =
where Em (T ω ) = Td + βκm , dm (T ω ) = β π√6 , βm = m2 fm (Td ), κ = 0.577215, and m f(Td ) is the distribution of Td , which is the so-called dominant value. 3.3
Selection of the Parameter λ
Now, we use the confidence interval for S ω to reduce the effect from noise on the local variance estimators in order to distinguish the detail regions in the images correctly. Recall that LVEω denotes the mean of the squared residual in a given window. Ideally, there is only noise in the residual. Then LVEω should behave like S ω . Hence, whenever ω,m LVEω ), i,j ∈ [0, B
(9)
we assume that the window contains noise only. On the other hand, if (9) is not satisfied, we suppose that this is due to image details contained in the residual ω image in Ωi,j . This property is useful when updating the parameter λ locally. For adapting λ algorithmically we proceed as follows. Initially we assign a small positive value to λ. Then we restore the image iteratively by increasing λ according to the following rule: ˜ k+1 λ i,j
= ζ · min
˜k λ i,j
λk+1 i,j =
+ ω +ρ (LVEk )i,j − σ ,L ,
1 ω2
˜k+1 , λ s,t
(10a)
(10b)
ω (s,t)∈Ωi,j
where ζ ≥ 1, ρ > 0, (x)+ = max(x, 0), LVEω k is obtained from uk , L is a large ˜ k ∈ L∞ (Ω), and for each channel of the vectorial data positive value to ensure λ we use the same λk during restoration. In our numerics we choose ζ = 2 which comes from the method proposed in [12] (TNV-algorithm). Finally, we set the ˜ k ||∞ /σ in order to keep the new λ ˜ k+1 at the same scale parameter ρ = ρk = ||λ ˜ as λk .
Multi-scale Total Variation
4
277
Our Method
Recently, a multi-scale image decomposition method (TNV-algorithm) was proposed in [12], which uses the TV-model (1) to extract the details in the residual, and which varies the regularization parameter over a sequence of dyadic scales to capture different features in the image. Although this method performs better than a number of existing methods, it satisfies the constraint (7) only globally, and does not consider the local characteristic of the features in the image. Referring to this decomposition method, we intertwine its idea with the MVTV-model (3), and combine it with the spatially dependent regularization parameter selection. This results in the following algorithm: Algorithm 2
2
1: Initialize uh0 = 0 ∈ RMm , ph0 = 0 ∈ RMm ×2 , λ0 = [λ0 , · · · , λ0 ] ∈ RM with 2 λ0 ∈ Rm and k = 0. 2: If k = 0 solve the discrete version of the minimization problem 1 μ 2 0 2 ˜ 0 = arg u min |∇u|2 dx + λ |Ku − z| dx + |∇u| dx, 2 Ω u∈H01 (Ω;RM ) 2 Ω Ω else compute vkh = zh − K h uhk and solve the discrete version of the minimization problem: 1 μ 2 k 2 ˜ k = arg u min |∇u|2 dx + λ |Ku − vk | dx + |∇u| dx, 2 Ω u∈H01 (Ω;RM ) 2 Ω Ω ˜ hk . 3: Update uhk+1 = uhk + u h 4: Based on uk+1 , update
+ ˜ k+1 = 2 · min λ ˜k + ρ λ LVEω − σ ,L , k
(λk+1 )i,j =
1 ω2
˜k+1 . λ s,t
ω (s,t)∈Ωi,j
5: Stop; or set k := k + 1 and go to step 2. A few remarks on the algorithm are in order. We initialize λ by a relatively small positive constant. In our numerical practice an 11-by-11 window turned out to yield reliable results. In Section 5, we study the influence of the window size on the restoration results. Similar to the Bregman iteration proposed in [18], we stop the iterative procedure as soon as the residual zh −K h uhk 2 drops below ξσ, where ξ > 1 relates to the image size. For m → ∞ we have ξ → 1.
5
Numerical Results
In this section we provide numerical results to study the behavior of our method with respect to its image restoration capabilities. We use two RGB color images
278
Y. Dong and M. Hintermüller
(a)
(b)
Fig. 1. Original images: (a) “Barbara”, (b) “Lena”
(a)
(b)
(c)
Fig. 2. Results of denoising image “Barbara” (the 1st row) and “Lena” (the 2nd row): (a) Noisy images, (b) Restored images (k = 3), (c) Final values of λ
(i.e., M = 3), “Barbara” (576-by-720) and “Lena” (512-by-512), as shown in Figure 1. Furthermore, from the experiments conducted on a broad variety of images we found that our method is robust with respect to the initial choice of λ. Thus, in all experiments listed here we use the same initial choice λ = 2.5. 5.1
Color Image Denoising
Here, we concentrate on image denoising, i.e., K h is the identity matrix. The degraded images containing Gaussian white noise with the noise level σ = 0.1. For a study of our method in the case of texture-like structures we zoom the
Multi-scale Total Variation
279
Fig. 3. Restored images by our method with different ω: (a) ω = 5, (b) ω = 11, (c) ω = 17
(a)
(b)
(c)
Fig. 4. Results of restoring blurred noisy image “Barbara” (the 1st row) and “Lena” (the 2nd row): (a) Blurred noisy images, (b) Restored images (k = 5), (c) Final values of λ
two images in Figure 1 into certain regions. In all of our experiments the image intensity range is scaled to [0, 1]. The results are shown in Figure 2 together with the number of iterations k. We can see that our method suppresses the noise successfully while preserving the details. In addition, we also show the final values of λ obtained by our choice rule. We find that in detail regions λ is large in order to preserve the details, and it is small in the homogeneous regions to remove noise.
280
Y. Dong and M. Hintermüller
In order to test our method for different values of the window size ω, Figure 3 shows the restored images with ω = 5, 11, 17. Except for some slight effects, we observe a remarkable stability with respect to ω. 5.2
Color Image Deblurring and Denoising
In this section, we illustrate the restoration ability of our method for noisy blurred images. The blurring operator K is a cross-channel blurring operator with the kernel: ⎡ ⎤ ⎡ ⎤ Krr Krg Krb 0.8 · (M, 7, 135) 0.1 · (G, 9, 7) 0.1 · (A, 7) ⎣ Kgr Kgg Kgb ⎦ = ⎣ 0.1 · (A, 9) 0.8 · (M, 7, 90) 0.1 · (G, 5, 1) ⎦ , Kbr Kbg Kbb 0.1 · (G, 7, 5) 0.1 · (M, 7, 45) 0.8 · (A, 11) where (A, r) denotes the average blur with window size r, (G, r, σ) denotes the Gaussian blur with window size r and standard deviation σ, (M, l, θ) denotes the motion blur with length l and angle θ, and (r, g, b) are the three channels in the RGB color model. Further we have Gaussian white noise with σ = 0.02. Figure 4 depicts a part of the noisy blurred “Barbara” and “Lena” images with the restored results and final values of λ. We find that our method still can preserve most of the details; see, e.g., the features on the scarf. Furthermore, for noisy blurred images our method is still able to distinguish most of the detail regions properly.
6
Conclusion
A multi-scale vectorial total variation model with spatially adapted regularization parameter λ for color image restoration is proposed in this paper. The local variance estimator LVE of the residual image is extended to the multi-channel case, and turns out to be an accurate instrument for updating λ within an iterative procedure. Assuming that the noise variance σ 2 is known, the present algorithm is completely automatized, i.e., there is no necessity of tuning parameters. The numerical results show that the new method can restore the degraded images efficiently while preserving most details.
References 1. Vogel, C.: Computational Methods for Inverse Problems. Frontiers Appl. Math., vol. 23. SIAM, Philadelphia (2002) 2. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 3. Dobson, D., Vogel, C.: Convergence of an iterative method for total variation denoising. SIAM J. Numer. Anal. 34, 1779–1791 (1997) 4. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numerische Mathematik 76, 167–188 (1997)
Multi-scale Total Variation
281
5. Chang, Q., Chern, I.L.: Acceleration methods for total variation-based image denoising. SIAM J. Applied Mathematics 25, 982–994 (2003) 6. Strong, D., Chan, T.: Edge-preserving and scale-dependent properties of total variation regularization. Inverse Problems 19, 165–187 (2003) 7. Chambolle, A.: An algorithm for total variation minimization and application. Journal of Mathematical Imaging and Vision 20, 89–97 (2004) 8. Hintermüller, M., Kunisch, K.: Total bounded variation regularization as bilaterally constrained optimization problem. SIAM J. Appl. Math. 64, 1311–1333 (2004) 9. Hintermüller, M., Stadler, G.: An infeasible primal-dual algorithm for total bounded variation-based inf-convolution-type image restoration. SIAM Journal on Scientific Computing 28(1), 1–23 (2006) 10. Almansa, A., Ballester, C., Caselles, V., Haro, G.: A TV based restoration model with local constraints. J. Sci. Comput. 34(3), 209–236 (2008) 11. Dong, Y., Hintermüller, M., Rincon-Camacho, M.: Automated parameter selection in a multi-scale total variation model. IFB-Report No. 22, Institute of Mathematics and Scientific Computing, University of Graz (November 2008) 12. Tadmor, E., Nezzar, S., Vese, L.: A multiscale image representation using hierarchical (BV, L2 ) decompositions. Multiscale Model. Simul. 2, 554–579 (2004) 13. Tadmor, E., Nezzar, S., Vese, L.: Multiscale hierarchical decomposition of images with applications to deblurring, denoising and segmentation. Comm. Math. Sci. 6, 1–26 (2008) 14. Bresson, X., Chan, T.: Fast dual minimization of the vectorial total variation norm and applications to color image processing. Inverse Problems and Imaging 2(4), 455–484 (2008) 15. Ekeland, I., Témam, R.: Convex Analysis and Variational Problems. Classics Appl. Math., vol. 28. SIAM, Philadelphia (1999) 16. Papoulis, A.: Probability, Random Variables, Stochastic Processes. McGraw Hill, New York (1991) 17. Mood, A.: Introduction to the Theory of Statistics. McGraw-Hill, New York (1974) 18. Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. SIAM Multiscale Model. and Simu. 4, 460–489 (2005)
Multiplicative Noise Cleaning via a Variational Method Involving Curvelet Coefficients Sylvain Durand1 , Jalal Fadili2 , and Mila Nikolova3 1
2
3
M.A.P. 5 - CNRS, University Paris Descartes, France
[email protected] http://www.math-info.univ-paris5.fr/∼sdurand/ GREYC CNRS-ENSICAEN-Université de Caen, France
[email protected] http://www.greyc.ensicaen.fr/∼jfadili/ CMLA - CNRS, ENS Cachan, PRES UniverSud, France
[email protected] http://www.cmla.ens-cachan.fr/∼nikolova/
Abstract. Classical ways to denoise images contaminated with multiplicative noise (e.g. speckle noise) are filtering, statistical (Bayesian) methods, variational methods and methods that convert the multiplicative noise into additive noise (using a logarithmic function) in order to apply a shrinkage estimation for the log-image data and transform back the result using an exponential function. We propose a new method that involves several stages: we apply a reasonable under-optimal hard-thresholding on the curvelet transform of the log-image; the latter is restored using a specialized hybrid variational method combining an 1 data-fitting to the thresholded coefficients and a Total Variation regularization (TV) in the image domain; the restored image is an exponential of the obtained minimizer, weighted so that the mean of the original image is preserved. The minimization stage is realized using a properly adapted fast Douglas-Rachford splitting. The existence of a minimizer of our specialized criterion and the convergence of the minimization scheme are proved. The obtained numerical results outperform the main alternative methods.
1
Introduction
In many active imaging systems (e.g. synthetic aperture radar, laser or ultrasound imaging), the data for the unknown image S0 : Ω → R+ , Ω ⊂ R2 , are severely corrupted with multiplicative noise. Then several independent measurements for the same image are needed: Sk = S0 ηk + nk ,
∀k ∈ {1, · · · , K},
(1)
where ηk : Ω → R+ , and nk represent the multiplicative and a typically zeromean additive noise, ∀k. Commonly (see e.g. [27]) ηk is modeled as a onesided exponential probability density function (pdf) (cf. Fig. 1(a)): pdf(ηk ) = X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 282–294, 2009. c Springer-Verlag Berlin Heidelberg 2009
Multiplicative Noise Cleaning
283
μ e−μηk 1lR+ (ηk ) for μ = 1. In practice, one takes an average of all measurements, 1 K see e.g. Fig. 2(b). Since K k=1 nk ≈ 0, the data read (cf. e.g. [27, 1, 30]): S=
K K 1 1 Sk = S 0 ηk = S0 η . K K k=1
(2)
k=1
Usually all ηk are independent. Denoting by Γ the usual Gamma-function, the mean of the noise η in (2) has a Gamma distribution (cf. Fig. 1(b)): η=
K 1 ηk : K
pdf(η) =
k=1
K K η K−1 exp (−Kη) . Γ (K)
(3)
Various adaptive filters have been proposed, see e.g. [31,17]: they work well when the noise is moderate or weak, i.e. for K large. Bayesian, variational or diffusionbased methods have been proposed as well; see e.g. [28, 24, 18, 2]. Numerous methods convert the multiplicative noise into additive noise by v = log S = log S0 + log η = u0 + n,
(4)
see e.g. [16, 30, 1, 23]. Then the pdf of n reads (cf. Fig. 1(c)): −1 exp − K(n − en ) . n = log η : pdf(n) = K K Γ (K)
(5)
One can prove that E [n] = ψ0 (K) − log K and Var [n] = ψ1 (K), where ψk (z) = d k+1 log Γ (z) is the polygamma function. A common strategy is to decompose dz the log-data v into a multiscale frame for L2 (R2 ) (an over-complete basis), say W ≡ {wi : i ∈ I} where I is a set of indexes: y = W v = W u0 + W n.
(6)
By the Central Limit Theorem, the noise W n in y is nearly Gaussian — cf. Fig. 1(d). Then coefficients y are denoised using shrinkage estimators T : R → R: yT [i] = T (W v)[i] , ∀i ∈ I. (7) Shrinkage functions designed for multiplicative noise were proposed e.g. in [30,1]. ≡ {w Let W i : i ∈ I} be a left inverse of W . Then a denoised log-image vT reads T ((W v)[i]) w i = T (y[i]) w i . (8) vT = i∈I
i∈I
Then the sought-after image is of the form ST = exp vT . 1
1
0
10
(a) ηk
0
1
1
(b) η =
K 2
1 K
k=1
1
−2
0
1
ηk (c) n = log η
Fig. 1. Noise distributions
−1
0
(d) W n
1
284
S. Durand, J. Fadili, and M. Nikolova
Our approach. We apply (4) and consider a tight-frame transform of the logdata. The restored log-image (section 2) minimizes a criterion composed of an 1 -fitting to the (suboptimally) hard-thresholded frame coefficients and a Total Variation (TV) regularization in the image domain. The minimization (section 3) uses a specialized Douglas-Rachford splitting. The full algorithm, involving a bias correction, is given in section 4. Experiments are presented in section 5. Some notations. (.T ) means transposed, (.∗ ) means convex conjugate and (. ) means adjoint.
2
Restoration of the Log-Image
Here we consider how to restore a good log-image given data v : Ωmega → R obtained using (4). We focus on methods which, for a given preprocessed data set, lead to convex optimization problems. We comment only variational methods and shrinkage estimators since they underly our specialized hybrid objective function. 2.1
Drawbacks of Shrinkage Restoration and Variational Methods
Shrinkage restoration. The main problems with these methods, sketched in (7)(8), is that shrinking large coefficients entails an erosion of the spiky features, while shrinking small coefficients yields Gibbs-like oscillations in the vicinity of edges and a loss of details in the textured area. On the other hand, if shrinkage is insufficient, some coefficients bearing mainly noise can remain almost unchanged—we call such coefficients outliers—and (8) shows that they yield artifacts with the shape of the functions w i , see Fig. 2. Even though various improvements were brought, these artifacts remain visible—see the results on Fig. 3(d) and Fig. 4(c) in Section 5 using the very recent Stein-block thresholding [8].
(a) Noisy, K = 10
(b) T = 2 Var [n] (c) T = 4 Var [n] (d) T = 6 Var [n]
Fig. 2. (a) Noisy Lena obtained according to (1)-(2) for K = 10. (b)-(d) Restorations exp vTH where data v are denoised by hard-thresholding of its curvelet coefficients, see (12)-(13), for different choices of T .
Multiplicative Noise Cleaning
285
(b) Noisy: μ = 1, K = 10 (c)ˆ u by (11) and Sˆ by (34) see (2)-(3) psnr=26.2 db, mae=8.5
(a) Original (256 × 256)
(d) Stein-block [8] (e) AA algorithm [2] (f) Our method psnr=25.5 db, mae=9.4 psnr=25.4 db, mae=9.4 psnr=26.05 db, mae=8.8 Fig. 3. Restoration of (b) using modern methods. Note that (c) is a slightly improved version of [26] and that the restoration in (d) is done in the curvelet domain.
(a) Original
(b) Noisy
(c) Stein-block th.
(d) Our method
Fig. 4. (a) Shepp-Logan phantom (256 × 256). (b) Noisy, K = 10. (c) Denoised with Stein-block thresholding in the curvelet domain [8] PSNR=24.73dB, MAE=4. (d) Denoised with our algorithm PSNR=31.25dB, MAE=1.87.
Variational methods. In these methods, the restored function minimizes a criterion Fv of the form Fv (u) = ρ Ω
ψ u(t), v(t) dt +
ϕ(|∇u(t)|) dt, Ω
(9)
286
S. Durand, J. Fadili, and M. Nikolova
where ψ : R+→ R+ measures closeness to data and ϕ(|∇u(·)|) introduces priors 2 via a trade-off parameter ρ > 0. A classical choice is ψ = u(·)−v(·) . It is usually required that the potential function ϕ : R+ → R+ promotes images involving edges. Analysing the minimizers of Fv as solutions of PDE’s on Ω, Rudin, Osher and Fatemi [25] exhibited that ϕ(|∇u(t)|) = |∇u(t)|, leads to such images, where def for any z(t) = (z1 (t), z2 (t)) ∈ R2 , t ∈ Ω, one sets |z(t)| = z1 (t)2 + z2 (t)2 . The resulting regularization term is known as Total Variation (TV) and will be denoted by · TV . However, whatever smooth data-fitting is chosen, this regularization yields images containing numerous constant regions (called staircasing effect), hence textures and fine details are removed, see [22]. The method in [2] is of this kind and operates in the image domain; the fitting term is derived ˆ defined by from (3) and the denoised image S, Sˆ = arg min FS for FS (Σ) = ρ(K) log Σ(t) + S(t)/Σ(t) dt + Σ TV , (10) Σ
exhibits constant regions (see section 5). In [26], the regularization Σ TV is changed into log Σ TV so as to reformulate the model as a convex problem, and not to over smooth the image parts with higher gray values. To recover the denoised image, we applied Sˆ ∝ exp(ˆ u) for u ˆ = arg min where Fv (u) = ρ u − v 2 + u TV . u
(11)
Following [25], various edge-preserving convex functions ϕ have been proposed; see [3] for a recent overview. Even though ϕ (0) = 0 alleviates stair-casing, a systematic drawback of the resulting restored images is that the amplitude of edges is underestimated; thus neat edges or spiky areas are subjected to erosion. 2.2
Hybrid Methods
Hybrid methods, see e.g. [9, 19, 5, 14], combine the information contained in the large coefficients y[i] obtained according to (6) with priors directly on the image u. They amount to define the restored function u ˆ by minimize Φ(u) subject to u ˆ ∈ {u : |(W (u − v)) [i]| ≤ μi , ∀i ∈ I} . Using an edge-preserving regularization, such as Φ = TV is a pertinent choice. The selection of parameters {μi }i∈J is more tricky. This choice must take into account the magnitude of the relevant data coefficient y[i]. However, choosing μi based solely on y[i], as done in these papers, is too rigid since there are either correct data coefficients that incur smoothing (μi > 0), or noisy coefficients that are left unchanged (μi = 0). A good compromise that we adopt is to determine (μi )i∈I based both on the data and on the prior term. 2.3
A Specialized Hybrid Criterion
Given the log-data v obtained by (4), we apply a frame transform as in (6) to get y = W v = W u0 + W n. The noise contained in the i-th datum reads n, wi .
Multiplicative Noise Cleaning
287
The low frequency approximation coefficients carry important information on the image. Therefore, a good choice is to keep them intact at this stage. Let I∗ ⊂ I denote the subset of all such elements of the frame. Then we apply a hard-thresholding operator TH [12] to all coefficients I \ I∗ :
0 if |t| ≤ T, def yTH [i] = TH y[i] , ∀i ∈ I \ I∗ , where TH (t) = (12) t otherwise, where T is an underoptimal threshold in order to preserve the information relevant to edges and to some fine details in textured areas, contained in the small coefficients. Let us consider vTH = W v[i] w i , where I1 = {i ∈ I : |y[i]| > T } ∪ I∗ . (13) i∈I1
The image vTH contains a lot of artifacts with the shape of the w i for those y[i] that are noisy but above the threshold T , as well as information on the fine details in the original log-image u0 . In all cases, whatever the choice of T , an image of the form vTH is unsatisfactory—see Fig. 2. The denoised coefficients, denoted by x ˆ, are obtained based on the under-thresholded data yTH . We focus on hybrid methods of the form: x ˆ = arg minx F (x) for x), where Ψ is a data-fitting term in the frame domain and F (x) = Ψ (x, yTH ) + Φ(W Φ is an edge-preserving regularization term in the log-image domain. Let us denote I0 = I \ I1 = {i ∈ I \ I∗ : |y[i]| ≤ T }.
(14)
Coefficients y[i] for i ∈ I0 can be of the two types. 1. Coefficients y[i] bearing mainly noise—then the best choice is x ˆ[i] = 0; 2. Coefficients y[i] relevant to edges and other details in u0 . Since y[i] is difficult to distinguish from the noise, the relevant x ˆ[i] should be restored using the edge-preserving prior Φ. Note that a careful restoration must find a nonzero x ˆ[i] in order to avoid Gibbs-like oscillations in u ˆ. Coefficients y[i] for i ∈ I1 are of the following two types. 1. Large coefficients which carry the main features of the sought-after function. They verify y[i] ≈ wi , u0 and can be kept intact. 2. Coefficients highly contaminated by noise, i.e. |y[i]| | wi , u0 |. We call them outliers because if we had x ˆ[i] = y[i], then u ˆ would contain an artifact with the shape of w i since by (13) we get vTH = ˆ[j]w j + y[i]w i . Instead, x ˆ[i] must be restored according to the j\i x prior Φ. This analysis clearly defines the goals that the minimizer x ˆ of F is expected to achieve: restored coefficients x ˆ[i] have to fit yTH [i] exactly if they are coherent with the prior Φ, otherwise they have to be restored according to Φ. Since [21] it is known that such requirements can be satisfied by criteria F where Ψ is non-smooth at the origin (e.g. 1 ), see also [13]. For these reasons, we focus on F (x) = Ψ (x) + Φ(x),
(15)
288
S. Durand, J. Fadili, and M. Nikolova
where, for Λ = diag(λi )i∈I , λi |(x − y)[i]| + λi |x[i]| = Λ(x − yTH ) 1 , Ψ (x) = i∈I1 ∪I∗
Φ(x) =
Ω
(16)
i∈I0
x| ds = W x . |∇W TV
(17)
In the pre-processing step (12) we do not recommend the use of a shrinkage function other than TH since it will alter all the data coefficients without restoring them faithfully. Via TH , we base our restoration on data yTH where all nonthresholded coefficients keep the original information on the sought-after image. The theorem stated next addresses the existence and the uniqueness of a minimizer for F . Given y, let Gy be the (convex) set of all minimizers of F : def Gy = x ˆ ∈ 2 (I) : F (ˆ x) = min F (x) . (18) 2 x∈ (I)
2
Theorem 1. [13] For y ∈ (I) and T > 0 given, consider F as defined in (15), where Ω ∈ R2 is open, bounded and its boundary ∂Ω is Lipschitz. Suppose is the pseudo-inverse that {wi }i∈I is a frame of L2 (Ω) and the operator W of W . Assume also that λmin = min λi > 0. Then Gy is nonempty, and for all i∈I
x x x ˆ1 , x ˆ2 ∈ Gy , ∇W ˆ1 ∝ ∇W ˆ2 , a.e. on Ω. x x In words, Sˆ1 = W ˆ1 and Sˆ2 = W ˆ2 have the same level lines, i.e. they differ by a local change of contrast; the latter is usually invisible to the naked eye. The choice of λi is investigated in [13]. Following this analysis, we use only two values for λi , depending only on the set I the index i belongs to. We focus on curvelets transforms of the log-data because (a) such a transform captures efficiently the main features of the data and (b) it is a tight-frame which is helpful for the subsequent numerical stage.
3
Minimization for the Log-Image
Let Γ0 (H) denote the class of proper lower-semicontinuous convex functions on a Hilbert space H. Now we focus on the minimization problem find xˆ such that F (ˆ x) =
min F for F = Ψ + Φ, x
(19)
where Ψ and Φ are defined in (16)-(17). Clearly, Ψ, Φ ∈ Γ0 (2 (I)), hence F ∈ Γ0 (2 (I)). The set Gy in (18) is non-empty by Theorem 1 and can be rewritten as Gy = {x ∈ 2 (I) x ∈ (∂F )−1 (0)}, where ∂F stands for subdifferential. Minimizing F amounts to finding a solution to the fixed point equation x = (Id + γ∂F )−1 (x) ,
(20)
where (Id + γ∂F )−1 is the resolvent operator associated to ∂F , γ > 0 is the proximal stepsize and Id is the identity map on 2 (I). Since (Id + γ(∂Ψ + ∂Φ))−1 cannot be calculated in closed-form, we focus on splitting methods that use separately the resolvent operators (Id + γ∂Ψ )−1 and (Id + γ∂Φ))−1 .
Multiplicative Noise Cleaning
3.1
289
Specialized Douglas-Rachford (D-R) Splitting Algorithm
The D-R family is the most general class of monotone operator splitting methods. Given a sequence μt ∈ (0, 2), D-R methods can be expressed via the recursion μt μt Id+ (2(Id+γ∂Ψ )−1− Id) ◦ (2(Id+γ∂Φ)−1− Id) x(t) . (21) x(t+1)= 1− 2 2 Since problem (19) has solutions, we have the following convergence result: Theorem 2. Let γ > 0 and μt ∈ (0, 2) be such that t∈N μt (2 − μt ) = +∞. Take x(0) ∈ 2 (I) and consider the sequence of iterates defined by (21). Then, (x(t) )t∈N converges weakly to some point x ˆ ∈ 2 (I) and (Id+γ∂Φ)−1 (ˆ x) ∈ Gy . The statement follows from [10, Corollary 5.2]. The sequence μt = 1, ∀t ∈ N fits. 3.2
Proximal Calculus
Proximity operators, invented in [20], generalize convex projection. Definition 1 (Moreau [20]). Let ϕ ∈ Γ0 (H). Then ∀x ∈ H the function z → 2 ϕ(z)+ x−z /2, for z ∈ H, achieves its infimum at a unique point denoted by proxϕ x. The relevant operator proxϕ : H → H is the proximity operator of ϕ. By the minimality condition for proxϕ , it is easy to see that ∀x, p ∈ H we have p = proxϕ x ⇐⇒ x − p ∈ ∂ϕ(p) ⇐⇒ (Id + ∂ϕ)−1 = proxϕ . By introducing def
the reflection operator rproxϕ = 2proxϕ − Id, the D-R iteration (21) reads μt μt Id + rproxγΨ ◦ rproxγΦ x(t) . (22) x(t+1) = 1 − 2 2 Proximity operator of Ψ
Lemma 1. Let x ∈ 2 (I). Then proxγΨ (x) = yTH [i]+TS γλi (x[i]−yTH [i]) , i∈I where TS γλi (z[i]) = max 0, z[i] − γλi sign(z[i]) . The proof is quite standard and can be found in our Report [15]. Note that rproxγΨ (x) = 2 yTH [i] + TS γλi (x[i] − yTH [i]) −x . (23) i∈I
(x). Computing proxγΦ Proximity operator of Φ. Clearly, Φ(x) = · TV ◦ W for an arbitrary W may be intractable. We assume that : 2 (I) → L2 (Ω) is surjective; (w1) W W = Id and W = c−1 W for 0 < c < ∞; note that W W = c Id; (w2) W (w3) W is bounded.
Let X = L2 (Ω) × L2 (Ω), ·, · X be the inner product in X and
·
p , p ∈ γ
[1, ∞] the Lp -norm on X . DefineB ∞ (X ) as the γ-radius closed L∞ -ballin X ,
γ def B ∞ = z ∈ X :
z
∞ ≤ γ = z = (z1 , z2 ) ∈ X : |z(t)| ≤ γ, ∀t ∈ Ω , and γ
PB γ (X ) : X → B ∞ (X ) the associated projector. ∞
290
S. Durand, J. Fadili, and M. Nikolova γ
Lemma 2. Let x ∈ 2 (I) and B ∞ (X ) is as defined above. Then: (x) ; proxγΦ (x) = Id − W ◦ Id − proxc−1 γ·TV ◦ W
(24)
(25) proxc−1 γ·TV (u) = u − PC (u) ,
γ/c where C = div(z) ∈ L2 (Ω) z ∈ Cc∞ (Ω × Ω), z ∈ B ∞ (X ) . (26) 2 Sketch of the proof. By (w1), range(W ) = L (Ω). Using that domain( · TV ) = = {0}. Statement (i) follows from L2 (Ω), we find cone dom · TV − range W
applying [11, Proposition 11] whose requirements are satisfied. If ϕ ∈ Γ0 (L2 (Ω)) and ϕ∗ is its convex conjugate, the Moreau decomposition [20, Proposition 4.a] asserts proxϕ + proxϕ∗ = Id .
(27)
Since the conjugate function of a norm is the indicator function of the ball of ∗ ∈ C. where C is given its dual norm, c−1 γ · TV (z) = 0 if z ∈ C, +∞ if z ∗ = PC . Identifying c−1 γ . TV with in (26). Using Definition 1, prox −1 c γ.TV ∗ ϕ and c−1 γ . TV with ϕ∗ , equation (27) leads to (ii)1 . From (24)-(25) we easily find that (x) . rproxγΦ (x) = Id − 2W ◦ PC ◦ W
(28)
Calculation of the projection PC in (25) on a discrete grid. In this case, W is an M×N tight frame with M= #I N = #Ω and assumption (w2) reads W = Id and W = c−1 W T , c ∈ (0, ∞) hence W T W = c Id). The discrete W counterpart of X is X = 2 (Ω) × 2 (Ω). We denote the discrete gradient by ¨ (cf. [6] or [29]) and the discrete divergence Div : X → 2 (Ω) is defined as ∇ ¨ . Moreover, C in (26) admits a simpler expression: Div = −∇
γ/c (29) C = Div(z) ∈ 2 (Ω) z ∈ B ∞ (X ) , γ/c
where B ∞ (X ) is defined using the new discrete notations. The projection PC in (25) does not admit an explicit form so we provide an iterative scheme for its calculation in the next lemma. Lemma 3. We adapt all assumptions of Lemma 2 to the new discrete setting, as explained above. Consider the forward-backward iteration ¨ Div(z (t) ) − cu/γ z (t+1) = PB 1 (X ) z (t) + βt ∇ (30) ∞
(31) 0 < inf βt ≤ sup βt < 1/4 t t z[i, j] if |z[i, j]| ≤ 1; PB 1 (X ) (z)[i, j] = (32) ∞ z[i, j]/|z[i, j]| otherwise . for
where ∀(i, j) ∈ Ω , 1
Note that our argument (27) to compute proxc−1 γ·TV (u) is not used in [6], which instead uses conjugates and bi-conjugates of the objective function.
Multiplicative Noise Cleaning
291
Then 1
(i) (z (t) )t∈N converges to a point zˆ ∈ B ∞ (X ); −1 (t) (ii) c γDiv(z ) converges to c−1 γDiv(ˆ z ) = (Id − proxc−1 γ·TV )(u). t∈N
The proof of this lemma can be found in our Report [15]. The iteration proposed in (30) to compute the proximity operator of the TV-norm is different from the projection algorithm of [6]. A similar iteration was proposed in [7] and in some other articles. The proof we gave is however simpler as it uses known properties of proximity operators. Note that computing prox·TV amounts to solving a discrete ROF-denoising. Our iteration to solve this problem is one possibility among others, see e.g. a recent report [4]. A crucial property of the D-R scheme (22) is its robustness to numerical errors that may occur when computing the proximity operators proxΨ and proxΦ , see [10]. More precisely, let at ∈ 2 (I) be an error term that models the inexact computation of proxγΦ in (24), as the latter is obtained through (30). If the sequence of error terms (at )t∈N and stepsizes (μt )t∈N in Theorem 2 obey t∈N μt at < +∞, then the D-R algorithm (22) converges [10, Corollary 6.2]. In our experiments, using 200 inner iterations in (30) is sufficient to satisfy this requirement. 3.3
Bias Correction to Recover the Sought-After Image
x Recall from (4) that u0 = log S0 and set u ˆ=W ˆ(NDR ) as the estimator of u0 , where NDR is the number of D-R iterations in (22). Unfortunately, the estimator u ˆ is prone to bias, i.e. E [ˆ u] = u0 − buˆ . A problem that classically arises in statistical estimation is how to correct such a bias. More importantly is how this bias affects the estimate after applying the inverse transformation, here the ˆ exponential. Our goal is then to ensure that for the estimate S of the image, we ˆ ˆ have E S = S0 . Expanding S in the neighborhood of E [ˆ u], we have u])(1+Var [ˆ u] /2+R2 ) = S0 exp (−buˆ )(1+Var [ˆ u] /2+R2 ) , (33) E euˆ = exp (E [ˆ where R2 is expectation of the Lagrange remainder in the Taylor series. One can observe that the posterior distribution of u ˆ is nearly symmetric, hence R2 ≈ 0. Then buˆ ≈ log(1v +Var [ˆ u] /2) ensures unbiasedness. Consequently, finite sample (nearly) unbiased estimates of u0 and S0 are respectively u ˆ + log(1 + Var [ˆ u] /2), and exp (ˆ u) (1 + Var [ˆ u] /2). Var [ˆ u] can be reasonably estimated by ψ1 (K), the variance of the noise n in (4) being given in (1). Thus, given the restored logimage u ˆ, our denoised image read: Sˆ = exp (ˆ u) (1 + ψ1 (K)/2) .
4
(34)
Full Algorithm to Suppress Multiplicative Noise
Piecing together Lemmas 1 and 2, and Theorem 2, we write down the full multiplicative noise removal algorithm:
292
S. Durand, J. Fadili, and M. Nikolova
Task: Denoise an image S corrupted with multiplicative noise according to (2). Parameters: The observed noisy image S, number of iterations NDR (DouglasRachford outer iterations) and NFB (Forward-Backward inner iterations), stepsizes μt ∈ (0, 2), 0 < βt < 1/4 and γ > 0, tight-frame transform W and initial threshold T (e.g. T = 2 ψ1 (K)), regularization parameters λ0,1 associated to the sets I0,1 . Specific operators: (a) TS γλi (z) = max 0, z[i] − γλi sign(z[i]) , ∀z ∈ R#I . i∈I
z[i, j] if |z[i, j]| ≤ 1 (b) ∀(i, j) ∈ Ω, PB 1 (X ) (z)[i, j] = ∞ z[i, j]/|z[i, j]| else. ¨ and Div—the discrete versions of the continuous operators ∇ and div. (c) ∇ (d) ψ1 (·) defined according to (1) (built-in Matlab function). Initialization: Compute v = log S and transform coefficients y = W v. Hardthreshold y at T to get yTH . Choose x(0) . Main iteration: For t = 1 to NDR , x(t) . (1) Inverse curvelet transform of x(t) according to u(t) = W (0) (2) Initialize z ; For s= 0 to NFB− 1 ¨ Div(z (s) ) − c u(t) . z (s) + βt ∇ z (s+1) = P 1 B ∞ (X )
γ
(3) Set z (t) = z (NFB) and compute w(t) = c−1 γ Div(z (t) ). (4) Forward curvelet transform: α(t) = W w(t) . (5) Compute r(t) = rproxγΦ (x(t) ) = x(t) − 2α(t) . (6) Find q (t)= rproxγΨ ◦ rproxγΦ x(t)= 2 yTH [i]+TS γλi r(t) [i]−yTH [i] −r(t) . (t) (7) Update x(t+1) : x(t+1) = (1 − μt /2) x(t) + (μt /2)q . x(NDR ) (1 + ψ1 (K)/2). Output: Denoised image Sˆ = exp W
5
i∈I
Experiments
In all experiments, our algorithm was run using second-generation curvelet tight frame along with the following set of parameters: ∀t, μt ≡ 1, βt = 0.24, γ = 10 and NDR = 50. The initial threshold T was set to 2 ψ1 (K). For comparison purposes, some very recent multiplicative noise removal algorithms from the literature are considered: the AA algorithm [2] minimizing the criterion in (10), and the Stein-block denoising method [8] in the curvelet domain, applied on the log transformed image. The latter is a sophisticated shrinkage-based denoiser that thresholds the coefficients by blocks rather than individually, and has been shown to be nearly minimax over a large class of images in presence of various additive bounded noises. We also tried the L2-TV method where the restored log-image u ˆ minimizes (11) and the denoised image Sˆ involves the bias correction (34). Thanks to the bias correction, it can be seen as an improved version of the first method proposed in the recent Report [26, § 4.1]. For fair comparison, the hyperparameters for all competitors were tweaked to reach their best level of performance on each noisy realization.
Multiplicative Noise Cleaning
293
The denoising algorithms were tested on two images: Lena and Boat, all of size 256×256 and gray-scale in the range [1, 256]. For each image, a noisy observation is generated by multiplying the original image by a realization of noise according to (2)-(3) for K = 10. The running time of our denoising method is 1 minute 3 seconds for 50 iterations on Intel 2.5 GHz Core Duo. The denoising performance of any algorithm is measured in terms of peak signal-to-noise ratio (PSNR) and mean absolute-deviation MAE, namely √ PSNR = 20 log10 N S0 ∞ / Sˆ − S0 dB and MAE = Sˆ − S0 /N . 2
1
The results are depicted in Figs. 3 and 4. Note that the AA algorithm tends to over-regularize the solution. Our denoiser clearly outperforms its competitors.
References 1. Achim, A., Tsakalides, P., Bezerianos, A.: Sar image denoising via bayesian wavelet shrinkage based on heavy-tailed modeling. IEEE Trans. Geosci. Remote Sens. 41(8), 1773–1784 (2003) 2. Aubert, G., Aujol, J.-F.: A variational approach to remove multiplicative noise. J. on Applied Mathematics 68(4), 925–946 (2008) 3. Aubert, G., Kornprobst, P.: Mathematical problems in image processing, 2nd edn. Springer, Berlin (2006) 4. Aujol, J.-F.: Some algorithms for total variation based image restoration. Report CLMA 2008-05 (2008) 5. Candès, E.J., Guo, F.: New multiscale transforms, minimum total variation synthesis. Applications to edge-preserving image reconstruction. Signal Processing 82 (2002) 6. Chambolle, A.: An algorithm for total variation minimization and application. J. of Mathematical Imaging and Vision 20(1) (2004) 7. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 8. Chesneau, C., Fadili, J., Starck, J.-L.: Stein block thresholding for image denoising. Technical report (2008) 9. Coifman, R.R., Sowa, A.: Combining the calculus of variations and wavelets for image enhancement. Applied and Computational Harmonic Analysis 9 (2000) 10. Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53(5) (2004) 11. Combettes, P.L., Pesquet, J.-C.: A Douglas-Rachford splittting approach to nonsmooth convex variational signal recovery. IEEE J. of Selected Topics in Signal Processing 1(4), 564–574 (2007) 12. Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3), 425–455 (1994) 13. Durand, S., Nikolova, M.: Denoising of frame coefficients using l1 data-fidelity term and edge-preserving regularization. SIAM J. on Multiscale Modeling and Simulation 6(2), 547–576 (2007) 14. Durand, S., Froment, J.: Reconstruction of wavelet coefficients using total variation minimization. SIAM J. on Scientific Computing 24(5), 1754–1767 (2003)
294
S. Durand, J. Fadili, and M. Nikolova
15. Durand, S., Fadili, J., Nikolova, M.: Multiplicative noise removal using L1 fidelity on frame coefficients. Report CMLA n.2008-40 (2008) 16. Fukuda, S., Hirosawa, H.: Suppression of speckle in synthetic aperture radar images using wavelet. Int. J. Remote Sens. 19(3), 507–519 (1998) 17. Krissian, K., Westin, C.-F., Kikinis, R., Vosburgh, K.G.: Oriented speckle reducing anisotropic diffusion. IEEE Trans. on Image Processing 16(5), 1412–1424 (2007) 18. Ma, J., Plonka, G.: Combined Curvelet Shrinkage and Nonlinear Anisotropic Diffusion. IEEE Trans. on Image Processing 16(9), 2198–2206 (2007) 19. Malgouyres, F.: Mathematical analysis of a model which combines total variation and wavelet for image restoration. J. of information processes 2(1), 1–10 (2002) 20. Moreau, J.-J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. CRAS Sér. A Math 21. Nikolova, M.: Minimizers of cost-functions involving nonsmooth data-fidelity terms. Application to the processing of outliers. SIAM J. on Numerical Analysis 40(3), 965–994 (2002) 22. Nikolova, M.: Weakly constrained minimization. Application to the estimation of images and signals involving constant regions. J. of Mathematical Imaging and Vision 21(2), 155–175 (2004) 23. Pizurica, A., Wink, A.M., Vansteenkiste, E., Philips, W., Roerdink, J.B.T.M.: A review of wavelet denoising in mri and ultrasound brain imaging. Current Medical Imaging Reviews 2(2), 247–260 (2006) 24. Rudin, L., Lions, P.-L., Osher, S.: Multiplicative denoising and deblurring: Theory and algorithms. In: Osher, S., Paragios, N. (eds.), pp. 103–119. Springer, Heidelberg (2003) 25. Rudin, L., Osher, S., Fatemi, C.: Nonlinear total variation based noise removal algorithm. Physica 60D, 259–268 (1992) 26. Shi, J., Osher, S.: A nonlinear inverse scale space method for a convex mutiplicative noise model. In: UCLA 2007 (2007) 27. Ulaby, F., Dobson, M.C.: Handbook of Radar Scattering Statistics for Terrain. Artech House, Norwood (1989) 28. Walessa, M., Datcu, M.: Model-based despeckling and information extraction from sar images. IEEE Trans. Geosci. Remote Sens. 38(9), 2258–2269 (2000) 29. Welk, M., Steidl, G., Weickert, J.: Locally analytic schemes: A link between diffusion filtering and wavelets shrinkage. Applied and Computational Harmonic Analysis 24, 195–224 (2008) 30. Xie, H., Pierce, L.E., Ulaby, F.T.: SAR speckle reduction using wavelet denoising and markov random field modeling. IEEE Trans. Geosci. Remote Sensing 40(10), 2196–2212 (2002) 31. Yu, Y., Acton, S.T.: Speckle reducing anisotropic diffusion. IEEE Trans. on Image Processing 11(11), 1260–1270 (2002)
Projected Gradient Based Color Image Decomposition Vincent Duval, Jean-François Aujol, and Luminita Vese 1
Institut TELECOM, TELECOM ParisTech, CNRS UMR 5141
[email protected] 2 CMLA, ENS Cachan, CNRS, UniverSud
[email protected] 3 UCLA, Mathematics Department
[email protected] Abstract. This work deals with color image processing, with a focus on color image decomposition. The problem of image decomposition consists in splitting an original image f into two components u and v = f − u. u contains the geometric information of the original image, while v is made of the oscillating patterns of f , such as textures. We propose a numerical scheme based on a projected gradient algorithm to compute the solution of various decomposition models for color images or vectorvalued images. A direct convergence proof of the scheme is provided, and some analysis on color texture modeling is given. Keywords: Color image decomposition, projected gradient algorithm, color texture modeling.
1
Introduction
Total variation regularization was introduced almost 20 years ago for image restoration in the seminal work by Rudin et al [1]. It has now grown as a popular and widely used tool in image processing (see [2, 3] and references therein for instance). If we denote by f the original image, the problem we are interested in consists in minimizing energies of the type: |Du| + μf − ukT . (1) Here |Du| is the total variation of u; we simply have |Du| = |∇u| dx in the case when u is regular. .T stands for a norm which favors the noise and/or the textures of the original image f (in the sense that it is small for such features) and k is a positive exponent. The most basic choice for .T is the L2 norm, and k = 2. However, inspired from the book by Y. Meyer [4], and also motivated by work of Mumford-Gidas [5], other spaces have been considered for modeling natural images and oscillating patterns such as textures or noise. [4] was the inspiration source of many works, e.g. to name a few [6, 7, 8, 9, 10, 11, 12, 13, 14]. Image decomposition consists in splitting an original image f into two components, u X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 295–306, 2009. c Springer-Verlag Berlin Heidelberg 2009
296
V. Duval, J.-F. Aujol, and L. Vese
and v = f − u. u contains the geometrical component of the original image (it can be seen as a sketch of the original image), while v is made of the oscillatory component (when the original image f is noise free, v is the texture component). In this work, we are concerned with color image processing. While some authors deal with color images using a Riemannian framework, like G. Sapiro and D. L. Ringach [15] or N. Sochen et al [16], others combine a functional analysis viewpoint with the Chromaticity-Brightness representation [17]. The model we use is more basic: it is the same as the one used in [18] (and related with [19]). Its advantage is to have a rich functional analysis interpretation. Note that in [20], the authors also propose a cartoon + texture color decomposition and denoising model inspired from Y. Meyer [4], using the vectorial versions of total variation and approximations of the space G(Ω) for textures (to be defined later); unlike the work presented here, they use Euler-Lagrange equations and a gradient descent scheme for the minimization. Here, we give some insight into the definition of a texture space for color images. In [21], a TV-Hilbert model was proposed for image restoration and/or decomposition: |Du| + μf − u2H (2) where .H stands for the norm of some Hilbert space H. This is a particular case of problem (1). Thanks to the Hilbert structure of H, different methods can be used to minimize (2), such as a projection algorithm [21]. We extend (2) to the case of color images. From a numerical point of view, (1) is not straightforward to minimize. Depending on the choice of .T , the minimization of (1) can be quite challenging. Even in the simplestcase when .T is the L2 norm and k = 2, handling the total variation term |Du| needs to be done with care. The most classical approach consists in writing the associated Euler-Lagrange equation to problem (1). In [1], a fixed step gradient descent scheme is used to compute the solution. This method has on the one hand the advantage of being very easy to implement, and on the other hand the disadvantage of being quite slow. To improve the convergence speed, quasi-Newton methods have been proposed [22]. Duality based schemes have also drawn a lot of attention to solve (1): first by Chan, Golub and Mulet in [23], later by A. Chambolle in [24] with a projection algorithm. This projection algorithm has recently been extended to the case of color images in [18]. It has been shown that graph cuts based algorithms could also be used [25,26]. Let us notice that it is shown in [27,28] that Nesterov’s schemes provide fast algorithms for minimizing (1). Another variant of Chambolle projection algorithm [24] is to use a projected gradient algorithm [25, 28, 29]. Here we have decided to use this approach which has both advantages of being easy to implement and of being quite efficient. The plan of the paper is the following. In Sect. 2, we define and provide some analysis about the spaces we consider in the paper. In Sect. 3, we extend the TVHilbert model originally introduced in [21] to the case of color images. In Sect. 4, we present a projected gradient algorithm to compute a minimizer of problem (2). This projected gradient algorithm has first been proposed by A. Chambolle
Projected Gradient Based Color Image Decomposition
297
in [25] for total variation regularization. A proof of convergence was given in [28] relying on optimization results by Bermudez and Moreno [30]. We derive here a simple and direct proof of convergence. In Sect. 5, we apply this scheme to solve various classical denoising and decomposition problems. We illustrate our approach with many numerical examples.
2
Definitions and Properties of the Considered Color Spaces
In this section, we introduce some notations, and we provide some analysis of the functional analysis spaces we consider to model color textures. 2.1
Introduction
Let Ω be a Lipschitz convex bounded open set in R2 . We model color images as RM -valued functions defined on Ω. The inner product in L2 (Ω, RM ) is denoted M as u, vL2 (Ω,RM ) = Ω i=1 ui vi . For a vector ξ ∈ RM , we define the norms: M M |ξ|1 = |ξi |, |ξ|2 = ξi2 , |ξ|∞ = max |ξi | . i=1
i=1
i=1...M
We say that a function f ∈ L1 (Ω, RM ) has bounded variation if the following quantity is finite: |f |T V = supξ∈B f , div ξL2 (Ω,RM ) , with B = {ξ ∈ Cc1 (Ω, R2×M )/∀x ∈ Ω, |ξ(x)|2 ≤ 1} .
(3)
This quantity is called the total variation. For more information on its properties, we refer the reader to [3]. The set of functions with bounded variation is a vector space classically denoted by BV (Ω, RM ). For f smooth enough, the total M 2 variation of f is |f |T V = Ω i=1 |∇fi | dx. Other choices of sets B are possible (see [18] for a discussion), which are mathematically equivalent and define the same BV space. But in practice, in image processing, it is crucial to have a coupling between the channels as in (3) in order to avoid visual artifacts. 2.2
The Color G(Ω) Space
The G(R2 ) space was introduced by Y. Meyer in [4] to model textures in grayscale images. For the generalization to color images, we will adopt the framework of [8]; the color space G(Ω) is also used in [20], as a generalization of [6] to color image decomposition and color image denoising. Definition 1. The space G(Ω) is defined by: G(Ω) = {v ∈ L2 (Ω, RM )/ ∃ξ ∈ L∞ (Ω, (R2 )M ), ∀i = 1, . . . , M, vi = div ξi and ξi · N = 0 on ∂Ω}
298
V. Duval, J.-F. Aujol, and L. Vese
(where ξi · N refers to the normal trace of ξi over ∂Ω). One can endow it with the norm: vG = inf{ξ∞ , ∀i = 1, . . . , M, vi = div ξi , ξi · N = 0 on ∂Ω} M 2 with ξ∞ = ess sup i=1 |ξi | . The following result was proved in [9] for grayscale images: it characterizes G(Ω). Working component by component, it is straightforward to extend it to color images (see [31]). Proposition 1 G(Ω) =
v ∈ L2 (Ω, RM )/ v=0 . Ω
Remark 1. The topology induced by the G-norm on G(Ω) is coarser than the one induced by the L2 norm. Let us consider, for m ∈ N∗ , the sequence ∀k = (k) 1 . . . M, fm (x, y) = cos mx + cos my defined on (−π, π)M . The vector field 1 1 ξ (k) = ( m sin(mx), m sin(my)) satisfies the boundary condition, and its diver√
2M and limm→+∞ f m G = gence is equal to f m . As a consequence f m G ≤ m 0. Yet, it is easy to see that f m 2L2 (Ω,RM ) = 4M π 2 . The sequence f m converges to 0 for the topology induced by the G-norm, but not for the one induced by the L2 norm.
More generally, oscillating patterns with zero mean have a small G norm (see [4] for more details).
3 3.1
Color TV-Hilbert Model: Presentation and Mathematical Analysis Presentation
The TV-Hilbert framework was introduced for grayscale images by J.-F. Aujol and G. Gilboa in [21] as a way to approximate the BV-G model. They prove that one can extend Chambolle’s algorithm to this model. In this section we show that this is still true for color images. We are interested in solving the following problem: 1 inf |u|T V + f − u2H (4) u 2λ where H is the space of zero-mean functions of L2 (Ω, RM ), regarded as a Hilbert space endowed with the following norm : v2H = v, KvL2 (Ω,RM ) . Here we assume that K : H → L2 (Ω, RM ) is a symmetric positive definite, bounded linear operator (for the topology induced by the L2 (Ω, RM ) norm on H) and K −1 is bounded on Im(K).
Projected Gradient Based Color Image Decomposition
299
Example 1 (The Rudin-Osher-Fatemi model). It was proposed in [1] for grayscale images, then extended to color images using different methods (e.g. [15] or [19]). In [18], the authors use another kind of color total variation, which is the one we use in this paper. The idea is to minimize the functional: |u|T V +
1 f − u2L2 (Ω,RM ) . 2λ
(5)
Without loss of generality, we can assume that f has zero mean. Then this model becomes a particular case of the TV-Hilbert model with K = Id. Example 2 (The OSV model). In [7], S. Osher, A. Solé and L. Vese propose to model textures by the H −1 space. In order to generalize this model, we introduce the following functional : 1 inf |u|T V + |∇Δ−1 (f − u)|2 (6) u 2λ Ω ⎛ −1 ⎞ ⎛ ⎞ Δ v1 ∇ρ1 M ⎜ ⎟ ⎜ . ⎟ .. 2 where Δ−1 v = ⎝ ⎠, ∇ρ = ⎝ .. ⎠, |∇ρ|2 = j=1 |∇ρj | and .
Δ−1 vM ∇ρM M −1 2 −1 i |∇Δ (f −u)| = Ω i=1 |∇Δ (f −ui )|2 = f −u, −Δ−1 (f −u)L2 (Ω,RM ) . Ω For K = −Δ−1 , the Osher-Solé-Vese problem is a particular case of the TV-Hilbert framework. We also refer to L. Lieu, L. Vese [14] for more general (BV, H −s ) models, as particular cases of the TV-Hilbert formulation.
3.2
Mathematical Study
For f ∈ L2 (Ω, RM ), the existence and uniqueness of the minimizer u of (4) can be proved using standard methods (see [3]). Now, let us introduce the notation v = f − u, when u is a minimizer of (4). Following Y. Meyer’s steps, one can extend the result proposed in [4] for grayscale images (see [31] for a detailed proof): Theorem 1 (Characterization of minimizers) Let f ∈ L2 (Ω, RM ). (i) If KfG ≤ λ then the solution of the TV-Hilbert problem is given by (u, v) = (0, f ). (ii) If Kf G > λ then the solution (u, v) is characterized by: KvG = λ and u, KvL2 (Ω,RM ) = λ|u|T V . For λ > 0, the set Gλ = {v ∈ L2 (Ω, RM ), vG ≤ λ} is a closed convex set, as well as K −1 Gλ . The orthogonal projection of this set is well-defined and we can notice that Theorem 1 reformulates:
H v = PK −1 G (f ) λ . u= f −v
300
V. Duval, J.-F. Aujol, and L. Vese
That is, v is the orthogonal projection of f on the set K −1 Gλ . Therefore, the problem is equivalent to its dual formulation, with v = λK −1 div p : inf λK −1 div p − f 2H .
|p|≤1
4
(7)
Projected Gradient Algorithm
We present here a projection algorithm for solving this dual formulation, inspired from [24, 18], and we provide a complete proof of convergence of this scheme. 4.1
Discrete Setting
From now on, we will work in the discrete case, using the following convention. A grayscale image is a matrix of size N × N . We write X = RN ×N the space of grayscale images. Their gradients belong to the space Y = X × X. The L2 inner product is u, vX = 1≤i,j≤N ui,j vi,j . For the gradient and divergence operators on grayscale images, we use the same discretizations as in [24]. A color image is an element of X M and its gradient belongs to Y M . The gradient and the divergence are defined component by component, so that the color divergence is still the opposite of the adjoint of the color gradient. Notice that in this framework, we have ∇2 = div2 = 8 (see [24]). 4.2
Projected Gradient
It was recently noticed ([25], [28]), that problem (7) for grayscale images could be solved using a projected gradient descent. This is the algorithm we decided to extend to the case of color images. Let B = {v ∈ Y M , ∀ 1 ≤ i, j ≤ N, |vi,j |2 ≤ 1} be the discrete version of our set of test-functions. Theorthogonal projection on x1 x2 B is easily computed: PB (x) = max{1,|x| . The projected gradient , 2 } max{1,|x|2 } descent scheme is defined by : pm+1 = PB pm + τ ∇(K −1 div pm − f /λ (8) which amounts to: pm+1 = i,j
pm + τ ∇(K −1 div pm − fλ )i,j . i,j −1 div pm − f ) | max 1, |pm i,j 2 i,j + τ ∇(K λ
(9)
Since the functional is not elliptic, the standard proof of convergence of the projected gradient algorithm (see [32] for instance) needs to be adapted to this particular case. Proposition 2. If 0 < τ < 4 K1−1 , then algorithm (9) converges. More precisely, there exists p ˜ ∈ B such that : lim (K −1 div pm ) = K −1 div p ˜
m→∞
˜ − f 2H = inf p∈B λK −1 div p − f 2H . and λK −1 div p
Projected Gradient Based Color Image Decomposition
301
Proof. We only give here a sketch of the proof. Let us first notice that p is a minimizer iff p ∈ B and ∀q ∈ B, ∀τ > 0, q − p, p − (p + τ ∇(K−1 div p − f /λ))L2 ≥ 0. Or equivalently: p = PB p + τ (∇(K −1 div p − f /λ) , where PB is the orthogonal projection on B with respect to the L2 inner product. Let p be such a minimizer. • Now let us consider a sequence defined by (8), and write A = −∇K −1 div . We have : pk+1 − p2 ≤ (I − τ A)(p − pk )2 since PB is 1-Lipschitz [32]. Provided I − τ A ≤ 1, we can deduce : pk+1 − p ≤ pk − p
(10)
and the sequence (pk − p) is convergent. • A is a symmetric positive semi-definite operator. By writing E = ker A and ⊥
F = ImA, we have Y M = E ⊕ F , and we can decompose any q ∈ Y M as the sum of two orthogonal components q E ∈ E and q F ∈ F . Notice that by injectivity of K −1 , E is actually equal to the kernel of the divergence operator. Let μ1 = 0 < μ2 ≤ . . . ≤ μa be the ordered eigenvalues of A. I − τ A = max(|1 − τ μ1 |, |1 − τ μa |) = 1 for 0 ≤ τ ≤
2 . μa
We can restrict I − τ A to F and then define : g(τ ) = (I − τ A)|F < 1 for 0 < τ < μ2a • Now we assume that 0 < τ < μ2a . Therefore, inequality (10) is true and the sequence (pk ) is bounded, and so is the sequence (K −1 div pk ). We are going to prove that the sequence (K −1 div pk ) has a unique cluster point. Let (K −1 div pϕ(k) ) be a convergent subsequence. By extraction, one ˜ its limit. Passcan assume that pϕ(k) is convergent too, and denote by p ϕ(k)+1 ing to the limit in (8), the sequence (p ) is convergent towards p ˆ = PB p ˜ + τ ∇(K −1 div p ˜ − f /λ) . Using (10), we also notice that ˜ p − p = ˆ p − p. As a consequence: ˜ p − p2 = PB p ˜ − f /λ) − PB p + τ ∇(K −1 div p − f /λ) 2 ˜ + τ ∇(K −1 div p ≤ (I − τ A)(˜ p − p)2 = (˜ p − p)E 2 + g(τ )2 (˜ p − p)F 2 < ˜ p − p2 if (˜ p − p)F =0 .
Of course, this last inequality cannot hold, which means that (˜ p −p)F = 0. Hence (˜ p − p) ∈ E = ker A and K −1 div p ˜ = K −1 div p: the sequence (K −1 div pk ) is convergent. • Since div 2 = ∇2 = 8 (see [24]), we conclude by noticing that μa ≤ 8K −1 .
Since we are only interested in v = λK −1 div p, Proposition (2) justifies the validity of algorithm (8). We can actually prove that the sequence (pm ) defined by (8) converges (see [31] Corollary 4.1).
302
5
V. Duval, J.-F. Aujol, and L. Vese
Applications to Color Image Denoising and Decomposition
In this last section, we apply the projected gradient algorithm to solve various color image problems. 5.1
TV-Hilbert Model
The Color ROF Model. As an application of (9), we use the following scheme for the ROF model (5): pm+1 = i,j
pm + τ ∇(div pm − fλ )i,j . i,j f m max 1, |pm i,j + τ ∇(div p − λ )i,j |2
(11)
The Color OSV Model: As for the OSV model (6), we use: = pm+1 i,j 5.2
pm − τ ∇(Δdiv pm + fλ )i,j . i,j m + f) | max 1, |pm − τ ∇(Δdiv p i,j 2 i,j λ
(12)
The Color A2BC Algorithm
Following Y. Meyer [4], one can use the G(Ω) space to model textures, and try to solve the problem: inf u (|u|BV + αf − uG ). In [8], the authors approximate this problem by minimizing the following functional: 1 f − u − v2L2 (Ω) + χGμ (v n ) 2λ
0 if v ∈ Gμ with χGμ (v) = . +∞ otherwise
Fμ,λ (u, v) = |u|BV +
(13)
Following [8,17,18], it is straightforward to extend the A2BC algorithm using the projection algorithm. We start by initializing with u0 = v 0 = 0, and then compute iteratively until convergence1: v n+1 = PGμ (f − un ) 5.3
and
un+1 = f − v n+1 − PGλ (f − v n+1 ) .
The Color TV-L1 Model
The TV-L1 model is very popular for grayscale images. It benefits from having both good theoretical properties (it is a morphological filter) and fast algorithms (see [26]). In order to extend it to color images, we consider the problem: M 2 inf u |u|T V + λf − u1 with the notation u1 = Ω l=1 |ul | . As for the A2BC algorithm, we are led to consider the approximation, for α > 0: inf |u|BV +
u,v 1
1 f − u − v22 + λv1 . 2α
The proof of convergence of this algorithm is the same as the one in [8].
Projected Gradient Based Color Image Decomposition
303
Fig. 1. From left to right: original and noisy images (WG, PSNR = 57.3 dB), denoised with color ROF (λ = 25, PSNR= 74.2 dB) and with color OSV (λ = 25, PSNR= 74.1 dB)
Fig. 2. Cartoon-texture decomposition using color A2BC algorithm (upper row) and color TVL1 (lower row). On top, the original image.
In order to generalize the TV-L1 algorithm proposed by Aujol et al ( [33]), we aim at solving the alternate minimization problem: 1 1 f − u − v22 f − u − v22 + λv1 . inf |u|BV + and inf u v 2α 2α
304
V. Duval, J.-F. Aujol, and L. Vese
Fig. 3. From left to right: original and noisy images (using salt and pepper noise, PSNR= 34.6 dB), denoised with color TVL1 (PSNR= 67.5 dB) and noise part
The first problem is a Rudin-Osher-Fatemi problem. Scheme (9) with K = Id is well adapted for solving it. The second one can be solved by a "vectorial soft thresholding": Proposition 3. The solution of the second problem is given by: v(x) = V Tαλ (f (x) − u(x)) =
f (x) − u(x) max (|f (x) − u(x)|2 − αλ, 0) a.e. |f (x) − u(x)|2
The proof of this last result is given in [31]. Therefore, we propose to generalize the TV-L1 algorithm by initializing with u0 = v 0 = 0, then computing iteratively until convergence (the proof of convergence is the same as the one in [33]): v n+1 = V Tαλ (f − un ) 5.4
and
un+1 = f − v n+1 − PGα (f − v n+1 ).
Numerical Experiments
Figure 1 displays denoising results using ROF (5) and OSV (6) models. The images look very similar but since the OSV model penalizes much more the highest frequencies than the ROF model [33], the denoised image still shows the lowest frequencies of the noise. The convergence speed in the ROF model is roughly the same as with the Bresson-Chan algorithm (see [18], [31]). Figure 2 displays a cartoon-texture decomposition experiment using different kinds of texture. The algorithms used were A2BC and TVL1. Both results look good. On Figure 3, a denoising experiment was performed using salt-and-pepper noise. The denoised picture looks quite good and surprisingly better than the original image! This is because the picture we used had some compression artifacts that the algorithm removed.
Acknowledgements This work has been supported by the French "Agence Nationale de la Recherche" (ANR), under grant FREEDOM (ANR07-JCJC-0048-01), "Films, REstauration Et DOnnées Manquantes", and by the National Science Foundation under Grants DMS-0312222 and DMS-0714945. Part of this work was done while the first author was visiting the Department of Mathematics, UCLA.
Projected Gradient Based Color Image Decomposition
305
References 1. Rudin, L., Osher, S., Fatemi, E.: Non linear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 2. Chan, T., Shen, J.: Image processing and analysis - Variational, PDE, wavelet, and stochastic methods. SIAM Publisher, Philadelphia (2005) 3. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations. Applied Mathematical Sciences, vol. 147. Springer, Heidelberg (2001) 4. Meyer, Y.: Oscillating patterns in image processing and nonlinear evolution equations. In: The fifteenth Dean Jacqueline B. Lewis memorial lectures. University Lecture Series, vol. 22. American Mathematical Society, Providence, RI (2001) 5. Mumford, D., Gidas, B.: Stochastic models for generic images. Quarterly of Applied Mathematics LIV(1) (2001) 6. Vese, L., Osher, S.J.: Modeling textures with total variation minimization and oscillating patterns in image processing. Journal of Scientific Computing 19(1-3), 553–572 (2003) 7. Osher, S., Solé, A., Vese, L.: Image decomposition and restoration using total variation minimization and the H −1 norm. SIAM Journal on Multiscale Modeling and Simulation 1(3), 349–370 (2003) 8. Aujol, J.F., Aubert, G., Blanc-Féraud, L., Chambolle, A.: Image decomposition into a bounded variation component and an oscillating component. Journal of Mathematical Imaging and Vision 22(1), 71–88 (2005) 9. Aubert, G., Aujol, J.: Modeling very oscillating signals. Application to image processing. Applied Mathematics and Optimization 51(2), 163–182 (2005) 10. Aujol, J.F., Chambolle, A.: Dual norms and image decomposition models. International Journal on Computer Vision 63(1), 85–104 (2005) 11. Yin, W., Goldfarb, D., Osher, S.: A comparison of three total variation based texture extraction models. Journal of Visual Communication and Image Representation 18(3), 240–252 (2007) 12. Garnett, J., Jones, P., Le, T., Vese, L.: Modeling oscillatory components with the homogeneous spaces BM O−α and W −α,p . Pure and Applied Mathematics Quarterly (to appear) 13. Le, T., Vese, L.: Image decomposition using total variation and div (BMO). Multiscale Modeling and Simulation, SIAM Interdisciplinary Journal 4(2), 390–423 (2005) 14. Lieu, L., Vese, L.: Image restoration and decomposition via bounded total variation and negative hilbert-sobolev spaces. Applied Mathematics & Optimization 58, 167– 193 (2008) 15. Sapiro, G., Ringach, D.L.: Anisotropic diffusion of multivalued images with applications to color filtering. IEEE Transactions on Image Processing 5(11), 1582–1586 (1996) 16. Sochen, N., Kimmel, R., Malladi, R.: A general framework for low level vision. IEEE Transactions on Image Processing 7(3), 310–318 (1998) 17. Aujol, J.F., Kang, S.H.: Color image decomposition and restoration. Journal of Visual Communication and Image Representation 17(4), 916–928 (2006) 18. Bresson, X., Chan, T.: Fast minimization of the vectorial total variation norm and applications to color image processing. Inverse Problems and Imaging (IPI) (accepted) (2007)
306
V. Duval, J.-F. Aujol, and L. Vese
19. Blomgren, P., Chan, T.: Color TV: Total variation methods for restoration of vector valued images. IEEE Transactions on Image Processing 7(3), 304–309 (1998) 20. Vese, L., Osher, S.: Color texture modeling and color image decomposition in a variational-PDE approach. In: Proceedings of the Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2006), pp. 103–110. IEEE, Los Alamitos (2006) 21. Aujol, J., Gilboa, G.: Constrained and SNR-based solutions for TV-Hilbert space image denoising. Journal of Mathematical Imaging and Vision 26(1-2), 217–237 (2006) 22. Vogel, C.: Computational Methods for Inverse Problems. Frontiers in Applied Mathematics, vol. 23. SIAM, Philadelphia (2002) 23. Chan, T., Golub, G., Mulet, P.: A nonlinear primal-dual method for total variationbased image restoration. SIAM Journal on Scientific Computing 20(6), 1964–1977 (1999) 24. Chambolle, A.: An algorithm for total variation minimization and its applications. JMIV 20, 89–97 (2004) 25. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 26. Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation part I: Fast and exact optimization. Journal of Mathematical Imaging and Vision 26(3), 277–291 (2006) 27. Weiss, P., Aubert, G., Blanc-Féraud, L.: Efficient schemes for total variation minimization under constraints in image processing. SIAM Journal on Scientific Computing (to appear) (2007) 28. Aujol, J.: Some algorithms for total variation based image restoration. CMLA Preprint 2008-05 (2008), http://hal.archives-ouvertes.fr/hal-00260494/en/ 29. Zhu, M., Wright, S., Chan, T.: Duality-based algorithms for total variation image restoration, UCLA CAM Report 08-33 (May 2008) 30. Bermudez, A., Moreno, C.: Duality methods for solving variational inequalities. Comp. and Maths. with Appls. 7(1), 43–58 (1981) 31. Duval, V., Aujol, J.F., Vese, L.: A projected gradient algorithm for color image decomposition. Technical report, UCLA, CAM Report 08-40 (2008) 32. Ciarlet, P.G.: Introduction á l’Analyse Numérique Matricielle et á l’Optimisation. Dunod (1998) 33. Aujol, J., Gilboa, G., Chan, T., Osher, S.: Structure-texture image decomposition modeling, algorithms, and parameter selection. International Journal of Computer Vision 67(1), 111–136 (2006)
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising Christoffer A. Elo1 , Alexander Malyshev1 , and Talal Rahman2 1
2
Department of Mathematics, University of Bergen Johannes Bruns gate 12, 5007 Bergen, Norway
[email protected],
[email protected] Bergen University College, Faculty of Engineering, Nygårdsgaten 112, 5020 Bergen
[email protected] Abstract. We propose a fast algorithm for image denoising, which is based on a dual formulation of a recent denoising model involving the total variation minimization of the tangential vector field under the incompressibility condition stating that the tangential vector field should be divergence free. The model turns noisy images into smooth and visually pleasant ones and preserves the edges quite well. While the original TV-Stokes algorithm, based on the primal formulation, is extremely slow, our new dual algorithm drastically improves the computational speed and possesses the same quality of denoising. Numerical experiments are provided to demonstrate practical efficiency of our algorithm.
1
Introduction
We suppose that the observed image d0 (x, y), (x, y) ∈ Ω ⊂ R2 , is an original image d(x, y) perturbed by an additive noise η, d0 = d + η.
(1)
The problem of recovering the image d from the noisy image d0 is an inverse problem that is often solved by variational methods using the total variation (TV) minimization. The corresponding Euler equation, which is a set of nonlinear partial differential equations, is typically solved by applying a gradient-descent method to a finite difference approximation of these equations. A classical total variation denoising model is the primal formulation due to Rudin, Osher and Fatemi [1] (the ROF model): λ d − d0 2L2 . (2) 2 The parameter λ > 0 can be chosen, e.g., to approximately fulfill the condition d − d0 L2 ≤ σ, where σ is an estimate of ηL2 . The Euler equation −div (∇d/|∇d|) + λ(d − d0 ) = 0 is usually replaced by a regularized one, ∇d + λ(d − d0 ) = 0, (3) −div |∇d|β min ∇dL1 + d
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 307–318, 2009. c Springer-Verlag Berlin Heidelberg 2009
308
C.A. Elo, A. Malyshev, and T. Rahman
where |∇d|β =
|∇d|2 + β 2 is a necessary regularization, since images contain
flat areas where |∇d| = d2x + d2y ≈ 0. When solving (3) numerically, an explicit time marching scheme with an artificial time variable, t, is typically used. However, such an algorithm is rather slow due to severe restrictions requiring small time steps for the convergence. It is well known that the ROF model suffers from the so called staircase effect, which is a disadvantage when denoising images with affine regions. To overcome this defect, we motivate for a two-step approach, where the fourth-order model, studied in [2, 3, 4], is decoupled into two second-order problems. Such methods are known to overcome the staircase effect, but tend to have computational difficulties due to very large conditioning. The authors of [5, 6] used the same two-step approach as in [7], but adopting ideas from [8, 9] they proposed to preserve the divergence-free condition on the tangential vector field. Recall that the tangential vector field τ is orthogonal to the normal (gradient) vector field n of the image d: n = ∇d = (dx , dy ),
τ = ∇⊥ d = (−dy , dx )T .
(4)
Hence div τ = 0. The first step of the TV-Stokes algorithm smoothes the tangential vector field τ0 = ∇⊥ d0 for a given noisy image d0 and then solve the minimization problem 1 min ∇τ L1 + τ − τ0 2L2 τ 2δ
subject to div τ = 0,
(5)
where δ > 0 is some carefully chosen parameter. Once a smoothed tangential vector field τ is obtained, the second step reconstructs the image d by fitting it to the normal vector field by solving the minimization problem min ∇dL1 d
n − ∇d, |n| L2
subject to d − d0 L2 = σ,
(6)
where σ is an estimate of ηL2 . In [5] the minimization problems (5) and (6) are numerically solved by means of a time marching explicit scheme, while existence and uniqueness are proven for the Modified TV-Stokes in [6]. The TV-Stokes approach resulted in an algorithm which does not suffer from the staircase effect, preserves the edges, and the denoised images look visually pleasant. However, the TV-Stokes algorithm from [5] is extremely slow convergent and therefore practically unusable as demonstrated in the last section of the present paper. We adopt the TV-Stokes denoising model but reduce the above presented primal formulation to the so called dual formulation, which is then numerically solved by a variant of fast Chambolle’s iteration [10]. The reduction exploits the orthogonal projector ΠK onto the subspace K = {τ : div τ = 0} for elimination of the divergence-free constraint.
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising
2
309
The TV-Stokes Denoising Algorithm in Dual Formulation
To overcome difficulties with non-differentiability in the primal formulation, Carter [11], Chambolle [10] and Chan, Golub and Mulet [12] have proposed dual formulations of the ROF model, where a dual variable p = (p1 (x, y), p2 (x, y)) is used to express the total variation: ∇dL1 = max {(d, divp)L2 : |pj (x, y)| ≤ 1 ∀(x, y) ∈ Ω, j = 1, 2} . p
(7)
For instance, a variant of dual formulation from [10] consists in minimization of the distance divp − λd0 L2 . In [10] Chambolle also proposed a fast iteration for solving this minimization problem that produces a denoised image after a few steps only. Below we show how to reduce the TV-Stokes model to a dual formulation. 2.1
Step 1
To derive a dual formulation of the first step we take advantage of the following analog of (7) for the total variation of the tangential vector field τ = (τ1 , τ2 )T : ∇τ L1 = max {(τ, divp)L2 : |pi (x, y)| ≤ 1 ∀(x, y) ∈ Ω, i = 1, 2} , p
(8)
where the dual variable p is a pair of two rows, p1 = (p11 , p12 ) and p2 = (p21 , p22 ). The divergence is defined as follows: divp = (divp1 , divp2 )T , where divpi =
∂pi2 ∂pi1 + , i = 1, 2. ∂x ∂y
(9)
This definition is similar to the vectorial dual norm from [13] for vectorial images, e.g. color images. Plugging (8) into (5) yields 1 min max (τ, divp)L2 + (τ − τo , τ − τo )L2 . (10) div τ =0 |pi |≤1 2δ Results from convex analysis, see for instance Theorem 9.3-1 in [14], allow us to exchange the order of max and min in (10) and obtain an equivalent optimization problem 1 max min (τ, divp)L2 + (τ − τo , τ − τo )L2 . (11) 2δ |pi |≤1 div τ =0 Now comes a trick. Let us introduce the orthogonal projection ΠK onto the constrained subspace K = {τ : div τ = 0}. Note that τ0 ∈ K. By means of the pseudoinverse Δ+ we may write that τ1 τ1 τ + = − ∇Δ div 1 . (12) ΠK τ2 τ2 τ2
310
C.A. Elo, A. Malyshev, and T. Rahman
The constraint div τ = 0 means that ΠK τ = τ , and the latter implies the equalities (τ, divp) = (ΠK τ, divp) = (τ, ΠK divp). Hence (11) is equivalent to 1 (13) max min (τ, ΠK divp)L2 + (τ − τo , τ − τo )L2 . |pi |≤1 div τ =0 2δ Solution to the minimization problem (without constraint div τ = 0!) 1 min (τ, ΠK divp)L2 + (τ − τo , τ − τo )L2 τ 2δ is τ = τ0 − δΠK divp
(14)
and satisfies the constraint div τ = 0. Owing to (14) we have the equality 1 1 (τ − τo , τ − τo ) = 2δ [(τ0 , τ0 ) − (δΠK divp − τ0 , δΠK divp − τ0 )] , (τ, ΠK divp) + 2δ which together with (13) gives our dual formulation:
min ΠK divp − δ −1 τ0 L2 : |pi | ≤ 1, i = 1, 2 . (15) p
Numerical solution of (15) is computed by Chambolle’s iteration from [10]:
pn + Δt ∇ ΠK divpn − δ −1 τ0 0 n+1 p = 0, p . (16) = 1 + Δt |∇ (ΠK divpn − δ −1 τ0 )| The iteration converges rapidly when Δt ≤ 14 . The smoothed tangential field after n iterations is given by τn = τ0 − δΠK divpn . 2.2
Step 2
The image d is reconstructed at the second step by fitting it to the normal vector field built from the tangential vector field computed at step 1, (n1 , n2 ) = (τ2 , −τ1 ). Again we introduce a dual variable r = (r1 (x, y), r2 (x, y)) and use the formula ∇dL1 = max|r|≤1 (∇d, −r)L2 . Then the minimization problem (6) is equivalent to the problem n 1 d − d0 2L2 , d, div r + (17) min max + d |r|≤1 |n| 2μ L2 where μ > 0 is a Lagrangian multiplier. After interchanging min and max in (17) we find conditions for attaining the minimum: n d = d0 − μ div r + . (18) |n| By analogy with (15) we can derive the dual formulation for step 2: d0 n − : |r| ≤ 1 . min div r + r |n| μ L2
(19)
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising
Chambolle’s iteration for (19) is as follows: n − μ−1 d0 rn + Δt ∇ div rn + |n| . rn+1 = n − μ−1 d0 1 + Δt ∇ div rn + |n| 2.3
311
(20)
The Discrete Algorithm
The staggered grid is used for discretization as in [5]. For convenience we introduce the differentiation matrices ⎞ ⎛ ⎞ ⎛ 1 −1 1 ⎟ ⎜ −1 1 ⎟ ⎟ −1 1 1⎜ 1⎜ ⎟ ⎟ ⎜ ⎜ T .. .. (21) , −B = ⎜ B= ⎜ ⎟, ⎟ . . . . .. .. ⎠ ⎟ h⎝ h⎜ ⎝ −1 1 ⎠ −1 1 −1 where B is the forward difference operator and −B T is the backward difference operator. The discrete gradient operator applied to a matrix d is then defined as ∇h d = dBxT , By d , (22) where Bx (By ) stands for differentiation in the x (resp. y) direction. The discrete divergence operator is given by divh (p1 , p2 ) = −p1 Bx − ByT p2 .
(23)
The discrete analog of the projection operator ΠK has the form h ΠK = I − ∇h (Δh )+ divh ,
(24)
where the gradient and divergence are applied in a slightly different manner: T τ1 dBx h T h div = −τ1 Bx − By τ2 , ∇ d= . (25) τ2 By d To complete the definition (24) we need a description of the pseudoinverse operator (Δh )+ for the discrete Laplacian Δh d = −dBxT Bx − ByT By d.
(26)
Let us introduce the orthogonal N × N matrix of the Discrete Cosine Transform, C, which is defined by dst(eye(N)) in MATLAB. The symmetric matrix defined in MATLAB by dst(eye(N-1)), satof the Discrete Sine Transform, S, T isfies the equation S S = (N/2) ∗ I, where I is theidentity matrix. We prefer N/2 of order N − 1. The to use the orthogonal symmetric matrix S = −S/ singular value decomposition of B has the form B = S[0, Σ]C,
Σ = diag(σ1 , . . . , σN −1 ),
where the diagonal matrix Σ has the diagonal entries
(27)
312
C.A. Elo, A. Malyshev, and T. Rahman
σk =
πk 2 sin , h 2N
k = 1, 2, . . . , N − 1.
(28)
By the aid of (27) equation (26) can be rewritten as f = Δh d = −dC T
0
Σx2
C − CT
0
Σy2
Cd.
Denoting f = Cf C T and d = CdC T we arrive at the equation 0 0 − d. f = −d Σy2 Σx2
(29)
(30)
Suppose that the matrices f and This equation is easily solved with respect to d. d have the entries fij and dij for i, j = 0, 1, . . . . Note that in our case f00 = 0. Then the solution d = G(f) is as follows: d00 = 0, 2 , di,0 = −fi,0 /σi,y
i = 1, 2, . . . ,
2 , d0,j = −f0,j /σj,x
j = 1, 2, . . . ,
(31)
2 2 + σj,x ), i, j = 1, 2, . . . . dij = −fij /(σi,y
Thus the pseudoinverse operator (Δh )+ can be efficiently computed with the help of the Discrete Cosine Transform: (Δh )+ f = C T G(Cf C T )C,
(32)
where the function G is defined in (31). In conclusion we recall that multiplication of an N × N matrix by C or C T = C −1 is typically implemented by the aid of the fast Fourier transform and requires only O(N 2 log2 N ) arithmetical operations. All other computations have the cost O(N 2 ).
(a) Lena, 200 × 200
(b) Cameraman, 256 × 256 Fig. 1. Original images
(c) Barbara, 512 × 512
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising
313
Algorithm. Dual TV-Stokes Given d0 , k, δ and μ ; Step one; Let p0 = 0 and q 0 = 0 ; Calculate τ 0 = (v 0 , u0 ) : v 0 = −Bd and u0 = dB T ; Initialize counter: n = 0 ; while not converged do Calculate projections: h (πp , πq ) = ΠK (divh pn , divh q n )
pn + k ∇h πp − δ −1 v0 . 1 + k |(∇h (πp − δ −1 v0 ))| q n + k ∇h πq − δ −1 u0 . q n+1 = 1 + k |(∇h (πq − δ −1 u0 ))| Update counter: n = n + 1 ; end Calculate τ : pn+1 =
h τ = τ0 − ΠK (δdivh pn+1 , δdivh q n+1 )
(33)
(34)
(35)
(36)
Step two; Let r 0 = 0 and calculate the normal field: n = (n1 , n2 ), n1 = u(v 2 + u2 )−1/2 and n2 = −v(v 2 + u2 )−1/2 ; Initialize counter: n = 0 ; while not converged do Calculate projections: r
n+1
r n + k ∇h divh (r n + n) − μ−1 v0 . = 1 + k ∇h divh (rn + n) − μ−1 v0
(37)
Update counter: n = n + 1 ; end Recover image d: d = d0 − μdivh r n+1
(38)
Algorithm 1. Dual TV-Stokes algorithm for image denoising 2.4
Numerical Experiments
In what follows we present several examples to show how the TV-Stokes method works for different images. All the images we have tested are normalized into gray-scale values, ranging from 0 (black) to 1 (white). In the experiments we start with a clean image, shown in figure 1, and then add random noise with zero mean. This is done by the imnoise MATLAB command, where the variance
314
C.A. Elo, A. Malyshev, and T. Rahman 5
130
4.4
x 10
4.2
120
4
110 3.8
100 3.6
90 3.4
80
3.2
3
70
2.8
60
2.6
50
0
10
20
30
40
50
60
70
80
90
0
1
2
3
4
5
6
(a) Dual TV-Stokes algorithm 1
7
8 4
100
x 10
(b) TV-Stokes [5]
Fig. 2. Energy vs. iterations plot for the first step
parameter is set to 0.001 for the Barbara image and 0.005 for the Lena image. The Cameraman image is taken directly from the paper [5], so we compare the results with the same noisy image as input. In [5] this model is further compared to the two-step method LOT and famous ROF model. The signal-to-noise ratio is measured in decibels before denoising: " ! (d − d)2 dx Ω SN R = 20 log10 ! , (39) (η − η)2 dx Ω ! ! 1 1 where d = |Ω| and η = |Ω| Ω d dx, Ω η dx The numerical procedures used in [5] were based on explicit finite difference schemes. This process is very slow, as the constraint converges slowly. However, in the proposed dual method the constraint is satisfied on each step by the orthogonal projection. The energy and number of iterations required for convergence in step one are shown in figure 2. The figure clearly illustrates that the dual TV-Stokes algorithm requires less iterations before the energy is stable than the primal TV-Stokes algorithm. Although the iterations in the dual TV-Stokes algorithm require more computational effort in each iteration, it is much faster than using sparse linear solvers. Inverting the Laplacian for the orthogonal projection in each iteration is a bottleneck for very large images. In all these examples the projection was applied by the aid of the Fast Fourier Transform, which needs O(n2 log(n)) operations in each iteration. For very large images, one should consider using a multigrid solver method for applying the projection. This will reduce the operations cost to O(N 2 ). All methods were coded in MATLAB, and in table 1 the CPU time is given in seconds for each test image. The figure shows the dual TV-Stokes algorithm vs. the primal TV-Stokes algorithm from [5]. We measure the L2 -norm of the energy in (15) and (19) for stopping criteria, and stop the iteration when the difference of the energy is below 10−3 . For the TV-Stokes algorithm we used the same stopping criteria as in [5], where the tolerance of the L2 -norm of the
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising
315
Table 1. Runtimes of the dual TV-Stokes algorithm compared to the TV-Stokes algorithm [5]. The test system is a 2 Opteron 270 dualcore 64-bit processor and 8GB RAM. Both steps in the dual TV-Stokes algorithm are computed with 150 iterations, while the first step in the primal TV-Stokes algorithm is calculated with 75000 iterations and the second step with 25000 iterations. Algorithm Dual TV-Stokes algorithm Image First step Second step Lena 9.8 1.12 Cameraman 17.4 2.2 Barbara 128.2 20.7
TV-Stokes algorithm, [5] First step Second step 9083.2 1992.5 11189.0 2259.4 80602.5 14926.3
constraint is equal to 5 × 10−3 and the difference in the energy tolerance is equal to 10−3 . The time steps were set to 10−3 and 5 × 10−3 respecitvely for the first and second step of the TV-Stokes algorithm. Our first test is the well known Lena image, which we will recover from highly added noise. We have cropped the image to show the face, which consists of smooth areas and edges that are important to preserve. The denoised image in Figure 3, shows that the dual TV-Stokes method has recovered the smooth areas without inducing any staircase-effect. The smoothing parameter δ is equal to 0.0835 and μ is equal to 0.17. Since this is a highly noisy image, the ROF model fails to give a visually pleasant image, because the smooth surfaces are piecewise continuous. The TV-Stokes algorithm however, has nearly the same quality as the dual TV-Stokes algorithm. For the TV-Stokes algorithm, δ was equal to 0.045. The next test is the Cameraman image, which consists of a smooth skyline and some low-intensity buildings in the background. The buildings are difficult to recover, as they get smeared out by the denoising. The results are shown in figure 4 with δ equal to 0.055 and μ equal to 0.08. The TV-Stokes result is taken from [5] where the SNR are the same as the one we report, 20 log10 (8.21) ≈ 18.28. Figure 4.d shows the TV-Stokes reconstruction for the same noisy image, where the delta parameter is equal to 0.06. The last example is the Barbara image, which is quite detailed, with high and low intensity textures. The high intensity textures and the smooth areas are preserved quite well, but the low intensity textures disappear in the same way as for the Cameraman. This image is 512 × 512 in size, which makes the algorithm slower, because of the rather large number of matrix operations per iteration. However, reaching a result for the optimal parameters is still obtainable, since the method has a denoised image after a few steps. Thus, one can run the method multiple times to find the optimal parameters. For this image we used δ equal to 0.05 and μ equal to 0.15. We do not report on an optimal result for this particular case of the TV-Stokes algorithm, due to page limitation and the amount of running time. Clearly, using the dual formulation is more effective than solving the model with the explicit gradient descent method. The CPU time is found for only one runtime, since computing an average of many runtimes is very time consuming
316
C.A. Elo, A. Malyshev, and T. Rahman
(a) Noisy image, SN R ≈ 14.0
(b) Denoised using the dual TVStokes algorithm
(c) Contour plot, dual TV-Stokes (d) Difference image, dual TVimage Stokes
(e) Denoised using ROF [1]
(f) Difference image, ROF
(g) Denoised using the TV-Stokes (h) Difference image, TV-Stokes algorithm [5] Fig. 3. Lena image (200 × 200), denoised using the dual TV-Stokes, TV-Stokes and the ROF algorithm
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising
317
(a) Noisy image, SN R ≈ 18.28 (b) Denoised using the dual TVStokes algorithm
(c) Difference image, dual TV- (d) Denoised using Stokes Stokes algorithm [5]
the
TV-
Fig. 4. Cameraman (256 × 256), denoised using the dual and the primal formulation of the TV-Stokes algorithm
(a) Noisy image, SN R ≈ 20.0
(b) Denoised image
Fig. 5. Barbara (512 × 512), denoised using the dual formulation of the TV-Stokes algorithm
for the TV-Stokes method. Although, the time shown are for one runtime, they clearly give the indication that our method is much faster and stable. The comparison with the primal method also shows that the proposed dual method has the same denoising quality.
318
C.A. Elo, A. Malyshev, and T. Rahman
References 1. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1-4), 259–268 (1992) 2. Chan, T., Marquina, A., Mulet, P.: High-order total variation-based image restoration. SIAM J. Sci. Comput. 22(2), 503–516 (2000) 3. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76, 167–188 (1997) 4. Lysaker, O., Lundervold, A., Tai, X.C.: Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Trans. Imag. Proc. 12, 1579–1590 (2003) 5. Rahman, T., Tai, X.C., Osher, S.: A tv-stokes denoising algorithm. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 473–483. Springer, Heidelberg (2007) 6. Litvinov, W., Rahman, T., Tai, X.C.: A modified tv-stokes model for image processing (submitted) (2008) 7. Lysaker, O.M., Osher, S., Tai, X.C.: Noise removal using smoothed normals and surface fitting. IEEE Transaction on Image Processing 13(10), 1345–1357 (2004) 8. Bertalmio, M., Bertozzi, A., Sapiro, G.: Navier-stokes, fluid dynamics, and image and video inpainting. In: Proc. IEEE Computer Vision and Pattern Recognition (CVPR) (2001) 9. Tai, X., Osher, S., Holm, R.: Image inpainting using tv-stokes equation. Image Processing based on partial differential equations (2006) 10. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20(1-2), 89–97 (2004) 11. Carter, J.: Dual methods for total variation-based image restoration. PhD thesis, UCLA (2001) 12. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20(6), 1964–1977 (1999) 13. Bresson, X., Cham, T.F.: Fast minimization of the vectorial total variation norm and applications to color image processing. CAM Report 07-25 (2007) 14. Ciarlet, P.G., Jean-Marie, T., Bernadette, M.: Introduction to numerical linear algebra and optimisation. Cambridge University Press, Cambridge (1989)
Anisotropic Regularization for Inverse Problems with Application to the Wiener Filter with Gaussian and Impulse Noise Micha Feigin and Nir Sochen School of Mathematics, Tel Aviv University
[email protected],
[email protected] Abstract. Most inverse problems require a regularization term on the data. The classic approach for the variational formulation is to use the L2 norm on the data gradient as a penalty term. This however acts as a low pass filter and thus is not good at preserving edges in the reconstructed data. In this paper we propose a novel approach whereby an anisotropic regularization is used to preserve object edges. This is achieved by calculating the data gradient over a Riemannian manifold instead of the standard Euclidean space using the Laplace-Beltrami approach. We also employ a modified fidelity term to handle impulse noise. This approach is applicable to both scalar and vector valued images. The result is demonstrate via the Wiener filter with several approaches for minimizing the functional including a novel GSVD based spectral approach applicable to functionals containing gradient based features.
1
Introduction
Handling degraded images, both due to blur and noise, is a practical reality in any imaging field. The common image degradation model is I = I0 ∗ h + n
(1)
where I, the observed image, is the result of a convolving the input image (or ideal image) I0 with some blurring kernel h. The result is then summed with additive noise n. This is a common model for any system that contains a lens and sensor. Both the blur and noise are a combination of several processes. Some typical causes for image blue are out of focus images, motion blur due to an unstable camera and/or object and a low pass filter resulting from the finite aperture and anti aliasing filter on the sensor. Noise can result from the sensor and amplifier due to low light, heat, dead pixels and background radiation or from memory and communication corruption. Each of these processes has it’s own typical blur kernel and noise distribution statistics [1, 2]. A direct naive approach to handle the blur can be given using a spectral (Fourier) approach manipulation of the degradation model equation. To see the ˆ +n difficulty though, look at the Fourier transform of this equation Iˆ = Iˆ0 · h ˆ X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 319–330, 2009. c Springer-Verlag Berlin Heidelberg 2009
320
M. Feigin and N. Sochen
(where the hat notation denotes the Fourier transform). This transforms the convolution into a multiplication which allows for an easy rearrangement of the ˆ Any L2 kernel h will the equation. Extracting Iˆ0 gives us Iˆ0 = Iˆ − n ˆ /h. decay to zero at infinity. This results with a divide by zero issue at least for high frequencies. Add to that the issue that the SNR usually drops at these frequencies, which makes this procedure very sensitive to noise. One solution is this case is the Wiener filter [3], which can be derived from the standard variational formulation for ill posed inverse problems by adding prior knowledge (or assumptions) via an additional penalty term to the reconstruction. That is to minimize an energy functional of the form S (I0 ) = I0 ∗ h − I + μ Φ (I0 ) . fidelity term penalty
(2)
Here Φ is some function of the parameter I0 that imposes the assumptions on the model. A common constraint term is Φ (I0 ) = ∇I0 which penalizes high frequencies as these are often the source of instability. The side effect of this constraint is that while high frequency noise is reduced in the reconstruction, edge detail is lost as well as is demonstrated in Fig. 1.
(a) Original Image (b) Degraded Input
(c) μ = 5 · 10−4
(d) μ = 5 · 10−5
Fig. 1. Edge preservation vs. Noise suppression with the Wiener filter. The input image 1(a) is degraded using Gaussian white noise 1(b). The results show the difference between preferring noise suppression 1(c) to edge preservation 1(d).
This functional is often minimized under the L2 norm which is appropriate for Gaussian noise. This is mainly due to the fact that the resulting Euler Lagrange equations are linear and are thus (relatively) easy to solve. That is, the classic Wiener filter functional based on the L2 norm 2 2 (3) S (I0 ) = I0 ∗ h − IL2 + ∇I0 L2 = |I0 ∗ g − I|2 + |∇I0 |2 dA . results with the following Euler Lagrange equations (see [4] for the derivation of the Euler Lagrange formulation of the convolution) −h (−¯ x) ∗ (h (¯ x) ∗ I0 − I) − μΔI0 = 0 .
(4)
Anisotropic Regularization for Inverse Problems
321
Here x ¯ is the coordinate vector x ¯ = (x, y) for the two dimensional case. This can be solved as before by applying the Fourier transform, which results with ˆ (−ω) · h ˆ (¯ h ω ) · Iˆ0 − Iˆ + μ |¯ ω |2 Iˆ0 = 0 (5) where ω ¯ = (ωx , ωy ) is the frequency vector for the resulting frequencies along the x and y axes respectively. Now, assuming that the convolution kernel is real we ˆ (−ω) = h ˆ ∗ (ω) (where h∗ is the conjugate of h) to rewrite can use the identity h the equation as ˆ h∗ (ω) Iˆ0 = Iˆ . (6) 2 ˆ 2 h (ω) + μ |ω| Despite being easy to solve, there are two main issues with the L2 norm approach, both for the constraint and the fidelity term. The first issue is that it fails to preserve object boundaries (Fig. 1). The main reason is the penalty term that penalizes high frequencies. As the fidelity term is also L2 it does little to alleviate this problem. The second issue is that the fidelity term is designed to handle Gaussian noise and behaves poorly in the presence of impulse noise One solution to both these issues is to use the L1 or total variation (TV) norm [5,6,7]. When used for the fidelity term it improves behavior with impulse noise. For the constraint it improves edge preservation. For the functional S (I0 ) = I0 ∗ h − IT V + μ ∇I0 T V = |I0 ∗ h − I| + μ |∇I2 | dA (7) the resulting Euler-Lagrange equations are −h (−¯ x) ∗
h (¯ x) ∗ I0 − I − μdiv |h (¯ x) ∗ I0 − I|
∇I |∇I|
= 0.
(8)
Unfortunately though the solution of which is unstable. One approach to improve on this is to use an augmented TV norm [5] 2 2 (I0 ∗ h − I) + η + μ |∇I0 | + ηdA (9) S (I0 ) = with 0 < η 1. The resulting modified Euler-Lagrange equation are ⎛ ⎞ ∇I h (¯ x) ∗ I0 − I ⎠ = 0. − μdiv ⎝ −h (−¯ x) ∗ 2 2 (h (¯ x) ∗ I0 ) + η |∇I| + η
(10)
This greatly improves the response of the fidelity term to impulsive noise, but not so much for the edge preservation of the constraint. It also doesn’t account explicitly for the edges in the image. Other approaches include using Mumford-Shah like techniques of edge detection into the functional [4], weighing the Laplacian based on edge detection [8], Perona-Malik like regularizers [9], maximal likelihood estimators [10], certainty maps [11] and channel pairing on color images [12].
322
M. Feigin and N. Sochen
We propose two novelties in this paper. The first is to combine the augmented L1 norm on the fidelity term for handling impulse noise with anisotropic regularization based on the Laplace Beltrami operator for edge preservation. This is achieved by keeping the L2 norm of the gradient, however this is calculated over a Riemannian manifold instead of the standard Euclidean space using a Laplace-Beltrami approach [13]. When combined with the augmented TV norm (9), this approach also produces exceptional results for impulsive noise (Sec. (4)) The second is the use of the GSVD (generalized singular value decomposition) for the minimization of functionals that employ a gradient based penalty term. It’s direct contribution is the ability easily minimize non-local operators and functionals defined on non square domains where the Fourier transform is inapplicable. For isotropic operators it can be very efficient as the decomposition needs to be calculated once only off line. One interesting point to both these approaches is the relation to other frameworks. In particular it enables to better understand the relation to sparse representation and K-SVD [14]. It is important to note that both these ideas are easily applicable to general ill posed inverse problems over general feature spaces, and specifically for this case, also for color images [15] and textures [16]. The rest of this paper is organized as follows: Sec. 2 discusses the anisotropic approach. Sec. 3 discusses several approaches to minimizing the functional, including a novel approach using the GSVD. Sec. 4 shows some results of the method.
2
Anisotropic Regularization for the Wiener Filter
The problem with edge preservation lies with the gradient based penalty term. In the Euler-Lagrange equations it manifests as a Laplacian that acts as a low pass filter. In order to correctly formulate the anisotropic penalty term, we start with the Euler Lagrange equation for the Wiener filter −h (−x) ∗ (h (x) ∗ I0 − I) − μΔI0 = 0
(11)
and replace the Laplacian with an anisotropic operator, namely the LaplaceBeltrami operator [13] resulting with −h (−x) ∗ (h (x) ∗ I0 − I) − μΔg I0 = 0 .
(12)
The Laplace Beltrami operator is defined as √ 1 Δg I = √ div gG−1 ∇I g where for the gray-scale case
1 + Ix2 Ix Iy , G= Ix Iy 1 + Iy2
g = det (G) .
(13)
(14)
What this does is apply the Laplacian diffusion operator, but instead of applying it under the standard Euclidean norm, it is applied over the image manifold [13].
Anisotropic Regularization for Inverse Problems
323
This means that we are looking at the image as a two dimensional manifold in three dimensional space for gray scale images and in 5 dimensional space for color images. When applying the diffusion operator, distance between pixels is measured over this manifold so the distance takes into account not only spatial offset but also intensity offset. The result is that pixels on different side of an edge are farther apart than pixels on the same homogeneous region and the edges act as insulators so that image data doesn’t flow across edges. This can be extended to color images by applying the diffusion √ on a per- channel basis, that is for each channel I i the process is Δg I i= √1g div gG−1 ∇I i with 2 i i 1 + i Ixi i IxIy . (15) G= i i 2 1 + i Iyi i Ix Iy The metric itself takes into account all the channels coupling them in the final process to remove misalignment of the edges across the different channels. Note that the image channels can be color channels such as RGB, CMY or more general features such as textures [16]). When extending the functional to handle impulse noise using the augmented L1 fidelity term, the Euler-Lagrange equations become instead −h (−¯ x) ∗
3
h (¯ x) ∗ I0 − I 2
(h (¯ x) ∗ I0 ) + η
− μΔg I = 0 .
(16)
Finding the Minimizer
There are several approaches to minimizing the resulting functional. We already have the Euler-Lagrange equations, i.e Eq. (12) and 16. Using the direct Fourier space approach, even for the L2 fidelity term, is not applicable here since the Fourier transform doesn’t diagonalize the LaplaceBeltrami operator. A different relatively simple direct approach approach is to use the gradient descent equations h (¯ x) ∗ I0 − I ∂ I0 = h (−¯ + μΔg I x) ∗ ∂t (h (¯ x) ∗ I0 )2 + η
(17)
For the L2 fidelity term there are two other spectral approaches that can be applied here, and eigen transform and the GSVD. The advantage of these among other things is that they provide a direct solution and thus prove the existence of the minimizer, same as for the standard Wiener filter. Proving the existence of a minimizer for the proposed Tikhonov functional is much more difficult and beyond the scope of this paper, but can be done using similar lines to those taken in [5]. 3.1
The Laplace-Beltrami Eigen-Space
We can use the same approach implemented in [17] to diagonalize the LaplaceBeltrami operator. The problem is that the Eigenvectors of the Laplace-Beltrami
324
M. Feigin and N. Sochen
operator don’t convert the convolution into a multiplication, so we need to combine this approach with the Fourier transform. We start with the Euler-Lagrange equations for the anisotropic Wiener filter, Eq. (12). If we linearize the Laplace Beltrami operator by fixing the metric, it becomes a self adjoint negative (semi) definite operator and thus it’s eigenspace is a bases to the function space under the L2 norm. Insert into this equation the eigen decomposition of the image using this eigen space I0 = c0i φi , I = ci φi (18) i
i
This produces h (−x) ∗
h (x) ∗
i
c0i φi −
+μ
ci φi
i
λi c0i φi = 0
(19)
i
which after rearrangement gives c0i h (−x) ∗ h (x) ∗ φi + μλi c0i φi = ci h (−x) ∗ φi .
(20)
i
Now, to handle the convolution, apply the Fourier transform ˆ∗ · h ˆ · c0 φˆi + μλi c0 φˆi = − −h ci ˆh∗ φˆi i i i
(21)
i
which can be rewritten as
c0i
i
ˆ 2 ˆ ∗ φˆi . ci h h − μλi φˆi =
(22)
i
˜ Here I˜ = (ci ) and This is a linear set of equations of the form AI˜0 = B I. 0 ˜ I0 = ci are the coefficient vectors in the Laplace-Beltrami eigen-space. This is a system of equations needs to be solved for I˜0 . Using these coefficients the ideal image I0 can be reconstructed. For a full solution this needs to be combined with fixed point iterations updating the metric, although it is stable with respect to the flow so in effect this is rarely need. There are two things to note here. First, the coefficients of I decay rather quickly so we can truncate I˜ and thus not calculate the right hand side of B. The same assumption can be made for I˜0 and thus for A. 3.2
Using the GSVD
Consider an energy functional with two linear operators La and Lb using the L2 norm 2 2 S (f ) = |La f | + μ |Lb f | dA . (23)
Anisotropic Regularization for Inverse Problems
325
Assuming that these operators can be discretized as matrices A and B respectively this can written as equations with v a vector representation of the function f S (v) = Av2 + μ Bv2 . (24) The two matrices A and B have a joint diagonalization based on the general singular value decomposition (GSVD) of the form [18] A = U Σ1 X T ,
B = V Σ2 X T
(25)
with U and V unitary matrices and Σ1 and Σ2 positive diagonal (not necessarily square). U and V must have the same number of columns but not necessarily the same number of rows (this last property we will need later on). Thus Eq. (24) can be rewritten as 2 2 S (v) = U Σ1 X T v L2 + μ V Σ2 X T v L2 . (26) Now, we can substitute v˜ = X T v to construct a functional in v˜. Also note that the L2 norm is invariant to unitary transformations, thus this functional is equivalent to 2 2 S (˜ v ) = Σ1 v˜L2 + μ Σ2 v˜L2 . (27) This new functional can be minimized according to v˜ resulting with Σ1T Σ1 v˜ + μΣ2T Σ2 v˜ = 0
(28)
We would like to do something similar with the Wiener-Filter formulation. The problem is that the gradient operator can not be discretized as a matrix operator since it takes a function and returns a vector. Luckily, what we need is an operator operating on I such that the norm would be equal to that of the gradient. For the L2 case this can be achieved as follows 2 2 S (I0 ) = |h ∗ I0 − I| + μ |∇I0 | dA 2 Dx 2 2 (29) ⇒ HI0 − I + Dy I0 = HI0 − IL2 + DI0 L2 L2 where H is the convolution matrix (which is block cyclic but not cyclic in the x 2D case) and D = D is the matrix resulting from stacking the matrix for the Dy derivative in the x direction and the one for the derivative in the y direction. For the L2 case we get 2 Dx 2 2 2 (30) Dy I0 = Dx I0 L2 + Dy I0 L2 = ∇I0 L2 . L2 Now we can use the fact that the GSVD can be applied to matrices with a different number of rows to diagonalize this equation H = U Σ1 X T ,
D = V Σ2 X T
(31)
326
M. Feigin and N. Sochen
Using this we can do the same procedure as before 2 2 HI0 − I2L2 + DI0 2L2 ⇒ U Σ1 X T I0 − I L + V Σ2 X T I0 L 2
2
(32)
and again based on U and V being unitary and substituting I˜0 = X T I0 and I˜ = U −1 I = U T I results with 2 2 (33) S I˜0 = Σ1 I˜0 − I˜ + Σ2 I˜0 L2
L2
this can be minimized according to I˜0 to produce Σ1T Σ1 I˜0 − I˜ + Σ2T Σ2 I˜0 = 0
(34)
or after rearrangement and back-substitution −1 T T Σ1 U I0 . I = X −T Σ1T Σ1 + μΣ2T Σ1
(35)
Note that Σ1T Σ1 + μΣ2T Σ1 is a diagonal matrix and thus easy to invert (in fact for μ = 1 it is the identity matrix). To apply the same idea to the anisotropic case, we need to formulate the prior to the Laplace-Beltrami operator as a gradient over a manifold instead. The operator is the minimizer of the following symmetric positive definite √ √ −1 2 ∇I T G−1 ∇I gdm σ = Dg ∇I dm σ, gG = Dg2 (36) and the discrete formulation for the anisotropic derivative matrix Dg (which replaces D in Eq. 29) can be found via an eigen decomposition of the matrix √ −1 gG √ ⎛ 2 √ 2 2 2 ⎞ Ix + 1+Ix +Iy Iy Ix Iy (1− 1+Ix2 +Iy2 )
√ √ D + D x y 4 4 2 2 2 2 2 2 2 2 Dx 1+Ix +Iy ⎜ (I +Iy )√ 1+Ix +Iy ⎟ (Ix +I√ y) = ⎝ I xI 1− Dg = A (37) ⎠ 1+Ix2 +Iy2 ) Iy2 + 1+Ix2 +Iy2 Ix2 x y( Dy √ √ D + D (Ix2 +Iy2 ) 4 1+Ix2 +Iy2 x (Ix2 +Iy2 ) 4 1+Ix2 +Iy2 y One advantage of this approach is that it is applicable to non-local operators and to non square domains where the Fourier transform as applied to the original Wiener filter fails. For the isotropic case it needs to be calculated once off line as the transform is constant and thus can be very efficient for reoccurring problems (or by splitting the problem into constant sized patches as described in [17]).
4
Numerical Results
Comparing the reconstruction quality based on standard measurements alone such as SNR and PSNR doesn’t do justice to the method. This is due to the fact that these values are not good assessors for edge reconstruction being L2 based measures. Despite this and for a lack of a better objective comparison method,
Anisotropic Regularization for Inverse Problems
327
we do see an improvement in the reconstruction based on these measurements. It is important to also note the subjective difference when looking at the images themselves. The biggest difference is seen near pronounced edges and textures which are much better preserved than with the standard wiener filter. This method also removes ringing (Gibbs effect) seen around strong edges and color skews in color images. The results are cropped and zoomed to better accent the difference due to the limit of the medium.
(a) Input
(b) Degraded
(c) Standard W.F.
(d) Anisotropic W.F.
Fig. 2. Reconstruction of a gray-scale image (2(a)) degraded using a Gaussian kernel and Gaussian noise (2(b)) with standard deviation of 10%. The image is reconstructed using the standard (2(c)) and anisotropic Wiener filter (2(d)).
The first example (Fig. 2) shows the results for a gray scale image degraded by a Gaussian kernel and Gaussian noise with standard deviation of 10% (with a resulting SNR of 16.34db). The reconstruction for both the standard Wiener filter (2(c)) and the anisotropic version (2(d)) is done based on the L2 fidelity term. The SNR of the reconstructed images are 20.72db and 21.08db respectively. The anisotropic reconstruction displays less noise, especially visible in homogeneous areas such as the white background and skin. The edges in the isotropic version on the other hand display both blur (such as the back, hands and hair) and ringing around pronounced edges not appearing in the anisotropic version. This is most pronounced around the dominant edges of the back and the hair. Figure 3 shows the results of applying the Wiener filter to an image with impulse noise (11% density, with 8.47db SNR). The first two examples (3(b), 3(e)) display the result of applying the standard and anisotropic Wiener filters respectively, both using the L2 fidelity term. Despite improving SNR values (15.9db and 16.48db) the results are still rather miserable, although the anisotropic version still displays more pronounced edges (teeth, wall) as well as less noise. On the other hand, looking at the versions employing the augmented L1 fidelity term (3(c) and 3(f)), on first look one can mistake them for the input image. Despite this the anisotropic version still displays much sharper results up close, as well as improved SNR (22.48db compared to 22.98db). The following examples for color images show the extendability of the method to vector valued images.
328
M. Feigin and N. Sochen
(a) Input Image
(b) Std. W.F. L2 fidelity
(c) Std. W.F., L1 fidelity
(d) Degraded Image
(e) AI W.F., L2 fidelity
(f) AI W.F., L1 fidelity
Fig. 3. Restoration of a gray scale image corrupted by impulse noise of density 0.11. Figures 3(b) and 3(e) show the reconstruction using regular and anisotropic Wiener filter with L2 fidelity. Figures 3(c) and 3(f) show the reconstruction using the L1 fidelity term.
Figure 4 shows the results for a color image degraded by a Gaussian kernel and Gaussian noise with a standard deviation of 10% (SNR of 16.7db). As can be seen, the anisotropic reconstruction produces sharper edges without the color shifts and ringing which is visible around sharp edges. Additionally, there is less overall noise and color shifts due to the smoothing of the noise. SNR for the isotropic case is 21.04db compared to 21.6db for the anisotropic variation. Fig. 5 shows the results of applying both the regular and anisotropic Wiener filter, both based on the L1 fidelity term, to a color image degraded by a Gaussian kernel and impulse noise with 11% density (SNR of 11db). The anisotropic variation shows sharper edges, better color restoration and less color skews around edge boundaries. This, like the previous results, is most pronounced around bright edges such as the teeth, eyes and wall. The SNR of the reconstruction is 20db and 23.1db for the isotropic and anisotropic varieties respectively.
Anisotropic Regularization for Inverse Problems
(a) Degraded image
(b) Standard W.F.
329
(c) Anisotropic W.F.
Fig. 4. Color image degraded by a gaussian kernel and uncorrelated Gaussian noise (4(a)) with standard deviation of 10%. Figures 4(b) and 4(c) show the results for the standard and the anisotropic reconstruction.
(a) Degraded image
(b) Std. W.F. L1 fidelity
(c) AI W.F. L1 fidelity
Fig. 5. Color image degraded by a gaussian kernel and uncorrelated impulse noise (5(a)) with density 0.11. Figures 5(b) and 5(c) show the results for the standard and anisotropic restoration based on the L1 fidelity term.
5
Conclusion
In this work we presented an anisotropic regularization term for inverse problems that allows to better preserve object edges while at the same time improving noise suppression. Combined with an augmented L1 fidelity term it provides remarkable results for images corrupted by impulse noise.
References 1. Goodman, J.: Introduction to Fourier Optics. McGraw-Hill Book Company, New York (1996) 2. Jähne, B.: Digital Image Processing, 5th edn. Springer, Heidelberg (2002)
330
M. Feigin and N. Sochen
3. Gonzalez, R.C., Woods, R.E.: Digital image processing, 2nd edn. Prentice-Hall, Englewood Cliffs (2002) 4. Bar, L., Sochen, N., Kiryati, N.: Semi-blind image restoration via mumford-shah regularization. IEEE Trans. on Image Processing 15(2), 483–493 (2005) 5. Bar, L., Kiryati, N., Sochen, N.: Image deblurring in the presence of impulsive noise. Int. J. Comput. Vision 70(3), 279–298 (2006) 6. Blomgren, P., Chan, T.F.: Color tv: Total variation methods for restoration of vector-valued images. IEEE Trans. Image Processing 7, 304–309 (1998) 7. Chan, T.F., Vese, L.A.: Image segmentation using level sets and the piecewiseconstant mumford-shah model. Technical Report 00-14, UCLA CAM (2000) 8. Charbonnier, P., Blanc-féraud, L., Aubert, G., Barlaud, M.: Deterministic edgepreserving regularization in computed imaging. IEEE Trans. Image Processing 6, 298–311 (1997) 9. Welk, M., Theis, D., Weickert, J.: Variational deblurring of images with uncertain and spatially variant blurs. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 485–492. Springer, Heidelberg (2005) 10. Jalobeanu, A., Blanc-Feraud, L., Zerubia, J.: An adaptive gaussian model for satellite image deblurring. IEEE Transactions on Image Processing (4), 613–621 (2004) 11. Krajsek, K., Mester, R.: The edge preserving wiener filter for scalar and tensor valued images. In: DAGM-Symposium, pp. 91–100 (2006) 12. Kaftory, R., Sochen, N., Zeevi, Y.Y.: Variational blind deconvolution of multichannel images. Int. J. Imaging Science and Technology 15(1), 56–63 (2005) 13. Sochen, N., Kimmel, R., Malladi, R.: A general framework for low level vision. IEEE Trans. Image Processing, Special Issue on Geometry Driven Diffusion 7, 310–318 (1998) 14. Aharon, M., Elad, M., Bruckstein, A.: The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Trans. On Signal Processing 54(11), 4311–4322 (2006) 15. Kimmel, R., Malladi, R., Sochen, N.: Images as embedded maps and minimal surfaces: Movies, color, texture, and volumetric medical images. International Journal of Computer Vision 39, 111–129 (2000) 16. Sagiv, C., Sochen, N., Zeevi, Y.: Gabor features diffusion via the minimal weighted area method. In: EMMCVPR (September 2001) 17. Feigin, M., Sochen, N., Vemuri, B.C.: Efficient anisotropic α-kernels decompositions and flows. In: POCV (2008) 18. Golub, G.H., Loan, C.F.V.: Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Locally Adaptive Total Variation Regularization Markus Grasmair Department of Mathematics, University of Innsbruck, Technikerstr. 21a, A-6020 Innsbruck, Austria
[email protected] http://infmath.uibk.ac.at
Abstract. We introduce a locally adaptive parameter selection method for total variation regularization applied to image denoising. The algorithm iteratively updates the regularization parameter depending on the local smoothness of the outcome of the previous smoothing step. In addition, we propose an anisotropic total variation regularization step for edge enhancement. Test examples demonstrate the capability of our method to deal with varying, unknown noise levels.
1
Introduction
Because of its ability to generate images with piecewise smooth structures that are well separated by pronounced edges, total variation regularization is one of the most widely used techniques for image denoising and related tasks. Since the first proposal by Rudin, Osher, and Fatemi [14] of using the total variation for denoising purposes, that is, the L1 -norm of the gradient, this method has been applied to a wide range of applications in imaging and inverse problems. We refer to [1, 2, 3, 5, 12, 13, 15] to name but a few contributions to this field. Given a noisy function f ∈ L2 (Ω) on some open and bounded domain Ω ⊂ IRn , n ∈ IN, the goal of denoising is to find a new function u close to f that retains the important features of f while noise, consisting of fast oscillations, is removed. Noting that edges belong to the most prominent features in images, this task can be achieved by minimizing the total variation functional 2 1 T (u; α) := u(x) − f (x) dx + α|Du|(Ω) (1) 2 Ω with respect to u ∈ BV(Ω). The regularization parameter α > 0 in (1) controls the amount of smoothing that is desired: the larger α, the more the regularized function uα tends to consist of well separated homogeneous regions. Conversely, a small parameter α implies a function lying close to the input data, but also possibly exhibiting a significant number of oscillations. The relation between α and uα , however, exists only on a qualitative level. There is no simple connection between the value of α and the smoothness of uα , or even between α and the difference f − uα , which is simply the part of the data classified as noise by the functional T . The necessity of taking into X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 331–342, 2009. c Springer-Verlag Berlin Heidelberg 2009
332
M. Grasmair
account both the data and the expected noise level is a well established fact in the theory of inverse problems (see for instance [8]). Because for many applications of mathematical imaging, in particular tasks that are to be completely automated, a precise knowledge of the noise is not available, this leads to the conclusion that, in these cases, a-priori parameter choices are not feasible. Instead, one should adapt α until both uα and the perceived noise f − uα are satisfactory. Though better than a fixed a-priori choice, also adaptation of the regularization parameter need not be sufficient for good results. It may happen that the noise on the image f is not identically distributed but varies locally. In this case, it is difficult to find a compromise between oversmoothing in noise-free regions caused by too large a parameter choice, and a still noisy output resulting from a small parameter. Similar effects can be observed, if the structure of the noise-free data itself changes over the image. Then, the regularization parameter should be larger for homogeneous parts of the image than for parts with small details. The problem of finding a parameter that is suited for the whole image can be circumvented by passing from a global parameter α > 0 to a parameter function α : Ω → IR>0 . Then, the regularization functional reads as 1 T (u; α) = 2
Ω
2 u(x) − f (x) dx +
α(x) d|Du|(x) .
(2)
Ω
This functional is well-defined, if α is continuous, and, using direct methods, can readily be shown to attain a minimizer, if α is bounded away from zero. Total variation regularization with non-constant regularization parameter has already been studied in several other articles [6, 9, 10, 11, 16, 17]. In [16, 17], the choice of α is based on the scale of the features one wants to recover. In [10], at first the uniform problem is solved with an automatically identified optimal regularization parameter α. The result of the first denosing attempt is used for extracting the edges in the image, at which subsequently the regularization parameter is locally increased. Then the minimization problem is solved a second time with the localized parameter α(x). The approach in [11] uses statistical properties of the residual in order to decide whether the local regularization parameter is suited. The criterion employed there is based on the local variance of the residual: If it is close to the noise level, one can expect that mostly noise has been filtered. It it is higher, then the residual probably contains texture and therefore the regularization parameter has to be decreased. The estimates in [11] are closely related to the inequalities in [10], though the approaches by which they are reached differ considerably. Note moreover that the same idea has already been employed in [6] for one-dimensional total variation regularization. In this paper, we propose to target some a-priori specified smoothness of the output uα , which is measured in terms of the oscillations of the direction ∇uα /|∇uα | of the gradient of the image. This direction can be determined by passing to a dual formulation, as it essentially equals the rescaled dual variable. This idea of parameter adaptation based on the properties of the dual function is taken from [6].
Locally Adaptive Total Variation Regularization
333
The main concept of this paper of using a dual variable to provide a guess on the smoothness of the regularized image is introduced in Section 2. For further improving this smoothed image by enhancing the edges, we propose to subsequently apply anisotropic total variation regularization with an anisotropy that is estimated from the same dual variable that has determined the isotropic regularization parameter (see Section 3). A complete description of the algorithm can be found in Section 4. Finally, we apply this method in Section 5 to two test examples that show its suitability for adaptive noise removal.
2
Parameter Adaptation via Dual Variables
Consider the dual formulation of T (·; α), which consists in solving the constrained minimization problem 2 J (V ) := div V (x) + f (x) dx → min , Ω
|V (x)| ≤ α(x) almost everywhere on Ω , V (x) · ν(x) = 0 almost everywhere on ∂Ω ,
(3)
over the space of vector valued essentially bounded functions L∞ (Ω; IRn ). In (3), ν denotes the outward normal to the domain Ω, and the equation V · ν = 0 is understood in a distributional sense. Also, the divergence of an essentially bounded function is defined distributionally. To be precise, the functions V and div V satisfy the equation ∇φ(x) · V (x) dx = − φ(x) div V (x) dx Ω
Ω
for every φ ∈ C 1 (IR ). Minimization of Tα is equivalent to solving the dual problem (3) in the sense that a function Vα ∈ L∞ (Ω; IRn ) solves (3), if and only if uα := f + div Vα minimizes Tα . We refer to [4], which treats the dual formulation of total variation regularization, and to [7] for a detailed introduction to infinite dimensional convex analysis. We now examine the dual variable V more closely. Formally, the optimality condition for a minimizer uα of the functional T reads as ∇uα (x) for almost every x ∈ Ω . uα (x) − f (x) div α(x) |∇uα (x)| n
Since uα − f = div Vα , one sees that the dual minimizer Vα introduced above in fact coincides with the direction of the gradient of uα , multiplied by α(x). In particular, for almost every x ∈ Ω, we either have that |Vα (x)| = α(x) or the gradient of uα at x is zero, that is, uα is approximately constant near x. Even more, the local behaviour of Vα is strongly related to a certain kind of regularity of the regularized function uα : Large variations of Vα /α on the unit
334
M. Grasmair
sphere imply equally large variations of the direction of the gradient of uα . In other words, variations of Vα /α imply small oscillations of uα . The method we propose in the following takes advantage of these properties of Vα and uα and exploits their relation. Let r > 0 be some fixed parameter. We define the r-local mean of a vector valued, essentially bounded function W ∈ L∞ (Ω; IRn ) at x ∈ Ω by 1 Mr (x; W )(x) := − W (y) dy := n W (y) dy . L Br (x) ∩ Ω Br (x)∩Ω Br (x)∩Ω Here, Ln denotes the n-dimensional Lebesgue measure. In addition, we define the r-local variation of W by Σr (x; W )(x) := W (x) − Mr (x; W ) . (4) The definition of Σr directly implies that Σr (x; W ) ≤ 2 ess sup |W (y)| : y ∈ Br (x) ∩ Ω for almost every x ∈ Ω. Applying the above inequality to the scaled solution Wα (x) := Vα (x)/α(x) of (3), one immediately sees that 0 ≤ Σr (x; Wα ) ≤ 2 max |Vα (y)|/α(y) : y ∈ Br (x) ∩ Ω ≤ 2 . Moreover, the actual size of the value Σr (x; Wα ) provides an indication of the oscillation of the function uα near x: If Σr (x; Wα ) is close to zero, then the gradient ∇uα points in roughly the same direction on the whole set Br (x). Conversely, a value above one implies that the orientation of ∇uα (x) vastly differs from the majority of directions present in Br (x). See Figure 1 for an example of a smoothed image with corresponding local variation of the dual variable Vα . In this manner, the function Σr (x; Wα ) can serve as a local criterion for the smoothness of the regularized function uα . If the finally desired smoothness is not yet reached, that is, if Σr (x; Wα ) is too large, it is necessary to increase the local regularization parameter α(x). Conversely, if the function uα appears too smooth, that is, Σr (x; Wα ) is close to zero, then α(x) is decreased and a new tentative solution uα is computed. This process of computing Σr (x; Wα ) and updating α is repeated until the update of uα becomes small enough. In order to reach a uniform smoothness of the regularized image uα over its whole domain, we propose to prescribe some target smoothness 0 < θ < 1. Then one can compute a suitable update α ˜ of α setting s α(x) ˜ = α(x) θ + Σr (x; Wα )/2 (5) for some parameter s > 0 determining the size of the update. Iteration of this update will lead to a uniform smoothness Σr (x; Wα ) ≈ 2(1 − θ). The choice
Locally Adaptive Total Variation Regularization
335
Fig. 1. Smoothed image (left) and corresponding function Σr (right). Bright pixel values indicate a higher value of Σr .
of the target smoothness should reflect the properties of the image one wants to recover: A large parameter (θ ≥ 0.7) means that only the structures about the size of r are of interest. Small values (θ ≈ 0.55) put more emphasis on the structures of size smaller than r (see also Figure 4). In order to avoid too rapid changes of the parameter α(x), it is necessary smooth the update α ˜ computed by means of (5). Also from a theoretical point of view, this smoothing procedure is required for obtaining a continuous regularization function α. We propose to simply replace the update α ˜ (x) by its local mean value Mr (x; α). ˜ In this way, the average smoothness in the balls Br (x) will be almost independent of x.
3
Edge Enhancement by Anisotropy
Having determined the size of the local regularization parameter α(x) by means of the scaled dual variable Wα , it is in addition possible to use the distribution of the values of Wα on the unit sphere for sharpening edges and, in particular, thin ridges, which usually tend to get oversmoothed. To that end, instead of applying isotropic regularization, we introduce an anisotropy the direction of which is determined by the local covariance of Wα . For R > 0 we define the IRn×n -valued function CovR (x; W ), the covariance of W on BR (x) ∩ Ω, by defining its (i, j)-th component as (i) (i,j) (i) (j) W (y) − MR (x; W ) W (j) (y) − MR (x; W ) dy . CovR (x; W ) := − BR (x)∩Ω
(6) Again using the property that Wα is proportional to ∇uα , one sees that the principal component of CovR (x; Wα ) indicates, up to sign, the prevailing direction of ∇uα near x. This dominant direction can be pronounced further by replacing the isotropic bound |Vα (x)| ≤ α(x) in (3) by an anisotropic one defined by CovR (x; Wα ). This is achieved by minimizing J (V ) respecting the constraints V · ν = 0 on ∂Ω and
336
M. Grasmair
c(x) V (x)t CovR (x; Wα )V (x) ≤ 1
on Ω .
(7)
Here, the scalar valued function c : Ω → IR>0 has to be chosen in such a way that a similar amount of smoothing is reached as for isotropic regularization with parameter α(x). For determining a suitable size for c, note that the amount of smoothing induced by the bound (7) can be estimated by the determinant of the matrix c(x) CovR (x; Wα ), which, for consistency with the constraint |V (x)| ≤ α(x), should equal α(x)−2n . Thus one obtains for the function c the value −1/n c(x) = α(x)−2 det CovR (x; Wα ) . We therefore propose an edge enhancement via solving the minimization problem 2 J (V ) = div V (x) + f (x) dx → min , Ω
V (x)t A(x)V (x) ≤ 1 almost everywhere on Ω ,
(8)
V (x) · ν(x) = 0 almost everywhere on ∂Ω . Here
−1/n CovR (x; Wα ) , A(x) = α(x)−2 det CovR (x; Wα )
and Wα = Vα /α, where Vα is the solution of (3). Denoting the solution of (8) by VA and defining uA := f + div VA , we obtain an enhanced version of the isotropic total variation minimizer uα .
4
Summary of the Algorithm
We now summarize the method developed in the previous sections for adaptive denoising of a noisy image f ∈ L2 (Ω). Algorithm 1. Set k = 1, choose some initial regularization function α1 : Ω → IR>0 , a smoothness parameter 0 < θ < 1, some r > 0, R > 0, s > 0, and ε > 0. 1. Compute
Vk := arg min J (V ) : |V (x)| ≤ αk (x) on Ω, V · ν = 0 on ∂Ω . 2. Define Wk := Vk /αk and compute Σr (x; Wk ) (see (4)). 3. If Vk − Vk−1 < ε go to 5. 4. Compute s α ˆ k+1 (x) := αk (x) θ + Σr (x; Wα )/2 and ˆ k+1 ) , αk+1 (x) := Mr (x; α increase k by one, and go to 1.
Locally Adaptive Total Variation Regularization
337
5. Compute CovR (x; Wk ) (see (6)) and −1/n CovR (x; Wk ) . A(x) := α(x)−2 det CovR (x; Wk ) 6. Compute
VA := arg min J (V ) : V (x)t A(x)V (x) ≤ 1 on Ω, V · ν = 0 on ∂Ω .
Define the regularized function uA := f + div VA . In steps 1–4, only the regularization function α is determined. For this, it is not necessary to compute the minimizers of J precisely. Instead, a reasonable approximation of a minimizer is sufficient to provide a good update of α, at least during the first iterations. In particular if an iterative method is used for the minimization of J , the computation time can be improved by stopping the iteration well before convergence is reached. In the numerical examples below, the functions Vk and VA were computed by alternating between gradient of J and descent steps for the minimization t projections of V on the sets V : |V (x)| ≤ α (x) and V : V (x) A(x)V (x) ≤ k 1 , respectively. The function Vk−1 was used as initial guess for the computation of Vk .
5
Examples
The algorithm presented in Section 4 is tested by means of two images. The first, synthetic image shows a collection of ellipses and rectangles of different size and intensity (see Figure 2, upper left). These clean data were distorted by normally distributed random noise. In order to illustrate the capability of the algorithm for dealing with varying noise level, the standard deviation of the random noise was chosen to increase towards the right bottom of the image from about 10% of the maximal intensities to 150% (see Figure 2, lower left). The original image only consisting of simple geometric forms without any texture, it should be perfectly suited for total variation regularization. The changing noise level within the distorted data, however, makes a uniform parameter choice almost impossible: If the regularization parameter is chosen too small, then the noise on the right hand side of the data is barely removed. In particular, the right hand edges of the lower ellipses can hardly be recovered. On the other hand, a too large regularization parameter leads to the disappearance of the small circle at the left hand side of the image (see Figure 2, middle column). Only a very small range of parameters removes the noise reasonably well while still preserving the small scale structure—and even then the contrast deteriorates. Figure 2, upper right, shows the smoothed image obtained with Algorithm 1. Since the original image is very smooth, the smoothness parameter was chosen rather large as θ = 0.85. The variance Σr was evaluated on balls with a radius of 3 pixels, the complete image measuring 256 × 256 pixels. The lower right image in Figure 2 shows the distribution of the finally chosen regularization function α. As expected, it increases to the right bottom, where more noise is present. Over the whole image, the maxima and minima of α differ by a factor of 12.
338
M. Grasmair
Fig. 2. Left column: original and noisy image; the noise level increases to the right bottom of the image. Middle column: denoising without parameter adaptation; either small details are lost or the smoothing effects are partially insufficient. Right column, upper row: denoised image for smoothness parameter θ = 0.85. Right column, lower row: logarithm of the finally chosen regularization function α; the minima and maxima of α differ by a factor of 12.
One can see in the resulting image that the noise is efficiently removed. Also, the shape of the two lower ellipses is reconstructed in a reasonable way, considering that rather more noise than signal is present in these regions. Moreover, the small circle on the left is clearly visible, though some contrast was lost. As a second test example, we consider the photographer image. In a first experiment we add different levels of random noise (see Figure 3). The outcome of the adaptive Algorithm 1 (right column) is compared with the solution of standard total variation regularization with constant parameter choice independent of the noise level (middle column). The smoothness parameter for the adaptive algorithm was chosen as θ = 0.60; the regularization parameter for the standard algorithm was selected in such a way that the results for moderate noise level (third row) are comparable. The results show that, as expected, a constant regularization parameter only yields good results for a very specific noise level. For stronger noise, almost no smoothing is obtained, whereas the image is oversmoothed in case it is already quite clean. In contrast, the adaptive algorithm yields comparable results for different noise levels, and is also able to treat noise-free images (first row). In order to illustrate the effect of the smoothness parameter, we apply Algorithm 1 to the noise-free photographer image and vary θ (see Figure 4). For a value of θ = 0.55 mainly the grass and details of the camera are smoothed. As
Locally Adaptive Total Variation Regularization
339
Fig. 3. Left column: image with Gaussian noise; the noise level increases with each row (σ = 0, 30, 50, 100). Middle column: total variation denoised image with constant parameter choice. The regularization parameter is kept the same for all images. Right column: denoised images with adaptive parameter choice for a smoothness parameter θ = 0.60.
θ increases, more and more details are lost until only the large scale structures in the image remain. Thus, the smoothness parameter works in some sense like the regularization parameter of standard total variation regularization.
340
M. Grasmair
Fig. 4. Influence of the smoothness parameter θ. First row: original image and smoothed images with θ = 0.55 and θ = 0.60. Second row: smoothed images with θ = 0.65, θ = 0.70, and θ = 0.80. Table 1. Comparison between standard TV regularization, the method proposed in [11], and our method for different smoothness parameters. The table provides signal to noise ratios for the photographer image with various levels of Gaussian noise added (σ = 20, 30, 40, 50). original uniform TV adaptive ( [11]) θ = 0.55 θ = 0.60 θ = 0.65 9.47 6.31 3.86 1.93
14.63 12.13 10.05 8.38
15.63 13.35 11.59 10.11
15.30 13.28 11.47 10.00
14.73 13.46 12.45 11.54
13.71 12.84 12.18 11.51
There is, however, a notable difference. In the standard method, the time when structures in the image disappear depends on their scale, which is basically the ratio between contrast, that is, the difference of the intensities of the structure and the background, and the perimeter of the structure. As opposed to this, the model presented here puts much less emphasis on the contrast. Low contrast but distinct parts of the image tend to disappear much later than with uniform regularization. Compare for instance the rightmost building in the images of Figure 4 with the outcome of the standard method (Figure 3, first row, middle image). Finally, Table 1 compares the performance of our algorithm with uniform total variation regularization and the adaptive method from [11]. The regularization parameter for the comparison was chosen in such a way that the norm of the
Locally Adaptive Total Variation Regularization
341
residual equals the norm of the noise. At small noise levels, the texture enhancing method [11] and even uniform regularization perform better. On the other hand, our algorithm provides good results if much noise is present. Note moreover that the here proposed method does not require a guess on the noise level, whereas the other methods do.
6
Conclusion
We have introduced an algorithm for the local adaptation of the regularization parameter in total variation regularization applied to the task of image denoising. The main idea of the method is to base the parameter choice on the smoothness of the output image, which is measured in terms of the variation of the direction of its gradient. This variation can be obtained when employing a dual method for the actual minimization of the total variation regularization functional. Starting from an initial guess of the regularization function, the proposed algorithm consecutively computes the corresponding minimizer of the total variation functional and updates the regularization function depending on the smoothness of the update. The iteration stops when the update is sufficiently small. As a post-processing step, we propose to apply an anisotropic regularization method intended to sharpen edges. Again, the regularization is determined by the dual variable. This anisotropic regularization step reduces the contrast loss due to isotropic smoothing and, in particular, is suited for the enhancement of ridges. The examples presented in Section 5 indicate the suitability of the proposed method for denoising images with unknown, varying noise levels. In particular, they show its ability to provide an estimate for the amount of smoothing required to obtain a certain smoothness of the output.
Acknowledgement This work has been supported by the Austrian Science Fund (FWF) within the national research network Industrial Geometry, project 9203-N12.
References 1. Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems. Inverse Probl. 10(6), 1217–1229 (1994) 2. Aubert, G., Kornprobst, P.: Mathematical problems in image processing. In: Partial differential equations and the calculus of variations, With a foreword by Olivier Faugeras, 2nd edn. Applied Mathematical Sciences, vol. 147. Springer, New York (2006) 3. Burger, M., Osher, S.: Convergence rates of convex variational regularization. Inverse Probl. 20(5), 1411–1421 (2004) 4. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vision 20(1–2), 89–97 (2004)
342
M. Grasmair
5. Chambolle, A., Lions, P.-L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997) 6. Davies, P.L., Kovac, A.: Local extremes, runs, strings and multiresolution. Ann. Statist. 29(1), 1–65 (2001) 7. Ekeland, I., Temam, R.: Convex Analysis and Variational Problems. NorthHolland, Amsterdam (1976) 8. Engl, H.W., Hanke, M., Neubauer, A.: Regularization of inverse problems. Mathematics and its Applications, vol. 375. Kluwer Academic Publishers Group, Dordrecht (1996) 9. Frigaard, I.A., Ngwa, G., Scherzer, O.: On effective stopping time selection for visco-plastic nonlinear BV diffusion filters used in image denoising. SIAM J. Appl. Math. 63(6), 1911–1934 (electronic) (2003) 10. Frigaard, I.A., Scherzer, O.: Herschel–Bulkley diffusion filtering: non-Newtonian fluid mechanics in image processing. Z. Angew. Math. Mech. 86(6), 474–494 (2006) 11. Gilboa, G., Sochen, N., Zeevi, Y.Y.: Variational denoising of partly-textured images by spatially varying constraints. IEEE Trans. Image Process. 15(8), 2281–2289 (2006) 12. Ito, K., Kunisch, K.: Augmented Lagrangian methods for nonsmooth, convex optimization in Hilbert spaces. Nonlinear Anal. 41A, 591–616 (2000) 13. Nashed, M.Z., Scherzer, O.: Least squares and bounded variation regularization with nondifferentiable functional. Numer. Funct. Anal. Optim. 19(7-8), 873–901 (1998) 14. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1–4), 259–268 (1992) 15. Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in Imaging. Applied Mathematical Sciences, vol. 167. Springer, New York (2008) 16. Strong, D.M.: Adaptive Total Variation Minimizing Image Restoration. CAM Report 97-38, University of California, Los Angeles (1997) 17. Strong, D.M., Aujol, J.-F., Chan, T.F.: Scale recognition, regularization parameter selection, and Meyer’s G norm in total variation regularization. Multiscale Model. Simul. 5(1), 273–303 (electronic) (2006)
Basic Image Features (BIFs) Arising from Approximate Symmetry Type Lewis D. Griffin1, Martin Lillholm1, Mike Crosier1, and Justus van Sande2 2
1 Computer Science, University College London, London WC1E 6BT, UK Biomedical Engineering, Eindhoven University of Technology, The Netherlands
[email protected] Abstract. We consider detection of local image symmetry using linear filters. We prove a simple criterion for determining if a filter is sensitive to a group of symmetries. We show that derivative-of-Gaussian (DtG) filters are excellent at detecting local image symmetry. Building on this, we propose a very simple algorithm that, based on the responses of a bank of six DtG filters, classifies each location of an image into one of seven Basic Image Features (BIFs). This effectively and efficiently realizes Marr’s proposal for an image primal sketch. We summarize results on the use of BIFs for texture classification, object category detection, and pixel classification. Keywords: Gaussian Derivatives, Hermite Transform, Group Theory.
1 Introduction Previous schemes for detection of image symmetry are fairly complex [1-6]; requiring, for example, comparison of the outputs of filters at multiple positions. Herein we show that symmetries may be detected by single linear filters. Building on this we present a simple algorithm that computes a Marr-type primal sketch [7] by categorizing local image structure according to its approximate symmetry. The paper is organized as follows. In section 2 we present results on image symmetries. In 3 we show how to test whether a linear filter is sensitive to a symmetry. In 4 we review image measurement with derivative-of-Gaussian (DtG) filters. In 5 we consider the symmetry-sensitivity of these DtG filters. In 6 we show how this sensitivity gives rise to a system of Basic Image Features (BIFs). In 7 we summarize results on using BIFs for texture categorization, object category detection and pixel classification. In 8 we conclude. Sections 2-5 are a distillation of work published, in press and under review in fuller form elsewhere [8-14]; 6 is new; parts of 7 have been presented or are under review in fuller form elsewhere [9, 11].
2 Image Symmetries Symmetry of a structure (X) is always relative to some class of admissible transformations. A structure is said to have a symmetry when a non-trivial group of X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 343–355, 2009. © Springer-Verlag Berlin Heidelberg 2009
344
L.D. Griffin et al.
admissible transformations, known as the automorphism group, each leave it indistinguishable from the original. This is denoted by Aut [ X ] := {t | t D X = X } .
Considering images, an obvious class of transformations are the spatial isometries; and the possible symmetries, relative to this class, have long been catalogued [15-17]. A broader class of transformations, where each spatial isometry is combined with a permutation of a finite set of image ‘colour’ values, has also been considered. These allow the symmetries of, for example, Escher’s ‘Reptiles’ to be expressed [18]. The gamut of possible ‘colour symmetries’ has been fully determined [19, 20]. We propose that the class of ‘image isometries’, defined as a spatial isometry combined with an intensity isometry, is appropriate for images. We write an image isometry as g = ( i, s ) , where i : \ → \ is an intensity isometry, and s : \ 2 → \ 2 is a spatial isometry. Such an image isometry is applied to an image I : \ 2 → \
(
)
according to g D I = i D I ( s D _ ) = i I ( s ( _ ) ) . Choosing a class of transformations is tantamount to choosing a geometry [21], and the geometry that corresponds to the class of image isometries has previously been considered for images [22] and much earlier, abstractly, as one of a larger class of possible geometries [23]. We have employed a method for determining the possible automorphism groups of images, relative to the class of image isometries. The method relies on two results. First, that the projection of a group of image isometries onto their spatial or intensity components in both cases makes a group. Second, that (except for a special case) the intensity projection group must be isomorphic to a factor group of the spatial projection group [8]. Using the method, we have determined the possible automorphism groups of 2-D images, except for cases that contain discrete periodic translations. A summary of these possible symmetries, together with our notational system is shown in fig. 1. The symmetries include: familiar ones, such as reflectional (J2,1), reflect-and-negate (J6,1 ), and Yin-Yang type (J7,2); simple but often ignored ones, such as variation in one direction only (J3); simple but novel, such as continuous translate-and-increment in one direction, plus a line of reflection parallel to that direction (J11); and some wholly novel, such as continuous translate-and-increment in one direction, plus a continuous line of centres of Yin-Yang type symmetry (J12).
3 Sensitivity of Linear Filters to Symmetries Detection of a symmetry seems to require multiple measurements, but this is incorrect. Consider a +1/-1 filter, such as used in finite-difference schemes. When positioned so that it straddles a putative line of reflection, a necessary criterion for the symmetry is that the filter gives a 0 response. We generalizes this: a filter F is sensitive to a symmetry K if it gives the same response to all images that have the symmetry (i.e. ∃ f ∈ \ Aut [ I ] ⊇ K ⇒ F I = f ). This definition is impractical because it requires assessment across all images. However, we have found a necessary and sufficient test that requires only a single integral to be computed. We present this below in Theorem 1, after introducing some notation.
J6,3, J6,5, …
…
J6,2
J6,1
J1,4, J1,6, …
…
… J2,3, J2,5, …
…
J1,2
J2,4, J2,6, …
…
…
…
J0
J2,1
J7,4, J7,6, …
J8,4, J8,6, …
J2,2
J3
J4
J8,2
J7,2
J9
J11
Jslope
J10
Fig. 1. The group/subgroup lattice of the possible image symmetries, excluding those with discrete periodic translation
J1,3, J1,5, …
…
J6,4, J6,6, …
J5
Jconst
J12
Basic Image Features (BIFs) Arising from Approximate Symmetry Type 345
346
L.D. Griffin et al.
We use an inner product notation ( F I :=
G G ∫ F( x) I(x) )
to denote the
G x∈\ 2
measurement of an image I : \ 2 → \ by a filter F : \ 2 → \ ; and we define an K operator F ( ) := ∑ i s D F which, roughly speaking, ‘smears’ a filter by a group. ( i , s )∈K
Theorem: Symmetry-Sensitivity Test for Filters
F is sensitive to K if and only if F F ( K ) = 0 Proof A formal proof will be published elsewhere [14]. Intuitively the truth of the theorem can be understood as follows. The signal that a filter ‘sees’ best is a copy of itself. Of all the symmetric signals, a symmetrised version of the filter should be the most easily seen. If the filter cannot see a symmetrised version of itself, then it is insensitive to the symmetry.
4 Gaussian Derivative Filters Gaussian Derivative (DtG) filters are defined in 1-D by
Gσ ( x ) := ( 2πσ
)
1 2 −2
e
−
x2 2σ 2
, Gσ(
n)
( x ) :=
n
dn ⎛ −1 ⎞ ⎛ x ⎞ Gσ ( x ) = ⎜ ⎟ Hn ⎜ ⎟ Gσ ( x ) dx n 2 σ ⎝ ⎠ ⎝σ 2 ⎠
where Hn is the nth Hermite polynomial; and in 2-D by Gσ(
m, n )
( x, y ) := Gσ( m)( x ) Gσ( n)( y ) .
They are used as a general-purpose method to probe an image location (which for simplicity we assume is at the origin 0 ) by computation of inner products jmn = Gσ( m, n ) I . Typically, one measures with a family of DtG filters up to some order e.g. the 2nd
{
}
order family Gσ( m, n ) | 0 ≤ m + n ≤ 2 . Scale-normalized filter responses c pq := σ p + q j pq make later equations simpler. The suitability of DtGs as the front-end of an uncommitted computational vision system arises from the symmetries that individual filters and families possess [24]. First amongst these is a scale symmetry, which manifests as a change of size, but not of shape, when a DtG is rescaled by blurring with a Gaussian kernel. Second is that the linear span of a family of DtGs is rotationally symmetric. The responses of a bank of DtG filters entangle intrinsic and extrinsic aspects of image structure. For example, an in-plane rotation of the image about the measurement point causes the DtG responses to change. A representation that
Basic Image Features (BIFs) Arising from Approximate Symmetry Type
347
disentangles these aspects for measurement up to 2nd order has been developed [13]. The representation works by factoring out of the 6-D 2nd order DtG response space the changes due to the group of image isometries that fix the measurement point and do not invert the intensity axis which we denote D∞ ( 0 ) × A+ (1) . 12.1
=
-2.3
=
-0.1
2nd order DtG family
image patch
0.8
4.1
-3.7
point in 6-D jet space
6-D jet space D∞ ( 0 ) × A+ (1) - The group of centred rotations and reflections, and positive affine intensity re-scalings
= The 2nd order local-image-structure orbifold
Fig. 2. The top part illustrates schematically the probing of an image patch by a bank of DtG filters resulting in a point in jet space; the bottom, the factoring of the jet space by a group of transformations resulting in the local-image-structure orbifold
The result is an orbifold – a type of manifold with boundaries, creases and corners allowed – consisting of a 3-D and a 0-D component (figure 2). The intrinsic aspect of a 6-tuple of filter responses corresponds to a particular location in the orbifold, and is invariant to rotating the image about the measurement point, reflecting it in a line through the measurement point, or affinely scaling the intensity. When the responses of the 1st and 2nd order DtG filters are all zero, the intrinsic aspect is the 0-D part of the orbifold; all other responses map to the 3-D component. A coordinate system ( l , b, a ) ∈ ⎡⎣ − π2 , π2 ⎤⎦ × ⎡⎣ 0, π2 ⎤⎦ × ⎡⎣0, π2 ⎤⎦ for the 3-D component is given by [13]:
(
)
2 ⎛ ⎞ 2 2 2 l = arctan ⎜ 4 c10 + c01 + ( c20 − c02 ) + 4 c11 , c20 + c02 ⎟ ⎝ ⎠
⎛ 2 2 b = arctan ⎜ 2 c10 + c01 , ⎝
a=
1 2
((
( c20 − c02 )2 + 4 c112 ⎞⎟ ⎠
((
)
)
2 2 arctan c01 − c10 ( c02 − c20 ) + 4c10 c01 c11 , 2 c012 − c102 c11 + c10 c01 ( c20 − c02 )
))
The orbifold has been equipped with a metric, induced by one on the filter response space, which expressed as a line element in the lba-system is
(
−1
)
ds 2 = dl 2 + cos 2 l db 2 + da 2 2 ( 5 − 3cos2b ) sin 2 2b . The orbifold is intrinsically curved, but it can be embedded into Euclidean 3-space with only mild distortion.
348
L.D. Griffin et al.
5 Symmetry-Sensitivity of DtG Filters Using the elements of sections 2-4, we can determine which DtG filters are sensitive to which symmetries. We consider not just canonical filter forms (e.g. an x-derivative) but any linear combination of filters in the 2nd order filter family. This allows us to determine the symmetry-sensitivity of the entire filter family, independent of the particular basis filters used. For example, while the x-derivative filter is sensitive to a reflectional symmetry with a vertical mirror line through the measurement point, the x- and y-derivatives together are sensitive to any reflectional symmetry in a line through the measurement point. J 6,c {3, J 7,c {6, c J 8,4
J1,c{3,
}
J1,c{4, J
c 2,3 +
c J1,2
J
}
J
c 5
J 6,c {6, } J 7,c {12, } J 8,c {6, } J const
} }
c J 6,2 c J 7,4
a 4
J11a
c J 2,2
a J 2,1
a J 2,2 + a+ J 8,2 +
J3
J10
J11g J12g
c J8,2 J 9a
c J 7,2
c J 8,4
J12a J slope
J 4g J 9g
a J 6,1
a J 6,2 + a− J 8,2 +
J 5a
SS is the exterior only
J0
g g J1,2 + J 2,1+ g g J 6,1 + J 7,2+ g J 8,2 +
SS is the entire volume
Fig. 3. The sensitivity-submanifolds (SS) of different symmetry types are shown in red. The different possible SS are arranged in a lattice induced by inclusion relations. The symmetry type labels correspond to those used in figure 1. Superscripts indicate the spatial relationship between the symmetry and the origin: a c indicates origin-centred rotation; an a+ that the origin is contained in a line of reflection, but is not a centre of rotation; similarly for a- and anti-reflections; a g indicates general position, neither centred nor aligned. All symmetries labelled in a box have the indicated SS; those on the left are minimal.
The filter family sensitivities can be projected into the orbifold to determine where the intrinsic component of the jet responses must lie whenever the image has any of a class of symmetries equivalent by conjugation with an element of D∞ ( 0 ) × A+ (1) . We
Basic Image Features (BIFs) Arising from Approximate Symmetry Type
349
call the restricted set of possible responses the sensitivity-submanifold (SS). For example the SS is the orbifold exterior ( a = 0 ∧ a = π 2 ) for reflectional symmetry in a mirror through the measurement point. The results are summarized in fig 3.
6 Symmetry-Based Basic Image Features (BIFs) We have used the symmetry sensitivities of the DtG filters as a starting point in defining a set of Basic Image Features (BIFs) that realize Marr’s idea of a primal sketch of image structure, in a computationally simple scheme. We do not claim that the scheme is derived as rigorously as the results on symmetry sensitivity. Our scheme works by considering the orbifold projection of jets, and classifying them according to the SS that they are closest to i.e. we define a Voronoi cell partitioning of the orbifold with the SS as cell centres. We find that this works best when only seven 0-D SS (the first and second rows of figure 3) are used, though we cannot justify this beyond that it produces nice results. The resulting orbifold partitioning is shown in the top-left of figure 4.
Fig. 4. Top left: the partitioning of the orbifold into BIF categories. Bottom left: BIFs calculated across a range of scales for a simple image of a figure ‘8’; in each cube scale increases right-to-left. Lower cubes sectioned for visualisation. Right: an example complex greyscale image, with BIFs calculated at one particular scale.
The orbifold distance to the six of these SS that lie in the 3-D component of the c orbifold are simple to compute; for example, the distance to the J 7,2 SS is tan −1
(
1 c2 2 20
2 2 + c11 + 12 c02
) (c
2 10
)
2 + c01 . To find which distance is shortest it is
350
L.D. Griffin et al.
computationally equivalent but simpler to find which of six quantities is maximum. The distance to the seventh SS, which corresponds to the origin of jet space where all the 1st and 2nd derivative filters have zero response, is not well-defined. We incorporate it into our scheme by using a multiple of the 0th order jet response. The full resulting scheme for computing BIFs is as follows. i) compute scale-normalized DtG filter responses as described in section 7. ii) compute λ :=
1 2
( c20 + c02 )
and γ :=
1 4
( c20 − c02 )2 + c112
iii)classify according to which is the largest of
{
M = ε . c00 ,
2 2 c10 + c01 , λ , − λ , (γ + λ )
2 , (γ − λ )
}
2, γ .
In our scheme the only free parameters, that have to be tuned to the application are the filter scale σ and ε which controls the amount of image classified as flat; a setting of ε = 0.05 is an effective default. For display purposes we find the following colour scheme effective: if ε .c00 is the largest of M then colour the pixel pink; if 2 2 c10 + c01 is largest colour it grey; then black, white, blue, yellow and green.
7 Example Applications Using BIFs We summarize results on using BIFs for texture, object and pixel classification. 7.1 Texture Classification
Textures are often classified based on a representation of them by a histogram over a texton vocabulary [25-29]. Textons are categorical patch classifications [25, 30]. To define the texton vocabulary, a space of patch descriptions is typically Voronoi partitioned into on-the-order-of 1000 texton categories, usually around centres found by k-means clustering of the responses from many images. Textures are then classified by nearest-neighbour matching of histograms. We have investigated the classification performance of an approach in which images are labelled using spatial complexes of BIFs instead of Voronoi cells in a local description vector space. Our approach is (i) simpler because we have eliminated the clustering step needed to produce a dictionary of features, and (ii) faster because we assign image patches to histogram bins without having to use a high-dimensional nearest-neighbour computation. We call the spatial complexes of BIFs that we use analogously to textons, Basic Image Patterns (BIPs). The type of BIP that we have found effective for texture description is a scale-template of the BIFs at the same location but at four, octave-separated scales. Unlike spatial-template BIPs, these scale-templates retain the rotation invariance of BIFs, which has been shown [30] to be advantageous in texture classification tasks. For textures, we do not use the pink/flat BIF category, so four scales produces a 64=1296 bin histogram representation, which seems to capture the right trade-off between specificity and generality (see figure 5).
Basic Image Features (BIFs) Arising from Approximate Symmetry Type
351
Fig. 5. Left: An image from the CUReT 'polyester' texture class. Centre: BIFs computed at four octave-separated scales, stacked to form an array of 'column-BIPs'. Right: Occurrence histogram of column-BIPs from every position in the image".
Our method has been tested on the CUReT texture dataset [31]. As reported in [9], the simple column-BIP representation and nearest-neighbour matching using the Bhattacharyya distance correctly classifies 98.2±0.1% of the remaining 49 images per class, which is at least as good as other methods using nearest-neighbour classifiers. Extending this method by using a multi-scale histogram comparison [9] results in an improvement to 98.6±0.1% on CUReT, which is comparable to methods [27] using SVMs for classification; and produces what are, to the best of our knowledge, the best reported results [9] on the more challenging UIUCTex [32] and KTH-TIPS [27] datasets, which include variations in scale. 7.2 Object Categorization
Texton approaches have also been shown to be useful for object categorization [28, 33]. Similar to texture, the ‘standard’ approach is to partition a patch descriptor space, such as that used by SIFT [34, 35], into on-the-order-of 1000 categories (visual words) and then to describe each image to be analyzed by what visual words it contains, and to use machine learning techniques to determine a classifier that can predict the category of object based on such descriptions. We have conducted preliminary experiments to assess whether visual words built from BIFs could be used rather than SIFT-space categories. As with texture this would be simpler and faster. For our initial experiments, we have labelled pixels according to their BIF type and, inspired by SIFT, with an orientation, quantized at the π 4 level. The orientation depends on the BIF type: grey BIFs have one of eight possible orientations based on 1st order structure; yellow, green and blue BIFs have one of four possible orientations based on 2nd order structure; black, white and pink BIFs are unoriented. Thus we have twenty-three possible orientation-augmented BIF (oBIF) labels. oBIFs are a natural 2nd order generalization of the gradient orientation alphabet typically used in SIFT [34, 35]. See figure 6 for an example image and calculated oBIFs.
352
L.D. Griffin et al.
Fig. 6. The top row (left) shows an image from the PASCAL challenge, labeled with direction-augmented BIFs at right. On the bottom are shown the 4ä4 template BIPs whose occurrence in an image most informatively signals the presence of a car.
We have tested three different types of visual word, which when built from BIFs or oBIFs we call Basic Image Patterns (BIPs); two based on geometrical partitioning of patch space and one based on more standard data-driven quantization. Each BIP system has been used with simple un-optimised of-the-shelf classifiers and applied to the 20-class PASCAL VOC 2008 [33] object recognition challenge dataset. Our score in figure 7 is based on a late fusion of the three schemes and is mid-field: above other first-time entrants and below well-optimised veteran entries. Using the PASCAL VOC 2008 [33] dataset, examples of 4x4 template BIPs whose presence in images is approximately independent, and which are maximally informative for the ‘car’ category are shown in figure 6. SurreyUv A_SRKDA UvA_TreeSFS LEAR_shotgun Uv A_ FullSFS Uv A_ Soft5ColorSift LEAR_ flat XRCE TKK_ ALL_ SFBS TKK_MAXVAL BerlinFIRSTNikon UCL ECPLIAMA CASIA_ LinSVM INRIASaclay_ CMA CASIA_ NonLinSVM INRIASaclay_ MEVO FIRST_ SCST FIRST_ SC1C CASIA_ NeuralNet 0
10
20
30
40
50
Fig. 7. Results for the PASCAL VOC 2008 challenge. Each bar in the chart is a challenge entry - our result is highlighted.
Basic Image Features (BIFs) Arising from Approximate Symmetry Type
353
7.3 Pixel Classification
Many image problems involve inferring one of a small class of labels for each pixel of an image. For complex images with unpredictable global structure, most approaches balance the likelihood of the labels, given the local image structure, and the likelihood of the local arrangement of inferred labels. In both cases the likelihoods are computed on the basis of statistics learnt from groundtruth-labelled training data. We have experimented with the use of BIFs in the computation of label likelihoods given the image i.e. ignoring the likelihood of arrangements of inferred labels. For our experiments we have used 2-D Electron Microscopy images of neuronal grey matter tissue, stained to enhance neuronal membranes. We trained on four images with hand-drawn groundtruth data, indicating the position of membranes, and evaluated on a further four images. We use a k-Nearest Neighbour (k-NN) approach to classification. NN classification starts by compiling a list of descriptors of all the patches in the training data, together with the groundtruth label of the pixel in the patch centre. The classifier is used by extracting a patch around each pixel to be classified, forming a description of it, comparing the description to each the compiled descriptions, finding the k which are most similar, and assigning the pixel being analyzed with the label associated with the majority of the k. We evaluated a baseline solution based on pixel values. The distance between two patch descriptions is simply the Euclidean distance between their blurred pixel values, minimized over allowing one patch to be rotated or reflected into eight configurations. We jointly optimize blur, patch size and k. The best settings that we find are: no blur, 7ä7 patches, and k=14. At these settings membrane-labelled pixels overlap (intersection divided by union) with the groundtruth by 48%. Our solution uses a patch of BIF labels as a patch descriptor. The distance between two descriptors is simply the number of pixels where the label does not agree. As in the baseline, we minimize the distance over one of the patches being rotated or reflected. We jointly optimize the scale ( σ ) at which the BIFs are computed, the parameter ε which controls the amount of the flat BIF class, patch size and k. The best settings that we find are σ = 1.2 , ε = 0.15 , 9ä9 patches, and k=10. At these settings we achieve an overlap of 55%. See figure 8. image
groundtruth
greylevel-based classification
BIF-based classification
BIFs
Fig. 8. Typical results of our pixel-classification system
So, using BIF- rather than greylevel-description raises the score from 48% to 55%. Computation is also faster because the kNN lookup dominates the cost of computing patch descriptions, and with BIFs the distances that need to be computed are of a Hamming rather than Euclidean type.
354
L.D. Griffin et al.
8 Conclusions We have derived a scheme for classifying image structure into one of seven BIF types based on the outputs of a bank of six DtG filters. Applied to an entire image, the output realizes Marr’s notion of an image primal sketch. Presented results show that BIF description is simple, fast and effective for texture, object and pixel classification. The BIF system was derived by considering the sensitivity of DtG filters to image symmetry. Although the final algorithm is pleasingly simple, there are some weak points in the derivation of the BIFs from symmetry sensitivities. Specifically, why are only the 0-D SS considered, how exactly does orbifold distance correspond to degree of failure of symmetry, why should least-approximate local symmetry be an effective feature label? We hope that the foundation of symmetry-sensitivity of DtGs can eventually answer all of these questions in a scheme where arbitrary choice has been eliminated. Such a scheme will be extendable to higher-order filter families (where appeal to visual evidence and past practice are less effective), for which a richer alphabet of feature labels is to be expected. We predict that such a richer alphabet will give more effective solutions in the application areas that we have reviewed.
References 1. Liu, Y.X., Collins, R.T., Tsin, Y.H.: A computational model for periodic pattern perception based on frieze and wallpaper groups. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(3), 354–371 (2004) 2. Scognamillo, R., et al.: A feature-based model of symmetry detection. Proceedings of the Royal Society of London Series B-Biological Sciences 270(1525), 1727–1733 (2003) 3. Mellor, M., Brady, M.: A new technique for local symmetry estimation. In: Kimmel, R., Sochen, N.A., Weickert, J. (eds.) Scale-Space 2005. LNCS, vol. 3459, pp. 38–49. Springer, Heidelberg (2005) 4. Bonneh, Y., Reisfeld, D., Yeshurun, Y.: Quantification of local symmetry - application to texture-discrimination. Spatial Vision 8(4), 515–530 (1994) 5. Mancini, S., Sally, S.L., Gurnsey, R.: Detection of symmetry and anti-symmetry. Vision Research 45(16), 2145–2160 (2005) 6. Baylis, G.C., Driver, J.: Perception of symmetry and repetition within and across visual shapes: Part-descriptions and object-based attention. Visual Cognition 8(2), 163–196 (2001) 7. Marr, D.: Vision. W H Freeman & co., New York (1982) 8. Griffin, L.D.: Symmetries of 1-D Images. Journal of Mathematical Imaging and Vision 31(2-3), 157–164 (2008) 9. Crosier, M., Griffin, L.D.: Texture classification with a dictionary of basic image features. In: CVPR 2008. IEEE, Los Alamitos (2008) 10. Lillholm, M., Griffin, L.D.: Statistics and category systems for the shape index descriptor of local image. Image and Vision Computing (in press) (2008) 11. Lillholm, M., Griffin, L.D.: Novel image feature alphabets for object recognition. In: ICPR 2008 (2008) 12. Griffin, L.D.: Symmetries of 2D images: cases without periodic translation. Journal of Mathematical Imaging and Vision (in press)
Basic Image Features (BIFs) Arising from Approximate Symmetry Type
355
13. Griffin, L.D.: The 2nd order local-image-structure solid. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(8), 1355–1366 (2007) 14. Griffin, L.D., Lillholm, M.: Symmetry-sensitivity of derivative of gaussian filters. IEEE Transactions on Pattern Analysis and Machine Intelligence (in press) 15. Bieberbach, L.: Über die bewegungsgruppen der euklidischen raume I. Mathematische Annalen 70, 297 (1911) 16. Conway, J.H., et al.: On three-dimensional space groups. Contributions to Algebra and Geometry 42(2), 475–507 (2001) 17. Grünbaum, B., Shephard, G.C.: Tilings and Patterns. WH Freeman & co., New York (1987) 18. Schattschneider, D.: MC Escher. Visions of Symmetry. Plenum Press (1990) 19. Holser, W.T.: Classification of symmetry groups. Acta Crystallographica 14, 1236–1242 (1961) 20. Loeb, A.A.: Color and Symmetry. Robert E. Krieger (1978) 21. Klein, F.: A comparative review of recent researches in geometry (trans. by MW Haskell). Bulletin of the New York Mathematical Society 2, 215–249 (1892) 22. Koenderink, J.J., van Doorn, A.J.: Image processing done right. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 158–172. Springer, Heidelberg (2002) 23. Cayley, A.: Sixth memoir upon the quantics. Philosophical Transactions of the Royal Society 149, 61–70 (1859) 24. Koenderink, J.J., van Doorn, A.J.: Generic Neighborhood Operators. IEEE Transactions on Pattern Analysis and Machine Intelligence 14(6), 597–605 (1992) 25. Varma, M., Zisserman, A.: Texture classification: are filter banks necessary? In: CVPR 2003. IEEE, Los Alamitos (2003) 26. Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. International Journal of Computer Vision 62(1), 61–81 (2005) 27. Hayman, E., et al.: On the signifigance of real-world conditions for material classification. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3024, pp. 253–266. Springer, Heidelberg (2004) 28. Zhang, J., et al.: Local features and kernels for classification of texture and object categories: a comprehensive study. In: CVPR 2006 (2006) 29. Perronnin, F., et al.: Adapted vocabularies for generic visual categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006) 30. Varma, M., Zisserman, A.: Unifying Statistical Texture Classification Frameworks. Image and Vision Computing (in press) (2005) 31. Cula, O.G., Dana, K.J.: Compact representation of bidirectional texture functions. In: CVPR 2001. IEEE, Los Alamitos (2001) 32. Lazebnik, S.C., Schmid, C., Ponce, J.: A spare texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1265– 1278 (2005) 33. Csurka, G., et al.: Visual categorization with a bag of keypoints. In: ECCV 2004, pp. 1–22 (2004) 34. Lowe, D.G.: Towards a computational model for object recognition in IT cortex. In: Biologically Motivated Computer Vision, Proceeding, pp. 20–31 (2000) 35. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
An Anisotropic Fourth-Order Partial Differential Equation for Noise Removal Mohammad Reza Hajiaboli Department of Electrical and Computer Engineering Concordia University, Montreal, Canada
[email protected] Abstract. Fourth-order nonlinear diffusion filters are isotropic filters in which the strength of diffusion at regions with strong image features such as regions with an edge or texture is reduced leading to their preservation. However, the optimal choice of parameter in the numerical solver of these filters for having a minimal distortion of the image features results in a very slow convergence rate and formation of speckle noise on the denoised image especially when the noise level is moderately high. In this paper, a new fourth-order nonlinear diffusion filter is introduced, which have an anisotropic behavior on the image features. In the proposed filter, it is shown that a suitable design of a set of diffusivity functions to unevenly control the diffusion on the directions of level set and gradient leads to a fast convergent filter with a good edge preservation capability. The comparison of the results obtained by the proposed filter with that of the classical and recently developed techniques shows that the proposed method produces a noticeable improvement in the quality of denoised images evaluated subjectively and quantitatively as well as a substantial increment of the convergence rate comparing to the classical filter.
1
Introduction
Nonlinear diffusion denoising filters are known for their good edge preservation capabilities. In these techniques, the denoised image is a solution of a partial differential equation (PDE). The first kind of these denoising methods is introduced by Perona and Malik [1] in 1990 based on solving a nonlinear second-order PDE (i.e. the so-called Perona-Malik equation). Since then, there has been a great deal of research in this filed which led to introduction of variety of nonlinear diffusion denoising techniques (see [2], [3] as a few examples). In spite of the good edge preservation obtained by these techniques, these methods tend to produce blocky effects in the images [4]. In fact, the solution of Perona-Malik equation is a piecewise constant solution, therefore these filters create blocky effects on the smooth regions of the image. A spatially regularized version of the nonlinear diffusion filter has been introduced by Catte et al. [2] to reduce the formation of the these artifacts on the denoised image. You and Kaveh [4] proposed a more effective solution to this problem by using a fourthorder PDE for noise removal, where a planar approximation of the noisy image X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 356–367, 2009. c Springer-Verlag Berlin Heidelberg 2009
An Anisotropic Fourth-Order Diffusion Filter
357
is supported in the solution of the PDE resulting in a significant improvement of the ramp edge preservation and a dramatic reduction of blocky effects. Based on this idea a variety of the fourth-order PDE based denoising techniques have been developed such as the filters given in [5], [6], and [7]. However, the fourthorder diffusion filters damp high spatial frequency components (i.e. noise and step edges) much faster than the second-order ones [5]. This feature can result in distorting of the step edges during the evolutionary process of the image denoising especially when smoothing strength of the filter for the detected edges is not effectively reduced by a diffusivity function. Setting a small threshold value in the diffusivity function substantially reduces the diffusivity on the edges with the expense of a very slow convergence rate, as reported in [4] and [5]. All of previously mentioned techniques belong to a class of diffusion-based denoising filters known as isotropic nonlinear diffusion denoising methods. It means that total amount of the diffusion controlled by the diffusivity function is applied on the different regions of the image regardless of the direction of the image features. To improve edge preservation of these filters, the other class of diffusion-based denoising techniques have been emerged in which the diffusion is adapted to the direction of the local image features [8], [9] and [10]. It means that the filter minimizes the diffusion strength on the direction perpendicular to the direction of local features and maximizes it in the direction of the local features. However, these techniques have been developed in the context of the second-order diffusion filters. In this paper, an anisotropic fourth-order diffusion filter is proposed in which the diffusion strength is adjusted respecting the direction of the local features. Two different diffusivity functions are designed to extremely minimize the diffusion perpendicular to the feature orientation, while allowing the diffusion parallel to the edge orientation and on the smooth regions to proceed with normal strength. The comparison of the results obtained by the proposed filter with that of the classical and newly developed ones reveals a noticeable improvement in the quality of the denoised images evaluated subjectively and quantitatively as well as a substantial increment of the convergence rate comparing to the classical filter.
2 2.1
A Brief Review From Second to Fourth-Order Filters
The nonlinear diffusion filters are evolutionary processes. The fundamental PDE of the nonlinear diffusion filter introduced by Perona and Malik [1] is given by ∂u/∂t = div. (c (∇u) ∇u) ,
(1)
where u is the image intensity function, c(.) is a diffusivity function by which the diffusion coefficient is calculated and t is the evolution time. Symbols of div. and . are used for mathematical notation of Euclidean norm and divergence respectively. The diffusivity function is a positive and none increasing function of ∇u. One of these diffusivity functions defined by Perona and Malik is given by
358
M.R. Hajiaboli
c (∇u) = k 2 / k 2 + ∇u2 ,
(2)
where k is the so-called contrast parameter. You and his colleagues [11], carried out a detailed analysis to show that the solution of (1) is equal to the minimization of an energy functional. If the diffusivity function of (2) is used then the energy functional is R (u) = Ω
k2 2 ln k + ∇u2 dxdy , 2
(3)
where Ω is the region of support of u. R (u) is minimized when ∇u2 is minimum, which leads to a piecewise constant approximation of u. Therefore, formation of staircase artifacts on the ramp edges is unavoidable. In order to resolve this problem, You and Kaveh [4] introduced a fourth-order PDE-based denoising method in which the denoised image is obtained by minimization of the potential function given by E (u) = f |∇2 u| dxdy , (4) Ω
2
where f (s) = sc (s)and |∇ u| is the absolute value of Laplacian of u. Therefore, for the same diffusivity function in (2), E(u) is in form of E (u) = Ω
k2 2 ln k + |∇2 u| dxdy , 2
(5)
meaning that E(u) is minimized when |∇2 u| is minimum. Therefore, the ramp region of u (i.e. the regions where |∇2 u| = 0) are fit in the solution of the associated fourth-order PDE. The solution of the Minimization problem of (4) after using Euler equation followed by gradient descent procedure is given by ∂u/∂t = −∇2 c |∇2 u| ∇2 u , (6) By the forward Euler approximation of the ∂u/∂t , the numerical solver of (6) is given by un+1 = un − dt × ∇2 c |∇2 un | ∇2 un , u0 = u0
and
n = 0, 1, · · · , N ,
(7)
where n is the number of iterations, dt is the time step-size and u0 is a noisy image. This process is an iterative process. In order to protect the edges from over-smoothing, the process needs to be ceased at a certain number of iterations denoted by N. Besides these nonlinear diffusion filters, another class of techniques known as regularization techniques based on solving the nonlinear PDE has been widely used for image restoration. The classical paper of Rudin, Osher and Fatemi [12] is introduced one of the first kind of these filters in which PDE to be solved is
An Anisotropic Fourth-Order Diffusion Filter
359
of the second order. Therefore, the same problem of formation of staircases on the ramp regions of the image motivates the researchers to introduce the new regularization techniques by solving the higher order PDE such as [13], [14]. However, the focus of this paper is on the diffusion based techniques as they have been reviewed earlier. 2.2
Edge Preservation and Convergence Rate
Apart from a significant advancement in reduction of the blocky effects on the denoised image using (6), the optimal parameter setting for numerical solver in (7) leads to very slow convergence rate in its numerical solver especially when the level of contaminating noise is moderately high. A recently developed technique known as a fourth-order hybrid model [6] uses a relaxed median filter [15] to improve the quality of the denoised image when the observed image is heavily contaminated by noise. The numerical model of this filter is given by un+1 = RMαω un − dt × ∇2 c |∇2 un | ∇2 un , (8) where RM denotes the relaxed median filter with a lower bound of α and upper bound of ω. This filtering process needs a lower number of iterations to give a denoised image. However, the denoised image is highly affected by using the relaxed median filter and the main advantage of using fourth-order diffusion filters (i.e. the ramp edge preservation) is hindered as it is shown later. Moreover, the computational burden per iteration is dramatically higher than that of the You and kaveh. Another recently introduced technique [7] demonstrates a significant improvement in the convergence rate along with a good ramp edge preservation. In this technique, the diffusivity function of the You and Kaveh filter, c |∇2 u| , is replaced by c (∇u) and the PDE of the filter is given by (9) ∂u/∂t = −∇2 c (∇u) ∇2 u , Although the energy functional of (9) does not have a closed form, it can be seen that the filter can still support the planar approximation of the image. The ramp edge preservation of this fourth-order diffusion filter comes from the fact that 2 2 ∂u/∂t 2 → 0 when ∇ u → 0. However, as|∇ u| ≥ ∇u the diffusivity function of c |∇ u| gives the smaller diffusion coefficient for the step edges compared to c (∇u) . Therefore, in spite of the good convergence rate obtained by (9), the step edges are still facing the higher amount of the distortion comparing to that of the classical methods. 2.3
Anisotropic Diffusion Filters
The so-called anisotropic diffusion filters refer to the schemes in which the diffusion rate is specifically controlled based on the direction of the local features such as the ones introduced in [8], [9] and [10]. The coherence-enhancing diffusion filter [9] is one this kind in which the scalar diffusion coefficient in (1) is
360
M.R. Hajiaboli
replaced by a tensor diffusion coefficient to reduce the diffusivity of the filter in perpendicular to the orientation of the local features, while let the diffusion with high strength is performed at the direction of the level set. Another anisotropic filter introduced by Carmona and Zhong [10] uses the scalar diffusivity functions to perform anisotropic diffusion. The PDE of this filter is given by ∂u/∂t = c1 (c2 uηη + c3 uξξ ) ,
(10)
where c1 ,c2 and c3 are different diffusivity functions and uηη and uξξ are the second-order directional derivative. Let η denote the perpendicular direction to the orientation of the feature or the so-called gradient direction and ξ denote the direction of the contour or level set. All of these techniques belong to a class of filters known as the secondorder diffusion filters. Some techniques such as [16] for surface smoothing by anisotropic diffusion filtering of the normals to the surface or its other variant for image denoising [17] can be considered as fourth-order anisotropic filters, however these filters are two phase filters meaning that at the first phase, an anisotropic filter applies on the normal map of the surface or image and at the second phase, a surface is fitted to the processed normals. In Section 3, a new setting of the fourth-order anisotropic diffusion filter is proposed, which is a single phase filter and can be seen as a generalization of the classical fourth-order nonlinear diffusion filter of You and Kaveh.
3 3.1
The Proposed Model Diffusion Equation
The previously mentioned fourth-order diffusion filters are isotropic in which the extent of the diffusion is controlled by the diffusivity function regardless of the orientation of the edges. The only anisotropic behavior of those filters is limited to the anisotropic response of the discrete Laplacian operator. Most of the discrete Laplacian operators exhibit an anisotropic response to the edge with respect to x and y (i.e. the Cartesian coordination) [18]. However, in order to give an anisotropic realization of the fourth-order diffusion filter, one should consider the second-order directional derivative of the image. Two normalized and orthogonal vectors of η and ξ pointing at the direction of the gradient and level set respectively are given by [ux uy ] [−uy ux ] η= and ξ = . u2x + u2y u2x + u2y
(11)
Based on the definition in (11) , one can derive the second order derivative of the image in the direction of the gradient and level set as uηη = and
uxx u2x + 2ux uy uxy + uyy u2y u2x + u2y
(12)
An Anisotropic Fourth-Order Diffusion Filter
uξξ =
uxxu2y − 2ux uy uxy + uyy u2x . u2x + u2y
361
(13)
However, it can be simply shown that the summation of these second directional derivatives is equal to the Laplacian of the image, ∇2 u = uxx + uyy = uξξ + uηη .
(14)
Therefore, the proposed fourth-order diffusion equation, which is of a generalization of (6) can be written as ∂u/∂t = −∇2 (c1 (c2 uηη + c3 uξξ )) .
(15)
In the proposed model, c1 , c2 and c3 are the diffusivity functions, where c1 controls total amount of diffusion and c2 and c3 control the uneven diffusion in the direction of η and ξ . Apparently, choosing c2 = c3 and c1 ∗ c2 = c will lead to the nonlinear diffusion filter of (6) or (9) depending on the definition of c. In the next section, the criteria of a suitable choice for these diffusivity functions are discussed. 3.2
Diffusivity Functions
Different diffusivity functions in context of nonlinear diffusion denoising have been introduced and depending on the choice of the diffusivity function, the behavior of the filter can be varied. The most commonly used diffusivity function in fourth-order diffusion filters is the one in (2) as c (s), where s is the modulus of the derivative of the image (s = |∇2 u| in (6) or s = ∇u in (9). This diffusivity function regardless of the choice of s is a function bounded in (0,1]. However, a low computational cost and suitable choice of these diffusivity functions in our proposed model is given by c1 (s) = c2 (s) = c (∇u) and c3 = 1 .
(16)
Similar to (9), s in the function c1 is the modulus of the gradient of u which leads to a fast convergence rate and c2 = c1 is an optimal choice in terms of overall computational cost of the filter. Therefore, the proposed model in (15) can be rewritten in the form of ∂u/∂t = −∇2 c (∇u)2 uηη + c (∇u) uξξ . (17) Since the function c is bounded in (0, 1], the overall diffusivity in η direction is smaller than the one in ξ direction. Before presenting comparative results in the next section, the performance of the filter is compared to the second order filter of Perona Malik in Fig.1, which can show the ability of the proposed filter to preserve the ramp edges. In fact, the proposed filter supports the planar approximation of the image similar to (6) and (9), since for planar regions, uηη → 0 and uξξ → 0 which lead to ∂u/∂t → 0.
362
M.R. Hajiaboli
(a)
(b)
(c)
Fig. 1. Comparing the results obtained by a second-order filter and the proposed filter, (a) noisy image, (b) denoised image by the Perona and Malik filter, (c) denoised image by the proposed filter
3.3
Inverse Diffusion
The classical fourth-order filter of You and Kaveh [4] in (6) is a well-posed process because its potential function, (5), is a positive potential function with a global minimum. On the other hand, deriving the potential function of the proposed filter, (17), is not as simple as (6). However, in order to demonstrate that the uneven weighed summation of uηη and uξξ may lead to the inverse diffusion, it is sufficient to show that at least for a sub-region of u 2 sing c (∇u) uηη + c (∇u) uξξ = sign ∇2 u . (18) In this case, the dynamic flow of (17) performs an inverse diffusion, which results in the edge enhancement. The maximum of the uneven weight between coefficients of uηη and uξξ happens, when c (∇u) = 1/2. In this case, the linear version of the (17) can be written in the form of u uξξ ηη ∂u/∂t = −∇2 + 2 4u uξξ uξξ ηη 2 = −∇ + + 4 4 24 u u ∇ ξξ + . (19) = −∇2 4 4 Knowing (6) has a positive potential function, if sign ∇2 u/4 + uξξ /4 = 2 that sign ∇ u , it results in a positive potential function for filter (19). It means that |∇2 u| > |uξξ | should be valid throughout the whole image, which does not hold true. An example shown in Fig.2 can demonstrate the fact that the linear diffusion equation of (19) performs an inverse diffusion on the edges. The signal shown in Fig.2-(b) is the extracted intensity profile of the standard test image of "disk" in Fig.2-(a) at the middle row. The signal in Fig.2-(c) is the same intensity
An Anisotropic Fourth-Order Diffusion Filter
363
Fig. 2. Inverse diffusion as a result of the uneven diffusion in the directions of η and ξ, (a) is the original image of "disk", (b),(c) and (d) are the intensity of the original, diffused image by (19) and diffused image by the proposed filter (17) at the middle row
profile of the image after being filtered by (19). The inverse diffusion in this case leads to the edge enhancement. However, if the filter is run on the nonlinear fashion as it is proposed in (17), the image shown in Fig.2-(d) shows that process of uplifting of the edges is dramatically reduced. In the other word, in the general application of the image denoising, the process of the inverse diffusion in the proposed filter does not lead to instability of the filter and formation of ringing artifacts around the edges.
4
Comparative Results
In this section, we are presenting the comparative results of the proposed method with the other fourth-order nonlinear diffusion filters. The results of the following filters are going to be compared: 1. The Proposed filter with the PDE of (17) with k=7 and dt=0.031 (i.e. the time-step size that provides a data independent stability in the numerical solver [7]). 2. The filter of (7) introduced by You and Kaveh [4] with the suggested parameters setting of dt=0.25 and k=1. 3. The filter of (8) introduced in [12] with the suggested parameters setting of dt=0.1, k=3, α = 3 and ω = 5. 4. The filter in (9) introduced in [7] that is a self-governing filter. In this filter, the diffusivity function of Pernoa and Malik, c(s) has been used with s = ∇u, the contrast parameter of k is estimated by histogram-based mechanism used in [1] and dt=0.031. Three test images of "Pepper", "Cameraman" and "House" have been corrupted by white additive Gaussian noise with standard deviation of 15. In Table 1, an objective comparison between the performances of these filters in terms of signal-to-noise ratio (SNR) of the denoised image and their computational complexity are presented.
364
M.R. Hajiaboli Table 1. Quantitative comparison of the results
Noisy Image SNR(dB) Pepper 10.98 Cameraman 12.38 House 9.68
Method Proposed (9) (7) (8) Proposed (9) (7) (8) Proposed (9) (7) (8)
SNR(dB) 17.84 17.32 15.83 15.21 17.08 16.83 16.59 13.59 17.44 17.08 15.80 15.39
Denoised Image Num. of Iter. CPU/Iter. Convergence(s) 80 0.038 3.04 14 0.080 1.12 3133 0.031 97.12 2 0.155 0.31 35 0.038 1.33 6 0.082 0.492 3015 0.031 93.46 1 0.160 0.16 89 0.038 3.382 36 0.081 2.916 3907 0.031 121.12 2 0.160 0.32
18 17 16
SNR(dB)
15 14 13 12 filter (7) filter (9) proposed filter (17) filter(8)
11 10 9 0 10
1
10
2
10 Number of Iteration −1
3
10
4
10
Fig. 3. Comparing the convergence rate of the filters for denoising of test image "House"
The results exhibit that the proposed method constantly produces the denoised image with higher SNR. It is important to note that the results are obtained at the optimal number of iterations in which the maximum SNR in evolutionary process of the filters are achieved. If the iterative filtering process is continued after the optimal number of iterations, the SNR of the denoised image is reduced due to over-smoothness of edges. The other important feature in the proposed method is its fast convergence rate. As it is shown in Fig.3, for the test image of "House", the convergence rate in the proposed method is much higher than the filter of You and Kaveh. The computational burden of the filters is measured as CPU time of each iteration provided that they are filtering the same image on the same computer. Thus, the total convergence time for filtering process is a multiplication of CPU/iteration by number of iterations. The relaxed median regularized filter converges faster
An Anisotropic Fourth-Order Diffusion Filter
(a)
(d)
(b)
(c)
(e)
(f)
365
Fig. 4. Comparing the perceptual quality of the results. The pair of images labeled (a) to (f) are as the following: (a) noiseless image, (b) noisy image, (c) denoised image using (7), (d) denoised image using (8), (e) denoised image using (9), (f) is denoised image using proposed filter (17).
366
M.R. Hajiaboli
than the proposed method, however the maximum SNR is significantly lower than that of other methods, and the decay rate of SNR due to over-smoothness of the edges is also very fast. The computational cost of the proposed filter compared to the one in (9) is slightly higher, however the higher SNR obtained by the proposed filter justifies this amount of the higher computational burden. In Fig.4, the perceptual quality of the denoised image by the proposed method is compared with that of the other methods. In the first row, the whole image and in the second row, a magnified portion of the image are shown. Each pair of the images is labeled from (a)-(f). The first two images (a) and (b) are the noiseless and the noisy images. In Fig.4-(c), the denoised image by You and Kaveh filter is shown in which formation of some speckle noise on the denoised image is visible. This drawback is known and addressed in [4] and it is as a result of choosing small value for k in diffusivity function, however this setting of k is necessary to protect the edges from over-smoothing. In Fig.4-(d), the denoised image by the relaxed median regularized filter using (8) is shown. This denoised image is blurred and some staircase artifacts on smooth regions of the image are formed. The next image, shown in Fig.4-(e) is the result of the filter in (9) in which the extent of denoising and edge preservation is noticeably better than that of the filters of (7) and (8). However, comparing this result with the one obtained by the proposed filter in Fig. 4-(f) reveals that the extent of edge preservation in the proposed filter is noticeably higher.
5
Conclusion
An anisotropic fourth-order PDE for noise removal has been proposed. A brief theoretical review of the second and fourth-order diffusion denoising filters has been presented with highlighting the fact that previously developed fourth-order filters are isotropic filters in which the extent of the edge preservation is controlled by reduction of the diffusivity of the filters near the edge regardless of its orientation. A major challenge in these filters is that the optimal choice of the model parameters for good edge preservation leads to a dramatically slow convergence rate. However, in the proposed filter, the diffusion strength has been adjusted with respect to the direction of the local features. Two different diffusivity functions have been designed to extremely minimize the diffusion in perpendicular to the feature orientation (i.e. gradient direction), while let the diffusion on the direction parallel to the orientation of the edge (i.e. direction of the level set) proceed with normal speed. Therefore, the proposed filter leads to a faster reduction of the uncorrelated noise and overall faster convergence rate with a good edge preservation due to reduction of the diffusivity of the filter in the gradient direction. The comparison of the results obtained by the proposed filter with that of the classical and newly developed ones has shown that the proposed method produces a noticeable improvement in the quality of the denoised images evaluated subjectively and quantitatively as well as a substantial increment of the convergence rate compared to the classical filter.
An Anisotropic Fourth-Order Diffusion Filter
367
References 1. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. on Pattern Analysis and Machine Intelligence 12(7), 629–639 (1990) 2. Catte, F., et al.: Image selective smoothing and edge detection by nonlinear diffusion. SIAM J. Numer. Anal. 29(1), 182–193 (1992) 3. Black, M.J., et al.: Robust anisotropic diffusion. IEEE Transactions on Image Processing 7(3), 421–432 (1998) 4. You, Y.L., Kaveh, M.: Fourth-order partial differential equations for noise removal. IEEE Transactions on Image Processing 9(10), 1723–1730 (2000) 5. Lysaker, M., Lundervold, A., Tai, X.-C.: Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Tran. on Image Processing 12(12), 1579–1590 (2003) 6. Rajan, J., Kannan, K., Kaimal, M.R.: An Improved hybrid model for molecular image denoising. Journal of Mathematical Imaging and Vision 31, 73–79 (2008) 7. Hajiaboli, M.R.: A self-governing hybrid model for noise removal. In: Wada, T., Huang, F., Lin, S. (eds.) PSIVT 2009. LNCS, vol. 5414, pp. 295–305. Springer, Heidelberg (2009) 8. Weickert, J.: Anisotropic Diffusion in Image Processing. B. G. Teubner (1998) 9. Weickert, J.: Coherence-enhancing diffusion filtering. International Journal of Computer Vision 31(2-3), 111–127 (1998) 10. Carmona, R.A., Zhong, S.: Adaptive smoothing respecting feature directions. IEEE Transactions on Image Processing 7(3), 353–358 (1998) 11. You, Y.-L., et al.: Behavioral analysis of anisotropic diffusion in image processing. IEEE Trans. Image Processing 5, 1539–1553 (1996) 12. Rudin, L., Osher, S., Fatemi, E.: Nonlinear Total Variation based noise removal algorithms. Physica D 60, 259–268 (1992) 13. Chan, T., Marquina, A., Mulet, R.: High Order Total Variation-based Image Restoration. SIAM J. on Scientific Computing 22(2), 503–516 (2000) 14. Fang, L., et al.: Image restoration combining a total variational filter and a fourthorder filter. Journal of Visual Communication and Image Representation 18(4), 322–330 (2007) 15. Hamza, A.B., et al.: Removing noise and preserving details with relaxed median filters. Journal of Mathematical Imaging and Vision 11(2), 161–177 (1999) 16. Tasdizen, T., et al.: Geometric surface smoothing via anisotropic diffusion of normals. IEEE visualization 1(1), 125–132 (2002) 17. Lysaker, M., Osher, S., Tai, X.-C.: Noise removal using smoothed normals and surface fitting. IEEE Transactions on Image Processing 13(10), 1345–1357 (2004) 18. Kamgar-Parsi, B., Rosenfeld, A.: Optimally isotropic Laplacian operator. IEEE Transactions on Image Processing 8(10), 1467–1472 (1999)
Enhancement of Blurred and Noisy Images Based on an Original Variant of the Total Variation Khalid Jalalzai and Antonin Chambolle Centre de Mathématiques Appliquées (CMAP), École Polytechnique, 91128 Palaiseau Cedex, France
[email protected],
[email protected] Abstract. In this paper, we introduce a new variant of the total variation (T V ). Its purpose is to simplify T V -based restoration when the image is degraded by some kernel which is easily computed in the Fourier domain (blur, Radon transform...). We actually replace the T V term by a mere L1 norm of some field, for which the optimization is much easier. This approach permits us to use a recent and fast algorithm to enhance, in particular, blurred and noisy images. We also compare our approach with standard total variation based denoising and show that it avoids the famous staircasing effect.
1
Introduction
In 1992, Rudin, Osher and Fatemi (ROF) introduced the total variation in their founding article [13] as a regularizing criterion for inverse problems in imaging. This has been fruitful in image restoration since it can regularize images without smoothing the edges. A possible approach to tackle the minimization of ROF’s problem consists in the generic forward-backward splitting method studied for instance by Combettes and Wajs in [3]. This consists in minimizing (ϕ + ψ) where ϕ and ψ are both convex functions with certain regularity properties. Usually in signal restoration, given a signal u, ϕ(u) is the so-called data fidelity term and is equal to 12 Au − g2 where g is a noisy signal which also underwent a linear perturbation A. The second term ψ usually reflects a priori knowledge about the noise for instance. In case ψ = T V as in ROF’s problem, the main drawback is that it is usually difficult to compute (even with a small error) the minimizer, namely the proximal operator proxT V (see Moreau [9] or again Combettes and Wajs [3] for more about this). Therefore, what we propose in this article is a variant of ROF’s problem where ψ is simply the L1 norm of some field p. This new term preserves the nice properties of the original total variation. Its relevance is due to the fact that its proximal operator is easy to compute and leaves the way open to compressed sensing-type algorithms (see Nesterov [12] or Beck and Teboulle [2] for instance). X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 368–376, 2009. c Springer-Verlag Berlin Heidelberg 2009
Enhancement of Blurred and Noisy Images
369
However our idea is different from the "Augmented lagrangian" (see Tai and Wu [15]) or "Split Bregman" (see Goldstein and Osher [6]) methods where the field p must satisfy at convergence (sometimes approximately) the constraint p = ∇u, while in our approach p might be quite far from being a gradient.
2
Few Notations
From now on, an image u will be represented by an n×n matrix with real entries i.e. an element of X = IRn×n . To simplify matters in the sequel, especially when we shall consider the discrete Fourier transform of u, we assume that the image u is also periodic and defined for all k ∈ ZZ by ui+kn,j+kn = ui,j with i, j ∈ {1, ..., n}. To define the total variation of the image u, we first have to introduce a discretized version of the gradient. For u ∈ X, it is the vector ∇u of Y = X × X given by ui+1,j − ui,j (∇u)i,j = , ui,j+1 − ui,j for i, j = 1, ..., n. Finally, the most simple approximation of the total variation of u ∈ X is defined by T V (u) = |(∇u)i,j | i,j
where | · | is simply the Euclidian norm of IR2 . Let us also introduce two important operators: the divergence div p of an element p ∈ Y and the laplacian Δv of an image v. By analogy with the continuous setting, we want them to satisfy div p, uX = −p, ∇uY and Δv = div ∇v,
(1)
for all u ∈ X.
3
The TV-Based Classical Approach
Given a noisy image g which has also been exposed to a linear perturbation A, the Rudin, Osher and Fatemi method suggests to minimize the following functional 1 2 F (u) = Au − g + λT V (u) (2) 2 to restore the image g. The positive parameter λ controls the regularization level. Actually the T V term is not differentiable and in practice it is often replaced by another approximation of the total variation: ε2 + |(∇u)i,j |2 T Vε (u) = i,j
370
K. Jalalzai and A. Chambolle
where ε is a positive real number. Therefore, we are led to find the unique uε which minimizes 1 2 Fε (u) = Au − g + λT Vε (u). 2 We are actually facing a smooth convex optimization problem which can be solved easily with the gradient method. To do so, it is enough to consider a sequence (un ) of images and a small enough gradient step h > 0 that satisfy un+1 = un − h AT (Aun − g) + λ∇T Vε (un ) ⎛ ⎞ (∇un )i,j ⎠, ∇T Vε (un ) i,j = − div ⎝ ε2 + |(∇un )i,j |2
with
for any i, j = 1, ..., n. It remains to choose u0 : it would be wiser to take it as close as possible to the minimizer, consequently u0 = g seems to be a good choice. Unfortunatelly, the simple scheme which is suggested above is fairly slow since it converges as O n1 which means that there exists a positive real C such that Fε (un ) − Fε (uε ) ≤
C . n
A proof of this classical result can be found in [10], [11] or even [2]. Actually, in [11], Nesterov proposes a variant of the gradient algorithm with convergence rate O( n12 ) which solves the problem. It is as follows:
L 2 vn = argmin Fε (un ) + v − un , ∇Fε (un )X + v − un , v ∈ X , 2 n 1 2 [Fε (uk ) + w − uk , ∇Fε (uk )X ] + w − u0 , w ∈ X , wn = argmin 2 k=0
un+1
2 k+1 wn + vn , = k+3 k+3
where L is the Lipschitz constant of ∇Fε . This algorithm combines efficiently classical gradient method (for the calculation of vn ) and conjugate gradient method (calculation of wn ). We refer to Nesterov [10] for further explanations on these two techniques. See also Beck-Teboulle [2] for a recent, simpler variant.
4
A Variant of TV
Let u ∈ X be an image. The main idea is to replace the T V term in (2) by J(u) = min p1 p∈Y Πp=∇u
Enhancement of Blurred and Noisy Images
371
2 2 where on the one hand, p1 = i,j (p1i,j ) + (p2i,j ) when p = (p1 , p2 ) ∈ X ×X and on the other hand, Π is the projection on the gradients defined by Πp = ∇¯ v, where v¯ realizes the minimum min ∇v − p.
v∈X
(3)
Here · is the Euclidian norm of Y . Remark by the way that we have J(u) ≤ T V (u) ≤ T Vε (u). for any u ∈ X. This is a straightforward consequence of the definition. In the sequel, we shall detail some other interesting properties of this functional which makes us believe that it behaves the same way as T V . Let us get back to work: the solution of (3) is characterized by the EulerLagrange equation ∇∗ (∇u − p) = 0 or, using the notation introduced in (1), Δu = div p, (we recall that our operators ∇, div and Δ are here discrete operators with periodic boundary conditions). Therefore, J(u) =
min
p∈Y div p=Δu
p1 .
Hence, the Rudin, Osher and Fatemi’s problem expressed in terms of this new functional consists in minimizing G(p) =
1 2 Au − g + λp1 2
over (p, u) which satisfy the constraint Δu = div p. Lately, minimization of such functionals has attracted much attention in data compression in particular and was the subject of many papers. Among those, two recent articles by Nesterov [12] and by Beck and Teboulle [2] focus on the minimization of objective functions which can be decomposed as a sum ϕ+ψ where ϕ is a continuously differentiable convex function whose gradient is Lipschitz continuous and ψ is a continuous convex function which is possibly nonsmooth but is simple in the sense that its proximal operator is easy to compute (see Combettes and Wajs [3] for the definition). These characteristics suit perfectly the two terms composing G and we henceforth denote ϕ(p) =
1 2 AΔ−1 div p − g and ψ(p) = p1 . 2
372
K. Jalalzai and A. Chambolle
In their article, Beck and Teboulle describe the following scheme to construct a minimizing sequence (pn ) for G: q1 = p0 ∈ Y, t1 = 1, L 2 pn = argmin ϕ(qn ) + p − qn , ∇ϕ(qn )Y + p − qn + p1 , p ∈ Y , 2 2 1 + 1 + 4tn , tn+1 = 2 tn − 1 qn+1 = pn + (pn − pn−1 ), tn+1 −1
is the Lipschitz constant of ∇ϕ. where L = 12 (1 − cos( 2π n )) Remark that in this algorithm, each iteration needs only one computation of the gradient if things are done properly. As for Nesterov’s algorithm, which converges as O( n12 ) as does Beck and Teboulle’s one, and which is again a clever combination of gradient method and conjugate gradient, it demands two calculations of the gradient which slows down notably each iteration.
5
The Continuous Setting
Let us mention in this section some properties of the functional J in the continuous setting. We refer to Jalalzai [7] for proofs and further results. First of all, let us fix some notations especially for this section. Henceforth, Ω will designate an open set of IRn with a smooth enough boundary and to simplify matters we first place ourselves in the context of functions u whose distributional derivatives are integrable functions that we denote Du, i.e. u lies in Sobolev space H 1 (Ω). The functional we previously introduced is a discretization of n 2 J(u) = inf |φ|, φ ∈ L (Ω) and Πφ = Du . Ω
where Π is the orthogonal projection on gradients as in section 4. Formally n speaking, given a function φ ∈ L2 (Ω) we set Πφ = D¯ v where v¯ minimizes min Dv − φL2 (Ω)n .
v∈H 1 (Ω)
n
It is actually easy to see that there exists a unique ψ ∈ L2 (Ω) such that we have the so-called Helmholtz decomposition ¯ φ = Πφ + ψ where ∇v · ψ = 0 given any v ∈ C 1 (Ω). Ω
(we refer to Dautray-Lions [4] or Temam [14] for more about this topic). If we put things together, we proved that n 2 1 ¯ |Du + ψ|, ψ ∈ L (Ω) and ∇v · ψ = 0 ∀v ∈ C (Ω) . J(u) = inf Ω
Ω
Enhancement of Blurred and Noisy Images
373
Nonetheless, this new formulation of J stays meaningful even when u is simply a function of bounded variation in Ω (denoted u ∈ BV (Ω)) which means that its n distributional derivative Du is this time in Mb (Ω) , the space of IRn -valued finite Radon measures on Ω. Henceforth, we also let ψ range in the space Mb (Ω)n . We refer to Ambrosio, Fusco and Pallara [1] or even Giusti [5] for properties of bounded variation functions and for other measure theory considerations. All this motivated a new definition of J when u ∈ BV (Ω), namely: n 1 ¯ J(u) = inf |Du + ψ|, ψ ∈ Mb (Ω) and ∇v dψ = 0 ∀v ∈ C (Ω) . Ω
Ω
Note by the way that J(u) is obviously well-defined for any u ∈ BV (Ω) since |Du|. (4) J(u) ≤ Ω
Thanks to a classical convex duality argument, it is possible to show that under some additional assumptions on Ω, we have ∇w · Du, w ∈ C 1 (Ω) and ∇w∞ ≤ 1 . J(u) = sup Ω
Using this dual formulation one can prove the following result: Theorem 1. Let Ω be an open set in IRn and u = χE be the characteristic of a finite-perimeter set E ⊂ Ω, or even let u ∈ BV (Ω) with a derivative Du concentrated on the jump set. Then J(u) = |Du|. Ω
The proof is mostly based on the fact that rectifiable sets admit approximate tangent hyperplanes. Remark that when Du has a diffuse part, inequality (4) may be strict. The latter theorem legitimates the use of functional J in the image processing context since it shows that J behaves the same way as T V .
6
Preliminary Numerical Simulations
In this last section, we compare the two different approaches based on the functionals T V and J. For this purpose, we use the two algorithms we presented above. In our implementation, Beck-Teboulle’s algorithm does 2 times less iterations since it needs to compute four Fourier transforms per iteration whereas Nesterov’s algorithm needs only two. We think that one can do much better especially in the case we consider J since it makes extensive use of Fourier transform methods and therefore is easily parallelizable. Moreover, the functional J seems to avoid the famous staircasing effect (see Louchet and Moisan’s article [8]) produced by the T V minimization. Indeed, the latter yields images with peculiar local configurations. J does not. All along these tests, the regularization parameter λ is maintained equal to 1. For all the simulations, we used a personal computer with a 2 Ghz Core2 Duo processor and we let the two Matlab programs run for exactly 20 seconds.
374
6.1
K. Jalalzai and A. Chambolle
First Example
We look at the 256 × 256 Lenna image. This photo went through a Gaussian blur of standard deviation σblur = 1.5 followed by an additive zero-mean Gaussian noise with standard deviation σnoise = 4. The original image is represented in Fig. 1. We then implemented and runned Beck-Teboulle and Nesterov’s algorithms to restore the image. The results of these two experiments are shown in Fig. 3 and 4.
Fig. 1. Original Lenna photo
Fig. 3. J-processed iterations
6.2
Lenna,
Fig. 2. σblur = 1.5, σnoise = 4
600
Fig. 4. T V -processed iterations
Lenna,
1000
Second Example
The second example aims to compare the deblurring for a text scan. The first figure is the original image. In Fig. 2 the 256 × 256 poem image underwent the same disruption process with parameters σblur = 1 and σnoise = 4.
Enhancement of Blurred and Noisy Images
Fig. 5. The first verse of a famous Verlaine’s poem
Fig. 7. J-processed iterations
poem,
600
375
Fig. 6. σblur = 1, σnoise = 4
Fig. 8. T V -processed iterations
poem,
1200
References 1. Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, Oxford (2000) 2. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences (accepted) (2008) 3. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. SIAM Journal on Multiscale Modeling and Simulation 4(4), 1168–1200 (2005) 4. Dautray, R., Lions, J.-L.: Mathematical Analysis VI and Numerical Methods for Science and Technology. Evolution Problems II, vol. 6. Springer, Heidelberg (1993) 5. Giusti, E.: Minimal Surfaces and Functions of Bounded Variation. Birkhäuser, Basel (1984) 6. Goldstein, T., Osher, S.: The Split Bregman Method for L1 Regularized Problems. UCLA CAAM Report 08-29 (2008)
376
K. Jalalzai and A. Chambolle
7. Jalalzai, K.: Étude des propriétés d’une variante de la variation totale. Master thesis (2008) 8. Louchet, C., Moisan, L.: Total variation denoising using posterior expectation (2008), http://hal.archives-ouvertes.fr 9. Moreau, J.-J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. C. R. Acad. Sci. Paris Sér. A Math. 255, 2897–2899 (1962) 10. Nesterov, Y.: Introductory lectures on convex optimization. Kluwer Academic Publishers, Dordrecht (2004) 11. Nesterov, Y.: Smooth minimization of non-smooth functions. Mathematical Programming (A), pp. 127–152 (2005) 12. Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE Report (2007) 13. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 14. Temam, R.: Navier-Stokes Equations Theory and Numerical Analysis. AMS Bookstore (2001) 15. Tai, X.-C., Wu, C.: Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration for ROF Model. UCLA CAAM Report 09-05 (2009)
Coarse-to-Fine Image Reconstruction Based on Weighted Differential Features and Background Gauge Fields Bart Janssen, Remco Duits, and Luc Florack Eindhoven University of Technology, Dept. of Biomedical Engineering & Dept. of Mathematics and Computer Science {B.J.Janssen,R.Duits,L.M.J.Florack}@tue.nl
Abstract. We propose an iterative approximate reconstruction method where we minimize the difference between reconstructions from subsets of multi scale measurements. To this end we interpret images not as scalar-valued functions but as sections through a fibered space. Information from previous reconstructions, which can be obtained at a coarser scale than the current one, is propagated by means of covariant derivatives on a vector bundle. The gauge field that is used to define the covariant derivatives is defined by the previously reconstructed image. An advantage of using covariant derivatives in the variational formulation of the reconstruction method is that with the number of iterations the accuracy of the approximation increases. The presented reconstruction method allows for a reconstruction at a resolution of choice, which can also be used to speed up the approximation at a finer level. An application of our method to reconstruction from a sparse set of differential features of a scale space representation of an image allows for a weighting of the features based on the sensitivity of those features to noise. To demonstrate the method we apply it to the reconstruction from singular points of a scale space representation of an image.
1
Introduction
Reconstruction from signal samples is a long standing problem in signal and image analysis [20]. We present a method for the approximation of a signal or image from its generalized samples, i.e. the samples are given on a non-equidistant grid and were obtained by means of spatially varying filters. Variational reconstruction of non-equidistant image samples has recently become of interest to the image compression community [9] where significant gains in reconstruction quality have been obtained by introducing anisotropic non-linear regularization strategies. In the scale space community a general interest in reconstruction from generalized samples has been there for quite some time [19, 18, 14, 12, 13]. We propose a method that produces an image that approximately satisfies all features. Features that are more robust to perturbations of the source image are given a higher weight, which steers the reconstruction method such that those X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 377–388, 2009. c Springer-Verlag Berlin Heidelberg 2009
378
B. Janssen, R. Duits, and L. Florack
features are better approximated than those that are more sensitive to noise. This leads to a more robust method compared to interpolating methods. A gauge field is introduced by means of covariant derivatives on a vector bundle. This way a model of the to be reconstructed image can be incorporated in the energy functional which is minimized to find a suitable reconstruction. Using this gauge field we can construct a coarse-to-fine image reconstruction method. A coarse-to-fine approach naturally leads to a more efficient algorithm in terms of memory consumption and computational efficiency.
2
Image Reconstruction
In the reconstruction problem we aim for a reconstruction from a set of linear functionals on an image. These functionals represent measurements on the image and are henceforth called features. More rigorously: a feature di ∈ R of an image f ∈ L2 (R2 ) measured with a filter ψi ∈ L2 (R2 ) is given by di = (ψi , f )L2 , i = 1 . . . P in which (·, ·)L2 denotes the L2 -inner product. In general the set of features do not describe the input image f unambiguously (they do not constitute a frame), and there is need for a model to which the reconstruction should adhere. When such a model can be described by a (semi-)norm the reconstruction can be obtained directly by means of an orthogonal projection onto the features [14]. Nielsen and Lillholm [19, 18] proposed to find a reconstruction from its features using a nonlinear regularization term (model). Their so called observationconstrained evolution ensures that the features are interpolated. When measurements are contaminated by noise approximation is often favored over interpolation. In the following we will not discuss the interpolation but approximation of a set of P features {di }P i=1 that were obtained by means of the filters {ψi }i=1 .
3
Approximation
Instead of searching for a signal that interpolates the given features one can try to find a signal that approximates the features. In the case of noisy measurements the latter approach is often preferred. We now aim for the function g ∈ H1 (R2 ) that minimizes P λ 2 E(g) = ((g, ψi )L2 − di ) + ||∇g||2 dV , (1) 2 2 R i=1 regularization term data term where λ ∈ R+ a parameter that controlls the quality of the approximation. As λ tends to 0 the approximation will approach the interpolation of the features. The minimizer of this linear functional can be found by finding the unique g that solves the following Euler equation: P ψi ((ψi , g)L2 − di ) − λΔg = 0 . (2) i=1
Coarse-to-Fine Image Reconstruction
379
The parameter λ takes into account each feature with the same weight. This is not desirable when the features are not normalized and even after normalization one can improve on the selection of the weights. We allow for these improvements by introducing P extra parameters (which we will call feature weights), αi ∈ R+ , i = 1 . . . P , that will be set to a fixed value based on the properties of the features. In case of reconstruction from differential features of a scale space representation of an image, which is the main motivation for our method, we can select the newly introduced parameters based on the noise propagation in the scale space representation of an image. The global parameter λ can be absorbed by these parameters but will be maintained in our formulation for the sake of clarity. For fixed αi we now search for the g that satisfies arg min E(g) = arg min g∈L2 (R2 )
P
g∈L2 (R2 ) i=1
2
αi ((g, ψi )L2 − di ) +
λ 2
R2
||∇g||2 dV .
(3)
In the next section we will discuss how the feature weights can be selected. 3.1
Noise Propagation
In order to be able to select sensible values for the αi parameters that appear in eq. (3), we need to make some assumptions on the noise and the set of filters {ψi }P i=1 that are used to extract the measurements. With regard to the noise we assume additive zero-mean white Gaussian noise which has a correlation distance of τ pixels. In recent work about stability of toppoints [2] (which are singular points of a Gaussian scale space representation of an image) this was found to be a sensible assumption. In our application we will reconstruct from differential structure taken from the Gaussian scale space representation of the input image f , therefore we assume that the set of filters {ψi }P i=1 consists of Gaussian kernels or derivatives thereof. The idea is now to construct the weights αi according to the sensitivity of their associated differential features to noise. In order to estimate the sensitivity of a feature di of the image f that is contaminated by additive noise we can adopt work on noise propagation in scale space by Blom [4]. He proposes to 2 compute at a certain scale t > 0 the momenta Mm = Nmx ,my , Nnx ,ny x ,my ,nx ,ny of derivatives of orders mx , my , nx , and ny of the fiducial noise function N . He assumes only the covariance matrix N 2 of the noise to be given. In case the correlation distance τ is much smaller than the scale t, 2 Mm N 2 x ,my ,nx ,ny
τ −1 12 (mx +my +nx +ny ) 2t
4t
Qmx +nx Qmy +ny ,
(4)
with Qn = (n + 1)!! for n even and Qn = 0 otherwise. Features that are sensitive to perturbations on the source image f should influence the final result less than features that are relatively insensitive to these perturbations. Therefore we i compute αi from eq. (4) such that αi ∝ Mn−2 i ,ni ,ni ,ni at scale t . The parameters x
y
x
y
nix , niy , and ti are the derivative order in the x direction, the derivative order
380
B. Janssen, R. Duits, and L. Florack
in the y direction and the scale of the ith filter ψi . Here we stress that these estimations are based on the assumption that P the filters are partial derivatives of a Gaussian. We furthermore ensure that i=1 αi = 1, which essentially makes αi independent of the value of N 2 and τ . 3.2
Discretization
We can try to solve an approximation to g by discretizing eq. (2) (augmented with the feature weights) or discretize the energy functional in eq. (3), and thereafter finding a discrete minimizer of the discretized energy. These two approaches can be equivalent for a slick choice of so called test functions that are involved in the former method. We will proceed by elaborating on directly discretizing the energy. To solve g from eq. (3) we will approximate g by a β-spline of order n:
(eiω/2 − e−iω/2 )n+1 β n (x) = F −1 ω → (x) , (5) (iω)n+1 where F −1 denotes inverse Fourier transformation. Equality (5) is equivalent to the (n + 1)-fold convolution of the β-spline of order 0 ⎧ ⎨ 1 − 12 < x < 12 0 β (x) = 21 |x| = 12 ; (6) ⎩ 0 otherwise a rectangle. Further details concerning β-splines can be found in eg. [22]. It was shown in the context of optic flow [17] and registration [21] that such an approach has computational advantages over a finite difference approach. Arigovindan et al. [1] showed good results in his application of this approach to (a multigrid scheme for) image and vector field interpolation. Moreover it allows for a coarse-to-fine implementation in a elegant way because of the 2-scale relation
x x n+1 n −n n β = 2 − k . (7) β k 2j 2j−1 k∈Z
The n-th order β-spline approximation of g in two spatial dimensions at resolution a > 0, is given by g˜a (x, y) =
M−1 −1 N l=0 k=0
ck,l β n (
x y − k)β n ( − l) , a a
(8)
with ck,l , x, y ∈ R, β n (·) the central β-spline of order n ∈ N, resolution parameter a, and N, M ∈ N correspond to the width and height of the image in pixels. Notice this is a representation of the image in the continuous domain and that g˜a ∈ C n (R2 ), i.e. n-times continuously differentiable.
Coarse-to-Fine Image Reconstruction
381
The regularization term in eq. (3), R2 ||∇g||2 dxdy, can be approximated with the help of eq. (8) by ∞ ∞ ∞ 1 M−1 −1 N ∂β n xi 2 −k ||∇ga (x, y)|| dxdy = ck,l cm,n a −∞ −∞ −∞ ∂xi i=0 l,n=0 k,m=0 x ∞
x ∂β n xi 1−i 1−i − m dxi − l βn − n dx1−i , (9) βn ∂xi a a a −∞ where (x1 , x2 ) correspond to (x, y) in eq. (8). When we consider the integrals in the previous equations we notice that it can be expressed by a convolution: ∞ ∞ ∂β n x ∂β n ∂β n x ∂β n −k − m dx = −a (u) ((m − k) − u) du . a ∂x a ∂u −∞ ∂x −∞ ∂u (10) This is easily verified by substitution of integration variable (u = xa − k) and noting that β n (x) = β n (−x) for all x ∈ R. We furthermore note that a derivative of a central β-spline of degree n is again a linear combination of β-splines at the expense of lowering its degree to (n − 1) ∂ n β (x) = β n−1 (x + 1/2) − β n−1 (x − 1/2) . ∂x As a result we can write eq. (9) in matrix-vector notation as ∞ ∞ ||∇ga (x, y)||2 dxdy = cT Rc , −∞
(M−1)(N −1)
with c = {ci }i=0
(11)
(12)
−∞
and
N −1 M−1 ∂β 2n (m − k) R = aβ 2n (n − l) n,l=0 ⊗ −a + ∂x m,k=0 M−1 N −1 ∂β 2n (n − l) −a ⊗ aβ 2n (m − k) m,k=0 . ∂y n,l=0
(13)
We will express the inner product in the data term in equation (3) in terms of β-splines as well. This leads to an expression similar to eq. (10), (ga , ψi )L2 (R2 ) = (−1)ni +mi
N −1,M−1
ck,l (β n ∗ ψi )(k − xi , l − yi ) ,
(14)
k,l=0
where (xi , yi ) and (ni , mi ) are the location and differential order of the ith filter ψi respectively. In contrast to the discretization of the regularization we will not derive a closed form expression for this convolution but we will approximate the β-spline in eq. (14) by a Gaussian. Where we use the observation in [23] that 6x2 6 n ∼ β (x) = e− (n+1) . (15) π(n + 1)
382
B. Janssen, R. Duits, and L. Florack
The data term can be expressed in matrix-vector notation by Edata (c) = ||Sc − d||2 ,
(16)
(N −1)(M−1),P
and d = {di }P where S = {(β n ∗ ψi )(k − xi , l − yi )}k,l=0,i=1 i=1 . Now we can write the minimizer of equation (3) in matrix-vector notation as T S S − λR c = S T d . (17) This linear system of equations can be solved using a conjugate gradient (CG) method [3]. In case the matrix S is sparse it is beneficial to apply a multigrid method [5]. Mainly due to the non-sparseness of S, the conjugate gradient method is preferred. Notice that, in this specific case, R can be expressed as a convolution. For large images it is infeasible to explicitly compute S T S, therefore we compute the matrix vector product ˆ c = S T Sc that appears in a conjugate ˜ = Sc and thereafter evaluating ˆc = S T c ˜. gradient iteration by first evaluating c
4
Adaptation to a Gauge Field
In the previous sections we used a very simple model as a regularization term. For several applications it would be beneficial if we were able to introduce a more sophisticated model of the image we want to reconstruct. Feature based image editing [16] and optic flow estimation [13,8] are applications that potentially have great benefit of such a refinement. Recently an image in-painting method was introduced that achieves a model refinement by means of covariant derivatives on a vector bundle that are guided by a user selectable gauge field [10]. We will adapt a similar approach. The basic idea is to replace the gradient that appears in the regularization term of eq. (3) by a covariant derivative DAh that is biased by a gauge field h ∈ H2 (R2 ). To this covariant derivative the gauge field h should be “invisible”, i.e DAh h = 0. If we were able to put h to be the original image f the approximation would exactly produce f again. To this end we interpret f not as a scalar function but as a section through a fibered space E = R2 × R+ . Heuristically this means that we rescale intensity by a spatially varying factor, the unit section σ. Thus we consider f σ instead of f to model intensity values in the image (the latter is a special case in which σ(x) = (x, 1) ∀x ∈ R2 ). This implies that when we consider derivatives, we need to account for the spatial variability of σ. In the next subsection we will introduce to this end a connection on a vector bundle. There, we will also make the heuristic description of our approach presented here a bit more rigorous. For the reader who is not familiar with the concept of vector bundles it could be useful to take notice of Fig. 1 before reading the next subsection, since it aids in developing the right geometrical interpretation of the presented material. 4.1
Connections on Vector Bundles
Consider a vector bundle (E, π, M ), with total space E = R2 × R+ , base space M = R2 , and projection π : E → M . π projects a point in E (a point in M augmented with an intensity L ∈ R+ ) to M in the following manner
Coarse-to-Fine Image Reconstruction
383
(18)
π(x, y, L) = (x, y) .
L amounts to a certain physical quantity such as luminous intensity, which is expressed in candela (cd). Next we define a section s : M → E such that π◦s = idM , where idM denotes the identity map on M . We define the association of a section σf with unique image f ∈ L2 (R2 ) as f ↔ σf ⇔ ∀(x,y)∈R2 σf (x, y) = (x, y, f (x, y)) .
(19)
The multiplication of such a section σf by an image g is given by (20)
gσf = σf g .
Let σ ˜ denote the unit section σ ˜ (x, y) = (x, y, L0 ), with L0 a fixed luminous intensity unit (eg. 1cd). We want to define a connection D over the space of sections Γ (E) on E. Let L (Γ (T M ), Γ (E)) denote the space of linear operators that map a section of a tangent bundle on M to a section of a vector bundle. Here we stress that a section of a tangent bundle, V ∈ Γ (T M ), is just a vector field on M . A map D : Γ (E) → L (Γ (T M ), Γ (E))
(21)
is a connection on a vector bundle iff it possesses the following properties, cf. [15], pp.106. In the following we will use standard notation DV σ = (Dσ) (V ). 1. D is tensorial in V : DV +W σ = DV σ + DW σ for V, W ∈ Γ (T M ), σ ∈ Γ (E) Df V σ = f DV σ for f ∈ C ∞ (M, R), V ∈ Γ (T M ) .
(22) (23)
2. D is R-linear in σ: DV (σ + τ ) = DV σ + DV τ for V ∈ Γ (T M ), σ, τ ∈ Γ (E)
(24)
and it satisfies the Leibniz product rule: DV (f σ) = V (f )σ + f DV σ for f ∈ C ∞ (M, R) .
(25)
Suppose we have a section D on a vector bundle. Then it must satisfy the four properties (eqs. (22) to (25)) mentioned above. Therefore we must have the following identity σ+ Dσ(X)(c(t)) = D(z σ ˜ )(X)(c(t)) = X|c(t) (z)˜
2
z(c(t))c˙i (t)D∂xi σ ˜
(26)
i=1
2 for all sections σ = z σ ˜ , and vector fields X = i=1 c˙i ∂xi . Here c : (0, 1) → M ˙ i = 1, 2, with c(t) ˙ = ddt c(t), and is a smooth curve on M , c˙i (t) = dxi , c(t) z ∈ C ∞ (M, R) an arbitrary image. By {dxi }2i=1 = {dx, dy} we denote the dual frame in the cotangent bundle T ∗ M .
384
B. Janssen, R. Duits, and L. Florack
For each i = 1, 2 D∂xi σ ˜ should be a section on the vector bundle. Such a section can be identified with a function Ai : M → R
(27)
D∂xi σ = σAi = Ai σ ˜.
(28)
by eq. (19) , i.e.
Substituting eq. (28) into eq. (26) yields Dσ(X)(c(t)) =
2 i c˙ (t)∂xi (z)(c(t)) + z(c(t))c˙i (t)Ai (c(t)) σ ˜.
(29)
i=1
2 So each connection is parameterized by the co-vector field A = i=1 Ai dxi . At this point we still have a degree of freedom, namely we still can select a specific co-vector field. In our application we want a certain image h to be “invisible” so for a fixed h we select A = Ah such that A D h (σh ) = 0 , (30) h i.e. DcA ˙ (σh ) = 0 for all curves c, holds for a specific image h. Here we made the dependence of D on Ah explicit in the superscript notation (in the previous equations we left it out in order to facilitate readability). Given the requirement of eq. (30) we explicitly calculate i c˙ (t)(∂xi h)(c(t)) + h(c(t))c˙i (t)Ai (c(t)) σ ˜=0σ ˜ for all curves c : (0, 1) → M
⇔ (∇h)(c(t)) + h(c(t))A(c(t)) = 0 ⇔ Ah (c(t)) = −
(31) 2
∂xi logh(c(t)) dxi ∀h>0 .
i=1
(32) Which gives us an expression for Ah (eq. (32)) provided h is strictly positive. This is a limitation of our method. However, for a system that observes physical quantities this is a realistic assumption. From the previous derivations we conclude that applying a covariant derivative that is gauged by an image h to an image f amounts to 2 2 A D h (σf ) (c) ˙ = c(f ˙ )+ Ai c˙i f σ ˜ = c(f ˙ )− ((∂xi logh)c˙i f ) σ ˜ (33) i=1
i=1
f ˙ σ ˜. = c(f ˙ ) − c(h) h
Where we used the following short notation: c(f ˙ )=
2 i=1
c˙i ∂xi (f ) = (c(t) ˙ · ∇f ) (c(t)) =
d f (c(t)) . dt
(34)
Coarse-to-Fine Image Reconstruction E σf (c(t+)) c(f ˙ )− fh c(h)| ˙ c(t) { c(f ˙ )|c(t)
C σf (c(t))
σh (c(0))
c(h)| ˙ c(t) f c(h)| ˙ c(t) h
c
{
{
y
σh (c(t))
c(t+) c(t)
D
σf
E A
{
σh (c(t+))
385
σh
↓π c(0) c(0) ˙
B
M
x
Fig. 1. A visualization of the calculation of a covariant derivative as described in eq. (33). The base space M corresponds to R2 and total space E corresponds to R2 ×R+ . We refer to the text right after eq. (33) for an explanation of this figure.
Note that eq. (33) can be rewritten as ˙ (c(t)) = σ ˜ (df + f Ah ) (c(t)) ˙ , ∀c:(0,1)→M : DAh σf (c(t))
(35)
˜) = σ ˜ (df + f Ah ). When we identify σf = f σ ˜ ↔ f this simplifies to i.e. DAh (f σ DAh f = (d + Ah )f ,
(36)
in which Ah f is a multiplication. The calculation of a covariant derivative as described in eq. (33) allows for a geometrical interpretation. A visualization thereof, which is depicted in Fig. 1, will be described next. We stipulate this is a specially crafted example since there is only structure present in one single direction. Therefore we only have to construct a visualization for the calculation of a covariant derivative in the direction that is labelled by x in the figure. The derivative in the direction that is labelled by y simply vanishes. On the base space M a curve c : (0, 1) → M is drawn. We want to calculate the covariant derivative of the section σf at the point that corresponds to c(t) on the base space. The covariant derivative is gauged by the gauge field h. Therefore another section, σh , is depicted in the figure. The gradient of σh at the point σh (c(t)) in total space E is depicted by a line, labelled A, through σh (c(t)) and σh (c(t + )). The line labelled D visualizes in a similar manner the gradient of σf at σh (c(t)). On the left side it is shown how the gradient of A is attenuated by the fraction of the values of σf (c(t)) and σh (c(t)). The value of this attenuated directional derivative is added to the directional derivative of σf at σf (c(t)) in the upper left of the figure to finally produce the result of eq. (33). To clarify the attenuation process we added Fig. 2 where the relevant lines are labeled the same as their corresponding lines in Fig. 1. In essence the energy functional for which we search a minimizer stays the same as the one for which a minimizer is
386
B. Janssen, R. Duits, and L. Florack
}
C A c(h) ˙ { f c(h) ˙ h
{
1
B
Fig. 2. Visualization of the amplification of c(h) ˙ by congruence relations that are used in Fig. 1.
f (c(t))
} h(c(t))
f (c(t)) . h(c(t))
This image clarifies the
sought in eq. (3). We merely change the notion of a gradient, which is adapted to a gauge field h, the resulting energy functional now reads E(g) =
P
2
αi ((g, ψi )L2 − di ) +
i=1
λ 2
R2
||DAh g||2 dV ,
(37)
where DAh is the covariant derivative or equivalently a linear connection acting on an image as in eq. (36).
5
Multi-scale Approximate Reconstruction from Singular Points
We will apply the gauged reconstruction of eq (37) to the reconstruction from singular points of a Gaussian scale space representation uf of an image f , with ∂u uf the unique solution to ∂sf = Δuf with initial condition uf (·, 0) = f . Singular 2 + points (x, y, s) ∈ R × R of uf are those points satisfying ∇uf (x, y, s) = 0 . (38) det∇∇T uf (x, y, s) = 0 For more information about catastrophe theory in general, its application in scale space theory and the calculation of the locations of singular points we refer to [11, 6, 7]. A filter ψi corresponding to a derivative at a certain position in the scale space of an image is given by ψi (x, y) = (2si )
ni +mi 2
∂ ni +mi 1 − ((x−xi )24s+(y−yi )2 ) i e . ∂(xni y mi ) 4πsi
(39)
Here we used multi-index notation i = (xi , yi , mi , ni , si ). A singular point is encoded by storing the second order derivative jet for each singular point location. The discretization proposed in Section 3.2 allows for a reconstruction at a certain resolution a > 0. We will select scales {2j }Jj=0 . First we find all features which can be approximated well at the coarsest resolution J, i.e. those features for which ||ψi − PVa ψi || < , where PVa denotes the L2 -projection onto the
Coarse-to-Fine Image Reconstruction
387
Fig. 3. From left to right, (1) the source image “trui”, (2) reconstruction at resolution 65×65 pixels from 84 feature points, (3) reconstruction from 226 feature points at 129× 129 pixels and gauged by the image on its left, (4) reconstruction from 727 feature points at 257×257 pixels and gauged by the image on its left, (5) same reconstruction as the image on the left but not gauged, and (6) reconstruction from all 1070 feature points, no gauge field. The features are up to second order differential structure obtained from the scale space rep. of the source image at its singular point positions.
set Va = {β n ( xa − k)β n ( xa − l)|k, l ∈ Z} and > 0 a small constant. Next we compute a reconstruction at resolution J using a constant gauge field h. Then, for each scale j = J − 1 . . . 0 we find the gauge field by application of the two scale relation (see eq. (7)) to the reconstructed image at scale j + 1. To reduce memory consumption and gain computational efficiency all features that were used in a coarser scale reconstruction are left out such that those features are only implicitly encoded (via the gauge field) in the reconstruction algorithm. See the caption of Figure 3 for a description of the experiments we conducted. Comparing the fourth and fifth image shows that features which are not directly encoded are passed by the gauge field (lower resolution images). In fact the difference between the two reconstructions is quite striking. We furthermore note that memory requirements and the computational complexity for the algorithms to produce these two images are equivalent. When the features of all 1070 singular points are directly used (right most image in Figure 3) the visual quality is more appealing. The memory requirements are however much larger. We also mention the method of feature selection for the next level is quite crude and can be improved by incorporating e.g. a feedback loop. These are possibilities for future exploration which are allowed by the presented framework.
6
Conclusions
We introduced a coarse-to-fine image reconstruction method that approximates a set of generalized samples that are weighted according to their noise robustness. Information from a coarse resolution reconstruction is passed to a finer resolution level by means of a gauge field. To this end we considered the image not as a scalar function but as a section through a fibered space. Application of the newly proposed method to the reconstruction from singular points of a scale space representation of an image shows the feasibility of the method.
388
B. Janssen, R. Duits, and L. Florack
References 1. Arigovindan, M.: Variational Reconstruction of Vector and Scalar Images from Non-Uniform Samples. PhD thesis, EPFL, Lausanne, Switserland (2005) 2. Balmashnova, E.: Scale-Euclidean invariant object retrieval. PhD thesis, Eindhoven University of Technology, Eindhoven, The Netherlands (2007) 3. Barret, R., Berry, M., Chan, T.F., et al.: Templates for the solution of linear systems: Building blocks for iterative methods. SIAM, Philadelphia (1994) 4. Blom, J.: Topological and Geometrical Aspects of Image Structure. PhD thesis, University of Utrecht, Utrecht, The Netherlands (1992) 5. Briggs, W.L., Henson, V.E., McCormick, S.F.: A Multigrid Tutorial. SIAM, Philadelphia (2000) 6. Damon, J.: Local Morse theory for solutions to the heat equation and Gaussian blurring. Journal of Differential Equations 115(2), 368–401 (1995) 7. Florack, L.M.J., Kuijper, A.: The topological structure of scale-space images. JMIV 12(1), 65–79 (2000) 8. Florack, L.M.J., Janssen, B.J., Kanters, F.M.W., Duits, R.: Towards a new paradigm for motion extraction. In: Campilho, A., Kamel, M.S. (eds.) ICIAR 2006. LNCS, vol. 4141, pp. 743–754. Springer, Heidelberg (2006) 9. Galic, I., Weickert, J., Welk, M., Bruhn, A., Belyaev, A., Seidel, H.: Image compression with anisotropic diffusion. JMIV 31(2-3), 255–269 (2008) 10. Georgiev, T.: Relighting, retinex theory, and perceived gradients. In: Proceedings of Mirage 2005 (March 2005) 11. Gilmore, R.: Catastrophe Theory for Scientists and Engineers. Dover Publications, New York (1993); Originally published by John Wiley & Sons, New York (1981) 12. Janssen, B.J., Duits, R., ter Haar Romeny, B.M.: Linear image reconstruction by Sobolev norms on the bounded domain. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 55–67. Springer, Heidelberg (2007) 13. Janssen, B.J., Florack, L.M.J., Duits, R., ter Haar Romeny, B.M.: Optic flow from multi-scale dynamic anchor point attributes. In: Campilho, A., Kamel, M.S. (eds.) ICIAR 2006. LNCS, vol. 4141, pp. 767–779. Springer, Heidelberg (2006) 14. Janssen, B.J., Kanters, F.M.W., Duits, R., Florack, L.M.J., ter Haar Romeny, B.M.: A linear image reconstruction framework based on Sobolev type inner products. IJCV 70(3), 231–240 (2006) 15. Jost, J.: Riemannian Geometry and Geometric Analysis, 4th edn. Springer, Berlin (2005) 16. Kanters, F.M.W.: Towards Object-based Image Editing. PhD thesis, Eindhoven University of Technology, Eindhoven, The Netherlands (February 2007) 17. Le Besnerais, G., Champagnat, F.: B-Spline image model for energy minimizationbased optical flow estimation. IEEE-TIP 15(10), 3201–3206 (2006) 18. Lillholm, M., Nielsen, M., Griffin, L.D.: Feature-based image analysis. International Journal of Computer Vision 52(2/3), 73–95 (2003) 19. Nielsen, M., Lillholm, M.: What do features tell about images? In: Proceedings on Scale Space 2001, pp. 39–50. Springer, Heidelberg (2001) 20. Shannon, C.E.: Communication in the presence of noise. In: Proc. IRE, vol. 37, pp. 10–21 (January 1949) 21. Thevenaz, P., Ruttimann, U.E., Unser, M.: A pyramid approach to subpixel registration based on intensity. IEEE-TIP 7(1), 27–41 (1998) 22. Unser, M.: Splines: A perfect fit for signal and image processing. IEEE Signal Processing Magazine 16(6), 22–38 (1999) 23. Unser, M., Aldroubi, A., Eden, M.: On the asymptotic convergence of B-Spline wavelets to Gabor functions. IEEE-TIT 38(2), 864–872 (1992)
Edge-Enhanced Image Reconstruction Using (TV) Total Variation and Bregman Refinement Shantanu H. Joshi1 , Antonio Marquina2,3, Stanley J. Osher3 , Ivo Dinov1 , John D. Van Horn1 , and Arthur W. Toga1 1
3
Laboratory of Neuroimaging, University of California, Los Angeles, CA 90095, USA 2 Departamento de Matematica Aplicada, Universidad de Valencia, C/ Dr Moliner, 50, 46100 Burjassot, Spain Department of Mathematics, University of California, Los Angeles, CA 90095, USA
Abstract. We propose a novel image resolution enhancement method for multidimensional images based on a variational approach. Given an appropriate downsampling operator, the reconstruction problem is posed using a deconvolution model under the assumption of Gaussian noise. In order to preserve edges in the image, we regularize the optimization problem by the norm of the total variation of the image. Additionally, we propose a new edge-preserving operator that emphasizes and even enhances edges during the up-sampling and decimation of the image. Furthermore, we also propose the use of the Bregman iterative refinement procedure for the recovery of higher order information from the image. This is coarse to fine approach for recovering finer scales in the image first, followed by the noise. This method is demonstrated on a variety of low-resolution, natural images as well as 3D anisotropic brain MRI images. The edge enhanced reconstruction is shown to yield significant improvement in resolution, especially preserving important edges containing anatomical information. Keywords: Edge-preserving operators, total variation regularization, deconvolution, Gaussian blur, Bregman iteration, up/down sampling.
1 Introduction With the recent advances in low-cost imaging solutions and increasing storage capacities, there is an increased demand for better image quality in a wide variety of applications involving both image and video processing. Often times, owing to sensor shortcomings, low-power requirements, or environmental limitations, one is only able to acquire a low-resolution observation of the scene. The low-resolution data can exist in the form of still images, a sequence of image frames devoid of inter-frame motion, a single video sequence, or a collection of video sequences. Furthermore the observations can be corrupted by motion-induced artifacts either in the case of still images or videos. The collective approach that tackles the problem of reconstructing a high-resolution image from one or more of the above low-resolution observations is termed as superresolution. There are several prominent approaches to this problem, all of them largely employing various cues such as sub-pixel shifts between successive frames, the camera blur, defocus, and zoom, etc. These approaches can be divided into two types, ones that use motion information between successive frames (e.g., video super-resolution), and X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 389–400, 2009. c Springer-Verlag Berlin Heidelberg 2009
390
S.H. Joshi et al.
the others that use a motion-free approach. Most of these approaches usually expect multiple low-resolution observations as input. Super-resolution image reconstruction can be mathematically modeled as a nonlinear process consisting of a convolution operator acting on the image, followed by a down sampling operation and the mixing of additive noise. Most of the earlier research work in this area has been developed in the frequency domain approach using (discrete) Fourier transform and wavelet-transform based methods. For e.g. the approach of Tsai and Huang [13] first outlined the idea of super-resolution in their seminal paper. Peleg et al. [8] used the iterative back projection scheme to achieve image reconstruction. Yet another approach [12] uses projections on convex sets (POCS) of images to restrict the solution domain for reconstruction. A hybrid approach by Elad and Feuer [5] combines the POCS and the maximum likelihood approaches for both motion-based and motion-free super-resolution. A very different set of methods use the learning-based approach for super-resolution. The general idea here is to learn a set of image features from exemplar images and use them for the reconstruction of a high-resolution image. Capel and Zisserman [2] use PCA on face image databases to learn the image model and use it to reconstruct images from multiple views. Freeman et al. [6] learn a feature set of image patches that encode the relationships among different spatial frequencies from a large training set and use it as prior information for reconstructing higher frequencies for resolution enhancement. The reader is referred to an excellent monograph by Chaudhari and Joshi [4] for a comprehensive bibliography and references in the field. Along with a wide range of applications of super-resolution methods in tasks such as satellite image processing, surveillance, computer vision, and even video processing, there has been a considerable effort by researchers trying to apply these methods to medical imaging. In particular, MRI acquisitions usually have a low-resolution in the inter-slice direction, and it is of considerable interest to “fill-in” the intermediate slices. Carmi et al. [3] use sub-pixel shifted MR (Magnetic Resonance) images for high resolution reconstruction. Greenspan et al. [7] combine several low resolution images in the slice-select direction to achieve SR reconstruction. Kornprobst et al. [9] also achieve higher resolution in the slice-select direction for fMRI sequences. While super-resolution methods attempt to exploit the information redundancy in several low-resolution observations of images, at times, only a single low-resolution instance of the image is available. This is sometimes the case in MRI images, where due to economic or health reasons, a patient is scanned only once over a period of time, or the time elapsed between successive scans may be too large to preserve any temporal coherence to take advantage of. Based on this assumption, we will focus mainly on the problem of single frame high resolution reconstruction of images. Our approach will be based upon a variational model that uses the TV norm [11] as a regularizing functional. Recently, Marquina et al. [14] have proposed a new variational model based on the TV norm [11] for super-resolution of multidimensional images. They use a new multi-scale approach (Bregman iterations) for iterative refinement and recovery of finer details in images. We will follow this approach to solve the more general super-resolution problem using the TV norm as regularizing functional. In addition, we propose an iterative refinement procedure based on an original idea by Bregman [1], to improve spatial resolution. The proposed super-resolution method improves upon the behavior of any
Edge-Enhanced Image Reconstruction
391
interpolation method (including high order and sinc interpolation) because our method preserves edges satisfactorily avoiding Gibbs phenomenon, whereas the iterative refinement procedure allows us to recover fine scales of the image. The main contributions of this paper are as follows: – a three-dimensional variational model based on the TV norm [11] regularizer. – a new multi-scale approach (Bregman iterations) for iterative refinement and recovery of finer details in images. – a new piecewise-linear up(down) sampling operator that preserves edges. – application of this method for super-resolution for anisotropic 3D MRI images. This paper is organized as follows: Section 2 outlines the super-resolution model using TV regularization. In particular, it explains the variational model as well as a new scale-space approach that utilizes the Bregman iterative procedure for recovering finer details from images. Additionally, section 2.2 proposes a new edge-preserving up (down) sampling operator used in the model. Section 3 presents details of the numerical implementation of the model. Section 4 demonstrates experimental results for a few 2D natural images as well as 2D slices and 3D volumes of MRI images, followed by the summary.
2 Image Observation and Synthesis Model The low resolution image observation model can be formulated in a standard fashion as a down-sampled degraded version of the original high resolution image. We assume that the low resolution image f is defined on a subset of a plane Ω ⊂ Rk . For the purpose of this paper, k is either 2 or 3. Here onwards, all the notation will be specified for 3D images. The restriction to 2D images is straightforward. For a discrete representation, we assume f ∈ Rn × Rm × Rp . Let the unknown high resolution image to be estimated be given by u ∈ R2m × R2n × R2p . Then given a linear down sampling operator D, we can write the observation model as, f = D(h ∗ u) + n,
(1)
where n is an additive Gaussian white noise with zero mean and variance σ 2 , and h is a translation invariant convolution kernel corresponding to the point spread function of the imaging device. A related problem in the above formulation is the estimation of the kernel h, that we shall skip in this paper. Throughout this paper, we assume that the kernel is given by the Gaussian, h(x, y, z) = Ke
− 12
x2 2 σx
2
2
y
z
y z +σ 2 + σ2
,
(2)
where K is a normalization constant, and σx , σy , σz are variances along the X, Y , and Z directions respectively. The problem in Eqn. 1 is usuallysolved as a constrained optimization problem that seeks to minimize the regularizer Ω ||∇u||2 dxdy, while constraining the noise to be ||h ∗ u − f ||2L2 = σ 2 . This ensures that the reconstructed image u is free of discontinuities. An alternative to the above regularizer is the total variation
392
S.H. Joshi et al.
proposed by Rudin and Osher [11]. This norm is shown to recover edges in images satisfactorily. The total variation norm is given as, TV(u) = |∇u|dxdy (3) Ω
Using the regularizer in Eqn. 3, we can state the single frame image reconstruction model as follows: u ˆ = arg min{T V (u) + u
λ [||f − D(h ∗ u)||2L2 − σ 2 ]} 2
(4)
The Euler-Lagrange formulation for Eqn. 4 can be written as ∇u ˜ ∗ S(f ) − h ˜ ∗ (S ◦ D(h ∗ u))) = 0 + λ(h |∇u| ∇u ˜ ∗ (¯ =⇒ ∇ · + λh g − T (h ∗ u)) = 0 |∇u| ∇·
(5) (6)
˜ is the inverse of h, g¯ = S(f ), and the operator T where S is an upsampling operator, h is defined as T = S ◦ D. Furthermore D ◦ S = Id The Euler-Lagrange equation given by Eqn. 6 can be solved as a time-dependent equation ∇ ˜ ∗ (¯ ut = ∇ · + λh g − T (h ∗ u)) (7) |∇u| with homogeneous Neumann boundary conditions and initiating with u0 = S(f ). 2.1 Bregman Iterative Method The convergence of Eqn. 7 to the steady state yields a reconstructed high resolution image. However if one wishes to recover even finer scales from the reconstructed image, one can use the Bregman iterative refinement procedure [1] to do so. If u0 is the solution of Euler-Lagrange equation (6), then we have, ∇·
∇u0 ˜ ∗ (¯ + λh g − T (h ∗ u0 )) = 0 |∇u0 |
(8)
We will denote the image residual in the high resolution scale by v0 as, v0 = g¯ − T (h ∗ u0 )
(9)
We now solve the Euler-Lagrange equation for the new image g¯ + v0 to obtain a new solution, which we denote by u1 . Again, the solution u1 will satisfy ∇u1 ˜ ∗ g¯ + v0 − T (h ∗ u1 ) = 0, ∇· + λh (10) |∇u1 | where the new residual is defined as v1 = g¯ + v0 − T (h ∗ u1 )
(11)
Edge-Enhanced Image Reconstruction
393
and so on. The sequence of images u0 , u1 , · · · , uj , · · · are also referred to as Bregman iterates. It is advisable to terminate this procedure when a satisfactory image quality is obtained, otherwise it has a tendency to recover noise after all the finer scales in the image are recovered. This iterative procedure was introduced for image restoration in [10]. 2.2 Edge-Preserving Up (Down)-Sampling Operator There are various choices for the up (S) and down (D) sampling operators used in the observation model in Eqn. 1 and the synthesis model in Eqn. 7 respectively. The simplest down sampling operator can be an averaging operator that simply averages the eight neighbors of the pixel using either a Gaussian kernel, or an arithmetic average. Correspondingly, the up sampling operation simply involves repeating voxel values for each row, column, and slice. Alternately, one can also use bilinear interpolation for up sampling and down sampling images. The problems with the above approaches are the unnecessary blurring (averaging) that is caused at each step of the iteration while solving the Euler Lagrange equation in 6. To overcome this problem, one can use better signal preserving operators that involve sinc or Fourier interpolation for up and down sampling. However these methods can potentially introduce ringing artifacts in images with sharp edges or boundaries. Especially for images with prominent edges and interfaces, we need an appropriate interpolation operator that preserves these features. Accordingly, we propose a new piecewise-linear up (down) sampling operator that preserves such edges and boundaries. We describe the edge-preserving operator in detail below. We set up the grid xj = (j − 1)Δx, yk = (k − 1)Δy and zl = (l − 1)Δz, where Δx > 0, Δy > 0, Δz > 0 and j = 1, . . . , n, k = 1, . . . , m and l = 1, . . . , p. We define the domain E = [0, A] × [0, B] × [0, C], where A = (n − 1)Δx, B = (n − 1)Δy, and C = (n − 1)Δz. We consider the grid function u defined as uj,k,l : R3 → R We define the edge-preserving piecewise linear approximation of the grid function u as the function L(x, y, z)|Ejkl = Ljkl (x, y, z) where the computational voxel Ejkl is given by Ejkl = [xj −
Δx Δx Δy Δy Δz Δz , xj + ] × [yk − , yk + ] × [zl − , zl + ] 2 2 2 2 2 2
and Ljkl (x, y, z) = uj,k,l + a(x − xj ) + b(y − yk ) + c(z − zl ), x Δ− uj,k,l Δx + uj,k,l where a, b, and c are determined from a = minmod , , Δx Δy u z Δx y Δ− uj,k,l Δz+ uj,k,l − j,k,l Δ+ uj,k,l b = minmod , and c = minmod , where the , Δy , Δz Δy Δz operations in the term containing derivatives are understood component-wise, and given by Δx± uni,j,k = ±(uni±1,j,k − uni,j,k ), Δy± uni,j,k = ±(uni,j±1,k − uni,j,k ), and Δz± uni,j,k = ±(uni,j,k±1 − uni,j,k ), where i, j, k are the indices of the 3D grid.
394
S.H. Joshi et al.
The minmod(d, e) function is defined as, minmod(d, e) =
sgn(d) + sgn(e) min(|d|, |e|), 2
(12)
where sgn(d) = 1 if d ≥ 0 and sgn(d) = −1 otherwise. The function Ljkl (x, y, z) is defined on the computational voxel Ejkl . We want to up-(down) sample the grid function u with a spatial resolution of hx > 0, hy > 0, hz > 0. Then the up-(down) sampled grid function v is defined on a new grid v(q, r, s) for q = q, . . . , nh, r = 1, . . . , mh, and s = 1 . . . , ph where A B C nh = floor , mh = floor , ph = floor , hx hy hz where floor(d) is the maximum of all integers i such that i ≤ d. The new grid is then defined as xhq = (q − 1)hx , yhr = (r − 1)hy , and zhs = (s − 1)hz . Based on this grid, the function v is defined as v(q, r, s) = L(xhq , yhr , zhs ). We demonstrate the edge-preserving property of the above operator by applying it to a checkerboard pattern as shown in Fig. 1. Figure 1 shows a low-resolution image, as well as its up sampled versions using a bilinear, sinc and the edge-preserving operator for two different types of checkerboard patterns. It also shows a magnified portion from the center of the image. It is observed that the bilinear and the sinc interpolation operators introduce significant spurious levels of gray in between the black squares in the pattern. Furthermore, they have a tendency to smooth out the boundaries of the flat black squares in the image. In contrast, the edge-preserving operator has retained, and in some cases even enhanced the boundaries and edges as compared to the low-resolution image. Figure 3 shows similar results with a 280 × 200 scene image. The first image in the top row shows the 560×400 pixel replicated image, whereas the last image is the superresolved image. The bottom row shows a small portion of the image magnified to show detail. One can immediately observe the blocking effects due to pixel replication in the first image, and blurring of the edge boundaries in the bilinearly interpolated version. The edges get somewhat better using the sinc interpolation, but the best quality is given by the super-resolved image, that resolves and even enhances sharp edges and interfaces in the image. In both the above cases, we used an isotropic Gaussian kernel with kernel widths σx = σy = 1.
3 Numerical Implementation This section discusses the numerical implementations of the solution to the Euler Lagrange equation. The Euler-Lagrange derivative of the TV-norm is not well defined at 1 points where ∇u = 0, due to the presence of the term |∇u| . Hence we modify the regularization TV functional as follows:
|∇u|2 + dxdy (13) Ω
Edge-Enhanced Image Reconstruction Low-resolution Image
Bilinear Interpolation
Sinc Interpolation
395
Edge-preserved Upsampling
Fig. 1. The first and the third rows show a low-resolution image from the left, and its up sampled versions using a bilinear interpolation operator, a sinc operator, and the new edge-preserving operator for two different checkerboard patterns. The second and the fourth rows show a magnified area from the center of the image.
where is a small positive parameter. We express the 3D model (7) in terms of explicit partial derivatives ˜ ∗ (¯ ut =λh g − T (h ∗ u)) 2
unxx((uny ) +(unz )2 + ))+unyy ((unx )2 +(unz )2 + ))+unzz ((unx )2 +(uny )2 + )) [(unx )2 + (uny )2 + (unz )2 + ]3/2 n n n −2uxy ux uy − 2unxz unx unz − 2unyz uny unz + (14) [(unx )2 + (uny )2 + (unz )2 + ]3/2
+
396
S.H. Joshi et al.
low-resolution image
sinc interpolation
Super-resolved reconstruction
1st Bregman refinement
Fig. 2. Clockwise from top, a 380 × 285 low-resolution image, upsampled to twice the size by sinc interpolation, and super-resolved reconstruction, and the first Bregman iterated image
using u0 = S(f ) as the initial guess and homogeneous Neumann boundary conditions (i.e. absorbing boundary). The above expression can also be rewritten as n un+1 i,j,k − ui,j,k ˜ ∗ (¯ = λ[h g − T (h ∗ un ))]i,j,k Δt
(15)
2
+
unxx ((uny ) +(unz )2 +))+unyy ((unx )2 +(unz )2 +))+unzz ((unx )2 +(uny )2 +)) [(unx )2 +(uny )2 +(unz )2 +]3/2
(16)
+
−2unxy unx uny − 2unxz unx unz − 2unyz uny unz [(unx )2 + (uny )2 + (unz )2 + ]3/2
(17)
The approximations to the derivatives in Eqn. 17 can be calculated as: [unxx ]i,j,k = Δx+ Δx− uni,j,k /h2x , [unyy ]i,j,k = Δy+ Δy− uni,j,k /h2y , [unzz ]i,j,k = Δz+ Δz− uni,j,k /h2z , [unxy ]i,j,k = (Δx− + Δx+ )(Δy− + Δy+ )uni,j,k /4(hx hy ), [unxz ]i,j,k = (Δx− + Δx+ )(Δz− + Δz+ )uni,j,k /4(hx hz ), [unyz ]i,j,k = (Δy− + Δy+ )(Δz− + Δz+ )uni,j,k /4(hy hz ), [unx ]i,j,k = (Δx− + Δx+ )uni,j,k /2hx, [uny ]i,j,k = (Δy− + Δy+ )uni,j,k /2hy , [unz ]i,j,k = (Δz− + Δz+ )
Edge-Enhanced Image Reconstruction Low-resolution Image
Bilinear Interpolation
Sinc Interpolation
397
Super-resolved reconstruction
Fig. 3. Top row shows the low-resolution image, and the upsampled versions using bilinear, sinc and the super-resolved reconstruction. The bottom row shows a magnified detail of a portion of the image.
uni,j,k /2hz The Lagrange multiplier λ was chosen to be the maximum value for which the algorithm was stable. It was empirically determined to be λ = 10, and was not changed thereafter.
4 Experimental Results Lastly, we demonstrate the algorithm by performing experiments with 2D natural images, 2D slices of 3D volumetric images, and finally the full 3D volumetric MRI images themselves. 4.1 Results for Natural Images Figure 2 shows the results of the super-resolution reconstruction algorithm applied to a 380 × 285 map image. This image has been scaled to 760 × 570 by pixel-replication for display purposes. It can be observed that pixel replication inherently adds blocking artifacts to the image. The low-resolution image is up sampled by a factor of two using bilinear interpolation, and sinc interpolation, and finally using the super-resolution reconstruction method. It is observed that bilinear interpolation grossly smoothes out the image, the result due to sinc interpolation is preserves some high frequency information, whereas the super-resolved reconstruction yields a sharp, crisp image, even resolving the little text at finer scales. One can further enhance this image by performing the 1st Bregman iteration as shown in Fig. 2. However, this process should be terminated after one or two iterations. 4.2 Results for 2D Slices of 3D MRI Image In this experiment, we look at enhancing the in-plane resolution of individual transverse slices of a 3D MRI image. From left, all rows of Fig. 4 show an isotropic original image
398
S.H. Joshi et al.
Original Image
Subsampled Image Fourier Interpolation SR reconstruction
Fig. 4. Examples of super-resolved reconstruction for 2D slices of 3D MRI images
180 × 216, the subsampled image, a Fourier interpolated image, and a super-resolved reconstructed image. For display purposes, the subsampled image is shown at twice the resolution using pixel-replication. It is observed that the high resolved reconstructed image has sharper edge features, more details, and visually closely resembles the original image as compared to the Fourier interpolated result. 4.3
Results for Full 3D MRI Images
The proposed super-resolution algorithm can be applied to arbitrary 2D images or even 3D volumes of anisotropic voxel dimensions. In this experiment, we apply the reconstruction
Edge-Enhanced Image Reconstruction
399
Original Image Subsampled Image Fourier Interpolation SR reconstruction
Fig. 5. Examples of super-resolved reconstruction for full 3D MRI images (volume rendered)
algorithm to the full 3D MRI image volume. Figure 5 shows a volume rendering of an original image of dimensions 256×256×160, at voxel widths given by 1×1×1.25 mm3. This image is first subsampled to half the resolution at 128 × 128 × 80 (2 × 2 × 2.5 mm3 ) and then super-resolved to a full isotropic 256 × 256 × 160 image with 1 × 1 × 1 mm3 resolution. As expected, we can see an improvement in the resolution plus an increase in the detail simultaneously across all X, Y, and Z dimensions. In this experiment, we used an anisotropic Gaussian kernel with the variances proportional to the voxel dimensions. Furthermore the grid dimensions for the edge-preserving up sampling and down sampling h operators were taken to be Δx = h2x , Δy = 2y , Δz = h2z , where hx , hy , hz are the voxel dimensions of the appropriate up sampled or down sampled image.
5 Conclusion and Future Directions We have presented a method for enhancement of resolution of images. The strengths of this approach lie in the i) TV norm as a regularizing functional in the variational model, and ii) a new piecewise-linear up(down) sampling operator that preserves edges. While we are aware that the proposed method works with the physical space, and not the frequency (k-space) of the data, we emphasize that the TV prior is a nonlinear prior that does modify the amplitudes of the k-space data. In other words, our algorithm works on the processed physical image, yet it modifies the spectral information implicitly in the data. This is an important point to be noted, especially in view of comparison with other methods that involve MRI image processing that work with the k-space representation of the data. We have demonstrated the improvement in spatial resolution for 2D as well as 3D anatomical MRI images. In the future, we intend to investigate the problem of high resolution reconstruction of DT-MRI images using the proposed method.
400
S.H. Joshi et al.
Acknowledgments This research was partially supported by the National Institute of Health through the NIH Roadmap for Medical Research, Grant U54 RR021813. Additionally, Dr. Antonio Marquina gratefully acknowledges the support from the NSF grants DMS-0312222, ACI-0321917, the NIH grant G54 RR021813, as well as DGICYT MTM2008-03597 from the Spanish Government Agency.
References 1. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. and Math. Phys. 7, 200–217 (1967) 2. Capel, D., Zisserman, A.: Super-resolution from multiple views using learnt image models. In: CVPR, vol. 2, pp. 627–634 (2001) 3. Carmi, E., Liu, S., Alon, N., Fiat, A., Fiat, D.: Resolution enhancement in MRI. Magnetic Resonance Imaging 24(2), 133–154 (2006) 4. Chaudhuri, S., Joshi, M.: Motion-Free Super-Resolution. Springer, New York (2005) 5. Elad, M., Feuer, A.: Restoration of a single super-resolution image from several blurred,noisy, and undersampled measured images. IEEE Tran. Image Processing 6(12), 1646–1658 (1997) 6. Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-resolution. IEEE Computer Graphics and Applications 22(2), 56–65 (2002) 7. Greenspan, H., Oz, G., Kiryati, N., Peled, S.: MRI inter-slice reconstruction. Magnetic Resonance Imaging 20, 437–446 (2002) 8. Irani, M., Peleg, S.: Improving resolution by image registration. CVGIP: Graphical Models and Image Processing 53(3), 231–239 (1991) 9. Kornprobst, P., Peeters, R., Nikolova, M., Deriche, R., Ng, M., Van Hecke, P.: A superresolution framework for fMRI sequences and its impact on resulting activation maps. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2879, pp. 117–125. Springer, Heidelberg (2003) 10. Osher, S.J., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for Total Variation-based image restoration. Multiscale Modeling and Simulation 4(2), 460–489 (2005) 11. Rudin, L.I., Osher, S.J., Fatemi, E.: Nonlinear Total Variation based noise removal algorithms. Physica D 60(1-4), 259–268 (1992) 12. Startk, H., Oskoui, P.: High-resolution image recovery from image-plane arrays, using convex projections. Journal of the Optical Society of America 6, 1715–1726 (1989) 13. Tsai, R.Y., Huang, T.S.: Multi-frame image restoration and registration. In: Advances in Computer Vision and Image Processing, pp. 317–339 (1984) 14. Marquina, A., Osher, S.J.: Image super-resolution by TV-regularization and Bregman iteration. Journal of Scientific Computing 37(3), 367–382 (2008)
Nonlocal Variational Image Deblurring Models in the Presence of Gaussian or Impulse Noise Miyoun Jung and Luminita A. Vese University of California, Los Angeles, Department of Mathematics, Los Angeles, CA 90095-1555, USA
[email protected],
[email protected] Abstract. We wish to recover an image corrupted by blur and Gaussian or impulse noise, in a variational framework. We use two data-fidelity terms depending on the noise, and several local and nonlocal regularizers. Inspired by Buades-Coll-Morel, Gilboa-Osher, and other nonlocal models, we propose nonlocal versions of the Ambrosio-Tortorelli and Shah approximations to Mumford-Shah-like regularizing functionals, with applications to image deblurring in the presence of noise. In the case of impulse noise model, we propose a necessary preprocessing step for the computation of the weight function. Experimental results show that these nonlocal MS regularizers yield better results than the corresponding local ones (proposed for deblurring by Bar et al.) in both noise models; moreover, these perform better than the nonlocal total variation in the presence of impulse noise. Characterization of minimizers is also given.
1
Introduction
We consider the problem of restoring an image blurred and then contaminated by Gaussian or impulse noise. Let f, u : Ω → IR be image intensity functions, where Ω ⊂ IR2 is open and bounded. The standard linear degradation model is f = k ∗ u + n; f is the observed blurry-noisy image, k is (known) spaceinvariant blurring kernel, u is the ideal image we want to recover, and n is additive random noise independent of u. We approach the restoration problem within the variational framework: inf u {Φ(f − k ∗ u) + Ψ (|∇u|)}, where Φ defines a data-fidelity term, and Ψ defines the regularization that enforces a smoothness constraint on u, depending on its gradient ∇u. First, two different fidelity terms can be considered based on the noise; in the case of Gaussian noise model, the L2 -fidelity term led by the maximum likelihood estimation is commonly used: Φ(f − k ∗ u) = Ω (f − k ∗ u)2 dx. However, the quadratic data fidelity term considers the impulse noise, which might be caused by bit errors in transmissions or wrong pixels, as an outlier. So, for the impulse noise model, the L1 -fidelity term is more appropriate, due to its robustness of removing outlier effects [2], [17]: Φ(f − k ∗ u) = Ω |f − k ∗ u|dx. Image deblurring-denoising is an inverse problem, which is known to be ill-posed due to either the non-uniqueness of the solution or the numerical X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 401–412, 2009. c Springer-Verlag Berlin Heidelberg 2009
402
M. Jung and L.A. Vese
instability of the inversion of the blurring operator. The regularization term Ψ alleviates this problem by reflecting some a-priori properties. Several regularization terms were suggested in the literature, including [23], [9], [19], [20], [16]. Here, we consider the total variation regularization [19], [20] and two approximations of Mumford-Shah regularizers [16], denoted M SH 1 and M ST V , proposed by Ambrosio-Tortorelli [3] and Shah [21], [1] respectively and recently used for image deblurring in the presence of Gaussian and impulse noise by Bar et al [4], [5]. These traditional regularization terms are based on local image operators, which denoise and preserve edges very well, but may induce loss of fine structures like texture during the restoration process. Recently, Buades et al [8] introduced the nonlocal means filter, which produces excellent denoising results. Kindermann et al [13] and Gilboa-Osher [10,11] formulated the variational framework of NL-means by proposing nonlocal regularizing functionals. Lou et al [14] used the nonlocal total variation (N L/T V ) of Gilboa-Osher in image deblurring in the presence of Gaussian noise with a preprocessing step for the computation of the weight function. We propose here nonlocal versions of the approximated Mumford-Shah and Ambrosio-Tortorelli regularizing functionals, called N L/M SH 1 and N L/M ST V , by applying the nonlocal operators proposed by Gilboa-Osher to M SH 1 and M ST V respectively, for image restoration in the presence of blur and Gaussian or impulse noise. In addition, for the impulse noise model, we propose to use a preprocessed image to compute the weights w (the weights w defined in the NL-means filter are more appropriate for the additive Gaussian noise). We note that the interesting parallel work [7] also proposed N L/M SH 1 regularizer for segmentation and denoising in the presence of Gaussian noise, but not for deblurring, nor for the impulse noise case. More details about our proposed methods are presented in [12]. Local Regularizers. In this section, we recall several regularization terms. The first one is the Mumford-Shah regularizing functional [16] which gives preference to piecewise smooth images. The MS regularizer, depending on the image u and on its edge set K ⊂ Ω, is given by Ψ MS (u, K) = β Ω\K |∇u|2 dx + α K dH1 , where H1 is the 1D Hausdorff measure. The first term enforces smoothness of u everywhere except on the edge set K, and the second one minimizes the total length of edges. But it is difficult to minimize in practice the non-convex MS functional. Ambrosio and Tortorelli [3] approximated this functional by a sequence of regular functionals Ψ using the Γ -convergence. The edge set K is represented by a smooth auxiliary function v. Thus we have an approximation to Ψ MS as [3] (v − 1)2 MSH 1 2 2 dx, |∇v|2 + (u, v) = β v |∇u| dx + α Ψ 4 Ω Ω where 0 ≤ v(x) ≤ 1 represents the edges: v(x) ≈ 0 if x ∈ K and v(x) ≈ 1 1 otherwise, > 0 is a parameter, and α, β > 0. A minimizer u = u of ΨMSH approaches a minimizer u of Ψ MS as → 0.
Nonlocal Variational Image Deblurring Models
403
An alternative approach is the total variation [19, 20] proposed by Rudin, Osher, and Fatemi, called T V regularizer: Ψ T V (u) = Ω |Du| ≈ Ω |∇u|dx. Because of its benefits of preserving edges (which have high gradient levels) and convexity, T V has been widely used in image restoration. Shah [21] suggested a modified version of the AT approximation to the MS functional by replacing the 2-norm of |∇u| by the 1-norm in the first term:
ΨMST V
(v − 1)2 dx. |∇v|2 + (u, v) = β v |∇u|dx + α 4 Ω Ω 2
This functional Γ −converges to the other functional Ψ MST V as → 0, [1]:
|∇u|dx + α
Ψ MST V (u) = β Ω\K
K
|u+ − u− | dH1 + |Dc u|(Ω) 1 + |u+ − u− |
where u+ and u− denote the image values on two sides of the jump set K = Ku of u, and Dc u is the Cantor part of the measure-valued derivative Du. Note |u+ −u− | that the non-convex term 1+|u + −u− | is similar with the prior regularization by Geman-Reynolds [9]. We observe that this regularizing functional is simi|Du| = lar to the total variation of u ∈ BV (Ω) that can be written as Ω + − 1 |∇u|dx + |u − u |dH + |D u|(Ω). By comparing the second terms, c Ω\Ku Ku we see that the M ST V regularizer does not penalize the jump part as much as the T V regularizer. In this paper, we consider the T V regularizer Ψ T V , the 1 M SH 1 regularizer ΨMSH , and the M ST V regularizer ΨMST V . Nonlocal Regularizers. Nonlocal methods in image processing have been explored in many papers because they are well adapted to texture denoising while the standard denoising models working with local image information seem to consider texture as noise, which results in losing details. Nonlocal methods are generalized from neighborhood filters (e.g. Yaroslavsky filter, [24]) and patch based methods. The idea of neighborhood filter is to restore a pixel by averaging the values of neighboring pixels with a similar grey level value. Buades et al. [8] generalized this idea by applying the patch-based method, and proposed the famous nonlocal-means (or NL-means) filter for denoising, given by N Lu(x) = − da (u(x),u(y)) 1 h2 u(y)dy; da (u(x), u(y)) = Ga (t)|u(x + t) − u(y + t)|2 dt is C(x) Ω e the patch distance, Ga is the Gaussian kernel with standard deviation a deterda (u(x),u(y)) h2 dy is a normalization factor, and mining the patch size, C(x) = Ω e− h is the filtering parameter corresponding to the noise level (usually the standard deviation of the noise). The NL-means not only compares the grey level at a single point but the geometrical configuration in a whole neighborhood (patch). In the variational framework, Kindermann et al [13] formulated the neighborhood filters and NL-means filters as nonlocal regularizing functionals which generally are not convex. Then, Gilboa-Osher [10] formalized the convex nonlocal functional inspired from graph theory, and moreover, based on the gradient and divergence definitions on graphs in the context of machine learning,
404
M. Jung and L.A. Vese
they [11] derived the corresponding nonlocal operators. Let u : Ω → IR be a function, and w : Ω × Ω → IR be a nonnegative and symmetric weight function. Thenonlocal gradient vector ∇w u : Ω × Ω → IR is (∇w u)(x, y) := → (u(y) − u(x)) w(x, y). Hence, the nonlocal divergence divw − v : Ω → IR of the → − vector v : Ω × Ω → IR is defined as the adjoint of the nonlocal gradient, → v )(x) := Ω (v(x, y) − v(y, x)) w(x, y)dy, and the norm of the nonlocal (divw − 2 gradient of u at x ∈ Ω is given by |∇w u|(x) = Ω (u(y) − u(x)) w(x, y)dy. Based on these nonlocal operators, they introduced nonlocal regularizing functionals of the general√form Ψ (u) = Ω φ(|∇w u|2 )dx, where s → √ φ(s) is a positive function, convex in s, and φ(0) = 0. By taking φ(s) = s, they proposed the nonlocal TV regularizer (N L/T V ) which corresponds in the local case to Ψ T V (u) = Ω |∇u|dx. Inspired by these ideas, we propose in the next section nonlocal versions of Ambrosio-Tortorelli and Shah approximations to the MS regularizers for image denoising-deblurring. This is also continuation of the work by Bar et al. [4], [5], first to propose the use of Mumford-Shah-like approximations to image restoration. In practice, we use the search window Ωw = {y ∈ Ω : |y − x| ≤ r} instead of Ω (semi-local) and the weight function w at (x, y) ∈ Ω ×Ω depending on a function da (f (x),f (y)) . The weight function w(x, y) gives f : Ω → IR, w(x, y) = exp − h2 the similarity of image features between two pixels x and y, which is normally computed using the blurry-noisy image f . Recently, for image deblurring in the presence of Gaussian noise, Lou et al [14] used a preprocessed image obtained by applying the Wiener filter to f , instead of f , to compute w. In our work, only for the impulse noise model, we propose a different preprocessing step and evaluate w by using the preprocessed image.
2
Description of the Proposed Models
We propose the following nonlocal Mumford-Shah regularizers (N L/M S) by applying the nonlocal operators to the approximations of the MS regularizer (v − 1)2 N L/MS 2 2 Ψ dx, |∇v|2 + (u, v) = β v φ(|∇w u| )dx + α 4 Ω Ω √ where φ(s) = s and φ(s) = s correspond to the nonlocal versions of M SH 1 and M ST V regularizers, so called N L/M SH 1 and N L/M ST V , respectively. In addition, we use these nonlocal regularizers to deblur images in the presence of Gaussian or impulse noise. Thus, by incorporating the proper fidelity term depending on the noise model, we design two types of total energies as G Gaussian noise model: E (u, v) = (f − k ∗ u)2 dx + Ψ N L/MS (u, v), Ω Impulse noise model: E Im (u, v) = |f − k ∗ u|dx + Ψ N L/MS (u, v). Ω
Nonlocal Variational Image Deblurring Models
405
Minimizing these functionals in u and v, we obtain the Euler-Lagrange equations ∂E Im ∂E G v−1 2 = = 2βvφ(|∇w u| ) − 2αv + α = 0, ∂v ∂v 2 Gaussian noise model: Impulse noise model:
∂E G = k˜ ∗ (k ∗ u − f ) + LN L/MS u = 0, ∂u ∂E Im = k˜ ∗ sign(k ∗ u − f ) + LN L/MS u = 0, ∂u
˜ where k(x) = k(−x) and
N L/MS (u(y) − u(x))w(x, y) L u=−2 · (v 2 (y)φ (|∇w (u)|2 (y)) + v 2 (x)φ (|∇w (u)|2 (x)) dy. Ω
The energy functionals E G (u, v) and E Im (u, v) are convex in each variable and bounded from below. Therefore, to solve two Euler-Lagrange equations simultaneously, the alternate minimization approach is applied. Note that since both energy functionals are not convex in the joint variable (u, v), we may compute only a local minimizer. However, this is not a drawback in practice, since the initial guess for u in our algorithm is the data f . To extend the nonlocal methods to the impulse noise case, we need a preprocessing step for the weight function w since we cannot directly use the data f to compute w. In other words, in the presence of impulse noise, the noisy pixels tend to have larger weights than the other neighboring points, so it is likely to keep the noise value at such pixel. Thus, we propose a simple algorithm to obtain a preprocessed image g, which removes the impulse noise (outliers) as well as preserving texture as much as possible. Basically, we use the median filter, well-known for removing impulse noise. However, if we apply one-step of the median filter, then the output may be too smoothed out. In order to preserve fine structures as well as to remove noise properly, we take the idea of Bregman iteration [6], [18], and we propose the following algorithm to obtain a preprocessed image g that will be used only in the computation of the weight function w: Initialize : r0 = 0, g0 = 0. do (iterate n = 0, 1, 2, . . . , m) gn+1 = median(f + rn , [a a]) rn+1 = rn + f − k ∗ gn+1 while f − k ∗ gn 1 > f − k ∗ gn+1 1 [Optional] gm = median(gm , [b b]) where f is the given noisy-blurry data, median(f, [a a]) is the median filter of size a × a with input f ; the optional step is needed in the case when the final gm still has some salt-and-pepper-like noise. This algorithm is simple, it requires a few iterations only, and it takes less than 1 second for a 256 × 256 size image. Moreover, the preprocessed image gm is a deblurred and denoised version of f ; it will be used only in the computation of the weights w, while keeping f in the data fidelity term, thus artifacts are not introduced by the median filter.
406
M. Jung and L.A. Vese
Characterization of Minimizers. In this section we characterize the minimizers of the functionals formulated with the nonlocal regularizers, using [15, 22]. Assuming that a functional · on a subspace of L2 (Ω) is a semi-norm, we can define the dual norm (where ·, · denotes the L2 (Ω) inner product) of f,ϕ f ∈ L2 (Ω) ⊂ L1 (Ω) as f ∗ := supϕ =0 ϕ ≤ +∞, so that the usual duality f, ϕ ≤ ϕ f ∗ holds for ϕ = 0. We define two functionals (here Ku := k ∗ u), F (u) = λ |f − Ku|2 dx + |u|N L/T V , Ω |v − 1|2 2 2 2 )dx |f − Ku| + η dx + β|u|N L/MS + α (|∇v| + G(u, v) = 4 Ω Ω where λ > 0, and |u|N L/MS ∈ {|u|N L/MST V,v , |u|N L/MSH 1 ,v }. We use here the notations |u|N LT V = Ω |∇w u|(x)dx, |u|N L/MST V,v = Ω v 2 (x)|∇w u|(x)dx, and 2 2 |u|N L/MSH 1 ,v = Ω v (x)|∇w u| (x)dx, which are semi-norms. We modified the regularizing functional |u|N L/MSH 1 ,v ; the square-root term replaces the original term of our model, Ω v 2 (x)|∇w u|2 (x)dx. It is introduced here to enable the characterization of minimizers below, but the numerical calculations utilize the original formulation. For the proofs we refer to [12]. Proposition 1. Let K : L2 (Ω) → L2 (Ω) be a linear bounded blurring operator with adjoint K ∗ and let F be the associated functional. Then 1 if and only if u ≡ 0 is a minimizer of F . (1) K ∗ f ∗ ≤ 2λ 1 (2) Assume that 2λ < K ∗ f ∗ < ∞. Then u is a minimizer of F if and only if 1 1 ∗ K (f − Ku) ∗ = 2λ and u, K ∗ (f − Ku) = 2λ |u|N L/T V ,
where · ∗ is the corresponding dual norm of | · |N L/T V . Proposition 2. Let K : L2 (Ω) → L2 (Ω) be a linear bounded blurring operator with adjoint K ∗ and let G be the associated functional. If (u, v) is a minimizer of G with v ∈ [0, 1], then f − Ku f − Ku ∗ ∗ K , u = β|u|N L/MS , K = β and (f − Ku)2 + η 2 ∗ (f − Ku)2 + η 2 where · ∗ is the corresponding dual norm of | · |N L/MS .
3
Experimental Results and Comparisons
The nonlocal MS regularizers proposed here, N L/M ST V and N L/M SH 1 , are tested on several images with different blur kernels and noise types. We compare them with their traditional (local) versions, such as M ST V and M SH 1 , and with the local and nonlocal total variations (T V [20], N L/T V [11]). In addition, we experiment the nonlocal regularizers in the impulse noise model with a preprocessing step for the weight function.
Nonlocal Variational Image Deblurring Models
407
Fig. 1. Image recovery with cross sections: Gaussian blur kernel with σb = 1 and Gaussian noise with σn = 5. Top: original image and its cross section, noisy blurry image and its cross section. Middle, Bottom rows: recovered images (middle) and recovered cross sections (bottom) using T V, M ST V, N L/T V, N L/M ST V . SNR for the results: T V = 32.9485, M ST V = 33.5629, N L/T V = 45.1943, N L/M ST V = 50.6618. β = 0.0045 (M ST V ), 0.001 (N L/M ST V ), α = 0.00000015, = 0.000001.
Fig. 2. Top: (1st, 3rd) original images, (2nd, 4th) noisy blurry images with Gaussian kernel with σb = 1 (2nd) and using the pill-box kernel of radius 2 (4th), and then contaminated by Gaussian noise with σn = 5. Bottom: recovered images with SNR values: T V (14.4240), M ST V (14.4693), N L/T V (17.4165), N L/M ST V (16.5776). β = 0.007, α = 0.00000015 (M ST V ), β = 0.0025, α = 0.00000025 (N L/M ST V ), = 0.0000005.
408
M. Jung and L.A. Vese
Fig. 3. Recovery of noisy blurry image from Fig. 3. Top: recovered image u using T V (SNR=25.0230), M ST V (SNR=25.1968), M SH 1 (SNR=23.1324). Third row: recovered image u using N L/T V (SNR=26.4554), N L/M ST V (SNR=26.4696), N L/M SH 1 (SNR=24.7164). Second, bottom rows: corresponding residuals f − k ∗ u. β = 0.0045 (M ST V ), 0.001 (N L/M ST V ), 0.06 (M SH 1 ), 0.006 (N L/M SH 1 ), α = 0.00000001, = 0.00002.
First, we test the Gaussian noise model in Figs. 1-3. As expected, N L/M ST V and N L/M SH 1 perform better than M ST V and M SH 1 respectively in the sense that not only they recover the fine scales such as texture better, but also in the case of N L/M ST V , the model does not produce any staircase effect (appeared in M ST V ). Furthermore, comparing the nonlocal MS regularizers
Nonlocal Variational Image Deblurring Models
409
Fig. 4. Recovery of noisy blurry image with Gaussian kernel with σ = 1 and saltand-pepper noise with d = 0.3. Top: original image, blurry image, noisy-blurry image. Middle: recovered images using T V (SNR=26.9251), M ST V (SNR=27.8336), M SH 1 (SNR=23.2052). Bottom: recovered images using N L/T V (SNR=29.2403), N L/M ST V (SNR=29.3503), N L/M SH 1 (SNR=27.1477). Second column: β = 0.25 (M ST V ), 0.1 (N L/M ST V ), α = 0.01, = 0.002. Third column: β = 2 (M SH 1 ), 0.55 (N L/M SH 1 ), α = 0.001, = 0.0001.
with N L/T V , N L/M ST V and N L/T V seem to lead to similar results visually and according to SNR, while N L/M SH 1 gives a smoother image and lower SNR. Specifically, in Fig. 1, we use a simple image and its 1D cross section. In this example, we use 11 × 11 size search window for N L/M ST V which is sufficient to obtain the best result, while N L/T V needs a 31×31 size. Moreover, N L/M ST V recovers the signals much better than N L/T V , which might be caused by the fact that originally, M ST V regularizer does not suppress the jump part as much as T V . On the other hand, in Fig. 2, N L/T V produces clearer edges leading to higher SNR, while N L/M ST V has some artifacts near the edges of especially
410
M. Jung and L.A. Vese
Fig. 5. Comparison between M SH 1 and N L/M SH 1 with the image blurred and contaminated by high density (d = 0.4) of impulse noise. Top: noisy blurry images (left) using motion blur kernel of length=10, oriented at angle θ = 25◦ w.r.t. the horizon and salt-and-pepper noise with d = 0.4, (middle) using Gaussian kernel with σb = 1 and salt-and-pepper noise with d = 0.4, (right) using Gaussian kernel with σb = 1 and random-valued impulse noise with d = 0.4. Middle: recovered images using M SH 1 , (left) SNR=17.1106, (middle) SNR=15.2017, (right) SNR=16.6960. Bottom: recovered images using N L/M SH 1 , (left) SNR=21.2464, (middle) SNR=23.1998, (right) SNR=24.2500. First column: β = 2 (M SH 1 ), 0.4 (N L/M SH 1 ), second column: β = 2 (M SH 1 ), 1 (N L/M SH 1 ), α = 0.001, = 0.0002. Third column: β = 2.5 (M SH 1 ), 0.65 (N L/M SH 1 ), α = 0.000001, = 0.002.
small black boxes. However, in the other real boat image, there is no significant difference between them visually and according to SNR (see Fig. 3). Fig. 3 also justifies the result that the nonlocal regularizers preserve edges and details better than the traditional local ones because we see less textures in the residuals f − k ∗ u.
Nonlocal Variational Image Deblurring Models
411
Next, we recover a blurred image contaminated by impulse noise (salt-andpepper noise or random-valued impulse noise). First, we test all the nonlocal regularizers and the corresponding local ones on the Lenna image Fig. 4 with Gaussian blur kernel and salt-and-pepper noise with the noise density d = 0.3, and then we test M SH 1 and N L/M SH 1 on the Einstein image Fig. 5 with different blur kernels and both impulse noise models, salt-and-pepper noise and random-valued impulse noise, with the same noise density d = 0.4. By using a preprocessed image for the weight function, all the nonlocal regularizers outperform the traditional local ones by reducing the staircase effect and recovering the details better. Comparing the nonlocal regularizers, both N L/T V and N L/M ST V seem to give better results than N L/M SH 1 in the sense of SNR, but visually N L/M SH 1 looks more natural by preserving texture or details better especially with high noise density (see Fig. 4). Moreover, in the presence of high density of noise, M SH 1 suffers from restoring images especially blurred with Gaussian kernel, while it works satisfactorily with the other blur kernels such as motion blur. But, N L/M SH 1 performs very well with Gaussian blur as well as it produces better results with the other blur kernels. This can be seen in Figures 4 and 5. In Fig. 4 with Gaussian blur and high noise density d = 0.3, M SH 1 suffers from some artifacts induced by noise, while M ST V and T V give cleaner results. On the other hand, N L/M SH 1 provides visually better result than the other nonlocal ones by preserving the fine structures. Even though N L/M ST V gives the highest SNR, the result still looks more like cartoon by suppressing the texture parts especially in the hat part. So in this case, we visually prefer N L/M SH 1 . Based on the above results, in Fig. 5, we only compare M SH 1 and N L/M SH 1 with the different blur kernels and both impulse noise models with higher noise density d = 0.4. As expected, N L/M SH 1 produces better results than M SH 1 in both blur cases; especially in the Gaussian blur case, the results do not have any artifacts, unlike M SH 1 . Finally we note that in the MS regularizers, the parameters α, β and were selected manually to provide the best SNR results. The smoothness parameter β increases with noise level while the other parameters α, are approximately fixed. For the computational time, it takes about 5 minutes for constructing the weight function of a 256 × 256 image with the 11 × 11 search window and 5 × 5 patch in MATLAB on a dual core laptop with 2GHz processor and 2GB memory. The minimization for the (local or nonlocal) MS regularizers takes around 60 seconds for the computations of both u using an explicit scheme based on the gradient descent method and v using a semi-implicit scheme with the total iterations 5 × (100 + 5), while the (local or nonlocal) TV regularizer using gradient descent with an explicit scheme takes less than 55 seconds with 500 iterations.
Acknowledgments This work has been supported by the National Science Foundation Grants DMS0714945 and DMS-0312222.
412
M. Jung and L.A. Vese
References 1. Alicandro, R., Braides, A., Shah, J.: Free-discontinuity problems via functionals involving the L1 -norm of the gradient and their approximation. Interfaces Free Bound 1, 17–37 (1999) 2. Alliney, S.: Digital Filters as Absolute Norm Regularizers. IEEE TSP 40(6), 1548– 1562 (1992) 3. Ambrosio, L., Tortorelli, V.M.: On the approximation of free discontinuity problems. Boll. Un. Mat. Ital. 6-B, 105–123 (1992) 4. Bar, L., Sochen, N., Kiryati, N.: Semi-Blind Image Restoration via Mumford-Shah Regularization. IEEE TIP 15(2), 483–493 (2006) 5. Bar, L., Sochen, N., Kiryati, N.: Image deblurring in the presence of impulsive noise. IJCV 70, 279–298 (2006) 6. Bregman, L.M.: The relaxation method for finding common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7, 200–217 (1967) 7. Bresson, X., Chan, T.F.: Non-local unsupervised variational image segmentation models. UCLA C.A.M. Report 08-67 (2008) 8. Buades, A., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. SIAM MMS 4(2), 490–530 (2005) 9. Geman, D., Reynolds, G.: Constrained Restoration and the Recovery of Discontinuities. IEEE TPAMI 14(3), 367–383 (1992) 10. Gilboa, G., Osher, S.: Nonlocal linear image regularization and supervised segmentation. SIAM MMS 6(2), 595–630 (2007) 11. Gilboa, G., Osher, S.: Nonlocal operators with applications to image processing. SIAM MMS 7(3), 1005–1028 (2008) 12. Jung, M., Vese, L.A.: Image restoration via nonlocal Mumford-Shah regularizers. UCLA C.A.M. Report 09-09 (2009) 13. Kindermann, S., Osher, S., Jones, P.W.: Deblurring and denoising of images by nonlocal functionals. SIAM MMS 4(4), 1091–1115 (2005) 14. Lou, Y., Zhang, X., Osher, S., Bertozzi, A.: Image recovery via nonlocal operators. UCLA C.A.M. Report 08-35 (2008) 15. Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. Univ. Lecture Ser. 22 (2002) 16. Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. 42, 577–685 (1989) 17. Nikolova, M.: Minimizers of cost-functions involving non-smooth data-fidelity terms. Application to the processing of outliers. SIAM Num. Anal. 40(3), 965–994 (2002) 18. Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation based image restoration. SIAM MMS 4, 460–489 (2005) 19. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60, 227–238 (1996) 20. Rudin, L., Osher, S.: Total variation based image restoration with free local constraints. IEEE ICIP 1, 31–35 (1994) 21. Shah, J.: A common framework for curve evolution, segmentation and anisotropic diffusion. In: IEEE CVPR, pp. 136–142 (1996) 22. Tadmor, E., Nezzar, S., Vese, L.: Multiscale hierarchical decomposition of images with applications to deblurring. Denoising and segmentation: CMS 6(2), 281–307 (2008) 23. Tichonov, A., Arsensin, V.: Solution of ill-posed problems. Wiley, New York (1977) 24. Yaroslavsky, L.P.: Digital image processing: An Introduction. Springer, Heidelberg (1985)
A Geometric PDE for Interpolation of M -Channel Data Frank Lenzen1 and Otmar Scherzer1,2 1
Department of Mathematics, University of Innsbruck, Technikerstrasse 21a, A-6020 Innsbruck, Austria {Frank.Lenzen,Otmar.Scherzer}@uibk.ac.at http://infmath.uibk.ac.at 2 Johann Radon Institute for Computational and Applied Mathematics (RICAM), Austrian Academy of Sciences, Altenbergerstrasse 69, A-4040 Linz, Austria
Abstract. We propose a partial differential equation to be used for interpolating M -channel data, such as digital color images. This equation is derived via a semi-group from a variational regularization method for minimizing displacement errors. For actual image interpolation, the solution of the PDE is projected onto a space of functions satisfying interpolation constraints. A comparison of the test results with standard and state-of-the-art interpolation algorithms shows the competitiveness of this approach.
1
Introduction
A frequent task in image processing is interpolation, which we refer to as the process of assigning a discrete set of pixel positions and according discrete M channel image data (e.g. RGB color data) an interpolating function. Interpolation is frequently used for zooming into or scaling digital images. A special kind of image interpolation problems is inpainting, i.e. the problem of reconstructing lost or corrupted parts of images. Linear interpolation (that is convolution methods) [18], such as for example nearest neighbor, spline, and the Whittaker-Shannon interpolation [14, 4], is computationally efficient but produce unpleasant artifacts. On the other hand, nonlinear methods adapting to geometrical structures can produce more visually attractive results but are computationally more demanding . Nowadays, most of these nonlinear methods are motivated by energy minimization or by scale spaces of partial differential equations, see for example [1, 22, 21, 18]. In particular for inpainting such nonlinear methods are widely used, see for example [2, 5, 6, 23]. In this paper we derive a partial differential equation that is designed to correct and filter for displacement errors in M -channel data. Combined with the interpolation ideas of [11, 16], this method is suited for interpolation. The paper is organized as follows: In Section 2 we consider a variational ansatz for correcting displacement errors. Application of the semi-group concepts yields X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 413–425, 2009. c Springer-Verlag Berlin Heidelberg 2009
414
F. Lenzen and O. Scherzer
a PDE, which can be considered the gradient flow of the variational problem. A relationship of our PDE to the Mean Curvature Flow (MCF) equation is established. Our approach is combined with interpolation constraints in Section 3. For comparison, we show in Section 4 results from the proposed method and from interpolation methods from the scale space literature. In particular we take into account the GREYCstoration software of Tschumperlé [21] and the interpolation method proposed by Roussos and Maragos [18,19]. The paper ends with a conclusion in Section 5.
2
Displacement Regularization
Let u : Ω → IRM be an M channel function representing continuous M -channel data on a bounded open domain Ω ⊆ IR2 . We presume the following image acquisition model: Data u(0) of u are given, which satisfy u(0) = u ◦ Φ , (1) where Φ : Ω → Ω is a displacement vector field. In the following we consider the problem of finding (u, Φ) satisfying (1) such that the displacement Φ − Id is small and u has minimal total variation. A variational method corresponding to this problem consists in minimization of 1 2 |Φ(x) − x| dx + α |∇u(x)| dx , (2) 2 Ω Ω for small α > 0 over the set of functions satisfying u(0) = u ◦ Φ. Here ⎛ ⎞1/2 2 M ∂1 u1 ∂1 u2 ∂1 u3 and |∇u(x)| = ⎝ (∂i uj (x))2 ⎠ . ∇u = ∂2 u1 ∂2 u2 ∂2 u3 j=1 i=1
We want to avoid solving a coupled system for (u, Φ), and therefore we assume that u is a smooth function, so that we can make a first order Taylor series expansion. Then it follows from our modeling assumptions that u(0) (x) = (u ◦ Φ)(x) = u(x + (Φ(x) − x)) ≈ u(x) + ∇uT (x) (Φ(x) − x) . (3) Here, ≈ symbolizes that the left hand side approximates the right hand side for small displacements Φ − Id. In the following, we assume that equality holds instead of ≈, which implies that only small displacements occur. Note that the equation ∇uT (x)(Φ(x) − x) = u(0) (x) − u(x) for unknown Φ(x) − x is overdetermined. In case that the difference u(0) (x) − u(x) is not only caused by a distortion Φ, no solution to this problem might exist. To overcome this problem, we consider the minimization of
2
T
∇u (x)(Φ(x) − x) − u(0) (x) + u(x) ,
(4)
A Geometric PDE for Interpolation of M -Channel Data
415
that is, we search for the displacement vector Φ(x) − x , which fits best to the data (u(0) (x), u(x)). The minimizer of (4) is given by Φ(x) − x = (∇uT (x))† (u(0) (x) − u(x)),
(5)
where (∇u(x))† denotes the pseudo–inverse (see [17]) of ∇u(x). For notational convenience, we leave out the dependence of u with respect to x in the following. Inserting (5) into (2) gives the functional 1 Fu0(0) (u) := (u − u(0) )T (∇uT ∇u)† (u − u(0) ) + α |∇u| dx . (6) 2 Ω In order to avoid computation of the pseudo–inverse, we additionally regularize the probably singular matrix ∇uT ∇u by the regular, symmetric, and strictly positive definite matrix (εI + ∇uT ∇u) with some ε > 0. To summarize, we consider in the sequel the variational problem of minimizing 1 Fuε (0) (u) := (u − u(0) )T (εI + ∇uT ∇u)−1 (u − u(0) ) + α |∇u| dx . (7) 2 Ω For this functional, existence theory within the classical framework of the Calculus of Variations [7, 8] is not applicable. Moreover for a theoretical analysis, minimization has in fact to be considered over the space of M -channel functions with components of finite total variation. In order to implement the minimization of Fvε numerically, quasi-convexification techniques would be most efficient. This approach requires the analytical calculation of the quasi-convex envelope of the function (x, ξ, ν) →
1 (ξ − v(x))T (εI + ν T ν)−1 (ξ − v(x)) + α |ν| 2
with respect to ν. However, the quasi-convex envelope function is not known so far, and thus efficient numerical minimization based on this approach is not at hand. In the following we recall the convex semi-group solution concept [3]: Let R : H → IR ∪ {∞} be a convex functional on a Hilbert space H, and let uα be a minimizer of the variational regularization functional 2 1 Gu(0) (u) := u − u(0) + αR(u) . 2 H Then, for u(0) sufficiently smooth, (uα − u(0) )/α converges for α → 0 to an element in the subgradient ∂R(u(0) ) of R. Choosing u(k) ∈ argmin Gu(k−1) , iterative minimization of Gu(k) yields an approximation of the solution of the flow ∂u ∈ ∂R(u) ∂t at scale t = kα. In other words, variational regularization approximates a diffusion filtering scale space, which is the associated gradient flow equation. For
416
F. Lenzen and O. Scherzer
convex semi-groups the solutions of diffusion filtering and variational methods are comparable and look rather similar [20]. We expect a similar behavior for the non-convex functional Fuε (0) and derive the according flow equation, which is the gradient flow associated with (7). We use the abbreviations
−1 Aε (u) := εI + ∇uT ∇u and 1 (u − u(k−1) )T Aε (u)(u − u(k−1) ) dx . Suε (k−1) (u) := 2 Ω The directional derivative of Suε (k−1) at u in direction φ (provided it exists) satisfies ∂τ Suε (k−1) (u + τ φ) = φT Aε (u)(u − u(k−1) ) dx+ Ω (8) 1 (k−1) T ε (k−1) (u − u ) ∂u,φ A (u) (u − u ) dx , 2 Ω where
Aε (u + τ φ) − Aε (u) . τ →0 τ In a similar way, the directional derivative of Rα (u) := α Ω |∇u| at u in direction φ can be derived in a formal way: ∇u ∂τ Rα (u + τ φ) = α dx. (9) ∇φT |∇u| Ω ∂u,φ Aε (u) := lim
Note that the right hand side of (9) is meant as the subdifferential of the TV semi-norm evaluated in the direction of φ. Using (8) and (9), the optimality condition for the minimizer u(k) of Fuε (k−1) reads as u(k) − u(k−1) dx φT Aε (u(k) ) α Ω 1 (u(k) − u(k−1) )T (10) ∂u(k) ,φ Aε (u(k) ) (u(k) − u(k−1) ) dx + 2 Ω α ∇u(k) dx. ∇φT =− |∇u(k) | Ω Let t > 0 be fixed and k = t/α, then, as in the convex case, we can expect that (u(k) − u(k−1) )/α converges to ∂t u(t) for α → 0. From this it follows then that u(k) − u(k−1) → 0, and from (10) it follows that ∇u(t) dx. (11) φT Aε (u(t))∂t u(t) dx = − ∇φT |∇u(t)| Ω Ω Using Green’s formula and the fundamental lemma, from (11) the strong formulation
−1 ∇u(t) , (12) ∂t u(t) = ∇ · Aε (u(t))∂t u(t) = εI + ∇uT (t)∇u(t) |∇u(t)| follows, where u(t) satisfies natural (Neumann) boundary conditions.
A Geometric PDE for Interpolation of M -Channel Data
417
In the following, we leave out the dependence of u with respect to t for notational convenience. Multiplying both sides of (12) by (εI + ∇uT ∇u), we get ∇u T . (13) ∂t u = (εI + ∇u ∇u) ∇ · |∇u| Moreover, the initial condition associated with the flow is u(0) := u(0) . Now, letting ε → 0, which only seems to make sense mathematically if M ≤ 2, we obtain the evolutionary partial differential equation ∇u T ∂t u = (∇u ∇u) ∇ · . (14) |∇u| Remark 1. For scalar data (M = 1) the equation (14) reads as ∇u 2 . ∂t u = |∇u| ∇ · |∇u|
(15)
One recognizes that (15) differs from the Mean Curvature Flow equation by the leading factor |∇u|2 instead of |∇u|. We generalize the functional in (6) to
† 1 (u − u(0) )T (∇uT ∇u)p (u − u(0) ) + α |∇u| dx 2 Ω
(16)
with p ≥ 0. We note that the power of a matrix is defined via spectral decomposition. The case p = 1/2 is of particular interest, because – the functional (16) becomes invariant under affine rescaling of the image brightness. – The semi-group approach (see also [10] for the scalar case) gives the gradient flow 1 ∇u T 2 , ∂t u = (∇u ∇u) ∇ · |∇u| which, in the scalar case, is the Mean Curvature Flow equation. For an analytical comparison of the solution of (16) for scalar, radial-symmetric monotonous data to the MCF solution we refer to [9].
3
Interpolation of M-Channel Data
The evolution equation (14) can be used for interpolating discrete M -channel data by restricting u to satisfy interpolation constraints. The problem of interpolating M -channel data has already been studied in the literature before, see for example [1, 21, 18, 19]. The difference between the approaches by [21, 18, 19] and our approach are the different PDEs for filtering: [21, 18, 19] use anisotropic diffusion, whereas the PDE (14) generalizes the Mean Curvature Flow equation.
418
F. Lenzen and O. Scherzer
To begin with, we recall the interpolation constraints proposed in [11,16]. For the simplicity of notation we restrict ourself to M -channel data defined on a two-dimensional rectangular domain 1 1 1 1 , Nx + × , Ny + , Ω := 2 2 2 2 where Nx , Ny ∈ N. The domain is partitioned into cells (’pixels’) Qi,j :=
1 1 1 1 × j − ,j + , i − ,i + 2 2 2 2
(i, j) = (1, 1), (1, 2) . . . , (Nx , Ny ) .
Let G be a kernel function defined on IR2 and compactly supported in [− 21 , 12 ]2 . Let Z := (zm,i,j ) a tensor, which denotes sampled data of a function G ∗ u : IR2 → IRM at the positions (i, j). Here ∗ denotes the convolution operator. In particular: zm,i,j := (G ∗ um )(i, j),
(m, i, j) = (1, 1, 1), (1, 1, 2) . . . , (M, Nx , Ny ) .
(17)
Examples for kernel functions typically used in literature are listed in [18]. We rewrite (17) as follows: Let Gi,j := G(· − (i, j)), then zm,i,j = Gi,j , um L2 (Ω) ,
(m, i, j) = (1, 1, 1), . . . , (M, Nx , Ny ) .
We say that an M -channel function u = (u1 , . . . , uM ) satisfies the interpolation constraints for some discrete data Z = (zm,i,j ), if Gi,j , um L2 (Ω) = zm,i,j . The set of functions satisfying the interpolation constraints for data Z is denoted by UZ,G . Example 1. We consider for G the two-dimensional δ distribution, i.e., G(x, y) = δ(x)δ(y). Then zm,i,j = um ((i, j)). The nearest neighbor (componentwise, piecewise constant) interpolation reads as u(0) m |Qi,j = zm,i,j ,
(m, i, j) = (1, 1, 1), . . . , (M, Nx , Ny ) .
Here, u(0) = u ◦ Φ, where Φ(x, y)|Qi,j = (i, j). In particular u can be interpreted as a distortion of u(0) by a local sampling displacement Φ. Now let u(0) ∈ UZ,G be arbitrary. The nearest neighbor interpolation in Example 1 motivates the assumption that, for a sampled function u, there exists Φ such that u(0) = u ◦ Φ. Recalling the concepts presented in Section 2 we consider the
A Geometric PDE for Interpolation of M -Channel Data
419
functional defined in (7) restricted to the set UZ,G in order to reconstruct u from given u(0) . In turn, we restrict the flow equation (13) to UZ,G : ∇u T , (18) ∂t u = PU0,G (εI + ∇u ∇u)∇ · |∇u| where PU0,G (v) = v − G−2 L2 (R2 )
Ny Nx
Gi,j , vL2 (Ω) Gi,j
i=1 j=1
is applied on each component separately. Note that the assumption u(0) ∈ UZ,G together with ∂t u ∈ U0,G asserts that the solution u(t), t ≥ 0 stays in UZ,G . At this point we remark that there is no analytical theory guaranteeing the well posedness of (18). Since (18) comprises a projection, in order to solve (18) numerically a timeexplicit scheme with sufficiently small step size Δt is required.
4
Numerical Results
We compare our method consisting in numerically solving (18) to two standard interpolation methods, namely nearest neighbor and cubic interpolation, as well as to established, sophisticated interpolation methods proposed by Tschumperlé & Deriche [21] and by Roussos & Maragos [19]. The method of Tschumperlé & Deriche is implemented in the GREYCstoration software (see http://cimg. sourceforge.net/greycstoration/), for the method of Roussos & Maragos, test results are available from the site http://cvsp.cs.ntua.gr/∼tassos/PDEinterp/ ssvm07res/. In our method, the kernel function has to be chosen appropriately. We use G(x, y) :=
1 χ 1 1 2 gσ (x, y), g (x, y) dx dy [− 2 , 2 ] [− 1 , 1 ]2 σ 2 2
where gσ is the two-dimensional isotropic Gaussian of standard deviation σ. In our method a value of 20 is used for the variance σ 2 . For evaluating the methods, we use the two test images shown in Fig. 1. For both images, a low and a high resolution version is available, where the low resolution image is obtained from the high resolution image via low-pass filtering (convolution with a bicubic spline) and downsampling by a factor of four, see [19]. The test images were provided by Roussos & Maragos. The methods mentioned above are used to upsample the low resolution image by a factor four. Our method is applied with 100 time steps, Δt = 0.03 , ε = 0.05 and σ 2 = 20 for the first and 100 time steps, Δt = 0.05, ε = 0.01 and σ 2 = 20 for the second test image, respectively. For GREYCstoration (version 2.9) we use the option ’-resize’ together with the aimed size of the high resolution image and parameters ’-anchor true’, ’-iter 3’ and ’-dt 10’. For the remaining parameters
420
F. Lenzen and O. Scherzer
Fig. 1. Two test images. Each test image is available in a low and a high resolution version with a factor of four between both resolution.
the default values are used. The results of Roussos’ method were obtained from the web site mentioned above. Let us consider the results of upsampling the first test image. In order to highlight the differences between the methods, we compare only details of the resulting images, see Fig. 2. The results with nearest neighbor and cubic interpolation are shown in Fig. 2, top right and middle left, respectively. Both results are unsatisfactory and confirm, what is well known from the literature, that by nearest neighbor interpolation the upsampled images look blocky and cubic interpolation produces blurry images. The result of GREYCstoration with interpolation constraints (Fig. 2, middle row right) also appears blurry, but compared to cubic interpolation better reconstruct the edges in the image. The method proposed by Roussos & Maragos as well as our method (see Fig. 2, bottom row) produce sharp and well reconstructed edges. In order to further investigate the differences between the PDE based methods, we zoom into two regions of the second test image, one region containing an edge (see Fig. 3) and one region with texture (see Fig. 4). Fig. 3 shows the edge region after applying the methods proposed by Tschumperlé with interpolation constraints (top row, second left), Roussos (top row, second right) and our method (top row, right). For comparison we have plotted the detail of the original image (top row, left). One can see that by Tschumperlé’s method the edges appear blurry and irregular. This seems to be an effect of the interpolation constraints, because when Tschumperlé’s method is applied without constraints, strong anisotropic diffusion along the edge occurs so that the edge becomes more regular. By the method of Roussos the edge is reconstructed in a sharp way, but overshots appear. Our method is also able to reconstruct the edge sharply but with little overshots. Concerning the gray mark at the parrot’s beak, we observe that Tschumperlé’s method reconstructs the shape of the mark better than the other methods do. The differences in the behavior of the methods can also be recognized when applying the Sobel operator to the interpolated images: The thickness of the edges in the result of the Sobel operator indicates the blurriness of the reconstructed edge. We see that the proposed method produces sharper edges than the
A Geometric PDE for Interpolation of M -Channel Data
421
Fig. 2. Upsampling by a factor of four, Detail of the first test image. top left: original high resolution image, top right: nearest neighbor interpolation, middle left: cubic interpolation, middle right: interpolation using GREYCstoration, bottom left: interpolation method proposed by Roussos et. al, bottom right: proposed interpolation method.
method by Roussos and more regular edges than the method by Tschumperlé. The overshots introduced by Roussos’ method can also be observed in the outcome of the Sobel operator. They are far stronger than the overshots produced by our method. Now we investigate the effect of the interpolation methods on textures. Fig. 4, top left, shows a textured region of the original image. The results of the methods proposed by Tschumperlé (with interpolation constraints) and Roussos are given in Fig. 4, top right and bottom left, respectively. The result of the proposed method is shown in Fig. 4, bottom right. One observes a certain blurriness
422
F. Lenzen and O. Scherzer
Fig. 3. Detail of an edge in the original and interpolated images (top row, using GREYCstoration with interpolation constraints, Roussos’ method, and the proposed method) and subsequently applied Sobel operator (bottom row)
in the results by Tschumperlé’s method. As for the result before, we point out that incorporating the interpolation constraints seems to have a strong effect on the result. When applying GREYCstoration without imposing constraints, the results are much more influenced by the anisotropic diffusion and the edges and the texture are accentuated. In the result of the interpolation method proposed by Roussos, we see a strong effect of the anisotropic diffusion on the texture, so that the result is more visually appealing than the other results. Nevertheless, a comparison with the original image shows that original and reconstructed texture differ significantly. In particular the orientations of the short stripes in the face of the parrot are different. Note that the anisotropic diffusion induced by the direction of the texture also affects the pupil of the parrot. On the result of our method we remark that the reconstruction of the texture is quite conservative, i.e., we stay near the initial guess. The blockyness is slightly reduced by the evolution process. Taking a look at the eye of the parrot, the relation of our
A Geometric PDE for Interpolation of M -Channel Data
423
Fig. 4. A texture detail of the original (top left) and interpolated images using GREYCstoration (top right), Roussos’ method (bottom left) and the proposed method (bottom right)
method to Mean Curvature Flow can be observed: The pupil is reconstructed as a perfectly circular shape.
5
Conclusion
We have proposed a new PDE based method for the interpolation of color images. The method differs from other state-of-the-art methods by the underlying evolution process. We use a PDE which is a generalized Mean Curvature Flow, whereas other methods are based on anisotropic diffusion. Interpolation constraints are satisfied by projecting the evolution process onto an adequate function space. Numerical tests show that our method is competitive to state-of-the-art interpolation methods. Due to the Mean Curvature Flow nature of the method, edges are well reconstructed. Textures are treated in a conservative manner.
424
F. Lenzen and O. Scherzer
Acknowledgments We want to thank Gerhard Dziuk (Univ. Freiburg), Peter Elbau (RICAM, Linz) and Markus Grasmair (University Innsbruck) for inspirational discussions. We thank David Tschumperlé for providing GREYCstoration and Anastasios Roussos and Petros Maragos for providing the test images as well as the results of their algorithm. The work of O.S. is partially funded by the project FSP S 92 (subproject 9203-N12).
References 1. Belahmidi, A., Guichard, F.: A partial differential equation approach to image zoom. In: Proc. of the 2004 Int. Conf. on Image Processing, pp. 649–652 (2004) 2. Bertalmio, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: [13], pp. 417–424 (2000) 3. Brézis, H.: Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert. North-Holland Publishing Co., Amsterdam (1973); NorthHolland Mathematics Studies, No. 5. Notas de Matemática (50) 4. Burger, W., Burge, M.J.: Digitale Bildverarbeitung. Springer, Heidelberg (2005) 5. Chan, R., Setzer, S., Steidl, G.: Inpainting by flexible Haar wavelet shrinkage. Preprint, University of Mannheim (2008) 6. Chan, T., Kang, S., Shen, J.: Euler’s elastica and curvature based inpaintings. SIAM J. Appl. Math. 63(2), 564–592 (2002) 7. Dacorogna, B.: Weak Continuity and Weak Lower Semicontinuity of Non-Linear Functionals. Lecture Notes in Mathematics, vol. 922. Springer, Heidelberg (1982) 8. Dacorogna, B.: Direct Methods in the Calculus of Variations. Applied Mathematical Sciences, vol. 78. Springer, Berlin (1989) 9. Elbau, P., Grasmair, M., Lenzen, F., Scherzer, O.: Evolution by non-convex energy functionals. Reports of FSP S092 - Industrial Geometry 75, University of Innsbruck, Austria (submitted) (2008) 10. Grasmair, M., Lenzen, F., Obereder, A., Scherzer, O., Fuchs, M.: A non-convex PDE scale space. In: [15], pp. 303–315 (2005) 11. Guichard, F., Malgouyres, F.: Total variation based interpolation. In: Proceedings of the European Signal Processing Conference, vol. 3, pp. 1741–1744 (1998) 12. Hagen, H., Weickert, J. (eds.): Visualization and Processing of Tensor Fields. Mathematics and Visualization. Springer, Heidelberg (2006) 13. Hoffmeyer, S. (ed.): Proceedings of the Computer Graphics Conference 2000 (SIGGRAPH 2000). ACMPress, New York (2000) 14. Jähne, B.: Digitale Bildverarbeitung, 5th edn. Springer, Heidelberg (2002) 15. Kimmel, R., Sochen, N.A., Weickert, J. (eds.): Scale-Space 2005. LNCS, vol. 3459. Springer, Heidelberg (2005) 16. Malgouyres, F., Guichard, F.: Edge direction preserving image zooming: a mathematical and numerical analysis. SIAM J. Numer. Anal. 39, 1–37 (2001) 17. Nashed, M.Z. (ed.): Generalized inverses and applications. Academic Press/ Harcourt Brace Jovanovich Publishers, New York (1976) 18. Roussos, A., Maragos, P.: Vector-valued image interpolation by an anisotropic diffusion-projection pde. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 104–115. Springer, Heidelberg (2007)
A Geometric PDE for Interpolation of M -Channel Data
425
19. Roussos, A., Maragos, P.: Reversible interpolation of vectorial images by an anisotropic diffusion-projection pde. In: Special Issue for the SSVM 2007 conference. Springer, Heidelberg (2007) (accepted for publication) 20. Scherzer, O., Weickert, J.: Relations between regularization and diffusion filtering. J. Math. Imaging Vision 12(1), 43–63 (2000) 21. Tschumperlé, D.: Fast anisotropic smoothing of multi-valued images using curvature-preserving pde’s. International Journal of Computer Vision (IJCV) 68, 65–82 (2006) 22. Tschumperlé, D., Deriche, R.: Vector valued image regularization with pdes: A common framework for different applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005) 23. Weickert, J., Welk, M.: Tensor field interpolation with pdes. In: [12], pp. 315–325 (2006)
An Edge-Preserving Multilevel Method for Deblurring, Denoising, and Segmentation Serena Morigi1 , Lothar Reichel2 , and Fiorella Sgallari1 1
2
Dept. of Mathematics-CIRAM, University of Bologna, Bologna, Italy {morigi,sgallari}@dm.unibo.it Dept. of Mathematical Sciences, Kent State University, Kent, OH 44242, USA
[email protected] Abstract. We present a fast edge-preserving cascadic multilevel image restoration method for reducing blur and noise in contaminated images. The method also can be applied to segmentation. Our multilevel method blends linear algebra and partial differential equation techniques. Regularization is achieved by truncated iteration on each level. Prolongation is carried out by nonlinear edge-preserving and noise-reducing operators. A thresholding updating technique is shown to reduce “ringing” artifacts. Our algorithm combines deblurring, denoising, and segmentation within a single framework.
1
Introduction
Digital image restoration, reconstruction, and segmentation are important in medical and astronomical imaging, film restoration, as well as in image and video coding. This paper introduces a cascadic multilevel method for simultaneous restoration and segmentation of blurred and noisy images. Blur arises for many reasons, including out-of-focus cameras, and camera or object motion during exposure. Blur often is modeled by a point-spread function (PSF). Noise is the random, unwanted, variation in brightness of an image. It may originate from, e.g., film grain or electronic noise from a digital camera or scanner. We consider additive noise in this work. It is well known that linear deblurring methods tend to introduce oscillatory artifacts. Variational deblurring methods are able to reduce these artifacts, however, they typically are much more computationally intensive than linear methods; see, e.g., Welk et al. [15] for a discussion. Many segmentation methods apply curve evolution techniques. These methods seek to detect object boundaries, represented by closed curves in an image. The contours are represented as the zero level set of an implicit function defined in higher dimension. The active contours evolve in time according to a Partial Differential Equation (PDE) model, which takes into account intrinsic geometric measures of the image. We will use a variant proposed by Li et al. [7] of the wellknown Geodesic Active Contours (GAC) model [2]. This paper discusses a cascadic multilevel image restoration method that allows both spatially variant and spatially invariant PSFs. The method requires X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 426–438, 2009. c Springer-Verlag Berlin Heidelberg 2009
Multilevel Method for Deblurring, Denoising, and Segmentation
427
the solution of a linear system of equations on each level. These systems are solved by an iterative method, the choice of which depends on properties of the PSF. We introduce a thresholding updating strategy in order to suppress “ringing.” The restriction operators are defined by solving local weighted least-squares problems, and the prolongation operators are determined by piecewise linear prolongation followed by integrating a discretized nonlinear Perona-Malik diffusion equation for a few time-steps. The purpose of the integration is to reduce noise. The cascadic multilevel method so obtained shares the computational efficiency and simplicity of truncated iteration for the solution of linear discrete ill-posed problems with the edge-preserving property of nonlinear models. The multilevel method proceeds from coarser to finer levels, and regularizes by truncated iteration on each level. For many image restoration problems, the multilevel method demands fewer matrix-vector product evaluations on the finest level than the corresponding 1-level truncated iterative method, and often determines restorations of higher quality. A benefit of our multilevel approach to image restoration is that it easily can be combined with image segmentation, as is illustrated in the present paper. We remark that our multilevel method differs significantly from multilevel methods for the solution of well-posed boundary value problems for elliptic partial differential equations in that prolongation and restriction operators, as well as the number of iterations on each level are chosen in a different manner. This paper is organized as follows. Section 2 introduces the variational deblurring and the denoising model, Section 3 discusses the cascadic multilevel framework, and Section 4 presents a few computed examples. Concluding remarks can be found in Section 5.
2
Deblurring, Denoising, and Segmentation of Images
We consider the restoration of two-dimensional gray-scale images, which have been contaminated by blur and noise. The available observed blur- and noisecontaminated image f δ is related to the unavailable blur- and noise-free image u ˆ by the degradation model δ f (x) = h(x, y)ˆ u(y)dy + η δ (x), x ∈ Ω, (1) Ω 2
where Ω ⊂ R is the image domain, η δ represents noise in the data, and the kernel h(x, y) models the PSF. If the blur is spatially invariant, then h is of the form ˜ ˜ The kernel is smooth or piecewise smooth h(x, y) = h(x− y) for some function h. and, therefore, the integral operator is compact. It follows that the solution of (1) is an ill-posed problem; see, e.g., Engl et al. [3] and Hansen [5] for discussions on ill-posed problems and their numerical solution. We would like to determine an accurate approximation of u ˆ when the observed image f δ and the kernel h, but not the noise η δ , are known. A popular approach to achieving this is to minimize the functional 2 1 δ E(u) = h(x, y)u(y)dy − f (x) + ρ R(u(x))dx, (2) Ω 2 Ω
428
S. Morigi, L. Reichel, and F. Sgallari
where ρ > 0 is a regularization parameter and R(u) = ψ(|∇u|2 )
(3)
is a regularization operator. Here ψ is a differentiable monotonically increasing function and ∇u denotes the gradient of u; see, e.g., Rudin et al. [11] and Welk et al. [15] for discussions on this kind of regularization operators. The Euler-Lagrange equation associated with (2), supplied with a gradient descent which yields a minimizer as “time” t → ∞, is given by ∂u (t, z) = − h(x, z) h(x, y)u(t, y)dy − f δ (x) dx + ρ D(u(t, z)), ∂t Ω Ω (4) for z ∈ Ω and t ≥ 0. The initial function u(0, z) = f δ (z), z ∈ Ω, and suitable boundary conditions are used. We also refer to D as a regularization operator. Image restoration methods based on the Euler-Lagrange equation require that the regularization operator D, as well as values of the regularization parameter ρ and a suitable finite time-interval of integration [0, T ] be chosen. The determination of suitable values of ρ and T generally is not straightforward. We get from (3) that D(u) = div(g(|∇u|2 )∇u),
g(t) = dΨ (t)/dt.
(5)
The function g is referred to as the diffusivity. Perona-Malik regularization is obtained by choosing the diffusivity g(s) =
1 , 1 + s/σ
(6)
where σ is a positive constant; see [14]. Alternatively, one can use a regularization operator of total variation-type. Nonlinear models based on (4)-(6) can provide denoising and deblurring of good quality; however, their time-integration is computationally demanding: explicit methods require many tiny time-steps and therefore are expensive, while each time-step with an implicit or semi-implicit method is, in general, expensive even if it could be accelerated by multigrid techniques. A much cheaper and simpler approach to determining an approximation of the desired image u ˆ is to apply a few steps of an iterative method to the linear system of equations obtained by a discrete approximation of (1), Au = bδ ,
A ∈ Rn×n ,
u, bδ ∈ Rn .
(7)
Here A is a discrete blurring operator and bδ represents the available blur- and noise-contaminated image. In applications typically bδ , rather than f δ , is available; see [5] for details. Approximate solutions of (7) conveniently can be computed by Krylov subspace iterative methods, where the choice of method depends on the matrix properties. For instance, spatially variant blur often gives rise to a nonsymmetric matrix A, and we may use the LSQR Krylov subspace method [13] to solve
Multilevel Method for Deblurring, Denoising, and Segmentation
429
(7). This method is an implementation of the conjugate gradient method applied to the normal equations. When the matrix is symmetric, but possibly indefinite, the MR-II [4] Krylov subspace method is an attractive alternative to LSQR. The iteration number may be considered a discrete regularization parameter. It is important not to carry out too many iterations in order to avoid severe error propagation. This approach to determining a restored image is referred to as regularization by truncated iteration; see, e.g., [3, 4, 8] for discussions. Due to cut-off of high frequencies, these iterative methods may introduce artifacts, such as ringing, and fail to recover edges accurately. Many image analysis applications require image segmentation. The level to which segmentation is carried out depends on the problem being solved; segmentation should be terminated when the regions of interest in the application have been isolated. This problem-dependence makes autonomous segmentation one of the most difficult computational tasks in image analysis. The presence of noise and blur makes this task even more complicated. In this paper we carry out segmentation by computing Geodesic Active Contours (GAC). This kind of segmentation methods are based on curve evolution theory, see [2] and references therein, and level sets [12]. The basic idea is to start with initial boundary shapes represented by closed curves, i.e., contours, and iteratively modify these contours by application of shrink/expansion operations determined by image constraints. The shrink/expansion operations, referred to as contour evolution, are performed by minimizing an energy functional, similarly to traditional region-based segmentation methods; however, the level set framework provides more flexibility. The GAC PDE model proposed in [2] is given by ∂φ ∇φ = |∇φ|div g(|∇bδ |2 ) , (8) ∂t |∇φ| where the edge-detector function g is defined by (6) and the initial condition φ0 is the signed distance function to an arbitrary initial curve enclosing the objects to be segmented. The solution to the segmentation problem is the zero-level set of the steady state of the flow φt = 0. We apply a fast curve evolution method recently suggested by Li et al. [7] in our multilevel method, which eliminates the need of costly re-initialization, but we remark that other GAC methods also can be used.
3
The Cascadic Multilevel Framework
We first review the cascadic multilevel method proposed in [8] for the removal of blur and noise. In [8] only symmetric blurring matrices are considered. Introduce for v = [v (1) , v (2) , . . . , v (n) ]T ∈ Rn the weighted least-squares norm v =
1 (i) 2 v n i=1 n
1/2 .
(9)
430
S. Morigi, L. Reichel, and F. Sgallari
Let ˆb ∈ Rn denote the unknown noise-free right-hand side associated with the right-hand side bδ of (7). We assume that ˆb ∈ Range(A) and that a bound δ for the noise e = bδ − ˆb is available, i.e., e ≤ δ.
(10)
Let W1 ⊂ W2 ⊂ · · · ⊂ W be a sequence of nested subspaces of Rn of dimension dim(Wi ) = ni with n1 < n2 < . . . < n = n. We refer to the subspaces Wi as levels, with W1 being the coarsest and W = Rn the finest level. Each level is furnished with a weighted least-squares norm; level Wi has a norm of the form (9) with n replaced by ni . We choose ni−1 = ni /4, 1 < i ≤ . Let Ai ∈ Rni ×ni be the representation of the blurring operator A on level Wi . The matrix Ai is determined by discretization of the integral operator (1) similarly as A. This defines implicitly the restriction operator Ri : Rn → Wi , such that Ai = Ri ARi∗ . (11) We define R = I. The choice of restriction operators Ri is in our experience less crucial for (ω) achieving high-quality restorations than the choice of restriction operators Ri : Rn → Wi for reducing the available blur- and noise-contaminated image represented by the right-hand side bδ in (7). We let (ω)
bδi = Ri bδ , (ω)
1 ≤ i < ,
(12)
where the Ri are determined by repeated local weighted least-squares approximation, inspired by a “staircasing”-reducing scheme recently proposed by Buades et al. [1]. Also the choice of prolongation operators from level i−1 to level i is important for the performance of the multilevel method. We apply nonlinear prolongation operators Pi : Wi−1 → Wi , 1 < i ≤ , defined by piecewise linear interpolation followed by integration of the Perona-Malik equation over a short time-interval; see below. The Pi are designed to be noise-reducing and edge-preserving. The multilevel methods of the present paper are cascadic, i.e., they first determine an approximate solution of A1 u = bδ1 in W1 , using the LSQR or MR-II iterative methods. We refer to the iterative method as IM in Algorithm 1 below. The iterations with this method are terminated by the discrepancy principle; see below. The so determined approximate solution in W1 is mapped into W2 by the prolongation P2 . A correction of this mapped iterate in W2 is computed by the IM. Again, the iterations are terminated by the discrepancy principle, and the approximate solution in W2 so obtained is mapped into W3 by P3 . The computations are continued in this fashion until an approximation of u ˆ has been determined in W = Rn . In the algorithm Δui,mi := IM(Ai , bδi − Ai ui,0 ) denotes the computation of the approximate solution Δui,mi of Ai zi = bδi − Ai ui,0 by mi iterations with one of the iterative methods MR-II or LSQR, using the initial iterate Δui,0 = 0.
Multilevel Method for Deblurring, Denoising, and Segmentation
431
Multilevel Algorithm 1 Input: A, bδ , δ, ≥ 1 (number of levels); Output: approximate solution u ∈ W of (7); segmented result φ ; Determine Ai and bδi from (11) and (12), respectively, 1 ≤ i ≤ ; u0 := 0; φ0 := initial contour; for i := 1, 2, . . . , do ui,0 := Pi ui−1 ; φi,0 := Si φi−1 ; Δui,mi := IM(Ai , bδi − Ai ui,0 ); Correction step: ui := ui,0 + βΔui,mi ; Segmentation step: φi := GAC(φi,0 , ui ); endfor
The number of iterations on each level is based on the discrepancy principle as follows: we assume that there are constants ci independent of δ, such that bδi − ˆbi ≤ ci δ,
1 ≤ i ≤ ,
where δ satisfies (10). It can be seen by using the noise-reducing property of the (ω) restriction operators Ri , that a suitable choice is ci =
1 ci+1 , 3
1 ≤ i < ,
c = γ,
(13)
for some constant γ > 1. In the computed examples of Section 4, we use γ = 1.4. The discrepancy principle prescribes that the iterations on level i be terminated as soon as bi − Ai ui,0 − Ai Δui,mi ≤ ci δ. (14) When many iterations are carried out, the computed approximate solution Δui,mi obtained, generally, is severely contaminated by noise, which is propagated from bi − Ai ui,0 . The purpose of the stopping criterion (14) is to i) allow enough iterations be carried out to determine an as accurate restoration on level i as possible, and ii) avoid to carry out so many iterations that the computed approximate solution Δui,mi is severely contaminated by propagated noise. Discussions on properties of the stopping rule (14) can be found in [8,10]. A general discussion on applications of the discrepancy principle to determine approximate solutions of ill-posed problems is provided in [3]. The nonlinear edge-preserving prolongation operators Pi have previously been applied in [8], where further details on their implementation are provided; see also [16]. The prolongation operator Pi first maps the approximate solution determined by the algorithm on level Wi−1 into Wi by piecewise linear interpolation, and then uses the result as initial function for a discretized initial-boundary value problem for the Perona-Malik nonlinear diffusion equation ∂u = div(g(|∇u|2 )∇u), ∂t
(15)
432
S. Morigi, L. Reichel, and F. Sgallari
where g is the Perona-Malik diffusivity (6). Integration over a short time-interval removes noise while preserving rapid spatial transitions, such as edges. Integration is performed by carrying out about 10 time-steps of size about 0.2 with an explicit finite difference method. The small number of time-steps avoids difficulties due to numerical instability and keeps the computational work required for integration negligible. We found it to be beneficial to apply more time-steps the more noise-contaminated the available image. However, in our experience the exact choices of the number of time-steps and their sizes are not crucial for the good performance of the multilevel method. In the algorithm, φ0 denotes the initial contour for the GAC segmentation method implemented by the solving (8); see [7]. The prolongation of the level set function from Wi−1 to Wi is carried out by spline interpolation and denoted by Si . The statement φi := GAC(φi,0 , ui ) updates the contour on level i. Ringing in restored images stems from the Gibbs phenomenon at discontinuities. The latter could be image borders, boundaries inside the image, or be introduced by inadequate spatial sampling of the image or kernel. The larger the support of the kernel in (1), the more pronounced the ringing. High contrast edges cause strong ringing, and the magnitude of the ringing is proportional to the norm of the image gradient. Based on these observations, we propose a deringing correction obtained by multiplying the image by the spatially variant function β(x, y) = α + (1 − α)(1 − g(|∇ui,0 (x, y)|2 )), (16) where g is the diffusivity (6) and the parameter 0 ≤ α ≤ 1 controls the suppression of the computed correction. Since we would like to suppress ringing in the smooth regions, but avoid suppression of edges, the correction function β should be small in smooth regions and large elsewhere. We use α = 0.05 in the computed examples of this paper, but this value can be tuned depending on the presence of large homogeneous regions in the image.
4
Numerical Results
We illustrate the performance of Algorithm 1. The computations are carried out in MATLAB with about 16 significant decimal digits. We assume that a fairly accurate estimate of the norm of the noise is available. If this is not the case, such an estimate can be computed by integration of bδ for a few time-steps with the Perona-Malik differential equation; details are described in a forthcoming paper. Note that the matrices Ai , defined by (11), do not have to be explicitly stored; it suffices to define functions for the evaluation of matrix-vector products with the Ai and, if Ai is nonsymmetric, also with the ATi . For the examples of this section, the matrix-vector products can be computed efficiently by using the structure of the Ai ; see, e.g., [9] for a discussion. The matrices corresponding to the finest level are numerically singular in all examples. The displayed restored images provide a qualitative comparison of the performance of the proposed cascadic multilevel method. A quantitative comparison is given by the Peak Signal-to-Noise Ratio,
Multilevel Method for Deblurring, Denoising, and Segmentation
PSNR(u , u ˆ) = 20 log10
255 dB, u − u ˆ
433
(17)
where u ˆ denotes the blur- and noise-free image and u the restored image determined by Algorithm 1. Each pixel is stored with 8 bits; the numerator 255 is the largest pixel-value that can be represented with 8 bits. A high PSNR-value indicates that the restoration is accurate; however, the PSNR-values are not always in agreement with visual perception. We also measure the variation in the error image uerr = u − uˆ, defined by EV(u , u ˆ) = ∇uerr 22 , (18) pixel
where the sum is over all pixels of the image. The more accurately the edges are restored, the smaller this sum.
Fig. 1. Blur- and noise-free images used in the numerical experiments. Left: butterfly, 400 × 400 pixels. Right: corner, 512 × 512 pixels.
We apply Algorithm 1 to blur- and noise-contaminated versions of the images shown in Figure 1. The corner image is representative of images with welldefined edges, while the butterfly image is a gray-scale photographic image with smoothed edges. Example 4.1. We consider the restoration of a contaminated version of the lefthand side image of Figure 1. Contamination is by space-invariant Gaussian blur as generated by the MATLAB function blur.m from Regularization Tools [6] with parameters sigma = 3 and band = 9. This function generates a block Toeplitz matrix with Toeplitz blocks. The parameter band specifies the halfbandwidth of the Toeplitz blocks and the parameter sigma defines the variance of the Gaussian PSF. The image also is contaminated by 5% Gaussian noise. The blurring operator is symmetric. We therefore use the MR-II iterative method.
434
S. Morigi, L. Reichel, and F. Sgallari
Fig. 2. Example 1. Top-left: Image contaminated by Gaussian blur and 5% Gaussian noise. Top-right: Image restored by 1-level method. Bottom-left: Image restored by 3-level method. Bottom-right: Deringing function β defined by (16).
Figure 2 provides a qualitative comparison of images restored by the basic 1-level MR-II method and the 3-level method defined by Algorithm 1. The restoration obtained with the latter method can be seen to be of higher quality with sharper edges. The deringing function β (16) is shown in Figure 2 (bottom right); it is small in smooth image regions and large elsewhere. Table 1(a) gives a quantitative comparison of the restorations determined by Algorithm 1 with = 2 and = 3 levels, and the basic 1-level MR-II method, for different amounts of noise. The columns marked “PSNR” and “EV” display (17) and (18), respectively. They show Algorithm 1 with = 3 to yield images with the highest PSNR- and smallest EV-values. The column marked “iter” shows the number of iterations required on each level. For instance, the triplet 4 − 1 − 2 indicates that Algorithm 1 carried out 4 MR-II iterations on the coarsest level, 1 iteration on the intermediate level, and 2 iterations on the finest level. The
Multilevel Method for Deblurring, Denoising, and Segmentation
435
Table 1. PSNR, number of iterations (iter), and edge variation (EV) as functions of the number of levels and noise-level (% noise) for restorations of (a) the image of Example 4.1 contaminated by Gaussian blur determined by band = 9 and sigma = 3, and (b) the image of Example 4.2 contaminated by motion blur defined by r = 15 and θ = 10 % noise 1 1 2 1 3 1 1 5 2 5 3 5 1 10 2 10 3 10
(a) PSNR 26.05 26.73 26.86 24.30 24.38 24.63 23.25 23.42 23.60
iter 11 89 979 4 33 533 3 22 412
EV 5043 4179 4060 5279 4682 4555 5477 4949 4853
% noise 1 1 2 1 3 1 1 5 2 5 3 5 1 10 2 10 3 10
(b) PSNR 30.93 31.69 32.02 27.13 28.56 28.69 25.15 26.77 26.93
iter 12 11 9 17 8 8 5 43 723 3 22 312
EV 4629 2294 2251 6519 3553 3140 5368 3692 3466
Fig. 3. Example 4.2. Left: Restoration determined by 3-level LSQR-based multilevel method. Right: Restoration obtained by basic 1-level LSQR.
dominating computational effort are the matrix-vector product evaluations on the finest level. The 2- and 3-level methods can be seen to require fewer iterations on the finest level than the basic 1-level MR-II method. 2 Example 4.2. Consider the restoration of a version of the right-hand side image of Figure 1 that has been contaminated by motion blur and 5% Gaussian noise. The PSF is represented by a line segment of length r pixels in the direction of the motion. The angle θ (in degrees) specifies the direction; it is measured counter-clockwise from the positive x-axis. The PSF takes on the value r−1 on this segment and vanishes elsewhere. We refer to the parameter r as the width.
436
S. Morigi, L. Reichel, and F. Sgallari
Fig. 4. Example 3. Top left: Segmentation of a blur- and noise- free image. Top right: Segmentation a blurred and noisy image by a 1-level method. Bottom-left: Segmentation by 3-level method of the blurred and noisy image on level 2. Bottom-right: Segmentation by 3-level method on finest level.
The motion blur for this example is defined by r = 15 and θ = 10. The blurring matrix A is nonsymmetric. We therefore use the LSQR iterative method in Algorithm 1. Figure 3 (left) shows the restoration determined by Algorithm 1 with 3 levels. The restored image obtained by the basic 1-level LSQR method is shown in Figure 3 (right). Visual comparison shows Algorithm 1 to give the most pleasing restoration. This is in agreement with the PSNR- and EV-values reported in Table 1(b). 2 Example 4.3. We apply Algorithm 1 to segmentation of a contaminated version of the image of Figure 1 (right). The contamination is caused by Gaussian blur, determined by band = 9 and sigma = 3, and 10% Gaussian noise. Segmentation is carried out using the variational formulation for geodesic active contours
Multilevel Method for Deblurring, Denoising, and Segmentation
437
(GAC) without re-initialization as described by Li et al. [7]. The initial curve is close to the boundary of the image. Figure 4 (top-left) shows the segmentation obtained when applied to the noise- and blur-free image in Figure 1 (left). The curve evolution requires 900 iterations. Segmentation of the contaminated image is more difficult. We first deblur the contaminated image by the basic 1-level MR-II iterative method, and then apply GAC segmentation to the restored image. The resulting segmentation is shown in Figure 4 (top-right). The curve evolution required 1200 iterations. Finally, we apply Algorithm 1 with 3 levels and the Segmentation step. No segmentation is carried out on the coarsest level. On level = 2, we apply GAC segmentation with 400 curve evolution iterations. The resulting segmentation is shown in Figure 4 (bottom-left). Prolongation of the evolved contour is carried out by spline interpolation. Only 100 curve evolution iterations are required on the finest level. The resulting segmentation is displayed in Figure 4 (bottomright). The figure shows Algorithm 1 to be able to extract object boundaries with less computational effort and higher accuracy than the corresponding 1level method. 2
5
Conclusions and Extension
Visual inspection of the images shown in Section 4, as well as computed PSNRand EV-values, show the cascadic multilevel method to give more accurate restorations than 1-level methods applied on the finest level only. A multilevel approach to segmentation of contaminated images also yields better results and requires less computational effort than the corresponding 1-level method. The aim of ongoing work is to gain increased understanding of the interplay between image restoration and segmentation.
Acknowledgments This research has been supported by PRIN-MIUR-Cofin 2006 project, by University of Bologna "Funds for selected research topics", and in part by an OBR Research Challenge Grant.
References 1. Buades, A., Coll, B., Morel, J.M.: The staircasing effect in neighborhood filters and its solution. IEEE Trans. Image Processing 15, 1499–1505 (2006) 2. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. Int. J. Comput. Vis. 22, 61–79 (1997) 3. Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer, Dordrecht (1996) 4. Hanke, M.: Conjugate Gradient Type Methods for Ill-Posed Problems. Longman, Essex (1995)
438
S. Morigi, L. Reichel, and F. Sgallari
5. Hansen, P.C.: Rank-Deficient and Discrete Ill-Posed Problems. SIAM, Philadelphia (1997) 6. Hansen, P.C.: Regularization tools, version 4.0 for MATLAB 7.3. Numer. Algorithms 46, 189–294 (2007) 7. Li, C., Xu, C., Gui, C., Fox, M.D.: Level set evolution without re-initialization: a new variational formulation. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 430–436 (2005) 8. Morigi, S., Reichel, L., Sgallari, F., Shyshkov, A.: Cascadic multiresolution methods for image deblurring. SIAM J. Imaging Sci. 1, 51–74 (2008) 9. Ng, M.K., Chan, R.H., Tang, W.-C.: A fast algorithm for deblurring models with Neumann boundary conditions. SIAM J. Sci. Comput. 21, 851–866 (1999) 10. Reichel, L., Shyshkov, A.: Cascadic multilevel methods for ill-posed problems. J. Comput. Appl. Math. (in press) 11. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 12. Osher, S., Sethian, J.A.: Fronts propagating with curvaturedependent speed: algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988) 13. Paige, C.C., Saunders, M.A.: LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software 8, 43–71 (1982) 14. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12, 629–639 (1990) 15. Welk, M., Theis, D., Brox, T., Weickert, J.: PDE-based deconvolution with forwardbackward diffusivities and diffusion tensors. In: Kimmel, R., Sochen, N.A., Weickert, J. (eds.) Scale-Space 2005. LNCS, vol. 3459, pp. 585–597. Springer, Heidelberg (2005) 16. Weickert, J., Romeny, B.M.H., Viergever, M.A.: Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Trans. Image Process. 7, 398–410 (1998)
Fast Dejittering for Digital Video Frames Mila Nikolova CMLA, ENS Cachan, CNRS, PRES UniverSud, France
[email protected] http://www.cmla.ens-cachan.fr/∼nikolova/
Abstract. We propose several very fast algorithms to restore jittered digital video frames (their rows are shifted) in one iteration. The restored row shifts minimize non-smooth and possibly non-convex local criteria applied on the second-order differences between consecutive rows. We introduce specific error measures to assess the quality of dejittering. Our algorithms are designed for gray-value, color and noisy images. Some of them can be considered as parameter-free. They outperform by far the existing algorithms both in quality and in speed. They are a crucial step towards real-time dejittering of digital video.
1
Intrinsic Dejittering
Image jitter consists in a random horizontal shift of each row of a video frame. It occurs when the synchronization row pulses are corrupted e.g. by noise or degradation of the storage medium, or in wireless transmission. The visual effect is disturbing since all shapes are jagged, cf. e.g. Fig. 4. Structured jitter can be provoked by acoustic or electrical interferences [7], cf. e.g. Fig. 8. Time base corrector machines recover with some success the row synchronization pulses. This operation is often unsuccessful or impossible [6]. An alternative—restoring the video frames directly from the jittered data, called intrinsic dejittering [5]—is much more flexible and widely applicable. State of the Art. Intrinsic dejittering was invented in [5]. The method is based on a 2D auto-regressive (AR) image model. The unknown AR coefficients and row starts are estimated iteratively, jointly by blocs; a drift compensation is applied afterwards [6]. In [7], the 1 norm of the differences between 2 or 3 consecutive shifted rows is compared in the framework of dynamic programming. A fully Bayesian iterative method using a TV-based prior for joint dejittering and denoising is derived in [12]. The Bake and Shake method in [3] uses a good PDE image model (e.g. Perona-Malik) to recover the row positions. In [4], the same authors analyze the vertical slicing moments of images of bounded variation and derive a variational method (faster than [3] but less effective for difficult data). Our Approach. We exhibit a pertinent model enabling to discriminate natural images from their jittered versions. Each row is restored based on the previously restored rows using a simple non-smooth and possibly non-convex local criterion. We thus construct one-iteration effective and fast dejittering algorithms. Noisy X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 439–451, 2009. c Springer-Verlag Berlin Heidelberg 2009
440
M. Nikolova
jittered images are restored in two stages: (a) dejittering of the raw data; (b) denoising of the obtained dejittered image.
2
The Main Points of Our Approach
Notations. For any positive integers m and n, the rows of a matrix h ∈ Rm×n are denoted by hi , 1 ≤ i ≤ m, and the components of a row hi by hi (j), 1 ≤ j ≤ n. The components of any n-length vector u are denoted by ui , 1 ≤ i ≤ n. Given an original image f ∈ Rr×c , a jittered image g is produced according to: fi (j + di ) if 1 ≤ j + di ≤ c, (1) 1 ≤ i ≤ r, di ∈ Z, 1 ≤ j ≤ c, gi (j) = 0 otherwise. In practice, the row shifts di are bounded, |di | ≤ M , for M ≤ 6 or more [6]. The ˆ respectively. restored image and row shifts are denoted by fˆ and d, 2.1
Choice of a Local Criterion on Consecutive Rows
First of all, we need a good model for the columns of natural images.
original
(a) Original
jittered
(b) One column
(c) Jittered
Fig. 1. 50 × 50 zoom of Lena. (b) Gray value of column 15 in (a) and in (c).
Remark 1. The gray-value of the columns of natural images can be seen as pieces of 2nd or 3rd order polynomials—see Fig. 1(b) left or Fig. 3 in [9]. Such a claim is false for jittered images—see Fig. 1(b) right. This observation provides a sound basis to discriminate a natural image from its jittered versions. Suppose that fˆ1 , . . . fˆi−1 are already dejittered. By Remark 1, we will estimate the next dˆi using a criterion that compares fˆi−1 , fˆi−2 , . . . with all possible shifts of the ith data row, gi (j − di ), di ∈ {−N, . . . , N } for N ≥ M .
Uniform jitter, M = 6 arg min J , α = 1 Original (116 × 200) arg min J , α = 0.5 Fig. 2. Uniform jitter on {−M, . . . , M }. Restorations using (2)-(3) and (4).
Fast Dejittering for Digital Video Frames
441
Remark 2. Each row of g has no more than N zero-valued pixels at both extremities because of the jitter, see e.g. Fig. 2. Involving them in our criterion can seriously distort its meaning. So for any row i, we will use only data samples gi (j) for j ∈ {N + 1, . . . , c − N } which certainly belong to the original image. Guided by Remarks 1 and 2, as well as by a series of preliminary experiments (see e.g. Fig. 3), our main focus is on (2) dˆi = arg min J (di ) : di ∈ {−N, .., N } , N ≥ M, J (di ) =
c−N
gi (j − di ) − 2fˆi−1 (j) + fˆi−2 (j)α , α ∈ {0.5, 1} .
(3)
j=N +1
dˆi is easily found by exhaustive search since it belongs to a small finite set. Then: ∀j ∈ {1, · · · , c}, fˆi (j) = gi (j − dˆi ) if 1 ≤ j − dˆj ≤ c and fˆi (j) = 0 else .
(4)
Criterion J for α ∈ (0, 1] is minimized by a dˆi such that for a maximum number of components j we have fˆi (j) ≈ 2fˆi−1 (j) − fˆi−2 (j)—i.e. fˆi (j), fˆi−1 (j) and fˆi−2 (j) form a nearly linear segment—while breakpoints are preserved; for a mathematical flavor, see [10, 11, 13]. Then the gray value of each column of fˆ varies nearly piecewise linearly. More details are given in [9]. Remark 3. Dejittering a single frame yields a translated estimate pˆ of the row shifts, say pˆ = dˆ + C. Given the original d, the integer C is such that (5) C = arg max # i ∈ {1, · · · , r} : pˆi − n = di , n∈Z
where # means cardinality. α ˆ Alternative criteria (see Fig. 3). Minimize J1 (di ) = c−N j=N +1 gi (j−di )−fi−1 (j) yields (c)-(d). Criteria J1 work poorly—they tend to recover constant grayvalue vertical pieces. Solving (2)-(3) yields the original image α in (e)-(f). Criteria c−N ˆ ˆ ˆ J3 (di ) = j=N +1 gi (j−di ) − 3fi−1 (j) + 3fi−2 (j) − fi−3 (j) cannot discriminate well enough a natural image from its slightly shifted versions, see (g)-(h). 2.2
Error Measures for Dejittering
Remind that fˆ is translated with respect to (w.r.t.) f and that the extremities of its rows are null. In order to apply standard error measures, we shrink fˆ to fˆs fˆis (j) = fˆi (j + N ), 1 ≤ j ≤ c − 2N, ∀i ∈ {1, . . . , r}, so that fˆs contains only proper image information. Then we select an r×(c−2N ) inner submatrix f s of the original f that matches fˆs the best. Note that any error measure on fˆs − f s is sensitive to the of f s . We select f s using choice r c−2N s s ˆ the 1 norm: f − f 1 = min0≤k≤2N i=1 j=1 fi (j + k) − fˆs (j). Then we
442
M. Nikolova
(a) Original
(e) J , α = 1
(b) Jittered
(f) J , α = 0.5
(c) J1 , α = 1
(d) J1 , α = 0.5
(g) J3 , α = 1
(h) J3 , α = 0.5
Fig. 3. (b) Independent uniform jitter. Next: restorations for N = M + 1.
consider the mean absolute error mae(fˆ, f ) = f s−fˆs 1 / r(c−2N ) and the peak 2 signal to noise ratio, psnr(fˆ, f ) = 10 log10 δ r(c − 2N )/f s − fˆs 22 , where .2 is the 2 -norm and δ is the dynamic range of (fˆs , f s ). ˆ The error measure The quality of dejittering can also be evaluated using d− d. def ˆ d) = (1/r)d − d ˆ 1 gives the average displacement of the pixels along any e1 (d, column. The following two measures are quite interesting: ˆ d) def ˆ ∞% ; e∞ (d, = (100/c)d − d (6) def Δ ˆ = 0, 1 ≤ i ≤ r−1 % . (7) e0 (d, d) = 100/(r−1) # (dˆi −di ) − (dˆi+1 −di+1 ) e∞ measures the maximum horizontal error w.r.t. the width c of the image while ˆ eΔ 0 measures the number of changes in d − d w.r.t. the height r of the image. Δ Remark 4. When both e∞ and eΔ 0 are small (e.g. e∞ ≤ 0.4% and e0 ≤ 0.8%), we are guaranteed that dejittering is nearly perfect, independently of any other error measure (see Figs. 6, 7, 10 and 12). Indeed, for a 512 × 512 image, the proposed error bounds mean that no more than 4 rows have a horizontal erroneous shift which is no more than 2 pixels. For a natural image, such an error is invisible to the naked eye. However, if one of these values is larger, no conclusion can be done—cf. Fig. 9 and the relevant comments.
3
Algorithms for Gray-Value Natural Images
We construct an r×(c+2N )-size matrix f ∗ for N > M . The middle of its first row f1∗ is g1 , so pˆ1 = N+1. Then we restore the relative row shifts pˆi ∈ {1, . . . , 2N +1}, ∀i ∈ {2,· · ·, r} based on (2)-(3) and (4). Then fˆ is an inner sub-matrix of f ∗ .
Fast Dejittering for Digital Video Frames
443
. . Notations. [a .. b .. c] means that a, b and c are concatenated horizontally; a ← b means that we replace a by b. ∀n ∈ N, θ(n) is the n-length 0-valued row: . def . θ(n) = 0 .. · · · .. 0 , #θ(n) = n.
(8)
Algorithm 1 (Gray value images) – Fix N > M , e.g., N = M + 1. – Choose α = 1 or α = 0.5.
. . 1. Define f ∗ ∈ Rr×(c+2N ) and set f1∗ = θ(N ) .. g1 .. θ(N ) .
. . 2. Split g = g L .. γ .. g R where g L ∈ Rr×N , γ ∈ Rr×(c−2N ) and g R ∈ Rr×N .
.. .. 3. Put pˆ0 = pˆ1 = N + 1 and u = ⎧ v = θ(N ) . γ 1 . θ(N ) . ..
.. 4. For any i = 2, . . . , r, do: ⎪ ⎪ (i) Put hk = θ(k − ⎪ ⎪ 1) . γ i . θ(2N − k + 1) ; ⎪ ⎪ ⎨ (ii) Find m = max k, pˆi−1 , pˆi−2 and n = min k, pˆi−1 , pˆi−2 +c−1 ; (a) ∀ k = 1, . . . , 2N +1, do ⎪ n α ⎪ ⎪ 1 k ⎪ ⎪ − 2u + v (iii) J (k) = h ; ⎪ j j j ⎩ n−m+1 j=m (b) Find pˆi = arg min{J (k) : 1 ≤ k ≤ 2N + 1} ;
. . pi − 1) .. γ i .. θ(2N + 1 − pˆi ) ; (c) Replace v ← u and u ← hpˆi = θ(ˆ
. . (d) Set f ∗ = θ(ˆ p − 1) .. g .. θ(2N − pˆ + 1) . i
i
i
i
5. Extract fˆ ∈ Rr×c from f ∗ ∈ Rr×(c+2N ) : cancel 2N columns at the left and right ends of f ∗ that have the largest number of zeros. Explanations. u, v and hk are c-length rows such that at step i, u and v correspond to the restored rows i − 1 and i − 2, respectively, while hk in 4a(i) realizes all possible shifts for row i. In 4a(ii), m and n help to satisfy Remark 2. In 4b, pˆi is the estimate for relative shift of row i. Computation time. We used Matlab 7.2 on a PC with Pentium 4 CPU 2.8GHz and 1GB RAM, under Windows XP Professional service pack 2. For a 512 × 512 image and N = 7 we got the solution in 0.62 s. for α = 1 and in 1 s. for α = 0.5. Translation Recovery. In order to compute the the errors defined in § 2.2, we need the translation constant C given in (5). Note that 1 − N ≤ C ≤ 3N + 1. Algorithm (Translation Recovery) 1. Define I = {−N + 1, . . . , 3N + 1}. 2. Compute the histogram H(n) = # j ∈ I : pˆ(j) − d(j) = n , ∀n ∈ I. 3. Obtain C = arg maxn∈ I H(n). Then dˆi = pˆi − C, 1 ≤ i ≤ r.
444
M. Nikolova
Compound models. If the gray-values of the columns of an image are nearly constant on large pieces, we should involve in J a 1st -order differences term. Algorithm 1(a) In Algorithm 1, 4a(iii), use J below where β is a weight for 1st -order differences: α 1 J (k) = n−m+1 nj=m |hkj − 2uj + vj | + β|hkj − uj | , β ≥ 0. Illustrations. In all experiments, Algorithm 1, is applied with N = M + 1. The jitter in Fig. 4 is significant. We kept this first trial since our method found the original for α ∈ {0.5, 1}. In Fig. 5 (Peppers), the dejittered image is hard to distinguish from the original. However, the error image f s − fˆs shows a slight displacement of several pixels. The dejittered image in Fig. 6 is nearly ˆ perfect since eΔ 0 = 0.6% and e∞= 0.39%. We observe that d − d has a 1-pixel error at rows 83, 84 and 401. The first two are within the zooms in the same figure. The restored Boat in Fig. 7 is quasi-perfect since eΔ 0 = 0.25% and e∞= 0.39%. The original Boat can be seen in Fig. 8 where the restorations are exact (all errors are null). For the results concerning [12] and [3], cf. section 6, p. 450.
Uniform jitter, M= 6 Bayesian TV [12] Bake & Shake [3] Algorithm 1≡Original mae=11.7, psnr=22 mae=7.4, psnr=23 mae=0, psnr=∞ Fig. 4. Algorithm 1 for α = 1 and α = 0.5 yields the original image
Uniform jitter,M=10 Original (512×512) Algorithm 1, α = 0.5
Error: f s − fˆs
Fig. 5. Algorithm 1 with α = 0.5 yields mae= 1.35, psnr=31.51 and e1 = 0.4
Large-Scale Experiment. We tested all proposed algorithms using 1000 independent experiments where 4 images were degraded with 2 different types of random jitter and restorations were done for α = 1 and α = 0.5. The main conclusion is that α = 0.5 is better for images with texture or curvatures (Lena, Barbara,
Fast Dejittering for Digital Video Frames
Uniform jitter,M=6
Alg. 1, α = 0.5
Zoom dejittered
445
Zoom original
Fig. 6. (512×512). Algorithm 1: mae= 4.16, psnr=25.53, eΔ 0 = 0.6% and e∞ = 0.39%.
Uniform jitter,M=10 Bayesian TV [12] Bake & Shake [3] Alg. 1, α ∈ {0.5, 1} mae=13.4, psnr=20.8 mae=12.5, psnr=20.3 mae=0.6, psnr=42.9 Fig. 7. Boat (400×512). Algorithms 1 is nearly perfect: eΔ 0 = 0.25% and e∞ = 0.39%.
d = 6 sin
n 20
Algorithm 1 ≡Original d = 6 sin
n 4
Algorithm 1 ≡Original
Fig. 8. Boat (400×512). Here . denotes approximation to the nearest integer.
Peppers); α = 1 is better for images with many straight lines (Boat). In all cases α = 1 yields good results, usually α = 0.5 works better. The details are reported in [9]. Globally, the obtained mean results are very encouraging.
4
Algorithms Color Natural Images
We extend Algorithm 1 to RGB color images where all channels incur the same jitter. RGB images are represented by vector-valued matrices f where each pixel fi (j) has 3 components, fi (j; κ), 1 ≤ κ ≤ 3. The jittering model now reads: fi (j + di ; κ), if 1 ≤ j +di ≤ c, 1 ≤ i ≤ r, |di | ≤ M, 1 ≤ κ ≤ 3. gi (j; κ) = 0, otherwise, 1 ≤ j ≤ c,
446
M. Nikolova
The main algorithm is based on (2)-(3) and (4), yet again. Since the jitter is the same for all color channels, we obtain from g a gray-value image γ and estimate the relative row shifts pˆi using γ as in Algorithm 1. The dejittered color image fˆ is obtained by inserting pˆ into g. Similarly to (8), for any positive integer n we denote by θ(n × 3) the n-length vector-valued row whose components are (0, 0, 0) for all i = 1, · · · , n. Algorithm 2 (Color images) – Fix N > M , e.g., N = M + 1. – Choose α = 1 or α = 0.5.
. . 1. Define f ∗ ∈ Rr×(c+2N )×3 and set f1∗ = θ(N × 3) .. g1 .. θ(N × 3) .
. . 2. Split g = g L .. g .. g R , where g L ∈ Rr×N , g ∈ Rr×(c−2N ) and g R ∈ Rr×N . 3. Calculate γ 1 (j) = |g1 (j; 1)| + |g1 (j; 2)| + |g 1 (j; 3)| for 1 ≤ j ≤ c − 2N . 4. Put pˆ0 = pˆ1 = N + 1 and u = v = θ(N ), γ 1 , θ(N ) . 5. For any i = 2, . . . , r do: i. γ i (j) = g i (j; 1) + g i (j; 2) + g i (j; 3); (a) ∀ k = 1, . . . , 2N + 1 do: ii. do step 4a as in Algorithm 1; (b) Do steps 4b and 4c as in Algorithm 1 ;
. . p − 1) × 3 .. g .. θ (2N − pˆ + 1) × 3 . (c) Set f ∗ = θ (ˆ i
i
i
i
6. Find fˆ ∈ Rr×c as in step 5, Algorithm 1. Computation time. In the conditions of Remark 3, p.443, for a 512×512 RGB image and N = 7 we got the solution in 1 s. for α = 1 and in 1.4 s. for α = 0.5. Algorithm 2(a) (Compound models) In step 5a, Algorithm 2, replace J as done in Algorithm 1(a). Illustrations. In all examples, Algorithms 2 and 2(a) are used with N = M + 1. In Fig. 9, the main part of the error in dˆ corresponds to the sky and to the ground which are quite homogeneous, so the error is invisible to the naked eye. Part of it reaches the the boat, so we display a zoom of the latter. Fig. 10 shows
original
restored
Uniform jitter, M= 8 Man (478 × 532)
Algorithm 2 α = 1
Zooms.
Fig. 9. Dejittering yields mae= 1.45, psnr=33.82, e1 = 0.76 and e∞ = 3.76%
Fast Dejittering for Digital Video Frames
Jitter N (0,52 ) truncated on {−15, .., 15}
Zooms of a 707 × 579 image
447
Algorithm 2, α = 0.5
Original
Fig. 10. The restoration of the whole image quasi-perfect: e∞= 0.17% and eΔ 0 = 0.28%
(a)
(b)
(c) Gaussian jitter, M = 12
Algorithms 2(a)
Zooms
Fig. 11. Zooms: (a) Jittered, (b) Original, (c) Dejittered
Uniform jitter M = 8
Original (542 × 410)
Algorithm 2, α = 0.5
Fig. 12. The result is quasi-perfect, mae=0.14, psnr=45.15, eΔ 0 = 0.37% and e∞= 0.18%
448
M. Nikolova
a zoom of a 707 × 579 image. The dejittering of the full image is nearly perfect since e∞ = 0.17% and eΔ 0 = 0.28%. The jitter in Fig. 11 is a centered Gaussian with standard deviation σ = 6, truncated and quantized on {−12, . . . , 12}. Algorithm 2(a) for α = 0.5 and β ∈ {2, 3} gives better visual results than Algorithm 2. Fig. 12 shows a nearly perfect restoration since e∞ = 0.18% and eΔ 0 = 0.37%.
5
Restoration of Noisy Jittered Images
Our approach is to first dejitter the raw data using the ideas of Algorithms 1-2 and then to denoise the dejittered image. In the second stage, we use fast shrinkage estimators, see e.g. [8]. Better methods would improve the final result. 5.1
Moderate Noise
For a noise with 15-20 db snr or more, Algorithms 1, 2 perform well. Experiment. The image in Fig. 13(a) is corrupted with white zero-mean normal noise, 15 db snr, and independent uniform jitter on {−6, . . . , 6}. Taking into account that the columns of the image are nearly constant on large segments, dejittering in (b) is done using Algorithm 1(a) for β = 3. Denoising of (b) is done in (c) by hard thresholding the 2D Daubechies wavelet transform with 4 vanishing moments for T = 30. The restoration is fast and the result is clean, compared to Fig. 5.
(a) 15 db snr+Jitter
(b) Dejittered, Alg.1
(c) Denoised
Fig. 13. Pepers (512 × 512). For the restored image in (c), psnr=29.34.
5.2
Strong Noise
When the noise is strong, we propose a sightly different scheme having a comparable computational cost. The idea is to partially denoise each row of the image using hard thresholding and to replace the function |.|α in step 4a(iii) of Algorithm 1 by a better adapted edge-preserving function ψ. Let W : R1×n → R1×n denote a 1D wavelet transform and W ∗ its inverse. Given a threshold T > 0, let us introduce the hard thresholding operator HT : R1×n → R1×n by
Fast Dejittering for Digital Video Frames
HT (w)(j) =
0 if w(j) ≤ T w(j) otherwise
1 ≤ j ≤ n, ∀w ∈ R1×n .
449
(9)
Knowing that the asymptotically optimal T , cf. [2], oversmooths rows, we use an under-optimal T . In order to simplify the presentation, we give the algorithm for gray-value images. The extension to color images is straightforward, cf. [9]. Algorithm 3 (Quite noisy images) – – – –
Fix N > M , e.g., N = M + 1. Choose a 1D wavelet transform W (e.g. Daubechies). Fix an under-optimal threshold T . Choose ψ : R×R → R+ , e.g. ψ(s, t) = (|s| + β|t|)α , and fix α > 0 and β ≥ 0.
. . 1. Define f ∗ ∈ Rr×(c+2N ) and set f1∗ = θ(N ) .. g1 .. θ(N ) .
. . 2. Split g = g L .. g .. g R where g L ∈ Rr×N , g ∈ Rr×(c−2N ) and g R ∈ Rr×N . 3. Compute γ1 = W ∗ HT (W g1 ) . 4. Do steps 3 to 5 of Algorithm 1 with changes: the following (a) in step 4a(i), insert γi = W ∗ HT (W g i ) ; n 1 k k (b) in step 4a(iii), use J (k) = n−m+1 j=m ψ |hj − 2uj + vj |+β|hj − uj | .
(a) 10db snr + Jitter
(d)Algorithm 3, dejittering
(b) Bayesian TV [12] (c) Bake & Shake [3] mae=19.36, psnr=20.24 mae=20.62, psnr=19.37
(e) Our 2-stage method mae=7, psnr=28.31
Original
Fig. 14. Boat (512 × 512). Restoration of (a) using different methods.
450
Comments. Hard-thresholding in steps 3 and 4a is better than other shrinkages since it keeps unchanged the important coefficients. The 1D row under-denoising (step 4a) helps to approach the model of Remark 1. Denoising of a dejittered image can be done by various methods. Experiment. Boat in Fig. 14 is corrupted with 10 db snr white zero-mean normal noise and independent jitter, uniform on {−8, .., 8}. The restoration using Bake and Shake [3] is visually better than Bayesian TV [12]. For these results, cf. section 6, p. 450. We used Algorithm 3(a) for β = 0 and ψ(t) = |t|α for α = 0.5 in step 4b. In steps 3 and 4a we use hard-thresholding of the Daubechies wavelet coefficients with 2 vanishing moments for T = 30. The dejittered image in (d) is denoised in (e) by hard thresholding of its curvelet transform using the enhanced-denoising program in the CurveLab 2.1.2 toolbox relevant to [1].
6
Conclusions
The obtained results have a remarkable quality while the algorithms are nearly real-time. More details and examples are presented in [9]. The crux of our approach are (a) to minimize a nonsmooth and possibly nonconvex local criterion on the magnitude of the second-order differences between consecutive rows; (b) to exclude from J all pixels due to the jitter. In presence of strong noise, a critical step is to (under)-denoise the rows successively so that the prior mentioned in Remark 1 remains relevant, and to adapt the criterion J if necessary. The natural evolution of this work is to involve it in the restoration of video sequences and to take advantage of the correlation between consecutive frames.
Acknowledgements This work has been supported by grant Freedom, anr07-jcjc-0048-01. The author thanks Louis Laborelli, (Institut National de l’Audiovisuel, France), for his discussion on practical questions relevant to jittering. The author is thankful to Dr. Suhg-Ha Kang, Georgia Institute of Technology, Atlanta, who realized all experiments with the methods [3] and [12], as well as to Dr. Jackie Shen (Barclays Capital, Wall Street) who provided his Matlab codes for [12].
References 1. Candés, E.J., Demanet, L., Donoho, D.L., Ying, L.: Fast discrete curvelet transforms. SIAM J. on Multiscale Modeling and Simulation 5(3), 861–899 (2006) 2. Donoho, D.L., Johnstone, I.M.: Ideal Spatial Adaptation by Wavelet Shrinkage. Biometrika 81(3), 425–455 (1994) 3. Kang, S.-H., Shen, J.: Video dejittering by bake and shake. Image and vision computing 24(2), 143–152 (2006) 4. Kang, S.-H., Shen, J.: Image Dejittering Based on Slicing Moments. Springer Series on Mathematics and Visualization, pp. 35–55 (2007)
451 5. Kokaram, A., Roosmalen, P.M.B., Rayner, P., Biemond, J.: Line registration of jittered video. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2553–2556 (1997) 6. Kokaram, A.: Motion picture restoration. Springer, Heidelberg (1998) 7. Laborelli, L.: Removal of video line jitter using a dynamic programming approach. In: Proc. of the IEEE ICASSP, pp. 331–334 (2003) 8. Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press, London (1999) 9. Nikolova, M.: One-iteration dejittering of digital video images. Report CMLA n.2008-20, http://www.cmla.ens-cachan.fr/fileadmin/Membres/nikolova/RT-DJ.pdf 10. Nikolova, M.: Local strong homogeneity of a regularized estimator. SIAM J. on Appl. Mathematics 61(2), 633–658 (2000) 11. Nikolova, M.: Analysis of the recovery of edges in images and signals by minimizing nonconvex regularized least-squares. SIAM J. on Multiscale Modeling and Simulation 4(3), 960–991 (2005) 12. Shen, J.: Bayesian video dejittering by bv image model. SIAM J. on Appl. Mathematics 64(5), 1691–1708 (2004) 13. Welk, M., Weickert, J., Becker, F., Schnörr, C., Feddern, C., Burgeth, B.: Median and related local filters for tensor-valued images. Signal Processing (special issue Tensor Signal Processing) 7, 291–308 (2007)
Sparsity Regularization for Radon Measures Otmar Scherzer1,2 and Birgit Walch1, 1
2
Department of Mathematics, University of Innsbruck, Technikerstr. 21a, A-6020 Innsbruck, Austria
[email protected],
[email protected] http://infmath.uibk.ac.at Radon Institute of Computational and Applied Mathematics, Altenberger Str. 69, A-4040 Linz, Austria
Abstract. In this paper we establish a regularization method for Radon measures. Motivated from sparse L1 regularization we introduce a new regularization functional for the Radon norm, whose properties are then analyzed. We, furthermore, show well-posedness of Radon measure based sparsity regularization. Finally we present numerical examples along with the underlying algorithmic and implementation details. We shall, here, see that the number of iterations turn out of utmost importance when it comes to obtain reliable reconstructions of sparse data with varying intensities.
1
Introduction
In this paper we consider the solution of the abstract equation F u = v subject to u ∈ dom F .
(1)
The operator F is linear and bounded between Hilbert spaces W and V . We assume that dom F is a subset of Radon measures on a bounded domain Ω ⊆ IRn . We consider solving the operator equation (1) approximately by a variational regularization method, which consists in minimizing the functional 2 Tˆα,vδ (u ) := F u − v δ V + α u RM (2) on dom F ⊆ W . Here u RM is the norm of the Radon measure u . In order to see the relation to sparsity we note that if u is absolutely continuous with density U , i.e., U dx = du , then we have that U vdx : v ∈ C0 (Ω), vL∞ ≤ 1 = U L1 . u RM = sup Ω
The regularization method with Tˆα,vδ , where the Radon measure is replaced by the L1 -norm, has been analyzed in [13]. There, however, different assumptions
Birgit Walch is Recipient of a DOC fFORTE-fellowship of the Austrian Academy of Sciences at the Department of Mathematics of the University of Innsbruck.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 452–463, 2009. c Springer-Verlag Berlin Heidelberg 2009
Sparsity Regularization for Radon Measures
453
have been made that guarantee existence of a minimizer in L1 (Ω), while in this work we consider minimizers, which are Radon measures. The notion of sparsity appears in a variety of settings. In the context of regularization it is mostly used in connection with regularization terms RS (u ) := ωi |u , φi | , where φi is a set of appropriate functions, typically forming a basis or frame. The inner product is on a Hilbert space and ωi are positive coefficients. We refer to a few papers, which are related to this topic [7, 2, 3, 4, 5, 8, 9, 10, 11, 12, 14]. Some researchers even call total variation minimization sparsity regularization. We study the reconstruction of sparse functions and measures. In contrast to total variation regularization we focus on reconstructing sparse measures and not gradient measures. There is a fundamental difference between regularization terms RS and L1 , respectively Radon measure regularization. To see this, take (φi ) an orthonormal basis and ωi = 1 in the definition of RS (u ) and note that standard convex analysis in the Hilbert space l2 is applicable. Note that l1 ⊆ l2 2 and therefore we can consider minimization of u → F u − v δ + αRS (u ) over l2 ≡ L2 (Ω). That is, there is a proper extension of the functional from l1 to l2 if the operator F can be extended on l2 . However, convex analysis in the Hilbert spaces L2 is not applicable for ·L1 Regularization, since on domains with finite 2 measure, L2 (Ω) ⊂ L1 (Ω), and minimization of u → F u − v δ +α u L1 over L2 (Ω) is a real restriction of the proper domain of the regularization functional, which is L1 (Ω). The curiosity is that after discretization with piecewise constant functions of the later a truncated expansion of the former is revealed. The outline of this paper is as follows: In Section 2 we give a review on the analysis of regularization methods. In Section 4 we review some basic facts on Radon measures and duals of Sobolev spaces. Having specified the ingredients we apply the general results of the review sections to Tˆα,vδ in Section 3 and show well–posedness, and regularizing properties. Section 5 shows the analogy in the analysis to total variation minimization. Section 6 presents an example for sparse recovery and shows some reconstructions.
2
Review on Convergence Properties of Variational Regularization Methods
In this section we make the following general assumptions, where we stick to the notation of [13]. Afterwards, we apply the results to the setting already used in the introduction. Assumption 1 1. 2. 3. 4.
Let U and V be Hilbert spaces. L : U → V is a bounded linear operator. = dom F is closed and convex in U . F := L|dom F , where ∅ τU and τV are the weak topologies on U and V , respectively.
454
O. Scherzer and B. Walch
We consider now the solution of the abstract equation F u = v subject to u ∈ dom F .
(3)
We consider solving this operator equation by variational regularization methods, which consist in minimizing the functional 2 Tα,vδ (u) := F u − v δ V + αR(u) where v δ ∈ y. For most applications it will be considered a noisy approximation of v as in equation 3. In order to have regularization properties of the family (Tα,vδ ) it is required that R, ·V , and L satisfy: Assumption 2 1. The norm ·V is sequentially lower semi-continuous with respect to τV . 2. The functional R : U → [0, ∞] is convex and sequentially lower semicontinuous with respect to τU . dom R = {u : R(u) = ∞} is the domain of R. 3. D := dom F ∩ dom R = ∅ (which, in particular, implies that R is proper). 4. For every α > 0 and M > 0, the level sets Mα (M ) := levelM (Tα,v ) := {u ∈ U : Tα,v (u) ≤ M } are sequentially pre-compact with respect to τU . 5. For every M > 0 the set Mα (M ) is sequentially closed with respect to τU and the restriction of F to Mα (M ) is sequentially continuous with respect to the topologies τU and τV . We stress that the sets Mα (M ) are defined based on the Tikhonov functional for unperturbed data v and we do not a-priori exclude the case that Mα (M ) = ∅. We refer to the following theorems from [13], which guarantee the existence of a minimizer, stability of the regularized solutions, and convergence: Theorem 3 (Existence). Let F , R, D, U , and V satisfy Assumption 2. Assume that α > 0 and v δ ∈ V . Then, there exists a minimizer of Tα,vδ . It has been shown by several authors that information on the noise level δ v − v ≤ δ
(4)
is essential for an analysis of regularization methods. In fact without this information the regularization cannot be chosen such that convergence of uδα to a solution of equation 1 can be guaranteed. Theorem 4 (Stability). Let F , dom F , U , and V satisfy Assumption 2. Assume that α > 0 and vk → v δ . Moreover, let uk ∈ arg min Tα,vk ,
k ∈ IN .
Then, (uk ) has a convergent subsequence. Every convergent subsequence converges to a minimizer of Tα,vδ .
Sparsity Regularization for Radon Measures
455
The following theorem clarifies the role of the regularization parameter α. It has to be chosen in dependence of the noise level to guarantee approximation of the solution of (3). Theorem 5 (Convergence). Let F , dom F , U , and V satisfy Assumption 2. Assume that (3) has a solution in dom F and that α : (0, ∞) → (0, ∞) satisfies α(δ) → 0 and
δ2 → 0 , as δ → 0 . α(δ)
Moreover, let the sequence (δk ) of positive numbers converge to 0, and assume that the data vk := v δk satisfy v − vk ≤ δk . Let uk ∈ arg min Tα(δk ),vk . Then (uk ) has a convergent subsequence to a solution of (1).
3
Regularization on the Space of Radon Measures
We assume that Ω ⊆ IRn and Ω ⊆ IRm are bounded, open and connected with Lipschitz boundary, respectively. For the sake of simplicity of presentation we take V = L2 (Ω ). Other spaces can be considered but then the notation is not that transparent anymore. We consider and study minimization of the functional Tˆα,vδ (u ) := (F u − v δ )2 + α u RM (5) Ω
over the set of Radon measures on Ω. Here, u RM denotes the norm of the Radon measure of u . Radon Measures Below we shortly review some facts about Radon measures, and specify the according properties. The set of Radon measures is the dual of C0 (Ω). Here, C0 (Ω) is the space of continuous functions from Ω into IR with compact support in Ω. We always consider C0 (Ω) equipped with the supremum norm. We denote the dual by M := (C0 (Ω)) and for u ∈ M the Radon measure is defined by u RM := sup vdu : v ∈ C0 (Ω), vL∞ ≤ 1 . Ω
We recall the definition of weak* convergence in M, i.e., a bounded sequence (uk )k in M is weakly* convergent to u ∈ M if f duk = f du for all f ∈ C0 (Ω) . lim k→∞
Ω
Ω
Below we show that ·RM is lower semi-continuous with respect to the weak* convergence on M.
456
O. Scherzer and B. Walch
Lemma 1. ·RM is lower semi-continuous with respect to the weak* convergence on M. Proof. Let a sequence of Radon measures (uk )k be weakly* convergent to some measure u . Then, u RM = sup vdu : v ∈ C0 (Ω), vL∞ ≤ 1 Ω vduk : v ∈ C0 (Ω), vL∞ ≤ 1 = sup lim k→∞
Ω
≤ lim inf uk RM . k→∞
Dual of a Sobolev Space Let s ∈ IN be fixed. In the following we investigate the dual of the Sobolev space W := W0s,2 (Ω), which is a Hilbert space with the inner product w1 , w2 s := ∇s w1 · ∇s w2 , Ω
where ∇s is the tensor containing all s-th derivatives. The associated norm is denoted by w s . For w ∈ W , the dual of W0s,2 (Ω), we have w −s := sup {w w ˜:w ˜ ∈ W, w ˜ s ≤ 1} . W satisfies the following properties: 1. From the Riesz representation theorem (see e.g. [6, Theorem 3.4]) it follows that for every w ∈ W there exists w ∈ W such that w w ˜ = w, w ˜ s for all w ˜ ∈ W. We define the Riesz mapping Iw = w ,
(6)
and note that I is an isomorphism between W and W, i.e., Iw s = w −s . In particular, we have that (wk )k → w with respect to the topology τW if and only if (wk )k = (Iwk )k → Iw = w with respect to the topology τW . 2. The inner product on the dual space W can be defined by w1 , w2 −s = w1 , w2 s , where w1 , w1 and w2 , w2 are related by the Riesz representation theorem, respectively. Now, we state a lemma, which is central for our further considerations: Lemma 2. Let 2s > n; Recall that s is the order of differentiation in the definition of W and n is the dimension of Ω. Then
Sparsity Regularization for Radon Measures
457
1. ·RM is convex and lower semi-continuous on W . 2. M is closed in W . 3. There exists a constant C such that w −s ≤ C w RM for all w ∈ M. Proof. We make some general statements first. Since, by assumption 2s > n, the Sobolev embedding theorem (see [1, Thm. 5.4]) guarantees that the embedding from W into C0 (Ω) is bounded, i.e., there exists a constant C such that uL∞ ≤ C us for all u ∈ W .
(7)
Since C0∞ (Ω) is dense in W and C0 (Ω) (with respect to the topologies of W and C0 (Ω), respectively), we have u RM = sup {u v : v ∈ C0 (Ω), vL∞ ≤ 1} = sup {u v : v ∈ C0∞ (Ω), vL∞ ≤ 1} 1 = sup {u v : v ∈ C0∞ (Ω), vL∞ ≤ C} C 1 ≥ sup {u v : v ∈ C0∞ (Ω), vs ≤ 1} C 1 = sup u v : v ∈ W0s,2 (Ω), vs ≤ 1 C = u −s . Thus, M ⊆ W . 1. Let (uk )k be a sequence of Radon measures, which is convergent to u in W (i.e., with respect to τW ). It remains to prove that u is a Radon measure. Since (uk )k is bounded in W , it is also weakly* convergent in W , meaning that uk v → u v for all v ∈ W. Then, in particular, we have uk v → u v for all v ∈ C0∞ (Ω). Now, let v ∈ C0∞ (Ω) satisfy vL∞ ≤ 1, then u v = lim uk v k→∞
≤ lim sup {uk v˜ : v˜ ∈ C0 (Ω), ˜ v L∞ ≤ 1} k→∞
(8)
≤ lim inf uk RM . k→∞
Since
C0∞ (Ω)
is dense in C0 (Ω), the last inequality shows that u RM ≤ lim inf uk RM k→∞
and, thus, u is a Radon measure. 2. From (8) it also follows that .RM is lower semi-continuous on W . The convexity is trivial. 3. Using (7) it follows that w −s = sup {w w ˜:w ˜ ∈ W, ws ≤ 1} ≤ sup {w w ˜:w ˜ ∈ M, wL∞ ≤ C}
= C w RM . This gives the third assertion.
458
4
O. Scherzer and B. Walch
Application to Variational Regularization on Radon Measures
We consider minimization of Tˆα,vδ on W , the dual of the Sobolev space W0s,2 (Ω), with dom F := M, the space of Radon measures, and L : W → L2 (Ω ) as in Assumption 1 bounded. Here W , L2 (Ω ) play the role of U and V in Assumption 1; i.e., we consider the weak topologies on W (not that since W is a Hilbert space, weak and weak* convergence can be identified) and L2 (Ω ). Note that in our notation of Assumption 1 we use here F := L|dom F . In order to apply the general results stated in Section 1 we have to verify Assumption 2. The requirement in Assumption 1 that dom F = M is closed in W , has already been shown in Lemma 2. 1. We recall that every norm on a Hilbert space is continuous and convex with respect to the weak topology. Therefore, ·W is sequentially weakly lower semi-continuous with respect to τW . 2. The functional R(·) := ·RM is convex and lower semi-continuous, which has already been shown in Lemma 2. 3. The set of Radon measures, which equals the domain D, is not empty. 4. Let α > 0, M > 0, and let (uk )k be a sequence in Mα (M ). We show that (uk )k has a convergent subsequence with respect to τW . From the definition of Tˆα,vδ it follows that (uk RM )k is bounded and, therefore, from Lemma 2 it follows that (uk )k is bounded with respect to ·−s . Thus, (uk )k has a subsequence which weakly converges in W . This shows that the sequence is sequentially precompact with respect to τW . 5. Let us follow up on the proof of the previous item. – Let us denote the weak limit of (uk )k by u in W . We show that u ∈ Mα (M ). We use that .RM is lower semi-continuous with respect to W . Moreover, since L : W → L2 (Ω ) is bounded, the functional w → Lw − v δ 2 is lower semicontinuous with respect to W . Thus, the sum of both terms is lower semi-continuous and thus u ∈ Mα (M ). Thus Mα (M ) is sequentially closed. – The operator L|dom F is weakly continuous and dom F is weakly sequentially closed, which follows from Lemma 2, which states that dom F = M is closed and convex, and since L is bounded on W . Therefore, Assumption 2 is satisfied and the assertions follow. Theorem 5 requires the existence of a solution of (3) in D. Thus, for the application of this result the existence of a solution with finite Radon measure is required.
5
Methodological Comparison with Finite Total Variation Regularization
The method which we are proposing is methodologically related to total variation minimization, which can be viewed as the relaxation of W 1,1 –regularization, which in turn consists in minimization of the functional
Sparsity Regularization for Radon Measures
u→
Ω
δ 2
459
(F u − v ) + α
|∇u| . Ω
Total variation minimization consists in minimization of u → Ω (F u − v δ )2 + α |Du|, where |Du| is the total variation of u, which is the norm of the finite, vector valued, Radon measure Du. In our context the regularization is with respect to Radon measures, which is a relaxation of L1 –regularization. Thus, total variation regularization can be considered as a regularization method on Radon measures for the first derivatives of the function, while according to our theory, L1 -regularization is for the distributions in W −2,2 (Ω). The derived analogy is not completely satisfactory and certainly subject to further research. The analogy to total variation minimization suggests that the smallest Sobolev space, which is a Hilbert space and contains the Radon measures, is W −1,2 (Ω). However, based on our analysis so far, this space is slightly too small to perform analytical studies. Our analysis is based on using the standard Sobolev embedding theorem and as a consequence, slightly more regularity properties on the linear operator F have to be imposed, than expected from the comparison with the total variation analysis.
6
Application in Nuclear Medicine
Apart from a purely theoretical background the concept of sparse data also proves relevant to a variety of real-world applications. As far as the imaging point of view is concerned we consider the field of nuclear medicine one major area of interest. Basically, however, any type of peaky (clustered) data on an otherwise relatively homogeneous background appears suitable for sparsity reconstruction. In the following we give a short description of the above research topic in order to provide a short introduction to the practical part of sparsity regularization: The two most popular techniques in nuclear medicine, PET (Positron Emission Tomography) respectively SPECT (Single Photon Emission Tomography), both rely on nuclear disintegration. Here, a tomographic scanner measures the decay of a radioactive tracer substance which has previously been injected into the patients body. Such a procedure, e.g., often appears in cancer diagnosis. As far as the field of imaging is concerned we consider the related isotopes our sparse data. Based on the respective measurements we obtain a so-called sinogram, plotting the number of radioactive disintegrations against the different scanner angles. The actual image is, then reconstructed according to the given sinogram. In the medical imaging context sparse variational reconstructions have already been used for MRI RF excitation pulse design in [15]. 6.1
Algorithm Characteristics
The current section focuses on the most important implementation characteristics of the main reconstruction algorithms involved in sparsity reconstruction.
460
O. Scherzer and B. Walch
Firstly, we have decided to apply our sample data (see Paragraph 6.2) to the following Daubechies, Defrise, DeMol [7] (DDD)-type implementation uk+1 := uk − λF ∗ (F uk − v δ ) − α sgn(uk+1 )
(9)
where the last term represents the sign (denoted by the sgn) operator, applied k+1 to the next step reconstruction, and may also be expressed by |uuk+1 | . We, thus, obtain an alternative formulation
α −1 k+1 ) := 1 + k+1 uk+1 = uk − λF ∗ (F uk − v δ ) . (10) S (u |u | As indicated by the notation the set valued operator S −1 contains a univariate inverse and therefore, we get an implementable scheme by applying the inverse of S −1 : (11) uk+1 = S(uk − λF ∗ (F uk − v δ )) . where
⎧ ⎪ ⎨t + α S(t) := t − α ⎪ ⎩ 0
if t ≤ −α if t ≥ +α else.
(12)
We refer to this implementation as of DDD-type, since the implementation is for function (actually measures) and not basis coefficients, as the original sparsity is devoted to. Aside from this difference it is the algorithm suggested in [7]. The numerical implementation is for piecewise constant functions approximating Radon measures. The situation is analogous as in the case of total variation regularization with finite elements where derivative (which are Radon measures) are approximated by derivatives of finite element. 6.2
Experimental Results
In order to test the practical relevance of the above method we have created test data with a constant background exhibiting (clusters of) peaks as we consider them the most realistic scenario. Most practical acquisition devices, however, rarely yield noise free data, which has lead to the decision of adding to our sample data v different types of noise. I. e., in order to achieve a proper real-world scenario we restrict the input to our reconstruction algorithms (see Paragraph 6.1) to noisy sinograms v δ only. Since the tested algorithms are mainly intended for medical use we have decided to adapt the sample framework to the nature of nuclear medical data acquisition. Most underlying processes in this field exhibit a clear Poisson nature, which has motivated the decision to overlay the clear sinogram data with typical Poisson noise. From a programming point of view we have decided to allow for the specification of four different parameters, each of which may have a certain influence on the outcome of the reconstruction process. The weighting parameters λ and α from Equations (9) to (12) appear an obvious choice in this case. Furthermore,
Sparsity Regularization for Radon Measures 1
1
4
l Regularization 2.5
461
l Residuals
x 10
200 180 2 160 140 1.5 Residuals
120 100
1
80 60
0.5
40 20 0
0
0
100
200
300 400 Number of Iterations
500
600
Fig. 1. The above figure is to illustrate the convergence behavior of our proposed regularization scheme from a practical point of view. The right hand side plot shows the declining residuals obtained during the computation process yielding the reconstruction image to the left. α = 0.0036
α = 0.00036
α = 0.000036
α = 0.0000036
200
150
100
50
0
Fig. 2. Decreasing values of α tend to sharpen even smaller object boundaries but at the same time also produce more background noise. Increasing the parameter, however, results in a quite homogeneous background while blurring and sometimes even removing smaller objects.
we have added one algorithm-independent input parameter, i.e., the number of iteration cycles. With the above implementation details specified, we have, finally, submitted the DDD-type algorithm from Paragraph 6.1 to different test cases. Number of Iterations: As obvious from the problem statement in Equations (9) to (12) the final reconstruction is created from iteratively updating the current reconstruction image. In most cases the starting image will be of random nature. The number of iterations may, thus, have a certain impact on the outcome of the reconstruction process. For our algorithm we have created test cycles within the range of [25, 1600], with the remaining parameters fixed. In this respect we have determined 50 cycles as the minimum value for obtaining a relatively reliable result. Note, however, that here, object boundaries appear blurred on an otherwise constant background. With an increasing number of iterations the different objects become sharper, while on
462
O. Scherzer and B. Walch l1 Regularization
L2 Regularization
W1,2 Regularization
Fig. 3. The above figures are intended to compare our benchmark results to those of other popular methods, e. g., L2 and W 1,2 regularization. Here, l1 , as depicted to the left tends to yield the clearest approximations of the original objects. We have however noticed, that in some cases small peaks may not be preserved during the regularization process. On the other hand, L2 appears not only slightly more blurred but also fails to remove the circle object caused by the Radon Transform which we consider a major drawback. Finally, W 1,2 regularization tends to produce strong object blurs which may be a problem not only for small and dense peaks but also deteriorate the overall reconstruction quality.
the other hand we are faced with the problem of an ever more inhomogeneous background. Weighting Parameters: As described in paragraph 6.1 the implementation includes two weighting parameters λ and α closely related to each other. Since we consider the role of the first one to be of higher importance we have decided for a ratio-based test environment. I. e., setting λ with the range of [0.016, 0.16] we have evaluated the quality of the reconstructions with α at 10λn , where 1 ≤ n ≤ 4. The described test framework has, furthermore, helped in limiting the computational power involved to a reasonable extent. Interestingly our experiments have shown that the ratio between λ and α turns out less important provided the first parameter is selected ’correctly’. There were no obvious differences between images with α = 10λ2 or α = 10λ3 . On the other side, we have noticed lower values of λ producing a more homogeneous background while higher ones resulted in sharper object boundaries. In this respect the effects appear similar to those described for varying numbers of iterations. Finally we may conclude that there exists a certain relation between the number of iterations and the choice of λ. The higher we set the weighting parameter the sooner we have to stop the iterative cycle in order to limit the background inhomogeneities to a certain extent.
Acknowledgement This work has been supported by the Austrian Science Fund (FWF) within the national research networks Industrial Geometry, project 9203-N12, and Photoacoustic Imaging in Biology and Medicine, project S10505-N20.
Sparsity Regularization for Radon Measures
463
References 1. Adams, R.A.: Sobolev Spaces. Academic Press, New York (1975) 2. Bredies, K., Lorenz, D.: Iterated hard shrinkage for minimization problems with sparsity constraints. SIAM J. Sci. Comput. 30(2), 657–683 (2008) 3. Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006) 4. Combettes, P.L., Pesquet, J.-C.: Proximal thresholding algorithm for minimization over orthonormal bases. SIAM J. Optim. 18(4), 1351–1376 (2007) 5. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (electronic) (2005) 6. Conway, J.B.: A Course in Functional Analysis, 2nd edn. Graduate Texts in Mathematics, vol. 96. Springer, Heidelberg (1990) 7. Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 57(11), 1413–1457 (2004) 8. Daubechies, I., Fornasier, M., Loris, I.: Accelerated projected gradient methods for linear inverse problems with sparsity constraints. J. Fourier Anal. Appl. (to appear) (2008) 9. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006) 10. Figueiredo, M., Nowak, R., Wright, S.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Topics Signal Process 1(4), 586–598 (2007) 11. Griesse, R., Lorenz, D.: A semismooth Newton method for Tikhonov functionals with sparsity constraints. Inverse Probl. 24(3), 035007, 19 (2008) 12. Ramlau, R., Teschke, G.: A Tikhonov-based projection iteration for nonlinear illposed problems with sparsity constraints. Numer. Math. 104(2), 177–203 (2006) 13. Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in Imaging. Applied Mathematical Sciences, vol. 167. Springer, New York (2008) 14. Tropp, J.A.: Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theory 52(3), 1030–1051 (2006) 15. Zelinski, A.C., Wald, L.L., Setsompop, K., Goyal, V.K., Adalsteinsson, E.: Sparsityenforced slice-selective MRI RF excitation pulse design. IEEE Trans. Med. Imag. 27, 1213–1229 (2008)
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage Simon Setzer University of Mannheim, A5, 68131 Mannheim, Germany
[email protected] http://kiwi.math.uni-mannheim.de
Abstract. We examine relations between popular variational methods in image processing and classical operator splitting methods in convex analysis. We focus on a gradient descent reprojection algorithm for image denoising and the recently proposed Split Bregman and alternating Split Bregman methods. By identifying the latter with the so-called DouglasRachford splitting algorithm we can guarantee its convergence. We show that for a special setting based on Parseval frames the gradient descent reprojection and the alternating Split Bregman algorithm are equivalent and turn out to be a frame shrinkage method.
1
Introduction
In recent years variational models were successfully applied in image restoration. These methods came along with various computational algorithms. Interestingly, the roots of many restoration algorithms can be found in classical algorithms from convex analysis dating back more than 40 years. It is useful from different points of view to discover these relations: Classical convergence results carry over to the restoration algorithms at hand and ensure their convergence. On the other hand, earlier mathematical results have found new applications and should be acknowledged. The present paper fits into this context. Our aim is twofold: First, we show that the Alternating Split Bregman Algorithm proposed by Goldstein and Osher for image restoration and compressed sensing can be interpreted as a DouglasRachford Splitting Algorithm. In particular, this clarifies the convergence of the algorithm. Second, we consider the following denoising problem which uses an L2 data-fitting and a Besov-norm regularization term [1] 1 1 (Ω) }. argmin { u − f 2L2 (Ω) + λuB1,1 1 u∈B1,1 (Ω) 2
(1)
We show that for discrete versions of this problem involving Parseval frames the corresponding alternating Split Bregman Algorithm can be seen as an application of a Forward-Backward Splitting Algorithm. The latter is also related to the Gradient Descent Reprojection Algorithm, see Chambolle [2]. Since our methods are based on soft (coupled) frame shrinkage, we also establish the relation to the X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 464–476, 2009. c Springer-Verlag Berlin Heidelberg 2009
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage
465
classical wavelet shrinkage scheme. Finally, we consider the Rudin-Osher-Fatemi model [3] 1 argmin u − f 2L2 (Ω) + λ |∇u(x)| dx, (2) u∈BV (Ω) 2 Ω which is a successful edge-preserving image restoration method. We apply our findings to create an efficient frame-based minimization algorithm for the discrete version of this problem.
2
Operator Splitting Methods
Proximation and Soft Shrinkage. We start by considering the proximity operator proxγΦ (f ) := argmin{ u∈H
1 u − f 2 + Φ(u)} 2γ
(3)
on a Hilbert space H. If Φ : H → R ∪ {+∞} is proper, convex and lower semi-continuous (lsc), then for any f ∈ H, there exists a unique minimizer u ˆ := proxγΦ (f ) of (3). By Fermat’s rule, this minimizer is determined by the inclusion 1 (ˆ u − f ) + ∂Φ(ˆ u) γ ⇔f ∈u ˆ + γ∂Φ(ˆ u) ⇔ u ˆ = (I + γ∂Φ)−1 f, 0∈
where the set-valued function ∂Φ : H → 2H is the subdifferential of Φ. If Φ is proper, convex and lsc, then ∂Φ is a maximal monotone operator. For a set-valued function F : H → 2H , the operator JF := (I + F )−1 is called the resolvent of F . If F is maximal monotone, then JF is single-valued and firmly nonexpansive. In this paper, we are mainly interested in the following two functions Φi , i = 1, 2, on H := RM : i) Φ1 (u) := Λu1 with Λ := diag(λj )M j=1 , λj ≥ 0,
N ˜ j )N , λ ˜ j ≥ 0 and |u| := uj 2 ii) Φ2 (u) := Λ˜ |u| 1 with Λ˜ := diag(λ j=1 for uj := (uj+kN )p−1 k=0 and M = pN .
j=1
The corresponding Fenchel conjugate functions are given by i) Φ∗1 (u) := ιC (u) with C := {u ∈ RM : |uj | ≤ λj , j = 1, . . . , M }, ˜ j , j = 1, . . . , N }, ii) Φ∗2 (u) := ιC˜ (u) with C˜ := {u ∈ RM : uj 2 ≤ λ ˜ i.e., ιC (u) := 0 for u ∈ C where ιC the indicator function of the set C (or C), and ιC (u) := +∞ otherwise. A short calculation shows that for any f ∈ RM we have proxΦ1 (f ) = TΛ (f ),
proxΦ2 (f ) = T˜Λ˜(f ),
466
S. Setzer
where TΛ denotes the soft shrinkage function given componentwise by 0 if |fj | ≤ λj , Tλj (fj ) := fj − λj sgn(fj ) if |fj | > λj ,
(4)
and T˜Λ˜ denotes the coupled shrinkage function, compare [2, 4, 5], ˜j , 0 if fj 2 ≤ λ T˜λ˜ j (fj ) := ˜ ˜j . fj − λj fj /fj 2 if fj 2 > λ Similarly, we obtain proxΦ∗1 (f ) = f − TΛ (f ),
proxΦ∗2 (f ) = f − T˜Λ˜(f ).
(5)
Operator Splittings. Now we consider more general minimization problems of the form (P ) min g(u) + Φ(Du) , u∈H1
:=FP (u)
where D : H1 → H2 is a bounded linear operator and both functions g : H1 → R ∪ {+∞} and Φ : H2 → R ∪ {+∞} are proper, convex and lsc. Furthermore, 1 we assume that 0 ∈ int(D dom(g) − dom(Φ)). For g(u) := 2γ u − f 2 and D = I this is again our proximation problem. The corresponding dual problem has the form (D) − min g ∗ (−D∗ b) + Φ∗ (b) . b∈H2
:=FD (b)
We assume that solutions u ˆ and ˆb of the primal and dual problems, respectively, exist and that the duality gap is zero. In other words, we suppose that there ˆ which satisfies the Karush-Kuhn-Tucker conditions 0 ∈ ∂g(ˆ is a pair (ˆ u, d) u) + ∗ˆ ∗ ˆ D b, 0 ∈ −Dˆ ˆ is a solution of (P ) if and only if u + ∂Φ (b). Then u u) = ∂g(ˆ u) + ∂(Φ ◦ D)(ˆ u). 0 ∈ ∂FP (ˆ Similarly, a solution ˆb of the dual problem is characterized by 0 ∈ ∂FD (ˆb) = ∂(g ∗ ◦ (−D∗ ))(ˆb) + ∂Φ∗ (ˆb). In both primal and dual problem, one finally has to solve an inclusion of the form 0 ∈ A(ˆ p) + B(ˆ p). (6) Various splitting techniques make use of this additive structure. In this paper, we restrict our attention to the forward-backward splitting (FBS) and the DouglasRachford splitting (DRS). The inclusion (6) can be rewritten as fixed point equation pˆ − ηB(ˆ p) ∈ pˆ + ηA(ˆ p) ⇔ pˆ ∈ JηA (I − ηB)ˆ p, η > 0 (7) and the FBS algorithm is just the corresponding iteration. For the following convergence result and generalizations of the algorithm we refer to [6, 7, 8, 9].
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage
467
Theorem 1 (FBS). Let A : H → 2H be a maximal monotone and βB : H → H be firmly nonexpansive for some β > 0. Furthermore, assume that a solution of (6) exists. Then, for any p(0) and any η ∈ (0, 2β) the following FBS algorithm converges weakly to such a solution of (6) p(k+1) = JηA (I − ηB)p(k) .
(8)
To introduce the DRS, we rewrite the right-hand side of (7) as
p + ηB pˆ ⇔ pˆ ∈ JηB JηA (I − ηB)ˆ p + ηB pˆ pˆ + ηB pˆ ∈ JηA (I − ηB)ˆ
:=tˆ
The DRS algorithm [10] is the corresponding iteration, where we use t(k) := p(k) + ηBp(k) . For the following convergence result, which in contrast to the FBS algorithm holds also for set-valued operators B, see [6, 8]. Theorem 2 (DRS). Let A, B : H → 2H be maximal monotone operators and assume that a solution of (6) exists. Then, for any initial elements t(0) and p(0) and any η > 0, the following DRS algorithm converges weakly to an element tˆ: t(k+1) = JηA (2p(k) − t(k) ) + t(k) − p(k) , p(k+1) = JηB (t(k+1) ). p) + B(ˆ p). If H is finiteFurthermore, it holds that pˆ := JηB ( tˆ) satisfies 0 ∈ A(ˆ dimensional, then the sequence p(k) k∈N converges to pˆ.
3
Bregman Methods (p)
For a function ϕ : H → R ∪ {+∞}, the Bregman distance Dϕ is defined as (p) (u, v) = ϕ(u) − ϕ(v) − p, u − v , Dϕ
with p ∈ ∂ϕ(v), cp. [11]. Given an arbitrary initial value u(0) and a parameter γ > 0, the Bregman proximal point algorithm (BPP) applied to (P ) has the form [12, 13, 14] 1 (p(k) ) u(k+1) = argmin{ Dϕ (u, u(k) ) + FP (u)}, γ u∈H1
p(k+1) ∈ ∂ϕ(u(k+1) ).
(9)
For conditions on ϕ such that (u(k) )k∈N converges to a minimizer of (P ), see [13] and the references therein. For ϕ := 12 · 22 , we recover the classical proximal point algorithm (PP) for (P ) which can be written as follows, compare [15], u(k+1) = proxγFP (u(k) ) = argmin u∈H1
1 u − u(k) 22 + FP (u) = Jγ∂FP (u(k) ). 2γ
468
S. Setzer
Under our assumptions on g, Φ and D, the weak convergence of the PP algorithm is guaranteed for any initial point u(0) , see [16]. In the same way, we can define the PP algorithm for (D) 1 b(k+1) = proxγ∂FD (b(k) ) = argmin b − b(k) 22 + FD (b) = Jγ∂FD (b(k) ) 2γ b∈H2 and the same convergence result holds true. It is well-known that the PP algorithm applied to (D) is equivalent to the augmented Lagrangian method (AL) for the primal problem, see, e.g., [15,14]. To define this algorithm we first transform (P ) into the constrained minimization problem min
u∈H1 ,d∈H2
E(u, d) s.t. Du = d,
(10)
where E(u, d) := g(u) + Φ(d). This problem was introduced in [29]. The corresponding AL algorithm for (P ) is then defined as 1 (u(k+1) , d(k+1) ) = argmin E(u, d) + b(k) , Du − d + Du − d22 2γ u∈H1 ,d∈H2 1 (11) b(k+1) = b(k) + (Du(k+1) − d(k+1) ). γ Indeed, it has been shown that for the same initial value b(0) the sequence (b(k) )k∈N coincides with the one produced by the PP algorithm applied to (D), see [15]. Moreover, if (b(k) )k∈N converges strongly then every strong cluster point of (u(k) )k∈N is a solution of (P ), cf. [17]. To solve the constrained optimization problem (10), Goldstein and Osher [18] proposed to use the Bregman distance (p(k) )
DE
(k)
(k) (u, d, u(k) , d(k) ) = E(u, d) − E(u(k) , d(k) ) − p(k)
− pd , d − d(k) u ,u−u
and the term
1 2γ Du
− d22 instead of FP in (9). This results in the algorithm
(u(k+1) , d(k+1) ) = argmin u∈H1 ,d∈H2
(p(k) )
DE
1 Du − d22 , (12) 2γ 1 (k) = pd + (Du(k+1) − d(k+1) ), γ
(u, d, u(k) , d(k) ) +
1 ∗ (k+1) (k+1) − d(k+1) ), pd pu(k+1) = p(k) u − D (Du γ
where we have used that (12) implies (k) 0 ∈ ∂E(u(k+1) , d(k+1) ) − pu(k) , pd
1 1 + D∗ (Du(k+1) − d(k+1) ), − (Du(k+1) − d(k+1) ) , γ γ (k+1) (k+1) (k+1) (k+1) , ,d ) − pu , pd = ∂E(u (k) (k) (k) (k) ∈ ∂E(u(k) , d(k) ). Setting pu = − γ1 D∗ b(k) and pd = so that pu , pd for all k ≥ 0 and regarding that for a bounded linear operator D, 1 (p(k) ) Du − d22 = E(u, d) − E(u(k) , d(k) ) DE (u, d, u(k) , d(k) ) + 2γ 1 1 1 Du − d22 , − b(k) , Du − Du(k) − b(k) , d − d(k) + γ γ 2γ
1 (k) γb
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage
469
Goldstein and Osher obtained the Split Bregman method [18] (u(k+1) , d(k+1) ) = argmin
E(u, d) +
u∈H1 ,d∈H2
b
(k+1)
1 (k) b + Du − d22 , 2γ
= b(k) + Du(k+1) − d(k+1) .
(13)
As already discovered in [19], the Split Bregman algorithm (13) is just the AL algorithm (11) with the only difference that in (13) the iterates b(k) are scaled by γ. Hence, we can conclude that the sequence ( γ1 b(k) )k∈N generated by the Split Bregman method (13) converges to solutions of the dual problem. The same (k) holds true for the sequence (pd )k∈N we get from (12). To summarize: PP for (D)
=
AL for (P )
=
Split Bregman Alg.
Since the minimization problem in (13) is hard to solve, Goldstein and Osher [18] proposed the following alternating Split Bregman algorithm without a convergence proof: 1 (k) u(k+1) = argmin g(u) + b + Du − d(k) 22 , 2γ u∈H1 1 (k) b + Du(k+1) − d22 , d(k+1) = argmin Φ(d) + 2γ d∈H2 b(k+1) = b(k) + Du(k+1) − d(k+1) .
(14) (15) (16)
The next theorem identifies this alternating Split Bregman method as a special case of a DRS. DRS for (D) = Alternating Split Bregman Alg. If H1 and H2 are finite-dimensional it therefore provides us with a convergence result for the sequence (b(k) )k∈N of this algorithm. Theorem 3. The alternating Split Bregman algorithm coincides with the DRS algorithm applied to (D) with A := ∂(g ∗ ◦ (−D∗ )) and B := ∂Φ∗ , where η = 1/γ and t(k) = η(b(k) + d(k) ), p(k) = ηb(k) , k ≥ 0. (17) Proof: 1. First, we show that for a proper, convex, lsc function h : H1 → R ∪ {+∞} and a bounded linear operator K : H1 → H2 the following relation holds true: η pˆ = argmin Kp − q2 + h(p) ⇒ η(K pˆ − q) = Jη ∂(h∗ ◦(−K ∗ )) (−ηq). 2 p∈H1 (18) The first equality is equivalent to
0 ∈ ηK ∗ (K pˆ − q) + ∂h(ˆ p) ⇔ pˆ ∈ ∂h∗ − ηK ∗ (K pˆ − q) .
470
S. Setzer
Applying −ηK on both sides and adding −ηq implies
−ηK pˆ ∈ −ηK∂h∗ − ηK ∗ (K pˆ − q) = η ∂ h∗ ◦ (−K ∗ ) η(K pˆ − q)
−ηq ∈ I + η ∂(h∗ ◦ (−K ∗ )) η(K pˆ − q) which is by the definition of the resolvent equivalent to the right equality in (18). 2. Applying (18) to (14) with h := g, K := D and q := d(k) − b(k) we get η(b(k) + Du(k+1) − d(k) ) = JηA (η(b(k) − d(k) )). Assume that the alternating Split Bregman iterates and the DRS iterates coincide with the identification (17) up to some k ∈ N. Using this induction hypothesis it follows that η(b(k) + Du(k+1) ) = JηA (η(b(k) − d(k) )) + ηd(k) = t(k+1) .
2p(k) −t(k)
(19)
t(k) −p(k)
By definition of b(k+1) in (16) we see that η(b(k+1) + d(k+1) ) = t(k+1) . Next we apply (18) to (15) with h := Φ, K := I and q := b(k) + Du(k+1) which gives together with (19), η(b(k) + Du(k+1) − d(k+1) ) = JηB (η(b(k) + Du(k+1) )) = p(k+1) .
t(k+1)
Again by the formula (16) for b(k+1) we obtain ηb(k+1) = p(k+1) which completes the proof. 2 A similar result was shown in [20, 21].
4
Application to Image Denoising
In the following, we restrict our attention to a discrete setting. We consider digital images defined on {1, . . . , n} × {1, . . . , n} and reshape them columnwise into vectors f ∈ RN with N = n2 . If not stated otherwise the multiplication of vectors, their square root etc. are meant componentwise. We will now apply the algorithms defined in Sections 2 and 3 to the discrete denoising problem of the form argmin u∈RN
1 2
u − f 22 + Φ(Du) ,
D ∈ RM,N ,
M ≥ N,
(20)
where Φ is defined as in Section 2. Consider the alternating Split Bregman algorithm (14)-(16) with g(u) := 12 u − f 22 . Theorem 3 implies the convergence
of b(k) k∈N and it is not hard to show that for this special choice of g, the se
quence u(k) k∈N converges to a solution of the primal problem. The quadratic functional in (14) with the above choice of g can simply be minimized by setting its gradient to zero which results in
u(k+1) = (γI + DT D)−1 γf + DT (d(k) − b(k) ) .
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage
471
Goldstein and Osher proposed to calculate the inverse (γI + DT D)−1 by GaußSeidel iterations. Applying (4) we see that for Φ = Φ1 the solution of the proximation problem in (15) is given by d(k+1) = TγΛ (b(k) + Du(k+1) ). The following algorithm shows the case Φ = Φ1 . Observe that in order to better compare this method to the other algorithms in this section, we have changed the order in which we compute u(k+1) . This is allowed because there are no restrictions on the choice of the starting values. Algorithm (Alternating Split Bregman Shrinkage) Initialization: u(0) := f , b(0) := 0. For k = 0, 1, . . . repeat until a stopping criterion is reached d(k+1) := TγΛ (b(k) + Du(k) ), b(k+1) := b(k) + Du(k) − d(k+1) ,
u(k+1) := (γI + DT D)−1 γf + DT (d(k+1) − b(k+1) ) . For Φ = Φ2 we have to replace the soft shrinkage TγΛ by the coupled shrinkage T˜ ˜. Note that this algorithm can also be used for the deblurring problem which γΛ
differs from (20) in having a more general data-fitting term g(u) := 12 Ku − f 22 with some linear operator K. In this case one has to invert the matrix γK T K + DT D which can be diagonalized in many applications by FFT or DCT techniques, e.g., if it is circulant. The problem (20) can also be solved via its dual problem by u ˆ = f − DTˆb, where ˆb = argmin{ 1 f − DT b2 + Φ∗ (b)}, i = 1, 2 (21) 2 i 2 b∈RM see, e.g., [22]. Applying the FBS algorithm (8) to the dual problem (21) gives b(k+1) = proxγΦ∗i b(k) + γD(f − DT b(k) ) , i = 1, 2, where 0 < γ < 2/DT D2 . Using the relation (5) we obtain for Φ = Φ1
b(k+1) = b(k) + γD(f − DT b(k) ) − TΛ b(k) + γD(f − DT b(k) ) . This yields the following algorithm to compute the minimizer of (20) for Φ = Φ1 : Algorithm (FBS Shrinkage) Initialization: u(0) := f , b(0) := 0 For k = 0, 1, . . . repeat until a stopping criterion is reached
d(k+1) := TΛ b(k) + γDu(k) , b(k+1) := b(k) + γDu(k) − d(k+1) , u(k+1) := f − DT b(k+1) .
472
S. Setzer
For the functional Φ2 we have to replace the shrinkage functional by T˜Λ˜. This algorithm can also be deduced as a simple gradient descent reprojection algorithm as it was done, e.g., by Chambolle [2]. Note that this is not the often cited Chambolle algorithm in [22]. A relation of this method to the Bermúdez-Moreno algorithm which also turns out to be an FBS algorithm was shown in [23]. A connection to min-max duality was established in [24]. 4.1
Besov-Norm Regularization
For a sufficiently smooth orthogonal wavelet basis {ψi }i∈I of L2 (Ω) with wavelets of more than one vanishing moment, problem (1) can be rewritten as 1 d − c2 2 + λd 1 , 2 where c := ( f, ψi )i and d := ( u, ψi )i . In the discrete setting, consider the orthogonal matrix W ∈ RN,N having as rows the filters of orthogonal wavelets (and scaling functions) up to a certain level. Then the minimization problem corresponding to (1) is given by 1 u − f 22 + ΛW u1 2 u∈RN 1 = argmin W u − W f 22 + ΛW u1 . 2 N u∈R
u ˆ = argmin
(22)
ˆ where The orthogonality of W yields further u ˆ = W T d, 1 dˆ = argmin d − c22 + Λd1 , 2 d∈RN
c := W f, Λ := λIN
(23)
and by (4) we obtain the known wavelet shrinkage procedure u ˆ = W T TΛ (W f ) consisting of a wavelet transform W followed by soft shrinkage TΛ of the wavelet coefficients and the inverse wavelet transform W T . However, for image processing tasks like denoising or segmentation, ordinary orthogonal wavelets are not suited due to their lack of translational invariance which leads to visible artefacts. Nevertheless, without the usual subsampling, the method becomes translationally invariant and the results can be improved. But then W ∈ RM,N , M = pN , where p is three times the decomposition level plus one for the rows belonging to the scaling function filters on the coarsest scale. We still have W T W = IN , but of course W W T = IM , i.e., the rows of W form a discrete Parseval frame on RN but not a basis. For the design of such frames see, e.g., [25, 26]. Equality (22) is still true for Parseval frames, but the problem is no longer equivalent to (23). Instead we can apply FBS shrinkage or alternating Split Bregman shrinkage with D = W and Φ = Φ1 . Note that in order to use the FBS algorithm, γ has to fulfill 0 < γ < 2/W TW 2 . Now W T W = IN , thus we have to choose γ in (0, 2) and γ = 1 is an admissible choice. It was shown in [27] that both algorithms coincide for D = W with W T W = IN and γ = 1:
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage
Alternating Split Bregman Shrinkage
473
FBS Shrinkage
=
Moreover, the third step of both algorithms can be simplified to the frame synthesis step u(k+1) = W T d(k+1) . 4.2
(24)
ROF Regularization
In this section, we apply the algorithms presented so far to the discrete ROF denoising method. We use an appropriate discretization of the absolute value of the gradient. Let h0 := 12 [1 1] and h1 := 12 [1 − 1] be the filters of the Haar wavelet. For convenience of notation, we use periodic boundary conditions and the corresponding circulant matrices are denoted by H0 ∈ Rn,n and H1 ∈ Rn,n . Then the following matrix fulfills W T W = IN but W T W = I4N ⎛
⎞
⎛
⎞
H0 ⊗ H0 H0 ⎜ H0 ⊗ H1 ⎟ ⎜ ⎟ W := ⎝ = ⎝ ⎠. H1 ⊗ H0 ⎠ H1 H1 ⊗ H1
In [4,5] it was shown that
2
2
2 12 (H0 ⊗ H1 ) u + (H1 ⊗ H0 ) u + (H1 ⊗ H1 ) u
is a consistent finite difference discretization of |∇u|. Using this gradient discretization, the discrete version of the ROF functional in (2) reads argmin u∈RN
1 u − f 22 + Λ˜ |H1 u| 1 , 2
Λ˜ := λIN .
(25)
Observe that if we use the alternating Split Bregman algorithm with D = H1 for this problem we have to solve a linear system of equations in the third step of each iteration. This problem can be avoided by using that H1 is part of a Parseval frame, cp. [27]. To this end we define the proper, convex and lsc functional Φ˜2 which differs from Φ2 in that the first part of the input vector is neglected, i.e., Φ˜2 (c) = Λ˜ |c1 | 1 ,
for c = (c0 , c1 ) ∈ RN × R3N .
Now we can rewrite (25) as follows argmin u∈RN
1 u − f 22 + Φ˜2 (W u) . 2
Applying the alternating Split Bregman algorithm, or equivalently the FBS method, with γ = 1 and (24) we obtain the following algorithm.
474
S. Setzer
Initialization: u(0) := f , b(0) := 0. For k = 0, 1, . . . repeat until a stopping criterion is reached (k+1)
d0
(k+1)
d1 b
(k+1)
u(k+1)
:= (W u(k) )0 ,
:= T˜ ˜ b(k) + (W u(k) )1 , Λ
(k+1)
+ (W u(k) )1 − d1 (k+1) d0 T , := W (k+1) d1 := b
(k)
, (26) (0)
where (W u)0 := H0 u and (W u)1 := H1 u. Note that starting with b0 := 0 all (k) iterates b0 remain zero vectors. We also obtain algorithm (26) if we apply FBS shrinkage directly to (25) with D = H1 and γ = 1. We now give a numerical example for these two algorithms. The computations were performed in MATLAB. In Fig. 1 we see the result of applying the two algorithms to a noisy image. Note that we only show the resulting image for algorithm (26) here, since the difference to the alternating Split Bregman
0.3 0.25 0.2 0.15 0.1 0.05 0 −0.05 −0.1 −0.15 −0.2
Fig. 1. Comparison of algorithm (26) and the alternating Split Bregman method with D = H1 . Stopping criterion: u(k+1) − u(k) ∞ < 0.5. Top left: Original image. Top right: Noisy image (white Gaussian noise with standard deviation 25). Bottom left: Algorithm (26), λ = 70, (53 iterations). Bottom right: Difference to alternating Split Bregman shrinkage with D = H1 , (53 iterations).
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage
475
method with D = H1 is marginal. We also found that the two algorithms need nearly the same number of iterations. However, algorithm (26) is extremely fast and does not require solving a linear system of equations as the alternating Split Bregman shrinkage does. Moreover, γ = 1 seems to be a very good parameter choice. For the above numerical experiment we used periodic boundary conditions, concerning Neumann boundary conditions, see, e.g., [28].
References 1. DeVore, R.A., Lucier, B.J.: Fast wavelet techniques for near-optimal image processing. In: IEEE MILCOM 1992 Conf. Rec., vol. 3, pp. 1129–1135. IEEE Press, San Diego (1992) 2. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 3. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 4. Mrázek, P., Weickert, J.: Rotationally invariant wavelet shrinkage. In: Michaelis, B., Krell, G. (eds.) DAGM 2003. LNCS, vol. 2781, pp. 156–163. Springer, Heidelberg (2003) 5. Welk, M., Steidl, G., Weickert, J.: Locally analytic schemes: A link between diffusion filtering and wavelet shrinkage. Applied and Computational Harmonic Analysis 24, 195–224 (2008) 6. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis 16(6), 964–979 (1979) 7. Tseng, P.: Applications of a splitting algorithm to decomposition in convex programming and variational inequalities. SIAM Journal on Control and Optimization 29, 119–138 (1991) 8. Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53(5–6), 475–504 (2004) 9. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation 4, 1168–1200 (2005) 10. Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Transactions of the American Mathematical Society 82(2), 421–439 (1956) 11. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7(3), 200–217 (1967) 12. Eckstein, J.: Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming. Mathematics of Operations Research 18(1), 202–226 (1993) 13. Kiwiel, K.C.: Proximal minimization methods with generalized Bregman functions. SIAM Journal on Control and Optimization 35(4), 1142–1168 (1997) 14. Frick, K.: The Augmented Lagrangian Method and Associated Evolution Equations, Dissertation, University of Innsbruck (2008) 15. Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Mathematics of Operations Research 1(2), 97– 116 (1976)
476
S. Setzer
16. Browder, F.E., Petryshyn, W.V.: The solution by iteration of nonlinear functional equations in Banach spaces. Bulletin of the American Mathematical Society 72, 571–575 (1966) 17. Iusem, A.N.: Augmented Lagrangian methods and proximal point methods for convex optimization. Investigación Operativa 8, 11–49 (1999) 18. Goldstein, D., Osher, S.: The Split Bregman method for l1 regularized problems. UCLA CAM Report (2008) 19. Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for 1 minimization with applications to compressed sensing. SIAM Journal on Imaging Sciences 1(1), 143–168 (2008) 20. Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming 55, 293–318 (1992) 21. Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary–Value Problems. Studies in Mathematics and its Applications, vol. 15, pp. 299–331. North–Holland, Amsterdam (1983) 22. Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 20, 89–97 (2004) 23. Aujol, J.F.: Some first-order algorithms for total variation based image restoration. Preprint ENS Cachan (2008) 24. Zhu, M., Chan, T.: An efficient primal-dual hybrid gradient algorithm for total variation image restauration. UCLA CAM Report (2008) 25. Daubechies, I., Han, B., Ron, A., Shen, Z.: Framelets: MRA-based construction of wavelet frames. Applied and Computational Harmonic Analysis 14, 1–46 (2003) 26. Dong, B., Shen, Z.: Pseudo-splines, wavelets and framelets. Applied and Computational Harmonic Analysis 22, 78–104 (2007) 27. Setzer, S., Steidl, G.: Split Bregman method, gradient descent reprojection method and Parseval frames. Preprint Univ. Mannheim (2008) 28. Chan, R.H., Setzer, S., Steidl, G.: Inpainting by flexible Haar-wavelet shrinkage. SIAM Journal on Imaging Science 1, 273–293 (2008) 29. Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences 1(3), 248–272 (2008)
Anisotropic Smoothing Using Double Orientations Gabriele Steidl and Tanja Teuber University of Mannheim, A5, 68131 Mannheim, Germany
[email protected],
[email protected] http://kiwi.math.uni-mannheim.de Abstract. To improve the quality of image restoration methods directional information has recently been involved in the restoration process. In this paper, we propose a two step procedure for denoising images that is particularly suited to recover sharp vertices and X junctions in the presence of heavy noise. In the first step, we estimate the (smoothed) orientations of the image structures, where we find the double orientations at vertices and X junctions using a model of Aach et al. Based on shape preservation considerations this directional information is then applied to establish an energy functional which is minimized in the second step. We discuss the behavior of our new method in comparison with single direction approaches appearing, e.g., when using the classical structure tensor of Förstner and Gülch and demonstrate the very good performance of our method by numerical examples.
1
Introduction
Recently, much effort has been put into improving image restoration processes by involving directional information. Our paper contributes to this topic. We restrict our attention to the denoising of images f ∈ L2 (R2 ) corrupted by heavy white Gaussian noise and the minimization of energy functionals 1 2 f − uL2 + λJ(u) , arg min (1) u∈L2 2 where J : L2 → R≥0 ∪ {+∞} denotes a proper, convex, closed functional which is in addition positively homogeneous. Frequently applied examples of such functionals are R2 ϕ(∇u) dx, u ∈ BVϕ , J(u) := (2) ∞, u ∈ L2 \BVϕ , where ϕ(x) = ϕ1 (x) := |x1 | + |x2 | as in [1, 2] or ϕ(x) = ϕ2 (x) := x21 + x22 as 2 2 in the Rudin-Osher-Fatemi (ROF) model [3]. Here BVϕ (R ) := {u ∈ L2 (R ) : ϕ(∇u) dx < ∞} denotes the (anisotropic) space of functions of bounded R2 variation equipped with the norm ϕ(∇u) dx := sup − u(x) divV (x) dx, (3) R2
1 (R2 ,R2 ) V ∈Cc V ∈Wϕ a.e.
R2
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 477–489, 2009. c Springer-Verlag Berlin Heidelberg 2009
478
G. Steidl and T. Teuber
where the Wulff shape Wϕ := {x ∈ R2 : x, y ≤ ϕ(y) ∀y ∈ R2 } of ϕ is the unit square with horizontal and vertical edges in case ϕ = ϕ1 and the unit circle for ϕ = ϕ2 . Note that other positively homogeneous, finite, convex, even functions ϕ with ϕ(0) = 0 and ϕ(x) > 0 for x = 0 can be used in (2) and that the spaces BVϕ are equivalent for all these functions [4]. Besides (2) we will also apply inf convolution functionals J(u) := (J1 2J2 )(u) :=
inf
u=u1 +u2
{J1 (u1 ) + J2 (u2 )},
(4)
where J1 , J2 are nonnegative, proper, convex, closed and positively homogeneous. A possible choice for J and J suggested, e.g., in [5] are |∂x u1 | dx 1 2 R2 and R2 |∂y u2 | dx. It is well known that for large regularization parameters λ model (2) with ϕ1 and similarly the above inf convolution model tends to cut vertices vertically and horizontally while the ROF approach rounds them. Therefore we propose to introduce local directional information obtained from the double direction tensors of Aach et al. [6] into these functionals. Outline of our paper. In Sec. 2 we recall the single orientation estimations provided by the structure tensor in [7]. Then we turn to the double orientation estimations proposed in [6], where we get some additional insights on the nullspaces of these tensors. In Sec. 3 we start with shape preservation facts as motivation for the subsequent introduction of our new directional denoising model. Furthermore, we discuss our orientation choice in comparison to the classical structure tensor. The good performance of our method is demonstrated by numerical examples in Sec. 4. Conclusions are given in Sec. 5. More details including proofs are contained in the accompanying preprint [8]. Related work. Image restoration by first approximating the local geometry and then involving it into the restoration process was suggested in various papers. A group of methods retrieves the local geometry by computing the Gülch/Förstner structure tensor and then uses its eigenvalues and orthogonal eigenvectors to define a diffusion tensor which steers the direction of the flux in PDEs. Tschumperlé [9] divided these methods into divergence-based [10], tracebased [11] and his curvature-based methods. The first approach is also related to the minimization of specific energy functionals, see, e.g., [12,13]. The curvaturebased method [14, 9] which is related to the line integral convolution [15] is better suited for the restoration of sharp edges than the other two methods, but our method is superior in the presence of heavy noise. Note that as in [16] the curvature-based method can include multiple directions. Various papers deal with the smoothing of normal vectors by minimizing certain energy functionals [17, 18, 19, 20, 21, 22] and use this information for subsequent denoising. In general these minimization procedures are much more expensive then our double direction approach. Kimmel, Sochen et al. suggested restoration techniques within the Beltrami framework [23]. The corresponding smoothing with the socalled ’short-time Beltrami kernel’ differs from the bilateral filters [24] in the fact that it uses geodetic distances on the image manifold while the bilateral
Anisotropic Smoothing Using Double Orientations
479
kernel applies Euclidian distances. In [25], the authors considered special images containing rotated rectangle and established a unique functional both for finding the rotation angles and for denoising. However, the resulting algorithm is again a two step procedure. For a simpler two step approach we refer to [26]. So far, the best results behind our new method we have obtained by applying nonlocal means [27, 28]. An example is reported in Sec. 4.
2 2.1
Orientation Estimations Single Orientation Estimations
Let Ω ⊂ R2 be the image part of interest. For simplicity, we assume that Ω := Bε (0) is the ball around 0 with radius ε. Our ideal assumption is that this part of the image corresponds to a function f : Ω → R which has constant values along a single direction r with r2 = 1, i.e., f = ϕ(sT ·) with s := r⊥ = (r2 , −r1 )T and ϕ : [−ε, ε] → R. Then, 0=
∂ f (x) = rT ∇f (x) = rT ϕ (sT x) s, ∂r
∀x ∈ Ω
holds true and we also have for a nonnegative weight function w : Ω → R that 2 0= w(x) (rT ∇f (x)) dx = rT w(x)∇f (x) ∇f (x)T dx r. (5) Ω
Ω
If ϕ is not constant, then the symmetric, positive semidefinite matrix 2 J := w(x)∇f (x)∇f (x)T dx = w(x) (ϕ (sT x)) dx ssT Ω
Ω
has rank one and r is an eigenvector of the eigenvalue 0. So far we have considered image parts with an ideal directional behavior. Since in applications we deal with noisy images, a pre-smoothing step with the 2D Gaussian Kσ of standard deviation σ is performed before computing the gradient in J . Thus, (5) holds at least approximately and r is the minimizer of the weighted least squares expression rT J r subject to r2 = 1, i.e., the eigenvector belonging to the smallest eigenvalue of J . Moreover, in natural images the significant directions vary in different image parts. To detect the direction in the neighborhood of every image point x, we use the shifted Gaussian w = Kρ (· − x) (truncated outside B3ρ (x)). In this way, we can attach to each image point a 2 × 2 matrix, the so-called structure tensor Jρ := Kρ ∗ (∇fσ ∇fσT ) ,
∇fσ := ∇(Kσ ∗ f ).
If the eigenvalues of Jρ (x) fulfill λ1 λ2 , then we are in the neighborhood of an edge and the orthogonal eigenvectors r1 = r and r2 = r⊥ approximate the isophote direction and the gradient direction in x. In the neighborhood of vertices, where λ2 ≥ λ1 0, we obtain smoothed eigenvectors between neighboring edges. This causes artefacts in restoration models involving these directions. Therefore we are interested in double orientations.
480
2.2
G. Steidl and T. Teuber
Double Orientation Estimations
Assume that f can be decomposed into two functions fi = ϕi (sTi ·) with si := ri⊥ , i = 1, 2, where r1 ∦ r2 . As in Fig. 1, we consider two decompositions of f , the transparent model f (x) = f1 (x) + f2 (x) ∀x ∈ Ω (6) and the occlusion model with Ω = Ω1 ∪ Ω2 , Ω1 ∩ Ω2 = ∅ and f1 (x) for x ∈ Ω1 , f (x) = f2 (x) for x ∈ Ω2 .
Ω
f1
Ω1
f1
f2
Ω2
f2
(7)
Fig. 1. Illustration of the transparent model (left) and the occlusion model (right)
Transparent model. By the definition of f1 and f2 we conclude for all x ∈ Ω that 0=
∂2 ∂2 f1 (x) + f2 (x) = r2T H(x) r1 = r1T H(x) r2 f (x) = ∂r1 ∂r2 ∂r1 ∂r2
(8)
with the Hessian H(x) of f at x. Applying tensor products ⊗ of matrices, (8) becomes 0 = (r1 ⊗ r2 )T h(x) = (r2 ⊗ r1 )T h(x)
with h := (∂xx f, ∂xy f, ∂xy f, ∂yy f )T (9)
and since this holds true for all x ∈ Ω we also get 0= w(x) (r1 ⊗ r2 )T h(x)h(x)T (r1 ⊗ r2 ) dx = (r1 ⊗ r2 )T T (r1 ⊗ r2 )
(10)
Ω
with the symmetric, positive semidefinite matrix T := Ω w(x) h(x)h(x)T dx ∈ R4,4 . By (10) and since r1 ∦ r2 , the vectors r1 ⊗ r2 and r2 ⊗ r1 are two linearly independent eigenvectors of the eigenvalue 0 of T . Instead of determining the directions r1 and r2 via (10), Aach et al. [6] proposed to rewrite (9) by skipping the double entry ∂xy f in h as ˜ ˜ := (∂xx f, ∂xy f, ∂yy f )T , r := (r11 r21 , r11 r22 + r12 r21 , r12 r22 )T . with h 0 = rT h(x) (11) Then our determining equation (10) becomes T ˜ h(x) ˜ T dx ∈ R3,3 0 = r T r with T := w(x) h(x) (12) Ω
Anisotropic Smoothing Using Double Orientations
481
and r is an eigenvector of 0 of the symmetric, positive semidefinite matrix T . ˜ := ˜ 1 , s2 ⊗s ˜ 2 ), v ⊗v More precisely, we can prove that T = S Φ S T with S := (s1 ⊗s 2 2 T (v1 , v1 v2 , v2 ) and
2 ϕ1 (sT1 x) ϕ2 (sT2 x) ϕ1 (sT1 x) dx w(x) Φ := T 2 ϕ1 (sT1 x)ϕ2 (sT2 x) Ω ϕ2 (s2 x) so that rank T = 0 if ϕi ∈ Π1 , i = 1, 2, rank T = 1 if ϕi ∈ Π1 for exactly one i or ϕi ∈ Π2 \ Π1 for i = 1, 2, rank T = 2 otherwise, where Πn denotes the space of polynomials on [−ε, ε] of degree ≤ n. If rank T = 2 (vertex case), then the nullspace of T is N (T ) = {c r : c ∈ R}. If rank T = 1 (edge case) and ϕ1 is linear but ϕ2 not, then N (T ) = {(r11 c1 , r11 c2 + r12 c1 , r12 c2 )T : c = (c1 , c2 )T ∈ R2 }, i.e., c plays the role of r2 in (11). There exist several possibilities to detect the directions ri , i = 1, 2 from an eigenvector u = (u1 , u2 , u3 )T ∈ N (T ). For example, it is not hard to check that the following setting from [6] does the job: T T For u1 = 0 set r1 := √ 21 2 (u1 , y1 ) , r2 := √ 21 2 (u1 , y2 ) , where yi , i = 1, 2 u1 +y1
u1 +y2
are the solutions of the quadratic equation y 2 − u2 y + u1 u3 = 0. If u1 = 0, then T T yi = 0 for one i and we set ri := √ 21 2 (u2 , u3 ) and r3−i := (0, 1) . u2 +u3
In the following, we choose as direction r1 those fulfilling |r1 , ∇fσ˜ | ≤ |r2 , ∇fσ˜ |. In particular, r1 is the isophote direction at edges, where some vector c plays the role of r2 . Occlusion model. By the definition of f1 and f2 we conclude for all x ∈ Ω that 0=
∂ ∂ f (x) f (x) = (r1T ∇f (x)) (r2T ∇f (x)) = r1T ∇f (x)∇f (x)T r2 ∂r1 ∂r2
(13)
and by rewriting the equation using tensor products that T 0 = (r2 ⊗r1 )T g(x) = (r1 ⊗r2 )T g(x) with g := (∂x f )2 , ∂x f ∂y f, ∂x f ∂y f, (∂y f )2 . This reads in the reduced form with r defined by (11) as T 0 = rT g˜(x) with g˜ := (∂x f )2 , ∂x f ∂y f, (∂y f )2 . Since this relation is true for all x ∈ Ω, we also have that T 0 = r C r with C := w(x) g˜(x)˜ g (x)T dx.
(14)
Ω
Thus, r is an eigenvector of the eigenvalue 0 of the symmetric, positive semidef˜ 1 )(s1 ⊗s ˜ 1 )T + inite matrix C. More precisely, we can prove that C = α1 (s1 ⊗s 4 T ˜ 2 )(s2 ⊗s ˜ 2 )T with αi := α2 (s2 ⊗s dx, i = 1, 2, so that the rank Ωi w(x) ϕi (si x) of C is ν ∈ {0, 1, 2} if exactly 2−ν of the functions ϕi are constant on Ωi , i = 1, 2. The directions ri , i ∈ {1, 2} can be obtained from an eigenvector of N (C) as in the transparent model.
482
G. Steidl and T. Teuber
Fig. 2. Noisy images and their double orientation estimations by the occlusion model (left) and by the transparent model (right)
Double orientation tensors. In practice, we deal with noisy images having image parts with various significant directions. As for the classical structure tensor the double orientation tensors are defined as
˜σ˜ Tρ := Kρ ∗ h gσ g˜σT ) , hTσ , Cρ := Kρ ∗ (˜ ˜ := ∂xx fσ , ∂xy fσ , ∂yy fσ T , g˜ := (∂x fσ )2 , ∂x fσ ∂y fσ , (∂yy fσ )2 T and the where h directions r1 , r2 can be derived from an eigenvector of the smallest eigenvalue of Tρ /Cρ (x). For an example of estimated double orientations see Fig. 2.
3
Image Restoration and Shape Preservation
We start with a proposition which characterizes the solution of (1). Proposition 1. The function uˆ ∈ L2 is the solution of the minimization problem (1) iff i) u ˆ = f − λˆ v , ii) vˆ ∈ CJ := {v ∈ L2 : v, w ≤ J(w) ∀w ∈ L2 }, iii) ˆ u, vˆ = J(ˆ u). For the special functional (2) we have that vˆ ∈ CJ if there exists a vector field Vˆ ∈ L∞ (R2 , R2 ) such that vˆ := −divVˆ ∈ L2 (R2 ) and Vˆ ∈ Wϕ a.e. on R2 . Using this proposition, one can prove that rectangles with horizontal and vertical edges [4] and + junctions [8] are preserved by the solution of (1) with (2) and ϕ = ϕ1 . Corollary 1. The solution u ˆ of (1) with (2) and ϕ = ϕ1 reads function 1Ω of Ω := (−a, a) × (−b, b) as i) for f:= c 1Ω with the characteristic cab u ˆ = c − λ a+b 1 , λ ≤ , a, b > 0, Ω ab a+b ii) for f := c1 1Ω1 + c2 1Ω2 with Ω1 := (−l, l) × (−a, a), b) × (−l, l) as Ω2 := (−b, c1 la c2 lb l+a l+b u ˆ = c1 − λ la 1Ω1 + c2 − λ lb 1Ω2 , λ ≤ min l+a , l+b , l > a, b > 0. In this paper, we propose to modify (2) (and similarly (4)) by locally including directions. The basic idea is that the minimizer of the modified functional also preserves shapes as, e.g., shown in Fig. 3 and arbitrary X junctions. This
Anisotropic Smoothing Using Double Orientations
483
Fig. 3. Original and noisy trapezoid image (standard deviation 150)
modification can be motivated by the following considerations for a globally fixed transform matrix R: Substituting x := R−1 t, fR := f (R−1 ·), we obtain 1 (f − u)2 + λϕ(∇u) dx 2 R2 1 = (f (R−1 t) − u(R−1 t))2 + λ ϕ(∇x u(R−1 t)) dt 2|det R| R2 1 = (fR (t) − uR (t))2 + λ ϕ(RT ∇t uR (t)) dt. 2|det R| R2 Whence, if u ˆ minimizes the left-hand side, then the transformed image u ˆR := u ˆ(R−1 ·) is a minimizer of 1 2 (fR − u) dx + λ ϕ(RT ∇u) dx. (15) 2 R2 R2 n−1
In the following, we consider discrete square images f := (f (x, y))x,y=0 ∈ Rn,n in their columnwise reshaped form f ∈ RN , N := n2 . Instead of partial derivatives we use forward differences so that the discrete version of the gradient reads ⎛
11 ⎜ 1 1 ⎜ 1⎜ H0 ⊗ H1 Dx .. := , H0 := ⎜ D= . Dy H1 ⊗ H0 2⎜ ⎝ 1
⎞
⎞
⎛
−1 1 ⎟ ⎜ −1 1 ⎟ ⎜ ⎟ ⎜ .. ⎟ , H1 := ⎜ . ⎟ ⎜ ⎝ 1⎠ −1 2
⎟ ⎟ ⎟ ⎟. ⎟ 1⎠ 0
Then problem (1) becomes
arg min f − u22 + λJ(u) u∈RN
and (2) with ϕ = ϕ1 resp. (4) with
J(u) := Du1 , J(u) :=
R2
|∂x u1 | dx and
(16)
R2
|∂y u2 | dx read as
resp.
min {Dxu1 1 + Dy u2 1 }.
u=u1 +u2
The solution of (16) can be characterized as in the continuous setting:
(17) (18)
484
G. Steidl and T. Teuber
160
140
120
100
80
60
40
20
Fig. 4. Denoising with the directions r, r ⊥ from the classical structure tensor. Left: Angle of r mod 180o (σ = 2.5, ρ = 5). The directions are smoothed near vertices following the smallest way between neighboring edge directions. Middle: Denoising result using only one direction R := (r) (λ = 2500). Following this direction, obtuse vertices are rounded, while the acute one is prolongated. Right: Denoising result using both directions R = (r1 , r2 ) = (r, r ⊥ ) (λ = 1000). The edges of the minimizer u ˆ tend u|, i = 1, 2 to be aligned with one of the directions ri , i.e., one of the summands |ri , ∇ˆ becomes very small. Hence, rounding artefacts are visible at obtuse vertices, while the model decides for the wrong direction at the acute vertex which leads to a cut-off artefact.
Proposition 2. The vector u ˆ ∈ RN is the solution of the minimization problem (16) if and only if i) - iii) of Proposition 1 hold true, where L2 has to be replaced by RN with the Euclidian inner product. For the special functionals (17) and (18) T we have that vˆ ∈ CJ if and only if there exists a vector Vˆ = (Vˆ (1) )T , (Vˆ (2) )T ∈ R2N such that vˆ := DT Vˆ vˆ := DxT Vˆ (1) = DyT Vˆ (2)
and Vˆ ∞ ≤ 1, and Vˆ ∞ ≤ 1.
resp.,
As in the continuous case rectangles and + junctions are preserved by the solution of (16) with (17). However, due to image boundaries one has to be careful with the discretization. Corollary 2. Let x0 , y0 ≥ 0 and x0 + a, y0 + b ≤ n − 2. The solution u ˆ of the minimization problem (16) with J defined by (17) reads for i) f :=
:= {x0 + 1, · · · , x0 + a} × {y0 + 1, · · · , y0 + b} as c 1Ω with Ω 2(a+b) cab 1Ω , λ ≤ 2(a+b) , where Hi are modified by Hi (0, 0) = 0, u ˆ = c − λ ab Hi (n − 1, n − 1) = (−1)i , i = 0, 1. ii) f := c1 1Ω1 + c2 1Ω2 with Ω1 := {x0 + 1, · · · ,x0 + a} × {0, . . . ,n − 1}, Ω2 := {0, . . . , n − 1} × {y0 + 1, · · · , y0 + b} as u ˆ = c1 − λ a2 1Ω1 + c2 − λ 2b 1Ω2 , λ ≤ min{ ac21 , bc22 }, where Hi are modified by H0 (n − 1, 0) = 1, H1 (0, 0) = 0, Hi (n − 1, n − 1) = (−1)i , i = 0, 1. Similarly it can be shown that the inf convolution approach preserves + junctions [8].
Anisotropic Smoothing Using Double Orientations 140
140
120
120
100
100
80
80
60
60
40
40
20
20
0
0
485
Fig. 5. Denoising with double orientations from the occlusion model. Left/Middle: u|, i = 1, 2. Except at isolated vertex points the model aligns the Energies |ri , ∇ˆ edges of the minimizer u ˆ with the direction r1 (σ = 2, ρ = 9.5). Right: Denoised image (λ = 2500). Although not perfect, this result is the best we got with various denoising methods so far. 180
160
140
120
100
80
60
40
20
0
Fig. 6. Denoising with the single direction r1 from the occlusion model. Left: Angle corresponding to the chosen direction (σ = 2, ρ = 9.5, σ ˜ = 5σ). Middle: Denoising with the regularization term |r1 , ∇u| introduces textures at flat regions (λ = 2500). Right: Denoising with the regularization term |∇u| − r1⊥ , ∇u avoids these artefacts (λ = 4500).
Having (15) in mind we introduce our double orientations r1 , r2 from Subsection 2.2 into (17) resp. (18) and consider for r˜iT = (diag(ri1 ), diag(ri2 )), i = 1, 2, the minimizers of our new functionals 1 ˜ T Du1 = 1 f − u22 + λ(˜ f − u22 + λR r1T Du1 + ˜ r2T Du1 ), 2 2 1 f − u22 + λ min {˜ r1T Du1 1 + ˜ r2T Du2 1 } . u=u1 +u2 2
(19)
(20)
We want to examine the behavior of (19) by the simple denoising example in Fig. 3. First, we computed the minimizers using the directions r and r⊥ from the classical structure tensor. The appearing artefacts are commented in the caption of Fig. 4. Then, Fig. 5 shows the good denoising result with the proposed occlusion model for double orientations. Finally, Fig. 6 presents the denoising results
486
G. Steidl and T. Teuber
obtained by using only direction r1 from this model. This leads to artefacts in flat regions, where the process introduces texture due to directional smoothing of heavy noise. This effect can be avoided by replacing |r1 , ∇u | by |∇u|−r1⊥ , ∇u . Note that we have to adapt the sign of r1⊥ such that r1⊥ , ∇fσ˜ ≥ 0 here. This functional was also proposed in [19] but with a more expansive procedure to find appropriate directions r1⊥ .
4
Numerical Examples
In the following, we present further numerical examples. All programs were written in MATLAB, where we solved the minimization problems via their dual problem using second-order cone programming implemented in the software package MOSEK [29]. To discretize the derivatives occurring in the orientation estimation tensors we applied the filters suggested by Scharr in [30]. The gray values of the original images are in [0, 255] and for visualization we have used the MATLAB routine ’imagesc’, which incorporates an affine gray value scaling. Moreover, the parameters are chosen with respect to the best visual result. To start with, we took a noisy image with different shapes and restored it by nonlocal means, ROF and by (19) with occluding directions. The results are presented in Fig. 7. As already observed in [25] the result by ROF suffers from rounding artefacts at corners, since to remove all noise the regularization parameter λ has to be chosen rather large. This is avoided by (19) using occluding directions as visible at bottom right. The example with nonlocal means gives slightly worse results at corners. To demonstrate the performance on a real world image we included Fig. 8. Here, the example shows that the shape of
Fig. 7. Top: noisy image (standard deviation 100) and restored image by iterating two times the nonlocal means filter [28]. Bottom left: denoised image by ROF (λ = 500). Bottom right: restored image by (19) and occluding directions (λ = 900, σ = 2, ρ = 6).
Anisotropic Smoothing Using Double Orientations
487
Fig. 8. Top: noisy image (standard deviation 30) and result by the nonlocal means filter [28]. Bottom left: denoised image by ROF (λ = 50). Bottom right: result by (19) and occluding directions (λ = 50, σ = 0.5, ρ = 8).
Fig. 9. Left to right: original image [30], noisy image (standard deviation 10), denoised image by (19) (λ = 15, σ = 2, ρ = 12), denoised image by (20) (λ = 40, σ = 2,ρ = 12). The directions are estimated by the transparent model.
488
G. Steidl and T. Teuber
the building is much better preserved by (19) than by ROF, since the local directions in the image are treated much more accurate. In contrast to nonlocal means, our method as well as ROF suffer from staircaising effects. However, for a large smoothing parameter related to the noise level nonlocal means creates small blur artefacts where our result has sharp structures. Besides, our method is computationally much faster. Finally, to point out the benefits of inf convolution, Fig. 9 shows restored images of an oriented texture by (19) and (20) resp. using the transparent model. For such images inf convolution is better suited than (19), since (19), like ROF, aims for a piecewise constant solution, which means that too many details are removed.
5
Conclusions
We have demonstrated how directional information estimated by the transparent or the occlusion model [6] can be integrated into certain minimization problems to improve the restoration results especially at sharp corners and X junctions. For simplicity we have restricted our attention to double orientations, but a generalization to more than two directions is possible with the results presented in [31]. To further improve the restoration results one option would be to use also higher order derivatives as done in [32]. Through this, it is for example possible to overcome the staircaising effects observed for (19).
References 1. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 2. Hintermüller, M., Kunisch, K.: Total bounded variation regularization as a bilaterally constrained optimization problem. SIAM J. Appl. Math. 4(64), 1311–1333 (2004) 3. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 4. Esedoglu, S., Osher, S.: Decomposition of images by the anisotropic Rudin-OsherFatemi model. Comm. Pure and Applied Mathematics 57(12), 1609–1626 (2004) 5. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numerische Mathematik 76, 167–188 (1997) 6. Aach, T., Mota, C., Stuke, I., Mühlich, M., Barth, E.: Analysis of superimposed oriented patterns. IEEE Trans. on Image Processing 15(12), 3690–3700 (2006) 7. Förstner, W., Gülch, E.: A fast operator for detection and precise location of distinct points, corners and centres of circular features. In: Proc. ISPRS Intercommission Conf. on Fast Processing of Photogrammetric Data, pp. 281–305 (1987) 8. Teuber, T.: Anisotropic smoothing using double orientations. Preprint University of Mannheim (2009) 9. Tschumperlé, D.: Fast anisotropic smoothing of multivalued images using curvature preserving PDEs. International Journal of Computer Vision 68(1), 65–82 (2006) 10. Weickert, J.: Anisotropic Diffusion in Image Processing. Teubner, Stuttgart (1998) 11. Tschumperlé, D., Deriche, R.: Vector-valued image regularization with PSDs: A common framework for different applications. IEEE Trans. on Pattern Analysis and Machine Intelligence 27(4) (2005)
Anisotropic Smoothing Using Double Orientations
489
12. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations. Applied Mathematical Sciences, vol. 147. Springer, New York (2002) 13. Steidl, G., Teuber, T.: Diffusion tensors for denoising sheared and rotated rectangles (submitted) (2008) 14. Tschumperlé, D.: The CImg library. C++ Template Image Processing Library, http://cimg.sourceforge.net 15. Cabral, B., Leedom, L.C.: Imaging vector fields using line integral convolution. In: SIGGRAPH 1993, Computer Graphics, vol. 27, pp. 263–272 (1993) 16. Weickert, J.: Anisotropic diffusion filters for image processing based quality control. In: Fasano, A., Primicerio, M. (eds.) Proc. Seventh European Conference on Mathematics in Industry, pp. 355–362. Teubner, Stuttgart (1994) 17. Goldfarb, D., Wen, Z., Yin, W.: A curvilinear search method for p-harmonic flows on spheres. SIAM Journal on Imaging Sciences 2(1), 84–109 (2009) 18. Kimmel, R., Sochen, N.: Orientation diffusion or how to comb a porcupine? Journal of Visual Communication and Image Representation 13(1-2), 238–248 (2002) 19. Lysaker, O., Osher, S., Tai, X.C.: Noise removal using smoothed normals and surface fitting. IEEE Trans. on Image Processing 13(10), 1345–1357 (2004) 20. Vese, L., Osher, S.: Numerical methods for p-harmonic flows and applications to image processing. SIAM Journal on Numerical Analysis 40(6), 2085–2104 (2002) 21. Yuan, J., Schnörr, C., Steidl, G.: Convex Hodge decomposition and regularization of image flows. Journal of Mathematical Imaging and Vision 33(2), 169–177 (2009) 22. Rahman, T., Tai, X.C., Osher, S.: A TV-Stokes denoising algorithm. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 473–483. Springer, Heidelberg (2007) 23. Spira, A., Kimmel, R., Sochen, N.: A short-time Beltrami kernel for smoothing images and manifolds. IEEE Trans. on Image Processing 16(6), 1628–1636 (2007) 24. Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Proc. Sixth Intern. Conf. on Computer Vision, pp. 839–846. Narosa Publishing House (1998) 25. Berkels, B., Burger, M., Droske, M., Nemitz, O., Rumpf, M.: Cartoon extraction based on anisotropic image classification. In: Vision, Modeling, and Visualization Proceedings, pp. 293–300 (2006) 26. Setzer, S., Steidl, G., Teuber, T.: Restoration of images with rotated shapes. Numerical Algorithms 48(1-3), 49–66 (2008) 27. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: IEEE Int. Conf. on Comp. Vision and Pattern Recognition., vol. 2, pp. 60–65 (2005) 28. Manjón, J.V., Buades, A.: NL means. MATLAB Software, http://dmi.uib.es/~abuades/software.html 29. The MOSEK Optimization Toolbox, http://www.mosek.com 30. Scharr, H.: Diffusion-like reconstruction schemes from linear data models. In: Franke, K., Müller, K.-R., Nickolay, B., Schäfer, R. (eds.) DAGM 2006. LNCS, vol. 4174, pp. 51–60. Springer, Heidelberg (2006) 31. Mühlich, M., Aach, T.: A theory for multiple orientation estimation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 69–82. Springer, Heidelberg (2006) 32. Setzer, S., Steidl, G.: Variational methods with higher-order derivatives in image processing. In: Neamtu, M., Schumaker, L.L. (eds.) Approximation Theory XII: San Antonio 2007, pp. 360–385. Nashboro Press (2008)
Image Denoising Using TV-Stokes Equation with an Orientation-Matching Minimization Xue-Cheng Tai1,2 , Sofia Borok1, and Jooyoung Hahn1 1
Division of Mathematical Sciences, School of Physical Mathematical Sciences, Nanyang Technological University, Singapore 2 Department of Mathematics, University of Bergen, Norway
[email protected] Abstract. In this paper, we propose an orientation-matching minimization for denoising digital images with an additive noise. Inspired by the two-step algorithm in the TV-Stokes denoising process [1, 2, 3], the regularized tangential vector field with the zero divergence condition is used in the first step. The present work suggests a different approach in order to reconstruct a denoised image in the second step. Namely, instead of finding an image that fits the regularized normal direction from the first step, we minimize an orientation between the image gradient and the regularized normal direction. It gives a nonlinear partial differential equation (PDE) for reconstructing denoised images, which has the diffusivity depending on an orientation of a regularized normal vector field and the weighted self-adaptive force term depending on the direction between the gradient of an image and the vector field. This allows to obtain a denoised image which has sharp edges and smooth regions, even though an original image has smoothly changing pixel values near sharp edges. The additive operator splitting scheme is used for discretizing Euler-Lagrange equations. We show improved qualities of results from various numerical experiments.
1
Introduction
Digital image denoising processes based on partial differential equations (PDEs) and energy minimization have been extensively studied for last 20 years in both theoretical and practical ways. From the Gaussian filtering to the anisotropic diffusion [4,5,6] and the total variation (TV) minimization [7,8], a noisy image has been denoised from poorly estimated derivative information. The TV-filtering is very effective for piecewise constant images and the anisotropic diffusion is adjustable to flow-like images. However, both approaches are not suitable for an image which has smoothly changing pixel values near sharp edges. Since qualities of denoised images are seriously dependent on estimated derivative information, it has been a crucial topic to regularize derivatives of an image [9], that is, an orientational information [10, 11, 12, 1]. Inspired by [1, 2, 3],
The research is supported by MOE (Ministry of Education) Tier II project T207N2202 and IDM project NRF2007IDMIDM002-010. In addition, the support from SUG 20/07 is also gratefully acknowledged.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 490–501, 2009. c Springer-Verlag Berlin Heidelberg 2009
The TV-Stokes Equation with an Orientation-Matching Minimization
491
we also use a regularization of the tangent vector field of an image with the zero divergence condition. The present work propose a different approach in order to reconstruct a denoised image from the regularized normal vector field, which we call an orientation-mathching minimization. That is, we minimize an orientation between the image gradient and the regularized normal direction. It gives a nonlinear PDE for reconstructing denoised images, which has the diffusivity depending on an orientation of the regularized normal vector field and the weighted self-adaptive force term depending on the direction between the gradient of an image and the vector field. This allows to obtain a denoised image which has sharp edges and smooth regions, even though an original image has smoothly changing pixel values near sharp edges. The paper is organized as follows. In Section 2, we introduce the proposed model with a review of TV-Stokes (TVS) denoising algorithm [1,2]. Some numerical aspects are explained in Section 3. Several numerical examples are shown and different models are compared in Section 4. The paper is concluded in Section 5.
2 2.1
Two-Step Denoising Model Review of TV-Stokes Denoising Algorithm
Let us consider a gray true image d: Ω ⊂ R2 → [0, 1]. We assume that a given noisy image d0 has an additive Gaussian white noise η with the relation d0 (p) = d(p) + η(p),
p = (x, y) ∈ Ω.
The normal and tangential vectors of the level curves of an image d are given by T T ∂d ∂d ∂d ∂d , ,− and t = ∇⊥ d(p) = , (1) n = ∇d(p) = ∂x ∂y ∂y ∂x where T is a transpose. Then, the vector fields are satisfied with the following conditions ∇ × n = 0 and ∇ · t = 0, which means n is the irrotational vector field and t is the incompressible vector field. This property is very crucial when an image is reconstructed from the information of n or t. The TVS denoising model [1, 2] consists of two steps to obtain a denoised image, which uses the same process in the second step as the Lysaker-OsherTai (LOT) model [10]. However, for the first step, instead of regularizing the normal vector field in the LOT model, a tangential vector field is regularized with the constraint of incompressibility. The regularized tangential vector field t is obtained by minimizing a functional: δ 2 min |∇t| + |t − t0 | dp, (2) ∇·t=0 Ω 2 where t0 = ∇⊥ d0 , δ is a positive parameter, and |∇t| is defined by
492
X.-C. Tai, S. Borok, and J. Hahn
|∇t| =
∂u ∂x
2
+
∂u ∂y
2
+
∂v ∂x
2
+
∂v ∂y
2
, ∇t =
∇u ∇v
, t=
u . v
The minimization problem is originally introduced in [2, 1]. The optimality condition for the saddle point is obtained by the gradient descent flow which gives the PDE ∂t ∇t −∇· + δ(t − t0 ) − ∇λ = 0, ∂τ |∇t| (3) ∇ · t = 0, with the boundary conditions and the initial condition ∇t + λI · ν = 0, t(p, 0) = t0 , |∇t| where I is the identity matrix. Note that it is not straightforward to use the Perona-Malik (PM) model [4] or Rudin-Osher-Fatemi (ROF) model [7] directly for regularizing derivative information of an image [9]. One of reason for regularizing the tangential vector field is that the incompressibility condition, ∇ · t = 0, is numerically computed using the Chorin projection type method which is well developed in the fluid dynamics; see details in Section 3. Moreover, the condition guarantees the existence of an image d which satisfies the relation (1). Once the regularized tangent vector field t = (u, v)T is obtained in the first step, the regularized normal vector field n is defined by (v, −u)T . In two-step algorithms for image denoising [10, 2, 1] and image inpainting [2], it is suggested to solve the following minimization problem in the second step to reconstruct an image from n: n min dp, (4) |∇d| − ∇d · |n| d−d0 2 =σ Ω where ·2 is the L2 (Ω) norm and σ is the standard deviation of a Gaussian white noise. From the Euler-Lagrange equation and the gradient descent method along fictitious time τ , we obtain a PDE for reconstructing an image with the free flux boundary condition and an initial condition d(p, 0) = d0 (p): n ∂d ∇d (p, τ ) = ∇ · − − μ(d − d0 ), (5) ∂τ |∇d| |n| where μ is a positive parameter. Note that the ROF model is in the case of n = 0, which means that TV-norm filter is very suitable for denoising a piecewise constant image. In other words, the model suffers from a stair-case effect on regions whose pixel values are smoothly changed. Since the TVS denoising model and the LOT model find an image that fits the regularized normal vector field from the PDE (5), it is natural to have a better performance than the ROF model. However, it still has problems when the original image has smoothly changing pixel values near sharp edges and the regularized normal vector field on some regions is almost parallel or has some numerical errors; see Figures 2 and 4.
The TV-Stokes Equation with an Orientation-Matching Minimization
2.2
493
Orientation-Matching Minimization
Inspired by the two-step algorithm in the TVS denoising model, we also use the regularized tangential vector field with the zero divergence condition in the first step. In this paper, we propose a new approach for reconstructing a denoised image in the second step. Namely, unlike finding an image that fits the regularized normal direction (4), we minimize an orientation between the image gradient and the regularized normal direction: |∇d · n| min dp, (6) − d−d0 2 =σ Ω |∇d||n| where ·2 and σ are same in (4). From the Euler-Lagrange equation and the gradient descent method along fictitious time τ , we obtain new PDE for obtaining a denoised image with the free flux boundary condition and an initial condition d(p, 0) = d0 (p): sgn(∇d · n) n |∇d · n| ∇d ∂d (p, τ ) = ∇ · − − μ(d − d0 ), (7) ∂τ |∇d|2 |n| |∇d| |∇d| |n| where sgn(·) is the sign function and μ is a positive parameter. Unlike the diffu1 n sivity term |∇d| and the fixed force ∇· |n| term in (5), the PDE from the proposed minimization has the diffusivity depending on an orientation of the regularized normal vector field n and the weighted self-adaptive force term depending on the direction between ∇d and n. We expect two differences between the proposed model (6) and the previous one (4) for reconstructing a denoised image. The first is that we have smaller orientation difference between the gradient of an original image and the gradient of a denoised image. The second is that the result in our model will have sharper edges in a denoised image, specially when the original image has smoothly changing pixel values near sharp edges. These are easily observed in numerical experiments and there are some plausible reasons. In order to see the first difference, we assume that θ is the angle between ∇d/|∇d| and n/|n|. Then, the functional in the proposed model is written by (−| cos θ|)dp. (8) Ω
and the functional in the previous model is presented by n ∇d · n dp = dp |∇d| − ∇d · |∇d| 1 − |n| |∇d||n| Ω Ω = |∇d|(1 − cos θ)dp.
(9)
Ω
The previous energy functional minimizes both |∇d| and the angle θ. If an image d has some regions where |∇d| is large enough, the minimization of the angle difference between ∇d/|∇d| and n/|n| has quite an weak effect. In case of very
494
X.-C. Tai, S. Borok, and J. Hahn
small |∇d|, any angle will fit to n/|n|. Even though there exists a small amount of the angle difference, the graph of a denoised image is easily affected to generate a different shape to the original image. Since the proposed energy functional only minimizes the orientation difference, the shape of a denoised result is more sensitively changed in order to fit the original image regardless of the magnitude of |∇d|. We numerically show the orientation difference in Table 1 using different methods. When we assume that ∇d is approximately parallel to n, the second difference is expected because the proposed PDE can be written by |∇d · n| ∇d sgn(∇d · n) n n ∇d 1 ∇· − ∇· − (±) . |∇d|2 |n| |∇d| |∇d| |n| |∇d| |n| |∇d| From the approximation, if |∇d| is large, we observe that the proposed model (7) is dominantly influenced by a data fidelity term and slightly affected by a regularization term. However, the previous model (5) is still affected by an additional force term from the regularized normal vector field. Since we may have some numerical errors of the vector field in a numerical computation of (2), it is difficult to know whether the additional force will generate a good result or not. Even though the extra force reduces a stair-case effect comparing to the TV-filtering method in smooth regions, it may derive an erroneous effect near edges where |∇d| is large. We numerically show qualities of a denoised image when the original image has smoothly changing pixel values near sharp edges; see Figure 2, 3, and 4.
3
Numerical Aspects
For the discretization, we use the standard staggered grid which is suggested in [2]. In this section, we briefly note some issues of discretization in the first and second steps. 3.1
A Regularization of the Tangent Vector Field
The minimization problem (2) for regularizing the tangent vector filed with the constraint of the incompressibility condition is solved by the method of Lagrange and the Chorin projection type method. We apply the Chorin projection type method and the AOS method [13, 14] to solve the PDE (3). 1. Calculation for an intermediate tangent field t∗ which is not incompressible vector field. t∗ − tn ∇t∗ = ∇· − δ(t∗ − t0 ), Δτ |∇tn | with the boundary condition ∇t∗ · ν = 0,
The TV-Stokes Equation with an Orientation-Matching Minimization
495
where |∇tn | ≡ + |∇tn |2 and tn is the tangent vector field at the nth time step. The AOS method of the linearized equation for the component u and v is used. The spatial derivatives with respect to x and y are approximated by standard one-sided finite differences. 2. Solving for λ such that ⎧ n+1 − t∗ ⎨t = ∇λ, Δτ ⎩∇ · tn+1 = 0. This gives a Poisson equation for λ with the zero Neumann boundary condition: 1 ∇ · t∗ . ∇ · ∇λ = − Δτ 3. Updating the tangent vector field by tn+1 = t∗ + Δτ ∇λ. The boundary values are updated by the incompressibility condition. More datails are shown in [2, 1]. For the stopping criterion, we use the steady state condition for the flow t = (u, v)T : n+1 − un ||∞ ||v n+1 − v n ||∞ ||u ≤ α, , max ||un ||∞ ||v n ||∞ where n and n + 1 are consecutive time steps and || · ||∞ is the L∞ (Ω) norm. Note that α = 10−4 is fixed for all examples in the paper. 3.2
A Reconstruction of a Denoised Image
After the regularized tangent vector field t = (u, v)T is computed from the first step, we propose an orientation-matching minimization (6) to reconstruct a denoised image from the regularized normal vector field n = (v, −u)T . The optimality condition for the saddle point is obtained by the gradient descent flow which gives a PDE (7). We also apply the AOS method to solve the proposed PDE. Note that we use a regularized sign function [15]:
sgnε (s) ≡ 2Hε (s) − 1,
⎧ 1 ⎪ ⎪ ⎨ Hε (s) ≡ 0 ⎪ ⎪ ⎩ 1 1 + s + 1 sin πs 2 ε π ε
s > ε, s < ε, otherwise,
and a parameter is used to avoid division by zero in numerical experiments: |∇dn | ≡ + |∇dn |2 , |n| ≡ + |n|2 , where n is the nth time step. More datails are shown in [1, 2].
496
X.-C. Tai, S. Borok, and J. Hahn
For the stopping criterion, we use the steady state condition for the relative difference in the energy (6). That is, |E n+1 − E n | ≤ β, En where E n is the energy value at the time step n approximated by
|∇dn · n| n − . E ≈ |∇dn | |n| i,j The value of β may be different for images and we use 10−2 ≤ β ≤ 10−4 . The energy (4) is similarly computed and it is used for the stopping criterion of the second step in the previous model. Remark 1. The right choice of parameters is crucial for qualities of a denoised image. The parameters, δ and μ, they control a balance between a data smoothing and a fidelity therm. The parameter is used to avoid a division by zero, which also controls the diffusivity for smoothing a data. The AOS scheme provides us a wide range of the time step. However, if Δτ is too large, then visual qualities of a denoised image are deteriorated.
4
Examples
In this section, we show numerical experiments for denoising an image based on the proposed method. With synthetic images and real images, we discuss about the strength of the proposed orientation-matching minimization and compare with results from other methods. For the simplicity, the following notations are used to indicate parameters in different methods. – – – – –
V (Δτ, δ, ): a regularization of the tangent vector field (3). M 1 (Δτ, μ, ): a reconstruction of a denoised image from (7). M 2 (Δτ, μ, ): a reconstruction of a denoised image from (5). M 3 (λ): the TV-filtering method in [8]. M 4 (μ, ρ, ): a reconstruction of a denoised image from (10).
We also include an interesting numerical experiment to combine the anisotropic nonlinear diffusion [6, 5] with the regularized tangent vector field t = (u, v)T in the first step (2). That is, the diffusivity tensor is constructed from n = (v, −u)T and we solve a PDE with the free flux boundary condition: ∂d (p, τ ) = ∇ · g Gρ ∗ nnT ∇d − μ(d − d0 ), (10) ∂τ where (Gρ ∗M )ij = Gρ ∗mij for a matrix M = (mij ) and Gρ ∗f is the convolution of f with the two-dimensional Gaussian kernel with the standard deviation ρ. The function g is defined on a set S of real semi-positive symmetric 2×2 matrices: 1 1 g(M ) ≡ √ vΛ vΛ T + √ (Λ2 ) vΛ2 vΛ2 T , + Λ1 1 1 + Λ2 where (Λ1 , vΛ1 ) and (Λ2 , vΛ2 ) are eigenpairs of M ∈ S, Λ1 ≥ Λ2 .
The TV-Stokes Equation with an Orientation-Matching Minimization
(test 1)
(test 2)
(test 3)
(test 4)
497
(test 5)
Fig. 1. Results from the proposed method: the first row is original images, we add a Gaussian white noise with zero mean and the standard deviation 10 for all images in the second row, and the last row is the result from the proposed method
Table 1. Comparison of the orientation difference γ in (11): (A) is the result of the proposed method, (B) is the result of TVS denoising method, (C) is the result of TVfilter method. The denoised image from the prosed method is shown in the third row of Figure 1. images test 1 test 2 test 3 test 4 test 5 (A) (B) (C)
(a)
(b)
0.9706 0.8693 0.7668 0.5681 0.4936 0.9316 0.8478 0.6304 0.4983 0.4051 0.7466 0.6825 0.6218 0.3891 0.3228
(c)
(d)
(e)
(f)
Fig. 2. Comparison with other methods: (a), (b), and (c) are the graph of images from top to bottom of the test 5 in Figure 1, respectively. (d) is the result of TVS denoising model and (e) is the result of TV-filtering model. (f) is the result from (10). Note that (c) is the result from the proposed model.
498
X.-C. Tai, S. Borok, and J. Hahn
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 3. (a) is an original image. We add a Gaussian white noise with zero mean and the standard deviation 20 in (b) which is larger noise than in test 4 in Figure 1. (c) is the result of the proposed model. (d) is the result of TVS denoising model and (e) is the result of TV-filtering model. (f) is the result from (10).
(a)
(a1)
(a2)
(a3)
(b)
(b1)
(b2)
(b3)
Fig. 4. (a) is a part of a tangent vector field from (2). (a1), (a2), and (a3) in the first row are a part of the images (c), (d), (f) in Figure 3, respectively. In the second row, we compute less smooth tangent vector field (b) in the first step and use the same method for the second step as the first row.
The TV-Stokes Equation with an Orientation-Matching Minimization
(a)
(b)
(c)
(d)
499
Fig. 5. There is a Gaussian white noise with zero mean and the standard deviation 10 in (a) from [16]. (b) is the result from the proposed model. (c) is the result of TVS denoising model and (d) is the result of TV-filtering model. The size of image is 240 × 124.
(a)
(b)
(c)
(d)
Fig. 6. There is a Gaussian white noise with zero mean and the standard deviation 10 in (a) from [16]. (b) is the result of the proposed model. (c) is the result of TVS denoising model and (d) is the result of TV-filtering model. The size of image is 181 × 274.
Example 1. We numerically check how well the orientation of the gradient of a denoised image is fitted to the gradient of the original image. In Table 1, we measure the orientation difference for different test images: ∇de 1 ∇dc γ= · dp, (11) |Ω| Ω |∇de | |∇dc | where de is the original image, dc is the computed denoised image, and |Ω| is the area of the domain. In the first step in (A) and (B), V (10−1 , 1, 104) is
500
X.-C. Tai, S. Borok, and J. Hahn
fixed for all test images. In the second step in (A) and (B), M 1 (10−3 , 1, 10−3) and M 2 (10−3 , 1, 10−6) for test 1, M 1 (10−3 , 1, 5 · 10−3 ) and M 2 (10−3 , 1, 2.5 × 10−5 ) for test 2, M 1 (10−3 , 1, 2.5 · 10−5 ) and M 2 (103 , 1, 5 · 10−3 ) for test 3, M 1 (10−3 , 1, 10−3 ) and M 2 (10−3 , 5, 10−3) for test 4, and M 1 (10−3 , 2, 3 × 10−3 ) and M 2 (103 , 3, 3 × 10−3 ) for test 5 are used, respectively. In (C), all results are obtained by M 3 (60). As we explain in Section 2.2, the proposed model has better performance for fitting the orientation. In Figure 2, the graph of computed results are presented in order to see visual difference. The result (f) is obtained by (10) with M 4 (0.4, 0.1, 10−3). A denoised image from the proposed method has very clean shape, even though an original image has smoothly changing pixel values near edges. We observe that results from other methods do not have very sharp edges. The result (e) from the TV-filtering model has has a stair-case effect on smooth regions. These results are expected in Section 2.2. Example 2. In Figure 3, we compare the results from different methods with larger noise in Figure 1. For a regularization of the tangent vector field in (c) and (d), V (5 × 10−2 , 1, 10−4 ) is used. The result of the proposed method in (c) is obtained by using M 1 (10−3 , 2, 10−3). (d), (e), and (f) are obtained by M 2 (10−3 , 4, 10−4 ), M 3 (80), and M 4 (0.5, 1, 10−3). Now, we observe the effect of the first step (2) to the second step in (7), (5), and (10) is numerically shown. The first row in Figure 4 is a part of images in Figure 3. In the second row, we obtain a relatively less smooth vector field with V (10−1 , 3, 10−4). (b2) is obtained by M 1 (10−3 , 2, 10−3 ) and we use same parameters for (b1) and (b3) as (a1) and (a3). Note that the result (b2) does not have very clean edge even if we use smaller μ in the second step for the previous model (5). The other methods, (5) and (10), are responded by a small change of the vector field because the field is directly used in the formulation without considering any relation with an image data. Example 3. For real images, we make a comparison with denoised images from different methods. In Figure 5, the image (a) is obtained by the proposed method using V (10−1 , 5, 10−4 ) and M 1 (5 × 10−4 , 5, 5 × 10−4 ). (b) is from V (5 × 10−2, 5, 10−4 ) and M 2 (10−3 , 1, 5 × 10−3). (c) is from M 3 (60). In Figure 6, the image (a) is obtained by the proposed method with V (10−1 , 2, 10−4) and M 1 (10−4 , 30, 10−3). (b) is from V (10−1 , 2, 10−4 ) and M 2 (10−3 , 2, 10−3). (c) is from M 3 (60). For these images, two models (4) and (6) give similar results which are better than the TV-filtering model.
5
Conclusions
We proposed an orientation-matching minimization for denoising digital images. Our algorithm consisted of two steps. In the first step, we use the regularized tangent vector field with the incompressibility condition which is suggested in [2]. The condition is crucial for reconstructing an image from the vector field. In the second step, the present work proposed a minimization of an orientation between the image gradient and the regularized normal direction. It gives a nonlinear PDE for reconstructing a denoised images, which has the diffusivity depending on an
The TV-Stokes Equation with an Orientation-Matching Minimization
501
orientation of the regularized normal vector field and the weighted self-adaptive force term depending on the direction between the gradient of an image and the vector field. This allows to obtain a denoised image which has sharp edges and smooth regions, even though an original image has smoothly changing pixel values near sharp edges. We show improved qualities of results from various numerical experiments.
References 1. Rahman, T., Tai, X.C., Osher, S.: A TV-Stokes denoising algorithm. In: Sgallari, F., Murli, A., Paragios, N. (eds.) Scale Sace and Variational Methods in Computer Vision, pp. 473–482. Springer, Heidelberg (2007) 2. Tai, X.C., Osher, S., Holm, R.: Image inpainting using TV-Stokes equation. In: Image Processing Based on Partial Differential Equations, pp. 3–22. Springer, Heidelberg (2006) 3. Bertalmio, M., Sapiro, G., Bertozzi, A.L.: Navier-Stokes, fluid dynamica, and image and video inpainting. In: Proc. Conf. Comp. Vision Pattern Rec., pp. 355–362 (2001) 4. Perona, P., Malik, J.: Scale space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Machine Intell. 12(7), 629–639 (1990) 5. Weickert, J.: Coherence-enhancing diffusion filtering. Int. J. Comput. Vis. 31, 111– 127 (1999) 6. Brox, T., Weickert, J., Burgeth, B., Mrázek, P.: Nonlinear structure tensors. Image Vis. Comput. 24, 41–55 (2006) 7. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 8. Bresson, X., Chan, T.: Fast daul minimization of the vectorial total variation norm and applications to color image processing. Inverse Problems and Imaging 2(4), 455–484 (2008) 9. Hahn, J., Lee, C.O.: A nonlinear structure tensor with the diffusivity matrix composed of the image gradient. J. Math. Imag. Vis. (accepted) 10. Lysaker, M., Osher, S., Tai, X.C.: Noise removal using smoothed normals and surface fitting. IEEE Trans. Image Processing 13(10), 1345–1357 (2004) 11. Vese, L., Osher, S.: Numerical methods for p-harmonic flows and applications to image processing. SIAM J. Numer. Anal. 40(6), 2085–2104 (2002) 12. Sochen, N., Sagiv, C., Kimmel, R.: Stereographic combing a porcupine or studies on direction diffusion in image processing. SIAM J. Appl. Math. 64(5), 1477–1508 (2004) 13. Lu, T., Neittaanmaki, P., Tai, X.C.: A parallel splitting up method for partial differential equations and its application to Navier-Stokes equations. RAIRO Math. Model. and Numer. Anal. 26(6), 673–708 (1992) 14. Weickert, J., ter Harr Romeny, B.M., Viergever, M.A.: Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Trans. Image Processing 7, 398– 410 (2001) 15. Chan, T., Vese, L.: Active contours without edges. IEEE Trans. Image Processing 10, 266–277 (2001) 16. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proc. 8th Int’l Conf. Computer Vision, vol. 2, pp. 416–423 (July 2001)
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration for ROF Model Xue-Cheng Tai1 and Chunlin Wu2 1
2
Division of Mathematical Science, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore and Department of Mathematics, University of Bergen, Johannes Brunsgate 12, N-5008 Bergen, Norway
[email protected] Division of Mathematical Science, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore
Abstract. In the recent decades the ROF model (total variation (TV) minimization) has made great successes in image restoration due to its good edge-preserving property. However, the non-differentiability of the minimization problem brings computational difficulties. Different techniques have been proposed to overcome this difficulty. Therein methods regarded to be particularly efficient include dual methods of CGM (Chan, Golub, and Mulet) [7] Chambolle [6] and split Bregman iteration [14], as well as splitting-and-penalty based method [28] [29]. In this paper, we show that most of these methods can be classified under the same framework. The dual methods and split Bregman iteration are just different iterative procedures to solve the same system resulted from a Lagrangian and penalty approach. We only show this relationship for the ROF model. However, it provides a uniform framework to understand these methods for other models. In addition, we provide some examples to illustrate the accuracy and efficiency of the proposed algorithm.
1
Introduction
Image restoration such as denoising and deblurring is one of the most fundamental task in image processing and is in general based on regularization. To preserve image edges and features during image regularization procedures is difficult but very desired. Recently the ROF model [23] has been demonstrated to be very successful in edge-preserving image restoration; see [9] [11] and references therein. Consequently the model attracted much attention and has been extended to high order models [8] [31] [18] [19] [16] [25] and vectorial models [24] [2] [10] for color image restoration [17] [27]. However, the computation of the ROF model suffers from serious nonlinearity and non-differentiability. In [23], the authors proposed an artificial time marching strategy to the associated Euler-Lagrange equation. This method is slow due to strict stability constraints in the time step size. Besides, the artificial time marching method computes solutions of not the exact ROF model, but its approximation, say, regularized ROF model. Different techniques have been proposed to overcome this difficulty. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 502–513, 2009. c Springer-Verlag Berlin Heidelberg 2009
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration
503
There are several methods regarded as particularly efficient. One approach is the dual methods [7] [5] [6], which is based on various dual formulations of the model. The other is split Bregman iteration [14], which uses functional splitting and Bregman iteration for constrained optimization [20] [30]. Similar to split Bregman iteration, another approach based on splitting and then alternating minimization of the penalized cost function was proposed in [28] [29]. In this paper, we present augmented Lagrangian method to solve the model and show that the dual method and split Bregman iteration can actually be either deduced from, or equivalent to our method.
2
ROF Model and Related Numerical Solvers
Assume Ω ⊂ R2 is a bounded open subset (usually a rectangle in image processing) and f : Ω → R is an observed image. f often contains various degradation and can be noisy and blurred, which is usually modelled as f = Ku + n,
(1)
where u is the true image, and K, n are the linear operator and noise respectively. The K operator may stand for the identity operator, or various blur operations such as Gaussian blur and motion blur. The noise n may denote Gaussian noise or salt-pepper noise or even others. Image restoration aims to recover u from f with some information of K and n. In this paper we assume that n is some Gaussian white noise and K is a general blur operator. Since the variance of n and the blur kernel of K can usually be estimated, we further assume we know K and the variance of n exactly. With these knowledge, it’s still difficult to recover u from f . Even in the pure denoising case (K = I), it’s not an easy task to get u since we only know the variance of the random noise n. For pure deblur case in which K = I and n = 0, we cannot directly solve f = Ku to get u due to the compactness of K. The problem f = Ku is ill-posed, and the solution would be highly oscillatory. Regularization on the solution should be considered. The restoration problem is thus presented using some regularity R(u) as min R(u) u
s.t.f − Ku2 = σ 2 ,
(2)
where σ is the variance of n. The constrained minimization problem is often solved approximately using Tikhonov regularization as follows min F (u) = R(u) + u
λ Ku − f 2 , 2
(3)
for some parameter λ. There are many choices for the regularity term R(u). One of the most basic and successful choice of the regularity is due to Rudin, Osher, and Fatemi [23] in which R(u) was chosen to be the total variation of u. The so-called ROF model reads
504
X.-C. Tai and C. Wu
u = arg min Frof (u) = u
|∇u| + Ω
λ Ku − f 2 . 2
(4)
In [23] the authors considered the image denoising problem (K = I) and presented a gradient descent method to solve (4). (Here the method is described for general K.) The artificial time marching was introduced to the associated Euler-Lagrange equation as follows ∇u ) |∇u|2 +β
ut = ∇ · ( √ u(0) = f
+ K ∗ (f − Ku)
,
(5)
where β is a small positive number to avoid zero division and K ∗ is the L2 adjoint of K. There are mainly two drawbacks for the gradient descent method (5). At first, the method computes the solution of (4) not exactly, but approximately. On the second, the method is slow due to strict constraints on the time step size. The choice of β affects both aspects. Larger the β, more efficient the scheme is, whereas worse the approximation will be. There is a tradeoff between the accuracy and efficiency in choosing β. Many algorithms have been proposed to improve on this method. Those regarded as particularly efficient include dual methods and split Bregman iteration, as well as splitting-and-penalty based method, as mentioned before. Before we go on, we present here an obviously equivalent formulation of the restoration problem (4), which will play an important roll in our derivation. The difficulty to solve the ROF restoration model (4) is due to the nondifferentiability of the total variation norm. We introduce an auxiliary variable q for ∇u to separate the calculation of the non-differentiable term and the fidelity term. The model (4) is thus equivalent to min Grof (u, q) = Ω |q| + λ2 Ku − f 2 u,q , (6) ∂x u q1 = s.t. q= = ∇u ∂y u q2 a constrained optimization problem. 2.1
CGM Dual Method
In [7] Chan et al presented a primal-dual method for the TV minimization. They introduced a new variable ∇u (7) ω= |∇u| to the Euler-Lagrange equation of the model (4), yielding −∇ · ω + λK ∗ (Ku − f ) = 0 , ∇u − ω|∇u| = 0
(8)
to remove some of the singularity caused by the non-differentiability of the object functional.
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration
505
Different from the original Euler-Lagrange equation for u, this system contains both u and ω variables. In [7], u and ω are called the primal and dual variables, respectively. Again the authors approximate this primal-dual system using a regularized TV norm for real calculation. Newton’s linearization technique for both the primal and dual variables is used to solve the discrete version. 2.2
Chambolle’s Dual Method
Another work based on dual formulation with a slightly different derivation is due to Chambolle. In [6] Chambolle used Legendre-Fenchel transform and a key result from optimization theory to get an original and efficient algorithm for total variation minimization. The primal variable of the image data is expressed explicitly with the dual variable and only the dual variable is iteratively computed. The primal variable u is obtained from the final result of the dual variable. However, the algorithm dose not consider general K operators. Specifically, Chambolle adopted the following definition of total variation for general (not necessary to be smooth) function u: TV(u) = sup{ u(x)∇ · ξ(x) : ξ ∈ Cc1 (Ω; R2 ), |ξ(x)| ≤ 1, ∀x ∈ Ω}. (9) Ω
Denoting S = Closure{∇ · ξ(x) : ξ ∈ Cc1 (Ω; R2 ), |ξ(x)| ≤ 1, ∀x ∈ Ω},
(10)
Chambolle showed that the ROF restoration model (4) with K = I (Note the slight difference between Eqn. (4) and the model in [6] about the parameter λ) yields 1 u = f − πS (λf ) = f − π S (f ), (11) λ λ where πS (·) is the L2 norm projection operator to S, which reads πS (·) = arg min {divξ(x) − ·2 : |ξ(x)| ≤ 1, ∀x ∈ Ω}. divξ(x)
(12)
Since S is not a linear space, this projection is nonlinear. From the KKT conditions and with a careful observation, it was shown in [6] that ξ(x) for πS (λf ) satisfies −∇(divξ(x) − λf ) + |∇(divξ(x) − λf )|ξ(x) = 0, (13) which can be solved by a semi-implicit gradient descent algorithm. Note here we present the continuous case instead of the discrete version used in [6]. 2.3
Split Bregman Iteration
Recently (split) Bregman iteration attracts much attention in signal recovery and image processing community. The basic idea is to transform a constrained optimization problem to a series of unconstrained problems. In each unconstrained
506
X.-C. Tai and C. Wu
problem, the object function is defined by Bregman distance [3] of a convex functional. The Bregman distance of a convex functional J(u) is defined as the following (nonnegative) quantity DJp (u, v) ≡ J(u) − J(v)− < p, u − v >,
(14)
where p ∈ ∂J(v). When J(u) is a continuously differentiable functional, its sub-differential ∂J(v) has a single element for each v, and consequently the Bregman distance is unique. In this case the distance is just the difference at the point u between J(·) and its first order approximation at the point v. For non-differentiable functionals, the sub-differential may contain none or multiple values. Therefore, the Bregman distance between u and v can be ill-defined or multivalued. However, this poses no difficulty for the iterative algorithms as the algorithms automatically choose a unique sub-gradient in each iteration as long as the fidelity term for the constraints is differentiable (this condition holds usually). We also want to emphasis here that Bregman distance of a functional is not a distance in the usual sense since, in general, DJp (u, v) = DJp (v, u) and the triangle inequality does not hold. See [20] [30] for more details. To find the solution of the ROF model (4), or equivalently the constrained problem (6), split Bregman iteration (In [14] algorithms for K = I, say, TV denoising are presented) solves a sequence of unconstrained problems taking the form as k r (pk u ,pq ) (uk+1 , q k+1 ) = arg min DGrof ((u, q), (uk , q k )) + |q − ∇u|2 , (15) u,q 2 Ω where pku , pkq , sometimes written together to be (pku , pkq ), are the sub-gradients of Grof at (uk , q k ) with respect to u and q, respectively. Taking the update of the sub-gradients into consideration, the iteration procedure can be formulated as Algorithm 1. For the computation of (uk+1 , q k+1 ), we refer to Algorithm 3 for more details. Algorithm 1. Split Bregman iteration for the ROF model 1. Initialization: q 0 = 0, u0 = 0, p0q = 0, p0u = 0; 2. For k=0, 1, 2, ...: Compute (uk+1 , q k+1 ) using Eqn. (15), and update = pku − rdiv(q k+1 − ∇uk+1 ) pk+1 u . k+1 pq = pkq − r(q k+1 − ∇uk+1 )
3
(16)
Augmented Lagrangian Method, and Relations to Dual Methods and Split Bregman Iteration
In this section we present augmented Lagrangian method [15] [21] [22] for the ROF model, or equivalently the constrained problem (6). Augmented Lagrangian
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration
507
method has many advantages over other methods such as penalty method [1], and has been successfully applied to nonlinear PDE and mechanics [13]. We also show that the dual methods and split Bregman iteration can be either deduced from, or equivalent to augmented Lagrangian method. 3.1
Augmented Lagrangian Method
In augmented Lagrangian method, one solves the constrained optimization problem (6) by λ r 2 min max Lrof (u, q, μ) = |q| + Ku − f + μ · (q − ∇u) + |q − ∇u|2 , u,q μ 2 2 Ω Ω Ω (17) μ1 is the Lagrange multiplier and r is a positive constant. That where μ = μ2 is, the method is to seek a saddle point of the augmented Lagrangian functional Lrof (u, q, μ). The system of optimality conditions is thus ∂Lrof = λK ∗ (Ku − f ) + ∇ · μ + r∇ · (q − ∇u) = 0, ∂u q ∂Lrof = + μ + r(q − ∇u) = 0, ∂q |q| ∂Lrof = q − ∇u = 0. ∂μ
(18) (19) (20)
We now have two ways to solve the problem (17). One is using optimization techniques to directly minimize/maximize corresponding functionals; while the other is solving the associated system of optimality conditions. The augmented Lagrangian method uses an iterative procedure to solve (17); see Algorithm 2. The iterative scheme runs until some stopping condition is satisfied. Algorithm 2. Augmented Lagrangian method for the ROF model 1. Initialization: u0 = 0, q 0 = 0, µ0 = 0; 2. For k=0,1,2,...: compute (uk+1 , q k+1 ) as a minimizer of the augmented Lagrangian method for the Lagrange multiplier µk , i.e., (uk+1 , q k+1 ) = arg min Lrof (u, q, µk ), u,q
(21)
where Lrof (u, q, µk ) is defined in Eqn. (17); and update µk+1 = µk + r(q k+1 − ∇uk+1 ).
(22)
To solve the problem (21), we separate it to the following two sub-problems ([28] [29]): r λ arg min Ku − f 2 − μk · ∇u + |q − ∇u|2 , (23) u 2 2 Ω Ω
508
X.-C. Tai and C. Wu
for given q, and arg min q
|q| +
Ω
μk · q + Ω
r 2
|q − ∇u|2 ,
(24)
Ω
for given u. Sub-problems (23) and (24) can be efficiently solved. For (23), the optimality condition gives a linear equation λK ∗ (Ku − f ) + divμk + rdivq − r u = 0 for u, which allows us to use Fast Fourier transforms. Denoting F (u) as the Fourier transform of u, we get u from u = F −1 (
λF (K ∗ )F (f ) − F (div) · F(μk ) − rF (div) · F(q) ), λF (K ∗ )F (K) − rF ( )
(25)
where applying Fourier transform to a vector such as div and μk means applying Fourier transform to its components, respectively; and Fourier transforms of operators such as K, ∂x , ∂y , are regarded as the transforms of their corresponding convolution kernels (for differential operators, the kernels will be approximated by kernels of difference operators). For (24), we actually have the following closed form solution 1 1 (1 − |w(x,y)| )w(x, y), |w(x, y)| > 1, q= r (26) 0, |w(x, y)| ≤ 1, where w = r∇u − μk , since we can reformulate the problem to be 1 arg min |rq| + |rq − (r∇u − μk )|2 . q 2 Ω Ω Based on these observation, we can use Algorithm 3 to solve (21). Here N can be chosen using some convergence test techniques. In common augmented Lagrangian method, one usually sets N = 1.
Algorithm 3. Augmented Lagrangian method for the ROF model – solve the sub-problem of Eqn. (21) 1. Initialization: uk+1,0 = uk , q k+1,0 = q k ; 2. For n = 0, 1, 2, ..., N : Compute uk+1,n+1 from Eqn. (25) for q = q k+1,n ; and then compute q k+1,n+1 from Eqn. (26) for u = uk+1,n+1 ; 3. uk+1 = uk+1,N , q k+1 = q k+1,N .
As for the second approach to solve the problem (17), people can use some other iterative procedures to solve the corresponding optimality system. Actually the optimality system naturally infers CGM and the dual method of Chambolle as shown in the following.
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration
3.2
509
Relations between Augmented Lagrangian Method and Dual Methods as Well as Split Bregman Iteration
In this sub-section we show that CGM and Chambolle’s dual methods for the ROF model can be deduced naturally from the augmented Lagrangian method. This is a much simpler derivation of the dual methods. Also split Bregman iteration is demonstrated to be equivalent to Algorithm 2. Connection to CGM Dual Method. We first show that CGM dual method can be deduced from the augmented Lagrangian method. The optimality conditions for the augmented Lagrangian approach are given in (18)–(20). From Eqn. (20), we get q = ∇u. Combining this with (19), we see that μ=−
∇u . |∇u|
(27)
Therefore, the dual variable in CGM dual method is nothing but the Lagrange multiplier μ with a different sign. Hence, the system of optimality conditions (18)–(20) is equivalent to ∇ · μ + λK ∗ (Ku − f ) = 0 , ∇u + μ|∇u| = 0
(28)
which is just the primal-dual system of CGM dual method if one replaces −μ with ω. Connection to Chambolle’s Dual Method. We now further derive Chambolle’s dual method. From the first equation of (28), we get u as: u = (λK ∗ K)−1 (λK ∗ f − divμ),
(29)
yielding the equation for the dual variable ∇((K ∗ K)−1 (λK ∗ f − divμ)) + |∇((K ∗ K)−1 (λK ∗ f − divμ))|μ = 0.
(30)
For image denoising problems where K = I, (30) and (29) are just the equations used by Chambolle in [6] to solve the dual variable and recover the primal variable u, respectively. The equation (30) for the dual variable in [6] was obtained through a not well-known KKT conditions for inequalities constrained optimization problems, whereas here we deduce this equation very naturally from the augmented Lagrangian method. This is a generic formulation and is not discussed in [6]. We also point out here that some connections between CGM and Chambolle’s dual methods have been noticed in [32]. Connection to Split Bregman Iteration. The split Bregman iteration is actually equivalent to the augmented Lagrangian method. Considering the zero initialization for the sub-gradients and the Lagrange multiplier and letting (pku , pkq ) = −(divμk , μk )
(31)
510
X.-C. Tai and C. Wu
for each k, we have (uk+1 , q k+1 )
r + |q − ∇u|2 = 2 Ω λ r = arg min |q| + Ku − f 2 + udivμk + μk · q + |q − ∇u|2 u,q Ω 2 2 Ω Ω Ω λ r 2 k k = arg min |q| + Ku − f − μ · ∇u + μ ·q+ |q − ∇u|2 u,q Ω 2 2 Ω Ω Ω k (pk u ,pq ) ((u, q), (uk , q k )) arg min DGrof u,q
= arg min Lrof (u, q, μk ), u,q
indicating the equivalence between split Bregman iteration and the iterative procedure for augmented Lagrangian method. In the context of compressive sensing, this equivalence has been pointed out in [30].
Original SNR: InfdB
Blurry&Noisy SNR: 6.30dB
deconvwnr deconvreg SNR: 11.29dB, t = 0.08s SNR: 11.17dB, t = 0.36s
ALM(r=10) SNR: 12.99dB, t = 0.86s
deconvlucy SNR: 9.29dB, t = 1.31s
Fig. 1. Augmented Lagrangian method for ROF restoration, and comparisons to builtin Matlab functions. In the sub-figures, SNR and t denote signal-noise-ratio and the CPU time usage, respectively.
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration FTVd(r0=1, SF=2, r=256) SNR: 12.62dB, t = 1.09s
511
ALM(r0=1, SF=2, r=128) ALM(r0=1, SF=1.70, r=69.758) SNR: 12.52dB, t = 0.75s SNR: 12.71dB, t = 0.80s
Fig. 2. Comparisons between FTVd package (splitting-and-penalty) and augmented Lagrangian method with increasing penalty parameters for ROF restoration. In the sub-figures, r0, SF and r stand for the initial value, the scaling factor and the final value of the penalty parameter of methods, respectively. Here, SNR and t denote signalnoise-ratio and the CPU time usage, respectively.
3.3
Remark
We want to emphasis that our observations can be extended to many other models including anisotropic TV, high order nonlinear PDE filters (e.g. fourth order models), vectorial TV, and even general models. Similarly, we can use FFTbased fast solvers and closed form solutions to solve the sub-problems for the corresponding algorithms. In addition, one can also derive naturally the dual methods [12] [26] [4] from the system of optimality conditions of augmented Lagrangian functionals for these models. Furthermore, the equivalence between split Bregman iteration and augmented Lagrangian method is also valid for these models. More details will be given in a forthcoming paper.
4
Examples
Two numerical examples are provided in Fig. 1 and Fig. 2 to illustrate the accuracy and efficiency of our method. We compare our method with some builtin Matlab functions, i.e. deconvwnr.m, deconvreg.m and deconvlucy.m in Fig. 1. As one can see, our method generates much better restoration than these built-in Matlab functions in comparable (or even less) CPU time costs. We also compare our method (with increasing parameter r) in Fig. 2 with the recently developed FTVd package based on pure splitting-and-penalty, which is one of the most efficient approaches as compared to other existing methods as discussed in [29]. From Fig. 1 and 2 people can also compare FTVd with our method with fixed parameter r.
5
Conclusion
In this paper we use an approach based on augmented Lagrangian method for ROF model. The algorithm benefits from FFT-based fast solvers and closed
512
X.-C. Tai and C. Wu
form solution. We also show that our method gives a uniform framework to understand the approaches currently regarded to be particularly efficient for ROF model, such as dual methods and split Bregman iteration. The CGM and Chambolle’s dual methods are different iterative schemes to solve the Augmented Lagrangian systems and the dual variables in these methods are nothing but the Lagrange multiplier. Split Bregman iteration is actually equivalent to augmented Lagrangian method. Numerical examples demonstrate the accuracy and efficiency of our approach. The method can be extended to many other restoration models.
Acknowledgements This research has been supported by MOE (Ministry of Education) Tier II project T207N2202 and IDM project NRF2007IDM-IDM002-010. Support from SUG 20/07 is also gratefully acknowledged.
References 1. Bertsekas, D.P.: Multiplier methods: a survey. Automatica 12, 133–145 (1976) 2. Blomgren, P., Chan, T.F.: Color TV: total variation methods for restoration of vector-valued images. IEEE Trans. Image Process. 7, 304–309 (1998) 3. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7, 200–217 (1967) 4. Bresson, X., Chan, T.F.: Fast minimization of the vectorial total variation norm and applications to color image processing. UCLA CAM Report 07-25 (2007) 5. Carter, J.L.: Dual methods for total variation – based image restoration. Ph.D. thesis, UCLA (2001) 6. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004) 7. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20, 1964–1977 (1999) 8. Chan, T., Marquina, A., Mulet, P.: High-order total variation-based image restoration. SIAM J. Sci. Comput. 22, 503–516 (2000) 9. Chan, T.F., Osher, S., Shen, J.: The digital TV filter and nonlinear denoising. IEEE Trans. Image Process. 10, 231–241 (2001) 10. Chan, T.F., Kang, S.H., Shen, J.H.: Total variation denoising and enhancement of color images based on the CB and HSV color models. J. Visual Commun. Image Repres. 12, 422–435 (2001) 11. Chan, T., Esedoglu, S., Park, F.E., Yip, A.: Recent developments in total variation image restoration. UCLA CAM Report 05-01 (2005) 12. Chan, T.F., Esedoglu, S., Park, F.E.: A fourth order dual method for staircase reduction in texture extraction and image restoration problems. UCLA CAM Report 05-28 (2005) 13. Glowinski, R., Le Tallec, P.: Augmented Lagrangians and operator-splitting methods in nonlinear mechanics. SIAM, Philadelphia (1989) 14. Goldstein, T., Osher, S.: The split Bregman method for L1 regularized problems. UCLA CAM Report 08-29 (2008)
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration
513
15. Hestenes, M.R.: Multiplier and gradient methods. Journal of Optimization Theory and Applications 4, 303–320 (1969) 16. Hinterberger, W., Scherzer, O.: Variational methods on the space of functions of bounded Hessian for convexification and denoising. Computing 76, 109–133 (2006) 17. Kimmel, R., Malladi, R., Sochen, N.: Images as embedded maps and minimal surfaces: movies, color, texture, and volumetric medical images. Int’l J. Computer Vision 39, 111–129 (2000) 18. Lysaker, M., Lundervold, A., Tai, X.-C.: Noise removal using fourth-order partial differential equation with applications to medical Magnetic Resonance Images in space and time. IEEE Trans. Image Process. 12, 1579–1590 (2003) 19. Lysaker, M., Tai, X.-C.: Iterative image restoration combining total variation minimization and a second order functional. Int’l J. Computer Vision 66, 5–18 (2006) 20. Osher, S., Burger, M., Goldfarb, D., Xu, J.J., Yin, W.T.: An iterative regularization method for total variation-based image restoration. SIAM Multiscale Model. Simul. 4, 460–489 (2005) 21. Powell, M.J.D.: A method for nonlinear constraints in minimization problems. Optimization. In: Fletcher, R. (ed.), pp. 283–298. Academic Press, New York (1972) 22. Rockafellar, R.T.: A dual approach to solving nonlinear programming problems by unconstrained optimization. Mathematical Programming 5, 354–373 (1973) 23. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 24. Sapiro, G., Ringach, D.L.: Anisotropic diffusion of multivalued images with applications to color filtering. IEEE Trans. Image Process 5, 1582–1586 (1996) 25. Scherer, O.: Denoising with higher order derivatives of bounded variation and an application to parameter estimation. Computing 60, 1–27 (1998) 26. Steidl, G.: A note on the dual treatment of higher-order regularization functionals. Computing 76, 135–148 (2006) 27. Tschumperlé, D., Deriche, R.: Vector-valued image regularization with PDEs: a common framework for different applications. IEEE Trans. Pattern Anal. Machine Intell. 27, 506–517 (2005) 28. Wang, Y.L., Yin, W.T., Zhang, Y.: A fast algorithm for image deblurring with total variation regularization. UCLA CAM Report 07-22 (2007) 29. Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences (to appear) 30. Yin, W.T., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for compressend sensing and related problems. SIAM J. Imaging Sciences 1, 143–168 (2008) 31. You, Y.-L., Kaveh, M.: Fourth-order partial differential equation for noise removal. IEEE Trans. Image Process. 9, 1723–1730 (2000) 32. Zhu, M., Wright, S.J., Chan, T.F.: Duality-based algorithms for total variation image restoration. UCLA CAM Report 08-33 (2008)
The Convergence of a Central-Difference Discretization of Rudin-Osher-Fatemi Model for Image Denoising Ming-Jun Lai1 , Bradley Lucier2 , and Jingyue Wang3 1 2
University of Georgia, Athens GA 30602, USA
[email protected] Purdue University, West Lafayette IN 47907, USA
[email protected] 3 University of Georgia, Athens GA 30602, USA
[email protected] Abstract. We study the connection between minimizers of the discrete and the continuous Rudin-Osher-Fatemi models. We use a centraldifference total variation term in the discrete ROF model and treat the discrete input data as a projection of the continuous input data into the discrete space. We employ a method developed in [13] with slight adaption to the setting of the central-difference total variation ROF model. We obtain an error bound between the discrete and the continuous minimizer in L2 norm under the assumption that the continuous input data are in W 1,2 .
1
Introduction
One of the most influential variational models for image denoising is the total variation–based model proposed by Rudin, Osher and Fatemi(ROF) [10]. This model studies the following constrained minimization problem: arg min |u|BV u with u= g Ω
Ω
and
(1) |u − g|2 = σ 2
Ω
where g is the input data, σ is the standard deviation of the noise, Ω is the unit square [0, 1]2 , and |u|BV is the total variation (TV) of u defined as follows. We consider functions φ in the space of C 1 functions from Ω to R2 with compact support, i.e., [C01 (Ω)]2 . The variation of a function u ∈ L1 (Ω) is then defined to be |u|BV :=
|Du| := Ω
u∇ · φ.
sup φ∈[C01 (Ω)]2 , |φ|≤1 point-wise
Ω
For more details on functions of bounded variation, we refer the reader to [9]. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 514–526, 2009. c Springer-Verlag Berlin Heidelberg 2009
The Convergence of a Central-Difference Discretization of ROF Model
515
The existence and uniqueness of the minimizer of (1) have been studied by Lions, Osher and Rudin [11] and more completely by Acar and Vogel [1]. Chambolle and Lions [4] proved that the constrained problem (1) is equivalent to the following unconstrained problem: 1 arg min |u|BV + |u − g|2 . (2) u 2λ Ω They also proved more general results of existence and uniqueness of (1). We later call 1 |u − g|2 E(u) = |u|BV + (3) 2λ the ROF energy functional. On the computing side, the most commonly used discrete variational model is based on the discrete energy Ek (u) =
k−1
μi,j |(∇u)i,j | +
i,j=0
k−1 1 μi,j (ui,j − gi,j )2 , 2λ i,j=0
(4)
where u is defined by a 2-dimensional matrix of size k × k, μi,j is related to the scale k. A simple choice of μi,j is μi,j = 1/k 2 . There are several possible choices for the discrete gradient operator ∇u [3], [5], and [13]. A common choice is (∇u)i,j = ((∇x u)i,j , (∇y u)i,j ) , with (∇x u)i,j =
ui+1,j − ui,j , h
(∇y u)i,j =
ui,j+1 − ui,j , h
where h = 1/k. On the boundary, u is assumed to satisfy the discrete Neumann boundary conditions: u−1,j = u0,j , ui,−1 = ui,0 ,
uk,j = uk−1,j , ui,k = ui,k−1 .
(5) (6)
The discrete function gi,j is the input image. Many efficient algorithms have been developed to find the numerical minimizer of (4) [6], [2], [3]. It is not hard to show that Ek Γ -converges to E (for the definition of Γ convergence, we refer the reader to [7]), therefore, the sequence {uk }, minimizers of Ek , converges to u, the minimizer of E, in L1 (Ω) and Ek (uk ) converges to E(u) as k tends to ∞ (cf. [7]). It is interesting to know the rate of convergence and the convergence in other norm, e.g., in L2 norm. It is also interesting see the difference between the continuous minimizer and the discrete minimizer. The authors in [13] proved that if the discrete energy Ek is equipped with a symmetrical discrete total variation as defined in (7) and the discrete input data g k is the projection of the
516
M.-J. Lai, B. Lucier, and J. Wang
continuous input data g by taking average of g on each pixel, one can bound the error between the discrete minimizer uk and the continuous u in L2 norm by the Lipschitz norm of g provided that g is in some Lipschitz space. ⎛ 2 2 ⎞1/2 k−1 h2 k uki+1,j − uki,j uki,j+1 − uki,j u = ⎠ + ⎝ + TV 4 h h i,j=0 ⎛
uk − uki,j ⎝ i+1,j h ⎛ ⎝
uki,j
− h
uki−1,j
⎛
uk − uki−1,j ⎝ i,j h
2
+
2
+
2
+
uki,j − uki,j−1 h − h
uki,j+1
uki,j
uki,j − uki,j−1 h
2 ⎞1/2 ⎠
+
2 ⎞1/2 ⎠
+
2 ⎞1/2 ⎠
(7)
In this paper, we extend the study in [13], [12] to the discrete ROF model equipped with a central-difference TV term which is much simpler than the symmetrical discrete TV term. The ideas for the study in this paper is exactly the same to the ones in [13]. However, a problem of the central-difference model is that it does not deal well with some non-smooth data, for example, a chessboard image. Thus we have to adapt the study in [13] slightly to this situation and put a stronger assumption on the input data g in order to establish the convergence. We can still get a similar error bound if the input data g ∈ W 1,2 . More precisely, our main results are Theorem 1. If g ∈ W 1,2 , u is the minimizer of E in (3) and uk is the minimizer of Ek in (4) equipped with the central-difference TV operator, we will give the definition in (10), then |E(u) − Ek (uk )| ≤ C(1 +
1 )(gW 1,2 + g2W 1,2 )h1/2 . λ
and Theorem 2. If g ∈ W 1,2 , u is the minimizer of the functional E in (3) and uk is the minimizer of the functional Ek in (10), then Ih uk − u2 ≤ C(λ + 1)(gW 1,2 + g2W 1,2 )h1/2 . where Ih uk is the piecewise constant injection of uk into L2 space. The definition of Ih uk will be given in (14) in the next secion.
2
Preliminaries
A continuous image u is defined as a L2 function on Ω ⊂ R2 . In practice, we always assume Ω to be the unit square [0, 1] × [0, 1].
The Convergence of a Central-Difference Discretization of ROF Model
517
We assume the output of denoised image to be in the space of bounded variation. In the discrete settings, we consider the discrete set Ω k to be the set of all pairs i = (i1 , i2 ) ∈ Z 2 with 0 ≤ i1 , i2 ≤ k. A discrete image uk is defined as a function on Ω k . We always use superscripts to indicate a function is a discrete function through this paper. For discrete functions, we define the discrete p (Ω k ) norms ⎛ ⎞1 u p (Ω k ) k
:= ⎝
p
|uki |p
μi ⎠
for 1 ≤ p ≤ ∞
i∈Ω k
where μi is the measure of the discrete space at each index i. The simplest choice of μi is μi = 1 for i ∈ Ω k . In analogue of Sobolev norm, we define the discrete Sobolev norm as follows. The first order forward finite differences of uk at point i = (i1 , i2 ) are k Δ+ x ui =
uki1 +1,i2 − uki1 ,i2 ; h
k Δ+ y ui =
uki1 ,i2 +1 − uki1 ,i2 , h
where h = 1/k is the step size. We can also define backward finite difference as k Δ− x ui =
uki1 ,i2 − uki1 −1,i2 ; h
k Δ− y ui =
uki1 ,i2 − uki1 ,i2 −1 . h
One can define the second order finite difference as Δxx uki =
k − k Δ+ x ui − Δx ui . h
Also Δyy uki can be similarly defined. We define ∇uk 1 , Δxx uk 1 , Δyy uk 1 as k + k ∇uk 1 := (|Δ+ x ui | + |Δy ui |)μi ; Δxx uk 1 :=
i
|Δxx uki |μi ,
Δyy uk 1 :=
i
(8) |Δyy uki |μi .
(9)
i
In this paper, we shall study the error bound for the central-difference discrete ROF model of which the energy functional is defined as follows Ec (uk ) = Jc (uk ) +
1 k u − g k 2c . 2λ
where the BV term Jc is defined by |Δcx uki |2 + |Δcy uki |2 μi , Jc (uk ) := i∈Ω k
(10)
(11)
518
M.-J. Lai, B. Lucier, and J. Wang
and Δcx uki and Δcy uki at i := (i1 , i2 ) are defined by Δcx uki =
uki1 +1,i2 − uki1 −1,i2 , 2h
Δcy uki =
uki1 ,i2 +1 − uki1 ,i2 −1 . 2h
Here uk satisfies the discrete Neumann boundary condition: uk−1,j = uk1,j ,
ukk+1,j = ukk−1,j ,
uki,−1 = uki,1 ,
uki,k+1 = uki,k−1 .
The discrete space measure μi = |Ωi | where Ωi is the intersection of Ω and the square with center ih and size h. Ωi := Ω ∩ [i1 h −
h h h h , i1 h + ] × [i2 h − , i2 h + ]. 2 2 2 2
It is straightforward to calculate ⎧ 2 ⎨ h /4 (i1 , i2 ) ∈ {(0, 0), (0, k), (k, 0), (k, k)} μi = h2 /2 i1 = 0, k; 0 < i2 < k or i2 = 0, k; 0 < i1 < k ⎩ 2 h 0 < i 1 , i2 < k
(12)
(13)
The 2 term is defined by uk − g k 2c =
k
k 2 |uki,j − gi,j | μi,j .
i,j=0
We often need to extend u ∈ Lp (Ω) and uk ∈ p (Ω k ) to all of R2 and Z2 , respectively; we denote the extensions by Ext u and Extk uk . For u ∈ Lp (Ω), we use the following procedure. First, Ext u(x) = u(x),
x ∈ Ω.
We then reflect horizontally across the line x1 = 1, Ext u(x1 , x2 ) = Ext u(2 − x1 , x2 ),
1 ≤ x1 ≤ 2, 0 ≤ x2 ≤ 1,
and reflect again vertically across the line x2 = 1, Ext u(x1 , x2 ) = Ext u(x1 , 2 − x2 ),
0 ≤ x1 ≤ 2, 1 ≤ x2 ≤ 2.
Having defined Ext u on 2Ω, we then extend Ext u periodically with period (2, 2) on all of R2 . We use a similar construction for discrete functions uk . First we extend uk to 2Ω k := {i = (i1 , i2 ) ∈ Z2 | 0 ≤ i1 , i2 ≤ 2k} as follows: Extk uki = uki ,
i ∈ Ωk ;
The Convergence of a Central-Difference Discretization of ROF Model
519
then we reflect horizontally Extk uk(i1 ,i2 ) = Extk uk(2k−i1 ,i2 ) ,
k + 1 ≤ i1 ≤ 2k, 0 ≤ i2 ≤ k,
and then vertically Extk uk(i1 ,i2 ) = Extk uk(i1 ,2k−i2 ) ,
0 ≤ i1 ≤ 2k, k + 1 ≤ i2 ≤ 2k.
Now that Extk uk is defined on 2Ω k , we extend it periodically with period (2k, 2k) to all of Z2 . Note that with this definition, the value of Extk uk at any point immediately “outside” Ω k is the same as the value of uk at the closest point “inside” Ω k . We sometimes need to inject or project functions into L2 (Ω) or discrete space 2 (Ω k ) respectively. We use the piecewise constant injector to inject discrete function uk into Lp (Ω): (Ih uk )(x) = uki
for x ∈ Ωi .
(14)
We also define an injector Lh into a space of continuous, piecewise linear functions. In fact, Lh is the linear interpolation of discrete points {uki } on a triangulation of vertices hZ2 . uki φki . (15) Lh uk = i∈Ω k
Here φki is a dilated and translated tent function, φki (x) := φki1 ,i2 (x1 , x2 ) := φ(x1 /h − i1 , x2 /h − i2 ),
(16)
where φ is the tent function which is continuous on R2 , supported in the hexagon shown in Fig. 1, linear on each triangle as shown in Fig. 1, and satisfies the following 0 (i1 , i2 ) ∈ Z2 \(0, 0) φ(i1 , i2 ) = 1 (i1 , i2 ) = (0, 0) We also consider the piecewise constant projector of u ∈ L1 (Ω) onto the space of discrete functions, defined by 1 (Pk u)i = u, i ∈ Ω k , |Ωi | Ωi where |Ωi | = μi is the measure of Ωi defined in (12). We need both continuous and discrete smoothing operators, which we define as follows. Assume that η(x) is a a fixed non-negative, rotationally symmetric, mollifier with support in the unit disk that is C ∞ and has integral 1. For > 0 we define the scaled function 1 x , x ∈ R2 ; η (x) := 2 η
520
M.-J. Lai, B. Lucier, and J. Wang
(−1, 1)
(0, 1)
(−1, 0)
(1, 0)
(0, 0)
(0, −1)
(1, −1)
Fig. 1. The Support of φ
we smooth a function u ∈ Lp (Ω), 1 ≤ p ≤ ∞, by computing η (x − y) Ext u(y) dy, x ∈ 2Ω. (η ∗ Ext u)(x) = R2
The discrete smoothing operator SL is defined by (SL uk )i =
1 (2L + 1)2
L j1 ,j2 =−L
uki+(j1 ,j2 )
for i ∈ Ω k
For u ∈ L (Ω) we define the (first-order) Lp (Ω) modulus of smoothness by p1 sup |u(x + τ ) − u(x)|p dx . ω(u, t)Lp (Ω) = p
τ ∈R2 , |τ | 0. (d) (Lyapunov Functionals) V (t) := Φ(u(t)) := i∈J r(ui (t)) is a Lyapunov function for all r ∈ C1 [a, b] with increasing r on [a, b]: V (t) is decreasing and bounded from below by Φ(c), where c := (μ, ..., μ) ∈ RN . (e) (Convergence to a Constant Steady State) lim u(t) = c. t→∞
The proof shows that not all of the requirements (S1)–(S5) are necessary for each of the theoretical results above: Requirement (S1) is needed for local well-posedness, while proving a maximum–minimum principle requires (S3) and (S4). Local well-posedness together with the maximum–minimum principle implies global well-posedness. The average grey value invariance is based on (S2) and (S3). The existence of Lyapunov functionals can be established by means of (S2)–(S4), and convergence to a constant steady state requires (S5) in addition to (S2)–(S4).
4 Application to Space-Discrete FAB Diffusion It is straightforward to verify the prerequisites (S1)–(S5) for the popular positive diffusivity functions, such that Theorem 1 is applicable. However, for FAB diffusion negative diffusivities are possible and the situation becomes much more complicated. One immediatly sees that space-discrete FAB diffusion with g ∈ C 1 [0, ∞) satisfies (S1: smoothness), (S2: symmetry), and (S3: vanishing row sums). However, this just implies local well-posedness and average grey level invariance. By inspecting (9) it becomes clear that (S4: nonnegative off-diagonals) and (S5: irreducibility) cannot be satisfied for typical FAB diffusivities: These diffusivities may vanish (which violates (S5)) and they may even become negative (violating (S4)). As a consequence, global well-posedness, a maximum–minimum principle, Lyapunov functions and convergence to a constant steady state cannot be proven in this way. For the practical applicability of FAB diffusion it would be highly desirable to have at least global well-posedness and a maximum–minimum principle. Is there a remedy for these properties? Fortunately the answer is affirmative, since (S4: nonnegative off-diagonals) can be replaced by a less restrictive condition that only holds at extrema: Theorem 2 (Space-Discrete Diffusion Filtering under Weaker Conditions) Assume that a space-discrete filter satisfies only the properties (S1)–(S3) of the framework (Ps ), and
532
M. Welk, G. Gilboa, and J. Weickert
(S4a) nonnegative off-diagonals at extrema: = i if u has an extremum in i. ai,j (u) ≥ 0 for all j ∈ J with j Then the well-posedness result (a), the maximum–minimum principle (b), and the average grey level invariance (c) of Theorem 1 are still satisfied. Proof. Following [13], one observes that in some pixel k that is a discrete global maximum (i.e. uk ≥ uj for all j ∈ J), condition (S4a) implies that duk = akj (u) uj dt j∈J = akk (u) uk + akj (u) uj ≤ uk ·
j∈J\{k}
≥0
≤uk
akj (u)
j∈J (S3)
= 0.
(10)
In the same way one can prove that if k is a minimum, one has ≥ 0. This nonenhancement behaviour in extrema is the only place where nonnegativity is required in the entire proof of the maximum–minimum principle in [13]. As a consequence, the maximum–minimum principle still holds if (S4) is replaced by the weaker condition (S4a). Moreover, together with local well-posedness, global well-posedness is obtained. This completes the proof. duk dt
While the preceding results are encouraging, we have not yet shown that a suitable space-discretisation satisfies the nonnegativity requirement (S4a) at extrema. Unfortunately, this issue is a bit more delicate than one might assume: A standard discretisation of the diffusivity g(|∇u|2 ) in some pixel (i, j) is given by the central difference approximation 2 2 ui+1,j − ui−1,j ui,j+1 − ui,j−1 gi,j := g (11) + 2h1 2h2 Note that even if u has an extremum in (i, j), the preceding central difference approximation of |∇u|2 may become positive – and not 0 as one would expect from the continuous theory. Since the FAB diffusivities only guarantee that g(0) > 0, it can happen that this finite difference approximation creates negative diffusivities in extrema and (S4a) is violated. Fortunately there is an interesting alternative to the standard discretisation of the diffusivity that solves these problems immediately: Theorem 3 (Properties of Space-Discrete FAB Diffusion) The space discretisation (6) of FAB diffusion with g(0) > 0 and g ∈ C 1 [0, ∞) is wellposed, satisfies a maximum–minimum principle and average grey level invariance, if the diffusivity is evaluated by the nonstandard finite difference approximation ui+1,j − ui,j ui,j − ui−1,j gi,j := g max · ,0 h1 h1 ui,j+1 − ui,j ui,j − ui,j−1 · ,0 . (12) + max h2 h2
Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering
533
It should be noted that this approximation has the same quadratic order of consistency as the previous one. However, it guarantees a vanishing discrete gradient approximation in extrema. As a consequence, (S4a) is guaranteed, since FAB diffusities satisfy g(0) > 0. Interestingly, the positivity property g(0) > 0 together with the smoothness assumption g ∈ C 1 [0, ∞) are the only requirements that are necessary to establish well-posedness and a maximum–minimum principle for space-discrete FAB diffusion. Last but not least, these results are not restricted to the two-dimensional case: With a similar nonstandard approximation, it is straightforward to verify that space-discrete FAB diffusion is well-posed and satisfies an extremum principle in any dimension.
5 Fully Discrete FAB Diffusion In order to establish useful properties for FAB diffusion in the fully discrete case, we restrict ourselves to the 1-D setting and use a simple explicit time discretisation with step size τ . Then the corresponding scheme to ∂t u = ∂x (g((∂x u)2 ) ∂x u) is given by g k + gik uki−1 − uki g k + gik uki+1 − uki − uki uk+1 i = i−1 · · + i+1 (13) 2 τ 2 h 2 h2 k k u −u uk −uk with the nonstandard approximation gik = g max i h i−1 · i+1h i , 0 . The up-
per index denotes the time level, i.e. uki approximates u at location (i − 12 )h and time kτ . This approximation also holds at the boundary pixels u1 and uN when one uses the before mentioned dummy pixels. For our analysis, two additional assumptions are essential. While the first one refers to the range of grey values, the second one requires a diffusivity g that still takes sufficiently large positive values for small positive arguments. We get the following result. Theorem 4 (Properties of Fully Discrete FAB Diffusion) Let an initial 1-D image f = (fi ) be given and let the sequence of images uk = (uki ) evolve according to (13) with the initial condition u0 = f . Let the grey-values fi be restricted to a finite interval of length R. Assume further that two constants c1 > c2 > 0 exist such that the diffusivity g fulfils g(0) = c1 , and g(z) > −c2 for all z > 0. Moreover, assume that a positive ω exists such that g(s2 ) > c2 holds for all s with 0 < s < ωR. If the time step satisfies τ
ω 2 h2 R2 . Here we conclude from uki+1 − uki+2 ≤ R that uki − uki+1 > ω 2 h2 R .
(20)
k k k Using 12 (gik + gi+1 ) < c1 and 12 (gi+1 + gi+2 ) > −c2 we obtain from (19) the estimate τ τ k k k uk+1 (21) i+1 ≤ ui+1 + 2 c1 (ui − ui+1 ) + 2 c2 R h h k which ensures uk+1 i+1 ≤ ui , provided that
Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering
τ≤
ω 2 h4 c 1 ω 2 h 2 + c2
535
(22)
holds. Condition (18) ensures the bounds of both cases, i.e. (17) and (22). Step 3: No new extrema are generated around existing extrema Assume that uki is a local maximum, and none of its neighbours is a local minimum. Assume first that (uki+1 − uki )(uki+2 − uki+1 ) > ω 2 R2 (23) and thus again (20) and (21) hold. Similar considerations for uk+1 yield i uk+1 ≥ uki + i
τ τ c1 (uki+1 − uki ) − 2 c1 R 2 h h
which together with (21) implies τ k τ k+1 − u ≥ 1 − 2 c1 (ui − uki+1 ) − 2 (c1 + c2 )R . uk+1 i i+1 2 h h
(24)
(25)
By the hypothesis of the theorem, (14), and (20) we have that τ
uki+1 > uki+2 > uki+3 . We show that then also uk+1 i+1 ≥ ui+2 holds. In the proof we distinguish three cases. k k k Case 1: gik + gi+1 ≥ 0 and gi+2 + gi+3 ≥ 0. Then τ k k+1 uk+1 − u ≥ 1 − 2 c1 (ui+1 − uki+2 ) (32) i+1 i+2 h2 such that the right-hand side is again nonnegative if (17) holds. k k k Case 2: gik + gi+1 ≥ 0 and gi+2 + gi+3 < 0. k k k (The case gik + gi+1 < 0 and gi+2 + gi+3 ≥ 0 is treated in a symmetric way.) k k k From ui+2 − ui+3 ≤ R and (ui+1 − uki+2 )(uki+2 − uki+3 ) > ω 2 h2 R2 we obtain
uki+1 − uki+2 > ω 2 h2 R .
(33)
Consequently, τ c1 (uki+1 − uki+2 ) − h2 τ > uki+1 − uki+2 − 2 2 c1 (uki+1 − uki+2 ) − h Due to (33) the right-hand side is certainly nonnegative if k+1 k k uk+1 i+1 − ui+2 ≥ ui+1 − ui+2 − 2
τ≤
τ c2 (uki+2 − uki+3 ) h2 τ c2 R . (34) h2
ω 2 h4 . 2c1 ω 2 h2 + c2
(35)
k k k < 0 and gi+2 + gi+3 < 0. Case 3: gik + gi+1 Since in this case we have
(uki − uki+1 ) + (uki+2 − uki+3 ) ≤ R ,
(36)
(uki+1 − uki+2 ) min(uki − uki+1 , uki+2 − uki+3 ) > ω 2 h2 R2
(37)
uki+1 − uki+2 > 2ω 2 h2 R .
(38)
it follows that
and thus
A similar reasoning as in Case 2 gives that τ≤
uk+1 i+1
−
uk+1 i+2
ω 2 h4 . 2c1 ω 2 h2 + c2 /2
is ensured if (39)
Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering
537
Comparing the bounds derived for the different statements yields (14) as the most restrictive one. If this condition is imposed, extrema cannot be created but only shifted to neighbouring pixels, and monotone segments preserve their monotonicity. Both the maximum–minimum principle and the reduction of total variation follow immediately. This completes the proof. We are convinced that Theorem 4 also possesses a 2-D analogue. The preceding proof, however, does not transfer in a straightforward way to this case: The dependency of g on nonstandard discretisations of ux and uy (cf. (12)) makes it highly cumbersome to control the sign of g.
6 Summary and Conclusions In spite of its negative diffusivity, FAB diffusion becomes well-posed if a nonstandard space discretisation is used. It guarantees a positive diffusivity in discrete extrema. This result is fundamental for justifying FAB diffusion in a practical setting with digital images. Our ongoing work includes research on the multidimensional fully discrete case as well as extensions of our results to (semi-)implicit time discretisations.
Acknowledgement This work has been initiated during a visit of Guy Gilboa to Saarland University. His visit has been financially supported by the Minerva Foundation.
References 1. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations, 2nd edn. Applied Mathematical Sciences, vol. 147. Springer, New York (2006) 2. Breuß, M., Welk, M.: Staircasing in semidiscrete stabilised inverse diffusion algorithms. Journal of Computational and Applied Mathematics 206, 520–533 (2007) 3. Gilboa, G., Sochen, N.A., Zeevi, Y.Y.: Forward-and-backward diffusion processes for adaptive image enhancement and denoising. IEEE Transactions on Image Processing 11(7), 689– 703 (2002) 4. Gilboa, G., Sochen, N.A., Zeevi, Y.Y.: Image sharpening by flows based on triple well potentials. Journal of Mathematical Imaging and Vision 20, 121–131 (2004) 5. Kramer, H.P., Bruckner, J.B.: Iterations of a non-linear transformation for enhancement of digital images. Pattern Recognition 7, 53–58 (1975) 6. Mrázek, P., Weickert, J., Steidl, G.: Diffusion-inspired shrinkage functions and stability results for wavelet denoising. International Journal of Computer Vision 64(2/3), 171–186 (2005) 7. Nikolova, M.: Local strong homogeneity of a regularized estimator. SIAM Journal on Applied Mathematics 61(2), 633–658 (2000) 8. Nikolova, M.: Minimizers of cost-functions involving nonsmooth data fidelity terms. Application to the processing of outliers. SIAM Journal on Numerical Analysis 40(3), 965–994 (2002)
538
M. Welk, G. Gilboa, and J. Weickert
9. Osher, S., Rudin, L.: Shocks and other nonlinear filtering applied to image processing. In: Tescher, A.G. (ed.) Applications of Digital Image Processing XIV. Proceedings of SPIE, vol. 1567, pp. 414–431. SPIE Press, Bellingham (1991) 10. Osher, S., Rudin, L.I.: Feature-oriented image enhancement using shock filters. SIAM Journal on Numerical Analysis 27, 919–940 (1990) 11. Perona, P., Malik, J.: Scale space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 629–639 (1990) 12. Pollak, I., Willsky, A.S., Krim, H.: Image segmentation and edge enhancement with stabilized inverse diffusion equations. IEEE Transactions on Image Processing 9(2), 256–266 (2000) 13. Weickert, J.: Anisotropic Diffusion in Image Processing. Teubner, Stuttgart (1998) 14. Weickert, J., Benhamouda, B.: A semidiscrete nonlinear scale-space theory and its relation to the Perona–Malik paradox. In: Solina, F., Kropatsch, W.G., Klette, R., Bajcsy, R. (eds.) Advances in Computer Vision, pp. 1–10. Springer, Wien (1997) 15. Welk, M., Weickert, J., Gali´c, I.: Theoretical foundations for spatially discrete 1-D shock filtering. Image and Vision Computing 25(4), 455–463 (2007)
L0 -Norm and Total Variation for Wavelet Inpainting Andy C. Yau1 , Xue-Cheng Tai1,2 , and Michael K. Ng3 1
Division of Mathematical Science, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 2 Mathematics Institute, University of Bergen, Norway 3 Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Abstract. In this paper, we suggest an algorithm to recover an image whose wavelet coefficients are partially lost. We propose a wavelet inpainting model by using L0 -norm and the total variation (TV) minimization. Traditionally, L0 -norm is replaced by L1 -norm or L2 -norm due to numerical difficulties. We use an alternating minimization technique to overcome these difficulties. In order to improve the numerical efficiency, we also apply a graph cut algorithm to solve the subproblem related to TV minimization. Numerical results will be given to demonstrate our advantages of the proposed algorithm.
1
Introduction
Inpainting refers as filling the missing “information" in an image. However, missing information of the image can be in the pixel domain, but also can be in the other domain. As wavelet plays an important role in the image compression, some information may be lost when the image is compressed and transmitted, either in terms of pixels or wavelet coefficients. In this work, we shall consider to “inpaint" the missing wavelet information. Inpainting idea in digital image processing has been developed for several years. Masnou and Morel [17] solved the inpainting problem by using the propagating level curves. Bertalmio et. al. [2] suggested to solve a third order PDE. Chan and Shen [6] proposed a total variation (TV) inpainting model which uses variational methods in inpainting. Tai et. al [18] suggested an inpainting algorithm that propagate the information into the inpainting domain along the isophote direction by solving TV-Stokes equation. Chan et. al. [5] suggested a unified TV model for inpainting and superresolution. However, all these methods are in the pixel domain only. In the wavelet domain, the situation is totally different. The damages in the wavelet domain will give the image with correlated damage patterns in the pixel domain. Therefore, we cannot recover the damage image directly in the pixel domain. Chan et. al. [7] suggested a wavelet-based TV
The research is supported by MOE (Ministry of Education) Tier II project T207N2202 and IDM project NRF2007IDMIDM002-010. In addition, the support from SUG 20/07 is also gratefully acknowledged.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 539–551, 2009. c Springer-Verlag Berlin Heidelberg 2009
540
A.C. Yau, X.-C. Tai, and M.K. Ng
algorithm to recover the image whose wavelet coefficients are lost. They applied TV regularization in the pixel domain to control and restore wavelet coefficients in the wavelet domain. Cai et. al. [3] suggested a tight frame based inpainting algorithm. They found the sparse representation of the image by using the smoothed function with the L1 -norm and solved the inpainting problem by the projection of a vector onto the convex set. In this paper, we will restore the image by finding the best sparse representation of the image in the wavelet domain with a TV minimization. There are some earlier works for image restoration in the wavelet domain with the TV minimization. Chan and Zhou [8] discussed the image denoising and compression by combining wavelet and the TV minimization. Durand and Froment [11] used the TV minimization in conjunction with wavelet to eliminate pseudo oscillations in image restoration. Wang and Zhou [20] suggested a wavelet-based TV minimization to denoise the medical image. To find the sparse representation, we apply the L0 -norm which counts the number of nonzero of the vector. It is usually used for finding spare representations. However, it is a combinatorial problem which is the NP-complex problem [10]. It is hard to be solved directly and therefore it has not been used much for real applications. It is usually replaced by the L1 -norm. This makes the objective function of the optimization to be a convex functional and the problem can be solved more easily. However, the computational cost for finding the solution of such optimization is very expensive. Mancera and Portilla [16] suggested a method to find the sparse representations by the L0 -norm directly. They presented a sub-optimal method, that is, looking for the vector with K non-zero coefficients to minimize the Euclidean distance to the input signal. In this paper, we will consider the observed image damaged in the wavelet domain and propose a fast algorithm for image restoration by using the L0 -norm and the TV minimization. We first find the best sparse representation by taking the L0 -norm in the wavelet domain and then fill the missing information by the TV minimization solved by a graph cut algorithm [1] [9]. We will present a method to solve the L0 -norm minimization directly and efficiently. The paper will be organized as follows. In Section 2, we will introduce our mathematical model and discuss the method to solve it. Numerical results will be given in Section 3 to demonstrate our algorithm. We will conclude this paper in Section 4.
2
Mathematical Model
Consider the image domain Ω ⊂ IR2 . Let g be the original image of size n1 × n2 . Then the noisy image g˜ is given by g˜ = g + η
(1)
where η is a noise vector. Let W be the wavelet transform that maps an image from the pixel domain to the wavelet domain. Then the wavelet decomposition of g˜ is given by gˆ = Wg + Wη.
(2)
L0 -Norm and Total Variation for Wavelet Inpainting
541
We assume that some of the wavelet coefficients are lost. Let J be the index set indicating the positions that the wavelet coefficients are known and the rest are lost. Then we define a mapping Π by 1, if (i, j) ∈ J; Πi,j = (3) 0, if (i, j) ∈ / J. Therefore, the observed image, which is damaged in the wavelet domain, can be written as gˆob = Π(Wg + Wη). (4) To restore the image, we consider the following minimization. min Π(ˆ u − gˆ)0 + βT V (u) u
(5)
where β is a regularization parameter and T V (u) denotes the TV norm of u, which is T V (u) = |∇u|dx. (6) Ω
Here and later, for any function defined in the image domain, we will use u ˆ to denote its wavelet representation, i.e. u ˆ = Wu. Due to the orthogonality of the wavelet transformation, we always have ˆ u2 = u2.
(7)
For simplicity, we have use · 2 to denote the both the matrix L2 and pixel domain L2 (Ω) norms. The L0 -norm measures the differences of wavelet coefficients between the observed image and the resultant image while the TV norm fills the missing information of the image in the pixel domain. Minimization problem (5) is trying to find an image u whose wavelet coefficients is close to gˆ while its TV-norm is minimized in the pixel domain. This problem is not easy to solve. To solve the minimization (5), we shall introduce one more auxiliary function f and one more fitting term [13] [21] to the minimization (5). Then it becomes 1 min Π(ˆ u − gˆ)0 + u − f 22 + βT V (f ) u,f α 1 2 ˆ u − f 2 + βT V (f ) = min min Π(ˆ u − gˆ)0 + ˆ u f α
(8)
We have used property (7) in the above formulation. The new auxiliary image f is an approximation of u and the new fitting α1 u − f 22 is used to control the difference between u and f in the pixel domain. When α goes to zero, the image f will go to u. The advantage with the above formulation is that we can solve the minimization (5) by solving two sub-minimization problems, i.e. 1 ˆ u − fˆ22 , u ˆ α min u − f 22 + βT V (f ).
min Π(ˆ u − gˆ)0 + f
(9) (10)
542
A.C. Yau, X.-C. Tai, and M.K. Ng
These two minimization problem are coupled. The first minimization problem is trying to find u with a given f and the second minimization problems is trying to find f with a given u. We shall try to use an iterative scheme to alternatively minimize these two sub-problems. For the first minimization problem (9), We shall show that the solution of can be given by a simple explicit formula when α approaches zero. This is cost efficient. The second minimization problem (10) is essentially the ROF model. We shall use a new fast graph cut algorithm to solve it. In the minimization (9), the minimization functional is separable and we can minimize with respect to u ˆi,j separately for different (i, j). This is an very important property. For each (i, j), there are two possible cases for u ˆi,j . We solve it by considering all the cases. Case 1: u ˆi,j = gˆi,j . The objective functional value of (9) related to the (i, j)-th coefficient is 1 gi,j − fˆi,j |22 . 0 + |ˆ (11) α = gˆi,j . The objective functional value of (9) related to the (i, j)-th Case 2: u ˆi,j coefficient is 1 ui,j − fˆi,j |22 . 1 + |ˆ (12) α However, u ˆi,j has two possible choices, either gˆi,j or fˆi,j . Therefore, in this case, we substitute u ˆi,j = fˆi,j into (12), and we have 1+
1 ˆ fi,j − fˆi,j 22 = 1. α
(13)
If u ˆi,j = gˆi,j , the following inequality must hold. 1 ˆ ui,j − fˆi,j 22 ≤ 1. α
(14)
Therefore, the update scheme for u ˆi,j is u ˆi,j =
gˆi,j , fˆi,j ,
if α1 (ˆ gi,j − fˆi,j )2 ≤ 1 and (i, j) ∈ J; otherwise.
(15)
If α is small enough, the update scheme (15) becomes u ˆi,j =
gˆi,j , fˆi,j ,
if (i, j) ∈ J; otherwise.
(16)
In case there is noise, we should choose a small α and use (15) to update u ˆi,j . When u ˆ is found, we can find f by applying the graph cut algorithm to the minimization (10). In order to use the graph cut algorithm, we need to discretize the TV norm in the minimization. Let M = {(i, j)|i ∈ {1, . . . , n1 }, j ∈
L0 -Norm and Total Variation for Wavelet Inpainting
543
{1, . . . , n2 }} is the set of grid points, and δ denote the mesh size. We should use a special form for the discrete TV norm as in [1] [4] [19] T V k (f ) =
p∈M q∈Nk (p)
1 ωpq |fp − fq | 2
(17)
where Nk (p) is the set of neighboring points of any grid point p ∈ M and defined as N4 = {(i ± 1, j), (i, j ± 1)|(i, j) ∈ M }, N8 = {(i ± 1, j), (i, j ± 1), (i ± 1, j ± 1)|(i, j) ∈ M } 2
4δ and ωpq = kp−q . 2 Finally, the minimization (10) can be rewritten in the discrete form as follows
min f
|up − fp |2 + β
p∈M q∈Nk (p)
p∈M
1 ωpq |fp − fq |. 2
(18)
We assume that the image is in n-bit grey scale format and thus f can only take values in [0, 1, . . . , 2n − 1]. Due to this special requirement, we can solve the minimization (10) by a graph cut algorithm. 2.1
Graph Construction
In this subsection, we shall use the graph cut method to solve the minimization (18). The graph cut method basically can be divided into two parts: graph construction and finding the minimal cut. Our graph construction is based on the method from Bae and Tai [1], which constructs a 3-dimensional graph. Consider that the observed image is n-bit gray level image. Then the range of the intensity level of this image is from 0 to 2n − 1 and a set of vertices is defined as V = {vp,l | p ∈ M, l ∈ {1, . . . , 2n − 1}} ∪ {s} ∪ {t}. Here s, t refer to the two terminal nodes and we refer to [1] for some more details. All the edges for the graph can be divided into two groups, Ed and Er , that is E = Ed ∪ Er . where n
Ed = ∪2l=1−2 {(vp,l , vp,l+1 )|p ∈ M } ∪ {(s, vp,1 )|∀p ∈ M } ∪ {(vp,2n −1 , t)|∀p ∈ M }; Er = {(vp,l , vq,l )|p ∈ M, q ∈ Nk (p), ∀l ∈ {1, . . . , 2n − 1}}. The cost of edges in Ed is defined by the data fitting terms, which is given by c(s, vp,1 ) = δ 2 |up |2 , c(vp,l , vp,l+1 ) = δ 2 |up − l|2
where l ∈ {1, . . . , 2n − 2},
c(vp,2n −1 , t) = δ 2 |up − (2n − 1)|2 .
544
A.C. Yau, X.-C. Tai, and M.K. Ng
We say that a cut is admissible if it exactly severs one edge for each p ∈ M , in which case exactly n1 n2 edges from Ed are severed. The cost of the edge in Er is defined by the TV norm, which is given by c(vp,l , vq,l ) = βωpq
where p ∈ M and q ∈ Nk (p).
The above method will give us the 3-dimensional graph G = (V, E). Then we can find the minimal cut with G. A cut on G is a partition of the vertices V into two disjoint sets (Vs , Vt ) such that s ∈ Vs and t ∈ Vt . For a given cut, the set of severed edges C is defined as C = {(a, b)|a ∈ Vs , b ∈ Vt and (a, b) ∈ E}
(19)
The cut severs the edge e if e is contained in C. The cost of the cut is defined as |C| = c(e) (20) e∈C
The minimal cut is that the total cost of the cut |C| is the minimum. As in [1], we associate an f with every admissible cut C on G through definition ⎧ if (s, vp,1 ) ∈ C, ⎨0 if (vp,l , vp,l+1 ) ∈ C, (21) fp = l ⎩ n 2 −1 if (vp,2n −1 , t) ∈ C. It is easy to see the following relation between a cut C and the image function f : 1 ωpq |fp − fq |. |C| = |up − fp |2 + β (22) 2 p∈M
p∈M q∈Nk (p)
which is the objective function of the minimization (18). So if C is the minimal cut, then the objective function is the minimum with respect to f . Therefore, to find the minimal cut on this graph is equivalent to solve the minimization (18). Besides the graph construction mentioned above, there are other methods to construct the graph. Darbon et. al. suggested construct the graph with one layer only and solve the problem level by level [9]. Ishikawa et. al. proposed a similar graph construction with more vertices and edges [15] [14]. According to the method mentioned above, we can construct the corresponding graph to the TV minimization (18) and find the minimal cut by applying the push-and-relabel algorithm [12]. As a result, we can solve the minimization (10).
3
Numerical Result
In the numerical experiment, three images, which are ‘lena’ image, ‘bush’ image and synthetic image, of size 96×96 are used for testing and shown in figure 1. We compare the quality of the image by peak-signal-to-noise ratio (PSNR) which is given by
2552 . (23) P SN R = 10 log10 u − u0 22 where u0 is an original image and u is a reconstructed image.
L0 -Norm and Total Variation for Wavelet Inpainting
545
In the experiment, we assume that α is small enough such that we can apply the update scheme (16) directly and we initiate β with some number and decrease β by 1 in each iteration. The reason is that the large β can recover the geometric information, and the small β can recover the details of the image. Figure 2 shows the position of missing wavelet coefficients of the observed images. We use ‘db7’ wavelet in our experiment. We use Matlab to run the experiment in the laptop computer with Intel Core 2 CPU T7200 (2GHz) and 2 GB memory. We will compare with the Chan et. al. algorithms [7] and name Model 1 and Model 2 to represent their model 1 and model 2.
(a)
(b)
(c)
Fig. 1. Original images: (a) ‘lena’ image, (b) ‘bush’ image and (c) synthetic image
(a)
(b)
Fig. 2. The position of the missing coefficients of the observed images: (a) 10% of wavelet coefficients are missing and (b) 50% of wavelet coefficients are missing
3.1
Noise Free Image
In the first experiment, we test our algorithm with the noise free image. Figure 3 shows the lena image with 10% wavelet coefficients lost and its restored image. The starting β for this case is 60 and PSNR of the resultant image is 27.80 dB. The result obtained by Chan et. al. algorithms [7] are also shown in the figure. Figure 3(c) is obtained by their Model 1 with PSNR=28.26 dB and Figure 3(d) with PSNR=26.70 dB is obtained by their Model 2. Figure 4 shows the bush image with 10% wavelet coefficients lost and its restored image. The starting β for this case is 55 and PSNR of the resultant image is 29.54 dB. Figure 4(c) is obtained by Model 1 with PSNR=26.89 dB and Figure 4(c) with PSNR=25.65 dB is obtained by Model 2. Figure 5 shows the bush image with 50% wavelet coefficients lost and its restored image. The starting β for this case is 60 and PSNR of the resultant
546
A.C. Yau, X.-C. Tai, and M.K. Ng
(a)
(b)
(c)
(d)
Fig. 3. The image with 10% of wavelet coefficient lost : (a) observed image (PSNR=11.84 dB) and (b) restored image (PSNR = 27.80 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
(a)
(b)
(c)
(d)
Fig. 4. The image with 10% of wavelet coefficient lost : (a) observed image (PSNR=16.08 dB) and (b) restored image (PSNR = 29.54 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
(a)
(b)
(c)
(d)
Fig. 5. The image with 50% of wavelet coefficients lost : (a) observed image (PSNR=11.00 dB) and (b) restored image (PSNR=19.48 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
image is 19.48 dB. Figure 5(c) is obtained by Model 1 with PSNR=17.92 dB and Figure 5(d) with PSNR=18.22 dB is obtained by Model 2. Figure 6 shows the synthetic image with 50% wavelet coefficients lost and its restored image. The starting β for this case is 60 and PSNR of the resultant image is 23.79 dB. Figure 6(c) is obtained by Model 1 with PSNR=26.04 dB and Figure 6(d) with PSNR=22.38 dB is obtained by their Model 2. Table 1 summarizes the results of our experiment.
L0 -Norm and Total Variation for Wavelet Inpainting
(a)
(b)
(c)
547
(d)
Fig. 6. The image with 50% of wavelet coefficient lost : (a) observed image (PSNR=10.02 dB) and (b) restored image (PSNR = 23.79 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7]. Table 1. Comparsion of noise free cases (PSNR) Image
Missing coef. Obs. image Our alg. Model 1 Model 2
Lena image Bush image Bush image Synthetic image
10% 10% 50% 50%
(a)
11.84 16.08 11.00 10.02
27.80 29.54 19.48 23.79
(b)
28.26 26.89 17.92 26.04
26.70 25.65 18.22 22.38
(c)
Fig. 7. Noisy ‘lena’ image, ‘bush’ image and synthetic image
(a)
(b)
(c)
(d)
Fig. 8. The image with 10% of wavelet coefficient lost : (a) observed image (PSNR=11.28 dB) and (b) restored image (PSNR = 22.37 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
The experimental results show that our method can obtained better result than Model 2. The result of the Bush image is also better than those results from Model 1 in PSNR. The resultant image shows that our method can keep
548
A.C. Yau, X.-C. Tai, and M.K. Ng
(a)
(b)
(c)
(d)
Fig. 9. The image with 10% of wavelet coefficients lost : (a) observed image (PSNR=15.93 dB) and (b) restored image (PSNR=26.22 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
(a)
(b)
(c)
(d)
Fig. 10. The image with 50% of wavelet coefficients lost : (a) observed image (PSNR=10.76 dB) and (b) restored image (PSNR=17.54 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
(a)
(b)
(c)
(d)
Fig. 11. The image with 50% of wavelet coefficients lost : (a) observed image (PSNR=9.85 dB) and (b) restored image (PSNR=19.06 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
Table 2. Comparsion of noisy cases (PSNR) Image Lena image Bush image Bush image Synthetic image
Missing coef. Obs. image Our alg. Model 2 Model 2 10% 10% 50% 50%
11.28 15.93 10.76 9.85
22.37 26.22 17.54 19.06
17.79 24.47 15.36 15.84
20.67 22.48 15.90 18.95
L0 -Norm and Total Variation for Wavelet Inpainting
549
more details in the restored image than the other methods. In Figure 6, the small circle on the right upper corner is clearer than the other images. 3.2
Noisy Image
In the second experiment, we test our algorithm with noisy images. We add the white Guassian noise with σ = 0.01 to the original images which are shown in Figure 7. Figure 8 shows the lena image with 10% of wavelet coefficients lost and its restored image. The starting β for this case is 65. We obtained the best image when β = 23 and its PSNR is 22.37 dB. The PSNR of Figure 8(c) by using Model 1 is 17.79 dB) and Figure 8(d) is 20.67 dB by using Model 2. Figure 9 shows the input image with 10% of wavelet coefficients lost and its restored image. The starting β for this case is 50 and the PSNR of the resultant image is 26.22 dB. The PSNR of Figure 9(c) by using Model 1 is 24.47 dB and Figure 9(d) is 22.48 dB by using Model 2. Figure 10 shows the bush image with 50% of wavelet coefficients lost and its restored image. The starting β for this case is 60 and the PSNR of the resultant image is 17.54 dB. The PSNR of restored images are 15.36 dB by using Model 1 (Figure 10(c)) and 15.90 dB by using Model 2 (Figure 10(d)). Figure 11 shows the synthetic image with 50% of wavelet coefficients lost and its restored image. The starting β for this case is 60 and the PSNR of the resultant image is 19.06 dB. The PSNR of restored images are 15.84 dB by using Model 1 (Figure 11(c)) and 18.95 dB by using Model 2 (Figure 11(d)). Table 2 summarizes the results of our experiment. In this experiment, it is more difficult as the noise is present in the observed image. Table 2 shows that our restored image better than those results from the other methods in PSNR. The resultant images show that our method can remove the noise and keep more details of the image. In Figure 10, our resultant image keeps more details than the others resultant images. The head of Bush in our resultant image remains a better shape than in the other images.
4
Conclusion
In this paper, we introduce the algorithm to solve the wavelet inpainting problem. We apply the L0 -norm to optimize the wavelet coefficients and the TV minimization to fill the missing information. We suggest a method to solve the L0 -norm directly. We solve the minimization (5) by introducing one more fitting term and break down into two minimizations (9) and (10). The minimization (9) minimizes the L0 -norm and the minimization (10) minimizes the TV norm. We apply the graph cut algorithm to solve the TV minimization. The experimental results show that our algorithm can obtain better results.
550
A.C. Yau, X.-C. Tai, and M.K. Ng
References 1. Bae, E., Tai, X.C.: Graph Cuts for the Multiphase Mumford-Shah Model Using Piecewise Constant Level Set Methods. UCLA, Applied Mathematics, CAMreport-08-36 (2008) 2. Bertalmio, M., Sapiro, G., Caselles, V., Balleste, C.: Image inpainting. Technical report, ECE-University of Minnesota 60, 259–268 (1999) 3. Cai, J.F., Chan, R.H., Shen, Z.: A Framelet-Based Image Inpainting Algorithm. Appl. Comput. Harmon. Anal. 24, 131–149 (2008) 4. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 5. Chan, T.F., Ng, M.K., Yau, A.C., Yip, A.M.: Superresolution image reconstruction using fast inpainting algorithms. Applied and Computational Harmonic Analysis 23(1), 3–24 (2007) 6. Chan, T., Shen, J.: Mathematical models for local non-texture inpainting. SIAM Journal on Applied Mathematics 62, 1019–1043 (2001) 7. Chan, T., Shen, J., Zhou, H.M.: Total Variation Wavelet Inpainting. Journal of Mathematical Imaging and Vision 25(1), 107–125 (2006) 8. Chan, T., Zhou, H.M.: Optimal Constructions of Wavelet Coefficients Using Total Variation Regularization in Image Compression. UCLA, Applied Mathematics, CAM Report, No. 00–27 (2000) 9. Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation part I: Fast and exact optimization. J. Math. Imaging Vis. 26(3), 261–276 (2006) 10. Donoho, D.L.: For Most Large Undetermined Systems of Linear Equations the Minimal l1-norm Solution is also the Sparsest Solution. Communications on Pure and Applied Mathematics 59(7), 903–934 (2006) 11. Durand, S., Froment, J.: Artifact Free Signal Denoising with Wavelets. In: Proceedings of ICASSP 2001, vol. 6, pp. 3685–3688 (2001) 12. Goldberg, A.V., Tarjan, R.E.: A new approach to the maximum-flow problem. J. ACM 35(4), 921–940 (1988) 13. Huang, Y., Ng, M.K., Wen, Y.: A Fast Total Variation Minimization Method for Image Restoration. Multiscale Modeling & Simulation 7(2), 774–795 (2008) 14. Ishikawa, H.: Exact optimization for markov random fields with convex priors. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(10), 1333– 1336 (2003) 15. Ishikawa, H., Geiger, D.: Segmentation by grouping junctions. In: CVPR 1998: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA. IEEE Computer Society, Los Alamitos (1998) 16. Mancera, L., Portilla, J.: L0-Norm-Based Sparse Representation Through Alternate Projections. In: IEEE International Conference on Image Processing, Atlanta, pp. 2089–2092 (2006) 17. Masnou, S., Morel, J.: Level-lines based disocclusion. In: Proc. 5th IEEE Int. Conf. on Image Process., Chicago, pp. 259–263 (1998) 18. Tai, X.C., Osher, S., Holm, R.: Image Inpainting Using a TV-Stokes Equation. In: Image Processing based on partial differential equations, pp. 3–22. Springer, Heidelberg (2006)
L0 -Norm and Total Variation for Wavelet Inpainting
551
19. Ranchin, F., Chambolle, A., Dibos, F.: Total Variation Minimization and Graph Cuts for Moving Objects Segmentation. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 743–753. Springer, Heidelberg (2007) 20. Wang, Y., Zhou, H.: Total Variation Wavelet-Based Medical Image Denoising. International Journal of Biomedical Imaging 2006, 1–6 (2006) 21. Wang, Y., Yang, J., Yin, W., Zhang, Y.: A New Alternating Minimization Algorithm for Total Variation Image Reconstruction. SIAM J. Imaging Science 1(3), 248–272 (2008)
Total-Variation Based Piecewise Affine Regularization Jing Yuan1 , Christoph Schnörr1, and Gabriele Steidl2 1
Image and Pattern Analysis Group Dept. Mathematics and Computer Science, University of Heidelberg, Germany {yuanjing,schnoerr}@math.uni-heidelberg.de 2 Appl. Math. Comp. Sci. Group Dept. Mathematics and Computer Science, University of Mannheim
[email protected] Abstract. In this paper, we introduce a novel second-order regularizer, the Affine Total-Variation term, to capture the geometry of piecewise affine functions. The approach can be characterized by two convex decompositions of a given image into piecewise affine structure and texture and noise, respectively. A convergent multiplier-based method is presented for computing a global optimum by computationally cheap iterative steps. Experiments with images and vector fields validate our approach and illustrate the difference to classical TV denoising and decomposition.
1 1.1
Introduction Overview and Motivation
In this paper, we suggest and investigate a novel second-order regularization term, (1) u2xx + u2yx + u2xy + u2yy dx , TVa (u) := Ω
called Affine Total Variation, for denoising and decomposing functions into piecewise affine structures. Our work has been motivated by the basic total variation approach [15] to the piecewise constant regularization of functions, henceforth called ROF-model, and a recent extension of this approach suggested in [23] to the piecewise harmonic regularization of vector fields. The latter approach demonstrates that by modifying the usual total variation term TV(u) = |∇u| dx , (2) Ω
flows can be restored and decomposed into richer structure than merely piecewise constant functions, that only model a narrow subclass of real signals sufficiently accurate. At the same time, the basic structure of the ROF-model from the viewpoint of convex optimization has been preserved, such that standard methods from convex programming lead to efficient algorithms. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 552–564, 2009. c Springer-Verlag Berlin Heidelberg 2009
Total-Variation Based Piecewise Affine Regularization
553
While the work [23] was motivated by flows related to image sequences from experimental fluid dynamics, our present work investigates piecewise affine regularization as an alternative to the piecewise harmonic case studied in [23]. Figure 1 shows the result of applying the novel regularizer (1) to a noisy image function. Our approach returns a denoised version of the input data with the piecewise affine structures preserved well. From the viewpoint of optimization, our approach has the same simple structure as the ROF-model. From the viewpoint of algorithm design, however, a bit more work is required to be able to resort to standard algorithms, due to the second-order partial derivatives appearing in (1).
Fig. 1. From left to right. Noisy input image f , denoised image u using the regularizer (1), and the difference between the original noise-free image and the denoised image. Up to local errors at discontinuities, this latter image is almost constant which means that the piecewise affine structure underlying the noisy input data has been successfully restored.
1.2
Related Work and Contribution
Related work. Applying the standard TV-term (2) to general, not necessarily piecewise constant signals and images, leads to the well-known staircasing effect, that is to many jumps of the minimizing functions making the decomposition of the input data useless for signal interpretation. In this connection, higher-order regularization has been studied in the literature. In [1], Chambolle and Lions propose an inf-convolution of the total-variation term and a functional based on the second-order derivatives: 2 + w2 + w2 + w2 R(u) = min |∇v| dx + α wxx yx xy yy dx. u=v+w
Ω
Ω
A corresponding asymptotical case was studied in [16]. Chan et al. [3] adaptively add the Laplacian as regularizing term or replace the second summand in the inf-convolution by the Laplacian in [2] to avoid staircasing. After mollifying the TV-measure TV(u) ≈ Ω |∇u|2 + ε dx , ε 1, the corresponding Euler-Lagrange equation is iteratively solved by the lagged-diffusivity fixed-point method (cf. [19]). Likewise, You and Kaveh [20] and Didas et al. [5] investigate Laplacians u and variations thereof as argument of one convex functional. In [11], Lysaker and Tai provide two regularizars
554
J. Yuan, C. Schnörr, and G. Steidl
R1 (u) = R2 (u) =
|uxx | + |uyy | dx
(3)
u2xx + u2yx + u2xy + u2yy dx
(4)
Ω Ω
which are used in a PDE-based image diffusion process so as to avoid staircase effect in smooth regions and a fourth-order numerical scheme is given. In [12], Lysaker and Tai further introduce the convex combination of high-order regularizar and the classical total-variation term. The functional (3) was also considered in [8]. In [13], Rahman, Tai and Osher suggested a two-step high-order image denoising method, which first computes a denoised tangential field τ = (τ1 , τ2 )t , i.e. div τ = 0, by applying the regularizar |∇τ | dx which is actually equivalent to (4) for the image scalar field, then reconstructs the image gray-values by fitting the resulting normal field n = (τ2 , −τ1 ) through n dx , s.t. min |∇u| − ∇u · (u − f )2 dx = σ 2 . u |n| Ω Ω Basically, the energy functionals used in our approaches possess the same structure as the work [11] except the applied nonsmooth high-order regularizar and the optimized functional proposed in (13b) is similar as the tangential-smoothing step suggested in [13] except that our approach tries to smooth the curl-free gradient field than the div-free field. In connection with optical flow estimation, Trobin et al. [18] adopt from [4] the second-order term T √ √ 1 t(u) := √ Δu, 2 (uxx − uyy ), 8 uxy , 3 and use the corresponding TV-term Ω t(u) · t(u)dx for flow estimation. The derivation of t(u) in [4] is based on Fourier transforms and motivated by designing local detectors for detecting ridges and valleys of image functions, say. As a consequence, the corresponding TV-term appears not to be a proper basis for piecewise affine decomposition, and boundaries are not treated adequately (as is clearly visible e.g. in Fig. 2f in [18]). Contribution. Our contribution consists in devising a novel regularization term (1) that provides a mathematically precise solution to the problem of denoising piecewise affine signals. Staircasing is suppressed as well, and a augmented Lagrangian based problem decomposition is derived that enables to compute a global optimum by iterating computationally simple iterative steps. Numerical experiments are presented mainly to illustrate and validate properties of the approach.
2
Subspaces and Orthogonal Decompositions
We let Ω ⊂ R2 denote an open bounded and simply-connected domain with Lipschitz-continuous boundary ∂Ω. For scalar-valued functions, we denote by
Total-Variation Based Piecewise Affine Regularization
555
| · |p , 1 ≤ p < ∞ the usual Lp (Ω) norm and by · the L2 (Ω)inner product. For vector-valued functions g = (g1 , g2 )T , we set g p := g12 + g22 p and
g, hΩ := g1 , h1 + g2 , h2 . Further, we use the notation u ¯ := |Ω|−1 Ω u dx for T ⊥ T the average of u and ∇u := (ux , uy ) , ∇ u := (uy , −ux ) , div g := g1x + g2y and curl g := g1y − g2x . Let H 1 (Ω) denote the Sobolev spaces with the inner product ¯ v¯
u, vH 1 := ∇u, ∇vΩ + u
(5)
and let H01 (Ω) := {u ∈ H 1 (Ω) : u|∂Ω = 0}. We are interested in the space
ux , u ¯y )T · n , H(Ω) := u ∈ H 1 (Ω) : ∂n u|∂Ω = (¯ where n denotes the outer unit normal vector at the boundary ∂Ω. By the following proposition, we can decompose functions u ∈ H(Ω) into a globally affine component ua and an oscillating part uo . Proposition 1. The space H(Ω) admits the orthogonal decomposition H(Ω) = Ha (Ω) ⊕H 1 Ho ,
Ha (Ω) := u ∈ H(Ω) : ∇u = (¯ ux , u ¯y )T ,
Ho (Ω) := u ∈ H(Ω) : u ¯=u ¯x = u ¯y = 0 , ∂n u|∂Ω = 0 .
(6a) (6b) (6c)
¯y . Then ua := Proof. For any u ∈ H(Ω), let uox := ux − u¯x and uoy := uy − u u ¯x x + u ¯y y + u ¯ ∈ Ha (Ω) and the function uo defined by its partial derivatives uox , uoy and by u ¯o = 0 belongs to Ho (Ω). Moreover, we have that u = ua + uo . The orthogonality of the decomposition follows by
ua , uo H 1 = ∇ua , ∇uo Ω + u ¯a u¯o = u¯x uox dx + u¯y uoy dx = 0. 2 Ω
Ω
The Helmholtz decomposition of vector fields, see [6, 22, 21] also for the discrete setting, is given by L2 (Ω)2 = ∇H 1 (Ω) ⊕ ∇⊥ H01 (Ω), where the spaces can be also characterized by ∇H 1 (Ω) = {v ∈ L2 (Ω)2 : curl v = 0} and ∇⊥ H01 (Ω) = {v ∈ L2 (Ω)2 : div v = 0, v · n|∂Ω = 0}. We will need the space V (Ω) := {v ∈ L2 (Ω)2 : v · n|∂Ω = 0, v¯1 = v¯2 = 0} . (7) By the Helmholtz decomposition, this space admits the orthogonal decomposition V (Ω) = V∇ (Ω) ⊕ V∇⊥ (Ω),
(8)
where V∇ (Ω) := {v ∈ ∇H 1 (Ω) : v · n|∂Ω = 0, v¯1 = v¯2 = 0} and V∇⊥ (Ω) := {v ∈ ∇⊥ H01 (Ω) : v¯1 = v¯2 = 0}. Proposition 2. For every vector field v ∈ V∇ (Ω), there is a unique function uo ∈ Ho (Ω) with v = ∇uo .
556
J. Yuan, C. Schnörr, and G. Steidl
Proof. By definition we have for any v ∈ V∇ (Ω) that there exists u ∈ H 1 (Ω) such that v = ∇u. Then we see that v · n|∂Ω = ∂n u|∂Ω = 0 and v¯1 = u ¯x = 0, v¯2 = u ¯y = 0. On the other hand, uo ∈ Ho is uniquely determined by the Neumann problem Δuo = div v ,
3
∂n uo |Ω = 0 ,
u ¯o = 0 .
(9)
Variational Approaches
In the rest of this paper, we follow the first discretize, then optimize paradigm, yet adopt the usual (continuous) notation that is easier to read. Accordingly, all operators like ∇, div etc. denote linear mappings between finite dimensional spaces, | · |p are the usual p norms and for g := (gi )ni=1 , gi ∈ R2 g p := |(|gi |2 )ni=1 |p . In the following, we denote by δC the indicator function of a convex set C, i.e. δC (x) := 0 if x ∈ C, and δC (x) := ∞ otherwise and by PC the orthogonal projector onto C. We exhibit the effect of the regularizer (1) by computing a dual representations of the optimization problems (13) in accordance to the dual formulation of the ROF-model. In general, if g : Rn → R and Φ : Rm → R are proper, closed convex functions and D : Rn → Rm is a linear operator, then the following problem (P) has the dual (D): (P )
inf {g(u) + Φ(Du)},
(D)
u∈Rn
− infm {g ∗ (−D∗ p) + Φ∗ (p)}, p∈R
where g ∗ denotes the conjugate function of g. For the problems considered in the following, it can be shown that solutions of the primal and dual problem exist and that the duality gap is zero. Rudin-Osher-Fatemi (ROF) Model. We recall some basic formulas as a reference for our approach presented below. The ROF-model reads inf u
1 2
|f − u|22 + α TV(u) ,
α TV(u) := σCα (u),
(10)
where Cα := div Bα , Bα := p : p ∞ ≤ α}. Let u ˆ denote the minimizer of (10). Setting g(u) := 12 |f − u|22 , D := I and Φ(u) := αT V (u) and regarding that g ∗ (v) := 12 |f + v|22 − 12 |f |2 and Φ∗ (v) = δCα the dual problem reads − inf
1
v∈Cα
1 |f − v|22 − |f |2 , 2 2
(11)
where we have replaced v by −v by the symmetry of Cα . Consequently, if pˆ := argmin p∈Bα
1 2
|f − div p|22
(12)
Total-Variation Based Piecewise Affine Regularization
557
then vˆ := div pˆ = PCα (f ) is the minimizer of the dual problem. Primal and dual solutions are related by the optimality condition f − div pˆ = u ˆ, that in turn yields the image decomposition f = u ˆ + vˆ. Affine Variational Models. Based on the regularizer (1) we consider two variational approaches: 1
|u − f |22 + α TVa (u) , inf u 2 1
|u − f |2H 1 + α TVa (u) . inf u 2
(13a) (13b)
These approaches differ due to the data term which is the usual one in case of (13a), whereas the data term in (13b) is induced by the discrete counterpart of the inner product (5). 3.1
Variational Approach (13a)
We introduce an auxiliary vector field v in order to express the regularizer (1) in term of the ordinary TV-measure defined in (2). Then approach (13a) reads inf
u,v
1 2
|f − u|22 + α TV(v1 ) + α TV(v2 ) ,
i.e.,
subject to
v = ∇u.
(14)
inf {g(u, v) + Φ D(u, v)T u,v
with g(u, v) :=
1 2 |f
−
u|22
+ α TV(v1 ) + α TV(v2 ), Φ := δ0 , D := (∇ − Since I). div g ∗ (r, s) = 12 |f + r|22 − 12 |f |2 + δCα (s1 ) + δCα (s2 ), Φ∗ ≡ 0 and −D∗ = the I dual problem becomes 1 1 div |f − div q|22 − |f |2 . q) = − inf 2 − inf 2 g ∗ ( I 2 2 q∈Cα q∈Cα This formulation parallels the dual formulation (11) of the ROF-model. Let 1 qˆ := argmin |f − div q|22 . 2 2 q∈Cα
(15)
The higher-order TV regularization becomes apparent through the texture part of f which is defined by the orthogonal projection vˆ = div qˆ = Pdiv Cα2 (f ) onto a different convex set. An alternative, more explicit characterization of the regularization effect of (1) in terms of the auxiliary field v = ∇u is obtained by reformulating (13a) as inf G(v) + α TV(v1 ) + TV(v2 ) , v
where
G(v) :=
inf
u,∇u=v
1 |u − f |22 (16) 2
558
J. Yuan, C. Schnörr, and G. Steidl
Exploiting strong duality again we obtain that 1
1 |f − div p|22 − |f |2 − v, pΩ . G(v) = − inf p 2 2
(17)
Fermat’s rule yields that the minimizer pˆ has to fulfill ∇ div pˆ = ∇f − v and, in turn, Δ(div pˆ) = Δf − div v. Insertion into G(v) in (17) yields for (16) (omitting the constant) 1 Δ−1 (Δf − div v)2 + α TV(v1 ) + TV(v2 ) inf (18) 2 v 2 This representation of (13a) and (15), respectively, shows that the edge image Δf is approximated by the divergence of a piecewise smooth vector field v in terms of the | · |Δ−2 -norm. Clearly, inserting v = ∇u into Δ−1 (Δf − div v) yields 1 2 2 |f − u|2 from (13a). 3.2
Variational Approach (13b)
The data term of problem (13b) decomposes according to the orthogonal decomposition (6a). By construction, the affine component ua of u = ua + uo is not affected by the regularizer. Thus, uˆa = fa , where fa can be computed in a preprocessing step. It remains to minimize 1 inf ∇fo − v 22 + α TV(v1 ) + TV(v2 ) , subject to v = ∇uo . uo ,v 2 Due to the Prop. 2 the linear constraint can be expressed as δV∇ (v). Reasoning similar to the previous section yields
1 sup w, ∇fo − vΩ − w 22 + sup q, −vΩ + δV∇ (v) 2 2 w q∈Cα
1 = sup w + q, −vΩ + δV∇ (v) + w, ∇f Ω − w 22 2 2 w,q∈Cα Interchanging inf and sup and taking inf v (ignoring constants), we obtain inf
2 w,q∈Cα
1 ∇fo − w 22 2
subject to
w + q ∈ V∇⊥ .
(19)
The minimizer w ¯ is obviously an element of V∇ , which together with the constraints q ∈ Cα2 , w + q ∈ V∇⊥ leads to the reformulation of (19) inf
2) q∈P∇ (Cα
1 ∇fo − q 22 . 2
(20)
Here P∇ denotes the orthogonal projector onto the subspace V∇ . To compare this approach with (15), we rewrite (20) as inf
2) q∈P∇ (Cα
2 1 1 ∇(fo − Δ−1 div q) 22 = inf 2 ∇ Δ−1 (Δfo − div q) 2 , (21) 2 q∈P∇ (Cα ) 2
Total-Variation Based Piecewise Affine Regularization
559
where Δ−1 stands for the solution operator of problem (9). Approach (15), on the other hand, is given by inf |fa + (fo − div q)|22 .
2 q∈Cα
(22)
Taking into account the representation of vector fields q ∈ V∇ by a potential functions φq in terms of q = ∇φq viz. div q = Δφq (Prop. 2), we see that (21) focuses on the decomposition of the edge set Δf , whereas (22) decomposes f and does not discriminate the two components fa and fo . Comparing (21) on the other hand with (18) indicates how regularization of the large-scale structural components of f is accomplished by (21) in terms of the small-scale texture component φq , by taking the gradient (after smoothing with Δ−1 ) and projection onto a suitable set P∇ (Cα2 ).
4
Optimization
In this section we specify algorithms for computing a global minimum of (13a) and (13b), respectively. We apply an alternating version of the split Bregman algorithm [7]. Note that the split Bregman algorithm coincides with the augmented Lagrangian method applied to the primal problem [14] and that its alternating version is just a Douglas-Rachford splitting for the dual problem [17]. The convergence properties of this technique are well known. 4.1
Algorithm Minimizing (13a)
The split Bregman algorithm for (14) reads (u(k+1) , v (k+1) ) = argmin u,v
1 2
|f − u|22 + α(TV(v1 ) + TV(v2 ))
1 ∇u − v 22 + b(k) , ∇u − vΩ , 2τ 1 = b(k) + ∇u(k+1) − v (k+1) . τ +
b(k+1)
Alternating the minimization of u(k+1) and v (k+1) we obtain
1 ∇u(k) + τ b(k) − v 22 , v (k+1) = argmin α(TV(v1 ) + TV(v2 )) + 2τ v
1 1 (k+1) ∇u + τ b(k) − v (k+1) 22 . = argmin |f − u|22 + u 2 2τ u Then v (k+1) follows as in the ROF approach by (k) 2 (∇u v (k+1) = ∇u(k) + τ b(k) − PCατ + τ b(k) )
and u(k+1) can be computed by setting the gradient to zero u(k+1) = (I −
1 1 −1 ) f − div( v (k+1) − b(k) ) τ τ
560
J. Yuan, C. Schnörr, and G. Steidl
Algorithm. Initialization: b(0) = 0 and u(0) = f For k = 0, 1, . . . iterate until a convergence criterion is reached w(k+1) := ∇u(k) + τ b(k) (k+1) 2 (w v (k+1) := w(k+1) − PCατ ) 1 1 (k+1) (k+1) −1 f − div( τ v := (I − τ ) − b(k) ) u b(k+1) := b(k) + τ1 ∇u(k+1) − v (k+1) 4.2
Algorithm Minimizing (13b)
Based on the derivation in section 3.2, we consider inf
uo ,v
1 2
∇fo − ∇uo 22 + α TV(v1 ) + TV(v2 ) ,
subject to
v = ∇uo .
and have to iterate
1 ∇fo − ∇u|22 + α(TV(v1 ) + TV(v2 )) 2 u,v 1 + ∇u − v 22 + b(k) , ∇u − vΩ , 2τ 1 = b(k) + ∇u(k+1) − v (k+1) . τ
(u(k+1) , v (k+1) ) = argmin
b(k+1)
Alternating the first minimization process we obtain the following algorithm Algorithm. Initialization: b(0) = 0 and u(0) = f For k = 0, 1, . . . iterate until a convergence criterion is reached w(k+1) := ∇u(k) + τ b(k) (k+1) 2 (w v (k+1) := w(k+1) − PCατ ) τ (k+1) −1 := 1+τ div ∇fo + ( τ1 v (k+1) − b(k) ) u b(k+1) := b(k) + τ1 ∇u(k+1) − v (k+1)
5
Numerical Experiments
In this section we illustrate the properties of our approach with few numerical experiments. The mimetic finite difference method [9, 10] is used for discretizing relevant scalar fields and vector fields and a detailed implementation of the nonlinear functionals is given in [21]. By this numerical scheme, the relevant boundary conditions are kept well and turn out to be compatible with the corresponding integral identities. Signals. Figure 2 shows that our approach (13b) effectively removes noise without staircasing effect, in contrast to the ROF model. We also point out that boundaries are treated without introducing artifacts.
Total-Variation Based Piecewise Affine Regularization
561
Fig. 2. Ground truth and noisy input data are shown by the first two graphs respectively. Standard TV-regularizaton (ROF model) leads to the well-known staircasing effect (see 3rd. picture). Piecewise affine TV regularization effectively removes noise and recovers the piecewise affine signal structure (see 4th picture).
Variational approach (13a) versus (13b). Figure 3 compares the minimizers of the two variational approaches (13a) and (13b) for an arbitrary image section. The last two pictures of Figure 3 depict 3D plots of the minimizers subtracted from the original image section. The plot on 5th graph corresponding to the approach (13b) clearly indicates an approximation “error” that is not noticeable in the plot on 4th graph corresponding to (13a). This result confirms the discussion above of formal differences between equations (21) and (22) and the | · |2H1 based data term is more sensitive to large noises due to the noise amplification by partial derivatives.
Fig. 3. From left to right. Original image section, minimizer of (13a), minimizer of (13b), 3D plots of the minimizers subtracted from the original data illustrate a major difference between the variational approaches (13a) and (13b). While the 4th plot on the shows almost pure noise, the rightmost plot indicates an estimation error due to using the | · |2H1 data term which is sensitive to large noise levels.
Denoising of vector fields. Figure 4 compares the standard TV regularization (ROF model) with piecewise affine TV regularization for the denoising of vector fields. The input data simulate estimates obtained for a moving camera in a scene with moving objects. This scenario is roughly represented by a piecewise planar layout of the scene. The numerical results confirm again that our approach returns useful estimates of both denoised vector fields and its discontinuities, while the ROF-model only returns discontinuities but no useful vector field estimates.
562
J. Yuan, C. Schnörr, and G. Steidl
Fig. 4. Top. Color-coded motion field corresponding to a moving camera and static as well as moving objects represented by sections of planes; ground-thruth (1st. fig.), input data (2nd. fig.), the ROF-based result (3rd. fig.) and the affine regulariza tionbased (13a) result (4th. fig.). Last two rows: Components of ∇u1 and ∇u2 for the ROF model (2nd. row) and for piecewise affine regularization (3rd. row). The result on the right illustrates that through piecewise affine regularization no staircasing effect occurs, thus enabling both discontinuity detection and motion estimation, while the latter is not feasible for such scenarios with the standard ROF-model.
6
Conclusion
We presented a novel convex variational approach to the denoising and the decomposition of signals, images and vector fields. Based on a suitable orthogonal decomposition of the underlying vector space, a TV measure comprising second-order derivatives was introduced that enables to denoise noisy input data and to preserve piecewise affine signal structure using standard algorithms of convex programming. The latter are computationally simple due to a problem decomposition employing the augmented Lagrangian and primal and dual variables. By deriving dual variational formulations aking to the ROF model, differences between first- and second order regularization and between two alternative data terms were worked out. Numerical experiments confirm these findings.
Total-Variation Based Piecewise Affine Regularization
563
References 1. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997) 2. Chan, T., Esedoglu, S., Park, F.E.: Image decomposition combining staicase reduction and texture extraction. J. Visual Communication and Image Representation 18(6), 468–486 (2007) 3. Chan, T., Marquina, A., Mulet, P.: Higher-order total variation-based image restoration. SIAM J. Sci. Comput. 22(2), 503–516 (2000) 4. Danielsson, P.E., Lin, Q.: Efficient detection of second-degree variations in 2D and 3D images. J. Vis. Comm. Image Repr. 12, 255–305 (2001) 5. Didas, S., Setzer, S., Steidl, G.: Combined 2 data and gradient fitting in conjunction with 1 regularization. Advances in Computational Mathematics 30(1), 79–99 (2009) 6. Girault, V., Raviart, P.-A.: Finite Element Methods for Navier-Stokes Equations. Springer, Heidelberg (1986) 7. Goldstein, D., Osher, S.: The Split Bregman method for l1 regularized problems. UCLA CAM Report (2008) 8. Hintermüller, W., Kunisch, K.: Total bounded variation regularization as a bilaterally constraint optimization problem. SIAM J. Appl. Math. 64(4), 1311–1333 (2004) 9. Hyman, J.M., Shashkov, M.: Natural discretizations for the divergence, gradient, and curl on logically rectangular grids. Comput. Math. Appl. 33(4), 81–104 (1997) 10. Hyman, J.M., Shashkov, M.: Adjoint operators for the natural discretizations of the divergence, gradient and curl on logically rectangular grids. Appl. Numer. Math. 25(4), 413–442 (1997) 11. Lysaker, M., Lundervold, A., Tai, X.C.: Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Trans. Image Processing 12(12), 1579–1590 (2003) 12. Lysaker, M., Tai, X.C.: Iterative image restoration combining total variation minimization and a second-order functional. International Journal of Computer Vision 66(1), 5–18 (2006) 13. Rahman, T., Tai, X.C., Osher, S.J.: A TV-stokes denoising algorithm. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 473–483. Springer, Heidelberg (2007) 14. Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976) 15. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 16. Scherzer, O.: Denoising with higher order derivatives of bounded variation and an application to parameter estimation. Computing 60, 1–27 (1998) 17. Setzer, S.: Split Bregman algorithm, Douglas-Rachford splitting and frame shrinkage. In: Lie, K.A., Lysaker, M., Morken, K., Tai, X.C. (eds.) Scale Space and Variational Methods. LNCS. Springer, Heidelberg (2009) 18. Trobin, W., Pock, T., Cremers, D., Bischof, H.: An unbiased second-order prior for high-accuracy motion estimation. In: Rigoll, G. (ed.) DAGM 2008. LNCS, vol. 5096, pp. 296–405. Springer, Heidelberg (2008) 19. Vogel, C.R.: Computational Methods for Inverse Problems. SIAM, Philadelphia (2002)
564
J. Yuan, C. Schnörr, and G. Steidl
20. You, Y.L., Kaveh, M.: Fourth-order partial differential equations for noise removal. IEEE Trans. Image Processing 9(10), 1723–1730 (2000) 21. Yuan, J., Schnörr, C., Steidl, G.: Simultaneous optical flow estimation and decomposition. SIAM J. Scientific Computing 29(6), 2283–2304 (2007) 22. Yuan, J., Schnörr, C., Memin, E.: Discrete orthogonal decomposition and variational fluid flow estimation. J. Math. Imaging and Vision 28(1), 67–80 (2007) 23. Yuan, J., Schnörr, C., Steidl, G.: Convex hodge decomposition and regularization of image flows. J. Math. Imag. Vision 33(2), 169–177 (2009)
Image Denoising by Harmonic Mean Curvature Flow Mourad Zéraï Laboratory for Mathematical and Numerical Modeling in Engineering Science National Engineering School at Tunis ENIT-LAMSIN, B.P. 37, 1002 Tunis Belvédère, Tunisia
[email protected] Abstract. We propose a noise-removal method for vector-valued images by considering the negative gradient flow (the biharmonic map heat flow) of the intrinsic Bi-energy on Riemannian manifold of non-positive curvature. This method represents a natural generalization of both harmonic maps and minimal immersions. It is derived by finding the critical point of the variational problem associated to the integral of the squared norm of the tension-field (Bi-harmonic map) or of the mean curvature vector field (Bi-minimal immersion). In local coordinates, this method yields a fourth order non-linear system of PDE’s that we, numerically, solve by an explicit finite difference method. Experiments on real color-image endowed with the Helmholtz and Stiles metrics show that the proposed method is effective, accurate and highly robust.
1
Introduction
Let (D, g) be a flat 2D image domain endowed with the metric g and mapped in an (V, h) coordinates manifold, which can be, for instance, a color RGB-space endowed with the color-metric h. Consider the energy 1 E2 (u) = |τ (u)|2 dμg , (1) 2 D where μg is the area measure on D endowed with the metric g and τ (u) = trg ∇du is the tension vector field, vanishing for critical points of the Dirichlet energy (i.e. harmonic maps), 1 E(u) = |du|2 dμg . (2) 2 D In local coordinates, it takes the form: 2 α α ∂ u α ij D k ∂u − Γ + τ (u) = g ij ∂xi ∂xj ∂xk
V
α Γβγ
∂uβ ∂uγ ∂xi ∂xj
,
(3)
α where D Γijk and V Γβγ are the Christoffel symbols of the Levi-Civita connections on (D, g) and (V, h). X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 565–575, 2009. c Springer-Verlag Berlin Heidelberg 2009
566
M. Zéraï
Critical points of E2 (u) are called biharmonic maps. The Euler-Lagrange operator attached to the bienergy (1), called the bitension field and derived by Jiang [16] is τ2 (u) = −g τ (u) − trRV (du, τ (u)du). (4) The corresponding gradient flow is given by the geometric evolution problem ∂ut = −τ2 (u),
(5)
where RV is the curvature tensor of V. Jiang also proved that, in the case of a target manifold RV with non-positive curvature, every biharmonic map is harmonic, which is the case of almost all coordinates manifolds (excluding the directional ones) that we are concerned with in image processing. In the same way, if we denote by Imm(D, V) the space of Riemannian immersions in (V, h), then a Riemannian immersion u : (D, g) → (V, h) is called minimal if it is a critical point of the volume functional 1 V : Imm(D, V) → IR, V (u) = dμu∗ h , 2 D where u∗ h is the pull-back metric and μu∗ h is the induced area mesure on D. The corresponding Euler-Lagrange equation is H = 0, where H is the mean curvature vector field. We recall an important fact that will be of great importance in the sequel and established by Eells and Sampson [7] that is : if u is an immersion, then its mean curvature is, up to a constant, the trace of the second fundamental form trg ∇du. As suggested by Eells and Sampson in their seminal paper [7], natural generalization of harmonic maps can be given by considering the critical points of the functionals obtained integrating the square of the norm of the tension field, i.e. : 1 E2 (u) = |τ (u)|2 dμg . 2 D Critical points of the functionals obtained integrating the square of the norm of the the mean curvature vector field, which is known in the literature by the (generalized)-Willmore functional, can represent a possible generalization of minimal immersions. More precisely, biminimal immersions (or Willmore immersions) are critical points of the Willmore functional (see [21] for a short survey on those topics): W(u) = (|H|2 + K)dμu∗ h , (6) D
where K is the sectional curvature of (V, h) restricted to the image of D. Historically, this functional appears in the context of an embedding of a surface Σ in the three dimensional Euclidean space IR3 and consequently with a vanishing sectional curvature K. As noticed by Willmore in [34], it was Weiner in (1978) who has added the curvature term K in the integrand when he considered immersions of orientable surface into a Riemannian manifold of constant sectional curvature [32].
Image Denoising by Harmonic Mean Curvature Flow
567
The Willmore functional appears already in earlier work of Germain [11]. After what it was considered in the early twentieth century in various works by Thomsen [27], and subsequently by Blaschke [2]. It was reintroduced, and more systematically studied within the framework of the conformal geometry of surfaces, by Willmore in 1965 [33]. The Willmore functional also plays an important role in various areas of science. In molecular biology, it is known as the Helfrich Model [13], where it appears as a surface energy for lipid bilayers. In solid mechanics, the Willmore functional arises as the limit-energy for thin-plate theory (see [9]). In general relativity, this functional appears as the main term in the expression of the Hawking quasilocal mass (see [12] and [15]). In image processing, the Willmore functional was used since 2004 by Droske and Rumpf in a level-set formalism (see [6]) for the restoration of damaged region of a surface. Recently Clarenz et al. [5] covers a hole and reconstruct a surface by using a minimizing Willmore energy functional with a finite element implementation leading to smooth surface patches with guaranteed continuity properties. In this paper, we use the generalized Willmore functional (6) in the context of manifold-valued image processing. We are interested with a noise-removal method for multi-channels images, taking color images as a typical representative of this class of images. More precisely, we use flat color metrics, i.e., with vanishing curvature tensor. Namely, we use the Helmholtz and Stiles (flat) metrics, and consequently the curvature term will disappear from (6) which simplifies the expressions of the derived flow which is a nonlinear parabolic PDE’s system ∂ut = g H,
(7)
where g = u∗ h and g stands for the rough Laplacian, i.e.: 1 ∂ H = √ det g ∂xi g
α
∂H α det gg ∂xj ij
+
V
α Γβγ
∂H β ∂H γ ij g . ∂xi ∂xj
(8)
Since the Euler-Lagrange equation associated to (6) reduces to g H = 0. Following the denomination used by Chen in [4] we call the flow (7) harmonic mean curvature flow (HMCF) and it represents an extension to the notion of harmonic mean curvature from the Euclidean setting to the Riemannian one. We refer to [1] for more details about this topic in the Euclidean setting. We will tackle the problem with non flat color metrics such as Schrodinger or VosWalraven metrics in a future work. We note that Sochen and Zeevi [25] have already used the Vos-Walraven line-element in a Riemannian setting for processing color images with the Beltrami flow which yields a second-order nonlinear parabolic PDEs system. Finally, one can remark, formally, the apparent similarity between the two HMCF in Euclidean and (flat)-Riemannian settings since it is well known that every flat Riemannian n-dimensional manifold is locally isometric to IRn (see ( [10], p. 109, for instance).
568
1.1
M. Zéraï
Related Works
In local coordinates, the gradient descent of the Euler-Lagrange equation of the minimization problem related to (6) yields a fourth order non-linear system of PDE’s, and thus our method can be classified in the family of fourth-order parabolic equation for image denoising. This family of methods have gained big importance in the last few years. Indeed, many nonlinear PDEs are proposed to deal with the tradeoff between noise removal and edge preservation. Among them, the fourth-order parabolic PDEs have drawn great interest. In general, the forms of fourth-order PDEs are analogous with the second order ones. For example, the fourth order equation proposed by You and Kaveh [35], ut = −(g(u)u),
(9)
or the equation proposed by Wei [30] ut = −∇(g(|∇u|)∇u),
(10)
are Perona-Malik analogue; the equation proposed by Tumblin and Turk’s in [28] ut = ∇(g(Dij u)∇u),
(11)
where g is a function of the second derivatives of the image intensity function u, is a fourth order possible analogue to the anisotropic diffusion equation of Weickert [31], and finally the equation proposed by Lysaker et al [19] u ut = , (12) |u| is similar to Total Variation model [23].
2
Color-Image as Typical Example of Vector-Valued Image Processing
Since the beginning of quantized color vision theories in the 19th century two approaches have appeared. On one side is the Young-Helmholtz trichromatic approach which is physically oriented and compatible with the science of colorimetry. On the other side is the opponent approach which is mainly based on color sensation. In this paper, we are mainly interested by the geometrical trichromatic approach of Young-Helmholtz. This approach has many good computation characteristics but also many physiological limitations (for a nice discussion about this topic see [3] for instance). In this vein, many approaches were proposed by different line element theory. The notion of line-element is nothing but the metric associated to the color manifold. Amount these lineelement theories we mention Helmholtz [14], Schrödinger [24], Stiles [26] and Vos-Walraven [29]. In the geometrical trichromatic approach, a color image is considered as 3 images: Red, Green, and Blue, (or their many other transformations) that are
Image Denoising by Harmonic Mean Curvature Flow
569
composed into one. These three channels represent a limited domain in the three dimensional Euclidean space IR3 , which we endow with a metric derived from the expression of the considered line-element. With this construction we can consider the color space as a Riemannian (sub-)manifold. The different line-elements (or metrics) proposed in the literature are derived from two main considerations: – The first consideration is what Ron Kimmel has called inductive [18]. In this case, the line elements are established by simple assumptions on the visual response mechanisms. All models of this category assume that the color space can be simplified and represented as a Riemannian space of nonpositive curvature. Some of the proposed metrics have an effect to flatten the color space like the Helmholtz’s or the Stiles’ one. The others have the effect to warp (negatively) the color space like Schrödinger or Vos-Walraven metrics. Roughly speaking, we can see the negative curvature of a manifold like a generalization of Euclidean space (which is flat) in the sense that if two geodesic-lines start from the same point but in different directions, they will never cross again (which is not true in manifold with positive curvature, like the sphere). This characteristic ensures some nice properties like uniqueness results in minimization problems [17]. – The second class of line-elements are derived by empirical considerations. In this category, the metric coefficients are determined to fit empirical data. Among them, some describe an Euclidean space like the CIELAB (CIE 1976 (L*a*b*)) [18]; some others, like MacAdam [20], are based on an effective arclength. 2.1
Vector-Valued Image as Isometric Immersions
Let (D, g) be a flat 2D image domain imbedded in an (V, h) coordinates manifold, which can be, for instance, a color RGB-space endowed with the color-metric h, ˜ = (IR2 ⊕ V, can ⊕ h), where ˜ h) and let (V, – can is the canonical metric of IR2 , – V is an n-dimensional manifold equipped with the metric h and modeling the coordinates space which is the (vector) channels of an image u (the three dimensional RGB space for a color image, for instance), – V˜ is the direct sum of IR2 and V, ˜ = can ⊕ h. – and h A vector-valued image can be described mathematically as an isometric immersion (x1 , x2 ) → u = (x1 , x2 ; v 1 (x1 , x2 ), . . . , v n (x1 , x2 )) of a two-dimensional domain D in the trivial fiber bundle IR2 × (V, h) which is a (2 + n)-dimensional manifold. The image manifold and its metric (D, g) are called the space of parameters in the dynamical system community, the target manifold and its metric ˜ are called the space of coordinates. The metric h ˜ of V is then given by ˜ h) (V, d˜ s2 = ds2spatial + β 2 ds2vector
(13)
570
M. Zéraï
where β is the relative scale between the spatial coordinates and the intensity components which we will set equal to one for sake of simplicity. We can rewrite the metric defined by (13) as the quadratic form d˜ s2 = (dx1 )2 + (dx2 )2 + (dv)T h(dv), where v = (v i ). The corresponding metric tensor is I2 02,n ˜ h= , 0n,2 h where h is the metric tensor of V. ˜ Therefore Since the image is an isometric immersion, we have g = u∗ h. gαβ = δαβ + hij ∂α v i ∂β v j ,
α, β = 1, 2,
i, j = 1, . . . , n,
(14)
where n = 3 if we deal with color-images.
3 3.1
Main Examples of Flat Color-Metric Helmholtz’s Color Metric
Hermann von Helmholtz (1821-1894), was the first who had attempted to mathematically formulate the distance between colors by the concept of line element. He defines the following line element: 2 2 2 dR dG dB + + , (15) ds2 = R G B where R, G and B are the three color channels: Red, Green and blue. In local coordinates, this can be expressed as a positive definite symmetric matrix: ⎛ 1 ⎞ 0 0 x21 ⎜ ⎟ 1 (hij )i,j=1,2,3 = ⎝ 0 x22 0 ⎠ , (16) 0 0 x12 3
where we use the coordinate notation x1 = R, x2 = G and x3 = B. The color space is defined as a domain D in the positive orthant IR3+ defined by:
IR3+ = x ∈ IR3 | xi > 0, i = 1, 2, 3 (17) Having the expression of the metric, we can now give the Christoffel symbols using the formula: 1 ∂hkl ∂hjk ∂hjl i , (18) Γjk = hij + − 2 ∂xk ∂xj ∂xl and hence, the non vanishing Christoffel symbols are 1 1 1 2 3 , Γ22 = − , Γ33 =− . R G B A simple computation shows that the color-manifold endowed with Helmholtz metric is flat. 1 Γ11 =−
Image Denoising by Harmonic Mean Curvature Flow
3.2
571
Stiles’ Color Metric
Walter W. Stiles modified the Helmholtz’s proposal in order to better account for observations of threshold values (see [26] p. 660). Thus he proposed the following form of color-metric: 2 2 2 ζ(R) ζ(G) ζ(B) 2 dR + dG + dB (ds) = ρ γ β where: 9 , 1 + 9R
ζ(R) =
ζ(G) =
9 , 1 + 9G
ζ(B) =
9 . 1 + 9B
The functions ζ(R), ζ(G) and ζ(B) are determined experimentally. The constant ρ, γ and β are proportional to the limiting Weber fractions of the three cone responses at high luminances and Stiles obtained the following values: ρ = 1.28,
γ = 1.65,
β = 7.25
At high luminances, the Stiles’ metric reduces to 2
(ds) =
dR ρR
2
+
dG γG
2
+
dB βB
2
and in this form its relationship with the Helmholtz’s metric is obvious. With the same notations as the previous section and using equation (18) we have 9 9 9 1 2 3 , Γ22 , Γ33 . Γ11 =− =− =− 1 + 9R 1 + 9G 1 + 9B Another simple computation shows that the color-manifold endowed with Stiles’ metric is flat.
4
Harmonic Mean Curvature Flow
We consider the flow ∂ut = g H,
(19)
where H, in local coordinates, takes the form (up to a multiplicative constant): ∂uβ ∂uγ ij g , (20) ∂xi ∂xj g = u∗ h the pull-back induced metric and the Laplace-Beltrami operator. Suppose that the color-space is Euclidean, then all the Christoffel symbols V α Γβγ vanish and (19) becomes H α (u) = u +
V
α Γβγ
∂ut = (u)
(21)
572
M. Zéraï
where = √
1 ∂ det g ∂xα
∂ det gg αβ β ∂x
.
and we recover a You-Kaveh type equation (9) when we consider the intrinsic formulation (21), and a Wei type equation (10) if we consider (21) in local coordinates. α ∂uβ ∂uγ ij term, is to constraint the flow to live on The effect of the V Γβγ ∂xi ∂xj g the color-manifold, and thus to take account of the physiological aspects of the different luminances.
5
Numerical Issues
The corresponding gradient descent of the minimizing of the functional (6) leads to a system of fourth order partial differential equations. We have used an explicit finite difference discretization approach for this PDEs system which requires the evaluation of higher order derivatives and comes along with strong restrictions on the time step. This is not the better method to deal with this problem. To overcome these difficulties, a better strategy is, for instance, the discretization by the finite element method as it was used by Clarenz et al [5]. Nevertheless, the finite difference explicit scheme we used seems to be very robust and effective in numerical experiments.
6
Experiments
To be sure that our model is effective and works, we made some tests on different color-images. In figure-1 are presented the affects of HMCF, with Helmholtz and Stiles metrics, on a detail of the peppers color image. In order to test the efficiency of our method we must compare it with other fourth-order methods (and even second-order). That’s what we will accomplish in the future. It is interesting to analyse the figure-2 where are presented in grey-levels images the intensities of the four entries of the inverse image metric tensor, namely (g ij ) with the above notations. It is clearly shown that (g ij ) collects the morphological structure of the image, and acts like an anisotropic edge stopping
Fig. 1. From left to right : 1- Original image as a little detail from peppers, 2- Highly degraded image, 3- HMCF with Helmholtz metric, 100 iterations at dt =0.05, 4- HMCF with Stiles metric, 100 iterations at dt =0.05
Image Denoising by Harmonic Mean Curvature Flow
573
Fig. 2. Grey-levels image representation of the four entries of the (symmetric) tensor g ij which is the inverse of the induced metric gij and acts like an anisotropic edge stopping function
function. This fact proves, empirically, that our method preserves the contours of an image, while it smoothes homogeneous region. And the fact that the edge stopping function is of matrix-type, then anisotropic, motivates the comparison with the fourth-order equation (11) proposed by Tumblin and Turk’s [28].
Acknowledgment I am indebted to Professor Maher Moakher for encouragement, insightful comments and assistance throughout my work.
References 1. Barros, M., Garay, O.J.: On Submanifolds with Harmonic Mean Curvature. Proceedings of the American Mathematical Society 123(8) (August 1995) 2. Blaschke, W.: Vörlesungen über Differential Geometrie III. Springer, Berlin (1929) 3. Buchsbaum, G., Gottschalk, A.: Trichromacy, opponent colours coding and optimum colour information transmission in the retina. Proc. R. Soc. Lond. B (220), 89–113 (1983) 4. Chen, B.-Y.: Some open problems and conjectures on submanifolds of finite type. Soochow J. Math. 17, 169–188 (1991)
574
M. Zéraï
5. Clarenz, U., Diewald, U., Dziuk, G., Rumpf, M., Rusu, R.: A finite element method for surface restoration with smooth boundary conditions. Computer Aided Geometric Design 21(5), 427–445 (2004) 6. Droske, M., Rumpf, M.: A level set formulation for Willmore flow. Interfaces and Free Boundaries 6(3), 361–378 (2004) 7. Eells, J., Sampson, J.H.: Harmonic mappings of Riemannian manifolds. Amer. J. Mah. 86, 109–160 (1964) 8. Eliasson, H.I.: Introduction to global calculus of variations. In: Global analysis and its applications, IAEA, Vienna, vol. II, pp. 113–135 (1974) 9. Friesecke, G., James, R.D., Muller, S.: A theorem on geometric rigidity and the derivation of nonlinear plate theory from three-dimensional elasticity. Commun. Pure Appl. Math. 17(11), 1461–1506 (2002) 10. Gallot, S., Hulin, D., Lafontaine, J.: Riemannian Geometry, 2nd edn. Springer, Heidelberg (1990) 11. Germain, S.: Recherches sur la théorie des surfaces élastiques. Courcier, Paris (1821) 12. Hawking, S.W.: Gravitational radiation in an expanding universe. J. Mat. Phys. 9, 598–604 (1968) 13. Helfrich, W.: Elastic properties of lipid bilayers: theory and possible experiments. Z. Nat. forsch. A C28, 693–703 (1973) 14. von Helmholtz, H.: Handbuch der Physiologischen Optik. Voss, Hamburg (1896) 15. Huisken, G., Ilmanen, T.: The Riemannian Penrose inequality. Int. Math. Res. Not. 1997(20), 1045–1058 (1997) 16. Jiang, G.Y.: 2-harmonic maps and their first and second variational formulas. Chin. Annals Math. A 7, 389–402 (1986) 17. Jost, J.: Riemannian Geometry and Geometric Analysis, 2nd edn. Springer, Heidelberg (1998) 18. Kimmel, R.: A natural norm for color processing. In: Chin, R., Pong, T.-C. (eds.) ACCV 1998. LNCS, vol. 1351, pp. 88–95. Springer, Heidelberg (1997) 19. Lysaker, M., Lundervold, A., Tai, X.C.: Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Transactions on images processing 12, 1579–1590 (2003) 20. MacAdam, D.L.: Visual sensitivity to color differences in daylight. J. Opt. Soc. Am. 32, 247 (1942) 21. Montaldo, S., Oniciuc, C.: A short survey on biharmonic maps between Riemannian manifolds. Revista de la Unión Mathemática Argentina 47(2) (2006) 22. Olischlager, N., Rumpf, M.: A two step time discretization of Willmore flow. In: 21st Chemnitz FEM Symposium (2008) 23. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 24. Schroedinger, E.: Grundlinien einer theorie de farbenmetrik in tagessehen. Ann. Physik 63, 481 (1920) 25. Sochen, N., Zeevi, Y.: Using Vos-Walraven line element for Beltrami flow in color images. EE-Technion and TAU HEP report Technion and Tel-Aviv University (1992) 26. Stiles, W.S., Wyszecki, G.: Color Science, Concepts and Methods, Quantitative Data and Formulae. John Wiley & Sons, Inc., Chichester (2000) 27. Thomsen, G.: Über konforme Geometrie, I. Grundlagen der konformen Flächentheorie. Abh. Math. Semin. Univ. Hamburg, 31–56 (1923)
Image Denoising by Harmonic Mean Curvature Flow
575
28. Tumblin, J., Turk, G.: LCIS: A boundary hierarchy for detail-preserving contrast reduction. In: Proceeding of the SIGGRAPH annual conference on Computer Graphics, Los Angeles, CA USA, August 1999, pp. 83–90 (1999) 29. Vos, J.J., Walraven, P.L.: An analytical description of the line element in the zonefluctuation model of colour vision II. The derivative of the line element. Vision Research (12), 1345–1365 (1972) 30. Wei, G.: Generalized Perona-Malik equation for image processing. IEEE Signal Processing Letters 6(7), 165–167 (1999) 31. Weickert, J.: Anisotropic diffusion in image processing. Teubner (1998) 32. Weiner, J.L.: On a problem of Chen, Willmore et Alia. Indiana University Math. J. (27), 19–35 (1978) 33. Willmore, T.J.: Note on embedded surfaces. An. Stiint. Univ. Al. I. Cuza Iasi., Ser. Noua, Mat. 11B, 493–496 (1965) 34. Willmore, T.J.: Riemannian Geometry. Owford Science Publications (1993) 35. You, Y.L., Kaveh, M.: Fourth-order partial differential equations for noise removal. IEEE Transactions on Image Processing 10(9), 1723–1730 (2000)
Tracking Closed Curves with Non-linear Stochastic Filters Christophe Avenel1 , Etienne Mémin2 , and Patrick Pérez2 1
ENS Cachan / IRISA INRIA, Vista Project, Center of Rennes {Christophe.Avenel,Etienne.Memin,Patrick.Perez}@irisa.fr 2
Abstract. The joint analysis of motions and deformations is crucial in a number of computer vision applications. In this paper, we introduce a non-linear stochastic filtering technique to track the state of a free curve. The approach we propose is implemented through a particle filter which includes color measurements characterizing the target and the background respectively. We design a continuous-time dynamics that allows us to infer inter-frame deformations. The curve is defined by an implicit level-set representation and the stochastic dynamics is expressed on the level-set function. It takes the form of a stochastic differential equation with Brownian motion of low dimension. Specific noise models lead to traditional evolution laws based on mean curvature motions, while other forms lead to new evolution laws with different smoothing behaviors. In these evolution models, we propose to combine local motion information extracted from the images and an incertitude modeling of the dynamics. The associated filter we propose for curve tracking thus belongs to the family of conditional particle filters. Its capabilities are demonstrated on various sequences with highly deformable objects.
1
Introduction
Tracking deformable structures delineated by free curves, with no prior on their possible shapes, is a very challenging problem. As a matter of fact, the shape of a deformable object or even of a rigid body may change drastically when visualized from an image sequence. These deformations are due to object apparent motion, to perspective effects and to 3D shape evolution. This difficulty is amplified when the object becomes partially or totally occluded during even a very short time period. The presence of cluttered background and ambiguities constitutes other difficulties for tracking. For curve tracking numerous approaches based on the level set representation have been proposed [1, 2, 3, 4, 5, 6, 7]. These techniques mainly addressed the problem as a succession of instantaneous detection or segmentation problems. At best only discrete snapshots of the location of the object of interest are provided and no dynamical or morphological consistency can be really enforced. Implausible growing/decreasing or merging/splitting cannot be avoided without resorting to shape priors [8,9,10]. This reduces considerably the generality of the tracker and restrains its use to very specific applications [8,10]. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 576–587, 2009. c Springer-Verlag Berlin Heidelberg 2009
Tracking Closed Curves with Non-linear Stochastic Filters
577
Such deterministic approaches have also great difficulties to cope with ambiguities and noise. The explicit introduction of a dynamics in the curve evolution law has been considered in [4]. However, the proposed technique, although much more satisfying from the point of view of the forecasting of the curves, is not embedded into a tracking framework. In [11], an approach based on a group action mean shape and a moving average has been proposed. This tracking is restricted to simple motions. Recently an optimal control strategy has been defined for curve tracking [12]. This technique permits to cope with non linear differential evolution laws. It is nevertheless a deterministic technique that only involves Gaussian incertitude on the dynamical system. It is also a batch technique which relies on the entire image sequence. It can hardly be used for on-line tracking. The extraction of state trajectories relying on past measurements and on a dynamical model, as done with stochastic filtering, permits to handle naturally partial occlusions, cluttered noise and ambiguities. It enables also to rely on an approximate knowledge of the underlying dynamics. However, the state dimension constitutes the Achille’s heel of recursive Bayesian filter such as the particle filter. Due to this so called curse of dimensionality, only few works attempted to mix stochastic filtering and level set representation for curve tracking [13, 14]. These works have to face a high dimensional sampling problem and as a consequence rely on a crude discretization of the non linear curve dynamics which may be problematic in some situations. The approach we proposed for curve tracking is also implemented through a particle filter and a level set representation. This approach includes color measurements characterizing the target and the background respectively [15]. The dynamics involved is formulated as a stochastic differential equation. This allows us to get a continuous-time representation of the curve trajectory and, thus, to infer inter-frame deformations. This gives access to richer dynamics on curves. It would also permit the use of continuous time physical evolution laws in specific contexts. The stochastic dynamics is expressed on the level-set function and takes the form of a stochastic differential equation with Brownian motion of low dimension. Although such an attempt has been done to build stochastic dynamics for image segmentation in [16], our approach is different, as it integrates naturally the contribution of noise in the dynamics derivation. It also allows interpreting additional smoothing terms on the curve as a consequence of the incertitude we have on the curve dynamics. Conceptually, this yields a rigourus derivation of the curve dynamics, enabling to handle topological changes occuring between two frame instants, and also to cope with the propagation of possibly irregular curves driven by noisy motion fields. No adhoc, additional filters are here needed to propagate the curve. Such a smoothing is expicitly handled within the expression of the stochastic expression of the level set dynamics. The evolution models we propose combines local motion information extracted from the image and the modeling of dynamics uncertainty. The associated filter thus belongs to the family of conditional particle filters [17].
578
2
C. Avenel, E. Mémin, and P. Pérez
Stochastic Filtering and Particle Filter
Before introducing in detail the stochastic evolution laws on which we will rely in this work we present in this section the generic problem of continuous time stochastic filtering in presence of discrete-time measurements. Stochastic filters constitute well known procedures to estimate the posterior pdf p(xk |z1:k ) (called the filtering distribution) of a state variable of interest at any measurement instant k, given the discrete measurements series z1:k = (z1 · · ·zk ) until instant k, and an initial distribution p(x0 ). In the following, we consider a continuous time state xt . We will denote by xt=k or xk its value at the measurement instant k. At each time instant k, the measurement equation relates the observation zk to the state xk . In this work the general system we are dealing with is described by: dxt = f (xt )dt + σ(t)dBt , (1) zk = g(xk ) + vk , where Bt is a Brownian motion and vk is a noise variable. Functions f and g are non linear in the general case. Assuming there exists a transition distribution p(xt |xr =
=
σ12 ϕ2x +σ22 ϕ2y dt, ∇ϕ2 σ12 ϕ2y +σ2 ϕ2x dt, ∇ϕ2 (σ12 −σ22 )ϕx ϕy dt. ∇ϕ2
(15)
Introducing the surface normal expression, the Îto diffusion [21] driving the implicit surface evolution reads finally: dϕt = wn∗ ∇ϕ +
1 (ϕxx (σ12 ϕ2x + σ22 ϕ2y ) + ϕyy (σ12 ϕ2y + σ22 ϕ2x ) 2∇ϕ2 (1)
+2(σ12 − σ22 )ϕx ϕy ϕxy ))dt + σ1 ∇ϕdBt .
(16)
582
C. Avenel, E. Mémin, and P. Pérez
Recalling that the mean curvature can be expressed as: κ = curv(ϕ) =
1 (Δϕ − ∇ϕT ∇2 ϕ ∇ϕ), ∇ϕ
(17)
where ∇2 ϕ denotes the Hessian matrix and Δϕ the Laplacian, the surface evolution law may be written in a more compact form as: dϕ = (wn∗ ∇ϕ +
σ12 σ22 (1) κ∇ϕ + ∇ϕT ∇2 ϕ ∇ϕ)dt + σ1 ∇ϕdBt . (18) 2 2∇ϕ2
It can be observed from (18) that if both incertitudes have the same strength (i.e. σ1 = σ2 ) this model takes a particular simple form: 1 (1) dϕt = (wn∗ ∇ϕ + σ12 Δϕ)dt + σ1 ∇ϕdBt . 2
(19)
The dynamical model (2) constitutes a general stochastic process allowing to guide a curve through an implicit surface. This stochastic process will enable us to draw samples of curves in our tracking process. Before turning to the experiments, it is interesting to see to what corresponds the expectation of these stochastic processes. It can be shown, through Kolmogorov backward equation (the adjoint of the Fokker-Planck equation) that the expectation u(x, t) = Ex (Φ(X t )) evolves as: σ12 σ2 ∂u = (wn∗ + 2 κ)∇u + ∇uT ∇2 u ∇u, and u(x, 0) = Φ0 (x), (20) ∂t 2 2∇u2 where Φ0 denotes the initial surface, built from an initial value of the contour. This equation gives us the evolution law of the expectation on a fixed grid of an implicit surface driven by a stochastic dynamical model of form (9). This dynamical model includes two independent Brownian uncertainty on the curve motion directed along the curve’s tangent and normal respectively. The first term corresponds to the traditional deterministic evolution law of a level set function. The curvature term is here introduced due to the effect of the motion incertitude along the curves tangent. The second term is less usual and corresponds to an uncertainty directed along the surface normal. If both uncertainties are set to the same amplitude then the previous equation simplifies as: = wn∗ ∇u + u(x, 0) = Φ0 (x). ∂u ∂t
4
σ2 2∇Φ2 Δu,
(21)
Experiments and Results
Motion Information Extracted from the Images. The evolution laws introduced in the previous section are based on a stochastic force w calculated
Tracking Closed Curves with Non-linear Stochastic Filters
583
from the image. We now introduce the force we use in our experiments. It is a linear combination of two main components: w ∗(i) = β(t)v T n + (1 − β(t))∂ϕ F (ϕ(i) ) n
(22)
with proportions β(t) ∈ [0, 1] and 1 − β(t) respectively. The first component is a motion component obtained from an optical flow computation, while the second corresponds to a photometric edge component obtained from a generalized ChanVese operator [12]. Optical-Flow Component. The motion component v = (v x , v y )T is provided by a robust and fast optical-flow estimator. It is defined as the minimizer of the objective function: T f (∇I v + I(t + dt) − I(t)1p(zt |ν(x)) 0 two fixed parameters and c1 and c2 being two unknown constants depending on Φ0 , R and u, the distance measure functional Fd (the segmentation criterion) is defined by: |R(x) − c1 |2 H (Φ0 (x + u(x))) dx Fd (c1 , c2 , u) = ν1 Ω |R(x) − c2 |2 (1 − H (Φ0 (x + u(x)))) dx. (1) + ν2 Ω
We need to add a regularization term of the form Freg (u) to (1), which is a substitute for the length term of the evolving curve in [4], and therefore the unknown Φ(x) from [4] is substituted by Φ0 (x + u(x)), with Φ0 fixed now. Thus, we obtain a binary segmentation method that can also be used for registration. Introduction of a Nonlinear Elasticity-Based Regularizer. A regularizing term Freg is now introduced to ensure the smoothness of the displacement vector field u. To allow large displacements, we introduce a nonlinear-elasticitybased smoother. We propose to view the deformation of the initial contour into the final segmented contour as the deformation undergone by St. Venant-Kirchhoff materials. These materials are homogeneous, isotropic, hyperelastic and the axiom of frame indifference is satisfied (see [8] for further details). Let us denote by ε the Green-St. Venant strain tensor defined by: ε = 12 (C − I) with C = ∇ϕT ∇ϕ, ϕ being the deformation such that ϕ = Id+u, ∇ϕ being the Jacobian matrix and I denoting the identity matrix. We have equivalently ε = ε(u) = 12 (∇uT + ∇u + ∇uT ∇u). The strain tensor is a measure of the deviation between a given deformation and a rigid deformation for which C = I. As stressed by Ciarlet ( [8]), St. Venant-Kirchhoff materials are the simplest ones among nonlinear models (large strains are also possible when the stress is small, however a linear relation implies that the stress is small if and only if the strain is small). The stored energy of St. Venant-Kirchhoff materials [8] is given by W (ε) = λ2 (tr ε)2 + μtr ε2 . Thus, the nonlinear elasticity regularizer that will be coupled with the distance measure functional Fd is defined by: λ 2 2 (tr ε(u)) + μtr ε (u) dx . W (ε(u)) dx = (2) Freg (u) = 2 Ω Ω
A Combined Segmentation and Registration Framework
605
Although this functional does not satisfy known theoretical assumptions (the stored energy function is not polyconvex; it is also not rank-1 convex and consequently not quasiconvex, which raises a drawback of theoretical nature since the introduced functional is not lower semi-continuous on W 1,4 ) to insure existence of minimizers, we can expect to get, in practice, better results than those obtained with linearized models, as will be demonstrated next. The computation of the Euler-Lagrange equation satisfied by u is cumbersome. Following the idea of the more theoretical work [25], we propose to circumvent this issue by introducing a second unknown, a matrix auxiliary variable V , which approximates the Jacobian matrix of u. The nonlinear elasticity regularizer is thus applied to V and no longer to ∇u, that is, the nonlinearity is no longer in the derivatives of the unknown u. Also, as the matrix variable V is introduced to mimic the Jacobian matrix of u, an additional term based on the Frobenius norm denoted by || · ||F of ∇u − V is incorporated in the modeling. More precisely, letting T T V = V +V2+V V and α > 0 a tuning parameter, we redefine the smoothing functional Freg = Freg (u, V ) by: α W (V ) dx + ||∇u − V ||2F dx . (3) Freg (u, V ) = 2 Ω Ω In the limit, as α → +∞, we obtain ∇u V in the L2 -topology. Total Energy Functional. The total energy Etotal considered in the remainder of this work is given by: Etotal (c1 , c2 , u, V ) = Fd (c1 , c2 , u) + Freg (u, V ).
(4)
Evolution Problem. We give the form of the associated Euler-Lagrange equations in the two-dimensional case. In the calculations, the Heaviside function is replaced by a smooth version denoted by H and H = δ , regularization of the Dirac measure. Fixing u and V and minimizing Etotal (c1 , c2 , u, V ) with respect to c1 and c2 yields, as in [4]: R(x)H (Φ0 (x + u(x))) dx R(x) (1 − H (Φ0 (x + u(x)))) dx , c2 = Ω . c1 = Ω H (Φ0 (x + u(x))) dx (1 − H (Φ0 (x + u(x)))) dx Ω
Ω
Computing the first variation of functional Fd (c1 , c2 , u) in (1) with respect to u gives the following gradient: ∂u Fd (c1 , c2 , u) = ν1 (R − c1 )2 −ν2 (R − c2 )2 δ (Φ0 (x + u(x))) ∇Φ0 (x+u(x)) . Also, computing the first variation of functional Freg (u, V ) in (3) with respect to u gives only linear differential equations in each ui :
∂vk2 ∂vk1 , k = 1, 2. (5) + ∂uk Freg (u, V ) = −α uk − ∂x1 ∂x2
606
C. Le Guyader and L.A. Vese
To finish, setting V = (vij )1≤i,j≤2 and letting c01 = v11 + v22 +
1 2 2 2 2 2 2 v11 + v12 , c02 = 2v11 + v11 + v21 + v22 + v21 2
2 2 c03 = 2v22 + v12 + v22 , c04 = v12 + v21 + v11 v12 + v21 v22 ,
we obtain:
∂v11 Freg (u, V ) = α v11 −
∂u1 ∂x1
∂v12 Freg (u, V ) = α v12 −
∂u1 ∂x2
∂v21 Freg (u, V ) = α v21 −
∂u2 ∂x1
∂v22 Freg (u, V ) = α v22 −
∂u2 ∂x2
+ (λc01 + μc02 )(1 + v11 ) + μc04 v12 .
+ (λc01 + μc03 )v12 + μc04 (1 + v11 ).
+ (λc01 + μc02 )v21 + μc04 (1 + v22 ).
+ (λc01 + μc03 )(1 + v22 ) + μc04 v21 .
(6)
We solve the Euler-Lagrange equations in u and V using gradient descent, parameterizing the descent direction by an artificial time t ≥ 0. Systems of 4 and 2 equations are obtained (solved by semi-implicit finite difference schemes), ∂V = −∂V Freg (u, V ), ∂t
∂u = −∂u Fd (c1 , c2 , u) − ∂u Freg (u, V ), ∂t
(7)
equipped with the boundary conditions u = 0IR2 on ∂Ω and with the initial conditions u(x, 0) = 0IR2 and V = 0M2 (IR) . In most cases, no regridding is necessary. Nevertheless, in the algorithm, we have used a regridding technique quite similar to the one proposed by Christensen et al. [7]. The Jacobian det(∇(Id + u)) is monitored and if it drops below a defined threshold in some parts of the image, the process is reinitialized. The only change is that instead of doing the reinitialization step with the last deformed template as done in [7], we use the last deformed level set function Φ0 (· + u(·)). The overall displacement u is reconstructed similarly to [7].
3
Numerical Experiments
We conclude the paper by presenting several results on both synthetic and real images in 2 dimensions. In most experiments, ν1 = ν2 = 1 but when dealing with complex topologies involving long and thin concavities, these parameters have been increased up to 2.5. The C ∞ regularization of the Heaviside function [4] is 1 2 z H (z) = 2 1 + π arctan . Our first experimental test in Fig. 1 is an academic one and is similar to those performed by Modersitzki in [24] (we refer to pages 114–115, 129–130, 150–153, 168–170 for comparisons using linear elasticity, diffusion, curvature, or the viscous fluid method), with the goal to illustrate that the model easily handles large displacements while segmenting the reference object. The problem is to warp a
A Combined Segmentation and Registration Framework
607
Fig. 1. Top: left, the reference image; right the template. Bottom: left, the boundary of the disk (zero level set of Φ0 ) superimposed on the reference image; middle, the segmentation of the letter C; right, deformed grid using nonlinear elasticity regularization.
Fig. 2. Left, boundary of the ellipse (zero level set of Φ0 ) superimposed on the reference image; middle, the topology-preserving segmentation of the two disks; right, deformed grid using nonlinear elasticity regularization
black disk to the letter C both defined on the same image domain. The given data are the template and reference images as well as the curve delineating the disk boundary. We wish to demonstrate that our method qualitatively performs in a way similar to the fluid model without requiring the expensive Navier-Stokes solver employed for its numerical discretization, and provides two results: the segmentation of the reference image as well as a smooth displacement vector field u. The implementation is simple, based on finite difference schemes, and allows to remove the nonlinearity in the derivatives of the unknown u. The method allows large deformations unlike the linear elasticity model, diffusion model, curvature-based model for which the registration cannot be accomplished, the images differing too much (see pages 114–115, 150–153, 168–171 from [24]). In this example, three regridding steps were necessary: the transformation was considered as admissible if the Jacobian exceeded 0.01. Note that regridding steps were also necessary with the fluid registration model.
608
C. Le Guyader and L.A. Vese
Fig. 3. Topology-preserving segmentation of three complex slices of the brain. Left, the boundary of the disk (zero level set of Φ0 ) superimposed on the reference image; middle, the segmentation of the slice of the brain; right, deformed grid using nonlinear elasticity regularization.
The second example in Fig. 2 illustrates how the method can be used in the case of topology-preserving segmentation ([16], [1], [28], [17] on this topic). This synthetic reference image represents two disks (similar to tests performed in prior related works [16], [28], [17]). The template image, defined on the same image domain is made of a black ellipse such that, when superimposed on the reference image, its boundary encloses the two disks. We aim at segmenting these two disks while maintaining the same topology throughout the process (one pathconnected component) and at obtaining a smooth displacement vector field u. In this example, two regridding steps were necessary: the transformation was considered as admissible if the Jacobian exceeded 0.01. The method has been tested on complex slices of brain data. The goal is to register a disk to the outer boundary of the brain with topology preservation. In Fig. 3, the template image, defined on the same image domain, is made of a disk (shown superimposed on the reference). Two regridding steps were necessary for the first row, and 3-4 regridding steps for the 2nd and 3rd rows: the transformation was considered as admissible if the Jacobian exceeded 0.01.
A Combined Segmentation and Registration Framework
609
Fig. 4. Top: left, reference R; right, template T (mouse atlas and gene data). Bottom, left to right: contour obtained by the proposed algorithm segmenting template T (starting with Φ0 defining a disk), superimposed over the reference R; segmented reference, using as Φ0 the output contour detected at the previous step; final deformed grid using nonlinear elasticity smoother.
Fig. 5. Experiment exactly as in Fig. 4
Another medical application, as shown in Fig. 4 and Fig. 5, is proposed for mapping mouse gene data to an atlas. First, the proposed method is applied to the gene data, using Φ0 defining a disk, to segment it and extract a contour; then the method is applied again using as Φ0 the new contour, to segment the atlas data. In the process, we obtain a smooth deformation between the gene and the atlas data. No regridding step was necessary for Fig. 4.
Acknowledgments This work was supported in part by the National Institutes of Health (NIH) through the NIH Roadmap for Medical Research Grant U54 RR021813 entitled
610
C. Le Guyader and L.A. Vese
Center for Computational Biology, and by the National Science Foundation Grant DMS 0312222.
References 1. Alexandrov, O., Santosa, F.: A topology-preserving level set method for shape optimization. J. Comput. Phys. 204(1), 121–130 (2005) 2. Beg, F., Miller, M., Trouvé, A., Younes, L.: Computing large deformation metric mappings via geodesic flows of diffeomorphisms. IJCV 61(2), 139–157 (2005) 3. Broit, C.: Optimal Registration of Deformed Images. PhD thesis, Computer and Information Science, University of Pensylvania (1981) 4. Chan, T., Vese, L.: Active Contours Without Edges. IEEE Trans. Image Process. 10(2), 266–277 (2001) 5. Chen, Y., Thiruvenkadam, H., Tagare, H., Huang, F., Wilson, D.: On the Incorporation of Shape Priors in Geometric Active Contours. In: IEEE Workshop on VLSM, pp. 145–152 (2001) 6. Chen, Y., Thiruvenkadam, H., Gopinath, K., Brigg, R.: Image Registration Using the Mumford-Shah Functional and Shape Information. In: World Multiconference on Systems, Cybernetics and Informatics, pp. 580–583 (2002) 7. Christensen, G.E., Rabbitt, R.D., Miller, M.I.: Deformable Templates Using Large Deformation Kinematics. IEEE Trans. Image Process. 5(10), 1435–1447 (1996) 8. Ciarlet, P.G.: Elasticité Tridimensionnelle. Masson (1985) 9. Droske, M., Rumpf, M.: A variational approach to non-rigid morphological registration. SIAM J. Appl. Math. 64(2), 668–687 (2004) 10. Duay, V., Houhou, N., Thiran, J.-P.: Atlas-based segmentation of medical images locally constrained by level sets. In: ICIP, vol. 2 (2005) 11. Fischer, B., Modersitzki, J.: Fast Diffusion Registration. AMS Contemporary Mathematics. Inverse Problems, Image Analysis, and Medical Imaging 313, 117– 129 (2002) 12. Fischer, B., Modersitzki, J.: Curvature based image registration. JMIV 18(1), 81– 85 (2003) 13. Fischer, B., Modersitzki, J.: A Unified Approach to Fast Image Registration and a New Curvature Based Registration Technique. Linear Algebra and its applications 380, 107–124 (2004) 14. Haber, E., Modersitzki, J.: Numerical methods for volume preserving image registration. Inverse problems 20(5), 1621–1638 (2004) 15. Haber, E., Modersitzki, J.: Image Registration with Guaranteed Displacement Regularity. Int. J. Comput. Vision 71(3), 361–372 (2007) 16. Han, X., Xu, C., Prince, J.L.: A Topology Preserving Level Set Method for Geometric Deformable Models. IEEE Trans. Pattern Anal. Mach. Intell. 25(6), 755–768 (2003) 17. Le Guyader, C., Vese, L.: Self-repelling snakes for topology-preserving segmentation models. IEEE Trans. Image Process. 17(5), 767–779 (2008) 18. Le Guyader, C., Vese, L.: A Combined Segmentation and Registration Framework with a nonlinear Elasticity Smoother. UCLA C.A.M. Report 08-16 (2008) 19. Leow, A., Chiang, M.-C., Protas, H., Thompson, P., Vese, L., Huang, H.S.C.: Linear and Non-Linear Geometric Object Matching with Implicit Representation. In: Proc. 17th ICPR, vol. 3, pp. 710–713 (2004)
A Combined Segmentation and Registration Framework
611
20. Liao, W.-H., Khuu, A., Bergsneider, M., Vese, L., Huang, S.-C., Osher, S.: From Landmark Matching to Shape and Open Curve Matching: A Level Set Approach. UCLA CAM Report 02-59 (2002) 21. Liao, W.-H., Yu, C.-L., Bergsneider, M., Vese, L., Huang, S.-C.: A New Framework of Quantifying Differences Between Images by Matching Gradient Fields and Its Application to Image Blending. In: Nuclear Science Symposium Conference Record, vol. 2, pp. 1092–1096. IEEE, Los Alamitos (2002) 22. Lord, N.A., Ho, J., Vemuri, B.C., Eisenschenk, S.: Simultaneous Registration and Parcellation of Bilateral Hippocampal Surface Pairs for Local Asymmetry Quantification. IEEE Trans. Med. Imaging 26(4), 417–478 (2007) 23. Miller, M., Trouvé, A., Younes, L.: On the metrics and Euler-Lagrange equations of computational anatomy. Annu. Rev. B. Eng. 4, 375–405 (2002) 24. Modersitzki, J.: Numerical Methods for Image Registration. Oxford University Press, Oxford (2004) 25. Negrón Marrero, P.V.: A numerical method for detecting singular minimizers of multidimensional problems in nonlinear elasticity. Numerische Mathematik 58, 135–144 (1990) 26. Rabbitt, R.D., Weiss, J.A., Christensen, G.E., Miller, M.I.: Mapping of hyperelastic deformable templates using the finite element method. In: Proceedings SPIE, vol. 2573, pp. 252–265 (1995) 27. Rouchdy, Y., Pousin, J., Schaerer, J., Clarysse, P.: A nonlinear elastic deformable template for soft structure segmentation: application to the heart segmentation in MRI. IP 23, 1017–1035 (2007) 28. Sundaramoorthi, G., Yezzi, A.: Global regularizing flows with topology preservation for active contours and polygons. IEEE Trans. Image Process. 16(3), 803–812 (2007) 29. Unal, G.B., Slabaugh, G.G.: Coupled PDE’s for non-rigid registration and segmentation. In: CVPR, pp. 168–175 (2004) 30. Vemuri, B., Chen, Y.: Joint image registration and segmentation. In: Osher, S., Paragios, N. (eds.) Geometric Level Set Methods, pp. 251–269 (2003) 31. Vemuri, B., Ye, J., Chen, Y., Leonard, C.: A level-set based approach to image registration. In: IEEE Workshop on Mathematical Methods in Biomedical Image Analysis, pp. 86–93 (2000) 32. Vemuri, B., Ye, J., Chen, Y., Leonard, C.: Image Registration via level-set motion: Applications to atlas-based segmentation. Medical Image Analysis 7(1), 1–20 (2003) 33. Vese, L., Chan, T.: A Multiphase Level Set Framework for Image Segmentation Using the Mumford and Shah Model. IJCV 50(3), 271–293 (2002) 34. Wang, F., Vemuri, B.C.: Simultaneous registration and segmentation of anatomical structures from brain MRI. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 17–25. Springer, Heidelberg (2005) 35. Xiaohua, C., Brady, J.M., Rueckert, D.: Simultaneous segmentation and registration of medical images. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, pp. 663–670. Springer, Heidelberg (2004) 36. Xu, C., Prince, J.L.: Snakes, shapes, and gradient vector flow. IEEE Trans. Image Process. 7, 359–369 (1998) 37. Yanovsky, I., Thompson, P.M., Osher, S., Leow, A.D.: Topology Preserving LogUnbiased Nonlinear Image Registration: Theory and Implementation. In: IEEE Conf. on CVPR (2007) 38. Yezzi, A., Zollei, L., Kapur, T.: A variational framework for joint segmentation and registration. IEEE-MMBIA, 44–51 (2001)
A Scale-Space Approach to Landmark Constrained Image Registration Eldad Haber1 , Stefan Heldmann2 , and Jan Modersitzki3 1
Dept. of Math. and Computer Science, Emory Emory University, Atlanta, USA
[email protected] 2 Inst. of Mathematics, University of Lübeck, Lübeck, Germany
[email protected] 3 Dept. of Computing and Software, McMaster University, Hamilton, Canada
[email protected] Abstract. Adding external knowledge improves the results for ill-posed problems. In this paper we present a new multi-level optimization framework for image registration when adding landmark constraints on the transformation. Previous approaches are based on a fixed discretization and lack of allowing for continuous landmark positions that are not on grid points. Our novel approach overcomes these problems such that we can apply multi-level methods which have been proven being crucial to avoid local minima in the course of optimization. Furthermore, for our numerical method we are able to use constraint elimination such that we trace back the landmark constrained problem to a unconstrained optimization leading to an efficient algorithm.
1
Introduction
Image registration is a challenging problem in digital imaging. Roughly speaking, the problem can be described as follows. Given a reference image R and a template image T , find a reasonable spatial transformation y such that the transformed image T [y] is similar to the reference. Image registration is required whenever images resulting from different times, devices, and/or perspectives need to be compared or integrated. Alone in the area of medical applications, registration is used in radiation therapy, surgery planing, treatment evaluation, motion correction and estimation and many more, see, e.g. [1, 2, 3, 4, 5, 6] and references therein. See also [7, 8, 9] for related work. However, although the registration problem is easily stated it is hard to be solved. A key difficulty is the ill-posedness of the problem: For a particular point x, scalar intensity values R(x) and T (x) are given but a transformation vector y(x) vector is to be computed. A common approach is to phrase image registration as an optimization problem involving a distance measure D reflecting similarity of images and a regularization term S reflecting reasonability of the transformation. Though appropriate regularization results in a well-posed problem in the sense of Hadamard [10] (see, e.g. [11, 12, 13]), it is sometimes difficult or even impossible to find an application conform regularization. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 612–623, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Scale-Space Approach to Landmark Constrained Image Registration
613
Fig. 1. Reference (left) and template (right) images
A simple example is shown in Fig. 1, where the reference and template image cover the full intensity range and share some obvious symmetries. Considering only rigid transformations, there are four different solutions for any reasonable distance measure. Regularization can be used to privilege one of these (for example by penalizing rotations). However, any regularization is somehow artificial and may favor a meaningless solution. One way to obtain better results and to guide the model towards a more realistic solution is by using landmarks. In the above example, just adding the information that the top-left corner of the square in the reference image corresponds to the bottom-right corner of the square in the template image eliminates three of the above solutions. Adding landmark information to image registration is far from being new, see e.g. [14, 15, 16, 17, 18, 4] and references therein. Although landmarks have been used extensively in the past, the effective numerical implementation of image registration with landmark is unsatisfactory. For example, no landmark registration scheme known to us allow for the incorporation of scale space or multi-level techniques which are frequently used to avoid local minima. Typically, landmark constraints are described in a discrete sense, where the ith pixel in a fixed discretization is constrained. This causes troubles if the discretization is variable, that is, if discretization on different scales is used. The goal of this paper is to develop a multilevel technique for the incorporation of landmarks in a registration process. We stress that ignoring issues such as local minima, different algorithms than the one proposed here (for example [17]) should give similar results. Thus, the focus of this work is on numerical implementation of multilevel algorithms with landmark constraints. Starting with a variational formulation of the landmark constrained registration problem, this paper provides a consistent numerical approach. The new approach is based on discretize-then-optimize approach and takes advantage of a multi-level discretization. The new approach automatically resolves the problem resulting from a fixed number of constraints versus a varying number of unknowns and related inconsistency of the constraints. A numerical stable and computational feasible basis of the constrained manifold is derived. Using a reduced formulation gives a handle to an elegant algorithm, where indefinite Karush-Kuhn-Tucker systems [19] can be avoided.
614
E. Haber, S. Heldmann, and J. Modersitzki
This paper is organized as follows. Sect. 2 introduces the basic notation and states the problem in a variational framework. A discretized then optimize approach is used to numerically solve the constrained registration problem. Details are outlined in Sect. 3, where the discretization, the construction of a basis for the constraint manifold, the numerical optimization, and a multi-level strategy are described. Sect. 4 presents some numerical results. Conclusions are given in Sect. 5.
2
Variational Formulation
In this section we formulate the constrained registration problem. Let d ∈ N denote the spatial dimension (typically d = 2, 3) and Ω ⊂ Rd the region of interest and let T , R ∈ L2 (Rd , R) denote the template and reference image, respectively. The objective is to find a transformation y : Rd → Rd such that the transformed image T [y] in similar to R and the transformation y is regular, where similarity and regularity are measured by D and S, respectively. More precisely, T [y](x) := T (y(x)) for all x ∈ Ω, D[T , R] := 12 Ω (T − R)2 dx, S[y] := α2 Ω |By|2 dx, B := Id ⊗ Δ. Here, for ease of presentation, it is assumed that similarity is quantified by the energy in the difference image. However, other distance measure like mutual information [20,21] or normalized gradient fields [22,23] can be handled similarly. Regularity is measured using the curvature regularizer [24, 25] where the partial differential operator B is the vector valued Laplacian, | · | denotes the Euclidian norm in Rn , and α is a regularization parameter. Note that the order of the regularizer has to be sufficiently high to cover the landmark constraints [26, 4]. It is assumed that a number L of landmarks r1 , ..., rL ∈ Rd in the reference and corresponding landmarks t1 , ..., tL ∈ Rd in the template image are given. The automatic detection of landmarks is beyond the scope of this paper; see [16] for an overview. The point evaluation functional is denoted by δx . With (Id ⊗ δr )[y] = (y 1 (r ), ..., y d (r )) = y(r ) ∈ Rd the landmark constraints can be phrased as C[y] = t := (t1 , ..., tL ) ∈ RL,d , where minimize J [y] = D[T [y], R] + S[y − yref ] subject to
C[y] = t,
(1)
where yref allows for a bias towards a particular solution. The above problem is strongly related to plain landmark based registration, where D = 0 and S = S TPS is the bending energy of a thin-plate-spline; see,
A Scale-Space Approach to Landmark Constrained Image Registration
615
e.g. [26,4] for an extended discussion. The solution yTPS is explicitly known and a linear combination of shifts of a radial basis function ρ associate to S and a polynomial correction. Following [4], the kth component of yTPS reads k (x) = yTPS
L
k k θk ρ(|x − r |) + (1, x1 , ..., xd )(θL+1 , ..., θL+d+1 ) ,
(2)
=1
where the coefficients are given by Aθk = (tk1 , . . . , tkL , 0, . . . , 0) with 1 ··· 1 P [ρ(|ri − rj )|)]L i,j=1 , P = r · · · r ∈ Rd+1,L , A= P 0 1 L 2 t log t (d = 2) and ρ(t) = . t (d = 3) In our final formulation of the continuous problem, we use this function as a reference for regularization, i.e. yref = yTPS , and it is thus convenient, to rephrase the problem in the update u = y − yref : minimize J [u] = D[T [yref + u], R] + S[u] subject to
C[u] = 0.
(3)
The role of the plain landmark solution as a reference is manifold. It can be seen as a good starting guess for a later implementation, minimizing the risk of being trapped by a local minimum. Moreover, it injects boundary values to region of interest. In fact, these boundary conditions make yTPS linear for x → ∞ and thus invertible, which is preferable for most applications. Finally, it yields homogeneous constraints. As it is pointed out later, this is a crucial point for the discretization as now the feasible set is always non-empty.
3
Numerical Treatment
A discretize-then-optimize approach is used to compute a numerical solution of (3). The discretization is briefly outlined for dimension d = 2, see [27] for a detailed and general description. Note that the discretization is variable during the course of optimization and all quantities introduced in this section depend on the discretization with h. However, in this section a fixed discretization level is assumed and in order to keep the presentation clear, dependencies on h are neglected. 3.1
Discretization
Fig. 2.a shows the discretization of a domain Ω in m = (3, 4) cells with cellcenters xj , j = 1, ..., n = m1 m2 . Note that all discrete quantities depend on the discretization width h, hi = ˆ = h1 · · · hd . The next equations describes how the discrete quantiωi /mi and h ties are assembled. X = (x11 , ..., x1n , ..., xdn ) ∈ Rdn , U = (u11 , ..., u1n , ..., udn ) ∈ Rdn ,
R = (R(x1 ), ..., R(xn )) ∈ Rn , T (U ) = (T (u1 ), ..., T (un )) ∈ Rn .
616
E. Haber, S. Heldmann, and J. Modersitzki
x2 ω2
h2
•
•
•
•
•
•
•
•
xj •
•
0 h1
•
∂i2,h
• ω1
x
1
1 0 −2 1 C B 1 −2 1 C .. 1 B C B . 1 −2 = 2B C C ∈ Rmi ,mi hi B .. A @ . −2 1 1 −2
xb
xd ξ1 r ξ2
xa
xc
Fig. 2. Discretization of a 2D domain Ω = (0, ω1 ) × (0, ω2 ) ⊂ R2 (left); discrete 2nd derivative ∂i2,h (middle); linear interpolation (right)
The discretization of the curvature operator can by expressed as Kronneckerproducts [25] of identity matrices Iq ∈ Rq,q and discrete 2nd derivatives ∂i2,h (see Fig. 2.b): B ≈ B = Id ⊗ (Im2 ⊗ ∂12,h + ∂12,h ⊗ Im1 ). Finally, the integrals are approximated using a midpoint quadrature rule. Thus J [u] ≈ J(U ) =
1 2
ˆ |T (yTPS(X) + U ) − R|2 + h
1 2
ˆ |BU |2 . αh
The final step is the discretization of the point evaluation functional δx . For an arbitrary location r , a d-linear interpolation of discrete point evaluation functionals located at the 2d closest grid points is exploited. For example, let d = 2 and let the four neighboring grid points of r be denoted by xa , ..., xd ; see Fig. 2.c. Thus, δr [u] ≈ δrh u = C u(X) = (1 − ξ1 )(1 − ξ2 )u(xa ) + ξ1 (1 − ξ2 )u(xb ) + (1 − ξ1 )ξ2 u(xc ) + ξ1 ξ2 u(xd ), and C is a sparse row vector with non-zero entries only at positions related to the locations of xa , ..., xd . If for a certain discretization a landmarks r would be located precisely on a grid point xj , then C has only one non-zero entry at position j. Assembling these rows for = 1, ..., L results a sparse L-by-n matrix C with at most 2d non-zero entries per row, see Fig. 3.b. The Kronnecker-products Id ⊗ C enables a simultanuous treatment of all components of the discretized vector field U . Note that even for a very coarse discretization (n < L) there exists a feasible solution fulfilling the constraints: U = 0. Thus the feasible set is non-empty. The discrete formulation of the constrained registration problem thus reads: minimize subject to 3.2
ˆ |T (yref (X) + U ) − R|2 + J(U ) = 12 h (Id ⊗ C)U = 0, U ∈ Rdn .
1 2
ˆ |BU |2 hα
(4)
An Efficient Basis for the Feasible Set
The objective is to derive a numerical feasible basis for the nullspace of the operator C. Note the size L-by-n of C can be large (e.g. n = 1283 and L = 100)
A Scale-Space Approach to Landmark Constrained Image Registration
617
and the rank of this matrix is generally unknown. For a coarse discretization, C has more rows than columns and a fine discretization it has more columns than rows. The basic idea is to reorder the columns of C, such that the non-zeros columns are placed first. Let Π denote the corresponding n × n permutation matrix and C ∗ be a matrix consisting of the non-zeros columns of C, such that CΠ = ( C ∗ | 0 ). The size of C ∗ is L-by-p, where p ≤ 2d L since each row of C can have at most 2d non-zeros entries. The matrix C ∗ is not only relatively small but also very sparse. Assuming the number of landmarks to be less then 1.000, it is thus possible to compute a singular value decomposition (SVD) of C ∗ [28], i.e. C ∗ = W ΣV ,
W W = IL ,
where
V V = Ip ,
and Σ = diag(σ1 , ..., σmin{L,p} ) ∈ RL,p ,
σ1 ≥ · · · ≥ σmin{L,p} ≥ 0.
The above SVD enables the computation of the numerical rank of the matrix C ∗ and hence C. To this end let tol be a user proscribed tolerance (e.g. tol = 0 or tol = 10−16 ) and let k the largest integer such that σk > tol. The last p − k columns of V are a basis of the (numerical) nullspace of the matrix C ∗ and thus the columns of Z form a basis for the nullspace of C, where V (:, k+1 : p) ∈ Rp,p−k V (:, k + 1 : p) 0 ∈ Rn,n−k , Z=Π 0 I n−p
and the final step undoes the permutation. Important issues are summarized as follows. The matrix C ∗ is relatively small, such that the SVD becomes numerically feasible. The SVD enables a uniform treatment independent of the rank of C ∗ and thus handles a coarse discretization (L > n) as well as a fine discretization (L < n). Note that in the case L > n the solution is the thin plate spline solution since there are 0 degrees of freedom. The columns of Z form a sparse, orthonormal, and numerically stable basis for the set of constraints. For very fine discretizations, the matrix Z is essentially the identity matrix and can be stored efficiently. Any feasible vector is given by U = (Id ⊗ Z)w, where w ∈ Rd(n−k) , and there always exists a feasible point w = 0. 3.3
Numerical Optimization
The final version of the discrete constrained registration problem is given in terms of the reduced basis and reads minimize J(w) =
1 2
ˆ |T (yref (X) + Zd w) − R|2 + h
1 2
ˆ |BZd w|2 . hα
(5)
where Zd = Id ⊗ Z. In order to find a numerical solution to (5) standard optimization techniques can be applied; see e.g. [19] for an overview. Here, we use a Gauss-Newton type
618
E. Haber, S. Heldmann, and J. Modersitzki
algorithm with an Armijo line search as outlined in [27]. The quasi-Newton system is given by Zd HZd δw = −∇J(w) where δw is the new search direction and H = ∇T ∇T + αB B is an approximation to the Hessian. Note that since the regularization is quadratic the term B B is exactly the Hessian of the regularization part and only the data fitting term is approximated. A generalized Gauss-Newton strategy can be used to handle other distance measures as mentioned before. For a numerical solution of the Newton-systems, a preconditioned conjugate gradient solver is used with symmetric Gauss-Seidel preconditioned; see [29] for details. 3.4
The Multilevel Strategy
It remains to describe the multi-level framework. To this end, a multi-level rep resentation {TD , RD , m } of given discrete data is initialized, where for ease of presentation it is assumed that mi = 2 , i = 1, ..., d, = min , ..., max . Note that hi = ωi /mi depends on the level. More precisely, T max = original data,
T −1 = downsample(conv(G, T )),
where G is a smoothing kernel (in our numerical experiments we used the block smoother G = (1, 1, 1)(1, 1, 1)/9). In general, we compute updates to the thin plate spline solution on different grids. Similar to many other multilevel algorithms, the solution on finer grid is initialized by the coarser grid solution. To be more specific, running from coarse to fine, the continuous represen tation T , R for TD , RD are computed (in our numerical experiments, spline interpolation is used). Moreover, the discretized thin-plate spline solution Yref = TPS y (X ) (cf. (2)) for a cell-centered grid X of size m and the matrix Z (cf. Sect.3.2) is initialized. A numerical solution wopt of the discretized registration problem (5) is computed and the current grid solution is given by Yopt = Y0 + Zd wopt . On the coarse grid, the initial guess we choose w0min = 0 as min starting guess such that Y0min = Yref . The starting guess w0 for a finer grid is chosen as the best least squares approximation of the prolongated coarser grid so lution, where P−1 denotes the linear prolongation operator. Since the constraint
−1 Z −1 wopt . basis Z is orthogonal, the computation simplifies to w0 := Z P−1 When designing a multilevel strategy we require set the number of levels. Unfortunately, setting the number of levels is non-trivial. In general, similar to other problems, one requires that the coarsest level actually represents the problem [30].
4 4.1
Results Artificial 2D Data
We use the hand data shown in Fig. 4. In this example, a synthetic transformation ytrue has been specified and the reference is a transformed copy of the template image R = T [ytrue ]; see Fig. 4. This construction allows a comparison with
A Scale-Space Approach to Landmark Constrained Image Registration
619
a ground truth. Here, 47 manually detected landmarks t˜j haven been chosen in the template image. Using a numerical approximation to the inverse of the trans−1 ˜ formation, the landmarks in for the reference are defined by rj ≈ ytrue (tj ) and corresponding landmarks in the template image are defined by tj := ytrue (rj ). Note that since ytrue is explicitly known, there are no errors in the landmark pairing. The original data is 128-by-128 and the level ranges from min = 3 to max = 7. Fig. 3.a shows the coarse grid representation of the data. Here, many landmarks can be found in some particular cells. The problem is over-constrained and the 47-by-64 matrix C min is rank deficient (the rank being 27). The non-zero pattern of this matrix is shown in Fig. 3.b.
Fig. 3. Coarse grid representation of data with 47 landmarks (circles), min = 3 (left); non-zero pattern of the matrix C min (right)
Fig. 4 shows the original data (a,b,c) and the results based on the thin-platespline solution yTPS (d,g), an unconstrained solution yun (e,h), and the constrained solution ycon (f,i). The distance measure and landmark error are given by err(Y ) := 100 D(Y )/D(X)[%], D(Y ) = |T (Y ) − R|2 , LM(Y ) := |(Id ⊗ C)Y − t|Frobenius. All three registration approaches (TPS, unconstrained, constrained) perform well for this example. The TPS approach gives perfect results for the landmarks but a large difference for the trapezoid. The unconstrained approach results a very small difference but the landmark error is relatively large. Finally, the constrained approach performs perfect on the landmark and results the smallest difference. The later is due to the fact that the stopping criteria is relative to the initial guess, which is results a smaller distance in the constrained approach. 4.2
3D Example
For our 3D experiment we use real data from CT and 3D power Doppler ultrasound (US) of a human liver. The goal of this application is the alignment of
620
E. Haber, S. Heldmann, and J. Modersitzki
err ≈ 100% LM ≈ 8.1
err ≈ 9% LM ≈ 10−14
err ≈ 0.6% LM ≈ 0.68
err ≈ 0.4% LM ≈ 10−14
Fig. 4. Original template image with landmarks (crosses) and visualization of an artificial transformation ytrue (top left); reference (top middle) is a transformed template R(x) = T (ytrue (x)), with visualization of transformed landmarks and initial grid; initial difference |T −R| (top right); transformed template T [y] based on thin-plate spline solution yTPS (center left), unconstrained solution yun (center middle), and constrained solution ycon (center right); differences |T [y] − R| for y = yTPS (bottom left), y = yun (bottom middle), and y = ycon (bottom right)
A Scale-Space Approach to Landmark Constrained Image Registration
621
Fig. 5. 3D registration of CT and US. Reference (top left) R with landmarks (black balls); (b) template T with landmarks (top right); reference R and deformed template T [yTPS ] after landmark registration (bottom left); reference R and deformed template T [ycon ] after constrained registration (bottom right)
vessels that have been segmented from the original data. Consequently, we have binary images allowing for a direct comparison by the SSD distance measure. The size of the data in our experiment is 171 × 165 × 186 voxels. Additionally, we have 11 corresponding landmarks that were manually picked by an expert; see Fig. 5 (a,b). For the registration we used four levels starting from 22 × 21 × 24 and ranging to the original resolution with 171 × 165 × 186 voxels. Results for a plain landmark based registration by using only the thin-plate-spline solution yTPS and the constrained solution ycon = yTPS + u are shown in Fig. 5(c,d). As it turns out, the landmark solution provides a reasonable alignment but is far from being perfect. On the other hand, using the constrained approach improved the quality of the results considerably and leads to an almost perfect alignment of large parts of the vessel system.
5
Conclusions
The paper presents a variational framework for the landmark constrained registration problem and a discretize-then-optimize approach for computing a
622
E. Haber, S. Heldmann, and J. Modersitzki
numerical solution. A difficulty for the multi-level discretization is that the number of constraints is constant while the number of degrees of freedom varies. In particular for a coarse discretization, inconsistent constrains are to be expected. This paper provides a technique to overcome this problem by mixing landmark and update components, which results in compatible constraints. Moreover, it is shown how to efficiently compute a stable, orthogonal, and sparse basis for the constraint manifold and thus enabling a reduced space optimization avoiding saddle point problems.
References 1. Glasbey, C.: A review of image warping methods. Journal of Applied Statistics 25, 155–171 (1998) 2. Pluim, J., Maintz, J., Viergever, M.: Mutual-information-based registration of medical images: a survey. IEEE Transactions on Medical Imaging 22, 986–1004 (1999) 3. Hajnal, J., Hawkes, D., Hill, D.: Medical Image Registration. CRC Press, Boca Raton (2001) 4. Modersitzki, J.: Numerical Methods for Image Registration. Oxford University Press, Oxford (2004) 5. Goshtasby, A.A.: 2-D and 3-D Image Registration. Wiley Press, New York (2005) 6. Joshi, A., Shattuck, D., Thompson, P.: Brain image registration using cortically constrained harmonic mappings. In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 359–371. Springer, Heidelberg (2007) 7. Grady, L.: A lattice-preserving multigrid method for solving the inhomogeneous poisson equations used in image analysis. In: Forsyth, D.A., Torr, P.H.S., Zisserman, A. (eds.) Scale Space and Variational Methods in Computer Vision, SSVM, ECCV (2008) 8. Koestler, H.: A Multigrid Framework for Variational Approaches in Medical Image Processing and Computer Vision. Ph.d. dissertation, University of Erlangen, Netherland (2008) 9. Keller, S., Lauze, F., Nielsen, M.: Motion compensated video super resolution. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 801–812. Springer, Heidelberg (2007) 10. Hadamard, J.: Sur les problmes aux drives partielles et leur signification physique, pp. 49–52. Princeton University Bulletin, Princeton (1902) 11. Weickert, J., Schnörr, C.: A theoretical framework for convex regularizers in PDEbased computation of image motion. Int. J. Computer Vision 45(3), 245–264 (2001) 12. Hinterberger, W., Scherzer, O., Schnörr, C., Weickert, J.: Analysis of optical flow models in the framework of calculus of variations. Num. Funct. Anal. Opt. 23, 69–82 (2002) 13. Droske, M., Rumpf, M.: A variational approach to non-rigid morphological registration. SIAM Appl. Math. 64(2), 668–687 (2004) 14. Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) 15. Maurer, C.R., Fitzpatrick, J.M.: A Review of Medical Image Registration. In: Interactive Image-Guided Neurosurgery. In: American Association of Neurological Surgeons, Park Ridge, IL, pp. 17–44 (1993)
A Scale-Space Approach to Landmark Constrained Image Registration
623
16. Rohr, K.: Landmark-based Image Analysis. Computational Imaging and Vision. Kluwer Academic Publishers, Dordrecht (2001) 17. Fischer, B., Modersitzki, J.: Combining landmark and intensity driven registrations. PAMM 3, 32–35 (2003) 18. Ashburner, J., Friston, K.: Spatial normalization using basis functions. In: Frackowiak, R., Friston, K., Frith, C., Dolan, R., Friston, K., Price, C., Zeki, S., Ashburner, J., Penny, W. (eds.) Human Brain Function, 2nd edn. Academic Press, London (2003) 19. Nocedal, J., Wright, S.J.: Numerical optimization. Springer, New York (1999) 20. Collignon, A., Vandermeulen, A., Suetens, P., Marchal, G.: 3D multi-modality medical image registration based on information theory. Computational Imaging and Vision 3, 263–274 (1995) 21. Viola, P.A.: Alignment by Maximization of Mutual Information. PhD thesis, Massachusetts Institute of Technology (1995) 22. Clarenz, U., Droske, M., Rumpf, M.: Towards fast non–rigid registration. In: Inverse Problems, Image Analysis and Medical Imaging, AMS Special Session Interaction of Inverse Problems and Image Analysis, vol. 313, pp. 67–84. AMS (2002) 23. Haber, E., Modersitzki, J.: Intensity gradient based registration and fusion of multimodal images. Methods of Information in Medicine 46(3), 292–299 (2007) 24. Fischer, B., Modersitzki, J.: Fast curvature based registration of MRmammography images. In: Meiler, M., et al. (eds.) Bildverarbeitung für die Medizin, pp. 139–143. Springer, Heidelberg (2002) 25. Fischer, B., Modersitzki, J.: A unified approach to fast image registration and a new curvature based registration technique. Linear Algebra and its Applications 380, 107–124 (2004) 26. Light, W.A.: Variational methods for interpolation, particularly by radial basis functions. In: Griffiths, D., Watson, G. (eds.) Numerical Analysis 1995, pp. 94– 106. Longmans, London (1996) 27. Haber, E., Modersitzki, J.: A multilevel method for image registration. SIAM J. Sci. Comput. 27(5), 1594–1607 (2006) 28. Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (2000) 29. Barrett, R., Berry, M., Chan, T.F., Demmel, J.W., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., van der Vorst, H.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994) 30. Trottenberg, U., Oosterlee, C., Schuller, A.: Multigrid. Academic Press, London (2001)
A Variational Approach for Volume-to-Slice Registration Stefan Heldmann and Nils Papenberg Institute of Mathematics, University of Lübeck, Germany {heldmann,papenber}@math.uni-luebeck.de
Abstract. In this work we present a new variational approach for image registration where part of the data is only known on a low-dimensional manifold. Our work is motivated by navigated liver surgery. Therefore, we need to register 3D volumetric CT data and tracked 2D ultrasound (US) slices. The particular problem is that the set of all US slices does not assemble a full 3D domain. Other approaches use so-called compounding techniques to interpolate a 3D volume from the scattered slices. Instead of inventing new data by interpolation here we only use the given data. Our variational formulation of the problem is based on a standard approach. We minimize a joint functional made up from a distance term and a regularizer with respect to a 3D spatial deformation field. In contrast to existing methods we evaluate the distance of the images only on the two-dimensional manifold where the data is known. A crucial point here is regularization. To avoid kinks and to achieve a smooth deformation it turns out that at least second order regularization is needed. Our numerical method is based on Newton-type optimization. We present a detailed discretization and give some examples demonstrating the influence of regularization. Finally we show results for clinical data.
1
Introduction
In this paper we describe a new method for the registration of volumetric images to data that is given only on a low dimensional submanifold. The work is motivated by a clinical problem on improved resection of tissue by pre-operative intervention planning in liver surgery [1, 2]. Before an intervention an extensive planning including the definition of surgical paths and risk analysis is made. The planning is based on abdominal CT scans of the patient and subsequent segmentation of liver, liver segments, and vessels, cf. Fig. 1(a). During the intervention the surgeon is guided by tracked ultrasound (US) images of the liver. Consequently, the pre-operative CT planning data has to be aligned to the actual deformation of the liver given by the US data. A challenge in laparoscopic liver surgery is that the US data is recorded as a sequence of two dimensional slices in 3-space. Although the spatial ordering of the slices follows the scan path, they are not aligned and in general each slice can have an arbitrary position, cf. 1(b). X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 624–635, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Variational Approach for Volume-to-Slice Registration
(a)
625
(b)
Fig. 1. Clinical image data; (a) pre-operative CT planning data (few slices out of volume and segmentation of the liver); (b) few US slices from a single scan
One approach for the registration of a CT volume and US slices is to use so-called compounding techniques. Therefore, in a first step the US slice data is compounded into volume by interpolation and subsequently standard volumetric image registration is applied. However, using compounding has several drawbacks [3, 4, 5]. and practical experiments showed that using this approach for registration performs poorly and did not produce reasonable results. Besides poor performance, matching volumetric CT data to artificially generated volumetric US data does not provide confidence in registration results for the surgeon. Here, we take a different approach by comparing volumetric data directly to the given slice data. We use a variational setting for image registration. Therefore we minimize a cost-functional consisting of a so-called distance measure and regularizer with respect to a volumetric deformation. Here the regularizer is an integral on a d-dimensional domain while the distance is an integral on a d − 1-dimensional manifold. Although this seems to be a slight modification it turns out that higher order regularization is necessary to ensure smooth and differentiable deformations. In this work we provide proof-of-concept for our new approach. Therefore we consider a simplified mono-modal setting, i.e., we assume the volumetric and the slice data stem from the same type of imaging device. Without loss of generality, this allows for using the easy to present so-called Sum-of-Squared distance measure for the description of our method. The paper is organized as follows. First we present our variational approach to image registration and the novel distance measure. Next we discuss the need of higher-order regularization. In Sect. 4 we present a numerical scheme and subsequent we discuss our specific discretization of the distance measure and the regularizer in detail. part. Finally, in Sect. 5 we demonstrate the method with a synthesized clinical example.
2
Approach
In general we are given two images, a so-called reference R : Rd → R and a socalled template T : Rd → R. The goal of image registration is to find a smooth
626
S. Heldmann and N. Papenberg
deformation y : Ω → Rd that spatially aligns the images best on a domain of interest Ω ⊂ Rd . Typically Ω is a rectangular domain. Mathematically we formulate image registration as an optimization problem [6]. That is, we want to compute a solution y to min y
J (y) := D(R, T (y)) + αS(y)
(1)
where T (y) denotes the composition T ◦ y. The first term D of the objective function is a so-called distance measure that quantifies similarity between the reference R and the deformed template T (y). The second building block S is a regularizer forcing smoothness of the solution where α > 0 is a fixed chosen parameter. Typically S has the form [7] S(y) :=
1 By2L2(Ω) 2
(2)
where B is a linear differential operator. The particular difficulty in our case is that the template is a volumetric image while the reference is only known on a few scattered slices. As mentioned in the introduction one can use compounding-techniques to generate an artificial volume and subsequently use standard distance measure that relies an comparing two images of same dimension. We propose a different method. The idea of our new approach is to use only the given data rather than guessing the missing parts of the reference. To make the idea clear, in the following we assume that the distance measure is the socalled sum-of-squared-differences (SSD) [8], i.e, D is the squared L2 norm of the difference of the images. This is no loss of generality. The proposed modification applies to other distance measures such as mutual information [9,10], too, which is more suitable for multi-modal registration of CT and US data. As mentioned in the introduction, the goal of this paper is proof-of-concept and to outline the general method. Therefore and for ease of presentation, here we use the SSD distance measure. However, the standard SSD for d-dimensional images is given by 2 1 SSD(R, T ) = T (x) − R(x) dx. (3) 2 Ω In our approach we assume the reference is given only on a few planes on Ω. More generally, we assume R is known only on a set of smooth and bounded (d − 1)-dimensional sub-manifolds Mj ⊂ Ω, j = 1, . . . , m. Therefore, we modify (3) and define our distance measure by 1 D(R, T ) := 2 j=1 m
Mj
2 T (x) − R(x) dS(x)
(4)
where dS is the (d − 1)-dimensional surface measure. Note that in the particular case when Mj are slices we can trace back our modified distance to a sum of SSD distances of (d − 1)-dimensional images similar to serial registration. In this
A Variational Approach for Volume-to-Slice Registration
627
particular case we can parametrize Mj by linear maps τj with Gram determinant det Dτj Dτj = 1, where Dτj denotes the Jacobian matrix of τj , such that D(R, T ) =
m
SSD(Rj , Tj )
j=1
with Rj := R ◦ τj and Tj := T ◦ τj . Although changing integration in the distance measure seems a slight modification of problem (1) it turns out that regularization becomes crucial and needs to be chosen carefully. Since now the data is only given on a low-dimensional manifold the solution is strongly influenced by the full-space regularization. It turns out that first-order regularization, e.g, by choosing B = ∇ in (2), will produce non-differentiable solutions with kinks at the boundary of the manifold, cf. Fig. 2(e) and (h). In contrast, using second order regularization, e.g., setting B = Δ where Δ denotes the vector Laplacian, produces smooth results, cf. Fig. 2(f) and (i). In Sect. 3 we analyze this behavior by considering a simplified quadratic functional. Generally, the order of regularization to ensure differentiability depends on the space dimension. However, from the analysis in Sect. 3 we found that second order regularization is sufficient for space dimension d = 2 and d = 3. As a result we particularly propose using the curvature regularizer, i.e., setting B = Δ. Summarizing, for volume-to-slice registration we consider problem (1) with the distance measure (4) and smoother (2) with B = Δ. Thus, our approach is m 2 α 1 min T (y(x)) − R(x) dS(x) + |Δy|2 dx. (5) y 2 j=1 Mj 2 Ω
3
Regularization
In the following we motivate second order so-called curvature regularization [11, 12] by choosing B = Δ. The resulting functional for the registration (cf. (5)) is highly non-linear and in general non-convex which makes an analysis difficult and involved. To illustrate the main point on regularization we now consider a simplified quadratic problem 1 min By2L2(Ω) + gy dS (6) y 2 M where Ω ⊂ Rd is a domain with smooth boundary (Lipschitz), M ⊂ Ω is a smooth (d − 1)-dimensional manifold, and a function g ∈ L2 (M). Without loss of generality we assume that locally coordinates can be chosen such that M = {x ∈ Ω : xd = 0}. Then we can define a distribution f as the product of g multiplied by a Dirac-delta distribution, i.e., f is given by f = g δxd , such that f y dx = gδxd y dx = gy dS. (7) Ω
Ω
M
628
S. Heldmann and N. Papenberg
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Fig. 2. Volume-to-slice-registration results for academic 2D (a)–(f) and 3D (g)–(i) experiments. (a) Template image and 1D manifold (vertical line); (d) Original Reference that is compared to the template on the 1D manifold (vertical line); (b)+(e) Deformed template (a) and deformation for 1st order regularization (B = ∇); (c)+(f) Deformed template (a) and deformation for 2nd order regularization (B = Δ); (g) Surface of 3D template (elongated bar) and three orthogonal 2D manifolds with reference data taken from a big cuboid; (h) Deformed template for 1st order regularization (B = ∇); (i) Deformed template for 2nd order regularization (B = Δ).
Furthermore we assume that g = 0, i.e., gL2(M) = 0. Computing the EulerLagrange equations in its weak form shows a necessary condition for a minimizer is Ay = f (8) where A := −B ∗ B and B ∗ denotes the adjoint of B.
A Variational Approach for Volume-to-Slice Registration
629
The right-hand-side f belongs to the space H −1 (Ω) but clearly f ∈ L2 (Ω) = H (Ω) where H −1 (Ω) denotes the dual space of H 1 (Ω) and H m (Ω) is the Sobolev space of all m-times weakly differentiable functions [13, §3]. Now we discuss two different choices for the regularizer B. First first-order socalled diffusive regularization [14] with B = ∇ and second second-order curvature regularization by B = Δ. In the first case B = ∇ yields B ∗ = −∇· and hence A = Δ is a second-order differential operator. Since the right-hand-side f belongs to H −1 (Ω) \ H 0 (Ω) a solution of (8) must be in H −1+2 (Ω) \ H 0+2 (Ω) = H 1 (Ω) \ H 2 (Ω) (cf. [15, §8]). Due to the embedding H k (Ω) ⊂ C m (Ω) for m < k − d/2 this shows that if d > 1a solution cannot be differentiable [13, §6]. Applying the same logic in the second case for B = Δ, we find B ∗ = −Δ yielding the fourth-order differential operator A = Δ2 . Therefore, a weak solution y of (8) has to satisfy y ∈ H 3 (Ω) \ H 4 (Ω). Hence, if d < 4 then y ∈ C 1 (Ω) such that a solution is continuously differentiable for d = 2, 3. 0
4
Numerical Method
In this section we describe our approach to compute a numerical solution for the volume-to-slice registration problem (5). Here, we follow the first-discretize-thenoptimize paradigm. Therefore, we discretize the functional and subsequently apply Gauss-Newton optimization. We start by explaining our discretization. In the following we particularly describe the discretization for the threedimensional case, i.e., d = 3. That is, the domain of interest Ω is a subset of R3 and Mj are two-dimensional manifolds. We assume that the domain of interest is rectangular, i.e., Ω = (a1 , b1 ) × (a2 , b2 ) × (a3 , b3 )
with − ∞ < ai < bi < ∞, i = 1, 2, 3,
and Mj are rectangular slices. For simplicity we assume that all slices Mj are parametrized over the same parameter space Θ such that Mj = {x = τj (t) : t ∈ Θ}
and
Θ := (0, θ1 ) × (0, θ2 )
with parametrizations τj : Θ ⊂ R2 → Mj ⊂ R3 given by τj (t) := Qj t + bj ,
3 Qj ∈ R3×2 such that Q j Qj = I and bj ∈ R .
(9)
Note that the condition Q j Qj = I implies det Dτj Dτj = 1 where Dτj denotes the Jacobian matrix of τj . This property simplifies computing the integrals on the manifolds and will be used later. We start with the discretization of the deformation and the distance measure. Subsequently we describe the discretization of the regularizer.
Discretization of the Deformation We use a nodal discretization for the deformation y on Ω. Therefore, we introduce a uniform grid composed of n1 × n2 × n3 cells with grid-spacing h = 1 b2 −a2 b3 −a3 ( b1n−a , n2 , n3 ) and nodal grid points 1
630
S. Heldmann and N. Papenberg
Ω h := xk = x0 + k h : k ∈ {0, . . . , n1 } × {0, . . . , n2 } × {0, . . . , n3 } where x0 = (a1 , a2 , a3 ) and denotes the Hadamard (point-wise) product of two vectors. Then, we collect the values y(xk ) ∈ R3 of the deformation at all N = (n1 + 1)(n2 + 1)(n3 + 1) nodal grid points xk ∈ Ω h in a grid-function, i.e., a vector y h ∈ R3N . Discretization of the Distance Measure Now we turn to the to the discretization of the distance measure. Recall, that it was defined as m 2 1 T (y(x)) − R(x) dS(x). D(R, T (y)) = 2 j=1 Mj For an approximation of the integrals on Mj we start by discretizing the parameter space Θ. Therefore, we define θ 1 θ2 h h : k ∈ {1, . . . , p1 }×{1, . . . , p2 } Θ := tk = k h − with h = , 2 p1 p2
such that Θh contains the cell-center of a regular discretization by p1 × p2 cells. Consequently, we discretize Mj by
Mhj := {mk = τj (tk ) : tk ∈ Θh }. Note that we have two different grid-spacings h and h for the discretization of the deformation y on Ω and the discretization of the manifolds Mj , respectively. y
t2
τj x
t1 cell-centered discretization Θh of the parameter-space
z nodal discretization Ω h of the deformation (gray) with cell-centered discretiza tion Mhj of the manifold (black)
Fig. 3. Schematic overview on the discretization of the parameter-space Θ (left) and a manifold Mj and the domain Ω (right)
A Variational Approach for Volume-to-Slice Registration
631
An schematic overview of the different discretizations Θh , Mhj , and Ω h is shown in Fig. 3. Using the common mid-point rule for the discretization of an integral over Mj we obtain 2 2
T (y(x)) − R(x) dS(x) = T (y(τj (t)))−R(τj (t)) det Dτj Dτj dt Mj
Θ
=
2 T (y(τj (t))) − R(τj (t)) dt
Θ
≈ h1 h2
tk ∈Θh
= h1 h2
2 T (y(τj (tk ))) − R(τj (tk )) 2 T (y(mk )) − R(mk ) ,
mk ∈Mh j
where we used orthogonality of the Jacobian matrix Dτj , cf. (9). For short nota tion, analogues to the deformation we collect the M = p1 p2 grid points in Mhj in a vector mhj ∈ R3M . With some abuse of notation let Rjh := R(mhj ) ∈ RM be the values of the reference R on Mhj and analogues T (y(mhj )) be the values of T (y) such that 2 T (y(mhj )) − Rjh 22 = T (y(mk )) − R(mk ) .
mk ∈Mh j
As we can see this approximation involves values of the deformation y at points mk ∈ Mhj which are in general no grid-points of our nodal discretization Ω h . To this end we approximate the values y(mk ) for mk ∈ Mhj by interpolation of the nodal grid-function y h , i.e., y(mk ) ≈
3N
ξi yih
for
mk ∈ Mhj .
i=1
We particularly use linear interpolation such that in fact only 8 coefficients per point are involved. Collecting all interpolation weights ξi for each point mk ∈ Mhj in a 3M × 3N matrix Pj we have
T (Pj y h ) ≈ T (y(mhj )). Summarizing, we approximate the distance measure by m m 2 h h 1 T (y(x))−R(x) dS(x) ≈ 1 2 T (Pj y h )−Rjh 22 . D(R, T (y)) = 2 j=1 Mj 2 j=1
h ) ∈ RMm , P = diag(P1 , . . . , Pm ) ∈ R3Mm×3N we Setting Rh = (Rh1 , . . . , Rm obtain a concise formulation for a discrete version of D(R, T (y)) given by
D(y h ) :=
h1 h2 T (P y h) − Rh 22 . 2
(10)
632
S. Heldmann and N. Papenberg
Discretization of the Regularizer For a discrete version of the curvature regularizer we use standard finite differences for approximating derivatives and the mid-point rule for the approximation integrals. Recall the curvature regularizer was defined as 1 1 2 S(y) = ΔyL2 (Ω) = |Δy|2 dx. 2 2 Ω In a first step we approximate the Laplacian based on the standard second-order seven-point-formula, i.e., we define Δh y(x) :=
3 1 y(x − h e ) − 2y(x) + y(x + h e ) 2 h =1
where e1 , e2 , e3 are the unit vectors of R3 . Furthermore, let B h ∈ R3N ×3N be its matrix representation such that B h y h is a second order approximation to Δy at the nodal grid points in Ω h yielding (B h y h ) (B h y h ) is a second order approximation to (Δy)2 . Now, let Acn ∈ Rn1 n2 n3 ×N be a matrix that averages values from nodes to the cell-centers such that Acn (B h y h ) (B h y h ) is a second order approximation to (Δy)2 at the cell-centers. Thus applying the mid-point rule for mesh size h = (h1 , h2 , h3 ) we obtain c h h h h h1 h2 h3 e An (B y ) (B y ) ≈ |Δy|2 dx Ω
with e = (1, 1, . . . , 1) ∈ R algebra we find
n1 n2 n3
the one-vector. Moreover, applying some linear
e Acn (B h y h ) (B h y h ) = e Acn diag(B h y h )B h y h = y h B h diag(e Acn )B h y h . As a result, we define the discrete version of the curvature regularizer by S(y h ) :=
1 h h h y A y 2
with a matrix Ah := h1 h2 h3 B h diag(e Acn )B h ∈ R3N ×3N . Gauss-Newton Optimization Having established discrete versions of the distance measure and the smoother now we aim to min D(y h ) + αS(y h ). (11) y
Clearly, (11) is not a quadratic function due to the non-linearity in the distance D. Therefore, we cannot compute a solution directly and have to rely on an iterative method. Here, we us a standard Gauss-Newton method [16]. Therefore, in each iteration we solve a linear system of the type Hs = −g
(12)
to compute an update s for the current iterate. Thereby g is the gradient ∇D + α∇S of the objective function given by
A Variational Approach for Volume-to-Slice Registration
633
g = h1 h2 P ∇T (T (P y h ) − Rh ) + αAh y h and H is an approximation to the Hessian ∇2 D+α∇2 S. Neglecting second order terms in ∇2 D we set H := h1 h2 P ∇T ∇T P + αAh . Thus, the Hessian is a sparse symmetric positive definite matrix such that we can apply a conjugate gradient (CG) method for solving the linear system (12). In our implementation we use CG with symmetric Gauss-Seidel relaxation as a preconditioner. Summarizing this leads to an efficient numerical algorithm for computing a solution to the discrete volume-to-slice registration problem (11).
5
Experiments
We demonstrate our method by an academic example on real liver data. Therefore, we use 238 × 155 × 156 US volumetric data captured by a 3D US-scanner.
(a)
(b)
(c)
(d)
Fig. 4. 3D Volume-to-slice-registration results for clinical data. (a) 3D data (black) with five 2D reference slice; (b) 3D template (gray) with five reference slice; (c)+(d) 3D template (gray) and original data (black) before and after registration.
634
S. Heldmann and N. Papenberg
We simulate a typical ultrasound sweep by extracting few 2D slices from the volume. Fig. 4(a) shows the setting for five slices where we visualize the volumetric data by a surface rendering of the contained vessels. This slice data is used as reference. Subsequently, we apply an artificial non-linear deformation to the volume that is used as a template. Fig. 4(b) displays a surface rendering of the template with the reference slice data. Based on the five reference slices and the volumetric template then we performed a volume-to-slice registration. Fig. 4(c) and 4(d) shows the 3D template vessels before and after registration together with original vessels. Note that the original vessels served only to generate the reference slices and was not take into account during registration. As we can see we obtain an amazing and almost perfect alignment based on very few reference data (see Fig. 4(d)).
6
Conclusions
We described a new method for registration of a d-dimensional template to d − 1-dimensional reference data motivated by CT/US registration. A key observation is that high order regularization is required to avoid unwanted and non-differentiable deformations. Furthermore, we described an efficient algorithmic based on a Gauss-Newton optimization method. In a first experiment we successfully demonstrated our method for the registration of artificially deformed data where we were able to almost recover the original deformation based only on very few reference data. These promising first result shows that out approach works in general. Clearly, the chosen SSD distance measure is not suitable for the target application on CT and US registration. However, our overall method is independent of a particular choice for the distance measure. An extension to other distance measure that can handle multi-modality, such as mutual information, is straightforward. Concluding, we have presented a novel scheme and proof-of-concept for a clinical-relevant problem based on sound theory and efficient numeric. Future work includes extension to a multi-modal setting for registration of CT and US.
Acknowledgments We thank Dirk Langemann from the Institute of Mathematics at the University of Lübeck for his support on functional analysis. We also thank Thomas Lange from the Department of Surgery and Surgical Oncology at Charité - Universitätsmedizin Berlin for providing image data.
References 1. Fong, Y., Fortner, J., Sun, R., et al.: Clinical score for predicting recurrence after hepatic resection for metastatic colorectal cancer: analysis of 1001 consecutive cases. Ann. Surg. 230, 309–318 (1999)
A Variational Approach for Volume-to-Slice Registration
635
2. Lang, H.: Technik der leberresektion - teil i. Chirurg 78(8), 761–774 (2007) 3. Barry, C., Allott, C., John, N., Mellor, P., Arundel, P., Thomson, D., Waterton, J.: Three-dimensional freehand ultrasound: Image reconstruction and volume analysis. Ultrasound in Medicine & Biology 23, 1209–1224 (1997) 4. Coupe, P., Azzabou, P.H.N., Barillot, C.: 3D freehand ultrasound reconstruction based on probe trajectory. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 597–604. Springer, Heidelberg (2005) 5. Rohling, R.: 3D Freehand Ultrasound: Reconstruction and Spatial Compounding. PhD thesis, Department of Engineering, University of Cambridge (1998) 6. Broit, C.: Optimal registration of deformed images. PhD thesis, Department of Computer and Information Science, University of Pensylvania (1981) 7. Modersitzki, J.: Numerical Methods for Image Registration. Numerical Mathematics and Scientific Computation. Oxford University Press, Oxford (2003) 8. Brown, L.G.: A survey of image registration techniques. ACM Computing Surveys 24(4), 325–376 (1992) 9. Viola, P.A., Wells, W.M.I.: Alignment by maximization fo mutual information. In: 5th International Conference on Computer Vision (1995) 10. Collignon, A., Maes, F., Vandermeulen, P., Suetens, P., Marchal, G.: Automated multi-modality image registartion based on information theory. Information Processing in Medical Imaging (1995) 11. Fischer, B., Modersitzki, J.: Curvature based image registration. JMIV 18(1) (2003) 12. Fischer, B., Modersitzki, J.: Combining landmark and intensity driven registrations. PAMM 3, 32–35 (2003) 13. Wloka, J.: Partial Differential Equations. Cambridge University Press, Cambridge (1987) 14. Fischer, B., Modersitzki, J.: Fast diffusion registration. In: Nashed, M., Scherzer, O. (eds.) Inverse Problems, Image Analysis, and Medical Imaging. Contemporary Mathematics, vol. 313. AMS (2002) 15. Rudin, W.: Functional Analysis. McGraw-Hill, New York (1991) 16. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research. Springer, Heidelberg (1999)
Hyperbolic Numerics for Variational Approaches to Correspondence Problems Henning Zimmer1,2 , Michael Breuß1 , Joachim Weickert1 , and Hans-Peter Seidel2 1
Mathematical Image Analysis Group, Faculty of Mathematics and Computer Science, Building E1.1, Saarland University, 66041, Saarbrücken, Germany {zimmer,breuss,weickert}@mia.uni-saarland.de 2 Max-Planck Institute for Informatics, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany
[email protected] Abstract. Variational approaches to correspondence problems such as stereo or optic flow have now been studied for more than 20 years. Nevertheless, only little attention has been paid to a subtle numerical approximation of derivatives. In the area of numerics for hyperbolic partial differential equations (HDEs) it is, however, well-known that such issues can be crucial for obtaining favourable results. In this paper we show that the use of hyperbolic numerics for variational approaches can lead to a significant quality gain in computational results. This improvement can be of the same order as obtained by introducing better models. Applying our novel scheme within existing variational models for stereo reconstruction and optic flow, we show that this approach can be beneficial for all variational approaches to correspondence problems.
1
Introduction
Numerous tasks in the field of computer vision belong to the class of correspondence problems, where one has to match pixels of two or more images. Popular examples are stereo reconstruction and optic flow, that both amount to computing a displacement field between two images. In the stereo context, the absolute value of this field is called disparity and is needed to recover the depth information of a static scene. For optic flow, the displacement field is called optic flow field and gives information about the dynamics of a moving scene. A successful class of techniques for solving correspondence problems like stereo or optic flow are the variational approaches that find the displacement field as the minimiser of a continuous energy functional. Those methods have been studied for more than two decades, starting from the optic flow approach of Horn and Schunck [1]. During this period of time, lots of effort has been spent to improve the quality of models [2, 3, 4, 5, 6, 7]. In order to apply those continuous models to sampled digital images and for solving the minimisation problem on a computer, one certainly has to discretise X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 636–647, 2009. c Springer-Verlag Berlin Heidelberg 2009
Hyperbolic Numerics
637
occurring image derivatives. This task obviously offers a certain degree of freedom in choosing a well-suited derivative approximation. Surprisingly, this issue has hardly been studied for variational approaches to correspondence problems. If the discretisation is discussed at all, most approaches use “standard” central finite difference approximations [3, 4, 5]. For variational approaches to image restoration, sophisticated approximation schemes have already been considered for a long time [8, 9]. They also have been thoroughly studied in the field of hyperbolic partial differential equations (HDEs) [10, 11], where one simulates the transport of liquids or gases, resulting in a problem setting related to correspondence problems: Given an initial density distribution (first image) and the velocity of transport (displacement), compute the density distributions at later times (second image). One realises that the role of known and unknown is switched compared to correspondence problems. In this paper we make use of this relation between HDEs and correspondence problems for the first time in the literature. In the style of numerical schemes for HDEs, we develop an adaptive discretisation scheme that decides, based on a smoothness measure, on a suitable approximation of image derivatives at each point. This scheme is then used within variational frameworks for stereo reconstruction and optic flow. Experiments show that this approach improves the quality of results in the same order as can be achieved with model refinements. This paper is organised as follows: In Sect. 2 we investigate the importance of an appropriate approximation of image derivatives on the example of simple 1-D correspondence problems. Based on this we develop the adaptive discretisation scheme that is applied to stereo reconstruction and optic flow in Sect. 3 and Sect. 4, respectively. There we also show corresponding experiments. The paper is then concluded by a summary and an outlook on future work in Sect. 5.
2 2.1
Hyperbolic Numerics for 1-D Variational Approaches A Variational Approach for 1-D Correspondence Problems
For simplicity, let us consider a 1-D signal sequence f (x, t) where x ∈ Ω denotes the position in the signal domain Ω ⊂ IR and t ≥ 0 denotes time. In order to compute the unknown displacement function u(x) that gives the displacements from time t to t + 1, we minimise the energy functional E(u) = (fx u + ft )2 + α u2x dx , (1) Ω
where subscripts denote partial derivatives. The term (fx u+ft )2 is called data term and models how well the displacement u matches the signal sequence f . We impose that the signal values are invariant under their displacement, i.e., f (x+u, t+1) = f (x, t). Assuming that u is small and f sufficiently smooth, we can perform a linearisation that finally leads to the presented data term. Note that in the 1-D setting, the data term alone allows to compute a solution u = −ft /fx , if fx = 0. However, in 2-D this will no longer be the case.
638
H. Zimmer et al.
There, and also to obtain a solution in flat signal regions, the smoothness term u2x is needed. By penalising large derivatives of u, it allows to smoothly fill in the displacement function where the data term is not sufficient. Its contribution to the energy is steered by a smoothness weight α > 0. In order to actually compute a minimiser u of the energy (1), the calculus of variations states that u necessarily has to fulfil the Euler-Lagrange equation fx (fx u + ft ) − α uxx = 0 ,
(2)
with homogeneous Neumann boundary conditions. 2.2
A Closer Look into Discretisation Issues
For solving the Euler-Lagrange equation (2) on a computer, we have to discretise the signal f , the displacement u and their derivatives fx , ft and uxx . Note that the image derivatives that occur in the Euler-Lagrange equation (2) are in general the same as in the linearised data term of the energy (1). Thus, the data term suffices to find out which derivatives have to be approximated. Let us start with the discretisation of the signals f and u. To this end we sample them on a spatio-temporal discrete grid which yields the approximations f (xi , tk ) ≈ fik and u(xi ) ≈ ui where xi := (i − 12 ) h and tk = k τ for a spatial grid size h and a time step size τ . In this paper we will only consider the two frames fik and fik+1 , assuming a temporal sampling of τ = 1. Derivative Approximations. The discretisation of the occurring derivatives can be done in different ways. We use the popular concept of finite differences, as for example presented in [12]. As notation for the approximation of partial derivatives we use fd (xi , tk ) ≈ (fd )ki to denote the corresponding finite difference discretisation. I. Temporal Discretisation. For the time derivative we use the forward difference (ft )ki :=
1 k+1 f − fik , τ i
(3)
as this is the only reasonable choice, given fik and fik+1 . II. Spatial Discretisation of First Order. The approximation of fx offers different possibilities for (fx )ki . Basic choices are forward, backward and central differences: 1 k fi+1 − fik , h 1 k − k k fi − fi−1 Dx fi := , h 1 k k fi+1 − fi−1 Dx0 fik := , 2h Dx+ fik :=
1 k+1 fi+1 − fik+1 , h 1 k+1 k+1 − k+1 fi Dx f i , := − fi−1 h 1 k+1 k+1 fi+1 − fi−1 Dx0 fik+1 := , 2h Dx+ fik+1 :=
(4)
where D+ denotes forward, D− backward and D0 central differences, respectively, that can be computed at the time level k or k + 1.
Hyperbolic Numerics
639
Note that the approximation error of the one-sided differences (forward and backward) is in O(h), whereas their central counterparts only involve an error of O(h2 ). This, together with the unbiased stencil orientation, explains why they are a popular “standard” choice in image processing applications. To further reduce the approximation error one may consider averaged differences, taking into account the time level k and k + 1. In the remainder of this paper those will be referred to as “standard” derivative approximation. They are given by k+ 12
Dx0 fi
:=
1 k 1 0 k k+1 k+1 k Dx fi + Dx0 fik+1 = f . − fi−1 + fi+1 − fi−1 2 4h i+1
(5)
III. Spatial Discretisation of Second Order. Finally we have to approximate the second order spatial derivative of the displacement function. As this choice is not crucial we propose a simple central approximation 1 (uxx )i := Dx− Dx+ ui = 2 (ui+1 − 2ui + ui−1 ) . h
(6)
Why the Discretisation of fx Matters. To show that an appropriate choice of (fx )ki is crucial for computing reasonable displacements u, we conduct a small experiment: Consider the two frames of a signal sequence in Fig. 1 (a). Here, the signal is displaced by one position to the right in its middle part and stays unchanged otherwise, which is also indicated in the ground truth displacement in Fig. 1 (b). Note that this example comprises smooth as well as discontinuous signal and displacement regions which make it rather indicative. In Fig. 1 (c)–(e) we depict computed displacements using different discretisations for fx . The displacements were obtained as the solution of a linear system of equations that arises from the discretised Euler-Lagrange equation (2). As the system matrix is tri-diagonal, it can directly be solved via the Thomas algorithm [13]. Further note that we set the smoothness weight α = 10−4 , to clearly see the influence of the data term where fx occurs. When comparing the displacements in Fig. 1 (c)–(e), the large influence of the choice of (fx )ki becomes obvious: Averaged central differences only perform well in the smooth signal regions at the left and right boundaries. At discontinuities they suffer from over- and undershoots. One-sided differences perform either favourably or fail totally. Obviously, the correct orientation matters here. When using the “correct” one-sided differences, the displacement almost coincides with the ground truth, except at one point. This is, however, not due to the numerics, but is caused by the occlusion at the jump in the displacement. Hence the considered point at time level k does not possess a matching point at time level k + 1 and its displacement is undefined. In the ground truth, we assign to this point the displacement of its right neighbour. The observed behaviour in our experiment can be explained when looking into the theory of HDEs [10, 11]. There, so called upwind schemes are a widely used concept where the signal derivatives are approximated by “correctly oriented” one-sided differences. The correct orientation in our case means opposite to the displacement direction, see our experiment.
640
H. Zimmer et al. 30
1.5
25 1 20 15
0.5
10 0 5 0
-0.5 0
2
4
6
8
10
12
0
2
4
8
6
8
10
12
2
4
8 5
7 6
7 6
0
5
5
-5
4
4
-10
3 2
3 2
-15
1
1 -20
0
0
-1
-1 0
2
4
6
8
10
12
0
2
4
6
8
10
12
0
6
8
10
12
Fig. 1. Top row: (a) Signal at time k (solid) and k + 1 (dotted). (b) Ground truth displacement. Bottom row: (c) Displacement computed using standard averaged central differences (solid), compared to the ground truth (dotted). (d) Same for one-sided forward differences. (e) Same for one-sided backward differences.
2.3
An Adaptive Discretisation Scheme
After explaining the outcome of our experiment with the help of hyperbolic numerics, we now adapt a successful concept from this area for our purpose. Recall that one-sided upwind differences – that are low-order approximations – perform well at signal discontinuities. However, they involve a higher discretisation error than central differences that are high-order approximations and that perform favourably in smooth signal regions. Hence a natural idea is to combine the two strategies by using high-order approximations in smooth signal parts and low-order ones at discontinuities. Slightly more involved techniques utilising this idea are the high-resolution methods [11], developed in the context of HDEs. They use a nonlinear blend of low- and high-order approximations, steered by a smoothness measure. Adapting this methodology to the variational framework will result in an adaptive highresolution-type (HRT) discretisation scheme for correspondence problems, that will be presented now. Measuring smoothness. First we discuss how to determine the smooth and discontinuous regions of a signal. Therefore we introduce a smoothness measure Θi := Θ fik , fik+1 := Dx− fik − Dx+ fik + Dx− fik+1 − Dx+ fik+1 ,
(7)
that is close to 0 in smooth regions where backward and forward differences of fik and fik+1 are almost identical, and large at discontinuities of fi . Determining the Upwind Directions. Next we need to determine the appropriate upwind directions for the one-sided differences. Note that our experiment
Hyperbolic Numerics
641
from Fig. 1 has shown that this is very crucial. We propose to compute a predictor solution u ˜ whose sign determines the upwind direction. The predictor is computed using standard averaged central differences and a comparatively large smoothness weight, e.g., α ˜ = 1 to cope with outliers caused by the possibly less appropriate high-order discretisation. With its help the low-order upwind approximation fxL of fx is defined as ⎧ ⎪ D− f k , if u˜i > 0 , ⎪ ⎨ x i L (8) fx i := Dx+ fik , if u ˜i < 0 , ⎪ ⎪ ⎩ H (fx )i , if u ˜i = 0 , where
H k+ 1 fx i := Dx0 fi 2
(9)
denotes the high-order standard approximation of fx using averaged central differences. Revisiting the experiment from Fig. 1, we realise that this definition agrees with the results obtained there. The High-Resolution-Type (HRT) Discretisation Scheme. Now we have everything at hand to define the adaptive HRT discretisation scheme as (fx )ki := fxL i + Φ (Θi ) fxH i − fxL i , (10) using a blending function Φ(Θi ). It is close to 1 in smooth signal regions (indicated by Θi ), yielding a high-order approximation there. At discontinuities it is close to 0 which leads to a low-order approximation that is better suited there. For the actual choice of Φ(Θi ) we propose 1 − ΘTi , if 0 ≤ Θi < T , (11) Φ(Θi ) := 0, else , using a threshold parameter T > 0. Note that for T → 0 we obtain the upwind scheme and for T → ∞ one falls back to a standard scheme. Applying the HRT scheme to the signal sequence from Fig. 1 gives the same result as with the appropriate upwind scheme, hence we omit an additional figure. However, for more challenging stereo and optic flow problems that we discuss in Sect. 3 and 4, the blending of the HRT scheme will give results superior to a pure upwind scheme.
3
Integration into Variational Stereo Approaches
In this section we integrate our adaptive HRT discretisation scheme into a recent variational stereo approach by Slesareva et al. [6]. We restrict ourselves to the rectified scenario where displacements can only occur in horizontal direction and thus one has to solve a 1-D correspondence problem for each image row. However, it makes sense to couple those via a 2-D smoothness assumption, as will be described now.
642
3.1
H. Zimmer et al.
Variational Stereo
We consider the image pair fl (x) ≡ f (x, t) and fr (x) ≡ f (x, t + 1) denoting the left and right view of a static scene, respectively. Here, x := (x, y) denotes the location within a rectangular image domain Ω2 ⊂ IR2 . Further assume that the images are presmoothed by a Gaussian convolution of standard deviation σ. The unknown scalar-valued disparity is given by the absolute value of u which can be written as u := (u, 0) in the rectified case. In accordance to [6], the disparity is found by minimising the energy E(u) = [M (u) + α V (u)] dx . (12) Ω2
The data term
2 2 , M (u) = ΨM |fr (x+u) − fl (x)| + γ |∇fr (x+u) − ∇fl (x)|
(13)
where ∇ := (∂x , ∂y ) denotes the spatial gradient operator, combines the brightness and gradient constancy assumption weighted by γ > 0. The latter makes the method more robust under illumination changes. To cope with√ outliers caused by noise or occlusions, a robust penaliser function ΨM (s2 ) := s2 + ε2 using a small regularisation parameter ε > 0 is employed that results in modified L1 penalisation. As will be described below, the linearisation of the data term is postponed to the minimisation phase to allow for a correct handling of large displacements. The smoothness term V (u) = ΨV (|∇u|2 ) , (14) uses the same robust non-quadratic penaliser function as the data term, i.e., ΨV = ΨM , resulting in Total Variation regularisation [8]. Concerning the minimisation of the energy (12), we refer to [6] for the corresponding Euler-Lagrange equation. To solve it, we employ a coarse-to-fine multiscale warping approach [4] and compute on each warping level small flow increments du using the linearised data term
2 2 2 ΨM (fx du + ft ) + γ (fxx du + fxt ) + (fxy du + fyt ) . (15) Note that the discretised Euler-Lagrange equation now leads to a nonlinear system of equations. After linearisation, we obtain a large but sparse linear system, which can be solved efficiently by an iterative solver of Gauß-Seidel type [14]. 3.2
The HRT Discretisation Scheme for Variational Stereo
We now adapt the HRT scheme from Sect. 2.3 to the stereo setting. First, we extend the discrete grid to a 2-D version with grid sizes hx and hy in x- and y-direction, respectively. The images and the disparity are then approximated k+1 k by fl (xi , yj ) ≈ fi,j , fr (xi , yj ) ≈ fi,j and u(xi , yj ) ≈ ui,j .
Hyperbolic Numerics
643
I. Smoothness Measures. In the 2-D stereo case, we first of all need distinct smoothness measures Θx , Θy and Θxy for the x-, y- and xy-direction, respectively. For Θx we use the according expression (7) from the 1-D case and Θy is obtained by using y- instead of x-differences. With their help, the mixed expression is defined as Θxy = Θx + Θy . II. Derivative Approximations. Inspecting the linearised data term from (15), we realise that now also the second-order derivatives fxx , fxt , fxy and fyt need to be discretised. Due to space limitations we will exemplify our approach for fxy . The other derivatives are than approximated accordingly. Note that given the two signals k+1 k fi,j and fi,j , the time derivative ft is always approximated as in (3). We start with the high-order approximation of fxy = ∂x fy . This translates to the finite difference case as
1 k+ 1 k+1 H k k (fxy + Dx0 Dy0 fi,j (16) )i,j = Dx0 Dy0 fi,j 2 = Dx0 Dy0 fi,j 2 0 k+1 1 0 k k+1 k Dx fi,j+1 − fi,j−1 + Dx fi,j+1 − fi,j−1 (17) = 4hy k 1 k k k = fi+1,j+1 − fi+1,j−1 (18) − fi−1,j+1 − fi−1,j−1 8hx hy k+1 k+1 k+1 k+1 +fi+1,j+1 . − fi+1,j−1 − fi−1,j+1 − fi−1,j−1 Note that for fxx we employ the central discretisation in accordance to (6). In the low-order case we use the upwind discretisation of (fx )ki,j , steered by the predictor u ˜. For the y-derivative we employ the averaged central difference approximation as in the rectified scenario, the displacement in y-direction is always zero. Thus we obtain for u ˜>0:
1 L k − 0 k k k k k (fxy )i,j = Dx Dy fi,j = f , (19) −f − fi−1,j+1 −fi−1,j−1 2hx hy i,j+1 i,j−1 and a corresponding expression for u ˜ < 0. Note that we do not need a larger smoothness weight α ˜ to compute u ˜ in this case since an appropriate α for usual stereo pairs will be large enough. 3.3
Experiments for Variational Stereo
We now show results for disparity computations using the approach of Slesareva et al. [6] with different derivative approximations. We use greyscale versions of the stereo image data from the Middlebury University [15]1 . To measure the quality of estimated disparities compared to the given ground truth disparities, we employ the bad pixel error (BPE) measure [15]. As fixed parameters we set ε = 10−3 and T = 1. In the stereo case we set σ = 0.5 and for the optic flow experiments in Sect. 4 we set σ = 0.8. 1
Available under http://vision.middlebury.edu/stereo
644
H. Zimmer et al.
In Fig. 2, the results for the Plastic pair are depicted. Considering the bad pixel maps in Fig. 2 (b)–(c), we see that the HRT scheme improves the results in the vicinity of image discontinuities and at the boundaries. Those areas are marked grey in the error maps. Note that the artefacts in Fig. 2 (f) are again caused by occlusions. The improvement also becomes visible in the BPE measures that are summarised in Table 1 that also lists other Middlebury pairs and parameter settings. Also error measures for a pure upwind scheme are given there. Comparing them to the HRT scheme shows that the blending of the latter scheme also pays off in terms of quality measures.
Fig. 2. Top row: (a) Left image of the Plastic pair. (b) Bad pixels for approach with a standard derivative approximation (bad pixels are coloured black). (c) Same for the HRT scheme. Bottom row: (d) Ground truth disparity. (e) Disparity for approach with a standard derivative approximation. (f ) Same for the HRT scheme.
4
Extension to Variational Optic Flow
Having presented how to employ the adaptive HRT discretisation scheme for stereo, its extension to the optic flow case is more or less straightforward. For optic flow we consider a presmoothed image sequence f (x, t) and want to compute a flow field w := (u, v) , where u and v give the displacements in x- and y-direction, respectively. Using the method of Brox et al. [4] that was the basis for the stereo approach of Slesareva et al. [6], we compute w as the minimiser of an energy functional similar to the one from (12). One difference concerning the HRT scheme is that we now also have to approximate fy and fyy . This, however, works accordingly to the stereo case. More problematic are the low-order upwind approximations of fxy , as they now depend on a predictor w ˜ = (˜ u, v˜) . Hence we need to do an extensive case distinction taking into account all possible combinations of the signs of u ˜ and v˜. For example, let u ˜ > 0 and v˜ < 0 then
Hyperbolic Numerics
645
Table 1. BPE measures and parameters for stereo experiments Image Pair Derivative Approximation standard Plastic upwind HRT scheme standard Teddy upwind HRT scheme standard Venus upwind HRT scheme
L k k = (fxy )i,j = Dx− Dy+ fi,j
α= α= α= α= α= α= α= α= α=
Parameters 5.5, γ = 190.0 5.5, γ = 190.0 5.5, γ = 190.0 8.0, γ = 9.5 8.0, γ = 9.5 8.0, γ = 9.5 4.5, γ = 0.5 4.5, γ = 0.5 4.5, γ = 0.5
BPE 25.85 21.35 18.85 17.45 16.94 16.75 3.06 2.78 2.77
1 k k k k fi,j+1 . − fi,j − fi−1,j+1 − fi−1,j hx h y
(20)
In order to show that the HRT scheme also performs favourably for optic flow, we performed experiments using the recent optic flow data sets from the Middlebury University [16]2 . In Fig. 3 we show results obtained for the Urban3 sequence. Note that the error maps now show the magnitude of the average angular error (AAE) [17] measure. Inspecting them, the favourable performance of the HRT scheme in the marked regions becomes visible, which is also reflected in the AAE measures shown in Table 2. It again comprises also other Middlebury sequences, parameter settings and results for the upwind scheme. Concerning the latter, we see that also for optic flow, the HRT scheme performs better.
Fig. 3. Top row: (a) Frame 10 of the Urban3 sequence. (b) AAE map for approach with a standard derivative approximation. (c) Same for the HRT scheme. Bottom row: (d) Flow magnitude of the ground truth. (e) Flow magnitude for approach with a standard derivative approximation. (f ) Same for the HRT scheme.
2
Available under http://vision.middlebury.edu/flow
646
H. Zimmer et al. Table 2. AAE measures and parameters for optic flow experiments Image Sequence Derivative approximation standard Urban3 upwind HRT scheme standard RubberWhale upwind HRT scheme standard Dimetrodon upwind HRT scheme
5
Parameters α = 4.5, γ = 4.0 α = 4.5, γ = 4.0 α = 4.5, γ = 4.0 α = 50.0, γ = 50.0 α = 50.0, γ = 50.0 α = 50.0, γ = 50.0 α = 7.0, γ = 10.0 α = 7.0, γ = 10.0 α = 7.0, γ = 10.0
AAE 5.71 4.58 4.11 4.72 4.73 4.34 1.94 3.06 1.88
Conclusions and Outlook
In this paper we have presented a sophisticated numerical scheme for the approximation of spatial image derivatives in variational approaches to correspondence problems. Our experiments demonstrated that such a scheme allows to tangibly improve the quality of results, which has in more than 20 years of research in this field only been experienced by model refinements. We hence conjecture that the numerics can be a fruitful alternative starting point for further advances. This finding is no surprise for people acquainted with the theory of HDEs where sophisticated numerical schemes have been thoroughly investigated. In this paper we have seen that HDEs and variational approaches share some structural similarities. However, we were the first to utilise this similarity for developing a well-engineered numerical scheme for variational approaches. We want to stress that the adaptive discretisation scheme developed within this paper is for sure not the only lucrative technique that can be adapted from the field of HDEs. Our current research is thus concerned with exploring further directions that may lead to better numerical schemes for variational approaches to correspondence problems.
Acknowledgement Henning Zimmer gratefully acknowledges funding by the International MaxPlanck Research School (IMPRS).
References 1. Horn, B., Schunck, B.: Determining optical flow. Artificial Intelligence 17, 185–203 (1981) 2. Alvarez, L., Deriche, R., Papadopoulo, T., Sanchez, J.: Symmetrical dense optical flow estimation with occlusions detection. International Journal of Computer Vision 75(3), 371–385 (2007)
Hyperbolic Numerics
647
3. Ben-Ari, R., Sochen, N.: Variational stereo vision with sharp discontinuities and occlusion handling. In: Proc. 2007 IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, pp. 1–7. IEEE Computer Society Press, Los Alamitos (2007) 4. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004) 5. Nir, T., Bruckstein, A.M., Kimmel, R.: Over-parameterized variational optical flow. International Journal of Computer Vision 76(2), 205–216 (2008) 6. Slesareva, N., Bruhn, A., Weickert, J.: Optic flow goes stereo: A variational method for estimating discontinuity-preserving dense disparity maps. In: Kropatsch, W., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 33–40. Springer, Heidelberg (2005) 7. Zimmer, H., Bruhn, A., Valgaerts, L., Breuß, M., Weickert, J., Rosenhahn, B., Seidel, H.P.: PDE-based anisotropic disparity-driven stereo vision. In: Deussen, O., Keim, D., Saupe, D. (eds.) Proceedings of Vision, Modeling, and Visualization (VMV) 2008, pp. 263–272. AKA, Heidelberg (2008) 8. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 9. Marquina, A., Osher, S.: Explicit algorithms for a new time dependent model based on level set motion for nonlinear deblurring and noise removal. SIAM Journal on Scientific Computing 22(2), 387–405 (2000) 10. LeVeque, R.J.: Numerical Methods for Conservation Laws. Birkhäuser, Basel (1992) 11. LeVeque, R.J.: Finite Volume Methods for Hyperbolic Problems. Cambridge University Press, Cambridge (2002) 12. Morton, K.W., Mayers, L.M.: Numerical Solution of Partial Differential Equations. Cambridge University Press, Cambridge (1994) 13. Thomas, L.H.: Elliptic problems in linear difference equations over a network. Technical report, Watson Scientific Computing Laboratory. Columbia University, New York (1949) 14. Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003) 15. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision 47(1-3), 7–42 (2002) 16. Baker, S., Roth, S., Scharstein, D., Black, M., Lewis, J., Szeliski, R.: A database and evaluation methodology for optical flow. In: Proc. 2007 IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, pp. 1–8. IEEE Computer Society Press, Los Alamitos (2007) 17. Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical flow techniques. International Journal of Computer Vision 12(1), 43–77 (1994)
From a Single Point to a Surface Patch by Growing Minimal Paths Fethallah Benmansour and Laurent D. Cohen CEREMADE, UMR CNRS 7534, Université Paris Dauphine, Place du Maréchal De Lattre De Tassigny, 75775 PARIS CEDEX 16, France {benmansour,cohen}@ceremade.dauphine.fr
Abstract. We introduce a novel implicit approach for surface patch segmentation in 3D images starting from a single point. Since the boundary surface of an object is locally homeomorphic to a disc, we know that the boundary of a small neighboring domain intersects the surface of interest on a single closed curve. Similarly to active surfaces, we use a cost potential which penalizes image regions of low interest. First, Using a front propagation approach from the source point chosen by the user, one can see that the closed curve corresponds to a valley line of the arrival time from the source point. Next, we use an implicit 3D segmentation method. It assumes that the object boundary contains two known constraining curves. In our case, the first curve is reduced to a point and the other one is automatically detected by our approach. A partial differential equation is introduced and its solution is used for segmentation. The zero level set of this solution contains the valley line and the source point as well as the set of minimal paths joining them. We present a fast implementation which has been successfully applied to 3D biomedical and synthetic images.
1
Introduction
In this paper we are interested in interactive segmentation of a surface in a 3D image by clicking a single point on the boundary of an object and obtaining a patch of the desired surface around the given point. For this we use energy minimizing techniques and partial differential equations. Energy minimization techniques have been applied to a broad variety of problems in image processing and computer vision. Since the original work on snakes [1], they have notably been used for boundary detection. An active contour model, or snake, is a curve that deforms its shape in order to minimize an energy combining an internal part which smooths the curve and an external part which guides the curve toward particular image features. One of the main drawbacks of this approach is that it suffers from local minima ’traps’. Consequently, results strongly depend on the model initialization. Since the publication of [1], much work has been done in order to free active models from the problem of local minima. Cohen and Kimmel [2] introduced an approach to globally minimize the geodesic active contour energy, provided that two endpoints of the curve are X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 648–659, 2009. c Springer-Verlag Berlin Heidelberg 2009
From a Single Point to a Surface Patch by Growing Minimal Paths
649
initially supplied by the user. This energy is of the form γ P˜ where the incremental cost P˜ is chosen to take lower values on the contour of the image, and γ is a path joining the two points. The solution of this minimization problem is obtained through the computation of the minimal action map associated to a source point. The minimal action map can be regarded as the arrival times of ˜ and it satisfies a front propagating from the source point with velocity (1/P), the Eikonal equation. Therefore, we can compute efficiently the minimal action map with the Fast Marching Method as will be detailed in section 2. However their approach [2] cannot be directly extended to find the global minimum for an active surface in a 3D image. Nevertheless, this approach has been extended to surfaces in a 3D image by extracting a minimal surface laying on two given curves [3]. The advantage of this method is that it does not suffer from local minima problems, as would other active surface methods like [4, 5]. In this work, we focus on a novel approach for 3D object segmentation. Our aim is to generate a local surface patch from a single point. The method presented herein can be seen as an extension of the Eulerian approach presented by Ardon et al in [3] for surface extraction from a couple of ’constraining’ closed curves. But in our case, one of the curves is reduced to a single point and the other one is unknown. Let P˜ : Ω → R+ be a potential , where Ω ⊂ R3 , such that P˜ takes lower values on the surface of the object to be extracted, noted S and unknown. Having a single point p on S and a neighborhood od p: Σ ⊂ Ω, the required conditions are (see figure 1.) • the boundary ∂Σ is a connected closed surface. • ∂Σ ∩ S is a simple closed curve. • p ∈ S ∩ Σ. The volume Σ might be a ball or any topology equivalent volume. Our objective is to find the surface patch S ∩ Σ from the source point p and the potential P. We proceed in two stages : first, we look for the boundary of the
Fig. 1. On the left, one can see the required conditions for the surface patch extraction. The point p must be initialized on the surface S in the volume Σ. ∂Σ, the boundary of Σ, is a closed surface and ∂Σ ∩ S is a simple closed curve. On the right, we represent the information one has in practice : the surface S is unknown but the potential P takes lower values along S and higher values elsewhere.
650
F. Benmansour and L.D. Cohen
surface patch S ∩ ∂Σ and give a good estimate of it Γ ; in fact, running the Fast Marching algorithm (which will be detailed is section 2) from the source point p one can see that the Valley Line, noted Γ , of the arrival time on the boundary ∂Σ is a good approximation of S ∩∂Σ. A detailed definition of the valley line and the way it is extracted is presented in section 3. Next, one can represent the surface of interest as a dense network of minimal paths joining points of the valley line Γ to the source point p (section 4). The surface generated by this algorithm is completely composed of globally minimal paths. Indeed, by solving a stationary transport equation of the form : ∇Ψ.∇U = 0, where U is the action map (defined in section 2), and Ψ is the unknown, we show that any minimal path between the valley line Γ and the source point p is contained in its zero level set Ψ −1 ({0}). Important advantages of this approach are that it needs minimal interaction and that it is computationally efficient as explained later. This approach can also be used as computing brick for a complete segmentation from one single point (see section 5). Segmentation results on synthetic and medical images are presented in section 5. Finally conclusions, advantages and drawbacks of our method, and perspectives follow in section 6.
2
Background on Minimal Paths
Given a 3D image I : Ω → R+ and two points p1 and p2 , the underlying idea introduced by Cohen and Kimmel [2] is to build a potential P : Ω → R∗+ which takes lower values near desired features of the image I. The choice of the potential P depends on the application. For example, one can define P as a decreasing function of ∇I to extract image edges by finding a curve that globally minimizes the energy functional E : Ap1 ,p2 → R+ P γ(s) + w ds = E(γ) = P˜ γ(s) ds, (1) γ
γ
where Ap1 ,p2 is the set of all paths connecting p1 to p2 , s is the arc-length parameter, w > 0 is a regularization term and P˜ = (P + w). A curve connecting p1 to p2 that globally minimizes the energy (1) is a minimal path between p1 and p2 , noted Cp1 ,p2 . The solution of this minimization problem is obtained through the computation of the minimal action map U1 : Ω → R+ associated to p1 . The minimal action is the minimal energy integrated along a path between p1 and any point x of the domain Ω: ∀ x ∈ Ω, U1 (x) = min P˜ γ(s) ds . (2) γ∈Ap1 ,x
γ
The values of U1 may be regarded as the arrival times of a front propagating ˜ U1 satisfies the Eikonal equation from the source p1 with velocity (1/P). ˜ for x ∈ Ω, and U1 (p1 ) = 0. ∇U1 (x) = P(x)
(3)
From a Single Point to a Surface Patch by Growing Minimal Paths
651
Fig. 2. Minimal action map U from the source p using the potential P of figure 1 computed using the Fast Marching algorithm. Left: slices through the volume. Right: some equi-distant surfaces (level sets) of U.
The map U1 has only one local minimum, the point p1 , and its flow lines satisfy the Euler-Lagrange equation of functional (1). Thus, the minimal path Cp1 ,p2 can be retrieved with a simple gradient descent on U1 from p2 to p1 , solving the following ordinary differential equation with standard numerical methods like Heun’s or Runge-Kutta’s: dCp1 ,p2 (s) = −∇U1 Cp1 ,p2 (s) , and Cp1 ,p2 (0) = p2 . ds 2.1
(4)
Fast Marching Method
The Fast Marching Method (FMM) is a numerical method introduced by Sethian in [6] and Tsitsiklis in [7] for efficiently solving the isotropic Eikonal equation on a cartesian grid. In equation (3), the values of U may be regarded as the arrival ˜ The times of wavefronts propagating from the point of S with velocity (1/P). central idea behind the FMM is to visit grid points in an order consistent with the way wavefronts propagate. It leads to a single-pass algorithm for solving equation (3) and computing the minimal action map U. The FMM is a front propagation approach that computes the values of U in increasing order, and the structure of the algorithm is almost identical to Dijkstra’s algorithm for computing shortest paths on graphs [8]. In the course of the algorithm, each grid point is tagged as either Alive (point for which U has been computed and frozen), Trial (point for which U has been estimated but not frozen) or Far (point for which U is unknown). The set of Trial points forms an interface between the set of grid points for which U has been frozen (the Alive points) and the set of other grid points (the Far points). This interface may be regarded as a set of fronts expanding from each source until every grid point has been reached. The key to the speed of the FMM is the use of a priority queue to quickly find the Trial point with the smallest U value. If Trial points are ordered in a min-heap data structure, the computational complexity of the FMM is O(N log2 N ), where N is the total number of grid points.
652
F. Benmansour and L.D. Cohen
A way to estimate U, for a grid point xn is presented here. We limit ourselves to the 3D case. Adopting standard notation, we denote by Ui,j,k the value of U at the grid vertex (i, j, k) associated to the point xn with coordinates (i hx , j hy , k hz ), where hx , hy and hz are grid spacings in the x, y and z directions. A discretized version of (3) is solved in order to compute Ui,j,k . For the Eikonal equation, classic finite difference schemes tend to overshoot and are unstable. Rouy and Tourin [9] showed that the correct viscosity solution for Ui,j,k is given by the following first order accurate scheme :
max{(Ui,j,k − Ui−1,j,k ), (Ui,j,k − Ui+1,j,k ), 0} hx max{(Ui,j,k − Ui,j−1,k ), (Ui,j,k − Ui,j+1,k ), 0} hy max{(Ui,j,k − Ui,j,k−1 ), (Ui,j,k+1 − Ui,j,k ), 0} hz
2 + 2 + 2
= (P˜i,j,k )2 .
(5)
This is an upwind scheme : the forward and backward differences are chosen to follow the direction of the flow of information.
3
Valley Line Detection
In this section, we present a method to extract the intersection between the sub-domain boundary and the unknown surface of interest. We propose to use the minimal action map to extract the desired curve, since one can see that it corresponds to a valley line of the minimal action map (without a formal proof). Ridge and valley lines are concepts used in geomorphology and computer vision [10, 11]. According to Koenderink [12], valley lines are the locus of points on a surface at which the normal curvature assumes a local minimum in the principal direction associated with the largest, negative curvature. The main drawback of the existing criteria [10, 11] is that thresholding is needed. Hence, the detection is not precise enough, and needs more interaction for real noisy images. Moreover, these approaches are not adapted to our case where we want to extract the valley line of a scalar function defined on a surface topologically equivalent to a sphere. Our approach is heuristic, based on the fact that the fast marching propagates faster along the desired surface and then the minimal action map takes lower values along the curve of intersection between the domain boundary and the surface. Discrete definition of Σ and ∂Σ and Minimal action map on ∂Σ In practice, we assume that the volume Σ is defined as a boolean array. Then, we can partition Σ into two subsets, int(Σ) and ∂Σ, its interior and its boundary. A voxel x ∈ Σ is in the interior of the volume if all its 6 neighbors are in Σ, and it is a point of the boundary ∂Σ if x ∈ Σ \ int(Σ). Then ∂Σ is also represented by a boolean array (see figure 3).
From a Single Point to a Surface Patch by Growing Minimal Paths
653
(b)
(a)
Fig. 3. Discrete representation of the volume Σ and its boundary ∂Σ. (a) The volume Σ is described by a boolean array. (b) Σ is partitioned into two subsets int(Σ) and ∂Σ such that ∂Σ is connex according to on 26-connectivity.
(a)
(b)
(c)
frontier Γ of the surface patch
Fig. 4. Minimal action map associated to source point p and potential of figure 1. (a) Cut views of the minimal action map U on volume Σ. (b) View of U on ∂Σ, and its valley line Γ . (c) Unfolded U|∂Σ , valley line, and different marked points on Γ correspond to local minima.
Let us note U|∂Σ : ∂Σ → R+∗ the restriction of U on ∂Σ (see figure 4.) The value U(x) for a point x in ∂Σ is the arrival time to point x of the wavefront ˜ Since potential P˜ takes propagating from the source point p with velocity 1/P. lower values along the surface S, the front propagates faster along it. So, we can reasonably assume that the first point reached by the front on ∂Σ belongs to ∂Σ ∩ S. This point is easy to detect, because it is the global minimum of U|∂Σ and is noted xmin . In a more general manner, each local minimum xm of U|∂Σ has been reached by the front before all points in a small neighborhood of xm . Since, the wavefront propagates faster along S, one can expect that the curve ∂Σ ∩ S corresponds to valley lines on U|∂Σ . For valley line detection, our approach is simple and fast. Using the function U|∂Σ and without parametrizing the surface ∂Σ, we find frontier Γ of the surface patch S ∩ ∂Σ by looking for the cyclic sequences of the valley lines of U|∂Σ containing xmin . Finding Valley Lines of U|∂Σ As explained above, valley lines of U|∂Σ contain the local minima xm as well as the saddle points. A robust way to link two local minima is to detect the saddle point between them and to make a double gradient descent to each minimum. The difficulty here is that some local minima and saddle points of U|∂Σ do not
654
F. Benmansour and L.D. Cohen
belong to the curve of interest. To avoid this, saddle points of U|∂Σ are detected by increasing order. During this step, we store the information on a graph G such that vertices of the graph correspond to local minima of U|∂Σ , and an edge corresponds to a pair of valley lines joining two local minima via a saddle point. The valley line detection algorithm stops when a cycle (in the sense of a simple closed path) is detected in the graph G. However, the closed curve Γ tends to have low length, linking between close local minima. In practice, one adds two ad hoc constraints which make it possible to extract the border of the surface patch in a more robust manner. The algorithm stops as soon as the global minimum of U∂Σ , xmin , belongs to the closed sequence , and the subset of int(Σ) defined by : −1 U|int(Σ) (] max{U(x )}, +∞[) = {x ∈ int(Σ); U(x) > max(U(x ))} x ∈
x ∈
includes exactly two connected components for the 26-connectivity, which means that the sequence cuts the boundary ∂Σ into exactly two connected components (see figure 4).
4
Dense Network of Minimal Paths: An Implicit Approach
Once the boundary curve Γ is obtained, it is easy to construct explicitly a network of minimal paths linking points of Γ to the source point p by simple gra Γ dient descents as in [13]. The network linking Γ to p is noted Np = CxΓ ,p . xΓ ∈Γ
Since this networkmay have holes, our objective is to find a smooth function Ψ : Σ → R, such that the network NpΓ is included in the zero level set of Ψ , i.e NpΓ ⊂ Ψ −1 ({0}), where Ψ −1 ({0}) = {x ∈ Σ; Ψ (x) = 0}. A necessary condition on function Ψ is ∇Ψ (x).∇U(x) = 0, (6) for each point x belonging to a path CxΓ ,p . Thus, vector ∇Ψ is perpendicular to ∇U along the minimal paths of the network NpΓ . Extending the constraint given by equation (6) to the whole domain Σ gives a sufficient condition on Ψ . Moreover, adding a linear term on Ψ smoothes the solution without changing the zero level set of Ψ . Hence, if Ψ is a smooth function satisfying the following conditions:
(C1 ) ∀ x ∈ Σ, ∇Ψ (x) · ∇U(x) − α Ψ (x) = 0, (7) (C2 ) ∀ x ∈ Γ, Ψ (x) = 0, where α ≥ 0, then NpΓ ⊂ Ψ −1 ({0}). Finally, Ψ −1 ({0}) is a dense network of minimal paths. Indeed, if Ψ satisfies conditions (C1 ) and (C2 ), then ∀x ∈ Ψ −1 ({0}), the minimal path Cx,p linking x to the source p in included in Ψ −1 ({0}). Detailed proof of these results can be found in [3]. Using conditions (C1 ) and (C2 ), we look for a solution Ψ of the following Dirichlet problem:
From a Single Point to a Surface Patch by Growing Minimal Paths
∇Ψ (x) · ∇U(x) − α Ψ (x) = 0 if x ∈ int(Σ), if x ∈ ∂Σ, Ψ (x) = d|∂Σ (x)
655
(8)
where d|∂Σ is a signed Euclidean distance to Γ on ∂Σ. Indeed, that makes the function Ψ satisfying the second condition (C2 ). One can propose other boundary conditions satisfying (C2 ), but empirically, we found that the signed distance is an adequate choice. Since Γ is a simple closed curve on ∂Σ and ∂Σ is topologically equivalent to a sphere, Γ partitions ∂Σ into two distinct open surfaces. That makes the sign choice for d|∂Σ obvious. First, the unsigned distance from Γ on ∂Σ is calculated using the Fast Marching algorithm (this time using 26-connectivity), then different signs are attributed to the distance on each connected component of ∂Σ \ Γ of the partition (see figure 5).
Fig. 5. Transport initialization. First, the distance map from the curve Γ is computed. Then using Γ , ∂Σ \ Γ is partitioned into exactly two parts. Finally, different signs are attributed to d|∂Σ on each connected component.
Equation (8) is a stationary transport equation. The associated non stationary PDE models the transport in time and space of material along the vector field ∇U. The stationary transport equation has been studied [3] for surface segmentation, for computing tissue thickness [14] and inpainting [15]. The stationary transport equation (8), as most PDEs for which characteristics intersect are numerically hard to solve. Nevertheless, the direction on which information propagates is known (−∇U) thus one can elaborate a single pass algorithm based on an ordered sweeping of the grid points [3,14,15]. We propose to find values of Ψ by exploring points of Σ in decreasing order of |Ψ |. The algorithm, called Fast Transport is similar to the Fast Marching algorithm : only the ordering is different as well as the local update scheme. The complexity of the Fast Transport algorithm is O(N log(N )). The information propagates from ∂Σ to the source point p following the direction −∇U. Thus, it is important to use an upwind scheme that takes into account the direction −∇U to approximate the derivatives of Ψ . Let us note Ψi,j,k the value of Ψ at point x of coordinate (ihx , jhy , khz ), ∂d Ψi,j,k the derivative of Ψ along direction d (d corresponds to x, y or z-direction) and ∂d Ui,j,k the derivative of U along direction d. If ∂d Ui,j,k < 0, the information is transported increasingly along d direction. Thus along the x direction we have:
656
F. Benmansour and L.D. Cohen
Fig. 6. On the left and on the middle are respectively shown, on a cut view, the function Ψ and its sign on Σ. On the right is shown the extracted surface pathc, i.e. the isosurface Ψ −1 ({0}), as well as the network of minimal paths NpΓ .
∂x Ψi,j,k
⎧ Ψi+1,j,k − Ψi,j,k ⎪ ⎪ if ∂x Ui,j,k ≥ 0, ⎨ hx = ⎪ − Ψi−1,j,k Ψ ⎪ ⎩ i,j,k if ∂x Ui,j,k < 0. hx
The derivatives along y and z direction are similar. The update scheme of the Fast Transport algorithm is based on the previous equation, by injecting it in equation (8), see [3] for more details. Lastly, although this scheme is of relatively low precision and dissipative, it gives satisfactory results in our experiments with an acceptable convergence speed. In our implementation α is a parameter that can be fixed through the maximum discontinuity jump of Ψ around the source p. Indeed, by considering the minimal path Cx,p , linking a point x ∈ ∂Σ to p, parametrized on the interval J = [0, L(x)], where L(x) is the Euclidean length of the path, one can prove using equation (8) that ∀ s ∈ J, Ψ Cx,p (s) = d|∂Σ (x) e−αs . Thus the discontinuity jump occurs around the source point p and is as high as |d|∂Σ (x)|e−αL(x) . Fixing a maximum discontinuity jump ε and
α =
log max |d|∂Σ (x)| − log(ε) x∈∂Σ
min L(x)
,
x∈∂Σ
guaranty that the discontinuity jump around the source point p is less or equal than ε. Imposing this constraint requires the computation of the Euclidean length L of the minimal paths. This calculus can be easily done during the Fast Marching propagation as explained in [16, 17]. On figure 6, function Ψ solution of equation (8), the final segmentation result Ψ −1 ({0}) as well as the network of minimal paths are shown.
From a Single Point to a Surface Patch by Growing Minimal Paths
657
Fig. 7. We select a sub-volume from a CT cardiac image. Then an edge detector potential, inversely proportional to the gradient magnitude of the image ∇I is shown. The Fast Marching algorithm is launched from the selected source point to compute the minimal action map U. Then the valley line of U is calculated. Finally the information is transported from the initialized values of the sub-volume boundary using the fast transport algorithm, and the segmentation result of this patch of surface is found using the marching cube algorithm on the solution of the transport equation.
Fig. 8. On the left: segmentation of a synthetic torus. On the right: segmentation of a closed cell from electronic microscopy image. (a) Potential P taking lower values on the features of interest on which a single source point is selected. The other points are found automatically using the approach presented in [17]. (b) A cut view of the visited domain Ω ∗ showing the value of the minimal action map U. (c) A Cut view of the domain Ω ∗ showing the Voronoi partition. (d) The set of sources and the valley lines detected on each Voronoi cell. (e) A cut view of the domaine Ω ∗ showing values of function Ψ solution of the transport equation (8). (f ) Isosurface Ψ −1 ({0}) on which the detected keypoint points, the valley lines and the geodesic meshing are superimposed. On the right: (g-h-i) Some slices of the original image and the final segmentation Ψ −1 ({0}) superimposed on it.
5
Experimental Results
Using our method, one can extract a surface patch from a single point, see figure 7. The main advantages of our method is that it is minimally interactive
658
F. Benmansour and L.D. Cohen
and fast. The important constraint is that the boundary of the selected subvolume intersects the surface on a single closed curve. One can imagine that by considering a subdivision of the whole domain, and by selection of a few points on the sub-domains that contains the surface of interest, one can extract a full segmentation of the desired object. Recently, we presented [17] a new method for segmenting closed contours and surfaces. Our work builds on a variant of the minimal path approach. First, an initial point on the desired contour is chosen by the user. Next, new keypoints are detected automatically using a front propagation approach. We assume that the desired object has a closed boundary. This a-priori knowledge on the topology is used to devise a relevant criterion for stopping the keypoint detection and front propagation. The final domain visited by the front will yield a band surrounding the object of interest. Using this method for 3D closed objects, we can extract a networks of minimal paths from a 3D image called Geodesic Meshing. But this segmentation is insufficient. The Voronoi partition of the visited domain gives a good subdivision of it, and by applying the algorithm presented in this paper on each Voronoi cell, one can find a full segmentation of the object of interest, see figure 8.
6
Conclusion
In this paper we have proposed a new method to segment a surface patch from a single source point. Our method needs minimal interaction : a single source point. An important condition is that the boundary of the sub-volume that contains the surface patch of interest should intersects the surface on a single closed curve. By remarking that this closed curve corresponds to the valley line of the arrival time from the source point we have proposed a heuristic to extract it automatically. Finally we adapted an existing implicit surface segmentation method to find a complete surface that contains the valley line and the network of minimal paths linking this valley line to the source point. Our approach can be extended to segment a complete surface by subdividing the domain into several sub-domains containing the desired surface patches. Then, a few points can be enough to generate a coherent object boundary segmentation.
Acknowledgements We would like to thank Stéphane Bonneau for his contributions, and Professor Anthony J. Yezzi for interesting discussions. This work was partially supported by ANR grant SURF -NT05-2_45825.
References 1. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. International Journal of Computer Vision 1, 321–331 (1988) 2. Cohen, L.D., Kimmel, R.: Global minimum for active contour models: a minimal path approach. International Journal of Computer Vision 24, 57–78 (1997)
From a Single Point to a Surface Patch by Growing Minimal Paths
659
3. Ardon, R., Cohen, L.D., Yezzi, A.: Fast surface segmentation guided by user input implicit extension of minimal paths. Journal of Mathematical Imaging and Vision 25, 289–305 (2006) 4. Caselles, V., Kimmel, R., Sapiro, G., Sbert, C.: Minimal surfaces based object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 394–398 (1997) 5. Cohen, L.D., Cohen, I.: Finite element methods for active contour models and balloons for 2D and 3D images. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 1131–1147 (1993) 6. Sethian, J.A.: Level Set Methods and Fast Marching Methods. Cambridge University Press, Cambridge (1999) 7. Tsitsiklis, J.N.: Efficient algorithms for globally optimal trajectories. IEEE Transactions on Automatic Control 40, 1528–1538 (1995) 8. Dijkstra, E.W.: A note on two problems in connection with graphs. Numerische Mathematic 1, 269–271 (1959) 9. Rouy, E., Tourin, A.: A viscosity solution approach to shape from shading. SIAM Journal on Numerical Analysis 29, 867–884 (1992) 10. López, A., Lloret, D.: On ridges and valleys. In: ICPR 2000: Proceedings of the International Conference on Pattern Recognition, Washington, DC, USA, p. 4059. IEEE Computer Society, Los Alamitos (2000) 11. Tang, C.K., Medioni, G.G.: Extremal feature extraction from 3-D vector and noisy scalar fields. In: IEEE Visualization 1998, October 1998, pp. 95–102 (1998) 12. Koenderink, J.: Solide Shape. MIT Press, Cambridge (1990) 13. Ardon, R., Cohen, L.D.: Fast constrained surface extraction by minimal paths. Int. J. Comput. Vision 69(1), 127–136 (2006) 14. Yezzi, A., Prince, J.L.: An Eulerian PDE Approach for Computing Tissue Thickness. IEEE Transactions On Medical Imaging 22, 1332–1339 (2003) 15. Bornemann, F., Marz, T.: Fast image inpainting based on coherence transport. JMIV 28(3), 259–278 (2007) 16. Cohen, L.D., Deschamps, T.: Segmentation of 3D tubular objects with adaptive front propagation and minimal tree extraction for 3D medical imaging. Computer Methods in Biomechanics and Biomedical Engineering 10, 289–305 (2007) 17. Benmansour, F., Cohen, L.D.: Fast object segmentation by growing minimal paths from a single point on 2D or 3D images. Journal of Mathematical Imaging and Vision 33(2), 209–221 (2009)
Optimization of Convex Shapes: An Approach to Crystal Shape Identification Timo Eirola and Toni Lassila Helsinki University of Technology, Institute of Mathematics, P.O. Box 1100, FI-02015 TKK, Finland
[email protected] Abstract. We consider a shape identification problem of growing crystals. The shape of the crystal is to be constructed from a single interferometer measurement. This is an ill-posed inverse problem. The forward problem of interferogram from shape is injective if we restrict the problem to convex shapes with known boundary. The problem is formulated as a shape optimization problem. Our aim is to solve this numerically using the gradient descent method. In the numerical computations of this paper we study the behavior of the approach in simplified cases. Using H 1 -gradients (inner products) acts as a regularization method. Methods for enforcing the convexity of shapes are discussed.
1
Introduction
Shape optimization is a field of mathematical optimization concerned with finding the shape (bounded open set with Lipschitz boundary) that minimizes a given cost functional. Boundary variational techniques can be used to compute sensitivities of functionals with respect to shape. Comprehensive texts on the topic of shape analysis include [1] and [2]. We consider a shape identification problem of finding the shape of a growing 3 He crystal that best fits the interferogram produced in a Fabry-Pérot interferometer. Based on physical principles it is assumed that the crystal shape is convex at all times. For an overview of the growth process of 3 He crystals and the interferometer setup, see [3]. The restriction to convex shapes can be used as a simplification tool in shape optimization problems. In [4] the authors showed the existence of solutions to very generic shape optimization problems with the constraint that the shapes were convex. In our problem of determining shape from interferogram the operator solving the forward problem is generally not injective if the shapes are allowed to be nonconvex. We prove that if the convexity assumption holds and the height of the shape at the boundary of the computational domain is known then the shape identification problem does have a unique solution.
This work has been supported by the Academy of Finland (decision number 107290/04). We would like to thank Heikki Junes from the Low Temperature Laboratory at TKK for his input and introducing us to this problem.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 660–671, 2009. c Springer-Verlag Berlin Heidelberg 2009
Optimization of Convex Shapes
661
It has been previously noted that the convexity constraint can be difficult to handle in numerical computations, especially in higher dimensions. It is known that pointwise conditions, such as curvature conditions, can fail to guarantee convexity for functions sampled at discrete points. For further discussion on this point, see [5]. Methods for optimization in the family of convex functions have been previously studied in [5, 6, 7, 8, 9]. In contrast to most of these approaches we do not write a strict convex constraint system, but instead use a penalization method that allows convexity to be temporarily broken when it is beneficial to the convergence of the iteration. The shape identification problem is solved using level set methods and gradient descent for shapes. Methods for convexification by evolution equations, such as the level set method, have been previously considered in [10, 11]. As is typical for ill-posed inverse problems, the presence of experimental noise in the measurements requires some type of regularization. We demonstrate that using H 1 -gradients (inner products) for the shape gradients acts as a form of regularization.
2 2.1
Shapes and Shape Evolution Representing Shapes
We first define the notation. The computational domain D ⊂ IRd , d ∈ {1, 2}, is a convex bounded open set. We consider convex shapes (open sets with Lipschitz boundary) Ω ⊂ D × IR+ , which are supported by D from below, that is to say n(x), e3 < 0
=⇒
x3 = 0 ,
(1)
where n is the outward normal vector field on the surface ∂Ω. A convex shape Ω supported by D can be represented in many ways. One is to give a Lipschitz function φ : D × IR+ → IR such that Ω = {x : φ(x) < 0} ,
Ω c = {x : φ(x) ≥ 0}
(2)
and |∇φ| nonvanishing on ∂Ω. Then φ is called an implicit function or a level set function for Ω. An alternative representation of Ω is with a function u : D → IR+ defined as u(x1 , x2 ) = sup {x3 ≥ 0 : φ(x1 , x2 , x3 ) ≤ 0} , (3) where φ is an implicit function for Ω. We call this the height function of Ω. Note that if Ω is convex then u is concave. Denote by C ⊂ H 1 (D) ∩ C(D) the set of concave functions on D that are continuous on D. We also define Ch ⊂ C as the subset of concave functions that are equal to h on the boundary ∂D for a given function h : ∂D → IR+ .
662
2.2
T. Eirola and T. Lassila
Level Set Methods
Consider an initial shape Ω0 and an evolution its boundary ∂Ω0 under a smooth velocity field v(x, t). When the shape Ω(t) at time t is represented by an implicit function φ(·, t), we have an Eulerian representation of the evolution of the implicit function in time φt (x, t) + vn (x, t)|∇φ(x, t)| = 0 ,
(4)
where vn is the component of v in the outward normal direction of ∂Ω. This is called a level set equation. Level set methods are a generic framework of nonlinear hyperbolic-parabolic PDEs for implicit functions that can be used to model evolution of shapes under certain types of flows. For a generic introduction into level set methods, see [12]. For a survey of level set methods specifically in inverse problems, see [13].
3 3.1
Shape Optimization Shape Derivatives
Let J(Ω) : Σ → IR be a shape functional defined on some family of admissible shapes Σ. The derivative with respect to shape at Ω0 in the direction of the smooth velocity field v is defined as the limit dJ(Ω0 ; v) = lim+ t→0
J(Ωt ) − J(Ω0 ) t
(5)
when it exists. With some general assumptions (see Chap. 8 of [1] for details) this expression is bounded and linear with respect to v, and has support only on the boundary of Ω0 : dJ(Ω0 ; v) = D · vn dS . (6) ∂Ω0
Using the shape derivative (6) the shape functional can be expanded as J(Ωt ) = J(Ω0 ) + t · dJ(Ω0 ; v) + o(t) .
(7)
For a given Hilbert space H(∂Ω0 ) we look for the unique function ∇S J ∈ H(∂Ω0 ) such that dJ(Ω0 ; v) = ∇S J, vn H . (8) Then ∇S J is the shape gradient of J with respect to the chosen inner product. If the velocity normal field vn is chosen to be the negative shape gradient vn = −∇S J(Ω0 ) we have J(Ωt ) = J(Ω0 ) − t · ||dJ(Ω0 )||2H(∂Ω0 ) + o(t) < J(Ω0 )
(9)
for sufficiently small t > 0. This is the method of gradient descent for shape optimization. The negative gradient flow can be efficiently implemented with numerical level set methods.
Optimization of Convex Shapes
3.2
663
Convexity Constraints
To obtain level set methods that preserve the convexity of the shape we follow the basic idea of constrained gradient descent. Let G(Ω) be a shape constraint functional. We consider the constrained shape optimization problem min Ω
(10)
J(Ω)
subject to G(Ω) = 0. Then if J and G are shape differentiable and there exist shape gradients ∇S J and ∇S G, we let μ be a Lagrange multiplier and obtain the necessary conditions for a constrained minimum ∇S J(Ω) + μ∇S G(Ω) = 0 , G(Ω) = 0 .
(11) (12)
A C 2 shape in the plane is convex if the curvature of its boundary is nonnegative. In three dimensions a sufficient condition for convexity is that both principal curvatures of the surface must be nonnegative. Let Ω be a convex shape with the height function u. Then the minimum curvature k1 of the surface is given by 2 2 ux1 x1 + ux2 x2 + (ux1 x1 − ux2 x2 ) + (2ux1 x2 ) k1 = − . (13) 1 + u2x1 + u2x2 This follows from taking the smaller eigenvalue of the matrix representation of the second fundamental form. We extend k1 to all of D × IR+ by setting k1 (x1 , x2 , x3 ) = k1 (x1 , x2 , u(x1 , x2 )) for all x3 ≥ 0 . (14) Let Ω be supported by D and define k := k1 1 + |∇u|2 . We use the constraint functional G(Ω) = u(x) max {0, −k1 (x)} dS . (15) ∂Ω
This functional vanishes if and only if k1 is everywhere nonnegative. The scaling by u is shown to be useful by the following computation. We reformulate the functional in terms of a change of integrals from ∂Ω to D. Then: u G(Ω) = max 0, − max 0, −u k dx1 dx2 k dS = 1 + |∇u|2 ∂Ω D u(x1 ,x2 ) = max 0, − k dx3 dx2 dx1 = max 0, − k dx . D
0
Ω
According to Theorem 4.2 of Chap. 8 in [1] this functional has the L2 shape gradient = max 0, − ∇S G k . (16)
664
T. Eirola and T. Lassila
We obtain the penalty function formulation for the level set equation (4) with a convexity constraint
φt + vn − μ max 0, − k |∇φ| = 0 , (17) with a penalty term μ > 0. This method is a version of the min/max curvature flows studied in [14], since φt + vn |∇φ| = μ min {0, k1 } |∇φ| .
(18)
Furthermore, the minimum curvature flow will convexify the initial shape, justifying our choice of the constraint functional (15). The following theorem was proven in [11]: Theorem 1. In the case that vn ≡ 0, the viscosity solution of the equation (17) converges towards the convex hull of the initial shape Ω0 as t → ∞.
4 4.1
A Problem in 3 He Crystal Imaging Fabry-Pérot Interferometer Measurement of a Crystal
The formation of faceted crystals in low-temperature 3 He has been the subject of study in the low temperature physics community. It is known that at below 200 mK temperatures smooth facets appear that correspond to orientations of the lattice planes. The problem of predicting which facets appear at which temperature is still open. It is known that as the temperature is increased past the so called roughening limit the facets become rounded out and no longer appear. The theoretical roughening limit is much higher than what has been observed in practical experiments. We consider an experimental setup where liquid 3 He at temperature below 200 mK is placed between the two plates of a Fabry-Pérot interferometer. Overpressure is then exerted to allow the creation of crystals to occur. As light passes through the crystals, a diffraction pattern is observed on a CCD imaging array. By relating the intensity of the interferogram to the phase delay through the crystal at each point we can determine the shape of the crystal and the orientation of all the facets. 4.2
Convexity of Crystals and the Growth Process
The growth of crystals is governed by three principal forces: the external work done to the system by the driving overpressure, the surface tension between the liquid and solid Helium, and gravity. When the crystal growth process is sufficiently slow we can assume that at each measurement the crystal has achieved thermal equilibrium. The crystal shape is then determined by minimizing a surface energy. This leads to an anisotropic mean curvature flow that models the growth process of crystals [15]. It is known that such flows preserve convexity of the shapes [16]. We therefore assume that, apart from small irregularities, the thermal equilibrium shape is also convex. This assumption has been verified in experimental measurements.
Optimization of Convex Shapes
4.3
665
Inverse Problem of Shape from Interferogram
Let D = [0, 1]2 be the domain of the interferogram and f : D → IR a function that gives the intensity of the interference pattern at each point on the CCD. The physical parameters are Δnsl , the difference between the refractive indices of the solid and liquid 3 He, and λ, the interferometer laser wavelength, and a(x) the amplitude. The intensity of the interference pattern at each point is given approximately by Δnsl F (u)(x) = a(x) ϕ( u(x)) = f (x) , (19) λ where ϕ : IR → [−1, 1] is a continuously differentiable piecewise strictly monotone waveform function. Note that this definition forbids square or sawtooth type waveforms. To simplify things we assume the laser amplitude to be almost constant and known, a(x) ≈ a. The inverse problem to be solved is: given an interferogram f ∈ L2 (D) of measured intensities (with noise), deduce the shape of the crystal Ω. This problem can be posed as a mathematical shape optimization problem. Let Ω be a convex trial shape supported by D. Denote the bottom part of the surface of the shape as Γb := ∂Ω ∩ D. We consider the shape functional with the L2 -norm J(Ω) = 12 |ϕ(x3 ) − Sf (x1 , x2 )|2 dS , (20) ∂Ω\Γb
where ϕ is a continuously differentiable and piecewise strictly monotone function and S : L2 (D) → H 1 (D) is a smoothing operator. The corresponding mathematical shape optimization problem is then min
convex Ω∈ΣΓ
J(Ω) ,
(21)
b
where ΣΓconvex is a family of convex shapes with Γb fixed. The choice of this family b of will be discussed later. We have the following existence theorem from [4]: Theorem 2. Let f be such that Sf is continuous. Then the shape optimization problem (21) has at least one solution. 4.4
Is the Inverse Problem Uniquely Solvable?
It is possible to construct examples that show that in the absence of a convexity constraint the inverse problem of finding the shape Ω from its interference pattern f is not uniquely solvable even when we set a perimeter constraint such as requiring Γb to be fixed. But if we require convexity and fix u on the boundary ∂Γb , we have the following result: Theorem 3. Let D ⊂ IRd be a bounded convex open set and Γ its boundary. Fix a function h ∈ C(Γ ) on the boundary. Let Ch be the family of concave functions u : D → IR in C(D) such that u|Γ = h. Let the operator F : H 1 (D) → H 1 (D) be defined as (Fu)(x) = ϕ(u(x)) . (22)
666
T. Eirola and T. Lassila
where ϕ is a continuously differentiable and piecewise strictly monotone function. Then the restriction of F into Ch is injective. Proof. Case d = 1 Let u, v : [a, b] → IR be distinct concave functions such that u(a) = v(a), u(b) = v(b), and that ϕ(u) ≡ ϕ(v). Let (ξ, η) ⊂ [a, b] be any open interval where u = v but u(ξ) = v(ξ) and u(η) = v(η). Without loss of generality we assume u > v on (ξ, η). Since ϕ is continuously differentiable and ϕ(u(ξ)) = ϕ(v(ξ)) from the inverse function theorem it follows that ϕ (u(ξ)) = 0. From the assumption that ϕ is piecewise strictly monotone follows that ϕ has only isolated zeros. Thus the local behavior of ϕ near u(ξ) can be of only two types, a) or b), as shown in Fig. 1. a)
b)
ϕ
u, v
u(ξ) = v(ξ)
u(ξ) = v(ξ)
Fig. 1. The different kinds of possible local behavior of the function ϕ(u) near a bifurcation point ξ
Since u is concave there exists an interval (ξ, ξ + ε) where it is either constant, increasing, or decreasing: 1. If u was constant in some interval (ξ, ξ + ε) then so would be ϕ(v). But because ϕ cannot vanish in any neighborhood of u(ξ) this would mean that v would also be constant in (ξ, ξ + ε), a contradiction. So neither u nor v can be locally constant past the bifurcation point ξ. 2. Assume that u is increasing in some interval (ξ, ξ + ε) and the local behavior of ϕ is like in a). Then v must be decreasing in (ξ, ξ + ε). 3. Assume that u is decreasing in some interval (ξ, ξ + ε) and the local behavior of ϕ is like in b). But since u > v, case b) is impossible. Thus immediately after the bifurcation point ξ we must have u increasing and v decreasing. Using the same argument at η we get that u must be decreasing and v increasing in some interval (η − ε, η). But v is concave and cannot be first decreasing and later increasing, a contradiction.
Optimization of Convex Shapes
667
Case d ≥ 2 For every pair of points x, y ∈ Γ we take the line segment L connecting x to y and look at the restrictions u|L , v|L , which are concave functions of one variable. Since u, v coincide on all such segments L they are equal everywhere. We remark that in when the measurement is noisy we can lose the uniqueness of the solution. This is due to the fact that the range of the forward operator F is nonconvex, and thus if the measurement f lies outside the range of F the minimization problem (21) can have multiple solutions. 4.5
Formulation for the H 1 -Variation of a Shape Functional
To solve optimization problem (21) using the gradient descent method we must find the shape gradient of the functional given by (8). While the gradient could be computed only in the L2 inner product, we prefer the H 1 inner product since the resulting gradients are smoother and hopefully also lead to a numerically more robust algorithm. The need for regularizing the shape variations is wellestablished in the literature, but the relation with regularization of ill-posed inverse problems perhaps less so. The effect of different inner products on the convergence of the gradient descent iteration was studied in more detail in [17]. Lemma 1. Consider the shape functional for d-dimensional convex shapes Ω ⊂ D × IR+ : g(x, n) dS , (23) J(Ω) = ∂Ω \ Γb
1
where g(x, n) is H with respect to both arguments. Then J is shape differentiable and the shape derivative dJ(Ω; v) with respect to a normal variation vn ∈ H01 (D) is given by
dJ(Ω; v) = − ∇n g · ∇vn + (∇x g · n + κg)vn |F | dξ , (24) D
for all vn ∈ H01 (D), where |F | := 1 + |∇u|2 is the change of integrals term given by u the height function of the convex shape. Proof. The details are given for example in [18]. Here we reproduce only the general procedure. Let Ω be given and φ its implicit function. Then according to the coarea formula [19] gives ∇φ ) |∇φ| δ(φ) 1lΓbc dx . g(x, n) dS = g(x, J(Ω) = |∇φ| ∂Ω \ Γb IRd The variation can now be performed in terms of φ. Let vn = −ψ/|∇φ| be an extension velocity field to the entire IRd such that ψ|Γb ≡ 0, i.e. the base remains fixed. The Gâteaux derivative is, after some computations, given by d ψ ∇φ J(φ + τ ψ) = ∇ · ∇n g + g |∇φ|δ(φ) dx . dJ(Ω; v) = − dτ |∇φ| |∇φ| IRd
668
T. Eirola and T. Lassila
Integration by parts gives ∇ · (∇n g) vn dS = − ∂Ω
∇n g · ∇vn dS
∂Ω
and the result follows by using the coarea formula in the other direction and noting that n = ∇φ/|∇φ| and κ = ∇ · n is the mean curvature of ∂Ω. We can thus compute the negative shape gradient of J with respect to the H 1 inner product as the solution w ∈ H01 (D) of the elliptic equation (∇vn · ∇w + vn w) dξ+ (αvn + β · ∇vn ) dξ = 0, for all vn ∈ H01 (D), (25) D
D
where α = |F | (∇x g · n + κg) and β = −|F |∇n g as in Lemma 1, plus homogeneous Dirichlet boundary conditions. For the convex constrained iteration it also beneficial to use the H 1 -gradient of the constraint functional (15), which can be obtained by the same procedure from (16).
5 5.1
Numerical Experiments Methodology
As a first approach to optimization of convex shapes we limit the numerical experiments to 1-d and choose D = [0, 1]. The questions to be answered are: – Does the convexity constraint penalty term improve the quality of the recovered shapes? – We would like to estimate the tensor of anisotropy of the mean curvature flow that drives the crystal formation process. Can reasonable estimates for the curvatures be obtained from the recovered shapes? The quality of the recovered shapes was studied with two different crystal profiles (shown in Fig. 2). Case A represents a faceted crystal, while Case B is a smooth profile. For the forward model we used a sinusoidal waveform, f (x) = sin(γu(x)). To measure the error of the recovered shapes we generated a testing sample of 100 noisy realizations of the data f , each with 10% standard deviation, and took the mean L2 -error over this sample set. At each descent step the shape derivative (24) was computed. The H 1 -gradient was solved from equation (25). The normal velocity field was extended to the entire computational domain and the resulting level set evolution was solved using the Level set method toolbox [20] for Matlab. The gradient descent step size was chosen according to the Armijo rule [21] to obtain decreasing steps in the functional (20). The iteration was stopped when the recovered height function u changed less than 0.1% in the L2 -norm during the previous step. For the convex constrained iteration (17) we used a penalty parameter value of μ = 105 .
Optimization of Convex Shapes
Case A
−4
x 10
Case B
−4
4
669
x 10
3
2
2 1 0
1 0
0.2
0.4
0.6
0.8
0
1
0
0.2
0.4
0.6
0.8
1
Fig. 2. Left: True crystal shape (solid line) and initial guess (dashed line) for the test Case A. Right: Same for Case B.
5.2
Choosing the Smoothing Operator S
To construct the smoothing operator S in (20) we considered linear diffusion operators of the form −K
(Sf )(xi ) = (I − δDxx )
f (xi ),
K ∈ IN ,
(26)
where Dxx is an operator giving the discrete approximation of the second derivative of f at xi . The simplest choice is the symmetric difference approximation for the second derivative (in the 1-d case) Dxx =
f (xi+1 ) − 2f (xi ) + f (xi−1 ) . Δx2
(27)
This difference approximation tends to smooth out especially the corners of f , so that for faceted profiles we should choose K moderately small. We chose δ = 0.01 and considered the cases K = 0 (no smoothing) and K = 100 (with smoothing). 5.3
Comparison of Convergence with and without the Convexity Constraint
The first observation we made was that the L2 -gradient descent iteration in general does not work at all. The computed boundary variations were too oscillatory. After an H 1 -gradient was implemented the regularization was enough to provide local convergence from an initial guess that had 15%-20% relative L2 -error. In Table 1 we list the accuracy of the obtained shapes by the relative L2 error from the true crystal shape. We note that in both cases the recovered solutions were roughly within 3% of relative error. This remained the case even with convexity constraints and smoothing of the data. The sharp corner of Case A also produced more error than the smooth profile of Case B. 5.4
Estimating the Curvature(s) of the Crystal Surface
One way of evaluating the quality of the recovered crystal shapes is to see if useful estimates for the curvature(s) of the crystal surface can be obtained. We
670
T. Eirola and T. Lassila
Table 1. Relative L2 -error from the true profile u obtained by the unconstrained (μ = 0) and convex constrained (μ = 105 ) iterations with and without smoothing
Case A B
No smoothing No smoothing With smoothing With smoothing μ=0 μ = 105 μ=0 μ = 105 1.71 % 0.47 %
2.61 % 0.51 %
1.98 % 0.47 %
2.63 % 0.61 %
ran both the unconstrained and convex constrained iterations for Case A. We also tested the effect of increasing K in the smoothing operator (26). The obtained curvatures are plotted in Fig. 3. In this case the curvature should be zero almost everywhere with a singularity at one point. None of the curvature estimates are free from numerical artifacts. The convex constrained solution gives curvatures that are nearly nonnegative everywhere. The effect of added smoothing is to dampen the oscillations of the recovered curvatures. μ = 0, No smoothing
μ = 0, With smoothing
1
1
0.5
0.5
0
0 0
0.5 5 μ = 10 , No smoothing
1
1
1
0.5
0.5
0
0 0
0.5
1
0
0.5 5 μ = 10 , With smoothing
1
0
0.5
1
Fig. 3. Estimated curvatures for the Case A obtained with the unconstrained and convex constrained iterations, with and without smoothing of the data. The true curvature is denoted by a dashed line.
6
Conclusions
The inverse problem of crystal shape identification from a single interferogram is uniquely solvable if the shape is required to be convex and we have boundary data available. Numerical level set methods can be used to solve such problems with the gradient descent method. We added a penalty term to enforce convexity of the shapes. By choosing H 1 shape gradients we introduced regularization to the problem. This allowed recovery of solutions of the otherwise ill-posed problem. We demonstrated that local convergence is obtained even when relatively large amounts of noise are present in the interferogram. The convex penalty term improved the quality of the recovered surface curvatures.
Optimization of Convex Shapes
671
References 1. Delfour, M., Zolésio, J.P.: Shapes and geometries - analysis, differential calculus, and optimization. SIAM, Philadelphia (2001) 2. Sokolowski, J., Zolésio, J.P.: Introduction to shape optimization: shape sensitivity analysis. Springer, Heidelberg (2003) 3. Tsepelin, V., Alles, H., Babkin, A., Jochemsen, R., Parshin, A., Todoshchenko, I., Tvalashvili, G.: Morphology and growth kinetics of 3He crystals below 1 mK. J. Low Temp. Phys. 129(5-6), 489–530 (2002) 4. Buttazzo, G., Guasoni, P.: Shape optimization problems over classes of convex domains. J. Convex Anal. 4(2), 343–351 (1997) 5. Aguilera, N., Morin, P.: Approximating optimization problems over convex functions. Numer. Math. 111(1), 1–34 (2008) 6. Carlier, G., Lachand-Robert, T.: Convex bodies of optimal shape. J. Convex Anal. 10, 265–273 (2003) 7. Carlier, G., Lachand-Robert, T., Maury, B.: A numerical approach to variational problems subject to convexity constraint. Numer. Math. 88, 299–318 (2001) 8. Carlier, G., Lachand-Robert, T., Maury, B.: H 1 -projection into set of convex functions: A saddle point formulation. In: ESAIM: Proc., vol. 10, pp. 277–290 (2001) 9. Lachand-Robert, T., Oudet, É.: Minimizing within convex bodies using a convex hull method. SIAM J. Optim. 16(2), 368–379 (2005) 10. Hinterberger, W., Scherzer, O.: Variational methods on the space of functions of bounded Hessian for convexification and denoising. Comput. 76, 109–133 (2006) 11. Vese, L.: A method to convexify functions via curve evolution. Commun. Partial Differential Equations 24(9), 1573–1591 (1999) 12. Osher, S., Fedkiw, R.: Level set methods and dynamic implicit surfaces. Applied Mathematics Sciences, vol. 153. Springer, Heidelberg (2002) 13. Burger, M., Osher, S.: A survey on level set methods for inverse problems and optimal design. European Journal of Applied Mathematics 16(2), 263–301 (2005) 14. Malladi, R., Sethian, J.: Image processing: flows under min/max curvature and mean curvature. Graph. Models Image Process. 58(2), 127–141 (1996) 15. Wettlaufer, J., Jackson, M., Elbaum, M.: A geometric model for anisotropic crystal growth. J. Phys. A 27, 5957–5967 (1994) 16. Bellettini, G., Caselles, V., Chambolle, A., Novaga, M.: Crystalline mean curvature flow of convex sets. Arch. Ration. Mech. Anal. 179, 109–152 (2005) 17. Burger, M.: A framework for the construction of level set methods for shape optimization and reconstruction. Interfaces Free Bound 5, 301–329 (2003) 18. Solem, J.: Variational problems and level set methods in computer vision - theory and applications. PhD thesis, Lund University (2006) 19. Federer, H.: Geometric measure theory. Springer, New York (1969) 20. Mitchell, I.: The flexible, extensible and efficient toolbox of level set methods. J. Sci. Comput. (2007) (online first) 21. Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math. 16(1) (1966)
An Implicit Method for Interpolating Two Digital Closed Curves on Parallel Planes Nikolaos Gabrielides and Laurent Cohen Centre de Recherche en Mathématique de la Décision, Université Paris IX, Dauphine, Place du Maréchal de Lattre de Tassigny, 75775 Paris Cedex 16, France
[email protected],
[email protected] Abstract. Ardon et al. [2] presented an implicit method for surface segmentation in 3D images. The boundary of the surface is assumed to be constrained by two given curves in the image. In this work we adopt the afore approach to interpolate two given digital curves lying on parallel planes, by introducing an artificial image potential, which is based on a triangular facet surface interpolation technique.
1
Introduction
Let us be given two digital contours Γ and Δ, i.e. two closed ordered sets of black voxels on a white background, lying on the planes z = rΓ and z = rΔ , of a 3D image Ωpqr , which discretizes the volume Ω ⊂ IR3 , with p, q and r being the number of voxels distributed equidistantly along the x, y and z axis, respectively. We wish to construct a surface that interpolates the data sets Γ and Δ. A similar formulation to the afore digital contour interpolation problem can be found in the construction of a gradual transformation from the closed polygon, PΓ to the closed polygon PΔ , most widely known as the morphing problem. Following Efrat et al. [11], this tranformation can be expressed as a mapping: M(PΓ , PΔ ) = {μ(t), t ∈ [0, 1], such that μ(0) = PΓ , μ(1) = PΔ }, which can be computed by solving the following two problems: (a) The correspondence problem, where an explicit mapping between PΓ and PΔ , is established, by specifying two functions cγ (u) : [0, 1] → PΓ and cδ (u) : [0, 1] → PΔ . (b) The vertex path problem, where we seek for the trajectory that connects cγ (u) with cδ (u) (see also [15]). If this path is a straight line, then it is easy to find examples with self intersections. The authors of [11] assert that if one adopts the policy of moving cγ (u) to cδ (u) along the Euclidean shortest path, from cγ (u) to cδ (u) that avoids PΓ and PΔ , then it is guaranteed that all intermediate morphs are simple, since the shortest paths do not cross each other, although two such paths may have a common sub-path. Hence, in order to achieve a solution to the digital contour interpolation problem, free of self intersections, we seek for a method that constructs surfaces from
This work was partially supported by ANR grant SURF -NT05-2_45825.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 672–683, 2009. c Springer-Verlag Berlin Heidelberg 2009
An Implicit Method for Interpolating Two Digital Closed Curves
673
3D images, that contain geodesic paths connecting the digital contours Γ and Δ. The method presented in [2] might give us the opportunity to solve the problem with implicitly defined surfaces, as it possesses this property.
2
Preliminaries
In order to segment a given 2D or 3D image I : Ω → IR, a common approach is to define a Riemannian manifold, called potential function, P = P (I) : Ω → IR, such that features in I will be captured on P . This, of course, is ensured with an “appropriate” definition of the function P , which takes into account the nature of the features we aim to follow. More specifically, after the classic work of Kass et al. [16] in 2D image segmentation methods the objective is to compute an active contour, C(s), s ∈ [0, L], located on the surface P , such that minimizes the energy functional:
L
E(C) =
(1)
P (C(s))ds. 0
Towards this aim, Cohen & Kimmel, in [8], presented a segmentation method, which computes the active contour connecting two given points, P1 , P2 on P . The authors show that a globally minimal curve for (1) is obtained by following the opposite gradient direction on the minimal action map UP1 (Q) (see [18]) which is defined by: UP1 (Q) =
L
inf
C(0)=P1 ,C(L)=Q
P (C(s))ds, 0
Q on P.
(2)
The minimal path C(s), from P1 to P2 is then obtained by solving the problem: ˜ dC(σ) ˜ ˜ = −∇UP1 (C(σ)), with C(0) = P2 , dσ
˜ − σ) (3) and setting C(s) = C(L
According to the analysis in [19] the minimal action map UP1 is the solution of the following eikonal equation: ||∇UP1 || = P,
with UP1 (P1 ) = 0.
(4)
An extension of the above results for 3D images is presented in [1]. Given a 3D image, I, and the corresponding potential, P , the Euler-Lagrange equations of the energy functional E in the 3D space are: ˆ = P (C)κ ∇P (C) · n
ˆ = 0, and ∇P (C) · b
(5)
ˆ and the scalar κ denote the normal, the binormal and ˆ, b where the vectors n the curvature of C, respectively. It was proved that if UP1 is the solution of the eikonal equation (4), then every curve C(s) that is a solution of the ordinary differential equation (3) is also a solution of the Euler-Lagrange equations (5).
674
N. Gabrielides and L. Cohen
This result paved the way to define and compute the globally minimal path between a point P and a curve Γ on the Riemannian manifold P . The minimal action map with respect to Γ and P is defined as the function 1 UΓ (P) = min E(C) = min P (C(t)) ||C (t)|| dt, (6) C
C
0
where C(t), t ∈ [0, 1] is any curve from the point P to the curve Γ . Note that, by the definition of C, the minimal action map UΓ (P) is equal to UQ (P) for some Q ∈ Γ . Thus, UΓ satisfies the eikonal equation: ||∇UΓ || = P,
with UΓ (Q) = 0,
∀Q ∈ Γ.
(7)
Going one step beyond, let us assume that the point P belongs to a set Δ. Having solved (7) all the minimal paths from each point in Δ to the curve Γ , can be computed using (3). Let us denote this set of paths by SΓΔ . It can now be undrestood that if the points in Δ form a curve, then the set SΓΔ consists of all the minimal paths, CΔ Γ (s), between the points of the two curves Γ and Δ. Next, in [2] a function Ψ , was defined on the image domain, such that its zero level set contains all the paths in SΓΔ , i.e. Ψ (CΔ Γ (s)) = 0. Assuming that Ψ is continuously differentiable, the following necessary condition was obtained: Δ Ψ (CΔ Γ (s)) = 0 =⇒ ∇Ψ (CΓ (s)) ·
CΔ Γ (s) = 0 =⇒ ∇Ψ (P) · ∇UΓ (P) = 0, ds
(8)
for every point P ∈ SΓΔ . Demanding that Ψ satisfies a relation similar to (8), everywhere in Ω, a sufficient condition for the minimal paths to be contained in Ψ = 0 is given by the following transport equation: ∇Ψ (P) · ∇UΓ (P) + G(Ψ (P)) = 0,
Ψ (Q) = 0,
∀Q ∈ Δ,
(9)
where the function G is such that G(0) = 0 (e.g., G(Ψ ) = aΨ (P)). In fact it was proved that if Ψ satisfies (9) then for all points P of its zero-level set, the minimal path joining P with the curve Γ is contained in the zero level set of Ψ . This, in turns, proposes to solve equation (7) and then compute Ψ through (9). , Note that the equations (7) and (9) can be solved over the nodes of Ωpqr which discretizes Ω. In view of this, the point-sets Γ and Δ form two digital contours, which in turn implies that the afore method establishes essentially an interpolation between the two given digital contours. This allows us to employ it in the digital contour interpolation problem, provided that Γ and Δ lie on the parallel planes z = rΓ and z = rΔ , and no potential function is given. Since the surface Ψ = 0 contains all the minimal paths from the digital contour Γ to the digital contour Δ, we can allegate that solving the problems (7) and (9) we obtain an interpolating surface free of self-intersections.
3
An Artificial Image Potential
The need for an artificial image potential, other than constant, can be explained as follows: if P is constant, then the induced Riemannian manifold is a hyperplane in R4. Thus, the minimization of the energy functional (1) leads to a set
An Implicit Method for Interpolating Two Digital Closed Curves
675
of straight lines in R3, which start from the point set (contour) Δ and end on the points of the contour Γ , having the minimum length. Suppose now that the contour Γ is translated onto the plane z = rΓ until one point p of it is closer to all points of the set Δ. Then, the surface that contains all minimal paths is conic with its apex at P and base the set Δ. In that case all the points of Γ but P are not interpolated by the surface Ψ = 0. Thus, the problem is to introduce an artificial potential function, by using only the given information of Γ and Δ. Let us suppose that we are given a matching between the two given point sets (pixel sets) Γ and Δ. Then, we can easily define the set of minimal paths SΓΔ through equation (3) for any potential function, P . If P is constant, then the minimal paths are the straight lines which connect the points of the two point sets (the pixel centers) according to the preassumed matching, thus a C 0 surface containing all the minimal paths can be a triangular facet surface that interpolates Γ and Δ. The main disadvantage with such a construction is that self-intersections cannot be avoided in general (see [14]). However, since there are interpolation techniques which can easily construct triangular facet surfaces that interpolate the given point sets, the above remarks make us think that it is preferable to compute the potential P through the construction of such a surface, say S. Since S consists of triangles, it can easily be implicitized on the grid, Ωpqr . This can be achieved, for example, by computing the euclidean distance function, D, of S, on the grid nodes P, i.e. D(P, S) = min ||P − S|| , ∀P ∈ Ω
(10)
Then, regardless the matching we chose between the points of Γ and Δ, if one traverses the minimal path from a point on Δ to some point on Γ and the surface intersects itself, the minimal path is chosen so as to have a common sub-path after the intersection point, thus avoiding self-intersections. This suggests that the surface S could be the Riemannian manifold on which the minimal paths lie, i.e. the unsigned distance D can play the role of the discrete potential P at the image discretized domain Ωpqr . 3.1
Interpolating Two Polygons with C 0 Triangular Facet Surfaces
Previous Work. The construction of the surface S can be formulated as follows: Problem 1. Given the ordered closed planar point sets: PΓ = {PΓ,j ∈ IE 3 , j = 0, . . . , n − 1} and PΔ = {PΔ,k ∈ IE 3 , k = 0, . . . , m − 1}, which belong to the planes z = rΓ and z = rΔ , respectively, construct a C 0 surface interpolating them and consists of triangles with vertices in PΓ and PΔ . (n+m)! . Among them, one has to The total number of such triangulations is (n−1)!(m−1)! compute the optimal, according to some objective function, which quantifies the quality of these triangulations. Apparently, the quality of such a surface depends mainly on the relative twist between the points of the two contours. This in turns lets us entitle the objective function as a twist minimization criterion.
676
N. Gabrielides and L. Cohen
Keppel introduced in [17] a representation of all continuous solutions, with the aid of a toroidal graph, i.e., a binary matrix, Kn×m , where the indices j, k of its elements are regarded as j = mod(j, n) and k = mod(k, m). If Kjk = 1, then the points PΓ,j and PΔ,k are connected. If Kjk = 1 and Kj+1,k = 1, then the points PΓ,j , PΓ,j+1 and PΔ,k form a triangle. (Analogously, if Kjk = 1 and Kj,k+1 = 1, then the points PΓ,j , PΔ,k and PΓ,k+1 form a triangle). Each triangle arrangement is represented by a set of unitary elements in this matrix. Keppel proved that for acceptable triangulation, these elements form a monotone path in the graph. Thus, the optimum surface can be obtained by searching among all monotone paths in the toroidal graph Kn×m . The methods for computing such paths can be divided into two categories: the exhaustive search methods (e.g, [17, 13]) that evaluate the final surface according to some global criterion, and the methods based on weighted graphs (e.g., [6, 4, 12]) according to which a weight is assigned on each graph node and then starting from the least one, the whole path is computed by choosing in each step, among the neighboring nodes, the one with minimum weight. The methods based on weighted graphs reduce effectively the computational cost, but since they are depending on the selection of the nodal weights, may yield surfaces that do not interpolate all points in PΓ and PΔ . Our intension is to propose a nodal weight definition, which resolves such ambiguities. Our Method. In order to introduce our method, let us further restrict ourselves to convex contour data sets. In [6, 12] the weight at the node Kjk of the toroidal graph, is the length ||PΓ,j − PΔ,k ||. Thus, by definition, the final result depends on the relative position of the sets PΓ and PΔ . The method of [4] proposes a translation of the polygons so as their centers, AΓ and AΔ coinside. Thus, the square of the afore defined distance for the translated polygons, with respect to the initial points is equal to ||(PΓ,j − AΓ ) − (PΔ,k − AΔ )||2 = ||(PΓ,j − AΓ )||2 + ||(PΔ,k − AΔ )||2 − 2(PΓ,j − AΓ ) · (PΔ,k − AΔ ). Then, setting −(PΓ,j − AΓ ) · (PΔ,k − AΔ ) as nodal weight, the path is computed by choosing the minimum weight at each step. We propose as weight function the dimensionless quantity: −
(PΓ,j − AΓ ) · (PΔ,k − AΔ ) , ||PΓ,j − AΓ || ||PΔ,k − AΔ ||
(11)
which is equal to the negative cosine of the angle formed by the vectors: PΓ,j − AΓ , j = 0, . . . , n − 1 and PΔ,k − AΔ , k = 0, . . . , m − 1, in [0, π]. Since the cosine is a decreasing function in [0, π], the proposed weight can equivalently be defined as the least angle, φ(θΓ,j , θΔ,k ) formed by two lines with directions given by PΓ,j − AΓ and PΔ,k − AΔ , where θΓ,j denotes the polar angle of the point PΓ,j with respect to a coordinate system whose origin is AΓ,j . (Analogous definition holds for θΔ,k ). We connect the point PΓ,j with the point PΔ,k (analogously the point PΔ,k with PΓ,j ), when the index k (index j) is such that solves the following problems: min
k=0,...,m−1
φ(θΓ,j , θΔ,k ) and
min
j=0,...,n−1
φ(θΓ,j , θΔ,k ).
(12)
An Implicit Method for Interpolating Two Digital Closed Curves
677
We set the weight at every node Kjk equal to the angle φ(θΓ,j , θΔ,k ). Then, Kjk = 1 for all couples of points that constitute the set of solutions of the problems (12). Now, we can easily establish that the solution has the following properties (see, e.g., Fig.1): i. In every row and every column of the toroidal graph there exists at least one unitary node, since ∀j we have computed the corresponding index k and ∀k we have computed the corresponding j. ii. The unitary nodes of the graph are ordered monotonically. The proof is simple, if one realizes that for each particular connection between PΓ,p1 PΔ,q1 and PΓ,p2 PΔ,q2 , every point PΓ,j which is in between PΓ,p1 and PΓ,p2 must be connected with a point which is in between PΔ,q1 and PΔ,q2 , since both polygons share the same orientation and are convex. iii. Solving the problems (12) does not imply that all the nodes of the monotone path in the graph have been computed. It is possible to be left with couples (p1 , q1 ) and (p1 + 1, q1 + 1) but none of (p1 , q1 + 1) and (p1 + 1, q1 ).
Fig. 1. Left: The connections between the points of two convex polygons, as obtained by solving the problems (12). Right: The toroidal graph of the connections. The unitary nodes are illustrated by spheres, the computed triangle edges by blue lines and the possible triangle edges by red lines.
If we interprete geometrically the afore properties, we may assert that up to this point we have constructed a surface which interpolates the point sets PΓ and PΔ and consists of triangular and rectangular patches. The final triangulation can be obtained by tracking all the rectangular patches (i.e. where the property (iii) holds) and triangulating them, based on the least nodal weight. Constructing the surface in this way, O(nm) operations need to be performed, but this cost can effectively be reduced. Towards this aim, we define the circular lists: LΓ = {θΓ,j }n−1 j=0 and LΔ = {θΔ,k }m−1 of the polar angles of the points of the two initial point sets, with k=0 respect to their centers. Note that the elements of these lists have a circularly increasing order. We find the element of the list LΓ with the least value and we
678
N. Gabrielides and L. Cohen
set the head of LΓ at its position. Then, we compute the index which solves (12) for j = 0 and we set the head of LΔ at . (We also reorder accordingly the elements of the point sets PΓ and PΔ ). Now, we know that the element K00 of the graph belongs to the set of solutions of the problem. Note that up to this point, the operations done are O(n + m). Say now that the node Kjk belongs to the solution set of the problems (12), i.e. Kjk = 1. We consider only the possible connection of this to the nodes Kj+1,k , Kj,k+1 and Kj+1,k+1 , knowing that due to the properties (i)-(iii), at least one of them belongs to the solution nodes. Thus, we begin from the node K00 , which is already computed, and at each step we compare the weights given by the function φ(·, ·), only for the afore mentioned three neigboring nodes. In case the least node is Kj+1,k+1 , we also insert in the path the one of the other two that has the least weight. Apparently, the path computed this way will traverse the nodes of the solution of the problems (12), and since the nodes to be computed are exactly (n + m), it readily follows that the complexity of the algorithm is O(n + m). Now, we can state the following result: Lemma 1. A C 0 triangular faced surface that interpolates any two convex planar polygons, with n and m points and satisfies the criteria (12) can be computed after O(n + m) operations. Moreover, the space needed for the whole process is of O(n + m). If one or both polygons are not convex, we can map them onto their convex hulls and apply the algorithm to the trasformed polygons. The output of the alogrithm is actually a point matching, thus the final surface can be constructed by adopting this matching. The use of such a technique was first proposed and implemented in [12] but their method increases the computational cost. Alternatively, in order to eliminate the cost of this mapping, we project all the points of the non-convex segments, Pj , j = S + 1, . . . , E − 1, to the corresponding convex hull segment PS PE , according to rule given by: Pj = (1 − tj )PS + tj PE ,
tj =
j−1 k=S ||Pk+1 −Pk || E−1 . k=S ||Pk+1 −Pk−1 ||
Computing the convex hull
of a polygon by using the algorithm of [21] needs O(n) operations, hence we can state that the results of the Lemma 1 still hold in the general case of nonconvex polygons. It is worth to remark that this algorithm although is of linear complexity, the criterion is not local (in the sense that the same result is obtained following the exhaustive search procedure) in constrast to all up today published algorithms, except of the one given in [24] also for convex polygons. Finally, the result, i.e. the point matching, is independent of any translation of the initial data and moreover independent of an isotropic scaling of the initial data sets, thus it satisfies the criteria given by [24]. Note also that the whole method emulates the algorithmic procedure proposed by [5]. The Discrete Potential Function. Since the surface S consists of (n + m) triangles, the minimum Euclidean distance (10) from every point of a grid Ωpqr to S can be found in (n + m) operations, thus the total number of calculations for the discrete image potential becomes of O(pqr(n + m)).
An Implicit Method for Interpolating Two Digital Closed Curves
4
679
Numerical Solution of the Eikonal and the Transport Equation
Both equations (7) and (9), belong to the class of Hamilton-Jacobi stationary equations and shall be considered simultaneously. The conditions under which the solution of a numerical approximation of any Hamilton-Jacobi equation converges towards the so-called viscosity solution can be found in [9] and [10]. In [25, 2] a first order upwind scheme employed in order to solve equation (9). According to them, the numerical Hamiltonian of (9) can be written, for G(Ψ ) = αΨ , as i,j,k i,j,k i,j,k Ψ x , Ψy , Ψz · (UΓ )i,j,k + αΨ i,j,k = 0. , (UΓ )i,j,k , (UΓ )i,j,k (13) x y z where the subscripts denote the partial differentation with respect to x, y and z. Approximating the derivatives by biasing the finite difference stencil in the direction where the characteristic information is coming from, lets us write the product Ψxi,j,k (UΓ )i,j,k as: x i,j,k Ψ i,j,k −Ψ i+1,j,k (UΓ )i,j,k = −(UΓ )i,j,k , if (UΓ )i,j,k 0 Δx or i + 1, if (UΓ )i,j,k 0 i − 1, if (UΓ )i,j,k x (14) Applying the above to (13) and solving it with respect to Ψxi,j,k we obtain: |(UΓ )i,j,k | |(U )i,j,k | |(U )i,j,k | + Ψ i,J,k Δyy + Ψ i,j,K ΓΔzz Ψ I,j,k ΓΔxx
Ψ i,j,k = (15) |(U )i,j,k | |(UΓ )i,j,k | )i,j,k | x z + ΓΔyy + |(UΓΔz +α Δx Ψxi,j,k (UΓ )i,j,k x
Ψ |(UΓ )i,j,k | x
i,j,k
− Ψ I,j,k , where I = Δx
for i = 0, . . . , p − 1, j = 0, . . . , q − 1 and k = 0, . . . , r − 1, with I, J and K being defined in analogous to (14) manner, according to the sign of the nodal derivatives of UΓi,j,k with respect to x, y and z, respectively. For the eikonal equation (7) the scheme proposed by Rouy & Tourin [22],
2 i,j,k max max (UΓ )i,j,k = (P i,j,k )2 , −X , 0 , − min (UΓ )+X , 0
(16)
X={x,y,z}
leads to a quadratic equation, with respect to (UΓ )i,j,k . Both equations (15) and (16) can be solved iteratively by updating their grid values until they converge, according to some predefined accuracy. An ultimately efficient approach to solving them is based on the so-called fast marching method, which was introduced by Sethian [23] for the eikonal equation (16). Realizing that the solution of the eikonal equation represents the distance
680
N. Gabrielides and L. Cohen
map on the (hyper)-surface P from the boundary curve Γ (see [19] and [7]) it is to be expected that the information propagates from the smaller values, near the boundary Γ , to the larger ones as we move far from it. In other words, since the characteristics of the eikonal equation are straight lines (see [20]) emanating from the boundary Γ , the numerical solution can be built "outwards" from the smallest values, as Sethian pointed out. The idea is to sweep the front ahead, by considering a set of points in a narrow band around the existing front, and to march this narrow band forward, freezing the values of existing points and bringing new ones into the narrow band structure. The key is in the selection of which grid point in the narrow band to update. The answer is that the point having the smallest value (i.e. the closest to the already calculated points) in this narrow band around the front is the one that cannot be affected by the other points next to it, thus its value must be correct. Returing back to the discrete transport equation (15) an extremely fast convergence can be achieved by visiting the points in the order they are reached by the characteristic curves, in an analogous way to that of the fast marching method for the eikonal equation (see [25, 2]). Considering the characteristics of the equation (9) we obtain that the absolute values of Ψ i,j,k decrease, as we move from the boundary to the zero-level set of Ψ , provided that the coefficient α is greater than zero, thus in each step we update the values of Ψ i,j,k on a narrow band of nodes, using the values of Ψ i,j,k that have already been calculated (solved), starting from the boundary of the domain, via equation (15). Then, we consider as solved the point, whose value is closest to solved points, i.e., the one with the maximum absolute value in the narrow band. Regarding the boundary conditions, since we concern only for the zero level set of Ψ , and the condition Ψ = 0 on Γ , following [2] we define the closed set: Vη = {P ∈ Ωpqr : D(P, Γ ) ≤ η}, Γ
where η is a real positive value. We impose Ψ to be equal to the signed distance between P and Γ on the nodes of V ∩ Ωpqr and equal to ± min(D(P, Γ )), Γ P ∈ ΩpqrΓ on the rest of the boundary nodes of Ωpqr , by choosing the negative sign for the nodes exterior to Γ and the positive sign for those interior to Γ . Note also that Γ can be on the boundary of Ω, while Δ must be entirely inside Ω. Numerical experimentation has shown that visually acceptable results can be achieved if we extend the grid Ωpqr in the z direction, so as Δ lies in the middle z-plane. Finally, we should remark that the algorithm yields a different solution, Γ if we compute the surface containing the minimal paths SΔ , instead of the one Δ containing SΓ . The authors of [3] raise this asymmetry by exploiting both the minimal action maps UΓ and UΔ , which is defined analously to UΓ .
5
Examples
In what follows we have taken the digital contours Γ and Δ to lie on the planes z = rΓ and z = rΔ , with rΓ < rΔ and the coefficient α in equation (15) to be
An Implicit Method for Interpolating Two Digital Closed Curves
681
Fig. 2. Ex. 1: The C 0 triangular facet surface and the implicit surface Ψ = 0
Fig. 3. Ex. 2: The C 0 triangular facet surface and the implicit surface Ψ = 0
Fig. 4. Ex. 2: Intermediate slices of the implicit surface, from the contour Γ to Δ
Fig. 5. Ex. 3: The C 0 triangular facet surface and the implicit surface Ψ = 0
equal to 0.1. The grids are relatively coarse, ranging from 50 ÷ 70 nodes in the x and y directions and 20 nodes in the z direction. The first example (see Fig.2) can be characterized as a “simple case” where the triangular facet surface has no self-intersections. In the example shown in
682
N. Gabrielides and L. Cohen
Figs.3-4 the triangular surface has a widely spread self-intersection region, due to the interpolated contours, which are far from being convex. The method yields a surface with no self-intersections. The third example (see Fig.5) is an interpolation of two contours of U as S like shapes. It shows that “morphing” cannot be achieved always due to the fact that in some cases the resulting surface, although it has no self-intersections, appears to have holes, i.e., disconnected cross sections in the area of self-intersection of the triangular surface. This means that the particular image potential function dictates the minimal paths to go around the self-intersection area, thus generating a hole in the surface.
6
Conclusions
We presented an implicit method to interpolate two digital contours on parallel planes, employing the 3D image segmentation technique of [2]. In order to guarantee that the voxels of both contours will always be interpolated, we introduced an artificial potential function. Towards this, we developed an interpolation method, that matches all the pixel centers through a C 0 triangular facet surface, and set the potential function to be the eucledean distance to this surface. The method results to non self-intersecting surfaces. However, when the polygons, connecting the contour voxel centers, are far from convex, it cannot always produce morphs that preserve the connectedness of the given contours along each intermediate slice, which in turns arises a question on how can the potetial function be improved, so as to stably accomplish an acceptable morphing between Γ and Δ. This remains an open question. The idea behind this work was to set up processes for interpolating sets of pixels/voxels following minimal paths on some appropriately defined manifolds. The idea seems to be fruitful and it might pave the way to solve even more difficult interpolation problems in the future.
References 1. Ardon, R., Cohen, L.D.: Fast constrained surface extraction by minimal paths. Inter. J. of Computer Vision 69, 127–136 (2006) 2. Ardon, R., Cohen, L.D., Yezzi, A.: A new implicit method for surface segmentation by minimal paths in 3D images. Appl. Math. Optim. 55, 127–144 (2007) 3. Ardon, R., Cohen, L.D., Yezzi, A.: Fast surface segmentation guided by user input using implicit extension of minimal paths. J. of Math. Imaging & Vision 25, 289– 305 (2006) 4. Batnitzki, S., et al.: Three-dimensional computer reconstruction from surface contours for head CT examinations. J. of Comp. Assist. Tomogr. 5, 60–67 (1981) 5. Choi, Y.-K., Park, K.-H.: A heuristic triangulation algorithm for multiple planar contours using an extended double branching procedure. Visual Computer 10, 372– 387 (1994) 6. Christiansen, H.N., Sederberg, T.W.: Conversion of complex contour lines into polygonal element mosaics. In: Phillips, R.L. (ed.) Computer Graphics (SIGGRAPH 1978), vol. 12, pp. 187–192 (1978)
An Implicit Method for Interpolating Two Digital Closed Curves
683
7. Cohen, L.D.: Minimal paths and fast marching methods for Image Analysis. In: Paragios, N., Chen, Y., Faugeras, O. (eds.) Mathematical Models in Computer Vision: The Handbook, pp. 97–111. Springer, Heidelberg (2005) 8. Cohen, L.D., Kimmel, R.: Global minimum for active contour models: A minimal path approach. Inter. J. of Computer Vision 24, 57–78 (1997) 9. Crandall, M., Lions, P.L.: Viscosity solutions of Hamilton-Jacobi equations. Trans. Amer. Math. Soc. 277, 1–42 (1983) 10. Crandall, M., Lions, P.L.: Two approximations of solutions of Hamilton-Jacobi equations. Math. of Comp. 43, 1–19 (1984) 11. Efrat, A., Har-Peled, S., Guibas, L., Murali, T.: Morphing between Polylines. In: Proc. 12th Ann. ACM-SIAM Symp. on Discr. Alg. (SODA 2001), pp. 680–689 (2001) 12. Ekoule, A.B., Peyrin, F.C., Odet, C.L.: A triangulation algorithm from arbitrary shaped multiple planar contours. ACM Trans. on Graph. 10, 182–199 (1991) 13. Fuchs, H., Kedem, Z.M., Uselton, S.P.: Optimal surface reconstruction from planar contours. Commun. ACM 20, 693–702 (1977) 14. Gitlin, G., O’Rourke, J., Sabramanian, V.: On reconstructing polyhedra from parallel slices. Intern. J. of Comp. Geom. & Appl. 6, 103–122 (1996) 15. Hahmann, S., Bonneau, G.-P., Caramiaux, B., Cornillac, M.: Multiresolution morphing of planar curves. Computing 79, 197–209 (2007) 16. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active Contour Models. Intern. J. of Computer Vision 1, 321–331 (1988) 17. Keppel, E.: Approximating complex surfaces by triangulation of contour lines. IBM J. Res. Devel. 19, 2–11 (1975) 18. Kimmel, R., Amir, A., Bruckstein, A.: Finding shortest paths on surfaces using level sets propagation. IEEE Trans. Pat. Anal. Mach. Int. 17, 635–640 (1995) 19. Kimmel, R., Kiryati, N., Bruckstein, A.: Sub-pixel distance map and weighted distance transforms. J. of Math. Imaging & Vision 6, 223–233 (1996) 20. Mauch, S.: Efficient Algorithms for Solving Static Hamilton-Jacobi Equations, Doctoral Thesis, California Institute of Technology, Pasadena, California (2003) 21. Melkman, A.: On-line construction of the convex hull of a simple polygon. Inform. Proc. Letters 25, 11–12 (1987) 22. Rouy, E., Tourin, A.: A viscosity solutions approach to shape-from-shading. SIAM J. Numer. Anal. 29, 867–884 (1992) 23. Sethian, J.: A fast marching level set method for monotonically advancing fronts. Proc. Natl. Acad. Sci. USA 93, 1591–1595 (1996) 24. Welzl, E., Wolfers, B.: Surface reconstruction between simple polygons. In: Lengauer, T. (ed.) ESA 1993. LNCS, vol. 726, pp. 397–408. Springer, Heidelberg (1993) 25. Yezzi, A., Prince, J.: An Eulerian PDE approach for computing tissue thickness. IEEE Trans. on Medical Imaging 22, 1332–1339 (2003)
Pose Invariant Shape Prior Segmentation Using Continuous Cuts and Gradient Descent on Lie Groups Niels Chr. Overgaard, Ketut Fundana, and Anders Heyden Applied Mathematics Group, Malmö University, Sweden {nco,ketut.fundana,anders.heyden}@mah.se
Abstract. This paper proposes a novel formulation of the Chan-Vese model for pose invariant shape prior segmentation as a continuous cut problem. The model is based on the classic L2 shape dissimilarity measure and with pose invariance under the full (Lie-) group of similarity transforms in the plane. To overcome the common numerical problems associated with step size control for translation, rotation and scaling in the discretization of the pose model, a new gradient descent procedure for the pose estimation is introduced. This procedure is based on the construction of a Riemannian structure on the group of transformations and a derivation of the corresponding pose energy gradient. Numerically, this amounts to an adaptive step size selection in the discretization of the gradient descent equations. Together with efficient numerics for TVminimization we get a fast and reliable implementation of the model. Moreover, the theory introduced is generic and reliable enough for application to more general segmentation- and shape-models.
1
Introduction
The celebrated model of T. Chan and L. Vese [1] for piecewise constant, twophase segmentation of a gray scale image I : Ω → R+ can be formulated as follows: Among all characteristic functions u = 1Σ of measurable sets Σ, contained in the bounded (image) domain Ω ⊂ R2 , and all pairs of real numbers c = (c0 , c1 ), find u∗ = 1Σ ∗ , c∗ = (c∗0 , c∗1 ) which minimizes the following energy λ 1 − u, (I − c0 )2 + u, (I − c1 )2 , (1) 2 where λ > 0 is a fixed weight, J(u) = Ω |∇u| dx is the total variation of u, and u, v = Ω uv dx is the L2 inner product between u and v. Recall that for u = 1Σ , J(u) = Per(Σ), the perimeter (in Ω) of Σ, i.e. the length of the boundary Γ = ∂Σ in Ω. Traditionally, and originally [1], minimization of (1) was formulated in the level set framework of Osher an Sethian [2, 3, 4] by setting u = H(φ), where H denotes the Heaviside function, and φ : Ω → R an embedding function used to represent the image object implicitly as Σ = {x ∈ Ω ; φ(x) > 0}. This highly ECV (u, c) = J(u) +
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 684–695, 2009. c Springer-Verlag Berlin Heidelberg 2009
Pose Invariant Shape Prior Segmentation Using Continuous Cuts
685
non-linear optimization problem is solved using gradient descent, which, in the level set framework, corresponds to the following evolution PDE for the active contour Γ (t) := ∂Σ(t) = {x ∈ Ω ; φ(x, t) = 0}, ∇φ λ ∂φ = div( ) + (I − c0 )2 − (I − c1 )2 |∇φ|, ∂t |∇φ| 2 where t is an artificial time parameter and φ = φ(x, t) a time dependent level set function. At every instant in this evolution, the gray value estimates c0 , c1 are updated according to c0 = c0 (u) =
1 − u, I 1 − u, 1
and c1 = c1 (u) =
u, I . u, 1
(2)
One of the most inspiring discoveries in resent years, due to Chan, Esedou¯glu and Nikolova [5], is that, for any fixed c, the minimization (1) with respect to binary label functions u may be solved exactly by considering a convex relaxation of the problem, where the set of admissible u’s is enlarged to: K := {u ∈ BV(Ω) ; 0 ≤ u(x) ≤ 1 for all x ∈ Ω}.
(3)
In fact, it was shown in [5] that if u ∈ K minimizes (1), then for almost all thresholds t ∈ (0, 1) the function 1 if u(x) > t ut (x) = (x ∈ Ω), (4) 0 otherwise is a global minimizer for the original problem. The proof is recalled in Section 2.1. Thus, global minimizers of the Chan-Vese model can be found by truncation of the solution to an easier, unilaterally constrained, convex variational problem. The use of this truncation property is referred to as the continuous (graph) cut method, and problems formulated in this manner can be solved efficiently using fast algorithms for TV-minimization. See, e.g., Chambolle [6]. The problem of including apriori shape information into the segmentation process has been studied extensively within the level set framework for the last decade or so [7, 8, 9, 10, 11]. The common approach is to include a interaction energy between object Σ and a prior shape Σ into the segmentation functional. If f denotes the characteristic function of the prior shape Σ , then a typical shape prior segmentation functional looks like E(u, c, f ) = ECV (u, c) +
γ u − f 2 , 2
(5)
where γ > 0 is a fixed coupling constant for the interaction, and u = u, u is the L2 norm. The shape interaction in (5) may be interpreted geometrically as u − f 2 = area(Σ Σ ), i.e. the area of the symmetric set difference between the sets Σ and Σ , c.f. [10] and [11]. The segmentation is now obtained by minimization of the functional (5) with respect to the (binary) label functions
686
N.Chr. Overgaard, K. Fundana, and A. Heyden
u, gray values c and f ∈ F , where F denotes a class of prescribed shape priors. This formulation is quite general. A specific example, considered in this paper, is segmentation with pose invariant priors. In this case F = {f = f0 ◦ T }T ∈G , where the binary function f0 is a shape template, and T ranges over a group of transformations G, e.g. the group of similarity transformations. Since continuous cuts have emerged as an alternative to level sets for minimization of the CV- and other segmentation models, it is natural to ask if known shape prior segmentation models can be reformulated as variational problems possessing the important truncation property, which allow them to be solved using TV-minimization algorithms. One such attempt has been made in [12], see Section 2.2, but it does not go all the way. The purpose of the present paper is to formulate the shape prior segmentation model (5) as a continuous cut problem. This is achieved by reformulating the problem as a CV model (see Section 3.1). We specifically consider shape priors which are pose invariant under the group of similarity transforms, which involves optimization over a Lie group. In order to solve this problem efficiently and reliably, we develop a theory for gradient descent on Lie groups (Section 3.3). The problem here is, essentially, to construct a Riemannian structure on the Lie group. The new theory eliminates the problems associated with step-size selection in discretizations of the gradient descent ODEs usually encountered in segmentation models with pose estimation.
2 2.1
Background Relaxation in the CV Model
In this section we briefly describe the theory behind the continuous cut solution for the CV model and its connection to the ROF denoising model and TV-minimization. Let us consider the minimization of (1) over the set of label functions u ∈ K defined in (3), and gray values c ∈ R2 . In this setting ECV is a bi-convex functional, that is, convex in each of its arguments u and c, separately, when the other is kept fixed. However, ECV is not jointly convex. One therefore uses a method referred to, in this paper, as the CV-algorithm, which alternates between optimization in u and c: If an initial state (u0 , c0 ) is given, then a minimizing sequence (uk , ck ) is constructed by uk+1 = arg min ECV (u, ck ),
(6)
u∈K
ck+1 = arg min ECV (uk+1 , c).
(7)
c∈R2
The sub-problem (7) is a simple quadratic optimization whose solution is readily given by the formulas in (2) with u = uk+1 . We therefore proceed to describe the theory and algorithms for the continuous cut solution of the sub-problem (6). If c is fixed then the minimization of (1) over K is equivalent to minimization over K of the energy
Pose Invariant Shape Prior Segmentation Using Continuous Cuts
λ ˆ E(u) = J(u) + (I − c1 )2 − (I − c0 )2 , u := J(u) + g, u, 2
687
(8)
where g = (λ/2)[(I − c1 )2 − (I − c0 )2 ] is the data term. We now prove the result in Chan et al. [5] referred to in the Introduction, that minimization of Eˆ over binary u’s can be obtained from the solution of the convex variational problem ˆ inf u∈K E(u) by truncation. For u ∈ BV (Ω), let ut denote the function defined in (4). We recall: ˆ then so does ut for almost The Truncation Lemma. If u ∈ K solves inf K E, all t ∈ [0, 1]. 1 Proof. The coarea formula, J(u) = 0 J(ut ) dt, and the layer cake representation 1 1 ˆ ˆ t ) dt. Since ut ∈ K it is g, u = 0 g, ut dt, together yield E(u) = 0 E(u t ˆ ) ≥ E(u) ˆ admissible, and E(u for all t, by assumption, the integrand on the left 1 t ˆ ˆ hand side of 0 E(u ) − E(u) dt = 0 must be zero for almost all t ∈ [0, 1].
In Chan et al. [5], the minimum was approximated by solving a degenerate parabolic PDE for u (the gradient descent PDE) with an exact penalty term to ensure that the constraint 0 ≤ u ≤ 1 is satisfied at all times. This PDE was implemented with an explicit finite difference scheme, and is therefore rather slow. We have chosen another method, introduced by Aujol and Chambolle [13] and used successfully by Bresson et al. [14, Sect. 3.2]. This consists of minimizing a variant of (8) which has been regularized slightly by infimal convolution with a quadratic function: inf
v∈BV, u∈K
J(v) +
1 v − u 2 + g, u , 2θ
(9)
where θ > 0 is a parameter, and send θ → 0. For θ fixed, the problem is solved iteratively using what we call the ABC-algorithm: If (v 0 , u0 ) denotes an initial guess, then a minimizing sequence is given by the pair (v n , un ) where 1 v n+1 = arg min J(v) + v − un 2 = un − θ PrC (un /θ), 2θ v∈BV un+1 = arg min u∈K
1 n+1 v − u 2 + g, u = PrK (v n − θg). 2θ
(10) (11)
The first of these problems is the classical Rudin-Osher-Fatemi (ROF) image denoising model [15] with un as input image. The second one is a simple L2 optimization. Both problems are strictly convex, thus admits unique solutions, and, as indicated, their optima can be expressed in terms of L2 -projections onto closed convex sets: the first projection is onto C, which is the L2 -closure of the set {div ξ ; ξ ∈ C 1 (Ω; R2 ), |ξ(x)| ≤ 1 ∀x ∈ Ω}, c.f. Chambolle [6]. The second projection is onto K, defined above. The latter is easy to compute, indeed PrK f (x) = min(1, max(0, f (x))) for x ∈ Ω), for any square L2 function f : Ω → R.
688
N.Chr. Overgaard, K. Fundana, and A. Heyden
To minimize the ROF functional (10) we use a variant of the fast and reliable algorithm for TV-minimization proposed by Chambolle [6, 16]. 2.2
The Algorithm of Fundana and Co-workers
A resent paper by Fundana et al. [12] contains what is probably the first attempt to include shape priors into continuous cut segmentation. The authors consider the model (5) where f = f0 ◦ T is pose invariant under the group of similarity transformations T of the plane, i.e. the variational problem inf
u,c,T
γ E(u, c, T ) := ECV (u, c) + u − f0 ◦ T 2 . 2
(12)
This problem cannot be solved by continuous cuts (for c and T fixed) simply by enlarging the admissible label functions from the binary u’s to u ∈ K. The problem, of course, lies in the quadratic interaction term, which seems to “spoil” the Truncation Lemma. In [12] this problem is cleverly circumvented by the following construction: If (u0 , c0 , T 0 ) denotes an initial guess then a minimizing sequence (uk , ck , T k ) is (essentially) constructed by the following procedure: ck+1 = c(uk )
using formula (2).
(13)
∂ E(uk , ck+1 , T k ) time step Δt > 0 (14) ∂T γ = arg min ECV (u, ck+1 ) + u − f0 ◦ T k+1 , uk − f0 ◦ T k+1 (15) 2 u∈K
T k+1 = T k − Δt uk+1
Here we observe that by “freezing” one occurrence of u = uk in the quadratic interaction term, the update step (15) becomes linear in u, hence solvable by continuous cut methods. In [12] this minimization was performed using the gradient descent PDE from [5]. Our aim is to improve the above method by formulating the problem in such a way that the model itself, not only the algorithm, satisfies the truncation property.
3 3.1
The Shape Prior Segmentation Model The Basic Energy Functional
Our reformulation of the functional (5) is based on the following observation: If the label function u : Ω → {0, 1} is binary, and we define an image model by Imodel = Imodel (u, c) = c0 (1 − u) + c1 u, then it is easy to see that the CVfunctional (1) may be rewritten as: ECV (u, c) = J(u) +
λ I − Imodel 2 . 2
(16)
This suggests the following model for shape prior segmentation: If f : Ω → R denotes a (possibly fuzzy) shape prior, that is 0 ≤ f (x) ≤ 1 on Ω, then we
Pose Invariant Shape Prior Segmentation Using Continuous Cuts
689
associate an image model to f given by Iprior = Iprior (f, b) = b0 (1 − f ) + b1 f . We now pose shape prior segmentation as the minimization over all binary label functions u of the following functional: λ μ E(u, c, f, b) = ECV + Eprior = J(u)+ I − Imodel 2 + Imodel − Iprior 2 . (17) 2 2 Notice that close to convergence, it is reasonable to expect that b0 ≈ c0 and b1 ≈ c1 . Assuming that exact equality holds here, we find that μ μ Imodel − Iprior 2 = (c1 − c0 )2 u − f 2 , 2 2
(18)
which corresponds to the interaction term in (5) if we set γ = μ(c1 − c0 )2 . We will use this simplification in Section 3.2. Let us consider the minimization of (17) with respect to u and c when prior data b and f are kept fixed. After completion of squares in (17) we find that E(u, c, f, b) = J(u) +
λ + μ
Imodel − ( λ I + μ Iprior ) 2 2 λ+μ λ+μ
2 μ 1 1 λ I+ Iprior . + I 2 − 2 2 λ+μ λ+μ
(19)
Only the first square depends on the (binary) u and c. So updating u and c is equivalent to solving the following CV-problem: λ + μ 1 − u, (Ieff − c0 )2 + u, (Ieff − c1 )2 . inf J(u) + (20) 2 μ λ I + λ+μ Iprior is an effective image obtained as a convex combiHere Ieff = λ+μ nation of the observed image I and the prior image Iprior . The problem (20) has the truncation property, and may be solved by the CV-algorithm (6), (7), using continuous cuts. This solution is a minimizer of (17). Suppose that c and u have been updated and are now held fixed. Returning to the energy E, written in the original form (17), we optimize with respect to prior image model Iprior = b0 (1−f )+b1f . An easy calculation shows the optimal gray scales b = (b0 , b1 ) are given by the formulas:
b0 =
1 − f, Imodel 1 − f 2
and b1 =
f, Imodel . f 2
With these values fixed, we proceed to update the pose of the shape prior f , which is the subject of the next few sections. 3.2
Pose Invariant Prior Interaction Energy
Let f0 : Ω → R denote a shape template of class C01 (Ω), and T : R2 → R2 a similarity transformation, that is, a mapping of the form y = T (x) = μ−1 R−1 (x − a), x ∈ R2 , where R ∈ SO(2) denotes rotation, μ > 0 a scaling
690
N.Chr. Overgaard, K. Fundana, and A. Heyden
factor, and a ∈ R2 translation. We define the shape prior f as the transformed template T ∗ f0 : R2 → R by the formula f (x) = T ∗ f0 (x) = (f0 ◦ T )(x) = f0 (T (x)) for all x ∈ R2 . If T is sufficiently close to the identity map then, clearly, T ∗ f0 ∈ C01 (Ω), so that the support of the prior will remains inside the image domain Ω. In the present paper we use the simplification of (17) in (18) and consider a pose invariant prior interaction defined by the energy, Eprior (u) = inf u − T ∗ f0 2 = inf (u(x) − f0 (T (x)))2 dx, (21) T
T
Ω
where the infimum is taken over the group of similarity transforms T in the plane. The following (natural) parametrization is used throughout: cos θ − sin θ (θ ∈ R). (22) a ∈ R2 , μ = eσ (σ ∈ R), and R(θ) = sin θ cos θ The pose parameters θ, σ and a are collected in a vector p = (p1 , p2 , p3 , p4 ) := (θ, σ, a) ∈ R4 , the corresponding map is occasionally denoted T = T (p), and the shape prior becomes f (x) = T ∗ f0 (x) = T (p)∗ f0 (x) = f0 (e−σ R(−θ)(x − a)). Now, the infimum in (21) is usually computed by applying a gradient descent procedure to the function R4 p → E(p) := u − T (p)∗ f0 2 /2. That is, one solves a system of ODE:s given by p (t) = −∇E(p(t)), with respect to an artificial time parameter t, and the obtain the optimal pose p∗ as p∗ = limt→∞ p(t). This method requires the computation of the partial derivatives ∂E(p)/∂pi for every component pi of p. A simple calculation shows that ∂E(p)/∂pi = T (p)∗ f0 − u, ∂T (p)∗ f0 /∂pi , so we begin with the partials ∂T (p)∗ f0 (x)/∂pi . By the chain rule, ∂ ∗ T f0 (x) = −∇x T ∗ f0 (x) = −∇x f (x) (two components!) ∂a ∂ ∗ T f0 (x) = −∇x T ∗ f0 (x)T J(x − a) = −∇x f (x)T J(x − a) ∂θ ∂ ∗ T f0 (x) = −∇x T ∗ f0 (x)T (x − a) = −∇x f (x)T (x − a) ∂σ 0 1 where J = R(−θ)T R (−θ) = [ −1 0 ] is the clockwise rotation by π/2 radians. Notice that −∇x f appears in all the formulas, with the x-derivative computed after transformation of the template. It follows from the above formulas that the partial derivatives of E(p) are given by (The first equation being interpreted component wise.)
∂ E(θ, σ, a) = −f − u, ∇x f , ∂a and
∂ E(θ, σ, a) = −f − u, ∇x f T J(· − a), ∂θ
∂ E(θ, σ, a) = −f − u, ∇x f T (· − a). ∂σ (23)
Pose Invariant Shape Prior Segmentation Using Continuous Cuts
691
These integrals are effectively computed on the support of −∇x f , that is, over a neighbourhood of the boundary of the shape prior. The traditional way to proceed is to iteratively update the pose parameters a, θ and σ using (essentially) the schemes a(t + Δta ) = a(t) − Δta · ∂E/∂a, θ(t + Δtθ ) = θ(t) − Δtθ · ∂E/∂θ, and σ(t + Δtσ ) = σ(t) − Δtσ · ∂E/∂σ. This is problematic; in order for this method to work properly the time steps Δta , Δtθ and Δtσ have to be chosen differently, and with great care. This is not only unsatisfying from a theoretical view point but it also limits the practical applicability of the method; not least because the delicate choice of time steps tends to be time-consuming. We address this problem in the next section. 3.3
The Gradient Construction
The group of similarity transformations constitutes a four-dimensional manifold that we denote M (i.e., M is a Lie group). Any point p ∈ M may be represented by the coordinates p = (θ, σ, a) using (22), which may be regarded as a (almost global) parametrization of a neighbourhood of the identity map in M. If E : M → R is a differentiable function then dE(p) : Tp M → R denotes the differential of E at p ∈ M, where Tp M is the tangent space of M at p. In the lo∂E ∂E cal coordinates the differential may be expressed as dE = ∂E ∂a da + ∂θ dθ + ∂σ dσ. Suppose that Tp M is equipped with a scalar product (·, ·)p , then we may define the gradient of E at p as the unique vector ∇E(p) ∈ Tp M which satisfies the relation (∇E(p), v)p = dE(p)v, ∀v ∈ Tp M. (24) The metric ds2 = |da|2 + dθ2 + dσ 2 defines a scalar product which, as already noted, is insufficient for the construction of a reliable gradient descent scheme for E(p) = u − T (p)∗ f0 2 /2. Our goal is to define a Riemannian structure on M which is better suited for this task. Let a function f : M × R2 → R be defined by f (p, x) = T (p)∗ f (x) = f0 (T (p)x). Since the shape template f0 ∈ L2 (R2 ), the mapping p → f (p, ·) is a function f : M → L2 (R2 ). Now, L2 (R2 ) comes with an inner product ·, ·, so it is natural to define the scalar product on Tp M as the pullback by f of the L2 -inner product to the tangent space Tp M, (v, w)p = df (p)v, df (p)w,
(v, w ∈ Tp M)
(25)
where df (p) : Tp M → Tf (p) L2 (R2 ) ≡ L2 (R2 ) denotes the differential of f . By the chain rule, df = −(Dx f0 ◦ T )dT , so in view of the identity dT = DT dp = Dx T (p)DT (0)dp (which uses the group structure of M) we see that df = −∇x f T DT (0) dp, where DT (0) is the linear map given by the block matrix: DT (0) = I2×2
J(x − a)
(x − a) .
692
N.Chr. Overgaard, K. Fundana, and A. Heyden
0 1 As before, J = [ −1 0 ]. With this calculation we find that
df (p)v,df (p)w = −∇x f T DT (0) dp(v), −∇x f T DT (0) dp(w) = dp(v)T 1, DT (0)T ∇x f ∇x f T DT (0)dp(w) := dp(v)T G(p)dp(w), where G(p) denotes the metric tensor on Tp M expressed in the coordinates p. If we define M = ∇x f ∇x f T then G(p) = 1, g(p, ·) where g(p, ·) : R2 → R4×4 is given by g(p, x) = DT (0)T M DT (0), which equals ⎡
M J(x − a)
M
M (x − a)
⎤
⎢ ⎥ ⎣(x − a)T J T M (x − a)T J T M J(x − a) (x − a)T JM (x − a)⎦ (x − a)T M
(x − a)T M J(x − a)
(x − a)T M (x − a)
This expression is, unfortunately, too complicated for our present purpose, so we need to make some simplification. This is achieved by approximating the structure tensor M by the simpler tensor 12 |∇x f |2 I2×2 . (There are some compelling reasons for doing so! For instance g3,3 + g4,4 = |∇x f |2 |x − a|2 .) With this simplification we get ⎡
⎤ J(x − a) (x − a) I2×2 |x − a|2 (x − a)T J(x − a)⎦ , g(p, x) = |∇x f |2 ⎣(x − a)T J T T T (x − a) (x − a) J(x − a) |x − a|2 where we notice that, in fact, the matrix elements g4,3 = g3,4 = 0 because J is skew-symmetric. Finally, if we choose a—the center of rotation and scaling— such that |∇x f |2 , x − a = 0, that is, as the barycenter of the mass-distribution dm = |∇x f |2 dx, then the metric tensor G = 1, g has the following diagonal form: ⎡ ⎤ ∇x f 2 I2×2 0 0 ⎦. 0 |x − a|∇x f 2 0 G(p) = ⎣ (26) 2 0 0 |x − a|∇x f Equivalently, (dp, dp)p = ∇x f 2 |da|2 + |x − a|∇x f 2 (dθ2 + dσ 2 ). It follows from (25) and the formulas (23), that the corresponding gradient of E has the components: ∇a E =
f − u, −∇x f , ∇x f 2
∇θ E =
f − u, −∇x f T J(· − a) , |x − a|∇x f 2
f − u, −∇x f T (· − a) . and ∇σ E = |x − a|∇x f 2
(27)
This is the gradient used in our implementation of gradient descent search for the optimal pose parameters. Its use amounts to an adaptive step-size control in the numerical discretization of the associated system of ODEs.
Pose Invariant Shape Prior Segmentation Using Continuous Cuts
693
Fig. 1. Experiment 1: First row: The original image, 212×320 pixles (left), the active contour Γ = {x ; u(x) = .5} in CV-segmentation without priors after 100 iterations (middle), and the corresponding segmentation (right). Second row: The shape template, the active contour and the shape prior after 150 iterations, and the final segmentation. Final row: segmentation of the image contaminated with 15% Gaussian noise using 200 iterations. Parameters: μ = .4, λ = .1, θ = .5 and step-size Δt = .75.
4
Experiments
The method presented in Section 3 was implemented in MATLAB with the following specifics: For the minimization of (17) (in the form (19)) we used the ABC-algorithm (10) and (11) with the parameter θ = 0.5 and a variant of Chambolle’s algorithm [16, Eq. (12)], implemented with periodic boundary conditions, for the TV-minimization in (10). This was alternated with an update of the pose of the prior, using gradient descent with the new gradient (27). The experiments presented here are limited to a proof-of-concept level. The first experiment (Figure 1) shows the CV segmentation with and without the shape prior, and with added noise. The segmentation result is displayed as a cutout from the original image by multiplication with the optimal label function u. This verifies the binary character of u. The second experiment (Figure 2) shows how the search evolves for three different initializations. As shown, the method may not always converge to the wanted solution. In fact, the prior contour may sometimes even shrink and disappear. These cases correspond, however, to quite plausible local minima for the pose energy, and this behavior is not unexpected in a local optimization method. More details are found in the figure captions.
694
N.Chr. Overgaard, K. Fundana, and A. Heyden
Fig. 2. Experiment 2: Shape prior segmentation with three different initial poses (top row). Evolution after (approximately) 12, 25, 50, 100 and 200 iterations (rows 2–6). The run-time for 100 iterations is about 25 CPU-seconds. In the final phase of the segmentation, objects previously detected outside the prior disappear. With the third initialization the shape prior gets stuck in a local minimum. Such behavior cannot be ruled out when we work with local optimization methods. Image size and parameter settings are as in Experiment 1.
Pose Invariant Shape Prior Segmentation Using Continuous Cuts
5
695
Conclusion
This paper contains two central contributions. Firstly, the reformulation in (17) of the shape prior segmentation model in (5), which leads to a minimization problem which can be solved using continuous cut methods. Secondly, the derivation of the gradient expressions (27), which is the basis for a stable and efficient gradient descent scheme for prior pose optimization. We believe that the ideas introduced here can be extended to cover more general and complex shape prior segmentation models. In particular it would be interesting to see if the ideas can be applied to pose problems in three dimensions.
References 1. Chan, T., Vese, L.: Active contours without edges. IEEE Transactions on Image Processing 10(2), 266–277 (2001) 2. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 3. Sethian, J.: Level Set Methods and Fast Marching Methods Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge University Press, Cambridge (1999) 4. Osher, S.J., Fedkiw, R.P.: Level Set Methods and Dynamic Implicit Surfaces. Springer, Heidelberg (2002) 5. Chan, T.F., Esedo¯ glu, S., Nikolova, M.: Algorithms for finding global minimizers of image segmentation and denoising models. SIAM J. Appl. Math. 66(5), 1632–1648 (2006) 6. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging and Vision 20(1–2), 89–97 (2004) 7. Leventon, M., Grimson, W., Faugeras, O.: Statistical shape influence in geodesic active contours. In: CVPR (2000) 8. Rousson, M., Paragios, N.: Shape priors for level set representations. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 78–92. Springer, Heidelberg (2002) 9. Cremers, D., Soatto, S.: A pseudo-distance for shape priors in level set segmentation. In: Faugeras, O., Paragios, N. (eds.) 2nd IEEE Workshop on Variational, Geometric and Level Set Methods in Computer Vision (2003) 10. Chan, T., Zhu, W.: Level set based prior segmentation. Technical Report UCLA CAM 03-66, Department of Mathematics, UCLA (2003) 11. Riklin-Raviv, T., Kiryati, N., Sochen, N.: Unlevel-sets: Geometry and prior-based segmentation. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3024, pp. 50–61. Springer, Heidelberg (2004) 12. Fundana, K., Heyden, A., Gosch, C., Schnörr, C.: Continuous graph cuts for priorbased object segmentation. In: Proc. ICPR (2008) 13. Francois Aujol, J., Chambolle, A.: Dual Norms and Image Decomposition Models. Int. J. Comput. Vis. 63(1), 85–104 (2005) 14. Bresson, X., Esedo¯ glu, S., Vandergheynst, P., Thiran, J.-P., Osher, S.: Fast global minimization of the active contour/snake model. J. Math. Imaging Vis. 28(2), 151– 167 (2007) 15. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 16. Chambolle, A.: Total variation minimization and a class of binary MRF models. UMR CNRS 7641, Ecole Polytechnique, Centre de mathematiques appliquées (June 2005)
A Non-local Approach to Shape from Ambient Shading Emmanuel Prados1 , Nitin Jindal1 , and Stefano Soatto2 1
2
Perception Lab., INRIA Grenoble – Rhône-Alpes, France Computer Science Department, University of California, Los Angeles, USA
Abstract. We study the mathematical and numerical aspects of the estimation of the 3-D shape of a Lambertian scene seen under diffuse illumination. This problem is known as “shape from ambient shading” (SFAS), and its solution consists of integrating a strongly non-local and non-linear Integro-Partial Differential Equation (I-PDE). We provide a first analysis of this global I-PDE, whereas previous work had focused on a local version that ignored effects such as occlusion of the light field. We also design an original approximation scheme which, following Barles and Souganidis’ theory, ensures the correctness of the numerical approximations, and discuss about some numerical issues.
1
Introduction
Shape From Shading (SFS) refers to the problem of computing the three-dimensional shape of a surface, under certain assumptions on its reflectance and on the illumination, from a single grayscale image. By necessity, to render the problem tractable, these assumptions are rather coarse: Most restrict the illumination to a single point-light source at infinity [20, 4, 13, 7]. Only recently, [14] have shown that the problem actually simplifies when the attenuation of the light source at finite distance is taken into account. Nevertheless, due to inter-reflections and other complex phenomena, modeling illumination as a point source is very unrealistic even on a bright sunny day. Indeed, in most realistic conditions including indoors and outdoor overcast conditions, a uniform hemispherical illumination source is a more realistic model. The study of SFS under such illumination conditions has been pioneered by Langer et al. [10, 16, 9], and followed by others that we discuss shortly. In this work, we focus on the mathematical properties of the problem of “Shape From Ambient Shading” (SFAS), and seek for conditions that render the problem well-posed. 1.1
Relation to Prior Work
Langer et al. [10,16,9] were the first to consider the case of ambient lighting, and to note that vignetting effects, far from being a nuisance, enable the inference of object shape similar to more traditional SFS, except for the added complication of the distributed source. In [17], Tian, Tsui and Yeung have proposed a numerical SFS algorithm for dealing with some non-punctual and multiple X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 696–708, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Non-local Approach to Shape from Ambient Shading
697
light sources (any combination of spherical, rectangular and cylindrical light sources). Following a more elaborate and physically motivated model of illumination, [12, 18, 9, 19] introduced methods to deal with interreflection. However, in none of these works [17, 10, 16, 9, 12, 18, 9] are the mathematical properties of the SFAS problem elucidated analytically. In particular, there are no results on the existence and uniqueness of solution for the ensuing global PDE. At the opposite end of the spectrum, Lions, Rouy and Tourin [11] performed a theoretical analysis the SFS problem for multiple and continuous distributed light sources. As Tian, Tsui and Yeung [17], Lions, Rouy and Tourin neglect shadows (i.e. occlusions of the light sources by the surface itself); more specifically, they assume that for any fixed point x on the surface, all the light sources located on the hemisphere normal to the surface at x are visible from this point. This allows them to neglect the global nature of the equation, which in turn significantly simplifies the analysis. As Langer et al. [10,16,9] we focus on ambient lighting. In their work, Langer et al. do not neglect the “shadows effect” and they model interreflections. They also underline the importance of ambient lighting in psychophysics. In this context, light comes from all directions and the assumption of Lions, Rouy and Tourin [11] is equivalent to assume that the solution is concave. Here, we do not want to limit ourself to concave objects. Therefore, Lions’ constraints are far too restrictive.1 . The necessity to consider these phenomena takes us to mathematicaly uncharted territories. To the best of our knowledge, we are the first to provide theoretical results for the SFAS problem. Also, we introduce numerical algorithms verifying the properties of monotony, consistency and stability which typically ensure its convergence (see [1]).
2
Modeling Shape from Ambient Shading
Shape From Shading exploits assumptions on the illumination and reflectance properties of the scene (or of an object of interest within the scene) to relate its three-dimensional (3-D) shape to the measured grayscale image. The most typical assumptions are that the scene is Lambertian with constant diffuse albedo. This is akin to chalk and rough stone, and neglects specularities, translucency and other complex phenomena in the interaction of light with matter. While this assumption is clearly violated in most natural and man-made scenes, there are significant portions of scenes where the assumption is reasonable, and even objects that are far from Lambertian, such as human faces, have been successfully approximated as such for the purpose of analysis and inference (but not for synthesis, as humans are evolutionarily atuned to discriminate subtle features in human faces). Clearly, being SFS an ill-posed problem, there is no way to validate the assumptions on the data themselves, so applying SFS to a scene that is not Lambertian and that does not have constant diffuse albedo will results in gross errors even if the SFS algorithm used is provably correct and optimal. The 1
For simplicity, however, we also neglect interreflections, as Lions et al. [11] did, and we lump their contribution into the ambient illumination term, up to additive errors.
698
E. Prados, N. Jindal, and S. Soatto
second class of assumptions commonly made concern illumination. The most common assumption, that of a point light source, is made more for mathematical convenience than for realism. Under this model, anything hidden from direct line-of-sight to the sun would be invisible, clearly a far cry from reality. Modeling the entire sky as a constant-radiance hemisphere seems to be equally crude, but indeed it has been shown to be a better approximation that a single pointlight source [8]. Clearly, both phenomena are important and we auspicate their eventual integration. In the next subsection we formalize these assumptions and introduce our notation. 2.1
Reflectance Assumptions
Let S be a surface that supports a bi-directional reflectance distribution function (BRDF) β with Lambertian reflection and constant diffuse albedo ρ. In other words, following [6], the BRDF at a point p ∈ S does not depend on the viewing direction νpx , but only on the light source direction ν and on the position of the point itself p ∈ S: β(p; νpx , ν) = ρ. Because the intensity of the light source is not known, without loss of generality we can assume that the albedo to be equal to 1, and attribute the actual value to the light source. 2.2
Lighting Assumptions
We assume the dominating sky principle [10], so we neglect inter-reflections and, for any point of the surface, consider only radiant energy coming from the sky which is assumed to be a whole sphere of infinite radius. We also assume that the illumination is homogeneous, that is to say, that its power density distribution is constant. This assumption is required if we want to get rid of other contraints while still keeping the problem manageable. Now, unlike most previous work, we want to model the effect of self-occlusions, whereby the light source is only partly visible at each point. Let q be a point in R3 . We call visibility function and we denote χS (q; ν) the indicator function of the directions ν ∈ S2 from q that are not occluded by S: χS (q; ν) = 1 if {q + λν, λ ∈ R+ } ∩ S = φ, otherwise, χS (q; ν) = 0. The visibility function specifies if a point q is reached by the light ray of direction ν. The visibility cone assembles all the visible rays from a point q ∈ R3 : CS,q = {ν ∈ S2 : χS (q; ν) = 1}. 2.3
Resulting Radiance
Given the assumptions above, the radiance of the surface at a point p is χS (p; ν)ν, νp dν = ν, νp dν, = ν, νp + dν. RS (p) = S2
CS,p
(1)
CS,p
where νp is the unit normal vector to the surface S at p (see [6]) and where for all a in R, a+ = a if a ≥ 0 and a+ = 0 else. Here, the surface is implicitly assumed to be smooth. This ensures that all the ligth rays visible from a point come from above its tangent plane (the tangent plane would not be defined otherwise). So,
A Non-local Approach to Shape from Ambient Shading
699
for all points p on S, all the ligth rays visible from that point are included in the hemisphere defined by the normal νp to the surface at that point; that is to say CS,p ⊂ Hemiνp . Therefore ∀ν ∈ CS,p , ν, νp ≥ 0. Already at this point one can immediatly see the difficulty introduced by self-occlusions, for the integration domain of (1) is restricted to the visibility cone CS,p , which directly depends of the global geometry of the scene S. This is unlike traditional SFS, where the radiance only depended on local properties of the scene, for instance the direction of the normal νp to the surface at a given point. This requires the deployment of a different arsenal of tools that traditionally considered in SFS.2 Unlike most prior work, we consider full ambient illumination. In such a case, the assumptions of [11] are equivalent to assuming that the surface is convex which is too restrictive an assumption. In the next section we relate the measurements, i.e. the image greyscale, to the unknown – the 3-D shape of the scene – via the model above.
3
Shape from Ambient Shading
In this section we formalize the problem of SFAS as the solution of a global integro-partial differential equation, which we analyze in the next section. 3.1
Imaging Equation
We assume that we measure a greyscale image I : D ⊂ R2 → R+ ; x → I(x), on a closed domain D. Our goal is to characterize the surfaces S which generate it. Note that in general there is no guarantee that the surface is unique. We now need to link the measurements (I) with the unknowns (S). To do so we use the assumptions developed in the previous section, together with the socalled Radiance equation [6], which approximates the brightness of a pixel x of the image with the radiance of the point πS−1 (x) of the surface viewed in x: I(x) = RS (πS−1 (x)). Using the results from the previous section we have I(x) = ν, νp + dν, (2) CS,p
where νp is the outward-pointing normal vector to the surface S at the point p = πS−1 (x). In what follows we are going to assume that the data I corresponds with an image of a scene verifying our modeling assumptions. In particular, for 2
In order to simplify the problem and to remove this global dependency, Lions, Rouy and Tourin [11] assume that for all the points of the surface, all the light sources located on the normal hemisphere are visible. In other words, they assume that there are no self-shadows. simplifies strongly the problem Also, such an assumption because we have then C ν, νp RL (ν) dν = S2 ν, νp RL (ν) dν, where RL (ν) is S,p the power density distribution of the lighting. Also, this completely removes the global dependency of the radiance with respects to the whole shape.
700
E. Prados, N. Jindal, and S. Soatto
convenience, we rescale the range so as to have 0 ≤ I(x) ≤ π. Also for simplicity, we assume that the camera performs an orthographic projection of the scene. This is a reasonable hypothesis provided that the domain of interest in the scene is small compared to its distance to the camera. Under these conditions, we can represent the surface as the graph of a function u, and write the outward unit normal vector explicitly: S = {(x, u(x)); x ∈ D} ; ν(x,u(x)) = √ 1 (−∇u(x), 1). 2 1+|∇u(x)|
Finally, following [13], we could assume that the camera is a pinhole. This assumption could be forgone at the cost of a more complicated notation, but the core of the analysis in this paper would hold nevertheless. 3.2
Formulation as an Integro-Differential Equation
With the orthographic camera model, the image formation model above can be interpreted as a Partial Differential Equation (PDE) in the unknown function u: + 1 (−∇u(x), 1) , ν dν, (3) I(x) = 1 + |∇u(x)|2 Cu,(x,u(x)) where Cu,p denote CS,p (the surface S is represented by the function u). Solving the SFAS problem then amounts to integrating the PDE (3) given an image I. Clearly the result would be meaningful only if a solution exists, and if it is unique, or at least if one can characterize the set of functions u that are indistinguishable in the sense of all solving (3) for a given measured image I. Note that this equation is a first-order stationary global integro-partial differential equation of the general form: H(x, u(x), ∇u(x), u(.)) = 0, ∀x ∈ Int(D). The numerical and theoretical study of the solutions of these kind of equation is done via the Hamiltonian 1 H(x, t, p, u) = (−p, 1), ν+ dν − I(x). 1 + |p|2 Cu,(x,t)
4
Analysis of the Shape from Ambient Shading Equation
We consider now the problem of uniqueness of solution of (3). While we show that the solution is, in general, not unique, we give an analytical characterization of all the different scenes that – under the given assumptions – yield the same measured image. This analysis is important both for the purpose of implementing viable numerical integration scheme, and also to make SFAS a useful tool in Computer Vision. This is akin to what is done in Structure From Motion [5], where the 3-D structure of a scene is in general not unique, but one can easily characterize the solutions as being equivalence classes under the similarity, affine or projective groups depending on knowledge on the camera calibration. 4.1
An Intrinsic Ambiguity
First, recall that 0 ≤ RS (p) ≤ π, p ∈ S and CS,p ⊂ Hemiνp , so one can easily show that RS (p) = π iff CS,p = Hemiνp . Now, let us consider a completely
A Non-local Approach to Shape from Ambient Shading
701
u(x)
x
Fig. 1. Example of multiple solutions in dimension 1 when the image contains a subset of pixels having the maximal intensity. Any curve between the blue and the green curves, and which is concave on the set of points with maximal intensity, generates the same image as the one generated by the initial black curve.
white image with a maximal intensity: I(x) = π ∀ x ∈ D. With such an image, the solutions of equation (2) satisfy CS,p = Hemiνp for all the points p on the surface. Therefore, if we represent the surface as the graph of the function u, it is easy to see that the surface lies below the tangent plane to the surface at the point (x, u(x)). So, the solutions u of (3) are concave, and so is the surface S. Since inversely all concave functions generate such a white image then we can conclude that the set of solutions is comprised of all concave functions. In this case, the problem is clearly ill-posed because the image can be generated by a number of different surfaces, and therefore the solution cannot be unique. This problem does not arise only in this pathological case: It is patent as soon as the image contains a subset of pixels having the maximal intensity, as we illustrate in Figure 1. Pixels with maximal intensity are shown in red, and the green curve corresponds with a maximal solution when the blue gives the minimal one. Any curve between these two, which is concave on the set of points with maximal intensity, generates the same image as the one generated by the black curve. In the following sections, we will show that this condition is minimal, in the sense that the solution is unique if and only if there are no subsets of pixels having the maximal intensity. Also, when there are multiple solutions, they are characterized by in terms of their value on these subsets. 4.2
Uniqueness Result and Characterization of the Solutions
In this section we show that the solutions of the SFAS problem are charaterized by their value on the subset {x | I(x) = π} ⊂ D. To the end, let us define Ω = {x | I(x) < π} and let us complete the equation H(x, u(x), ∇u(x), u) = 0, ∀x ∈ D
(4)
by some Dirichlet boundary conditions on CΩ = D− Ω = {x ∈ D | I(x) = π}. In other words, we assume that we know the height of the solution on this subset. The equation then becomes H(x, u(x), ∇u(x), u) = 0, ∀x ∈ Ω, (5) u(x) = ϕ(x) ∀x ∈ CΩ.
702
E. Prados, N. Jindal, and S. Soatto
For mathematical convenience, we also assume that the brightness image I is continuous (then Ω is an open subset of D) and that the intensity is maximal ¯ ⊂ Int D). We on the boundary of the image (in other words, we assume that Ω can now state the uniqueness theorem: Theorem 1. If u and v are two C 1 solutions to equation (5) then u = v on D. This theorem ensures that there exists at most a unique C 1 solution to equation (5). Also, it provides a characterization of the set of the solutions of equation (4), characterized by its values on the subset CΩ (the region where I(x) = π). If the image never saturates (CΩ is empty), then the solution is unique when complemented by a Dirichlet boundary condition. Equivalently, all solutions are parameterized by their boundary conditions. Because of space constraints, we cannot report the complete proof of theorem 1 here, and we refer the reader to our technical report [15] for details. The relevance of this result from the standpoint of Computer Vision is that if we know the depth of the scene on the subset where the image is saturated, then there exists a unique solution to the Shape From Ambient Shading problem. This means that, elsewhere on the image, ambient shading is sufficient to recover the original surface which generated the image. In the next section we develop an approximation scheme for numerically integrating (5).
5
Approximation Scheme and Numerical Algorithm
In section 3 we have formalized the SFAS problem as the solution of a partial differential equation of the form H(x, u(x), ∇u(x), u) = 0. We have then added Dirichlet boundary conditions on CΩ = D−Ω to arrive at a unique solution when the image is not saturated. In order to compute a reliable numerical solution to this equation, we use machinery available for Hamilton-Jacobi equations. The key point consists then in designing approximation schemes which are monotone [2, 1]. 5.1
A Monotonic Scheme
Following [1], we consider schemes of the form S(h, x, uρ (x), uρ ) = 0 where S : ¯ × R × B(Ω) ¯ → R : (h, x, t, u)
R+ × Ω → S(h, x, t, u); h ∈ R+ defines the size of the grid that is used in the corresponding numerical algorithms (a 2D Cartesian ¯ is the space of bounded functions defined on the set Ω. ¯ uρ is the grid); B(Ω) ρ unknown (u is a function). Also, we are interested in the solution uρ of the ¯ and scheme S. We say that the scheme S is monotone if for all h ∈ R+ , x ∈ Ω ¯ ¯ t ∈ R the function S(h, x, t, ·) : B(Ω) → R is monotone. That is, for all y ∈ Ω, u(y) ≥ v(y), then S(h, x, t, u) ≥ T (h, x, t, v). An iterative algorithm for computing a numerical approximation of the solution directly follows. Given un (the approximation of uρ at step n), and a point ¯ the associated algorithm consists in solving the equation x of Ω, S(h, x, t, un ) = 0
(6)
A Non-local Approach to Shape from Ambient Shading
703
with respect to t. A solution of (6) is the updated value of un at x. Here, we are then going to use the definition of monotonicity given by Barles and Souganidis in [1]: Definition 1 (monotonicity). The scheme S(h, x, uρ (x), uρ ) = 0 defined in ¯ ∀t ∈ R and ∀u, v ∈ B(Ω), ¯ ¯ , is monotone if ∀h ∈ R+ , ∀x ∈ Ω, Ω u≤v
=⇒
S(h, x, t, u) ≥ S(h, x, t, v)
(the scheme is non-increasing with respect to u). The interest of the monotonicity is twofold. (i) With other basic assumptions (monotonicity with respect to t, existence of a subsolution, bound for the subsolutions), this property is the key to ensure that the scheme is stable (existence of the solution and of an upper bound), that the computed approximations converge towards the solution of the scheme, see [13]. (ii) Combined with some stability and consistency properties, the monotonicity ensures that the solutions of the scheme converge towards the continuous solution of the considered PDE when the grid vanishes see [1]. In what follows, we are going to design a monotonic approximation scheme for the SFAS problem in order to take advantage of all these benefits. 5.2
Monotonic Scheme for the SFAS Problem
For readability, we denote Hu,t (x, p) = H(x, t, p, u). Let us recall that the Hamiltonian of insterest in SFAS is Hu,t (x, p) = Cu,(x,t) √ 1 2 (−p, 1), ν+ dν −I(x). 1+|p|
One can verify easily that Cu,(x,t) is decreasing (in the sense of inclusion) with respect to u and increasing with respect to t. Also, it follows that Hu,t verifies exactly the same monotonic properties. On the other hand, in order to get a consistent approximation scheme, we have to replace ∇u (represented by the variable p in the above Hamiltonian) in the PDE by one of its numerical approximations (finite differences). The difficulty is then to find such a discretization while maintaining monotonicity. In order to get a monotonic scheme, we take inspiration from Lax-Friedrichs scheme for conservation laws [3, 2]. We chose: S(h, x, t, u) = Hu,t (x, Du(x)) − θ Lut (x),
(7)
where Du(x) is the vector obtained by a centered discretization of ∇u(x), more precisely, the ith component of Du(x) is [Du(x)]i =
→ → u(x + h− ei ) − u(x − h− ei ) 2h
and where Lut (x) is the classical discretization of the Laplacian Δu(x) (in which one replaces u(x) by t), i.e. Lut (x) =
→ → u(x + h− ei ) + u(x − h− ei ) − 2t . 2 h
i=1..N
704
E. Prados, N. Jindal, and S. Soatto
This scheme, however, is still not necessarily monotonic. To satisfy this property, we need to find an adequate value for θ. By differential calculus, one can verify that maxi=1..N h |∂pi Hu,t (x, Dz)| ≤ 2θ is a sufficient condition to ensure this property; see [15] for a detailled √ proof. By the same tools, one can also easily prove√that |∂pi Hu,t (x, p)| ≤ 2 2π. The scheme(7) is then monotonic as soon as θ ≥ 2πh. Also, to limit the smoothing due to the Laplacian term introduced in the scheme (term which can be interpreted as a regularization), θ must be as small as possible. On the other hand, under the assumptions of section 4.2, one can verify that any deep enough function is a subsolution of the scheme (7) (because the visibility cone becomes arbitrarily small). Moreover, the subsolutions are necessarily bounded by the function corresponding to convex hull defined by the Dirichlet boundary constraints. Since the scheme is also increasing with respect to t and verifies limt→+∞ S(h, x, t, u) ≥ 0 then theorems 3.1 and 3.5 of [13] ensure that the scheme (7) is stable and that the iterative approximations converge towards the solution of the scheme. In practice, we can start from any subsolution and we have just to update the surface with scheme (7) until convergence. Finally, our scheme being also consistent with the SFAS I-PDE, relying on Barle and Souganidis theorem [1], we can conjecture that the computed approximations converge towards the continuous solution of the I-PDE. This guarantees the reliability of our numerical approximations toward the theoretical solution of our problem.
6
Numerical Experiments
We focus here on the numerical results obtained by the algorithm associated to the scheme (7). As described in section 5.1, the approximation schemes suggest an iterative numerical algorithm, whose udating step (at point x) consists in solving equation S(h, x, t, u) = 0 (equation in t), where u is the approximation of the whole solution at the previous step. Here, to solve equation Hu,t (x, Du(x))−θ Lut (x) = 0, we rewrite this equation as a fixed point equation t = g(t), where
→ − → − h2 g(t) = 14 i=1,2 (u(x + h ei ) + u(x − h ei )) − θ Hu,t (x, Du(x)
and then process the iterations tn+1 = g(tn ). In practice this process systematically converges after less than 5 iterations (we assign t0 to the previous value of u(x)). The numerical algorithm starts with a subsolution as a very steep valley such that visibility is closed to 0 for all points in the domain of the image. We refer the reader to [15] for further implemention details. To test our algorithm, we consider some scenarios for which the problem is well-posed. In other words, we limit the computation domain to a subset of Ω = {x | I(x) < π}. This computation domain is delimited by the red box in the corresponding figures. On the other part of the image domain, we enforce Dirichlet boundary conditions. In our tests, we use the sin(x) sin(y) surface. For the first test, we restrict the computation domain to a subset on which the surface is convex. As shown in
A Non-local Approach to Shape from Ambient Shading
705
Fig. 2. Left: image generated by the sin x ∗ sin y surface with h = 0.05 and region of interest where we run the algorithm; middle: original surface (groundtruth) on the region interest; right: surface reconstructed by our algorithm (result)
Fig. 3. Left: image generated by the sin x ∗ sin y surface with h = 0.05 inside a cubical box and region of interest where we run the algorithm; middle: original surface (groundtruth) on the region of interest; right: surface reconstructed by our algorithm (result)
Fig. 4. sinx ∗ siny image with regularization and region of interest where we run the numerical scheme. Results of the numerical scheme with (right) and without (left) regularization in input image.
Fig. 5. Reconstruction with different grid sizes h
706
E. Prados, N. Jindal, and S. Soatto Table 1. Errors for the first two tests min value max value L1 errors L2 errors L∞ errors sin x sin y, Fig. 2 -0.999707 0.066750 0.006191 0.009792 0.033867 sin x sin y in box, Fig. 3 -0.999707 0.999568 0.188896 0.240712 0.372564 Table 2. Errors by adding the regularization term in the input image Min Value Max Value L1 Error L2 Error L∞ Error without regularization -0.999707 0.999568 0.186037 0.189434 0.207331 with regularization -0.999707 0.999568 0.065627 0.067900 0.078941 Table 3. Errors with respect to h
grid sizes (h) L1 error L2 error L∞ error
h = 0.2 0.504147 0.526644 0.658852
h = 0.1 0.358676 0.371685 0.424875
h = 0.08 0.270054 0.276862 0.308127
h = 0.05 0.186037 0.189434 0.207331
h = 0.04 0.151427 0.153691 0.166671
Figure 2, the computed iterative solution converges accurately towards the original surface. In the second test, we want to extend the computation domain to both concave and convex areas. To remove the ambiguity due to points with maximal intensity, we reduce the intensity of the image by placing the sin(x) sin(y) surface in a box, i.e. surrounded by four walls of a cube with the roof open. In this test, the algorithm converges towards the solution in both concave and convex regions. Nevertheless, as shown Figure 3, when the reconstruction is very accurate in the convex region, there is a significant error in the concave region. Table 1 shows the minimum and maximum values of the original surfaces in the regions of interest (where the algorithm is applied). It also shows the L1 , L2 and L∞ errors. The top row shows the errors for the first test (sin(x) sin(y) surface) illustrated in Figure 2. The second row shows the errors for sin(x) sin(y) surface inside a box; it corresponds with the result of Figure 3. In our experiments, we have used the L1 error to test for convergence. In the second test, one can understand the error on the concave region as a result of the introduction of the regularization term (which was needed to make the scheme monotonic). To further analyze this effect, we focus on the concave part and we perform the following two experiments. 1) We run our algorithm with an input image containing the regularization term. More precisely, we use 1 ˜ I(x) = (−Du(x), 1), ν+ dν − θ Lu(x) 1 + |Du(x)|2 Cu,(x,u(x)) as input to our algorithm. So, in practice, the algorithm computes the solution of equation 1 ˜ − θ Lu(x) = 0 (−Du(x), 1), ν+ dν − I(x) 2 1 + |Du(x)| Cu,(x,u(x))
A Non-local Approach to Shape from Ambient Shading
707
and the computed solution should then better coincide with the original surface. We then make this third test with the sin x sin y surface inside the box (with a computation domain reduced to the concave part). As shown in table 2 and Figure 4, the algorithm is now able to recover accurately the surface. 2) Finally, since the regularization parameter θ is linearly dependent with the size of the grid h, then the regularization effect should reduce when the size of the grid vanishes. We then redo the second test (sin x sin y surface inside a box, with the original image I, with the same reduced computation domain as previously) with smaller and smaller grid sizes: h = 0.2, 0.1, 0.08, 0.05, 0.04. Also, as we can see in Figure 5 and Table 3, the computed approximations actually converge towards the original surface when the grid size is reduced. In addition to confirm the above assertion, this also validates our methodology and our theory which ensures a well-posed algorithm whose the output convergences towards the continuous solution when the grid vanishes.
7
Conclusion and Future Work
In 3-D reconstruction approaches to Computer Vision, illumination is rarely modeled explicitly. With few notable exceptions, most work in Structure From Motion assumes that illumination is constant and therefore it ascribes all photometric effects to the radiance of the scene, regardless of how it comes to be. In Shape From Shading, where the illumination is key, most existing work models it as an ideal point light source. In this paper we focus on the opposite abstraction, where the illumination is diffuse, and indeed it is constant. Outdoor scenes on a cloudy day, or indoor scenes in modern offices are reasonably well approximated by these conditions. Clearly one would like to account for arbitrary unknown radiant distributions, and possibly also illumination, but this would render the analysis prohibitive. Already under the restrictive assumptions we have chosen to operate under, the problem of recovering the 3-D shape of the scene translates to a global integro-differential equation that, to the best of our knowledge, has never been analyzed. Although algorithms have been explored in the past to exploit diffuse shading for recovering properties of the scene, a thorough theoretical study of the mathematical properties of this problem has been lacking. We believe we are the first to study the uniqueness of SFAS, to show that – in general – it is not unique, and to characterize the set of scenes that are indistinguishable, in the sense of satisfying the assumptions of SFAS and generating the same image. While we believe that the main contribution of this paper is analytical, we do validate our results empirically in simulation. To that end, we propose a monotonic scheme for numerically integrating the SFAS equation, and show experimental results that highlight the features, and challenges, of this method.
Acknowledgement ANR-06-MDCA-007 and ONR N00014-08-1-0414.
708
E. Prados, N. Jindal, and S. Soatto
References 1. Barles, G., Souganidis, P.E.: Convergence of approximation schemes for fully nonlinear second order equations. Asymptotic Analysis 4, 271–283 (1991) 2. Crandall, M.G., Lions, P.L.: Two approximations of solutions of Hamilton-Jacobi equations. Mathematics of Computation 43(167), 1–19 (1984) 3. Crandall, M.G., Majda, A.: Monotone difference approximations for scalar conservation laws. Mathematics of Computation 34(149), 1–21 (1980) 4. Durou, J.-D., Falcone, M., Sagona, M.: Numerical methods for shape-from-shading: A new survey with benchmarks. CVIU 109(1), 22–43 (2008) 5. Faugeras, O.: Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press, Cambridge (1993) 6. Horn, B.K.: Robot Vision. MIT Press, Cambridge (1986) 7. Horn, B.K., Brooks, M.J. (eds.): Shape from Shading. MIT Press, Cambridge (1989) 8. Koenderink, J.J., Pont, S.C., van Doorn, A.J., Kappers, A.M.L., Todd, J.T.: The visual light field. Perception 36, 1595–1610 (2007) 9. Langer, M.S., Bulthoff, H.H.: Depth discrimination from shading under diffuse lighting. Perception 29(6), 649–660 (2000) 10. Langer, M.S., Zucker, S.W.: Shape from shading on a cloudy day. Journal of Optical Society of America 11, 467–478 (1994) 11. Lions, P.-L., Rouy, E., Tourin, A.: Shape-from-shading, viscosity solutions and edges. Numer. Math. 64, 323–353 (1993) 12. Nayar, S., Ikeuchi, K., Kanade, T.: Shape from interreflections. IJCV 6(3), 173–195 (1991) 13. Prados, E.: Application of the theory of the viscosity solutions to the Shape From Shading problem. PhD thesis, Univ. of Nice-Sophia Antipolis (2004) 14. Prados, E., Faugeras, O.: Shape from shading: a well-posed problem? In: Proceedings of CVPR 2005, vol. II, pp. 870–877. IEEE, Los Alamitos (2005) 15. Prados, E., Jindal, N., Soatto, S.: A non-local approach to shape from ambient shading. Technical report, INRIA (2009) 16. Stewart, A.J., Langer, M.S.: Towards accurate recovery of shape from shading under diffuse lighting. IEEE Trans. on PAMI 19(9), 1020–1025 (1997) 17. Tian, Y.L., Tsui, H.T., Yeung, S.Y., Ma, S.: Shape from shading for multiple light sources. Journal of the Optical Society of America 16(1), 36–52 (1999) 18. Wada, T., Ukida, H., Matsuyama, T.: Shape from shading with interreflections under proximal light source-3D shape reconstruction of unfolded book surface from a scanner image. In: ICCV (1995) 19. Yang, J., Zhang, D., Ohnishi, N., Sugie, N.: Determining a polyhedral shape using interreflections. In: CVPR 1997, p. 110 (1997) 20. Zhang, R., Tsai, P.-S., Cryer, J.-E., Shah, M.: Shape from Shading: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(8), 690–706 (1999)
An Elasticity Approach to Principal Modes of Shape Variation Martin Rumpf and Benedikt Wirth Bonn University, 53113 Bonn, Germany {martin.rumpf,benedikt.wirth}@ins.uni-bonn.de http://www.ins.uni-bonn.de Abstract. Concepts from elasticity are applied to analyze modes of variation on shapes in two and three dimensions. This approach represents a physically motivated alternative to shape statistics on a Riemannian shape space, and it robustly treats strong nonlinear geometric variations of the input shapes. To compute a shape average, all input shapes are elastically deformed into the same configuration. That configuration which minimizes the total elastic deformation energy is defined as the average shape. Each of the deformations from one of the shapes onto the shape average induces a boundary stress. Small amplitude stimulation of these stresses leads to displacements which reflect the impact of every single input shape on the average. To extract the dominant modes of variation, a PCA is performed on this set of displacements. To make the approach computationally tractable, a relaxed formulation is proposed, and sharp contours are approximated via phase fields. For the spatial discretization of the resulting model, piecewise multilinear finite elements are applied. Applications in 2D and in 3D demonstrate the qualitative properties of the presented approach.
1
Introduction
This paper is concerned with the notion of shape averages and principal modes of shape variation based on concepts from continuum mechanics, namely nonlinear and linearized elasticity. As shapes we consider object contours, encoded as edge sets in images. Compared to a classical principal component analysis in a vector space, where an average and a covariance tensor can be computed directly on the linear space itself, in the case of shapes we are dealing with highly nonlinear geometric variations. Hence, for the zero moment analysis – i. e. the definition of a suitable shape average – the total elastic energy stored in a set of deformations from the input shapes onto a single image shape is minimized. At the energy minimum the corresponding image shape is defined as the shape average. Concerning a first moment analysis, we propose a physically sound linearization of shape variations which allows to define a covariance tensor. Each deformation from an input onto the average shape induces stresses on the shape average, which can be regarded as the imprint of the input shape. Modulating these stresses leads to displacements on the shape average, where the mapping from stresses to X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 709–720, 2009. c Springer-Verlag Berlin Heidelberg 2009
710
M. Rumpf and B. Wirth
displacements is linear and well-defined. Each of these displacements can be regarded as a linearization of the usually nonlinear elastic deformation from one of the image shapes onto the shape average. Thus, a covariance tensor can be computed based on these displacements of the shape average. It linearly encodes the modes of variation of the shape average induced by the set of input shapes, even though the underlying deformations are usually large and nonlinear. Finally, we perform a principal component analysis based on this covariance tensor, which allows to identify the dominant modes of variation of the input shapes. Our model is related to the physical interpretation of the arithmetic mean and the covariance tensor for n points x1 , · · · , xn in IRd . Indeed, the arithmetic d mean x ∈ IR minimizes i=1,...,n αd(x, xi )2 , where d(x, xi ) is the distance between x and xi . Due to Hooke’s law, the stored elastic energy αd(x, xi )2 in the spring connecting xi and x is proportional to the squared distance. Hence, the arithmetic mean minimizes the total elastic energy of the system of connected springs. Likewise, the covariance tensor (xi − x, xj − x) can – up to the spring constant – be identified with the covariance tensor (σi , σj ) of the forces σi pulling at the mean x. At first, shape analysis was mainly based on correspondences between landmark positions on different shapes as in the influential work by Cootes et al. [1]. Principal component analysis (PCA) is a classical, by definition linear statistical tool. Chalmond and Girard [2] have proposed a PCA which incorporates also truely nonlinear geometric transformations. A survey on the potential of shape analysis in brain imaging is given by Faugeras and coworkers in [3]. Another important application concerns ready-made clothing, where it would be favorable to know the shape of the average human body and its principal modes of variation to design clothes which sufficiently fit as many people as possible. Conceptually, correlations of shapes have been studied on the basis of a general framework of a space of shapes and its intrinsic structure. The notion of shape space was introduced by Kendall [4] already in 1984. Charpiat et al. [5] discuss shape averaging and shape statistics based on the Hausdorff distance of sets. Statistics on signed distance functions was also studied by Leventon et al. [6], whereas Dambreville et al. [7] used shape statistics based on characteristic functions to define a robust shape prior in image segmentation. Kernel density estimation in feature space was introduced by Cremers et al. [8] to incorporate the probability of 2D silhouettes of 3D objects in image segmentation. An overview on related kernel density methods is given by Rathi et al. [9]. Mémoli and Sapiro [10] have investigated the Gromov–Hausdorff distance as a global measure for the lack of isometry in shape analysis. In contrast to such a global measure for the defect from an isometry, the nonlinear elastic energy functional involved in our approach measures this defect locally, and locally isometric deformations indeed minimize the corresponding local functional. Understanding shape space as an infinite-dimensional Riemannian manifold has been studied extensively by Miller et al. [11, 12]. Fuchs et al. [13] proposed a viscoelastic notion of the distance between shapes S given as boundaries of physical objects O. The elasticity paradigm for shape analysis on which our
An Elasticity Approach to Principal Modes of Shape Variation
711
approach is founded differs significantly from these metric approaches to shape space (cf. Sect. 4 for a detailed discussion of the conceptual difference). In this paper, shapes are represented implicitly via a diffused phase field description. This in particular enables a robust and flexible application in two and three dimensions.
2
Zero Moment Analysis
In this section we briefly recall an elastic approach to shape averaging already presented in [14]. We consider shapes Si as the boundaries ∂Oi of sufficiently regular objects Oi . Given n shapes, S1 , . . . , Sn , we seek an average shape S that reflects the geometric characteristics of the given shapes in a physical manner. For that purpose we assume that the average shape S can be described as a deformed configuration of the input shapes, i. e. there are deformations φi : Oi → IRd , i = 1, . . . , n, with S = φi (Si ) (see Fig. 1). A natural choice for the shape average S is that particular shape which minimizes the total n accumulated deformation energy of all deformations, E[S, (φi )i=1,...,n ] = n1 i=1 W[Oi , φi ], where W[Oi , φi ] represents the stored deformation energy of the deformation φi . To ensure existence of a minimizing shape S, we add a regularizingprior L[S] to the energy. Here, we consider the Hd−1 -measure of S, i. e. L[S] = S da, and the shape average S is defined as a minimizer of the energy E[S, (φi )i=1,...,n ] + μL[S]. As deformation energy W[Oi , φi ] we will employ a nonlinear, hyperelastic energy W[O, φ] = O W (Dφ) dx , whose integrand can be rewritten as a function of ˆ (Dφ, cofDφ, det (Dφ)) = W ¯ (I1 , I2 , I3 ) with only the three invariants W (Dφ) = W 2 2 T (I1 , I2 , I3 ) := (|Dφ|2 , |cofDφ|2 , det (Dφ)). |Dφ|2 := tr(Dφ Dφ), |cof(Dφ)|2 , and det(Dφ) describe the averaged local change of length, area, and volume, ˆ is conrespectively. We consider polyconvex energy functionals [15], where W vex and isometries, i. e. deformations with DφT Dφ = ½, are local minimizers p ¯ (I1 , I2 , I3 ) = α1 I 2 + (cf. Fig. 2). Typical energy densities are of the form W 1 q α2 I22 + α3 I3−s + α4 I3r with α1 , . . . , α4 > 0, where the penalization of volume 3 →0 ¯ I−→ ∞, enables us to control local injectivity (cf. [16]). shrinkage, i. e. W φ1
S1
S2
φ2
φ4
φ3
S3
S
φ5
S4
S5
Fig. 1. Sketch of elastic shape averaging. The input shapes Si (i = 1, . . . , 4) are mapped onto a shape S via elastic deformations φi . The shape S which minimizes the elastic deformation energy is denoted the shape average.
712
M. Rumpf and B. Wirth
Fig. 2. For two input shapes from Fig. 1 the deformation (via a deformed checkerboard), the averaged local change of length √12 |Dφi |2 , and the local change of area det(Dφi ) are depicted (colors encode range [0.95, 1.05])
This type of energy has two major advantages: it allows to incorporate large deformations with strong material and geometric nonlinearities, and its form follows from first principles and allows to distinguish the physical effects of length, area, and volume distortion, which reflect the local distance from an isometry. The first Piola–Kirchhoff stress tensor, which describes force per unit area in the reference configuration O, is then recovered as σ ref [φ] = W,A (Dφ) := ∂W∂A(A) . The Cauchy (real) stress, describing the force per unit area in the deformed configuration φ(O), reads σ[φ] = σ ref [φ](cofDφ)−1 . To simplify the numerical treatment and to allow for slight topological differences between the shapes Si we relax the constraint φi (Si ) = S, i = 1, . . . , n, and −1 introduce a penalty functional F [Si , φi , S] = Hd−1 (Si \ φ−1 i (S) ∪ φi (S) \ Si ) which measures the symmetric difference of the input shapes Si and the pull back φ−1 i (S) of S. Our shape averaging model is thus based on the energy 1 E [S, (φi )i=1,...,n ] = n i=1 n
γ
3
Oi
W (Dφi ) dx + γF [Si , φi , S] + μL[S] .
First Moment Analysis
As outlined in the introduction, our first moment analysis on shapes is based on an analysis of stresses induced on the shape average by each individual input shape. Modulation of each of these stresses results in a certain displacement, and the proposed principal component analysis on shapes will be performed on these displacements. To comprehensively derive this model we proceed in several steps: Encoding nonlinear deformations via stresses on a linear vector space. Let us at first review the underlying physical concept of stress. By the Cauchy stress principle, each deformation φi : Oi → O is characterized by pointwise boundary stresses on S in the deformed configuration, which try to restore the undeformed configuration Oi . The stress at some point x on S is given by the application of the Cauchy stress tensor σi = σ[φi ] to the outer normal ν on S. The resulting stress σi ν is a force density acting on a local surface element of S. Let us assume that the above relation between the energetically favorable deformation and its induced stresses is one-to-one. Hence, the average shape can be described in terms of the input shape Si and the boundary stress σi ν, and
An Elasticity Approach to Principal Modes of Shape Variation
713
we write S = Si [σi ν]. If we now scale the stress with a weight t ∈ [0, 1], we obtain a one-parameter family of shapes S(t) = Si [tσi ν] connecting Si = S(0) with S = S(1). Thus, we can regard σi ν as a representative of shape Si in the linear space of vector fields on S. Modeling the impact of an input shape on the average shape. Let us now study how the average shape S varies if we increase the impact of a particular input shape Sk for some k ∈ {1, . . . , n}. In fact, we intend to associate to every surface load σk ν a displacement on the averaged object domain O via the solution operator of a suitable linearized elasticity problem. Here, the object O actually is a deformed configuration of different original objects Oi . Hence, we have to choose a proper elasticity tensor which reflects the compound stress configuration of the averaged domain O. A simple isotropic linearized elasticity model would not take into account the nonlinear geometric nature of our zero order analysis. To achieve this, we apply the Cauchy stress σk ν to the average shape S, scaled with a small constant δ. Based on our above discussion of stresses and due to the sketched equilibrium condition, this additional boundary stress δσk ν acts as a first Piola–Kirchhoff stress on the (reference) configuration S. The elastic response is given by a correspondingly scaled displacement uk : O → IRd . To properly model the loaded configurations we concatenate this displacement with every nonlinear deformation φi and take into account the sum of the resulting elastic energies plus a term involving the given Cauchy stress in the following energy, 1 Ek [δ, u] = W[Oi , (½ + δu) ◦ φi ] − δ 2 σk ν · u da . n i=1,...,n S Now, the displacement uk is obtained as a minimizer of this modulated energy for a fixed set of deformations (φi )i=1,...,n under the constraints O uk dx = 0 and O x× uk dx = 0, which encode zero average translation and rotation.Let us remark that the boundary integral can be replaced by the volume integral O σk : Du dx, which is more convenient with respect to a numerical discretization. To verify this, we use integration by parts and the fact that div σk = 0 holds on O. As Euler Lagrange condition for uk we obtain div σk [δ uk ] = 0 on O and σ[δ uk ]ν = δσk ν on S after a tedious but straightforward computation. Here, σ[δ uk ] :=
1 −1 W,A ((½ + δDuk )Dφi ◦ φ−1 i )cofD(φi ) n i=1,...,n
is the first Piola–Kirchhoff stress tensor on the compound object O, which effectively reflects an average of all stresses in the n deformed configurations φi (Oi ) for i = 1, . . . , n. As long as A → W (A) is not quadratic in A, uk still solves a nonlinear elastic problem. The advantage of this nonlinear variational formulation is that it is of the same type as the one for the zero moment analysis, and it encodes in a natural way the compound elasticity configuration of the
714
M. Rumpf and B. Wirth
σ2refν ref S2
σ2 ν
φ2
1 φ− 2 (x)
σ3 ν
σ1 ν
1 φ− 1 (x)
S1
x
1 φ− 3 (x)
ref ref
σ1 ν
φ1
S
φ3
σ3refν ref
S3
Fig. 3. Sketch of the pointwise stress balance relation on the averaged shape
averaged shape domain O. As an obvious drawback we have to consider the sum of n nonlinear elastic energies for the computation of every displacement uk , k = 1, . . . , n. In the limit for δ → 0, we would obtain uk as the solution of the actually linear elasticity problem div (C [u]) = 0 in O ,
C [u] ν = σk ν on S
for the symmetric displacement gradient [u] = (Du + DuT )/2 under the constraint O u dx = 0. Here, the in general inhomogeneous and anisotropic elasticity tensor C is defined by 1 1 T C= Dφi W,AA [Dφi ]Dφi ◦ φ−1 , i n i=1,...,n det Dφi based on an appropriate transformation of the Hessian of the energy density W . This elasticity tensor takes into account the loads of the compound configuration based on the combination of all deformations φi on the input objects Oi for i = 1, . . . , n. In our current implementation, we avoid the evaluation of C and consider the above nonlinear approximation, which is simpler to implement but computationally more expensive. The actual covariance analysis based on the derived displacements. Now, we have a set of displacements uk : O → IRd at hand which represent the variations of the average shape, induced by a modulation of the stresses σk from the deformations φk of the input shapes Sk into the average shape S. On this space of displacements, we consider the standard L2 –product (u, u ˜)2 := O u·˜ u dx and define the covariance operator Cov : L2 (O) → L2 (O); u → Covu :=
1 n
(u, uk )2 uk .
k=1,...,n
Obviously, Cov is positive definite on span(u1 , · · · , un ). Hence, we can diagonalize Cov on this finite dimensional space and obtain a set of L2 –orthogonal eigenfunctions wk : O → IRd – actually displacements – and eigenvalues λk > 0 with Covwk = λk wk .
An Elasticity Approach to Principal Modes of Shape Variation
715
Fig. 4. The two dominant modes (right) for four different shapes (left) demonstrate that our principal component analysis properly captures strong geometric nonlinearities
These eigenfunctions can be considered as principal modes of variation of the average object O and hence of the average shape S, given the n input shapes. The eigenvalues encode the actual strength of these variations. Let us underline that this covariance analysis properly takes into account the usually strong geometric nonlinearity in shape analysis via the transfer of geometric shape variation to elastic stresses on the average shape, based on paradigms from nonlinear elasticity (cf. Fig. 4). These stresses lie in a linear vector space and thus allow for a covariance analysis, which is by definition linear. The interpretation of stresses in terms of displacements can be regarded as a proper choice of a scalar metric g(·, ·) on the space of stresses interpreted as a tangent space of the shape space at the average shape: we define g(σν, σ ˜ ν) := (u, u ˜)2 , given the above identification of stresses σν, σ ˜ ν with induced displacements u, u ˜ via the proper compound elasticity problem. Finally, this identification provides a suitable physical interpretation of stresses as modes of shape variation.
4
Elastic versus Riemannian Shape Analysis
The elasticity paradigm, on which our zero and first order shape analysis are based, differs significantly from a Riemannian approach to shape space as proposed for instance by Srivastava et al. [17]. Due to the axiom of elasticity, the energy at the deformed configuration S is independent of the path from a shape S˜ to the shape S along which the deformation is generated in time. Hence, there is no notion of shortest paths if we consider a purely elastic shape model. The visco-plastic model by Fuchs et al. [13] and the related model by Younes [18] define energies based on an integration of dissipation along transformation paths, where dissipation is understood as a Riemannian metric. This approach is not elastic in the classical axiomatic sense we consider here, and it partiularly requires that at rest the intermediate configurations are all stress-free. The above-mentioned conceptual differences are reflected in a different behavior. If we regard shapes from a flow-oriented perspective, then a visco-elastic approach would be more appropriate. However, the elastic approach is favorable for rather rigid, more stable shapes, since it prevents locally strong isometry violation. An example is provided in Fig. 5: The input shapes are regarded as two versions of an object that may have none, one, or two pins at more or less stable positions. Both pins are apparently not interpreted as shifted versions of each other since a shifting deformation would cost too much energy. However, if the material was visco-plastic, a horizontal shift of each pin would be easier and result in an average shape with just one centered pin and its variation being a
716
M. Rumpf and B. Wirth
Fig. 5. Average and variation (right) for two shapes with pins at different positions (left). The pins are not interpreted as shifted versions of each other.
sideward movement. This corresponds to a completely different perception of the input shapes. The strong local rigidity and isometry preservation of the elasticity concept becomes particularly evident in Fig. 4 and Fig. 6, where non-isometric deformations are concentrated only at joints. On a Riemannian manifold, the exponential map allows to describe geodesics from an averaged shape S – in the sense of Karcher [19] – to the input shapes Sk via Sk = expS (vk ) for some tangent vector vk at the shape S in shape space. Hence, a covariance analysis will be performed on the tangent vectors v1 , · · · , vn with respect to the Riemannian metric g(·, ·). In the strictly elastic setup, the shape space is in general not metrizable. Instead, the stresses σk play the role of the vk , imprinting the impact of Sk on the average shape S in terms of an induced displacement uk .
5
Finite Element Phase Field Approximation
Since explicit treatment of an edge set is difficult in a variational setting, we consider a phase field model picking up the approach by Ambrosio and Tortorelli [20] for the discretization of the Mumford–Shah model [21]. Hence, a shape S is encoded by a smooth phase field function v : Ω → IR, which is close to zero on S and one in between. In our approach we construct such phase field functions vi for the input shapes Si in advance. Usually, vi can be computed based on the model in [20] applied to the input images ui . The specific form of the phase field function v for the averaged shape S is then directly determined via a phase field approximation of our variational model. Given a phase field parameter , which will determine the width of the phase field, we first define an approxi mate mismatch penalty F [vi , φi , v] = 1 Ω (v ◦ φi )2 (1 − vi )2 + vi2 (1 − v ◦ φi )2 dx . Here, we suppose v to be extended by1 outside the computational domain Ω. 1 Next, we consider the energy L [v] = Ω |∇v|2 + 4 (v − 1)2 dx , which acts as an approximation of the prior L[S]. Furthermore, we simplify the later numerical implementation by assuming that the whole computational domain behaves elastically with an elasticity several orders of magnitude softer outside the object domains Oi on the complement set Ω \ Oi . Thus, given a smooth approximation χOi of the characteristic function of the object domain O i , we define an ap proximate elastic energy W [Oi , φi ] = Ω (1 − η)χOi + η W (Dφi ) dx , where in our applications η = 10−4 . Finally, the resulting approximation of the total energy functional for the variational description of the average shape reads
An Elasticity Approach to Principal Modes of Shape Variation
717
1 (W [Oi , φi ] + γF [vi , φi , v]) + μL [v] . n i=1 n
E γ, [v, (φi )i=1,...,n ] =
In analogy, a phase field approximation Ekγ, of the energy Ek can be constructed. In these approximations, F acts as a penalty with γ 1 and L ensures a mild regularization of the averaged shape with μ 1. Integration is performed only in regions where all integrands are defined. The actual spatial discretization is based on finite elements. We consider the phase fields v, vi and deformations φi as being represented by continuous, piecewise multilinear (trilinear in 3D and bilinear in 2D) finite element functions on an image domain Ω = [0, 1]d . A cascadic multi scale approach is applied for the relaxation of the energy. For details both on the phase field approximation and the numerical discretization we refer to [14].
6
2D and 3D Applications
We have applied our shape analysis approach to various collections of 2D and 3D shapes. The computed average and dominant variations for sets of 2D shapes are depicted in Figs. 1 to 7 as first illustrative examples. Figure 1 shows the average of five human silhouettes. The corresponding deformations φi and local deformation invariants are displayed in Fig. 2 for two of the input shapes. Particularly the deformed checkerboard patterns show that – due to the invariance properties of the energy – isometries are locally preserved. Also, the indicators of length and area variation only peak locally at the person’s joints. The corresponding principal components are given in Fig. 6. The average shape is represented by the dark line, whereas the light red lines signify deformations of the shape along the principal components. Here, we see the bending of the arm and the leg basically decoupled as the first two dominant modes of variation. The silhouette variations of raising the arm or the leg can only be obtained as linear combinations of the first and fourth or of the second and third mode of variation, respectively. A larger set of shapes is treated in Fig. 7, where 20 binary images “device7” from the MPEG7 shape database serve as input shapes. Apparently, the first principal component is given by a thickening or thinning of the leaves, accompanied by a change of indentation depth between them. The second mode obviously corresponds to bending the leaves, and the third mode represents local changes at the tips: A sharpening and orientation of neighboring
Fig. 6. A set of input shapes (cf. Fig. 1) and their modes of variation with ratios 1, 0.22, 0.15, and 0.06
λi λ1
of
718
M. Rumpf and B. Wirth
Fig. 7. Original shapes and their first three modes of variation with ratios 0.20, and 0.05
λi λ1
of 1,
Fig. 8. 24 given foot shapes, textured with the distance to the surface of the average foot (bottom right). The range [−6 mm, 6 mm] is color-coded as .
λ1 /λ1 = 1
λ2 /λ1 = 0.010
λ3 /λ1 = 0.010
λ4 /λ1 = 0.003
λ5 /λ1 = 0.001
λ6 /λ1 = 0.0008
Fig. 9. The first six dominant modes of variation for the feet from Fig. 8
tips towards each other, originating e. g. from the sixth or the second last input shape. The final example uses 24 foot-shapes as input (which were originally provided as triangulated surfaces and then converted to characteristic functions
An Elasticity Approach to Principal Modes of Shape Variation
719
on the unit cube). The average shape is shown along with the original shapes in Fig. 8, where the input feet are color-coded according to their local distance to the surface of the average foot. It is doubtlessly difficult to analyze the shape variation on this basis: We see modest variation at the toes and the heel as well as on the instep, but any correlation between these variations is difficult to determine. The corresponding modes of variation in Fig. 9, however, are quite intuitive. For all modes we show the average in the middle and its configurations after deformation according to the principal components. The first mode apparently represents changing foot lengths, the second and third mode belong to different variants of combined width and length variation, and the fourth to sixth mode correspond to variations in relative heel position, ankle thickness, and instep height.
7
Conclusion
We have developed an elasticity-based notion of shape variation. Since the shape space of elastically deformable objects inherently does not possess a Riemannian structure, we utilized an alternative shape space structure, in which distance is replaced by elastic deformation energy and boundary stresses play the role of linear representations of shapes. Such an approach imposes a physically and mathematically sound structure on spaces of elastic objects. Its computational feasibility has been proven by application to sets of 2D and 3D shapes.
Acknowledgments The authors thank Guillermo Sapiro for pointing them to the issue of an elastic principal component analysis. We are grateful to Heiko Schlarb from adidas, Herzogenaurach, Germany, for providing 3D scans of feet. Furthermore, we acknowledge support by the Hausdorff Center for Mathematics. Benedikt Wirth has been supported by the Bonn International Graduate School.
References 1. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models—their training and application. Computer Vision and Image Understanding 61(1), 38–59 (1995) 2. Chalmond, B., Girard, S.C.: Nonlinear modeling of scattered multivariate data and its application to shape change. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(5), 422–432 (1999) 3. Faugeras, O., Adde, G., Charpiat, G., Chefd’Hotel, C., Clerc, M., Deneux, T., Deriche, R., Hermosillo, G., Keriven, R., Kornprobst, P., Kybic, J., Lenglet, C., LopezPerez, L., Papadopoulo, T., Pons, J.P., Segonne, F., Thirion, B., Tschumperlé, D., Viéville, T., Wotawa, N.: Variational, geometric, and statistical methods for modeling brain anatomy and function. NeuroImage 23, S46–S55 (2004)
720
M. Rumpf and B. Wirth
4. Kendall, D.G.: Shape manifolds, procrustean metrics, and complex projective spaces. Bull. London Math. Soc. 16, 81–121 (1984) 5. Charpiat, G., Faugeras, O., Keriven, R.: Approximations of shape metrics and application to shape warping and empirical shape statistics. Foundations of Computational Mathematics 5(1), 1–58 (2005) 6. Leventon, M., Grimson, W., Faugeras, O.: Statistical shape influence in geodesic active contours. In: 5th IEEE EMBS International Summer School on Biomedical Imaging, 2002 (2002) 7. Dambreville, S., Rathi, Y., Tannenbaum, A.: A shape-based approach to robust image segmentation. In: Campilho, A., Kamel, M. (eds.) ICIAR 2006. LNCS, vol. 4141, pp. 173–183. Springer, Heidelberg (2006) 8. Cremers, D., Kohlberger, T., Schnörr, C.: Shape statistics in kernel space for variational image segmentation. Pattern Recognition 36, 1929–1943 (2003) 9. Rathi, Y., Dambreville, S., Tannenbaum, A.: Comparative analysis of kernel methods for statistical shape learning. In: Beichel, R., Sonka, M. (eds.) CVAMIA 2006. LNCS, vol. 4241, pp. 96–107. Springer, Heidelberg (2006) 10. Mémoli, F., Sapiro, G.: A theoretical and computational framework for isometry invariant recognition of point cloud data. Foundations of Computational Mathematics 5, 313–347 (2005) 11. Miller, M.I., Younes, L.: Group actions, homeomorphisms and matching: a general framework. International Journal of Computer Vision 41(1-2), 61–84 (2001) 12. Miller, M., Trouvé, A., Younes, L.: On the metrics and euler-lagrange equations of computational anatomy. Annual Review of Biomedical Enginieering 4, 375–405 (2002) 13. Fuchs, M., Jüttler, B., Scherzer, O., Yang, H.: Shape metrics based on elastic deformations. Forschungsschwerpunkt S92, Idustrial Geometry 71, Universität Innsbruck (2008) 14. Rumpf, M., Wirth, B.: A nonlinear elastic shape averaging approach. SIAM Journal on Imaging Sciences (2008) (submitted) 15. Ciarlet, P.G.: Three-dimensional elasticity. Elsevier Science Publishers B. V., Amsterdam (1988) 16. Baker, T.: Three dimensional mesh generation by triangulation of arbitrary point sets. In: Computational Fluid Dynamics Conference, 8th, Honolulu, HI, June 9-11, 1987, vol. 1124-CP, pp. 255–271 (1987) 17. Srivastava, A., Jain, A., Joshi, S., Kaziska, D.: Statistical shape models using elastic-string representations. In: Narayanan, P. (ed.) ACCV 2006. LNCS, vol. 3851, pp. 612–621. Springer, Heidelberg (2006) 18. Younes, L.: Computable elastic distances between shapes. SIAM J. Appl. Math. 58, 565–586 (1998) 19. Karcher, H.: Riemannian center of mass and mollifier smoothing. Communications on Pure and Applied Mathematics 30(5), 509–541 (1977) 20. Ambrosio, L., Tortorelli, V.M.: On the approximation of free discontinuity problems. Bollettino dell’Unione Matematica Italiana, Sezione B 6(7), 105–123 (1992) 21. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Communications on Pure Applied Mathematics 42, 577–685 (1989)
Pre-image as Karcher Mean Using Diffusion Maps: Application to Shape and Image Denoising Nicolas Thorstensen, Florent Segonne, and Renaud Keriven Universite Paris-Est, Ecole des Ponts ParisTech, Certis
[email protected] http://certis.enpc.fr/˜thorsten
Abstract. In the context of shape and image modeling by manifold learning, we focus on the problem of denoising. A set of shapes or images being known through given samples, we capture its structure thanks to the Diffusion Maps method. Denoising a new element classically boils down to the key-problem of pre-image determination, i.e.recovering a point, given its embedding. We propose to model the underlying manifold as the set of Karcher means of close sample points. This non-linear interpolation is particularly well-adapted to the case of shapes and images. We define the pre-image as such an interpolation having the targeted embedding. Results on synthetic 2D shapes and on real 2D images and 3D shapes are presented and demonstrate the superiority of our pre-image method compared to several state-of-the-art techniques in shape and image denoising based on statistical learning techniques.
1 Introduction Manifold learning, the process of extracting the meaningful structure and correct geometric description present in a set of training points Γ = {s1 · · · sp } ⊂ §, has seen renewed interest over the past years. These techniques are closely related to the notion of dimensionality reduction, i.e.the process of recovering the underlying low dimensional structure of a manifold M that is embedded in a higher-dimensional space §. Among the most recent and popular techniques are the Locally Linear Embedding (LLE) [5], Isomap [6], Laplacian eigenmaps [7] and Diffusion Maps [8, 9, 10]. In this paper we focus on Diffusion Maps. Their nonlinearity, as well as their locality-preserving property and stable behavior under noise are generally viewed as a major advantage over classical methods like principal component analysis (PCA) and classical multidimensional scaling [8]. This method considers an adjacency graph on the set Γ of training samples, which matrix (Wi,j )i,j∈1,...,p captures the local geometry of Γ - its local connectivity - through the use of a kernel function w. Wi,j = w(si , sj ) measures the strength of the edge between si and sj . Typically w(si , sj ) is a decreasing function of the distance d§ (si , sj ) between the training points si and sj . In this work, we use the Gaussian kernel w(si , sj ) = exp (−d2§ (si , sj )/2σ 2 ), with σ estimated as the median of the distances between all the training points [2, 10]. The kernel function has the property to implicitly map data points into a highdimensional space, called the feature space. This space is better suited for the study of non-linear data. Computing the Diffusion Maps amounts to embed the data into the X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 721–732, 2009. c Springer-Verlag Berlin Heidelberg 2009
722
N. Thorstensen, F. Segonne, and R. Keriven
feature space through a mapping Ψ . While the mapping from input space to feature space is of primary importance , the reverse mapping from feature space back to input space (the pre-image problem) is also useful. Consider for example the use of kernel PCA for pattern denoising. Given some noisy patterns, kernel PCA first applies linear PCA on the -mapped patterns in the feature space, and then performs denoising by projecting them onto the subspace defined by the leading eigenvectors. These projections, however, are still in the feature space and have to be mapped back to the input space in order to recover the denoised patterns. 1.1 Related Work Statistical methods for shape processing are very common in computer vision. A seminal work in this direction was published by Leventon et. al. [11] adding statistical knowledge into energy based segmentation methods. Their method captures the main modes of variation by performing a PCA on the set of shapes. This was extended to nonlinear statistics by Cremers et al. in [12]. The authors introduce non linear shape priors by using a probabilistic version of Kernel PCA (KPCA). Dambreville et.al [1] and Arias et al. [2] developed a method for shape denoising based on Kernel PCA. So did Kwok et al. [3] in the context of image denoising. Both methods compute a projection of the noisy datum onto a low dimensional space. In [13,4] the authors propose another kernel method for data denoising, the so called Laplacian Eigenmaps Latent Variable Model (LELVM), a probabilistic method. This model provides a dimensionality reduction and reconstruction mapping based on linear combinations of input samples. LELVM performs well on motion capture data but fails on complex shapes (see Fig. 1). Further we would like to mention the work of Pennec [14] and Fletcher [15] modeling the manifold of shapes as a Riemannian manifold and the mean of such shapes as a Karcher mean [16]. Their methodology is used in the context of computational anatomy to solve the average template matching problem. Closer to our work is the algorithm proposed by Etyngier et. al. [17]. They use Diffusion Maps as a statistical framework for non linear shape priors in segmentation. They augment an energy functional by a shape prior term. Contrary to us, they do not compute a denoised shape but propose an additional force toward a rough estimate of it.
Fig. 1. Digit images corrupted by additive Gaussian noise (from left to right, σ 2 = 0.25, 0.45, 0.65, 0.85). The different rows respectively represent, from top to bottom: the original digits; the corrupted digits; denoising with [1]; with [1]+ [2]; with [3]; with [3]+ [2]; with [4]; with our Karcher means based method. See table 2 for quantified results.
Pre-image as Karcher Mean Using Diffusion Maps
723
1.2 Our Contributions In this paper, we propose a new method to solve the pre-image (see Section 3) problem in the context of Diffusion Maps for shape and image denoising. We suggest a manifold interpretation and learn the intrinsic structure of a given training set. Our method relies on a geometric interpretation of the problem which naturally leads the definition of the pre-image as a Karcher-mean [16] that interpolates between neighboring samples according to the diffusion distance. Previous pre-image methods were designed for Kernel PCA. Our motivation for using Diffusion Maps comes from the fact that the computed mapping captures the intrinsic geometry of the underlying manifold independently of the sampling. Therefore, the resulting Nyström extension (see Section 2.2) proves to be more “meaningful” far from the manifold and leads to quantitatively better pre-image estimations, even for very noisy input data. In the case of shape denoising, we compare our results to the work proposed by Dambreville [1] and for image denoising, to several denoising algorithms using Kernel PCA: [3], [2], [4]. Results on 3D shapes and 2D images are presented and demonstrate the superiority of our method. The rest of the paper is organized as follows. Section 2 presents the Diffusion Maps framework and the out-of-sample extension. Section 3 introduces our pre-image methodology. Numerical experiments on real data are reported in section 4 and section 5 concludes.
2 Learning a Set of Shapes Let Γ = {s1 · · · sp } be p independent random points of a m-dimensional manifold M locally sampled under some density qM (s) (m