Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbruecken, Germany
6046
Ivan Dimov Stefka Dimova Natalia Kolkovska (Eds.)
Numerical Methods and Applications 7th International Conference, NMA 2010 Borovets, Bulgaria, August 2024, 2010 Revised Papers
13
Volume Editors Ivan Dimov Bulgarian Academy of Sciences Institute of Computer and Communication Technologies Acad. G. Bonchev 25 A, 1113 Sofia, Bulgaria Email:
[email protected] Stefka Dimova University of Sofia "St. Kliment Ohridski" Faculty of Mathematics and Informatics Department Numerical Methods and Algorithms Blvd. James Bourchier 5, 1164 Sofia, Bulgaria Email:
[email protected] Natalia Kolkovska Bulgarian Academy of Sciences Institute of Mathematics and Informatics Acad. Bonchev St.,Bl.8, 1113 Sofia, Bulgaria Email:
[email protected] ISSN 03029743 eISSN 16113349 eISBN 9783642184666 ISBN 9783642184659 DOI 10.1007/9783642184666 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2010942928 CR Subject Classification (1998): G.1, F.2.1, G.4, I.6, J.2, J.6 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues
© SpringerVerlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Cameraready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acidfree paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
The international conference Numerical Methods and Applications is a traditional forum for scientists from all over the world providing an opportunity to share ideas and establish fruitful scientiﬁc cooperation. The aim of the conference is to bring together leading international scientists of the numerical and applied mathematics community and to attract original research papers of very high quality. The papers in this volume were presented at the seventh edition of the International Conference on Numerical Methods and Applications (ICNM&A 2010) held in Borovets, Bulgaria, August 20–24, 2010. The conference was organized by the Institute of Mathematics and Informatics of the Bulgarian Academy of Sciences in cooperation with SIAM. The Faculty of Mathematics and Informatics of St. Kliment Ohridski University of Soﬁa and the Institute of Computer and Communication Technologies, Bulgarian Academy of Sciences were coorganizers of this traditional scientiﬁc meeting. Over 100 participants from 22 countries attended the conference. Ninetyfour talks, including ten invited and keynote talks, were presented. This volume contains 60 papers submitted by authors from 16 countries. During ICNM&A 2010 a wide range of problems concerning recent theoretical achievements in numerical methods and their applications in mathematical modeling were discussed. Speciﬁc topics of interest were the following: Numerical methods for diﬀerential and integral equations; approximation techniques in numerical analysis; numerical linear algebra; hierarchical and domain decomposition methods; parallel algorithms; Monte Carlo methods; computational mechanics; computational physics, chemistry and biology; engineering applications. Five special sessions were organized: Monte Carlo and QuasiMonte Carlo Methods; Environmental Modeling; Grid Computing and Applications; Metaheuristics for Optimisation Problems; Modeling and Simulation of Electrochemical Processes. The ICNM&A 2010 talks were delivered by researchers representing some of the strongest research teams in the ﬁeld of numerical methods and their application for solving a wide range of practical problems. The success of the conference and the present volume are due to the joint eﬀorts of many colleagues from various institutions and organizations. We express our deep gratitude to all the members of the Scientiﬁc Committee for their valuable contribution to forming the scientiﬁc spirit of the conference, as well as for their help in reviewing the submitted papers. We are also grateful to the staﬀ involved in the local organization.
VI
Preface
We hope that this meeting among scientists who develop and study numerical methods, on one hand, and researchers who use them for solving reallife problems, on the other, has broadened their horizons and contributed to their mutual enrichment. December 2010
Ivan Dimov Stefka Dimova Natalia Kolkovska
Organization
International Scientiﬁc Committee A. Andreev (Bulgaria) E. Atanassov (Bulgaria) R. Blaheta (Czech Republic) T. Boyadjiev (Bulgaria) J. Buˇsa (Slovakia) R. Ciegis (Lithuania) P. D’Ambra (Italy) I. Dimov (Bulgaria) S. Dimova (Bulgaria) I. Farago (Hungary) M. Feistauer (Czech Republic) S. Fidanova (Bulgaria) K. Georgiev (Bulgaria) A. Goolin (Russia) S. GochevaIlieva (Bulgaria)
J. Guermond (USA) R. Herbin (France) O. Iliev (Germany) B. Jovanovic (Serbia) S. Korotov (Finland) J. Kraus (Austria) N. Krejic (Serbia) R. Lazarov (USA) I. Lirkov (Bulgaria) S. Margenov (Bulgaria) P. Marinov (Bulgaria) S. Markov (Bulgaria) P. Matus (Belarus) P. Minev (Canada) M. Nedjalkov (Bulgaria) J. Pedroso (Portugal) K. Penev (UK) B. Popov (USA)
S. Radev (Bulgaria) P. Ribeiro (Portugal) K. Sabelfeld (Russia) J. Schoeberl (Germany) S. Selberherr (Austria) Bl. Sendov (Bulgaria) K. Semerdzhiev (Bulgaria) S. Slavchev (Bulgaria) M. Todorov (Bulgaria) V. Thomee (Sweden) P. Vabishchevich (Russia) I. Yotov (USA) L. Zikatanov (USA)
Organizing Committee Chairperson: N. Kolkovska I. Bazhlekov T. Chernogorova I. Christov
M. Dimova I. Georgiev
S. Stoilova D. Vasileva
Table of Contents
Invited Papers SpaceTime Discontinuous Galerkin Finite Element Method for ConvectionDiﬀusion Problems and Compressible Flow . . . . . . . . . . . . . . . ˇ Miloslav Feistauer and Jan Cesenek
1
Stochastic Algorithms in Linear Algebra  beyond the Markov Chains and von Neumann  Ulam Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karl Sabelfeld
14
SM Stability for TimeDependent Problems . . . . . . . . . . . . . . . . . . . . . . . . . Petr N. Vabishchevich
29
Monte Carlo and QuasiMonte Carlo Methods Advanced Monte Carlo Techniques in the Simulation of CMOS Devices and Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asen Asenov
41
Monte Carlo Method for Numerical Integration Based on Sobol’s Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ivan Dimov and Rayna Georgieva
50
Using MonteCarlo Simulation for Risk Assessment: Application to Occupational Exposure during Remediation Works . . . . . . . . . . . . . . . . . . . M.L. Dinis and A. Fi´ uza
60
The badic Diaphony as a Tool to Study Pseudorandomness of Nets . . . . Ivan Lirkov and Stanislava Stoilova
68
Scatter Estimation for PET Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . Milan Magdics, Laszlo SzirmayKalos, Balazs T´ oth, ´ Adam Csendesi, and Anton Penzov
77
Modeling of the SET and RESET Process in Bipolar Resistive OxideBased Memory Using Monte Carlo Simulations . . . . . . . . . . . . . . . . Alexander Makarov, Viktor Sverdlov, and Siegfried Selberherr
87
Stochastic Algorithm for Solving the WignerBoltzmann Correction Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Nedjalkov, S. Selberherr, and I. Dimov
95
X
Table of Contents
Modeling Thermal Eﬀects in FullyDepleted SOI Devices with Arbitrary Crystallographic Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Raleva, D. Vasileska, and S.M. Goodnick
103
Particle Monte Carlo Algorithms with Small Number of Particles in Grid Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefan K. Stefanov
110
Is SelfHeating Important in Nanowire FETs? . . . . . . . . . . . . . . . . . . . . . . . D. Vasileska, A. Hossain, K. Raleva, and S.M. Goodnick
118
Environmental Modeling MixedHybrid Formulation of Multidimensional Fracture Flow . . . . . . . . . Jan Bˇrezina and Milan Hokr
125
WRFFire Applied in Bulgaria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nina Dobrinkova, Georgi Jordanov, and Jan Mandel
133
Bulgarian Operative System for Chemical Weather Forecast . . . . . . . . . . . Iglika Etropolska, Maria Prodanova, Dimiter Syrakov, Kostadin Ganev, Nikolai Miloshev, and Kiril Slavov
141
Atmospheric Composition Studies for the Balkan Region . . . . . . . . . . . . . . Georgi Gadzhev, Georgi Jordanov, Kostadin Ganev, Maria Prodanova, Dimiter Syrakov, and Nikolai Miloshev
150
Specialized Sparse Matrices Solver in the Chemical Part of an Environmental Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Krassimir Georgiev and Zahari Zlatev
158
A Numerical Investigation for the Optimal Contaminant Inlet Positions in Horizontal Subsurface Flow Wetlands . . . . . . . . . . . . . . . . . . . . . . . . . . . . Konstantinos Liolios, Vassilios Tsihrintzis, and Stefan Radev
167
Using Satellite Observations for Air Quality Assessment with an Inverse Model System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Achim Strunk, Hendrik Elbern, and Adolf Ebel
174
Distributed Software System for Data Evaluation and Numerical Simulations of Atmospheric Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Atanas T. Terziyski and Nikolay T. Kochev
182
Advanced Numerical Tools Applied to Geoenvironmental Engineering  Soils Contaminated by Petroleum Hydrocarbons, a Case Study . . . . . . . Maria Cristina Vila, J.M. Soeiro de Carvalho, and Ant´ onio Fi´ uza
190
Table of Contents
Richardson Extrapolated Numerical Methods for Treatment of OneDimensional Advection Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zahari Zlatev, Ivan Dimov, Istv´ an Farag´ o, Krassimir Georgiev, ´ Agnes Havasi, and Tzvetan Ostromsky
XI
198
Grid Computing and Applications Programming Problems with a Large Number of Objective Functions . . . Cornel Resteanu and Romica Trandaﬁr
207
First Results of SEEGRIDSCI Application CCIAQ . . . . . . . . . . . . . . . . . Dimiter Syrakov, Valery Spiridonov, Kostadin Ganev, Maria Prodanova, Andrey Bogachev, Nikolai Miloshev, and Kiril Slavov
215
Metaheuristics for Optimization Problems Genetic Algorithms Based Parameter Identiﬁcation of Yeast FedBatch Cultivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Angelova, Stoyan Tzonkov, and Tania Pencheva Intuitionistic Fuzzy Interpretations of Conway’s Game of Life . . . . . . . . . . Lilija Atanassova and Krassimir Atanassov Ant Colony Optimization Approach to Tokens’ Movement within Generalized Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vassia Atanassova and Krassimir Atanassov
224 232
240
Start Strategies of ACO Applied on Subset Problems . . . . . . . . . . . . . . . . . Stefka Fidanova, Krassimir Atanassov, and Pencho Marinov
248
Sensitivity Analysis of ACO Start Strategies for Subset Problems . . . . . . Stefka Fidanova, Pencho Marinov, and Krassimir Atanassov
256
A HighlyParallel TSP Solver for a GPU Computing Platform . . . . . . . . . Noriyuki Fujimoto and Shigeyoshi Tsutsui
264
Metaheuristics for the Asymmetric Hamiltonian Path Problem . . . . . . . . . Jo˜ ao Pedro Pedroso
272
Adaptive Intelligence Applied to Numerical Optimisation . . . . . . . . . . . . . Kalin Penev and Anton Ruzhekov
280
FedBatch Cultivation Control Based on Genetic Algorithm PID Controller Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Olympia Roeva and Tsonyo Slavov Perspectives of Selﬁsh Behaviour in Mobile Ad Hoc Networks . . . . . . . . . . Marcin Seredynski and Pascal Bouvry
289 297
XII
Table of Contents
A Comparison of Metaheurisitics for the Problem of Solving Parametric Interval Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iwona Skalna and Jerzy Duda Parametric Approximation of Functions Using Genetic Algorithms: An Example with a Logistic Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fernando TorrecillaPinero, Jes´ us A. TorrecillaPinero, Juan A. G´ omezPulido, Miguel A. VegaRodr´ıguez, and Juan M. S´ anchezP´erez PopulationBased Metaheuristics for Tasks Scheduling in Heterogeneous Distributed Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flavia Zamﬁrache, Marc Frˆıncu, and Daniela Zaharie
305
313
321
Modeling and Simulation of Electrochemical Processes Modeling of Species and Charge Transport in Li–Ion Batteries Based on Nonequilibrium Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arnulf Latz, Jochen Zausch, and Oleg Iliev
329
Finite Volume Discretization of Equations Describing Nonlinear Diﬀusion in LiIon Batteries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Popov, Y. Vutov, S. Margenov, and O. Iliev
338
Contributed Papers Numerical Study of Magnetic Flux in the LJJ Model with Double SineGordon Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P.Kh. Atanasova, T.L. Boyadjiev, E.V. Zemlyanaya, and Yu.M. Shukrinov
347
A Simple Preconditioner for the SIPG Discretization of Linear Elasticity Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Ayuso, I. Georgiev, J. Kraus, and L. Zikatanov
353
Merger Bound States in 0 − π Josephson Structures . . . . . . . . . . . . . . . . . . Todor L. Boyadjiev and Hristo T. Melemov
361
Some Error Estimates for the Discretization of Parabolic Equations on General Multidimensional Nonconforming Spatial Meshes . . . . . . . . . . . . . Abadallah Bradji and J¨ urgen Fuhrmann
369
FiniteVolume Diﬀerence Scheme for the BlackScholes Equation in Stochastic Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tatiana Chernogorova and Radoslav Valkov
377
Table of Contents
XIII
On the Numerical Simulation of Unsteady Solutions for the 2D Boussinesq Paradigm Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christo I. Christov, Natalia Kolkovska, and Daniela Vasileva
386
Numerical Investigation of Spiral Structure Solutions of a Nonlinear Elliptic Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Milena Dimova and Stefka Dimova
395
Bidirectional Beam Propagation Method Applied for Lasers with Multilayer Active Medium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N.N. Elkin, A.P. Napartovich, and D.V. Vysotsky
404
Analysis of the CBS Constant for Quadratic Finite Elements . . . . . . . . . . Ivan Georgiev, Maria Lymbery, and Svetozar Margenov Sensitivity of Results of the Water Flow Problem in a Discrete Fracture Network with Large Coeﬃcient Diﬀerences . . . . . . . . . . . . . . . . . . . . . . . . . . Milan Hokr, Jiˇr´ı Kopal, Jan Bˇrezina, and Petr R´ alek
412
420
Fluxon Dynamics in Stacked Josephson Junctions . . . . . . . . . . . . . . . . . . . . Ivan Hristov and Stefka Dimova
428
Global Convergence Properties of the SORWeierstrass Method . . . . . . . . Vladimir Hristov, Nikolay Kyurkchiev, and Anton Iliev
437
Numerical Solution of a Nonlinear Evolution Equation for the Risk Preference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naoyuki Ishimura, Miglena N. Koleva, and Lubin G. Vulkov A Numerical Approach for the American Call Option Pricing Model . . . . Juri D. Kandilarov and Radoslav L. Valkov
445 453
A Numerical Study of a Parabolic MongeAmp`ere Equation in Mathematical Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miglena N. Koleva and Lubin G. Vulkov
461
Convergence of Finite Diﬀerence Schemes for a Multidimensional Boussinesq Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natalia T. Kolkovska
469
A Numerical Approach for Obtaining Fragility Curves in Seismic Structural Mechanics: A Bridge Case of Egnatia Motorway in Northern Greece . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asterios Liolios, Panagiotis Panetsos, Angelos Liolios, George Hatzigeorgiou, and Stefan Radev
477
An Eﬃcient Numerical Method for a System of Singularly Perturbed Semilinear ReactionDiﬀusion Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Chandra Sekhara Rao and Sunil Kumar
486
XIV
Table of Contents
A Comparison of Methods for Solving Parametric Interval Linear Systems with General Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iwona Skalna
494
Numerical Investigation of the Upper Bounds on the Convective Heat Transport in a Heated from below Rotating Fluid Layer . . . . . . . . . . . . . . Nikolay Vitanov
502
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
511
SpaceTime Discontinuous Galerkin Finite Element Method for ConvectionDiﬀusion Problems and Compressible Flow ˇ Miloslav Feistauer and Jan Cesenek Charles University Prague, Faculty of Mathematics and Physics, Sokolovsk´ a 83, 186 75 Praha 8, Czech Republic
[email protected],
[email protected] Abstract. This paper is concerned with the numerical solution of nonstationary, nonlinear, convectiondiﬀusion problems by the spacetime discontinuous Galerkin ﬁnite element method (DGFEM) and applications to compressible ﬂow. The ﬁrst part is devoted to theoretical analysis of error estimates of the method. In the second part, this technique is applied to the numerical solution of compressible ﬂow in timedependent domains and the simulation of ﬂow induced airfoil vibrations. Keywords: nonlinear nonstationary convectiondiﬀusion problems, spacetime discontinuous Galerkin discretization, error estimates, numerical solution of compressible ﬂow in timedependent domains, ALE method, airfoil vibrations.
1
Introduction
During the last decade the discontinuous Galerkin ﬁnite element method, using piecewise polynomial discontinuous approximations (cf., e.g. [2]), appeared as an eﬃcient tool for the space discretization of a number of problems described by partial diﬀerential equations. The numerical simulation of strongly nonstationary transient problems requires the application of numerical schemes of high order of accuracy both in space and in time. From this point of view, it appears suitable to use the discontinuous Galerkin discretization with respect to space as well as time. The discontinuous Galerkin time discretization was introduced and analyzed, e.g. in [9] for the solution of ordinary diﬀerential equations. In [10] and references therein, the solution of linear parabolic problems is carried out with the aid of conforming ﬁnite elements in space combined with the DG time discretization. In [5], the spacetime DGFEM was analyzed for a linear nonstationary convectiondiﬀusionreaction problem. The papers [6] and [7] are devoted to the analysis of a nonstationary convectiondiﬀusion problem with a nonlinear convection and linear diﬀusion. In the present paper we are concerned with the spacetime discontinuous Galerkin discretization applied to the numerical solution of a nonstationary convectiondiﬀusion problem with a nonlinear convection as well as diﬀusion. In the second part of the paper we apply this I. Dimov, S. Dimova, and N. Kolkovska (Eds.): NMA 2010, LNCS 6046, pp. 1–13, 2011. c SpringerVerlag Berlin Heidelberg 2011
2
ˇ M. Feistauer and J. Cesenek
method to the simulation of compressible ﬂow in timedependent domains and ﬂow induced airfoil vibrations. For simplicity we shall consider problems with two space dimensions. We consider the following initialboundary value problem. Let Ω ⊂ IR2 be a bounded polygonal domain and T > 0. We want to ﬁnd u : QT = Ω ×(0, T ) → IR such that ∂u ∂fs (u) + − div(β(u)∇u)) = g ∂t s=1 ∂xs u = uD , 2
in QT ,
∂Ω×(0,T )
u(x, 0) = u0 (x),
x ∈ Ω.
(2) (3)
We assume that g, uD , u0 , fs are given functions and fs ∈ C 1 (IR), 1, 2. Moreover, let β : IR → [β0 , β1 ],
(1)
0 < β0 < β1 < ∞,
β(u1 ) − β(u2 ) ≤ Lu1 − u2  ∀u1 , u2 ∈ IR.
fs  ≤ C, s = (4) (5)
In the derivation and analysis of the discrete problem we assume that the exact solution is regular in the following sense: u ∈ L2 (0, T ; H 2(Ω)), ∇u(t) L∞ (Ω) ≤ CR
2
∂u ∈ L2 (0, T ; H 1(Ω)), ∂t for a.e. t ∈ (0, T ).
(6) (7)
SpaceTime Discretization
In the time interval [0, T ] we shall construct a partition 0 = t0 < · · · < tM = T and denote Im = (tm−1 , tm ), τm = tm − tm−1 , τ = maxm=1,...,M τm . For each Im we consider a partition Th,m of the closure Ω of the domain Ω into a ﬁnite number of closed triangles with mutually disjoint interiors. The partitions Th,m are in general diﬀerent for diﬀerent m. By Fh,m we denote the system of all faces of all elements K ∈ Th,m . Further, I we denote the set of all inner faces by Fh,m and the set of all boundary faces B by Fh,m . Each Γ ∈ Fh,m will be associated with a unit normal vector nΓ , B which has the same orientation as the outer normal to ∂Ω for Γ ∈ Fh,m . We set hK = diam(K) for K ∈ Th,m , hm = maxK∈Th,m hK , h = maxm=1,...,M hm . By ρK we denote the radius of thelargest circle inscribed into K. ± For a function ϕ deﬁned in M m=1 Im we put ϕm = ϕ (tm ±) = limt→tm ± ϕ(t) and {ϕ}m = ϕ (tm +) − ϕ (tm −). Over a triangulation Th,m we deﬁne the broken Sobolev spaces H k (Ω, Th,m ) = I {v; vK ∈ H k (K) ∀ K ∈ Th,m }. For each face Γ ∈ Fh,m there exist two neigh(L)
(R)
(L)
(R)
bours KΓ , KΓ ∈ Th,m such that Γ ⊂ ∂KΓ ∩ ∂KΓ . We use the convention (L) (R) that nΓ is the outer normal to ∂KΓ and the inner normal to ∂KΓ . If
SpaceTime DGFEM B Γ ∈ Fh,m , then KΓ
(L)
3
will denote the element adjacent to Γ . For v ∈ H 1 (Ω, Th,m )
I for the trace of vK (L) on Γ . If Γ ∈ Fh,m , Γ (L) (R) 1 = the trace of vK (R) on Γ , v Γ = 2 vΓ + vΓ , [v]Γ = (L)
and Γ ∈ Fh,m we use the notation vΓ (R)
then we set vΓ (L)
Γ
(R)
vΓ − vΓ . Let CW > 0 be a ﬁxed constant. We set h(Γ ) =
hK (L) + hK (R) Γ
I for Γ ∈ Fh,m ,
Γ
2CW
h(Γ ) =
hK (L) Γ
CW
B for Γ ∈ Fh,m .
(8)
By (·, ·) we denote the scalar product in L2 (Ω) and by · we denote the norm in L2 (Ω). If u, v, ϕ ∈ H 2 (Ω, Th,m ), we deﬁne the forms ah,m (v, u, ϕ) = −
I Γ ∈Fh,m
−
Γ
B Γ ∈Fh,m
Γ
Jh,m (u, ϕ) =
K∈Th,m
K
β(v)∇ u · ∇ ϕ dx
(9)
(β(v)∇u · nΓ [ϕ] + θβ(v)∇ϕ · nΓ [u]) dS (β(v)∇u · nΓ ϕ + θ β(v)∇ ϕ · nΓ u − θβ(v)∇ϕ · nΓ uD ) dS,
h(Γ )−1
[u] [ϕ] dS + Γ
I Γ ∈Fh,m
h(Γ )−1
B Γ ∈Fh,m
u ϕ dS,
(10)
Γ
Ah,m = ah,m + β0 Jh,m , (11) 2 ∂ϕ bh,m (u, ϕ) = − fs (u) dx (12) ∂x s K∈Th,m K s=1 (L) (R) (L) (L) + H uΓ , uΓ , nΓ [ϕ] dS + H uΓ , uΓ , nΓ ϕ dS. I Γ ∈Fh,m
Γ
h,m (ϕ) = (g, ϕ) + β0
B Γ ∈Fh,m
h(Γ )−1
B Γ ∈Fh,m
Γ
Γ
uD ϕ dS.
(13)
In (12), H is a numerical ﬂux with the following properties. (H1) H(u, v, n) is deﬁned in IR2 × B1 , where B1 = {n ∈ IR2 ; n = 1}, and is Lipschitzcontinuous with respect to u, v. 2 (H2) H(u, v, n) is consistent: H(u, u, n) = s=1 fs (u) ns , u ∈ IR, n = (n1 , n2 ) ∈ B1 . (H3) H(u, v, n) is conservative: H(u, v, n) = −H(v, u, −n), u, v ∈ IR, n ∈ B1 . In the above forms we take θ = −1, θ = 0 and θ = 1 and obtain the nonsymmetric (NIPG), incomplete (IIPG) and symmetric (SIPG) variants of the approximation of the diﬀusion terms, respectively.
ˇ M. Feistauer and J. Cesenek
4
In the space H 1 (Ω, Th,m ), the following norm will be used: ϕ DG,m =
1/2 ϕ2H 1 (K) + Jh,m (ϕ, ϕ) .
(14)
K∈Th,m
Let p, q ≥ 1 be integers. For each m = 1, . . . , M we deﬁne the ﬁnitedimensional space
p = ϕ ∈ L2 (Ω); ϕK ∈ P p (K) ∀ K ∈ Th,m . Sh,m (15) Here P p (K) denotes the space of all polynomials on K of degree ≤ p. We denote p by Πm the L2 (Ω)projection on Sh,m . The approximate solution will be sought in the space p,q Sh,τ
q = ϕ ∈ L (QT ); ϕ Im = ti ϕi 2
p with ϕi ∈ Sh,m , m = 1, . . . , M . (16)
i=0
In what follows we shall use the notation U = ∂U/∂t, u = ∂u/∂t. Definition 1. We say that a function U is an approximate solution of problem p,q (1) – (3), if U ∈ Sh,τ and
((U , ϕ) + Ah,m (U, U, ϕ) + bh,m (U, ϕ)) dt + {U }m−1 , ϕ+ m−1 Im p,q =
h,m (ϕ) dt, ∀ ϕ ∈ Sh,τ , m = 1, . . . , M, U0− := Π1 u0 .
(17)
Im
The exact regular solution u satisﬁes the identity
((u , ϕ) + Ah,m (u, u, ϕ) + bh,m (u, ϕ)) dt + {u}m−1 , ϕ+ m−1 Im p,q =
h,m (ϕ) dt ∀ ϕ ∈ Sh,τ , with u(0−) = u(0).
(18)
Im
It is also possible to consider q = 0. In this case, scheme (17) represents a version of the backward Euler method. Therefore, we shall be concerned only with q ≥ 1.
3
Error Analysis
p,q In the derivation of the error we shall use the Sh,τ interpolation π of functions v ∈ H 1 (0, T ; L2(Ω)) deﬁned by p,q , b) (π v) (tm −) = Πm v(tm −), a) π v ∈ Sh,τ p,q−1 c) (πv − v, ϕ∗ ) dt = 0 ∀ ϕ∗ ∈ Sh,τ , ∀ m = 1, . . . , M.
(19)
Im
It is possible to prove that πu is uniquely determined and πvIm = π(Πm v)Im .
SpaceTime DGFEM
5
Our main goal will be the analysis of the estimation of the error e = U − u, p,q which can be expressed in the form e = ξ + η, where ξ = U − πu ∈ Sh,τ and p,q η = πu − u. Then, in virtue of (17) and (18), for each ϕ ∈ Sh,τ we have
((ξ , ϕ) + Ah,m (U, U, ϕ) − Ah,m (u, u, ϕ)) dt + {ξm−1 }, ϕ+ (20) m−1 Im
= (bh,m (u, ϕ) − bh,m (U, ϕ)) dt − (η , ϕ)dt − {η}m−1 , ϕ+ m−1 . Im
3.1
Im
Derivation of an Abstract Error Estimate
In our further considerations, by C we shall denote a positive generic constant, independent of h, τ, m, M, K, u, U , which can attain diﬀerent values in diﬀerent places. In the sequel, we shall consider a system of triangulations Th,m , m = 1, . . . , M , h ∈ (0, h0 ), which is shape regular and locally quasiuniform: There exist positive constants CR and CQ , independent of K, Γ, m, M and h, such that for all m = 1, . . . , M and h ∈ (0, h0 ) hK ≤ CR , ∀K ∈ Th,m , ρK hK (L) ≤ CQ hK (R) , hK (R) ≤ CQ hK (L) Γ
Γ
Γ
Γ
(21) I ∀ Γ ∈ Fh,m .
(22)
Important tools in the analysis of the DGFEM are the multiplicative trace inequality and the inverse inequality: There exist constants CM , CI > 0 independent of h ∈ (0, h0 ), m, M , K ∈ Th,m and v such that 2 v 2L2 (∂K) ≤ CM v L2 (K) vH 1 (K) + h−1 v v ∈ H 1 (K), (23) 2 L (K) , K and
vH 1 (K) ≤ CI h−1 K v L2 (K) ,
v ∈ P p (K).
(24)
The analysis of the form bh,m implies that for each k > 0 there exists a constant C = C(k) such that bh,m (U, ϕ) − bh,m (u, ϕ) β0 ≤ ϕ 2DG,m + C( ξ 2 + η 2L2 (Ω) + k
(25) h2K η2H 1 (K) ).
K∈Th,m
As for the coercivity, we can prove the following result: Let CW > 0, for θ = −1 (N IP G), 2 4β1 CW ≥ CMI for θ = 1 (SIP G), β0 2 2β1 CW ≥ 2 CMI for θ = 0 (IIP G), β0
(26) (27) (28)
6
ˇ M. Feistauer and J. Cesenek
where CMI = CM (CI + 1)(CQ + 1). Then ah,m (U, ξ, ξ) + β0 Jh,m (ξ, ξ) ≥
β0 ξ 2DG,m. 2
(29)
Let us substitute ϕ := ξ in (20). Then a detailed technical analysis yields the estimate − 2 − 2 β0 ξ − ξ + ξ 2DG,m dt (30) m m−1 2 Im − 2 +C ≤C ξ 2 dt + 4ηm−1 Rm (η) dt, Im
Im
where Rm (η) = η 2DG,m + η 2 +
(h2K η2H 1 (K) + h2K η2H 2 (K) ).
(31)
K∈Th,m
An important task is the estimation of the term Im ξ 2 dt. The case, when β(u) = const > 0, was analyzed in [6] and [7] using the approach from [1] based on the application of the socalled GaussRadau quadrature and interpolation. However, in the case of nonlinear diﬀusion, this technique is not applicable. Lemma 1. There exist constants C, C ∗ > 0 such that − 2 − 2 + η + ξ 2 dt ≤ C τm ξm−1 m−1 Im
Im
Rm (η) dt ,
(32)
provided 0 < τm ≤ C ∗ β0 .
(33)
Proof. The proof is rather technical. Therefore, we can mention only the most important steps. Let us set l tm−l/q = tm−1 + (tm − tm−1 ) for l = 0, ..., q. q Using scaling arguments and the equivalence of norms in the space P q (0, 1), we get the inequalities q l=0
Lq ξ(tm−l/q ) ≥ τm
2
Im
ξ 2 dt.
(34)
and + ξm−1 2
Mq ≤ τm
Im
with constants Lq , Mq depending on q only.
ξ 2 dt
(35)
SpaceTime DGFEM
7
Let us substitute ϕ := ξ in (20). Then a detailed analysis yields the estimate β0 + − 2 ξm + ξm−1 2 + ξ 2DG,m dt (36) 2 Im η − 2 ξ − 2 + 2 ≤C ξ dt + Rm (η)dt + 2 m−1 + 2 m−1 + 4δ1 ξm−1 2 , δ1 δ1 Im Im valid for any δ1 > 0. In the case q = 1, using (34) – (36) and choosing δ1 in a suitable way, we conclude that Lemma 1 holds. Further, let q ≥ 2. For each l = 1, ..., q − 1 we set ξ˜l = ζtm−l/q , where ζtm−l/q is the discrete characteristic function to the function ξ at the point tm−l/q . This p,q means that ξ˜l ∈ Sh,τ , tm−l/q p,q−1 + (ξ˜l , ϕ)dt = (ξ, ϕ)dt, ∀ϕ ∈ Sh,m , ξ˜l (t+ (37) m−1 ) = ξ(tm−1 ). Im
tm−1
It is possible to show that ξ˜l 2DG,m dt ≤ C Im
Im
ξ 2DG,mdt.
(38)
Using in (37) ϕ := ξ , we ﬁnd that 1 + + (ξ , ξ˜l )dt + ξm−1 , (ξ˜l )+ ξ(tm−l/q ) 2 + ξm−1 2 . m−1 = 2 Im
(39)
Using (20) with ϕ = ξ˜l , (34), (35), (38) and (39), after a detailed computation we ﬁnd that for any δ2 > 0 we have + ξm−l/q 2 + ξm−1 2 (40) − −
2 ξ 2 η 2 + ≤C ξ DG,m + ξ 2 + Rm (η) dt + 2 m−1 + 2 m−1 + 4δ2 ξm−1 2 . δ δ 2 2 Im
If we sum (40) over all l = 1, ..., q − 1, use (30), (34), (35) and choose δ2 in a suitable way, we prove the existence of a constant C ∗ > 0 such that (32) holds, if (33) is satisﬁed. On the basis of (30) and (32), discrete Gronwall’s lemma and the relations ξ0− = 0, e = ξ + η we obtain the abstract error estimate: Theorem 1. Let (33) hold. Then there exists a constants C > 0 such that the error e = U − u satisfies the estimate m β0 2 e− + e 2DG,j dt (41) m 2 j=1 Ij ⎛ ⎞ m m m − 2 − 2 ⎝ ⎠ ≤C ηj + Rj (η) dt + 2 ηm + β0 η 2DG,j dt, j=1
j=1
Ij
m = 1, . . . , M, h ∈ (0, h0 ).
j=1
Ij
8
3.2
ˇ M. Feistauer and J. Cesenek
Error Estimation in Terms of h and τ
The derivation of error estimates in dependence on h and τ is obtained from the abstract error estimate and estimation of terms containing η, under the assumptions (7) and
u ∈ H q+1 0, T ; H 1 (Ω) ∩ C([0, T ]; H p+1 (Ω)), (42) and the assumption that the meshes satisfy conditions (21), (22), (33) and τm ≥ Ch2m ,
m = 1, . . . , M.
(43)
Moreover, we assume that the Dirichlet datum uD satisﬁes the condition uD (x, t) =
q
ψj (x) tj ,
(44)
j=0
where ψj ∈ H p+1/2 (∂Ω) for j = 0, . . . , q. If all meshes Th,m are identical, then condition (43) can be omitted. Then, using a similar process as in [6] and [7], we obtain the main result: Theorem 2. Let u be the exact solution of problem (1) – (3) satisfying the regularity conditions (7) and (42). Let U be the approximate solution to problem (1) – (3) obtained by scheme (17) in the case that the Dirichlet datum uD is defined by (44). Let conditions (21), (22), (33) and (43) be satisfied. Then there exists a constant C > 0 independent of h, τ, m, ε, u, U such that m
2 e− m +
ε 2 j=1
Ij
e 2DG,j dt
(45)
≤ C h2p u2C([0,T ];H p+1 (Ω)) + τ 2q+2 uH q+1 (0,T ;H 1 (Ω)) , m = 1, . . . , M, h ∈ (0, h0 ). The detailed analysis will be a subject of a paper [3] in preparation.
4 4.1
DGFEM for the Solution of Compressible Flow in TimeDependent Domains Continuous Problem in the ALE Form
We shall be concerned with the numerical solution of compressible ﬂow in a bounded domain Ωt ⊂ IR2 depending on time t ∈ [0, T ]. The time dependence of the domain is taken into account with the aid of a regular onetoone ALE mapping At : Ω 0 −→ Ω t . We deﬁne the ALE velocity z˜(X, t) = ∂At (X)/∂t, z(x, t) = z˜(A−1 t ∈ [0, T ], X ∈ Ω 0 , x ∈ Ω t , and the ALE derivative of a funct (x), t), ˜ tion f = f (x, t) deﬁned for x ∈ Ωt and t ∈ (0, T ): DA f (x, t)/Dt = ∂ f(X, t)/∂t, −1 ˜ where f (X, t) = f (At (X), t), X = At (x) ∈ Ω0 .
SpaceTime DGFEM
9
The system describing compressible ﬂow consisting of the continuity equation, the NavierStokes equations, the energy equation and thermodynamical relations can be written in the ALE form ∂Rs (w, ∇w) DA w ∂g s (w) + + w divz = , Dt ∂xs ∂xs s=1 s=1 2
2
(46)
where w = (w1 , . . . , w4 )T = (ρ, ρv1 , ρv2 , E)T ∈ IR4 ,
g i (w) = f i (w) − zi w, (47) T
T
f i (w) = (fi1 , · · · , fi4 ) = (ρvi , ρv1 vi + δ1i p, ρv2 vi + δ2i p, (E + p)vi ) ,
V V V T V Ri (w, ∇w) = (Ri1 , . . . , Ri4 )T = 0, τi1 , τi2 , τi1 v1 + τi2 v2 + k∂θ/∂xi , V τij = λ divv δij + 2μ dij (v), dij (v) = (∂vi /∂xj + ∂vj /∂xi ) /2,
p = (γ − 1)(E − ρv2 /2), θ = E/ρ − v2 /2 g/cv .
(48)
We use the following notation: ρ  density, p  pressure, E  total energy, v = (v1 , v2 )  velocity, θ  absolute temperature, γ > 1  Poisson adiabatic constant, cv > 0  speciﬁc heat at constant volume, μ > 0, λ = −2μ/3  viscosity coeﬃcients, k > 0  heat conduction. The above system is equipped with initial condition w(x, 0) = w0 (x),
x ∈ Ω0 .
(49)
As for boundary conditions, we assume that the boundary of Ωt consists of three diﬀerent parts: ∂Ωt = ΓI ∪ ΓO ∪ ΓWt , where ΓI is the inlet, ΓO is the outlet and ΓWt denotes impermeable walls that may move in dependence on time. Then we prescribe the following boundary conditions: a) ρΓI = ρD , c)
2
b) vΓI = v D = (vD1 , vD2 )T ,
τijV ni vj + k
i,j=1
∂θ =0 ∂n
on ΓI ,
a) vΓWt = z D = (zD1 , zD2 ), b) a)
2
τijV ni = 0,
j = 1, 2,
(50)
b)
i=1
∂θ Γ = 0, ∂n Wt
(51)
∂θ = 0 on ΓO . ∂n
(52)
By z D we denote the velocity of a moving wall. 4.2
Discretization
Let us construct a partition 0 = t0 < t1 < t2 . . . of the time interval [0, T ]. At each time instant tm , the domain Ωtm is approximated by a polygonal domain Ωh,m , in which a triangulation Th,m is constructed. The discrete problem is formulated in a similar way as in Section 2. The approximate solution will be
10
ˇ M. Feistauer and J. Cesenek
p,q denoted by W . We assume that W Im ∈ Sh,τ,m = {ϕ ∈ L2 (Ωh,m × Im ); ϕ = q p i 4 D i=0 t ϕi with ϕi ∈ [Sh,m ] , t ∈ Im }. The symbol Fh,m will denote the system B of Γ ∈ Fh,m , on which a Dirichlet condition is prescribed. We introduce the 2 forms ∂ϕh ah,m (w, ϕh ) = Rs (w, ∇w) · dx (53) ∂xs K s=1 K∈Th,m
2
−
Γ s=1
I Γ ∈Fh,m
2
−
Γ s=1
D Γ ∈Fh,m
bh,m (w, ϕh ) = −
I Γ ∈Fh,m
Γ
+
B Γ ∈Fh,m
Γ
K s=1
(L)
(R)
(L)
(R)
Γ
gs (w) ·
∂ϕh dx ∂xs
(55)
H g (wΓ , wΓ , nΓ ) · ϕh dS, Γ
I Γ ∈Fh,m
D Γ ∈Fht
2
(54)
H g (wΓ , wΓ , nΓ ) · [ϕh ] dS
Jh,m (w, ϕh ) = +
Rs (w, ∇w)(nΓ )s · ϕh dS,
K∈Th,m
+
Rs (w, ∇w) (nΓ )s · [ϕh ] dS
h(Γ )−1 [w] · [ϕh ] dS
(56)
h(Γ )−1 w · ϕh dS,
h,m (w, ϕh ) =
2 D Γ ∈Fh,m
dh,m (w, ϕh ) =
Γ s=1
K∈Th,m
K
h(Γ )−1 w B · ϕh dS,
(w · ϕh ) divz dx.
(57)
(58)
Here H g is a conservative numerical ﬂux consitent with the ﬂuxes g s . We use the incomplete IIPG version (i.e. θ = 0). The boundary state w B is deﬁned on B the basis the Dirichlet boundary conditions and extrapolation. For Γ ∈ Fh,m (R)
the boundary state w Γ appearing in the form bh,m is deﬁned with the aid of the solution of the 1D linearized initialboundary Riemann problem as in [4]. − Further, we set W = W − ◦ A−1 ◦ Atm . Now we can deﬁne the m−1
m−1
tm−1
p,q approximate solution as a function W satisfying the conditions W Im ∈ Sh,τ,m and
(W , ϕ) + ah,m (W , ϕ) + bh,m (W , ϕ) + Jh,m (W , ϕ) (59) Im
SpaceTime DGFEM
+dh,m (W , ϕ)) dt + p,q , ∀ ϕ ∈ Sh,τ,m
W+ m−1
− + − W m−1 , ϕm−1 =
Im
11
h,m (ϕ) dt,
0 m = 1, . . . , M, W − 0 := Π1 u .
This nonlinear problem is solved with respect to W Im by a suitable iterative process. 4.3
Flow Induced Airfoil Vibrations
We consider an elastically supported airfoil with two degrees of freedom  the vertical displacement H (positively oriented downwards) and the angle α of rotation around an elastic axis EO (positively oriented clockwise). The motion of the airfoil is described by the system of nonlinear ordinary diﬀerential equations for unknowns H, α: ¨ + kHH H + Sα α mH ¨ cos α − Sα α˙ 2 sin α + dHH H˙ = −L(t), ¨ cos α + Iα α ¨ + kαα α + dαα α˙ = M (t). Sα H
(60)
We use the following notation: m  mass of the airfoil, Sα  static moment around the elastic axis EO= (xEO1 , xEO2 ), Iα  inertia moment around the elastic axis EO, kHH  bending stiﬀness, kαα  torsional stiﬀness, dHH  structural damping in bending, dαα  structural damping in torsion, c  length of the chord of the airfoil, l  airfoil depth. The aerodynamic lift force L and aerodynamic torsional moment M are deﬁned by L = −l
2
ΓW t j=1
τ2j nj dS,
τij = −pδij + τijV ,
M =l
2
ΓW t i,j=1
τij nj riort dS,
(61)
r1ort = −(x2 − xEO2 ), r2ort = x1 − xEO1 .
System (60) is equipped with the initial conditions prescribing the values H(0), ˙ α(0), H(0), α(0). ˙ It is transformed to a ﬁrstorder ODE system and approximated by the fourthorder RungeKutta method coupled with scheme (59). Figure 1 shows the displacement H and the rotation angle α in dependence on time for the farﬁeld velocity 20, 30 and 40 m/s and the following data: m = 0.086622 kg, Sa = −0.000779673 kg m, Ia = 0.000487291 kg m2 , kHH = 105.109 N/m, kαα = 3.696682 Nm/rad, l = 0.05 m, c = 0.3 m, μ = 1.8375 · 10−5 kg m−1 s−1 , farﬁeld density ρ = 1.225 kg m−3 , H(0) = −0.02 m, α(0) = 6 ˙ degrees, H(0) = 0, α˙ = 0. The position of the elastic axis is on the chord of the airfoil at the 40% distance from the leading edge. The farﬁeld Mach number is 0.014 for the velocity 20 m/s. The structural damping is neglected. The ﬂow is purely subsonic in this case and, therefore, it is not necessary to introduce an artiﬁcial viscosity in scheme (59), as was carried out, e.g. in [8]. In (59), the approximation polynomial degrees q = 0, p = 2 were used. We see that for the velocities 20 and 30 m/s the vibrations are damped, but for the velocity 40 m/s we get the ﬂutter instability when the vibration amplitudes are increasing in
ˇ M. Feistauer and J. Cesenek
12
15 10 0
α[°]
H[mm]
5 5 10 15 20
0
0.1
0.2 t[s]
0.3
0.4
10 5
α[°]
H[mm]
0 5 10 15 20 0
10 20 30 40 50 60 70 80 90 100
0.1
0.2 t[s]
0.3
0.4
0
7 6 5 4 3 2 1 0 1 2
0.1
0
0.1
0.2 t[s]
0.3
0.4
0.2 t[s]
0.3
0.4
0.2 t[s]
0.3
0.4
12 10 8 α[°]
H[mm]
25
7 6 5 4 3 2 1 0 1 2 3 4
6 4 2 0
0
0.1
0.2 t[s]
0.3
0.4
2
0
0.1
Fig. 1. Displacement H (left) and rotation angle α (right) of the airfoil in dependence on time for farﬁeld velocity 20, 30 and 40 m/s
SpaceTime DGFEM
13
time. The monotonous increase and decrease of the average values of H and α, respectively, shows that the ﬂutter is combined with a divergence instability in the presented example. Acknowledgements. This work is supported by the research project MSM 0021620839 (M. Feistauer) and by the Neˇcas Center for Mathematical Modelling, ˇ project LC06052 (J. Cesenek), both ﬁnanced by the Ministry of Education of ˇ the Czech Republic. The research of J. Cesenek was also partly supported by the project No. 12810 of the Grant Agency of the Charles University in Prague.
References 1. Akrivis, G., Makridakis, C.: Galerkin timestepping methods for nonlinear parabolic equations. ESAIM: Math. Modelling and Numer. Anal. 38, 261–289 (2004) 2. Arnold, D.N., Brezzi, F., Cockburn, B., Marini, D.: Uniﬁed analysis of discontinuous Galerkin methods for elliptic problems. SIAM J. Numer. Anal. 39, 1749–1779 (2001) ˇ 3. Cesenek, J., Feistauer, M.: Theory of the spacetime discontinuous Galerkin method for nonstationary parabolic problems with nonlinear convection and diﬀusion (in preparation) ˇ 4. Feistauer, M., Cesenek, J., Hor´ aˇcek, J., Kuˇcera, V., Prokopov´ a, J.: DGFEM for the numerical solution of compressible ﬂow in time dependent domains and applications to ﬂuidstructure interaction. In: Pereira, J.C.F., Sequeira, A. (eds.) Proceedings of the 5th European Conference on Computational Fluid Dynamics ECCOMAS CFD 2010, Lisbon, Portugal, June 1417 (2010) (published electronically), ISBN 9789899677814 ˇ 5. Feistauer, M., H´ ajek, J., Svadlenka, K.: Spacetime discontinuous Galerkin method for solving nonstationary linear convectiondiﬀusionreaction problems. Appl. Math. 52, 197–234 (2007) 6. Feistauer, M., Kuˇcera, V., Najzar, K., Prokopov´a, J.: Analysis of spacetime discontinuous Galerkin method for nonlinear convectiondiﬀusion problems. Preprint No. MATHknm2010/1, Charles University Prague, School of Mathematics (submitted Numer. Math.) 7. Feistauer, M., Kuˇcera, V., Najzar, K., Prokopov´ a, J.: Spacetime DG method for nonstationary convectiondiﬀusion problems. In: Numerical Mathematics and Advanced Applications, ENUMATH 2009. Springer, Heidelberg (2010), doi:10.1007/9783642117954 34 8. Feistauer, M., Kuˇcera, V., Prokopov´ a, J.: Discontinuous Galerkin solution of compressible ﬂow in timedependent domains. Mathematics and Computers in Simulations 80, 1612–1623 (2010) 9. Eriksson, K., Estep, D., Hansbo, P., Johnson, C.: Computational Diﬀerential Equations. Cambridge University Press, Cambridge (1996) 10. Thom´ee, V.: Galerkin Finite Element Methods for Parabolic Problems. Springer, Berlin (2006)
Stochastic Algorithms in Linear Algebra  beyond the Markov Chains and von Neumann  Ulam Scheme Karl Sabelfeld Institute Comp. Math. & Math. Geoph., Novosibirsk, Lavrentiev str, 6, 630090 Novosibirsk, Russia
[email protected] Abstract. Sparsiﬁed Randomization Monte Carlo (SRMC) algorithms for solving systems of linear algebraic equations introduced in our previous paper [34] are discussed here in a broader context. In particular, I present new randomized solvers for large systems of linear equations, randomized singular value (SVD) decomposition for large matrices and their use for solving inverse problems, and stochastic simulation of random ﬁelds. Stochastic projection methods, which I call here ”random row action” algorithms, are extended to problems which involve systems of equations and constrains in the form of systems of linear inequalities.
1
Introduction
The use of Monte Carlo methods for solving large systems of linear equations is intimately tied the NeumannUlam scheme, e.g., see [15], [16], [20], [37], [31], [32], [5], [6], [7]. It can be interpreted as follows: (1) ﬁrst, take the representation of the solution in a form of the Neumann series, then, (2) represent the solution (one component of the vector, in the case of a system of algebraic equations x = Ax + b) as an expectation over some Markov chain associated in a sense to the matrix A, (3) the expectation is then calculated by taking an ensemble average (numerically, the arithmetic mean) of a random estimator deﬁned on the constructed Markov chains. The nice feature of this method has always its parsimonious memory usage: the method takes almost no memory, independent of the size of the matrix. However a serious drawback is its weak convergence: the error decreases as O(N −1/2 ) where N is the number of independent samples of the Markov chains. QuasiMonte Carlo methods may sometimes improve the rate of convergence, however in practice the improvement is often too small. Nowadays, there has been dramatic progress in solving the storage problem, and it is natural to involve other stochastic ideas beyond the von
The author thanks the organizers of the conference, and acknowledges the support of the RFBR under Grants N 060100498, 090112028oﬁm, and a joint BMBF and Bortnik Funds Grant N 7326.
I. Dimov, S. Dimova, and N. Kolkovska (Eds.): NMA 2010, LNCS 6046, pp. 14–28, 2011. c SpringerVerlag Berlin Heidelberg 2011
Stochastic Algorithms in Linear Algebra
15
NeumannUlamMarkov chain paradigm. As an example, we mention conventional deterministic iteration methods where however the weights are chosen at random (e.g., see [42], [36]). Another important example is the projection method where one takes projections onto randomly sampled subspaces (e.g., see [41], [33]). Sampling from random subspaces is the main idea also in the randomized singular value decomposition technique (e.g., see [18], [10][12]). A general idea behind these methods appeals to the fundamental result of Johnson and Lindenstrauss [21] which says that any n point subset of Euclidean space can be embedded in k = O(log{n}/ε2) dimensions without distorting the distances between any pair of points by more than a factor of (1 ± ε), for any 0 < ε < 1. So the result of Johnson and Lindenstrauss asserts that any set of n points in ddimensional Euclidean space can be embedded into kdimensional Euclidean space where k is logarithmic in n and independent of d so that all pairwise distances are maintained within an arbitrarily small factor. The linear transformation can be done by a random matrix whose entries are independent standard Gaussian random variables. This transformation was essentially simpliﬁed in [1] by showing that this matrix can be changed with a matrix whose entries rij are independent discrete random variables with the distribution P (±1) = 1/6, P (0) = 1/3 which greatly sparsiﬁes the matrix. More precisely, Achlioptas’ Theorem is formulated as follows. Suppose that A is an n × d matrix of n points in IRd . Fix constants ε, β > 0, and choose an integer k such that k≥
4 + 2β log n . − ε3 /3
ε2
Suppose that R is a random k × d matrix with entries rij belonging to the distribution ⎧ p=1/6 √ ⎨ +1 rij = 3 0 p=2/3 (1) ⎩ −1 p=1/6 , Define the n×k matrix Q = √1k AR, which is considered as a projection of A onto a kdimensional subspace. For any row u in A, let f (u) be the corresponding row in Q. Then, for any distinct rows u, v of A, we have (1 − ε)u − v2 ≤ f (u) − f (v)2 ≤ (1 + ε)u − v2 with probability at least 1 − n−β . O(log n)
In [2], the authors suggested a lowdistortion embedding of Ld2 into Lp (p = 1, 2), called the FastJohnsonLindenstrauss Transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. In all this methods we deal with conventional numerical methods, but introduce some randomness to improve the convergence and more, to turn to
16
K. Sabelfeld
very high dimensions which can not be treated pure deterministically. So for instance, it is well known that the computational cost of a full SVD for large matrices is rapidly increasing with the matrix dimension. The randomized SVD solves this problem by a random sampling of small size submatrices for which the SVD is computed. In the case of projection methods, one projects the points only to a random set of subspaces. This type of methods treats the dimension problem in a nontrivial manner. But the main advantage of these methods is in their convergence rate: it is dramatically increased compared to the conventional Monte Carlo methods, and is actually comparable with the best deterministic methods. The computational cost of most simulation algorithms in dimension m is increasing exponentially in m. Note that even simply accessing a vector in dimension m requires N m operations, where N is the number of entries in each direction. This complexity growth is often mentioned as Curse of Dimensionality [4]. Given an equation in m dimensions, one can try to approximate its solution u(x1 , . . . , xm ) by a separable function, say, as u(x1 , . . . , xm ) ≈ f1 (x1 ) . . . fm (xm ), hence, radically reducing the complexity to a linear function of m. More generally, a separation representation is deﬁned as [4] u(x1 , . . . , xm ) =
s
(i)
(i) λi f1 (x1 ) . . . fm (xm ) + O(ε) .
i=1
Setting an accuracy goal ε ﬁrst, and then adapting {λi }, {fi(xi )} and s to achieve this goal with minimal separation rank s is the idea behind many algorithms based on the separation representation approach. In Monte Carlo methods, one often has to deal with very large dimensions, in problems like the integration, solution of integral equations, PDEs, simulation of random ﬁelds, etc. It is customary to think that the Monte Carlo methods are able to resolve problems for very high dimensions, however it is true only under the following conditions: (1) the variance of the MC estimator is small, (2) the desired accuracy is not high, (3) the complexity of construction of the random estimator is a slow function of the dimension m. Condition (3) can be often satisﬁed, however the conditions (1) and (2) are the main √ concern, because the convergence rate of MC methods is slow, scaling as σ/ M where σ is the standard deviation, and M is the sample size. Therefore, any approach, method or algorithm capable to inﬂuence one of the above three conditions is of great interest in Monte Carlo methods. In particular, one often says, in a very general sense, that a variance reduction is developed when certain deterministic transformations lead to transformed random estimator with smaller variance. The dimension is of less concern, though the dimension reduction is desirable in relation, again, with the variance reduction. For instance, the variance is reduced if an exact (or an eﬃcient deterministic) integration over a part of variables is possible. In linear algebra, a fundamental approach to separation representations for matrices is based on SVD [19], see also an excellent tutorial presentation [38]. The literature on the numerical construction of SVD is vast, we mention only some
Stochastic Algorithms in Linear Algebra
17
of them, e.g., [19], [24], [26], [29], [40], [44]. Recently, diﬀerent matrix operations like matrix multiplication and SVD for large matrices based on randomization idea has been suggested in diﬀerent papers, for diﬀerent application ﬁelds, e.g., see [18], [8], [9], [10], [11], [12], [34], [13], [23], [4], [44], [25]. Where can these computational techniques be employed ? Essentially in all ﬁelds where computation is extensively used, especially when dealing with very high dimensions, such as with highdimensional PDEs, integral equations of the 3D potential theory, inverse problems of tomography and crystallography, solving the Schr¨ odinger equation, turbulence simulations. These techniques prove useful not only in the computational mathematics, but also problems from information retrieval and Web analysis, such as Google PageRank problem and latent semantic indexing, have strongly motivated the research in the ﬁeld of design and analysis of linear algebra algorithms involving massive data sets. The list of applications can be easily extended by Data clustering, information retrieval, property testing of graphs, image processing, among others.
2
Sparsified Randomization Algorithms for Linear Systems
Let us consider a system of liner algebraic equations with a n × n matrix A, x = Ax + b,
(2)
x = (x1 , . . . , xn )T , b = (b1 , . . . , bn )T ∈ Rn , and A = {Aij }ni,j=1 , where T stands for the transpose operation, and n is supposed to be large enough. For simplicity, we assume that the spectral radius of the matrix A is less than unity, so that the solution of (2) can be calculated by the simple iteration method x(m+1) = Ax(m) + b; x(0) = b; m = 0, 1, 2, . . . . (3) Generalizations to other iteration methods are presented in our paper [34]. Sampling of Columns without Replacement. Let G be an unbiased estimator for the matrix A which is deﬁned as a random matrix such that E G = A, and let G(0) , G(1) , . . . , G(M−1) be a sequence of independent samples chosen from the random estimator G. The iterative procedure is deﬁned by ξ (m+1) = G(m) ξ(m) + b, m = 0, 1, . . . , M − 1
(4)
where ξ (0) = b. Since G(m) , m = 0, 1, . . . are all independent of each other, we get from (4) that Eξ(M) = x(M) . Let us consider the particular case when G is chosen as a sparse matrix. We will construct the matrix G columnwise: ﬁx an arbitrary integer l which is much less than n, and choose a random set J of l integers uniformly from 1 to n without replacement, that is, we choose j1 as an integer uniformly among
18
K. Sabelfeld
1, 2, . . . , n, then, j2 uniformly among the rest of n − 1 integers, etc., the last being jl , and deﬁne the entries of G by n Gik =
l
0
Aik for k ∈ J else
for i = 1, 2, . . . , n. Thus, the random matrix G has exactly l nonzero columns of the matrix A, and obviously that for any i, k we have EGik = Gik P{k ∈ J} = Aik . Note that for calculation of the components of the vector ξ(m+1) we need only l components of the vector ξ (m) and in order to calculate them we need only l components of ξ (m−1) , and so on. Consequently, we need l2 operations in every step. For approximation of x(M ) we need N M l 2 operations, where N is the necessary statistics and M is the length of the cutoﬀ of the Neumann series. Nonuniform Sampling of Columns with Replacement. Let us present a diﬀerent version of the sparsiﬁcation algorithm, where the random choice of columns is not uniform, but it is carried out as a sampling with replacement. In addition, for generality, we describe the evaluation of AB where B is a vector or a matrix. Starting with the remark that n a product of two matrices, A and B, can be represented as follows, AB = τ =1 A(τ ) B (τ ) where we use the notation A(τ ) for the τ th column of A, and B (τ ) for the τ th row of B we come to the randomized calculation of AB. Let us choose a probability distribution p1 , p2 , . . . , pn for sampling from the indices 1, 2, . . . , n. The randomized evaluation of the product AB is formulated as follows: 1. For τ = 1 to l we sample independently a random number iτ in (1, . . . , n) according to the probability distribution P rob(iτ = k) = pk , k = 1, . . . , n  a column of S is chosen as A(iτ ) / l piτ , and the relevant row in the matrix R is taken as B (iτ ) / l piτ . 2. The unbiased estimator for AB is the matrix SR.
The estimator SR is obviously unbiased: E (SR)ij = (AB)ij , i, j = 1, . . . , n. A criterion for the best choice of the distribution {pk } can be of course different. It is convenient to use the mean error
in the Frobenius norm, so we have to minimize the quantity E AB − SR2F . It can be shown (see [34]) that the choice A(k)  B (k)  pk = n (k)  k=1 A(k)  B
(5)
Stochastic Algorithms in Linear Algebra
minimizes the variance of the error which takes in this case the form: n 2
1 1 2 (k) E AB − SRF = A(k)  B  − AB2F . l l
19
(6)
k=1
In conclusion we summarize that in the Sparsiﬁed Algorithm we have the following input parameters: n, the size of the matrix A, m, the number of iterations, and l, the size of the sampled submatrices which characterizes how sparse the random matrices in the randomization algorithm are.
3 3.1
SVD and Randomized Versions SVD Background
Let A be a rectangular m × n matrix with m rows and n columns, having rank r. From the fundamental theorem of linear algebra we know (e.g., see [38]) that the matrix can be represented as a sum of r matrices of rank 1: A=
r
σi u(i) v(i)T
(7)
i=1
where σ1 ≥ σ2 ≥ . . . ≥ σr are the singular values, and u(i) ∈ IRm , v (i) ∈ IRn , i = 1, . . . , r are its left and right singular columnvectors, respectively. The families {u(i) }, {v (i) } are orthogonal sets of vectors: u(i)T · u(j) = δij , and the same for {v(i) }. In matrix form, the SVD representation (7) reads: A = U ΣV T
(8)
where U and V are orthonormal matrices with left and right singular vectors of A, respectively, and Σ is a diagonal matrix: Σ = diag(σ1 , . . . , σr ). Recall that U T U = Ir×r and V T V = In×n . The Frobenius norm AF and the spectral norm A2 are deﬁned by AF =
ij
a2ij
1/2 ,
A2 = max Ax2 = σ1 . x2 =1
(9)
The following fundamental result is well known from linear algebra as the EckhartYoung theorem [14]. If we are interested in the best approximation (in the norms  · F and  · 2 ) of A among all matrices D of rank k, then the sok lution is Ak = i=1 σi u(i) (v (i) )T , i.e., for all k rank matrices D, A − Ak 2 ≤ A−D2 , A−Ak F ≤ A−DF . The matrix Ak admits the representation: Ak = Uk Σk VkT = AVk VkT = Uk UkT A
20
K. Sabelfeld
where Uk , Vk are submatrices of U and V which contain only the top left and right singular vectors, respectively. A matrix A has a good rank k approximation if A−Ak  is small in Frobenius and 2norms. To estimate the errors, one may use the well known equalities: r
1/2 A − Ak F = σi2 (A) , A − Ak 2 = σk+1 (A) . i=k+1
3.2
Randomized SVD Algorithm
So let us assume that the matrix A is large enough, and we want to construct a randomized approximation of the ﬁrst k right singular values and corresponding right singular vectors. The idea behind many versions of randomized algorithms for SVD is to sample randomly s rows of A, then to form an s × s matrix S and compute its right singular vectors. Let us give the following version presented in [10]. Let us choose a discrete probability distribution p1 , . . . , pm for sampling from m the rows A(1) , . . . , A(m) of A: i=1 pi = 1. Randomized SVD Algorithm 0. Fix an integer s such that s is much larger than k, where ε is an error measure, but s ≤ m. 1. for j = 1 to s do: sample a random index {1, . . . , m} of the row of A according to the probability √ distribution {pj }m j=1 , and include A(j) / spj as a row of S, T 2. Compute S S and its SVD: S ST =
s
λ2j w(j) w(j)T
j=1 T
T
3. Compute h = S w /S w  for j = 1, . . . , k. Construct Hk as a matrix whose columns are the h(j) , and λ1 , . . . λk are our approximations to the ﬁrst k singular values of A. Thus we get a rank (at most) k approximation to A is AHk HkT . (j)
(j)
(j)
Note that we could turn to sample columns of A instead of rows, and compute approximations of the left singular vectors, then, Hk were a matrix RRT A where R is a m × k matrix containing approximations to the top k left singular vectors. Let us give the error estimators presented in [12]. Assume that we construct a k rank approximation AHk HkT to our matrix A by the above algorithm where the sampling of s random rows is carried out according to a probability distribution {pi }m i=1 satisfying the condition pi ≥ βA(i) 2 /A2F for some positive β ≤ 1, and let ε > 0. If s ≥ 4k/βε2 then the following estimation of the mean is true E A − AHk HkT 2F ≤ A − Ak 2F + εA2F . (10)
Stochastic Algorithms in Linear Algebra
Error estimation in probability is also possible. Let η = 1 + s ≥ 4kη 2 /βε2 then with probability at least 1 − δ
21
8 log(1/δ)/β. If
A − AHk HkT 2F ≤ A − Ak 2F + εA2F .
(11)
The same estimations in the spectral norm hold also true, with omitting the factor k in the conditions s ≥ 4k/βε2 and s ≥ 4kη 2 /βε2 . From the description of the above algorithm it is clear that the steps 1 and 2 are crucial for the eﬃciency of the method. In the step 1, we could of course use the uniform sampling which means, one call of the RAND generator will be used only, not depending on the dimension n. However this would work well only if the ”weights” of the rows, A(i)  are more or less equal for all i = 1, . . . , n. Generally, according to the estimates (10), (11), it is reasonable to sample the rows according to the probability distribution pi = βA(i) 2 /A2F . In [8], the authors suggest to use the conventional sampling algorithm which needs about n log n operations. But we can use Walker’s algorithm [43] (see the Fortran code in our recent paper [34]) which even in the general case needs only one call to RAND generator, not depending on the dimension of the matrix. Out of the loop, we need only a preparation of two additional arrays of dimension n which are calculated in O(n) operations. This method works of course if we use the sampling of rows with replacement which is always the case since we deal with matrices of large dimension. Thus this sampling algorithm is practically equivalent in eﬃciency to the uniform sampling of rows !
4
Simulation of Random Fields Based on the KarhunenLo` eve Expansion
Let us now consider a realvalued inhomogeneous random ﬁeld u(x), x ∈ G deﬁned on a probability space (Ω, A, P ) and indexed on a bounded domain G. Assume (without loss of generality) that the ﬁeld has a zero mean and a variance E u2 (x) that is bounded for all x ∈ G. The KarhunenLo`eve expansion has the form ∞ u(x) = λk ξk hk (x) , (12) k=1
where λk and hk (x) are the eigenvalues and eigenfunctions of the covariance function B(x1 , x2 ) = u(x1 ) u(x2 ), and ξk is a family of random variables. Thus λk and hk (x) are the eigenvalues and eigenfunctions are the solutions of the following eigenvalue problem for the correlation operator: B(x1 , x2 ) hk (x1 ) dx1 = λk hk (x2 ) . (13) G
The eigenfunctions form a complete orthogonal set
G
hi (x) hj (x) dx = δij where
δij is the Kronecker deltafunction. The family {ξk } is a set of uncorrelated random variables which are obviously related to hk by
22
K. Sabelfeld
1 ξk = √ λk
u(x) hk (x) dx ,
E ξk = 0,
Eξi ξj = δij .
(14)
G
It is well known that the KarhunenLo`eve expansion presents an optimal (in the mean square sense) convergence for any distribution of u(x). If u(x) is a zero mean Gaussian random ﬁeld, then {ξk } is a family of standard Gaussian random variables. Some generalizations to nongaussian random ﬁelds are reported in [27]. 4.1
Discrete Approximation of the Karhunen Lo` eve Expansion
Exact solution of the eigenvalue problem (13) can be obtained only for some simple cases, but generally, one has to solve it numerically, using quadraturebased methods, e.g., the Nystr¨om method [3]. Assume for simplicity the random process u(x) is deﬁned on a bounded interval G = [ a, b ], and xi , i = 1, . . . , n are points of a subdivision of this interval, and we are seeking for a discrete approximation v ≈ u(x) where the component vj of the vector v approximates the value u(xj ), j = 1, . . . , n. Then the covariance n × n matrix Bv = v v T of the vector v should approximate the given correlation function B(xi , xj ) in the sense that (Bv )ij ≈ B(xi , xj ). This implies that the continuous eigenvalue problem (13) is approximated by the eigenvalue problem for the correlation matrix Bv : Bv gk = λk gk
(15)
where λk are the eigenvalues, and gk the relevant eigenvectors. Since Bv is symmetric and positive deﬁnite, all eigenvalues λ1 , . . . , λn are nonnegative, and the spectral representation for the matrix Bv reads Bv =
n
λk gk gkT .
k=1
This leads us to the discrete KL expansion of the random vector v: v=
n
λk ξk gk
k=1
where {ξk }k=1,...n is a sequence of independent standard Gaussian random variables. So what remains here, is to solve the eigenvalue problem (15). If the dimension of Bh is not large, one may use standard numerical methods, e.g., the Lanczos algorithm. However to approximate random ﬁelds with high accuracy, one needs to take a subdivision which is ﬁne enough, so the matrix Bv can be of very large size. Then, we can use the randomized low rank approximation method described in section 2.2. It should be noted that the method can be very eﬃcient if the matrix Bv admits a good low rank approximation which is in many practical cases true when the correlation is not too longranged.
Stochastic Algorithms in Linear Algebra
23
Lorenzian Random Field. In [34], we have presented the following results of simulation obtained by the randomized SVD based algorithm described. Let us consider the following example [30] where we have considered the following random boundary value problem: in the upper halfplane G = {(x, y) : y ≥ 0}, ﬁnd a solution to the Laplace equation Δu(x, y) = 0 with the boundary conditions uy=0 = g(x) where g(x) is a Gaussian zero mean white noise. We have constructed the solution explicitly, which says that the solution u(x, y) is a partially homogeneous (i.e., homogeneous with respect to the longitudinal coordinate x) Gaussian random ﬁeld which is uniquely deﬁned by its correlations at two pints (x1 , y1 ), (x2 , y2 ), and the correlation function has the following Lorenzian form B(x1 , y1 ; x2 , y2 ) = u(x1 , y1 ) u(x2 , y2 ) =
1 y1 + y2 . π (y1 + y2 )2 + (x1 − x2 )2
(16)
Thus the random process u(x, y) is inhomogeneous in transverse direction. In [30], we have found an explicit KL expansion of this solution, so it was used to validate our randomized SVD based algorithm. The solution u(x, y) on a rectangular G with a grid with 500 × 500 nodes was simulated, and the rank k = 20 approximation was already enough to calculate the solution with 1%accuracy. The number of randomly sampled rows in the randomized SVD algorithm was s = 200. The reason why the rank k = 20 was enough is in the relative rapid decrease of the correlations. In the next example we deal with a longrange correlation function of the fractional Wiener process. Fractional Wiener Process. Let us consider the fractional Wiener process W H (t) of index H, H ∈ (0, 1) (Hurst parameter) which is deﬁned as a centered Gaussian inhomogeneous random process on [0, 1] with the following correlation function 1 2H BH (s, t) = E[W H (s)W H (t)] = s + t2H − t − s2H . 2 Simulation results for the fractional Wiener process on the interval [0, 2.5] with the Hurst constant H = 0.3 are presented in [35], the randomized SVD algorithm with k = 80 rank approximation was constructed by sampling 160 random rows, in the 240 × 240 correlation matrix.
5 5.1
Solution of Integral Equations Singular Approximations
The low rank approximation can be used to transform the original integral equation to an equivalent integral equation with a new kernel whose properties are better in certain sense. For instance, in [31], Sect. 2.2 we present a singular approximations based method where the norm of the new kernel of the transformed equations is less than 1. This can be achieved by the randomized SVD
24
K. Sabelfeld
with very low rank approximations. Let us present the method for a system of linear algebraic equations, for details of numerical simulation see [35]. Thus we consider a large system of linear equations with an m × m matrix and righthand side vector b = (b1 , . . . , bm )T , and it is assumed that A ≥ 1, hence the Neumann series diverges. We introduce a matrix B = A−
r
αi βiT
(17)
i=1
where α1 , . . . , αm and β1 , . . . , βm are arbitrary columnvectors, i.e., the matrix B is obtained by substraction from A a sum of singular matrices of the form αi βiT . Suppose such matrices are found, and we are interested in the relation between the solution x and the solution of the equation with the matrix B. Consider r + 1 auxiliary linear systems with the matrix B for diﬀerent righthand sides: x0 = Bx0 + b,
x1 = Bx1 + α1 ,
Then x = x0 +
r
......
xr = Bxr + αr .
Ji xi
(18)
(19)
i=1
where J1 , . . . , Jr are components of the vector J which satisﬁes the equation J = T J + t where T is the matrix with entries Tij = βiT xj , i, j = 1, . . . r, and t is a vector with components ti = βiT x0 , i = 1, . . . , r. Practical implementation of this method has a sense if for small value of r we can ﬁnd the expansion (17) with qB = B < 1. Note that the randomized SVD algorithm suggests such a solution, and we can try, step by step, to increase the number of terms till the condition qB = B < 1 is satisﬁed. For example, in the boundary integral equation formulation of the Laplace equation for a convex domain one may take r = 1 (e.g., see [17]). For nonconvex domains, r can be chosen quite small, as our calculations presented in the next section show. This is true for quite general singular kernels of the potential theory which appear in the relevant boundary integral equations, see, e.g., [25], [28], [29]. 5.2
Inconsistent Systems, Linear Least Squares, and IllPosed Problems
The general formulation of a linear least squares problem is the following: we have a set of vectors which we wish to combine linearly to provide the best possible approximation to a given vector. If the set of vectors is {a1 , a2 , . . . , an } and the given vector nis b, we seek coeﬃcients x1 , x2 , . . . , xn which produce a minimal error b − i=1 xi ai . We have to choose the vector x so as to minimize Ax − b. Let the SVD of A be U ΣV T (where U and V are square orthogonal matrices, and Σ is rectangular with the same dimensions as A). Then we have Ax − b = U ΣV T x − b = U (ΣV T x) − U (U T b) = U (Σy − c)
(20)
Stochastic Algorithms in Linear Algebra
25
where y = V T x and c = U T b. Note that U is an orthogonal matrix, and so preserves lengths, i.e., U (Σy − c) = Σy − c, and hence Ax − b = Σy − c. This suggests a method for solving the least squares problem. First, determine the SVD of A and calculate c as the product of U T and b. Then, solve the least squares problem for Σ and c, i.e., ﬁnd a vector y so that Σy − c is minimal which is obviously trivial since Σ is diagonal. Now, y = V T x so we can determine x as V y. That gives the solution vector x as well as the magnitude of the error, Σy − c.
6
Random Row Action Iteration Process
We describe here a randomized version of the projection methods belonging to the class of a ”rowaction” methods which work well both for systems with singular matrices and for overdetermined systems. These methods belong to a type known as Projection on Convex Sets methods. Here we present a method beyond the conventional Markov chain based Neumann–Ulam scheme. The main idea is in a random choice of the row in the projection method so that in average, the convergence is improved compared to the conventional periodic choice of the rows. We extend this randomized method for solving linear systems coupled with systems of linear inequalities. The row action iteration process also known as the projection method suggested ﬁrst by Kaczmarz [22] can be proved to converge for any system of linear equations with nonzero rows, even when it is singular and inconsistent and the arithmetic operations required in an iteration of the method are comparatively few. Let us consider a system of linear algebraic equations Ax = b
(21)
where A is a rectangular m × n matrix with m ≥ n, and b ∈ IRm , x ∈ IRn . We further denote by ai the ith row of A, and aTi is the relevant columnvector, the transpose of ai . Our stochastic iterative process is written as follows xk+1 = xk + ωk E
bν(i) − (aν(i) · xk ) T aν(i) , aν(i) 2
k = 1, 2, . . .
(22)
where ωk are some parameters (could be random), the expectation E is taken over the distribution of random indices ν(i) whose values are sampled at random among random subsets of indices lying in (1, 2, . . . , m). We show that the distribution can be chosen so that the method converges with expected exponential rate, not depending on the number of equations in the system. The solver does not even need to know the whole system, but only some random rows of the matrix, therefore, it is well suited for solving very large systems of linear algebraic equations. Moreover, this method can be used for solving systems of linear equations coupled with systems of linear inequalities. Remarkably, the structure of the algorithm remains practically the same. We note that an example of nonuniform sampling of the random rows in the row action process was suggested
26
K. Sabelfeld
in [39] which is quite costly, because it requires recalculation of the sampling probabilities in each iteration process. So assume we solve a coupled system of linear equations and inequalities aTi x ≤ bi aTi x = bi Let
(i) γk
=
i ∈ I≤ , i ∈ I= .
[(ai · xk ) − bi ]+ (ai · xk ) − bi
if if
(23) (24)
i ∈ I≤ i ∈ I= ,
and write the iteration process in the form: (ν(i))
xk+1 = xk −
γk aT , aν(i) 2 ν(i)
k = 1, 2, . . . .
(25)
It can be shown that this process is convergent, and
1 E d2 (xk+1 , S) ≤ 1 − 2 d2 (xk , S) . 2 L AF Here L is the Hoﬀmann constant deﬁned by d(x, Sb ) ≤ L e(Ax − b) where Sb is the set of possible solutions of our systems, d(x, Sb ) is the Euclidean distance from x to the set Sb , and e(y) deﬁnes the error in the relevant line of our system of equations and inequalities + yi (i ∈ I≤ ) e(y)i = yi (i ∈ I= )
References 1. Achlioptas, D., McSherry, F.: Fast computation of low rank matrix approximations. In: Proceedings of the 33rd Annual Symposium on Theory of Computing (2001) 2. Ailon, N., Chazelle, B.: The fast JohnsonLinderstrauss transform and approximate nearest neighbors. SIAM J. Comput. 39(1), 302–322 (2009) 3. Belongie, S., Fowlkes, C., Chung, F., Malik, J.: Spectral Partitioning with Indeﬁnite Kernels Using the Nystreom Extension. In: Heyden, A., et al. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 531–542. Springer, Heidelberg (2002) 4. Beylkin, G., Mohlenkam, M.J.: Algorithms for numerical analysis in high dimension. SIAM Journal on Scientiﬁc Computing 26(6), 2133–2159 (2005) 5. Dimov, I., Philippe, B., Karaivanova, A., Weihrauch, C.: Robustness and Applicability of Markov Chain Monte Carlo Algorithms for Eigenvalue Problem. Journal of Applied Mathematical Modelling 32, 1511–1529 (2008) 6. Dimov, I., Alexandrov, V., Papancheva, R., Weihrauch, C.: Monte Carlo Numerical Treatment of Large Linear Algebra Problems. In: Shi, Y., et al. (eds.) ICCS 2007. LNCS, vol. 4487, pp. 747–754. Springer, Heidelberg (2007)
Stochastic Algorithms in Linear Algebra
27
7. Dimov, I.T.: Monte Carlo Methods for Applied Scientists, p. 291. World Scientiﬁc, Singapore (2008) 8. Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering Large Graphs via the Singular Value Decomposition. Machine Learning 56(13), 9–33 (2004) 9. Drineas, P., Kannan, R.: Pass Eﬃcient Algorithms for Approximating Large Matrices. In: Proceedings of the 14th Annual Symposium on Discrete Algorithms (Baltimore, MD), pp. 223–232 (2003) 10. Drineas, P., Drinea, E., Huggins, P.S.: An experimental evaluation of a Monte Carlo algorithm for singular value decomposition. In: Manolopoulos, Y., Evripidou, S., Kakas, A.C. (eds.) PCI 2001. LNCS, vol. 2563, pp. 279–296. Springer, Heidelberg (2003) ISSN 03029743 11. Drineas, P., Kannan, R.: Fast Monte Carlo algorithms for approximate matrix multiplication. In: Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, p. 452 (2001) ISBN: 0769513905 12. Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J. Comput. 36(1), 132–157 (2006) 13. Eberly, W., Kaltofen, E.: On Randomized Lanczos Algorithms. In: International Conference on Symbolic and Algebraic Computation Archive Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, pp. 176– 183 (1997) 14. Eckhart, C., Young, G.: A principal axis transformation for nonHermitian matrices. Bulletin of the American Mathematical Siciety 45, 118–121 (1939) 15. Ermakov, S.M., Mikhailov, G.A.: Statistical modeling. Nauka, Moscow (1982) (in Russian) 16. Ermakov, S.M.: Monte Carlo Method in Computational Mathematics. An Introductory course. BINOM publisher, St. Pitersburg (2009) (in Russian) 17. Ermakov, S.M., Sipin, A.S.: A new Monte Carlo scheme for solving problems of mathematical physics. Soviet Dokl. 285(3) (1985) (Russian) 18. Frieze, A., Kannan, R., Vempala, S.: Fast Monte Carlo algorithms for ﬁnding lowrank approximations. J. ACM 51( 6), 1025–1041 (2004) 19. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996) 20. Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods. Chapman and Hall, London (1964) 21. Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz maps into a Hilbert space. Contemp. Math. 26, 189–206 (1984) 22. Kaczmarz, S.: Angenaeherte Auﬂoesung von Systemen linearer Gleichungen. Bull. Acad. Polon. Sciences et Lettres, A, 355–357 (1937) 23. Kobayashi, M., Dupret, G., King, O., Samukawa, H.: Estimation of singular values of very large matrices using random sampling. Computers and Mathematics with Applications 42, 1331–1352 (2001) 24. Lanczos, C.: An iteration method for the solution of the eigenvalue problem of linear diﬀerential and integral operators. Journal of Research of the National Bureau of Standards 45(4), 255–282 (1950) 25. Liberty, E., Woolfe, F., Martinsson, P.G., Rokhlin, V., Tygert, M.: Randomized algorithms for the lowrank approximation of matrices. Yale Dept. of Computer Science Technical Report 1388 26. Muller, N., Magaia, L., Herbst, B.M.: Singular Value Decomposition, Eigenfaces, and 3D Reconstructions. SIAM Review 46(3), 518–545 (2004)
28
K. Sabelfeld
27. Phoon, K.K., Huang, H.W., Quek, S.T.: Simulation of strongly nonGaussian processes using KarhunenLoeve expansion. Probabilistic engineering Mechanics 20, 188–198 (2005) 28. Rokhlin, V.: Rapid solution of integral equations of classical potential theory. J. Comp. Phys. 60, 187–207 (1985) 29. Rokhlin, V., Szlam, A., Tygert, M.: A randomized algorithm for principal component analysis. SIAM J. Matrix Anal. Appl., arxiv.org (2009) 30. Expansion of random boundary excitations for some elliptic PDEs. Monte Carlo Methods and Applications 13(56), 403–451 (2007) 31. Sabelfeld, K.K.: Monte Carlo Methods in Boundary Value Problems. Springer, Heidelberg (1991) 32. Sabelfeld, K.K., Simonov, N.A.: Random Walks on Boundary for Solving PDEs. VSP, The Netherlands, Utrecht (1994) 33. Sabelfeld, K., Loshina, N.: Fast stochastic iterative projection methods for very large linear systems. In: Seventh IMACS Seminar on Monte Carlo Methods (MCM 2009), Brussels, September 611 (2009) 34. Sabelfeld, K., Mozartova, N.: Sparsiﬁed Randomization Algorithms for large systems of linear equations and a new version of the Random Walk on Boundary method. Monte Carlo Methods and Applications 15(3), 257–284 (2009) 35. Sabelfeld, K., Mozartova, N.: Sparsiﬁed Randomization Algorithms for low rank approximations and applications to integral equations and inhomogeneous random ﬁeld simulation. Mathematics and Computers in Simulation (2010) (submitted) 36. Sabelfeld, K., Shalimova, I., Levykin, A.: Random Walk on Fixed Spheres for Laplace and Lam´e equations. Monte Carlo Methods and Applications 12(1), 55–93 (2006) 37. Sobol, I.M.: Numerical Monte Carlo Methods. Nauka, Moscow (1973) (in Russian) 38. Strang, G.: The fundamental Theorem of linear algebra. The American Mathematical Monthly 100(9), 848–855 (1993) 39. Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications 15, 262–278 (2009) 40. Stewart, G.W.: On the Early History of the Singular Value Decomposition. SIAM Review 35(4) (1993) 41. Vempala, S.S.: The Random projection method. AMS (2004) 42. Vorobiev, Ju.V.: Stochastic iteration process. J. Comp. Math. and Math. Physics 4(6), 5(5), 1088–1092, 787795 (1964) (in Russian) 43. Walker, A.J.: New fast method for generating discrete random numbers with arbitrary friquency distributions. Electronic Letters 10, 127–128 (1974) 44. Woolfe, F., Liberty, E., Rokhlin, V., Tygert, M.: A fast randomized algorithm for the approximation of matrices. Applied and Computational Harmonic Analysis 25, 335–366 (2008)
SM Stability for TimeDependent Problems Petr N. Vabishchevich Keldysh Institute of Applied Mathematics, RAS 4 Miusskaya Square, 125047 Moscow, Russia
[email protected] Abstract. Various classes of stable ﬁnite diﬀerence schemes can be constructed to obtain a numerical solution. It is important to select among all stable schemes such a scheme that is optimal in terms of certain additional criteria. In this study, we use a simple boundary value problem for a onedimensional parabolic equation to discuss the selection of an approximation with respect to time. We consider the pure diﬀusion equation, the pure convective transport equation and combined convectiondiﬀusion phenomena. Requirements for the unconditionally stable ﬁnite diﬀerence schemes are formulated that are related to retaining the main features of the diﬀerential problem. The concept of SM stable ﬁnite difference scheme is introduced. The starting point are diﬀerence schemes constructed on the basis of the various Pad´ e approximations.
1
Introduction
When timedependent problems of mathematical physics are solved numerically, much emphasis is placed on computational algorithms of higher orders of accuracy (e.g., see [1, 2]). Along with improving the approximation accuracy with respect to space, improving the approximation accuracy with respect to time is also of interest. In this respect, the results concerning the numerical methods for ordinary diﬀerential equations (ODEs) [3, 4] provide an example. Taking into account the speciﬁc features of timedependent problems for PDEs, we are interested in numerical methods for solving the Cauchy problem in the case of stiﬀ equations [5–7]. When timedependent problems are solved approximately, the accuracy can be improved in various ways. In the case of twolevel schemes (the solution at two adjacent time levels is involved), polynomial approximations of the scheme operators on the solutions are used explicitly or implicitly. The most popular representatives of such schemes are RungeKutta methods [7, 8], which are widely used in modern computations. The main feature of the multilevel schemes (multistep methods) manifests itself in the approximation of time derivatives with a higher accuracy on a multipoint stencil. A characteristic example is provided by multistep methods based on backward numerical diﬀerentiation [9]. Various classes of stable ﬁnite diﬀerence schemes can be constructed to obtain a numerical solution [10, 11]. It is important to select among all stable schemes such a scheme that is optimal in terms of certain additional criteria. In the theory I. Dimov, S. Dimova, and N. Kolkovska (Eds.): NMA 2010, LNCS 6046, pp. 29–40, 2011. c SpringerVerlag Berlin Heidelberg 2011
30
P.N. Vabishchevich
of ﬁnite diﬀerence schemes, there is the class of asymptotically stable schemes (see [12, 13]) that ensure the correct longtime behavior of the approximate solution. In the theory of numerical methods for ODEs (see [7, 9]), the concept of Lstability is used, which reﬂects the longtime asymptotic behavior of the approximate solution from a diﬀerent point of view. In [14] the properties of twolevel diﬀerence schemes of high order approximation for the approximate solution of the Cauchy problem for evolutionary equations with selfadjoint operators are considered. The simplest boundary value problem for the onedimensional parabolic equation serves as a basic problem. The concept of SM stability (Spectral Mimetic stability) of a diﬀerence scheme is introduced. This property is connected with the behavior of individual harmonics of the approximate solutions. In this paper, we continue to study the SM properties of diﬀerence schemes for the approximate solutions of unsteady problems of mathematical physics. On the model boundary value problem for onedimensional parabolic equation, the spectral characteristics of the approximations in space and in time are considered. In particular, good approximation properties (third order approximation in space) are observed for the convection operator. Twolevel schemes of higher order of approximation in time, based on the Pad´ e approximation, are considered for solving problems of mathematical physics with symmetric and skewsymmetric operators.
2
Problem Formulation
We consider ﬁnitedimensional real Hilbert space H, where the scalar product and the norm are (·, ·) and · , respectively. Let u(t) (0 ≤ t ≤ T > 0) be deﬁned as the solution of the Cauchy problem for evolutionary equation of ﬁrst order: du + Λ u = f (t), dt
0 < t ≤ T,
u(0) = u0 .
(1) (2)
The righthand side f (t) ∈ H of equation (1) is given and Λ, depending on t (Λ = Λ(t) ≥ 0), is a linear nonnegative, in generally, not selfadjoint operator from H to H. For problem (1), (2) the estimate of stability is easily established. Taking into account the skewsymmetric property of operator Λ, we have the equality u
du = (f, u). dt
By using (f, u) ≤ uf we obtain a simple estimate of stability for the solution of (1), (2) with respect to the initial data and the righthand side: t u(t) ≤ u0 + f (θ)dθ. (3) 0
SM Stability for TimeDependent Problems
31
We would like to preserve these properties of the diﬀerential problem after the transition to a discrete analogue of problem (1), (2). The main attention in our discussion is given to unsteady boundary value problem for partial diﬀerential equations. In this context, we can associate the Cauchy problem (1), (2) with the application of the method of lines (approximation in space). Having in mind the importance for applications, we will direct our considerations on an example of a boundary value problem for the onedimensional parabolic equation of second order. Let a suﬃciently smooth function u(x, t) satisﬁes the equation ∂u + Lu = 0, ∂t
0 2; If 9M, then either 9a2 or a1 ≡ 1(mod 9) and a2 a0 ≡ 6(mod 9); If 4M, then 2a2 and a2 ≡ a1 − 1(mod 4); If 2M, then a2 ≡ a1 − 1(mod 2). Some authors researched pseudorandomness of xi , i = 0, 1, . . . , M − 1 under the discrepancy DM of the twodimensional net y y i i+1 (xi , xi+1 ) = , , i = 0, 1, . . . , M − 1. M M J. EichenauerHerrmann and H. Niederreiter [4,5] proved bounds of the discrepancy DM of the twodimensional net produced by quadratic congruen (log M )2 √ tial generator which are DM = O . Using the geometric approach M 3/2 O. Blaˇzekov´a and O. Strauch [1] obtained order O (log√MM) of the ∗ stardiscrepancy DM of the same net. From the uniform distribution theory [8] it is well known that the discrepancy and the star discrepancy are always of the same order of magnitude, they diﬀer at most by 2s , where s is the dimension. Obviously, the order obtained in [1] is better than the previously proved estimates in [4,5].
2
The b−adic Diaphony and Pseudorandomness
The study of the pseudorandom property of the sequence xi , i = 0, 1, . . . is associated with an estimation of the distribution of the twodimensional net (xi , xi+1 ). Until now, the discrepancy is used to estimate the distribution of
The badic Diaphony as a Tool to Study Pseudorandomness of Nets 1
71
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Van der Corput sequence
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
quadratic generator
Fig. 1. The distribution of points of sequences (2) and (3), M = 1024, b = 3
the nets. Here we use the b−adic diaphony for the study of the distribution of the twodimensional net (xi , xi+1 ) and the pseudorandomness of the sequence xi , i = 0, 1, . . .. 2.1
Pseudorandomness of the Van der Corput Sequence Using the b−adic Diaphony
We consider the net (ζb (i), ζb (i + 1)),
i = 0, 1, . . . , M − 1.
(2)
This net is not uniformly distributed, because the points of the net lie on the 1 1 lines y = x + j+1 + j − 1, j = 0, 1, 2, . . . (see Fig. 1). b b The bad distribution of the twodimensional net (2), based on the Van der Corput sequence is seen from the values of the b−adic diaphony in Table 1. 2.2
Pseudorandomness of a Quadratic Generator Using the b−adic Diaphony
We consider the quadratic congruential generator (1) and obtain the sequence yi xi = of quadratic congruential pseudorandom numbers. To investigate M Table 1. The diaphony FM of the Van der Corput sequence, b = 3 M = bν , 3 ≤ ν ≤ 10 M FM 27 0.374992 81 0.37243 243 0.372141 729 0.372108 2187 0.372105 6561 0.372104 19683 0.372104 59049 0.372104
M 16 32 64 128 256 512 1024 2048
M = 2μ , 4 ≤ μ ≤ 16 FM M FM 0.387033 4096 0.372105 0.376644 8192 0.372104 0.373283 16384 0.372104 0.372489 32768 0.372104 0.372197 65536 0.372104 0.372126 0.372112 0.372106
72
I. Lirkov and S. Stoilova Table 2. The diaphony FM of the quadratic generator b = 3 3x2 + x + 2(mod M ) M = bν , 3 ≤ ν ≤ 10 M FM 27 0.214727 81 0.10644 243 0.0592701 729 0.0348591 2187 0.0165547 6561 0.0119346 19683 0.0072553 59049 0.00361669
6x2 + 3x + 1(mod M ) M = 2μ , 4 ≤ μ ≤ 16 FM M FM 0.187028 4096 0.0125823 0.150217 8192 0.00912462 0.105382 16384 0.00630444 0.0760402 32768 0.00436687 0.0544161 65536 0.00314525 0.0362376 0.0242248 0.0171268
M 16 32 64 128 256 512 1024 2048
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 2. The distribution of points of the combination of quadratic generator with Van der Corput sequence with M = 1024, b = 3, fM (i) ≡ 6i2 + 3i + 1(mod M )
pseudorandom property of the sequence xi , we calculate the b−adic diaphony of the net (xi , xi+1 ), i = 0, 1, . . . , M − 1 (3) for two concrete quadratic generators in the case when M = bν and M = 2μ and Table 2 shows the results. 2.3
Pseudorandom Property of the Combination of the Van der Corput Sequence with a Quadratic Generator
O. Strauch proposed to combine the Van der Corput sequence with a quadratic generator. In such way, the obtained net has a better pseudorandom property than original sequences. To improve the distribution of the twodimensional net we combine the Van der Corput sequence ζb (i) with the quadratic generator yi+1 = fM (yi ). In this way we obtain the net (ζb (yi ), ζb (yi+1 )), i = 0, 1, . . . , M − 1.
(4)
The badic Diaphony as a Tool to Study Pseudorandomness of Nets 1
0.8
0.6
0.4
0.2
0
1 0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1 0
0
1
0.8
0.6
0.4
0.2
0
1 0.9
0.1
0
0.2
0.3
m=2
0.4
0.5
0.7
0.6
0.8
0.9
1
0
1
1
1
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.3
0.7
0.8
0.9
1
0.4
0.6
0.5
0
0 0
0.2
0
0.1
0.2
m=5
0.3
0.4
0.5
0.7
0.8
0.9
1
m=4
0.9
0
0.1
m=3
73
0.6
0.7
0.8
0.9
1
0
0.1
m=6
0.2
0.4
0.3
0.5
0.6
0.7
0.8
0.9
1
m=7
Fig. 3. The distribution of the combination of quadratic generator with Van der Corput sequence with M = 1024 Table 3. The diaphony FM of the net (6) of the combination of quadratic generator fM (i) ≡ 6i2 + 3i + 1(mod M ) with Van der Corput sequence, b = 3, M = 2μ , 4 ≤ μ ≤ 16, 3 ≤ ν ≤ 9, and m ≤ ν M 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536
m=2 0.19484 0.14907 0.11024 0.10049 0.08596 0.07979 0.07538 0.07313 0.07217 0.07154 0.07118 0.07109 0.07102
m=3 0.20886 0.1374 0.09258 0.06829 0.05764 0.03917 0.03483 0.0298 0.02667 0.02542 0.02441 0.02385 0.02364
m=4 0.17695 0.09042 0.0731 0.04742 0.03482 0.0256 0.02051 0.01602 0.01174 0.01042 0.00900 0.00841
FM m=5 m=6
0.07685 0.04808 0.03402 0.02179 0.01555 0.01264 0.01009 0.00657 0.00517 0.00423
0.05253 0.03164 0.02098 0.01698 0.01263 0.00927 0.00639 0.00511 0.00347
m=7
m=8
m=9
0.03035 0.01398 0.01061 0.00878 0.00577 0.00458 0.00295
0.01293 0.00862 0.00602 0.00403 0.00305
0.01095 0.00702 0.00416 0.00286
If the quadratic generator produced purely full period of the length M , then the net (4) has the same points as (ζb (i), ζb (fM (i))), i = 0, 1, . . . , M − 1. The distribution of the obtained net is seen at Fig. 2.
74
I. Lirkov and S. Stoilova
Table 4. The diaphony FM of the net (6) of the combination of quadratic generator fM (i) ≡ 3i2 +i+2( mod M ) with Van der Corput sequence, b = 3, M = bν , 3 ≤ ν ≤ 10, and m ≤ ν M 27 81 243 729 2187 6561 19683 59049
m=2 0.23612 0.20728 0.11450 0.08438 0.07770 0.07182 0.07173 0.07118
m=3 0.37499 0.21735 0.18947 0.08945 0.04994 0.03851 0.02585 0.02562
m=4 0.37243 0.21512 0.18739 0.08626 0.04458 0.03139 0.01334
FM m=5 m=6
0.37214 0.21487 0.18715 0.08590 0.04395 0.03050
0.37211 0.21484 0.18713 0.08585 0.04388
m=7
m=8
m=9
0.37211 0.21484 0.37210 0.18713 0.21484 0.37210 0.08585 0.18713 0.21483
M=2μ 1 m=2 m=3 m=4 m=5 m=ν
FM
0.1
0.01
0.001 10
100
1000
10000
100000
M
Fig. 4. The diaphony FM of the combination of quadratic generator with Van der Corput sequence, M = 2μ
2.4
Simplification
For x ∈ [0, 1) with the b−adic expression x = 0.x1 x2 . . . xm−1 xm xm+1 . . . let ζb∗m (x) be deﬁned as ζb∗m (x) = 0.xm xm−1 . . . x2 x1 . O. Strauch proposed the net ζb∗m
y i
M
, i = 0, 1, . . . , M − 1.
(5)
For pseudorandomness of (5) we study the b−adic diaphony FM of the twodimensional net y y i i+1 ζb∗m , ζb∗m , i = 0, 1, . . . , M − 1. M M If fM (i) has a purely full period, then the net has the same points as i fM (i) ∗ ∗ ζbm , ζbm , i = 0, 1, . . . , M − 1 M M
(6)
The badic Diaphony as a Tool to Study Pseudorandomness of Nets
75
M=bν 1
FM
m=2 m=3 m=4 m=5 m=ν
0.1
0.01 10
100
1000
10000
100000
M
Fig. 5. The diaphony FM of the combination of quadratic generator with Van der Corput sequence, M = bν
and the same b−adic diaphony. The distribution of the points of the net (6) for six values of the number m is shown in Fig. 3. Tables 3 and 4 as well as Fig. 4 and 5 show the computed b−adic diaphony of the nets using two quadratic generators with functions fM (i) ≡ 6i2 + 3i + 1(mod M ), M = 2μ and fM (i) ≡ 3i2 + i + 2(mod M ), M = 3ν . Conclusion and Future Work The obtained results show that the b−adic diaphony is a good tool to study pseudorandomness of sequences and nets. The calculations for the b−adic diaphony of the net (2) conﬁrm the fact that the Van der Corput sequence is a deterministic and does not have pseudorandom properties. Last ﬁgures illustrate that the b−adic diaphony of the net (6) decreases with the increasing of the number of the points. This shows that the net (6) is uniformly distributed and therefore the sequence (5) has good pseudorandomness. Hence, the b−adic diaphony can be used to research the pseudorandomness of the sequences and nets. Furthermore, the b−adic diaphony of the nets (4) and (6) as well as of the sequence (5) can be theoretically estimated. In the future we plan to ﬁnd such theoretical bounds. Acknowledgments. We would like to thank Professor Oto Strauch for the wonderful ideas about the combination of the Van der Corput sequence with quadratic generator and the simpliﬁcation of this combination. The study of pseudorandomness of the proposed by Prof. Oto Strauch sequences is very interesting and useful for us. The authors thank to Professor Ivan Dimov for very useful remarks during the work on the paper. This work is supported by the project BgSk207, Bulgarian NSF.
76
I. Lirkov and S. Stoilova
References 1. Blaˇzekov´ a, O., Strauch, O.: Pseudorandomness of quadratic generators. Uniform Distribution Theory 2(2), 105–120 (2007) 2. Dimov, I., Atanassov, E.: Exact Error Estimates and Optimal Randomized Algorithms for Integration. In: Boyanov, T., Dimova, S., Georgiev, K., Nikolov, G. (eds.) NMA 2006. LNCS, vol. 4310, pp. 131–139. Springer, Heidelberg (2007) 3. Drmota, M., Tichy, R.F.: Sequences, Discrepancies and Applications. LNM, vol. 1651. Springer, Heidelberg (1997) 4. EichenauerHerrmann, J., Niederreiter, H.: On the discrepancy of quadratic congruential pseudorandom numbers. J. Comput. Appl. Math. 34(2), 243–249 (1991) 5. EichenauerHerrmann, J., Niederreiter, H.: An improved upper bound for the discrepancy of quadratic congruential pseudorandom numbers. Acta Arithmetica 69(2), 193–198 (1995) 6. Grozdanov, V., Stoilova, S.: The b−adic diaphony. Rendiconti di Matematica 22, 203–221 (2002) 7. Knuth, D.E.: Seminumerical algorithms, 2nd edn. The art of computer programming, vol. 2. Addison Wesley, Reading (1981) 8. Kuipers, L., Niederreiter, H.: Uniform distribution of sequences. John Wiley, New York (1974) 9. L’Ecuyer, P., Lemieux, C.: Recent Advances in Randomized QuasiMonte Carlo Methods. In: Dror, M., L’Ecuyer, P., Szidarovszki, F. (eds.) Modeling Uncertainty: An Examination of Stochastic Theory, Methods, and Applications, pp. 419–474. Kluwer Academic Publishers, Dordrecht (2002) 10. Lemieux, C., L’Ecuyer, P.: Randomized Polynomial Lattice Rules for Multivariate Integration and Simulation. SIAM Journal on Scientific Computing 24(5), 1768– 1789 (2003) 11. Niederreiter, H.: Random number generation and quasiMonte Carlo methods. In: CBMSNSF Regional Conference Series in Applied Mathematics, vol. 63. SIAM, Philadelphia (1992) 12. Niederreiter, H., Shparlinski, I.E.: On the distribution of inversive congruential pseudorandom numbers in parts of the period. Mathematics of Computation 70(236), 1569–1574 (2000) 13. Niederreiter, H., Shparlinski, I.E.: Exponential sums and the distribution of inversive congruential pseudorandom numbers with primepower modulus. Acta Arithmetica XCII(1), 89–98 (2000) ˇ Distribution of Sequences: A Sampler, Peter Lang, 14. Strauch, O., Porubsk´ y, S.: Frankfurt am Main (2005) ¨ 15. Weil, H.: Uber die Gleichverteilung von Zahlen mod. Eins. Math. Ann. 77, 313–352 (1916)
Scatter Estimation for PET Reconstruction Milan Magdics, Laszlo SzirmayKalos, Balazs T´ oth, ´ Adam Csendesi1 , and Anton Penzov2 2
1 Budapest University of Technology and Economics, Hungary Institute of Information and Communication Technologies, BAS, Bulgaria
Abstract. This paper presents a Monte Carlo scatter estimation algorithm for Positron Emission Tomography (PET) where positronelectron annihilations induce photon pairs that ﬂy independently in the medium and eventually get absorbed in the detector grid. The path of the photon pair will be a polyline deﬁned by the detector hits and scattering points where one of the photons changed its direction. The values measured by detector pairs will then be the total contribution, i.e. the integral of such polyline paths of arbitrary length. This integral is evaluated with Monte Carlo quadrature, using a sampling strategy that is appropriate for the graphics processing unit (GPU) that executes the process. We consider the contribution of photon paths to each pair of detectors as an integral over the Cartesian product set of the volume. This integration domain is sampled globally, i.e. a single polyline will represent all annihilation events occurred in any of its points. Furthermore, line segments containing scattering points will be reused for all detector pairs, which allows us to signiﬁcantly reduce the number of samples. The scatter estimation is incorporated into a PET reconstruction algorithm where the scattered term is subtracted from the measurements.
1
Introduction
In positron emission tomography (PET) we need to ﬁnd the spatial intensity distribution of positron–electron annihilations. During an annihilation event, two oppositely directed 511 keV photons are produced [Gea07]. We collect the number of simultaneous photon hits in detector pairs, also called Lines Of Responses or LORs: (y1 , y2 , . . . , yNLOR ). The required output of the reconstruction method is the emission density function x(v) that describes the number of photon pairs (i.e. the annihilation events) born in a unit volume around point v. Tomography reconstruction algorithms are usually iterative. They start with an initial emission density, compute the detector response by simulating the photon transport and update the emission density taking into account the actual simulated and the measured detector responses [SV82]. Before being detected in the detectors, photons might interact with the matter in many ways, but in our energy range and for living organs only Compton scattering and the photoelectric absorption are relevant. The probability of scattering in unit distance is the scattering cross section σs . When scattering happens, there is a unique correspondence between the relative scattered energy and the cosine of the scattering angle θ, as deﬁned I. Dimov, S. Dimova, and N. Kolkovska (Eds.): NMA 2010, LNCS 6046, pp. 77–86, 2011. c SpringerVerlag Berlin Heidelberg 2011
78
M. Magdics et al.
by the Compton formula: =
1 , 1 + 0 (1 − cos θ)
where = E1 /E0 expresses the ratio of the scattered energy E1 and the incident energy E0 , and 0 = E0 /(me c2 ) is the incident photon energy relative to the energy of the electron. The diﬀerential of the scattering cross section, i.e. the probability density that the photon is scattered from direction ω into diﬀerential solid angle dω in direction ω, is given by the KleinNishina formula [Yan08]: dσs (v, cos θ, 0 ) r2 C(v) = e ( + 3 − 2 sin2 θ), dω 2 where cos θ = ω · ω , C(v) is the electron density, and re = 2.82 · 10−15 [m] is the classical electron radius. The KleinNishina formula deﬁnes the product of the scattering cross section σs (v, 0 ) and the conditional probability density of the scattering direction. The scattering cross section can be obtained as the directional integral of the KleinNishina formula over the whole directional sphere: σs (v, 0 ) = Ω
dσs (v, cos θ, 0 ) r2 C(v) 0 dω = e σs (0 ) dω 2
(1)
where Ω is the directional sphere and σs0 is the normalized scattering cross section: σs0 (0 )
1 + − sin θdω = −2π 3
=
2
2
+ 3 − 2 sin2 θd cos θ.
−1
Ω
The ratio of the KleinNishina formula and the scattering cross section is the phase function, which deﬁnes the probability density of the reﬂection direction, provided that reﬂection happens: PKN (cos θ, 0 ) =
dσs + 3 − 2 sin2 θ /σs = . dω σs0 (0 )
The absorption cross section σa (0 ) due to the photoelectric eﬀect is approximately inversely proportional to the cube of the photon energy, thus σa (v, 0 ) ≈
const σa (v, 1) = . E3 30
(2)
The proportionality ratio σa (v, 1) depends on the material compounds and grows rapidly (with a power between 4 and 5) with the atomic number of the elements.
Scatter Estimation for PET Reconstruction
2
79
Previous Work
A physically plausible scatter correction needs photon transport simulation and the evaluation of highdimensional integrals in photon path space. As classical quadrature rules fail in higher dimensions due to the curse of dimensionality, these highdimensional integrals are estimated by Monte Carlo or quasiMonte Carlo methods [SK08]. Unfortunately, available Monte Carlo tools, like Geant4/GATE [Gea07, ABB+ 04], MCNP1 , SimSet2 , PeneloPET [EHV+ 06], are too general, and therefore not optimized for the particular task and not suitable for GPU execution. Thus, they are too slow to be incorporated into an online iterative reconstruction. For eﬀective simulation, we run our algorithm on the graphics processing unit (GPU), which is a massively parallel supercomputer. It can reach teraﬂops performance if its quasiSIMD architecture is respected, i.e. if threads execute the same instruction sequence with no communication. The direct simulation of the photon transport would not meet this requirement since diﬀerent photons may end up in the same detector which needs synchronized writes. Thus, we consider the adjoint problem and take a detector oriented viewpoint. For eﬃcient evaluation, we transform the integral over the path space to a volumetric integral.
3
Scatter Estimation
If we consider photon scattering, the path of the photon pair will be a polyline containing the emission point somewhere inside one of its line segments (Fig. 1). This polyline includes scattering points s1 , . . . , sS where one of the photons changed its direction in addition to detector hit points z 1 = s0 and z 2 = sS+1 . The values measured by detector pairs will then be the total contribution, i.e. the integral of such polyline paths of arbitrary length. We consider the contribution of photon paths as an integral over the Cartesian product set of the volume. This integration domain is sampled globally, i.e. a single sample is used for the computation of all detector pairs. Sampling parts of photon paths globally and reusing a partial path for all detector pairs allow us to signiﬁcantly reduce the number of samples. To express the contribution of a polyline path, we take its line segments onebyone and consider a line segment as a virtual LOR with two virtual detectors of locations, si−1 and si , and of diﬀerential areas projected perpendicularly to ⊥ the line segment, dA⊥ i−1 and dAi (Fig. 1). The contribution of a virtual LOR at its endpoints, i.e. the expected number of photon pairs going through dA⊥ i−1 ⊥ ⊥ and dA⊥ i is C(si−1 , si )dAi−1 dAi , where contribution C is the product of several factors: C(si−1 , si ) = G(si−1 , si )X(si−1 , si )T1 (si−1 , si )B1 (si−1 , si ), where G(si−1 , si ) is the geometry factor, X(si−1 , si ) is the total emission along the line segment, T0 (si−1 , si ) is the total attenuation due to outscattering, and 1 2
http://mcnpgreen.lanl.gov/ http://depts.washington.edu/simset/html/simset_main.html
80
M. Magdics et al.
r s2
r
r r z2 = s3
r v
si +1
θi
⊥ dl r dω dAi +1 ⊥ s dAi i
r v
r r r z1 = s0 s1
r
si −1 ⊥ i −1
dA Polyline photon path
Virtual LOR
Fig. 1. The scattered photon path is a polyline (left) made of virtual LORs (right). The left ﬁgure depicts the case of S = 2.
B0 (si−1 , si ) is the total attenuation due to photoelectric absorption, assuming photon energy 0 : 1 G(si−1 , si ) = , si−1 − si 2
T0 (si−1 , si ) = e
−
si si−1
1 X(si−1 , si ) = 2π
σs (l,0 )dl
,
B0 (si−1 , si ) = e
si x(l)dl, si−1
−
si si−1
σa (l,0 )dl
In the line segment of the emission, the original photon energy has not changed yet, thus 0 = 1. Suppose that scattering happens around end point si of the virtual LOR in diﬀerential volume dsi = dA⊥ i dl, i.e. at run length dl (right of Fig. 1). Let us extend this virtual LOR by a single scattering step to form polyline si−1 , si , si+1 . The probability that the photon scatters along distance dl and its new direction (i) is in solid angle dω is diﬀerential cross section dσs (si , cos θi , 0 )/dω · dl where θi is the scattering angle. The scattered photon will go along virtual LOR (si , si+1 ) ⊥ with diﬀerential area dA⊥ i+1 at its end if area dAi+1 subtends solid angle dω, that is: dA⊥ i+1 dω = . si − si+1 2 Upon scattering the photon changes its energy to (i)
(i+1)
0
=
0 (i)
1 + 0 (1 − cos θ)
.
This photon arrives at the other end of this virtual LOR if there is no further collision, which happens with probability T(i+1) (si , si+1 )B(i+1) (si , si+1 ). 0 0 Summarizing, the expected number of photon pairs born between si−1 and ⊥ si and reaching diﬀerential areas dA⊥ i−1 and dAi+1 via scattering at diﬀerential ⊥ volume dsi = dl · dAi is: (i)
C(si−1 , si )
dσs (si , cos θi , 0 ) ⊥ T(i+1) (si , si+1 )B(i+1) (si , si+1 )dA⊥ i−1 dsi dAi+1 . 0 0 dω
Scatter Estimation for PET Reconstruction
81
The integral of the contributions of paths of S scattering points is the product of these factors. For example, the integral of the contribution of paths of one scattering point is dσs (s, cos θ, 1) (1) y˜L = cos θ(0) cos θ(2) P(z 1 , s, z 2 )dsdz2 dz1 dω D1 D2 V
where θ(0) is the angle between the ﬁrst detector’s normal and the direction of z 1 to s, θ (2) is the angle between the second detector’s normal and the direction of z 2 to s, and P(z 1 , s, z 2 ) is the contribution of this polyline: P(z 1 , s, z 2 ) = C(z 1 , s)T0 (s, z 2 )B0 (s, z 2 ) + T0 (z 1 , s)B0 (z 1 , s)C(s, z 2 ). (3) The photon’s energy level 0 is obtained from the Compton formula for scattering angle θ formed by directions s − z 1 and z 2 − s. When the attenuation is computed, we should take into account that the photon energy changes along the polyline and the scattering cross section also depends on this energy, thus diﬀerent cross section values should be integrated when the annihilations on a diﬀerent line segment are considered. As we wish to reuse the line segments and not to repeat raymarching redundantly, each line segment is marched only once assuming photon energy 0 = 1, and attenuations T1 and B1 for this line segment is computed. Then, when the place of annihilation is taken into account and the real value of the photon energy 0 is obtained, initial attenuations T1 and B1 are transformed. The transformation is based on the decomposition of equations (1) and (2): σs (l, 0 ) = σs (l, 1) ·
σs0 (0 ) , σs0 (1)
σa (l, 0 ) =
σa (l, 1) . 30
Using this relation, we can write −
T0 = e
si si−1
−
B 0 = e
σs (l,0 )dl
−
=e si
si−1
0 ( ) si σs 0 0 (1) σs si−1
σa (l,0 )dl
−
=e
si
1 3 0 si−1
σs (l,1)dl
0 ( ) σs 0 0 (1) σs
= T1 σa (l,1)dl
.
1 3
= B1 0 .
The energy dependence of the cross section σ 0 (0 ) is a scalar function, which can be precomputed and stored in a table.
4
HighDimensional Quadrature Computation
In the previous section we concluded that the scattered contribution is a sequence of increasing dimensional integrals. Numerical quadratures generate M discrete samples u1 , u2 , . . . , uM in the domain of the integration and approximate the integral as: M 1 f (uj ) f (u)du ≈ (4) M j=1 p(uj )
82
M. Magdics et al.
where p(uj ) is a density of samples. In the integral of the contribution, a sample uj is a photon path connecting two detectors via S scattering points and containing an emission point somewhere: (j)
(j)
(j)
(j)
(j)
uj = (s0 , s1 , . . . , sS+1 ) where s0 = z 1 and sS+1 = z 2 . For example, if S = 1 i.e. we consider single scattering, then uj = (z 1 , s(j) , z 2 ).
r s2
r s2
r s1
r s1 2. Ray marching between scattering points.
1. Scattering points
r s2 r z1
r s1
3. Ray marching from detectors to scattering points
r s2 r z1
r z2
r s1
4. Ray marching on LOR and combination of scattering paths
Fig. 2. Steps of the sampling process
As the computation of a single segment of such a path requires raymarching and therefore is rather costly, we reuse the segments of a path in many other path samples. The basic steps of the path sampling process are shown by Fig. 2: 1. First, Nscatter scattering points s1 , . . . , sNscatter are sampled. 2. In the second step global paths are generated. If we decide to simulate paths of at most S scattering points, Npath ordered subsets of the scattering points are selected and paths of S points are established. If statistically independent random variables were used to sample the scattering points, then the ﬁrst path may be formed by points s1 , . . . , sS , the second by sS+1 , . . . , s2S , etc. Each path contains S − 1 line segments, which are marched assuming that the photon energy has not changed from the original electron energy. Note that building a path of length S, we also obtain many shorter paths as well. A path of length S can be considered as two diﬀerent paths of length S − 1 where one of the end points is removed. Taking another example, we get S −1 number of paths of length 1. Concerning the cost, rays should be marched only once, so the second step altogether marches on Npath (S − 1) rays.
Scatter Estimation for PET Reconstruction
83
3. In the third step, each detector is connected to each of the scattering points in a deterministic manner. Each detector is assigned to a computation thread, which marches along the connection rays. The total rays processed by the third step is Ndet Nscatter . 4. Finally, detector pairs are given to GPU threads that compute the direct contribution and combine the scattering paths ending up in them. The direct contribution needs altogether Ndetline NLOR raymarching computations. The described sampling process generates point samples. As these point samples are connected to all detectors, paths of length 2 (single scattering, S = 1) can be obtained from them. Paths longer than 2, i.e. simulating at least double scattering requires the formation of global paths. The integral quadrature of equation (4) is evaluated with these samples. To reduce the variance of the random estimator, we should ﬁnd a sampling density p that mimics the integrand. When inspecting the integrand, we should take into account that we evaluate a set of integrals (i.e. an integral for every LOR) using the same set of global samples, so the density should mimic the common factors of all these integrals. These common factors are the electron density C(v) of the scattering points, so we mimic this function when sampling points. We store the scattering cross section at the energy level of the electron, σ(v, 1), which is proportional to the electron density. As the electron density function is provided by the CT reconstruction as a voxel grid, we, in fact, sample voxels. The probability density of sampling point v is: σs (v, 1) σs [V ] Nvoxel = , σ (v, 1)dv C V V s
p(v) =
where σs [V ] is the scattering cross section at the energy level of the electron N in voxel V , C = V voxel =1 σs [V ] is the sum of all voxels, and V is the volume of interest.
5
Results
The presented algorithm have been implemented in CUDA and run on nVidia GeForce 480 GFX GPUs. We have modeled the PET system of NanoPET/CT [Med] consisting of twelve square detector modules organized into a ring, and the system measures LORs connecting a detector to three other detectors being at the opposite sides of the ring, which means that 12× 3/2 = 18 module pairs need to be processed. Each of the 12 detector modules consists of 81 × 39 crystals, thus Ndet = 12 · (81 × 39). The computation eﬀort can be analyzed by counting the number Nray of rays needed to march on, which is Nray = Npath (S − 1) + Ndet Nscatter + Ndetline NLOR . In our particular case S = 1, Nscatter = 128, and Ndetline = 4, thus — thanks to the heavy reuse of rays — scatter compensation requires just slightly more rays than the Ndetline NLOR rays of the unscattered contribution computation.
84
M. Magdics et al.
Geometry only
Absorption compensation
Scatter compensation
Fig. 3. Reconstruction results of the Derenzo phantom. The upper two rows depict a coronal and a sagittal slice of the reconstructed data, densities shown in the lower two rows are scaled by 5 in order to highlight the diﬀerences.
The reconstruction algorithm is an iteration of photon transfer simulation and density correction. We compared diﬀerent options during the transfer simulation like computing only the geometry factors, adding the attenuation due to outscattering and photoelectric absorption, and ﬁnally scattering compensation.
Scatter Estimation for PET Reconstruction
Geometry only
Absorption compensation
85
Scatter compensation
Fig. 4. 3D views of the Derenzo phantom reconstructions. We used a transfer function that emphasizes the cold noise in blue to make the diﬀerences more noticeable.
To compute single scattering, 128 scattering points are used, which are resampled in each iteration step. The algorithm has been tested on a Derenzo phantom that contains pipes with radioactive material. The Derenzo phantom is put in a cube of “super bone” of edge length 32 [mm]. Super bone has the same chemical compounds as the normal bone but it is ten times denser. In fact, it is even denser than steal, thus it can emphasize scattering and absorption phenomena. The results of the diﬀerent options after 100 iteration steps are shown in Fig. 3 and Fig. 4. Note that getting the forwardprojection to simulate more of the underlying physical process, the reconstruction can be made more accurate.
6
Conclusion
This paper proposed a GPU based scatter compensation algorithm for the reconstruction of PET measurements. The approach is restructured to exploit the massively parallel nature of GPUs. Based on the recognition that the requirements of the GPU prefer a detector oriented viewpoint, we solve the adjoint problem, i.e. originate photon paths in the detectors. The detector oriented viewpoint also allows us to reuse samples, that is, we compute many annihilation events with tracing a few line segments. The resulting approach can reduce the computation time of the fully 3D PET reconstruction to a few minutes.
Acknowledgement This work has been supported by the TeraTomo project of the NKTH, OTKA K719922 (Hungary), and Bulgarian NSF DTK 02/44. This work is connected to the scientiﬁc program of the “Development of qualityoriented and harmonized R+D+I strategy and functional model at BME” project. This project is supported by the New Hungary Development Plan (Project ID: TMOP4.2.1/B09/1/KMR20100002).
86
M. Magdics et al.
References [ABB+ 04]
[EHV+ 06]
[Gea07] [Med] [SK08]
[SV82] [Yan08]
Assi´e, K., Breton, V., Buvat, I., Comtat, C., Jan, S., Krieguer, M., Lazaro, D., Morel, C., Rey, M., Santin, G., Simon, L., Staelens, S., Strul, D., Vieira, J.M., Walle, R.V.D.: Monte carlo simulation in PET and SPECT instrumentation using GATE. Nuclear Instruments and Methods in Physics Research Section A 527(12), 180–189 (2004) Espana, S., Herraiz, J.L., Vicente, E., Vaquero, J.J., Desco, M., Udias, J.M.: PeneloPET, a Monte Carlo PET simulation toolkit based on PENELOPE: Features and validation. In: IEEE Nuclear Science Symposium Conference, pp. 2597–2601 (2006) Geant. Physics reference manual, Geant4 9.1. Technical report, CERN (2007) Mediso, http://www.bioscan.com/molecularimaging/nanopetct SzirmayKalos, L.: MonteCarlo Methods in Global Illumination — Photorealistic Rendering with Randomization. VDM, Verlag Dr. M¨ uller, Saarbr¨ ucken (2008) Shepp, L., Vardi, Y.: Maximum likelihood reconstruction for emission tomography. IEEE Trans. Med. Imaging 1, 113–122 (1982) Yang, C.N.: The KleinNishina formula & quantum electrodynamics. Lect. Notes Phys., vol. 746, pp. 393–397 (2008)
Modeling of the SET and RESET Process in Bipolar Resistive OxideBased Memory Using Monte Carlo Simulations Alexander Makarov, Viktor Sverdlov, and Siegfried Selberherr Institute for Microelectronics, TU Wien, Guhausstrae 2729, A1040 Vienna, Austria {makarov,sverdlov,selberherr}@iue.tuwien.ac.at
Abstract. A stochastic model of the resistive switching mechanism in bipolar oxidebased resistive random access memory (RRAM) is presented. The distribution of electron occupation probabilities obtained is in agreement with previous work. In particular, a low occupation region is formed near the cathode. Our simulations of the temperature dependence of the electron occupation probability near the anode and the cathode demonstrate a high robustness of the low occupation region. The RESET process in RRAM simulated with our stochastic model is in good agreement with experimental results. Keywords: stochastic model, resistive switching, RRAM, Monte Carlo method.
1
Introduction
With memories based on charge storage (such as DRAM, ﬂash memory, and other) approaching the physical limits of scalability, research on new memory structures has signiﬁcantly accelerated. Several concepts as potential substitutes of the charge memory were invented and developed. Some of the technologies are already available as prototype (such as carbon nanotube RAM (NRAM), copper bridge RAM (CBRAM)), others as product (phase change RAM (PCRAM), magnetoresistive RAM (MRAM), ferroelectric RAM (FRAM), while the technologies of spintorque transfer RAM (STTRAM), racetrack memory, and resistive RAM (RRAM) are under research. A new type of memory must exhibit low operating voltages, low power consumption, high operation speed, long retention time, high endurance, simple structure, and small size [1]. One of the most promising candidates for future universal memory is the resistive random access memory (RRAM). It is based on new materials, such as metal oxides [24] and perovskite oxides [5]. This type of memory is characterized by high density, excellent scalability, low operating voltages (< 2 V), fast switching times (< 10 ns), and long retention time. On the other hand, RRAM devices have not demonstrated yet suﬃcient endurance. Unless this problem can be solved, this technology is unlikely to be brought to market in the 2020 timeframe [1]. Unfortunately, a proper fundamental understanding of the switching I. Dimov, S. Dimova, and N. Kolkovska (Eds.): NMA 2010, LNCS 6046, pp. 87–94, 2011. c SpringerVerlag Berlin Heidelberg 2011
88
A. Makarov, V. Sverdlov, and S. Selberherr
d) OFF state
b) ON state
0.01
Current (A)
electrons oxygen vacancy ion of oxygen
0.005
vacancy occupied by electron vacancy annihilated by ion of oxygen current
0 0.005
c) RESET process
1
0.5 0 Voltage (V) vacancy annihilation
0.5
a) SET process
MetalOxide Layer
Fig. 1. Typical hysteresis cycle in RRAM and illustration of the resistive switching mechanism in bipolar oxidebased memory cell: (a) Schematic illustration of the SET process. (b) Schematic view of the conducting ﬁlament in the low resistance state (ON state). (c) Schematic illustration of the RESET process. (d) Schematic view of the conducting ﬁlament in the high resistance state (OFF state). Only the oxygen vacancies and ions that impact the resistive switching are shown.
mechanism in resistive random access memory (RRAM) is still missing, despite the fact that several physical mechanisms based on either electron or ion determined switching have been recently suggested in the literature: a model based on trapping of charge carriers [6], electrochemical migration of oxygen vacancies [7, 8], electrochemical migration of oxygen ions [9, 10], a uniﬁed physical model [11, 12], a domain model [13], a ﬁlament anodization model [14], a thermal dissolution model [15], and others. In this work we present a stochastic model of the bipolar resistive switching mechanism based on electron hopping between the oxygen vacancies along the conductive ﬁlament in an oxide layer.
2
Model Description
We associate the resistive switching behavior in oxidebased memory with the formation and rupture of a conductive ﬁlament (CF) (Fig. 1).
Modeling of the SET and RESET Process
89
The CF is formed by localized oxygen vacancies (Vo ) [11, 12] or domains of Vo . Formation and rupture of a CF is due to a redox reaction in the oxide layer under a voltage bias. The conduction is due to electron hopping between these Vo . For modeling the resistive switching in bipolar oxidebased memory by a Monte Carlo method, we describe the dynamics of oxygen ions (O2− ) and electrons in an oxide layer as follows: – – – – – –
formation of Vo by O2− moving to an interstitial position; annihilation of Vo by moving O2− to Vo ; movement of O2− between the interstitials; an electron hop into Vo from an electrode; an electron hop from Vo to an electrode; an electron hop between two Vo .
In order to model the dependences of transport on the applied voltage and temperature we choose the hopping rates for electrons as [16]: Γnm = Ae ·
dE · exp(−Rnm /a), 1 − exp(−dE/T )
(1)
Here, Ae is a coeﬃcient, dE = En − Em is the diﬀerence between the energies of an electron positioned at sites n and m, Rnm is the hopping distance, a is the localization radius. The hopping rates between an electrode (0 or N + 1) and an oxygen vacancy m are described as [12]: iC oC Γm = α · Γ0m , Γm = α · Γm0 ,
(2)
iA oA Γm = β · Γ(N+1)m , Γm = β · Γm(N +1) ,
(3)
Here, α and β are the coeﬃcients of the boundary conditions on the cathode and anode, respectively, N is the number of sites, A and C stand for cathode and anode, and i and o for hopping on the site and out from the site, respectively. To describe the motion of ions we have chosen the ion rates similar to (1): Γn = Ai ·
dE , 1 − exp(−dE/T )
(4)
Here we assume hopping only on a nearest interstitial. Thus, a distancedependent term is included in Ai . dE includes the formation energy for the mth Vo /annihilation energy of the mth Vo , when O2− is moving to an interstitial or back to Vo , respectively. The current generated by hopping is calculated as: I = qe · dx/ 1/ Γm (5) m
Here qe is the electron charge.
90
A. Makarov, V. Sverdlov, and S. Selberherr
Fig. 2. Calculated distributions of electron occupation probabilities for unidirectional next nearest neighbor hopping between the Vo (the 1st Vo is near the cathode, the last Vo is near the anode): (a) α > 0.5 and β > 0.5, pc = 0.5; (b) β < 0.5 and β < α, pc = 1 − β; α < 0.5 and α < β, pc = α
3
Model Verification
Calculations are performed on onedimensional lattices. All Vo are at the same energy level, if no voltage is applied. For simplify the calculations we assume that the oxygen vacancy is either empty or occupied by one electron. 3.1
Calculation of Electron Occupation Probabilities
To verify the proposed model, we ﬁrst evaluate the average electron occupations of hopping sites under diﬀerent conditions. For comparison with previous works all calculations in this subsection are made on a lattice consisting of thirty equivalent, equidistantly positioned hopping sites Vo . Following [17], we ﬁrst allow hopping in one direction and only to/from the closest Vo . The occupation probability of the central oxygen vacancies, pc , is described depending on the boundary conditions as follows: 1) for α > 0.5 and β > 0.5, pc = 0.5; 2) for α < 0.5 and α < β, pc = α; 3) for β < 0.5 and β < α, pc = 1 − β. Fig.2 shows simulation results of our stochastic model, which are fully consistent with theoretical predictions [17]. To move from a model system [17] to a more realistic structure, we calculated the distribution of electron occupations for a chain, where hopping is allowed not only to/from the nearest Vo (T = 0, Fig. 3), and for systems, where hopping
Modeling of the SET and RESET Process
91
Fig. 3. Calculated distribution of electron occupation probabilities, if unidirectional hopping is allowed not only to/from the closest Vo (T = 0): (a) α > 0.5 and β > 0.5; (b) β < 0.5 and β < α; α < 0.5 and α < β
Fig. 4. Calculated distribution of electron occupation probabilities, for hopping according to (13), for T > 0: (a) α > 0.5 and β > 0.5; (b) β < 0.5 and β < α; α < 0.5 and α < β
(13) is allowed in both directions (T > 0, Fig. 4). Note that for α > 0.5 and β > 0.5 (Fig. 3a and Fig. 4a) we still have pc = 0.5 in the center, while for other values α, β we observe a decrease in pc for α < β and an increase in pc for β < α.
92
A. Makarov, V. Sverdlov, and S. Selberherr
Fig. 5. Calculated distribution of electron occupation probabilities under diﬀerent biasing voltages. Lines are from [12], symbols are obtained with our stochastic model.
Fig. 6. Temperature dependence of electron occupation probability near the anode (line) and the cathode (dotted line)
We have calibrated our model in a manner to reproduce the results reported in [12], for V = 0.6 V to V = 1.4 V. Fig. 5 shows a case, when the hopping rate between two Vo is larger than the rate between the electrodes and Vo (i.e. α, β < 1). In this case a low occupation region is formed near the cathode (bipolar behavior). With the calibrated model we simulated the temperature dependence of the site occupations in the low occupation region. The results shown in Fig. 6 indicate high robustness of the low occupation region demonstrating changes of less than 10%, when the temperature is elevated from 25o C to 200o C.
Modeling of the SET and RESET Process
93
Fig. 7. I − V characteristics for a singleCF device are obtained from our stochastic model: (a) SET I − V characteristics; (b) RESET I − V characteristics and measured results from [12]
3.2
Modeling of the SET and RESET Processes
For the simulations we have used a onedimensional lattice consisting of thirty equivalent, equidistantly positioned hopping sites. To simplify calculations we assume that the coeﬃcients of the boundary conditions are constant and equal to 0.1, independent of the applied voltage. In both simulations (SET and RESET process) we have used the same formation/annihilation energy for Vo . The result of the simulation of the SET process is shown in Fig. 7a. To further demonstrate the capabilities of our model, we also simulated the RESET I − V characteristics for a singleCF device [12]. For this purpose the CF was modiﬁed in such a way that for each Vo an oxygen ion is placed nearby. Fig.7b. shows the simulation result of the stochastic model, which is in perfect agreement with measurements from [12].
4
Conclusion
In this work we have presented a stochastic model of the bipolar resistive switching mechanism. The distribution of the electron occupation probabilities calculated with the model is in excellent agreement with previous work. The simulated RESET process in RRAM is in good agreement with the experimental result. The proposed stochastic model can be used for performance optimization of RRAM devices. Acknowledgments. This research is supported by the European Research Council through the grant #247056 MOSILSPIN.
94
A. Makarov, V. Sverdlov, and S. Selberherr
References 1. Kryder, M.H., Kim, C.S.: After Hard Drives  What Comes Next? IEEE Trans. on Mag. 45(10), 3406–3413 (2009) 2. Kugeler, C., Nauenheim, C., Meier, M., et al.: Fast Resistance Switching of TiO2 and MSQ Thin Films for NonVolatile Memory Applications (RRAM). In: NVM Tech. Symp., p. 6 (2008) 3. Chen, Y.S., Wu, T.Y., Tzeng, P.J.: Formingfree HfO2 Bipolar RRAM Device with Improved Endurance and High Speed Operation. In: Symp. on VLSI Tech., pp. 37–38 (2009) 4. Dong, R., Lee, D.S., Xiang, W.F., et al.: Reproducible Hysteresis and Resistive Switching in MetalCuxOMetal Heterostructures. APL 90(4), 42107/13 (2007) 5. Lin, C.C., Lin, C.Y., Lin, M.H.: VoltagePolarityIndependent and HighSpeed Resistive Switching Properties of VDoped SrZrO3 Thin Films. IEEE Trans. on Electron Dev. 54(12), 3146–3151 (2007) 6. Fujii, T., Kawasaki, M., Sawa, A., et al.: Hysteretic CurrentVoltage Characteristics and Resistance Switching at an Epitaxial Oxide Schottky Junction SrRuO3/SrTi0.99Nb0.01O3. APL 86(1), art. no. 012107 (2005) 7. Nian, Y.B., Strozier, J., Wu, N.J., et al.: Evidence for an Oxygen Diﬀusion Model for the Electric Pulse Induced Resistance Change Eﬀect in TransitionMetal Oxides. PRL 98(14), 146403/14 (2007) 8. Wu, S.X., Xu, L.M., Xing, X.J.: ReverseBiasInduced Bipolar Resistance Switching in Pt/TiO2/SrTi0.99Nb0.01O3/Pt Devices. APL 93(4), 043502/13 (2008) 9. Szot, K., Speier, W., Bihlmayer, G., Waser, R.: Switching the Electrical Resistance of Individual Dislocations in SingleCrystalline SrTiO3. Nature Materials 5, 312– 320 (2006) 10. Nishi, Y., Jameson, J.R.: Recent Progress in Resistance Change Memory. In: Dev. Res. Conf., pp. 271–274 (2008) 11. Xu, N., Gao, B., Liu, L.F., et al.: A Uniﬁed Physical Model of Switching Behavior in OxideBased RRAM. In: Symp. on VLSI Tech., pp. 100–101 (2008) 12. Gao, B., Sun, B., Zhang, H., et al.: Uniﬁed Physical Model of Bipolar OxideBased Resistive Switching Memory. IEEE Electron Dev. Let. 30(12), 1326–1328 (2009) 13. Rozenberg, M.J., Inoue, I.H., Sanchez, M.J.: Nonvolatile Memory with Multilevel Switching: A Basic Model. PRL 92(17), 1783021 (2004) 14. Kinoshita, K., Tamura, T., Aso, H., et al.: New Model Proposed for Switching Mechanism of ReRAM. In: IEEE NonVolatile Semicond. Memory Workshop 2006, pp. 84–85 (2006) 15. Russo, U., Ielmini, D., Cagli, C., et al.: ConductiveFilament Switching Analysis and SelfAccelerated Thermal Dissolution Model for Reset in NiOBased RRAM. In: IEDM Tech. Dig., pp.775–778 (2007) 16. Sverdlov, V., Korotkov, A.N., Likharev, K.K.: ShotNoise Suppression at TwoDimensional Hopping. PRB 63, 081302 (2001) 17. Derrida, B.: An Exactly Soluble NonEquilibrium System: The Asymmetric Simple Exclusion Process. Phys. Rep. 301(13), 65–83 (1998)
Stochastic Algorithm for Solving the WignerBoltzmann Correction Equation M. Nedjalkov1 , S. Selberherr1 , and I. Dimov2 1
2
Institute for Microelectronics, TU Wien Gußhausstraße 2729/E360, A1040 Vienna, Austria Institute for Parallel Processing, Bulgarian Academy of Sciences Acad. G.Bontchev str Bl25A, 1113 Soﬁa, Bulgaria
Abstract. The quantumkinetics of current carriers in modern nanoscale semiconductor devices is determined by the interplay between coherent phenomena and processes which destroy the quantum phase correlations. The carrier behavior has been recently described with a twostage Wigner function model, where the phasebreaking eﬀects are considered as a correction to the coherent counterpart. The correction function satisﬁes a Boltzmannlike equation. A stochastic method for solving the equation for the correction function is developed in this work, under the condition for an apriori knowledge of the coherent Wigner function. The steps of an almost optimal algorithm for a stepwise evaluation of the correction function are presented. The algorithm conforms the well established Monte Carlo device simulation methods, and thus allows an easy implementation.
1
Introduction
Modeling and simulation of electronic transport in semiconductor devices is challenged by the nanometer and picosecond scale processes which determine the functionality of modern integrated circuits. Quantum transport models are explored to correctly describe coherent processes, such as tunneling, in conjunction with decoherence processes of scattering, which try to recover the classical behavior of the current carriers. The WignerBoltzmann (WB) equation gives a comprehensive quantumkinetic description of these phenomena, and has been recently applied for sumulation of a variety of nanometer devices and involved transport phenomena [1]. Stochastic approaches to the WB equation eﬃciently describe the scattering processes, however, the coherent part of the transport is obtained at signiﬁcant numerical costs. A scheme which uses coherent data obtained by alternative approaches has been developed recently. The scatteringinduced correction to the coherent Wigner function satisﬁes a Fredholm integral equation of the second kind, with a free term determined by the coherent data. Particle methods have been developed and used to calculate the free term. We have successfully applied these methods for very small devices, where this term can be regarded as a zeroth order correction. Here we utilize the numerical I. Dimov, S. Dimova, and N. Kolkovska (Eds.): NMA 2010, LNCS 6046, pp. 95–102, 2011. c SpringerVerlag Berlin Heidelberg 2011
96
M. Nedjalkov, S. Selberherr, and I. Dimov
Monte Carlo theory to derive a stochastic algorithm for solving the equation for the WB correction. An important peculiarity is that the problem is comprised by two models with diﬀerent dimensions: while the coherent transport involves two variables  the position and wave vector x, kx , the scattering occurs in the three dimensional wave vector space, thus involving the transversal components ky , kz = k⊥ . The two models are combined into a four dimensional space formulation by merely physical considerations. In this respect the sequel does not stick to the formal Monte Carlo schemes for solving integral equations, and in particular the adjoint equation, which proved as an already established approach to carrier transport problems [2], [3]. The adjoint equation remains rather implicit in the derivations, which refers to core schemes for solving integrals in favor of an emphasis on the physical aspects.
2
The Model
The timeindependent WignerBoltzmann equation: ¯hkx ∂ fw (x, kx , k⊥ ) = dkx Vw (x, kx − kx )fw (x, kx , k⊥ ) + m ∂x dk fw (x, k )S(k , k) − fw (x, k)λ(k)
(1)
describes the coherent part of the carrier transport at a rigorous quantum level, accomplished by the Boltzmann scattering model of the phasebreaking processes. Here Vw is the Wigner potential, the Boltzmann scattering operator k ) presents the scattering rate for a transition from k to k . λ(k) = S(k, dk S(k.k ) is the total outscattering rate, so that the quantity S/λ is the probability density for scattering from the initial to the ﬁnal state. The solution of (1) in the region D of a given device determines the physical characteristics of the current carriers and thus the circuit behavior of the device. The external factors which determine the solution are the applied bias, which controls the electric potential proﬁle in the device, and the boundary conditions. The latter are assumed to satisfy the equilibrium distribution function deep inside the device leads. It is the MaxwellBoltzmann distribution fMB , which is the only function turning the second row in (1) to zero independently of the physical origin of the scattering processes. The coherent problem is obtained from (1) by switching oﬀ all scattering processes. In this case the solution fwc (x, kx ) does not depend on the transversal wave vector components. A proper alignment of the variables with the genuine problem must be such that fwc is recovered after an integration over the transversal ones. A consistent with the boundary condition assumption is the appearance of the equilibrium with respect to the transversal variables function fMB (k⊥ ): fwc (x, k) = fwc (x, kx )
h ¯ 2 k2 ¯2 h ⊥ e− 2mkT 2πmkT
(2)
Stochastic Algorithm for Solving the WignerBoltzmann Correction Equation
97
This allows to deﬁne the function fwΔ (x, k) = fw (x, k) − fwc (x, k),
(3)
which is the scattering induced correction to the coherent Wigner function. The equation for the correction fwΔ is obtained by subtracting the coherent counterpart from (1). An immediate property of (3) is that the correction is zero at the device boundaries, where the same boundary conditions are assumed for both cases. The Wigner potential is approximated by its classical limit valid for slowly varying potentials at a next step: eE(x) ∂fwΔ (x, kx , k⊥ ) dkx Vw (x, kx − kx )fwΔ (x, kx , k⊥ ) = − (4) ¯h ∂kx This means that the force F (x) = eE(x), given by the derivative of the potential, can be only a linear function within the spatial support of fwΔ , related to the spatial width of the electrons. Such an assumption in the general equation (1) precludes the quantummechanical description of the transport. The latter, however, has a diﬀerent physical meaning in the equation for the correction. The width of the electron has been already accounted by the coherent solution, so that the limit precludes only correlations between the electric potential and the scattering processes. The obtained model for the correction function can be written as a Fredholm integral equation of the second kind with a free term determined by fwc : 0 0 Δ fw (x, k) = dt dk fwΔ (X(t), k )S(k , k(t))e− t λ(k(τ ))dτ + fwΔ,0 (x, k) tb
fwΔ,0
0
=
dt tb
dk fwc (X(t), k )S(k , k(t))e− −
Here
0 t
λ(k(τ ))dτ
fwc (X(t), k(t))λ(k(t))e−
(5)
0 t
λ(k(τ ))dτ
0 hKx (τ ) ¯ F (X(τ )) dτ Kx (t) = kx − dτ (6) m ¯h t t are classical Newton trajectories initialized by x, kx , 0, t < 0, and k(t) stands for Kx (t), k⊥ . The trajectory crosses the boundary of the device at a certain time tb , where fwΔ (X(tb ), k(tb )) = 0. 0
X(t) = x −
3
Computational Problem
The general task is to compute the averaged value of fwΔ in the given domain Ω of the two dimensional phase space. The averaged value can be expressed as: Δ I(Ω) = dx dkx fw (x, kx )θΩ (x, kx ) = dx dkx dk⊥ fwΔ (x, k)θΩ (x, kx ) (7)
98
M. Nedjalkov, S. Selberherr, and I. Dimov
by introducing the domain indicator θΩ (x, kx ), which is unity if the arguments belong to Ω, and 0 otherwise. The solution of equation (5) can be expressed as ∞ consecutive iterations of the kernel on the free term: fwΔ = p=0 fwΔ,p : fwΔ,(p+1)
0
=
dt −∞
dk θD (X(t))fwΔ,p (X(t), k )S(k , k(t))e−
0 t
λ(k(τ ))dτ
(8)
The lower bound of the time integral has been extended to −∞, since the introduced device domain indicator θD takes care for it’s correct value tb . We consider the contributions to (7) of the consecutive terms of (8). In this way we reduce the general task (7) to a problem of evaluation of the consecutive contributions: I(Ω) =
dx
dkx
dk⊥ fwΔ (x, k)θΩ (x, kx ) =
∞
(p+1)
dk⊥ IΩ
(k⊥ )
p=0
(p+1)
IΩ
(k⊥ ) =
0
dt
dx
−∞
dkx
dk θD (X(t))
fwΔ,p (X(t), k )S(k , k(t))e−
0 t
λ(k(τ ))dτ
θΩ (x, kx )
(9)
The trajectory X(t), k(t) = (Kx (t), k⊥ ) is initialized by x, kx at time 0, and the parameterization is backward: t < 0. 3.1
Stochastic Analysis
The aim of the following analysis is twofold: to devise a Monte Carlo method for evaluation of I(Ω); the method to be compatible with the established algorithms for device simulations and thus to allow an easy implementation. These algorithms emulate the natural processes of the evolution of Boltzmann carriers, which follow an incrementing in time succession. Thus equation (9) must be reformulated in a forward in time, t > 0, parameterization. According to (6) the trajectory is initialized by x, kx at 0, which can be written as: X(t) = X(t; x, kx , 0) = xt Kx (t) = Kx (t; x, kx , 0) = kxt . Two basic properties of the Newton trajectories are utilized. A trajectory, being a unique solution of a ﬁrst order diﬀerential equations, can be initialized by any of its points xt , kxt associated to given time t. Furthermore, in stationary conditions trajectories are invariant with respect to a shift of both, the time origin and the parameterization time: X(τ ) = X(τ −t; xt , kxt , 0) = X t(τ −t);
Kx (τ ) = Kx (τ −t; xt , kxt , 0) = Kxt (τ −t)
Here the initialization point/time have been changed accordingly, followed by a shift in time by −t. The short notations X t , K t recall for the novel initialization by xt , kxt , 0. It follows that x = X t(−t), kx = Kxt (−t). The Liouville theorem
Stochastic Algorithm for Solving the WignerBoltzmann Correction Equation
99
dxdkx = dxt dkxt is ﬁnally utilized to reformulate (9) as follows: ∞ S(k , kxt , k⊥ ) (p+1) t t t Δ,p t IΩ (k⊥ ) = dt dx dkx dk θD (x )fw (x , k ) λ(k ) 0 t t λ(k ) λ(Kxt (t), k⊥ )e− 0 λ(Kx (τ ),k⊥)dτ θΩ (X t (t), Kxt (t)) (10) λ(Kxt (t), k⊥ ) where, now, the trajectory X t (t), Kxt (t), t > 0 is initialized by xt , kxt at the time origin, and the equation has been augmented to obtain the (enclosed in curly brackets) well known Monte Carlo probability densities for scattering, S, and drift, D, processes. Indeed these densities associate to an initial point a ﬁnal point within the scheme:
SD xt , k → xt , kxt , k⊥ ⇒ X t (t), Kxt (t), k⊥ , (11) where → corresponds to a scattering event, while ⇒ to a drift, called also free ﬂight. The scheme deﬁnes a segment of a numerical trajectory obtained by the consecutive iterations of (10). To analyze the physical aspects behind such a (2) trajectory, it is suﬃcient to consider the second iteration IΩ . The following property will be used: in the limiting case, when the domain Ω shrinks to a point so that the domain indicator becomes a delta function: δ(x − X t(t))δ(kx − (p+1) Kxt (t)), equation (10) obtains a recursive form, due to the fact that Iδ (k⊥ ) = Δ,(p+1) fw (x, kx , k⊥ ) A convention to mark the variables by the number of the corresponding iteration is followed, for convenience the superscript t is omitted along with the subscript of kx . Finally, the notation (11), which provides a convenient abbreviation for the product of the two probability densities in (10) is utilized: ∞ ∞ dt2 dx2 dk2 dk2 θD (x2 ) dt1 dx1 dk1 dk1 θD (x1 )fwΔ,0 (x1 , k1 ) (12) 0
0
λ(k1 ) δ(x2 , k2 ; X1 K1 , t1 ) λ(k2 ) λ(k2 ) ⇒ X2 (t2 ), K2 (t2 ), k⊥3 } θΩ (X2 (t2 ), K2 (t2 )) λ(k3 )
SD {x1 , k1 → x1 , k1 , k⊥2 ⇒ X1 (t1 ), K1 (t1 ), k⊥2 } SD {x2 , k2 → x2 , k2 , k⊥3 with
δ(xs+1 , ks+1 ; Xs , Ks , ts ) = δ(xs+1 − Xs (ts ))δ(ks+1 − Ks (ts ))
The zeroth order is given by the free term which, according to (5) has two components denoted by fwΔ,0A and fwΔ,0B . The former is expressed in a forward in time parameterization [4] as follows:
fwΔ,0A (x1 , k1 )
∞
=
S(k0 , k0 , k⊥1 ) λ(k0 )
0
dt0
dx0
dk0
λ(K0 (t0 ), k⊥1 )e−
dk0 θD (x0 ) t0 0
⎧ ⎨
⎫ h ¯ 2 k2 ⊥0 ⎬ ¯ 2 e− 2mkT h f c (x0 , k0 ) ⎩ 2πmkT ⎭ w
λ(K0 (τ ),k⊥1 )dτ
(13)
λ(k ) 0 δ(x1, k1 ; X0 K0 , t0 ) λ(k1 )
100
M. Nedjalkov, S. Selberherr, and I. Dimov
The terms in the curly brackets in (12) and (13) correspond to a sequence of conditional probabilities giving rise to freeﬂight and scattering events. The ﬁnal point of each free ﬂight becomes the initial point for the next scattering event: x0 , k0 , k⊥0 → x0 , k0 , k⊥1 ⇒ X0 (t0 ) = x1 , K0 (t0 ) = k1 , k⊥1

fwΔ,0A (x1 , k1 )
x1 , k1 , k⊥1 → x1 , k1 , k⊥2 ⇒ X1 (t1 ) = x2 , K1 (t1 ) = k2 , k⊥2

fwΔ,1A (x2 , k2 )
x2 , k2 , k⊥2 → x2 , k2 , k⊥3 ⇒ X2 (t2 ), K2 (t2 ), k⊥3
 IΩ (k⊥3 ) (2)
The sequence of events resembles the evolution of a Boltzmann particle and thus enables the implementation of the standard algorithm for trajectory construction utilized in the device Monte Carlo simulators. 3.2
Numerical Aspects
We now return to the general task, the computation of I(Ω), and analyze what happens from a numerical point of view during the particle evolution. The basic notions from the Monte Carlo evaluation of integrals are assumed to be well known, and will be applied in the following. A general result is that a stochastic approach is optimal provided that the sampling probability density is proportional to the integrand function. In this respect the choice of the initial point x0 , k0 , k⊥0 in (13) is according to the Gaussian in the ﬁrst curly brackets for the transversal variables, and according to: fwc (x0 , k0 ) ; F1 = dx dkx fwc (x, kx ); F1 for the longitudinal ones. Thus the initial weight of the particle is F1 times the sign of fwc in the chosen point. The multiplication by F1 can be done at the ﬁnal stage of evaluation of the estimators, so that the initialized particle carries the sign only. The particle evolves to x1 , k1 , k⊥1 as a result of a scattering and a drift event, and the weight is updated by the ratio of the two λ values. We note that at this stage the above procedure can be regarded as a legitimate experiment (0) for evaluation of I(Ω)(0) = dk⊥1 IΩ (k⊥1 ). An estimator ξΩ (0) is introduced, whose value is updated by adding of sign(fwc )λ(k0 )/λ(k1 ). The integral over the transverse variables means that the update of the estimator is independent of the concrete value of k⊥1 . The trajectory continues by a second scattering and free ﬂight, and the weight is updated by the next fraction λ(k0 )/λ(k1 ). The obtained twosegment trajectory is a legitimate experiment for evaluation of I(Ω)(1) : the weight sign(fwc )λ(k0 )λ(k1 )/λ(k1 )λ(k2 ) is added to an estimator ξΩ (1). A third step follows in the same fashion, etc. The consecutive steps give rise to a weight sign(fwc )λ(k0 )/λ(kp ) used to evaluate the consecutive values of I(Ω)(p) , stored by the corresponding estimators ξΩ (p). The procedure continues, until the trajectory abandons the device domain for the ﬁrst time: In this case the
Stochastic Algorithm for Solving the WignerBoltzmann Correction Equation
101
device domain indicator becomes zero, which resets the value of the accumulated weight of all further steps to 0. The contributions to the higher order terms in the sum for I(Ω) become zero and the further evolution of such a trajectory becomes obsolete. In this way one trajectory represents one independent experiment for a direct evaluation of IΩ : all estimators can be merged into one, ξΩ . Finally, the arithmetic mean of the accumulated due to N independent trajectories value of ξΩ , multiplied by F1 is a Monte Carlo estimate of IΩ . The contribution of the second component fwΔ,0B is a subject of similar analysis. The only diﬀerence is that the trajectory begins with a free ﬂight, determined by the initialization point. This can be formally accounted by replacement of the ﬁrst S/λ term in (13) by a delta function. Diﬀerent strategies may be considered: the two contributions can be evaluated separately, or fwΔ,0 can be evaluated at a ﬁrst stage and then used for a direct evaluation of the iteration series. As the eﬃciency of these strategies can be estimated by numerical experiments only, we continue by adopting the ’separate simulation’ approach. 3.3
Pointwise Evaluation
It is further assumed that the coherent solution is known only pointwise. The following decomposition can be utilized in (10): dxt dkx fwΔ,(p) (xt , kx , k⊥ ) = fwΔ,(p) (xtm , kxn , k⊥ )Δ (14) mn
introduced by the interval Δ = Δkx Δx . The computational task is further foΔ,(p+1) cused on the evaluation of the averaged value of fw in the domain Ωij speciﬁed by Δ around (xi , kxj ). In particular (10) reduces to the recursive relation: fwΔ,(p+1) (xi , kxj , k⊥ ) =
dkxt
mn
λ(Kxt (t), k⊥ )e−
t 0
λ(Kxt (τ ),k⊥)dτ
∞ S(k , kxt , k⊥ ) Δ,p t dt dk⊥ fw (xm , kxn , k⊥ ) λ(k ) 0
λ(k ) θD (xtm )θΩij (X t (t), Kxt (t)), (15) λ(Kxt (t), k⊥ )
where the trajectory is initialized by xm , kxt , and gives rise to the following algorithm:  The phase space simulation domain is decomposed into subdomains Ωmn around xm , kxn nodes; The estimators ξmn are initialized to zero. Evaluated are the probabilities: Pmn =
fwc (xm , kxm ) ; F1
F1 =
fwc (xm , kxm );
mn
The number of independent Monte Carlo experiments is speciﬁed to Nl .
102
M. Nedjalkov, S. Selberherr, and I. Dimov
 Within a loop over l = 1, . . . , Nl : the initial point xm , kxn , k⊥ of the lth trajectory is chosen randomly by using Pmn and the Gaussian distribution function of the transversal wave vectors. The product of the sign of fwc and λ, both evaluated at the initial point, is assigned to a variable wl .  The construction of the trajectory begins by a scattering event for the iteration series A corresponding the ﬁrst component of the free term, followed by a free ﬂight. For the second component, B, only the free ﬂight remains. In both cases the events are realized by the standard scheme for device Monte Carlo simulators.  After each free ﬂight: if the trajectory belongs to the device domain, the estimator of the nearest to the end point node is updated by adding wl /λ where λ is determined by the free ﬂight end point; otherwise the construction of the trajectory is stopped and another trajectory begins.  At the end of the loop the values of the estimators are divided by Nl It holds: A,B /Nl . fwΔA,B (xi , kxj ) ξij
Finally
4
fwΔ (xi , kxj ) = fwΔA (xi , kxj ) − fwΔB (xi , kxj ).
Conclusions
The presented approach aims at an estimation of the eﬀect of scattering to the coherent transport in nanoscale devices. It oﬀers high computational eﬃciency at the expense of neglecting the correlations between electrical potential and scattering events. The devised Monte Carlo algorithm calculates pointwise the values of the scatteringinduced Wigner function correction. It is compatible with the established methods for Monte Carlo device simulations and thus allows an easy implementation.
Acknowledgment This work has been supported by the Austrian Science Fund Project FWFP21685.
References 1. Querlioz, D., Dollfus, P.: The Wigner Monte Carlo Method for Nanoelectronic Devices  A particle description of quantum transport and decoherence (ISTEWiley) (2010) 2. Kosina, H., Nedjalkov, M., Selberherr, S.: The stationary Monte Carlo method for device simulation  Part I: Theory. J. Appl. Phys. 93(6), 3553–3563 (2003) 3. Nedjalkov, M., Kosina, H., Selberherr, S., Ringhofer, C., Ferry, D.K.: Uniﬁed particle approach to WignerBoltzmann transport in small semiconductor devices. Physical Review B 70(11), 115319–115335 (2004) 4. Schwaha, P., Baumgartner, O., Heinzl, R., Nedjalkov, M., Selberherr, S., Dimov, I.: Classical approximation of the scattering induced Wigner correction equation. In: 13th International Workshop on Computational Electronics Book of Abstracts, IWCE13, Beijing, China, pp. 177–180. IEEE, Los Alamitos (2009)
Modeling Thermal Eﬀects in FullyDepleted SOI Devices with Arbitrary Crystallographic Orientation K. Raleva1 , D. Vasileska2 , and S.M. Goodnick2 1
University Sts, Cyril and Methodius, Skopje, Republic of Macedonia 2 Arizona State University, Tempe, AZ 852875706, USA
Abstract. In this work we continue our investigation on the heating eﬀects in nanoscale FDSOI devices using an inhouse thermal particlebased device simulator. We focus on the current variations for FDSOI devices with arbitrary crystallographic orientation and examine which crystallographic orientation gives better results from electrical and thermal point of view. Our simulation results demonstrate that one can obtain the lowest current degradation with (110) wafer orientation. The temperature of the hotspot is the smallest for (110)orientation as well. Keywords: nanoscale FDSOI devices, selfheating eﬀects, crystallographic orientation, particlebased device simulations.
1
Introduction
The continuous downscaling of MOSFET geometries is motivated by the need for higher packing density and device speed. The objective of the device miniaturization is to deliver high performance at low costs. It results in reduced unit cost per function and in enhanced performance. Full functionality in MOSFETs with technological gate lengths between 10 nm and 100 nm has been achieved leading to mass production of devices, and MOSFETs below 10 nm gate lengths have been established. Maintaining the pace of MOSFET device scaling in the sub100 nm gate length regime has become increasingly diﬃcult. The simple scaling of the channel length and gate oxide thickness is no longer suﬃcient to deliver the projected speed/power performance enhancement for high performance logic device technologies. Problems include shortchannel eﬀects such as, subthreshold leakage current and threshold voltage changes due to the draininduced barrier lowering (DIBL), and the high level of leakage current through the ultrathin gate dielectric. These leakage currents cause higher static power dissipation. Active switching power is another key problem where a higher number of gates switching at high frequency with only modest reductions in supply voltage result in high active power density. The problems facing device scaling necessitate new solutions. The desired solution is one that increases MOSFET drive current while reducing leakage currents, shortchannel eﬀects and the active power density. To achieve further improvement of performance in scaled silicon devices I. Dimov, S. Dimova, and N. Kolkovska (Eds.): NMA 2010, LNCS 6046, pp. 103–109, 2011. c SpringerVerlag Berlin Heidelberg 2011
104
K. Raleva, D. Vasileska, and S.M. Goodnick
12nm
13nm
metal
air
source
25nm metal oxide channel
BOX
13nm
12nm
air 2nm
metal
drain
m n7 3 m n0 1
m n0 5
Fig. 1. Crosssection and geometrical dimensions of the simulated 25nm gatelength FDSOI structure
applied mechanical stress [1], alternative wafer orientations [2],[3], and multigate transistors [4], [5] have been actively researched or are already in production. All these options take advantage of the anisotropic nature of the silicon crystal, and therefore, of its anisotropic bandstructure; in engineering terms gains are utilized in the carrier transport mass and mobility. For instance, strained Si is the only new channel material which has recently made its way into the commercial integrated circuits. By straining the silicon channel, carrier mobility can be enhanced. Also, devices fabricated on strainedSi (110) wafer orientations has shown improved mobility characteristics over (100) devices [6]. Similar results for (110) strained SOI MOSFETs have been published in Ref. [7] as well. The current trend in device scaling is a transition away from conventional planar CMOS to alternative nonplanar technology devices, such as fullydepleted (FD), dualgate (DG), trigate silicononinsulator (SOI) and others. The advantages of these devices are higher drive current, low junction capacitance, reduced leakage current, suppression of ﬂoatingbody eﬀects, absence of latchup and ease in scaling. But, one of the major problem with SOI devices is that they exhibit selfheating eﬀects. These selfheating eﬀects arise from the fact that the underlying SiO2 layer has about 100 times smaller thermal conductivity than bulk Si. We have previously reported that selfheating and increased power density play important roles in the operation of FDSOI devices with gate lengths between 25 and 180 nm [8,9,10] using 2D electrothermal simulation based on the selfconsistent solution of the Boltzmann transport equation for the electrons via Monte Carlo techniques and the energy balance equations for acoustic and optical phonons. There, it was shown that due to geometry and velocity overshoot, selfheating eﬀects are more pronounced for larger channel length devices with correspondingly larger supply voltages. In this work we continue our investigation on the current degradation in nanoscale FDSOI devices due to selfheating eﬀects. We focus on the current
Modeling Thermal Eﬀects in FDSOI Devices
105
variations for FDSOI devices with arbitrary crystallographic orientation and examine which crystallographic orientation gives better results from electrical and thermal point of view. Details of the structure being examined, electrical and thermal boundary conditions, current degradation due to selfheating and lattice temperature proﬁles for (100), (111) and (110) wafer orientations are presented in Section 2 of this paper. Conclusions regarding this work and future directions of research are given in Section 3. VG=1.2V 300K Vs=0
metal 20
300K
300K
metal (G)
metal
drain
source
VD=1.2V
0.2 0.4 0.6
40
0.8 60
BOX 25
1 50
75
Vsubstrate=0 Tbox=300K
Fig. 2. Conduction band edge proﬁle (in Volts) of the simulated structure for VGS = 1.2V and VDS = 1.2V. Also, the positions of the thermal Dirichlet boundary conditions are shown.
2
Electro Thermal Simulations for 25 nm GateLength FDSOI MOSFET
The crosssection of the simulated 25 nm gatelength FDSOI structure is shown in Fig. 1. In order to get more realistic results from thermal simulations, we extend the length of the metal (copper) gate, source and drain electrodes. In all simulation presented in this work, we have assumed Dirichlet boundary conditions at the bottom of the BOX and at the end of the three electrodes (see Fig. 2). For all other boundaries, Neumann conditions are assumed. Details on the role of the substrate and thermal boundary conditions can be found in [9]. The conduction band edge proﬁle for VGS = 1.2V and VDS = 1.2V and the electric Dirichlet boundary conditions are also shown in Fig. 2. To take into account the wafer orientation, we use the standard eﬀective mass approach which describes the band edge electronic properties in an approximate manner. Silicon valley eﬀective masses and subband degeneracy for (100), (111)
106
K. Raleva, D. Vasileska, and S.M. Goodnick
Table 1. Silicon Δvalley eﬀective masses and subband degeneracy for (100), (111) and (110) wafer orientations. (ml =0.91, mt =0.19)
and (110) wafer orientations are given in Table 1, where ml and mt are the longitudinal and the transverse eﬀective masses, respectively. The expressions for the eﬀective mass are derived according to [11]. In Table 2 we present the oncurrent variations and current degradations due to selfheating for diﬀerent wafer orientation. Table 2. Current variations for diﬀerent wafer orientations
The simulation results show that the higher value of the oncurrent is obtained when the simulated FDSOI structure is designed on wafers with either (100) or (110) crystallographic orientations which is due to the lower eﬀective masses along the corresponding transport directions which results in a higher electron drift velocity in the channel (see Fig. 3). Note that the carriers in the simulated structures for the given bias conditions are in the velocity overshoot regime which leads to very small current degradation. The lattice temperature proﬁles in the active silicon layer for (100) and (111) crystallographic orientations are shown in Fig. 4 (left panel). From Fig. 4 (right
Modeling Thermal Eﬀects in FDSOI Devices
107
Fig. 3. Average electron velocity along the channel for diﬀerent wafer orientations
! :9 % 78 & 6 ' (# = VU A ST B R C D?
;'(